Parameter Learning of Bayesian Network with Multiplicative Synergistic Constraints

Zhang, Yu; Hu, Zhiming

doi:10.3390/sym14071469

Open AccessArticle

Parameter Learning of Bayesian Network with Multiplicative Synergistic Constraints

by

Yu Zhang

¹

and

Zhiming Hu

^2,3,*

¹

School of Mathematics and Economics, Bigdata Modeling and Intelligent Computing Research Institute, Hubei University of Education, Wuhan 430205, China

²

School of Statistics and Mathematics, Zhejiang Gongshang University, Hangzhou 310018, China

³

School of Zhejiang College, Shanghai University of Finance and Economics, Jinhua 321013, China

^*

Author to whom correspondence should be addressed.

Symmetry 2022, 14(7), 1469; https://doi.org/10.3390/sym14071469

Submission received: 14 June 2022 / Revised: 10 July 2022 / Accepted: 14 July 2022 / Published: 18 July 2022

Download

Browse Figures

Versions Notes

Abstract

:

Learning the conditional probability table (CPT) parameters of Bayesian networks (BNs) is a key challenge in real-world decision support applications, especially when there are limited data available. The traditional approach to this challenge is introducing domain knowledge/expert judgments that are encoded as qualitative parameter constraints. In this paper, we focus on multiplicative synergistic constraints. The negative multiplicative synergy constraint and positive multiplicative synergy constraint in this paper are symmetric. In order to integrate multiplicative synergistic constraints into the learning process of Bayesian Network parameters, we propose four methods to deal with the multiplicative synergistic constraints based on the idea of classical isotonic regression algorithm. The four methods are simulated by using the lawn moist model and Asia network, and we compared them with the maximum likelihood estimation (MLE) algorithm. Simulation results show that the proposed methods are superior to the MLE algorithm in the accuracy of parameter learning, which can improve the results of the MLE algorithm to obtain more accurate estimators of the parameters. The proposed methods can reduce the dependence of parameter learning on expert experiences. Combining these constraint methods with Bayesian estimation can improve the accuracy of parameter learning under small sample conditions.

Keywords:

multiplicative synergistic; Bayesian networks; parameter learning; limited data

1. Introduction

Bayesian networks (BNs) can model probabilistic dependent relationships among variables in many real world problems. Therefore, they have become very popular in the artificial intelligence (AI) field over the last two decades. BNs have become a powerful tool with many applications such as medical diagnosis, financial analysis, bioinformatics, medical diagnosis, financial analysis, bioinformatics and industrial applications [1], target tracking [2], robot control [3], gene analysis [4], ecosystem modeling [5], signal processing [6], and educational measurement [7]. A BN model consists of a network structure and a set of conditional probability tables (CPTs). This paper focuses on the parameter learning of discrete BNs when the structure is known.

In practice, when performing parameter learning, we need sufficient samples. If we have sufficient data, BNs can be easily constructed using traditional methods such as the maximum likelihood (ML) method [8]. However, the ML method is difficult to obtain accurate parameters when the data is insufficient, and thus it is difficult to give the right decision [9]. It is very difficult to collect abundant data under certain circumstances, such as, in the cases of earthquake prediction [10], parole assessment [11], and rare disease diagnosis [12]. Thus, the data are not sufficient in many cases, which may lead to inaccurate structure and parameters of a BN. Therefore, many scholars began to pay attention to the parameter learning problems of BNs under the condition of small data sets, and proposed some algorithms to solve these problems.

Altendorf et al. [13] converted the qualitative influence constraints to penalty function and gave the objective optimization function combined with the ML function, and then applied the gradient method to solve it. Feelders and Linda [14] converted qualitative influence constraints to order constraints and applied isotonic regression algorithm to adjust the size of parameters so that the parameters satisfy order constraints. Cassio and Ji [15] converted monotonic constraints to penalty function and applied a convex optimization algorithm to solve the objective optimization function. Ren et al. [16] transformed interval constraints into beta distribution to constrain priori parameters, and combined Bayesian estimation to obtain the parameters. Niculescu [17] studied some equality constraints such as normative constraint and proportional constraint by introducing Lagrange multiplier. Rui [18] used Monte Carlo sampling method to extract virtual data from non-monotonic constraints space, and then used virtual data to construct prior distribution. Finally, Bayesian estimation was used to combine prior distribution with real data to obtain the parameters of BNs. Kobra [19] proposed a multi-experts parameter learning framework to fuse multiple experts’ knowledge. In addition to the above methods based on constraints, some scholars proposed the parameter learning methods of BN based on the minimum free energy (see [20]) and Noisy- or -Gates (see [21]). Zhou et al. [22] studied a class of constraints that is naturally encoded in the edges of BNs with monotonic influences. Gao et al. [23] developed “MiniMax Fitness” algorithm to address the problem that imposing prior distributions can reduce the fitness between parameters and data. For more studies on BNs, one can refer to [24,25,26,27].

It is found that the main idea of the algorithms is to introduce expert experiences or domain knowledge into the parameter learning process of BNs with some constraints in existing literature. However, the constraints involved in the existing literature are mainly network parameters under the condition of single parent node, and the constraints under the synergistic condition of multiple parent nodes are rarely studied, and the methods involved are relatively complex. There are many constraints under the condition of multiple parents nodes such as additive synergy, multiplicative synergy (see [28]), etc. In this paper, isotonic regression is used to study the parameter learning of BNs under multiplicative synergistic constrains. The proposed methods can reduce the dependence of parameter learning on expert experiences. Combining these constraint methods with Bayesian estimation can improve the accuracy of parameter learning under small sample conditions.

The reminder of this paper is organized as follows. In Section 2, we briefly review some basic theories of BNs and parameter learning. In Section 3, the classical isotonic regression algorithm is introduced. In Section 4, we provide the parameter learning algorithms of this paper under multiplicative synergistic constrains by referring to the idea of pool adjacent violators (PAV) algorithm. In Section 5, the effectiveness and performance of four mentioned algorithms are verified by simulations. In Section 6, some conclusions are given.

2. Preliminaries

In this section, some concepts of Bayesian networks will be briefly reviewed, so that one can understand the paper well.

2.1. Bayesian Networks

Bayesian networks are represented as a directed acyclic graph that contains some nodes and edges. Nodes represent random variables, while edges represent the probabilistic relationship between random variables. For each variable node

X_{i}

, a conditional probability table is specified as

P (X_{i} |π (X_{i}))

, which describes the probability over the possible values of

X_{i}

and possible configurations of parent variables

π (X_{i})

. In a BN, the joint probability can be written as follows:

P (X_{1}, \dots, X_{n}) = \prod_{i = 1}^{n} P (X_{i} |π (X_{i})) .

To illustrate the BNs more clearly, the lawn moist BN is employed, which is depicted as Figure 1, where “1” stands for “Cloudy”, “2” stands for “Rain”, “3” stands for “Sprinkle”, and “4” stands for “Wet”. The model will be used as the experimental model later. The variables in the network are binary, the value is 0 or 1, that is, if the event occurs or is present, it has the state 1. Our purpose is to learn the estimators of parameters in the following BN.

2.2. Maximum Likelihood Estimation

Maximum likelihood estimation is an important method for parameters learning of BN. The MLE of the parameters is

θ_{i j k}^{*} = \{\begin{cases} \frac{m_{i j l}}{m_{i j k}} & \sum_{k = 1}^{r_{i}} m_{i j k} > 0 \\ \frac{1}{r_{i}} & \sum_{k = 1}^{r_{i}} m_{i j k} = 0 \end{cases},

where

i

is the index of node

X_{i}

,

j

is the index of parent nodes’ configuration,

k

is

X_{i}^{'} s

states,

r_{i}

is the number of the states of

X_{i}

, and

m_{i j k}

is the number of cases that satisfy

X_{i} = k

and

π (X_{i}) = j

in the data set (see [1]).

3. Isotonic Regression

We know that the variables satisfy order relation, but the order obtained by observation or counting does not satisfy the known order relation, then the data will be adjusted by the weighted average method. This is the problem to be solved by the isotonic regression. Let

u_{1}, u_{2} \dots, u_{k}

be a set of variables,

u_{1} \leq u_{2} \leq \dots \leq u_{k}

, and

{u^{'}}_{1}, {u^{'}}_{2} \dots, {u^{'}}_{k}

are estimators. The process of using pool adjacent violators (PAV) algorithm to isotonic regression is as follows:

Step 1: Start with ${u^{'}}_{1}$ , and compare ${u^{'}}_{1}, {u^{'}}_{2} \dots, {u^{'}}_{k}$ in pairs, if $u_{i} \leq u_{i + 1}$ , there is no adjusment, if $u_{i} < u_{i + 1} < \dots < u_{i + j}$ , $(1 \leq j \leq k - i)$ , let

$u_{i + 1} = u_{i + 2} = \dots = u_{i + j} = \frac{1}{j + 1} \sum_{m = 0}^{j} {u^{'}}_{i + m},$

and so on.
Step 2: If ${u^{'}}_{1}, {u^{'}}_{2} \dots, {u^{'}}_{k}$ do not satisfy the size relationship of variables in step 1, repeat the process in step 1 from ${u^{'}}_{1}$ . Thus, the values of ${u^{'}}_{1}, {u^{'}}_{2} \dots, {u^{'}}_{k}$ adjusted by PAV algorithm can be obtained.

The uniqueness of the solution of isotonic regression has been verified (see [29]) and the solutions obtained by the above process that satisfying the order relation of

u_{1} \leq u_{2} \leq \dots \leq u_{k}

are unique.

4. Parameter Learning of BNs under Multiplicative Synergistic Constraints

4.1. The Model of Multiplicative Synergistic

Multiplicative synergy constraints describe the synergy size relationship of parameters among three node variables. Let

A

,

B

and

C

be the three variables in the BNs, and all of them are discrete binary variables. Suppose that

A

and

B

are the parent nodes of

C

, then the negative multiplicative synergy constraint of

A

and

B

to

C

can be expressed as:

P (c |a, b) \cdot P (c |\bar{a}, \bar{b}) \leq P (c |\bar{a}, b) \cdot P (c |a, \bar{b}) .

Similarly, positive multiplicative synergy constraint can be expressed as:

P (c |a, b) \cdot P (c |\bar{a}, \bar{b}) \geq P (c |\bar{a}, b) \cdot P (c |a, \bar{b}) .

4.2. Description of Algorithm

Under the condition of small data sets, if the variables

A

,

B

and

C

in the network conform to the above multiplicative synergy constraint, how to apply this constraint and combine the small data sets to limit the Bayesian network parameter learning and get more accurate results is very important.

The algorithm steps adopted in this paper are as follows:

Step 1: Using the existing small data sets and by the maximum likelihood estimation algorithm to get relevant parameters.
Step 2: For the structural part of multiple parent nodes, judge whether the parameters of this part meet the multiplicative synergy constraint. If they meet, conform to step 5, otherwise conform to step 3.
Step 3: Take the left and right sides of multiplicative synergy constraint as a whole, and use the idea of “averaging” of PAV algorithm to modify them, respectively.
Step 4: Adjust the whole (product of parameters) in step 3, and then modify each parameter.
Step 5: Obtain the final parameter learning result.

Steps 3 and 4 are the focus of the algorithm, which are described in detail below. By referring to the idea of PAV algorithm, this paper proposes four different methods to complete the order preserving of parameters and the contents mentioned in steps 3 and 4. Taking the negative multiplicative synergy constraint mentioned in Section 4.1 as an example, the following specific calculation methods are given successively. Parameters order preserving under the condition of positive multiplicative synergy can be obtained in a similar way. The proposed four methods are as follows:

Method 1

m e a n = (P (c |a, b) \cdot P (c |\bar{a}, \bar{b}) + P (c |\bar{a}, b) \cdot P (c |a, \bar{b})) / 2

d = |m e a n - P (c |a, b) \cdot P (c |\bar{a}, \bar{b})|

\begin{array}{l} P {(c |a, b)}^{'} = P (c |a, b) - d / 2 \\ P {(c |\bar{a}, \bar{b})}^{'} = P (c |\bar{a}, \bar{b}) - d / 2 \\ P {(c |\bar{a}, b)}^{'} = P (c |\bar{a}, b) + d / 2 \\ P {(c |a, \bar{b})}^{'} = P (c |a, \bar{b}) + d / 2 \end{array}

Method 2

m e a n = \frac{((N_{1} \cdot N_{4}) \cdot (P (c |a, b) \cdot P (c |\bar{a}, \bar{b})) + (N_{2} \cdot N_{3}) \cdot (P (c |\bar{a}, b) \cdot P (c |a, \bar{b})))}{N_{1} \cdot N_{2} \cdot N_{3} \cdot N_{4}}

d = |m e a n - P (c |a, b) \cdot P (c |\bar{a}, \bar{b})|

P {(c |a, b)}^{'} = P (c |a, b) - d \cdot N_{1} / (N_{1} \cdot N_{4})

P {(c |\bar{a}, \bar{b})}^{'} = P (c |\bar{a}, \bar{b}) - d \cdot N_{4} / (N_{1} \cdot N_{4})

P {(c |\bar{a}, b)}^{'} = P (c |\bar{a}, b) + d \cdot N_{2} / (N_{2} \cdot N_{3})

P {(c |a, \bar{b})}^{'} = P (c |a, \bar{b}) + d \cdot N_{3} / (N_{2} \cdot N_{3})

Method 3

m e a n = (P (c |a, b) \cdot P (c |\bar{a}, \bar{b}) + P (c |\bar{a}, b) \cdot P (c |a, \bar{b})) / 2

d = |m e a n - P (c |a, b) \cdot P (c |\bar{a}, \bar{b})|

P {(c |a, b)}^{'} = P (c |a, b) - d \cdot N_{1} / (N_{1} \cdot N_{4})

P {(c |\bar{a}, \bar{b})}^{'} = P (c |\bar{a}, \bar{b}) - d \cdot N_{4} / (N_{1} \cdot N_{4})

P {(c |\bar{a}, b)}^{'} = P (c |\bar{a}, b) + d \cdot N_{2} / (N_{2} \cdot N_{3})

P {(c |a, \bar{b})}^{'} = P (c |a, \bar{b}) + d \cdot N_{3} / (N_{2} \cdot N_{3})

Method 4

m e a n = \frac{((N_{1} \cdot N_{4}) \cdot (P (c |a, b) \cdot P (c |\bar{a}, \bar{b})) + (N_{2} \cdot N_{3}) \cdot (P (c |\bar{a}, b) \cdot P (c |a, \bar{b})))}{N_{1} \cdot N_{2} \cdot N_{3} \cdot N_{4}}

d = |m e a n - P (c |a, b) \cdot P (c |\bar{a}, \bar{b})|

\begin{array}{l} P {(c |a, b)}^{'} = P (c |a, b) - d / 2 \\ P {(c |\bar{a}, \bar{b})}^{'} = P (c |\bar{a}, \bar{b}) - d / 2 \\ P {(c |\bar{a}, b)}^{'} = P (c |\bar{a}, b) + d / 2 \\ P {(c |a, \bar{b})}^{'} = P (c |a, \bar{b}) + d / 2 \end{array}

where

N_{1}

stands for the sample size when the state of parent nodes is

(a, b)

,

N_{2}

stands for the sample size when the state of parent nodes is

(\bar{a}, b)

,

N_{3}

stands for the sample size when the state of parents node is

(a, \bar{b})

, and

N_{4}

stands for the sample size when the state of parent nodes is

(\bar{a}, \bar{b})

.

The algorithms obtained from Methods 1 to 4 are convergent, and we can obtain the theorem as follows:

Theorem 1.

Let

p_{01} \cdot p_{04} > p_{02} \cdot p_{03}

, then there exists

k

such that.

p_{k 1} \cdot p_{k 4} \leq p_{k 2} \cdot p_{k 3}

after

k

steps adjustment as Method 1, where

p_{1} = P (c |a, b)

,

p_{2} = P (c |\bar{a}, b)

,

p_{3} = P (c |a, \bar{b})

,

p_{4} = P (c |\bar{a}, \bar{b})

,

p_{k i}

represents the value of

p_{i} (i = 1, 2, 3, 4)

after the adjustment of

k

steps,

m_{k} = |p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3}| / 2

, and

d_{k} = |m_{k} - p_{k 1} \cdot p_{k 4}|

.

Proof of Theorem 1.

Without loss of generality, we only prove the case of Method 1, as the proof of the other three cases is analogous.

Proving

p_{k 1} \cdot p_{k 4} \leq p_{k 2} \cdot p_{k 3}

is to show that

p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3} \leq 0 .

p_{k 1} = p_{k - 1, 1} - d_{k - 1} / 2, p_{k 4} = p_{k - 1, 4} - d_{k - 1} / 2, p_{k 2} = p_{k - 1, 2} + d_{k - 1} / 2, p_{k 3} = p_{k - 1, 3} + d_{k - 1} / 2,

then

\begin{array}{l} p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3} \\ = (p_{k - 1, 1} - d_{k - 1} / 2) (p_{k - 1, 4} - d_{k - 1} / 2) - (p_{k - 1, 2} + d_{k - 1} / 2) (p_{k - 1, 3} + d_{k - 1} / 2) \\ = p_{k - 1, 1} \cdot p_{k - 1, 4} - p_{k - 1, 2} \cdot p_{k - 1, 3} - \frac{d_{k - 1}}{2} (p_{k - 1, 1} + p_{k - 1, 4} + p_{k - 1, 2} + p_{k - 1, 3}) \\ = (p_{k - 2, 1} - d_{k - 2} / 2) (p_{k - 2, 4} - d_{k - 2} / 2) - (p_{k - 2, 2} + d_{k - 2} / 2) (p_{k - 2, 3} + d_{k - 2} / 2) \\ - \frac{d_{k - 1}}{2} (p_{k - 2, 1} - \frac{d_{k - 2}}{2} + p_{k - 2, 4} - \frac{d_{k - 2}}{2} + p_{k - 2, 2} + \frac{d_{k - 2}}{2} + p_{k - 2, 3} + \frac{d_{k - 2}}{2}) \\ = p_{k - 2, 1} \cdot p_{k - 2, 4} - p_{k - 2, 2} \cdot p_{k - 2, 3} - \frac{d_{k - 2}}{2} (p_{k - 2, 1} + p_{k - 2, 4} + p_{k - 2, 2} + p_{k - 2, 3}) \\ - \frac{d_{k - 1}}{2} (p_{k - 2, 1} + p_{k - 2, 4} + p_{k - 2, 2} + p_{k - 2, 3}) \\ = p_{k - 2, 1} \cdot p_{k - 2, 4} - p_{k - 2, 2} \cdot p_{k - 2, 3} - \frac{d_{k - 2} + d_{k - 1}}{2} (p_{k - 2, 1} + p_{k - 2, 4} + p_{k - 2, 2} + p_{k - 2, 3}) \\ = p_{k - 3, 1} \cdot p_{k - 3, 4} - p_{k - 3, 2} \cdot p_{k - 3, 3} - \frac{d_{k - 3} + d_{k - 2} + d_{k - 1}}{2} (p_{k - 3, 1} + p_{k - 3, 4} + p_{k - 3, 2} + p_{k - 3, 3}) \\ = \dots \dots \\ = p_{11} \cdot p_{14} - p_{12} \cdot p_{13} - \frac{d_{1} + d_{2} + \dots + d_{k - 1}}{2} (p_{11} + p_{14} + p_{12} + p_{13}) \\ = p_{01} \cdot p_{04} - p_{02} \cdot p_{03} - \frac{d_{0} + d_{1} + \dots + d_{k - 1}}{2} (p_{01} + p_{04} + p_{02} + p_{03}) \\ = p_{01} \cdot p_{04} - p_{02} \cdot p_{03} - \frac{d_{0} + d_{1} + \dots + d_{k - 1}}{2} \cdot M \end{array}

where

p_{01} + p_{04} + p_{02} + p_{03} ≜ M

(

M

is a positive constant).

Therefore, we can find that

\begin{array}{l} d_{k} = |m_{k} - p_{k 1} \cdot p_{k 4}| \\ = |\frac{p_{k 1} \cdot p_{k 4} + p_{k 2} \cdot p_{k 3}}{2} - p_{k 1} \cdot p_{k 4}| \\ = \frac{|p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3}|}{2} \\ = \frac{|(p_{k - 1, 1} - \frac{d_{k - 1}}{2}) (p_{k - 1, 4} - \frac{d_{k - 1}}{2}) - (p_{k - 1, 2} - \frac{d_{k - 1}}{2}) (p_{k - 1, 3} - \frac{d_{k - 1}}{2})|}{2} \\ = \frac{|p_{k - 1, 1} \cdot p_{k - 1, 4} - p_{k - 1, 2} \cdot p_{k - 1, 3} - \frac{d_{k - 1}}{2} \cdot M|}{2} \end{array}

Since

p_{k - 1, 1} \cdot p_{k - 1, 4} - p_{k - 1, 2} \cdot p_{k - 1, 3} > 0

, we have

\begin{array}{l} d_{k - 1} = \frac{|p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3}|}{2} \\ = \frac{p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3}}{2} \end{array}

Hence,

p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3} = 2 d_{k - 1} .

Therefore,

\begin{array}{l} d_{k} = \frac{|p_{k - 1, 1} \cdot p_{k - 1, 4} - p_{k - 1, 2} \cdot p_{k - 1, 3} - \frac{d_{k - 1}}{2} \cdot M|}{2} \\ = \frac{|2 d_{k - 1} - \frac{d_{k - 1}}{2} \cdot M|}{2} = \frac{|2 d_{k - 1} (1 - \frac{M}{4})|}{2} \\ = d_{k - 1} |1 - \frac{M}{4}| \end{array}

By

0 < M \leq 4

, we have

d_{k} = d_{k - 1} |1 - \frac{M}{4}| = d_{k - 1} (1 - \frac{M}{4}) .

When

M = 4

,

d_{k} = 0 (k \geq 1)

, then

\begin{array}{l} p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3} \\ = p_{01} \cdot p_{04} - p_{02} \cdot p_{03} - \frac{d_{0} + d_{1} + \dots + d_{k - 1}}{2} \cdot M \\ = p_{01} \cdot p_{04} - p_{02} \cdot p_{03} - \frac{d_{0}}{2} \cdot 4 \\ = 2 d_{0} - 2 d_{0} = 0 \end{array}

Therefore, Theorem 1 holds in this case.

When

0 < M < 4

,

d_{k} = d_{k - 1} (1 - \frac{M}{4}) = d_{k - 2} {(1 - \frac{M}{4})}^{2} = \dots = d_{0} {(1 - \frac{M}{4})}^{k},

thus

\begin{array}{l} p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3} \\ = p_{01} \cdot p_{04} - p_{02} \cdot p_{03} - \frac{d_{0} + d_{1} + \dots + d_{k - 1}}{2} \cdot M \\ = 2 d_{0} - 8 d_{0} [1 - {(1 - M / 4)}^{k}] \end{array}

Let

p_{k 1} \cdot p_{k 4} - p_{k 2} \cdot p_{k 3}

= 2 d_{0} - 8 d_{0} [1 - {(1 - M / 4)}^{k}] \leq 0

, we can find that

{(1 - M / 4)}^{k} \leq \frac{3}{4} .

Since

0 < M < 4

, we have

0 < 1 - M / 4 < 1 .

Hence, there exists

k

such that

{(1 - M / 4)}^{k} \leq \frac{3}{4} .

From the above, we known that there exists

k

such that

p_{k 1} \cdot p_{k 4} \leq p_{k 2} \cdot p_{k 3} .

This completes the proof of Theorem 1. □

5. Experiments

In this section, we verify the effectiveness and performance of four algorithms mentioned in this paper by two simulations.

5.1. Experiment 1

5.1.1. Simulation Model

This simulation adopts the lawn moist model, as shown in Figure 1.

In the model,

R

,

S

and

W

meet the multiplicative synergy constraint, which can be expressed as follows:

P (W = 1 |R = 1, S = 1) \cdot P (W = 1 |R = 0, S = 0) \leq P (W = 1 |R = 1, S = 0) \cdot P (W = 1 |R = 0, S = 1) .

In the above inequality, when the value of the variable is 1, it means that the event occurs, and 0 means that it does not occur. Table 1 shows the real parameters in the network. In order to quantitatively analyze the performance of several methods in this paper, KL divergence from real parameters is introduced as an index to measure the accuracy of the algorithm. The expression of KL divergence (see [30]) is as follows:

K L (\hat{θ}, θ) = \sum_{X} p_{\hat{θ}} (X) \ln \frac{p_{\hat{θ}} (X)}{p_{θ} (X)} .

5.1.2. Simulation Analysis

Take the sample size as 20, the simulation results obtained by several algorithms are shown from Table 2, Table 3, Table 4, Table 5 and Table 6, and the KL divergences between the learning results and the real parameters are shown in Table 7.

Table 2, Table 3, Table 4, Table 5 and Table 6 show the network parameters learned by MLE algorithm and the four algorithms proposed in this paper when the sample size is 20. Table 7 shows the KL divergences between the learning results and the real parameters. The experimental results show that the KL divergence between the learning parameters of the four methods proposed in this paper and real parameters is smaller than that between the learning parameters of MLE and the real parameters when the sample size is small. It shows that each method proposed in this paper is superior to the MLE algorithm in the accuracy of parameter learning. In addition, it can be seen from Table 7 that among the four algorithms proposed in this paper, the learning accuracy of Method 2 is the highest, while that of Method 1 is the lowest.

5.2. Experiment 2

5.2.1. Simulation Model

This simulation adopts Asia Network, as shown in Figure 2, where ‘1’ stands for

X_{1}

, ‘2’ stands for

X_{2}

, ‘3’ stands for

X_{3}

, ‘4’ stands for

X_{4}

, ‘5’ stands for

X_{5}

, ‘6’ stands for

X_{6}

, ‘7’ stands for

X_{7}

, and ‘8’ stands for

X_{8}

.

In the model,

X_{7}

and,

X_{8}

meet the multiplicative synergy constraint, which can be expressed as follows:

\begin{array}{l} P (X_{8} = 1 |X_{5} = 1, X_{7} = 1) \cdot P (X_{8} = 1 |X_{5} = 0, X_{7} = 0) \\ \leq P (X_{8} = 1 |X_{5} = 1, X_{7} = 0) \cdot P (X_{8} = 1 |X_{5} = 0, X_{7} = 1) \end{array}

In the above inequality, if the value of the variable is 1, it means that the event occurs, and 0 means that it does not occur. Table 8 shows the real parameters in the network.

The explanations of

X_{i}

are as follows:

X_{1}

—Visit To Asia (2): Visit, No_Visit;

X_{2}

—Tuberculosis (2): Present, Absent;

X_{3}

—Smoking (2): Smoker, Nonsmoker;

X_{4}

—Lung Cancer (2): Present, Absent;

X_{5}

—Tuberculosis or Lung Cancer (2): True, False;

X_{6}

—Xray Result (2): Abnormal, Normal;

X_{7}

—Bronchitis (2): Present, Absent;

X_{8}

—Dyspnoea (2): Present, Absent.

5.2.2. Simulation Analysis

Take the sample size as 20, the simulation results obtained by several algorithms are shown from Table 9, Table 10, Table 11, Table 12 and Table 13, and the KL divergences between the learning results and the real parameters are shown in Table 14.

Table 9, Table 10, Table 11, Table 12 and Table 13 show the network parameters learned by MLE algorithm and the four algorithms proposed in this paper when the sample size is 20. Table 14 shows the KL divergences between the learning results and the real parameters. The experimental results show that the KL divergence between the learning parameters of the four methods proposed in this paper and real parameters is smaller than that between the learning parameters of MLE and the real parameters when the sample size is small. It shows that each method proposed in this paper is superior to the MLE algorithm in the accuracy of parameter learning. In addition, it can be seen from Table 14 that among the four algorithms proposed in this paper, the learning accuracy of Method 1 is the highest, while that of Method 3 is the lowest.

6. Conclusions

By referring to the idea of PAV algorithm, this paper proposes four methods to deal with multiplicative synergy constraints. We analyze and compare the algorithms from the algorithm accuracy. The simulations results show that the four algorithms mentioned in this paper are superior to the MLE algorithm in the accuracy of parameter learning, which can improve the results of the MLE algorithm to obtain more accurate estimators of the parameters.

The methods proposed in this paper can reduce the dependence of parameter learning on expert experiences. Combining these constraint methods with Bayesian estimation can improve the accuracy of parameter learning under small sample conditions. However, the algorithms in this paper also have limitations. When there are many parent nodes, it is difficult to give the parameter size relationship of the network. In the future research, the constraints presented in this paper can be combined with other existing constraints to reduce the dependence of constraints on expert experiences and improve the accuracy of parameter learning.

Author Contributions

Methodology, software, writing—original draft, writing—review and editing, Y.Z.; funding acquisition, supervision project administration, Z.H.; validation, Y.Z. and Z.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Bigdata Modeling and Intelligent Computing Research Institute, Hubei University of Education, Scientific Research Project of Education Department of Zhejiang Province (Y202147034), Zhejiang College of Shanghai University of Finance and Economics for Scientific Research Projects at the Provincial and Above Levels, and the National Statistical Science Research Project of China (2021LY100).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Acknowledgments

The authors would like to thank everyone for help.

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, L.; Guo, H. Introduction to Bayesian Networks; Science Press: Beijing, China, 2006. [Google Scholar]
Mascaro, S.; Nicholso, A.E.; Korb, K.B. Anomaly detection in vessel tracks using Bayesian networks. Int. J. Approx. Reason. 2014, 55, 84–98. [Google Scholar] [CrossRef]
Infantes, G.; Ghallab, M.; Ingrand, F. Learning the behavior model of a robot. Auton. Robot. 2010, 30, 157–177. [Google Scholar] [CrossRef] [Green Version]
Tamada, Y.; Imoto, S.; Araki, H.; Nagasaki, M.; Print, C.; Charnock-Jones, D.S.; Miyano, S. Estimating Genome-Wide Gene Networks Using Nonparametric Bayesian Network Models on Massively Parallel Computers. IEEE/ACM Trans. Comput. Biol. Bioinform. 2010, 8, 683–697. [Google Scholar] [CrossRef] [PubMed]
Landuyt, D.; Broekx, S.; D’hondt, R.; Engelen, G.; Aertsens, J.; Goethals, P.L. A review of Bayesian belief networks in ecosystem service modelling. Environ. Model. Softw. 2013, 46, 1–11. [Google Scholar] [CrossRef]
Wachowski, N.; Azimi-Sadjadi, M.R. Detection and Classification of Nonstationary Transient Signals Using Sparse Approximations and Bayesian Networks. IEEE/ACM Trans. Audio Speech Lang. Process. 2014, 22, 1750–1764. [Google Scholar] [CrossRef]
Almond, R.G.; Mislevy, R.J.; Steinberg, L.S.; Yan, D.; Williamson, D.M. Bayesian Networks in Educational Assessment; Springer: Berlin/Heidelberg, Germany, 2015. [Google Scholar]
Redner, R.A.; Walker, H.F. Mixture Densities, Maximum Likelihood and the EM Algorithm. SIAM Rev. 1984, 26, 195–239. [Google Scholar] [CrossRef]
Isozaki, T.; Kato, N.; Ueno, M. “Data Temperature” in minimum free energies for parameter learning of bayesian networks. Int. J. Artif. Intell. Tools 2009, 18, 653–671. [Google Scholar] [CrossRef]
Hu, J.; Tang, X.; Qiu, J. A Bayesian network approach for predicting seismic liquefaction based on interpretive structural mod-eling. Georisk Assess. Manag. Risk Eng. Syst. Geohazards 2015, 9, 200–217. [Google Scholar] [CrossRef]
Constantinou, A.C.; Freestone, M.; Marsh, W.; Fenton, N.; Coid, J. Risk assessment and risk management of violent reoffending among prisoners. Expert Syst. Appl. 2015, 42, 7511–7529. [Google Scholar] [CrossRef]
Seixas, F.L.; Zadrozny, B.; Laks, J.; Conci, A.; Saade, D.C.M. A Bayesian network decision model for supporting the diagnosis of dementia, Alzheimer’s disease and mild cognitive impairment. Comput. Biol. Med. 2014, 51, 140–158. [Google Scholar] [CrossRef] [Green Version]
Altendorf, E.; Restificar, A.; Dietterich, T. Learning from sparse data by exploiting monotonicity constraints. In Proceedings of the Twenty First Conference on Uncertainty in Artificial Intelligence (UAI 2005), Edinburgh, UK, 26–29 July 2005; pp. 18–26. [Google Scholar]
Ad, F.; Van der Gaag, L.C. Learning Bayesian tetworks parameters under order constraints. Int. J. Approx. Reason. 2006, 42, 37–53. [Google Scholar]
de Campos, C.P.; Ji, Q. Improving Bayesian network parameter learning using constraints. In Proceedings of the Nineteenth International Conference on Pattern Recognition (ICPR 2008), Tampa, FL, USA, 8–11 December 2008; pp. 1–4. [Google Scholar]
Ren, J.; Gao, X.; Ru, W. Parameters learning of BN in small sample base on data missing. Syst. Eng. Theory Pract. 2011, 31, 172–177. [Google Scholar]
Niculescu, R.S.; Mitchell, T.M.; Rao, R.B. Bayesian network learning with parameter constraints. J. Mach. Learn. Res. 2006, 7, 1357–1383. [Google Scholar]
Chang, R. Advanced Algorithms of Bayesian Network Learning and Probabilistic Inference from Inconsistent Prior Knowledge and Sparse Data with Applications in Computational Biology and Computer Vision. In Bayesian Network; Intechopen: London, UK, 2010. [Google Scholar] [CrossRef] [Green Version]
Etminani, K.; Naghibzadeh, M.; Peña, J.M. DemocraticOP: A Democratic way of aggregating Bayesian network parameters. Int. J. Approx. Reason. 2013, 54, 602–614. [Google Scholar] [CrossRef]
Isozaki, T.; Kato, N.; Ueno, M. Minimum Free Energies with “Data Temperature” for Parameter Learning of Bayesian Networks. In Proceedings of the 2008 20th IEEE International Conference on Tools with Artificial Intelligence, Dayton, OH, USA, 3–5 November 2008. [Google Scholar] [CrossRef]
Oniśko, A.; Druzdzel, M.J.; Wasyluk, H. Learning Bayesian network parameters from small data sets: Application of Noisy-OR gates. Int. J. Approx. Reason. 2001, 27, 165–182. [Google Scholar] [CrossRef] [Green Version]
Zhou, Y.; Fenton, N.; Zhu, C. An empirical study of Bayesian network parameter learning with monotonic influence constraints. Decis. Support Syst. 2016, 87, 69–79. [Google Scholar] [CrossRef]
Gao, X.; Guo, Z.; Ren, H.; Yang, Y.; Chen, D.; He, C. Learning Bayesian network parameters via minimax algorithm. Int. J. Approx. Reason. 2019, 108, 62–75. [Google Scholar] [CrossRef]
Tang, K.; Parsons, D.J.; Jude, S. Comparison of automatic and guided learning for Bayesian networks to analyse pipe failures in the water distribution system. Reliab. Eng. Syst. Saf. 2019, 186, 24–36. [Google Scholar] [CrossRef]
Imani, M.; Ghoreishi, S.F. Graph-Based Bayesian Optimization for Large-Scale Objective-Based Experimental Design. IEEE Trans. Neural Networks Learn. Syst. 2021, 1–13. [Google Scholar] [CrossRef]
Scutari, M.; Vitolo, C.; Tucker, A. Learning Bayesian networks from big data with greedy search: Computational complexity and efficient implementation. Stat. Comput. 2019, 29, 1095–1108. [Google Scholar] [CrossRef] [Green Version]
Imani, M.; Imani, M.; Ghoreishi, S.F. Bayesian Optimization for Expensive Smooth-Varying Functions. IEEE Intell. Syst. 2022. [Google Scholar] [CrossRef]
Wellman, M. Fundamental concepts of qualitative probabilistic networks. Artif. Intell. 1990, 44, 257–303. [Google Scholar] [CrossRef]
Di, R.; Gao, X.; Guo, Z. Discrete Bayesian network parameter learning based on monotonic constraint. Syst. Eng. Electron. 2014, 36, 272–277. [Google Scholar]
Kullback, S.; Leibler, R.A. On Information and Sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar] [CrossRef]

Figure 1. Lawn moist model.

Figure 2. Asia network.

Table 1. Real parameters of the simulation network.

	$X_{5} = 0, X_{7} = 0$	$X_{5} = 0, X_{7} = 1$	$X_{5} = 1, X_{7} = 0$	$X_{5} = 1, X_{7} = 1$
$W = 0$	0.6	0.4	0.4	0.15
$W = 1$	0.4	0.6	0.6	0.85

Table 2. Learning parameters of MLE.

	$R = 0, S = 0$	$R = 0, S = 1$	$R = 1, S = 0$	$R = 1, S = 1$
$W = 0$	0.5814	0.2500	0.5250	0.0545
$W = 1$	0.4186	0.7500	0.4750	0.9455

Table 3. Learning parameters of Method 1.

	$R = 0, S = 0$	$R = 0, S = 1$	$R = 1, S = 0$	$R = 1, S = 1$
$W = 0$	0.6012	0.2311	0.5052	0.0743
$W = 1$	0.3988	0.7689	0.4948	0.9257

Table 4. Learning parameters of Method 2.

	$R = 0, S = 0$	$R = 0, S = 1$	$R = 1, S = 0$	$R = 1, S = 1$
$W = 0$	0.6333	0.1774	0.4524	0.1755
$W = 1$	0.3667	0.8226	0.5476	0.8245

Table 5. Learning parameters of Method 3.

	$R = 0, S = 0$	$R = 0, S = 1$	$R = 1, S = 0$	$R = 1, S = 1$
$W = 0$	0.6012	0.2444	0.5171	0.0611
$W = 1$	0.3988	0.7556	0.4829	0.9389

Table 6. Learning parameters of Method 4.

	$R = 0, S = 0$	$R = 0, S = 1$	$R = 1, S = 0$	$R = 1, S = 1$
$W = 0$	0.6390	0.1924	0.3684	0.2121
$W = 0$	0.3610	0.8076	0.6316	0.7879

Table 7. KL divergences between the learning results and the real parameter.

Algorithm	MLE	Method 1	Method 2	Method 3	Method 4
KL divergence	0.1279	0.1133	0.1258	0.1203	0.0724

Table 8. Real parameters of the simulation network.

	$X_{5} = 0, X_{7} = 0$	$X_{5} = 0, X_{7} = 1$	$X_{5} = 1, X_{7} = 0$	$X_{5} = 1, X_{7} = 1$
$X_{8} = 0$	0.65	0.4	0.3	0.05
$X_{8} = 1$	0.35	0.6	0.7	0.95

Table 9. Learning parameters of MLE.

	$X_{5} = 0, X_{7} = 0$	$X_{5} = 0, X_{7} = 1$	$X_{5} = 1, X_{7} = 0$	$X_{5} = 1, X_{7} = 1$
$X_{8} = 0$	0.6299	0.3333	0.4976	0.0015
$X_{8} = 1$	0.3701	0.6667	0.5024	0.9985

Table 10. Learning parameters of Method 1.

	$X_{5} = 0, X_{7} = 0$	$X_{5} = 0, X_{7} = 1$	$X_{5} = 1, X_{7} = 0$	$X_{5} = 1, X_{7} = 1$
$X_{8} = 0$	0.6437	0.3155	0.4798	0.0193
$X_{8} = 1$	0.3563	0.6845	0.5202	0.9807

Table 11. Learning parameters of Method 2.

	$X_{5} = 0, X_{7} = 0$	$X_{5} = 0, X_{7} = 1$	$X_{5} = 1, X_{7} = 0$	$X_{5} = 1, X_{7} = 1$
$X_{8} = 0$	0.6774	0.2653	0.4286	0.1148
$X_{8} = 1$	0.3226	0.7347	0.5714	0.8852

Table 12. Learning parameters of Method 3.

	$X_{5} = 0, X_{7} = 0$	$X_{5} = 0, X_{7} = 1$	$X_{5} = 1, X_{7} = 0$	$X_{5} = 1, X_{7} = 1$
$X_{8} = 0$	0.6413	0.3035	0.5082	0.1011
$X_{8} = 1$	0.3587	0.6965	0.4918	0.8989

Table 13. Learning parameters of Method 4.

	$X_{5} = 0, X_{7} = 0$	$X_{5} = 0, X_{7} = 1$	$X_{5} = 1, X_{7} = 0$	$X_{5} = 1, X_{7} = 1$
$X_{8} = 0$	0.6988	0.3634	0.3277	0.1714
$X_{8} = 1$	0.3012	0.6366	0.6723	0.8286

Table 14. KL divergences between the learning results and the real parameter.

Algorithm	MLE	Method 1	Method 2	Method 3	Method 4
KL divergence	0.1410	0.0961	0.1113	0.1356	0.1078

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, Y.; Hu, Z. Parameter Learning of Bayesian Network with Multiplicative Synergistic Constraints. Symmetry 2022, 14, 1469. https://doi.org/10.3390/sym14071469

AMA Style

Zhang Y, Hu Z. Parameter Learning of Bayesian Network with Multiplicative Synergistic Constraints. Symmetry. 2022; 14(7):1469. https://doi.org/10.3390/sym14071469

Chicago/Turabian Style

Zhang, Yu, and Zhiming Hu. 2022. "Parameter Learning of Bayesian Network with Multiplicative Synergistic Constraints" Symmetry 14, no. 7: 1469. https://doi.org/10.3390/sym14071469

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Parameter Learning of Bayesian Network with Multiplicative Synergistic Constraints

Abstract

1. Introduction

2. Preliminaries

2.1. Bayesian Networks

2.2. Maximum Likelihood Estimation

3. Isotonic Regression

4. Parameter Learning of BNs under Multiplicative Synergistic Constraints

4.1. The Model of Multiplicative Synergistic

4.2. Description of Algorithm

5. Experiments

5.1. Experiment 1

5.1.1. Simulation Model

5.1.2. Simulation Analysis

5.2. Experiment 2

5.2.1. Simulation Model

5.2.2. Simulation Analysis

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI