(1) Potential optimal parent sets. According to the BN structure learning problem description and Formulas (3) and (4), it is necessary to find an optimal parent set from the of each node . Obviously, the number of POPS limits the efficiency of the search. If the number of remaining POPS is very small, it is easy to find the parent sets that meet the requirements.
(2) Order graph. The total number of states in the order graph is , and its scale increases exponentially with the increase in the nodes in BN. The size of the search space also limits the efficiency of the search.
Based on the above two points, this paper will propose solutions to improve the efficiency of the A* algorithm.
3.1. Pruned Potential Optimal Parent Sets with MMPC
Although pruning rules can further reduce the number of POPS, the remaining number is still considerable. In the problem of BN structure learning, we ultimately need optimal parent sets. Therefore, other sets are relatively unnecessary. Given a target variable
, the MMPC algorithm can quickly return the parent–child set
of the target variable
under the CI test. For the two variables
,
and the set
, the CI
can be calculated by
statistics under the null hypothesis of conditional independence. Let
represent the occurrence times of
,
, and
in the dataset
(respectively,
,
, and
denote the values specifically taken by
,
, and
.
and
generally are integers, and
is a combination of the integers.); then, the statistical variable
is defined as
Under the null hypothesis, the
statistic asymptotically obeys the distribution of
statistics. Therefore, given the significance level
, if the value
calculated by the test, namely,
, is less than
, the hypothesis is rejected, and the variables
and
are considered to be conditionally dependent under a given
. Otherwise,
and
are considered to be conditionally independent under a given
. The pseudocode of the MMPC algorithm is shown in Algorithm 1. In Algorithm 1,
is the parent–child set of the target variable
.
Algorithm 1: MMPC |
Input: Target variable , variable set , and significance level |
Output: Parent-child set of the target variable : Let parent-child set of the target variable : , ; while for do if then end if end for and for do if then end if end for end while |
Given a condition set , the MMPC algorithm not only considers to determine whether and are independent but also considers , which has stronger robustness, to determine whether and are independent. Certainly, this approach requires more test calculations. Finally, we can use the MMPC algorithm to compute the parent–child set for each variable .
Through the constraint of the parent–child set
, we can further prune the unnecessary sets and their corresponding MDL score calculations for the
. Taking a four-node BN as an example, for node
,
and all of its subsets could be the parent set of
. If the traditional pruning rules (Theorems 1–3) are not considered to be in effect, its POPS are still
and all its subsets, which are represented as the parent graph of
, as shown in
Figure 3. If the parent–child set of
obtained by the MMPC algorithm is
, then the parent graph of
shown in
Figure 3 can be pruned to the parent graph shown in
Figure 4. For larger BNs, this pruning will be more significant in its score calculations. The constraints of the parent–child set calculated by the MMPC algorithm can greatly reduce unnecessary score calculations and storage. Limiting the number of corresponding POPS improves the search efficiency of the A* algorithm.
3.2. Pruned Order Graph with Path Constraints
According to
Section 2.2, the optimal BN structure learning actually searches for the shortest path from the order graph. Therefore, if the path constraints can be found in the order graph, it will greatly improve the efficiency of the A* algorithm in searching for the shortest path in the order graph.
Before illustrating such path constraints, a simple example can be taken.
Table 1 shows the POPS of each variable in a six-node BN. We assume that we have obtained the POPS of each variable by the score pruning rules or MMPC algorithm in
Section 3.1. It can be seen from
Table 1 that not all nodes can choose all other nodes as their parent nodes due to the parent–child set constraints obtained from
Section 3.1. For example,
can only choose
as its parent node or an empty set with no parent.
A directed graph can be obtained by connecting each node
and its potential optimal parent sets
. We connect from each node
and its potential optimal parent sets
from
Table 1 to obtain the directed graph, as shown in
Figure 5. In such a directed graph, if
is a potential parent node of
, then the graph contains directed edges from
to
.
An interesting phenomenon can be observed in
Figure 5: there are only directed edges from the node in
to the node in
but no directed edges from the node in
to the node in
; in other words, the node in
cannot be the parent node of the node in
, and thus, it can be split into two parts:
and
. Thus, based on
Figure 5, by contracting
to one node and
to another node, we can finally obtain the acyclic component graph, as shown in
Figure 6. Based on the above splitting method, we can split the order graph of learning the six-node BN into two subgraphs, as shown in
Figure 6.
Obviously, it can be seen from the above that the complete order graph of the six-variable BN should contain 2
6 states. However, the order graph can be split into two subgraphs based on the constraints from
Figure 5, in other words,
and
. We refer to this splitting method as path constraints.
As the structure of the order graph changed, the entire process of searching the order graph also changed. First, we find the shortest path from
to
in the first subgraph of
Figure 7, and then find the shortest path from
to
in the second subgraph of
Figure 7. The shortest path from
to
is obtained by concatenating the shortest paths in the two subgraphs.
becomes the necessary state in the shortest searching process of the order graph. Therefore, the number of the states in the order graph search space of
Figure 7 can be reduced to
.
Compared with states in the complete order graph of the six-variable BN, path constraints can reduce the number of states in the order graph. As the number of nodes increases, the path constraint reduces the number of states in the order graph more significantly. We give Theorem 4 to generalize and quantify this reduction.
Theorem 4. In a Bayesian network with node set , given path constraints, can be split into subsets (). Then, the number of states in the order graph is reduced from to .
Proof of Theorem 4. Obviously, the total states of the complete order graph correspond to the Bayesian network of nodes. For the ordered graph under path constraints, where the number of states of any subgraph split by is , there are such subgraphs, and the total number of states is . However, this will double count states. Therefore, states are removed from the total number of computations. Finally, the total number of states is . □
This simple example shows that the directed graph built from each node and its potential optimal parent sets implies path constraints, which can be used to prune the order graph.
The internal principle is briefly described as follows, requiring the help of Theorem 5 (it is proved in the literature [
19]).
Theorem 5. Let and be two candidate parent sets for such that . We must have .
In general, if there is only a directed path from to in the directed graph but no directed path from to , then the order graph does not need to generate states containing but excluding . One way to think about this phenomenon is the following. For a current state in the order graph that does not include and , if we expand first and then , then the path cost from state to state is . On the other hand, if we expand first and then , the path cost from state to state is . However, since only a directed path from to can exist, . For these two path expansion plans, we should continue to compare the values of and . According to Theorem 5 and , it is more likely to obtain a better value that makes this path smaller in a larger set, and thus, . Therefore, the plan that expands first and then is more likely to achieve the shortest path from to . Thus, there is no need to generate states that contain but exclude .
Based on the previous simple example, we discuss the general method of obtaining path constraints in order to prune the order graph.
A new concept, the strongly connected component (SCC), is actually involved in the process of splitting the directed graph built from each node and its potential optimal parent sets . In a directed graph, if there is a directed path from to between two nodes and a directed path from to , the two nodes are said to be strongly connected. A directed graph is a strongly connected graph if any two nodes are strongly connected. The extremely strongly connected subgraph of a directed graph is called a strongly connected component. The strongly connected components of a directed graph form an acyclic component graph, which is also a DAG. Each node in the acyclic component graph corresponds to a strongly connected component and to a subset of the node set in a BN. The acyclic component graph gives more intuitive path constraints. In the acyclic component graph, if there are directed paths from to , the variable in cannot be the parent node of the variable in .
Based on the above concept, we try to obtain path constraints by extracting SCCs to prune the order graph. At present, there are mature algorithms for SCC extraction, among which the Kosaraju algorithm is the most commonly used. The pseudocode of the algorithm that obtains path constraints by extracting SCCs is shown in Algorithm 2.
In this algorithm, the potential optimal parent sets
of each node
are used to build the directed graph
, and the SCC
of the directed graph
is extracted by the Kosaraju algorithm. It is worth noting that if the size of the SCC is too large, it is still not conducive to improving the efficiency of the algorithm and to searching for a larger network. For example, the original A* algorithm itself cannot search the network of over 50 nodes. If, in the operation of building a directed graph
and extracting SCCs through the POPS, two SCCs with
and
are obtained, and the path constraints are determined, such a method is still meaningless. Because the A* algorithm still cannot search a network of 49 nodes. Thus, we limit the size of the SCC with the parameter
. If the size of the maximum SCC exceeds the parameter
, a part of the set of potential optimal parent sets
is selected to rebuild the directed graph
. We prefer to select the sets that correspond to the local scores of the top
in
. The parameter
will gradually decrease from the maximum number of POPS until the SCC that meets the conditions can be extracted from the built directed graph. Then, Algorithm 2 breaks out of the loop and returns the extracted SCCs under the constraints of the parameter
. However, this method is greedy because only part of POPS is used to build the directed graph, and the extracted SCCs lose some information. The pruned order graph formed according to the path constraints of SCCs has certain problems, which will affect the shortest path search and affect the accuracy of the final BN. This effect will be analyzed in detail in the experimental section.
Algorithm 2: Obtain path constraints algorithm |
Input: Potential optimal parent sets of each node , maximum size |
Output: Path constraints build graph according to all of ; ; if then for do build graph according to the best of ; ; if then break; endif end for end if
|
Finally, we discuss the search complexity in the pruned order graph.
obtained by Algorithm 2 can split the original order graph into
subgraphs, where the connection state between each subgraph is
, and
,
. Thus, for each subgraph, the start state is
and the goal state is
. For the entire order graph, it is equivalent to searching the shortest paths from
to
, then from
to
, all the way to
to
. Still taking
Figure 7 as an example, because there are
and
, there are
and
. Therefore, we search the shortest path from
to
and then search the shortest path from
to
. Finally, it only remains to connect each shortest path to obtain the entire shortest path on the order graph. For each subgraph, the maximum complexity of the A* search is
. This case is the worst case, which is almost impossible because A* uses heuristic functions. Therefore, in the pruned order graph, which is split into
subgraphs using
, the maximum complexity of the A* search is
This conclusion also corroborates Theorem 4. This finding shows that the maximum complexity depends on the size of the maximum SCC. Therefore, it is necessary for Algorithm 2 to use the parameter to limit the size of the maximum SCC, which effectively limits the maximum complexity of the A* search of the pruned order graph.