1. Introduction
A Bayesian network is a probabilistic graphical model used for knowledge representation and reasoning, mainly to represent and reason about the conditional dependencies among random variables. It represents the causal relationship between random variables through a directed acyclic graph and calculates the probability values of these relationships in combination with the probability distribution. This approach is widely used in many fields, including medical diagnosis [
1,
2], fault diagnosis [
3], and environmental protection [
4].
Currently, BN learning methods can be categorized into global structure learning and local structure learning. The former can be further divided into three main types: constraint-based approaches, score-based approaches, and hybrid approaches. Constraint-based approaches are based on conditional independence (CI) tests to identify causal relationships between variables. The early classical constraint-based methods were SGS and TPDA [
5]. However, for a Bayesian network with
n nodes, the two algorithms require
and
CI tests in the worst case. To reduce the computational complexity of the algorithm, the PC [
6] algorithm and its improved version, the PC-Stable [
7] algorithm, are proposed, among which the latter is the most commonly used algorithm and one of the comparison algorithms in this article. However, these methods of building BNs are all based on causal faithfulness and causal sufficiency. Therefore, when the sample size is insufficient, the accuracy of the CI test cannot be guaranteed, and the accuracy of the algorithm is greatly reduced.
The score-based approach treats the learning problem of the BN structure as a model selection problem and uses a score function and a search strategy to find the structure with the highest score in the search space. Common scoring functions include K2, the Bayesian information criterion (BIC), and the minimal description length (MDL), and common search spaces comprise the DAG space, the equivalent class (EC) space [
8], and the ordering space. Search strategies can be divided into two main categories: exact search and approximate search. Exact search algorithms, such as the B&B [
9] algorithm and the A* [
10] and ILP [
11] algorithms, cannot learn large-scale Bayesian network structures, and approximate learning algorithms use heuristic methods to improve search efficiency, which is a common method for large-scale BN learning. Hill climbing (HC) [
5] and K2 [
12], which are based on greedy search, were first proposed. Subsequently, a series of meta-heuristic algorithms, such as the genetic algorithm (GA) [
13], evolutionary programming [
14], ant colony optimization (ACO) [
15], cuckoo optimization (CO) [
16], water cycle optimization (WCO) [
17], particle swarm optimization (PSO) [
18,
19], artificial bee colony algorithm (ABC) [
20], bacterial foraging optimization (BFO) [
21], and firefly algorithm (FA) [
22], have been proposed to improve search efficiency and escape local optima. Among these meta-heuristic algorithms, the particle swarm optimization (PSO) algorithm is the most widely used and is used as a comparison algorithm in our experiments. Although these meta-heuristic algorithms can efficiently explore the search space, the following two challenges remain in the face of large-scale and highly complex DAGs:
An overly large search space will lead to low search efficiency.
It is prone to fall into a local optimum, resulting in a decrease in the accuracy of the final graph.
Hybrid approaches combine constraint-based and score-based approaches, attempting to integrate the advantages of both. It adopts the former to limit the search space of the latter algorithm. The classic hybrid approach is the max–min hill-climbing algorithm (MMHC) [
23], which is also one of the comparison algorithms featured in this paper. In recent years, feature selection has also been introduced into BN structure learning, such as the F2SL [
24] algorithm, which is also a contrast algorithm. It first determines the skeleton of the DAG through feature selection and then determines the direction of the edges on the basis of the CI test or scoring functions.
The goal of local structure learning is to find local structures in the form of parent–child nodes (PCs) or Markov blankets (MBs) of target variables. The existing MB learning algorithms can be divided into two main types: direct learning strategies and divide-and-conquer learning strategies. The direct learning strategy directly searches for MB variables on the basis of the statistical characteristics of MB’s conditional independence without distinguishing between PC and spouse variables. For the direct learning method, GS [
25] is the first theoretically correct Markov boundary algorithm and consists of two phases: growth and shrinkage. However, the heuristic method used is not efficient, and the IAMB [
26] algorithm proposed later adopts a dynamic heuristic method to make the algorithm more effective in selecting the candidate node set. In addition, the IAMB variants Inter-IAMB and Fast-IAMB alternately execute the growth process and shrinkage process in the IAMB algorithm and delete the false features in the MB set in time, thereby improving the accuracy of the CI test in the later stage of the algorithm operation. However, the GS, IAMB, and variants of the IAMB use the set of all currently selected features as the condition set, and the number of data samples required for the test is exponentially related to the size of the MB. The direct learning method has more advantages in terms of efficiency, but its accuracy for high-dimensional data is not ideal. The divide-and-conquer learning strategy utilizes the direct causal relationship between parent and child variables and the target variable to learn PC variables and spouse variables, respectively. It is usually superior to the direct learning method in terms of search accuracy and data utilization efficiency.
The first MB learning algorithm, which is based on the divide-and-conquer strategy, is the MMMB [
27] algorithm, which identifies spouse variables by identifying the V-structure. The HITON-MB [
28] algorithm and the semi-interleaved HITON-MB algorithm are improved versions of the MMMB algorithm. However, the MMMB algorithm and its variants are not correct in theory. The PCMB [
29] algorithm is the first divide-and-conquer method that can be proven correct in theory. This algorithm innovatively introduces the symmetry test based on “and-rule”. However, the symmetry test step increases the time complexity of the algorithm, and the subsequently proposed STMB [
30] algorithm alleviates this problem. To strike a balance between data usage efficiency and time efficiency, some algorithms, such as the BAMB [
31] algorithm and the EEMB [
32] algorithm, adopt the strategy of alternating PC learning and spouse learning. Some local structure learning algorithms, such as the PCD-by-PCD [
33] and CMB [
34] algorithms, can perform causal orientation on the basis of the separation set and Meek rules while conducting PC discovery. LCS-FS [
35] is a local structure learning algorithm based on feature selection. When testing the causal relationship between two nodes, a feature selection method based on mutual information without the need for a condition set is adopted, significantly improving efficiency.
Each approach has its own limitations and advantages. How to more effectively combine the constraint-based method and the score-based method, and how to combine the global structure learning method and the local structure learning method to play the complementary role of the two, are the current problems that need to be solved in Bayesian network structure learning, and they also comprise the research content of this paper. In our work, the local structure learning method is used to mine prior knowledge, and the global structure learning method can compensate for its own limitations by integrating this prior knowledge. For the local structure learning algorithm, we consider designing a new feature selection method, which can improve the recall rate of the MB while avoiding high-order CI tests and enhancing the precision rate when identifying V-structures. For the global structure optimization algorithm, the purpose of our work is to develop a meta-heuristic algorithm based on knowledge fusion, which can use the obtained knowledge as constraints to enable the meta-heuristic to converge rapidly and improve accuracy.
The SFS [
36] algorithm is a new type of meta-heuristic algorithm proposed in 2015 that has few parameters, fast convergence, and high accuracy. Its inspiration comes from the combination of the principles of fractal geometry in nature and the random search strategy. It effectively explores the solution space by utilizing the self-similarity and randomness of the fractal structure, avoiding local optima. At present, it has been widely applied to solve complex optimization problems in various fields, including power and energy [
37], finance [
38], image processing [
39], and machine learning [
40]. To our knowledge, these applications all focus on solving continuous optimization problems. Therefore, to introduce the SFS algorithm into the field of DAG learning, we propose a new binary SFS algorithm. The SFS algorithm consists of two main processes: the diffusion process and the update process. During the diffusion process, the method used to generate fractal shapes is random fractals, which can be generated by modifying random rules during the iterative process. For the DAG learning problem, we have redefined a new random walk strategy to solve the binary problem. During the update process, we link the strategy of updating positions among particles through information exchange with mutual learning among individuals in the DAG learning problem.
The main contributions of the paper are summarized as follows:
We propose a new local structure learning method that uses a combination of feature selection and CI testing to mine prior knowledge and identify partial edges.
We propose a binary SFS algorithm, which can integrate the obtained prior knowledge as soft/hard constraints to improve the search efficiency and accuracy.
Experiments prove that the performance of the local structure learning algorithm we propose in terms of searching the space and obtaining the structure prior is more suitable for soft constraint knowledge mining to solve the BN structure learning problem. Moreover, the joint cooperation of soft and hard constraints can improve the performance of the SFS algorithm.
The remainder of this article is organized as follows.
Section 2 introduces the background and related concepts.
Section 3 discusses the research design of this study.
Section 4 discusses the experimental results and performance evaluation.
Section 5 concludes this paper with some remarks and suggestions for future research.
4. Experiments
In this section, we first evaluate the performance of the WLBRC method in mining soft constraint knowledge, compare it with six MB discovery algorithms to evaluate the performance in the search space, and compare it with six algorithms that can identify V-structures to evaluate the performance in identifying V-structures. The six MB discovery algorithms are IAMB, STMB, PCMB, BAMB, LCS-FS, and MMPC [
23]. The six algorithms that can identify the V-structure are the GS, CMB, PCD-by-PCD, LCS-FS, PC-Stable, and F2SL_c [
24] algorithms. Then, we evaluated the performance of the FS-SFS algorithm on different networks and datasets. Finally, the FS-SFS algorithm was compared with five known BN structure learning algorithms on different datasets. The five comparison algorithms are the PC-stable, GS, F2SL, MMHC, and BNC-PSO algorithms. All the algorithms are run in MATLAB R2020a, and all the following experiments are performed on an AMD 1.7 GHz CPU with 16 GB of RAM.
4.1. Datasets and Evaluation Metrics
To evaluate the performance of the FS-SFS algorithm, we selected six standard Bayesian networks from the BNLEARN repository (
https://www.bnlearn.com/bnrepository/ and accessed on 29 April 2025) and collected 1000, 3000, 5000, and 10,000 samples for each network. The summaries of the six Bayesian networks are shown in
Table 1. To better test the performance of the proposed algorithm, we selected the network with more nodes. In the BNLEARN repository, the Alarm network is a medium network; the Hepar2 and Win95pts networks are large networks; and the Munin, Andes, and Pigs networks are very large networks.
To evaluate the search performance of the SFS algorithm, we adopt the following indicators:
BIC: The BIC score of the output optimal network.
AE: The number of edges in the output optimal network that were incorrectly added.
DE: The number of edges in the output optimal network that were incorrectly deleted.
RE: The number of edges in the output optimal network that were incorrectly reversed.
SHD: The hamming distance between the output structure and the original structure.
RT: The running time of the SFS algorithm.
F1: The evaluation index of graph accuracy; its calculation formula is .
Two main performance indicators were used to evaluate the performance of the proposed feature selection method: precision (P) and recall (R). Precision represents the number of correct edges in the set divided by the total number of edges in the set, and recall represents the number of correct edges in the set divided by the true number of edges in the original set. Recall and precision often have an inverse relationship. In some cases, increasing the recall rate may reduce the precision rate, and vice versa. The search space should adopt the principle of recall priority because a high recall rate can improve the completeness of the search space, which is crucial to the performance of the subsequent score search algorithm. The directional V-structure should adopt the principle of precision priority because high accuracy can improve the search efficiency of the subsequent score search algorithm. To quantitatively evaluate the performance of each algorithm in the search space and the directional V-structure, we adopt a more general form
of F1 to express our different preferences for precision and recall, which is defined as
when
, the recall rate has a greater impact, and when
, the precision rate has a greater impact. In our experiment,
was set to 5 when the search space was compared and to 0.2 when the V-structure was compared.
4.2. Soft Constraint Knowledge Mining
From the previous section, we know that soft constraint knowledge contains a series of search spaces and sets of directed edges. We report the precision and recall rates in each search space and the set of directed edges in
Table 2.
For search spaces, a higher recall indicates a better completeness of the search space, and a higher precision can enhance search efficiency. By comparing , , and , we find that the precision of is too low, which reduces the search efficiency, whereas has the lowest recall, which affects the graph accuracy of the final output network. Therefore, we consider as the search space. For , we found that on the Alarm and Pigs networks, the recall rate fluctuated very little compared with that of . However, on the other four networks, the recall rate decreases to varying degrees, which may cause the final output network to lose some edges. For , the precision is relatively high, and the initial population obtained by climbing within it has a very high score.
For the V set, the precision indicates the accuracy of the identified direction. The higher the precision is, the more reliable the prior knowledge is. The recall indicates the sufficiency of soft constraint knowledge. The higher it is, the more abundant the soft constraint knowledge is. For the Alarm, Win95pts, Andes, and Pigs networks, the precision exceeds 90%, which indicates that the orientation of the V-structure is very reliable for these four networks. For the Hepar2 network and the Munin network, the recall is significantly lower than that of the other four networks, and the precision is also lower than that of the other four networks. This indicates that sufficient and reliable prior knowledge has not been mined on these two networks.
To further illustrate the performance of the algorithm we proposed in soft constraint knowledge mining, we compared the results of six local structure learning algorithms in terms of the search space and V-structure identification. The adopted dataset sizes are 1000 and 10,000, which are the minimum and maximum values of the datasets collected in our experiment, respectively.
Table 3 reports the performance comparison with six local MB discovery algorithms in terms of the search space. For the search space, compared with precision, we should prioritize the recall rate.
is a harmonic value that we set on the basis of this preference. As the sample size increased from 1000 to 10,000, the recall rates of all the algorithms improved significantly, except for LCS-FS. Moreover, the recall rate of LCS-FS is also significantly lower than that of the other algorithms on most datasets. The reason for the analysis is that LCS-FS adopts the FCBF method for the discovery of approximate MB. As explained in the previous section, this method is prone to incorrectly deleting important features, so its recall rate cannot be guaranteed.
The IAMB algorithm adopts the direct learning strategy, whereas the STMB, PCMB, and BAMB algorithms adopt the divide-and-conquer strategy. Although theoretically, the direct learning strategy has high computational efficiency but poor accuracy when processing high-dimensional data, from our experimental results, it seems not to be absolute. The IAMB algorithm is significantly superior to the PCMB and BAMB algorithms in terms of time performance, but it outperforms the STMB algorithm on the Win95pts and Andes networks. The precision of the STMB algorithm is significantly lower than that of the IAMB, PCMB, and BAMB algorithms. MMPC is currently a commonly used search space establishment algorithm in Bayesian network heuristic algorithms. It prefers precision when establishing the search space, which is also the common preference of other comparison algorithms. However, the improvement in precision often reduces the recall rate. Among the comparison algorithms, the STMB algorithm with the lowest precision maintains a relatively high recall rate. Here, our aim is not to explain the shortcomings of other algorithms but rather to highlight their inappropriateness when used to establish the search space. This is also the motivation for us to develop new local structure learning algorithms.
Table 4 reports the performance comparison with six algorithms in identifying the V-structure. To identify the V-structure, compared with the recall rate, we should prioritize the precision rate.
is the harmonic value we set according to this preference. The GS algorithm achieves high precision on the other five networks except the Munin network. However, its low precision on the Munin network and low recall rate on all datasets indicate that its performance in identifying V-structures is weaker than that of other algorithms. Both the LCS-FS and F2SL_c algorithms adopt the FCBF method. Therefore, the recall rate of their V-structure cannot be guaranteed. In particular, for the Hepar2 network, its recall rate is only 1.63%, and the recall rate does not increase when the sample size increases to 10,000. This is also the motivation for why we introduce the LBRC method to rank the importance to avoid the omission of important features. The time performance of the CMB and PCD-by-PCD algorithms is significantly lower than that of the other algorithms. The PC-Stable algorithm is a generally recognized as a constraint-based DAG learning method, but its precision on the Munin1000 is only 5.51%.
To comprehensively illustrate the performance of the WLBRC algorithm in mining soft constraint knowledge, we report in
Table 5 the number of wins and losses of the proposed algorithm compared with other algorithms in searching the space and identifying the V-structure. For the search space, the WLBRC algorithm outperforms other algorithms in terms of time and recall rate and is only on par with MMPC in terms of comprehensive indicators
. For the V-structure, the WLBRC algorithm outperforms the other algorithms in comprehensive indicators
and only loses to the F2SL algorithm in terms of time and precision. However, the F2SL algorithm adopts the FCBF method. Its performance in the search space is consistent with that of the LCF-FS algorithm, and its recall rate is lower than that of the WLBRC algorithm. Overall, the WLBRC algorithm sacrifices the precision rate in the search space to ensure the recall rate and uses the recall rate in identifying the V-structure to ensure the precision rate. This preference leads to some indicators being inferior to those of other algorithms. However, we believe that the proposed algorithm is more suitable for solving the problem of soft constraint knowledge mining in DAG learning.
4.3. Learning BNs via the FS-SFS Algorithm
The parameters of the FS-SFS algorithm are few and simple. As shown in
Table 6, these parameters can be set directly without repeated optimization. To evaluate the performance of the SFS algorithm, we report the experimental results under different Bayesian network structure training sets in
Table 7. For each dataset, we report the average and standard deviation of each metric after 10 runs.
From the perspective of the BIC score, the standard deviations of the FS-SFS algorithm on the four datasets of the Alarm and Pigs networks are 0, which indicates that the algorithm can stably learn a network structure with the same BIC score on these two networks. Interestingly, on Alarm1000, although the acquired networks have the same BIC score, the F1 score and RE fluctuate, which indicates that the acquired networks have different structures with the same BIC score. For the other four networks, except for Munin3000, Munin5000, Munin10000, and Win95pts5000, the standard deviations of the network learned by the algorithm on the other datasets are all single digits, which is relatively small. An important reason for the analysis of the BIC score fluctuation on the Munin network is that its complex structure makes it impossible for us to learn sufficient soft constraint knowledge from the data, thereby resulting in changes in the final output structure. Overall, the SFS algorithm has very good stability in the search for scores.
From the point of view of structural errors, the SHD tends to decrease with increasing sample size and fluctuates very little around the mean. For the Alarm and Pigs networks, the structural errors of the learned networks have almost no fluctuations. Among the other four networks, the average value of structural errors is relatively large and concentrated in DE and AE. By observing the experimental results in the previous section, we can determine that the reason is the low recall of the search space. With increasing sample size, the recall of the search space increases, and DE and AE decrease. This finding indicates that increasing the sample size is conducive to restoring the wrongly deleted edges in the learning network, and the addition of the correct edges also limits the entry of the wrong edges to a certain extent. Furthermore, we find that the standard deviations of the structural errors AE, DE, and RE are all very small relative to the mean. This indicates that the structural errors between the learned networks are very small; that is, the output results of the SFS algorithm have good stability in terms of graph accuracy. With increasing sample size, the F1 score of the learning network also tends to increase, which indicates that appropriately increasing the sample size is helpful for improving the graph accuracy of our algorithm.
From the point of view of running time, with increasing sample size, the running time did not surge sharply but increased slowly, which indicates that our algorithm is suitable for processing large-scale datasets. However, as the network scale increases, the running time of the algorithm increases significantly, which indicates that the time performance of our algorithm still needs to be further optimized.
To test the impact of parameter changes on the performance of the algorithm, we floated the three hyper-parameters (
) up and down.
Table 8 reports the performance after parameter changes on the two networks of Alarm and Hepar2. The sample sizes were selected as 1000 and 10,000, and each parameter change was run only once. For the four datasets, after the parameter
changes, the changes in each performance index are not obvious, which indicates that the performance of the algorithm is not sensitive to the parameter
. For
p, all three performance indicators change significantly. The trend is that reducing
p increases the BIC score and SHD while reducing the F1 score. Increasing
p increases the F1 score and reduces the BIC score and SHD. Theoretically, the higher the positive constraint rate
p of expert knowledge is, the better the performance of the algorithm should be. The reason for the inconsistency between the F1 score and the BIC score is that the data are not faithful to the underlying network. For
q, there are changes in three performance indicators, but they are not obvious. The trend is that the smaller
q is, the higher the BIC score will be. The reason is that its up and down fluctuations cause a reduction in and expansion of the search space, and a change in the search space affects the output results of the algorithm. The sensitivity of algorithm performance to the search space motivated us to develop new local structure learning algorithms.
4.4. Comparison with Some Other Algorithms
To make the comparison fair, the FS-SFS algorithm only integrates the knowledge of soft constraints, and the hard constraints are reset to 0. Furthermore, for the BNC-PSO algorithm, which is also a meta-heuristic, we also integrate soft constraint knowledge; that is, it has the same initial population and search space. For the two constraint-based algorithms, PC-stable and GS, we still calculate their BIC scores for convenient comparison.
Table 9 and
Table 10 present the BIC scores and F1 scores, respectively, of the output structures of each algorithm under different datasets. The best results of each dataset are displayed in bold. Since the FS-SFS algorithm is a score-based algorithm, we report a two-sided Wilcoxon signed-rank test of the BIC scores in
Table 9 to observe whether the output results are statistically reliable. The P-value of the test is marked with an asterisk after the BIC score of the comparison algorithm.
As shown in
Table 9, the BIC scores of the FS-SFS algorithm on all datasets are higher than those of the other algorithms, which indicates that the SFS algorithm has a strong global search ability in score search. For the constraint-based algorithms GS and PC-Stable, the BIC score of the output network is significantly lower than that of the algorithm based on score search, especially when the network scale is large, which makes the gap more obvious. Furthermore, the PC-Stable algorithm cannot calculate the score on five datasets. The four score-based algorithms are combinations of local search methods and global search methods. Among them, the F2SL algorithm and the MMHC algorithm adopt the same global search method (hill climbing method). The F2SL algorithm outperforms the MMHC algorithm on the Alarm and Munin networks but loses to the MMHC algorithm on the other four networks. This finding indicates that the local search methods adopted by both have low robustness and cannot guarantee good performance when facing different BNs. The F2SL algorithm and the MMHC algorithm both lose to the meta-heuristic BNC-PSO algorithm and the FS-SFS algorithm on all networks. This indicates that the meta-heuristic method may have better global search capabilities in terms of score search than the hill climbing method does. Since the BNC-PSO algorithm and the FS-SFS algorithm adopt the same local search method, we believe that the global search capability of the latter is greater than that of the former.
As shown in
Table 10, the FS-SFS algorithm has the highest F1 score on the 14/24 datasets. Interestingly, the datasets where F1 scores win are different from those where the BIC scores win. One important reason for having a high BIC score but a low F1 score is that the data are not faithful to the underlying BN. Since the algorithms involved in the comparison are either constraint-based algorithms or integrate local search methods, as the sample size increases, the F1 scores of each algorithm improve. For the constraint-based algorithms, the F1 scores of the GS algorithm on the Munin and Pigs networks are significantly lower than those of the other algorithms, and the F1 scores of the other four networks are also lower than those of the PC-Stable algorithm. Obviously, the performance of PC-Stable is better than that of the GS algorithm. However, when the sample size is insufficient, the performance of the PC-Stable algorithm will be severely challenged. For example, on the Munin1000 dataset, the F1 score is 0.1180, which is much lower than those of other score-based algorithms. For the Munin10000 dataset, the F1 score becomes 0.5380, surpassing those of the F2SL, MMHC, and BNC-PSO algorithms. This indicates that the performance of the constraint-based algorithm is constrained by the sample size. Similarly, the four score-based algorithms involved in the comparison integrate local search methods, and their performances are affected by the sample size. We compared the F1 scores of the four score-based algorithms when the sample size was 1000. The FS-SFS algorithm won three times, and the MMHC algorithm won two times. This finding indicates that FS-SFS is more suitable for handling small sample datasets.
The above experimental results show that the FS-SFS algorithm under soft constraints has a greater global search ability in score search, and the graph accuracy of the learning network is also superior to that of other comparison algorithms. However, when addressing issues such as insufficient sample size and large-scale DAG learning, a high BIC score of the output structure often reduces the F1 score of the output structure. The hard constraint of expert knowledge can alleviate this problem, outputting high-score structures while having high graph accuracy and thus strong robustness and accuracy. In conclusion, both soft and hard constraints have significant influences on the performance of the SFS algorithm. The local structure learning algorithm based on the feature selection we propose is suitable for solving soft constraint knowledge mining in the BN structure learning problem.