Stochastic Fractal Search for Bayesian Network Structure Learning Under Soft/Hard Constraints

Dang, Yinglong; Gao, Xiaoguang; Wang, Zidong

doi:10.3390/fractalfract9060394

Open AccessArticle

Stochastic Fractal Search for Bayesian Network Structure Learning Under Soft/Hard Constraints

by

Yinglong Dang

,

Xiaoguang Gao

^* and

Zidong Wang

School of Electronic and Information, Northwestern Polytechnical University, Xi’an 710129, China

^*

Author to whom correspondence should be addressed.

Fractal Fract. 2025, 9(6), 394; https://doi.org/10.3390/fractalfract9060394

Submission received: 5 May 2025 / Revised: 6 June 2025 / Accepted: 18 June 2025 / Published: 19 June 2025

(This article belongs to the Special Issue Application of Fractals in Complex Networks of Engineering and Medicine)

Download Versions Notes

Abstract

:

A Bayesian network (BN) is an uncertainty processing model that simulates human cognitive thinking on the basis of probability theory and graph theory. Its network topology is a directed acyclic graph (DAG) that can be manually constructed through expert knowledge or automatically generated through data learning. However, the acquisition of expert knowledge faces problems such as excessively high labor costs, limited expert experience, and the inability to solve large-scale and highly complex DAGs. Moreover, the current data learning methods also face the problems of low computational efficiency or decreased accuracy when dealing with large-scale and highly complex DAGs. Therefore, we consider mining fragmented knowledge from the data to alleviate the bottleneck problem of expert knowledge acquisition. This generated fragmented knowledge can compensate for the limitations of data learning methods. In our work, we propose a new binary stochastic fractal search (SFS) algorithm to learn DAGs. Moreover, a new feature selection (FS) method is proposed to mine fragmented knowledge. This fragmented prior knowledge serves as a soft constraint, and the acquired expert knowledge serves as a hard constraint. The combination of the two serves as guidance and constraints to enhance the performance of the proposed SFS algorithm. Extensive experimental analysis reveals that our proposed method is more robust and accurate than other algorithms.

Keywords:

Bayesian network; structural constraint; feature selection; stochastic fractal search

1. Introduction

A Bayesian network is a probabilistic graphical model used for knowledge representation and reasoning, mainly to represent and reason about the conditional dependencies among random variables. It represents the causal relationship between random variables through a directed acyclic graph and calculates the probability values of these relationships in combination with the probability distribution. This approach is widely used in many fields, including medical diagnosis [1,2], fault diagnosis [3], and environmental protection [4].

Currently, BN learning methods can be categorized into global structure learning and local structure learning. The former can be further divided into three main types: constraint-based approaches, score-based approaches, and hybrid approaches. Constraint-based approaches are based on conditional independence (CI) tests to identify causal relationships between variables. The early classical constraint-based methods were SGS and TPDA [5]. However, for a Bayesian network with n nodes, the two algorithms require

O (2^{n})

and

O (n^{4})

CI tests in the worst case. To reduce the computational complexity of the algorithm, the PC [6] algorithm and its improved version, the PC-Stable [7] algorithm, are proposed, among which the latter is the most commonly used algorithm and one of the comparison algorithms in this article. However, these methods of building BNs are all based on causal faithfulness and causal sufficiency. Therefore, when the sample size is insufficient, the accuracy of the CI test cannot be guaranteed, and the accuracy of the algorithm is greatly reduced.

The score-based approach treats the learning problem of the BN structure as a model selection problem and uses a score function and a search strategy to find the structure with the highest score in the search space. Common scoring functions include K2, the Bayesian information criterion (BIC), and the minimal description length (MDL), and common search spaces comprise the DAG space, the equivalent class (EC) space [8], and the ordering space. Search strategies can be divided into two main categories: exact search and approximate search. Exact search algorithms, such as the B&B [9] algorithm and the A* [10] and ILP [11] algorithms, cannot learn large-scale Bayesian network structures, and approximate learning algorithms use heuristic methods to improve search efficiency, which is a common method for large-scale BN learning. Hill climbing (HC) [5] and K2 [12], which are based on greedy search, were first proposed. Subsequently, a series of meta-heuristic algorithms, such as the genetic algorithm (GA) [13], evolutionary programming [14], ant colony optimization (ACO) [15], cuckoo optimization (CO) [16], water cycle optimization (WCO) [17], particle swarm optimization (PSO) [18,19], artificial bee colony algorithm (ABC) [20], bacterial foraging optimization (BFO) [21], and firefly algorithm (FA) [22], have been proposed to improve search efficiency and escape local optima. Among these meta-heuristic algorithms, the particle swarm optimization (PSO) algorithm is the most widely used and is used as a comparison algorithm in our experiments. Although these meta-heuristic algorithms can efficiently explore the search space, the following two challenges remain in the face of large-scale and highly complex DAGs:

An overly large search space will lead to low search efficiency.
It is prone to fall into a local optimum, resulting in a decrease in the accuracy of the final graph.

Hybrid approaches combine constraint-based and score-based approaches, attempting to integrate the advantages of both. It adopts the former to limit the search space of the latter algorithm. The classic hybrid approach is the max–min hill-climbing algorithm (MMHC) [23], which is also one of the comparison algorithms featured in this paper. In recent years, feature selection has also been introduced into BN structure learning, such as the F2SL [24] algorithm, which is also a contrast algorithm. It first determines the skeleton of the DAG through feature selection and then determines the direction of the edges on the basis of the CI test or scoring functions.

The goal of local structure learning is to find local structures in the form of parent–child nodes (PCs) or Markov blankets (MBs) of target variables. The existing MB learning algorithms can be divided into two main types: direct learning strategies and divide-and-conquer learning strategies. The direct learning strategy directly searches for MB variables on the basis of the statistical characteristics of MB’s conditional independence without distinguishing between PC and spouse variables. For the direct learning method, GS [25] is the first theoretically correct Markov boundary algorithm and consists of two phases: growth and shrinkage. However, the heuristic method used is not efficient, and the IAMB [26] algorithm proposed later adopts a dynamic heuristic method to make the algorithm more effective in selecting the candidate node set. In addition, the IAMB variants Inter-IAMB and Fast-IAMB alternately execute the growth process and shrinkage process in the IAMB algorithm and delete the false features in the MB set in time, thereby improving the accuracy of the CI test in the later stage of the algorithm operation. However, the GS, IAMB, and variants of the IAMB use the set of all currently selected features as the condition set, and the number of data samples required for the test is exponentially related to the size of the MB. The direct learning method has more advantages in terms of efficiency, but its accuracy for high-dimensional data is not ideal. The divide-and-conquer learning strategy utilizes the direct causal relationship between parent and child variables and the target variable to learn PC variables and spouse variables, respectively. It is usually superior to the direct learning method in terms of search accuracy and data utilization efficiency.

The first MB learning algorithm, which is based on the divide-and-conquer strategy, is the MMMB [27] algorithm, which identifies spouse variables by identifying the V-structure. The HITON-MB [28] algorithm and the semi-interleaved HITON-MB algorithm are improved versions of the MMMB algorithm. However, the MMMB algorithm and its variants are not correct in theory. The PCMB [29] algorithm is the first divide-and-conquer method that can be proven correct in theory. This algorithm innovatively introduces the symmetry test based on “and-rule”. However, the symmetry test step increases the time complexity of the algorithm, and the subsequently proposed STMB [30] algorithm alleviates this problem. To strike a balance between data usage efficiency and time efficiency, some algorithms, such as the BAMB [31] algorithm and the EEMB [32] algorithm, adopt the strategy of alternating PC learning and spouse learning. Some local structure learning algorithms, such as the PCD-by-PCD [33] and CMB [34] algorithms, can perform causal orientation on the basis of the separation set and Meek rules while conducting PC discovery. LCS-FS [35] is a local structure learning algorithm based on feature selection. When testing the causal relationship between two nodes, a feature selection method based on mutual information without the need for a condition set is adopted, significantly improving efficiency.

Each approach has its own limitations and advantages. How to more effectively combine the constraint-based method and the score-based method, and how to combine the global structure learning method and the local structure learning method to play the complementary role of the two, are the current problems that need to be solved in Bayesian network structure learning, and they also comprise the research content of this paper. In our work, the local structure learning method is used to mine prior knowledge, and the global structure learning method can compensate for its own limitations by integrating this prior knowledge. For the local structure learning algorithm, we consider designing a new feature selection method, which can improve the recall rate of the MB while avoiding high-order CI tests and enhancing the precision rate when identifying V-structures. For the global structure optimization algorithm, the purpose of our work is to develop a meta-heuristic algorithm based on knowledge fusion, which can use the obtained knowledge as constraints to enable the meta-heuristic to converge rapidly and improve accuracy.

The SFS [36] algorithm is a new type of meta-heuristic algorithm proposed in 2015 that has few parameters, fast convergence, and high accuracy. Its inspiration comes from the combination of the principles of fractal geometry in nature and the random search strategy. It effectively explores the solution space by utilizing the self-similarity and randomness of the fractal structure, avoiding local optima. At present, it has been widely applied to solve complex optimization problems in various fields, including power and energy [37], finance [38], image processing [39], and machine learning [40]. To our knowledge, these applications all focus on solving continuous optimization problems. Therefore, to introduce the SFS algorithm into the field of DAG learning, we propose a new binary SFS algorithm. The SFS algorithm consists of two main processes: the diffusion process and the update process. During the diffusion process, the method used to generate fractal shapes is random fractals, which can be generated by modifying random rules during the iterative process. For the DAG learning problem, we have redefined a new random walk strategy to solve the binary problem. During the update process, we link the strategy of updating positions among particles through information exchange with mutual learning among individuals in the DAG learning problem.

The main contributions of the paper are summarized as follows:

We propose a new local structure learning method that uses a combination of feature selection and CI testing to mine prior knowledge and identify partial edges.
We propose a binary SFS algorithm, which can integrate the obtained prior knowledge as soft/hard constraints to improve the search efficiency and accuracy.

Experiments prove that the performance of the local structure learning algorithm we propose in terms of searching the space and obtaining the structure prior is more suitable for soft constraint knowledge mining to solve the BN structure learning problem. Moreover, the joint cooperation of soft and hard constraints can improve the performance of the SFS algorithm.

The remainder of this article is organized as follows. Section 2 introduces the background and related concepts. Section 3 discusses the research design of this study. Section 4 discusses the experimental results and performance evaluation. Section 5 concludes this paper with some remarks and suggestions for future research.

2. Background

2.1. BN

A BN can be represented as a 2-tuple

(G, θ)

, where G represents a directed acyclic graph and where

θ

represents the parameters of the network. The directed acyclic graph G can be represented as a 2-tuple

(V, E)

, the elements of

V = \{X_{1}, \dots, X_{n}\}

are called nodes, and the elements of E are called directed edges.

2.2. Information Theoretic Metrics

The entropy

H (X)

is a measure of the uncertainty of a variable X and is calculated via the following formula.

H (X) = - \sum_{x} p (x) l o g p (x)

(1)

The mutual information (MI) represents the amount of information shared between variables, and the mutual information of two random variables X and Y can be expressed as follows.

I (X; Y) = \sum_{x \in X} \sum_{y \in Y} p (x y) l o g \frac{p (x y)}{p (x) p (y)}

(2)

Conditional mutual information (CMI) refers to the mutual information between two variables X and Y when the third variable Z is known, and its calculation formula is as follows.

I (X; Y | Z) = \sum_{z \in Z} p (z) \sum_{x \in X} \sum_{y \in Y} p (x y | z) l o g \frac{p (x y | z)}{p (x | z) p (y | z)}

(3)

2.3. Scoring Function

In our work, BIC is employed to evaluate the scores of the network, which is calculated via the following formula:

B I C (g, D) = l o g p (D; \hat{θ}, g) - \frac{l o g m}{2} d_{θ}

(4)

where g represents the candidate structure, D represents the dataset,

\hat{θ}

represents the maximum likelihood estimator, m represents the sample size, and

d_{θ}

represents the dimensionality of the parameter

θ

.

3. Methodology

Our algorithm has two main parts. In the first part, we propose a new local structure learning algorithm. The core idea is to use feature selection and CI testing to obtain the MB of each node, merge the MB of all nodes to obtain the restricted search space, and identify the V-structure as the structure prior. The search space and structural priors obtained through data learning are taken as soft constraints [41], and the acquired expert knowledge is regarded as a hard constraint. Both can serve as guidance and constraints for the subsequent global structure learning algorithm. In the second part, we propose a binary stochastic fractal search algorithm for DAG learning and describe in detail the changes we made to the original stochastic fractal search algorithm. The combination of the two parts is the stochastic fractal search algorithm based on feature selection (FS-SFS) proposed by us for Bayesian network structure learning. We also explained how the acquired knowledge can be integrated into the search process of the SFS algorithm as soft and hard constraints.

3.1. Proposed Local Structure Learning Algorithm

Feature selection based on information theory is a common method for local MB discovery and has the advantage of avoiding higher-order CI tests. For example, the LCS-FS algorithm adopts the fast correlation-based filter (FCBF) method, which is a two-stage evaluation feature selection method that includes correlation evaluation and redundancy evaluation. In the evaluation of correlation, an information theory measurement method called symmetric uncertainty (SU) is adopted to score and rank the features, and its calculation formula is as follows.

S U (X; Y) = \frac{2 I (X; Y)}{H (X) + H (Y)}

(5)

In redundancy evaluation, the FCBF method heuristically deletes the redundant features in the feature order arranged according to a certain rule on the basis of an approximate MB criterion. Although the operational efficiency of the FCBF method is quite high, the approximate MB criterion it uses does not consider the interaction information between features and has difficulty identifying the complex redundant relationships among multiple features. Therefore, it is prone to miss important feature combinations and retain excessive noise features. The interaction information matrix (IIM) is calculated as follows.

I I M (X; Y; Z) = I (X; Y) - I (X; Y | Z)

(6)

In the face of a complex network or a large number of network node states, the accuracy of the method based on the CI test will be greatly reduced or even invalid. Therefore, we still use the FCBF method to screen the feature subset as the approximate MB. When using the FCBF, we need to modify it to meet our expectations for soft constraint knowledge mining. The proposed local structure learning algorithm is divided into four steps: restricted search space, primary pruning, secondary pruning, and identification of partial V-structures. During the two pruning processes, we adopted the FCBF method both times, but the correction strategies adopted were different; these were named the FCBF1 and FCBF2 algorithms. The threshold for the chi-square test in the algorithm is uniformly set at 0.05. The framework of our algorithm is shown in Algorithm 1.

In the first step, we first test each node’s potential neighbor (PN) nodes with a 0-order chi-square test. By joining the neighbors of all the nodes, we can obtain an initial skeleton structure that can serve as the initial search space (

S S_{1}

). For each edge in the initial search space, we calculate the mutual information between the corresponding two nodes. Note that we use the chi-square test instead of mutual information as a measure of the initial search space because the threshold of mutual information is not easy to determine.

Algorithm 1: WLBRC

$Fractalfract 09 00394 i001$

In the second step, we perform a preliminary pruning of the potential neighbor set of each node via the FCBF1 algorithm. To avoid premature and incorrect deletion of important features and ensure the recall rate of the search space, we use both the SU and the first-order chi-square test for judgment, and only when both are satisfied will we delete the corresponding edge. After pruning, we can obtain a narrower search space (

S S_{2}

). The pseudocode of the FCBF1 algorithm is shown in Algorithm 2.

In the third step, we continue to use the FCBF2 algorithm and chi-square test to further prune the PN of each node. In the redundancy evaluation of the FCBF2 algorithm, we introduce the LBRC [42] method to correct the importance ranking. The LBRC algorithm consists of three stages: correlation evaluation, redundancy evaluation, and complementarity evaluation. Its calculation formula is as follows.

J_{L B R C} (F) = I (F; C) - m a x I (F; F_{S}) + m a x I (F; F_{S} | C)

(7)

where

F_{S}

represents the selected subset of features and C represents the target category. In the correlation evaluation, since the V-structure is closely related to feature complementarity, we use the maximum weighted condition mutual information instead of the mutual information to calculate the

S U

. Similarly,

I (F; C)

in the above formula is also replaced by the maximum weighted mutual information. To our knowledge, this is the first time that the three-stage feature selection method has been introduced into the field of DAG learning. We name the proposed local structure learning algorithm the WLBRC algorithm. The modified calculation formula of the weighted conditional mutual information is as follows.

W_{C M I (X; Y | Z)} = 1 - \frac{I (X; Y) - I (X; Y | Z)}{m i n (H (X), H (Y), H (Z))}

(8)

In accordance with the LBRC algorithm, we greedily select the node with the maximum value in the LBRC algorithm and delete the node whose chi-square test is independent among the nodes to be selected. After the node is selected, we further try to find a set of conditions that make the node independent of the target node. We greedily select the node whose interaction information with the two nodes is the largest positive to enter the condition set until the chi-square test fails. We then treat nodes that meet the SU condition and do not find an independent set of conditions as neighbors of the target node. Finally, for the asymmetric branches in

S S_{3}

, we adopt the OR rule to improve the recall rate of the search space and the AND rule to limit PN to increase the precision of the subsequent recognition of the V-structure. The FCBF2 algorithm is shown in Algorithm 3.

Algorithm 2: FCBF1

$Fractalfract 09 00394 i002$

In the fourth step, we identify the V-structure according to the complementarity between the features. For any node T, we find the pair of nodes X and Y with the least interaction information among its neighbors. If the minimum value of the interaction information is less than 0, we consider that there is a V-structure

X \to T \leftarrow Y

. Then, according to this V-structure, other neighbor nodes are distinguished as parent or child nodes. The specific pseudocode for determining orientation is shown in Algorithm 4.

3.2. Proposed Binary SFS Algorithm

In this section, we attempt to make reasonable changes to the two processes in the SFS algorithm to adapt to binary problems. For the diffusion process, the core idea is fractal search. Each particle randomly walks around it, and the best generated particle is retained while the rest is discarded. This process enhances the local development capability and can prevent falling into local optimum. For the update process, the core idea is to update the position through the communication between particles. This process can enhance the global exploration ability and reduce the convergence time.

Algorithm 3: FCBF2

$Fractalfract 09 00394 i003$

Algorithm 4: VD

$Fractalfract 09 00394 i004$

3.2.1. Diffusion Process

To simulate the diffusion process, the Gaussian random walk is most commonly used because of its advantage in local development capabilities. The following formula describes the commonly used Gaussian walk strategy.

G W_{1} = Gaussian (μ_{B P}, σ) + (ϵ \times B P - ϵ^{'} \times P_{i})

(9)

G W_{2} = Gaussian (μ_{P}, σ)

(10)

σ = | \frac{l o g (g)}{g} \times (P_{i} - B P) |

(11)

where

ϵ

and

ϵ^{'}

are random numbers between 0 and 1.

B P

and

P_{i}

represent the best position point and the position of the ith particle, respectively. The means of the two Gaussian distributions are equal to

| B P |

and

| P_{i} |

, respectively, and the variance

σ

decreases dynamically with an increasing number of iterations g.

For the binary DAG learning problem, we adopt a new chemotactic walk strategy to replace the Gaussian random walk. In the chemotactic walk, we perform three operations in sequence: chemotactic edge addition, chemotactic edge deletion, and chemotactic edge reversal. Since the complete random walk will cause the search convergence speed to slow down, we consider that each particle randomly selects a node before the start of the chemotactic walk and performs only the chemotactic operation on its parent set. The selection of nodes by the three chemotaxis operations is not completely random; rather, they are selected one by one according to the preset random sequence. If node X is selected, we define three chemotaxis operations as follows:

Chemotactic edge addition: Greedily add nodes to the parent node set of X if this action will cause the BIC score to increase.
Chemotactic edge deletion: Greedily remove nodes from the parent node set of X if this action will cause the BIC score to increase.
Chemotactic edge reversal: Greedily reverse the parent node set of X if this action causes the BIC score to increase.

A chemotactic walk can efficiently explore the search space around particles. However, as the algorithm runs, the starting points of all the particles become the same, which is not conducive to escaping the local optimum. Therefore, we consider adding two kinds of jump operations, namely, deleting one parent node of the target node X before the chemotaxis operation begins or reversing one parent node of the target node X. For the selected node X, each particle can randomly choose to perform a chemotactic walk at the original position or perform a jump operation and then a chemotactic walk again. Three different chemotactic walk strategies can develop the local search space more fully and are conducive to escaping the local optimum. For large-scale and complex DAGs, unrestricted fractal search significantly increases the cost of the algorithm. In the original SFS algorithm, the number of Gaussian walk steps is also limited to a constant. In the chemotactic walk, we consider limiting the number of steps of the three chemotactic operation operators to a given identical constant. The maximum number of walks in the same direction (denoted as

m k

) was set to 10 in our experiment.

3.2.2. Update Process

The update process simulates how particles update their positions on the basis of the positions of other particles. To explore the search space better, the update process is divided into two parts. The two parts adopt different random methods to update each dimension or all dimensions of the particles. In the first step, we sort the particles on the basis of the fitness function and assign a probability value to them. The formula is as follows:

P a_{i} = \frac{rank (P_{i})}{n P o p}

(12)

where

rank (P_{i})

represents the ranking of the particles and where

n P o p

represents the number of particles. Particles with worse positions are assigned a higher probability value to update their positions. For each dimension j of the particle

P_{i}

, we randomly generate a random number

ϵ

between 0 and 1. When the probability value

P a_{i}

assigned to the particle is less than

ϵ

, we update the position of the particle’s dimension j. The methods adopted for updating are as follows:

P_{i}^{'} (j) = P_{r} (j) - ϵ \times (P_{t} (j) - P_{i} (j))

(13)

where

P_{r}

and

P_{t}

are two particles randomly selected apart from

P_{i}

.

In the second step, we update all the dimensions of the particles on the basis of the positions of other particles. This can enhance the global exploration ability and increase the diversity of the population. First, we sort all the particles again, assign probability values, and update the positions of the particles according to the following formula.

P_{i}^{″} = P_{i}^{'} - \hat{ϵ} \times (P_{t}^{'} - B P) | ϵ^{'} \leq 0.5

(14)

P_{i}^{″} = P_{i}^{'} + \hat{ϵ} \times (P_{t}^{'} - P_{r}^{'}) | ϵ^{'} > 0.5

(15)

where

\hat{ϵ}

and

ϵ^{'}

are random numbers between 0 and 1. During the entire update process, the new positions of the particles are generated through information exchange with other particles, which is similar to the learning mechanism in the DAG learning heuristic algorithm. To accelerate the convergence speed, we limit the object of information exchange to one. In the first step, we learn each dimension from the random particle. In the second step, we learn all the dimensions from the random particle or the optimal particle. We redefined the update formula as follows:

P_{i}^{'} (j) = P_{i} (j) ⨁ [r a n d ⨂ (P_{t} (j) ⨁ P_{i} (j)]

(16)

P_{i}^{″} = P_{i}^{'} ⨁ [R a n d ⨂ (P_{i}^{'} ⨁ B P)] | ϵ^{'} \leq 0.5

(17)

P_{i}^{″} = P_{i}^{'} ⨁ [R a n d ⨂ (P_{i}^{'} ⨁ P_{t}^{'})] | ϵ^{'} > 0.5

(18)

where ⨁ and ⨂ represent

A N D

and

X O R

, respectively, in logical operations, and

R a n d

represents a random number group or matrix with elements of 0 or 1. Since the position P of the particle is a matrix with elements of 0 or 1, after the operations of the above three formulas, it conforms exactly to the process of the DAG structure learning. This is a common method for binarizing DAG learning methods [17].

3.2.3. Knowledge Fusion

The hard constraints are directly given proportionally in our experiment and consist of two parts: the positive constraint rate p and the negative constraint rate q. We need to ensure that the new individuals generated in each generation must meet the hard constraints; otherwise, they will be regarded as illegal individuals and will not be updated in this iteration. In chemotactic walks, we also apply hard constraints to restrict unreasonable walks. The soft constraints contain a series of restricted search spaces (

S S_{1}

,

S S_{2}

, and

S S_{3}

) and identified V-structures. In our experiment,

S S_{2}

is used as the search space for the SFS algorithm. Note that the search space naturally needs to exclude the negative edge constraints in the hard constraints. When learning the DAG, the initial population needs to be generated first. In our algorithm, the initial population is obtained by performing hill climbing operations on the narrow space

S S_{3}

, starting from the directed edges contained in the acquired knowledge. The pseudocode of the SFS algorithm is shown in Algorithm 5. In each generation, each particle successively undergoes diffusion and update processes. When the iteration stop condition is reached, the algorithm stops and outputs the optimal structure.

Algorithm 5: Stochastic Fractal Search

$Fractalfract 09 00394 i005$

4. Experiments

In this section, we first evaluate the performance of the WLBRC method in mining soft constraint knowledge, compare it with six MB discovery algorithms to evaluate the performance in the search space, and compare it with six algorithms that can identify V-structures to evaluate the performance in identifying V-structures. The six MB discovery algorithms are IAMB, STMB, PCMB, BAMB, LCS-FS, and MMPC [23]. The six algorithms that can identify the V-structure are the GS, CMB, PCD-by-PCD, LCS-FS, PC-Stable, and F2SL_c [24] algorithms. Then, we evaluated the performance of the FS-SFS algorithm on different networks and datasets. Finally, the FS-SFS algorithm was compared with five known BN structure learning algorithms on different datasets. The five comparison algorithms are the PC-stable, GS, F2SL, MMHC, and BNC-PSO algorithms. All the algorithms are run in MATLAB R2020a, and all the following experiments are performed on an AMD 1.7 GHz CPU with 16 GB of RAM.

4.1. Datasets and Evaluation Metrics

To evaluate the performance of the FS-SFS algorithm, we selected six standard Bayesian networks from the BNLEARN repository (https://www.bnlearn.com/bnrepository/ and accessed on 29 April 2025) and collected 1000, 3000, 5000, and 10,000 samples for each network. The summaries of the six Bayesian networks are shown in Table 1. To better test the performance of the proposed algorithm, we selected the network with more nodes. In the BNLEARN repository, the Alarm network is a medium network; the Hepar2 and Win95pts networks are large networks; and the Munin, Andes, and Pigs networks are very large networks.

To evaluate the search performance of the SFS algorithm, we adopt the following indicators:

BIC: The BIC score of the output optimal network.
AE: The number of edges in the output optimal network that were incorrectly added.
DE: The number of edges in the output optimal network that were incorrectly deleted.
RE: The number of edges in the output optimal network that were incorrectly reversed.
SHD: The hamming distance between the output structure and the original structure.
RT: The running time of the SFS algorithm.
F1: The evaluation index of graph accuracy; its calculation formula is $F 1 = 2 \times Precision * Recall / (Precision + Recall)$ .

Two main performance indicators were used to evaluate the performance of the proposed feature selection method: precision (P) and recall (R). Precision represents the number of correct edges in the set divided by the total number of edges in the set, and recall represents the number of correct edges in the set divided by the true number of edges in the original set. Recall and precision often have an inverse relationship. In some cases, increasing the recall rate may reduce the precision rate, and vice versa. The search space should adopt the principle of recall priority because a high recall rate can improve the completeness of the search space, which is crucial to the performance of the subsequent score search algorithm. The directional V-structure should adopt the principle of precision priority because high accuracy can improve the search efficiency of the subsequent score search algorithm. To quantitatively evaluate the performance of each algorithm in the search space and the directional V-structure, we adopt a more general form

F_{β}

of F1 to express our different preferences for precision and recall, which is defined as

F_{β} = \frac{(1 + β^{2}) \times P \times R}{β^{2} \times P + R}

(19)

when

β > 1

, the recall rate has a greater impact, and when

β < 1

, the precision rate has a greater impact. In our experiment,

β

was set to 5 when the search space was compared and to 0.2 when the V-structure was compared.

4.2. Soft Constraint Knowledge Mining

From the previous section, we know that soft constraint knowledge contains a series of search spaces and sets of directed edges. We report the precision and recall rates in each search space and the set of directed edges in Table 2.

For search spaces, a higher recall indicates a better completeness of the search space, and a higher precision can enhance search efficiency. By comparing

S S_{1}

,

S S_{2}

, and

S S_{3}

, we find that the precision of

S S_{1}

is too low, which reduces the search efficiency, whereas

S S_{3}

has the lowest recall, which affects the graph accuracy of the final output network. Therefore, we consider

S S_{2}

as the search space. For

S S_{2}

, we found that on the Alarm and Pigs networks, the recall rate fluctuated very little compared with that of

S S_{1}

. However, on the other four networks, the recall rate decreases to varying degrees, which may cause the final output network to lose some edges. For

S S_{3}

, the precision is relatively high, and the initial population obtained by climbing within it has a very high score.

For the V set, the precision indicates the accuracy of the identified direction. The higher the precision is, the more reliable the prior knowledge is. The recall indicates the sufficiency of soft constraint knowledge. The higher it is, the more abundant the soft constraint knowledge is. For the Alarm, Win95pts, Andes, and Pigs networks, the precision exceeds 90%, which indicates that the orientation of the V-structure is very reliable for these four networks. For the Hepar2 network and the Munin network, the recall is significantly lower than that of the other four networks, and the precision is also lower than that of the other four networks. This indicates that sufficient and reliable prior knowledge has not been mined on these two networks.

To further illustrate the performance of the algorithm we proposed in soft constraint knowledge mining, we compared the results of six local structure learning algorithms in terms of the search space and V-structure identification. The adopted dataset sizes are 1000 and 10,000, which are the minimum and maximum values of the datasets collected in our experiment, respectively.

Table 3 reports the performance comparison with six local MB discovery algorithms in terms of the search space. For the search space, compared with precision, we should prioritize the recall rate.

F_{5}

is a harmonic value that we set on the basis of this preference. As the sample size increased from 1000 to 10,000, the recall rates of all the algorithms improved significantly, except for LCS-FS. Moreover, the recall rate of LCS-FS is also significantly lower than that of the other algorithms on most datasets. The reason for the analysis is that LCS-FS adopts the FCBF method for the discovery of approximate MB. As explained in the previous section, this method is prone to incorrectly deleting important features, so its recall rate cannot be guaranteed.

The IAMB algorithm adopts the direct learning strategy, whereas the STMB, PCMB, and BAMB algorithms adopt the divide-and-conquer strategy. Although theoretically, the direct learning strategy has high computational efficiency but poor accuracy when processing high-dimensional data, from our experimental results, it seems not to be absolute. The IAMB algorithm is significantly superior to the PCMB and BAMB algorithms in terms of time performance, but it outperforms the STMB algorithm on the Win95pts and Andes networks. The precision of the STMB algorithm is significantly lower than that of the IAMB, PCMB, and BAMB algorithms. MMPC is currently a commonly used search space establishment algorithm in Bayesian network heuristic algorithms. It prefers precision when establishing the search space, which is also the common preference of other comparison algorithms. However, the improvement in precision often reduces the recall rate. Among the comparison algorithms, the STMB algorithm with the lowest precision maintains a relatively high recall rate. Here, our aim is not to explain the shortcomings of other algorithms but rather to highlight their inappropriateness when used to establish the search space. This is also the motivation for us to develop new local structure learning algorithms.

Table 4 reports the performance comparison with six algorithms in identifying the V-structure. To identify the V-structure, compared with the recall rate, we should prioritize the precision rate.

F_{0.2}

is the harmonic value we set according to this preference. The GS algorithm achieves high precision on the other five networks except the Munin network. However, its low precision on the Munin network and low recall rate on all datasets indicate that its performance in identifying V-structures is weaker than that of other algorithms. Both the LCS-FS and F2SL_c algorithms adopt the FCBF method. Therefore, the recall rate of their V-structure cannot be guaranteed. In particular, for the Hepar2 network, its recall rate is only 1.63%, and the recall rate does not increase when the sample size increases to 10,000. This is also the motivation for why we introduce the LBRC method to rank the importance to avoid the omission of important features. The time performance of the CMB and PCD-by-PCD algorithms is significantly lower than that of the other algorithms. The PC-Stable algorithm is a generally recognized as a constraint-based DAG learning method, but its precision on the Munin1000 is only 5.51%.

To comprehensively illustrate the performance of the WLBRC algorithm in mining soft constraint knowledge, we report in Table 5 the number of wins and losses of the proposed algorithm compared with other algorithms in searching the space and identifying the V-structure. For the search space, the WLBRC algorithm outperforms other algorithms in terms of time and recall rate and is only on par with MMPC in terms of comprehensive indicators

F_{5}

. For the V-structure, the WLBRC algorithm outperforms the other algorithms in comprehensive indicators

F_{0.2}

and only loses to the F2SL algorithm in terms of time and precision. However, the F2SL algorithm adopts the FCBF method. Its performance in the search space is consistent with that of the LCF-FS algorithm, and its recall rate is lower than that of the WLBRC algorithm. Overall, the WLBRC algorithm sacrifices the precision rate in the search space to ensure the recall rate and uses the recall rate in identifying the V-structure to ensure the precision rate. This preference leads to some indicators being inferior to those of other algorithms. However, we believe that the proposed algorithm is more suitable for solving the problem of soft constraint knowledge mining in DAG learning.

4.3. Learning BNs via the FS-SFS Algorithm

The parameters of the FS-SFS algorithm are few and simple. As shown in Table 6, these parameters can be set directly without repeated optimization. To evaluate the performance of the SFS algorithm, we report the experimental results under different Bayesian network structure training sets in Table 7. For each dataset, we report the average and standard deviation of each metric after 10 runs.

From the perspective of the BIC score, the standard deviations of the FS-SFS algorithm on the four datasets of the Alarm and Pigs networks are 0, which indicates that the algorithm can stably learn a network structure with the same BIC score on these two networks. Interestingly, on Alarm1000, although the acquired networks have the same BIC score, the F1 score and RE fluctuate, which indicates that the acquired networks have different structures with the same BIC score. For the other four networks, except for Munin3000, Munin5000, Munin10000, and Win95pts5000, the standard deviations of the network learned by the algorithm on the other datasets are all single digits, which is relatively small. An important reason for the analysis of the BIC score fluctuation on the Munin network is that its complex structure makes it impossible for us to learn sufficient soft constraint knowledge from the data, thereby resulting in changes in the final output structure. Overall, the SFS algorithm has very good stability in the search for scores.

From the point of view of structural errors, the SHD tends to decrease with increasing sample size and fluctuates very little around the mean. For the Alarm and Pigs networks, the structural errors of the learned networks have almost no fluctuations. Among the other four networks, the average value of structural errors is relatively large and concentrated in DE and AE. By observing the experimental results in the previous section, we can determine that the reason is the low recall of the search space. With increasing sample size, the recall of the search space increases, and DE and AE decrease. This finding indicates that increasing the sample size is conducive to restoring the wrongly deleted edges in the learning network, and the addition of the correct edges also limits the entry of the wrong edges to a certain extent. Furthermore, we find that the standard deviations of the structural errors AE, DE, and RE are all very small relative to the mean. This indicates that the structural errors between the learned networks are very small; that is, the output results of the SFS algorithm have good stability in terms of graph accuracy. With increasing sample size, the F1 score of the learning network also tends to increase, which indicates that appropriately increasing the sample size is helpful for improving the graph accuracy of our algorithm.

From the point of view of running time, with increasing sample size, the running time did not surge sharply but increased slowly, which indicates that our algorithm is suitable for processing large-scale datasets. However, as the network scale increases, the running time of the algorithm increases significantly, which indicates that the time performance of our algorithm still needs to be further optimized.

To test the impact of parameter changes on the performance of the algorithm, we floated the three hyper-parameters (

p, q, n P o p

) up and down. Table 8 reports the performance after parameter changes on the two networks of Alarm and Hepar2. The sample sizes were selected as 1000 and 10,000, and each parameter change was run only once. For the four datasets, after the parameter

n P o p

changes, the changes in each performance index are not obvious, which indicates that the performance of the algorithm is not sensitive to the parameter

n P o p

. For p, all three performance indicators change significantly. The trend is that reducing p increases the BIC score and SHD while reducing the F1 score. Increasing p increases the F1 score and reduces the BIC score and SHD. Theoretically, the higher the positive constraint rate p of expert knowledge is, the better the performance of the algorithm should be. The reason for the inconsistency between the F1 score and the BIC score is that the data are not faithful to the underlying network. For q, there are changes in three performance indicators, but they are not obvious. The trend is that the smaller q is, the higher the BIC score will be. The reason is that its up and down fluctuations cause a reduction in and expansion of the search space, and a change in the search space affects the output results of the algorithm. The sensitivity of algorithm performance to the search space motivated us to develop new local structure learning algorithms.

4.4. Comparison with Some Other Algorithms

To make the comparison fair, the FS-SFS algorithm only integrates the knowledge of soft constraints, and the hard constraints are reset to 0. Furthermore, for the BNC-PSO algorithm, which is also a meta-heuristic, we also integrate soft constraint knowledge; that is, it has the same initial population and search space. For the two constraint-based algorithms, PC-stable and GS, we still calculate their BIC scores for convenient comparison. Table 9 and Table 10 present the BIC scores and F1 scores, respectively, of the output structures of each algorithm under different datasets. The best results of each dataset are displayed in bold. Since the FS-SFS algorithm is a score-based algorithm, we report a two-sided Wilcoxon signed-rank test of the BIC scores in Table 9 to observe whether the output results are statistically reliable. The P-value of the test is marked with an asterisk after the BIC score of the comparison algorithm.

As shown in Table 9, the BIC scores of the FS-SFS algorithm on all datasets are higher than those of the other algorithms, which indicates that the SFS algorithm has a strong global search ability in score search. For the constraint-based algorithms GS and PC-Stable, the BIC score of the output network is significantly lower than that of the algorithm based on score search, especially when the network scale is large, which makes the gap more obvious. Furthermore, the PC-Stable algorithm cannot calculate the score on five datasets. The four score-based algorithms are combinations of local search methods and global search methods. Among them, the F2SL algorithm and the MMHC algorithm adopt the same global search method (hill climbing method). The F2SL algorithm outperforms the MMHC algorithm on the Alarm and Munin networks but loses to the MMHC algorithm on the other four networks. This finding indicates that the local search methods adopted by both have low robustness and cannot guarantee good performance when facing different BNs. The F2SL algorithm and the MMHC algorithm both lose to the meta-heuristic BNC-PSO algorithm and the FS-SFS algorithm on all networks. This indicates that the meta-heuristic method may have better global search capabilities in terms of score search than the hill climbing method does. Since the BNC-PSO algorithm and the FS-SFS algorithm adopt the same local search method, we believe that the global search capability of the latter is greater than that of the former.

As shown in Table 10, the FS-SFS algorithm has the highest F1 score on the 14/24 datasets. Interestingly, the datasets where F1 scores win are different from those where the BIC scores win. One important reason for having a high BIC score but a low F1 score is that the data are not faithful to the underlying BN. Since the algorithms involved in the comparison are either constraint-based algorithms or integrate local search methods, as the sample size increases, the F1 scores of each algorithm improve. For the constraint-based algorithms, the F1 scores of the GS algorithm on the Munin and Pigs networks are significantly lower than those of the other algorithms, and the F1 scores of the other four networks are also lower than those of the PC-Stable algorithm. Obviously, the performance of PC-Stable is better than that of the GS algorithm. However, when the sample size is insufficient, the performance of the PC-Stable algorithm will be severely challenged. For example, on the Munin1000 dataset, the F1 score is 0.1180, which is much lower than those of other score-based algorithms. For the Munin10000 dataset, the F1 score becomes 0.5380, surpassing those of the F2SL, MMHC, and BNC-PSO algorithms. This indicates that the performance of the constraint-based algorithm is constrained by the sample size. Similarly, the four score-based algorithms involved in the comparison integrate local search methods, and their performances are affected by the sample size. We compared the F1 scores of the four score-based algorithms when the sample size was 1000. The FS-SFS algorithm won three times, and the MMHC algorithm won two times. This finding indicates that FS-SFS is more suitable for handling small sample datasets.

The above experimental results show that the FS-SFS algorithm under soft constraints has a greater global search ability in score search, and the graph accuracy of the learning network is also superior to that of other comparison algorithms. However, when addressing issues such as insufficient sample size and large-scale DAG learning, a high BIC score of the output structure often reduces the F1 score of the output structure. The hard constraint of expert knowledge can alleviate this problem, outputting high-score structures while having high graph accuracy and thus strong robustness and accuracy. In conclusion, both soft and hard constraints have significant influences on the performance of the SFS algorithm. The local structure learning algorithm based on the feature selection we propose is suitable for solving soft constraint knowledge mining in the BN structure learning problem.

5. Conclusions and Future Research

In this paper, a new binary SFS algorithm is proposed to solve the problem of BN structure learning. Moreover, a new local structure learning method based on feature selection and conditional independence testing was also proposed, which can achieve a high recall rate in the search space and high precision regarding causal orientation. Using the soft constraints obtained via this local structure learning method, expert knowledge is set as the hard constraints, and the soft/hard constraints are integrated into the SFS algorithm as prior knowledge. The performance of the SFS algorithm under soft/hard constraints was explored. The experimental results show that our proposed SFS algorithm has better global search ability and graph accuracy than other algorithms. In the future, we will be committed to optimizing the mining and application strategies of soft constraint knowledge. Moreover, it is also necessary to optimize the fractal strategy of the SFS algorithm to improve its time performance.

Author Contributions

Conceptualization, Y.D.; methodology, Y.D.; software, Y.D.; validation, Y.D.; formal analysis, Y.D. and Z.W.; investigation, Y.D.; resources, Y.D.; data curation, Y.D. and Z.W.; writing—original draft preparation, Y.D.; writing—review and editing, Y.D.; visualization, Y.D.; supervision, X.G.; project administration, Y.D. and Z.W.; funding acquisition, X.G. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Natural Science Foundation of China (61573285), the Fundamental Research Funds for the Central Universities, China (No. G2022KY0602), and the key core technology research plan of Xi’an, China (No. 21RGZN0016).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The true networks of all datasets are known, and they are publicly available (http://www.bnlearn.com/bnrepository, accessed on 1 May 2025).

Acknowledgments

I have benefited from the presence of my supervisor and classmates. I am very grateful to my supervisor Xiaoguang Gao who gave me encouragement, careful guidance, and helpful advice throughout the writing of this thesis.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

BIC	Bayesian information criterion
BN	Bayesian network
CI	Conditional independence
CMI	Conditional mutual information
DAG	Directed acyclic graph
FCBF	Fast correlation-based filter
FS	Feature selection
F2SL	Feature Selection-based Structure Learning
MB	Markov blanket
MDL	Minimal description length
MI	Mutual information
MMHC	Max–min hill-climbing
PC	Parent–child
PN	Potential neighbor
PSO	Particle swarm optimization
SFS	Stochastic Fractal Search
SS	Search space
SU	Symmetric uncertainty

References

Yang, J.; Jiang, L.F.; Xie, K.; Chen, Q.Q.; Wang, A.G. Lung nodule detection algorithm based on rank correlation causal structure learning. Expert Syst. Appl. 2023, 216, 119381. [Google Scholar] [CrossRef]
McLachlan, S.; Dube, K.; Hitman, G.A.; Fenton, N.E.; Kyrimi, E. Bayesian networks in healthcare: Distribution by medical condition. Artif. Intell. Med. 2020, 107, 101912. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.W.; Wang, Z.W.; He, S.W.; Gu, X.W.; Yan, Z.F. Fault detection and diagnosis of chillers using Bayesian network merged distance rejection and multi-source non-sensor information. Appl. Energy 2017, 188, 200–214. [Google Scholar] [CrossRef]
Tien, I.; Kiureghian, A.D. Algorithms for Bayesian network modeling and reliability assessment of infrastructure systems. Reliab. Eng. Syst. Saf. 2016, 156, 134–147. [Google Scholar] [CrossRef]
Koller, D.; Friedman, N. Probabilistic Graphical Models: Principles and Techniques; MIT Press: Cambridge, MA, USA, 2009. [Google Scholar]
Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search, 2nd ed.; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
Colombo, D.; Maathuis, M.H. Order-Independent Constraint-Based Causal Structure Learning. J. Mach. Learn. Res. 2014, 15, 3741–3782. [Google Scholar]
Chickering, D.M. Optimal structure identification with greedy search. J. Mach. Learn. Res. 2003, 3, 507–554. [Google Scholar]
de Campos, C.P.; Ji, Q. Efficient Structure Learning of Bayesian Networks using Constraints. J. Mach. Learn. Res. 2011, 12, 663–689. [Google Scholar]
Yuan, C.; Malone, B. Learning Optimal Bayesian Networks: A Shortest Path Perspective. J. Artif. Intell. Res. 2013, 48, 23–65. [Google Scholar] [CrossRef]
Cussens, J.; Järvisalo, M.; Korhonen, J.H.; Bartlett, M. Bayesian Network Structure Learning with Integer Programming: Polytopes, Facets and Complexity. J. Artif. Intell. Res. 2017, 58, 185–229. [Google Scholar] [CrossRef]
Cooper, G.F.; Herskovits, E. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 1992, 9, 309–347. [Google Scholar] [CrossRef]
Lee, J.; Chung, W.Y.; Kim, E. Structure learning of Bayesian networks using dual genetic algorithm. IEICE Trans. Inf. Syst. 2008, 91, 32–43. [Google Scholar] [CrossRef]
Cui, G.; Wong, M.L.; Lui, H.K. Machine learning for direct marketing response models: Bayesian networks with evolutionary programming. Manag. Sci. 2006, 52, 597–612. [Google Scholar] [CrossRef]
Gámez, J.A.; Puerta, J.M. Searching for the best elimination sequence in Bayesian networks by using ant colony optimization. Pattern Recognit. Lett. 2002, 23, 261–277. [Google Scholar] [CrossRef]
Askari, M.B.A.; Ahsaee, M.G.; IEEE. Bayesian network structure learning based on cuckoo search algorithm. In Proceedings of the 6th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Shahid Bahonar Univ Kerman, Kerman, Iran, 28 February–2 March 2018; pp. 127–130. [Google Scholar]
Wang, J.Y.; Liu, S.Y. Novel binary encoding water cycle algorithm for solving Bayesian network structures learning problem. Knowl.-Based Syst. 2018, 150, 95–110. [Google Scholar] [CrossRef]
Sun, B.D.; Zhou, Y.; Wang, J.J.; Zhang, W.M. A new PC-PSO algorithm for Bayesian network structure learning with structure priors. Expert Syst. Appl. 2021, 184, 11. [Google Scholar] [CrossRef]
Gheisari, S.; Meybodi, M.R. BNC-PSO: Structure learning of Bayesian networks by Particle Swarm Optimization. Inf. Sci. 2016, 348, 272–289. [Google Scholar] [CrossRef]
Ji, J.Z.; Wei, H.K.; Liu, C.N. An artificial bee colony algorithm for learning Bayesian networks. Soft Comput. 2013, 17, 983–994. [Google Scholar] [CrossRef]
Yang, C.C.; Ji, J.Z.; Liu, J.M.; Liu, J.D.; Yin, B.C. Structural learning of Bayesian networks by bacterial foraging optimization. Int. J. Approx. Reason. 2016, 69, 147–167. [Google Scholar] [CrossRef]
Wang, X.C.; Ren, H.J.; Guo, X.X. A novel discrete firefly algorithm for Bayesian network structure learning. Knowl.-Based Syst. 2022, 242, 10. [Google Scholar] [CrossRef]
Tsamardinos, I.; Brown, L.E.; Aliferis, C.F. The max-min hill-climbing Bayesian network structure learning algorithm. Mach. Learn. 2006, 65, 31–78. [Google Scholar] [CrossRef]
Yu, K.; Ling, Z.; Liu, L.; Li, P.; Wang, H.; Li, J. Feature Selection for Efficient Local-to-global Bayesian Network Structure Learning. ACM Trans. Knowl. Discov. Data 2024, 18, 37:1–37:27. [Google Scholar] [CrossRef]
Margaritis, D.; Thrun, S. Bayesian network induction via local neighborhoods. Adv. Neural Inf. Process. Syst. 1999, 12, 505–511. [Google Scholar]
Tsamardinos, I.; Aliferis, C. Towards Principled Feature Selection: Relevancy, Filters and Wrappers. In Proceedings of the Ninth International Workshop on Artificial Intelligence and Statistics, Key West, FL, USA, 3–6 January 2003. [Google Scholar]
Tsamardinos, I.; Aliferis, C.F.; Statnikov, A. Time and sample efficient discovery of Markov blankets and direct causal relations. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–27 August 2003; pp. 673–678. [Google Scholar]
Aliferis, C.F.; Statnikov, A.R.; Tsamardinos, I.; Mani, S.; Koutsoukos, X.D. Local Causal and Markov Blanket Induction for Causal Discovery and Feature Selection for Classification Part I: Algorithms and Empirical Evaluation. J. Mach. Learn. Res. 2010, 11, 171–234. [Google Scholar]
Peña, J.M.; Nilsson, R.; Björkegren, J.; Tegnér, J. Towards scalable and data efficient learning of Markov boundaries. Int. J. Approx. Reason. 2007, 45, 211–232. [Google Scholar] [CrossRef]
Gao, T.; Ji, Q. Efficient Markov Blanket Discovery and Its Application. IEEE Trans. Cybern. 2016, 47, 1169–1179. [Google Scholar] [CrossRef]
Ling, Z.; Yu, K.; Wang, H.; Liu, L.; Ding, W.; Wu, X. BAMB: A Balanced Markov Blanket Discovery Approach to Feature Selection. ACM Trans. Intell. Syst. 2019, 10, 1–25. [Google Scholar] [CrossRef]
Wang, H.; Ling, Z.; Yu, K.; Wu, X. Towards efficient and effective discovery of Markov blankets for feature selection. Inf. Sci. 2020, 509, 227–242. [Google Scholar] [CrossRef]
Yin, J.; Zhou, Y.; Wang, C.; He, P.; Geng, Z. Partial orientation and local structural learning of causal networks for prediction. In Causation and Prediction Challenge; PMLR: Birmingham, UK, 2008. [Google Scholar]
Koller, D.; Friedman, N. Local Causal Discovery of Direct Causes and Effects; MIT Press: Cambridge, MA, USA, 2015. [Google Scholar]
Ling, Z.; Yu, K.; Wang, H.; Li, L.; Wu, X. Using Feature Selection for Local Causal Structure Learning. IEEE Trans. Emerg. Top. Comput. Intell. 2021, 5, 530–540. [Google Scholar] [CrossRef]
Salimi, H. Stochastic Fractal Search: A powerful metaheuristic algorithm. Knowl.-Based Syst. 2015, 75, 1–18. [Google Scholar] [CrossRef]
Nguyen, T.P.; Vo, D.N. A Novel Stochastic Fractal Search Algorithm for Optimal Allocation of Distributed Generators in Radial Distribution Systems. Appl. Soft Comput. 2018, 70, 773–796. [Google Scholar] [CrossRef]
Khalilpourazari, S.; Reza Pasandide, S.H.; Akhavan Niaki, S.T. Optimization of multi-product economic production quantity model with partial backordering and physical constraints: SQP, SFS, SA, and WCA. Appl. Soft Comput. 2016, 49, 770–791. [Google Scholar] [CrossRef]
Betka, A.; Terki, N.; Toumi, A.; Hamiane, M.; Ourchani, A. A new block matching algorithm based on stochastic fractal search. Appl. Intell. 2019, 49, 1146–1160. [Google Scholar] [CrossRef]
Yang, Y.; Espin, C.G.S.; Al-Khafaji, M.O.; Kumar, A.; Velasco, N.; Abdulameer, S.F.; Alawadi, A.; Alam, M.M.; Dadabaev, U.A.U.; Mayorga, D. Development of a mathematical model for investigation of hollow-fiber membrane contactor for membrane distillation desalination. J. Mol. Liq. 2024, 404, 124907. [Google Scholar] [CrossRef]
Kitson, N.K.; Constantinou, A.C.; Guo, Z.G.; Liu, Y.; Chobtham, K. A survey of Bayesian Network structure learning. Artif. Intell. Rev. 2023, 56, 8721–8814. [Google Scholar] [CrossRef]
Zhang, Y.S.; Zhu, R.L.; Chen, Z.J.; Gao, J.; Xia, D. Evaluating and Selecting Features via Information Theoretic Lower Bounds of Feature Inner Correlations for High-Dimensional Data. Eur. J. Oper. Res. 2021, 290, 235–247. [Google Scholar] [CrossRef]

Table 1. Summary of networks.

Network	Nodes	Edges	Max.indeg	Max.outdeg	Domain Range
alarm	37	46	4	5	2–4
hepar2	70	123	6	17	2–4
win95pts	76	112	7	10	2–2
munin	189	282	3	15	1–21
andes	223	338	6	12	2–2
pigs	441	592	2	39	3–3

Table 2. Soft-constrained knowledge mining performance.

Network	Dataset	${SS}_{1}$		${SS}_{2}$		${SS}_{3}$		V
Network	Dataset	P	R	P	R	P	R	P	R
alarm	1000	0.1891	0.9783	0.4889	0.9565	0.8837	0.8261	1	0.4348
	3000	0.1815	0.9783	0.5176	0.9565	0.9286	0.8478	1	0.5000
	5000	0.1758	0.9783	0.4945	0.9783	0.9535	0.8913	1	0.5652
	10,000	0.1673	0.9783	0.5056	0.9783	0.9762	0.8913	0.9630	0.5652
hepar2	1000	0.2702	0.6260	0.4205	0.6016	0.7353	0.4065	0.4483	0.1057
	3000	0.2648	0.7642	0.4518	0.7236	0.7952	0.5366	0.6667	0.1789
	5000	0.2512	0.8455	0.4612	0.8211	0.8370	0.6260	0.6667	0.2114
	10,000	0.2339	0.8862	0.4771	0.8455	0.8854	0.6911	0.7805	0.2602
win95pts	1000	0.1951	0.9196	0.4914	0.7679	0.9038	0.4196	0.8636	0.1696
	3000	0.1786	0.9375	0.4475	0.8750	0.8947	0.4554	0.9655	0.2500
	5000	0.1723	0.9554	0.4410	0.9018	0.8986	0.5536	0.9783	0.4018
	10,000	0.1775	0.9732	0.4483	0.9286	0.8971	0.5446	0.9333	0.3750
munin	1000	0.0330	0.7985	0.0938	0.6850	0.3251	0.3370	0.6250	0.0733
	3000	0.0294	0.8352	0.1320	0.7436	0.4977	0.4029	0.8246	0.1722
	5000	0.0283	0.8571	0.1345	0.7766	0.6106	0.4652	0.8594	0.2015
	10,000	0.0277	0.8645	0.1297	0.8132	0.6238	0.4615	0.8621	0.1832
andes	1000	0.1375	0.7870	0.2135	0.7574	0.8990	0.5533	0.9466	0.3669
	3000	0.1327	0.8846	0.2393	0.8609	0.9244	0.6509	0.9716	0.5059
	5000	0.1274	0.9172	0.2402	0.8905	0.9070	0.6923	0.9834	0.5266
	10,000	0.1240	0.9556	0.2484	0.9320	0.9267	0.7485	0.9859	0.6213
pigs	1000	0.0430	1	0.1414	1	0.5157	1	0.9936	0.7889
	3000	0.0357	1	0.1376	1	0.6456	1	0.9981	0.9020
	5000	0.0340	1	0.1341	1	0.6884	1	0.9965	0.9493
	10,000	0.0314	1	0.1314	1	0.7522	1	1	0.9696

Table 3. The performance of different local structure learning algorithms in generating search spaces. ‘-’ indicates that no result has been output after running for one hour.

Network	Algorithm	1000				10,000
Network	Algorithm	$F_{5}$	P	R	RT	$F_{5}$	P	R	RT
alarm	IAMB	0.9002	0.6667	0.9130	0.17	0.9400	0.6567	0.9565	1.27
	STMB	0.8554	0.2739	0.9348	0.51	0.8744	0.2394	0.9783	1.76
	PCMB	0.8199	0.6909	0.8261	1.53	0.9032	0.7119	0.9130	11.23
	BAMB	0.9179	0.6324	0.9348	0.63	0.9385	0.6377	0.9565	5.37
	LCS_FS	0.7794	0.7059	0.7826	0.19	0.7583	0.7000	0.7609	1.63
	MMPC	0.9278	0.7818	0.9348	0.36	0.9718	0.8333	0.9783	0.86
	WLBRC	0.9226	0.4889	0.9565	0.15	0.9443	0.5056	0.9783	0.78
hepar2	IAMB	0.5907	0.5290	0.5935	0.62	0.8031	0.4493	0.8293	6.59
	STMB	0.5885	0.4867	0.5935	0.62	0.8564	0.3763	0.9024	16.60
	PCMB	0.3498	0.8936	0.3415	2.27	0.6623	0.7714	0.6585	171.22
	BAMB	0.4951	0.7895	0.4878	0.65	0.8249	0.7286	0.8293	12.54
	LCS_FS	0.2179	0.9630	0.2114	0.27	0.2262	0.9643	0.2195	2.70
	MMPC	0.5473	0.6204	0.5447	0.76	0.8186	0.7594	0.8211	5.39
	WLBRC	0.5918	0.4205	0.6016	0.24	0.8211	0.4771	0.8455	1.66
win95pts	IAMB	0.7018	0.4878	0.7143	0.77	0.8529	0.4541	0.8839	8.01
	STMB	0.6636	0.1985	0.7321	0.82	0.7176	0.1258	0.8839	7.99
	PCMB	0.4271	0.7705	0.4196	2.00	0.7070	0.5634	0.7143	45.40
	BAMB	0.6646	0.5597	0.6696	1.93	0.8693	0.5236	0.8929	29.42
	LCS_FS	0.5808	0.5909	0.5804	0.63	0.5697	0.5289	0.5714	5.99
	MMPC	0.7054	0.7054	0.7054	0.79	0.8853	0.7299	0.8929	2.46
	WLBRC	0.7516	0.4914	0.7679	0.43	0.8918	0.4483	0.9286	2.23
munin	IAMB	0.4291	0.5714	0.4249	2.29	0.5897	0.5876	0.5897	14.63
	STMB	-	-	-	-	-	-	-	-
	PCMB	0.1482	0.2073	0.1465	49.25	0.4556	0.4058	0.4579	1438.23
	BAMB	-	-	-	-	-	-	-	-
	LCS_FS	0.5570	0.2082	0.5971	6.73	0.5277	0.2650	0.5495	65.05
	MMPC	0.4591	0.0924	0.5458	11.45	0.6616	0.3569	0.6850	40.26
	WLBRC	0.5514	0.0938	0.6850	7.09	0.6761	0.1297	0.8132	32.41
andes	IAMB	0.7106	0.3009	0.7515	9.93	0.8336	0.2543	0.9172	146.06
	STMB	0.6818	0.1902	0.7604	6.40	0.7757	0.1475	0.9349	64.78
	PCMB	0.5934	0.6390	0.5917	26.97	0.7871	0.5757	0.7988	312.44
	BAMB	0.6946	0.5629	0.7012	7.99	0.8630	0.5644	0.8817	89.52
	LCS_FS	0.4267	0.7030	0.4201	3.64	0.4506	0.7282	0.4438	35.02
	MMPC	0.7212	0.5441	0.7308	5.94	0.8892	0.6157	0.9053	17.03
	WLBRC	0.6898	0.2135	0.7574	2.42	0.8428	0.2484	0.9320	12.03
pigs	IAMB	0.9735	0.6327	0.9949	18.10	0.9381	0.3682	1	216.75
	STMB	0.5813	0.0507	1	110.24	0.4627	0.0321	1	1874.75
	PCMB	-	-	-	-	-	-	-	-
	BAMB	0.9773	0.6238	1	212.06	-	-	-	-
	LCS_FS	0.9661	0.6950	0.9814	22.07	0.9862	0.7336	1	228.73
	MMPC	0.9708	0.5611	1	27.82	0.9715	0.5670	1	314.17
	WLBRC	0.8107	0.1414	1	13.55	0.7973	0.1314	1	127.96

Table 4. The performance of different structure learning algorithms in identifying V-structures. ‘-’ indicates that no result has been output after running for one hour.

Network	Algorithm	1000				10,000
Network	Algorithm	$F_{0.2}$	P	R	RT	$F_{0.2}$	P	R	RT
alarm	GS	0.5417	1	0.0435	0.11	0.4860	0.5455	0.1304	0.75
	CMB	0.7750	0.8000	0.4348	4.11	0.7164	0.7273	0.5217	13.71
	PCD-by-PCD	0.8050	0.8333	0.4348	1.53	0.9501	0.9677	0.6522	8.91
	LCS_FS	0.9257	0.9583	0.5000	1.15	0.8854	0.9167	0.4783	7.46
	PC-Stable	0.9472	0.9667	0.6304	0.27	0.9527	0.9688	0.6739	0.77
	F2SL_c	0.9524	1	0.4348	0.08	0.9327	1	0.3478	0.59
	WLBRC	0.9524	1	0.4348	0.15	0.9376	0.9630	0.5652	0.68
hepar2	GS	0.4830	0.7500	0.0488	0.40	0.7826	0.9474	0.1463	2.32
	CMB	0.7929	0.9500	0.1545	7.03	0.6878	0.7292	0.2846	546.88
	PCD-by-PCD	0.2527	0.2857	0.0650	1.77	0.3881	0.4130	0.1545	74.64
	LCS_FS	0.3006	1	0.0163	0.79	0.3006	1	0.0163	7.02
	PC-Stable	0.4388	0.4848	0.1301	0.27	0.6663	0.6857	0.3902	3.85
	F2SL_c	0.3006	1	0.0163	0.17	0.3006	1	0.0163	1.46
	WLBRC	0.3986	0.4483	0.1057	0.24	0.7247	0.7805	0.2602	1.55
win95pts	GS	0.7573	1	0.1071	0.45	0.9419	1	0.3839	5.41
	CMB	0.6697	0.7179	0.2500	5.59	0.7706	0.7937	0.4464	105.28
	PCD-by-PCD	0.5997	0.6800	0.1518	4.58	0.9228	0.9623	0.4554	26.09
	LCS_FS	0.8197	0.8667	0.3482	5.73	0.7780	0.8113	0.3839	39.36
	PC-Stable	0.6284	0.7273	0.1429	0.30	0.9094	0.9296	0.5893	1.93
	F2SL_c	0.8324	0.8947	0.3036	0.24	0.8447	0.8913	0.3661	1.91
	WLBRC	0.7462	0.8636	0.1696	0.43	0.8828	0.9333	0.3750	2.03
munin	GS	0	0	0	0.97	0.1099	0.2500	0.0073	5.92
	CMB	-	-	-	-	-	-	-	-
	PCD-by-PCD	-	-	-	-	-	-	-	-
	LCS_FS	0.2887	0.2906	0.2491	160.83	0.4012	0.4129	0.2344	1715.66
	PC-Stable	0.0551	0.0535	0.1941	197.85	0.5558	0.5733	0.3150	92.89
	F2SL_c	0.5205	0.5679	0.1685	1.64	0.5980	0.6506	0.1978	14.50
	WLBRC	0.4846	0.6250	0.0733	7.09	0.7545	0.8621	0.1832	31.28
andes	GS	0.8592	0.9462	0.2604	2.48	0.7335	0.7944	0.2515	17.11
	CMB	0.8168	0.8494	0.4172	40.39	0.8120	0.8378	0.4586	3108.60
	PCD-by-PCD	0.8195	0.8760	0.3136	30.18	0.9186	0.9468	0.5266	185.10
	LCS_FS	0.8756	0.9670	0.2604	12.97	0.9093	0.9902	0.2988	95.72
	PC-Stable	0.9300	0.9579	0.5385	2.24	0.9701	0.9809	0.7604	8.26
	F2SL_c	0.8941	0.9889	0.2633	1.54	0.9021	0.9896	0.2811	14.22
	WLBRC	0.8923	0.9466	0.3669	2.42	0.9642	0.9859	0.6213	11.48
pigs	GS	0.3835	0.8824	0.0253	2.07	0.4508	0.7353	0.0422	9.48
	CMB	-	-	-	-	-	-	-	-
	PCD-by-PCD	-	-	-	-	-	-	-	-
	LCS_FS	0.9753	0.9759	0.9595	67.35	1	1	1	383.33
	PC-Stable	0.7543	0.7542	0.7568	21.54	0.9984	0.9983	1	285.29
	F2SL_c	0.9942	0.9964	0.9409	7.18	1	1	1	59.15
	WLBRC	0.9838	0.9936	0.7889	13.55	0.9988	1	0.9696	68.72

Table 5. Performance comparison of WLBRC algorithm with other algorithms (Win/Tie/Lose).

Algorithm	$F_{5}$	R	RT	Algorithm	$F_{0.2}$	P	RT
IAMB	9/0/3	11/1/0	10/0/2	GS	8/0/4	7/1/4	7/0/5
STMB	11/0/1	6/3/3	12/0/0	CMB	11/0/1	11/0/1	12/0/0
PCMB	12/0/0	12/0/0	12/0/0	PCD-by-PCD	10/0/2	10/0/2	12/0/0
BAMB	8/0/4	11/1/0	12/0/0	LCS_FS	10/0/2	6/1/5	12/0/0
LCS_FS	9/0/3	11/1/0	11/0/1	PC-Stable	7/0/5	9/0/3	8/0/4
MMPC	6/0/6	9/3/0	12/0/0	F2SL_c	6/1/5	3/2/7	1/0/11

Table 6. Parameters of the FS-SFS.

Param.	Value	Descriptions
p	0.1	The constraint rate of the edge that is sure to exist
q	0.5	The constraint rate of the nonexistent edge
$n P o p$	50	The population size
$m w$	10	The maximum number of walks in the same direction
$μ$	log(m)	A threshold for determining the highest BIC increase
$L_{\max}$	$\min (N, 200)$	The maximum number of unpromoted iterations allowed
$MaxIt$	2000	The maximum number of iterations allowed

Table 7. Performance of FS-SFS algorithm on different datasets.

Network	Dataset	BIC	AE	DE	RE	SHD	RT	F1
alarm	1000	$- 1.2490 \times 10^{4}$ ± 0	0 ± 0	5 ± 0	2.40 ± 0.55	7.40 ± 0.55	43.12	0.8874 ± 0.0119
	3000	$- 3.4811 \times 10^{4}$ ± 0	0 ± 0	1 ± 0	2 ± 0	3 ± 0	73.95	0.9451 ± 0
	5000	$- 5.6914 \times 10^{4}$ ± 0	0 ± 0	1 ± 0	1 ± 0	2 ± 0	72.88	0.9670 ± 0
	10,000	$- 1.1201 \times 10^{5}$ ± 0	1 ± 0	1 ± 0	2 ± 0	4 ± 0	110.39	0.9348 ± 0
hepar2	1000	$- 3.4052 \times 10^{4}$ ± 5.55	4.40 ± 0.52	63.60 ± 0.52	2.20 ± 0.79	70.20 ± 0.79	144.21	0.6124 ± 0.0067
	3000	$- 1.0017 \times 10^{5}$ ± 0	3 ± 0	51 ± 0	2 ± 0	56 ± 0	190.77	0.7071 ± 0
	5000	$- 1.6603 \times 10^{5}$ ± 2.75	2 ± 0	41.60 ± 0.52	2 ± 0	45.60 ± 0.52	286.76	0.7694 ± 0.0031
	10,000	$- 3.3134 \times 10^{5}$ ± 2.17	1 ± 0	30 ± 0	4.60 ± 0.52	35.60 ± 0.52	446.02	0.8147 ± 0.0048
win95pts	1000	$- 1.0749 \times 10^{4}$ ± 0	10 ± 0	33 ± 0	3 ± 0	46 ± 0	253.55	0.7562 ± 0
	3000	$- 2.9637 \times 10^{4}$ ± 6.42	4.60 ± 0.52	23.60 ± 0.52	2.60 ± 0.52	30.80 ± 1.55	335.52	0.8371 ± 0.0101
	5000	$- 4.7576 \times 10^{4}$ ± 89.14	5.20 ± 0.42	18.20 ± 0.42	3.40 ± 0.52	26.80 ± 0.79	556.21	0.8569 ± 0.0049
	10,000	$- 9.3658 \times 10^{4}$ ± 0	3 ± 0	10 ± 0	2.64 ± 0.49	15.64 ± 0.49	755.73	0.9158 ± 0.0045
munin	1000	−5.6785 $\times 10^{4}$ ± 6.90	31 ± 2.11	143 ± 2.11	9 ± 0	183 ± 4.22	3383.80	0.5576 ± 0.0097
	3000	−1.4629 $\times 10^{5}$ ± 39.79	32 ± 1.05	122 ± 0	14.50 ± 0.53	168.50 ± 0.53	4388.82	0.5987 ± 0.0009
	5000	−2.2804 $\times 10^{5}$ ± 340.72	32 ± 2.11	102 ± 2.11	14 ± 0	148 ± 4.22	5614.34	0.6597 ± 0.0089
	10,000	−4.2387 $\times 10^{5}$ ± 533.26	32.90 ± 2.18	87.90 ± 2.18	6.50 ± 1.35	127.30 ± 3.68	8676.68	0.7275 ± 0.0070
andes	1000	$- 9.5761 \times 10^{4}$ ± 3.23	47 ± 1.05	82.50 ± 1.58	2 ± 0	131.50 ± 2.64	2915.82	0.7916 ± 0.0043
	3000	$- 2.8210 \times 10^{5}$ ± 3.32	31 ± 1.05	51.50 ± 1.58	2.50 ± 1.58	85 ± 4.22	3349.89	0.8665 ± 0.0090
	5000	$- 4.6757 \times 10^{5}$ ± 1.03	19.50 ± 0.52	43.50 ± 0.52	4.50 ± 0.52	67.50 ± 0.52	3868.22	0.8896 ± 0
	10,000	$- 9.3286 \times 10^{5}$ ± 0.22	17.80 ± 0.79	25.40 ± 0.84	1.20 ± 0.42	44.40 ± 1.72	5512.32	0.9318 ± 0.0029
pigs	1000	−3.4827 $\times 10^{5}$ ± 0	0 ± 0	0 ± 0	0 ± 0	0 ± 0	16,843.53	1 ± 0
	3000	−1.0120 $\times 10^{6}$ ± 0	0 ± 0	0 ± 0	0 ± 0	0 ± 0	17,692.17	1 ± 0
	5000	−1.6760 $\times 10^{6}$ ± 0	0 ± 0	0 ± 0	0 ± 0	0 ± 0	18,782.75	1 ± 0
	10,000	−3.3268 $\times 10^{6}$ ± 0	0 ± 0	0 ± 0	0 ± 0	0 ± 0	21,748.28	1 ± 0

Table 8. The performance impact after parameter changes.

Network	Param.	1000			10,000
Network	Param.	BIC	SHD	F1	BIC	SHD	F1
alarm	original	−1.2490 × $10^{4}$	7.40	0.8874	−1.1201 × $10^{5}$	4	0.9348
	p = 0	−1.2347 × $10^{4}$	8	0.8864	−1.1200 × $10^{5}$	5	0.9333
	p = 0.2	−1.2421 × $10^{4}$	8	0.8764	−1.1210 × $10^{5}$	4	0.9247
	q = 0.4	−1.2394 × $10^{4}$	8	0.8636	−1.1201 × $10^{5}$	4	0.9348
	q = 0.6	−1.2490 × $10^{4}$	7	0.8966	−1.1293 × $10^{5}$	4	0.9438
	nPop = 40	−1.2490 × $10^{4}$	7	0.8966	−1.1201 × $10^{5}$	4	0.9348
	nPop = 60	−1.2490 × $10^{4}$	7	0.8966	−1.1201 × $10^{5}$	4	0.9348
hepar2	original	−3.4052 × $10^{4}$	70.20	0.6124	−3.3134 × $10^{5}$	35.60	0.8147
	p = 0	−3.4001 × $10^{4}$	79	0.5355	−3.3125 × $10^{5}$	43	0.7642
	p = 0.2	−3.4112 × $10^{4}$	65	0.6387	−3.3155 × $10^{5}$	38	0.7963
	q = 0.4	−3.4045 × $10^{4}$	70	0.6096	−3.3135 × $10^{5}$	36	0.8111
	q = 0.6	−3.4057 × $10^{4}$	69	0.6237	−3.3140 × $10^{5}$	36	0.8186
	nPop = 40	−3.4046 × $10^{4}$	71	0.6064	−3.3134 × $10^{5}$	35	0.8203
	nPop = 60	−3.3046 × $10^{4}$	71	0.6064	−3.3135 × $10^{4}$	36	0.8111

Table 9. BIC scores of the algorithms for different datasets. Bold denotes the BIC score that was the best found amongst all methods. ‘*’ represents a significant difference (p < 0.05), while ‘**’ represents a very significant difference (p < 0.01).

Sample	Network	PC-Stable	GS	F2SL	MMHC	BNC-PSO	FS-SFS
1000	alarm	$- 1.2466 \times 10^{4}$ **	$- 1.7902 \times 10^{4}$ **	$- 1.2475 \times 10^{4}$ **	$- 1.2561 \times 10^{4}$ **	$- 1.2367 \times 10^{4}$ **	−1.2339 $\times 10^{4}$
	hepar2	$- 3.5318 \times 10^{4}$ **	$- 3.4834 \times 10^{4}$ **	$- 3.4261 \times 10^{4}$ **	$- 3.4048 \times 10^{4}$ **	$- 3.3994 \times 10^{4}$ *	−3.3984 $\times 10^{4}$
	win95pts	$- 1.3271 \times 10^{4}$ **	$- 1.5375 \times 10^{4}$ **	$- 1.0987 \times 10^{4}$ **	$- 1.0917 \times 10^{4}$ **	$- 1.0534 \times 10^{4}$ **	−1.0487 $\times 10^{4}$
	munin	-	$- 9.8582 \times 10^{4}$ **	$- 7.3424 \times 10^{4}$ **	$- 8.7319 \times 10^{4}$ **	$- 5.8794 \times 10^{4}$ **	−5.5655 $\times 10^{4}$
	andes	$- 9.9080 \times 10^{4}$ **	$- 1.0625 \times 10^{5}$ **	$- 1.0074 \times 10^{5}$ **	$- 9.8045 \times 10^{4}$ **	$- 9.6344 \times 10^{4}$ **	−9.5664 $\times 10^{4}$
	pigs	-	$- 4.5682 \times 10^{5}$ **	$- 3.5210 \times 10^{5}$ **	$- 3.4827 \times 10^{5}$	$- 3.5032 \times 10^{5}$ *	−3.4827 $\times 10^{5}$
3000	alarm	$- 3.5083 \times 10^{4}$ **	$- 5.3065 \times 10^{4}$ **	$- 3.5693 \times 10^{4}$ **	$- 3.6877 \times 10^{4}$ **	$- 3.4722 \times 10^{4}$ *	−3.4715 $\times 10^{4}$
	hepar2	$- 1.2180 \times 10^{5}$ **	$- 1.0207 \times 10^{5}$ **	$- 1.0124 \times 10^{5}$ **	$- 1.0030 \times 10^{5}$ **	$- 1.0007 \times 10^{5}$ **	−1.0000 $\times 10^{5}$
	win95pts	$- 3.5202 \times 10^{4}$ **	$- 4.4373 \times 10^{4}$ **	$- 3.1103 \times 10^{4}$ **	$- 3.0680 \times 10^{4}$ **	$- 2.9436 \times 10^{4}$ **	−2.9352 $\times 10^{4}$
	munin	-	$- 2.9331 \times 10^{5}$ **	$- 1.7190 \times 10^{5}$ **	$- 2.0244 \times 10^{5}$ **	$- 1.4934 \times 10^{5}$ **	−1.4228 $\times 10^{5}$
	andes	$- 2.9066 \times 10^{5}$ **	$- 3.1683 \times 10^{5}$ **	$- 2.9970 \times 10^{5}$ **	$- 2.8752 \times 10^{5}$ **	$- 2.8284 \times 10^{5}$ **	−2.8225 $\times 10^{5}$
	pigs	−1.0120 $\times 10^{6}$	$- 1.3594 \times 10^{6}$ **	$- 1.0122 \times 10^{6}$ **	−1.0120 $\times 10^{6}$	$- 1.0155 \times 10^{6}$ **	−1.0120 $\times 10^{6}$
5000	alarm	$- 5.7240 \times 10^{4}$ **	$- 8.8094 \times 10^{4}$ **	$- 5.8871 \times 10^{4}$ **	$- 5.9328 \times 10^{4}$ **	$- 5.7181 \times 10^{4}$ **	−5.6858 $\times 10^{4}$
	hepar2	$- 1.7224 \times 10^{5}$ **	$- 1.7118 \times 10^{5}$ **	$- 1.6849 \times 10^{5}$ **	$- 1.6635 \times 10^{5}$ **	$- 1.6592 \times 10^{5}$ *	−1.6591 $\times 10^{5}$
	win95pts	$- 5.6061 \times 10^{4}$ **	$- 7.0745 \times 10^{4}$ **	$- 5.0993 \times 10^{4}$ **	$- 4.8629 \times 10^{4}$ **	$- 4.7595 \times 10^{4}$ **	−4.7424 $\times 10^{4}$
	munin	-	$- 4.9315 \times 10^{5}$ **	$- 2.6622 \times 10^{5}$ **	$- 3.1552 \times 10^{5}$ **	$- 2.3419 \times 10^{5}$ **	−2.2742 $\times 10^{5}$
	andes	$- 4.8078 \times 10^{5}$ **	$- 5.3222 \times 10^{5}$ **	$- 4.9561 \times 10^{5}$ **	$- 4.7403 \times 10^{5}$ **	$- 4.6939 \times 10^{5}$ **	−4.6756 $\times 10^{5}$
	pigs	$- 1.6766 \times 10^{6}$ **	$- 2.2608 \times 10^{6}$ **	$- 1.6768 \times 10^{6}$ **	$- 1.6764 \times 10^{6}$ *	$- 1.6768 \times 10^{6}$ *	−1.6760 $\times 10^{6}$
10,000	alarm	$- 1.1383 \times 10^{5}$ **	$- 1.7479 \times 10^{5}$ **	$- 1.1623 \times 10^{5}$ **	$- 1.1489 \times 10^{5}$ **	$- 1.1200 \times 10^{5}$ *	−1.1197 $\times 10^{5}$
	hepar2	$- 3.3786 \times 10^{5}$ **	$- 3.4119 \times 10^{5}$ **	$- 3.3729 \times 10^{5}$ **	$- 3.3182 \times 10^{5}$ **	$- 3.3125 \times 10^{5}$ *	−3.3125 $\times 10^{5}$
	win95pts	$- 1.0726 \times 10^{5}$ **	$- 1.3627 \times 10^{5}$ **	$- 1.0393 \times 10^{5}$ **	$- 9.8666 \times 10^{4}$ **	$- 9.3616 \times 10^{4}$ **	−9.3566 $\times 10^{4}$
	munin	-	$- 9.8632 \times 10^{5}$ **	$- 5.0656 \times 10^{5}$ **	$- 9.7306 \times 10^{5}$ **	$- 4.5065 \times 10^{5}$ **	−4.2976 $\times 10^{5}$
	andes	$- 9.5765 \times 10^{5}$ **	$- 1.0700 \times 10^{6}$ **	$- 1.0036 \times 10^{6}$ **	$- 9.5184 \times 10^{5}$ **	$- 9.3407 \times 10^{5}$ **	−9.3264 $\times 10^{5}$
	pigs	$- 3.3270 \times 10^{6}$ **	$- 4.5233 \times 10^{6}$ **	−3.3268 $\times 10^{6}$	−3.3268 $\times 10^{6}$	$- 3.3268 \times 10^{6}$	−3.3268 $\times 10^{6}$

Table 10. F1 scores of the algorithms for different datasets. Bold denotes the F1 score that was the best found amongst all methods.

Sample	Network	PC-Stable	GS	F2SL	MMHC	BNC-PSO	FS-SFS
1000	alarm	0.8810	0.4516	0.8434	0.7912	0.8667	0.8764
	hepar2	0.2927	0.3529	0.2953	0.5464	0.5385	0.4839
	win95pts	0.4654	0.4459	0.5297	0.5455	0.6146	0.5437
	munin	0.1180	0.0414	0.3129	0.2722	0.2850	0.3664
	andes	0.7179	0.5217	0.5820	0.5742	0.7064	0.7364
	pigs	0.7555	0.0707	0.9735	0.9975	0.9486	1
3000	alarm	0.9545	0.4179	0.8537	0.7333	0.8764	0.8989
	hepar2	0.3094	0.3827	0.3067	0.5670	0.6154	0.6392
	win95pts	0.6932	0.5375	0.6349	0.6884	0.7273	0.6986
	munin	0.2500	0.0621	0.3000	0.2659	0.3991	0.4724
	andes	0.8068	0.5594	0.5667	0.6941	0.8066	0.8200
	pigs	1	0.0943	0.9983	1	0.9713	1
5000	alarm	0.9545	0.4412	0.8571	0.7957	0.9333	0.9111
	hepar2	0.4255	0.4294	0.2267	0.5941	0.7122	0.7059
	win95pts	0.7701	0.5697	0.6526	0.7544	0.7558	0.7373
	munin	0.3769	0.0403	0.3581	0.3211	0.4915	0.5265
	andes	0.8485	0.5217	0.5882	0.7797	0.8223	0.8723
	pigs	0.9983	0.1178	0.9983	0.9958	0.9941	1
10,000	alarm	0.9318	0.4478	0.8434	0.8602	0.9556	0.9011
	hepar2	0.5572	0.4881	0.2267	0.7204	0.7793	0.7700
	win95pts	0.8061	0.6550	0.6237	0.6756	0.8778	0.8818
	munin	0.5380	0.0604	0.3113	0.3784	0.5031	0.5756
	andes	0.8736	0.5021	0.3286	0.7101	0.8703	0.9107
	pigs	0.9992	0.1031	1	0.9992	1	1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dang, Y.; Gao, X.; Wang, Z. Stochastic Fractal Search for Bayesian Network Structure Learning Under Soft/Hard Constraints. Fractal Fract. 2025, 9, 394. https://doi.org/10.3390/fractalfract9060394

AMA Style

Dang Y, Gao X, Wang Z. Stochastic Fractal Search for Bayesian Network Structure Learning Under Soft/Hard Constraints. Fractal and Fractional. 2025; 9(6):394. https://doi.org/10.3390/fractalfract9060394

Chicago/Turabian Style

Dang, Yinglong, Xiaoguang Gao, and Zidong Wang. 2025. "Stochastic Fractal Search for Bayesian Network Structure Learning Under Soft/Hard Constraints" Fractal and Fractional 9, no. 6: 394. https://doi.org/10.3390/fractalfract9060394

APA Style

Dang, Y., Gao, X., & Wang, Z. (2025). Stochastic Fractal Search for Bayesian Network Structure Learning Under Soft/Hard Constraints. Fractal and Fractional, 9(6), 394. https://doi.org/10.3390/fractalfract9060394

Article Menu

Stochastic Fractal Search for Bayesian Network Structure Learning Under Soft/Hard Constraints

Abstract

1. Introduction

2. Background

2.1. BN

2.2. Information Theoretic Metrics

2.3. Scoring Function

3. Methodology

3.1. Proposed Local Structure Learning Algorithm

3.2. Proposed Binary SFS Algorithm

3.2.1. Diffusion Process

3.2.2. Update Process

3.2.3. Knowledge Fusion

4. Experiments

4.1. Datasets and Evaluation Metrics

4.2. Soft Constraint Knowledge Mining

4.3. Learning BNs via the FS-SFS Algorithm

4.4. Comparison with Some Other Algorithms

5. Conclusions and Future Research

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI