Next Article in Journal
Sentiment Analysis of Emirati Dialect
Previous Article in Journal
Virtual Reality Adaptation Using Electrodermal Activity to Support the User Experience
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Better Mechanistic Understanding of Big Data through an Order Search Using Causal Bayesian Networks

1
Department of Biostatistics, Florida International University, Miami, FL 33199, USA
2
Department of Mathematics & Statistics, University of South Florida, Tampa, FL 33620, USA
3
Department of Environmental Health Sciences, Florida International University, Miami, FL 33199, USA
*
Author to whom correspondence should be addressed.
Big Data Cogn. Comput. 2022, 6(2), 56; https://doi.org/10.3390/bdcc6020056
Submission received: 8 April 2022 / Revised: 8 May 2022 / Accepted: 10 May 2022 / Published: 17 May 2022

Abstract

:
Every year, biomedical data is increasing at an alarming rate and is being collected from many different sources, such as hospitals (clinical Big Data), laboratories (genomic and proteomic Big Data), and the internet (online Big Data). This article presents and evaluates a practical causal discovery algorithm that uses modern statistical, machine learning, and informatics approaches that have been used in the learning of causal relationships from biomedical Big Data, which in turn integrates clinical, omics (genomic and proteomic), and environmental aspects. The learning of causal relationships from data using graphical models does not address the hidden (unknown or not measured) mechanisms that are inherent to most measurements and analyses. Also, many algorithms lack a practical usage since they do not incorporate current mechanistic knowledge. This paper proposes a practical causal discovery algorithm using causal Bayesian networks to gain a better understanding of the underlying mechanistic process that generated the data. The algorithm utilizes model averaging techniques such as searching through a relative order (e.g., if gene A is regulating gene B, then we can say that gene A is of a higher order than gene B) and incorporates relevant prior mechanistic knowledge to guide the Markov chain Monte Carlo search through the order. The algorithm was evaluated by testing its performance on datasets generated from the ALARM causal Bayesian network. Out of the 37 variables in the ALARM causal Bayesian network, two sets of nine were chosen and the observations for those variables were provided to the algorithm. The performance of the algorithm was evaluated by comparing its prediction with the generating causal mechanism. The 28 variables that were not in use are referred to as hidden variables and they allowed for the evaluation of the algorithm’s ability to predict hidden confounded causal relationships. The algorithm’s predicted performance was also compared with other causal discovery algorithms. The results show that incorporating order information provides a better mechanistic understanding even when hidden confounded causes are present. The prior mechanistic knowledge incorporated in the Markov chain Monte Carlo search led to the better discovery of causal relationships when hidden variables were involved in generating the simulated data.

1. Introduction

The size of biomedical data, as well as the rate at which it is being produced, is increasing dramatically. The biomedical data is also being collected from many different sources, such as hospitals (clinical Big Data), laboratories (genomic and proteomic Big Data), and the internet (online Big Data). There is a growing need for statistically predictive causal discovery algorithms that incorporate the biological knowledge gained from modern statistical, machine learning, and informatics approaches used in the learning of causal relationships from biomedical Big Data comprised of clinical, omics (genomic and proteomic), and environmental components.
While earlier available studies focus on statistical methods to infer causality [1,2,3,4], recent statistical machine learning methods have been introduced which aim at analyzing big datasets [5,6,7,8,9,10,11,12,13,14,15,16,17,18]. However, given many different types of clinical, genomic, and environmental data, it is rather uncommon to see statistical machine learning methods that utilize prior knowledge relevant to the mechanisms behind the phenomena which generates those different data types. The statistical machine learning methods that recognize that there are many variables which are not collected in the data, but are still related to the mechanisms which produced the data (hidden variables), are also limited. Furthermore, there is a lack of statistical methods that evaluate how well the methods perform at inferring causality when hidden confounded variables are present.
There are many aspects of causality, from its representation (syntax) to its semantics and many different related concepts to causality, e.g., theory of inferred causation, counterfactual analyses, incomplete interventions, confounding effect, etc. [1,9]. However, in learning mechanisms from a phenomenon with collected data, the goal is to infer cause and effect relationships among complicated knitted random variables in the dataset with reasonable confidence.
Thus, the focus in this study is on the learning of causal relationships among random variables in the collected data, particularly when using causal Bayesian networks (CBNs). CBNs are directed acyclic graphs in which each arc is interpreted as a direct causal influence between a parent node and a child node relative to the other nodes in the network [19]. CBNs consist of a structure (such as an example in Figure 1) and a set of probabilities that parameterize said structure (not shown). In general, for each variable there is a conditional probability of that variable given the states of its direct causes. Thus, the probability associated with Gliomas Grade is P (Gliomas Grade|PTNP1, LPL, EGFR). That is, we provide the probability distribution over the values of the Gliomas Grade conditioned on each of the possible expression levels of the genes PTNP1, LPL, and EGFR. For variables that have no direct causes in the network, a prior probability is specified. The causal Markov condition [9] specifies the conditional independence relationships which are represented by a causal network: Let X and Y be variables. Suppose that Y is neither a direct nor an indirect effect of X. Then X is independent of Y, conditioned on any state of the direct causes of X. The causal Markov condition permits the joint distribution of the n variables in a CBN to be factored as follows [19]:
P ( x 1 , x 2 , , x n | K ) = i = 1 n P ( x i | π i , K )
where xi denotes a state of variable Xi, πi denotes a joint state of the parents of Xi, and K denotes background knowledge (prior probability). Since the initial research for a general Bayesian formulation for learning causal structure (including latent variables) and parameters from observational data using CBN [20,21], Bayesian causal discovery has become an active field of research in which numerous advances have been made [1,7,8,10,22,23].
CBNs have been suitable in analyzing Big Data sets consisting of different types of large data including clinical, genomic, and environmental data [8,12,23,24,25,26,27,28,29]. Such causal statistical models help to provide a more comprehensive understanding of human physiology and disease. More importantly, CBNs have been used as a natural way to express “causal” knowledge as a graph using nodes (representing random variables) and arcs (representing “causal” relationships). Indeed, there are many causal models made from existing causal knowledge—from simple and intuitive causal models (e.g., a model to predict whether neighbor is out [30], a sprinkler model [1], etc.), to expert causal models (e.g., a multiple diseases model [31], an ALARM monitoring system [32], etc.). The learning of causal relationships from data has been discussed in different articles [1,9,33], and this especially holds true for cases where researchers have used Bayesian Networks for learning structures [29,34,35,36,37]. Also, other algorithms, such as PC [9], K2 [5], and more recently the Bayesian Inference for Directed Acyclic Graphs (BiDAG) [12], have been used to learn causal relationships from data.
Earlier structure learning methods concentrated on model selection, where we select a model M* from
M * = a r g m a x i P D | M i
or
M * = a r g m a x i P M i | D
where we assume we have p number of mutually exclusive models, M 1 , M 2 , , M p [38]. Later methods incorporated model averaging [29], where we summarize how likely a feature F that is found in a subset of the models and is defined by a set of indices, f 1 , 2 , , p where f includes those indices of the models where F is observed. Thus, in model averaging, we calculate the probability of a feature F as the following:
f P D | M f  
or
f P P M f | D  
However, most of the structure learning methods do not address hidden variables. Since we cannot observe all relevant variables in a natural phenomenon, to better learn the underlying mechanistic process from Big Data, we need to address and evaluate the learning of causal relationships with hidden variables.
In this paper, we show that searching through the order (we describe further about what we mean by “order” in the method section) of variables in CBNs can help provide a better understanding of the underlying mechanistic process that generated the data even in the presence of hidden variables. In addition, we propose a novel algorithm in searching through the order (we call it the PrePrior algorithm) which evidences a promising performance when attempting to learn the underlying mechanistic process from data containing hidden variables. The algorithm utilizes model averaging techniques such as searching through a relative order (e.g., if gene A is regulating gene B, then we can say that gene A is in a higher order than gene B) and incorporates relevant prior mechanistic knowledge to guide the Markov chain Monte Carlo (MCMC) search through the order.

2. Methods

Given a CBN structure S and a dataset D, the Bayesian scoring method that assesses how well the structure fits the given data can be calculated using a closed form [39]:
P ( D | S ) = i = 1 n j = 1 q i Γ N i j Γ N i j + N i j k = 1 r i Γ N i j k + N i j k Γ N i j k
In the above scoring method, Dirichlet uniform parameter priors are used and parameter independence is assumed [40]; n represents the number of variables in the structure; qi represents the number of configurations of the parents for a given variable Xi; and ri represents the total amount of states for a variable Xi. For example, if Xi is a binary random variable and it has two binary random variables as direct causes (parents), then ri is equivalent to two and qi is equivalent to four. Nijk represents the counts for a given variable Xi under a given parent configuration (indexed by j) and a given state (indexed by k) for variable Xi. Nijk represents the Dirichlet uniform prior, which in this case may be calculated as the following:
N i j k = 1 r i q i  
The number of possible structures increases exponentially with the number of variables, and so the above formula is sufficient for determining the best BN when the number of variables in the CBN is small. However, when the number of variables is large, it becomes impossible to determine the best structure in this manner. The problem of finding the best CBN is NP-hard [41], and thus it is not always possible to find the best CBN that fits the data. This is the key limitation of model selection methods [38] when used as a means of extending our current mechanistic understanding through the learning of causal relationships from data.
The algorithm we introduce in this paper utilizes model averaging techniques, such as searching through a relative order [29] (e.g., cause is in a higher order than effect) and incorporating prior mechanistic knowledge to guide the MCMC (Markov Chain Monte Carlo) search through the order. An order describes the relationships between variables based on describing whether a variable can be a direct cause (parent) for another variable.
Definition 1.
(Order ): X i X j iff X j P a X i .
With the above definition of the order, we are stating that Xi is considered to be of a higher order than Xj if, and only if, Xj cannot be found in the group of direct causes (parents) of Xi. A potential ordering for a list of three variables is <X1, X2, X3>. This order implies that X1 can be a direct cause (parent) of X2 and/or X3, but X2 and X3 cannot be direct causes (parents) of X1. Similarly, X2 can be a direct cause (parent) of X3, but X3 cannot be a direct cause (parent) of X2. Note that any given order of random variables can better summarize mechanistic (causal) relationships than just one structure. For example, an order <X1, X2, X3> includes the following three structures (Figure 2):
Orders are useful because, in a manner similar to structures, they can be scored. Since an order represents a set of structures, it may be scored by summing over all structures consistent with the given order. This method for scoring an order is not efficient because it would require that we have a score for all structures that meet a given order. With that being the case, we consider an alternative method for scoring orders presented by Friedman and Koller [29], which uses the direct cause (parent) sets of variables. The equation for this scoring procedure is:
P ( D | O ) = i = 1 n U U i , ο j = 1 q i , U Γ N i j Γ N i j + N i j k = 1 r i Γ N i j k + N i j k Γ N i j k
The above equation is an expansion of Bayesian scoring presented by Heckerman [33]. Here, O represents an ordering, Ui,o represents the possible parent-sets for a given variable under a given ordering, and qi,U represents the possible configurations of the parents for a variable i within a parent-set U. All other parameters in the equation are represented in the same manner as in Equation (6).
The benefit in scoring orders over scoring structures is that in the case where one is dealing with two or more variables, there are more structures than orders. For example, when the number of variables equals four, there are 543 structures but only 24 different orders.
An MCMC search is used to search through the orders. At any given MCMC search process, we have a current order (denote it as o) and a proposed order (denote it as o′), and we decide whether the proposed order will take the place of the current order with a probability that is returned by a decision function f o , o . A proposed order is generated by either applying a local perturbation (i.e., swapping two variables in an order: for example, <X1, X2XiXjXn> to <X1, X2XjXiXn>), or a global perturbation (i.e., aka a cutting the deck, swapping groups of variables in an order: for example, <X1, X2Xi, Xi+1Xn> to < Xi+1Xn, X1, X2Xi >). Initially, a random order is generated.
Friedman and Koller [29] propose the following two algorithms for MCMC search with different f o , o :
o
Random Algorithm
Uses f o , o = m i n 1 , P ( D | o ) P ( D | o )  
o
Prior Algorithm
Uses f o , o = m i n 1 , P ( D | o ) P ( o | o ) P ( D | o ) P o | o  
where o, o′, and D represent the current order that we are considering: a proposed order and a dataset, respectively.
We further propose a new algorithm called the PrePrior Algorithm with the following MCMC search with the same f o , o as the Prior algorithm with an additional step:
o
PrePrior Algorithm
Uses P ( o | o ) based on user defined prior to sample o
Uses f o , o = m i n 1 , P ( D | o ) P ( o | o ) p ( D | o ) P o | o  
Note that PrePrior algorithm generates proposed orders based on the prior, P o and P o that the user provides.
User’s Prior of an Order. To specify a prior of mechanistic causal knowledge in terms of an order o (if X is known to cause Y, we say X has a higher order than Y, i.e., X Y ) or P o , we assume the following:
i.
If no prior is provided, a uniform prior of any given order is assumed. For example, for a pairwise order of X and Y, if no prior is provided then P X Y = P Y X = 0.5 . In general, for n variables a uniform prior for any order o is P o = 1 n ! .
ii.
The prior of an order is specified as the probability of how likely it is compared to the uniform prior. For example, if prior publications show gene Y is regulating gene X, a user might specify P X Y = 0.9 and if there have been studies suggesting that gene Z is regulating gene W, a user might specify P W Z = 0.6 .
For mechanism discovery, the correct discovery of the generating structure is the most important aspect of the algorithm. Datasets consisting of 50 and 1000 simulated observational cases from the ALARM Bayesian network were generated [27]. To see how well the algorithm correctly discovered the generating structure in the presence of hidden variables, we have selected two sets of nine variables each selected from 37 variables in the network. The first variable set is referred to as Close 9 variables (C9) and was created by selecting variables that were closely situated in the network (Figure 3a, all the grayed-out variables are hidden and not selected). The second variable set is referred to as Sparse 9 variables (S9) and was created by selecting variables that were relatively situated further in the network (Figure 3b, all the grayed-out variables are hidden and not selected).
Another reason we have selected these nine variables was to see how well the causal discovery algorithms were predicting the four pairwise relationships shown in Figure 4. Distinguishing these four pairwise relationships is the first step in better understanding the mechanistic process involved in generating these datasets.
Different numbers of pairwise causal relationships are found in the Close 9 variables (C9) and Sparse 9 variables (S9) (Table 1). For example, in C9, TPR and VentLung are not confounded nor causally related (denoted as ØX Y in Figure 4a), and TPR and HR are not confounded and causally related (denoted as ØXY in Figure 4b). In S9, ExpCO2 and Catechol are confounded but not causally related (denoted as HX Y in Figure 4c, ArtCO2 being a variable as H), and ArtCO2 and VentAlv are confounded and causally related (denoted as HXY in Figure 4d where VentLung takes the role of H).
Two datasets were generated from each of the two sets of variables. Two of the datasets had 50 observational cases each and were named D50C9 and D50S9 because they were generated by the C9 and S9 sets of variables, respectively. The other two datasets had 1000 observational cases each and were named D1KC9 and D1KS9 because they were generated by the C9 and S9 sets of variables, respectively. Many biological mechanistic networks are not completely connected, i.e., each variable has limited (e.g., less than five) causes. As a result, we have limited the number of possible parents to five and scored all the possible orders using Equation (8). It took roughly one month to score all of the possible orders for the four datasets. The dataset of results is referred to as Dataset Global BDe Best Order. Dataset Global BDe Best Order contains information on all of the scores for all of the possible orders, and therefore we know which is the best order (and the best Bayesian networks structure) that will be identified if the BDe metric [5] (similar to Equation (8)) is used given the dataset.
The Random, Prior, and PrePrior algorithms were independently ran three times on D50C9 and D50S9 for 1 h, 2 h, and 4 h; and on D1KC9 and D1KS9 for 2 h, 4 h, and 16 h. We have used five Linux machines to run in parallel of 522 total h (over 21 equivalent days) of runs.
The predictive performance is calculated as a pairwise causal distance from either generating the structure (denoted it as SG and shown in Figure 5) or the Dataset Global BDe Order. For each variable pair of X and Y, let the underlying relationship between X and Y be denoted as RX,Y where R X , Y X Y , X Y ,   X n o n e Y . Let the likelihood score of RX,Y assessed from either the generating structure and Dataset Global BDe Order as PG(RX,Y) and PG(D|RX,Y) respectively, where D D 50 C 9 ,   D 50 S 9 ,   D 1 KC 9 ,   D 1 KS 9 . Note that we calculate.
P G R X , Y = 1   i f   R X , Y S G   0   i f   R X , Y S G
and
P G D | R X , Y = o O S o δ S o P D | S o P ( D | o )
δ S o = 1   i f   R X , Y S o   0     i f   R X , Y S o
where O is the set of orders that satisfy O P ( D | O ) Φ O P ( D | Φ O ) > 0.99 and So is the set of structures that satisfy an order o O and S o P ( D | S o ) Φ S o P ( D | Φ S o ) > 0.99 for all possible orders (denote them as Φ O ) and all possible structures that satisfies an order o O (denote them as Φ S o ).
Additionally, we calculate PSG (D|RX,Y).
P S G D | R X , Y = S δ S P D | S
δ S = 1   i f   R X , Y S   0   i f   R X , Y S
S is the set of structures that satisfies S P ( D | S ) Φ S P ( D | Φ S ) > 0.99 for all possible structures (denote them as Φ S ) from all possible orders.
We use PSG(D|RX,Y) and PG(D|RX,Y) for all X and Y to generate a consensus causal structure by drawing arcs between X and Y with the thickest arc when PSG(D|RX,Y) or PG(D|RX,Y) are above 0.9999, and with the thinnest arc when PSG(D|RX,Y) or PG (D|RX,Y) are close to 0.0001. If PSG(D|XY) and PSG(D|YX) both are less than 0.0001, then no arcs are drawn between X and Y.
We first compare generating causal structure and Dataset Global BDe Best Order by calculating the following:
R X , Y P G R X , Y P G D | R X , Y
R X , Y P G R X , Y P S G D | R X , Y
These results will show us how the BDe metric approximates the generated causal structure given the generated datasets. In addition to comparing the predictive ability of these algorithms, we compared the causal structure predictive ability of the algorithms that use BDe metric with the Dataset Global BDe Best Order.
We report each Dataset Global BDe Best Order prediction using a Markov blanket of a variable (Catechol) appearing both from Close 9 variables (C9) and Sparse 9 variables (S9) and compared that with the Markov blanket of the Catechol from the generating structure.
Denote the probability of RX,Y predicted from an algorithm as PA(D|RX,Y) and PSA(D|RX,Y). Note that PA(D|RX,Y) is calculated the same way we calculated PG(D|RX,Y) described above. We report the distance from the generating structure as
R X , Y P G R X , Y P A D | R X , Y
R X , Y P G R X , Y P S A D | R X , Y
and the distance from the Dataset Global BDe Order as
R X , Y P G D | R X , Y P A D | R X , Y
R X , Y P S G D | R X , Y P S A D | R X , Y
Note here we consider indirect causation to assess RX,Y, i.e., we check whether X appears as an ancestor of Y (i.e., repeatedly applying parent-of(Y) function–parent-of(parent-of(Y)), parent-of(parent-of(parent-of (Y)))…), or whether Y appears as an ancestor of X in the overall network.
We report how well algorithms predict the Markov blanket of each variable in Close 9 variables (C9) and Sparse 9 variables (S9) (denote all Markov Blankets as AM) and compare with the Markov blanket of the variable from the Dataset Global BDe Best Order (denote all Markov Blankets as GM) by calculating the following distance:
g M G M a M A M d g M , a M
d g M , a M = P G D | g M P A D | a M   i f   g M a M   P G D | g M                                                 i f   g M A M P A D | a M                                                 i f   a M G M 0                                                                     o t h e w i s e
g M G M a M A M d S g M , a M
d S g M , a M = P S G D | g M P S A D | a M   i f   g M a M   P S G D | g M                                                 i f   g M A M P S A D | a M                                                 i f   a M G M 0                                                                     o t h e w i s e
Note that P G D | g M and P A D | a M can be calculated by incorporating the order weight (as we calculated P G D | R X , Y or P A D | R X , Y by multiplying P ( D | O ) ) and P S G D | g M and P S A D | a M can be calculated by not incorporating the order weight (as we calculated P S G D | R X , Y or P S A D | R X , Y by not multiplying P ( D | O ) ).
We also report all algorithms’ predicted performance, as how well they predict four causal pairwise relationships–ØX Y, ØXY, HX Y, and HXY–introduced in Table 1 by comparing the algorithm’s prediction of RX,Y ∈ { XY, XY, X(none)Y} with the true underlying relationships TX,Y ∈ {ØX Y, ØXY, HX Y, HXY}. In addition to the predictive performance, we also report the following for each RX,Y and for each TX,Y:
P A R X , Y | T X , Y = X , Y δ T X , Y P A D | R X , Y X , Y δ T X , Y
P S A R X , Y | T X , Y = X , Y δ T X , Y P S A D | R X , Y X , Y δ T X , Y
δ T X , Y = 1     i f   t r u e   r e l a t i o n s h i p   i s   T X , Y               0     i f   t r u e   r e l a t i o n s h i p   i s   n o t   T X , Y
where X , Y δ T X , Y is the number of underlying true relationships (i.e., counts in Table 1). Finally, we report the percentage of the algorithm’s most probable prediction of RX,Y given the true underlying true relationships TX,Y by calculating the following:
C A R X , Y | T X , Y = X , Y δ R X , Y , T X , Y X , Y δ T X , Y
δ R X , Y , T X , Y = 1     i f   t r u e   r e l a t i o n s h i p   i s   T X , Y   a n d   R X , Y a r g m a x r X , Y P A D | r X , Y 0                                                                                       o t h e r w i s e                                                                                                                    
C S A R X , Y | T X , Y = X , Y δ R X , Y , T X , Y X , Y δ T X , Y
δ R X , Y , T X , Y = 1     i f   t r u e   r e l a t i o n s h i p   i s   T X , Y   a n d   R X , Y a r g m a x r X , Y P S A D | r X , Y 0                                                                                             o t h e r w i s e                                                                                                                    
δ T X , Y = 1     i f   t r u e   r e l a t i o n s h i p   i s   T X , Y               0     i f   t r u e   r e l a t i o n s h i p   i s   n o t   T X , Y
We have also run other causal discovery algorithms, such as PC [9], K2 [5], and BiDAG [12] on the same datasets, i.e., 50 and 1000 cases for Sparse 9 variables (in D50S9 and D1KS9); and 50 and 1000 cases for Close 9 variables (in D50C9 and D1KC9). Since BiDAG could only incorporate binary random variables for learning, we converted all the variables in the datasets as continuous variables. This was done by adding normal noise with μ = 0 ,   δ = 0.01 to each measurement of discrete data. The reason we have used these parameters for noise was that they have given the most consistent conditional independencies among the variables when we compared the original discrete data and converted continuous data.

3. Results

Figure 6 reports the highest scored structure reported by BDe scores for each dataset. It is interesting to note that even with a large number of samples and a significantly more likely Global BDe Structure, i.e., for 1000 cases (D1KS9) and its BDe percentage structure score of >99%, it predicts incorrect mechanisms, e.g., HRBP is predicted as a cause of CO and CO is predicted as a cause of LVFailure (Figure 6c). However, the generating structure shows that HRBP is not a cause of CO (they are confounded by Catechol), and LVFailure is a cause of CO (Figure 5a). Another interesting result to notice is that even with many cases (i.e., 1000 cases), the highest BDe scored structure may obtain a mere 4% of the total BDe structure score.
Figure 7 shows consensus structures using PSG(D|RX,Y) (without incorporating the order weight) for D50S9, D50C9, D1KS9, and D1KC9. The arcs thicknesses are based on PSG(D|XY) or PSG(D|YX). If PSG(D|XY) is displayed as a percentage, then PSG(D|YX) is also displayed as a percentage in the parentheses. If PSG(D|XY) and PSG(D|YX) both are less than 0.0001, then no arcs are drawn between X and Y. >99 or ~0 indicates where the pairwise causal relationship probability is greater than 0.9999 or less than 0.0001, respectively. Similarly, Figure 8 shows consensus structures using PG(D|RX,Y) (with incorporating the order weight) for D50S9, D50C9, D1KS9, and D1KC9.
The Global BDe structure using D50S9 was marginally better (maximum likelihood of 0.1423) than other structures. All of the models incorrectly identified causal effects from LVFailure to VentAlv; from Catechol to ExpCO2; and from HRBP to CO when compared to the generating structure (Figure 5a). In D50S9, the consensus structures generated with the order weight (Figure 8a) and without the order weight (Figure 7a) were different than the Global BDe structure (Figure 6a). A significant difference between the consensus structures generated with the order weight (Figure 8a), and without the order weight (Figure 7a), was a causal relationship between Catechol to ExpCO2. The consensus structure generated with the order weight predicted PG(D|ExpCO2 → Catechol) = 0.4803 as the most probable relationship; however, the consensus structure generated without the order weight predicted PG(D|Catechol → ExpCO2) = 0.4409 as the most probable relationship, as the generating structure (Figure 3a) showed that Catechol and ExpCO2 had no direct causal influence between each other. It is also noteworthy that one of their common causes, VentAlv, was correctly predicted to be a common cause in both consensus structures. This showed that, to some extent, we can use the disagreement between the consensus structures generated with and without the order weight to identify confounded relationships without any direct causal relationship.
The Global BDe structure using D50C9 was marginally better (maximum likelihood of 0.1571) than other structures. In D50C9, the consensus structures generated with the order weight (Figure 8b) or without the order weight (Figure 7b) were slightly different than the Global BDe structure (Figure 6b). All models incorrectly identified causal effects from Anaphylaxis to ArtCO2; from InsuffAnesth to ArtCO2; and predicted a reversed causal direction of ArtCO2 and ExpCO2 compared to the generating structure (Figure 5b). Compared to the same 50 cases, D50S9, no significant differences were observed between the consensus structures generated with the order weight (Figure 8b) and without the order weight (Figure 7b).
Only in D1KS9, both consensus structures generated with (Figure 8c) or without the order weight (Figure 7c) agreed with the Global BDe structure (Figure 6c). This is not surprising because the Global BDe structure was significantly better (>0.9999) than any other structures. However, all models incorrectly predicted the following three causal relationships: between CO and LVFailure (reversed causal prediction); between Intubation and ExpCO2 (missing causal prediction); and added between Catechol and BP (unnecessary causal prediction) compared to the generating structure (Figure 5a).
The Global BDe structure using D1KC9 was marginally better (maximum likelihood of 0.0403). Among the four datasets, it resulted in the lowest maximum likelihood, making D1KC9 the most difficult dataset to learn causal relationships from. All models incorrectly identified a causal effect from ArtCO2 to SaO2 (Figure 5b). In D1KC9, the consensus structures generated with the order weight (Figure 8d) and without the order weight (Figure 7d) were different than the Global BDe structure (Figure 6d). A significant difference between the consensus structures generated with the order weight (Figure 8d) and without the order weight (Figure 7d) was the prediction of a causal relationship between VentLung and ArtCO2. The consensus structure generated with the order weight predicted PG(D|ArtCO2 → VentLung) = 0.5556 as being the most probable relationship; however, the consensus structure generated without the order weight predicted PG(D|VentLung → ArtCO2) = 0.6154 as being the most probable relationship. As the generating structure (Figure 3b) shows VentLung and ArtCO2 have a direct causal influence between each other and their common cause, Intubation is hidden in the dataset. This shows how difficult it is to learn reliable causal relationships among the upstream variables in which most of the confounded causes are hidden in the dataset.
We believe all these results are due to the omission of 28 variables and random sampling effects. Also, as the later results will show, with 50 cases, it is more difficult to learn the generating structure of C9, and with 1000 cases it is more difficult to learn the generating structure of S9.
Table 2 shows all the orders (from the total of 9! = 362,880 orders) that received a combined percentage score of >99%. Interestingly, the means were all 7.1429%. However, depending on the dataset, the standard deviation of the scores were different. The data sampled from S9 tended to show tighter percentage scores among the orders than the data sampled from C9. This means that order scores from S9 had less impact than those from C9.
Table 3 summarizes our claim that incorporating the ordering results can help us gain mechanistic knowledge. According to the distances, the BDe score had difficulties in learning the true underlying mechanisms from the generating structure with 50 cases of C9. However, by adding more samples, i.e., with 1000 cases of C9, we improved the ability to learn the true underlying mechanisms from the generating structure.
Overall, the results shown in Table 3 illustrate that order weight improves in learning the true underlying mechanisms from the generating structure. In the 1000 cases of S9 (D1KS9), as it was mentioned earlier (shown in Figure 6c), there was only one structure that was significant in terms of BDe score (i.e., >99% of the total BDe structure score). Because of this fact, all orders that were compliant with the dominating structure had a very similar score with a very tight margin, resulting in almost all the same order score (Table 2). Therefore, in this situation we can see why the order score will not improve in learning the true underlying mechanisms from the generating structure.
Table 4 and Table 5 compare the structure distances between (1) the algorithm’s predicted structures and the generated structures (Generated δ), and (2) the algorithm’s predicted structures and the best BDe structure scores (Global BDe δ). In some sense, Generated δ measures how well the algorithm learns the underlying mechanism from a phenomenon, and Global BDe δ measures how well the algorithm estimates the best BDe (or BGe) score from the sample.
In 50 cases spanning Table 4a and Table 5a, it is clear that all the MCMC ordering algorithms (Random, Prior, and PrePrior) outperformed constrained variant algorithms (BiDAG, K2, and PC) in terms of Generated δ and Global BDe δ with datasets D50S9 and D50C9. Also, in general, algorithms with the order weight predicted better in generating structures (i.e., lower Generated δ and Global BDe δ,) with a higher confidence (i.e., lower variance.)
With the maximum hours (4 h) run, Random and PrePrior converged on their predictions; however, Prior showed some variance in performance. We note that with a lesser number of hours (1 and 2 h), PrePrior showed better performances (better predictions with confidence, i.e., less variance) than Random in D50S9 and comparable predictions in D50C9 (in 1 h run, Random Generated δ was 22.31 with variance of 0.302, and PrePrior Weak Correct achieved Generated δ 22.65 with a very low variance, 0.001 (Table 4a)).
The structure distances of 1000 cases are shown in Table 4b and Table 5b. K2 showed the best Generated δ and Global BDe δ in D1KS9; however, its performance was the lowest among all the algorithms in D1KC9. We believe this was because, in D1KS9, as it was mentioned earlier (shown in Figure 6c), there was only one structure that was significant in terms of its BDe score (>99% of the total BDe structure scores).
The BiDAG performance in Global BDe δ in D1KS9 was the second best (next to K2′s); however, Generated δ in D1KS9 was either comparable or worse than the MCMC ordering algorithms (Random, Prior, and PrePrior). It seems MCMC ordering algorithms need more than 16 h to converge, although structure distances were generally decreasing in D1KC9, however, that trend is questionable in D1KS9.
We could not find a general pattern as we saw in 50 cases that better predicted the generating structures (lower Generated δ and Global BDe δ) with a higher confidence, i.e., a lower variance with order weight in 1000 cases. We believe this fact has to do with the results that we mentioned earlier, i.e., that MCMC ordering algorithms needs more than 16 h to converge.
With the outstanding performance of K2 in D1KS9 reported earlier, however, we must also mention the outstanding performance of the Prior algorithm with the Strong Correct prior, which achieved a better performance that was statistically significant in a mere 2h run in D1KC9. In D1KC9, all algorithms showed larger than ten for Generated δ, except for Prior. Prior achieved lower than ten for Generated δ with a high confidence (variance of 8.136; significantly lower than the second lowest variance of 18.0 from BiDAG).
Table 6 and Table 7 compare the Markov blanket distances between the algorithm’s predicted Markov blanket of each variable in the structures (for short, we refer it to MB) and MB in the generated structure (Generated δ), as well as the distance of the algorithm’s predicted MB and the MB of the best BDe structure scores (Global BDe δ).
In 50 cases from Table 6a and Table 7a, it is clear that all the MCMC ordering algorithms (Random, Prior, and PrePrior) outperformed the constrained variant algorithms (BiDAG, K2, and PC) in Generated δ and Global BDe δ with datasets D50S9. In dataset D50C9, BiDAG was slightly better (16.0 vs. 16.19) in Generated δ; however, it was significantly worse in Global BDe δ. Also, in general, Generated δ and Global BDe δ of the algorithms with the order weight did not change much because the MB distances were low to begin with (Generated δ ranged from 16.00 to 16.53, and with the order weight it ranged from 16.00 to 16.50; Global BDe δ ranged from 0.00 to 8.93, and with the order weight it ranged from 0.00 to 5.03). We note that with the order weight, the 1h runs in D50C9 showed lower Global BDe δ with a higher confidence, i.e., a lower variance.
With the maximum hour (4 h) run, Random and PrePrior predictions converged; however, Prior showed some variance in its performance. We note that with a smaller number of hours (1 and 2 h) runs, PrePrior showed better performances (better predictions with higher confidence (i.e., lower variance) than Random in D50S9, and comparable performances in D50C9 (in 1 h run, Random Generated δ was 16.17 with variance of 0.0, PrePrior Weak Correct achieved Generated δ 16.16 with a very low variance, 0.0 (Table 6a).
MB distances of 1000 cases are shown in Table 6b and Table 7b. In D1KS9, PrePrior with Strong and Weak Prior achieved the best Generated δ (16.00) with a variance of 12.0. K2 showed the best Global BDe δ (0.0) in D1KS9. Also, in general, Generated δ and Global BDe δ of the algorithms with the order weight did not change much because the MB distances were low to begin with (Generated δ ranged from 7.99 to 18.0, and with the order weight it ranged from 7.81 to 18.0; Global BDe δ ranged from 4.66 (excluding 0.0 from K2) to 13.83 (excluding 18.0 from BiDAG), and with the order weight it ranged from 4.80 (excluding 0.0 from K2) to 13.75 (excluding 18.0 from BiDAG)).
In D1KC9, most of the MCMC ordering algorithms (Random, Prior, and PrePrior) outperformed the constrained variant algorithms (BiDAG, K2, and PC) in Generated δ and Global BDe δ. In 2 h runs, Prior with Weak Correct prior achieved the best Generated δ (7.99; the runner-up was PrePrior Weak Correct prior with 9.15) and Global BDe δ (5.82; the runner-up was PrePrior Weak Correct prior with 8.16); however, the most confident prediction came from PrePrior Weak Correct prior in Generated δ (0.912; the runner-up was Prior Weak Correct prior with 0.938).
Also, in D1KC9 with 4 h runs, PrePrior with Strong Correct prior achieved the best Generated δ (10.27; the runner-up was Random with 10.31) and Random achieved the best Global BDe δ (8.07; the runner-up was PrePrior Strong Correct prior with 9.97). In 16 h runs, Random achieved the best Generated δ (8.38; the runner-up was PrePrior Weak Correct prior with 9.43) and Global BDe δ (4.66; the runner-up was PrePrior Strong Correct prior with 7.26).
Table 8 and Table 9 show algorithm’s predicted probabilities of four causal pairwise relationships shown in Figure 4. In all four datasets, all the MCMC ordering algorithms (Random, Prior, and PrePrior) outperformed the constrained variant algorithms (BiDAG, and K2) in the confounded relationships HX Y (no causal relationship) or HXY (causal relationship). K2 and BiDAG incorrectly predicted (with probability of 0.0) the true underlying confounded relationships: for example, with 1000 cases, using D1KS9, BiDAG predicted all the three true HXY relationships with a probability of 0.0, and using D1KC9, BiDAG, and K2 predicted all of the four true HX Y relationships with probability of 0.0. Typically, algorithms with the order weight tended to perform better in correctly predicting true causally independent relationships (ØX Y and HX Y) and performed worse in correctly predicting true causal predictions (ØXY and HXY).
Table 10 and Table 11 show the algorithm’s most probable prediction rates of four causal pairwise relationships shown in Figure 4. As it was noticed earlier in Table 8 and Table 9, in all four datasets, all the MCMC ordering algorithms (Random, Prior, and PrePrior) outperformed the constrained variant algorithms (BiDAG, and K2) in confounded relationships HX Y (no causal relationship) or HXY (causal relationship). Algorithms with the order weight changed most the probable prediction rates of the confounded and causally independent predictions (HX Y) of MCMC ordering algorithms except PrePrior with Weak Correct prior in D50S9 (one relationship prediction of Y→X was changed to X→Y). Another change by weighing order was noticed in D1KC9. There, algorithms with the order weight changed most the probable prediction rates of the confounded and causally independent predictions (HX Y), and the confounded causal predictions (HXY) of PrePrior with Weak Correct prior. For HX Y, five relationships prediction of X→Y were correctly changed to the true underlying relationship, X Y; and for HXY, one relationship prediction of Y→X was correctly changed to the true underlying relationship, X→Y.

4. Discussion and Future Work

The results from this study show that learning causal relationships from data is difficult, especially because many variables are hidden to us whether we are aware of that or not. Many Big Data analytic methods have been dealing with Big Data characteristics, such as its large volume, its fast growth in size, or its variety of data types. However, as we have shown in this study, it is important to incorporate and develop causal discovery frameworks to discover underlying mechanistic processes from Big Data.
Searching through order of variables in CBN and incorporating likelihood of the order helped us better search through plausible underlying mechanistic processes even when hidden variables were present. Further incorporating the prior of the order in the search process (PrePrior algorithm) showed an increase in performance, especially when there were a limited number of cases available, than other published methods that did not incorporate the prior of the order. We believe combining different types of data, e.g., environmental, genomics, neurological, social media, etc., will further strengthen our capabilities of discovering underlying mechanistic processes from Big Data.
Our study was focused in discovering underlying mechanistic processes using a small number of variables, i.e., <30. It was practical to use a small number of variables because we were focused on understanding the effect of hidden variables when learning causal relationships from data. Thus, the results reported here should be interpreted under this premise. As it was pointed out earlier, our study is limited in telling what the effects of the other characteristics of Big Data can contribute to the discovery of underlying mechanistic processes. Moreover, understanding those characteristics effects and combination effects of them will lead us to develop novel methods that will revolutionize the future Big Data analytics.
PrePrior algorithm can be extended in many different directions. As it was shown, with 1000 cases, all MCMC ordering algorithm could not converge in their predictions. This aspect can be overcome by incorporating constraint-based methods in conjunction with the Bayesian MCMC sampling methods using BDe (or BGe) scores. This will enable us to analyze not only larger samples, but also larger number of variables, one of the hall mark characteristics of Big Data. Also, it will extend the causal discovery ability when we model hidden variables explicitly or implicitly into the PrePrior algorithm.

5. Conclusions

We have shown searching through order of variables in CBN and incorporating the likelihood of the order helped us better understand the underlying mechanistic process that generated the data even when hidden variables were introduced in the experimental design. Also, a novel algorithm in searching through the order we proposed (PrePrior algorithm) showed promising performance in better learning the underlying mechanistic process that generated the data, especially confounded causal relationships with a reasonable number of samples (≈50).

Author Contributions

Conceptualization, C.Y. and D.R.; methodology, C.Y.; software, E.G. and Z.G.; validation, E.G. and Z.G.; formal analysis, E.G. and Z.G.; investigation, C.Y, E.G. and Z.G.; resources, C.Y.; data curation, E.G. and Z.G.; writing—original draft preparation, C.Y.; writing—review and editing, C.Y., E.G. and D.R.; visualization, Z.G.; supervision, C.Y.; project administration, C.Y.; funding acquisition, C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

C.Y., E.G. and Z.G. were funded by NIH SC3GM096948 grant.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

PrePrior and order searching algorithms were implemented in C++ using SMILE (Structural Modeling, Inference, and Learning Engine, Bayes Fusion LLC) C++ library. The package is available in SMLG (Statistical Machine Learning Group) GitHub at https://github.com/smlgfiuedu/Order-Score. (accessed on 7 April 2022). Also, all data is available in SMLG forum at http://smlg.fiu.edu/phpbb/viewtopic.php?f=87&t=161. (accessed on 7 April 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations were used in this manuscript:
BiDAGBayesian Inference for Directed Acyclic Graphs, a CBN search algorithm
BDeBayesian Dirichlet prior
C9Nine variables that were connected closely in ALARM Bayesian network
CBNCausal Bayesian Network
D1KC91000 observational cases generated from C9
D1KC91000 observational cases generated from S9
D50C950 observational cases generated from C9
D50S950 observational cases generated from S9
K2A constraint based CBN search algorithm
MCMCMarkov Chain Monte Carlo
NP-hardAt least hard as nondeterministic polynomial time problem
PCA constraint based CBN search algorithm
PrePriorA new order searching algorithm that uses prior of order to search CBN
S9Nine variables that were connected sparsely in ALARM Bayesian network

References

  1. Pearl, J. Causality: Models, Reasoning, and Inference, 2nd ed.; Cambridge University Press: Cambridge, UK; New York, NY, USA, 2009; p. 464. [Google Scholar]
  2. Good, I.J. A causal calculus I & II. Br. J. Philos. Sci. 1961, 11–12, 305–318, 343–351. [Google Scholar]
  3. Suppes, P. A Probabilistic Theory of Causality; North Holland: Amsterdam, The Netherlands, 1970. [Google Scholar]
  4. Glymour, C.; Scheines, R.; Spirtes, P.; Kelley, K. Discovering Causal Structure; Academic Press: New York, NY, USA, 1987. [Google Scholar]
  5. Cooper, G.F.; Herskovits, E.H. A Bayesian method for constructing Bayesian belief networks from databases. In Proceedings of the Uncertainty in Artificail Intellegence, Los Angeles, CA, USA, 15 July 1991; pp. 86–94. [Google Scholar]
  6. Spirtes, P.; Glymour, C.; Scheines, R. An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev. 1991, 9, 62–72. [Google Scholar] [CrossRef] [Green Version]
  7. Cooper, G.F.; Yoo, C. Causal Discovery from a Mixture of Experimental and Observational Data. arXiv 1999, arXiv:1301.6686. [Google Scholar]
  8. Heckerman, D.; Meek, C.; Cooper, G.F. A Bayesian Approach to Causal Discovery; Glymour, C., Cooper, G.F., Eds.; AAAI Press: Menlo Park, CA, USA, 1999; pp. 141–165. [Google Scholar]
  9. Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search, 2nd ed.; MIT Press: Cambridge, MA, USA, 2000. [Google Scholar]
  10. Yoo, C.; Blitz, E. Local Causal Discovery Algorithm using Causal Bayesian networks. Ann. N. Y. Acad. Sci. 2009, 1158, 93–101. [Google Scholar] [CrossRef] [Green Version]
  11. Pearl, J.; Glymour, M.; Jewell, N.P. Causal Inference in Statistics: A Primer; John Wiley & Sons: Hoboken, NJ, USA, 2016. [Google Scholar]
  12. Kuipers, J.; Suter, P.; Moffa, G. Efficient Structure Learning and Sampling of Bayesian Networks. arXiv 2018, arXiv:1803.07859. [Google Scholar] [CrossRef]
  13. Sazal, M.; Stebliankin, V.; Mathee, K.; Yoo, C.; Narasimhan, G. Causal effects in microbiomes using interventional calculus. Sci. Rep. 2021, 11, 1–15. [Google Scholar]
  14. Chauhan, A.S.; Cuzzocrea, A.; Fan, L.; Harvey, J.D.; Leung, C.K.; Pazdor, A.G.; Wang, T. Predictive Big Data Analytics for Service Requests: A Framework. Procedia Comput. Sci. 2022, 198, 102–111. [Google Scholar] [CrossRef]
  15. Binelli, C. Estimating Causal Effects When the Treatment Affects All Subjects Simultaneously: An Application. Big Data Cogn. Comput. 2021, 5, 22. [Google Scholar] [CrossRef]
  16. Park, S.B.; Hwang, K.T.; Chung, C.K.; Roy, D.; Yoo, C. Causal Bayesian gene networks associated with bone, brain and lung metastasis of breast cancer. Clin. Exp. Metastasis 2020, 37, 657–674. [Google Scholar] [CrossRef]
  17. Chowdhury, D.; Das, A.; Dey, A.; Sarkar, S.; Dwivedi, A.D.; Rao Mukkamala, R.; Murmu, L. ABCanDroid: A Cloud Integrated Android App for Noninvasive Early Breast Cancer Detection Using Transfer Learning. Sensors 2022, 22, 832. [Google Scholar] [CrossRef]
  18. Ye, Q.; Amini, A.A.; Zhou, Q. Distributed Learning of Generalized Linear Causal Networks. arXiv 2022, arXiv:2201.09194. [Google Scholar]
  19. Pearl, J. Probabilistic Reasoning in Intelligent Systems; Morgan Kaufmann: San Mateo, CA, USA, 1988. [Google Scholar]
  20. Spirtes, P.; Glymour, C.; Scheines, R. Causation, Prediction, and Search, 1st ed.; MIT Press: Cambridge, MA, USA, 1993. [Google Scholar]
  21. Pearl, J.; Verma, T.S. A Theory of Inferred Causality. In Studies in Logic and the Foundations of Mathematics; Elsevier: Amsterdam, The Netherlands, 1995; Volume 134, pp. 789–811. [Google Scholar]
  22. Yoo, C.; Cooper, G. Causal Discovery of Latent-Variable Models from a Mixture of Experimental and Observational Data; Center for Biomedical Informatics: Pittsburgh, PA, USA, 2001. [Google Scholar]
  23. Yoo, C. Bayesian Method for Causal Discovery of Latent-Variable Models from a Mixture of Experimental and Observational Data. Comput. Stat. Data Anal. 2012, 56, 2183–2205. [Google Scholar] [CrossRef] [PubMed]
  24. Meek, C. Causal inference and causal explanation with background knowledge. arXiv 2013, arXiv:1302.4972. [Google Scholar]
  25. Druzdzel, M.; Simon, H. Causality in Bayesian Belief Networks. In Uncertainty in Artificial Intelligence; Elsevier: Amsterdam, The Netherlands, 1993; pp. 3–11. [Google Scholar]
  26. Cooper, G.F. A simple constraint-based algorithm for efficiently mining observational databases for causal relationships. J. Data Min. Knowl. Discov. 1997, 1, 203–224. [Google Scholar] [CrossRef]
  27. Meek, C. Selecting Graphical Models: Causal and Statistical Modeling; Department of Philosophy, Carnegie Mellon University: Pittsburgh, PA, USA, 1997. [Google Scholar]
  28. Aliferis, C.F.; Cooper, G.F. Causal Modeling with Modifiable Temporal Belief Networks; Center for Biomedical Informatics: Pittsburgh, PA, USA, 1998. [Google Scholar]
  29. Friedman, N.; Koller, D. Being Bayesian about network structure. arXiv 2013, arXiv:1301.3856. [Google Scholar]
  30. Charniak, E. Bayesian networks without tears. AI Mag. 1991, 12, 50–63. [Google Scholar]
  31. Heckerman, D.E. A Tractable Inference Algorithm for Diagnosing Multiple Diseases; Elsevier: Amsterdam, The Netherlands; Windsor, ON, Canada, 1989; pp. 174–181. [Google Scholar]
  32. Beinlich, I.A.; Suermondt, H.J.; Chavez, R.M.; Cooper, G.F. The ALARM monitoring system: A case study with two probabilistic inference techniques for belief networks. In Proceedings of the Second European Conference on Artificial Intelligence in Medical Care, Berlin, Germany, 1989; pp. 247–256. [Google Scholar]
  33. Heckerman, D. A Bayesian Approach to Learning Causal Networks. arXiv 1995, arXiv:1302.4958. [Google Scholar]
  34. Chickering, D.M.; Heckerman, D.; Meek, C. A Bayesian approach to learning Bayesian networks with local structure. arXiv 2013, arXiv:1302.1528. [Google Scholar]
  35. Chen, X.W.; Anantha, G.; Lin, X. Improving Bayesian Network Structure Learning with Mutual Information-Based Node Ordering in the K2 Algorithm. IEEE Trans. Knowl. Data Eng. 2008, 20, 628–640. [Google Scholar] [CrossRef]
  36. Mani, S.; Cooper, G.; Spirtes, P. A Theoretical Study of Y Structures for Causal Discovery. arXiv 2006, arXiv:1206.6853. [Google Scholar]
  37. Silander, T.; Myllymaki, P. A simple approach for finding the globally optimal Bayesian network structure. In Proceedings of the Uncertainty in Artificial Intelligence, Cambridge, MA, USA, 13–16 July 2006; pp. 445–452. [Google Scholar]
  38. Hartemink, A.J.; Berger, H. Banjo: Banjo is licensed from Duke University. Copyright© 2005–2008 by Alexander J. Hartemink. All rights reserved. 2005. Available online: https://users.cs.duke.edu/~amink/software/banjo/ (accessed on 7 April 2022).
  39. Cooper, G.F.; Herskovits, E. A Bayesian method for the induction of probabilistic networks from data. Mach. Learn. 1992, 9, 309–347. [Google Scholar] [CrossRef]
  40. Geiger, D.; Heckerman, D. A characterization of the Dirichlet distribution with application to learning Bayesian networks. In Maximum Entropy and Bayesian Methods; Springer: Berlin/Heidelberg, Germany, 1995; pp. 196–207. [Google Scholar]
  41. Cooper, G.F. Probabilistic Inference Using Belief Networks Is NP-Hard; KSL8-727; Stanford University: Stanford, CA, USA, 1987. [Google Scholar]
Figure 1. A causal Bayesian networks example.
Figure 1. A causal Bayesian networks example.
Bdcc 06 00056 g001
Figure 2. Three structures included in the order <X1, X2, X3>.
Figure 2. Three structures included in the order <X1, X2, X3>.
Bdcc 06 00056 g002
Figure 3. Two sets of nine variables. All the grayed-out variables are hidden and not selected. (a) Close 9 variables (C9). (b) Sparse 9 variables (S9).
Figure 3. Two sets of nine variables. All the grayed-out variables are hidden and not selected. (a) Close 9 variables (C9). (b) Sparse 9 variables (S9).
Bdcc 06 00056 g003aBdcc 06 00056 g003b
Figure 4. Four pairwise causal relationships. H represents a variable that is shaded, meaning that it is present in the ALARM network but not introduced in the datasets using C9 and S9. Not confounded and not causally related is denoted as ØX Y in (a). Not confounded and causally related is denoted as ØXY in (b). Confounded and not causally related is denoted as HX Y in (c). Confounded and causally related is denoted as HXY in (d).
Figure 4. Four pairwise causal relationships. H represents a variable that is shaded, meaning that it is present in the ALARM network but not introduced in the datasets using C9 and S9. Not confounded and not causally related is denoted as ØX Y in (a). Not confounded and causally related is denoted as ØXY in (b). Confounded and not causally related is denoted as HX Y in (c). Confounded and causally related is denoted as HXY in (d).
Bdcc 06 00056 g004
Figure 5. Generating Structures for Sparse 9 (a) and Close 9 (b) variables.
Figure 5. Generating Structures for Sparse 9 (a) and Close 9 (b) variables.
Bdcc 06 00056 g005
Figure 6. The highest scored Global BDe Structure for (a) D50S9 (14.23%), (b) D50C9 (15.71%), (c) D1KS9 (>99%) and (d) D1KC9 (4.03%). BDe percentage score in the parentheses.
Figure 6. The highest scored Global BDe Structure for (a) D50S9 (14.23%), (b) D50C9 (15.71%), (c) D1KS9 (>99%) and (d) D1KC9 (4.03%). BDe percentage score in the parentheses.
Bdcc 06 00056 g006
Figure 7. Consensus structure without the order weight for (a) D50S9, (b) D50C9, (c) D1KS9, and (d) D1KC9. Thicknesses of the arcs are based on the pairwise causal relationship probability that is presented as a label in percentage (the reverse causal relationship probability is presented in the parentheses). >99 and ~0 represent pairwise causal relationship probability greater than 0.9999 and less than 0.0001, respectively.
Figure 7. Consensus structure without the order weight for (a) D50S9, (b) D50C9, (c) D1KS9, and (d) D1KC9. Thicknesses of the arcs are based on the pairwise causal relationship probability that is presented as a label in percentage (the reverse causal relationship probability is presented in the parentheses). >99 and ~0 represent pairwise causal relationship probability greater than 0.9999 and less than 0.0001, respectively.
Bdcc 06 00056 g007
Figure 8. Consensus structure with the order weight for (a) D50S9, (b) D50C9, (c) D1KS9, and (d) D1KC9. Thicknesses of the arcs are based on the pairwise causal relationship probability that is presented as a label in percentage (the reverse causal relationship probability is presented in the parentheses). >99 and ~0 represent pairwise causal relationship probability greater than 0.9999 and less than 0.0001 respectively.
Figure 8. Consensus structure with the order weight for (a) D50S9, (b) D50C9, (c) D1KS9, and (d) D1KC9. Thicknesses of the arcs are based on the pairwise causal relationship probability that is presented as a label in percentage (the reverse causal relationship probability is presented in the parentheses). >99 and ~0 represent pairwise causal relationship probability greater than 0.9999 and less than 0.0001 respectively.
Bdcc 06 00056 g008aBdcc 06 00056 g008b
Table 1. Number of pairwise causal relationships in Close 9 variables (C9) and Sparse 9 variables (S9). H represents a variable that is shaded. (a) Close 9 variables (C9). (b) Sparse 9 variables (S9).
Table 1. Number of pairwise causal relationships in Close 9 variables (C9) and Sparse 9 variables (S9). H represents a variable that is shaded. (a) Close 9 variables (C9). (b) Sparse 9 variables (S9).
(a)
Pairwise
Relationship
ØX YØXYHX YHXY
Count148410
(b)
Pairwise
Relationship
ØX YØXYHX YHXY
Count72063
Table 2. Mean and Standard Deviation (S.D.) of the Dataset Global BDe Best Order percentage score.
Table 2. Mean and Standard Deviation (S.D.) of the Dataset Global BDe Best Order percentage score.
DatasetD50S9D50C9D1KS9D1KC9
Mean7.1429%7.1429%7.1429%7.1429%
S.D.0.0014700.002034.48 × 10−88.37 × 10−6
Table 3. Structure distances between the generating causal structure and the Dataset Global BDe Best Order.
Table 3. Structure distances between the generating causal structure and the Dataset Global BDe Best Order.
DatasetD50S9D50C9D1KS9D1KC9
Without the order weight18.212321.676018.000014.2517
With the order weight17.223821.637018.000013.5725
Table 4. Structure distances without the order weight. (a) 50 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset. (b) 1000 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset.
Table 4. Structure distances without the order weight. (a) 50 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset. (b) 1000 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset.
(a)
D50S9D50C9
Generated δGlobal BDe δGenerated δGlobal BDe δ
MeanVarMeanVarMeanVarMeanVar
1 h Random 20.1411.1752.9125.42022.310.30212.89124.57
PSC24.719.61211.4044.96924.257.27120.326.456
WC22.5314.3538.2460.00125.807.25221.826.606
SI22.500.8988.6716.20425.857.49121.696.102
WI23.180.7467.682.07027.550.02923.430.005
PPSC18.210.0000.000.00022.320.31512.95125.70
WC18.210.0000.000.00022.650.00119.170.077
SI17.362.2041.466.41622.340.32713.00126.82
WI18.210.0000.000.00022.340.32713.00126.82
2 h Random18.210.0000.000.00021.680.0000.000.000
PSC18.130.0190.150.07122.670.00019.510.000
WC18.140.0160.140.05922.680.00019.290.143
SI18.140.0160.140.05922.670.00019.510.000
WI17.362.2041.466.41622.680.00019.290.143
PPSC18.210.0000.000.00021.680.0000.000.000
WC18.210.0000.000.00021.680.0000.000.000
SI18.210.0000.000.00021.680.0000.000.000
WI18.210.0000.000.00021.680.0000.000.000
4 h Random18.210.0000.000.00021.680.0000.000.000
PSC18.210.0000.000.00021.680.0000.000.000
WC18.210.0000.000.00022.320.31512.95125.70
SI18.210.0000.000.00022.000.3176.46125.18
WI18.210.0000.010.00121.680.0000.000.000
PPSC18.210.0000.000.00021.680.0000.000.000
WC18.210.0000.000.00021.680.0000.000.000
SI18.210.0000.000.00021.680.0000.000.000
WI18.210.0000.000.00021.680.0000.000.000
BiDAG48.008.00058.948.95739.002.00044.243.068
K228.00-14.46-44.00-46.16-
PC40.00-50.98-33.00-32.85-
(b)
D1KS9D1KC9
Generated δGlobal BDe δGenerated δGlobal BDe δ
MeanVarianceMeanVarianceMeanVarianceMeanVariance
2 h Random39.6858.60742.3771.34525.83201.87026.68200.887
PSC33.11285.03728.44377.9269.148.13610.862.249
WC33.8178.69942.5426.08825.20243.39825.39223.036
SI40.6737.33336.00156.00013.5219.84717.2536.087
WI36.6765.33346.0012.00025.34253.66925.40278.016
PPSC37.5637.92636.4417.92620.24103.44124.5070.396
WC36.00300.00033.78509.48112.7937.14616.0720.011
SI35.33341.33335.33645.33327.45271.04829.21236.507
WI45.116.37045.8536.06724.92263.52127.72284.627
4 h Random41.8314.08340.17116.08313.3335.66514.8492.436
PSC39.0019.00041.0039.00023.3470.61125.36180.337
WC44.004.00044.6717.33325.01276.11725.79303.470
SI38.6111.12242.7776.69923.80259.14326.79214.084
WI39.395.10639.33105.33320.4595.31321.8379.829
PPSC38.0016.00042.6757.33319.59400.96824.97301.929
WC36.3732.51437.1225.04621.3151.33822.2067.272
SI38.0016.00041.1118.37026.79179.47327.00205.282
WI38.6725.33338.00100.00026.79179.47327.00205.282
16 h Random37.4417.92637.7248.89811.113.5679.439.872
PSC38.7610.31332.9037.02714.6922.69717.5762.939
WC44.671.33348.5710.12213.9516.38518.7113.462
SI39.3317.33339.0073.00014.4428.32519.522.893
WI29.67140.33329.67364.33315.4643.44117.669.317
PPSC43.003.00049.6716.33314.3915.41414.4317.557
WC40.0028.00041.6758.33316.439.10215.335.494
SI32.37174.60129.03651.99523.44225.68925.63158.151
WI44.229.48146.8937.92612.248.37613.3357.693
BiDAG40.000.00028.000.00017.0018.00018.358.960
K218.00-0.00-66.00-60.37-
PC41.00-48.00-30.00-24.27-
P: Prior, PP: PrePrior, SC: Strong Correct, WC: Weak Correct, SI: Strong Incorrect, WI: Weak Incorrect.
Table 5. Structure distances with the order weight. (a) 50 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset. (b) 1000 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset.
Table 5. Structure distances with the order weight. (a) 50 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset. (b) 1000 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset.
(a)
D50S9D50C9
Generated δGlobal BDe δGenerated δGlobal BDe δ
MeanVarMeanVarMeanVarMeanVar
1 h Random 19.6430.355.8151.16221.700.0051.481.493
PSC30.1460.8919.72120.60123.7711.8138.0833.905
WC19.045.7866.5615.90424.7016.1259.9618.692
SI21.066.4388.617.39124.385.99712.2810.007
WI21.565.1548.7212.07126.1510.58812.9157.855
PPSC16.860.0970.840.45421.880.0904.4634.262
WC16.950.0730.630.34221.800.0113.737.171
SI16.320.6971.973.17021.990.1156.7444.955
WI16.860.0270.820.10721.900.0845.2031.125
2 h Random16.530.0121.530.05521.630.0000.070.000
PSC16.200.0792.120.34121.940.0015.670.502
WC16.330.0222.130.16721.970.0306.6411.473
SI16.290.0032.050.01421.960.0156.154.471
WI16.040.1722.610.78821.960.0296.2014.410
PPSC17.040.0240.430.11121.630.0000.050.000
WC17.040.0250.430.10521.620.0000.070.000
SI17.040.0240.430.11221.630.0000.060.000
WI17.030.1010.430.45321.640.0010.090.002
4 h Random16.560.0011.470.00321.630.0000.050.000
PSC16.560.0011.470.00321.650.0000.080.000
WC16.520.0071.560.03021.720.0051.351.645
SI16.540.0141.520.05721.670.0020.490.498
WI16.400.0721.820.31421.640.0000.080.000
PPSC17.220.0000.040.00021.620.0000.060.000
WC17.220.0000.040.00021.630.0000.050.000
SI17.220.0000.050.00021.630.0000.070.000
WI17.220.0000.040.00021.640.0000.060.000
BiDAG48.008.00058.948.95739.002.00044.243.068
K228.00-14.46-44.00-46.16-
PC40.00-50.98-33.00-32.85-
(b)
D1KS9D1KC9
Generated δGlobal BDe δGenerated δGlobal BDe δ
MeanVarianceMeanVarianceMeanVarianceMeanVariance
2 h Random39.5942.69242.9756.08625.56209.27827.13217.735
PSC33.11285.03728.44377.9278.414.90110.990.078
WC34.0376.58042.5327.00825.61226.01526.49246.942
SI40.2246.81535.33185.33413.0825.57817.1356.053
WI36.6765.33346.0012.00025.27243.76126.92279.100
PPSC38.0047.96736.9526.06920.10105.75925.5977.767
WC35.10273.64833.43489.51612.7637.20715.9735.531
SI35.33341.33335.33645.33326.50291.27629.65294.053
WI43.8717.64745.2552.43525.21252.00828.56306.454
4 h Random42.1013.59440.57120.58712.8739.93815.80100.781
PSC39.0019.00041.0039.00023.3271.32226.40192.191
WC44.004.00044.6717.33325.14271.99326.98289.026
SI37.435.24741.7379.88223.18279.93127.16239.742
WI39.425.00339.33105.33318.5012.70321.9540.858
PPSC38.0016.00042.6757.33321.0380.47722.9277.419
WC34.8428.73135.4522.96619.69398.65725.21327.081
SI38.6717.33342.0016.00021.2048.63223.6267.978
WI39.1133.03738.44113.92626.85177.83128.76188.909
16 h Random37.1021.48137.8550.11310.509.38610.1319.393
PSC38.5020.92033.9155.92914.3725.39718.4064.781
WC44.930.65748.8012.26314.0415.64619.2614.133
SI39.2418.30139.1469.67014.3928.54219.905.251
WI29.27157.21329.27390.81316.1739.22918.7211.306
PPSC43.003.00049.6716.33314.5016.53815.9315.372
WC40.4536.57842.1170.07216.517.74916.577.196
SI32.20173.35428.87647.19024.20244.54826.97221.240
WI44.229.48946.8937.93611.3613.99514.0047.746
BiDAG40.000.00028.000.00017.0018.00018.358.960
K218.00-0.00-66.00-60.37-
PC41.00-48.00-30.00-24.27-
P: Prior, PP: PrePrior, SC: Strong Correct, WC: Weak Correct, SI: Strong Incorrect, WI: Weak Incorrect.
Table 6. Markov blanket distances without the order weight. (a) 50 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset. (b) 1000 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset.
Table 6. Markov blanket distances without the order weight. (a) 50 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset. (b) 1000 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset.
(a)
D50S9D50C9
Generated δGlobal BDe δGenerated δGlobal BDe δ
MeanVarMeanVarMeanVarMeanVar
1 h Random16.050.010.761.73416.170.0004.7817.170
PSC16.140.013.144.23316.280.0487.681.059
WC16.140.022.455.90016.400.0478.281.077
SI16.040.002.572.74616.400.0458.261.048
WI16.110.002.010.16616.530.0008.930.001
PPSC16.000.000.000.00016.170.0004.8017.286
WC16.000.000.000.00016.160.0007.150.003
SI16.000.001.073.40716.170.0004.8217.401
WI16.000.000.000.00016.170.0004.8217.401
2 h Random16.000.000.000.00016.190.0000.000.000
PSC16.000.000.170.08316.160.0007.230.000
WC16.000.000.150.07216.160.0007.180.007
SI16.000.000.150.07216.160.0007.230.000
WI16.000.001.073.40716.160.0007.180.007
PPSC16.000.000.000.00016.190.0000.000.000
WC16.000.000.000.00016.190.0000.000.000
SI16.000.000.000.00016.190.0000.000.000
WI16.000.000.000.00016.190.0000.000.000
4 h Random16.000.000.000.00016.190.0000.000.000
PSC16.000.000.000.00016.190.0000.000.000
WC16.000.000.000.00016.170.0004.8017.286
SI16.000.000.000.00016.180.0002.3917.170
WI16.000.000.020.00116.190.0000.000.000
PPSC16.000.000.000.00016.190.0000.000.000
WC16.000.000.000.00016.190.0000.000.000
SI16.000.000.000.00016.190.0000.000.000
WI16.000.000.000.00016.190.0000.000.000
BiDAG18.000.00018.000.00016.000.00018.000.000
K216.00-13.79-18.00-18.00-
PC18.00-18.00-18.00-18.00-
(b)
D1KS9D1KC9
Generated δGlobal BDe δGenerated δGlobal BDe δ
MeanVarianceMeanVarianceMeanVarianceMeanVariance
2 h Random18.000.0013.031.00311.508.50210.9513.371
PSC18.000.009.3317.3337.990.9385.820.333
WC17.411.0313.830.09012.7610.27712.3015.951
SI18.000.008.679.3339.461.3698.824.790
WI17.331.3313.670.33311.7914.82810.8530.014
PPSC18.000.0011.112.37012.397.74112.1314.534
WC18.000.0010.009.0009.150.9128.162.063
SI18.000.0011.7825.48112.328.61111.6117.310
WI18.000.0011.820.18312.516.66711.7621.675
4 h Random18.000.009.8311.08310.3110.2198.0730.555
PSC18.000.0012.671.33312.0910.99111.4323.473
WC18.000.0011.339.33311.805.34010.9016.552
SI17.490.7713.420.46812.7616.14212.3425.819
WI18.000.0011.6716.33311.136.47310.4812.860
PPSC18.000.0012.671.33310.2713.4269.9715.665
WC18.000.0012.181.52211.1611.52710.7920.176
SI18.000.0012.220.14812.402.22111.5810.344
WI18.000.0010.3314.33312.402.22111.5810.344
16 h Random18.000.0011.891.4548.380.2504.660.750
PSC18.000.008.334.33310.739.5849.2523.921
WC18.000.0012.903.3139.690.6778.664.616
SI18.000.0012.338.3339.501.1909.241.328
WI18.000.0010.509.25011.6011.05610.3817.422
PPSC16.0012.0012.330.3339.610.6348.244.689
WC16.0012.0011.672.3339.431.4527.261.587
SI16.338.338.6757.33312.526.55211.6519.964
WI18.000.0012.224.1489.635.6307.6315.827
BiDAG18.000.00018.000.00011.002.00011.132.000
K218.00-0.00-18.00-17.86-
PC18.00-18.00-18.00-18.00-
P: Prior, PP: PrePrior, SC: Strong Correct, WC: Weak Correct, SI: Strong Incorrect, WI: Weak Incorrect.
Table 7. Markov blanket distances with the order weight. (a) 50 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset. (b) 1000 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset.
Table 7. Markov blanket distances with the order weight. (a) 50 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset. (b) 1000 cases datasets. Dark shaded cells represent the lowest distance or variance in each timed run for the dataset; Bright shaded cells represent the second lowest distance or variance in each timed run for the dataset.
(a)
D50S9D50C9
Generated δGlobal BDe δGenerated δGlobal BDe δ
MeanVarMeanVarMeanVarMeanVar
1 h Random16.050.0062.062.12916.190.0000.600.165
PSC16.160.0135.037.23416.330.0683.094.954
WC16.070.0042.611.09616.400.1033.822.873
SI16.030.0002.770.56116.350.0364.671.479
WI16.080.0032.700.77916.500.0524.938.424
PPSC16.000.0000.660.22616.180.0001.724.575
WC16.000.0000.490.17316.180.0001.410.999
SI16.000.0001.441.56516.180.0002.565.958
WI16.000.0000.620.04216.180.0001.974.155
2 h Random16.000.0001.140.02916.190.0000.110.000
PSC16.000.0001.510.17616.180.0002.130.069
WC16.000.0001.520.08616.180.0002.491.568
SI16.000.0001.490.00416.180.0002.320.617
WI16.000.0001.890.39316.180.0002.341.980
PPSC16.000.0000.340.05116.190.0000.110.000
WC16.000.0000.340.03916.190.0000.120.000
SI16.000.0000.340.05316.190.0000.110.000
WI16.000.0000.340.21116.190.0000.140.001
4 h Random16.000.0001.090.00216.190.0000.110.000
PSC16.000.0001.100.00116.190.0000.130.000
WC16.000.0001.160.01216.180.0000.560.199
SI16.000.0001.120.03316.190.0000.270.053
WI16.000.0001.350.13716.190.0000.130.000
PPSC16.000.0000.080.00016.190.0000.100.000
WC16.000.0000.080.00016.190.0000.110.000
SI16.000.0000.080.00016.190.0000.110.000
WI16.000.0000.080.00016.190.0000.110.000
BiDAG18.000.00018.000.00016.000.00018.000.000
K216.00-13.79-18.00-18.00-
PC18.00-18.00-18.00-18.00-
(b)
D1KS9D1KC9
Generated δGlobal BDe δGenerated δGlobal BDe δ
MeanVarianceMeanVarianceMeanVarianceMeanVariance
2 h Random18.000.00013.071.01411.3310.10611.2013.067
PSC18.000.0009.3317.3337.810.7545.850.182
WC17.440.94713.750.18712.8510.93712.2619.666
SI18.000.0008.679.3339.351.8548.708.596
WI17.331.33313.670.33311.7914.79411.3226.604
PPSC18.000.00011.142.21212.338.41312.3513.923
WC18.000.00010.027.9299.130.9568.226.685
SI18.000.00011.7825.48112.0611.44511.4524.335
WI18.000.00012.030.19912.526.59711.7923.237
4 h Random18.000.0009.7110.52610.2410.6168.2330.807
PSC18.000.00012.671.33311.9812.26711.4227.102
WC18.000.00011.339.33311.805.28911.0216.626
SI17.490.76513.080.68512.6118.28812.4125.506
WI18.000.00011.6716.33310.927.61510.6413.564
PPSC18.000.00012.671.33311.424.97510.7412.474
WC18.000.00012.000.69410.2713.4209.7118.499
SI18.000.00012.330.33311.1611.52210.9118.678
WI18.000.00010.4415.25912.262.79611.699.394
16 h Random18.000.00012.030.4818.250.5294.801.709
PSC18.000.0008.514.84410.788.9529.3822.068
WC18.000.00012.934.3989.531.1038.764.205
SI18.000.00012.476.9819.220.9509.241.265
WI18.000.00010.3011.47012.019.35710.6316.177
PPSC18.000.00012.330.3339.720.4008.303.678
WC18.000.00011.782.8219.310.7547.431.861
SI18.000.0008.6757.33312.586.02811.6423.386
WI18.000.00012.224.1539.476.3957.6216.874
BiDAG18.000.00018.000.00011.002.00011.132.000
K218.00-0.00-18.00-17.86-
PC18.00-18.00-18.00-18.00-
P: Prior, PP: PrePrior, SC: Strong Correct, WC: Weak Correct, SI: Strong Incorrect, WI: Weak Incorrect.
Table 8. Algorithms’ predicted probabilities of four causal pairwise relationships without the order weight. (a) Dataset for Sparse 9 variable with 50 cases (D50S9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (b) Dataset for Close 9 variable with 50 cases (D50C9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (c) Dataset for Sparse 9 variable with 1000 cases (D1KS9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (d) Dataset for Close 9 variable with 1000 cases (D1KC9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship.
Table 8. Algorithms’ predicted probabilities of four causal pairwise relationships without the order weight. (a) Dataset for Sparse 9 variable with 50 cases (D50S9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (b) Dataset for Close 9 variable with 50 cases (D50C9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (c) Dataset for Sparse 9 variable with 1000 cases (D1KS9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (d) Dataset for Close 9 variable with 1000 cases (D1KC9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship.
(a)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.1680.3420.1521.000
Y→X0.4420.6350.5780.000
X  Y0.3900.0230.2700.000
Prior SCX→Y0.1680.3420.1521.000
Y→X0.4420.6350.5780.000
X  Y0.3900.0230.2700.000
Prior WCX→Y0.1680.3420.1521.000
Y→X0.4420.6350.5780.000
X  Y0.3900.0230.2700.000
PrePrior SCX→Y0.1680.3420.1521.000
Y→X0.4420.6350.5780.000
X  Y0.3900.0230.2700.000
PrePrior WCX→Y0.1680.3420.1521.000
Y→X0.4420.6350.5780.000
X  Y0.3900.0230.2700.000
BiDAGX→Y0.1430.3500.0000.000
Y→X0.0000.1000.1670.667
X  Y0.8570.5500.8330.333
K2X→Y0.2860.2500.1670.667
Y→X0.4290.7000.8330.000
X  Y0.2860.0500.0000.333
(b)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.0380.4700.4850.204
Y→X0.1560.2530.2440.601
X  Y0.8050.2770.2710.195
Prior SCX→Y0.0380.4700.4850.204
Y→X0.1560.2530.2440.601
X  Y0.8050.2770.2710.195
Prior WCX→Y0.0320.4740.3930.402
Y→X0.1890.2530.2460.409
X  Y0.7790.2740.3610.189
PrePrior SCX→Y0.0380.4700.4850.204
Y→X0.1560.2530.2440.601
X  Y0.8050.2770.2710.195
PrePrior WCX→Y0.0380.4700.4850.204
Y→X0.1560.2530.2440.601
X  Y0.8050.2770.2710.195
BiDAGX→Y0.0711.0000.2500.400
Y→X0.5000.0000.7500.300
X  Y0.4290.0000.0000.300
K2X→Y0.4290.5000.7500.700
Y→X0.2860.3750.2500.300
X  Y0.2860.1250.0000.000
(c)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.1390.3710.2690.935
Y→X0.1350.1360.0460.065
X  Y0.7260.4930.6850.000
Prior SCX→Y0.2480.3900.1071.000
Y→X0.1430.0930.2380.000
X  Y0.6090.5170.6550.000
Prior WCX→Y0.1090.4110.4680.730
Y→X0.0480.1630.0710.270
X  Y0.8440.4260.4600.000
PrePrior SCX→Y0.0240.3670.4720.944
Y→X0.0240.1750.0000.056
X  Y0.9520.4580.5280.000
PrePrior WCX→Y0.1560.3400.1391.000
Y→X0.4290.6310.5180.000
X  Y0.4150.0280.3430.000
BiDAGX→Y0.5710.4000.5000.000
Y→X0.2860.4000.3330.667
X  Y0.1430.2000.1670.333
K2X→Y0.4290.4000.0001.000
Y→X0.2860.5500.3330.000
X  Y0.2860.0500.6670.000
(d)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.0460.6410.2790.582
Y→X0.0460.2870.2050.374
X  Y0.9080.0720.5170.044
Prior SCX→Y0.0230.4860.1320.535
Y→X0.0360.3440.2720.287
X  Y0.9410.1700.5950.178
Prior WCX→Y0.0120.4330.1220.645
Y→X0.0160.3580.2680.252
X  Y0.9720.2090.6100.103
PrePrior SCX→Y0.0420.5430.2110.635
Y→X0.0370.3240.2410.257
X  Y0.9200.1330.5480.109
PrePrior WCX→Y0.0520.5710.2700.652
Y→X0.0610.3200.3290.237
X  Y0.8870.1090.4010.111
BiDAGX→Y0.0000.7500.7500.400
Y→X0.1430.2500.2500.600
X  Y0.8570.0000.0000.000
K2X→Y0.4290.3750.7500.500
Y→X0.5710.3750.2500.500
X  Y0.0000.2500.0000.000
Table 9. Algorithms’ predicted probabilities of four causal pairwise relationships with the order weight. (a) Dataset for Sparse 9 variable with 50 cases (D50S9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (b) Dataset for Close 9 variable with 50 cases (D50C9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (c) Dataset for Sparse 9 variable with 1000 cases (D1KS9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (d) Dataset for Close 9 variable with 1000 cases (D1KC9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship.
Table 9. Algorithms’ predicted probabilities of four causal pairwise relationships with the order weight. (a) Dataset for Sparse 9 variable with 50 cases (D50S9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (b) Dataset for Close 9 variable with 50 cases (D50C9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (c) Dataset for Sparse 9 variable with 1000 cases (D1KS9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (d) Dataset for Close 9 variable with 1000 cases (D1KC9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship.
(a)
Algorithm True ØX  Y ØXY HX  Y HXY
Prediction
RandomX→Y0.1570.3380.1401.000
Y→X0.4260.6280.4480.000
X  Y0.4170.0330.4120.000
Prior SCX→Y0.1570.3380.1401.000
Y→X0.4260.6280.4480.000
X  Y0.4170.0330.4120.000
Prior WCX→Y0.1570.3380.1401.000
Y→X0.4260.6280.4440.000
X  Y0.4170.0340.4160.000
PrePrior SCX→Y0.1560.3400.1391.000
Y→X0.4290.6310.5180.000
X  Y0.4150.0280.3430.000
PrePrior WCX→Y0.1560.3400.1391.000
Y→X0.4290.6310.5180.000
X  Y0.4150.0280.3430.000
BiDAGX→Y0.1430.3500.0000.000
Y→X0.0000.1000.1670.667
X  Y0.8570.5500.8330.333
K2X→Y0.2860.2500.1670.667
Y→X0.4290.7000.8330.000
X  Y0.2860.0500.0000.333
(b)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.0370.4700.4840.203
Y→X0.1560.2520.2420.600
X  Y0.8070.2780.2750.197
Prior SCX→Y0.0370.4700.4830.202
Y→X0.1560.2520.2420.600
X  Y0.8070.2780.2750.198
Prior WCX→Y0.0370.4700.4730.223
Y→X0.1590.2520.2420.581
X  Y0.8040.2780.2840.197
PrePrior SCX→Y0.0370.4700.4840.203
Y→X0.1560.2520.2420.600
X  Y0.8070.2780.2750.197
PrePrior WCX→Y0.0370.4700.4840.203
Y→X0.1560.2520.2420.600
X  Y0.8070.2780.2750.197
BiDAGX→Y0.0711.0000.2500.400
Y→X0.5000.0000.7500.300
X  Y0.4290.0000.0000.300
K2X→Y0.4290.5000.7500.700
Y→X0.2860.3750.2500.300
X  Y0.2860.1250.0000.000
(c)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.1330.3770.2570.955
Y→X0.1250.1140.0320.045
X  Y0.7430.5090.7120.000
Prior SCX→Y0.2320.3900.1211.000
Y→X0.1450.0940.2300.000
X  Y0.6220.5170.6490.000
Prior WCX→Y0.1050.4190.4890.689
Y→X0.0320.1610.0670.311
X  Y0.8630.4200.4440.000
PrePrior SCX→Y0.0240.3670.4720.944
Y→X0.0240.1750.0000.056
X  Y0.9520.4580.5280.000
PrePrior WCX→Y0.1670.3500.3520.852
Y→X0.0240.1360.0560.148
X  Y0.8090.5140.5930.000
BiDAGX→Y0.5710.4000.5000.000
Y→X0.2860.4000.3330.667
X  Y0.1430.2000.1670.333
K2X→Y0.4290.4000.0001.000
Y→X0.2860.5500.3330.000
X  Y0.2860.0500.6670.000
(d)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.0450.6400.2600.586
Y→X0.0410.2870.1740.376
X  Y0.9140.0730.5660.039
Prior SCX→Y0.0200.4730.1400.534
Y→X0.0350.3490.2450.301
X  Y0.9450.1790.6150.165
Prior WCX→Y0.0110.4490.1150.649
Y→X0.0220.3530.2800.243
X  Y0.9670.1980.6060.109
PrePrior SCX→Y0.0360.5240.1990.637
Y→X0.0370.3310.2360.248
X  Y0.9270.1460.5650.115
PrePrior WCX→Y0.0460.5800.2880.667
Y→X0.0510.3160.3410.211
X  Y0.9040.1040.3710.122
BiDAGX→Y0.0000.7500.7500.400
Y→X0.1430.2500.2500.600
X  Y0.8570.0000.0000.000
K2X→Y0.4290.3750.7500.500
Y→X0.5710.3750.2500.500
X  Y0.0000.2500.0000.000
Table 10. Algorithms’ most probable prediction rates by four causal pairwise relationships without the order weight. (a) Dataset for Sparse 9 variable with 50 cases (D50S9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (b) Dataset for Close 9 variable with 50 cases (D50C9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (c) Dataset for Sparse 9 variable with 1000 cases (D1KS9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (d) Dataset for Close 9 variable with 1000 cases (D1KC9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship.
Table 10. Algorithms’ most probable prediction rates by four causal pairwise relationships without the order weight. (a) Dataset for Sparse 9 variable with 50 cases (D50S9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (b) Dataset for Close 9 variable with 50 cases (D50C9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (c) Dataset for Sparse 9 variable with 1000 cases (D1KS9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (d) Dataset for Close 9 variable with 1000 cases (D1KC9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship.
(a)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.140.350.001.00
Y→X0.430.650.500.00
X  Y0.430.000.500.00
Prior SCX→Y0.140.350.001.00
Y→X0.430.650.500.00
X  Y0.430.000.500.00
Prior WCX→Y0.140.350.001.00
Y→X0.430.650.500.00
X  Y0.430.000.500.00
PrePrior SCX→Y0.140.350.001.00
Y→X0.430.650.500.00
X  Y0.430.000.500.00
PrePrior WCX→Y0.140.350.171.00
Y→X0.430.650.330.00
X  Y0.430.000.500.00
BiDAGX→Y0.140.350.000.00
Y→X0.000.100.170.67
X  Y0.860.550.830.33
K2X→Y0.290.250.170.67
Y→X0.430.700.830.00
X  Y0.290.050.000.33
(b)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
Prior SCX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
Prior WCX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
PrePrior SCX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
PrePrior WCX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
BiDAGX→Y0.071.000.250.40
Y→X0.500.000.750.30
X  Y0.430.000.000.30
K2X→Y0.430.500.750.70
Y→X0.290.380.250.30
X  Y0.290.130.000.00
(c)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.140.400.171.00
Y→X0.140.050.000.00
X  Y0.710.550.830.00
Prior SCX→Y0.290.400.001.00
Y→X0.140.050.330.00
X  Y0.570.550.670.00
Prior WCX→Y0.140.300.331.00
Y→X0.000.150.000.00
X  Y0.860.550.670.00
PrePrior SCX→Y0.000.350.501.00
Y→X0.000.200.000.00
X  Y1.000.450.500.00
PrePrior WCX→Y0.140.300.331.00
Y→X0.000.100.000.00
X  Y0.860.600.670.00
BiDAGX→Y0.570.400.500.00
Y→X0.290.400.330.67
X  Y0.140.200.170.33
K2X→Y0.430.400.001.00
Y→X0.290.550.330.00
X  Y0.290.050.670.00
(d)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.000.750.250.60
Y→X0.000.250.000.40
X  Y1.000.000.750.00
Prior SCX→Y0.000.380.250.60
Y→X0.000.380.250.40
X  Y1.000.250.500.00
Prior WCX→Y0.000.380.000.70
Y→X0.000.380.500.10
X  Y1.000.250.500.20
PrePrior SCX→Y0.000.380.250.70
Y→X0.000.380.000.10
X  Y1.000.250.750.20
PrePrior WCX→Y0.000.630.750.60
Y→X0.000.380.250.40
X  Y1.000.000.000.00
BiDAGX→Y0.000.750.750.40
Y→X0.140.250.250.60
X  Y0.860.000.000.00
K2X→Y0.430.380.750.50
Y→X0.570.380.250.50
X  Y0.000.250.000.00
Table 11. Algorithms’ most probable prediction rates by four causal pairwise relationships with the order weight. (a) Dataset for Sparse 9 variable with 50 cases (D50S9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (b) Dataset for Close 9 variable with 50 cases (D50C9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (c) Dataset for Sparse 9 variable with 1000 cases (D1KS9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (d) Dataset for Close 9 variable with 1000 cases (D1KC9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship.
Table 11. Algorithms’ most probable prediction rates by four causal pairwise relationships with the order weight. (a) Dataset for Sparse 9 variable with 50 cases (D50S9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (b) Dataset for Close 9 variable with 50 cases (D50C9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (c) Dataset for Sparse 9 variable with 1000 cases (D1KS9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship. (d) Dataset for Close 9 variable with 1000 cases (D1KC9). Dark shaded cells represent the best prediction of the correct causal relationship; Bright shaded cells represent the second best prediction of the correct causal relationship.
(a)
Algorithm True ØX  Y ØXY HX  Y HXY
Prediction
RandomX→Y0.140.350.171.00
Y→X0.430.650.330.00
X  Y0.430.000.500.00
Prior SCX→Y0.140.350.171.00
Y→X0.430.650.330.00
X  Y0.430.000.500.00
Prior WCX→Y0.140.350.171.00
Y→X0.430.650.330.00
X  Y0.430.000.500.00
PrePrior SCX→Y0.140.350.171.00
Y→X0.430.650.330.00
X  Y0.430.000.500.00
PrePrior WCX→Y0.140.350.171.00
Y→X0.430.650.330.00
X  Y0.430.000.500.00
BiDAGX→Y0.140.350.000.00
Y→X0.000.100.170.67
X  Y0.860.550.830.33
K2X→Y0.290.250.170.67
Y→X0.430.700.830.00
X  Y0.290.050.000.33
(b)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
Prior SCX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
Prior WCX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
PrePrior SCX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
PrePrior WCX→Y0.000.500.500.20
Y→X0.140.250.250.60
X  Y0.860.250.250.20
BiDAGX→Y0.071.000.250.40
Y→X0.500.000.750.30
X  Y0.430.000.000.30
K2X→Y0.430.500.750.70
Y→X0.290.380.250.30
X  Y0.290.130.000.00
(c)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.140.400.171.00
Y→X0.140.050.000.00
X  Y0.710.550.830.00
Prior SCX→Y0.290.400.001.00
Y→X0.140.050.330.00
X  Y0.570.550.670.00
Prior WCX→Y0.140.300.331.00
Y→X0.000.150.000.00
X  Y0.860.550.670.00
PrePrior SCX→Y0.000.350.501.00
Y→X0.000.200.000.00
X  Y1.000.450.500.00
PrePrior WCX→Y0.140.300.331.00
Y→X0.000.100.000.00
X  Y0.860.600.670.00
BiDAGX→Y0.570.400.500.00
Y→X0.290.400.330.67
X  Y0.140.200.170.33
K2X→Y0.430.400.001.00
Y→X0.290.550.330.00
X  Y0.290.050.670.00
(d)
Algorithm TrueØX  YØXYHX  YHXY
Prediction
RandomX→Y0.000.750.250.60
Y→X0.000.250.000.40
X  Y1.000.000.750.00
Prior SCX→Y0.000.380.250.60
Y→X0.000.380.250.20
X  Y1.000.250.500.20
Prior WCX→Y0.000.380.000.70
Y→X0.000.380.500.10
X  Y1.000.250.500.20
PrePrior SCX→Y0.000.380.250.70
Y→X0.000.380.000.10
X  Y1.000.250.750.20
PrePrior WCX→Y0.000.630.250.70
Y→X0.000.380.250.10
X  Y1.000.000.500.20
BiDAGX→Y0.000.750.750.40
Y→X0.140.250.250.60
X  Y0.860.000.000.00
K2X→Y0.430.380.750.50
Y→X0.570.380.250.50
X  Y0.000.250.000.00
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Yoo, C.; Gonzalez, E.; Gong, Z.; Roy, D. A Better Mechanistic Understanding of Big Data through an Order Search Using Causal Bayesian Networks. Big Data Cogn. Comput. 2022, 6, 56. https://doi.org/10.3390/bdcc6020056

AMA Style

Yoo C, Gonzalez E, Gong Z, Roy D. A Better Mechanistic Understanding of Big Data through an Order Search Using Causal Bayesian Networks. Big Data and Cognitive Computing. 2022; 6(2):56. https://doi.org/10.3390/bdcc6020056

Chicago/Turabian Style

Yoo, Changwon, Efrain Gonzalez, Zhenghua Gong, and Deodutta Roy. 2022. "A Better Mechanistic Understanding of Big Data through an Order Search Using Causal Bayesian Networks" Big Data and Cognitive Computing 6, no. 2: 56. https://doi.org/10.3390/bdcc6020056

Article Metrics

Back to TopTop