Next Article in Journal
A Partial Information Decomposition for Multivariate Gaussian Systems Based on Information Geometry
Previous Article in Journal
GC-STCL: A Granger Causality-Based Spatial–Temporal Contrastive Learning Framework for EEG Emotion Recognition
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Learning Causes of Functional Dynamic Targets: Screening and Local Methods

1
School of Mathematical Sciences, Peking University, Beijing 100871, China
2
College of Science, Beijing Forestry University, Beijing 100083, China
*
Authors to whom correspondence should be addressed.
Entropy 2024, 26(7), 541; https://doi.org/10.3390/e26070541
Submission received: 2 April 2024 / Revised: 13 June 2024 / Accepted: 23 June 2024 / Published: 24 June 2024
(This article belongs to the Section Information Theory, Probability and Statistics)

Abstract

:
This paper addresses the challenge of identifying causes for functional dynamic targets, which are functions of various variables over time. We develop screening and local learning methods to learn the direct causes of the target, as well as all indirect causes up to a given distance. We first discuss the modeling of the functional dynamic target. Then, we propose a screening method to select the variables that are significantly correlated with the target. On this basis, we introduce an algorithm that combines screening and structural learning techniques to uncover the causal structure among the target and its causes. To tackle the distance effect, where long causal paths weaken correlation, we propose a local method to discover the direct causes of the target in these significant variables and further sequentially find all indirect causes up to a given distance. We show theoretically that our proposed methods can learn the causes correctly under some regular assumptions. Experiments based on synthetic data also show that the proposed methods perform well in learning the causes of the target.

1. Introduction

Identifying the causes of a target variable is a primary objective in numerous research studies. Sometimes, these target variables are dynamic, observed at distinct time intervals, and typically characterized by functions or distinct curves that depend on other variables and time. We call them functional dynamic targets. For example, in nature, the growth of animals and plants is usually multistage and nonlinear with respect to time [1,2,3]. The popular growth curve functions, including Logistic, Gompertz, Richards, Hossfeld IV, and Double-Logistic functions, have S shapes [3], and have been widely used to model the patterns of growth. In psychological and cognitive science, researchers usually fit individual learning and forgetting curves by power functions; individuals may have different curve parameters [4,5].
The causal graphical model is widely used for the automated derivation of causal influences in variables [6,7,8,9] and demonstrates excellent performance in presenting complex causal relationships between multiple variables and expressing causal hypotheses [7,10,11]. In this paper, we aim to identify the underlying causes of these functional dynamic targets using the graphical model. There are three main challenges for this purpose. Firstly, identifying the causes is generally more challenging than exploring associations, even though the latter has received substantial attention, as evidenced by the extensive use of Genome-Wide Association Studies (GWAS) within the field of bioinformatics. Secondly, it is difficult to use a causal graphical model to represent the generating mechanism of dynamic targets and to find the causes of the targets from observational data when the number of variables is very large. For example, one needs to find the genes that affect the growth curve of individuals from more than thousands of Single-Nucleotide Polymorphisms (SNPs). Finally, the variables considered are mixed, which increases the complexity of representing and learning the causal model. We discuss these three challenges in detail below.
First of all, traditional statistical methods can only discover correlations between variables, rather than causal relationships, which may give false positive or false negative results for finding the real causes of the target. In fact, the association between the target and variables may originate from different causal mechanisms. For example, Figure 1 displays several different causal mechanisms possibly resulting in a statistically significant association between the target and variables. In Figure 1, X 1 , X 2 , X 3 are three random variables, Y = ( Y 1 , , Y n ) is a vector representing a functional dynamic target, in which Y i , i = 1 , , n are states of the target in n time points, and direct edges represent direct causal relations among them. Using the statistical methods, we are very likely to find that X 1 is associated with Y significantly in all four cases. However, it is hard to identify whether X 1 is a real cause of Y without further causal learning. As shown in Figure 1, X 1 might be a direct cause of Y in Figure 1a,c, a cause but not a direct cause in Figure 1b, and not a cause in Figure 1d.
In addition, when the number of candidate variables is very huge, both learning causal structures and discovering target causes become very difficult. In fact, learning the complete causal graph is redundant and wasteful for the task of finding causes, as the focus should be on the target variable’s local structure. PCD-by-PCD algorithm [12] is adept at identifying such local structures and efficiently distinguishing parents, children, and some descendants. The MB-by-MB method [13], in contrast, simplifies this by learning Markov Blanket (MB) sets for identifying direct causes/effects, leveraging simpler and quicker techniques compared with PCD sets with methods like PCMB, STMB, and EEMB [14,15,16]. The CMB algorithm further streamlines this process using a topology-based MB discovery approach [17]. However, Ling [18] pointed out that Expand-Backtracking-type algorithms, such as the PCD-by-PCD and CMB algorithms, may overlook some v-structures, leading to numerous incorrect edge orientations. To tackle these issues, the APSL algorithm was introduced and designed to learn the subgraph within a specific distance centered around the target variable. Nonetheless, its dependence on the FCBF method for Markov Blanket learning tends to produce approximate sets rather than precise ones [19]. Furthermore, Ling [18] emphasized learning the local graph within a certain distance from the target rather than focusing on the causes of the target.
Finally, the variables in question are varied; specifically, the targets consist of dynamic time series or complex curves, while the other variables may be either discrete or continuous. Consequently, measuring the connections between the target and other variables presents significant challenges. For instance, traditional statistical methods used to assess independence or conditional independence between variables and complex targets might not only be inefficient but also ineffective, especially when there is an insufficient sample size to accurately measure high-order conditional independence.
In this paper, we introduce a causal graphical model tailored for analyzing dynamic targets and propose two methods to identify the causes of such a functional dynamic target assuming no hidden variables or selection biases. Initially, after establishing our dynamic target causal graphical model, we conduct an association analysis to filter out most variables unrelated to the target. With data from the remaining significantly associated variables, we then combine the screening method with structural learning algorithms and introduce the SSL algorithm to identify the causes of the target. Finally, to mitigate the distance effects that can mask the association between a cause and the target in data sets where the causal chain from cause to target is excessively long, we propose a local method. This method initially identifies the direct causes of the target and then proceeds to learn the causes sequentially in reverse order along the causal path.
The main contributions of this paper include the following:
  • We introduce a causal graphical model that combines Bayesian networks and functional dynamic targets to represent the causal mechanism of variables and the target.
  • We present a screening method that significantly reduces the dimensions of potential factors and combines it with structural learning algorithms to learn the causes of a given target and prove that all identifiable causes can be learned correctly.
  • We propose a screening-based and local method to learn the causes of the functional dynamic target up to any given distance among all factors. This method is helpful when association disappears due to the long distance between indirect causes and the target.
  • We experimentally study our proposed method on a simulation data set to demonstrate the validity of the proposed methods.

2. Preliminary

Before introducing the main results of this paper, we need to clarify some definitions and notations related to graphs. Furthermore, unless otherwise specified, we use capital letters such as V to denote variables or vertices, boldface letters such as V to denote variable sets or vectors, and lowercase letters such as v and v to denote the realization of a variable or vector, respectively.
A graph G is a pair ( V , E ) , in which V = { V 1 , , V p } is the vertex set and E E ( V ) : = ( V × V ) { ( V i , V i ) V i V } is the edge set. To simplify the symbols, we use V to represent both random variables and the corresponding nodes in the graph. For any two nodes V i , V j V , an undirected edge between V i and V j , denoted by V i V j , is an edge satisfying ( V i , V j ) E and ( V j , V i ) E , while a directed edge between V i and V j , denoted by V i V j , is an edge satisfying ( V i , V j ) E and ( V j , V i ) E . If all edges in a graph are undirected (directed), the graph is called an undirected (directed) graph. If a graph has both undirected and directed edges, then it is called a partially directed graph. For a given graph G , we use V ( G ) and E ( G ) to denote its vertex set and edge set, respectively, where G can be an undirected, directed, or partially directed graph. For any V V , the induced subgraph of G over V , denoted by G ( V ) or G V , is the graph with vertex set V and edge set E ( V ) E containing all and only edges between vertices in V , that is, G V = ( V , E ( V ) ) , where E ( V ) : = E ( V × V ) .
In a graph G , V i is a parent of V j and V j is a child of V i if the directed edge V i V j is in G . V i and V j are neighbors of each other if the undirected edge V i V j is in G . V i and V j are called adjacent if they are connected by an edge, regardless of whether the edge is directed or undirected. We use P a ( V i , G ) , C h ( V i , G ) , N e ( V i , G ) , A d j ( V i , G ) to denote the sets of parents, children, neighbors, and adjacent vertices of V i in G , respectively. For any vertex set V V , the parent set of V in G can be defined as P a ( V , G ) = V i V P a ( V i , G ) V . The sets of children, neighbors, and adjacent vertices of V in G can be defined similarly. A root vertex is the vertex without parents. For any vertex V i V , the degree of V i in G , denoted by d e g ( V i , G ) , is the number of V i ’s adjacent vertices, that is, d e g ( V i , G ) = | A d j ( V i , G ) | . The skeleton of G , denoted by G s , is an undirected graph obtained by transforming all directed edges in G to undirected edges, that is, G s : = ( V , E S ) , where E s : = { ( V i , V j ) V × V V i A d j ( V j , G ) } .
The sequence < V i 1 , , V i n > in graph G is an ordered collection of distinct vertices V i 1 , , V i n . A sequence becomes a path, denoted by ( V i 1 , , V i n ) , if every pair of consecutive vertices in the sequence is adjacent in G . The vertices V i 1 and V i n serve as the endpoints, with the rest being intermediate vertices. For a path π = ( V i 1 , , V i n ) in G , and for any 1 k n , the subpath from V i 1 to V i k is π ( V i 1 , V i k ) = ( V i 1 , , V i k ) , and path π can thus be represented as a combination of its subpaths, denoted by π = π ( V i 1 , V i k ) π ( V i k , V i n ) . A path is partially directed if there is no directed edge V i k + 1 V i k in G for any k = 1 , , n 1 . A partially directed path is directed (or undirected) if all its edges are directed (or undirected). A vertex V i is an ancestor of V j and V j is a descendant of V i if there exists a directed path from V i to V j or V i = V j . The sets of ancestors and descendants of V i in the graph G are denoted by A n ( V i , G ) and D e ( V i , G ) , respectively. Furthermore, a vertex V i is a possible ancestor of V j and V j is a possible descendant of V i if there is a partially directed path from V i to V j . The sets of possible ancestors and possible descendants of V i in graph G are denoted by P o s s A n ( V i , G ) and P o s s D e ( V i , G ) , respectively. For any vertex set V V , the ancestor set of V in graph G is A n ( V , G ) : = V i V A n ( V i , G ) . The sets of possible ancestors and (possible) descendants of V in graph G can be defined similarly.
A (directed, partially directed, or undirected) cycle is a (directed, partially directed, or undirected) path from a node to itself. The length of a path (cycle) is the number of edges on the path (cycle). The distance between two variables V i and V j is the length of the shortest directed path from V i to V j . A directed acyclic graph (DAG) is a directed graph without directed cycles, and a partially directed acyclic graph (PDAG) is a partially directed graph without directed cycles. A chain graph is a partially directed graph in which all partially directed cycles are undirected. This indicates that both DAGs and undirected graphs can be considered as specific types of chain graphs.
In a graph G , a v-structure is a tuple ( V i , V j , V k ) satisfying V i V j V k with V i A d j ( V k , G ) , in which V j is called a collider. A path π is d-separated (blocked) by a set of vertices Z if (1) π contains a chain V i V j V k or a fork V i V j V k with V j Z ; (2) π contains a v-structure V i V j V k with D e ( V j , G ) Z , and is d-connected otherwise [20]. Sets of vertices X and Y are d-separated by Z if and only if Z blocks all paths from any vertex V i X to any vertex V j Y , denoted by X G Y Z . Furthermore, for any distribution P, X P Y Z denotes that X and Y are conditional independent given Z . Given a DAG G and a distribution P, the Markov condition holds if X P Y Z X G Y Z , while faithfulness holds if X G Y Z X P Y Z . In fact, for any distribution, there exists at least one DAG such that the Markov condition holds, but there are some certain distributions that do not satisfy faithfulness to any DAG. Therefore, unlike the Markov condition, faithfulness is often regarded as an assumption. In this paper, unless otherwise stated, we assume that faithfulness holds, that is, X G Y Z X P Y Z . For simplicity, we use the symbol to denote both (conditional) independence and d-separation.
From the concepts described, it can be inferred that a DAG characterizes the (conditional) independence relationships among a set of variables. In fact, multiple different DAGs may characterize the same conditional independent relationship. According to the Markov condition and faithfulness assumption, if the d-separation relationship contained in two DAGs is exactly the same, then these two DAGs are said to be Markov equivalent. Furthermore, two DAGs are Markov equivalent if and only if they share the same skeleton and v-structures [21]. All Markov equivalent DAGs constitute a Markov equivalent class, which can be represented by a completely partially directed acyclic graph (CPDAG) G . Two vertices are adjacent in the CPDAG G if and only if they are adjacent in all DAGs in the equivalent class. The directed edge V i V j in CPDAG G indicates that this directed edge appears in all DAGs within the equivalent class, whereas the undirected edge V i V j signifies that V i V j is present in some DAGs and V i V j in others within the equivalent class [22]. A CPDAG is a chain graph [23] and can be learned by observational data and Meek’s rules [24] (Figure 2).

3. The Causal Graphical Model of Potential Factors and Functional Dynamic Target

Let X = { X 1 , , X p } be a set of random variables representing potential factors and Y = ( Y 1 , , Y q ) be a functional dynamic target, where Y i , for i = 1 , , q , represents the state of the target at q different time points. Let G be a DAG defined over X Y , and let G X be the subgraph induced by G over the set of potential factors X . Suppose that the causal network of X can be represented by G X , and when combined with the joint probabilities over X , denoted by P ( · ) , we obtain a causal graphical model ( G X , P ) . Consequently, the data generation mechanisms of X and Y follow a causal Bayesian network model of G X and a model determined by the direct causes P a ( Y , G ) of Y , respectively. Formally, we define a causal graphical model of the functional dynamic target as follows.
Definition 1. 
Let G be a DAG over X Y , P a ( Y , G ) denote the direct causes of Y in G and C h ( Y , G ) = , P ( · ) be a joint distribution over X , and Θ be parameters determining the expectations of the functional dynamic target Y , which is influenced by P a ( Y , G ) . Then, the triple ( G , P ( · ) , Θ ) constitutes a causal graphical model for Y if the following two conditions hold:
  • The pair ( G X , P ) constitutes a Bayesian network model for X .
  • The functional dynamic target Y follows the following model:
    Y = μ ˜ ( Θ ) + ϵ ˜ Y ,
    where μ ˜ ( Θ ) = ( μ ( t 1 , Θ ) , , μ ( t q , Θ ) ) is the vector of the mean function at time t 1 , , t q , and ϵ ˜ Y = ( ϵ Y , t 1 , , ϵ Y , t q ) is the vector of error terms with mean of zero, that is, E ( ϵ Y , t i ) = 0 , i = 1 , , q .
Different functional dynamic targets use different mean functions. For example, the optimal mean function of growth curves of different species varies from the Gompertz function, μ ( t , ( a , b , c ) ) = a e b e c t , the Richards function, μ ( t , ( a , b , c , d ) ) = a / ( 1 + b e c t ) d , the Hossfeld function, μ ( t , ( a , k , c ) ) = a t k / ( c + t k ) , and the Logistic function, μ ( t , ( a , r , c ) ) = a / ( 1 + e r ( t c ) ) to Double-Logistic function, μ ( t , ( a 1 , r 1 , c 1 , a 2 , r 2 , c 2 ) ) = a 1 / ( 1 + e r 1 ( t c 1 ) ) + a 2 / ( 1 + e r 2 ( t c 2 ) )  [25,26,27].
A causal graphical model of the functional dynamic target can be interpreted as a data generation mechanism of variables in X and Y as follows. First, the root variables in G X are generated according to their marginal probabilities. Then, following the topological ordering of the DAG G X , for any non-root-variable X, when its parent nodes P a ( X , G ) have been generated, X can be drawn from P ( X P a ( X , G ) ) , which is the conditional probability of X given its parent set P a ( X , G ) . Finally, the target is generated by Equation (1). According to Definition 1, the Markov condition holds for the causal graphical model of a dynamic target, that is, for any pair of variables X i and X j , the d-separation of X i and X j given a set Z in G implies that X i and X j are conditionally independent given Z .
Given a mean function μ ( t , Θ ) , we can estimate parameters Θ ^ as follows,
Θ ^ = arg min Θ i = 1 n j = 1 q y i , t j μ ( t j , Θ ) 2 ,
where n and q represent the number of individuals and the length of the functional dynamic target, respectively. The residual sum of squares (RSS) is minimized at Θ ^ . The Akaike information criterion (AIC) can be used to select the appropriate mean function μ to fit the functional dynamic targets. We have
μ = arg min μ ( n q + n q log ( 2 π ) + n q log ( R S S / q ) + 2 | Θ ^ | ) .

4. Variable Screening for Causal Discovery

For the set of potential factors X and the functional dynamic target Y , our task is to find the direct causes and all causes of Y up to a given distance. An intuitive method involves learning the causal graph G to find all causes of Y . Alternatively, we could first learn the causal graph G X and then identify all variables that have directed paths to Y . However, as mentioned in Section 1, this intuitive approach has three main drawbacks. To address these challenges, we propose a variable screening method to reduce the number of potential factors, and a hypothesis testing method to test for (conditional) independence between potential factors and Y . By integrating these methods with structural learning approaches, we have developed an algorithm capable of learning and identifying all causes of functional dynamic targets.
Let X be a variable with level K. The variable X is not independent of Y if there exists at least two values of X, say X = x 1 and X = x 2 , such that the conditional distributions of Y given X = x 1 and X = x 2 are different. Conversely, if the conditional distribution of Y given X = x remains unchanged for any x, we have that X and Y are independent. Let Θ x be the parameter of the mean function of the functional dynamic target with X = x . To ascertain whether the variable X is not independent of Y , we implement the following test:
H 0 : Θ x = Θ x , x , x { 1 , , K } ,
H 1 : Θ x Θ x , x , x { 1 , , K } .
Let y i = ( y i , t 1 , , y i , t q ) be the ith sample of the functional dynamic target with X = x i . Under the null hypothesis, y i is modeled as y i = μ ˜ ( Θ ) + ϵ ˜ , whereas under the alternative hypothesis, it is modeled as y i = μ ˜ ( Θ x i ) + ϵ ˜ . Let ln L ( Y , Θ ) denote the unrestricted log-likelihood of Y under H 0 and let ln L ( Y , Θ H 1 ) = x = 1 K ln L ( Y , Θ x ) denote the restricted log-likelihood of Y under H 1 . The likelihood ratio statistic is calculated as follows:
L R = 2 ( ln L ( Y , Θ ) ln L ( Y , Θ H 1 ) ) .
Under certain regular conditions, the statistic L R approximately follows χ 2 distribution, and the degrees of freedom of this χ 2 distribution are determined by the difference in the numbers of parameters between H 0 and H 1 , as specified in Equations (2) and (3).
Therefore, by applying hypothesis tests described in Equations (2) and (3) to each potential factor, we can identify all variables significantly associated with the dynamic target. We denote these significant variables as X s i g , defined as X s i g = { X X X , X     / Y } . Indeed, since the mean function of the dynamic target depends on its direct causes, which in turn depend on indirect causes, the dynamic target ultimately depends on all its causes. Therefore, when X is precisely a cause of Y , we can reject the null hypothesis in Equation (2), implying that X s i g includes all causes of the dynamic target, assuming no statistical errors. Therefore, given a dynamic target Y , perform hypothesis testing of H 0 against H 1 as defined in Equations (2) and (3) to each potential factor sequentially, then we can obtain the set X s i g and their corresponding p-values { X p v } X X s i g , in which X p v is the p-value of the variable X X s i g .
A causal graphical model, as described in Definition 1, necessitates adherence to the Markov conditions for variables and the functional dynamic target. Given the Markov condition and the faithfulness assumption, a natural approach to identifying the causes of the functional dynamic target involves learning the causal structure of X s i g and subsequently discerning the relationship between each variable X X s i g and Y . For significant variables X s i g , we present the following theorem, with its proof available in Appendix A.2:
Theorem 1. 
Suppose that ( G , P , Θ ) constitutes a causal graphical model for the functional dynamic target Y as defined in Definition 1, with the faithfulness assumption being satisfied. Let X s i g denote the set comprising all variables in X that are dependent on Y . Then, the following assertions hold:
1. 
X s i g consists of all causes and the descendants of these causes of Y , that is, X s i g = A n ( Y , G ) D e ( A n ( Y , G ) , G ) .
2. 
For any two variables X 1 , X 2 X s i g , if either X 1 or X 2 is a cause of Y , then X 1 , X 2 are not adjacent in G X s i g if and only if there exists a set A X s i g such that X 1 X 2 A .
3. 
For any two variables X 1 , X 2 X s i g , if there exists a set A X s i g such that X 1 X 2 A , then X 1 , X 2 are not adjacent in G X s i g .
The first result of Theorem 1 implies the soundness and rationality of the method for finding X s i g mentioned above. The second result indicates that when at least one end of an edge is a cause of Y , this edge can be accurately identified (in terms of its skeleton, not its direction) using any well-known structural learning methods, such as the PC algorithm [28] and GES algorithm [29]. Contrasting with the second result, the third specifies that for any pair of variables X 1 , X 2 X s i g , if a separation set exists in X s i g that blocks X 1 , X 2 , then these variables are not adjacent in the true graph G . However, the converse does not necessarily hold due to the potential presence of a confounder or common cause X 3 X s i g , which can led to the appearance of an extraneous edge between X 1 and X 2 in the causal graph G derived solely from data on X s i g . To accommodate this, the CPDAG learned from X s i g is denoted as G X s i g , and the induced subgraph that corresponds to the true graph G over X s i g is represented as G X s i g . An illustrative example follows to elaborate on this explanation.
Example 1. 
In Figure 3, Figure 3a presents a true graph defined over X = { X 1 , , X 5 } and Y . Here, the set of significant variables is X s i g = { X 1 , X 2 , X 3 , X 5 } , and X 4 is independent of Y . Figure 3b illustrates the induced subgraph G 1 , X s i g of the CPDAG G 1 over the set X s i g , while Figure 3c displays the graph learned through the structural learning method, such as the PC algorithm, applied to X s i g . It should be noted that, in G 1 , { X 4 } is a separation set of X 1 and X 2 , that is, X 1 X 2 X 4 . However, since X 4 X s i g and structural learning only utilize data concerning X s i g , no separation set exists for X 1 and X 2 in X s i g . Consequently, X 1 and X 2 appear adjacent in the learned graph G 1 , X s i g . Furthermore, given X 2 X 3 and X 2     / X 3 X 1 , the structural learning method identifies a v-structure X 2 X 1 X 3 . A similar process yiel X 1 X 2 X 5 . Therefore, a bidirected edge X 1 X 2 appears in the learned graph G 1 , X s i g but not in G 1 , X s i g , as highlighted by the red edge in Figure 3c.
Similarly, Figure 3d presents a true graph G 2 defined over X = { X 1 , , X 5 } and Y . In this scenario, the set of significant variables is identified as X s i g = { X 1 , X 2 , X 3 , X 5 } , with X 4 being independent of Y . Figure 3b depicts the induced subgraph G 2 , X s i g of the CPDAG G 2 over X s i g , while Figure 3c illustrates the graph learned through the structural learning method, such as the PC algorithm, applied to X s i g . In G 2 , the set { X 1 , X 4 } acts as a separation set between X 2 and X 3 , indicating X 2 X 3 ( X 1 , X 4 ) . However, with X 4 X s i g and structural learning relying solely on data concerning X s i g , a separation set for X 2 and X 3 in X s i g no longer exists. As a result, X 2 and X 3 appear adjacent in the learned graph G 2 , X s i g . Furthermore, given X 3 X 5 and X 3     / X 5 X 2 , the structural learning method is capable of identifying a v-structure X 3 X 2 X 5 . Therefore, a directed edge X 3 X 2 is present in the learned graph G 2 , X s i g but not in G 2 , X s i g , as highlighted by the red edge in Figure 3f.
Example 1 illustrates two scenarios in which the graph G X s i g might include false positive edges that do not exist in G X s i g . Importantly, these additional false edges may not appear between the causes of Y . Instead, they may occur between the causes and noncauses of Y , or exclusively among the noncauses of Y , as delineated in Theorem 1. The complete result is given by Proposition A1 in Appendix A.2. Indeed, a more profound inference can be drawn: the presence of extra edges does not compromise the structural integrity concerning the causes of Y , affecting neither the skeleton nor the orientation.
Theorem 2. 
The edges in E s ( G X s i g ) E s ( G X s i g ) , if exists, do not affect the skeleton or orientation of edges among the ancestors of Y in G . Furthermore, we have A n ( Y , G X s i g Y ) = A n ( Y , G X s i g Y ) and P o s s A n ( Y , G X s i g Y ) P o s s A n ( Y , G X s i g Y ) , where G X s i g Y and G X s i g Y are graphs by adding a node Y and directed edges from Y ’s direct causes to Y in graphs G X s i g and G X s i g , respectively.
According to Theorem 2, it is evident that although the graph G X s i g obtained through structural learning does not exactly match the induced subgraph of CPDAG G over X s i g corresponding to the true graph, the causes of the functional dynamic target Y in these two graphs are identical, including the structure among these causes. Thus, in terms of identifying the causes, the two graphs can be considered equivalent. Furthermore, Theorem 2 indicates that all possible ancestors of Y in G X s i g Y are also possible ancestors in G X s i g Y , though the converse may not hold. The detailed proof is available in Appendix A.2.
Example 2. 
The true graph G is given by Figure 4a, and the corresponding CPDAG G is itself, that is, G = G . In this case, the set of significant variables is X s i g = { X 1 , X 2 , X 4 } . Figure 4b is the induced graph G X s i g Y of G ( G ) over X s i g , and Figure 4c is the CPDAG G X s i g Y obtained by using the structural learning method on X s i g . Then, we have A n ( Y , G X s i g Y ) = A n ( Y , G X s i g Y ) = { X 1 } , while P o s s A n ( Y , G X s i g Y ) = { X 2 , X 4 } = P o s s A n ( Y , G X s i g Y ) .
According to the causal graphical model in Definition 1 and the faithfulness assumption, Y is the sum of the mean function and an independent noise, and the mean function is a deterministic function of Y ’s direct causes. Therefore, for any nondescendant of Y , say X, given the direct causes of Y , that is, P a ( Y , G ) , X is independent of Y . On the contrary, for any X X s i g , X is a direct cause of Y if and only if there is no subset A A d j ( X , G ) such that X Y A .
Let A be a subset of X s i g . For any X X s i g and X A , to test the conditional independence X Y A , consider the following test:
H 0 : Θ A = a , X = x = Θ A = a , X = x , a , x , x ,
H 1 : Θ A = a , X = x Θ A = a , X = x , a , x , x .
Under the null hypothesis, the parameter only depends on the value of the set A , which can be denoted as Θ A , while under the alternative hypothesis, the parameter is determined by the values of both A and X, which can be denoted as Θ A , X . Let ln L ( Y , Θ A ) be the log-likelihood of Y under H 0 , and ln L ( Y , Θ A , X ) be the log-likelihood of Y under H 1 . The likelihood ratio statistic is
L R = 2 ( ln L ( Y , Θ A ) ln L ( Y , Θ A , X ) ) .
Under certain regular conditions, the statistic L R approximately follows χ 2 distribution, with degrees of freedom equal to | Θ A , X | | Θ A | .
Based on the above results, we propose a screening and structural learning-based algorithm to identify the causes of the functional dynamic target Y , as detailed in Algorithm 1.
In Algorithm 1, the initial step involves learning the structure over X s i g utilizing data related to X s i g through a structural learning method, detailed in Lines 1–6. The notation X Y in Lines 3–4 signifies that the connection between X and Y could be either X Y or X Y . We first learn the skeleton of X s i g following the same procedure as the PC algorithm (Line 1), with the details in Appendix A.1. Nevertheless, due to the potential occurrence of bidirected edges, adjustments are made in identifying v-structures (Lines 2–5), culminating in the elimination of all bidirected edges. According to Theorem 1, these bidirected edges, which are removed directly (Line 5), are present only between causative and noncausative variables or among noncausative variables of the functional dynamic target. Since these variable pairs are inherently (conditional) independent, removing such edges does not compromise the (conditional) independence relationships among the remaining variables, as shown in Theorem 2 and Example 1. Subsequently, we designate the set of direct causes as D C : = X s i g and sequence these variables in ascending order of their correlations with Y (Lines 7–8). This is because variables with weaker correlation are less likely to be the direct cause of Y . Placing these variables at the beginning of the sequence can quickly exclude non-direct-cause variables in the subsequent conditional independence tests, thereby enhancing the algorithm’s efficiency, simplifying its complexity, and reducing the required number of conditional independence tests. Next, we add directed edges from all vertices in X s i g to Y (Line 9) to construct the graph G X s i g Y . For each directed edge, say X Y , we check the conditional independence of X and Y given a subset A X of D C (Lines 12–14). In seeking the separation set A X , the search starts with single-element sets, progressing to sets comprising two elements, and so forth. Upon identifying a separation set, both vertices and directed edges are removed from D C and G X s i g Y , respectively (Lines 15–17). Lastly, if the separation set’s size k surpasses that of D C , implying that no conditional independence of X and Y can be found given any subset of D C , the directed edge X Y remains in G X s i g Y .
Algorithm 1 SSL: Screening and structural learning-based algorithm
Require:  X s i g and their corresponding p-values { X p v } X X s i g , data sets about X s i g and Y .
Ensure: Causes of Y .
 1:
Learn the skeleton G s ( X s i g ) of the CPDAG G X s i g defined on X s i g and obtain corresponding separation sets S based on the data set related to X s i g via Algorithm A1 in Appendix A.1;
 2:
repeat
 3:
   Find the structure X Y Z satisfying X A d j ( Z , G ) in graph G s ( X s i g ) ;
 4:
   If Y S ( X , Z ) , then orient as X Y Z ;
 5:
until All structures X Y Z with X A d j ( Z , G ) in G s ( X s i g ) have been tested;
 6:
Construct the CPDAG G X s i g by deleting all bidirected edges and using Meek’s rules to orient as many undirected edges as possible in graph G s ( X s i g ) ;
 7:
Let D C : = X s i g ;
 8:
Sort D C in ascending order of associations with Y using { X p v } X X s i g ;
 9:
Let G X s i g Y be the graph by adding a node Y to the graph G X s i g ,and for each X X s i g , add a directed edge X Y to the graph G X s i g Y ;
10:
Set k : = 1 ;
11:
while  k < | D C | , do
12:
   for each vertex X D C , do
13:
     for each subset A X of D C { X } with k vertices, do
14:
        Test the conditional independence Y X A X using Equations (5) and (6);
15:
        if  Y X A X , then
16:
           Delete the directed edge X Y in graph G X s i g Y ;
17:
           Let D C : = D C { X } ;
18:
        end if
19:
     end for
20:
   end for
21:
    k : = k + 1 ;
22:
end while
23:
return  G X s i g Y .
According to Theorem 1 and the discussion after Example 2, D C is the set of all direct causes of Y if all assumptions in Theorem 1 hold and all statistical tests are correct. Further, according to Theorem 2, all ancestors of Y can be obtained from the graph G X s i g Y . Therefore, Algorithm 1 can learn all the causes of Y correctly.
Note that in Algorithm 1, we first traverse the sizes of the separation set (Line 11) and then, for each given size, traverse all variables in the D C set and all possible separations with that size (Line 12 and 13) to test for the conditional independence of each variable and Y . That is, first fix the size of the separation set to 1, and then traverse all variables. After all variables are traversed once, increase the size of the separation set to 2, and then traverse all variables again. The advantage of this arrangement is that it can quickly remove the nondirect causes of Y and reduce the size of the D C set, thereby reducing the number of conditional independence tests and improving their accuracy. Furthermore, it is worth mentioning that the reason why we directly add directed edges from variables in D C to Y in graph G X s i g Y (Line 9) is because we assume the descendant set of Y is empty, as shown in Definition 1, and in this case, Y ’s adjacent set is exactly the direct causes we are looking for. If there is no such assumption, then it is necessary to judge the variables in Y ’s adjacent set and distinguish the parents from the children.

5. A Screening-Based and Local Algorithm

Based on the previous results and discussions, we can conclude that Algorithm 1 is capable of correctly identifying the causes of a functional dynamic target. However, Algorithm 1 requires recovering the complete causal structure of X s i g and Y . As analyzed in Section 1, learning the complete structure is unnecessary for identifying the causes of the target. Furthermore, Algorithm 1 may be influenced by the distance effect, whereby the correlation between a cause and the target may diminish from the data when the path from the cause to the target is too lengthy. Consequently, identifying this cause variable through observational data becomes challenging, potentially leading to missed causes. Therefore, we propose a screening-based and local approach to address these challenges.
In this section, we introduce a three-stage approach to learn the causes of functional dynamic targets. Initially, utilizing the causal graphical model, we apply a hypothesis testing method to screen variables, identifying factors significantly correlated with the target. Subsequently, we employ a constraint-based method to find the direct causes of the target from these significant variables. Lastly, we present a local learning method to discover the causes of these direct causes within any specified distance. We begin with the introduction of a screening-based algorithm that can learn the direct causes of Y , as shown in Algorithm 2.
In Algorithm 2, we initially set the set of direct causes D C : = X s i g and arrange these variables in ascending order of their correlations with Y (Lines 1–2), which is the same as Algorithm 1. We introduce a set N X to contain variables determined not to belong to X’s separation set, starting as an empty set (Line 3). We then check the conditional independence of each variable X D C with Y . During the search for the separation set A X , A is set as all subsets of D C ( X N X ) with k variables and is arranged roughly in descending order of their associations with Y (Lines 7–8). This is because variables that have a stronger correlation with Y are more likely to be the direct causes and are also more likely to become the separation set of other variables. Placing these variables at the beginning of the order can quickly find the separation set of nondirect causes and remove these variables from D C , which can reduce the number of conditional independence tests and accelerate the algorithm. Once we find the separation set A X for X and Y , we remove X from D C and add X to N V for each V A X (Lines 11–13). This is because when A X is the separation set of X and Y , the variables in A X appear in the path from X to Y . Consequently, X should not be in the separation set for variables in A X with respect to Y . Compared with Algorithm 1, introducing N X in Algorithm 2 improves efficiency and speed. While Algorithm 1 requires examining every subset of D C X (Line 8 in Algorithm 1), Algorithm 2 only needs to evaluate subsets of D C ( X N X ) (Line 7 in Algorithm 2). The theoretical validation of Algorithm 2’s correctness is presented below.
Algorithm 2 Screening-based algorithm for learning direct causes of Y
Require:  X s i g and their corresponding p-values { X p v } X X s i g , data sets about X s i g and Y .
Ensure: Direct causes of Y .
 1:
Let D C : = X s i g ;
 2:
Sort D C in ascending order of associations with Y using { X p v } X X s i g ;
 3:
Let N X : = for each X D C ;
 4:
Set k : = 1 ;
 5:
while  k < | D C | , do
 6:
   for each vertex X in D C , do
 7:
     Let A be the set of all subsets of D C ( { X } N X ) with k variables;
 8:
     Sort A approximately in descending order of associations with Y ;
 9:
     for each A X A , do
10:
        Test the conditional independence Y X A X using Equations (5) and (6);
11:
        if  Y X A X , then
12:
            Set D C : = D C { X }
13:
            Add X to N V for each V A X ;
14:
            break
15:
        end if
16:
     end for
17:
   end for
18:
    k : = k + 1
19:
end while
20:
return  D C .
Theorem 3. 
If all assumptions in Theorem 1 hold, and there are no errors in the independence tests, then Algorithm 2 can correctly identify all direct causes of Y .
Next, we aim to identify all causes of Y within a specified distance. One natural method is to recursively apply Algorithm 2, starting with Y ’s direct causes and then expanding to their direct causes. This process continues until all causes within the set distance are found. However, this method’s effectiveness for Y relies on the assumption that Y has no descendants, making its adjacent set its parent set. This is not the case for other variables. Thus, we must further analyze and distinguish variables in the adjacent set of other variables. Consequently, we introduce the LPC algorithm in Algorithm 3.
Algorithm 3  L P C ( T , U ) algorithm
Require: a target node T, a data set over variables X , a non-PC set U .
Ensure: the PCD set of T and set S containing all separation relations.
 1:
Set P C D : = { X : X X U , and T     / X } ; k : = 1 ; S : = ;
 2:
while  k < | P C D | , do
 3:
   for each vertex X P C D , do
 4:
       if there exists A X P C D { X } such that | A X | = k and ( T X A X ) , then
 5:
          Set P C D : = P C D { X } and add tuple ( T , X , A X ) to S ;
 6:
       end if
 7:
   end for
 8:
    k : = k + 1 ;
 9:
end while
10:
return  P C D and S .
Algorithm 3 aims to learn the local structure of a given target variable T, but in fact, the final P C D set includes T’s Parents, Children, and Descendants. This is because when verifying the conditional independence (Line 4), we remove some nonadjacent variables of T in advance (Line 1), resulting in some descendant variables being unable to find the corresponding separation set.
Example 3. 
In Figure 5, let T = X 1 , U = . Since X 1 X 4 , we initially have P C D = { X 2 , X 3 } (Line 1 in Algorithm 3). Note that there originally exists a conditional independent relationship X 1 X 3 ( X 2 , X 4 ) in the graph, but since we remove the vertex X 4 in advance, there is no longer a separation set of X 1 and X 3 in the set of P C D . Therefore, X 3 cannot be removed from P C D further and the output P C D X 1 = { X 2 , X 3 } , that is, X 3 , which is a descendant of X 1 but not a child of X 1 , is included in P C D X 1 .
Example 3 illustrates that there may indeed be some nonchildren descendants of the target variable in the P C D set obtained by Algorithm 3. Below, we show that one can identify these non-child-descendant variables by repeatedly applying Algorithm 3. For example, in Example 3, the P C D set of X 1 is P C D X 1 = { X 2 , X 3 } . Then, we can apply Algorithm 3 to X 3 and find that the P C D set of X 3 is P C D X 3 = { X 2 , X 4 } . It can be seen that X 1 is not in P C D X 3 . Hence, we can conclude that X 3 is a nonchildren descendant of X 1 ; otherwise, X 1 must be in P C D X 3 . Through this method, we can delete the non-child-descendant variables from the P C D set, so that the P C D set only contains the parents and children of the target variable. Based on this idea, we propose a step-by-step algorithm to learn all causes of a functional dynamic target locally, as shown in Algorithm 4.
Algorithm 4 PC-by-PC: Finding all causes of target T within a given distance
Require: a target set T V = X Y , data set over V , and the maximum distance m.
Ensure: all causes of T with length up to m.
 1:
Set n : = 1 , n : = 0 , C a n C : = T ;
 2:
Initial graph G with directed edges from each vertex in T to an auxiliary node L;
 3:
repeat
 4:
   Set X = C a n C n ;
 5:
   Let U = { V : X P C V , V C a n C 1 : n 1 } ;
 6:
   Get the P C D set and the separation set ( P C X , S X ) = L P C ( X , U ) ;
 7:
   for each V P C X C a n C 1 : n 1  do
 8:
     if  X P C V  then
 9:
        Add an undirected edge X V to graph G ;
10:
     else
11:
         P C X : = P C X { V } , P C V : = P C V { X } ;
12:
     end if
13:
   end for
14:
   Update G by modifying structures like V 1 X V 2 , X V 1 V 2 and X V 1 V 2 to V 1 X V 2 , X V 1 V 2 and X V 1 V 2 respectively, if the middle vertex is not in the separation set of the two end vertices;
15:
   if X is the last vertex of C a n C  then
16:
     Update G by orientating undirected edges as much as possible via Meek’s rule;
17:
     for each V C a n C n : n , do
18:
        Add P C V C a n C to the end of C a n C if | P a t h ( V , L ) | < m or the m-th edge close to L in P a t h ( V , L ) is undirected;
19:
     end for
20:
   end if
21:
    n : = n , n : = n + 1 ;
22:
until X is the last vertex of C a n C ;
23:
return  G and S .
In Algorithm 4, C a n C n represents the n-th variable in the set C a n C , and C a n C 1 : n 1 represents the first to the ( n 1 ) -th variable in the set C a n C . P a t h ( V , L ) denotes the shortest path from V to L in graph G . There are many methods to learn the shortest path, such as the Dijkstra algorithm [30]. Algorithm 4 uses the mentioned symmetric validation method to remove descendants from the P C D set (Lines 7–13), and hence, we directly write the P C D set as the P C set (Line 6). When our task is to learn all causes of a functional dynamic target Y , the target set T as the algorithm input is all direct causes of Y , which can be obtained by Algorithm 2, and the auxiliary node L is exactly the functional dynamic target Y (Line 2). In fact, we can prove it theoretically, as shown below.
Theorem 4. 
If the faithfulness assumption holds and all independence tests are correct, then Algorithm 4 can learn all causes of the input target set T within a given distance m correctly. Further, if all assumptions in Theorem 1 holds, T is the set of direct causes of the functional dynamic target Y , and the auxiliary node L in Algorithm 4 is Y , then Algorithm 4 can learn all causes of Y within a given distance ( m + 1 ) correctly.
Note that the above algorithm gradually spreads outward from the direct causes of Y , and at each step, the newly added nodes are all in the P C set of previous nodes (Line 18), which only involves the local structure of all causes of Y , greatly improving the efficiency and accuracy of the algorithm. Moreover, Algorithm 4 identifies the shortest path between each cause variable and Y . When the m-th edge on one path from Y cannot be oriented, it only continues to expand from that path, instead of expanding all paths (Line 18 in Algorithm 4), which simplifies the algorithm and reduces the learning of redundant structures.

6. Experiments

In this section, we compare the effectiveness of different methods for learning the direct and all causes of a functional dynamic target through simulation experiments. As mentioned before, to our knowledge, existing structural learning algorithms lack the specificity needed to identify causes of functionally dynamic targets, so we only compare the methods we proposed, which are as follows:
  • SSL algorithm: The screening and structural learning-based algorithm given in Algorithm 1, which can learn both direct and all causes of a dynamic target simultaneously;
  • S-Local algorithm: First, use the screening-based algorithm given in Algorithm 2, which can learn direct causes of a functional dynamic target, and then use the PC-by-PC algorithm given in Algorithm 4, which can learn all causes of a functional dynamic target.
In fact, our proposed SSL algorithm integrates elements of the screening method with those of traditional constraint-based structural learning techniques, as depicted in Algorithm 1. In its initial phase, the SSL algorithm is a modified version of the PC algorithm, extending its capabilities to effectively handle bidirectional edges introduced by the screening process. This extension of the PC algorithm, tailored to address the causes of the dynamic target, positions the SSL algorithm as a strong candidate for a benchmark.
In this simulation experiment, we randomly generate a causal graph G consisting of a dynamic target Y and p = (15, 100, 1000, 10,000) potential factors. Additionally, we randomly select 1 to 2 variables from these potential factors to serve as direct causes for Y . The potential factors are all discrete with finite levels, while the functional dynamic target Y = ( Y 1 , , Y 24 ) is a continuous vector, and its mean function is a Double-Logistic function, that is,
Y t = μ t P a ( Y , G ) + ϵ t , t = 1 , , 24 ,
where
μ t P a ( Y , G ) = a 1 P a ( Y , G ) 1 + exp ( r 1 P a ( Y , G ) ( t c 1 P a ( Y , G ) ) ) + a 2 P a ( Y , G ) 1 + exp ( r 2 P a ( Y , G ) ( t c 2 P a ( Y , G ) ) ) ,
and ϵ t = ϵ t 1 + ε t , ε t N ( 0 , 0.02 2 ) . The P a ( Y , G ) in the subscript of the above equations indicates that parameters are affected by the direct causes of Y . For each causal graph G , we randomly generate the corresponding causal mechanism, that is, the marginal and conditional distributions of potential factors and the functional dynamic target, and generate the simulation data from it. We use different sample sizes n = ( 50 , 100 , 200 , 500 , 1000 ) and repeat the experiment 100 times for each sample size. In addition, we adopt adaptive significance level values in the experiment, because as the number of potential factors increases, the strength of screening also increases. In other words, as the number of potential factors p increases, the significance level α of the (conditional) independence test decreases. For example, α is 0.05 when p = 100 , while α is 0.0005 when p = 10,000.
To evaluate the effectiveness of different methods, suppose X l is the set of learned direct causes of Y by algorithms, and X d is the set of true direct causes of Y in the generated graph. Then, let T P = | X l X d | , F P = | X l X d | , F N = | X d X l | , and we have
r e c a l l = T P T P + F N , p r e c i s i o n = T P T P + F P , a c c u r a c y = p F P F N p ,
where p is the number of potential factors. It can be seen that the recall measures how much the algorithm has learned among all the true direct causes. Precision measures how much of the direct causes learned by the algorithm are correct. Accuracy measures the proportion of correct judgments on whether each variable is a direct cause or not. The evaluation indicators for learning all causes can also be defined similarly.
The experiment results are shown in Table 1, in which t i m e represents the total time (in seconds) consumed by the algorithm, and r e c , p r e c , a c c represent the average value of recall, precision, and accuracy over 100 experiments, respectively. In addition, different subscripts represent different methods. D C and A C denote that algorithms learn direct causes and all causes, respectively.
In Table 1, since the SSL algorithm obtains direct and all causes simultaneously through complete structural learning, for the sake of fairness, we only count the total time for both algorithms. It can be seen that the time of the two algorithms is approximately linearly related to the number of potential factors p. Moreover, when p is fixed, the algorithm takes longer and longer as the sample size n increases. In fact, for SSL algorithms, most of the time is spent on learning the complete graph structure. Therefore, as n increases, the (conditional) independence test becomes more accurate, resulting in an increase in the size of set X s i g and a larger graph to learn, which naturally increases the time required. For the S-Local algorithm, more than 99% of the time is spent on optimizing the log-likelihood function during the (conditional) independence test in the screening stage. As n increases, the optimization time becomes longer and the total time also increases accordingly. This also explains why the time of the S-Local algorithm increases linearly as the number of variables increases, since the number of independence tests required increases roughly linearly. In addition, it can be seen that in most cases, the S-Local algorithm takes less time than the SSL algorithm, especially when p is small. However, when p is large, the time used by the two algorithms is similar. This is mainly because in this experimental setting, the mechanism of the functional dynamic target Y is relatively complex, and its mean function is a Double-Logistic function with too many parameters, which requires much time for optimization. In fact, even if there is only one binary direct cause, the mean function will have 13 parameters. When the mechanism of the functional dynamic target is relatively simple, the time required for the S-Local algorithm will also be greatly reduced. Besides, it should be noted that more than 99% of the time, the S-Local algorithm is used to check the independence in the screening step, and in practice, this step can be performed in parallel, which will greatly reduce the time required.
When learning the direct cause, whether it is recall, precision, or accuracy, the results of the S-Local algorithm are much higher than those of the SSL algorithm, especially the value of precision. The precision values of the SSL algorithm are very small, mainly because the accuracy of learning the complete graph structure is relatively low, resulting in learning many non-direct-cause variables in the local structure of Y . Particularly when p is large, it is difficult to correctly recover the local structure of Y . What’s more, it should be noted that under the same sample size, when p is small, the values of recall, precision and accuracy obtained by S-Local algorithm are not as good as those obtained when p is large. For example, when p = 15 , n = 50 , we have r e c S L o c a l = 0.500 , p r e c S L o c a l = 0.474 and a c c S L o c a l = 0.929 , but when p = 10,000, n = 50, we have r e c S L o c a l = 1.000, p r e c S L o c a l = 1.000 and a c c S L o c a l = 1.000. The recall and accuracy values of the SSL algorithm also show similar results. This result does not violate our intuition, as we use adaptive significance levels in the experiment. When p is large, in order to increase the strength of screening and facilitate subsequent learning of all causes, we use a smaller significance level. Therefore, the algorithm is more rigorous in determining whether a variable is the direct cause of Y when learning direct causes, making it easier to exclude those non-direct-cause variables.
When learning all causes, the recall and accuracy values of the SSL algorithm and S-Local algorithm increase monotonically with respect to the sample size, and even in cases with many potential factors, both algorithms can achieve very good results. For example, when p = 10,000, the accuracy values of both algorithms are above 99.9 % . Of course, overall, the results of the S-Local algorithm are significantly better than those of the SSL algorithm. However, it should be noted that the values of precision of the two algorithms show different trends. The precision value of the SSL algorithm increases monotonically with n when p is large, but the trend is not significant when p is small. This is because the SSL algorithm is affected by the distance effect, and as n gradually increases, (conditional) independence tests also become more accurate. As a result, many causes that are far away from Y can be identified. When p is large, the number of causes that are far away from Y is also large. Therefore, the precision of the SSL algorithm will gradually increase. However, when p is small, most variables have a short distance from Y . Although the SSL algorithm can also obtain more causes (the value of r e c S S L increases), it also includes some noncause variables that are strongly related to Y in the set of causes. At this time, the value of precision does not have a clear trend. On the other hand, the precision value of the S-Local algorithm monotonically increases with respect to n when p is small, and as p gradually increases, this trend gradually transforms into a monotonic decrease. This is because when p is small, as n increases, the S-Local algorithm can identify more causes through a more accurate (conditional) independence test. However, when p is large, the number of noncause variables obtained by the S-Local algorithm is greater than the number of causes. Therefore, the recall value still increases, but the precision value gradually decreases. In other words, in this case, there is a trade-off between the values of recall and precision of the S-Local algorithm. However, it should be noted that although the trends of precision values are different, the accuracy values of both algorithms increase with the increase in sample size.
It should be noted that the primary objective of the models and algorithms introduced in this paper is to identify the causes of functional dynamic targets, addressing the "Cause of Effect" (CoE) challenge, rather than directly predicting Y . However, based on the causal graphical model for these targets, correctly identifying Y ’s direct causes is indeed sufficient for making accurate predictions. In the simulation experiment, with 15 nodes and 1000 samples, the Mean Squared Error (MSE) of prediction is 0.281 for simulations that incorrectly learn Y ’s direct causes. This figure dropped to 0.185 when the causes were correctly identified, reflecting a significant reduction in prediction error of approximately 34%. Additionally, as illustrated in Table 1, the S-Local algorithm demonstrated exceptional accuracy in identifying the direct causes, with a success rate consistently above 98% in most cases. This high level of accuracy indicates that our algorithms perform well in predicting Y as well.

7. Discussion and Conclusions

In this paper, we first establish a causal graphical model for functional dynamic targets and discuss hypothesis testing methods for testing the (conditional) independence between random variables and functional dynamic targets. In order to deal with situations where there are too many potential factors, we propose a screening algorithm to screen out some variables that are significantly related to the functional dynamic target from a large number of potential factors. On this basis, we propose the SSL algorithm and S-Local algorithm to learn the direct causes and all causes within a given distance of functional dynamic targets. The former utilizes the screening algorithm and structural learning methods to learn both the direct and all causes of functional dynamic targets simultaneously by recovering the complete graph structure of the screened variables. Its disadvantage is that learning the complete structure of the graph is very difficult and redundant, and it is also affected by the distance effect, resulting in a low accuracy in learning causes. The latter first uses a screening-based algorithm to learn the direct causes of functional dynamic targets, and then uses our proposed PC-by-PC algorithm, a step-by-step locally learning algorithm, to learn all causes within a given distance. The advantage of this algorithm is that all learning processes are controlled within the local structure of current nodes, making the algorithm no longer affected by the distance effect. In fact, this algorithm only focuses on the local structure of each cause variable, rather than learning the complete graph structure, greatly saving time and space. Moreover, the algorithm not only pays attention to the distance, but also can identify the direct path between each cause variable and the functional dynamic target, so that the algorithm does not need to identify the whole structure of a certain part but only learns the part of the local structure involving the cause variables, further reducing the learning of redundant structures.
It should be noted that when the causal mechanism of functional dynamic targets is very complex, the time required for the S-Local algorithm may greatly increase. In addition, the choice of significance level will also have an impact on the precision of the algorithm. Thus, how to simplify the causal model of functional dynamic targets and how to reasonably choose an appropriate significance level are two directions of our future work.

Author Contributions

Conceptualization, R.Z. and X.Y.; methodology, R.Z. and X.Y.; software, R.Z. and X.Y.; validation, R.Z. and Y.H.; formal analysis, R.Z., X.Y. and Y.H.; investigation, R.Z., X.Y. and Y.H.; resources, Y.H.; data curation, R.Z. and X.Y.; writing—original draft preparation, R.Z.; writing—review and editing, R.Z. and Y.H.; visualization, R.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key Research and Development Program of China grant number 2022ZD0160300.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The simulated data can be regenerated using the codes, which can be provided to the interested user via an email request to the correspondence author.

Acknowledgments

Thank Qingyuan Zheng for providing technical support.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1

In Algorithm 1, we first screen variables significantly related to Y through the hypothesis testing of H 0 against H 1 as defined in Equations (2) and (3), and learn the causal structure of these variables via a structural learning method. However, it should be noted that due to the absence of some variables in some separation sets, the graph we learn here is not the true CPDAG, but may have some extra edges, which may be directed or bidirected. Therefore, we need to make some modifications to the structural learning algorithm. As shown in Algorithm 1, we first learn the skeleton through the original structural learning method and then find v-structures by using a variant algorithm (Lines 2–5 in Algorithm 1). Hence, taking the PC algorithm as an example, we give the method to learn the skeleton G s ( X s i g ) below.
Algorithm A1 PC algorithm to learn the skeleton G s ( X s i g ) .
Require:  X s i g , data sets about X s i g .
Ensure: the skeleton G s ( X s i g ) and all separation sets S .
 1:
Construct a complete undirected graph G defined over X s i g ;
 2:
Set k : = 1 ;
 3:
repeat
 4:
    k : = k + 1 ;
 5:
   repeat
 6:
     Find an ordered variable pair ( X , Y ) satisfying | A d j ( X , G ) { Y } | k in graph G ;
 7:
     repeat
 8:
        Find a subset S A d j ( X , G ) { Y } with size k;
 9:
        if  X Y S , then
10:
          delete the undirected edge X Y from graph G ;
11:
          save S as S ( X , Y ) and S ( Y , X ) and add them into S ;
12:
        end if
13:
     until The undirected edge X Y is deleted or all subsets S A d j ( X , G ) { Y } have been selected;
14:
   until The conditional independence test has been completed for all ordered variable pairs ( X , Y ) all sets S that meet the conditions;
15:
until For each ordered variable pair ( X , Y ) , we have | A d j ( X , G ) { Y } | < k ;
16:
return  G and S .

Appendix A.2

Proof of Theorem 1. 
We know that all causes of Y and the descendants of these causes are d-connected with Y . Since the faithfulness assumption holds for the causal graphical model ( G , P , Θ ) , we have that all causes of Y and the descendants of these causes are not independent of Y . From the definition X s i g = { X X X and X     / Y } , we have that X s i g contains all causes of Y and the descendants of these causes. Meanwhile, according to the definition of the causal graphical model ( G , P , Θ ) , the functional dynamic target Y has no children. Therefore, all d-connected paths from a vertex X to Y should be either X Y or X X Y . Clearly, the vertex X is either a cause of Y or a descendant of a cause of Y . Now, we have shown that the vertices in X s i g are either causes of Y or descendants of causes of Y . Statement 1 is proved.
Now, we prove the Statement 2. The faithfulness assumption makes sure that X 1 and X 2 are not adjacent in G X s i g if there exists a set A X s i g such that X 1 X 2 A . We just need to prove the “only if” part, that is, there exists a set A X s i g such that X 1 X 2 A if X 1 and X 2 are not adjacent in G X s i g .
  • Consider the case that both X 1 and X 2 are causes of Y . Since X 1 and X 2 are not adjacent in G X s i g , either X 1 is a nondescendant of X 2 or X 2 is a nondescendant of X 1 . Without loss of generality, we assume that X 1 is a nondescendant of X 2 . Since X 2 is a cause of Y , according to Statement 1, we have P a ( X 2 , G ) = P a ( X 2 , G X s i g ) X s i g . Let A = P a ( X 2 , G ) , we can obtain the result X 1 X 2 A .
  • Consider the case that X 1 is a cause of Y and X 2 is not a cause of Y . If X 2 is not a descendant of X 1 , similar to the discussion in the previous case, we have X 1 X 2 A where A = P a ( X 1 , G ) = P a ( X 1 , G X s i g ) X s i g . If X 2 is a descendant of X 1 , we have that all paths from X 2 to X 1 with the first directed edge as X 2 · are d-separated, otherwise, there should be a directed cycle. Let A 1 consist of all parents of X 2 that are d-connected with X 1 in G . Clearly, A 1 X s i g since X 1 is a cause of Y . Then, let A = A 1 P a ( X 1 , G ) X s i g . Since P a ( X 1 , G ) blocks all d-connected paths between X 1 and X 2 through P a ( X 1 , G ) , and the set A 1 blocks all paths like X 1 X 2 , we have that the set A blocks all d-connected paths between X 1 and X 2 in G , that is, X 1 X 2 A holds.
  • Consider the case that X 2 is a cause of Y and X 1 is not a cause of Y , which is symmetric to the second case and can be discussed similarly.
Therefore, the “only if” part of Statement 2 is proven.
Statement 3 holds directly since the faithfulness assumption holds for the causal graphical model ( G , P , Θ ) . □
Proposition A1. 
The edges in E s ( G X s i g ) E s ( G X s i g ) , if they exist, do not appear between two ancestors of Y in G .
Proof of Proposition A1. 
For any two nonadjacent vertices X 1 , X 2 A n ( Y , G ) , if there is no directed path from X 1 ( o r X 2 ) to X 2 ( o r X 1 ) , then P a ( X 1 , G ) ( o r P a ( X 2 , G ) ) X s i g is a separation set relative to the pair ( X 1 , X 2 ) , and hence there is no edge between X 1 and X 2 in the graph G X s i g . Therefore, without loss of generality, we assume X 1 A n ( X 2 , G ) . In this case, since X 2 is an ancestor of Y , S = P a ( C n ( X 1 , X 2 , G ) , G ) D e ( C n ( X 1 , X 2 , G ) , G ) X s i g is a separation set relative to the pair ( X 1 , X 2 ) , in which C n ( X 1 , X 2 , G ) is the set of all intermediate nodes on the directed paths from X 1 to X 2 in G . Hence, there is no edge between X 1 and X 2 in the graph G X s i g . □
Proof of Theorem 2. 
According to Proposition A1, the edge X 1 X 2 in E s ( G X s i g ) E s ( G X s i g ) may have two cases:
Case 1. X 1 A n ( Y , G ) , X 2 D e ( A n ( Y , G ) , G ) A n ( Y , G ) .
Case 1.1. X 1 A n ( X 2 , G ) . Since X 2 is a nondescendant of X 1 , P a ( X 1 , G ) X s i g is a separation set relative to ( X 1 , X 2 ) . Hence, there is no edge between X 1 and X 2 in graph G X s i g , which is a contradiction.
Case 1.2. X 1 A n ( X 2 , G ) . In this case, there are four possible paths between X 1 and X 2 in the graph G :
Case 1.2.1. Causal path X 1 X 2 . In this case, given any vertex Z on this path different from X 1 and X 2 , this path can be blocked, implying that X 1 and X 2 are not adjacent in G X s i g , which is a contradiction.
Case 1.2.2. Non-causal-path X 1 X 2 . Since X 1 A n ( Y , G ) , all parents of X 1 are in the set X s i g . Hence, conditioning on the parent of X 1 on this path can block this path, implying that X 1 and X 2 are not adjacent in G X s i g , which is a contradiction.
Case 1.2.3. Non-causal-path X 1 X 2 . There must exist at least one v-structure in the path, and the path can be blocked given an empty set. Suppose the nearest v-structure to X 2 is W Z X 2 , then the collider Z in this v-structure can not appear on any causal paths from X 1 to X 2 ; otherwise, there will be a directed cycle in graph G . Hence, X 1 and X 2 are not adjacent in G X s i g , which is a contradiction.
Case 1.2.4. Non-causal-path X 1 X 2 . There must exist at least one v-structure in the path, and the path can be blocked given an empty set. However, different from Case 1.2.3, the colliders of the v-structures in this path may occur on some causal paths from X 1 to X 2 . According to Case 1.2.1, these vertices may need to be adjusted to block the causal paths. We use Figure A1 to make an illustration in detail. In Figure A1, with out loss of generality, we suppose Z i D e ( A n ( Y , G ) , G ) A n ( Y , G ) , i = 1 , , p . If not, let the point in the path that is farthest from X 1 and belongs to the set A n ( Y , G ) be the new X 1 . In this case, in the non causal path p 0 = X 1 Z 1 U 1 U p X 2 , all colliders Z i are in the causal path p 1 = X 1 Z 1 Z p X 2 , and U i X s i g , i = 1 , , p . In order to block the causal path p 1 , it is necessary to adjust some vertices in { Z i , i = 1 , , p } , say { Z k 1 , , Z k l } . But at this time, the path p = p 1 ( X 1 , Z k 1 ) p 0 ( Z k 1 , Z k 1 + 1 ) p 1 ( Z k 1 + 1 , Z k 2 ) p 0 ( Z k l , Z k l + 1 ) p 1 ( Z k l + 1 , X 2 ) cannot be blocked. In fact, even if all Z i , i = 1 , , p are adjusted, the non-causal-path p 0 still cannot be blocked.
Figure A1. Illustration for Case 1.2.4.
Figure A1. Illustration for Case 1.2.4.
Entropy 26 00541 g0a1
In general, in Case 1, only when the situation in Case 1.2.4 (Figure A1) occurs, the edges in E ( G X s i g ) E ( G X s i g ) will appear. Then we have X 1 A d j ( Z i , G X s i g ) , i = 1 , , p and X 1 A d j ( X 2 , G X s i g ) . Note that Rule 1–Rule 3 of Meek’s rules only orient the edges backward. In other words, in Case 1, no matter how the edges in E ( G X s i g ) E ( G X s i g ) are oriented, they do not affect the orientation of the edges between vertices in A n ( Y , G ) . For instance, when applying Rule 3 of Meek’s rules, as shown in Figure A2, if X 1 X 2 is the edge in E ( G X s i g ) E ( G X s i g ) due to Case 1.2.4, then the following hold:
  • If X 1 A n ( Y , G ) , X 2 D e ( A n ( Y , G ) , G ) A n ( Y , G ) , then because of the directed edge X 2 X 4 , we can obtain X 4 D e ( A n ( Y , G ) , G ) A n ( Y , G ) , implying that the new oriented edge X 1 X 4 is the directed edge out of the set A n ( Y , G ) , which does not affect the orientation of edges between vertices in A n ( Y , G ) .
  • If X 2 A n ( Y , G ) , X 1 D e ( A n ( Y , G ) , G ) A n ( Y , G ) , then because of the directed edge X 2 X 4 , we have X 4 D e ( A n ( Y , G ) , G ) . Note that if X 4 A n ( Y , G ) , then we have X 3 A n ( Y , G ) and X 3 X 1 E ( G ) . Since in Case 1.2.4, only paths between X 2 and X 1 cannot be blocked, and all such paths have an arrow pointing to X 1 . Hence, in the process of learning the graph G X s i g , we have X 2 X 3 and X 2     / X 3 X 1 , implying that a v-structure X 3 X 1 X 2 occurs in the graph G X s i g before applying Meek’s rules, which is a contradiction. Hence, X 4 D e ( A n ( Y , G ) , G ) A n ( Y , G ) , implying that the newly oriented edge X 1 X 4 is the directed edge between vertices in the set D e ( A n ( Y , G ) , G ) A n ( Y , G ) , which does not affect the orientation of edges between vertices in A n ( Y , G ) .
Figure A2. Rule 3 of Meek’s rules as an example to illustrate Case 1.
Figure A2. Rule 3 of Meek’s rules as an example to illustrate Case 1.
Entropy 26 00541 g0a2
Other cases of Meek’s rules can be similarly proved. In fact, for Case 1 as shown in Figure A1, if there exists a vertex W D e ( A n ( Y , G ) , G ) P a ( X 2 , G ) such that the edge between X 2 and W may be misoriented in G X s i g due to the new edges in E s ( G X s i g ) E s ( G X s i g ) , we have W A d j ( Z p , G ) ; otherwise, W X 2 Z p forms a v-structure and the edge W X 2 can be oriented correctly in both G X s i g and G X s i g . And then, we have W A d j ( Z p 1 , G ) ; otherwise, W Z p can be oriented by v-structure and W X 2 can be oriented correctly in both G X s i g and G X s i g by applying Rule 2 of Meek’s rules if the edge between Z p and X 2 is directed and using Lemma 1 in [24] if the edge between Z p and X 2 is undirected. Similarly, we have W A d j ( Z i , G ) , i = 1 , , p and W A d j ( X 1 , G ) . Note that the vertices X 1 , X 2 and W form a triangle, implying that the edge between W and X 2 cannot be oriented by applying Meek’s rules to the edge between X 1 and X 2 in G X s i g , which contradicts the assumption.
Case 2. X 1 , X 2 D e ( A n ( Y , G ) , G ) A n ( Y , G ) .
Case 2.1. X 1 A n ( X 2 , G ) and X 2 A n ( X 1 , G ) . In this case, there are three possible paths between X 1 and X 2 in the graph G :
Case 2.1.1. Non-causal-path X 1 X 2 . The specific discussion is similar to Case 1.2.3.
Case 2.1.2. Non-causal-path X 1 X 2 (or X 1 X 2 ). There must exist at least one v-structure in the path, and the path can be blocked given an empty set. Note that, different from Case 1.2.4, there is no causal path between X 1 and X 2 at this time, implying that the collider of the v-structure closest to X 2 (or X 1 ) will not be adjusted. Therefore, X 1 and X 2 are not adjacent in G X s i g , which is a contradiction.
Case 2.1.3. Non-causal-path X 1 X 2 . Since some parents of X 1 or X 2 may not belong to the set X s i g , this path cannot be blocked. For example, in the fork X 1 Z X 2 , when Z X s i g , an edge in E s ( G X s i g ) E s ( G X s i g ) appears between X 1 and X 2 . Similar to the discussion at the end of Case 1, no matter how this edge is oriented, Meek’s rules are backward-oriented, so the orientation of this edge only happens inside the set D e ( A n ( Y , G ) , G ) and does not affect the orientation within the set A n ( Y , G ) .
Case 2.2. X 1 A n ( X 2 , G ) (the case of X 2 A n ( X 1 , G ) can be discussed similarly). In this case, there are four possible paths between X 1 and X 2 in the graph G :
Case 2.2.1. Causal path X 1 X 2 . The discussion is the same as Case 1.2.1.
Case 2.2.2. Non-causal-path X 1 X 2 or X 1 X 2 . The discussion is the same as Case 1.2.3.
Case 2.2.3. Non-causal-path X 1 X 2 . The discussion is the same as Case 1.2.4.
Case 2.2.4. Non-causal-path X 1 X 2 . The discussion is the same as Case 2.1.3.
Case 2.3. This case is symmetric to Case 2.2 and can be discussed similarly.
We already proved that the edges in E s ( G X s i g ) E s ( G X s i g ) do not affect the skeleton or orientation of edges between ancestors of Y in G . In fact, it is worth mentioning that, in the above proof, all discussions focus on graph G , implying that the orientation of edges between vertices in A n ( Y , G ) are not affected by the new edge in E s ( G X s i g ) E s ( G X s i g ) . In other words, the ancestors of Y in the two graphs are the same, while the possible ancestors of Y in G X s i g are also the possible ancestors of Y in G X s i g , but not vice versa. □
Proof of Theorem 3. 
We need to prove that for any X X s i g , X is a direct cause of Y if and only if there is no subset A X s i g ( { X } N X ) such that X Y A . According to Theorem 1, the “only if” part holds obviously.
Now, we prove the “if” part. According to Theorem 1, for any X X s i g , X is a direct cause of Y if and only if there is no subset A X s i g { X } such that X Y A . Hence, using the definition of N X in Algorithm 2, it suffices to prove that if we have Y X A , then for each non-direct-cause variable V A , there exists at least one separation set of V and Y that does not contain X. In fact, paths between V and Y can be divided into two types: paths that go through X and paths that do not go through X. For paths that do not go through X, the separation set naturally does not contain X. Then, for any path that goes through X, the path can be represented as p ( V , Y ) = p ( V , X ) p ( X , Y ) , where p ( X , Y ) has already been blocked by some variables in set A . Therefore, whether the subpath p ( V , X ) is blocked or not, the path p ( V , Y ) can be separated by a set that does not contain X. □
Proof of Theorem 4. 
We first prove that Algorithm 3 can learn the PCD set P C D T of a target node T correctly. Similar to the definition of X s i g , let T s i g X be the set of variables associated with T. As shown in Line 1 in Algorithm 3, the initial value P C D T 0 of P C D T is { X : X V U , and T     / X } , which is a subset of T s i g . In fact, if the non-PC set U is empty, then we have P C D T 0 = T s i g . According to the Markov condition and the faithfulness assumption, for any X T s i g , X is a parent or child of T if and only if there is no subset A T s i g { X } such that X T A . Hence, P C D T contains all parents and children of T. For any nondescendant variable of T, the set of T’s parents separates them from T. Due to the lack of a separation set, some descendant variables of T may be included in P C D T , as shown in Example 3. Therefore, P C D T obtained by Algorithm 3 consists of T’s parents, children, and some descendants.
Now, we prove that Algorithm 4 can learn all causes of T within a given distance m. First, according to the discussion following Example 3, Algorithm 4 can learn the PC set of each variable correctly by using Algorithm 3 and symmetric validation method (Lines 7–13 in Algorithm 4). In other words, Algorithm 4 can learn the skeleton of the local structure of each variable correctly. Next, note that once the skeleton of the local structure of a variable is determined, its separation sets from other variables are also obtained at the same time (Line 6 in Algorithm 4). Therefore, all v-structures can be learned correctly because they are determined by local structures and separation sets (Line 14 in Algorithm 4). Combined with Meek’s rules, Algorithm 4 learns the orientation of the local structure of each variable correctly. Finally, we show that continuing the algorithm cannot obtain more causes of T within a distance m. Notice that we learn the local structure of nodes layer by layer, and we only learn the next layer after all the nodes of a certain layer have been learned (Line 15 in Algorithm 4). Hence, once Algorithm 4 is stopped, it means that all directed paths pointing to T with a distance less than or equal to m have been found, and the m-th edge of these paths has been directed. As shown above, we can correctly obtain all edges and v-structures and their orientations. Hence, continuing the algorithm can only orient new edges that are farther away from T ( > m ), which is not what we care about.
We already showed that Algorithm 4 can correctly learn all causes of T that are within a distance of m from T . Note that the distance between a functional dynamic target Y and its direct causes is always 1. Thus, obviously, if T is exactly the set of Y ’s direct causes obtained from Algorithm 2, and the node L in Line 2 in Algorithm 4 is exactly Y , then according to Theorem 3 and the proof above, Algorithm 4 learns all causes of Y within a given distance m + 1 correctly. □

References

  1. Karkach, F. Trajectories and models of individual growth. Demogr. Res. 2006, 15, 347–400. [Google Scholar] [CrossRef]
  2. Richards, A.S. A flexible growth function for empirical use. J. Exp. Bot. 1959, 10, 290–301. [Google Scholar] [CrossRef]
  3. Zimmerman, D.L.; Núñez-Antón, V. Parametric modelling of growth curve data: An overview. Test 2011, 10, 1–73. [Google Scholar] [CrossRef]
  4. Murre, J.M.; Chessa, A.G. Power laws from individual differences in learning and forgetting: Mathematical analyses. Psychon. Bull. Rev. 2001, 18, 592–597. [Google Scholar] [CrossRef] [PubMed]
  5. Wixted, J.T.; Chessa, A.G. On Common Ground: Jost’s (1897) law of forgetting and Ribot’s (1881) law of retrograde amnesia. Psychol. Rev. 2004, 111, 864–879. [Google Scholar] [CrossRef] [PubMed]
  6. Sachs, K.; Perez, O.; Pe’er, D.; Lauffenburger, D.A.; Nolan, G.P. Causal protein-signaling networks derived from multiparameter single-cell data. Science 2005, 308, 523–529. [Google Scholar] [CrossRef] [PubMed]
  7. Pearl, J. Causality Models, Reasoning and Inference, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
  8. Han, B.; Park, M.; Chen, X.W. A Markov blanket-based method for detecting causal SNPs in GWAS. BMC Bioinform. 2010, 11, S5. [Google Scholar] [CrossRef]
  9. Duren, Z.; Wang, Y. A systematic method to identify modulation of transcriptional regulation via chromatin activity reveals regulatory network during mESC differentiation. Sci. Rep. 2016, 6, 22656. [Google Scholar] [CrossRef] [PubMed]
  10. Heckman, J.J. Comment on “Identification of causal effects using instrumental variables”. J. Am. Stat. Assoc. 1996, 91, 459–462. [Google Scholar] [CrossRef]
  11. Winship, C.; Morgan, S.L. The estimation of causal effects from observational data. Annu. Rev. Sociol. 1999, 25, 659–706. [Google Scholar] [CrossRef]
  12. Yin, J.; Zhou, Y.; Wang, C.; He, P.; Zheng, C.; Geng, Z. Partial Orientation and Local Structural Learning of Causal Networks for Prediction. In Proceedings of the Causation and Prediction Challenge at WCCI, Hong Kong, China, 1–6 June 2008; pp. 93–105. [Google Scholar]
  13. Wang, C.; Zhou, Y.; Zhao, Q.; Geng, Z. Discovering and Orienting the Edges Connected to a Target Variable in a DAG via a Sequential Local Learning Approach. Comput. Stat. Data Anal. 2014, 77, 252–266. [Google Scholar] [CrossRef]
  14. Pena, J.M.; Nilsson, R.; Bjorkegren, J.; Tegner, J. Towards Scalable and Data Efficient Learning of Markov Boundaries. J. Mach. Learn. Res. 2007, 45, 211–232. [Google Scholar]
  15. Gao, T.; Ji, Q. Efficient Markov Blanket Discovery and Its Application. IEEE Trans. Cybern. 2017, 47, 1169–1179. [Google Scholar] [CrossRef]
  16. Wang, H.; Ling, Z.; Yu, K.; Wu, X. Towards Efficient and Effective Discovery of Markov Blankets for Feature Selection. Inf. Sci. 2020, 509, 227–242. [Google Scholar] [CrossRef]
  17. Gao, T.; Ji, Q. Local Causal Discovery of Direct Causes and Effects. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2512–2520. [Google Scholar]
  18. Ling, Z.; Yu, K.; Wang, H.; Liu, L.; Li, J. Any Part of Bayesian Network Structure Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 14, 1–14. [Google Scholar]
  19. Yu, L.; Liu, H. Efficient Feature Selection via Analysis of Relevance and Redundancy. J. Mach. Learn. Res. 2004, 5, 1205–1224. [Google Scholar]
  20. Pearl, J.; Shafer, G. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann: San Mateo, CA, USA, 1988. [Google Scholar]
  21. Verma, T.; Pearl, J. Equivalence and synthesis of causal models. In Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence, Cambridge, MA, USA, 27–29 July 1990; pp. 220–227. [Google Scholar]
  22. Pearl, J.; Geiger, D.; Verma, T. Conditional independence and its representations. Kybernetika 1989, 25, 33–44. [Google Scholar]
  23. Andersson, S.A.; Madigan, D.; Perlman, M.D. A characterization of Markov equivalence classes for acyclic digraphs. Ann. Stat. 1997, 25, 505–541. [Google Scholar] [CrossRef]
  24. Meek, C. Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–20 August 1995; pp. 403–410. [Google Scholar]
  25. Fekedulegn, D.; Mac Siúrtáin, M.P.; Colbert, J.J. Parameter estimation of nonlinear models in forestry. Silva Fenn. 1999, 33, 327–336. [Google Scholar] [CrossRef]
  26. Gossman, M.; Koops, W. Multiple analysis of growth curves in chickens. Poulty Sci. 1988, 67, 33–42. [Google Scholar] [CrossRef]
  27. Xu, M.J.; Zhu, L.B.; Zhou, S.; Ye, C.G.; Mao, M.X.; Sun, K.; Su, L.D.; Pan, X.H.; Zhang, H.X.; Huang, S.G.; et al. A computational framework for mapping the timing of vegetative phase change. New Phytol. 2016, 211, 750–760. [Google Scholar] [CrossRef]
  28. Spirtes, P.; Glymour, C. An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev. 1991, 9, 62–72. [Google Scholar] [CrossRef]
  29. Chickering, D.M. Optimal structure identification with greedy search. J. Mach. Learn. Res. 2002, 3, 507–554. [Google Scholar]
  30. West, D. Introduction to Graph Theory; Prentice Hall: Upper Saddle River, NJ, USA, 1996. [Google Scholar]
Figure 1. The causal graphs when the variable X 1 is significantly associated with Y : (a) V-structures. (b) Chains. (c) Triangles. (d) Forks.
Figure 1. The causal graphs when the variable X 1 is significantly associated with Y : (a) V-structures. (b) Chains. (c) Triangles. (d) Forks.
Entropy 26 00541 g001
Figure 2. Meek’s rules comprise four orientation rules. If the graph on the left-hand side of a rule is an induced subgraph of a PDAG, then the corresponding rule can be applied to replace an undirected edge in the induced subgraph with a directed edge. This replacement results in the induced subgraph transforming into the graph depicted on the right-hand side of the rule.
Figure 2. Meek’s rules comprise four orientation rules. If the graph on the left-hand side of a rule is an induced subgraph of a PDAG, then the corresponding rule can be applied to replace an undirected edge in the induced subgraph with a directed edge. This replacement results in the induced subgraph transforming into the graph depicted on the right-hand side of the rule.
Entropy 26 00541 g002
Figure 3. An example to illustrate the difference between G X s i g and G X s i g : (a) True graph G 1 . (b) G 1 , X s i g . (c) G 1 , X s i g . (d) True graph G 2 . (e) G 2 , X s i g . (f) G 2 , X s i g .
Figure 3. An example to illustrate the difference between G X s i g and G X s i g : (a) True graph G 1 . (b) G 1 , X s i g . (c) G 1 , X s i g . (d) True graph G 2 . (e) G 2 , X s i g . (f) G 2 , X s i g .
Entropy 26 00541 g003
Figure 4. An example to illustrate the results in Theorem 2: (a) True graph G . (b) G X s i g Y . (c) G X s i g Y .
Figure 4. An example to illustrate the results in Theorem 2: (a) True graph G . (b) G X s i g Y . (c) G X s i g Y .
Entropy 26 00541 g004
Figure 5. An example to illustrate the P C D set obtained by Algorithm 3.
Figure 5. An example to illustrate the P C D set obtained by Algorithm 3.
Entropy 26 00541 g005
Table 1. Experimental results of SSL algorithm and S-Local algorithm under different settings.
Table 1. Experimental results of SSL algorithm and S-Local algorithm under different settings.
pn time SSL time S Local Cause rec SSL rec S Local prec SSL prec S Local acc SSL acc S Local
15505055 D C 0.487 0.500 0.352 0.474 0.866 0.929
A C 0.157 0.487 0.721 0.814 0.445 0.640
1006449 D C 0.557 0.829 0.485 0.814 0.905 0.975
A C 0.143 0.656 0.786 0.877 0.451 0.733
20011260 D C 0.538 0.885 0.386 0.856 0.888 0.981
A C 0.224 0.882 0.603 0.878 0.476 0.854
50067685 D C 0.551 0.939 0.466 0.929 0.927 0.989
A C 0.363 0.977 0.567 0.893 0.552 0.916
10001126126 D C 0.778 0.981 0.691 0.963 0.957 0.996
A C 0.566 0.996 0.702 0.890 0.667 0.923
10050251277 D C 0.293 0.283 0.072 0.256 0.934 0.984
A C 0.112 0.223 0.154 0.358 0.867 0.915
100224227 D C 0.398 0.755 0.141 0.656 0.956 0.993
A C 0.104 0.604 0.226 0.688 0.884 0.946
200290235 D C 0.292 0.917 0.113 0.828 0.962 0.996
A C 0.123 0.891 0.205 0.763 0.891 0.961
500890336 D C 0.221 0.916 0.071 0.900 0.962 0.997
A C 0.145 0.966 0.156 0.753 0.892 0.957
10001509527 D C 0.462 0.978 0.156 0.962 0.967 0.999
A C 0.327 0.996 0.308 0.668 0.908 0.933
100050836839 D C 0.814 1.000 0.073 1.000 0.989 1.000
A C 0.336 0.204 0.235 0.924 0.985 0.992
100936962 D C 0.860 1.000 0.069 1.000 0.988 1.000
A C 0.587 0.573 0.379 0.867 0.988 0.995
20012041222 D C 0.980 1.000 0.083 1.000 0.989 1.000
A C 0.724 0.847 0.446 0.861 0.989 0.997
50023761930 D C 1.000 1.000 0.118 1.000 0.992 1.000
A C 0.804 0.922 0.500 0.814 0.991 0.997
100040153118 D C 1.000 1.000 0.121 1.000 0.993 1.000
A C 0.873 0.998 0.523 0.813 0.992 0.998
10,0005091099480 D C 0.667 1.000 0.150 1.000 1.000 1.000
A C 0.148 0.148 0.194 1.000 0.999 0.999
10010,00810,463 D C 0.654 1.000 0.101 1.000 0.999 1.000
A C 0.376 0.538 0.285 0.884 0.999 1.000
20013,71013,836 D C 0.923 1.000 0.101 1.000 0.999 1.000
A C 0.551 0.833 0.395 0.871 0.999 1.000
50021,08418,343 D C 1.000 1.000 0.130 1.000 0.999 1.000
A C 0.782 0.919 0.502 0.813 0.999 1.000
100031,47631,862 D C 1.000 1.000 0.126 1.000 0.999 1.000
A C 0.813 0.959 0.505 0.787 0.999 1.000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhao, R.; Yang, X.; He, Y. Learning Causes of Functional Dynamic Targets: Screening and Local Methods. Entropy 2024, 26, 541. https://doi.org/10.3390/e26070541

AMA Style

Zhao R, Yang X, He Y. Learning Causes of Functional Dynamic Targets: Screening and Local Methods. Entropy. 2024; 26(7):541. https://doi.org/10.3390/e26070541

Chicago/Turabian Style

Zhao, Ruiqi, Xiaoxia Yang, and Yangbo He. 2024. "Learning Causes of Functional Dynamic Targets: Screening and Local Methods" Entropy 26, no. 7: 541. https://doi.org/10.3390/e26070541

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop