Learning Causes of Functional Dynamic Targets: Screening and Local Methods

Zhao, Ruiqi; Yang, Xiaoxia; He, Yangbo

doi:10.3390/e26070541

Open AccessArticle

Learning Causes of Functional Dynamic Targets: Screening and Local Methods

by

Ruiqi Zhao

¹

,

Xiaoxia Yang

^2,* and

Yangbo He

^1,*

¹

School of Mathematical Sciences, Peking University, Beijing 100871, China

²

College of Science, Beijing Forestry University, Beijing 100083, China

^*

Authors to whom correspondence should be addressed.

Entropy 2024, 26(7), 541; https://doi.org/10.3390/e26070541

Submission received: 2 April 2024 / Revised: 13 June 2024 / Accepted: 23 June 2024 / Published: 24 June 2024

(This article belongs to the Section Information Theory, Probability and Statistics)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This paper addresses the challenge of identifying causes for functional dynamic targets, which are functions of various variables over time. We develop screening and local learning methods to learn the direct causes of the target, as well as all indirect causes up to a given distance. We first discuss the modeling of the functional dynamic target. Then, we propose a screening method to select the variables that are significantly correlated with the target. On this basis, we introduce an algorithm that combines screening and structural learning techniques to uncover the causal structure among the target and its causes. To tackle the distance effect, where long causal paths weaken correlation, we propose a local method to discover the direct causes of the target in these significant variables and further sequentially find all indirect causes up to a given distance. We show theoretically that our proposed methods can learn the causes correctly under some regular assumptions. Experiments based on synthetic data also show that the proposed methods perform well in learning the causes of the target.

Keywords:

screening method; local structure learning; functional dynamic target; direct causes; indirect causes

1. Introduction

Identifying the causes of a target variable is a primary objective in numerous research studies. Sometimes, these target variables are dynamic, observed at distinct time intervals, and typically characterized by functions or distinct curves that depend on other variables and time. We call them functional dynamic targets. For example, in nature, the growth of animals and plants is usually multistage and nonlinear with respect to time [1,2,3]. The popular growth curve functions, including Logistic, Gompertz, Richards, Hossfeld IV, and Double-Logistic functions, have S shapes [3], and have been widely used to model the patterns of growth. In psychological and cognitive science, researchers usually fit individual learning and forgetting curves by power functions; individuals may have different curve parameters [4,5].

The causal graphical model is widely used for the automated derivation of causal influences in variables [6,7,8,9] and demonstrates excellent performance in presenting complex causal relationships between multiple variables and expressing causal hypotheses [7,10,11]. In this paper, we aim to identify the underlying causes of these functional dynamic targets using the graphical model. There are three main challenges for this purpose. Firstly, identifying the causes is generally more challenging than exploring associations, even though the latter has received substantial attention, as evidenced by the extensive use of Genome-Wide Association Studies (GWAS) within the field of bioinformatics. Secondly, it is difficult to use a causal graphical model to represent the generating mechanism of dynamic targets and to find the causes of the targets from observational data when the number of variables is very large. For example, one needs to find the genes that affect the growth curve of individuals from more than thousands of Single-Nucleotide Polymorphisms (SNPs). Finally, the variables considered are mixed, which increases the complexity of representing and learning the causal model. We discuss these three challenges in detail below.

First of all, traditional statistical methods can only discover correlations between variables, rather than causal relationships, which may give false positive or false negative results for finding the real causes of the target. In fact, the association between the target and variables may originate from different causal mechanisms. For example, Figure 1 displays several different causal mechanisms possibly resulting in a statistically significant association between the target and variables. In Figure 1,

X_{1}, X_{2}, X_{3}

are three random variables,

Y = (Y_{1}, \dots, Y_{n})

is a vector representing a functional dynamic target, in which

Y_{i}, i = 1, \dots, n

are states of the target in n time points, and direct edges represent direct causal relations among them. Using the statistical methods, we are very likely to find that

X_{1}

is associated with

Y

significantly in all four cases. However, it is hard to identify whether

X_{1}

is a real cause of

Y

without further causal learning. As shown in Figure 1,

X_{1}

might be a direct cause of

Y

in Figure 1a,c, a cause but not a direct cause in Figure 1b, and not a cause in Figure 1d.

In addition, when the number of candidate variables is very huge, both learning causal structures and discovering target causes become very difficult. In fact, learning the complete causal graph is redundant and wasteful for the task of finding causes, as the focus should be on the target variable’s local structure. PCD-by-PCD algorithm [12] is adept at identifying such local structures and efficiently distinguishing parents, children, and some descendants. The MB-by-MB method [13], in contrast, simplifies this by learning Markov Blanket (MB) sets for identifying direct causes/effects, leveraging simpler and quicker techniques compared with PCD sets with methods like PCMB, STMB, and EEMB [14,15,16]. The CMB algorithm further streamlines this process using a topology-based MB discovery approach [17]. However, Ling [18] pointed out that Expand-Backtracking-type algorithms, such as the PCD-by-PCD and CMB algorithms, may overlook some v-structures, leading to numerous incorrect edge orientations. To tackle these issues, the APSL algorithm was introduced and designed to learn the subgraph within a specific distance centered around the target variable. Nonetheless, its dependence on the FCBF method for Markov Blanket learning tends to produce approximate sets rather than precise ones [19]. Furthermore, Ling [18] emphasized learning the local graph within a certain distance from the target rather than focusing on the causes of the target.

Finally, the variables in question are varied; specifically, the targets consist of dynamic time series or complex curves, while the other variables may be either discrete or continuous. Consequently, measuring the connections between the target and other variables presents significant challenges. For instance, traditional statistical methods used to assess independence or conditional independence between variables and complex targets might not only be inefficient but also ineffective, especially when there is an insufficient sample size to accurately measure high-order conditional independence.

In this paper, we introduce a causal graphical model tailored for analyzing dynamic targets and propose two methods to identify the causes of such a functional dynamic target assuming no hidden variables or selection biases. Initially, after establishing our dynamic target causal graphical model, we conduct an association analysis to filter out most variables unrelated to the target. With data from the remaining significantly associated variables, we then combine the screening method with structural learning algorithms and introduce the SSL algorithm to identify the causes of the target. Finally, to mitigate the distance effects that can mask the association between a cause and the target in data sets where the causal chain from cause to target is excessively long, we propose a local method. This method initially identifies the direct causes of the target and then proceeds to learn the causes sequentially in reverse order along the causal path.

The main contributions of this paper include the following:

We introduce a causal graphical model that combines Bayesian networks and functional dynamic targets to represent the causal mechanism of variables and the target.
We present a screening method that significantly reduces the dimensions of potential factors and combines it with structural learning algorithms to learn the causes of a given target and prove that all identifiable causes can be learned correctly.
We propose a screening-based and local method to learn the causes of the functional dynamic target up to any given distance among all factors. This method is helpful when association disappears due to the long distance between indirect causes and the target.
We experimentally study our proposed method on a simulation data set to demonstrate the validity of the proposed methods.

2. Preliminary

Before introducing the main results of this paper, we need to clarify some definitions and notations related to graphs. Furthermore, unless otherwise specified, we use capital letters such as V to denote variables or vertices, boldface letters such as

V

to denote variable sets or vectors, and lowercase letters such as v and

v

to denote the realization of a variable or vector, respectively.

A graph

G

is a pair

(V, E)

, in which

V = {V_{1}, \dots, V_{p}}

is the vertex set and

E \subseteq E^{*} (V) : = (V \times V) ∖ {(V_{i}, V_{i}) ∣ V_{i} \in V}

is the edge set. To simplify the symbols, we use

V

to represent both random variables and the corresponding nodes in the graph. For any two nodes

V_{i}, V_{j} \in V

, an undirected edge between

V_{i}

and

V_{j}

, denoted by

V_{i} - V_{j}

, is an edge satisfying

(V_{i}, V_{j}) \in E

and

(V_{j}, V_{i}) \in E

, while a directed edge between

V_{i}

and

V_{j}

, denoted by

V_{i} \to V_{j}

, is an edge satisfying

(V_{i}, V_{j}) \in E

and

(V_{j}, V_{i}) \notin E

. If all edges in a graph are undirected (directed), the graph is called an undirected (directed) graph. If a graph has both undirected and directed edges, then it is called a partially directed graph. For a given graph

G

, we use

V (G)

and

E (G)

to denote its vertex set and edge set, respectively, where

G

can be an undirected, directed, or partially directed graph. For any

V^{'} \subseteq V

, the induced subgraph of

G

over

V^{'}

, denoted by

G (V^{'})

or

G_{V^{'}}

, is the graph with vertex set

V^{'}

and edge set

E (V^{'}) \subseteq E

containing all and only edges between vertices in

V^{'}

, that is,

G_{V^{'}} = (V^{'}, E (V^{'}))

, where

E (V^{'}) : = E \cap (V^{'} \times V^{'})

.

In a graph

G

,

V_{i}

is a parent of

V_{j}

and

V_{j}

is a child of

V_{i}

if the directed edge

V_{i} \to V_{j}

is in

G

.

V_{i}

and

V_{j}

are neighbors of each other if the undirected edge

V_{i} - V_{j}

is in

G

.

V_{i}

and

V_{j}

are called adjacent if they are connected by an edge, regardless of whether the edge is directed or undirected. We use

P a (V_{i}, G), C h (V_{i}, G), N e (V_{i}, G), A d j (V_{i}, G)

to denote the sets of parents, children, neighbors, and adjacent vertices of

V_{i}

in

G

, respectively. For any vertex set

V^{'} \subseteq V

, the parent set of

V^{'}

in

G

can be defined as

P a (V^{'}, G) = \cup_{V_{i} \in V^{'}} P a (V_{i}, G) ∖ V^{'}

. The sets of children, neighbors, and adjacent vertices of

V^{'}

in

G

can be defined similarly. A root vertex is the vertex without parents. For any vertex

V_{i} \in V

, the degree of

V_{i}

in

G

, denoted by

d e g (V_{i}, G)

, is the number of

V_{i}

’s adjacent vertices, that is,

d e g (V_{i}, G) = | A d j (V_{i}, G) |

. The skeleton of

G

, denoted by

G_{s}

, is an undirected graph obtained by transforming all directed edges in

G

to undirected edges, that is,

G_{s} : = (V, E_{S})

, where

E_{s} : = {(V_{i}, V_{j}) \in V \times V ∣ V_{i} \in A d j (V_{j}, G)}

.

The sequence

< V_{i_{1}}, \dots, V_{i_{n}} >

in graph

G

is an ordered collection of distinct vertices

V_{i_{1}}, \dots, V_{i_{n}}

. A sequence becomes a path, denoted by

(V_{i_{1}}, \dots, V_{i_{n}})

, if every pair of consecutive vertices in the sequence is adjacent in

G

. The vertices

V_{i_{1}}

and

V_{i_{n}}

serve as the endpoints, with the rest being intermediate vertices. For a path

π = (V_{i_{1}}, \dots, V_{i_{n}})

in

G

, and for any

1 \leq k \leq n

, the subpath from

V_{i_{1}}

to

V_{i_{k}}

is

π (V_{i_{1}}, V_{i_{k}}) = (V_{i_{1}}, \dots, V_{i_{k}})

, and path

π

can thus be represented as a combination of its subpaths, denoted by

π = π (V_{i_{1}}, V_{i_{k}}) \oplus π (V_{i_{k}}, V_{i_{n}})

. A path is partially directed if there is no directed edge

V_{i_{k + 1}} \to V_{i_{k}}

in

G

for any

k = 1, \dots, n - 1

. A partially directed path is directed (or undirected) if all its edges are directed (or undirected). A vertex

V_{i}

is an ancestor of

V_{j}

and

V_{j}

is a descendant of

V_{i}

if there exists a directed path from

V_{i}

to

V_{j}

or

V_{i} = V_{j}

. The sets of ancestors and descendants of

V_{i}

in the graph

G

are denoted by

A n (V_{i}, G)

and

D e (V_{i}, G)

, respectively. Furthermore, a vertex

V_{i}

is a possible ancestor of

V_{j}

and

V_{j}

is a possible descendant of

V_{i}

if there is a partially directed path from

V_{i}

to

V_{j}

. The sets of possible ancestors and possible descendants of

V_{i}

in graph

G

are denoted by

P o s s A n (V_{i}, G)

and

P o s s D e (V_{i}, G)

, respectively. For any vertex set

V^{'} \subseteq V

, the ancestor set of

V^{'}

in graph

G

is

A n (V^{'}, G) : = \cup_{V_{i} \in V^{'}} A n (V_{i}, G)

. The sets of possible ancestors and (possible) descendants of

V^{'}

in graph

G

can be defined similarly.

A (directed, partially directed, or undirected) cycle is a (directed, partially directed, or undirected) path from a node to itself. The length of a path (cycle) is the number of edges on the path (cycle). The distance between two variables

V_{i}

and

V_{j}

is the length of the shortest directed path from

V_{i}

to

V_{j}

. A directed acyclic graph (DAG) is a directed graph without directed cycles, and a partially directed acyclic graph (PDAG) is a partially directed graph without directed cycles. A chain graph is a partially directed graph in which all partially directed cycles are undirected. This indicates that both DAGs and undirected graphs can be considered as specific types of chain graphs.

In a graph

G

, a v-structure is a tuple

(V_{i}, V_{j}, V_{k})

satisfying

V_{i} \to V_{j} \leftarrow V_{k}

with

V_{i} \notin A d j (V_{k}, G)

, in which

V_{j}

is called a collider. A path

π

is d-separated (blocked) by a set of vertices

Z

if (1)

π

contains a chain

V_{i} \to V_{j} \to V_{k}

or a fork

V_{i} \leftarrow V_{j} \to V_{k}

with

V_{j} \in Z

; (2)

π

contains a v-structure

V_{i} \to V_{j} \leftarrow V_{k}

with

D e (V_{j}, G) \notin Z

, and is d-connected otherwise [20]. Sets of vertices

X

and

Y

are d-separated by

Z

if and only if

Z

blocks all paths from any vertex

V_{i} \in X

to any vertex

V_{j} \in Y

, denoted by

X ⊥ ⊥_{G} Y ∣ Z

. Furthermore, for any distribution P,

X ⊥ ⊥_{P} Y ∣ Z

denotes that

X

and

Y

are conditional independent given

Z

. Given a DAG

G

and a distribution P, the Markov condition holds if

X ⊥ ⊥_{P} Y ∣ Z \Rightarrow X ⊥ ⊥_{G} Y ∣ Z

, while faithfulness holds if

X ⊥ ⊥_{G} Y ∣ Z \Rightarrow X ⊥ ⊥_{P} Y ∣ Z

. In fact, for any distribution, there exists at least one DAG such that the Markov condition holds, but there are some certain distributions that do not satisfy faithfulness to any DAG. Therefore, unlike the Markov condition, faithfulness is often regarded as an assumption. In this paper, unless otherwise stated, we assume that faithfulness holds, that is,

X ⊥ ⊥_{G} Y ∣ Z \Leftrightarrow X ⊥ ⊥_{P} Y ∣ Z

. For simplicity, we use the symbol

⊥ ⊥

to denote both (conditional) independence and d-separation.

From the concepts described, it can be inferred that a DAG characterizes the (conditional) independence relationships among a set of variables. In fact, multiple different DAGs may characterize the same conditional independent relationship. According to the Markov condition and faithfulness assumption, if the d-separation relationship contained in two DAGs is exactly the same, then these two DAGs are said to be Markov equivalent. Furthermore, two DAGs are Markov equivalent if and only if they share the same skeleton and v-structures [21]. All Markov equivalent DAGs constitute a Markov equivalent class, which can be represented by a completely partially directed acyclic graph (CPDAG)

G^{*}

. Two vertices are adjacent in the CPDAG

G^{*}

if and only if they are adjacent in all DAGs in the equivalent class. The directed edge

V_{i} \to V_{j}

in CPDAG

G^{*}

indicates that this directed edge appears in all DAGs within the equivalent class, whereas the undirected edge

V_{i} - V_{j}

signifies that

V_{i} \to V_{j}

is present in some DAGs and

V_{i} \leftarrow V_{j}

in others within the equivalent class [22]. A CPDAG is a chain graph [23] and can be learned by observational data and Meek’s rules [24] (Figure 2).

3. The Causal Graphical Model of Potential Factors and Functional Dynamic Target

Let

X = {X_{1}, \dots, X_{p}}

be a set of random variables representing potential factors and

Y = (Y_{1}, \dots, Y_{q})

be a functional dynamic target, where

Y_{i}

, for

i = 1, \dots, q

, represents the state of the target at q different time points. Let

G

be a DAG defined over

X \cup Y

, and let

G_{X}

be the subgraph induced by

G

over the set of potential factors

X

. Suppose that the causal network of

X

can be represented by

G_{X}

, and when combined with the joint probabilities over

X

, denoted by

P (\cdot)

, we obtain a causal graphical model

(G_{X}, P)

. Consequently, the data generation mechanisms of

X

and

Y

follow a causal Bayesian network model of

G_{X}

and a model determined by the direct causes

P a (Y, G)

of

Y

, respectively. Formally, we define a causal graphical model of the functional dynamic target as follows.

Definition 1.

Let

G

be a DAG over

X \cup Y

,

P a (Y, G)

denote the direct causes of

Y

in

G

and

C h (Y, G) = \emptyset

,

P (\cdot)

be a joint distribution over

X

, and

Θ

be parameters determining the expectations of the functional dynamic target

Y

, which is influenced by

P a (Y, G)

. Then, the triple

(G, P (\cdot), Θ)

constitutes a causal graphical model for

Y

if the following two conditions hold:

The pair $(G_{X}, P)$ constitutes a Bayesian network model for $X$ .
The functional dynamic target $Y$ follows the following model:

$Y = \tilde{μ} (Θ) + {\tilde{ϵ}}_{Y},$

(1)

where $\tilde{μ} (Θ) = (μ (t_{1}, Θ), \dots, μ (t_{q}, Θ))$ is the vector of the mean function at time $t_{1}, \dots, t_{q}$ , and ${\tilde{ϵ}}_{Y} = (ϵ_{Y, t_{1}}, \dots, ϵ_{Y, t_{q}})$ is the vector of error terms with mean of zero, that is, $E (ϵ_{Y, t_{i}}) = 0,$ $i = 1, \dots, q$ .

Different functional dynamic targets use different mean functions. For example, the optimal mean function of growth curves of different species varies from the Gompertz function,

μ (t, (a, b, c)) = a e^{- b e^{- c t}}

, the Richards function,

μ (t, (a, b, c, d)) = a / {(1 + b e^{- c t})}^{d}

, the Hossfeld function,

μ (t, (a, k, c)) = a t^{k} / (c + t^{k})

, and the Logistic function,

μ (t, (a, r, c)) = a / (1 + e^{- r (t - c)})

to Double-Logistic function,

μ (t, (a_{1}, r_{1}, c_{1}, a_{2}, r_{2}, c_{2})) = a_{1} / (1 + e^{- r_{1} (t - c_{1})}) + a_{2} / (1 + e^{- r_{2} (t - c_{2})})

[25,26,27].

A causal graphical model of the functional dynamic target can be interpreted as a data generation mechanism of variables in

X

and

Y

as follows. First, the root variables in

G_{X}

are generated according to their marginal probabilities. Then, following the topological ordering of the DAG

G_{X}

, for any non-root-variable X, when its parent nodes

P a (X, G)

have been generated, X can be drawn from

P (X ∣ P a (X, G))

, which is the conditional probability of X given its parent set

P a (X, G)

. Finally, the target is generated by Equation (1). According to Definition 1, the Markov condition holds for the causal graphical model of a dynamic target, that is, for any pair of variables

X_{i}

and

X_{j}

, the d-separation of

X_{i}

and

X_{j}

given a set

Z

in

G

implies that

X_{i}

and

X_{j}

are conditionally independent given

Z

.

Given a mean function

μ (t, Θ)

, we can estimate parameters

\hat{Θ}

as follows,

\hat{Θ} = \arg \min_{Θ} \sum_{i = 1}^{n} \sum_{j = 1}^{q} {(y_{i, t_{j}} - μ (t_{j}, Θ))}^{2},

where n and q represent the number of individuals and the length of the functional dynamic target, respectively. The residual sum of squares (RSS) is minimized at

\hat{Θ}

. The Akaike information criterion (AIC) can be used to select the appropriate mean function

μ^{*}

to fit the functional dynamic targets. We have

μ^{*} = \arg \min_{μ} (n q + n q \log (2 π) + n q \log (R S S / q) + 2 | \hat{Θ} |) .

4. Variable Screening for Causal Discovery

For the set of potential factors

X

and the functional dynamic target

Y

, our task is to find the direct causes and all causes of

Y

up to a given distance. An intuitive method involves learning the causal graph

G

to find all causes of

Y

. Alternatively, we could first learn the causal graph

G_{X}

and then identify all variables that have directed paths to

Y

. However, as mentioned in Section 1, this intuitive approach has three main drawbacks. To address these challenges, we propose a variable screening method to reduce the number of potential factors, and a hypothesis testing method to test for (conditional) independence between potential factors and

Y

. By integrating these methods with structural learning approaches, we have developed an algorithm capable of learning and identifying all causes of functional dynamic targets.

Let X be a variable with level K. The variable X is not independent of

Y

if there exists at least two values of X, say

X = x_{1}

and

X = x_{2}

, such that the conditional distributions of

Y

given

X = x_{1}

and

X = x_{2}

are different. Conversely, if the conditional distribution of

Y

given

X = x

remains unchanged for any x, we have that X and

Y

are independent. Let

Θ_{x}

be the parameter of the mean function of the functional dynamic target with

X = x

. To ascertain whether the variable X is not independent of

Y

, we implement the following test:

H_{0} : Θ_{x} = Θ_{x^{'}}, \forall x, x^{'} \in {1, \dots, K},

(2)

H_{1} : Θ_{x} \neq Θ_{x^{'}}, \exists x, x^{'} \in {1, \dots, K} .

(3)

Let

y_{i} = (y_{i, t_{1}}, \dots, y_{i, t_{q}})

be the ith sample of the functional dynamic target with

X = x_{i}

. Under the null hypothesis,

y_{i}

is modeled as

y_{i} = \tilde{μ} (Θ) + \tilde{ϵ}

, whereas under the alternative hypothesis, it is modeled as

y_{i} = \tilde{μ} (Θ_{x_{i}}) + \tilde{ϵ}

. Let

\ln L (Y, Θ)

denote the unrestricted log-likelihood of

Y

under

H_{0}

and let

\ln L (Y, Θ^{H_{1}}) = \sum_{x = 1}^{K} \ln L (Y, Θ_{x})

denote the restricted log-likelihood of

Y

under

H_{1}

. The likelihood ratio statistic is calculated as follows:

L R = - 2 (\ln L (Y, Θ) - \ln L (Y, Θ^{H_{1}})) .

(4)

Under certain regular conditions, the statistic

L R

approximately follows

χ^{2}

distribution, and the degrees of freedom of this

χ^{2}

distribution are determined by the difference in the numbers of parameters between

H_{0}

and

H_{1}

, as specified in Equations (2) and (3).

Therefore, by applying hypothesis tests described in Equations (2) and (3) to each potential factor, we can identify all variables significantly associated with the dynamic target. We denote these significant variables as

X_{s i g}

, defined as

X_{s i g} = {X ∣ X \in X, X / ⊥ ⊥ Y}

. Indeed, since the mean function of the dynamic target depends on its direct causes, which in turn depend on indirect causes, the dynamic target ultimately depends on all its causes. Therefore, when X is precisely a cause of

Y

, we can reject the null hypothesis in Equation (2), implying that

X_{s i g}

includes all causes of the dynamic target, assuming no statistical errors. Therefore, given a dynamic target

Y

, perform hypothesis testing of

H_{0}

against

H_{1}

as defined in Equations (2) and (3) to each potential factor sequentially, then we can obtain the set

X_{s i g}

and their corresponding p-values

{X_{p v}}_{X \in X_{s i g}}

, in which

X_{p v}

is the p-value of the variable

X \in X_{s i g}

.

A causal graphical model, as described in Definition 1, necessitates adherence to the Markov conditions for variables and the functional dynamic target. Given the Markov condition and the faithfulness assumption, a natural approach to identifying the causes of the functional dynamic target involves learning the causal structure of

X_{s i g}

and subsequently discerning the relationship between each variable

X \in X_{s i g}

and

Y

. For significant variables

X_{s i g}

, we present the following theorem, with its proof available in Appendix A.2:

Theorem 1.

Suppose that

(G, P, Θ)

constitutes a causal graphical model for the functional dynamic target

Y

as defined in Definition 1, with the faithfulness assumption being satisfied. Let

X_{s i g}

denote the set comprising all variables in

X

that are dependent on

Y

. Then, the following assertions hold:

1.: $X_{s i g}$ consists of all causes and the descendants of these causes of $Y$ , that is, $X_{s i g} = A n (Y, G) \cup D e (A n (Y, G), G)$ .
2.: For any two variables $X_{1}, X_{2} \in X_{s i g}$ , if either $X_{1}$ or $X_{2}$ is a cause of $Y$ , then $X_{1}, X_{2}$ are not adjacent in $G_{X_{s i g}}$ if and only if there exists a set $A \in X_{s i g}$ such that $X_{1} ⊥ ⊥ X_{2} ∣ A$ .
3.: For any two variables $X_{1}, X_{2} \in X_{s i g}$ , if there exists a set $A \in X_{s i g}$ such that $X_{1} ⊥ ⊥ X_{2} ∣ A$ , then $X_{1}, X_{2}$ are not adjacent in $G_{X_{s i g}}$ .

The first result of Theorem 1 implies the soundness and rationality of the method for finding

X_{s i g}

mentioned above. The second result indicates that when at least one end of an edge is a cause of

Y

, this edge can be accurately identified (in terms of its skeleton, not its direction) using any well-known structural learning methods, such as the PC algorithm [28] and GES algorithm [29]. Contrasting with the second result, the third specifies that for any pair of variables

X_{1}, X_{2} \in X_{s i g}

, if a separation set exists in

X_{s i g}

that blocks

X_{1}, X_{2}

, then these variables are not adjacent in the true graph

G

. However, the converse does not necessarily hold due to the potential presence of a confounder or common cause

X_{3} \notin X_{s i g}

, which can led to the appearance of an extraneous edge between

X_{1}

and

X_{2}

in the causal graph

G

derived solely from data on

X_{s i g}

. To accommodate this, the CPDAG learned from

X_{s i g}

is denoted as

G_{X_{s i g}}^{'}

, and the induced subgraph that corresponds to the true graph

G

over

X_{s i g}

is represented as

G_{X_{s i g}}^{*}

. An illustrative example follows to elaborate on this explanation.

Example 1.

In Figure 3, Figure 3a presents a true graph defined over

X = {X_{1}, \dots, X_{5}}

and

Y

. Here, the set of significant variables is

X_{s i g} = {X_{1}, X_{2}, X_{3}, X_{5}}

, and

X_{4}

is independent of

Y

. Figure 3b illustrates the induced subgraph

G_{1, X_{s i g}}^{*}

of the CPDAG

G_{1}^{*}

over the set

X_{s i g}

, while Figure 3c displays the graph learned through the structural learning method, such as the PC algorithm, applied to

X_{s i g}

. It should be noted that, in

G_{1}

,

{X_{4}}

is a separation set of

X_{1}

and

X_{2}

, that is,

X_{1} ⊥ ⊥ X_{2} ∣ X_{4}

. However, since

X_{4} \notin X_{s i g}

and structural learning only utilize data concerning

X_{s i g}

, no separation set exists for

X_{1}

and

X_{2}

in

X_{s i g}

. Consequently,

X_{1}

and

X_{2}

appear adjacent in the learned graph

G_{1, X_{s i g}}^{'}

. Furthermore, given

X_{2} ⊥ ⊥ X_{3}

and

X_{2} / ⊥ ⊥ X_{3} ∣ X_{1}

, the structural learning method identifies a v-structure

X_{2} \to X_{1} \leftarrow X_{3}

. A similar process yiel

X_{1} \to X_{2} \leftarrow X_{5}

. Therefore, a bidirected edge

X_{1} \leftrightarrow X_{2}

appears in the learned graph

G_{1, X_{s i g}}^{'}

but not in

G_{1, X_{s i g}}^{*}

, as highlighted by the red edge in Figure 3c.

Similarly, Figure 3d presents a true graph

G_{2}

defined over

X = {X_{1}, \dots, X_{5}}

and

Y

. In this scenario, the set of significant variables is identified as

X_{s i g} = {X_{1}, X_{2}, X_{3}, X_{5}}

, with

X_{4}

being independent of

Y

. Figure 3b depicts the induced subgraph

G_{2, X_{s i g}}^{*}

of the CPDAG

G_{2}^{*}

over

X_{s i g}

, while Figure 3c illustrates the graph learned through the structural learning method, such as the PC algorithm, applied to

X_{s i g}

. In

G_{2}

, the set

{X_{1}, X_{4}}

acts as a separation set between

X_{2}

and

X_{3}

, indicating

X_{2} ⊥ ⊥ X_{3} ∣ (X_{1}, X_{4})

. However, with

X_{4} \notin X_{s i g}

and structural learning relying solely on data concerning

X_{s i g}

, a separation set for

X_{2}

and

X_{3}

in

X_{s i g}

no longer exists. As a result,

X_{2}

and

X_{3}

appear adjacent in the learned graph

G_{2, X_{s i g}}^{'}

. Furthermore, given

X_{3} ⊥ ⊥ X_{5}

and

X_{3} / ⊥ ⊥ X_{5} ∣ X_{2}

, the structural learning method is capable of identifying a v-structure

X_{3} \to X_{2} \leftarrow X_{5}

. Therefore, a directed edge

X_{3} \to X_{2}

is present in the learned graph

G_{2, X_{s i g}}^{'}

but not in

G_{2, X_{s i g}}^{*}

, as highlighted by the red edge in Figure 3f.

Example 1 illustrates two scenarios in which the graph

G_{X_{s i g}}^{'}

might include false positive edges that do not exist in

G_{X_{s i g}}^{*}

. Importantly, these additional false edges may not appear between the causes of

Y

. Instead, they may occur between the causes and noncauses of

Y

, or exclusively among the noncauses of

Y

, as delineated in Theorem 1. The complete result is given by Proposition A1 in Appendix A.2. Indeed, a more profound inference can be drawn: the presence of extra edges does not compromise the structural integrity concerning the causes of

Y

, affecting neither the skeleton nor the orientation.

Theorem 2.

The edges in

E_{s} (G_{X_{s i g}}^{'}) ∖ E_{s} (G_{X_{s i g}}^{*})

, if exists, do not affect the skeleton or orientation of edges among the ancestors of

Y

in

G^{*}

. Furthermore, we have

A n (Y, G_{X_{s i g} \cup Y}^{*}) = A n (Y, G_{X_{s i g} \cup Y}^{'})

and

P o s s A n (Y, G_{X_{s i g} \cup Y}^{*}) \subseteq P o s s A n (Y, G_{X_{s i g} \cup Y}^{'})

, where

G_{X_{s i g} \cup Y}^{*}

and

G_{X_{s i g} \cup Y}^{'}

are graphs by adding a node

Y

and directed edges from

Y

’s direct causes to

Y

in graphs

G_{X_{s i g}}^{*}

and

G_{X_{s i g}}^{'}

, respectively.

According to Theorem 2, it is evident that although the graph

G_{X_{s i g}}^{'}

obtained through structural learning does not exactly match the induced subgraph of CPDAG

G^{*}

over

X_{s i g}

corresponding to the true graph, the causes of the functional dynamic target

Y

in these two graphs are identical, including the structure among these causes. Thus, in terms of identifying the causes, the two graphs can be considered equivalent. Furthermore, Theorem 2 indicates that all possible ancestors of

Y

in

G_{X_{s i g} \cup Y}^{*}

are also possible ancestors in

G_{X_{s i g} \cup Y}^{'}

, though the converse may not hold. The detailed proof is available in Appendix A.2.

Example 2.

The true graph

G

is given by Figure 4a, and the corresponding CPDAG

G^{*}

is itself, that is,

G = G^{*}

. In this case, the set of significant variables is

X_{s i g} = {X_{1}, X_{2}, X_{4}}

. Figure 4b is the induced graph

G_{X_{s i g} \cup Y}^{*}

of

G

(

G^{*}

) over

X_{s i g}

, and Figure 4c is the CPDAG

G_{X_{s i g} \cup Y}^{'}

obtained by using the structural learning method on

X_{s i g}

. Then, we have

A n (Y, G_{X_{s i g} \cup Y}^{*}) = A n (Y, G_{X_{s i g} \cup Y}^{'}) = {X_{1}}

, while

P o s s A n (Y, G_{X_{s i g} \cup Y}^{*}) = \emptyset \subseteq {X_{2}, X_{4}} = P o s s A n (Y, G_{X_{s i g} \cup Y}^{'})

.

According to the causal graphical model in Definition 1 and the faithfulness assumption,

Y

is the sum of the mean function and an independent noise, and the mean function is a deterministic function of

Y

’s direct causes. Therefore, for any nondescendant of

Y

, say X, given the direct causes of

Y

, that is,

P a (Y, G)

, X is independent of

Y

. On the contrary, for any

X \in X_{s i g}

, X is a direct cause of

Y

if and only if there is no subset

A \subseteq A d j (X, G)

such that

X ⊥ ⊥ Y ∣ A

.

Let

A

be a subset of

X_{s i g}

. For any

X \in X_{s i g}

and

X \notin A

, to test the conditional independence

X ⊥ ⊥ Y ∣ A

, consider the following test:

H_{0} : Θ_{A = a, X = x} = Θ_{A = a, X = x^{'}}, \forall a, x, x^{'},

(5)

H_{1} : Θ_{A = a, X = x} \neq Θ_{A = a, X = x^{'}}, \exists a, x, x^{'} .

(6)

Under the null hypothesis, the parameter only depends on the value of the set

A

, which can be denoted as

Θ_{A}

, while under the alternative hypothesis, the parameter is determined by the values of both

A

and X, which can be denoted as

Θ_{A, X}

. Let

\ln L (Y, Θ_{A})

be the log-likelihood of

Y

under

H_{0}

, and

\ln L (Y, Θ_{A, X})

be the log-likelihood of

Y

under

H_{1}

. The likelihood ratio statistic is

L R = - 2 (\ln L (Y, Θ_{A}) - \ln L (Y, Θ_{A, X})) .

(7)

Under certain regular conditions, the statistic

L R

approximately follows

χ^{2}

distribution, with degrees of freedom equal to

| Θ_{A, X} | - | Θ_{A} |

.

Based on the above results, we propose a screening and structural learning-based algorithm to identify the causes of the functional dynamic target

Y

, as detailed in Algorithm 1.

In Algorithm 1, the initial step involves learning the structure over

X_{s i g}

utilizing data related to

X_{s i g}

through a structural learning method, detailed in Lines 1–6. The notation

X * - Y

in Lines 3–4 signifies that the connection between X and Y could be either

X - Y

or

X \leftarrow Y

. We first learn the skeleton of

X_{s i g}

following the same procedure as the PC algorithm (Line 1), with the details in Appendix A.1. Nevertheless, due to the potential occurrence of bidirected edges, adjustments are made in identifying v-structures (Lines 2–5), culminating in the elimination of all bidirected edges. According to Theorem 1, these bidirected edges, which are removed directly (Line 5), are present only between causative and noncausative variables or among noncausative variables of the functional dynamic target. Since these variable pairs are inherently (conditional) independent, removing such edges does not compromise the (conditional) independence relationships among the remaining variables, as shown in Theorem 2 and Example 1. Subsequently, we designate the set of direct causes as

D C : = X_{s i g}

and sequence these variables in ascending order of their correlations with

Y

(Lines 7–8). This is because variables with weaker correlation are less likely to be the direct cause of

Y

. Placing these variables at the beginning of the sequence can quickly exclude non-direct-cause variables in the subsequent conditional independence tests, thereby enhancing the algorithm’s efficiency, simplifying its complexity, and reducing the required number of conditional independence tests. Next, we add directed edges from all vertices in

X_{s i g}

to

Y

(Line 9) to construct the graph

G_{X_{s i g} \cup Y}^{'}

. For each directed edge, say

X \to Y

, we check the conditional independence of X and

Y

given a subset

A_{X}

of

D C

(Lines 12–14). In seeking the separation set

A_{X}

, the search starts with single-element sets, progressing to sets comprising two elements, and so forth. Upon identifying a separation set, both vertices and directed edges are removed from

D C

and

G_{X_{s i g} \cup Y}^{'}

, respectively (Lines 15–17). Lastly, if the separation set’s size k surpasses that of

D C

, implying that no conditional independence of X and

Y

can be found given any subset of

D C

, the directed edge

X \to Y

remains in

G_{X_{s i g} \cup Y}^{'}

.

Algorithm 1 SSL: Screening and structural learning-based algorithm

Require:

X_{s i g}

and their corresponding p-values

{X_{p v}}_{X \in X_{s i g}}

, data sets about

X_{s i g}

and

Y

.
Ensure: Causes of

Y

.

1:: Learn the skeleton $G_{s}^{'} (X_{s i g})$ of the CPDAG $G_{X_{s i g}}^{'}$ defined on $X_{s i g}$ and obtain corresponding separation sets $S$ based on the data set related to $X_{s i g}$ via Algorithm A1 in Appendix A.1;
2:: repeat
3:: Find the structure $X * - Y - * Z$ satisfying $X \notin A d j (Z, G)$ in graph $G_{s}^{'} (X_{s i g})$ ;
4:: If $Y \notin S (X, Z)$ , then orient as $X * \to Y \leftarrow * Z$ ;
5:: until All structures $X * - Y - * Z$ with $X \notin A d j (Z, G)$ in $G_{s}^{'} (X_{s i g})$ have been tested;
6:: Construct the CPDAG $G_{X_{s i g}}^{'}$ by deleting all bidirected edges and using Meek’s rules to orient as many undirected edges as possible in graph $G_{s}^{'} (X_{s i g})$ ;
7:: Let $D C : = X_{s i g}$ ;
8:: Sort $D C$ in ascending order of associations with $Y$ using ${X_{p v}}_{X \in X_{s i g}}$ ;
9:: Let $G_{X_{s i g} \cup Y}^{'}$ be the graph by adding a node $Y$ to the graph $G_{X_{s i g}}^{'}$ ,and for each $X \in X_{s i g}$ , add a directed edge $X \to Y$ to the graph $G_{X_{s i g} \cup Y}^{'}$ ;
10:: Set $k : = 1$ ;
11:: while $k < | D C |$ , do
12:: for each vertex $X \in D C$ , do
13:: for each subset $A_{X}$ of $D C ∖ {X}$ with k vertices, do
14:: Test the conditional independence $Y ⊥ ⊥ X ∣ A_{X}$ using Equations (5) and (6);
15:: if $Y ⊥ ⊥ X ∣ A_{X}$ , then
16:: Delete the directed edge $X \to Y$ in graph $G_{X_{s i g} \cup Y}^{'}$ ;
17:: Let $D C : = D C ∖ {X}$ ;
18:: end if
19:: end for
20:: end for
21:: $k : = k + 1$ ;
22:: end while
23:: return $G_{X_{s i g} \cup Y}^{'}$ .

According to Theorem 1 and the discussion after Example 2,

D C

is the set of all direct causes of

Y

if all assumptions in Theorem 1 hold and all statistical tests are correct. Further, according to Theorem 2, all ancestors of

Y

can be obtained from the graph

G_{X_{s i g} \cup Y}^{'}

. Therefore, Algorithm 1 can learn all the causes of

Y

correctly.

Note that in Algorithm 1, we first traverse the sizes of the separation set (Line 11) and then, for each given size, traverse all variables in the

D C

set and all possible separations with that size (Line 12 and 13) to test for the conditional independence of each variable and

Y

. That is, first fix the size of the separation set to 1, and then traverse all variables. After all variables are traversed once, increase the size of the separation set to 2, and then traverse all variables again. The advantage of this arrangement is that it can quickly remove the nondirect causes of

Y

and reduce the size of the

D C

set, thereby reducing the number of conditional independence tests and improving their accuracy. Furthermore, it is worth mentioning that the reason why we directly add directed edges from variables in

D C

to

Y

in graph

G_{X_{s i g} \cup Y}^{'}

(Line 9) is because we assume the descendant set of

Y

is empty, as shown in Definition 1, and in this case,

Y

’s adjacent set is exactly the direct causes we are looking for. If there is no such assumption, then it is necessary to judge the variables in

Y

’s adjacent set and distinguish the parents from the children.

5. A Screening-Based and Local Algorithm

Based on the previous results and discussions, we can conclude that Algorithm 1 is capable of correctly identifying the causes of a functional dynamic target. However, Algorithm 1 requires recovering the complete causal structure of

X_{s i g}

and

Y

. As analyzed in Section 1, learning the complete structure is unnecessary for identifying the causes of the target. Furthermore, Algorithm 1 may be influenced by the distance effect, whereby the correlation between a cause and the target may diminish from the data when the path from the cause to the target is too lengthy. Consequently, identifying this cause variable through observational data becomes challenging, potentially leading to missed causes. Therefore, we propose a screening-based and local approach to address these challenges.

In this section, we introduce a three-stage approach to learn the causes of functional dynamic targets. Initially, utilizing the causal graphical model, we apply a hypothesis testing method to screen variables, identifying factors significantly correlated with the target. Subsequently, we employ a constraint-based method to find the direct causes of the target from these significant variables. Lastly, we present a local learning method to discover the causes of these direct causes within any specified distance. We begin with the introduction of a screening-based algorithm that can learn the direct causes of

Y

, as shown in Algorithm 2.

In Algorithm 2, we initially set the set of direct causes

D C : = X_{s i g}

and arrange these variables in ascending order of their correlations with

Y

(Lines 1–2), which is the same as Algorithm 1. We introduce a set

N_{X}

to contain variables determined not to belong to X’s separation set, starting as an empty set (Line 3). We then check the conditional independence of each variable

X \in D C

with

Y

. During the search for the separation set

A_{X}

,

A

is set as all subsets of

D C ∖ (X \cup N_{X})

with k variables and is arranged roughly in descending order of their associations with

Y

(Lines 7–8). This is because variables that have a stronger correlation with

Y

are more likely to be the direct causes and are also more likely to become the separation set of other variables. Placing these variables at the beginning of the order can quickly find the separation set of nondirect causes and remove these variables from

D C

, which can reduce the number of conditional independence tests and accelerate the algorithm. Once we find the separation set

A_{X}

for X and

Y

, we remove X from

D C

and add X to

N_{V}

for each

V \in A_{X}

(Lines 11–13). This is because when

A_{X}

is the separation set of X and

Y

, the variables in

A_{X}

appear in the path from X to

Y

. Consequently, X should not be in the separation set for variables in

A_{X}

with respect to

Y

. Compared with Algorithm 1, introducing

N_{X}

in Algorithm 2 improves efficiency and speed. While Algorithm 1 requires examining every subset of

D C ∖ X

(Line 8 in Algorithm 1), Algorithm 2 only needs to evaluate subsets of

D C ∖ (X \cup N_{X})

(Line 7 in Algorithm 2). The theoretical validation of Algorithm 2’s correctness is presented below.

Algorithm 2 Screening-based algorithm for learning direct causes of

Y

Require:

X_{s i g}

and their corresponding p-values

{X_{p v}}_{X \in X_{s i g}}

, data sets about

X_{s i g}

and

Y

.
Ensure: Direct causes of

Y

.

1:: Let $D C : = X_{s i g}$ ;
2:: Sort $D C$ in ascending order of associations with $Y$ using ${X_{p v}}_{X \in X_{s i g}}$ ;
3:: Let $N_{X} : = \emptyset$ for each $X \in D C$ ;
4:: Set $k : = 1$ ;
5:: while $k < | D C |$ , do
6:: for each vertex X in $D C$ , do
7:: Let $A$ be the set of all subsets of $D C ∖ ({X} \cup N_{X})$ with k variables;
8:: Sort $A$ approximately in descending order of associations with $Y$ ;
9:: for each $A_{X} \in A$ , do
10:: Test the conditional independence $Y ⊥ ⊥ X ∣ A_{X}$ using Equations (5) and (6);
11:: if $Y ⊥ ⊥ X ∣ A_{X}$ , then
12:: Set $D C : = D C ∖ {X}$
13:: Add X to $N_{V}$ for each $V \in A_{X}$ ;
14:: break
15:: end if
16:: end for
17:: end for
18:: $k : = k + 1$
19:: end while
20:: return $D C$ .

Theorem 3.

If all assumptions in Theorem 1 hold, and there are no errors in the independence tests, then Algorithm 2 can correctly identify all direct causes of

Y

.

Next, we aim to identify all causes of

Y

within a specified distance. One natural method is to recursively apply Algorithm 2, starting with

Y

’s direct causes and then expanding to their direct causes. This process continues until all causes within the set distance are found. However, this method’s effectiveness for

Y

relies on the assumption that

Y

has no descendants, making its adjacent set its parent set. This is not the case for other variables. Thus, we must further analyze and distinguish variables in the adjacent set of other variables. Consequently, we introduce the LPC algorithm in Algorithm 3.

Algorithm 3

L P C (T, U)

algorithm

Require: a target node T, a data set over variables

X

, a non-PC set

U

.
Ensure: the PCD set of T and set

S

containing all separation relations.

1:: Set $P C D : = {X : X \in X ∖ U, and T / ⊥ ⊥ X}$ ; $k : = 1$ ; $S : = \emptyset$ ;
2:: while $k < | P C D |$ , do
3:: for each vertex $X \in P C D$ , do
4:: if there exists $A_{X} \subseteq P C D ∖ {X}$ such that $| A_{X} | = k$ and $(T ⊥ ⊥ X ∣ A_{X})$ , then
5:: Set $P C D : = P C D ∖ {X}$ and add tuple $(T, X, A_{X})$ to $S$ ;
6:: end if
7:: end for
8:: $k : = k + 1$ ;
9:: end while
10:: return $P C D$ and $S$ .

Algorithm 3 aims to learn the local structure of a given target variable T, but in fact, the final

P C D

set includes T’s Parents, Children, and Descendants. This is because when verifying the conditional independence (Line 4), we remove some nonadjacent variables of T in advance (Line 1), resulting in some descendant variables being unable to find the corresponding separation set.

Example 3.

In Figure 5, let

T = X_{1}, U = \emptyset

. Since

X_{1} ⊥ ⊥ X_{4}

, we initially have

P C D = {X_{2}, X_{3}}

(Line 1 in Algorithm 3). Note that there originally exists a conditional independent relationship

X_{1} ⊥ ⊥ X_{3} ∣ (X_{2}, X_{4})

in the graph, but since we remove the vertex

X_{4}

in advance, there is no longer a separation set of

X_{1}

and

X_{3}

in the set of

P C D

. Therefore,

X_{3}

cannot be removed from

P C D

further and the output

P C D_{X_{1}} = {X_{2}, X_{3}}

, that is,

X_{3}

, which is a descendant of

X_{1}

but not a child of

X_{1}

, is included in

P C D_{X_{1}}

.

Example 3 illustrates that there may indeed be some nonchildren descendants of the target variable in the

P C D

set obtained by Algorithm 3. Below, we show that one can identify these non-child-descendant variables by repeatedly applying Algorithm 3. For example, in Example 3, the

P C D

set of

X_{1}

is

P C D_{X_{1}} = {X_{2}, X_{3}}

. Then, we can apply Algorithm 3 to

X_{3}

and find that the

P C D

set of

X_{3}

is

P C D_{X_{3}} = {X_{2}, X_{4}}

. It can be seen that

X_{1}

is not in

P C D_{X_{3}}

. Hence, we can conclude that

X_{3}

is a nonchildren descendant of

X_{1}

; otherwise,

X_{1}

must be in

P C D_{X_{3}}

. Through this method, we can delete the non-child-descendant variables from the

P C D

set, so that the

P C D

set only contains the parents and children of the target variable. Based on this idea, we propose a step-by-step algorithm to learn all causes of a functional dynamic target locally, as shown in Algorithm 4.

Algorithm 4 PC-by-PC: Finding all causes of target

T

within a given distance

Require: a target set

T \subseteq V = X \cup Y

, data set over

V

, and the maximum distance m.
Ensure: all causes of

T

with length up to m.

1:: Set $n : = 1, n^{'} : = 0, C a n C : = T$ ;
2:: Initial graph $G$ with directed edges from each vertex in $T$ to an auxiliary node L;
3:: repeat
4:: Set $X = C a n C_{n}$ ;
5:: Let $U = {V : X \notin P C_{V}, \forall V \in C a n C_{1 : n - 1}}$ ;
6:: Get the $P C D$ set and the separation set $(P C_{X}, S_{X}) = L P C (X, U)$ ;
7:: for each $V \in P C_{X} \cap C a n C_{1 : n - 1}$ do
8:: if $X \in P C_{V}$ then
9:: Add an undirected edge $X - V$ to graph $G$ ;
10:: else
11:: $P C_{X} : = P C_{X} ∖ {V}, P C_{V} : = P C_{V} ∖ {X}$ ;
12:: end if
13:: end for
14:: Update $G$ by modifying structures like $V_{1} - X - V_{2}$ , $X - V_{1} - V_{2}$ and $X - V_{1} \leftarrow V_{2}$ to $V_{1} \to X \leftarrow V_{2}$ , $X \to V_{1} \leftarrow V_{2}$ and $X \to V_{1} \leftarrow V_{2}$ respectively, if the middle vertex is not in the separation set of the two end vertices;
15:: if X is the last vertex of $C a n C$ then
16:: Update $G$ by orientating undirected edges as much as possible via Meek’s rule;
17:: for each $V \in C a n C_{n^{'} : n}$ , do
18:: Add $P C_{V} ∖ C a n C$ to the end of $C a n C$ if $| P a t h (V, L) | < m$ or the m-th edge close to L in $P a t h (V, L)$ is undirected;
19:: end for
20:: end if
21:: $n^{'} : = n, n : = n + 1$ ;
22:: until X is the last vertex of $C a n C$ ;
23:: return $G$ and $S$ .

In Algorithm 4,

C a n C_{n}

represents the n-th variable in the set

C a n C

, and

C a n C_{1 : n - 1}

represents the first to the

(n - 1)

-th variable in the set

C a n C

.

P a t h (V, L)

denotes the shortest path from V to L in graph

G

. There are many methods to learn the shortest path, such as the Dijkstra algorithm [30]. Algorithm 4 uses the mentioned symmetric validation method to remove descendants from the

P C D

set (Lines 7–13), and hence, we directly write the

P C D

set as the

P C

set (Line 6). When our task is to learn all causes of a functional dynamic target

Y

, the target set

T

as the algorithm input is all direct causes of

Y

, which can be obtained by Algorithm 2, and the auxiliary node L is exactly the functional dynamic target

Y

(Line 2). In fact, we can prove it theoretically, as shown below.

Theorem 4.

If the faithfulness assumption holds and all independence tests are correct, then Algorithm 4 can learn all causes of the input target set

T

within a given distance m correctly. Further, if all assumptions in Theorem 1 holds,

T

is the set of direct causes of the functional dynamic target

Y

, and the auxiliary node L in Algorithm 4 is

Y

, then Algorithm 4 can learn all causes of

Y

within a given distance

(m + 1)

correctly.

Note that the above algorithm gradually spreads outward from the direct causes of

Y

, and at each step, the newly added nodes are all in the

P C

set of previous nodes (Line 18), which only involves the local structure of all causes of

Y

, greatly improving the efficiency and accuracy of the algorithm. Moreover, Algorithm 4 identifies the shortest path between each cause variable and

Y

. When the m-th edge on one path from

Y

cannot be oriented, it only continues to expand from that path, instead of expanding all paths (Line 18 in Algorithm 4), which simplifies the algorithm and reduces the learning of redundant structures.

6. Experiments

In this section, we compare the effectiveness of different methods for learning the direct and all causes of a functional dynamic target through simulation experiments. As mentioned before, to our knowledge, existing structural learning algorithms lack the specificity needed to identify causes of functionally dynamic targets, so we only compare the methods we proposed, which are as follows:

SSL algorithm: The screening and structural learning-based algorithm given in Algorithm 1, which can learn both direct and all causes of a dynamic target simultaneously;
S-Local algorithm: First, use the screening-based algorithm given in Algorithm 2, which can learn direct causes of a functional dynamic target, and then use the PC-by-PC algorithm given in Algorithm 4, which can learn all causes of a functional dynamic target.

In fact, our proposed SSL algorithm integrates elements of the screening method with those of traditional constraint-based structural learning techniques, as depicted in Algorithm 1. In its initial phase, the SSL algorithm is a modified version of the PC algorithm, extending its capabilities to effectively handle bidirectional edges introduced by the screening process. This extension of the PC algorithm, tailored to address the causes of the dynamic target, positions the SSL algorithm as a strong candidate for a benchmark.

In this simulation experiment, we randomly generate a causal graph

G

consisting of a dynamic target

Y

and p = (15, 100, 1000, 10,000) potential factors. Additionally, we randomly select 1 to 2 variables from these potential factors to serve as direct causes for

Y

. The potential factors are all discrete with finite levels, while the functional dynamic target

Y = (Y_{1}, \dots, Y_{24})

is a continuous vector, and its mean function is a Double-Logistic function, that is,

Y_{t} = μ_{t ∣ P a (Y, G)} + ϵ_{t}, t = 1, \dots, 24,

where

μ_{t ∣ P a (Y, G)} = \frac{a_{1 ∣ P a (Y, G)}}{1 + \exp (- r_{1 ∣ P a (Y, G)} (t - c_{1 ∣ P a (Y, G)}))} + \frac{a_{2 ∣ P a (Y, G)}}{1 + \exp (- r_{2 ∣ P a (Y, G)} (t - c_{2 ∣ P a (Y, G)}))},

and

ϵ_{t} = ϵ_{t - 1} + ε_{t}, ε_{t} \sim N (0, {0.02}^{2})

. The

P a (Y, G)

in the subscript of the above equations indicates that parameters are affected by the direct causes of

Y

. For each causal graph

G

, we randomly generate the corresponding causal mechanism, that is, the marginal and conditional distributions of potential factors and the functional dynamic target, and generate the simulation data from it. We use different sample sizes

n = (50, 100, 200, 500, 1000)

and repeat the experiment 100 times for each sample size. In addition, we adopt adaptive significance level values in the experiment, because as the number of potential factors increases, the strength of screening also increases. In other words, as the number of potential factors p increases, the significance level

α

of the (conditional) independence test decreases. For example,

α

is 0.05 when

p = 100

, while

α

is 0.0005 when p = 10,000.

To evaluate the effectiveness of different methods, suppose

X_{l}

is the set of learned direct causes of

Y

by algorithms, and

X_{d}

is the set of true direct causes of

Y

in the generated graph. Then, let

T P = | X_{l} \cap X_{d} |

,

F P = | X_{l} ∖ X_{d} |

,

F N = | X_{d} ∖ X_{l} |

, and we have

r e c a l l = \frac{T P}{T P + F N}, p r e c i s i o n = \frac{T P}{T P + F P}, a c c u r a c y = \frac{p - F P - F N}{p},

where p is the number of potential factors. It can be seen that the recall measures how much the algorithm has learned among all the true direct causes. Precision measures how much of the direct causes learned by the algorithm are correct. Accuracy measures the proportion of correct judgments on whether each variable is a direct cause or not. The evaluation indicators for learning all causes can also be defined similarly.

The experiment results are shown in Table 1, in which

t i m e

represents the total time (in seconds) consumed by the algorithm, and

r e c, p r e c, a c c

represent the average value of recall, precision, and accuracy over 100 experiments, respectively. In addition, different subscripts represent different methods.

D C

and

A C

denote that algorithms learn direct causes and all causes, respectively.

In Table 1, since the SSL algorithm obtains direct and all causes simultaneously through complete structural learning, for the sake of fairness, we only count the total time for both algorithms. It can be seen that the time of the two algorithms is approximately linearly related to the number of potential factors p. Moreover, when p is fixed, the algorithm takes longer and longer as the sample size n increases. In fact, for SSL algorithms, most of the time is spent on learning the complete graph structure. Therefore, as n increases, the (conditional) independence test becomes more accurate, resulting in an increase in the size of set

X_{s i g}

and a larger graph to learn, which naturally increases the time required. For the S-Local algorithm, more than 99% of the time is spent on optimizing the log-likelihood function during the (conditional) independence test in the screening stage. As n increases, the optimization time becomes longer and the total time also increases accordingly. This also explains why the time of the S-Local algorithm increases linearly as the number of variables increases, since the number of independence tests required increases roughly linearly. In addition, it can be seen that in most cases, the S-Local algorithm takes less time than the SSL algorithm, especially when p is small. However, when p is large, the time used by the two algorithms is similar. This is mainly because in this experimental setting, the mechanism of the functional dynamic target

Y

is relatively complex, and its mean function is a Double-Logistic function with too many parameters, which requires much time for optimization. In fact, even if there is only one binary direct cause, the mean function will have 13 parameters. When the mechanism of the functional dynamic target is relatively simple, the time required for the S-Local algorithm will also be greatly reduced. Besides, it should be noted that more than 99% of the time, the S-Local algorithm is used to check the independence in the screening step, and in practice, this step can be performed in parallel, which will greatly reduce the time required.

When learning the direct cause, whether it is recall, precision, or accuracy, the results of the S-Local algorithm are much higher than those of the SSL algorithm, especially the value of precision. The precision values of the SSL algorithm are very small, mainly because the accuracy of learning the complete graph structure is relatively low, resulting in learning many non-direct-cause variables in the local structure of

Y

. Particularly when p is large, it is difficult to correctly recover the local structure of

Y

. What’s more, it should be noted that under the same sample size, when p is small, the values of recall, precision and accuracy obtained by S-Local algorithm are not as good as those obtained when p is large. For example, when

p = 15

,

n = 50

, we have

r e c_{S - L o c a l} = 0.500

,

p r e c_{S - L o c a l} = 0.474

and

a c c_{S - L o c a l} = 0.929

, but when p = 10,000, n = 50, we have

r e c_{S - L o c a l}

= 1.000,

p r e c_{S - L o c a l}

= 1.000 and

a c c_{S - L o c a l}

= 1.000. The recall and accuracy values of the SSL algorithm also show similar results. This result does not violate our intuition, as we use adaptive significance levels in the experiment. When p is large, in order to increase the strength of screening and facilitate subsequent learning of all causes, we use a smaller significance level. Therefore, the algorithm is more rigorous in determining whether a variable is the direct cause of

Y

when learning direct causes, making it easier to exclude those non-direct-cause variables.

When learning all causes, the recall and accuracy values of the SSL algorithm and S-Local algorithm increase monotonically with respect to the sample size, and even in cases with many potential factors, both algorithms can achieve very good results. For example, when p = 10,000, the accuracy values of both algorithms are above

99.9 %

. Of course, overall, the results of the S-Local algorithm are significantly better than those of the SSL algorithm. However, it should be noted that the values of precision of the two algorithms show different trends. The precision value of the SSL algorithm increases monotonically with n when p is large, but the trend is not significant when p is small. This is because the SSL algorithm is affected by the distance effect, and as n gradually increases, (conditional) independence tests also become more accurate. As a result, many causes that are far away from

Y

can be identified. When p is large, the number of causes that are far away from

Y

is also large. Therefore, the precision of the SSL algorithm will gradually increase. However, when p is small, most variables have a short distance from

Y

. Although the SSL algorithm can also obtain more causes (the value of

r e c_{S S L}

increases), it also includes some noncause variables that are strongly related to

Y

in the set of causes. At this time, the value of precision does not have a clear trend. On the other hand, the precision value of the S-Local algorithm monotonically increases with respect to n when p is small, and as p gradually increases, this trend gradually transforms into a monotonic decrease. This is because when p is small, as n increases, the S-Local algorithm can identify more causes through a more accurate (conditional) independence test. However, when p is large, the number of noncause variables obtained by the S-Local algorithm is greater than the number of causes. Therefore, the recall value still increases, but the precision value gradually decreases. In other words, in this case, there is a trade-off between the values of recall and precision of the S-Local algorithm. However, it should be noted that although the trends of precision values are different, the accuracy values of both algorithms increase with the increase in sample size.

It should be noted that the primary objective of the models and algorithms introduced in this paper is to identify the causes of functional dynamic targets, addressing the "Cause of Effect" (CoE) challenge, rather than directly predicting

Y

. However, based on the causal graphical model for these targets, correctly identifying

Y

’s direct causes is indeed sufficient for making accurate predictions. In the simulation experiment, with 15 nodes and 1000 samples, the Mean Squared Error (MSE) of prediction is 0.281 for simulations that incorrectly learn

Y

’s direct causes. This figure dropped to 0.185 when the causes were correctly identified, reflecting a significant reduction in prediction error of approximately 34%. Additionally, as illustrated in Table 1, the S-Local algorithm demonstrated exceptional accuracy in identifying the direct causes, with a success rate consistently above 98% in most cases. This high level of accuracy indicates that our algorithms perform well in predicting

Y

as well.

7. Discussion and Conclusions

In this paper, we first establish a causal graphical model for functional dynamic targets and discuss hypothesis testing methods for testing the (conditional) independence between random variables and functional dynamic targets. In order to deal with situations where there are too many potential factors, we propose a screening algorithm to screen out some variables that are significantly related to the functional dynamic target from a large number of potential factors. On this basis, we propose the SSL algorithm and S-Local algorithm to learn the direct causes and all causes within a given distance of functional dynamic targets. The former utilizes the screening algorithm and structural learning methods to learn both the direct and all causes of functional dynamic targets simultaneously by recovering the complete graph structure of the screened variables. Its disadvantage is that learning the complete structure of the graph is very difficult and redundant, and it is also affected by the distance effect, resulting in a low accuracy in learning causes. The latter first uses a screening-based algorithm to learn the direct causes of functional dynamic targets, and then uses our proposed PC-by-PC algorithm, a step-by-step locally learning algorithm, to learn all causes within a given distance. The advantage of this algorithm is that all learning processes are controlled within the local structure of current nodes, making the algorithm no longer affected by the distance effect. In fact, this algorithm only focuses on the local structure of each cause variable, rather than learning the complete graph structure, greatly saving time and space. Moreover, the algorithm not only pays attention to the distance, but also can identify the direct path between each cause variable and the functional dynamic target, so that the algorithm does not need to identify the whole structure of a certain part but only learns the part of the local structure involving the cause variables, further reducing the learning of redundant structures.

It should be noted that when the causal mechanism of functional dynamic targets is very complex, the time required for the S-Local algorithm may greatly increase. In addition, the choice of significance level will also have an impact on the precision of the algorithm. Thus, how to simplify the causal model of functional dynamic targets and how to reasonably choose an appropriate significance level are two directions of our future work.

Author Contributions

Conceptualization, R.Z. and X.Y.; methodology, R.Z. and X.Y.; software, R.Z. and X.Y.; validation, R.Z. and Y.H.; formal analysis, R.Z., X.Y. and Y.H.; investigation, R.Z., X.Y. and Y.H.; resources, Y.H.; data curation, R.Z. and X.Y.; writing—original draft preparation, R.Z.; writing—review and editing, R.Z. and Y.H.; visualization, R.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by National Key Research and Development Program of China grant number 2022ZD0160300.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The simulated data can be regenerated using the codes, which can be provided to the interested user via an email request to the correspondence author.

Acknowledgments

Thank Qingyuan Zheng for providing technical support.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Appendix A.1

In Algorithm 1, we first screen variables significantly related to

Y

through the hypothesis testing of

H_{0}

against

H_{1}

as defined in Equations (2) and (3), and learn the causal structure of these variables via a structural learning method. However, it should be noted that due to the absence of some variables in some separation sets, the graph we learn here is not the true CPDAG, but may have some extra edges, which may be directed or bidirected. Therefore, we need to make some modifications to the structural learning algorithm. As shown in Algorithm 1, we first learn the skeleton through the original structural learning method and then find v-structures by using a variant algorithm (Lines 2–5 in Algorithm 1). Hence, taking the PC algorithm as an example, we give the method to learn the skeleton

G_{s}^{'} (X_{s i g})

below.

Algorithm A1 PC algorithm to learn the skeleton

G_{s}^{'} (X_{s i g})

.

Require:

X_{s i g}

, data sets about

X_{s i g}

.
Ensure: the skeleton

G_{s}^{'} (X_{s i g})

and all separation sets

S

.

1:: Construct a complete undirected graph $G$ defined over $X_{s i g}$ ;
2:: Set $k : = - 1$ ;
3:: repeat
4:: $k : = k + 1$ ;
5:: repeat
6:: Find an ordered variable pair $(X, Y)$ satisfying $| A d j (X, G) ∖ {Y} | \geq k$ in graph $G$ ;
7:: repeat
8:: Find a subset $S \subseteq A d j (X, G) ∖ {Y}$ with size k;
9:: if $X ⊥ ⊥ Y ∣ S$ , then
10:: delete the undirected edge $X - Y$ from graph $G$ ;
11:: save $S$ as $S (X, Y)$ and $S (Y, X)$ and add them into $S$ ;
12:: end if
13:: until The undirected edge $X - Y$ is deleted or all subsets $S \subseteq A d j (X, G) ∖ {Y}$ have been selected;
14:: until The conditional independence test has been completed for all ordered variable pairs $(X, Y)$ all sets $S$ that meet the conditions;
15:: until For each ordered variable pair $(X, Y)$ , we have $| A d j (X, G) ∖ {Y} | < k$ ;
16:: return $G$ and $S$ .

Appendix A.2

Proof of Theorem 1.

We know that all causes of

Y

and the descendants of these causes are d-connected with

Y

. Since the faithfulness assumption holds for the causal graphical model

(G, P, Θ)

, we have that all causes of

Y

and the descendants of these causes are not independent of

Y

. From the definition

X_{s i g} = {X ∣ X \in X and X / ⊥ ⊥ Y}

, we have that

X_{s i g}

contains all causes of

Y

and the descendants of these causes. Meanwhile, according to the definition of the causal graphical model

(G, P, Θ)

, the functional dynamic target

Y

has no children. Therefore, all d-connected paths from a vertex X to

Y

should be either

X \to \dots \to \dots \to Y

or

X \leftarrow \dots \leftarrow X^{'} \to \dots \to Y

. Clearly, the vertex X is either a cause of

Y

or a descendant of a cause of

Y

. Now, we have shown that the vertices in

X_{s i g}

are either causes of

Y

or descendants of causes of

Y

. Statement 1 is proved.

Now, we prove the Statement 2. The faithfulness assumption makes sure that

X_{1}

and

X_{2}

are not adjacent in

G_{X_{s i g}}

if there exists a set

A \in X_{s i g}

such that

X_{1} ⊥ ⊥ X_{2} ∣ A

. We just need to prove the “only if” part, that is, there exists a set

A \in X_{s i g}

such that

X_{1} ⊥ ⊥ X_{2} ∣ A

if

X_{1}

and

X_{2}

are not adjacent in

G_{X_{s i g}}

.

Consider the case that both $X_{1}$ and $X_{2}$ are causes of $Y$ . Since $X_{1}$ and $X_{2}$ are not adjacent in $G_{X_{s i g}}$ , either $X_{1}$ is a nondescendant of $X_{2}$ or $X_{2}$ is a nondescendant of $X_{1}$ . Without loss of generality, we assume that $X_{1}$ is a nondescendant of $X_{2}$ . Since $X_{2}$ is a cause of $Y$ , according to Statement 1, we have $P a (X_{2}, G) = P a (X_{2}, G_{X_{s i g}}) \subseteq X_{s i g}$ . Let $A = P a (X_{2}, G)$ , we can obtain the result $X_{1} ⊥ ⊥ X_{2} ∣ A$ .
Consider the case that $X_{1}$ is a cause of $Y$ and $X_{2}$ is not a cause of $Y$ . If $X_{2}$ is not a descendant of $X_{1}$ , similar to the discussion in the previous case, we have $X_{1} ⊥ ⊥ X_{2} ∣ A$ where $A = P a (X_{1}, G) = P a (X_{1}, G_{X_{s i g}}) \subseteq X_{s i g}$ . If $X_{2}$ is a descendant of $X_{1}$ , we have that all paths from $X_{2}$ to $X_{1}$ with the first directed edge as $X_{2} \to \cdot$ are d-separated, otherwise, there should be a directed cycle. Let $A_{1}$ consist of all parents of $X_{2}$ that are d-connected with $X_{1}$ in $G$ . Clearly, $A_{1} \subset X_{s i g}$ since $X_{1}$ is a cause of $Y$ . Then, let $A = A_{1} \cup P a (X_{1}, G) \subseteq X_{s i g}$ . Since $P a (X_{1}, G)$ blocks all d-connected paths between $X_{1}$ and $X_{2}$ through $P a (X_{1}, G)$ , and the set $A_{1}$ blocks all paths like $X_{1} \to \dots \to X_{2}$ , we have that the set $A$ blocks all d-connected paths between $X_{1}$ and $X_{2}$ in $G$ , that is, $X_{1} ⊥ ⊥ X_{2} ∣ A$ holds.
Consider the case that $X_{2}$ is a cause of $Y$ and $X_{1}$ is not a cause of $Y$ , which is symmetric to the second case and can be discussed similarly.

Therefore, the “only if” part of Statement 2 is proven.

Statement 3 holds directly since the faithfulness assumption holds for the causal graphical model

(G, P, Θ)

. □

Proposition A1.

The edges in

E_{s} (G_{X_{s i g}}^{'}) ∖ E_{s} (G_{X_{s i g}}^{*})

, if they exist, do not appear between two ancestors of

Y

in

G

.

Proof of Proposition A1.

For any two nonadjacent vertices

X_{1}, X_{2} \in A n (Y, G)

, if there is no directed path from

X_{1} (o r X_{2})

to

X_{2} (o r X_{1})

, then

P a (X_{1}, G) (o r P a (X_{2}, G)) \subseteq X_{s i g}

is a separation set relative to the pair

(X_{1}, X_{2})

, and hence there is no edge between

X_{1}

and

X_{2}

in the graph

G_{X_{s i g}}

. Therefore, without loss of generality, we assume

X_{1} \in A n (X_{2}, G)

. In this case, since

X_{2}

is an ancestor of

Y

,

S = P a (C n (X_{1}, X_{2}, G), G) ∖ D e (C n (X_{1}, X_{2}, G), G) \subseteq X_{s i g}

is a separation set relative to the pair

(X_{1}, X_{2})

, in which

C n (X_{1}, X_{2}, G)

is the set of all intermediate nodes on the directed paths from

X_{1}

to

X_{2}

in

G

. Hence, there is no edge between

X_{1}

and

X_{2}

in the graph

G_{X_{s i g}}

. □

Proof of Theorem 2.

According to Proposition A1, the edge

X_{1} - X_{2}

in

E_{s} (G_{X_{s i g}}^{'}) ∖ E_{s} (G_{X_{s i g}}^{*})

may have two cases:

Case 1.

X_{1} \in A n (Y, G), X_{2} \in D e (A n (Y, G), G) ∖ A n (Y, G)

.

Case 1.1.

X_{1} \notin A n (X_{2}, G)

. Since

X_{2}

is a nondescendant of

X_{1}

,

P a (X_{1}, G) \subseteq X_{s i g}

is a separation set relative to

(X_{1}, X_{2})

. Hence, there is no edge between

X_{1}

and

X_{2}

in graph

G_{X_{s i g}}^{'}

, which is a contradiction.

Case 1.2.

X_{1} \in A n (X_{2}, G)

. In this case, there are four possible paths between

X_{1}

and

X_{2}

in the graph

G

:

Case 1.2.1. Causal path

X_{1} \to \dots \to X_{2}

. In this case, given any vertex Z on this path different from

X_{1}

and

X_{2}

, this path can be blocked, implying that

X_{1}

and

X_{2}

are not adjacent in

G_{X_{s i g}}^{'}

, which is a contradiction.

Case 1.2.2. Non-causal-path

X_{1} \leftarrow \dots * - * X_{2}

. Since

X_{1} \in A n (Y, G)

, all parents of

X_{1}

are in the set

X_{s i g}

. Hence, conditioning on the parent of

X_{1}

on this path can block this path, implying that

X_{1}

and

X_{2}

are not adjacent in

G_{X_{s i g}}^{'}

, which is a contradiction.

Case 1.2.3. Non-causal-path

X_{1} \to \dots \leftarrow X_{2}

. There must exist at least one v-structure in the path, and the path can be blocked given an empty set. Suppose the nearest v-structure to

X_{2}

is

W \to Z \leftarrow \dots \leftarrow X_{2}

, then the collider Z in this v-structure can not appear on any causal paths from

X_{1}

to

X_{2}

; otherwise, there will be a directed cycle in graph

G

. Hence,

X_{1}

and

X_{2}

are not adjacent in

G_{X_{s i g}}^{'}

, which is a contradiction.

Case 1.2.4. Non-causal-path

X_{1} \to \dots \leftarrow \dots \to X_{2}

. There must exist at least one v-structure in the path, and the path can be blocked given an empty set. However, different from Case 1.2.3, the colliders of the v-structures in this path may occur on some causal paths from

X_{1}

to

X_{2}

. According to Case 1.2.1, these vertices may need to be adjusted to block the causal paths. We use Figure A1 to make an illustration in detail. In Figure A1, with out loss of generality, we suppose

Z_{i} \in D e (A n (Y, G), G) ∖ A n (Y, G), i = 1, \dots, p

. If not, let the point in the path that is farthest from

X_{1}

and belongs to the set

A n (Y, G)

be the new

X_{1}

. In this case, in the non causal path

p_{0} = X_{1} \to Z_{1} \leftarrow U_{1} \to \dots \leftarrow U_{p} \to X_{2}

, all colliders

Z_{i}

are in the causal path

p_{1} = X_{1} \to Z_{1} \to \dots \to Z_{p} \to X_{2}

, and

U_{i} \notin X_{s i g}, i = 1, \dots, p

. In order to block the causal path

p_{1}

, it is necessary to adjust some vertices in

{Z_{i}, i = 1, \dots, p}

, say

{Z_{k 1}, \dots, Z_{k l}}

. But at this time, the path

p = p_{1} (X_{1}, Z_{k 1}) \oplus p_{0} (Z_{k 1}, Z_{k 1 + 1}) \oplus p_{1} (Z_{k 1 + 1}, Z_{k 2}) \oplus \dots \oplus p_{0} (Z_{k l}, Z_{k l + 1}) \oplus p_{1} (Z_{k l + 1}, X_{2})

cannot be blocked. In fact, even if all

Z_{i}, i = 1, \dots, p

are adjusted, the non-causal-path

p_{0}

still cannot be blocked.

Figure A1. Illustration for Case 1.2.4.

In general, in Case 1, only when the situation in Case 1.2.4 (Figure A1) occurs, the edges in

E (G_{X_{s i g}}^{'}) ∖ E (G_{X_{s i g}}^{*})

will appear. Then we have

X_{1} \in A d j (Z_{i}, G_{X_{s i g}}^{'}), i = 1, \dots, p

and

X_{1} \in A d j (X_{2}, G_{X_{s i g}}^{'})

. Note that Rule 1–Rule 3 of Meek’s rules only orient the edges backward. In other words, in Case 1, no matter how the edges in

E (G_{X_{s i g}}^{'}) ∖ E (G_{X_{s i g}}^{*})

are oriented, they do not affect the orientation of the edges between vertices in

A n (Y, G)

. For instance, when applying Rule 3 of Meek’s rules, as shown in Figure A2, if

X_{1} - X_{2}

is the edge in

E (G_{X_{s i g}}^{'}) ∖ E (G_{X_{s i g}}^{*})

due to Case 1.2.4, then the following hold:

If $X_{1} \in A n (Y, G), X_{2} \in D e (A n (Y, G), G) ∖ A n (Y, G)$ , then because of the directed edge $X_{2} \to X_{4}$ , we can obtain $X_{4} \in D e (A n (Y, G), G) ∖ A n (Y, G)$ , implying that the new oriented edge $X_{1} \to X_{4}$ is the directed edge out of the set $A n (Y, G)$ , which does not affect the orientation of edges between vertices in $A n (Y, G)$ .
If $X_{2} \in A n (Y, G), X_{1} \in D e (A n (Y, G), G) ∖ A n (Y, G)$ , then because of the directed edge $X_{2} \to X_{4}$ , we have $X_{4} \in D e (A n (Y, G), G)$ . Note that if $X_{4} \in A n (Y, G)$ , then we have $X_{3} \in A n (Y, G)$ and $X_{3} \to X_{1} \in E (G)$ . Since in Case 1.2.4, only paths between $X_{2}$ and $X_{1}$ cannot be blocked, and all such paths have an arrow pointing to $X_{1}$ . Hence, in the process of learning the graph $G_{X_{s i g}}^{'}$ , we have $X_{2} ⊥ ⊥ X_{3}$ and $X_{2} / ⊥ ⊥ X_{3} ∣ X_{1}$ , implying that a v-structure $X_{3} \to X_{1} \leftarrow X_{2}$ occurs in the graph $G_{X_{s i g}}^{'}$ before applying Meek’s rules, which is a contradiction. Hence, $X_{4} \in D e (A n (Y, G), G) ∖ A n (Y, G)$ , implying that the newly oriented edge $X_{1} \to X_{4}$ is the directed edge between vertices in the set $D e (A n (Y, G), G) ∖ A n (Y, G)$ , which does not affect the orientation of edges between vertices in $A n (Y, G)$ .

Figure A2. Rule 3 of Meek’s rules as an example to illustrate Case 1.

Other cases of Meek’s rules can be similarly proved. In fact, for Case 1 as shown in Figure A1, if there exists a vertex

W \in D e (A n (Y, G), G) \cap P a (X_{2}, G)

such that the edge between

X_{2}

and W may be misoriented in

G_{X_{s i g}}^{'}

due to the new edges in

E_{s} (G_{X_{s i g}}^{'}) ∖ E_{s} (G_{X_{s i g}}^{*})

, we have

W \in A d j (Z_{p}, G)

; otherwise,

W \to X_{2} \leftarrow Z_{p}

forms a v-structure and the edge

W \to X_{2}

can be oriented correctly in both

G_{X_{s i g}}^{'}

and

G_{X_{s i g}}^{*}

. And then, we have

W \in A d j (Z_{p - 1}, G)

; otherwise,

W \to Z_{p}

can be oriented by v-structure and

W \to X_{2}

can be oriented correctly in both

G_{X_{s i g}}^{'}

and

G_{X_{s i g}}^{*}

by applying Rule 2 of Meek’s rules if the edge between

Z_{p}

and

X_{2}

is directed and using Lemma 1 in [24] if the edge between

Z_{p}

and

X_{2}

is undirected. Similarly, we have

W \in A d j (Z_{i}, G), i = 1, \dots, p

and

W \in A d j (X_{1}, G)

. Note that the vertices

X_{1}, X_{2}

and W form a triangle, implying that the edge between W and

X_{2}

cannot be oriented by applying Meek’s rules to the edge between

X_{1}

and

X_{2}

in

G_{X_{s i g}}^{'}

, which contradicts the assumption.

Case 2.

X_{1}, X_{2} \in D e (A n (Y, G), G) ∖ A n (Y, G)

.

Case 2.1.

X_{1} \notin A n (X_{2}, G)

and

X_{2} \notin A n (X_{1}, G)

. In this case, there are three possible paths between

X_{1}

and

X_{2}

in the graph

G

:

Case 2.1.1. Non-causal-path

X_{1} \to \dots \to X_{2}

. The specific discussion is similar to Case 1.2.3.

Case 2.1.2. Non-causal-path

X_{1} \to \dots \leftarrow \dots \to X_{2}

(or

X_{1} \leftarrow \dots \to \dots \leftarrow X_{2}

). There must exist at least one v-structure in the path, and the path can be blocked given an empty set. Note that, different from Case 1.2.4, there is no causal path between

X_{1}

and

X_{2}

at this time, implying that the collider of the v-structure closest to

X_{2}

(or

X_{1}

) will not be adjusted. Therefore,

X_{1}

and

X_{2}

are not adjacent in

G_{X_{s i g}}^{'}

, which is a contradiction.

Case 2.1.3. Non-causal-path

X_{1} \leftarrow \dots \to X_{2}

. Since some parents of

X_{1}

or

X_{2}

may not belong to the set

X_{s i g}

, this path cannot be blocked. For example, in the fork

X_{1} \leftarrow Z \to X_{2}

, when

Z \notin X_{s i g}

, an edge in

E_{s} (G_{X_{s i g}}^{'}) ∖ E_{s} (G_{X_{s i g}}^{*})

appears between

X_{1}

and

X_{2}

. Similar to the discussion at the end of Case 1, no matter how this edge is oriented, Meek’s rules are backward-oriented, so the orientation of this edge only happens inside the set

D e (A n (Y, G), G)

and does not affect the orientation within the set

A n (Y, G)

.

Case 2.2.

X_{1} \in A n (X_{2}, G)

(the case of

X_{2} \in A n (X_{1}, G)

can be discussed similarly). In this case, there are four possible paths between

X_{1}

and

X_{2}

in the graph

G

:

Case 2.2.1. Causal path

X_{1} \to \dots \to X_{2}

. The discussion is the same as Case 1.2.1.

Case 2.2.2. Non-causal-path

X_{1} \to \dots \leftarrow X_{2}

or

X_{1} \leftarrow \dots \to \dots \leftarrow X_{2}

. The discussion is the same as Case 1.2.3.

Case 2.2.3. Non-causal-path

X_{1} \to \dots \leftarrow \dots \to X_{2}

. The discussion is the same as Case 1.2.4.

Case 2.2.4. Non-causal-path

X_{1} \leftarrow \dots \to X_{2}

. The discussion is the same as Case 2.1.3.

Case 2.3. This case is symmetric to Case 2.2 and can be discussed similarly.

We already proved that the edges in

E_{s} (G_{X_{s i g}}^{'}) ∖ E_{s} (G_{X_{s i g}}^{*})

do not affect the skeleton or orientation of edges between ancestors of

Y

in

G^{*}

. In fact, it is worth mentioning that, in the above proof, all discussions focus on graph

G

, implying that the orientation of edges between vertices in

A n (Y, G)

are not affected by the new edge in

E_{s} (G_{X_{s i g}}^{'}) ∖ E_{s} (G_{X_{s i g}}^{*})

. In other words, the ancestors of

Y

in the two graphs are the same, while the possible ancestors of

Y

in

G_{X_{s i g}}^{*}

are also the possible ancestors of

Y

in

G_{X_{s i g}}^{'}

, but not vice versa. □

Proof of Theorem 3.

We need to prove that for any

X \in X_{s i g}

, X is a direct cause of

Y

if and only if there is no subset

A \subseteq X_{s i g} ∖ ({X} \cup N_{X})

such that

X ⊥ ⊥ Y ∣ A

. According to Theorem 1, the “only if” part holds obviously.

Now, we prove the “if” part. According to Theorem 1, for any

X \in X_{s i g}

, X is a direct cause of

Y

if and only if there is no subset

A \subseteq X_{s i g} ∖ {X}

such that

X ⊥ ⊥ Y ∣ A

. Hence, using the definition of

N_{X}

in Algorithm 2, it suffices to prove that if we have

Y ⊥ ⊥ X ∣ A

, then for each non-direct-cause variable

V \in A

, there exists at least one separation set of V and

Y

that does not contain X. In fact, paths between V and

Y

can be divided into two types: paths that go through X and paths that do not go through X. For paths that do not go through X, the separation set naturally does not contain X. Then, for any path that goes through X, the path can be represented as

p (V, Y) = p (V, X) \oplus p (X, Y)

, where

p (X, Y)

has already been blocked by some variables in set

A

. Therefore, whether the subpath

p (V, X)

is blocked or not, the path

p (V, Y)

can be separated by a set that does not contain X. □

Proof of Theorem 4.

We first prove that Algorithm 3 can learn the PCD set

P C D_{T}

of a target node T correctly. Similar to the definition of

X_{s i g}

, let

T_{s i g} \subseteq X

be the set of variables associated with T. As shown in Line 1 in Algorithm 3, the initial value

P C D_{T}^{0}

of

P C D_{T}

is

{X : X \in V ∖ U, and T / ⊥ ⊥ X}

, which is a subset of

T_{s i g}

. In fact, if the non-PC set

U

is empty, then we have

P C D_{T}^{0} = T_{s i g}

. According to the Markov condition and the faithfulness assumption, for any

X \in T_{s i g}

, X is a parent or child of T if and only if there is no subset

A \subseteq T_{s i g} ∖ {X}

such that

X ⊥ ⊥ T ∣ A

. Hence,

P C D_{T}

contains all parents and children of T. For any nondescendant variable of T, the set of T’s parents separates them from T. Due to the lack of a separation set, some descendant variables of T may be included in

P C D_{T}

, as shown in Example 3. Therefore,

P C D_{T}

obtained by Algorithm 3 consists of T’s parents, children, and some descendants.

Now, we prove that Algorithm 4 can learn all causes of

T

within a given distance m. First, according to the discussion following Example 3, Algorithm 4 can learn the PC set of each variable correctly by using Algorithm 3 and symmetric validation method (Lines 7–13 in Algorithm 4). In other words, Algorithm 4 can learn the skeleton of the local structure of each variable correctly. Next, note that once the skeleton of the local structure of a variable is determined, its separation sets from other variables are also obtained at the same time (Line 6 in Algorithm 4). Therefore, all v-structures can be learned correctly because they are determined by local structures and separation sets (Line 14 in Algorithm 4). Combined with Meek’s rules, Algorithm 4 learns the orientation of the local structure of each variable correctly. Finally, we show that continuing the algorithm cannot obtain more causes of

T

within a distance m. Notice that we learn the local structure of nodes layer by layer, and we only learn the next layer after all the nodes of a certain layer have been learned (Line 15 in Algorithm 4). Hence, once Algorithm 4 is stopped, it means that all directed paths pointing to

T

with a distance less than or equal to m have been found, and the m-th edge of these paths has been directed. As shown above, we can correctly obtain all edges and v-structures and their orientations. Hence, continuing the algorithm can only orient new edges that are farther away from

T

(

> m

), which is not what we care about.

We already showed that Algorithm 4 can correctly learn all causes of

T

that are within a distance of m from

T

. Note that the distance between a functional dynamic target

Y

and its direct causes is always 1. Thus, obviously, if

T

is exactly the set of

Y

’s direct causes obtained from Algorithm 2, and the node L in Line 2 in Algorithm 4 is exactly

Y

, then according to Theorem 3 and the proof above, Algorithm 4 learns all causes of

Y

within a given distance

m + 1

correctly. □

References

Karkach, F. Trajectories and models of individual growth. Demogr. Res. 2006, 15, 347–400. [Google Scholar] [CrossRef]
Richards, A.S. A flexible growth function for empirical use. J. Exp. Bot. 1959, 10, 290–301. [Google Scholar] [CrossRef]
Zimmerman, D.L.; Núñez-Antón, V. Parametric modelling of growth curve data: An overview. Test 2011, 10, 1–73. [Google Scholar] [CrossRef]
Murre, J.M.; Chessa, A.G. Power laws from individual differences in learning and forgetting: Mathematical analyses. Psychon. Bull. Rev. 2001, 18, 592–597. [Google Scholar] [CrossRef] [PubMed]
Wixted, J.T.; Chessa, A.G. On Common Ground: Jost’s (1897) law of forgetting and Ribot’s (1881) law of retrograde amnesia. Psychol. Rev. 2004, 111, 864–879. [Google Scholar] [CrossRef] [PubMed]
Sachs, K.; Perez, O.; Pe’er, D.; Lauffenburger, D.A.; Nolan, G.P. Causal protein-signaling networks derived from multiparameter single-cell data. Science 2005, 308, 523–529. [Google Scholar] [CrossRef] [PubMed]
Pearl, J. Causality Models, Reasoning and Inference, 2nd ed.; Cambridge University Press: Cambridge, UK, 2009. [Google Scholar]
Han, B.; Park, M.; Chen, X.W. A Markov blanket-based method for detecting causal SNPs in GWAS. BMC Bioinform. 2010, 11, S5. [Google Scholar] [CrossRef]
Duren, Z.; Wang, Y. A systematic method to identify modulation of transcriptional regulation via chromatin activity reveals regulatory network during mESC differentiation. Sci. Rep. 2016, 6, 22656. [Google Scholar] [CrossRef] [PubMed]
Heckman, J.J. Comment on “Identification of causal effects using instrumental variables”. J. Am. Stat. Assoc. 1996, 91, 459–462. [Google Scholar] [CrossRef]
Winship, C.; Morgan, S.L. The estimation of causal effects from observational data. Annu. Rev. Sociol. 1999, 25, 659–706. [Google Scholar] [CrossRef]
Yin, J.; Zhou, Y.; Wang, C.; He, P.; Zheng, C.; Geng, Z. Partial Orientation and Local Structural Learning of Causal Networks for Prediction. In Proceedings of the Causation and Prediction Challenge at WCCI, Hong Kong, China, 1–6 June 2008; pp. 93–105. [Google Scholar]
Wang, C.; Zhou, Y.; Zhao, Q.; Geng, Z. Discovering and Orienting the Edges Connected to a Target Variable in a DAG via a Sequential Local Learning Approach. Comput. Stat. Data Anal. 2014, 77, 252–266. [Google Scholar] [CrossRef]
Pena, J.M.; Nilsson, R.; Bjorkegren, J.; Tegner, J. Towards Scalable and Data Efficient Learning of Markov Boundaries. J. Mach. Learn. Res. 2007, 45, 211–232. [Google Scholar]
Gao, T.; Ji, Q. Efficient Markov Blanket Discovery and Its Application. IEEE Trans. Cybern. 2017, 47, 1169–1179. [Google Scholar] [CrossRef]
Wang, H.; Ling, Z.; Yu, K.; Wu, X. Towards Efficient and Effective Discovery of Markov Blankets for Feature Selection. Inf. Sci. 2020, 509, 227–242. [Google Scholar] [CrossRef]
Gao, T.; Ji, Q. Local Causal Discovery of Direct Causes and Effects. In Proceedings of the Advances in Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; pp. 2512–2520. [Google Scholar]
Ling, Z.; Yu, K.; Wang, H.; Liu, L.; Li, J. Any Part of Bayesian Network Structure Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 14, 1–14. [Google Scholar]
Yu, L.; Liu, H. Efficient Feature Selection via Analysis of Relevance and Redundancy. J. Mach. Learn. Res. 2004, 5, 1205–1224. [Google Scholar]
Pearl, J.; Shafer, G. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference; Morgan Kaufmann: San Mateo, CA, USA, 1988. [Google Scholar]
Verma, T.; Pearl, J. Equivalence and synthesis of causal models. In Proceedings of the 6th Conference on Uncertainty in Artificial Intelligence, Cambridge, MA, USA, 27–29 July 1990; pp. 220–227. [Google Scholar]
Pearl, J.; Geiger, D.; Verma, T. Conditional independence and its representations. Kybernetika 1989, 25, 33–44. [Google Scholar]
Andersson, S.A.; Madigan, D.; Perlman, M.D. A characterization of Markov equivalence classes for acyclic digraphs. Ann. Stat. 1997, 25, 505–541. [Google Scholar] [CrossRef]
Meek, C. Causal inference and causal explanation with background knowledge. In Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–20 August 1995; pp. 403–410. [Google Scholar]
Fekedulegn, D.; Mac Siúrtáin, M.P.; Colbert, J.J. Parameter estimation of nonlinear models in forestry. Silva Fenn. 1999, 33, 327–336. [Google Scholar] [CrossRef]
Gossman, M.; Koops, W. Multiple analysis of growth curves in chickens. Poulty Sci. 1988, 67, 33–42. [Google Scholar] [CrossRef]
Xu, M.J.; Zhu, L.B.; Zhou, S.; Ye, C.G.; Mao, M.X.; Sun, K.; Su, L.D.; Pan, X.H.; Zhang, H.X.; Huang, S.G.; et al. A computational framework for mapping the timing of vegetative phase change. New Phytol. 2016, 211, 750–760. [Google Scholar] [CrossRef]
Spirtes, P.; Glymour, C. An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev. 1991, 9, 62–72. [Google Scholar] [CrossRef]
Chickering, D.M. Optimal structure identification with greedy search. J. Mach. Learn. Res. 2002, 3, 507–554. [Google Scholar]
West, D. Introduction to Graph Theory; Prentice Hall: Upper Saddle River, NJ, USA, 1996. [Google Scholar]

Figure 1. The causal graphs when the variable

X_{1}

is significantly associated with

Y

: (a) V-structures. (b) Chains. (c) Triangles. (d) Forks.

Figure 1. The causal graphs when the variable

X_{1}

is significantly associated with

Y

: (a) V-structures. (b) Chains. (c) Triangles. (d) Forks.

Figure 2. Meek’s rules comprise four orientation rules. If the graph on the left-hand side of a rule is an induced subgraph of a PDAG, then the corresponding rule can be applied to replace an undirected edge in the induced subgraph with a directed edge. This replacement results in the induced subgraph transforming into the graph depicted on the right-hand side of the rule.

Figure 3. An example to illustrate the difference between

G_{X_{s i g}}^{'}

and

G_{X_{s i g}}^{*}

: (a) True graph

G_{1}

. (b)

G_{1, X_{s i g}}^{*}

. (c)

G_{1, X_{s i g}}^{'}

. (d) True graph

G_{2}

. (e)

G_{2, X_{s i g}}^{*}

. (f)

G_{2, X_{s i g}}^{'}

.

Figure 3. An example to illustrate the difference between

G_{X_{s i g}}^{'}

and

G_{X_{s i g}}^{*}

: (a) True graph

G_{1}

. (b)

G_{1, X_{s i g}}^{*}

. (c)

G_{1, X_{s i g}}^{'}

. (d) True graph

G_{2}

. (e)

G_{2, X_{s i g}}^{*}

. (f)

G_{2, X_{s i g}}^{'}

.

Figure 4. An example to illustrate the results in Theorem 2: (a) True graph

G

. (b)

G_{X_{s i g} \cup Y}^{*}

. (c)

G_{X_{s i g} \cup Y}^{'}

.

Figure 4. An example to illustrate the results in Theorem 2: (a) True graph

G

. (b)

G_{X_{s i g} \cup Y}^{*}

. (c)

G_{X_{s i g} \cup Y}^{'}

.

Figure 5. An example to illustrate the

P C D

set obtained by Algorithm 3.

Figure 5. An example to illustrate the

P C D

set obtained by Algorithm 3.

Table 1. Experimental results of SSL algorithm and S-Local algorithm under different settings.

p	n	${time}_{SSL}$	${time}_{S - Local}$	$Cause$	${rec}_{SSL}$	${rec}_{S - Local}$	${prec}_{SSL}$	${prec}_{S - Local}$	${acc}_{SSL}$	${acc}_{S - Local}$
15	50	50	55	$D C$	$0.487$	$0.500$	$0.352$	$0.474$	$0.866$	$0.929$
	50	50	55	$A C$	$0.157$	$0.487$	$0.721$	$0.814$	$0.445$	$0.640$
	100	64	49	$D C$	$0.557$	$0.829$	$0.485$	$0.814$	$0.905$	$0.975$
	100	64	49	$A C$	$0.143$	$0.656$	$0.786$	$0.877$	$0.451$	$0.733$
	200	112	60	$D C$	$0.538$	$0.885$	$0.386$	$0.856$	$0.888$	$0.981$
	200	112	60	$A C$	$0.224$	$0.882$	$0.603$	$0.878$	$0.476$	$0.854$
	500	676	85	$D C$	$0.551$	$0.939$	$0.466$	$0.929$	$0.927$	$0.989$
	500	676	85	$A C$	$0.363$	$0.977$	$0.567$	$0.893$	$0.552$	$0.916$
	1000	1126	126	$D C$	$0.778$	$0.981$	$0.691$	$0.963$	$0.957$	$0.996$
	1000	1126	126	$A C$	$0.566$	$0.996$	$0.702$	$0.890$	$0.667$	$0.923$
100	50	251	277	$D C$	$0.293$	$0.283$	$0.072$	$0.256$	$0.934$	$0.984$
	50	251	277	$A C$	$0.112$	$0.223$	$0.154$	$0.358$	$0.867$	$0.915$
	100	224	227	$D C$	$0.398$	$0.755$	$0.141$	$0.656$	$0.956$	$0.993$
	100	224	227	$A C$	$0.104$	$0.604$	$0.226$	$0.688$	$0.884$	$0.946$
	200	290	235	$D C$	$0.292$	$0.917$	$0.113$	$0.828$	$0.962$	$0.996$
	200	290	235	$A C$	$0.123$	$0.891$	$0.205$	$0.763$	$0.891$	$0.961$
	500	890	336	$D C$	$0.221$	$0.916$	$0.071$	$0.900$	$0.962$	$0.997$
	500	890	336	$A C$	$0.145$	$0.966$	$0.156$	$0.753$	$0.892$	$0.957$
	1000	1509	527	$D C$	$0.462$	$0.978$	$0.156$	$0.962$	$0.967$	$0.999$
	1000	1509	527	$A C$	$0.327$	$0.996$	$0.308$	$0.668$	$0.908$	$0.933$
1000	50	836	839	$D C$	$0.814$	$1.000$	$0.073$	$1.000$	$0.989$	$1.000$
	50	836	839	$A C$	$0.336$	$0.204$	$0.235$	$0.924$	$0.985$	$0.992$
	100	936	962	$D C$	$0.860$	$1.000$	$0.069$	$1.000$	$0.988$	$1.000$
	100	936	962	$A C$	$0.587$	$0.573$	$0.379$	$0.867$	$0.988$	$0.995$
	200	1204	1222	$D C$	$0.980$	$1.000$	$0.083$	$1.000$	$0.989$	$1.000$
	200	1204	1222	$A C$	$0.724$	$0.847$	$0.446$	$0.861$	$0.989$	$0.997$
	500	2376	1930	$D C$	$1.000$	$1.000$	$0.118$	$1.000$	$0.992$	$1.000$
	500	2376	1930	$A C$	$0.804$	$0.922$	$0.500$	$0.814$	$0.991$	$0.997$
	1000	4015	3118	$D C$	$1.000$	$1.000$	$0.121$	$1.000$	$0.993$	$1.000$
	1000	4015	3118	$A C$	$0.873$	$0.998$	$0.523$	$0.813$	$0.992$	$0.998$
10,000	50	9109	9480	$D C$	$0.667$	$1.000$	$0.150$	$1.000$	$1.000$	$1.000$
	50	9109	9480	$A C$	$0.148$	$0.148$	$0.194$	$1.000$	$0.999$	$0.999$
	100	10,008	10,463	$D C$	$0.654$	$1.000$	$0.101$	$1.000$	$0.999$	$1.000$
	100	10,008	10,463	$A C$	$0.376$	$0.538$	$0.285$	$0.884$	$0.999$	$1.000$
	200	13,710	13,836	$D C$	$0.923$	$1.000$	$0.101$	$1.000$	$0.999$	$1.000$
	200	13,710	13,836	$A C$	$0.551$	$0.833$	$0.395$	$0.871$	$0.999$	$1.000$
	500	21,084	18,343	$D C$	$1.000$	$1.000$	$0.130$	$1.000$	$0.999$	$1.000$
	500	21,084	18,343	$A C$	$0.782$	$0.919$	$0.502$	$0.813$	$0.999$	$1.000$
	1000	31,476	31,862	$D C$	$1.000$	$1.000$	$0.126$	$1.000$	$0.999$	$1.000$
	1000	31,476	31,862	$A C$	$0.813$	$0.959$	$0.505$	$0.787$	$0.999$	$1.000$

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhao, R.; Yang, X.; He, Y. Learning Causes of Functional Dynamic Targets: Screening and Local Methods. Entropy 2024, 26, 541. https://doi.org/10.3390/e26070541

AMA Style

Zhao R, Yang X, He Y. Learning Causes of Functional Dynamic Targets: Screening and Local Methods. Entropy. 2024; 26(7):541. https://doi.org/10.3390/e26070541

Chicago/Turabian Style

Zhao, Ruiqi, Xiaoxia Yang, and Yangbo He. 2024. "Learning Causes of Functional Dynamic Targets: Screening and Local Methods" Entropy 26, no. 7: 541. https://doi.org/10.3390/e26070541

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning Causes of Functional Dynamic Targets: Screening and Local Methods

Abstract

1. Introduction

2. Preliminary

3. The Causal Graphical Model of Potential Factors and Functional Dynamic Target

4. Variable Screening for Causal Discovery

5. A Screening-Based and Local Algorithm

6. Experiments

7. Discussion and Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

Appendix A.1

Appendix A.2

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI