Relaxation of Some Confusions about Confounders

Zlatniczki, Ádám; Stippinger, Marcell; Benkő, Zsigmond; Somogyvári, Zoltán; Telcs, András

doi:10.3390/e23111450

Open AccessArticle

Relaxation of Some Confusions about Confounders

by

Ádám Zlatniczki

^1,2,

Marcell Stippinger

³,

Zsigmond Benkő

³,

Zoltán Somogyvári

³

and

András Telcs

^1,2,3,4,*

¹

Department of Computer Science and Information Theory, Budapest University of Technology and Economics, H-1111 Budapest, Hungary

²

Ericsson Hungary, H-1117 Budapest, Hungary

³

Wigner Research Centre for Physics, H-1121 Budapest, Hungary

⁴

Department of Quantitative Methods, University of Pannonia, H-8200 Veszprém, Hungary

^*

Author to whom correspondence should be addressed.

Entropy 2021, 23(11), 1450; https://doi.org/10.3390/e23111450

Submission received: 2 October 2021 / Revised: 24 October 2021 / Accepted: 26 October 2021 / Published: 31 October 2021

(This article belongs to the Special Issue Causal Discovery of Time Series)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

This work is about observational causal discovery for deterministic and stochastic dynamic systems. We explore what additional knowledge can be gained by the usage of standard conditional independence tests and if the interacting systems are located in a geodesic space.

Keywords:

causality; common cause; geodesic

1. Introduction

It is not necessary to emphasize the importance of the concept of causality in science and in the natural sciences in particular. The concept traverses all disciplines, and it is a matter of extensive research fueled by the exponentially increasing available scientific data and computation power. Revealing causal relations between systems via the time series produced by them is one of the most attractive challenges. The first major advancement was due to Granger who used an auto-regressive framework for a practical implementation of the predictive causality principle by Wiener [1].

The very popular Granger [2] method has some theoretical and practical limitations. It is not able to detect hidden common cause and, instead, indicates false directional causal relation between the observed systems (for details of all the pros and cons cf. [3]). Several further methods appeared in the last two decades (for a concise review see Runge [4] or [5,6]). One of the most prominent is the convergent cross mapping method developed by Sugihara [7] to investigate deterministic dynamic systems, which essentially utilizes Takens’ embedding theorem [8]. Stark [9,10] generalized Takens’ result and showed the theoretical limitations to use it for stochastic dynamic systems. For deterministic dynamics, a new approach was presented in a recent work [11] that was based on the comparison of the dimension of the attractors of the given systems and their joint observation.

The present paper investigates the causal relation of a pair of dynamic systems (which might be deterministic or stochastic). Facts are revealed that, to our best knowledge, avoided the attention of previous studies. We show that the common driver is an i.i.d. sequence, shared observational noise, if there is dependence between the systems with the smallest but positive time difference. We also show that, if the pair is located in a non-abstract physical space where the speed of information transfer is known, then direct causation and common cause cases can be distinguished, which, in general, is theoretically impossible.

Basic Definitions

First, we provide the framework of our investigation. Our aim is to find the causal relationship between two stochastic dynamic systems X and Y from which we observe the time series

{\{x_{i}\}}_{i = 1}^{n}, {\{y_{i}\}}_{i = 1}^{n}

.

Assumption 1.

We assume that there is a set of systems

S = \{X, Y, L\}, X \in R^{d_{X}}

and

Y \in R^{d_{Y}}, L \in R^{d_{L}},

D = d_{X} + d_{Y} + d_{L}

, and an external source of noise W such that the process

m_{i} = (s_{i}, ω_{i}) = ((x_{i}, y_{i}, l_{i}^{1}, \dots l_{i}^{m}), ω_{i}) \in R^{2 D}, s_{i} \in R^{D}, ω_{i} \in

R^{D}

has a joint distribution. The series

l_{n}

are unobserved, hidden series that are not i.i.d. (see Figure 1).

In what follows, for

ω_{1}

, we will use

ξ

emphasizing that it influences x and similarly

η

for

ω_{2}

for y.

For brevity, we will use the multi-index of involved dimensions:

\underset{̲}{d}

= (d_{X}, d_{Y}, d_{L})

.

Assumption 2.

The external noise

ω_{i} \in R^{D}

is modeled with an unobserved i.i.d. sequence and affects all the systems with independent

ξ, η, ω_{l_{i}}

components. Furthermore,

ω_{i + 1}

is independent from

S_{0}^{i} = {\{s_{j}\}}_{j = 0}^{i}

.

Assumption 3.

The process

m_{i}

is stationary.

Assumption 4.

The causal structure of the time series is time invariant and non random.

In what follows, we use the expression “drives” for all the terms “causes”, “influences” and “injects information” in relation to dynamic systems.

Following [12], we use the next model. The visible and invisible system can be described by a p-order Structural Vector Auto-regressive

S V A R

model:

m_{n + 1} = f (m_{n}, \dots, m_{n - p + 1}, ω_{n + 1})

n \in N,

and

m_{0}

follows the stationary distribution of the system. (It is a

S V A R (\underset{̲}{d}, p)

process, were

\underset{̲}{d}

is the multi index of dimensions of the variables, and p is the order of auto-regression).

The recursion clearly can be transformed with time delay embedding into higher dimension first order

S V A R

in particular to

S V A R (2 D \times p, 1)

in short

S V A R (1)

with

M_{n} = (m_{n}, m_{n - 1}, \dots, m_{n - p + 1}) :

(1)

M_{n + 1} = g (M_{n}, ω_{n + 1})

(2)

We make the same restriction as in [12] that (2) must be recursive in the variables, which ensures that there is no directed functional cycle. Variables with capital letters denote the same “embedding” as in (1).

Assumption 5.

The process M is exact (cf. Definition 4.3.2. [13]).

Exactness means that, if the process started from a set with positive probability, then, after a long time, the set in which it can be found has probability one. It is natural to assume exactness given that we work with an observation, and the support of the observed process for us will be the whole set where the process can run and, consequently, has probability one. On the other hand, exactness implies mixing for stationary processes, and, at the same time, a mixing stationary process is

α

-mixing (or strong mixing). Let us note here that, from strong mixing, ergodicity also follows, but we do not need that fact (see also [13]).

In our discovery scheme, we may allow instantaneous causation between all variables; however, we do not elaborate on that case here. For brevity, that is not reflected in (2). We note that a system like (2) with contemporaneous interaction but without a directed cycle can be rewritten into the form of (2) using time shifts thanks to the acyclic recursivity.

Definition 1.

We will say that X drives Y if there is a

k > 0

s.t.

Y_{n + k} = f (X_{n}, Y_{n}, L_{n}, η_{n + k})

(3)

where

L_{n}

stands for the set of latent variables,

η_{n + 1}

is an i.i.d sequence that is independent of

{(X_{i}, Y_{i}, L_{i})}_{i = 0}^{n}

, and

X_{n}

cannot be omitted without violating the validity of (3).

Let us explain that key definition. We may say that there is no such a function g that

Y_{n + k} = g (Y_{n}, L_{n}, η_{n + k})

(4)

which makes the fact explicit, that

Y_{n + 1}

can be created without

X_{n} .

Here, one should also observe that the i.i.d. part is also the same as in (3), and there is no possibility for an i.i.d.

X_{n}

to be hidden in

η_{n + 1}

.

2. Causal Discovery Schemes

The literature of causal discovery is huge. This work has been inspired by two recent ones with their strengths and limitations. First, we found the framework defined by Malinsky in [12] very appealing and that the complex nature of assumptions and the suggested algorithm in [14] presented an essential challenge. The algorithm in [12] is an extension of [15,16]. The algorithm provides a theoretically complete recall of the underlying causal structure at the price that some relations are marked undetermined and some causal relations are not or only partially revealed.

In [14], in addition to many other assumptions, it is assumed that all hidden processes that influence an observed one have no memory (Assumption A9 in [14]). That assumption and A6 in [14] cannot be checked. In [12], such restrictions are eliminated. That paper and most of the works based on Pearl’s DAG analysis have theoretical limitations as admitted in [12]. In what follows, we investigate some situations in which that limitation can be relaxed.

Information from X to Y can be transferred along a chain of direct causal links, along a directed path

π_{X, Y}

. The length of the path (the number of intermediate components plus one) is denoted by

l = l_{X, Y} = l (π_{X, Y}) .

Such a path has a starting and ending time

n,

n + l

(for arbitrary

n \geq 0, l > 0

), the difference is the time lag.

Assumption 6.

We assume that, with some background information, the minimal lag between the systems X and Y can be determined.

We consider the smallest lag

τ

for which dependence can be detected in “direction” X to Y:

τ = τ_{X, Y} = min_{π} \{l (π_{X, Y}) > 0\}

(5)

2.1. The Decomposable Case

We introduce our notation. In order to save space, let

(A, B) = (X, Y)

or

(Y, X)

. Let I stand for the Shannon entropy/differential entropy based mutual information. We define conditional mutual information between elements of time series

a_{n}, b_{n}

and similarly for other series. A segment from k to l of a time series

a_{n}

are denoted by

A_{k}^{l}

. Such segments are used in the condition representing a part or the full past. In order to investigate if there is information transfer from B to A with a given time lag

τ_{b, a}

we use the conditional mutual information between

a_{n + τ_{b, a}}

and

b_{n}

given the full past of both series

{A_{0}}^{n + τ_{b, a} - 1}

and

B_{0}^{n - 1}

, and we denote it by

I_{B}

. We define the following conditional mutual information

\begin{matrix} I_{B} & = & I (a_{n + τ_{b, a}}; b_{n} | A_{0}^{n + τ_{b, a 1} - 1}, B_{0}^{n - 1}), \\ I_{A, B}^{(k)} & = & I (a_{n}; b_{n + k} | A_{0}^{n - 1}, B_{0}^{n + k - 1}), for any 0 \leq k < τ_{A, B} . \end{matrix}

where we set

A_{p}^{q} = (a_{q}, \dots, a_{p})

and similarly for B and other variables.

Proposition 1.

Let for

L, A, B

(

(A, B) = (X, Y)

or

(Y, X)

)

δ = δ_{L, A, B} = τ_{L, B} - τ_{L, A} \geq 0 .

Under Assumptions 1–6 for

δ_{L, A, B} = k, 0 \leq k < τ_{A, B}

, the following implications hold.

\begin{matrix} \{\begin{matrix} I_{A} & I_{B} & A \to B & B \to A \\ = c^{'} & = 0 & \Leftrightarrow & \exists & ∄ \\ = c^{'} & = c^{″} & \Leftrightarrow & \exists & \exists \end{matrix}\} \times \{\begin{matrix} I_{A, B}^{(k)} & C D \\ = 0 & \Leftrightarrow & ∄ \\ = c & \Leftrightarrow & \exists \end{matrix}\}, \end{matrix}

Relation 1. Logical relations between conditional mutual information values and causal relations where

C D

stands for Common Driver and

c, c^{'}, c^{″} > 0

. In the right part of the table, =0 means that

I_{A, B}^{(k)} = 0

holds for all

0 \leq k < τ_{A, B}

, while >0 means that there is at least one such k for which

I_{A, B}^{(k)} > 0

.

The proposition summarizes the possible inferences in a concise way. In Relation 1, the headers contain the list of possible combinations and the possible causal scenarios. We have the direct product of two lists of cases collected in the two tables. The header of tables contains, on the left, the quantities that are decisive and, on the right, the possible causal scenarios. As an example, in the left table, the first row shows that if and only if we have that

I_{B} = 0

but

I_{A} = c^{'} > 0

(significantly differ from zero) then B does not drive A but A drives B. In the right table, if

I_{A, B}^{(k)} = 0

holds for all

0 \leq k < τ_{a, b}

that means that there is no common information between members of the series for

k < τ_{a, b}

, while, in the opposite case, there should be a common driver, given, that there is shared information that cannot be attributed to driving with a lag below

τ_{a, b}

. If

δ = τ_{x, y}

(or

= τ_{y, x}

) then, causation between X and Y and a common driver may coexist, and we cannot separate those models. In the next section we provide some observations in that situation.

2.2. The Confounder Case

We assume that

δ_{L, X, Y} = τ_{X, Y}

but

τ > 0 .

If

τ > 0

, we can investigate the common information between

X_{n + 1}

and

Y_{n + τ}

. Unfortunately, the variables

X_{n}, Y_{n + τ}

have a confounder; therefore, we cannot tell which causal relation is behind the dependence. However, some internal structure can be revealed. In line with the assumptions

δ_{L, X, Y} = τ_{X, Y}

but

τ > 0,

we assume that

\begin{matrix} I_{0} & = & I (X_{n}; Y_{n + τ} | X_{0}^{n - 1}, Y_{0}^{n + τ - 1}) > 0 \end{matrix}

(6)

\begin{matrix} I_{1} & = & I (X_{n + 1}; Y_{n + τ} | X_{0}^{n}, Y_{0}^{n + τ - 1}) = 0 \end{matrix}

(7)

Let

b_{1}

be the information that is passed from

X_{n}

to

Y_{n + τ}

and

b_{i}

for

i = 1, 2

from an L to both (if one or other information transfer takes place). We also let

a_{1}

be the information passed from

X_{n}

to the

X_{n + 1}

as Figure 2 shows.

From (7), we have that

b_{1}

is independent from

a_{1}

and

b_{1}

is independent from

b_{2}

. Thus, we have that the information

b_{n}

injected to

X_{n}

and

Y_{n}

from L is an i.i.d. sequence. A similar argument shows that the information

c_{1}

passed from

Y_{n + τ}

to

Y_{n + τ + 1}

is independent from

b_{2} .

We still cannot decide if X drives Y or L drives both; however, in the latter situation, we may say that L emits observational noise for X, and it does not influence its evolution (the value of

a_{i}

). Alternatively, we may consider

b_{i}

as the “part” of X, which is injected to Y. Let us note that L itself is not necessarily an i.i.d. sequence but, from the point of view of its impact on X and Y, it is indifferent.

One may appeal to the Occam’s razor principle (if other background knowledge does not dictate otherwise) that L itself is an i.i.d. process. If b is part of X or external noise that cannot be decided without further knowledge, we may refer again to the Occam’s razor principle and assume that there is no a third system, a common driver but X injects an i.i.d. sequence to Y.

2.3. Geodesic Spaces

Now, we investigate the case when the subsystems of M are located in a geodesic metric space with unique geodesics between any pair of points. We assume that the information transfer speed is uniform, constant in the space regardless of the location of the source and target. Under that assumption, we can speak interchangeably about distance in space and time.

2.4. Strict Reversed Triangular Inequality

If

δ = {min}_{L} δ_{L, X, Y}

and

δ > τ

(8)

then we have

τ_{L, Y} > τ_{L, X} + τ_{X, Y}

(9)

the reversed, strict triangular inequality, and there is information share between

X_{n}

and

Y_{n + τ}

, then no L can be a common driver of

X_{n}

and

Y_{n + τ}

(cf. Figure 3), so a direct driving should take place from X to

Y .

2.5. Strict Triangular Inequality

On the other hand, if for an L

δ_{L X, Y} < τ_{X, Y}

(10)

then we have

τ_{L, Y} < τ_{L, X} + τ_{X, Y}

(11)

and

X_{n}

and

Y_{n + τ}

have positive conditional mutual information conditioned on the past, then only

L,

the common driver can produce it, not causation (see Figure 4).

2.6. The Equality, the Confounder Case

Finally, if

τ_{X, Y} = δ_{L, X, Y},

τ_{L, Y} = τ_{L, X} + τ_{X, Y}

(12)

we have a confounder.

If the metric space has a unique geodesic from L to Y, then X should be on that geodesic of L and Y, and this means that the information from L either enters X along the path to Y or avoids it in in a tricky way by an infinitesimal detour as Figure 5 depicts.

In the former case, we have no confounder but the causal chain

L \to X \to Y

. This is a situation that, again, cannot be resolved without additional information about the actual systems under scrutiny. Economists used to call such L an instrumental variable.

Now, let us recall that the inequalities (8)

and

(10) read as

\begin{matrix} τ_{L, Y} & > & τ_{L, X} + τ_{X, Y}, \end{matrix}

(13)

\begin{matrix} τ_{L, Y} & < & τ_{L, X} + τ_{X, Y} . \end{matrix}

(14)

The latter one is the strict triangular inequality and the former one is its converse (both with strict inequality). Here, we arrive at the interpretation of causation in M. If it is a system in an abstract space without metric properties, there is no point to speak about distances in it, and there is no link between information transfer time (lag in short) and distances.

On the other hand, if

the system M is located in a geodesic metric space,
the geodesics are unique,
the information propagates along the geodesics, and
the information transfer has a constant speed,

then, distances are proportional to the delay with the same constant factor for all members. Triangular inequality is inherited from distances to lags. In the case of a metric space, like the Euclidean, hyperbolic and spherical with unique geodesic (except if X and Y are the oppositely positioned on the sphere) the triangular inequality holds, and thus (13) is impossible, and L cannot be a common driver that mimics driving or acts parallel to a driving between X and

Y .

Let us note that the triangular inequality holds for space-like vectors in the Minkovsky space, while the converse holds for time-like positions. Finally, the case of strict equality needs further investigation.

In case of different transfer speed, the picture is more complex, and the above geometric consideration is applicable in particular settings only. In the human brain, the information transfer has different speed depending on the transfer mode: via sequences of neural cells, long axon bundles or volume of surface currents. The transfer speed depends on the number of intermediate relay nodes of the network as well. Consequently, the case of causality analysis of brain regions needs detailed information on the connection type and speed between them. It is likely that many other topical areas, like climate and geophysics, specific knowledge of the metric properties and transfer speed may contribute to the success of causal discovery. In other areas, there is no information about the temporal arrangement of the unobserved factors, and consequently revealing the perfect description of the causal structure seems impossible.

2.7. Conditions and Mixing

Let us recall here that all the methods that are based on Pearl’s DAG analysis use d-separation ( or causal Markovness) based on a conditional independence test (CIT) in which parents are the conditioning variables. As such, they need access to the parents, which is impossible if those are not observed, and the computation cost can be prohibitive for large networks. Let us see that the d-separation uses the parents as cut set in the DAG. In Section 2, we used the full past of both observed processes. In practice, it is impossible to put the whole past in the condition; therefore, we should work with a shorter history. Let us consider, as an example, the case when

0 \leq k < τ_{x y}

, which means that there is no information transfer from

x_{n}

to

y_{n + k}

and investigate

I (x_{n}; y_{n + k} | X_{0}^{n - 1}, Y_{0}^{n + k - 1})

. If

I (x_{n}; y_{n + k} | X_{0}^{n - 1}, Y_{0}^{n + k - 1}) = 0

, i.e., there is no hidden common driver. One can show that

I (x_{n}; y_{n + k} | X_{0}^{n - 1}, Y_{0}^{n + k - 1}) \leq I (x_{n}; y_{n + k} | J_{d}) \to 0

as

d \to \infty,

where

J_{d} = (X_{n - d}^{n - 1}, Y_{n - d}^{n + k - 1})

. For the proof, see Appendix A.

With this argument, we have that the convergence to a constant or to zero of the conditional mutual information determines if there is a driving between X and Y and if there is a common driver (as indicated in Relation 1). Under Assumption 5, it is evident that if there is a hidden common driver, the information is passed along a fixed length path from the common cause to X and Y, and its effect on dependency is not diminishing. If there is no common driver, the exchanged information should traverse longer and longer paths, and the Conditional Mutual Information (CMI) should go to zero as d goes to infinity.

The conditional independence test (and proper estimate of CMI) has recently been the focus of research motivated by applications in machine learning and artificial intelligence. This is known to be a challenging task (cf. [6,12,17,18,19]).

3. Related Works and Discussion

There are numerous extensions and refinements of the original PC algorithm that Pearl developed [20]. This applies to the study of causal discovery of dynamic systems based on observed time series. We mention some prominent works [4,6,12,21,22] and their bibliography for further reading (see also the extended surveys [5,23]). The recent works [12,14] (see also [23]) have a very similar approach to the present one. In particular, we also use the structural modeling framework; however, we limit our focus to the discovery of a causal relation between a pair of systems. The method can be extended to the study of many time series by considering vector valued observations and/or many pairwise investigations.

The capabilities and limitations of the causal discovery algorithms were investigated in detail in seminal works [15,16,20,24] and recently in [14,21]. The recent generalizations are complete. They extend the labeling of edge ends of classical DAGs, while completeness does not mean that all relations are well specified. Completeness means that all the possible MAGs (Markov Equivalent Acyclic Graphs) can be created.

In this paper, we used an essential assumption and two unavoidable approximations. First, we assumed that the continuous time process can be inferred using a discrete time and limited resolution time series observation. Next, we assumed that the discrete time process can be well approximated with an order-p SVAR model. Finally, if the processes contain continuous variables, the condition is not restricted to a single state value but to a set of them, and, as a consequence, it is not perfectly blocking the information flow between the marginal variables. This deficiency might be eliminated by the local permutation method proposed by Runge in [19].

Author Contributions

Conceptualization, Á.Z. and A.T.; Formal analysis, A.T.; Methodology, Á.Z. and A.T.; Writing—original draft, A.T.; Writing—review and editing Z.B., M.S. and Z.S. All authors have read and agreed to the published version of the manuscript.

Funding

Z.B., Z.S. and A.T. was partially supported by an award from the National Brain Research Program of Hungary (NAP-B, KTIA NAP 12-2-201), Z.S. Hungarian National Research, Development and Innovation Fund, NKFIH under grant number K 135837.

Data Availability Statement

Not applicable.

Acknowledgments

We thank to Roberta Rehus for editing the paper.

Conflicts of Interest

The authors declare no conflict of interest.

Appendix A

We show that if

I (x_{n}; y_{n + k} | X_{0}^{n - 1}, Y_{0}^{n + k - 1}) = 0

(A1)

it can be confirmed by collecting evidence that

I (x_{n}; y_{n + k} | X_{n - d}^{n - 1}, Y_{n - d}^{n + k - 1}) \to 0

as d tends to infinity. Let us assume that (A1) holds. Since

k < τ_{x, y}

and there is no common cause of any information, what is shared by

x_{n}

and

y_{n + k}

should come from their past.

Let us introduce the short notation

J_{d} = (X_{n - d}^{n - 1}, Y_{n - d}^{n - 1})

and estimate

I (x_{n}; y_{n + k} | X_{0}^{n - 1}, Y_{0}^{n + k - 1})

from above using the monotonicity of the conditional mutual information.

\begin{matrix} I (x_{n}; y_{n + k} | X_{0}^{n - 1}, Y_{0}^{n + k - 1}) & \leq & I (x_{n}; y_{n + k} | X_{n - d}^{n - 1}, Y_{n - d}^{n + k - 1}) \\ \leq & I (X_{n}^{n + k}; Y_{n}^{n + k} | J_{d}) . \end{matrix}

the first step, we use the monotonicity, then the two assumption and monotonicity again. Next, we use that shared information comes from the past then monotonicity again in the second steps:

\begin{matrix} I (X_{n}^{n + k}; Y_{n}^{n + k} | J_{d}) & \leq & I (M_{n}^{n + k}; M_{0}^{n - 1 - d} | J_{d}) \\ \leq & I (M_{n}^{n + k}; M_{0}^{n - 1 - d}) . \end{matrix}

Figure A1. The figure shows why the conditions do not block the common driver.

Now, we use the fact that m is a first order Markov chain, then that it is time homogeneous and finally that, from exactness, it follows that it is

α

-mixing.

\begin{matrix} I (x_{n}; y_{n + k} | X_{0}^{n - 1}, Y_{0}^{n + k - 1}) \\ \leq & I (M_{n}^{n + k}; M_{0}^{n - 1 - d}) \\ = & I (M_{d + 1}; M_{0}) \leq α (d + 2) \to 0 \end{matrix}

as

d \to \infty

due to the strong mixing property, following from the exactness assumption (5).

References

Wiener, N. The theory of prediction. In Modern Mathematics for Engineers; Beckenbach, E., Ed.; McGraw-Hill: New York, NY, USA, 1956. [Google Scholar]
Granger, C. Investigating causal relations by econometric models and cross-spectral methods. Econometrica 1969, 37, 424–438. [Google Scholar] [CrossRef]
Maziarz, M. A review of the Granger-causality fallacy. J. Philos. Econ. Reflections Econ. Soc. Issues 2015, 8, 86–105. [Google Scholar]
Runge, J.; Bathiany, S.; Bollt, E.; Camps-Valls, G.; Coumou, D.; Deyle, E.; Glymour, C.; Kretschmer, M.; Mahecha, M.D.; Muñoz-Marí, J.; et al. Inferring causation from time series in Earth system sciences. Nat. Commun. 2019, 10, 2553. [Google Scholar] [CrossRef] [PubMed]
Guo, R.; Cheng, L.; Li, J.; Hahn, P.R.; Liu, H. A survey of learning causality with data: Problems and methods. ACM Comput. Surv. CSUR 2020, 53, 1–37. [Google Scholar] [CrossRef]
Guyon, I.; Janzing, D.; Schölkopf, B. Causality: Objectives and assessment. In Causality: Objectives and Assessment; PMLR: Maastricht, The Netherlands, 2010; pp. 1–42. [Google Scholar]
Sugihara, G.; May, R.; Ye, H.; Hsieh, C.H.; Deyle, E.; Fogarty, M.; Munch, S. Detecting causality in complex ecosystems. Science 2012, 338, 496–500. [Google Scholar] [CrossRef] [PubMed]
Takens, F. Detecting strange attractors in turbulence. In Dynamical Systems and Turbulence, Warwick, 1980; Springer: Berlin/Heidelberg, Germany, 1981; pp. 366–381. [Google Scholar]
Stark, J. Delay embeddings for forced systems. I. Deterministic forcing. J. Nonlinear Sci. 1999, 9, 255–332. [Google Scholar] [CrossRef]
Stark, J.; Broomhead, D.S.; Davies, M.E.; Huke, J. Delay embeddings for forced systems. II. Stochastic forcing. J. Nonlinear Sci. 2003, 13, 519–577. [Google Scholar] [CrossRef]
Benko, Z.; Zlatniczki, A.; Fabó, D.; Sólyom, A.; Erõoss, L.; Telcs, A.; Somogyvári, Z. Complete inference of causal relations in dynamical systems. arXiv 2018, arXiv:1808.10806. [Google Scholar]
Malinsky, D.; Spirtes, P. Causal structure learning from multivariate time series in settings with unmeasured confounding. In Proceedings of the 2018 ACM SIGKDD Workshop on Causal Discovery, London, UK, 20 August 2018; PMLR: Maastricht, The Netherlands, 2018; pp. 23–47. [Google Scholar]
Lasota, A.; Mackey, M.C. Chaos, Fractals, and Noise: Stochastic Aspects of Dynamics; Springer Science & Business Media: Berlin/Heidelberg, Germany, 2013; Volume 97. [Google Scholar]
Mastakouri, A.A.; Schölkopf, B.; Janzing, D. Necessary and sufficient conditions for causal feature selection in time series with latent common causes. arXiv 2020, arXiv:2005.08543. [Google Scholar]
Zhang, J. Causal reasoning with ancestral graphs. J. Mach. Learn. Res. 2008, 9, 1437–1474. [Google Scholar]
Zhang, J. On the completeness of orientation rules for causal discovery in the presence of latent confounders and selection bias. Artif. Intell. 2008, 172, 1873–1896. [Google Scholar] [CrossRef] [Green Version]
Li, C.; Fan, X. On nonparametric conditional independence tests for continuous variables. Wiley Interdiscip. Comput. Stat. 2020, 12, e1489. [Google Scholar] [CrossRef] [Green Version]
Lundborg, A.R.; Shah, R.D.; Peters, J. Conditional Independence Testing in Hilbert Spaces with Applications to Functional Data Analysis. arXiv 2021, arXiv:2101.07108. [Google Scholar]
Runge, J. Conditional independence testing based on a nearest-neighbor estimator of conditional mutual information. In Proceedings of the International Conference on Artificial Intelligence and Statistics, Beijing, China, 18–20 August 2018; PMLR: Maastricht, The Netherlands, 2018; pp. 938–947. [Google Scholar]
Pearl, J. Causality; Cambridge University Press: Causality, UK, 2009. [Google Scholar]
Lin, H.; Zhang, J. January. On Learning Causal Structures from Non-Experimental Data without Any Faithfulness Assumption. In Algorithmic Learning Theory; PMLR: Maastricht, The Netherlands, 2020; pp. 554–582. [Google Scholar]
Sun, J.; Taylor, D.; Bollt, E.M. Causal network inference by optimal causation entropy. SIAM J. Appl. Dyn. 2015, 14, 73–106. [Google Scholar] [CrossRef] [Green Version]
Vowels, M.J.; Camgoz, N.C.; Bowden, R. D’ya like DAGs? A Survey on Structure Learning and Causal Discovery. arXiv 2021, arXiv:2103.02582. [Google Scholar]
Spirtes, P.; Glymour, C. An algorithm for fast recovery of sparse causal graphs. Soc. Sci. Comput. Rev. 1991, 9, 62–72. [Google Scholar] [CrossRef] [Green Version]

Figure 1. An example is given for a possible causation scheme for the system M. In the observed series X and Y, and in between we have L a common cause. Above X and below Y, small circles represent the i.i.d. input

ξ

,

η

, and the large circles

V^{X}, V^{Y}

(also belonging to the set of unobserved series) represent the non i.i.d. influences that are not shared and not common for X and Y. In this example, X drives Y, and they have L as a common cause.

Figure 1. An example is given for a possible causation scheme for the system M. In the observed series X and Y, and in between we have L a common cause. Above X and below Y, small circles represent the i.i.d. input

ξ

,

η

, and the large circles

V^{X}, V^{Y}

(also belonging to the set of unobserved series) represent the non i.i.d. influences that are not shared and not common for X and Y. In this example, X drives Y, and they have L as a common cause.

Figure 2. The lag

τ

and lag difference

δ

are equal.

Figure 2. The lag

τ

and lag difference

δ

are equal.

Figure 3. The causation has smaller time lag

τ

compared with the difference

δ

from the common driver.

Figure 3. The causation has smaller time lag

τ

compared with the difference

δ

from the common driver.

Figure 4. The causation has larger time lag

τ

compared with the difference

δ

from the common driver.

Figure 4. The causation has larger time lag

τ

compared with the difference

δ

from the common driver.

Figure 5. The causation time lag τ equal to the lag difference δ.

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zlatniczki, Á.; Stippinger, M.; Benkő, Z.; Somogyvári, Z.; Telcs, A. Relaxation of Some Confusions about Confounders. Entropy 2021, 23, 1450. https://doi.org/10.3390/e23111450

AMA Style

Zlatniczki Á, Stippinger M, Benkő Z, Somogyvári Z, Telcs A. Relaxation of Some Confusions about Confounders. Entropy. 2021; 23(11):1450. https://doi.org/10.3390/e23111450

Chicago/Turabian Style

Zlatniczki, Ádám, Marcell Stippinger, Zsigmond Benkő, Zoltán Somogyvári, and András Telcs. 2021. "Relaxation of Some Confusions about Confounders" Entropy 23, no. 11: 1450. https://doi.org/10.3390/e23111450

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Relaxation of Some Confusions about Confounders

Abstract

1. Introduction

Basic Definitions

2. Causal Discovery Schemes

2.1. The Decomposable Case

2.2. The Confounder Case

2.3. Geodesic Spaces

2.4. Strict Reversed Triangular Inequality

2.5. Strict Triangular Inequality

2.6. The Equality, the Confounder Case

2.7. Conditions and Mixing

3. Related Works and Discussion

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI