Data-Driven Methods for the Detection of Causal Structures in Process Technology

Kühnert, Christian; Beyerer, Jürgen

doi:10.3390/machines2040255

Open AccessArticle

Data-Driven Methods for the Detection of Causal Structures in Process Technology

by

Christian Kühnert

^1,* and

Jürgen Beyerer

^1,2

¹

Fraunhofer Institute of Optronics, System Technologies and Image Exploitation IOSB, Fraunhoferstraße 1, Karlsruhe 76131, Germany

²

Institute for Anthropomatics, Karlsruhe Institute of Technology, Adenauerring 4, Karlsruhe 76131, Germany

^*

Author to whom correspondence should be addressed.

Machines 2014, 2(4), 255-274; https://doi.org/10.3390/machines2040255

Submission received: 5 March 2014 / Revised: 5 August 2014 / Accepted: 13 October 2014 / Published: 4 November 2014

(This article belongs to the Special Issue Machinery Diagnostics and Prognostics)

Download

Browse Figures

Versions Notes

Abstract

:

In modern industrial plants, process units are strongly cross-linked with each other, and disturbances occurring in one unit potentially become plant-wide. This can lead to a flood of alarms at the supervisory control and data acquisition system, hiding the original fault causing the disturbance. Hence, one major aim in fault diagnosis is to backtrack the disturbance propagation path of the disturbance and to localize the root cause of the fault. Since detecting correlation in the data is not sufficient to describe the direction of the propagation path, cause-effect dependencies among process variables need to be detected. Process variables that show a strong causal impact on other variables in the process come into consideration as being the root cause. In this paper, different data-driven methods are proposed, compared and combined that can detect causal relationships in data while solely relying on process data. The information of causal dependencies is used for localization of the root cause of a fault. All proposed methods consist of a statistical part, which determines whether the disturbance traveling from one process variable to a second is significant, and a quantitative part, which calculates the causal information the first process variable has about the second. The methods are tested on simulated data from a chemical stirred-tank reactor and on a laboratory plant.

Keywords:

root cause localization; causal structure discovery; time series analysis

1. Introduction

Modern industrial plants are complex systems that need to run over several weeks or months. During a production run, operating conditions can change, which can lead to abnormal behavior of the process. Since modern plants’ control and measurement devices are strongly cross-linked with each other, a failure in a major piece of equipment can potentially lead to plant-wide disturbances and for example result in a flood of alarms, making it difficult to localize the root cause of the disturbance. As not all relations of the different process parameters are well known, data-driven methods can be of great help to localize or at least to narrow down on the cause of a disturbance.

Backtracking the disturbance propagation path using data-driven methods means to detect temporal cause-effect relationships in a data set. In detail, this means that statistical relationships and time-shifts are used to reconstruct the propagation direction of the disturbance. Several methods have been already developed to test for temporal causal dependencies in data:

One of the first approaches was made by Granger [1], who compares two autoregressive models. The first model contains only past values of itself; the second model is augmented with past values of another variable. If the augmentation improves the regression, it is assumed that this variable has a causal impact on the other. In [2], an algorithm for root cause localization based on the cross-correlation function is presented for causal analysis, especially when having valve stiction. Schreiber [3] presents a concept, named transfer entropy, which detects causal dependencies by measuring the reduction of uncertainty when one variable predicts future values of the other. Further methods for the detection of causal dependencies are proposed in terms of dynamic Bayesian networks [4,5] or nearest neighbor approaches [6]. An overview about different methods, tested on artificial benchmark data sets and for biosignal analysis, is given in [7].

In this paper, different algorithms are proposed, which are based on the cross-correlation function, the transfer entropy, Granger causality and support vector machines. Next, the results of the different methods are combined, and a root cause priority list is generated. This priority list contains a ranking of the different process variables describing their possibility as being the actual cause of the fault.

All proposed algorithms consist of a statistical test, which determines whether the disturbance traveling from one process variable to a second is significant, and a quantitative part, called the causal strength, which defines the influence the input variable has on the output variable. For all methods, the causal strength takes values between zero(= no causal dependency) and one (= causal dependency). Finally, this value is used for generating the root cause priority list. The paper centralizes the main results given in [8], containing the following main contributions: (1) a proposal of a new algorithm based on support vector machines by using a recursive variable selection and model reduction approach; (2) the development of a design approach to combine all methods into one causal matrix and transfer into a root cause priority list; (3) the extension of an existing method based on the cross-correlation function by using permutation tests for the significance test; and (4) the development of a visualization method for the causal matrices. The paper is structured as follows:

In Section 2, the foundations for the detection of causal dependencies in dynamic systems are introduced. Section 3 explains the algorithms. In Section 4, it is pointed out how the results of the different methods can be combined and how the root cause priority list is calculated. Additionally, a new way to visualize the results from the different methods is described. Finally, Section 5 tests the methods on data from a simulated chemical stirred-tank reactor and on an experimental laboratory plant.

2. Detecting Causal Dependencies in Process Measurements

Definition (causal system): A time-invariant system is causal if, for all input signals with

u_{1} (t) \equiv u_{2} (t)

and

t \leq t_{1}

, for any

t_{1}

, the output signals for

0 \leq t \leq t_{1}

show the characteristic

y_{1} {x_{0}, u_{1} (t)} \equiv y_{2} {x_{0}, u_{2} (t)}

(with

x_{0}

: the initial state). Systems that are not causal are called acausal. In causal systems, the input signals for

t > t_{1}

do not have an impact on the behavior of the output signal until time

t_{1}

. Additionally, for causal systems, the impulse response for

t < 0

is zero.

This definition of causality results in several system theoretic consequences. For all static systems

f : R \to R

, causality exists, even if u or y is seen as the input signal, since in both cases, the output signal is not depending on future values, but on the current value of the input signal.

Additionally, all linear time-continuous systems with one input and one output signal are causal, since

Y (s) = G (s) U (s)

and

U (s) = G^{- 1} (s) Y (s)

explain the data equally well (

U (s), Y (s)

: Laplace-transformed input and output signals). In that case, it is only possible to test if the transfer function

G (s)

is realizable (order of the numerator polynom ≤ order of the denominator polynom). Still, this does not contain information about causality.

To backtrack the disturbance propagation path in a plant, a signal decay time or a signal dead time needs to be present in the measurements. Decay times between

u (t)

and

y (t)

characterize a smoothing effect, so that

y (t)

seems to be delayed with respect to

u (t)

. A dead time

T_{D}

between

u (t)

and

y (t)

exists, if

y (t)

does not depend on

u (τ)

in

τ \in (t - T_{D}, t]

, but only on

u (t)

in

τ \leq t - T_{D}

. In other words, dead times describe the interval a change in the input signal needs to become visible on the output of the system. In industrial processes dead times exist, e.g., in tubes, when measuring a fluid concentration on different positioned sensors.

Another point of view on causality is given by Pearl [9]. The central idea is that a cause C increases the probability of the appearance of an effect E. This means that C can only be the cause of E, if

P (E = 1 | C = 1) > P (E = 1 | C = 0)

is fulfilled. However, an increase of the probability can only be done through an active intervention. Therefore, Pearl introduces the do-operator, which forces setting C on a fixed value. In other words, C only has a causal impact on E, if

P (E = 1 | do (C = 1)) > P (E = 1 | do (C = 0))

is fulfilled. This approach is the only possibility to avoid the detection of false causal dependencies.

An example for the detection of a false causal dependency is a non-measured signal

u_{z}

, which has an impact on u, as well as a later one on y, without having a direct dependency pointing from u to y. In [10], it is shown how the approach given by Pearl can be used to detect causal structures in process data. Since the methods proposed in this paper only rely on observational data, meaning that no active intervention is performed, this approach will not be pursued in this paper.

3. Methods for Reconstructing the Disturbance Propagation Path

3.1. Cross-Correlation Function

The cross-correlation function (CCF) [11] quantifies the linear similarity of two equidistant sampled time series

u [k], y [k]

that are time-shifted by a constant lag λ. The CCF is defined as:

{\hat{c}}_{u y} [λ] = \frac{1}{K} \frac{\sum_{k = 1}^{K - λ} (u [k] - {\hat{μ}}_{u}) (y [k + λ] - {\hat{μ}}_{y})}{\sqrt{\sum_{k = 1}^{K - λ} {(u [k] - {\hat{μ}}_{u})}^{2} \sum_{k = 1}^{K - λ} {(y [k] - {\hat{μ}}_{y})}^{2}}}

(1)

with

λ \in {1 - K, 2 - K, \dots, K - 2, K - 1}

and:

{\hat{μ}}_{u} : = \frac{1}{K} \sum_{k = 1}^{K} u [k], {\hat{μ}}_{y} : = \frac{1}{K} \sum_{k = 1}^{K} y [k]

(2)

with

{\hat{c}}_{u y} [λ] \in [- 1, 1]

.

max | {\hat{c}}_{u y} [λ] | = 1

means that

u [k]

and

y [k]

are perfectly correlated at a shift λ, while values close to zero indicate that there is no correlation.

3.1.1. Significance Test

To check for a significant cause-effect relationship

u \to y

, two tests are established. The first test is used to check if

max | {\hat{c}}_{u y} [λ] |

differs significantly from zero; the second test checks if the causal strength from

u \to y

differs significantly from

y \to u

to have a clear indication of the direction of the propagation path.

Significant time-shifted correlation: Since the available samples of the two time series are limited, Equation (1) only gives an estimation of the CCF, so that

max | {\hat{c}}_{u y} [λ] |

randomly differs from zero for two uncorrelated signals. To test if

max | {\hat{c}}_{u y} [λ] |

differs significantly from zero, a hypothesis test is performed. We define:

\begin{matrix} λ^{max} : = & arg max_{λ} | {\hat{c}}_{u y} [λ] | \\ with λ \in {1 - K, \dots, - 1, 1, \dots, K - 1} . \end{matrix}

(3)

If

λ^{max} > 0

holds, there could be a causal relation from

u \to y

, and the hypothesis test is performed. To check if a significant correlation between the signals is present, a t-test with the null hypothesis defined as having no correlation is used with significance level

α = 0.05

. If this test fails, it is assumed that no cause-effect relationship exists from

u \to y

.

Significant causal direction: This test covers the possibility that the resulting CCF can have a global maximum for

λ > 0

and a slightly lower local maximum for

λ < 0

, meaning that the causal direction is not obvious. Therefore, this test checks if the maximum for

λ > 0

is significantly different from the maximum for

λ < 0

. To perform the test, a compound parameter

C^{CCF}

defined as:

C^{CCF} : = \frac{{max}_{λ > 0} | {\hat{c}}_{u y} [λ] | - {max}_{λ < 0} | {\hat{c}}_{u y} [λ] |}{{max}_{λ > 0} | {\hat{c}}_{u y} [λ] | + {max}_{λ < 0} | {\hat{c}}_{u y} [λ] |}

(4)

with

- 1 \leq C^{CCF} \leq 1

is used. A significance value

C^{CCF} > 0

indicates a causal dependency from

u \to y

. As

C^{CCF}

strongly depends on the characteristics of

u [k]

and

y [k]

, an adaptive threshold is derived through a

3 σ

permutation test. Since performing a complete permutation of

u [k]

destroys all causal information, the resulting value of

C^{CCF}

should be close to zero. This idea is exploited by calculating random permutations of

u [k]

and generating several values for

C_{π}^{CCF}

. The threshold

C_{thresh}^{CCF}

for each pair of variables is calculated as:

C_{thresh}^{CCF} : = μ_{C_{π}^{CCF}} + 3 σ_{C_{π}^{CCF}}

(5)

with

μ_{C_{π}^{CCF}}

being the mean value and

σ_{C_{π}^{CCF}}

the standard deviation. If

C^{CCF} > C_{thresh}^{CCF}

holds, the test has passed successfully.

If both significance tests are passed, the found causal dependency is assumed to exist.

3.1.2. Causal Strength

The causal strength of

u \to y

is defined as:

Q^{CCF} : = {(max (0, C^{CCF}))}^{β_{CCF}}

(6)

with

0 \leq Q^{CCF} \leq 1

.

β_{CCF}

is a design parameter to make the different methods numerically compatible, and its selection will be explained in Section 4. The proposed algorithm for the detection of cause-effect relationships for two time series

u [k]

and

y [k]

using the CCF is summarized as follows:

Algorithm 1: Algorithm based on cross-correlation.

Compute ${\hat{c}}_{u y}$ of $u [k]$ and $y [k]$ and determine $λ^{max}$ ;
If $λ^{max} < 0$ , then $u ↛ y$ , else test for a significant correlation;
Calculate $C^{CCF}$ and corresponding threshold $C_{thresh}^{CCF}$ ;
Check if $C^{CCF} > C_{thresh}^{CCF}$ ; calculate the causal strength $u \to y$ by $Q^{CCF}$

3.2. Granger Causality

The Granger causality (GC) has been introduced by Clive Granger [1] and is traditionally used in the field of economics [12,13]. Recently, the application of GC is of growing interest, especially in the field of neuroscience [14] and biology [15].

The central idea is that it is assumed that a signal

u_{i} [k]

has a causal influence on

y [k]

if past values from

u_{i} [k]

and

y [k]

result in a better prediction of

y [k]

than using only past values from

y [k]

for prediction. GC takes into account that besides the signal

u_{i}

[k],

y [k]

can depend on further signals

u_{l} [k] = u_{1} [k], \dots, u_{p} [k]; l \neq i, y [k]

. A comparison is done using two vector autoregressive models and performing a one-step-ahead prediction. If the prediction error of the first model is substantially smaller than the one from the second model, a causal dependency

u_{i} \to y

is concluded. For the proposed algorithm, each time series is once selected as output

y : = u_{m}

, while the left

r - 1

time series are used as input. This can be formulated with n defining the model order as:

E_{{\bar{U}}_{i} Y} = \sum_{k = n + 1}^{K} {(y [k] - {\hat{a}}_{0} - \sum_{j = 1}^{n} {\hat{a}}_{j} y [k - j] - \sum_{\binom{l = 1,}{l \neq i, m}}^{r} \sum_{j = 1}^{n} {\hat{b}}_{l j} u_{l} [k - j])}^{2}

(7)

\begin{matrix} E_{U_{i} Y} = \sum_{k = n + 1}^{K} {(y [k] - {\hat{a}}_{0} - \sum_{j = 1}^{n} {\hat{a}}_{j} y [k - j] - \sum_{\binom{l = 1,}{l \neq m}}^{r} \sum_{j = 1}^{n} {\hat{b}}_{l j} u_{l} [k - j])}^{2} \end{matrix}

(8)

By definition, the parameters

{\hat{a}}_{j}

,

{\hat{b}}_{l j}

in Equation (7) and (8) result from separate estimations. The impact of the i-th input signal is measured in terms of the sum of the squares of residuals without

u_{i}

as

E_{{\bar{U}}_{i} Y}

and with

u_{i}

as

E_{U_{i} Y}

and is used to test for causal significance and for calculating the causal strength.

The performance of the model strongly depends on the model order n. A too small of a value for n leads to a large prediction error, and setting n too large results in an overfitted model. Therefore, the Akaike information criterion (AIC) [16] is used for model order estimation, while taking into account the prediction errors, the sample size K, the number of variables r and the model order n. For model order estimation, the loss function is once set to

V = E_{U_{i} Y}

for the unrestricted and to

V = E_{{\bar{U}}_{i} Y}

for the restricted model. The AIC is defined as:

\begin{matrix} A I C (n) & = l o g (V) + \frac{2 n p^{2}}{K} \end{matrix}

(9)

while the estimated order is selected as

arg {min}_{n} A I C (n)

. Furthermore, all models are tested for consistency by performing a Durbin-Watson statistic [17] on the residuals.

3.2.1. Significance Test

This test is performed to check whether

E_{{\bar{U}}_{i} Y}

and

E_{U_{i} Y}

differ significantly. According to [3,

E_{{\bar{U}}_{i} Y}

and

E_{U_{i} Y}

follow

χ^{2}

distributions, and an F-test can be performed to verify if the time series

u_{i} [k]

has a causal influence on

y [k]

. The test is performed on the restricted and unrestricted model under the null hypothesis that

E_{U_{i} Y} < E_{{\bar{U}}_{i} Y}

with significance level

α = 0.05

.

3.2.2. Causal Strength

The strength of the significant causal relationship

u_{i} \to y

is defined as:

Q^{GC} : = {(1 - \frac{E_{U_{i} Y}}{E_{{\bar{U}}_{i} Y}})}^{β_{G C}}

(10)

with

0 \leq Q^{GC} \leq 1

, since

E_{U_{i} Y} < E_{{\bar{U}}_{i} Y}

counts. The design parameter

β_{G C}

will be determined in Section 4. The suggested algorithm is summarized as follows:

Algorithm 2: Algorithm based on Granger causality.

Compute $E_{{\bar{U}}_{i} Y}$ and $E_{U_{i} Y}$ using the model order
estimated through AIC;
Test for model consistency for both models using the Durbin–Watson statistic;
Perform a significance test based on an F-test for $E_{U_{i} Y} < E_{{\bar{U}}_{i} Y}$ ;
If the causal dependency is significant, calculate the causal strength $u_{i} \to y$ by $Q^{GC}$ ;

3.3. Transfer Entropy

The transfer entropy (TE) is an information theoretic measure and was first introduced by Schreiber [18]. Applications of the TE can be found, e.g., in neuroscience [19,20] and in financial data analysis [21]. In the field of process engineering, research has been conducted by Bauer [6], who uses TE for the causal analysis of measurements taken from chemical processes.

From its definition, it can be used to detect causal dependencies by testing how much information is transferred from

u [k]

to

y [k]

and how much information is transferred from

y [k]

to

u [k]

. The transition probability for

y [k]

is defined as

P (y_{n + 1} | y)

, which is used as short notation for:

P (y_{n + 1} = d_{n + 1} | y_{n} = d_{n}, . . ., y_{1} = d_{1})

(11)

with n defining the time horizon and

d_{v} \in {1, \dots, D_{y}

} the quantization levels of

y [k]

. The transition probability

P (y_{n + 1} | y, u)

is defined accordingly. The transfer entropy is then given as:

\begin{matrix} {TE}_{u y}^{⋆} (λ) & = \end{matrix} \begin{matrix} \sum_{\binom{d_{1} = 1}{\binom{\dots}{d_{n + 1} = 1}}}^{D_{y}} \sum_{\binom{e_{1} = 1}{\binom{\dots}{e_{n} = 1}}}^{D_{u}} P (y_{n + 1}, y, u) log \frac{P (y_{n + 1} | y, u)}{P (y_{n + 1} | y)} . \end{matrix}

(12)

According to [21], the boundaries of the TE are

0 \leq {TE}_{u y}^{⋆} (λ) \leq H_{y}

, with

H_{y}

being the entropy of the output signal. To capture dead times in the data, the parameter λ is introduced to perform a backward-shifting of

u [k]

, and Equation (12) is calculated for different

u [k - λ]

. For the calculation of causal dependencies, the maximum value of the transfer entropy is set to

{TE}_{u y} : = arg {max}_{λ} ({TE}_{u y}^{⋆} (λ))

. To calculate the value of the time horizon n, the residual sum of squares of several vector autoregressive models is calculated:

{\hat{σ}}_{y}^{2} = \sum_{k = n + 1}^{K} {(y [k] - {\hat{a}}_{0} - \sum_{j = 1}^{n} {\hat{a}}_{j} y [k - j])}^{2}

(13)

with

{\hat{a}}_{j}

resulting from a least squares estimation. The used order

n_{TE}

is then chosen as the minimum of the Akaike information criterion defined as

AIC (n) = log {\hat{σ}}_{y}^{2} + \frac{2 n}{K}

[16].

3.3.1. Significance Test

Testing for a significant causal relationship is done by a test introduced in [3]. The key idea is to generate an adaptive threshold

{TE}_{u y}^{thresh}

based on the permutated input time series

u_{π} [k]

and the generation of several values for

{TE}_{u_{π} y}

. The values of

{TE}_{u_{π} y}

are finally used to calculate the threshold

{TE}_{u y}^{thresh}

in terms of a

3 σ

-test:

{TE}_{u y}^{thresh} : = μ_{{TE}_{u_{π} y}} + 3 σ_{{TE}_{u_{π} y}}

(14)

with mean value

μ_{{TE}_{u_{π} y}}

and standard deviation

σ_{{TE}_{u_{π} y}}

. If

{TE}_{u y} > {TE}_{u y}^{thresh}

holds, a causal dependency

u \to y

is concluded, and the causal strength can be calculated.

3.3.2. Causal Strength

By definition

{TE}_{u y} \leq H_{y}

, meaning that for normalizing to values between zero and one, the causal strength of the transfer entropy can be defined as:

Q^{TE} : = {(\frac{{TE}_{u y}}{H_{y}})}^{β_{TE}} .

(15)

The design parameter

β_{TE}

will be chosen in Section 4. The suggested algorithm for the detection of cause-effect relations is summarized as:

Algorithm 3: Algorithm based on transfer entropy.

Compute model order $n_{TE}$ of the transfer entropy using
VAR models and AIC;
Compute ${TE}_{u y}^{⋆} (λ)$ of the time series $u [k]$ and $y [k]$ , set ${TE}_{u y} = max ({TE}_{u y}^{⋆} (λ))$ ;
Calculate ${TE}_{u y}^{thresh}$ and check if ${TE}_{u y} > {TE}_{u y}^{thresh}$ ;
Set $Q^{TE}$ as the resulting value of the causal strength $u \to y$

3.4. Support Vector Machines for Regression

Support vector machines (SVM) are learning methods that are used for supervised learning. Originally, they were developed by Vapnik [22] for classification and later on were extended towards regression and time series prediction [23]. In industrial processes, they are used, e.g., for fault detection and diagnosis [24] or optimization [25]. For the detection of causal structures in data, a model reduction approach is proposed.

Given a training set with

{x_{i}, z_{i}}_{i = 1}^{K}

with

x_{i} \in R^{n}

and

z_{i} \in R

, an SVR means to find the regression function:

f (x) = 〈w, x〉 + b

(16)

containing as parameters the normal vector w, the bias b and

〈\cdot, \cdot〉

, denoting the scalar product. The function

f (x)

should have at most a deviation ϵ from the values

z_{i}

for the whole training data set, while seeking a normal vector w, which is as small as possible. As the selection of a too small insensitivity zone ϵ would lead to equations with infeasible constraints, slack variables

ξ, \hat{ξ} \in R_{\geq 0}

and an additional weighting parameter

C \in R_{> 0}

are introduced leading to the objective function:

\begin{matrix} min_{w, ξ, \hat{ξ}} \frac{1}{2} {| | w | |}^{2} + C \sum_{i = 1}^{K} (ξ_{i} + {\hat{ξ}}_{i}) \\ with f (x_{i}) - z_{i} \leq ϵ + ξ_{i} \\ and z_{i} - f (x_{i}) \leq ϵ + {\hat{ξ}}_{i} . \end{matrix}

(17)

The optimization problem is solved by means of its dual containing the Lagrange multipliers

α, \hat{α}

. The solutions of the regression function is finally given by:

f (x) = \sum_{i = 1}^{K} (α_{i} - {\hat{α}}_{i}) 〈x_{i}, x〉 + b .

(18)

Kernel functions: One of the main reasons why SVMs are employed is their ability to deal with nonlinear dependencies by introducing so-called kernel functions; see, e.g., [26,27]. The input data

R^{n}

is mapped into some feature space

F

with a possibly higher dimension by using a nonlinear transformation function Φ and searching for the flattest function in the defined feature space. Since SVMs solely depend on the calculation of dot products, the computational complexity for this transformation remainsfeasible. A kernel function, defined as

k (x, x^{'}) : = 〈Φ (x), Φ (x^{'})〉

, can be plugged into Equation (18), resulting in the final regression function:

f (x) = \sum_{i = 1}^{N} (α_{i} - {\hat{α}}_{i}) k (x_{i}, x) + b .

(19)

In this paper, only Gaussian kernels, defined as

k (x, x^{'}) = e^{- \frac{| | x - x^{'} | |}{2 σ^{2}}}

, are used for the detection of causal structures. To optimize the parameters

ϵ, C

and σ, a downhill simplex algorithm [28] is used.

3.4.1. Detecting Causal Dependencies

To detect cause-effect dependencies, an input data set

Ψ_{u y} = {u [k - 1], . . ., u [k - n], y [k - 1], . . ., y [k - n]}

for predicting

y [k]

is generated. The SVM is once trained and optimized regarding

ϵ, C

and σ on the complete data set. The time horizon n is estimated as given in Equation (13) for the transfer entropy.

To detect cause-effect dependencies, an input data set

Ψ_{u y} = {u [k - 1], . . ., u [k - n], y [k - 1], . . ., y [k - n]}

for predicting

y [k]

is generated. The SVM is once trained and optimized regarding

ϵ, C

and σ on the complete data set. The time horizon n is estimated as given in Equation (13).

In the first step, the variables in

Ψ_{u y}

are ranked in terms of their prediction accuracy of

y [k]

by performing a recursive variable elimination algorithm based on the Lagrange multipliers as proposed by [29].

In the second step, a relevant subset of input variables is selected. If the resulting subset contains one or several past values of

u [k]

, it is assumed that u causes y. For the selection of the size of the subset, an F-test [11] (

α = 0.05

) is performed on the resulting residual sum of squares of the two SVMs, while the first SVM contains ψ variables and the second SVM

ψ + 1

variables. If the null hypothesis cannot be rejected, the residual sum of squares does not change significantly, and the found subset of variables is set to size ψ.

3.4.2. Causal Strength

Similar to Granger causality, the causal strength is calculated based on the comparison of the squared sum of residuals. The causal strength

Q^{SVM}

is calculated through a comparison of the two different squared sums of residuals, named

E_{u y}

and

E_{y}

, while

E_{u y} \leq E_{y}

. In detail,

E_{u y}

is calculated using the above explained SVM with the subset of input variables resulting from the initial set

Ψ_{u y}

. For the prediction of

y [k]

, the residual sum of squares

E_{y}

is calculated by performing the same algorithm, only starting with the reduced set

Ψ_{y} = {y [k - 1], . . ., y [k - n]}

, which does not contain the time series

u [k]

, and by using the same parameters

ϵ_{opt}, C_{opt}

and

σ_{opt}

.

Again a tuning parameter

β_{SVM}

is defined. Selecting the parameter value is postponed to Section 4. The resulting value

Q^{SVM}

is therefore defined as:

Q^{SVM} : = {(1 - \frac{E_{u y}}{E_{y}})}^{β_{SVM}}

(20)

with

0 \leq Q^{SVM} \leq 1

, where zero equals no causal dependency and one means maximum causal strength. The complete algorithm using support vector machines for the detection of causal dependencies is summarized as Algorithm 4.

Algorithm 4: Algorithm based on support vector machines.

Estimate the time horizon $n_{SVM}$ using a VAR model and AIC to generate $Ψ_{u y}$ ;
Train SVM and fit user-selected parameters $ϵ, C, σ$ using downhill simplex algorithm and check the consistency of the SVM using the Durbin–Watson statistic;
Perform variable selection and calculate subset;
If u is in the subset, set $Q^{SVM}$ as the resulting value of the causal strength $u \to y$ ;

As for each pair of variables a distinct SVM needs to be trained and tuned, this algorithm is the most complex one of the four presented. Regarding the transfer entropy, transition probabilities need to be calculated, meaning that this method can also become computationally intense. The cross-correlation function and Granger causality, being linear measures, are rather cheap to compute. For a detailed comparison of the different algorithms, containing a large set of benchmark data, refer to [8].

4. Reconstruction of the Disturbance Propagation Path and Localization of the Root Cause

Each pair of process variables results in a value that represents the causal influence one variable has on the other. For displaying the complete information, all relationships are written into a causal matrix

Q \in R^{r \times r}

defined as:

\begin{matrix} Q : = [\begin{matrix} - & q_{X_{2} \to X_{1}} & \dots & q_{X_{r} \to X_{1}} \\ q_{X_{1} \to X_{2}} & - & \dots & q_{X_{r} \to X_{2}} \\ ⋮ & ⋮ & ⋱ & ⋮ \\ q_{X_{1} \to X_{r}} & q_{X_{2} \to X_{r}} & \dots & - \end{matrix}] \end{matrix}

(21)

consisting of the causal strengths

q \in {0, \dots, 1}

of the process variables

X_{i}

with

i = 1, \dots, r

. In the matrix, the row index represents the variable that is the causing candidate and the column index representing the effect candidate. Values close to zero describe weak causal strengths, and values close to one describe strong ones.

4.1. Balancing and Combining Causal Matrices

Since each method uses a different mathematical approach, the causal matrices of the four methods are not comparable directly. To make them comparable to each other, the prior introduced exponential fitting parameters

β^{CCF}, β^{TE}, β^{GC}, β^{SVM} \in [0, \infty)

are used.

The proposed design approach is based on the assumption that, on average, all methods will work equally well on the data set. In that case, equally well means that for the found significant causal dependencies, all causal matrices will result in the same mean value. Hence, the value of each β parameter is fitted in a way so that the matrices

Q^{CCF}

,

Q^{TE}

,

Q^{GC}

and

Q^{SVM}

give the same mean for the significant cause-effect relationships. Regarding the investigated use cases in Section 5, the mean value for the causal matrix from each method is set to

0.5

.

Finally, to calculate the combined causal matrix, the mean is taken over all balanced causal matrices for all causal dependencies. In that case, non-significant causal dependencies are set to zero.

4.2. Root Cause Priority List

This list contains a ranking of the analyzed process variables with regard of their possibility of being the actual root cause. As a consequence, a value defined as RC is associated with each variable. This is done by summing up the causal influence one variable has onto the other variables defined as:

{RC}_{n} : = \sum_{i = 1, i \neq n}^{r} q_{X_{n} \to X_{i}} .

(22)

The variable having the maximum value of RC is ranked first, meaning that this variable is most likely to be the root cause of the disturbance. Table 1 outlines the representation of the root cause priority list.

Table 1. Root cause priority list from the causal matrix.

**Table 1.** Root cause priority list from the causal matrix.
Rank	Process variable	RC
1	$X_{n}$	$\sum_{i = 1, i \neq n}^{r} q_{X_{n} \to X_{i}}$
⋮	⋮	⋮
r	$X_{k}$	$\sum_{i = 1, i \neq k}^{r} q_{X_{k} \to X_{i}}$

4.3. Visualizing Causal Matrices

Several techniques have been already developed that deal with the visualization of the causal matrices. In [18], circular directional charts are suggested, and bubble charts are proposed in [6]. In [5], it is suggested to use heat maps to illustrate causal dependencies. Still, all of the methods have as a drawback that only one causal matrix can be visualized at a time. Hence, to compare the different causal matrices better, several ways for visualization are utilized in this paper.

Partially-directed graph: In these graphs, process variables are represented by nodes and causal dependencies by directed edges. It is possible that several edges point onto one node or that several edges leave one node. The main purpose of this representation is to give a fast overview of the disturbance propagation path, while the root cause is the first variable of the chain. Furthermore, the size of the arrowhead is used to indicate the strength of the causal dependency. The graph represents the combined causal matrix.

Doughnut chart: These graphs are circular charts that are divided into several sectors, while having a blank center. To represent the causal matrices, the quantity of each sector results from the calculated entries in Q plus one blank sector. The value in the middle of the doughnut represents the combined causal strength.

Bar chart: Bar charts represent values in the form of rectangular bars while having their length proportional to the causal strength. This visualization avoids the drawback of the doughnut chart, as the different sections are hard to compare, since they are bent.

5. Use Cases

Two use cases, namely a simulated continuously stirred-tank-reactor and a laboratory plant, are used to test the methods. Further experiments on the laboratory plant and more simulation results on the tank-reactor, as well was on other benchmark data sets can be found in [8].

5.1. Continuously-Stirred-Tank Reactor

To study the performance of the different methods, the model of a continuously-stirred-tank reactor (CSTR), explained in [30], is used. The underlying chemical reaction scheme consists of two irreversible follow-up reactions, where an educt A reacts to an intermediate product B, and this reacts to the resulting product C. The reactants are dissolved in a fluid and can be measured in terms of the three concentrations

c_{A}, c_{B}, c_{C}

at the outlet of the CSTR. The CSTR is continuously filled, while the fluid has the reactant concentration

c_{in}

and the temperature

ϑ_{fl}

. V describes the volume of the CSTR and F the selected volume flow rate. The parameters

k_{1}

and

k_{2}

are empirical parameters and describe the relationship between the temperature and speed of the chemical reaction.

E_{1}

and

E_{2}

are the activation energies of the reactants, and R is the universal gas constant. The parameter values are given in Table 2.

Table 2. Continuously-stirred-tank reactor (CSTR) parameters taken from [30].

**Table 2.** Continuously-stirred-tank reactor (CSTR) parameters taken from [30].
Parameter	Value	Unit
F	100	$L / min$
V	100	L
$k_{1}$	$7.2 \times 10^{10}$	$1 / min$
$k_{2}$	$5.2 \times 10^{10}$	$1 / min$
$E_{1} / R$	8750	K
$E_{2} / R$	9750	K

Finally, the underlying differential equations of the CSTR are:

\begin{matrix} {\dot{c}}_{A} (t) = & \frac{F}{V} (c_{in} (t) - c_{A} (t)) - k_{1} c_{A} (t) e^{- E_{1} / (R ϑ_{fl})}, \\ {\dot{c}}_{B} (t) = & k_{1} c_{C} (t) e^{- E_{1} / (R ϑ_{fl})} - k_{2} c_{B} (t) e^{- E_{2} / (R ϑ_{fl})} \\ - \frac{F}{V} c_{B} (t), \\ {\dot{c}}_{C} (t) = & k_{2} c_{B} (t) e^{- E_{2} / (R ϑ_{fl})} - \frac{F}{V} c_{C} (t) . \end{matrix}

(23)

The set-points of the two input variables are chosen as

ϑ_{fl, OP} = 350 K

and

c_{in, OP} = 1 mol / L

, while

ϑ_{fl}

is superposed with white noise having

N (0, 3 K^{2})

and

c_{in}

with white noise having

N (0, 0.1 {(mol / L)}^{2})

. To calculate the causal matrices, in total,

K = 1, 000

samples are used. An extract of the data set used for the analysis is given in Figure 1. From the differential equations, it is expected that the methods deliver as a result the disturbance propagation path

ϑ_{fl} \to c_{in} \to c_{A} \to c_{B} \to c_{C}

.

ϑ_{fl}

should be ranked in first position, since it has an impact on all three chemical reactions.

The results are illustrated in Figure 2, with the red squares marking the expected causal dependencies. The design parameters result in

β_{CCF} = 0.51

,

β_{TE} = 0.11, β_{GC} = 0.59

and

β_{SVM} = 0.28

.

Figure 1. Simulated data from the CSTR used for the analysis.

Figure 2. Causal matrices for the stirred-tank-reactor. The red squares describe the expected causal dependencies to be detected by the methods.

Analyzing the bar chart in Figure 2 illustrates that all methods detect a large causal strength of

ϑ_{fl}

pointing towards the other process variables. This becomes obvious when taking into account the underlying differential equations of the CSTR (see Equation (23)), as the temperature has a direct impact on all three concentrations. Furthermore, the result indicates that the nonlinearity implied in the exponential function has been correctly fitted by the Granger causality and cross-correlation function. Another strong causal strength has been found from

c_{in} \to c_{A}

. This can also be explained through the differential equations, as

c_{in}

has a direct impact on

c_{A}

. The relationship

c_{in} \to c_{B}

is the only indirect causal dependency detected by all methods, and

c_{A} \to c_{B}

and

c_{B} \to c_{C}

are detected by GC, TE and the SVM. The cause-effect dependency

c_{in} \to c_{C}

and

c_{A} \to c_{C}

has been found by TE and the SVM. Except the SVM, which detects the wrong causal dependency

c_{C} \to c_{B}

, no wrong causal dependencies are found by the other methods. Furthermore, the transfer entropy is the only method that detects all expected causal dependencies. The propagation path from the combination of the methods is given in Table 3.

Table 3. Root cause priority list calculated from the causal matrix in Figure 2.

**Table 3.** Root cause priority list calculated from the causal matrix in Figure 2.
Rank	Process variable	RC
1	$ϑ_{fl}$	$2.05$
2	$c_{in}$	$1.20$
3	$c_{A}$	$0.48$
4	$c_{B}$	$0.20$
5	$c_{C}$	$0.12$

The resulting root cause priority list is given in Table 3.

ϑ_{fl}

is correctly detected as being the root cause, and

c_{in}

, being the source of disturbances, is ranked second. Furthermore, the results show that causal dependencies at the end of the reaction chain result in weaker causal strengths or that these dependencies do not pass significance tests. The reason is that the disturbances are low-pass filtered each time a reaction takes place, so that less fluctuations for inferring causal dependencies are present in the data. Additionally, when merging the methods into one resulting causal matrix, the correct causal dependencies obtain a much stronger weighting compared to the causal matrices resulting from only one method at a time.

Figure 3. Experimental setup of the laboratory plant.

5.2. Experimental Laboratory Plant

To evaluate the methods on real-world data, a fault is generated in an experimental laboratory plant that pumps water in cycles. A photo of the plant is given in Figure 4, and the connection of the different process devices is sketched in Figure 4. The process starts by setting a pump (

x_{1}

) positioned on the lower side of the plant into feed-forward control to transfer water into the ball-shaped upper tank. From the upper tank, the water passes several measurement devices before flowing into a lower cylindrical tank. Finally, the water flows from the lower tank back to the pump and closes the water cycle. Between the two tanks, pressure (

x_{2}

) and flow (

x_{3}

) are measured. With a valve (

x_{4}

), placed between the flow meter and the lower tank, the water flow can be controlled. Additionally, the filling level (

x_{5}

) is measured in the lower tank.

Figure 4. Schematic drawing of the laboratory plant.

To generate a fault, a connection cable between the valve and compressor is removed and reattached randomly. The pump is set to 50% of its maximum feeding rate. The resulting data are illustrated in Figure 5. The instance the valve closes, the water is blocked from flowing from the upper to the lower tank. As it reopens, the process goes back into stationary phase. This means for the process variables that the flow reduces, while the level meter measures a continuous reduction of the water in the lower tank. The hydraulic pressure increases, until the pump stops delivering water from the lower to the upper tank. It is expected that the valve (

x_{4}

) is detected as being the root cause of the disturbance. The results are given in Figure 6, where the red squares mark the expected causal dependencies, and the design parameters result in

β_{CCF} = 0.33

,

β_{TE} = 0.26, β_{GC} = 0.31

and

β_{SVM} = 0.82

.

No method can detect all expected causal dependencies, but all expected causal dependencies are found by at least two methods. Table 4 gives the root cause priority list from the combined causal matrix. The valve is set on position one and is therefore correctly detected as being the root cause of the fault.

Like in the data from the CSTR, the outcome shows that when merging the methods into one resulting causal matrix, the correct causal dependencies obtain in a stronger weighting, meaning that the wrongly detected causal dependencies from some methods become less relevant.

Figure 5. Data of the laboratory plant when having a faulty valve. Sampling rate

T_{s} = 2 s

.

Figure 5. Data of the laboratory plant when having a faulty valve. Sampling rate

T_{s} = 2 s

.

Figure 6. Causal matrices for the laboratory plant. The red squares describe the expected causal dependencies to be detected by the methods.

Table 4. Root cause priority list calculated from the causal matrix in Figure 6.

**Table 4.** Root cause priority list calculated from the causal matrix in Figure 6.
Rank	Process Variable	RC
1	Valve opening ( $x_{4}$ )	1.61
2	Pressure ( $x_{2}$ )	1.04
3	Flow ( $x_{3}$ )	0.69
4	Filling level ( $x_{5}$ )	0.61
5	Pump feeding rate ( $x_{1}$ )	0

6. Conclusion and Future Work

Several data-driven methods have been proposed to detect causal dependencies in measurements by exploiting information contained in time-shifts and statistical relationships.

As use cases, the methods were applied to backtrack the disturbance propagation path and localize the root cause with data coming from a simulated stirred-tank reactor and from a laboratory plant. In both cases, the causing variable of the fault was found correctly. Additionally, the results of the use cases showed, that it seems to be useful that more than one method is applied to perform an analysis. Since a found causal dependency is only a hypothesis, it gives more evidence if different methods indicate the same causal dependencies.

There is much room for future research. As all proposed methods localize faults solely from process data, one part of future work will focus on the integration of prior knowledge available from a plant (e.g., through the scheme of the plant or known process characteristics). Additionally, attenuation of fluctuations along their propagation through the system can be used as a further source of information for reconstructing the propagation path. Another topic is how the methods can be extended to work on MIMO systems, as it is also possible that two faults occur at the same time in a plant. Finally, the approach for combining the methods shows good results, but is rather ad hoc. Approaches using fuzzy decision making or Bayesian statistics need to be investigated.

Author Contributions

The main contributions that have been presented in this paper are: (1) a new algorithm based on support vector machines for the detection of causal dependencies in data by using a recursive variable selection and model reduction approach; (2) a design approach to combine the different methods into one causal matrix. As each method follows a different mathematical approach, this was done by introducing exponential fitting parameters into each method. The combined causal matrix is in a subsequential step used to generate a root cause priority list to decide which process variable is most likely to be the root cause of the found causal dependencies; (3) an existing method based on the cross-correlation function has been extended by using permutation tests as significance test; and (4) a new visualization method for the representation of causal matrices was developed. This visualization allows a better comparison of the results coming from the different methods, since all causal matrices can be represented in a single graphic.

Conflicts of Interest

The authors declare no conflict of interest.

References

Granger, C.W.J. Investigating Causal Relations by Econometric Models and Cross-Spectral Methods. Econometrica 1969, 37, 424–38. [Google Scholar] [CrossRef]
Horch, A. Condition Monitoring of Control Loops; Trita-S3-REG, Royal Institute of Technology: Stockholm, Sweden, 2000. [Google Scholar]
Schreiber, T. Measuring information transfer. Phys. Rev. Lett. 2000, 85, 461–464. [Google Scholar] [CrossRef] [PubMed]
Murphy, K.P. Dynamic Bayesian Networks: Representation, Inference and Learning. PhD Thesis, University of California, Berkeley, CA, USA, 2002. [Google Scholar]
Eaton, D.; Murphy, K. Belief net structure learning from uncertain interventions. J. Mach. Learn. Res. 2007, 1, 1–48. [Google Scholar]
Bauer, M. Data-driven Methods for Process Analyis. PhD Thesis, University College London, London, UK, 2005. [Google Scholar]
Kühnert, C.; Gröll, L.; Heizmann, M.; Mikut, R. Methoden zur datengetriebenen Formulierung und Visualisierung von Kausalitätshypothesen. Automatisierungstechnik 2012, 60, 630–640. [Google Scholar] [CrossRef]
Kuehnert, C. Data-driven Methods for Fault Localization in Process Technology; Karlsruher Schriften zur Anthropomatik, KIT Scientific Publishing: Karlsruhe, Germany, 2013. [Google Scholar]
Verma, T.; Pearl, J. A theory of inferred causation. In Proceedings of the Second Internation Coference on the Principles of Knowledge Representation and Reasoning, Cambridge, MA, USA, 22–25 April 1991.
Kühnert, C.; Bernard, T.; Frey, C.W. Causal Structure Learning in Process Engineering Using Bayes Nets and Soft Interventions. In Proceedings of the IEEE International Conference on Industrial Informatics, Caparica, Lisbon, Portugal, 26–29 July 2011; pp. 69–74.
Johansson, R. System Modeling and Identification; Information and System Sciences Series 1; Prentice Hall: Englewood cliffs, NJ, USA, 1993. [Google Scholar]
Gelper, S.; Croux, C. On the Construction of the European Economic Sentiment Indicator. Oxf. Bull. Econ. Stat. 2010, 72, 47–62. [Google Scholar] [CrossRef]
Saunders, A. The Short-Run Causal Relationship between U.K. Interest Rates, Share Prices and Dividend Yields. Scott. J. Polit. Econ. 1979, 26, 61–71. [Google Scholar] [CrossRef]
Xiaoyan, M.; Kewei, C.; Rui, L.; Xiaotong, W.; Li, Y.; Xia, W. Application of Granger causality analysis to effective connectivity of the default-mode network. In Proceedings of the 2010 IEEE/ICME International Conference on Complex Medical Engineering (CME), Gold Coast, QLD, Australia, 13–15 July 2010; pp. 156–160.
Yang, W.; Luo, Q. Modeling Protein-Signaling Networks with Granger Causality Test. In Frontiers in Computational and Systems Biology; Computational Biology, Series 15; Springer: London, UK, 2010; pp. 249–257. [Google Scholar]
Akaike, H. A new look at the statistical model identification. IEEE Trans. Autom. Control 1974, 19, 716–723. [Google Scholar] [CrossRef]
Durbin, J.; Watson, G.S. Testing for serial correlation in least squares regression. Biometrika 1950, 37, 409–428. [Google Scholar] [PubMed]
Seth, A.K. A MATLAB toolbox for Granger causal connectivity analysis. J. Neurosci. Methods 2010, 186, 262–273. [Google Scholar] [CrossRef] [PubMed]
Chavez, M.; Martinerie, J.; Le Van Quyen, M. Statistical assessment of nonlinear causality: Application to epileptic EEG signals. J. Neurosci. Methods 2003, 124, 113–128. [Google Scholar] [CrossRef]
Staniek, M.; Lehnertz, K. Symbolic transfer entropy: Inferring directionality. Biomed. Tech. 2009, 54, 323–328. [Google Scholar] [CrossRef] [PubMed]
Marschinski, R.; Kantz, H. Analysing the information flow between financial time series. Eur. Phys. J. B Condens. Matter 2002, 30, 275–281. [Google Scholar] [CrossRef]
Vapnik, V.N. Estimation of Dependencies Based on Empirical Data; Springer: New York, NY, USA, 1982. [Google Scholar]
Smola, A.J.; Schölkopf, B. A tutorial on support vector regression. Stat. Comput. 2004, 14, 199–222. [Google Scholar] [CrossRef]
Wu, F.; Yin, S.; Karimi, H. Fault Detection and Diagnosis in Process Data Using Support Vector Machines. J. Appl. Math. 2014, 2014. [Google Scholar] [CrossRef]
Shi, F.; Chen, J.; Xu, Y.; Karimi, H. Optimization of Biodiesel Injection Parameters Based on Support Vector Machine. Math. Probl. Eng. 2013, 2013. [Google Scholar] [CrossRef]
Cristianini, N.; Shawe-Taylor, J. An Introduction to Support Vector Machines and Other Kernel-Based Learning Methods, 1st ed.; Cambridge University Press: Cambridge, MA, USA, 2000. [Google Scholar]
Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambridge, MA, USA, 2001. [Google Scholar]
Nelder, J.A.; Mead, R. A Simplex Method for Function Minimization. Comput. J. 1965, 7, 308–313. [Google Scholar] [CrossRef]
Rakotomamonjy, A. Analysis of SVM regression bounds for variable ranking. Neurocomputing 2007, 70, 1489–1501. [Google Scholar] [CrossRef]
Tenny, M.; Rawling, J. Closed-loop Behavior of Nonlinear Model Predictive Control. Tech. Rep. 2002, 50, 2142–2154. [Google Scholar] [CrossRef]

© 2014 by the authors; licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kühnert, C.; Beyerer, J. Data-Driven Methods for the Detection of Causal Structures in Process Technology. Machines 2014, 2, 255-274. https://doi.org/10.3390/machines2040255

AMA Style

Kühnert C, Beyerer J. Data-Driven Methods for the Detection of Causal Structures in Process Technology. Machines. 2014; 2(4):255-274. https://doi.org/10.3390/machines2040255

Chicago/Turabian Style

Kühnert, Christian, and Jürgen Beyerer. 2014. "Data-Driven Methods for the Detection of Causal Structures in Process Technology" Machines 2, no. 4: 255-274. https://doi.org/10.3390/machines2040255

Article Menu

Data-Driven Methods for the Detection of Causal Structures in Process Technology

Abstract

1. Introduction

2. Detecting Causal Dependencies in Process Measurements

3. Methods for Reconstructing the Disturbance Propagation Path

3.1. Cross-Correlation Function

3.1.1. Significance Test

3.1.2. Causal Strength

3.2. Granger Causality

3.2.1. Significance Test

3.2.2. Causal Strength

3.3. Transfer Entropy

3.3.1. Significance Test

3.3.2. Causal Strength

3.4. Support Vector Machines for Regression

3.4.1. Detecting Causal Dependencies

3.4.2. Causal Strength

4. Reconstruction of the Disturbance Propagation Path and Localization of the Root Cause

4.1. Balancing and Combining Causal Matrices

4.2. Root Cause Priority List

4.3. Visualizing Causal Matrices

5. Use Cases

5.1. Continuously-Stirred-Tank Reactor

5.2. Experimental Laboratory Plant

6. Conclusion and Future Work

Author Contributions

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI