1. Introduction
The use of Bayesian Networks (BN) for the study of reliability has been widely advocated in the literature [
1]. However, the asymmetric processes that are common in system reliability can hardly be fully captured by the framework of a BN.
Fortunately, it has been shown that any discrete BN can be embellished into a tree-based graph called a Chain Event Graph (CEG) [
2,
3]. The CEG is a graphical model that is a function of an underlying event tree and certain context specific conditional independence statements. In particular, the CEG can model and depict the types of structural asymmetries that the BN framework struggles to embody [
4]. This can then provide a framework for studying the causal mechanisms behind the failures of a given system. For example, Cowell and Smith [
2] developed a dynamic programming algorithm for maximum a posterior (MAP) structural learning for causal discovery within a restricted class of CEGs called stratified CEGs.
Conventional causal algebras have been adapted from Pearl’s
do-calculus for BNs [
5] to the singular manipulation on a CEG, and the
back-door theorem has been generalised to estimate the effect of this manipulation by previous research [
6,
7]. In a different strand of research, Barclay, Hutton, and Smith [
8] developed a class of CEGs suited for incorporating various missing data structures directly through its topology. Unlike BNs, conjugate inference is still well supported by the structure of CEGs even in the presence of missingness [
2].
In
Section 2, we adapt the MAP structural learning algorithm [
2] to search for the best scoring structure of a CEG when some data is informedly missing. The selected model provides the best explanation of the observed data that has been informedly censored. By assuming that each candidate CEG is causal in the sense formally defined in [
6,
7], the best scoring CEG is of a CEG in idle mode, and then causal deductions can be made from it.
In our recent work [
9], we demonstrated how to embed the causal reasoning underlying engineering reports for CEGs designed specifically for applications in system reliability. The causal calculus we developed there only provided a framework to study and analyse the impact of
remedial interventions, i.e., interventions designed to rectify the root cause after a failure had been observed.
In
Section 3, we extend the use of the CEG causal framework with missingness to express and analyse a different kind of intervention called a
routine intervention. This new class of intervention is necessary when we are evaluating the impact of interventions within scheduled maintenance regimes. These regimes are prepared in advance and are used to inspect machines with the objective of preventing future failures that might be about to happen. In this context, although the data may be informedly missing, we can still develop algorithms that, under certain stated hypotheses, produce formulae to give quantitative estimates of the impacts of various candidate routine interventions of this type.
In this paper, we can, therefore, show how we can use the underlying CEG model to predict the effect of various types of such interventions. In particular we report a new back-door criterion—an analogue of Pearl’s back-door criterion for BNs [
5]. This gives a quick sufficient condition as to whether the effect of such an intervention is identifiable when data is censored in a way that induces informed missingness. This criterion significantly increases the scope of the original causal calculus using CEGs designed for the singular manipulation [
6] and the stochastic manipulation established for BNs [
5]. It, thus, enables us to transfer causal technologies so that they apply to this graphical family.
In
Section 4, we demonstrate how to interpret the causal structures of a best scoring CEG by a simple example of a conservator system. Furthermore, comparative experiments are designed to show that the proposed new causal algebras can embellish the current structural learning algorithm to capture the causal effects of a routine intervention.
The contributions of this paper are threefold. First, we formally derive a method for selecting a CEG providing the framework of a probability model of maintenance regimes, which acknowledges the presence of informed missingness within the fitted data endemic in these applications. Second, we devise new causal algebras for the routine intervention and prove the identifiability of its causal effects in presence of the types of missing data that we might expect from this application. Third, we demonstrate how important this new intervention calculus can be in making valid inferences and how naive inferences that treat the system as uncontrolled and ignore the underlying causal structure within this application can severely mislead the analyst.
2. Causal Identifiability on Chain Event Graphs with Informed Missingness
We begin this section by briefly reviewing and then extending the definition of a CEG [
2,
3,
6,
7,
8] before providing a systematic approach to embedding information about the context-specific missingness into a CEG customised for the domain of reliability [
9,
10].
Suppose we have a vector of variables taking values in a state space , among which we explore various putative causal hypotheses. An event tree can be constructed to represent relationships embedded in , where denotes the vertex set and denotes the edge set of . Each non-leaf node is also called a situation. Let denote the set of non-leaf nodes. The floret of a situation is a subtree of , denoted by . The vertex set of consists of v and the vertices in connected from v by a directed edge in : . The edge set of is a subset of satisfying .
Let
denote the collection of all florets on the tree
. Let
denote a subpath from the root node
to a node
on the event tree. Every floret
represents a random variable conditional on
. We denote this conditional random variable by
for
. Each emanating edge
of
v is labelled by a value
. Thus, every conditional variable
,
, is represented on a set of florets on the event tree, denoted by
. Previous research [
3,
4,
6,
7] has demonstrated the capability of a tree-like structure to encode the asymmetric information. The corresponding event tree
associated with this description can be asymmetric and non-stratified [
2,
4] so that the florets representing the same variable can have different distances from the root node
.
Figure 1 depicts an event tree for a conservator system. Its variables are
. The categorical variable
represents causes of defects and has three levels
;
is the oil leak indicator;
is the alarm indicator;
is an indicator of whether there is a sight glass defect or a buchholz defect; and
is a failure indicator. This tree is constructed under the assumption that the fault caused by low temperature is irrelevant to the sight glass or buchholz defect, labelled as s/b on the tree. The situations of the tree are annotated as
, and the leaves are the unlabelled vertices. Since the last variable modelled on this tree is
, the leaves represent the status of the conservator being failed or operational.
Let
denote the set of all root-to-leaf paths on the tree and
denote the root-to-leaf paths passing through vertices
. The vector
is called the vector of primitive probabilities. Let
, which satisfies
and
for all
, where
. Then, the pair
indexes the
probability tree[
2,
3] defined over
.
The BN is capable of handling the missing data whenever this applies to all values of a pre-assigned set of variables by assigning a missingness indicator to each unobservable variable within that set. It is, therefore, possible to use the BN as a framework for identifying when causal hypotheses are identifiable in this rather restricted setting. The associated analyses use various graphically stated criteria—such as the front-door and the back-door criteria—see e.g., [
11,
12,
13]. However, unfortunately, the types of missingness that routinely occur in reliability—and, in particular, those associated with the data we collect when performing routine maintenance—are rarely missing across the original random vector associated with the system in this sort of symmetric way. This is because we only learn about those parts of a system that we have chosen to inspect.
In contrast, the probability tree provides a natural and more flexible way to visualise and model the context-specific missingness, where the unobservability of the variable partially depends on which path it lies on the tree. Here, we import the informed missingness into the event tree by defining the
floret-dependent missingness [
14]. Thus, consider a floret
, if the value of the corresponding variable
is unobservable, then we classify this floret into
.
On the other hand, if conditioned on
, the value of the variable
is always observed, and then the corresponding floret is classified into
. Accordingly, we have two subsets of florets,
and
, representing unobservable florets and fully observed florets, respectively. Then,
and
. For every unobservable floret
, we define a
missing floret indicator as:
Then,
represents the conditional missingness and
When
, we construct a floret representing this indicator, denote this by
, and call it a
missing indicator floret. We then reconstruct an event tree by importing the missing indicator florets on to
. We call this a
missingness event tree (m-tree). Here, we assume that
precedes
, denoted by
. In particular
is appended to the edge emanating from
labelled by
. This artificially introduced ordering has already been shown to be useful for interpreting an event tree constructed with informed missingness [
8]. The m-tree then has a new class of florets
for
, which is the set of missing indicator florets. The variables associated to the m-tree are expanded to
. We denote the topology of the m-tree by
. An example of the missingness event tree is shown in
Figure 2.
Having a missingness event tree, we further elicit a
missingness staged tree from
. For two situations
v and
w, if
and
represent the same variable, then these two situations are in the same stage whenever
[
3], and the emanating edge
is labelled the same value of
X as
when
. Here, we relax the restrictions for a
stratified staged tree where two situations in the same stage have the same distance from the root node [
2,
4]. For example,
can be in the same stage as
in the missingness event tree in
Figure 2, similar example see [
8].
Here, we assume that situations along the same root-to-leaf path cannot be in the same stage. This is the
square-free condition defined by Collazo et al. [
3]. Vertices in the same stage are assigned a unique colour, and the edges emanating from the same stage with the same label are assigned the same colour. Such a coloured tree that embeds context-specific conditional independence relations is a missingness staged tree. Let
denote the set of stages in the m-tree. Let
represent the set of stages associated with variable
and
. Let
denote the set of stages associated to the missing floret indicators. An example of a missingness staged tree of the m-tree in
Figure 2 is depicted in
Figure 3.
Two situations in the same stage are in the same positionw if the rooted subtrees and are isomorphic. This clustering gives a finer partition of vertices than U, denoted by . A missingness chain event graph (MCEG) can be constructed from a missingness staged tree as follows. A sink node is created by merging all the leaves of . Then, the vertex set is .
For any two
, we create an edge for every
and the child node
, which belongs to the position
, where the annotating edge probability is the same as that of
and is inherited from the original tree. The colours of the vertices and edges of the MCEG are the same as the corresponding stages and edges in the missingness staged tree [
15].
Note that the events on the event tree are chronologically ordered. By definition, a cause comes before its effects. We can be reasonably confident in providing
with a plausible order. For example, the trajectory of the events that lead to a machine’s failure always starts with a cause, followed by symptoms, and terminates with a failure. Therefore, we can construct event trees for analysing system failures following this order. In this case, having failed or not is always modelled on the leaves of the tree. Examples are shown in
Figure 1 and
Figure 2.
It follows that, for this special application of CEGs in system reliability, it is convenient to adapt the semantics and to replace the sink node defined above by and . In this way, is the receiving node of the edges labelled by a failure, while is the receiving node of the edges labelled by an operational condition.
Thus, we can classify the root-to-sink paths into two categories: failure paths and deteriorating paths. The former terminate in
, while the latter terminate in
.
Figure 4 gives an example of such a MCEG derived from
Figure 3.
It is possible to perform conjugate inference on an idle MCEG even when the data is informed censored [
8,
16]. This enables us to greatly speed up the search for good explanatory models. The simplest prior to set up in this context assumes each stage vector
is independently Dirichlet with parameters
[
3,
8]. This is identical to the case when there are no missingness indicators:
Let
so that, in particular, the equivalent sample size is
.
Then, given a set of observations
D, the posterior can be computed in a closed form due to Dirichlet-multinomial conjugacy. Thus,
where
, and
is the updated parameter vector.
The log-likelihood score for a MCEG
can be decomposed into local scores associated with the variables
and the missingness indicators
.
We can explicitly compute the log-likelihood in a closed form:
To elicit a best scoring CEG from an event tree, it is necessary to search over all possible orderings over the variables modelled by the tree when the total order over the variables is unknown. The event tree is defined to be built with respect to , and the associated missingness event tree is built as a function of with appropriate hypotheses of missingness. Therefore, even when the dataset has missing values, we still only search over permutations over to find an appropriate ordering that best explains the observed process.
Let denote an ordering of . This could be a set of partial orderings. All variables represented on the m-tree can automatically be ordered given . We denote the m-tree with a specified ordering by .
It is non-trivial to identify causal structure from a finite observational dataset. However, the idle model first needs to be estimated before any causal relations can be explored. Many advances have been made in casting the causal discovery as a Bayesian model selection problem [
2,
17,
18]. The MAP structural learning algorithm is a popular and well-developed tool for selecting a best topology of CEGs that best explains the data.
Under the hypothesis that there are no unobserved confounders [
2], we render the best scoring structure selected by the MAP algorithm causal and assume it is the model of the idle system when there is no intervention imported. This enables us to further perform causal analysis. Given such a causal graph, we can derive causal hypotheses from the structure and estimate causal effects under different hypothesised underlying causal mechanisms.
Sometimes there is only a putative partial order rather than total order on the variables whose causal relationship needs to be explored. However, in this setting we can still perform the search over candidate CEGs for the best fitting model, providing that the missing variables only extend to later nodes of the tree.
Cowell and Smith [
2] and Collazo et al. [
3] presented a recursive algorithm to find the best sink variable for every subset of
ordered by increasing size. This algorithm can be simply adapted for the tree built for the informedly missing data. Let
denote the subset of variables whose ordering is needed to be learned and
denote the best partial ordering over
.
Then, through applying the algorithm designed by [
2,
3] on every
, we can find the best ordering over the variables defined on the tree, where
. Here, we search over subspaces
for
and compute the local scores with respect to the corresponding
. In particular, for every
, where
, we find a best sink variable
for every
that has been ordered appropriately. The best sink variable
is found by computing the local score of the best subtree spanned by
for every
together with the corresponding missingness indicators.
The MAP score can be evaluated directly from the local scores that have been computed because the total score is the sum of local scores as shown in Equation (
5). Two MCEGs
and
with respect to the same data set can be compared by the log-posterior Bayes factor. Suppose both trees have Dirichlet priors whose hyperparameters are
and
. The Bayes factor, then, has a closed form [
3]:
where
denotes the log prior. Different priors over models can be chosen given expert judgement on different missingness mechanisms and conditional dependencies. When using a uniform prior,
, where
denotes the total number of models.