Sequential Ignorability and Dismissible Treatment Components to Identify Mediation Effects

Yuhao Deng; Haoyu Wei; Xia Xiao; Yuan Zhang; Yuanmin Huang

doi:10.3390/math12152332

Abstract

Mediation analysis is a useful tool to study the mechanism of how a treatment exerts effects on the outcome. Classical mediation analysis requires a sequential ignorability assumption to rule out cross-world reliance of the potential outcome of interest on the counterfactual mediator in order to identify the natural direct and indirect effects. In recent years, the separable effects framework has adopted dismissible treatment components to identify the separable direct and indirect effects. In this article, we compare the sequential ignorability and dismissible treatment components for longitudinal outcomes and time-to-event outcomes with time-varying confounding and random censoring. We argue that the dismissible treatment components assumption has advantages in interpretation and identification over sequential ignorability, whereas these two conditions lead to identical estimators for the direct and indirect effects. As an illustration, we study the effect of transplant modalities on overall survival mediated by leukemia relapse in patients undergoing allogeneic stem cell transplantation. We find that Haplo-SCT reduces the risk of overall mortality through reducing the risk of relapse, and Haplo-SCT can serve as an alternative to MSDT in allogeneic stem cell transplantation.

Keywords:

dismissible treatment component; identification; mediation; separable effect; sequential ignorability

MSC:

62N01

1. Introduction

Mediation analysis is a useful tool to detect the causal mechanism in randomized trials and observational studies [1]. The original version of mediation analysis was introduced using structural equations. Under the potential outcomes framework, the mediator is considered as a potential outcome of the initial treatment, whereas the primary outcome is considered as the potential outcome of both the initial treatment and the mediator [2,3,4]. The notion of causal mediation analysis stands on a philosophical view that the mediator can be manipulated. The total effect can be decomposed into a natural direct effect and a natural indirect effect. Identification of these natural effects typically requires a sequential ignorability assumption, which is sometimes hard to interpret [5,6]. Sequential ignorability means that the mediator is as-if-randomized given baseline covariates and treatment.

Due to the fact that the mediator may not be manipulable, the classical mediation analysis has been criticized by many researchers [7,8,9,10]. An interventionist approach named the separable effects framework decomposes the initial treatment into several components, and each component exerts an effect on a single event [11,12,13]. The core identification assumption in separable effects is the dismissible treatment components. The target estimands are usually termed separable direct and indirect effects, which share the same identification formula with the natural direct and indirect effects under slightly different sets of assumptions. The estimands of separable effects can be more interpretable by finding the treatment components. When sequential ignorability (or a dismissible treatment component) fails, the separable effects still offer a causal explanation, although they may not be identifiable.

To relax the original sequential ignorability, post-treatment variables are taken into consideration in the assumption. It is common that there is post-treatment confounding that affects the mediator and primary outcome [14,15,16]. The definition of natural direct and indirect effects should be modified for identification. When it comes to longitudinal studies or time-to-event studies, the mediation analysis becomes far more complicated [17,18,19]. The post-treatment variables are time-varying. Following the original idea of mediation analysis, a post-treatment variable should be considered as a potential outcome of all manipulable variables measured prior to this variable. Generalizing the sequential ignorability is tedious work as it involves dealing with these post-treatment variables. The sequential ignorability assumption is usually deemed unreasonable because the post-treatment variables may not be manipulable [20]. Even under the separable effects framework, there is a lack of desirable justification of assumptions, since the treatment components can exert effects on the primary outcome through time-varying confounding [13,21]. The separation of causal pathways is determined a priori. It is still worth studying the difference between the sequential ignorability and separable effects frameworks in identifying, estimating and interpreting the treatment effect, and how to extend the frameworks to complicated scenarios.

In this article, we study the mediation analysis for longitudinal outcomes and time-to-event outcomes. We compare the sequential ignorability assumption in the classical mediation analysis and the dismissible treatment components in the separable effects framework. We formally give the assumptions for identification. We argue that the dismissible treatment components can be more interpretable and enjoy more notation conciseness than sequential ignorability. A randomized mediator may not be appropriate, but a treatment component with an isolated effect on a single event may exist in some scenarios. Furthermore, the dismissible treatment components assumption is weaker than sequential ignorability because the former allows unmeasured confounding of some special types. For time-to-event outcomes, even whether sequential ignorability can be formally expressed is a problem [22].

The remainder of this article is organized as follows: In Section 2, we consider longitudinal outcomes. From the comparison of sequential ignorability and dismissible treatment components, it is easy to see the advantage of dismissible treatment components. In Section 3, we consider time-to-event outcomes. We only introduce the dismissible treatment components because the sequential ignorability is infeasible both in notation and interpretation. In Section 4, we apply the framework to a stem cell transplantation study, where the outcome is of time-to-event and there is a binary time-varying covariate. Finally, this article ends with a brief conclusion and discussion.

2. Longitudinal Outcomes

2.1. Identifiability by Sequential Ignorability

Suppose that there are a total of K periods of measurements. Let

A \in {0, 1}

be the treatment, where

A = 0

stands for the control (placebo) and

A = 1

stands for the active treatment. Let

L_{0}

be the baseline covariate,

L_{M}^{k}

be the time-varying mediator-inducing confounding,

L_{Y}^{k}

be the time-varying outcome-inducing confounding,

M^{k}

be the mediator and

Y^{k}

be the outcome at period k (

k = 1, \dots, K

). We aim to study the direct treatment effect on the outcome of interest

Y^{K}

measured at the last period and the indirect effect through time-varying mediators. In fact, at the period k, the time-varying confounding

L_{M}^{k}

,

L_{Y}^{k}

, mediator

M^{k}

and outcome

Y^{k}

are potential outcomes of the treatment (

a_{M}

and

a_{Y}

) and history (

g_{L_{M}}^{k}

,

g_{L_{Y}}^{k}

,

g_{M}^{k}

and

g_{Y}^{k}

), denoted by

\begin{matrix} L_{M}^{k} (a_{M}, g_{L_{M}}^{k}) & = L_{M}^{k} (a_{M}, l_{M}^{j}, l_{Y}^{j}, m^{j}, y^{j} : j = 1, \dots, k - 1), \end{matrix}

(1)

\begin{matrix} L_{Y}^{k} (a_{Y}, g_{L_{Y}}^{k}) & = L_{Y}^{k} (a_{Y}, l_{M}^{j}, l_{Y}^{j}, m^{j}, y^{j}, l_{M}^{k} : j = 1, \dots, k - 1), \end{matrix}

(2)

\begin{matrix} M^{k} (a_{M}, g_{M}^{k}) & = M^{k} (a_{M}, l_{M}^{j}, l_{Y}^{j}, m^{j}, y^{j}, l_{M}^{k}, l_{Y}^{k} : j = 1, \dots, k - 1), \end{matrix}

(3)

\begin{matrix} Y^{k} (a_{Y}, g_{Y}^{k}) & = Y^{k} (a_{Y}, l_{M}^{j}, l_{Y}^{j}, m^{j}, y^{j}, l_{M}^{k}, l_{Y}^{k}, m^{k} : j = 1, \dots, k - 1), \end{matrix}

(4)

where

l_{M}^{0} = l_{Y}^{0} = m^{0} = y^{0} = \emptyset

. We do not consider the post-treatment variables as potential outcomes of baseline covariates because baseline covariates are not manipulable. For notation convenience, we also define

L_{M}^{K + 1} (a_{M}, g_{L_{M}}^{K + 1}) = L_{Y}^{K + 1} (a_{Y}, g_{L_{Y}}^{K + 1}) = M^{K + 1} (a_{M}, g_{M}^{K + 1}) = Y^{K + 1} (a_{Y}, g_{Y}^{K + 1}) = \emptyset .

A significant challenge in conducting mediation analysis for longitudinal outcomes is that the time-varying mediators and outcomes are interacting with each other. The outcome

Y^{k}

at period k can have an effect on the mediator

M^{k + 1}

at period

k + 1

. Therefore, it is not straightforward to define the natural level of the mediators. A possible approach is to define the natural level of time-varying confounding, mediators and outcomes iteratively. At period 1, we set the treatment at

a_{M}

and obtain a natural level of the mediator-inducing confounding (possible null)

L_{M}^{1} (a_{M})

. Then, we set the treatment at

a_{Y}

and obtain a counterfactual level of the outcome-inducing confounding (possible null)

L_{Y}^{1} (a_{Y}, L_{M}^{1} (a_{M}))

. Next, we set the treatment at

a_{M}

again and obtain the counterfactual level of the mediator. Finally, we set the treatment at

a_{Y}

and obtain the counterfactual level of the outcome. Repeating this procedure, we can derive the counterfactual levels of the mediator-inducing confounding, outcome-inducing confounding, mediators and outcomes at all periods, denoted by

L_{M}^{k} (a_{M}, a_{Y})

,

L_{Y}^{k} (a_{M}, a_{Y})

,

M^{k} (a_{M}, a_{Y})

and

Y^{k} (a_{M}, a_{Y})

, respectively, for

k = 1, \dots, K

. Let

G_{L_{M}}^{k} (a_{M}, a_{Y})

,

G_{L_{Y}}^{k} (a_{M}, a_{Y})

,

G_{M}^{k} (a_{M}, a_{Y})

and

G_{Y}^{k} (a_{M}, a_{Y})

be the history of

L_{M}^{k}

,

L_{Y}^{k}

,

M^{k}

and

Y^{k}

under such a sequential intervention. The natural direct effect (NDE) and natural indirect effect (NIE) are defined as

\begin{matrix} NDE & = E {Y^{K} (0, 1) - Y^{K} (0, 0)}, \end{matrix}

(5)

\begin{matrix} NIE & = E {Y^{K} (1, 1) - Y^{K} (0, 1)} . \end{matrix}

(6)

Let

G_{L_{M}}^{k}

,

G_{L_{Y}}^{k}

,

G_{M}^{k}

and

G_{Y}^{k}

be the observed history prior to

L_{M}

,

L_{Y}

, M and Y. We always assume the stable unit treatment value assumption (SUTVA) that all individuals are independent with each other and the potential outcomes are well defined. Consistency states that the observed variables are equal to the potential variables under the observed treatment.

Assumption 1

(Consistency). For

k = 1, \dots, K

,

\begin{matrix} L_{M}^{k} = L_{M}^{k} (A, A), L_{Y}^{k} = L_{Y}^{k} (A, A), M^{k} = M^{k} (A, A), Y^{k} = Y^{k} (A, A), \\ G_{L_{M}}^{k} = G_{L_{M}}^{k} (A, A), G_{L_{Y}}^{k} = G_{L_{Y}}^{k} (A, A), G_{M}^{k} = G_{M}^{k} (A, A), G_{Y}^{k} = G_{Y}^{k} (A, A) . \end{matrix}

Identification of the NDE and NIE requires identifying the distribution of the counterfactual outcomes. To tackle the dependence of cross-world quantities on the history, we make the ignorability and sequential ignorability assumption as follows:

Assumption 2

(Ignorability for longitudinal outcomes).

\begin{matrix} A ⫫ (L_{M}^{k} (a_{M}, g_{L_{M}}^{k}), L_{Y}^{k} (a_{Y}, g_{L_{Y}}^{k}), M^{k} (a_{M}, g_{M}^{k}), Y^{k} (a_{Y}, g_{Y}^{k}) : k = 1, \dots, K) ∣ L_{0} . \end{matrix}

Assumption 3

(Sequential ignorability for longitudinal outcomes).

\begin{matrix} L_{M}^{k} ⫫ (L_{M}^{j + 1} (a_{M}, g_{L_{M}}^{j + 1}), L_{Y}^{j} (a_{Y}, g_{L_{Y}}^{j}), M^{j} (a_{M}, g_{M}^{j}), Y^{j} (a_{Y}, g_{Y}^{j}) : j = k, \dots, K) ∣ A, L_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, \\ L_{Y}^{k} ⫫ (L_{M}^{j + 1} (a_{M}, g_{L_{M}}^{j + 1}), L_{Y}^{j + 1} (a_{Y}, g_{L_{Y}}^{j + 1}), M^{j} (a_{M}, g_{M}^{j}), Y^{j} (a_{Y}, g_{Y}^{j}) : j = k, \dots, K) ∣ A, L_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, \\ M^{k} ⫫ (L_{M}^{j + 1} (a_{M}, g_{L_{M}}^{j + 1}), L_{Y}^{j} (a_{Y}, g_{L_{Y}}^{j}), M^{j + 1} (a_{M}, g_{M}^{j + 1}), Y^{j} (a_{Y}, g_{Y}^{j}) : j = k, \dots, K) ∣ A, L_{0}, G_{M}^{k} = g_{M}^{k}, \\ Y^{k} ⫫ (L_{M}^{j + 1} (a_{M}, g_{L_{M}}^{j + 1}), L_{Y}^{j} (a_{Y}, g_{L_{Y}}^{j}), M^{j + 1} (a_{M}, g_{M}^{j + 1}), Y^{j + 1} (a_{Y}, g_{Y}^{j + 1}) : j = k, \dots, K) ∣ A, L_{0}, G_{Y}^{k} = g_{Y}^{k}, \end{matrix}

where

k = 1, \dots, K

.

Ignorability simply means that the treatment is independent of all potential variables. The first independence in the sequential ignorability means that the treatment mechanism is ignorable given baseline covariates. At the period k, given all the observed variables prior to

L_{M}^{k}

(or

L_{Y}^{k}

,

M^{k}, Y^{k}

) including baseline covariates and observed treatment, the observed

L_{M}^{k}

(or

L_{Y}^{k}

,

M^{k}, Y^{k}

) are independent of all potential variables later than

L_{M}^{k}

(or

L_{Y}^{k}

,

M^{k}, Y^{k}

). Every post-treatment variable is as-if-randomized given the observed variables prior to the current time. To make the sequential ignorability assumption hold, unmeasured confounding between any two of

L_{M}^{j_{1}}

,

L_{Y}^{j_{2}}

,

M^{j_{3}}

and

Y^{j_{4}}

should be excluded (

j_{1}, j_{2}, j_{3}, j_{4} \in {1, \dots, K}

). Figure 1 displays a directed acyclic graph (DAG) that satisfies sequential ignorability. To derive the potential outcome

Y^{K} (a_{M}, a_{Y})

, we set

L_{M}^{k}

and

M^{k}

at the level under the treatment

a_{M}

, and set

L_{Y}^{k}

and

Y^{k}

at the level under the treatment

a_{Y}

, given the baseline covariates and history. No unmeasured confounding is allowed on the graph.

Figure 1. A direct acyclic graph (DAG) for longitudinal outcomes with 3 periods. Here, A is the treatment,

L_{M}^{j}

is the mediator-inducing confounding,

L_{Y}^{j}

is the outcome-inducing confounding,

M^{j}

is the mediator and

Y^{j}

is the outcome at period j. The baseline covariates

L_{0}

, which can have direct edges to all variables, are omitted. Red lines depict edges into the mediator, blue lines depict edges into the mediator-inducing confounding, green lines depict edges into the outcome, and black lines depict adges into the outcome-inducing confounding. This DAG also satisfies Markovness.

In the presence of censoring, let

Δ_{L_{M}}^{k}

,

Δ_{L_{Y}}^{k}

,

Δ_{M}^{k}

and

Δ_{Y}^{k}

be the censoring indicators for

L_{M}

,

L_{Y}

, M and Y, respectively, at period k. The censoring indicator equals 1 if the variable is observed and 0 if the variable is missing. We assume that the censoring is random given the observed history, not depending on potential variables. The random censoring assumption we use here is weaker than the non-informative censoring assumption usually made in survival analysis literature which assumes the censoring is independent of all potential variables, as we allow the censoring probability being explained by the time-varying observed variables.

Assumption 4

(Random censoring for longitudinal outcomes). For

k = 1, \dots, K

,

\begin{matrix} Δ_{L_{M}}^{k} ⫫ L_{M}^{k} (a_{M}, g_{L_{M}}^{k}) ∣ A = a_{M}, L_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, \\ Δ_{L_{Y}}^{k} ⫫ L_{Y}^{k} (a_{Y}, g_{L_{Y}}^{k}) ∣ A = a_{Y}, L_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, \\ Δ_{M}^{k} ⫫ M^{k} (a_{M}, g_{M}^{k}) ∣ A = a_{M}, L_{0}, G_{M}^{k} = g_{M}^{k}, \\ Δ_{Y}^{k} ⫫ Y^{k} (a_{Y}, g_{Y}^{k}) ∣ A = a_{Y}, L_{0}, G_{Y}^{k} = g_{Y}^{k} . \end{matrix}

In addition, we assume the positivity. The supports of potential variables should overlap between hypothetical treatments and the censoring time should be large enough so that we have information about the variable distributions at every period.

Assumption 5

(Positivity for longitudinal outcomes). The following statements hold:

\begin{matrix} P (L_{0} = l_{0}) > 0 & \Rightarrow P (A = a, L_{0} = l_{0}) > 0, \\ P (L_{0} = l_{0}, G_{L_{M}}^{k} (a_{M}, a_{Y}) = g_{L_{M}}^{k}) > 0 & \Rightarrow P (A = a_{M}, L_{0} = l_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, Δ_{L_{M}} = 1) > 0, \\ P (L_{0} = l_{0}, G_{L_{Y}}^{k} (a_{M}, a_{Y}) = g_{L_{Y}}^{k}) > 0 & \Rightarrow P (A = a_{Y}, L_{0} = l_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, Δ_{L_{Y}} = 1) > 0, \\ P (L_{0} = l_{0}, G_{M}^{k} (a_{M}, a_{Y}) = g_{M}^{k}) > 0 & \Rightarrow P (A = a_{M}, L_{0} = l_{0}, G_{M}^{k} = g_{M}^{k}, Δ_{M} = 1) > 0, \\ P (L_{0} = l_{0}, G_{Y}^{k} (a_{M}, a_{Y}) = g_{Y}^{k}) > 0 & \Rightarrow P (A = a_{Y}, L_{0} = l_{0}, G_{Y}^{k} = g_{Y}^{k}, Δ_{Y} = 1) > 0, \end{matrix}

for every

a, a_{M}, a_{Y} \in {0, 1}

.

Under ignorability, sequential ignorability, consistency and random censoring, we can show that (see Appendix A)

\begin{matrix} d P (L_{M}^{k} (a_{M}, g_{L_{M}}^{k}) = l_{M}^{k} ∣ L_{0} = l_{0}, G_{L_{M}}^{k} (a_{M}, a_{Y}) = g_{L_{M}}^{k}) \\ = d P (L_{M}^{k} = l_{M}^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, Δ_{L_{M}}^{k} = 1), \end{matrix}

(7)

\begin{matrix} d P (L_{Y}^{k} (a_{Y}, g_{L_{Y}}^{k}) = l_{Y}^{k} ∣ L_{0} = l_{0}, G_{L_{Y}}^{k} (a_{M}, a_{Y}) = g_{L_{Y}}^{k}) \\ = d P (L_{Y}^{k} = l_{Y}^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, Δ_{L_{Y}}^{k} = 1), \end{matrix}

(8)

\begin{matrix} d P (M^{k} (a_{M}, g_{M}^{k}) = m^{k} ∣ L_{0} = l_{0}, G_{M}^{k} (a_{M}, a_{Y}) = g_{M}^{k}) \\ = d P (M^{k} = m^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{M}^{k} = g_{Y}^{k}, Δ_{M}^{k} = 1), \end{matrix}

(9)

\begin{matrix} d P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ L_{0} = l_{0}, G_{Y}^{k} (a_{M}, a_{Y}) = g_{Y}^{k}) \\ = d P (Y^{k} = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{Y}^{k} = g_{Y}^{k}, Δ_{Y}^{k} = 1) . \end{matrix}

(10)

We take the expectation for

Y^{K} (a_{M}, a_{Y})

over the distribution of all potential variables using the g-formula [23]. Under positivity, the expectation of the potential outcome of interest at period K is identified as

\begin{matrix} E {Y^{K} (a_{M}, a_{Y})} = \int y^{K} \prod_{k = 1}^{K} & d P (L_{M}^{k} = l_{M}^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, Δ_{L_{M}}^{k} = 1) \\ d P (L_{Y}^{k} = l_{Y}^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, Δ_{L_{Y}}^{k} = 1) \\ d P (M^{k} = m^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{M}^{k} = g_{Y}^{k}, Δ_{M}^{k} = 1) \\ d P (Y^{k} = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{Y}^{k} = g_{Y}^{k}, Δ_{Y}^{k} = 1) \\ d P (L_{0} = l_{0}) . \end{matrix}

(11)

The integration is conducted over the support of

(L_{0}, L_{M}^{k}, L_{Y}^{k}, M^{k}, Y^{k} : k = 1, \dots, K)

. We summarize the result in the following theorem.

Theorem 1.

Under Assumptions 1–5, NDE and NIE are identifiable.

The model involves more predictors with k growing larger. Estimation becomes more complicated if the measurements have too many periods. For simplicity, we may assume Markovness (exclusion restriction) that the potential variables at period k only rely on the preceding variables for at most one period.

Assumption 6

(Markovness for longitudinal outcomes). For

k = 1, \dots, K

,

\begin{matrix} L_{M}^{k} (a_{M}, g_{L_{M}}^{k}) & = L_{M}^{k} (a_{M}, l_{M}^{k}, l_{Y}^{k}, m^{k}, y^{k}), \\ L_{Y}^{k} (a_{Y}, g_{L_{Y}}^{k}) & = L_{Y}^{k} (a_{Y}, l_{Y}^{k}, m^{k}, y^{k}, l_{M}^{k}), \\ M^{k} (a_{M}, g_{M}^{k}) & = M^{k} (a_{M}, m^{k}, y^{k}, l_{M}^{k}, l_{Y}^{k}), \\ Y^{k} (a_{Y}, g_{Y}^{k}) & = Y^{k} (a_{Y}, y^{k}, l_{M}^{k}, l_{Y}^{k}, m^{k}), \end{matrix}

in which

l_{M}^{k}

,

l_{Y}^{k}

,

m^{k}

and

y^{k}

are consistent with the history

g_{L_{M}}^{k}

,

g_{L_{Y}}^{k}

,

g_{M}^{k}

and

g_{Y}^{k}

.

Under Assumption 6,

\begin{matrix} d P (L_{M}^{k} (a_{M}, g_{L_{M}}^{k}) = l_{M}^{k} ∣ L_{0} = l_{0}, G_{L_{M}}^{k} (a_{M}, a_{M}) = g_{L_{M}}^{k}) \\ = d P (L_{M}^{k} = l_{M}^{k} ∣ A = a_{M}, L_{0} = l_{0}, L_{M}^{k - 1} = l_{M}^{k - 1}, L_{Y}^{k - 1} = l_{Y}^{k - 1}, M^{k - 1} = m^{k - 1}, Y^{k - 1} = y^{k - 1}), \\ d P (L_{Y}^{k} (a_{Y}, g_{L_{Y}}^{k}) = l_{Y}^{k} ∣ L_{0} = l_{0}, G_{L_{Y}}^{k} (a_{Y}, a_{Y}) = g_{L_{Y}}^{k}) \\ = d P (L_{Y}^{k} = l_{Y}^{k} ∣ A = a_{Y}, L_{0} = l_{0}, L_{Y}^{k - 1} = l_{Y}^{k - 1}, M^{k - 1} = m^{k - 1}, Y^{k - 1} = y^{k - 1}, L_{M}^{k} = l_{M}^{k}), \\ d P (M^{k} (a_{M}, g_{M}^{k}) = m^{k} ∣ L_{0} = l_{0}, G_{M}^{k} (a_{M}, a_{M}) = g_{M}^{k}) \\ = d P (M^{k} = m^{k} ∣ A = a_{M}, L_{0} = l_{0}, M^{k - 1} = m^{k - 1}, Y^{k - 1} = y^{k - 1}, L_{M}^{k} = l_{M}^{k}, L_{Y}^{k} = l_{Y}^{k}), \\ d P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ L_{0} = l_{0}, G_{Y}^{k} (a_{Y}, a_{Y}) = g_{Y}^{k}) \\ = d P (Y^{k} = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, Y^{k - 1} = y^{k - 1}, L_{M}^{k} = l_{M}^{k}, L_{Y}^{k} = l_{Y}^{k}, M^{k} = m^{k}) . \end{matrix}

So, we can use a pooled model to estimate these conditional probabilities. Then, the expectation of potential outcome can be estimated by the g-formula. An alternative to estimate the target estimand

E {Y^{K} (a_{M}, a_{Y})}

is to employ weighting methods [24].

2.2. Identifiability by Dismissible Treatment Components

The identification by sequential ignorability bears complexity in notations as well as difficulty in interpretations. An interventionist approach to studying mediation effects is the separable effects framework [11,13,21]. The initial treatment A is divided into two components

A_{M}

and

A_{Y}

, where

A_{M}

exerts effects on the mediator-inducing confounding

L_{M}^{k}

and mediators

M^{k}

, while

A_{Y}

exerts effects on the outcome-inducing confounding

L_{Y}^{k}

and outcomes

Y^{k}

. All post-treatment variables are potential outcomes of

A_{M}

and

A_{Y}

. Since we are not to intervene in post-treatment variables, there is no need to express the post-treatment variables as potential outcomes of their history. In the realized experiment,

A = A_{M} = A_{Y}

, but in hypothetical or future experiments, we can let

A_{M} \neq A_{Y}

.

Specifically, we have the potential mediator-inducing confounding

L_{M}^{k} (a_{M}, a_{Y})

, the potential outcome-inducing confounding

L_{Y}^{k} (a_{M}, a_{Y})

, the potential mediator

M^{k} (a_{M}, a_{Y})

and the potential outcome of interest

Y^{k} (a_{M}, a_{Y})

at period k. Let

L_{0}

be the baseline covariates. Denote the history for the mediator-inducing confounding, outcome-inducing confounding, mediator and outcome at period k by

G_{L_{M}}^{k} (a_{M}, a_{Y})

,

G_{L_{Y}}^{k} (a_{M}, a_{Y})

,

G_{M}^{k} (a_{M}, a_{Y})

and

G_{Y}^{k} (a_{M}, a_{Y})

, respectively, including all the post-treatment variables prior to this variable under the hypothetical treatment components

(a_{M}, a_{Y})

. Let

L_{M}^{k}

,

L_{Y}^{k}

,

M^{k}

and

Y^{k}

be the observed mediator-inducing confounding, outcome-inducing confounding, mediator and outcome at period k. Let

G_{L_{M}}^{k}

,

G_{L_{Y}}^{k}

,

G_{M}^{k}

and

G_{Y}^{k}

be the observed history at period k. We assume that the observed variables are equal to the potential counterparts under the realized treatment.

Assumption 7

(Consistency). For

k = 1, \dots, K

,

\begin{matrix} L_{M}^{k} = L_{M}^{k} (A, A), L_{Y}^{k} = L_{Y}^{k} (A, A), M^{k} = M^{k} (A, A), Y^{k} = Y^{k} (A, A), \\ G_{L_{M}}^{k} = G_{L_{M}}^{k} (A, A), G_{L_{Y}}^{k} = G_{L_{Y}}^{k} (A, A), G_{M}^{k} = G_{M}^{k} (A, A), G_{Y}^{k} = G_{Y}^{k} (A, A) . \end{matrix}

Under the separable effects framework, the natural direct effect (NDE) and natural indirect effect (NIE) on the outcome of interest at period K are defined as

\begin{matrix} NDE & = E {Y^{K} (0, 1) - Y^{K} (0, 0)}, \end{matrix}

(12)

\begin{matrix} NIE & = E {Y^{K} (1, 1) - Y^{K} (0, 1)} . \end{matrix}

(13)

In the literature on separable effects, these estimands are also called the separable direct effect (SDE) and separable indirect effect (SIE).

We assume the treatment mechanism is ignorability (the same role as the first part in Assumption 3).

Assumption 8

(Ignorability for longitudinal outcomes).

\begin{matrix} A ⫫ {L_{M}^{k} (a_{M}, a_{Y}), L_{Y}^{k} (a_{M}, a_{Y}), M^{k} (a_{M}, a_{Y}), Y^{k} (a_{M}, a_{Y}) : k = 1, \dots, K} ∣ L_{0} . \end{matrix}

The core assumption for identification is the dismissible treatment components assumption, an alternative to sequential ignorability.

Assumption 9

(Dismissible treatment components for longitudinal outcomes).

\begin{matrix} d P (L_{M}^{k} (a_{M}, a_{Y}) = l_{M}^{k} ∣ L = l_{0}, G_{L_{M}} (a_{M}, a_{Y}) = g_{L_{M}}^{k}) \\ = d P (L_{M}^{k} (a_{M}, a_{Y}) = l_{M}^{k} ∣ L = l_{0}, G_{L_{M}} (a_{M}, a_{M}) = g_{L_{M}}^{k}), \\ d P (L_{Y}^{k} (a_{M}, a_{Y}) = l_{Y}^{k} ∣ L = l_{0}, G_{L_{Y}} (a_{M}, a_{Y}) = g_{L_{Y}}^{k}) \\ = d P (L_{Y}^{k} (a_{M}, a_{Y}) = l_{Y}^{k} ∣ L = l_{0}, G_{L_{Y}} (a_{Y}, a_{Y}) = g_{L_{Y}}^{k}), \\ d P (M^{k} (a_{M}, a_{Y}) = m^{k} ∣ L = l_{0}, G_{M} (a_{M}, a_{Y}) = g_{M}^{k}) \\ = d P (M^{k} (a_{M}, a_{Y}) = m^{k} ∣ L = l_{0}, G_{M} (a_{M}, a_{M}) = g_{M}^{k}), \\ d P (Y^{k} (a_{M}, a_{Y}) = y^{k} ∣ L = l_{0}, G_{Y} (a_{M}, a_{Y}) = g_{Y}^{k}) \\ = d P (Y^{k} (a_{M}, a_{Y}) = y^{k} ∣ L = l_{0}, G_{Y} (a_{Y}, a_{Y}) = g_{Y}^{k}) . \end{matrix}

The dismissible treatment components assumption implies that the discrete-time hazards of the mediator-inducing confounding and mediators are irrelevant to

A_{Y}

, while the discrete-time hazards of the outcome-inducing confounding and outcomes are irrelevant to

A_{M}

, given the baseline covariates and history. This assumption bypasses the complicated derivation of sequential ignorability by using more concise and interpretable notations. Figure 2 shows the extended DAG for longitudinal outcomes under the separable effects framework. Even if there is unmeasured confounding between

L_{M}^{j_{1}}

and

L_{M}^{j_{2}}

, or between

L_{M}^{j_{3}}

and

M^{j_{4}}

, the paths to deliver effects of

A_{Y}

on

M^{j}

are blocked by the conditioning on the history (

j_{1}, j_{2}, j_{3}, j_{4}, j \in {1, \dots, K}

). Similarly, if there is unmeasured confounding between

L_{Y}^{j_{1}}

and

L_{Y}^{j_{2}}

, or between

L_{Y}^{j_{3}}

and

Y^{j_{4}}

, the paths to deliver effects of

A_{M}

on

Y^{j}

are blocked by the conditioning on the history (

j_{1}, j_{2}, j_{3}, j_{4}, j \in {1, \dots, K}

). When such confounding exists, the sequential ignorability fails but the dismissible treatment components assumption holds. The dismissible treatment components assumption holds as long as there is no unmeasured confounding between the sequences of

(L_{M}^{k}, M^{k})

and

(L_{Y}^{k}, Y^{k})

, under which the effects of

A_{M}

and

A_{Y}

can be isolated. Therefore, the dismissible treatment components assumption has weaker implications than the sequential ignorability.

Figure 2. An extended direct acyclic graph (eDAG) for longitudinal outcomes with 3 periods. The treatment A is divided into two components:

A_{M}

and

A_{Y}

. In addition,

L_{M}^{j}

is the mediator-inducing confounding,

L_{Y}^{j}

is the outcome-inducing confounding,

M^{j}

is the mediator, and

Y^{j}

is the outcome at the period j. The baseline covariates

L_{0}

, which can have direct edges to all variables, are omitted. Red lines depict edges into the mediator, blue lines depict edges into the mediator-inducing confounding, green lines depict edges into the outcome, black lines depict edges into the outcome-inducing confounding, and grey lines depict unmeasured confounding. In the presence of unmeasured confounding

U_{L 1}

,

U_{L_{2}}

,

U_{M}

and

U_{Y}

, the dismissible treatment components assumption holds but the sequential ignorability assumption is violated.

Let

Δ_{L_{M}}^{k}

,

Δ_{L_{Y}}^{k}

,

Δ_{M}^{k}

and

Δ_{Y}^{k}

be the censoring indicator of

L_{M}

,

L_{Y}

, M and Y, respectively, at period k. In addition to dismissible treatment components, we need random censoring and positivity.

Assumption 10

(Random censoring for longitudinal outcomes). For

k = 1, \dots, K

,

\begin{matrix} Δ_{L_{M}}^{k} ⫫ L_{M}^{k} (a_{M}, a_{M}) ∣ A = a_{M}, L_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, \\ Δ_{L_{Y}}^{k} ⫫ L_{Y}^{k} (a_{Y}, a_{Y}) ∣ A = a_{Y}, L_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, \\ Δ_{M}^{k} ⫫ M^{k} (a_{M}, a_{M}) ∣ A = a_{M}, L_{0}, G_{M}^{k} = g_{M}^{k}, \\ Δ_{Y}^{k} ⫫ Y^{k} (a_{Y}, a_{Y}) ∣ A = a_{Y}, L_{0}, G_{Y}^{k} = g_{Y}^{k} . \end{matrix}

Assumption 11

(Positivity for longitudinal outcomes). The following statements hold:

\begin{matrix} P (L_{0} = l_{0}) > 0 & \Rightarrow P (A = a, L_{0} = l_{0}) > 0, \\ P (L_{0} = l_{0}, G_{L_{M}}^{k} (a_{M}, a_{Y}) = g_{L_{M}}^{k}) > 0 & \Rightarrow P (A = a_{M}, L_{0} = l_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, Δ_{L_{M}} = 1) > 0, \\ P (L_{0} = l_{0}, G_{L_{Y}}^{k} (a_{M}, a_{Y}) = g_{L_{Y}}^{k}) > 0 & \Rightarrow P (A = a_{Y}, L_{0} = l_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, Δ_{L_{Y}} = 1) > 0, \\ P (L_{0} = l_{0}, G_{M}^{k} (a_{M}, a_{Y}) = g_{M}^{k}) > 0 & \Rightarrow P (A = a_{M}, L_{0} = l_{0}, G_{M}^{k} = g_{M}^{k}, Δ_{M} = 1) > 0, \\ P (L_{0} = l_{0}, G_{Y}^{k} (a_{M}, a_{Y}) = g_{Y}^{k}) > 0 & \Rightarrow P (A = a_{Y}, L_{0} = l_{0}, G_{Y}^{k} = g_{Y}^{k}, Δ_{Y} = 1) > 0, \end{matrix}

for every

a, a_{M}, a_{Y} \in {0, 1}

.

Assumptions 10 and 11 have similar meanings to Assumptions 4 and 5. Under ignorability, dismissible treatments components and random censoring, we can show that

\begin{matrix} d P (L_{M}^{k} (a_{M}, a_{Y}) = l_{M}^{k} ∣ L_{0} = l_{0}, G_{L_{M}}^{k} (a_{M}, a_{Y}) = g_{L_{M}}^{k}) \\ = d P (L_{M}^{k} = l_{M}^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, Δ_{L_{M}}^{k} = 1), \end{matrix}

(14)

\begin{matrix} d P (L_{Y}^{k} (a_{M}, a_{Y}) = l_{Y}^{k} ∣ L_{0} = l_{0}, G_{L_{Y}}^{k} (a_{M}, a_{Y}) = g_{L_{Y}}^{k}) \\ = d P (L_{Y}^{k} = l_{Y}^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, Δ_{L_{Y}}^{k} = 1), \end{matrix}

(15)

\begin{matrix} d P (M^{k} (a_{M}, a_{Y}) = m^{k} ∣ L_{0} = l_{0}, G_{M}^{k} (a_{M}, a_{Y}) = g_{M}^{k}) \\ = d P (M^{k} = m^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{M}^{k} = g_{Y}^{k}, Δ_{M}^{k} = 1), \end{matrix}

(16)

\begin{matrix} d P (Y^{k} (a_{M}, a_{Y}) = y^{k} ∣ L_{0} = l_{0}, G_{Y}^{k} (a_{M}, a_{Y}) = g_{Y}^{k}) \\ = d P (Y^{k} = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{Y}^{k} = g_{Y}^{k}, Δ_{Y}^{k} = 1) . \end{matrix}

(17)

by directly utilizing the property of the dismissible treatment components. Then, it is straightforward to apply the g-formula to identify the expectation of the potential outcome of interest under positivity:

\begin{matrix} E {Y^{K} (a_{M}, a_{Y})} = \int y^{K} \prod_{k = 1}^{K} & d P (L_{M}^{k} = l_{M}^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, Δ_{L_{M}}^{k} = 1) \\ d P (L_{Y}^{k} = l_{Y}^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, Δ_{L_{Y}}^{k} = 1) \\ d P (M^{k} = m^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{M}^{k} = g_{Y}^{k}, Δ_{M}^{k} = 1) \\ d P (Y^{k} = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{Y}^{k} = g_{Y}^{k}, Δ_{Y}^{k} = 1) \\ d P (L_{0} = l_{0}) . \end{matrix}

(18)

The identification formula under the separable effects framework is identical to that under sequential ignorability. Although these two frameworks introduce different estimands with different interpretations, the assumptions incorporated have the same goal to eliminate cross-world quantities. The cross-world reliance is erased by conditioning on observed history. We summarize the identification results in the following theorem.

Theorem 2.

Under Assumptions 7–11, both NIE and NDE are identifiable.

A practically useful assumption is Markovness, which simplifies the conditional probabilities.

Assumption 12

(Markovness for longitudinal outcomes). For

k = 1, \dots, K

,

\begin{matrix} d P (L_{M}^{k} (a_{M}, a_{M}) = l_{M}^{k} ∣ L_{0} = l_{0}, G_{L_{M}}^{k} (a_{M}, a_{M}) = g_{L_{M}}^{k}) \\ = d P (L_{M}^{k} (a_{M}, a_{M}) = l_{M}^{k} ∣ L_{0} = l_{0}, L_{M}^{k - 1} (a_{M}, a_{M}) = l_{M}^{k - 1}, L_{Y}^{k - 1} (a_{M}, a_{M}) = l_{Y}^{k - 1}, \\ M^{k - 1} (a_{M}, a_{M}) = m^{k - 1}, Y^{k - 1} (a_{M}, a_{M}) = y^{k - 1}), \\ d P (L_{Y}^{k} (a_{Y}, a_{Y}) = l_{M}^{k} ∣ L_{0} = l_{0}, G_{L_{Y}}^{k} (a_{Y}, a_{Y}) = g_{L_{Y}}^{k}) \\ = d P (L_{Y}^{k} (a_{Y}, a_{Y}) = l_{Y}^{k} ∣ L_{0} = l_{0}, L_{M}^{k} (a_{Y}, a_{Y}) = l_{M}^{k}, L_{Y}^{k - 1} (a_{Y}, a_{Y}) = l_{Y}^{k - 1}, \\ M^{k - 1} (a_{Y}, a_{Y}) = m^{k - 1}, Y^{k - 1} (a_{Y}, a_{Y}) = y^{k - 1}), \\ d P (M^{k} (a_{M}, a_{M}) = m^{k} ∣ L_{0} = l_{0}, G_{M}^{k} (a_{M}, a_{M}) = g_{M}^{k}) \\ = d P (M^{k} (a_{M}, a_{M}) = m^{k} ∣ L_{0} = l_{0}, L_{M}^{k} (a_{M}, a_{M}) = l_{M}^{k}, L_{Y}^{k} (a_{M}, a_{M}) = l_{Y}^{k}, \\ M^{k - 1} (a_{M}, a_{M}) = m^{k - 1}, Y^{k - 1} (a_{M}, a_{M}) = y^{k - 1}), \\ d P (Y^{k} (a_{Y}, a_{Y}) = y^{k} ∣ L_{0} = l_{0}, G_{Y}^{k} (a_{Y}, a_{Y}) = g_{Y}^{k}) \\ = d P (Y^{k} (a_{Y}, a_{Y}) = y^{k} ∣ L_{0} = l_{0}, L_{M}^{k} (a_{Y}, a_{Y}) = l_{M}^{k}, L_{Y}^{k} (a_{Y}, a_{Y}) = l_{Y}^{k}, \\ M^{k} (a_{Y}, a_{Y}) = m^{k}, Y^{k - 1} (a_{Y}, a_{Y}) = y^{k - 1}) . \end{matrix}

Then, we only need to involve the most recent measurements of

L_{M}

,

L_{Y}

, M and Y as well as

L_{0}

and A in the conditional distribution model. A regression estimator and a weighting estimator for the natural effects under Markovness with competing events has been proposed [21]. This idea can be generalized to longitudinal mediation analysis by finding plug-in estimators using the identification formula. Theoretically, we can derive the efficient influence functions for each term in the identification formula of

E {Y^{K} (a_{M}, a_{Y})}

and then apply the functional delta method to obtain an efficient estimator for

E {Y^{K} (a_{M}, a_{Y})}

[19,25]. The resulting estimator may enjoy some multiple robustness based on the efficient influence functions. However, the efficient estimator can be very complicated, involving too many models. The multiple robustness may not be very meaningful since the regression models are often incorrectly specified simultaneously. The variance derived by the asymptotic form based on efficient influence functions can be unstable. In practice, simple estimators with sensitivity analysis are desired.

3. Time-to-Event Outcomes

Mediation analysis for time-to-event outcomes is referred to as semi-competing risks [26,27]. There is an intermediate event (counterpart to the mediator) and a terminal event (counterpart to the outcome). If the terminal event occurs at time t, then the intermediate event would never occur after t. Semi-competing risks can be understood by discretion of the continuous time. Let

k = 1, \dots, K

be a sequence of times, and the statuses of intermediate event

M^{k} \in {0, 1}

and terminal event

Y^{k} \in {0, 1}

be measured at each time. The indicator of status equals 0 if the event has not occurred at time k and 1 if the event has already occurred at time k. Considering the nature of binary status, the conditional probabilities of

M^{k}

and

Y^{k}

given the history can be expressed by the hazard using counting processes, that is, the timewise probability of occurring the event at k given the baseline and history prior to k.

Formally, we use the notion of counting processes to formalize the mediation analysis for time-to-event outcomes [22,27]. Suppose there are time-varying intermediate event-inducing confounding

L_{1} (t)

and terminal event-inducing confounding

L_{2} (t)

. Let

{\tilde{N}}_{1} (t; a_{1}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot))

be the counting process of the potential intermediate event when the treatment is set at

a_{1}

, the intermediate event process prior to t is set at

{\tilde{n}}_{1} (s)

, the terminal event process prior to t is set at

{\tilde{n}}_{2} (s)

, the intermediate event-inducing confounding process prior to t is set at

{\tilde{l}}_{1} (s)

and the terminal event-inducing confounding process prior to t is set at

{\tilde{l}}_{2} (s)

,

s < t

. Analogously, let

{\tilde{N}}_{2} (t; a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot))

be the counting process of the terminal event when the treatment is set at

a_{2}

, and the processes prior to t are set at

{\tilde{n}}_{1} (s)

,

{\tilde{n}}_{2} (s)

,

{\tilde{l}}_{1} (s)

and

{\tilde{l}}_{2} (s)

,

s < t

. Let

L_{0}

be the baseline covariates, and the confounding processes of intermediate event-inducing confounding and terminal event-inducing confounding

{\tilde{L}}_{1} (t; a_{1}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), {\tilde{L}}_{2} (t; a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)),

respectively.

Let

{\tilde{N}}_{1} (t)

,

{\tilde{N}}_{2} (t)

,

{\tilde{L}}_{1} (t)

and

{\tilde{L}}_{2} (t)

be the counting processes of the intermediate event, terminal event, intermediate event-inducing confounding and terminal event-inducing confounding in the realized trial at time

t \in [0, t^{*}]

. We assume consistency if under the observed treatment A, the observable counting processes are compatible with the potential processes.

Assumption 13

(Consistency). For

t \in [0, t^{*}]

,

\begin{matrix} {\tilde{N}}_{1} (t) = {\tilde{N}}_{1} (t; A, {\tilde{N}}_{1} (\cdot), {\tilde{N}}_{2} (\cdot), {\tilde{L}}_{1} (\cdot), {\tilde{L}}_{2} (\cdot)), {\tilde{N}}_{2} (t) = {\tilde{N}}_{2} (t; A, {\tilde{N}}_{1} (\cdot), {\tilde{N}}_{2} (\cdot), {\tilde{L}}_{1} (\cdot), {\tilde{L}}_{2} (\cdot)), \\ {\tilde{L}}_{1} (t) = {\tilde{L}}_{1} (t; A, {\tilde{N}}_{1} (\cdot), {\tilde{N}}_{2} (\cdot), {\tilde{L}}_{1} (\cdot), {\tilde{L}}_{2} (\cdot)), {\tilde{L}}_{2} (t) = {\tilde{L}}_{2} (t; A, {\tilde{N}}_{1} (\cdot), {\tilde{N}}_{2} (\cdot), {\tilde{L}}_{1} (\cdot), {\tilde{L}}_{2} (\cdot)) . \end{matrix}

The sequential ignorability assumption and dismissible treatment components assumption can be extended to time-to-event outcomes from discrete times. Sequential ignorability requires that there is no unmeasured confounding between any two processes at any two times [22]. This requirement is sometimes too strong, because the processes can be determined by some underlying features. It is hard to imagine timewise randomization for the processes. The dismissible treatment components assumption only requires that there is no unmeasured confounding between the processes of

({\tilde{N}}_{1} (\cdot), {\tilde{L}}_{1} (\cdot))

and

({\tilde{N}}_{2} (\cdot), {\tilde{L}}_{2} (\cdot))

, which is weaker than sequential ignorability. Therefore, we only formalize the assumptions under the separable effects framework.

When conducting mediation analysis, we would set the treatment for the intermediate event-inducing confounding and intermediate event at

a_{1}

, and set the treatment for the terminal event-inducing confounding and terminal event at

a_{2}

. Therefore, we have natural processes of the intermediate event-inducing confounding, terminal event-inducing confounding, intermediate event and terminal event

{\tilde{L}}_{1} (t; a_{1}, a_{2})

,

{\tilde{L}}_{2} (t; a_{1}, a_{2})

,

{\tilde{N}}_{1} (t; a_{1}, a_{2})

and

{\tilde{N}}_{2} (t; a_{1}, a_{2})

, respectively. The natural direct effect (NDE) and natural indirect effect (NIE) are defined by contrasting the counterfactual cumulative incidence of the terminal event:

\begin{matrix} NDE (t) & = P ({\tilde{N}}_{2} (t; 0, 1) = 1) - P ({\tilde{N}}_{2} (t; 0, 0) = 1), \end{matrix}

(19)

\begin{matrix} NIE (t) & = P ({\tilde{N}}_{2} (t; 1, 1) = 1) - P ({\tilde{N}}_{2} (t; 0, 1) = 1) . \end{matrix}

(20)

Given the baseline covariates and natural processes, the continuous-time hazards of the intermediate event and terminal event can be expressed as

\begin{matrix} d Λ_{1} (t; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), d Λ_{2} (t; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \end{matrix}

and the transition density of the intermediate event-inducing confounding to

l_{1}

and terminal event-inducing confounding to

l_{2}

at time t as

\begin{matrix} d P_{L_{1}} (t, l_{1}; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), d P_{L_{2}} (t, l_{2}; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \end{matrix}

respectively. The treatment components

A_{1}

and

A_{2}

should have separable effects on

({\tilde{N}}_{1} (t), {\tilde{L}}_{1} (t))

and

({\tilde{N}}_{2} (t), {\tilde{L}}_{2} (t))

. In other words, given the baseline covariates and processes history, all the paths from

A_{1}

to

({\tilde{N}}_{2} (t), {\tilde{L}}_{2} (t))

are blocked, and all the paths from

A_{2}

to

({\tilde{N}}_{1} (t), {\tilde{L}}_{1} (t))

are blocked.

Assumption 14

(Ignorability for continuous time).

\begin{matrix} A ⫫ { & {\tilde{N}}_{1} (t; a_{1}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \\ {\tilde{N}}_{2} (t; a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \\ {\tilde{L}}_{1} (t; a_{1}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \\ {\tilde{L}}_{2} (t; a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) : 0 < t \leq t^{*}} ∣ L_{0} . \end{matrix}

Assumption 15

(Dismissible treatment components for continuous time). For

t \in [0, t^{*}]

,

\begin{matrix} d Λ_{1} (t; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d Λ_{1} (t; a_{1}, a_{1}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \\ d Λ_{2} (t; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d Λ_{2} (t; a_{2}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \\ d P_{L_{1}} (t, l_{1}; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d P_{L_{1}} (t, l_{1}; a_{1}, a_{1}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \\ d P_{L_{2}} (t, l_{2}; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d P_{L_{2}} (t, l_{2}; a_{2}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) . \end{matrix}

If

{\tilde{N}}_{2} (t) = 0

, we would not record the observations after t anymore. Since the counting processes can only jump from 0 to 1, the hazard of an event is only meaningful when the individual is at risk of this event. We can further simplify the notations,

\begin{matrix} d Λ_{1} (t; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d Λ_{1} (t; a_{1}, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \end{matrix}

(21)

\begin{matrix} d Λ_{2} (t; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d Λ_{2} (t; a_{2}, {\tilde{n}}_{1} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \end{matrix}

(22)

\begin{matrix} d P_{L_{1}} (t, l_{1}; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d P_{L_{1}} (t, l_{1}; a_{1}, {\tilde{n}}_{1} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \end{matrix}

(23)

\begin{matrix} d P_{L_{2}} (t, l_{2}; a_{1}, a_{2}, {\tilde{n}}_{1} (\cdot), {\tilde{n}}_{2} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d P_{L_{2}} (t, l_{2}; a_{2}, {\tilde{n}}_{1} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) . \end{matrix}

(24)

To account for censoring, let

{\tilde{N}}_{C} (t)

be the censoring process. We assume that the processes of intermediate event-inducing confounding, terminal event-inducing confounding, intermediate event and terminal event at time t are either observed or censored simultaneously. We assume random censoring and positivity.

Assumption 16

(Random censoring for continuous time).

\begin{matrix} d {\tilde{N}}_{C} (t) ⫫ ({\tilde{L}}_{1} (t; a, a), {\tilde{L}}_{2} (t; a, a), {\tilde{N}}_{1} (t; a, a), {\tilde{N}}_{2} (t; a, a)) \\ ∣ A = a, L_{0}, {\tilde{N}}_{C} (s) = 0, {\tilde{N}}_{1} (s), {\tilde{N}}_{2} (s) = 0, {\tilde{L}}_{1} (s), {\tilde{L}}_{2} (s) : s < t, 0 < t \leq t^{*} \end{matrix}

Assumption 17

(Positivity for continuous time).

\begin{matrix} P (L_{0} = l_{0}) > 0 \Rightarrow P (A = a, L_{0} = l_{0}) > 0, \\ P (L_{0} = l_{0}, {\tilde{N}}_{1} (t; a_{1}, a_{2}) = {\tilde{N}}_{2} (t; a_{1}, a_{2}) = 0, {\tilde{L}}_{1} (t; a_{1}, a_{2}) = {\tilde{l}}_{1}, {\tilde{L}}_{2} (t; a_{1}, a_{2}) = {\tilde{l}}_{2}) > 0 \\ \Rightarrow P (A = a, L_{0} = l_{0}, {\tilde{N}}_{1} (t) = {\tilde{N}}_{2} (t) = {\tilde{N}}_{C} (t) = 0, {\tilde{L}}_{1} (t) = {\tilde{l}}_{1}, {\tilde{L}}_{2} (t) = {\tilde{l}}_{2}) > 0 . \end{matrix}

Positivity ensures that the at-risk set has positive probability so we have data to estimate the hazards and transition densities.

Theorem 3.

Under Assumptions 13–17, NIE and NDE are identifiable for

t \in [0, t^{*}]

.

The identification formula is complicated but the idea is intuitive. Through product limits, the counterfactual cumulative incidence can be expressed as product limits of hazards (or conditional density) of counterfactual variables [28,29]. With the dismissible treatment assumption within the separable effects framework, the hazards (or conditional density) are identical to the observed counterparts by substituting the counterfactuals with the observables. For

0 < t \leq t^{*}

,

\begin{matrix} P ({\tilde{N}}_{2} (t; a_{1}, a_{2}) = 1) & = \int \int_{0}^{t} \prod_{0 < s \leq u} \int d P_{L_{1}} (s, l_{1}; a_{1}, {\tilde{n}}_{1} (s) = 0, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) \\ d P_{L_{2}} (s, l_{2}; a_{2}, {\tilde{n}}_{1} (s) = 0, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) \\ {1 - d Λ_{1} (s; a_{1}, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot))} \\ {1 - d Λ_{2} (s; a_{2}, {\tilde{n}}_{1} (s) = 0, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot))} \\ d Λ_{2} (u; a_{2}, {\tilde{n}}_{1} (s) = 0, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) \\ + \int \int_{0}^{t} \int_{0}^{v} \prod_{0 < s \leq u} \int d P_{L_{1}} (s, l_{1}; a_{1}, {\tilde{n}}_{1} (s) = 0, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) \\ d P_{L_{2}} (s, l_{2}; a_{2}, {\tilde{n}}_{1} (s) = 0, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) \\ {1 - d Λ_{1} (s; a_{1}, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot))} \\ {1 - d Λ_{2} (s; a_{2}, {\tilde{n}}_{1} (s) = 0, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot))} \\ d Λ_{1} (u; a_{1}, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) \\ \prod_{u < s \leq v} \int d P_{L_{1}} (s, l_{1}; a_{1}, d {\tilde{n}}_{1} (u) = 1, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) \\ d P_{L_{2}} (s, l_{2}; a_{2}, d {\tilde{n}}_{1} (u) = 1, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) \\ {1 - d Λ_{2} (s; a_{2}, l_{0}, d {\tilde{n}}_{1} (u) = 1, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot))} \\ d Λ_{2} (v; a_{2}, l_{0}, d {\tilde{n}}_{1} (u) = 1, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)), \end{matrix}

(25)

where the inner integration is conducted over the support of

({\tilde{L}}_{1} (s), {\tilde{L}}_{2} (s))

and the outer integration is conducted over the support of

L_{0}

. The first term is the incidence of the terminal event without a history of intermediate event, and the second term is the incidence of the terminal event with a history of intermediate event. Usually, the time-varying confounding can only change values at finite time points, so the transition density can be parameterized as a product of the density of the time to change values and the distribution function when changing values [28].

We can additionally assume Markovness to simplify the identification. The hazard and transition density only depend on the current status rather than the full history given baseline covariates.

Assumption 18

(Markovness for continuous time). For

t \in [0, t^{*}]

,

\begin{matrix} d Λ_{1} (t; a_{1}, l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d Λ_{1} (t; a_{1}, l_{0}, {\tilde{l}}_{1} (t^{-}), {\tilde{l}}_{2} (t^{-})), \\ d Λ_{2} (t; a_{2}, {\tilde{n}}_{1} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d Λ_{2} (t; a_{2}, {\tilde{n}}_{1} (t^{-}), l_{0}, {\tilde{l}}_{1} (t^{-}), {\tilde{l}}_{2} (t^{-})), \\ d P_{L_{1}} (t, l_{1}; a_{1}, {\tilde{n}}_{1} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d P_{L_{1}} (t, l_{1}; a_{1}, {\tilde{n}}_{1} (t^{-}), l_{0}, {\tilde{l}}_{1} (t^{-}), {\tilde{l}}_{2} (t^{-})), \\ d P_{L_{2}} (t, l_{2}; a_{2}, {\tilde{n}}_{1} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot), {\tilde{l}}_{2} (\cdot)) & = d P_{L_{2}} (t, l_{2}; a_{2}, {\tilde{n}}_{1} (t^{-}), l_{0}, {\tilde{l}}_{1} (t^{-}), {\tilde{l}}_{2} (t^{-})) . \end{matrix}

Although Markovness is not necessary for identification, it brings great convenience for estimation. Under Markovness, we can use typical survival models like the proportional hazards model with time-varying confounding to estimate the hazards. If Markovness does not hold, the time origins of the transitions from the intermediate event to the terminal event are not aligned. There would be a biased sampling issue since the censoring probability is varying with time [30]. More deliberate models are required to obtain a closed-form estimator without Markovness. An estimator for the counterfactual cumulative incidence using parametric models based on efficient influence functions when there are no time-varying confounding has been proposed for semi-competing risks, i.e., illness–death models [31]. Multiple robustness found that the resulting estimator is consistent if (1) all three transition hazards are correctly specified or (2) the propensity score and censoring probability are correctly specified, and two of the three transition hazards are correctly specified. Theoretically, this idea can be generalized to the case with time-varying confounding under semiparametric models using the functional delta method, but there is no easy-to-implement estimating procedure if there are too many time-varying covariates or the time-varying covariates are too complex.

4. Application to Stem Cell Transplantation Data

Allogeneic stem cell transplantation is a widely applied therapy to treat acute lymphoblastic leukemia (ALL), including two sorts of transplant modalities: human leukocyte antigen (HLA)-matched sibling donor transplantation (MSDT) and haploidentical stem cell transplantation from family (Haplo-SCT). MSDT has long been regarded as the first choice of transplantation because MSDT leads to lower transplant-related mortality (TRM), also known as non-relapse mortality (NRM) [32]. Another source of mortality is due to relapse, known as relapse-related mortality (RRM). In recent years, some benefits of Haplo-SCT have been noticed in that patients with positive pre-transplantation minimum residual disease (MRD) undergoing Haplo-SCT have better prognosis in relapse, and hence lower relapse-related mortality [33]. It is of interest to study how the transplant modalities exert effects on overall mortality.

A total of

n = 303

patients with positive MRD undergoing allogeneic stem cell transplantation were included in our study [22,34]. Among these patients, 65 received MSDT (

A = 1

) and 238 received Haplo-SCT (

A = 0

). The transplantation type is “genetically randomized” in that there is no specific consideration to prefer Haplo-SCT over MSDT whenever MSDT is accessible [33]. Therefore, we expect ignorability. Four baseline covariates were considered: age, sex (male, female), diagnosis (T-ALL or B-ALL) and complete remission status (CR1, CR>1). A time-varying covariate is the occurrence of graft-versus-host disease (GVHD). These five covariates are risk factors associated with relapse and mortality indicated in the previous literature. The outcome is of the time-to-event type, subject to right censoring. The mean follow-up time was 1336 days. The terminal event is overall mortality, and the intermediate event is relapse. In the MSDT group, 47.7% patients were observed to encounter relapse and 53.8% mortality. In the Haplo-SCT group, 30.0% patients were observed to encounter relapse and 36.6% mortality. Summary statistics are presented in Table 1.

Table 1. Summary statistics in the data application, stratified by treatment groups. We list the mean and standard deviation (SD) of baseline covariates in each group. We also list the proportion of certain observed uncensored events (GVHD, relapse, mortality) and the time to the observed uncensored event in each group.

We adopt the separable effects framework to study the mediation effect of transplant modalities on overall mortality. We can find clinical interpretation for the treatment components. Haplo-SCT has fewer matched HLA loci compared with MSDT, so stronger immune rejection is anticipated. In practice, patients receiving Haplo-SCT should additionally use antithymocyte globulin (ATG) to facilitate engraftment [35]. Therefore,

A_{2}

is the treatment component that, through delaying immune reconstitution (the combined use of ATG), increases the risk of transplant-related mortality. Relapse is caused by the presence of minimum residual disease. The stronger immune rejection with Haplo-SCT kills body cells but also kills the minimum residual disease cells, which are referred to as the graft-versus-host and graft-versus-leukemia (GVL) effects, respectively [33]. Therefore,

A_{1}

is a treatment component through GVHD, which increases the risk of GVHD but reduces the risk of relapse. Let

L_{1} (\cdot)

be the time-varying GVHD status, affected by

A_{1}

. Following the notations in the preceding section,

L_{2} (\cdot)

is null.

In the presence of time-varying covariates, it is very difficult to apply the g-formula to obtain a simple regression estimator, because there are too many terms in the identification formula. Taking advantage of the fact that the occurrence of GVHD is binary, we may as well consider the GVHD as a state within the multi-state model. In this way, there are a total of four states, the initial state, the GVHD state, the relapse state and the mortality state, as shown in Figure 3. The x-axis is the day after transplantation, and the y-axis is on the scale of cumulative incidence. To avoid bidirectional transition between GVHD and relapse, we can further divide the GVHD state into an acute GVHD state (after treatment but before relapse) and a chronic GVHD state (after relapse but before mortality). By modeling the transition hazards between states, we can derive the cumulative incidence function of the overall mortality through integrating functions of hazards.

Figure 3. A multi-state model illustration for the leukemia data. Relapse is the intermediate event (

{\tilde{N}}_{1}

), mortality is the terminal event (

{\tilde{N}}_{2}

) and GVHD status is a time-varying covariate taking values 0 or 1 (

{\tilde{L}}_{1}

). To ease estimation, we regard GVHD as a state. GVHD can transit to relapse and relapse can transit to GVHD. Within the separable effects framework, the treatment component

A_{2}

has an effect on the hazard of mortality (

{\tilde{N}}_{2}

), whereas the treatment component

A_{1}

has effects on the hazard of GVHD (

{\tilde{L}}_{1}

) and relapse (

{\tilde{N}}_{1}

).

We impose a semiparametric proportional hazards model for the transition rates with Markovness:

\begin{matrix} d Λ_{1} (t; a_{1}, l_{0}, {\tilde{l}}_{1} (\cdot)) & = d Λ_{01, a_{1}} (t) exp (β_{1, a_{1}}^{'} l_{0} + γ_{1, a_{1}} l_{1} (t^{-})), \end{matrix}

(26)

\begin{matrix} d Λ_{2} (t; a_{2}, {\tilde{n}}_{1} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot)) & = d Λ_{02, a_{2}} (t) exp (β_{2, a_{2}}^{'} l_{0} + γ_{2, a_{2}} l_{1} (t^{-}) + η_{2, a_{2}} {\tilde{n}}_{1} (t^{-})), \end{matrix}

(27)

\begin{matrix} d Λ_{L_{1}} (t; a_{1}, {\tilde{n}}_{1} (\cdot), l_{0}, {\tilde{l}}_{1} (\cdot)) & = d Λ_{03, a_{2}} (t) exp (β_{3, a_{1}}^{'} l_{0} + γ_{3, a_{1}} l_{1} (t^{-}) + η_{3, a_{1}} {\tilde{n}}_{1} (t^{-})), \end{matrix}

(28)

where

d Λ_{01, a_{1}} (t)

,

d Λ_{02, a_{2}} (t)

and

d Λ_{03, a_{1}} (t)

are unknown baseline hazards. The baseline hazards can be different across treatment groups. Specially, the hazards of mortality and GVHD can rely on the status of relapse. The statuses of the intermediate event and confounding serve as time-varying covariates in the extended Cox model [36,37]. The censoring probability is also estimated by the semiparametric proportional hazards model. The unknown parameters in the above models are estimated by nonparametric maximum likelihood estimation (NPMLE), where the estimated baseline hazards are step functions with jumps only at observed event times [38]. The estimated parameters are updated by the Newton–Raphson algorithm and are considered to be converged if the update leads to changes in values smaller than 0.0001. The estimation procedure is implemented using R (version 4.4.0) [39].

Figure 4 shows the estimated counterfactual cumulative incidences of overall mortality in the left panel. The 95% confidence intervals are obtained by bootstrap with 200 resamplings. The red line represents the cumulative incidence of mortality when receiving MSDT, and the cyan line represents the cumulative incidence of mortality when receiving Haplo-SCT. We can see that the mortality rate is higher for MSDT in MRD-positive patients, indicating a stronger graft-versus-leukemia effect for Haplo-SCT. In a hypothetical world, suppose that the delayed immune reconstitution is set at the level for Haplo-SCT

A_{1} = 0

, and the GVHD/GVL is set at the level for MSDT

A_{2} = 1

. Then, the blue line represents the cumulative incidence of mortality in this hypothetical world. The right panel of Figure 4 shows the natural direct effect (NDE) and natural indirect effect (NIE). The natural indirect effect is significantly positive, indicating that Haplo-SCT reduces the risk of overall mortality through reducing the risk of relapse. The natural direct effect is insignificant, which means that the usage of ATG to delay immune reconstitution does not have a high impact on mortality.

Figure 4. The counterfactual cumulative incidence functions and the natural direct/indirect effects. The dashed lines represent the 95% bootstrap confidence intervals.

A sensitivity analysis can be conducted by assuming that chronic GVHD is a terminal event-inducing confounding, or assuming that both acute GVHD and chronic GVHD are terminal event-inducing confoundings. Fortunately, the estimated cumulative incidences and treatment effects are similar to those in Figure 4. This strengthens our conclusion. The empirical findings provide guidance on allogeneic stem cell transplantation. Since Haplo-SCT is more accessible than MSDT, we argue the Haplo-SCT is a reasonable alternative to MSDT. Although Haplo-SCT may lead to slightly higher non-relapse mortality, it is promising that practitioners pay special attention to patients receiving Haplo-SCT in order to reduce relapse-related mortality. The overall mortality can be significantly reduced due to the strong graft-versus-leukemia effect of Haplo-SCT.

5. Conclusions

The estimand in mediation analysis involves cross-world quantities. In this article, we studied two assumptions for mediation analysis, namely, sequential ignorability and dismissible treatment components. The former is conventional in mediation analysis and can be understood from the view of sequential randomization. The latter comes from the separable effects framework. The dismissible treatment components condition is weaker than sequential ignorability because the former allows some types of unmeasured confounding. The dismissible treatment components can be tested in future experiments if the treatment components are known. It is easier to interpret the dismissible treatment components assumption than sequential ignorability. Even if the dismissible treatment components assumption is violated, the estimands of natural direct and indirect effects are still meaningful, although not identifiable. Nowadays, in medical studies, the separable effects framework is becoming more and more popular. For time-to-event outcomes, the advantage of the separable effects framework is significant due to its notational simplicity and interpretability.

Through the application of real data on allogeneic stem cell transplantation, we show the usefulness of the separable effects framework. A post-treatment time-varying covariate, GVHD, is considered to modify the treatment effect. We find that Haplo-SCT reduces the risk of overall mortality through reducing the risk of relapse. Haplo-SCT has the potential to serve as an alternative to MSDT. In this real-data example, we have explicit interpretation for the separable effects. Conclusions drawn from the separable effects framework can inform new clinical knowledge, and also inspire biological research on the micro-foundation of treatment components.

6. Discussion

The separable effects framework has some limitations. The dismissible treatment components assumption is not testable in real-world trials. In the presence of time-varying confounding, it is essential to discuss with subject experts to determine whether the time-varying confounding is intermediate event-inducing or terminal event-inducing. If the classification of time-varying confounding is ambiguous, sensitivity analysis is encouraged. The sequential ignorability framework can be more useful in this case. We can find a certain type of natural direct effect or path-specific effect that is identifiable, and this partial effect may be of scientific interest [15,16].

Although the identifiability of the natural direct and indirect effects is proven, estimation remains challenging. The regression estimator using g-formula is not only complicated but also subject to model mis-specification. It is still worth studying how to derive more efficient estimators relying on weaker model assumptions. Since the estimation of cumulative incidence is recursive, slight model mis-specification may lead to a huge estimation bias at large time points. Modeling multiple and multi-valued time-varying covariates can be extremely difficult, so existing studies only focused on simple time-varying covariates [28]. In the application, the time-varying covariate is binary, so we take advantage of multi-state models to derive the cumulative incidence. It is questionable whether desirable statistical properties can be maintained in the presence of complex time-varying covariates.

There are a few future research directions. First, the asymptotic property of the estimators is complicated in longitudinal studies by applying the g-formula. Theoretically, the asymptotic property can be established based on the influence function. However, the influence function is too tedious to derive and there is no efficient algorithms to implement the g-formula in longitudinal studies with too many periods. Second, mediation analysis requires that the treatment lasts from the beginning to end without noncompliance. In longitudinal studies and time-to-event studies, patients may switch to other treatments at some time [19,40]. A new problem is how to define the estimand and formalize identification assumptions. Third, in the separable effects framework, it is possible to decompose the initial treatment into more components. Maybe there is a separable treatment component that influences time-varying covariates. Third, the direct outcome event following the treatment and the indirect outcome event following the intermediate event may be contributed by different treatment components. Therefore, the total effect should be decomposed into more than two natural effects [29]. Furthermore, there can be multiple mediators or intermediate events. It is worth studying the decomposition, identification and estimation with more complex treatment–mediator–outcome structures [41,42,43].

Another type of mediation estimand is the randomized interventional effects [40,44,45]. In general cases, the randomized interventional effects are distinct from the natural effects (or separable effects). The randomized interventional effects randomly draw post-treatment mediators and confoundings from the observed distribution associated with a given treatment policy. Weaker assumptions are required to identify the treatment effects. The identification assumptions can be understood from the view of nonparametric structural equation models for the data generating process of the time-varying treatments, confoundings, mediators and outcomes. It is not necessary to separate the post-treatment confounding into several components associated with treatment components for identification. Sequential doubly robust estimation can be used to estimate the treatment effects [46,47]. The idea of randomized interventional effects has the potential to be generalized to time-to-event outcomes [48].

Author Contributions

Conceptualization, Y.D.; methodology, Y.D. and H.W.; investigation, Y.D., H.W. and X.X.; writing—original draft, Y.D.; writing—review and editing, Y.Z. and Y.H. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the Science and Technology Project of Guangxi (Guike AD21220114); the National Key Research and Development Program of China, Grant No. 2021YFF0901400; and the National Natural Science Foundation of China, Grant No. 12026606, 12226005, 12361055.

Data Availability Statement

The data and R (version 4.4.0) codes that support our findings are available on GitHub https://github.com/naiiife/multistate (accessed on 22 July 2024).

Acknowledgments

We thank the Guest Editor for the invitation. We thank Yuewen Wang at Peking University People’s Hospital for introducing the background of allogeneic stem cell transplantation.

Conflicts of Interest

Author Xia Xiao was employed by the Geely Automobile Holdings Limited. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest. The Geely Automobile Holdings Limited had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

Appendix A. Proof of Theorem 1

We only prove for

Y^{k} (a_{Y}, g_{Y}^{k})

. In the following proof, we use the caption A1 (Assumption 1), A2 (Assumption 2), … in each line to illustrate which assumption is used to derive the equation. Other proofs are similar.

\begin{matrix} P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ L_{0} = l_{0}, G_{Y}^{k} (a_{M}, a_{Y}) = g_{Y}^{k}) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ L_{0} = l_{0}, L_{M}^{1} (a_{M}) = l_{M}^{1}, L_{Y}^{1} (a_{Y}, l_{L_{Y}}^{1}) = l_{Y}^{1}, M^{1} (a_{M}, g_{M}^{1}) = m^{1}, \dots) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{M}, L_{0} = l_{0}, L_{M}^{1} (a_{M}) = l_{M}^{1}, L_{Y}^{1} (a_{Y}, l_{L_{Y}}^{1}) = l_{Y}^{1}, M^{1} (a_{M}, g_{M}^{1}) = m^{1}, \dots) (A 2) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{M}, L_{0} = l_{0}, L_{M}^{1} = l_{M}^{1}, L_{Y}^{1} (a_{Y}, l_{L_{Y}}^{1}) = l_{Y}^{1}, M^{1} (a_{M}, g_{M}^{1}) = m^{1}, \dots) (A 1) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{M}, L_{0} = l_{0}, L_{Y}^{1} (a_{Y}, l_{L_{Y}}^{1}) = l_{Y}^{1}, M^{1} (a_{M}, g_{M}^{1}) = m^{1}, \dots) (A 3) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, L_{Y}^{1} (a_{Y}, l_{L_{Y}}^{1}) = l_{Y}^{1}, M^{1} (a_{M}, g_{M}^{1}) = m^{1}, \dots) (A 2) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, L_{M}^{1} = l_{M}^{1}, L_{Y}^{1} (a_{Y}, l_{L_{Y}}^{1}) = l_{Y}^{1}, M^{1} (a_{M}, g_{M}^{1}) = m^{1}, \dots) (A 3) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, L_{M}^{1} = l_{M}^{1}, L_{Y}^{1} = l_{Y}^{1}, M^{1} (a_{M}, g_{M}^{1}) = m^{1}, \dots) (A 1) \\ = \dots (R e p e a t i n g A 3, A 2, A 3, A 1) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{M}, L_{0} = l_{0}, L_{M}^{1} = l_{M}^{1}, L_{Y}^{1} = l_{Y}^{1}, M^{1} = m^{1}, \dots, M^{k} = m^{k}) (A 1) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{M}, L_{0} = l_{0}) (A 3) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}) (A 2) \\ = P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, L_{M}^{1} = l_{M}^{1}, L_{Y}^{1} = l_{Y}^{1}, M^{1} = m^{1}, \dots, M^{k} = m^{k}) (A 3) \\ = P (Y^{k} = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, L_{M}^{1} = l_{M}^{1}, L_{Y}^{1} = l_{Y}^{1}, M^{1} = m^{1}, \dots, M^{k} = m^{k}) (A 1) \\ = P (Y^{k} = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, L_{M}^{1} = l_{M}^{1}, L_{Y}^{1} = l_{Y}^{1}, M^{1} = m^{1}, \dots, M^{k} = m^{k}, Δ_{Y}^{k} = 1) (A 4) . \end{matrix}

Positivity (A5) ensures that the conditional probability is well defined. Finally, by the g-formula, we obtain the identification expression for

E {Y^{K} (a_{Y}, g_{Y}^{M})} = E {Y^{K} (a_{M}, a_{Y})}

:

\begin{matrix} E {Y^{K} (a_{M}, a_{Y})} = \int y^{K} \prod_{k = 1}^{K} & d P (L_{M}^{k} (a_{M}, g_{L_{M}}^{k}) = l_{M}^{k}) d P (L_{Y}^{k} (a_{Y}, g_{L_{Y}}^{k}) = l_{Y}^{k}) \\ d P (M^{k} (a_{M}, g_{M}^{k}) = m^{k}) d P (Y^{k} (a_{Y}, g_{Y}^{k}) = y^{k}) d P (L_{0} = l_{0}) \\ = \int y^{K} \prod_{k = 1}^{K} & d P (L_{M}^{k} = l_{M}^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{L_{M}}^{k} = g_{L_{M}}^{k}, Δ_{L_{M}}^{k} = 1) \\ d P (L_{Y}^{k} = l_{Y}^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{L_{Y}}^{k} = g_{L_{Y}}^{k}, Δ_{L_{Y}}^{k} = 1) \\ d P (M^{k} = m^{k} ∣ A = a_{M}, L_{0} = l_{0}, G_{M}^{k} = g_{Y}^{k}, Δ_{M}^{k} = 1) \\ d P (Y^{k} = y^{k} ∣ A = a_{Y}, L_{0} = l_{0}, G_{Y}^{k} = g_{Y}^{k}, Δ_{Y}^{k} = 1) \\ d P (L_{0} = l_{0}) . \end{matrix}

References

Baron, R.M.; Kenny, D.A. The moderator–mediator variable distinction in social psychological research: Conceptual, strategic, and statistical considerations. J. Personal. Soc. Psychol. 1986, 51, 1173–1182. [Google Scholar] [CrossRef] [PubMed]
Robins, J.M.; Greenland, S. Identifiability and exchangeability for direct and indirect effects. Epidemiology 1992, 3, 143–155. [Google Scholar] [CrossRef] [PubMed]
Tchetgen, E.J.T.; Shpitser, I. Semiparametric theory for causal mediation analysis: Efficiency bounds, multiple robustness, and sensitivity analysis. Ann. Stat. 2012, 40, 1816–1845. [Google Scholar] [CrossRef] [PubMed]
Pearl, J. Interpretation and identification of causal mediation. Psychol. Methods 2014, 19, 459–481. [Google Scholar] [CrossRef] [PubMed]
Imai, K.; Keele, L.; Tingley, D. A general approach to causal mediation analysis. Psychol. Methods 2010, 15, 309. [Google Scholar] [CrossRef] [PubMed]
Imai, K.; Keele, L.; Yamamoto, T. Identification, inference and sensitivity analysis for causal mediation effects. Stat. Sci. 2010, 25, 51–71. [Google Scholar] [CrossRef]
Fiedler, K.; Schott, M.; Meiser, T. What mediation analysis can (not) do. J. Exp. Soc. Psychol. 2011, 47, 1231–1236. [Google Scholar] [CrossRef]
Lok, J.J. Defining and estimating causal direct and indirect effects when setting the mediator to specific values is not feasible. Stat. Med. 2016, 35, 4008–4020. [Google Scholar] [CrossRef] [PubMed]
Moreno-Betancur, M.; Carlin, J.B. Understanding interventional effects: A more natural approach to mediation analysis? Epidemiology 2018, 29, 614–617. [Google Scholar] [CrossRef] [PubMed]
Lok, J.J.; Bosch, R.J. Causal organic indirect and direct effects: Closer to the original approach to mediation analysis, with a product method for binary mediators. Epidemiology 2021, 32, 412–420. [Google Scholar] [CrossRef] [PubMed]
Robins, J.M.; Richardson, T.S. Alternative graphical causal models and the identification of direct effects. In Causality and Psychopathology: Finding the Determinants of Disorders and Their Cures; Oxford University Press: Oxford, UK, 2010; Volume 84, pp. 103–158. [Google Scholar]
Robins, J.M.; Richardson, T.S.; Shpitser, I. An interventionist approach to mediation analysis. In Probabilistic and Causal Inference: The Works of Judea Pearl; ACM: New York, NY, USA, 2022; pp. 713–764. [Google Scholar]
Stensrud, M.J.; Young, J.G.; Didelez, V.; Robins, J.M.; Hernán, M.A. Separable effects for causal inference in the presence of competing events. J. Am. Stat. Assoc. 2022, 117, 175–183. [Google Scholar] [CrossRef]
Wodtke, G.T.; Zhou, X. Effect decomposition in the presence of treatment-induced confounding: A regression-with-residuals approach. Epidemiology 2020, 31, 369–375. [Google Scholar] [CrossRef] [PubMed]
Miles, C.H.; Shpitser, I.; Kanki, P.; Meloni, S.; Tchetgen Tchetgen, E.J. On semiparametric estimation of a path-specific effect in the presence of mediator-outcome confounding. Biometrika 2020, 107, 159–172. [Google Scholar] [CrossRef] [PubMed]
Xia, F.; Chan, K.C.G. Identification, semiparametric efficiency, and quadruply robust estimation in mediation analysis with treatment-induced confounding. J. Am. Stat. Assoc. 2023, 118, 1272–1281. [Google Scholar] [CrossRef]
Bind, M.A.; Vanderweele, T.; Coull, B.; Schwartz, J. Causal mediation analysis for longitudinal data with exogenous exposure. Biostatistics 2016, 17, 122–134. [Google Scholar] [CrossRef] [PubMed]
Jose, P.E. The merits of using longitudinal mediation. Educ. Psychol. 2016, 51, 331–341. [Google Scholar] [CrossRef]
Zheng, W.; van der Laan, M. Longitudinal mediation analysis with time-varying mediators and exposures, with application to survival outcomes. J. Causal Inference 2017, 5, 20160006. [Google Scholar] [CrossRef] [PubMed]
Ten Have, T.R.; Joffe, M.M. A review of causal estimation of effects in mediation analyses. Stat. Methods Med Res. 2012, 21, 77–107. [Google Scholar] [CrossRef] [PubMed]
Stensrud, M.J.; Hernán, M.A.; Tchetgen Tchetgen, E.J.; Robins, J.M.; Didelez, V.; Young, J.G. A generalized theory of separable effects in competing event settings. Lifetime Data Anal. 2021, 27, 588–631. [Google Scholar] [CrossRef] [PubMed]
Deng, Y.; Wang, Y.; Zhou, X.H. Direct and indirect treatment effects in the presence of semicompeting risks. Biometrics 2024, 80, ujae032. [Google Scholar] [CrossRef] [PubMed]
Robins, J. A new approach to causal inference in mortality studies with a sustained exposure period—Application to control of the healthy worker survivor effect. Math. Model. 1986, 7, 1393–1512. [Google Scholar] [CrossRef]
Young, J.G.; Stensrud, M.J.; Tchetgen Tchetgen, E.J.; Hernán, M.A. A causal framework for classical statistical estimands in failure-time settings with competing events. Stat. Med. 2020, 39, 1199–1236. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; van der Laan, L.; Petersen, M.; Gerds, T.; Kvist, K.; van der Laan, M. Targeted maximum likelihood based estimation for longitudinal mediation analysis. arXiv 2023, arXiv:2304.04904. [Google Scholar]
Fine, J.P.; Jiang, H.; Chappell, R. On semi-competing risks data. Biometrika 2001, 88, 907–919. [Google Scholar] [CrossRef]
Huang, Y.T. Causal mediation of semicompeting risks. Biometrics 2021, 77, 1143–1154. [Google Scholar] [CrossRef]
Rytgaard, H.C.; Gerds, T.A.; van der Laan, M.J. Continuous-time targeted minimum loss-based estimation of intervention-specific mean outcomes. Ann. Stat. 2022, 50, 2469–2491. [Google Scholar] [CrossRef]
Deng, Y.; Wang, Y.; Zhan, X.; Zhou, X.H. Separable pathway effects of semi-competing risks via multi-state models. arXiv 2023, arXiv:2306.15947. [Google Scholar]
Asgharian, M.; M’Lan, C.E.; Wolfson, D.B. Length-biased sampling with right censoring: An unconditional approach. J. Am. Stat. Assoc. 2002, 97, 201–209. [Google Scholar] [CrossRef]
Breum, M.S.; Munch, A.; Gerds, T.A.; Martinussen, T. Estimation of separable direct and indirect effects in a continuous-time illness-death model. Lifetime Data Anal. 2024, 30, 143–180. [Google Scholar] [CrossRef] [PubMed]
Kanakry, C.G.; Fuchs, E.J.; Luznik, L. Modern approaches to HLA-haploidentical blood or marrow transplantation. Nat. Rev. Clin. Oncol. 2016, 13, 10–24. [Google Scholar] [CrossRef] [PubMed]
Chang, Y.J.; Wang, Y.; Xu, L.P.; Zhang, X.H.; Chen, H.; Chen, Y.H.; Wang, F.R.; Sun, Y.Q.; Yan, C.H.; Tang, F.F.; et al. Haploidentical donor is preferred over matched sibling donor for pre-transplantation MRD positive ALL: A phase 3 genetically randomized study. J. Hematol. Oncol. 2020, 13, 27. [Google Scholar] [CrossRef] [PubMed]
Ma, R.; Xu, L.P.; Zhang, X.H.; Wang, Y.; Chen, H.; Chen, Y.H.; Wang, F.R.; Han, W.; Sun, Y.Q.; Yan, C.H.; et al. An Integrative Scoring System Mainly Based on Quantitative Dynamics of Minimal/Measurable Residual Disease for Relapse Prediction in Patients with Acute Lymphoblastic Leukemia. 2021. Available online: https://library.ehaweb.org/eha/2021/eha2021-virtual-congress/324642 (accessed on 12 June 2021).
Walker, I.; Panzarella, T.; Couban, S.; Couture, F.; Devins, G.; Elemary, M.; Gallagher, G.; Kerr, H.; Kuruvilla, J.; Lee, S.J.; et al. Pretreatment with anti-thymocyte globulin versus no anti-thymocyte globulin in patients with haematological malignancies undergoing haemopoietic cell transplantation from unrelated donors: A randomised, controlled, open-label, phase 3, multicentre trial. Lancet Oncol. 2016, 17, 164–173. [Google Scholar] [CrossRef]
Crowley, J.; Hu, M. Covariance analysis of heart transplant survival data. J. Am. Stat. Assoc. 1977, 72, 27–36. [Google Scholar] [CrossRef]
Thackham, M.; Ma, J. On maximum likelihood estimation of the semi-parametric Cox model with time-varying covariates. J. Appl. Stat. 2020, 47, 1511–1528. [Google Scholar] [CrossRef] [PubMed]
Zeng, D.; Lin, D. Maximum likelihood estimation in semiparametric regression models with censored data. J. R. Stat. Soc. Ser. B Stat. Methodol. 2007, 69, 507–564. [Google Scholar] [CrossRef]
R Core Team. R: A Language and Environment for Statistical Computing; Version 4.4.0 [Computer Software]; R Foundation for Statistical Computing: Vienna, Austria, 2024. [Google Scholar]
VanderWeele, T.J.; Tchetgen Tchetgen, E.J. Mediation analysis with time varying exposures and mediators. J. R. Stat. Soc. Ser. B Stat. Methodol. 2017, 79, 917–938. [Google Scholar] [CrossRef] [PubMed]
Xia, F.; Chan, K.C.G. Decomposition, identification and multiply robust estimation of natural mediation effects with multiple mediators. Biometrika 2022, 109, 1085–1100. [Google Scholar] [CrossRef]
Zhou, X. Semiparametric estimation for causal mediation analysis with multiple causally ordered mediators. J. R. Stat. Soc. Ser. B Stat. Methodol. 2022, 84, 794–821. [Google Scholar] [CrossRef]
Wei, H.; Cai, H.; Shi, C.; Song, R. On efficient inference of causal effects with multiple mediators. arXiv 2024, arXiv:2401.05517. [Google Scholar]
VanderWeele, T.J.; Vansteelandt, S.; Robins, J.M. Effect decomposition in the presence of an exposure-induced mediator-outcome confounder. Epidemiology 2014, 25, 300–306. [Google Scholar] [CrossRef] [PubMed]
Rudolph, K.E.; Williams, N.T.; Diaz, I. Practical causal mediation analysis: Extending nonparametric estimators to accommodate multiple mediators and multiple intermediate confounders. Biostatistics 2024, kxae012. [Google Scholar] [CrossRef] [PubMed]
Díaz, I.; Williams, N.; Rudolph, K.E. Efficient and flexible mediation analysis with time-varying mediators, treatments, and confounders. J. Causal Inference 2023, 11, 20220077. [Google Scholar] [CrossRef]
Gilbert, B.; Hoffman, K.L.; Williams, N.; Rudolph, K.E.; Schenck, E.J.; Díaz, I. Identification and estimation of mediational effects of longitudinal modified treatment policies. arXiv 2024, arXiv:2403.09928. [Google Scholar]
Díaz, I.; Hoffman, K.L.; Hejazi, N.S. Causal survival analysis under competing risks using longitudinal modified treatment policies. Lifetime Data Anal. 2024, 30, 213–236. [Google Scholar] [CrossRef] [PubMed]

Figure 1. A direct acyclic graph (DAG) for longitudinal outcomes with 3 periods. Here, A is the treatment,

L_{M}^{j}

is the mediator-inducing confounding,

L_{Y}^{j}

is the outcome-inducing confounding,

M^{j}

is the mediator and

Y^{j}

is the outcome at period j. The baseline covariates

L_{0}

, which can have direct edges to all variables, are omitted. Red lines depict edges into the mediator, blue lines depict edges into the mediator-inducing confounding, green lines depict edges into the outcome, and black lines depict adges into the outcome-inducing confounding. This DAG also satisfies Markovness.

Figure 1. A direct acyclic graph (DAG) for longitudinal outcomes with 3 periods. Here, A is the treatment,

L_{M}^{j}

is the mediator-inducing confounding,

L_{Y}^{j}

is the outcome-inducing confounding,

M^{j}

is the mediator and

Y^{j}

is the outcome at period j. The baseline covariates

L_{0}

, which can have direct edges to all variables, are omitted. Red lines depict edges into the mediator, blue lines depict edges into the mediator-inducing confounding, green lines depict edges into the outcome, and black lines depict adges into the outcome-inducing confounding. This DAG also satisfies Markovness.

Figure 2. An extended direct acyclic graph (eDAG) for longitudinal outcomes with 3 periods. The treatment A is divided into two components:

A_{M}

and

A_{Y}

. In addition,

L_{M}^{j}

is the mediator-inducing confounding,

L_{Y}^{j}

is the outcome-inducing confounding,

M^{j}

is the mediator, and

Y^{j}

is the outcome at the period j. The baseline covariates

L_{0}

, which can have direct edges to all variables, are omitted. Red lines depict edges into the mediator, blue lines depict edges into the mediator-inducing confounding, green lines depict edges into the outcome, black lines depict edges into the outcome-inducing confounding, and grey lines depict unmeasured confounding. In the presence of unmeasured confounding

U_{L 1}

,

U_{L_{2}}

,

U_{M}

and

U_{Y}

, the dismissible treatment components assumption holds but the sequential ignorability assumption is violated.

Figure 2. An extended direct acyclic graph (eDAG) for longitudinal outcomes with 3 periods. The treatment A is divided into two components:

A_{M}

and

A_{Y}

. In addition,

L_{M}^{j}

is the mediator-inducing confounding,

L_{Y}^{j}

is the outcome-inducing confounding,

M^{j}

is the mediator, and

Y^{j}

is the outcome at the period j. The baseline covariates

L_{0}

, which can have direct edges to all variables, are omitted. Red lines depict edges into the mediator, blue lines depict edges into the mediator-inducing confounding, green lines depict edges into the outcome, black lines depict edges into the outcome-inducing confounding, and grey lines depict unmeasured confounding. In the presence of unmeasured confounding

U_{L 1}

,

U_{L_{2}}

,

U_{M}

and

U_{Y}

, the dismissible treatment components assumption holds but the sequential ignorability assumption is violated.

Figure 3. A multi-state model illustration for the leukemia data. Relapse is the intermediate event (

{\tilde{N}}_{1}

), mortality is the terminal event (

{\tilde{N}}_{2}

) and GVHD status is a time-varying covariate taking values 0 or 1 (

{\tilde{L}}_{1}

). To ease estimation, we regard GVHD as a state. GVHD can transit to relapse and relapse can transit to GVHD. Within the separable effects framework, the treatment component

A_{2}

has an effect on the hazard of mortality (

{\tilde{N}}_{2}

), whereas the treatment component

A_{1}

has effects on the hazard of GVHD (

{\tilde{L}}_{1}

) and relapse (

{\tilde{N}}_{1}

).

Figure 3. A multi-state model illustration for the leukemia data. Relapse is the intermediate event (

{\tilde{N}}_{1}

), mortality is the terminal event (

{\tilde{N}}_{2}

) and GVHD status is a time-varying covariate taking values 0 or 1 (

{\tilde{L}}_{1}

). To ease estimation, we regard GVHD as a state. GVHD can transit to relapse and relapse can transit to GVHD. Within the separable effects framework, the treatment component

A_{2}

has an effect on the hazard of mortality (

{\tilde{N}}_{2}

), whereas the treatment component

A_{1}

has effects on the hazard of GVHD (

{\tilde{L}}_{1}

) and relapse (

{\tilde{N}}_{1}

).

Figure 4. The counterfactual cumulative incidence functions and the natural direct/indirect effects. The dashed lines represent the 95% bootstrap confidence intervals.

Table 1. Summary statistics in the data application, stratified by treatment groups. We list the mean and standard deviation (SD) of baseline covariates in each group. We also list the proportion of certain observed uncensored events (GVHD, relapse, mortality) and the time to the observed uncensored event in each group.

	Haplo-SCT ( $A = 1$ )		MSDT ( $A = 0$ )
	Mean	(SD)	Mean	(SD)
Baseline covariates
Age	26.697	(12.232)	35.000	(13.077)
Sex	0.374	(0.485)	0.415	(0.497)
CR	0.227	(0.420)	0.154	(0.364)
Diagnosis	0.160	(0.367)	0.046	(0.211)
Observed events
GVHD	0.727	(0.446)	0.585	(0.497)
Time to GVHD	141.876	(402.252)	194.368	(178.842)
Relapse	0.290	(0.455)	0.477	(0.503)
Time to relapse	371.667	(406.247)	420.774	(369.805)
Mortality	0.366	(0.483)	0.538	(0.502)
Time to mortality	393.264	(346.752)	528.257	(410.212)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Article Metrics

Citations

Article Access Statistics

Journal Statistics

Multiple requests from the same IP address are counted as one view.