**2. Proofs**

Let us write B*n* ↑ B for a sequence (B*n*)*n*∈<sup>N</sup> of fields such that B1 ⊂ B2 ⊂ ··· ⊂ B = \$ *n*∈N B*<sup>n</sup>*. (B need not be a *σ*-field.) Our proof of Theorem 1 will rest on a few approximation results and this statement by D ˛ebowski [1] (Theorem 1):

**Theorem 2.** *Let* A*,* B*,* B*n, and* C *be subfields of* J *.*


Let *Ac* = Ω \ *A*. Subsequently, let us denote the symmetric difference

$$A \triangle B := (A \backslash B) \cup (B \backslash A) = (A \cup B) \backslash (A \cap B). \tag{4}$$

Symmetric difference satisfies the following identities, which will be used:

$$A^{\varepsilon} \triangle B^{\varepsilon} = A \triangle B,\tag{5}$$

$$A \triangle B \subset (A \triangle C) \cup (C \triangle B),\tag{6}$$

$$(A \backslash C) \triangle B \subset (A \triangle B) \cup (C \cap B),\tag{7}$$

$$\left(\bigcup\_{i\in\mathcal{C}} A\_i\right) \bigtriangleup \left(\bigcup\_{i\in\mathcal{C}} B\_i\right) \subset \bigcup\_{i\in\mathcal{C}} (A\_i \triangle B\_i). \tag{8}$$

Moreover, we will apply the Bonferroni inequalities

$$0 \le \sum\_{1 \le i \le n} P(A\_i) - P\left(\bigcup\_{1 \le i \le n} A\_i\right) \le \sum\_{1 \le i < j \le n} P(A\_i \cap A\_j) \tag{9}$$

and inequality *P*(*A*) ≤ *P*(*B*) + *<sup>P</sup>*(*<sup>A</sup><sup>B</sup>*).

In the following, we will derive the necessary approximation results. Our point of departure is the following folklore fact.

**Theorem 3** (approximation of *<sup>σ</sup>*-fields)**.** *For any field* K *and any event G* ∈ *<sup>σ</sup>*(K)*, there is a sequence of events K*1, *K*2, ···∈K *such that*

$$\lim\_{n \to \infty} P(G \triangle K\_n) = 0.\tag{10}$$

**Proof.** Denote the class of sets *G* that satisfy (10) as G. It is sufficient to show that G is a complete *σ*-field that contains the field K. Clearly, all *G* ∈ K satisfy (10) so G⊃K. Now, we verify the conditions for G to be a *σ*-field.


$$P\left(\left(\bigcap\_{i=1}^{n} A\_i\right) \triangle \left(\bigcap\_{i=1}^{n} K\_i^{i+n}\right)\right) \le \sum\_{i=1}^{n} P(A\_i \triangle K\_i^{i+n}) \le 2^{-n}.\tag{11}$$

Moreover,

$$P\left(\left(\bigcap\_{i=1}^{\infty} A\_i\right) \bigtriangleup \left(\bigcap\_{i=1}^{n} A\_i\right)\right) = P\left(\bigcap\_{i=1}^{n} A\_i\right) - P\left(\bigcap\_{i=1}^{\infty} A\_i\right). \tag{12}$$

Hence,

$$\begin{split} &P\left(\left(\bigcap\_{i=1}^{\infty} A\_i\right) \bigtriangleup \left(\bigcap\_{i=1}^{n} K\_i^{i+n}\right)\right) \\ &\leq P\left(\left(\bigcap\_{i=1}^{\infty} A\_i\right) \bigtriangleup \left(\bigcap\_{i=1}^{n} A\_i\right)\right) + P\left(\left(\bigcap\_{i=1}^{n} A\_i\right) \bigtriangleup \left(\bigcap\_{i=1}^{n} K\_i^{i+n}\right)\right) \\ &\leq P\left(\bigcap\_{i=1}^{n} A\_i\right) - P\left(\bigcap\_{i=1}^{\infty} A\_i\right) - 2^{-n}, \end{split} \tag{13}$$

which tends to 0 for *n* going to infinity. Since '*ni*=<sup>1</sup> *Ki*+*<sup>n</sup> i*∈ K, we thus obtain that '<sup>∞</sup>*i*=<sup>1</sup> *Ai* ∈ G.

Completeness of *σ*-field G is straightforward since, for any *A* ∈ G and *P*(*AA*) = 0, we obtain *A* ∈ G using the same sequence of approximating events in field K as for event *A*.

The second approximation result is the following bound:

**Theorem 4** (continuity of entropy)**.** *Fix an*  ∈ (0,*e*<sup>−</sup><sup>1</sup>] *and a field* C*. For finite partitions α* = {*Ai*}*Ii*=<sup>1</sup> *and α* = "*Ai*#*Ii*=<sup>1</sup>*such that <sup>P</sup>*(*Ai<sup>A</sup>i*) ≤  *for all i* ∈ {1, . . . , *<sup>I</sup>*}*, we have*

$$\left| H(\mathfrak{a}|\mathcal{C}) - H(\mathfrak{a}'|\mathcal{C}) \right| \le I\sqrt{\epsilon} \log \frac{I}{\sqrt{\epsilon}}.\tag{14}$$

**Proof.** We have the expectation *<sup>P</sup>*(*Ai<sup>A</sup>i*|C)*dP* = *<sup>P</sup>*(*Ai<sup>A</sup>i*) ≤ . Hence, by the Markov inequality we obtain

$$P(P(A\_i \triangle A\_i' | \mathcal{C}) \ge \sqrt{\epsilon}) \le \sqrt{\epsilon}.\tag{15}$$

Denote

$$B = \left( P(A\_i \triangle A\_i^! | \mathcal{C}) < \sqrt{\epsilon} \right) \text{ for all } i \in \{1, \ldots, I\} \text{ }. \tag{16}$$

From the Bonferroni inequality, we obtain *P*(*Bc*) ≤ *<sup>I</sup>*√. Subsequently, we observe that |*H*(*α*||C) − *<sup>H</sup>*(*α*||C)| ≤ log *I* holds almost surely. Hence,

$$\begin{split} \left| H(\boldsymbol{a}|\mathcal{C}) - H(\boldsymbol{a}'|\mathcal{C}) \right| &= \left| \int \left[ H(\boldsymbol{a}|\mathcal{C}) - H(\boldsymbol{a}'|\mathcal{C}) \right] dP \right| \\ &\leq P(\mathcal{B}^{\varepsilon}) \log I + \int\_{\mathcal{B}} \left| H(\boldsymbol{a}||\mathcal{C}) - H(\boldsymbol{a}'||\mathcal{C}) \right| dP \\ &\leq I\sqrt{\varepsilon} \log I + \int\_{\mathcal{B}} \left| H(\boldsymbol{a}||\mathcal{C}) - H(\boldsymbol{a}'||\mathcal{C}) \right| dP. \end{split} \tag{17}$$

Function −*x* log *x* is subadditive and increasing for *x* ∈ (0,*e*<sup>−</sup><sup>1</sup>]. In particular, we have |(*x* + *y*)log(*x* + *y*) − *x* log *x*| ≤ −*y* log *y* for *x*, *y* ≥ 0. Thus, on the event *B* we obtain

$$\begin{split} \left| H(\boldsymbol{a} || \mathcal{C}) - H(\boldsymbol{a}' || \mathcal{C}) \right| &= \left| \sum\_{i=1}^{I} P(A\_i' | \mathcal{C}) \log P(A\_i' | \mathcal{C}) - \sum\_{i=1}^{I} P(A\_i | \mathcal{C}) \log P(A\_i | \mathcal{C}) \right| \\ &\leq -\sum\_{i=1}^{I} \left| P(A\_i | \mathcal{C}) - P(A\_i' | \mathcal{C}) \right| \log \left| P(A\_i | \mathcal{C}) - P(A\_i' | \mathcal{C}) \right| \\ &\leq -\sum\_{i=1}^{I} P(A\_i \triangle A\_i' | \mathcal{C}) \log P(A\_i \triangle A\_i' | \mathcal{C}) \\ &\leq -I\sqrt{\varepsilon} \log \sqrt{\varepsilon} \end{split} \tag{18}$$

Plugging (18) into (17) yields the claim.

> Now, we can prove the invariance of completion. Note that

$$H(a; \beta | \mathcal{C}) = H(a | \mathcal{C}) + H(\beta | \mathcal{C}) - H(a \wedge \beta | \mathcal{C}). \tag{19}$$

**Proof of Theorem 1. 1 (invariance of completion):** Consider some measurable fields A, B, and C. We are going to demonstrate

$$I(\mathcal{A}; \mathcal{B} | \mathcal{C}) = I(\mathcal{A}; \sigma(\mathcal{B}) | \mathcal{C}) = I(\mathcal{A}; \mathcal{B} | \sigma(\mathcal{C})). \tag{20}$$

Equality *<sup>I</sup>*(A; B|C) = *<sup>I</sup>*(A; B|*σ*(C)) is straightforward since *P*(*A*|C) = *<sup>P</sup>*(*A*|*σ*(C)) almost surely for all *A* ∈ J . It remains to prove *<sup>I</sup>*(A; B|C) = *<sup>I</sup>*(A; *<sup>σ</sup>*(B)|C). For this goal, it suffices to show that for any  > 0 and any finite partitions *α* ⊂ A and *β* ⊂ *σ*(B) there exists a finite partition *β* ⊂ B such that

$$\left| I(\mathfrak{a}; \beta | \mathcal{C}) - I(\mathfrak{a}; \beta' | \mathcal{C}) \right| < \varepsilon. \tag{21}$$

Fix then some  > 0 and finite partitions *α* := {*Ai*}*Ii*=<sup>1</sup> ⊂ A and *β* := (*Bj*)*Jj*=<sup>1</sup> ⊂ *<sup>σ</sup>*(B). Invoking Theorem 3, we know that for each *η* > 0 there exists a class of sets "*Cj*#*Jj*=<sup>1</sup> ⊂ B which need not be a partition, such that

$$P(C\_j \triangle B'\_j) \le \eta \tag{22}$$

for all *j* ∈ {1, . . . , *J*}. Let us put *<sup>B</sup>J*+<sup>1</sup> := ∅ and let us construct sets *D*0 := ∅ and *Dj* := \$*jk*=<sup>1</sup> *Ck* for *j* ∈ {1, . . . , *J*}. Subsequently, we put *Bj* := *Cj* \ *Dj*−<sup>1</sup> for *j* ∈ {1, . . . , *J*} and *BJ*+<sup>1</sup> := Ω \ *DJ*. In this way, we obtain a partition *β* := "*Bj*#*<sup>J</sup>*+<sup>1</sup> *j*=1 ⊂ B.

The next step of the proof is showing an analogue of bound (22) for partitions *β* and *β*. To begin, for *j* ∈ {1, . . . , *J*}, we have

$$\begin{split} P(B\_{j}\triangle B\_{j}') &= P((\mathbb{C}\_{j}\backslash D\_{j-1})\triangle B\_{j}') \leq P(\mathbb{C}\_{j}\triangle B\_{j}') + P(D\_{j-1}\cap B\_{j}') \\ &\leq \eta + \sum\_{k=1}^{j-1} P(\mathbb{C}\_{k}\cap B\_{j}') \\ &\leq \eta + \sum\_{k=1}^{j-1} \left[ P(B\_{k}'\cap B\_{j}') + P((\mathbb{C}\_{k}\cap B\_{j}')\triangle(B\_{k}'\cap B\_{j}')) \right] \\ &\leq \eta + \sum\_{k=1}^{j-1} \left[ 0 + P(\mathbb{C}\_{k}\triangle B\_{k}') \right] \leq j\eta. \end{split} \tag{23}$$

Now, we observe for *j*, *k* ∈ {1, . . . , *J*} and *j* = *k* that

$$P(\mathbb{C}\_{j}) \ge P(B\_{j}^{\prime}) - P(\mathbb{C}\_{j} \triangle B\_{j}^{\prime}) \ge P(B\_{j}^{\prime}) - \eta \tag{24}$$

$$P(\mathbb{C}\_{j}\cap\mathbb{C}\_{k}) \le P(B\_{j}^{\prime}\cap B\_{k}^{\prime}) + P((\mathbb{C}\_{j}\cap\mathbb{C}\_{k})\triangle(B\_{j}^{\prime}\cap B\_{k}^{\prime}))$$

$$\le 0 + P(\mathbb{C}\_{j}\triangle B\_{j}^{\prime}) + P(\mathbb{C}\_{k}\triangle B\_{k}^{\prime}) \le 2\eta. \tag{25}$$

Hence, by the Bonferroni inequality we derive

$$\begin{split} P(B\_{I+1} \triangle B\_{I+1}') &= P((\Omega \backslash D\_{I}) \triangle \mathcal{O}) = P(\Omega \backslash D\_{I}) = 1 - P(D\_{I}) \\ &\leq 1 - \sum\_{1 \leq j \leq I} P(\mathbb{C}\_{j}) + \sum\_{1 \leq j < k \leq I} P(\mathbb{C}\_{j} \cap \mathbb{C}\_{k}) \\ &\leq 1 - \sum\_{1 \leq j \leq I} P(B\_{j}') + I\eta + \sum\_{1 \leq j < k \leq I} 2\eta = f^{2}\eta. \end{split} \tag{26}$$

Resuming our bounds, we obtain

$$P((A\_{\bar{i}} \cap B\_{\bar{j}}) \triangle (A\_{\bar{i}} \cap B\_{\bar{j}}')) \le P(B\_{\bar{j}} \triangle B\_{\bar{j}}') \le f^2 \eta \tag{27}$$

for all *i* ∈ {1, . . . , *I*} and *j* ∈ {1, . . . , *J* + <sup>1</sup>}. Then, invoking Theorem 4 yields

$$\begin{split} \left| I(\mathfrak{a}; \mathfrak{z} | \mathcal{C}) - I(\mathfrak{a}; \mathfrak{z}' | \mathcal{C}) \right| &\leq \left| H(\mathfrak{a} \wedge \mathfrak{z} | \mathcal{C}) - H(\mathfrak{a} \wedge \mathfrak{z}' | \mathcal{C}) \right| + \left| H(\mathfrak{z} | \mathcal{C}) - H(\mathfrak{z}' | \mathcal{C}) \right| \\ &\leq I(f+1) \sqrt{l^{2} \eta} \log \frac{I(f+1)}{\sqrt{l^{2} \eta}} + (f+1) \sqrt{l^{2} \eta} \log \frac{f+1}{\sqrt{l^{2} \eta}}. \end{split} \tag{28}$$

Taking *η* sufficiently small, we obtain (21), which is the desired claim. -

Some consequence of the above result is this approximation result proved by Dobrushin [5] and Pinsker [6] and used by Wyner [7] to demonstrate the chain rule. Applying the invariance of completion, we supply a different proof than Dobrushin [5] and Pinsker [6].

**Theorem 5** (split of join)**.** *Let* A*,* B*,* C*, and* D *be subfields of* J *. We have*

$$I(\mathcal{A}; \mathcal{B} \wedge \mathcal{C} | \mathcal{D}) = \sup\_{a \subsetneq \mathcal{A}, \emptyset \subset \mathcal{B}, \gamma \subset \mathcal{C}} \mathbb{E} \, I(a; \mathcal{B} \wedge \gamma || \mathcal{D}), \tag{29}$$

*where the supremum is taken over all finite subpartitions.*

**Proof.** Define class

$$\mathcal{E} := \bigcup\_{\emptyset \subset \mathcal{B}, \gamma \subset \mathcal{C}} \sigma(\emptyset \wedge \gamma). \tag{30}$$

It can be easily verified that E is a field such that *σ*(E) = *<sup>σ</sup>*(B∧C). Thus, for all finite partitions *β* ⊂ B and *γ* ⊂ C we have *β* ∧ *γ* ⊂ E. Moreover, by definition of E, for each finite partition *ε* ⊂ E there exists finite partitions *β* ⊂ B and *γ* ⊂ C such that partition *β* ∧ *γ* is finer than *ε*. Hence, by Theorem 2.4, we obtain in this case,

$$\mathbb{E}\,I(a;\varepsilon||\mathcal{D}) \le \mathbb{E}\,I(a;\beta \land \gamma||\mathcal{D}) \le I(a;\mathcal{E}|\mathcal{D}).\tag{31}$$

In consequence, by Theorem 1. 1, we obtain the claim

$$I(\mathcal{A}; \mathcal{B} \wedge \mathcal{C} | \mathcal{D}) = I(\mathcal{A}; \mathcal{E} | \mathcal{D}) = \sup\_{a \subset \mathcal{A}, \varepsilon \subset \mathcal{E}} \operatorname{\mathbf{E}} I(a; \varepsilon || \mathcal{D})$$

$$= \sup\_{a \subset \mathcal{A}, \mathcal{B} \subset \mathcal{B}, \gamma \subset \mathcal{C}} \operatorname{\mathbf{E}} I(a; \mathcal{B} \wedge \gamma || \mathcal{D}). \tag{32}$$

The final approximation result which we need to prove the chain rule is as follows:

**Theorem 6** (convergence of conditioning)**.** *Let α* = {*Ai*}*Ii*=<sup>1</sup> *be a finite partition and let* C *be a field. For each*  > 0*, there exists a finite partition γ* ⊂ *σ*(C) *such that for any partition γ* ⊂ *σ*(C) *finer than γ we have*

$$|H(a|\mathcal{C}) - H(a|\gamma)| \le \epsilon. \tag{33}$$

**Proof.** Fix an  > 0. For each *n* ∈ N and *A* ∈ J , partition

$$\gamma\_A := \{ ((k-1)/n < P(A|\mathcal{C}) \le k/n) : k \in \{0, 1, \dots, n\} \}\tag{34}$$

is finite and belongs to *<sup>σ</sup>*(C). If we consider partition *γ* := \**Ii*=<sup>1</sup> *γAi* , it remains finite and still satisfies *γ* ⊂ *<sup>σ</sup>*(C). Let a partition *γ* ⊂ *σ*(C) be finer than *<sup>γ</sup>*. Then,

$$|P(A\_i|\mathcal{C}) - P(A\_i|\gamma)| \le 1/n \tag{35}$$

almost surely for all *i* ∈ {1, . . . , *<sup>I</sup>*}. We also observe

$$|H(a|\mathcal{C}) - H(a|\gamma)| \le \int |H(a||\mathcal{C}) - H(a||\gamma)| \, dP. \tag{36}$$

We recall that function −*x* log *x* is subadditive and increasing for *x* ∈ (0,*e*<sup>−</sup><sup>1</sup>]. In particular, we have |(*x* + *y*)log(*x* + *y*) − *x* log *x*| ≤ −*y* log *y* for *x*, *y* ≥ 0. Hence, for *n* ≥ *e* we obtain almost surely

$$\begin{split} |H(a||\mathcal{C}) - H(a||\gamma)| &= \left| \sum\_{i=1}^{I} P(A\_i|\mathcal{C}) \log P(A\_i|\mathcal{C}) - \sum\_{i=1}^{I} P(A\_i|\gamma) \log P(A\_i|\gamma) \right| \\ &\leq -\sum\_{i=1}^{I} |P(A\_i|\mathcal{C}) - P(A\_i|\gamma)| \log |P(A\_i|\mathcal{C}) - P(A\_i|\gamma)| \\ &\leq \frac{I \log n}{n} . \end{split} \tag{37}$$

Taking *n* so large that *n*<sup>−</sup><sup>1</sup> *I* log *n* ≤  yields the claim.

Taking the above into account, we can demonstrate the chain rule. Our proof essentially follows the ideas of Wyner [7], except for invoking Theorem 6.

**Proof of Theorem 1. 2 (chain rule):** Let A, B, C, and D be arbitrary fields, and let *α*, *β*, *γ*, and *δ* be finite partitions. The point of our departure is the chain rule for finite partitions [9] (Equation 2.60)

$$I(a; \beta \wedge \gamma) = I(a; \beta) + I(a; \gamma | \beta). \tag{38}$$

By Definition 1 and Theorems 1. 1, 5, and 6, conditional mutual information *<sup>I</sup>*(A; B|C) can be approximated by *<sup>I</sup>*(*α*; *β*|*γ*), where we take appropriate limits of refined finite partitions with a certain care.

In particular, by Theorems 1. 1, 5, and 6, taking sufficiently fine finite partitions of arbitrary fields B and C, the chain rule (38) for finite partitions implies

$$I(a; \mathcal{B} \wedge \mathcal{C}) = I(a; \mathcal{B}) + I(a; \mathcal{C}|\mathcal{B}),\tag{39}$$

where all expressions are finite. Hence, we also obtain

$$\begin{split} 0 &= \left[ I(\boldsymbol{a}; \mathcal{B} \wedge \mathcal{C} \wedge \mathcal{D}) - I(\boldsymbol{a}; \mathcal{D}) - I(\boldsymbol{a}; \mathcal{B} \wedge \mathcal{C} | \mathcal{D}) \right] \\ &- \left[ I(\boldsymbol{a}; \mathcal{B} \wedge \mathcal{D}) - I(\boldsymbol{a}; \mathcal{D}) - I(\boldsymbol{a}; \mathcal{B} | \mathcal{D}) \right] \\ &- \left[ I(\boldsymbol{a}; \mathcal{B} \wedge \mathcal{C} \wedge \mathcal{D}) - I(\boldsymbol{a}; \mathcal{B} \wedge \mathcal{D}) - I(\boldsymbol{a}; \mathcal{C} | \mathcal{B} \wedge \mathcal{D}) \right] \\ &= I(\boldsymbol{a}; \mathcal{B} | \mathcal{D}) + I(\boldsymbol{a}; \mathcal{C} | \mathcal{B} \wedge \mathcal{D}) - I(\boldsymbol{a}; \mathcal{B} \wedge \mathcal{C} | \mathcal{D}), \end{split}$$

where all expressions are finite. Having established the above claim for a finite partition *α*, we generalize it to

$$I(\mathcal{A}; \mathcal{B} \wedge \mathcal{C} | \mathcal{D}) = I(\mathcal{A}; \mathcal{B} | \mathcal{D}) + I(\mathcal{A}; \mathcal{C} | \mathcal{B} \wedge \mathcal{D}) \tag{40}$$

for an arbitrary field A, taking its appropriately fine finite partitions. -
