**Abbreviations**

The following abbreviations are used in this manuscript:


### **Appendix A. The Basics of Channel Theory Information Flow and Context Dependency**

The Channel Theory of [28] introduces the idea of a "classifier" (or "classification") as accommodating a "context" in terms of its constituent "tokens" in some language and the "types" to which they belong.

**Definition A1.** *A classifier* A *is a triple Tok*(A), *Typ*(A), |=A *where Tok*(A) *is a set of "tokens", Typ*(A) *is a set of "types", and* |=A *is a "classification" relation between tokens and types.*

Note that this definition specifies a classifier/classification as an object in the category of Chu spaces [67–69] where '|=A' is realized by a satisfaction relation valued in some set **K** (with no structure assumed). The arrows (morphisms) between classifiers are specified by the following:

**Definition A2.** *Given two classifiers* A = *Tok*(A), *Typ*(A), |=A *and* B = *Tok*(B), *Typ*(B), |=B *, an infomorphism f* : A→B *is a pair of maps* −→*f* : *Tok*(B) → *Tok*(A) *and* ←−*f* : *Typ*(A) → *Typ*(B) *such that* ∀*b* ∈ *Tok*(B) *and* ∀*a* ∈ *Typ*(A)*,* −→*f* (*b*) |=A *a if and only if b* |=B ←−*f* (*a*)*.*

Information is inherently a physical mode of distinctions and relationships between them, and not simply a reduction to a quantity of bits as it would be for Shannon information that passively neglects the substance of reasoning. Rather, it instead conforms to the set of logical constraints as imposed by Definition A2. An infomorphism as a mapping between classifiers provides the basic building blocks towards constructing multi-level, quasi-hierarchical classification systems. Such a framework of information transfer is indicative of causation, which itself may be viewed as a form of computation in view of the regular relations in a distributed system [70]. References [6,32,33] bring to the forefront many examples, and applications of the above concepts that include probability distributions, Bayesian belief networks, event space structures, formal concept analysis, and fuzzy relationships (as further relevant to this issue, let us point out that the Sorkin model of spacetime causal sets [71,72] has been interpreted in terms of classifiers (Chu spaces) in [73] (reviewed in [32])). In particular, Reference [33] focuses on orders of contextuality with ramifications to active inference and to the Frame Problem.

The specifics of transmitting information via classifiers and infomorphisms lead, in [28], to defining the idea of an information channel over classifiers, the most general of which leads to the categorical notion of a cocone with the core **C** the colimit of all possible upward-going structure-preserving maps from the classifiers A*<sup>i</sup>*. Such a colimit core, when it exists, can be regarded as a classifier which embraces the totality of information that is common to the component classifiers A*<sup>i</sup>*. The resulting structure is a cocone diagram (CCD) as exemplified Figure 3. Within such a framework, the means by which channels encode sets of mutual constraints between classifiers is regulated by a local logic as presented formally in ([28], Ch. 12) (reviewed in [32,33]). Basically, the idea is that the types of a (regular) theory specify the logical structure of a given situation. A local logic is essentially a classifier having a (regular) theory along with a subset of tokens that satisfy all constraints

of the theory as specified by a sequent (a sequent *M* |=A *N* of a classifier A is a pair of subsets *M*, *N* of *Typ*(A) such that ∀ *x* ∈ *Tok*(A), *x* |=A *M* ⇒ *x* |=A *N*). An infomorphism preserving this additional logical structure is then promoted to a logic infomorphism. In short then, a local logic "identifies" the token(s) satisfying all of the types, the logic infomorphisms are those infomorphisms that transfer token-identification information between local logics, and the channels comprise sets of (logic) infomorphisms encoding mutual constraints that assemble multiple identified tokens. As demonstrated in [74], a sequent of a theory can be weakened to a conditional probability such that a CCD becomes a network of hierarchical Bayesian inference, as reviewed and formulated in [32,33] (cf. [75]), and whose structure is compatible with the variational free energy principle as the latter is fundamental to the precision of perceptual inference [76] (the sequent relation can be weakened by requiring only that if *x* |=A *M*, there is some probability *Prob*(*N*|*M*) that *x* |=A *N*. Essentially it is how a conditional probability interprets the logical implication "⇒" [77]).

### *Appendix A.1. Example: Observables in Context*

One fundamental example incorporating "context", instrumental in [33] has the following Chu space ingredients: Consider the following countable (in practice, finite) sets:


(iii) a set *R* of contexts (or, in certain instances, a set of "detectors", "measurements" or "methods").

The set *B* can be decomposed as *B* = *B<sup>M</sup>* ∪ *B<sup>C</sup>* (disjoint union), where *B<sup>M</sup>* contains "objects/contents" or "degrees of freedom" that are observed or measured in some event *a* ∈ *A*, and *B<sup>C</sup>* contains what is not observed in the events in *A*. This leads to defining a 'large' space,

$$X := B \times R = (B^M \cup B^C) \times R,\tag{A1}$$

in assuming that *A*, *B* and *R* are subsets of the same (even larger) probability space P (We do not make any assumptions about corresponding types of probability distributions (e.g., discrete versus continuous) in relationship to P. Neither do we specify the nature of random variables, nor the possible orders of "connectedness" (of distributions)). Thus, based on this data we consider the classifier,

$$\mathcal{A} = \langle A, X, \mathbb{H} \rangle,\tag{A2}$$

as comprising observables in context, where as in Section 3, the classification relation '|=A' is realized by the Chu space valuation in the set **K** = {−1, <sup>1</sup>}. Notably, in [33], '|=A' can be realized for an inferential process by the conditional probability *p*(*a*|*x*) = *p*(*a*|{*b*, *<sup>c</sup>*}), whenever defined, for *a* ∈ *A*, *b* ∈ *B* and *c* ∈ *R*, and which for suitable indexing, leads to an information flow of hierarchical Bayesian inference within a CCD [33]. The background to the results in Section 5 here can be found in ([33], Section 7). In particular, ([33], Th. 7.1) states the criteria for intrinsic contextuality (non-co-deployable observables) in terms of noncommutativity of a CCD. Note that the above classifier (Chu space) formulism of contextuality is very general. Special cases of the set *X* = *B* × *R* are the sets of binary random variables labelled by a measurement (contents-context) system as basic to the theory of Contextuality-by-Default [78,79]. Much amounts to the question of determining the nature of an empirical model *e* relative to how a probability distribution can be obtained as the marginals of a global probability distribution on the outcomes to all measurements. For example, *e* is said to be contextual in [80] if the corresponding probability distribution cannot be obtained by such global means. This has a compatible interpretation in terms of the non-existence of a global section of a sheaf defined relative to a "measurement cover" in [81]. These methods of studying contextuality are also very general, and as for those of [33], can extend beyond quantum theory to such disciplines as linguistics

and psychology. To see the explicit connections between these various approaches would indeed be a worthwhile undertaking.
