**DC1**: *Probabilities that are conditioned on one subdomain are not affected by information about other non-overlapping subdomains.*

Consider a subdomain D⊂X composed of atomic propositions *i* ∈ D and suppose the information to be processed refers to some other subdomain D ⊂ X that does not overlap with D, D∩D = ∅. In the absence of any new information about D, the PMU demands, we do not change our minds about probabilities that are conditional on D. Thus, we design the inference method so that *q*(*i*|D), the prior probability of *i* conditioned on *i* ∈ D, is not updated. Thus, the selected conditional posterior is

$$P(i|\mathcal{D}) = q(i|\mathcal{D})\,. \tag{1}$$

We adopt the following notation: priors are denoted by *q*, candidate posteriors by lower case *p*, and the selected posterior by upper case *P*. We shall write either *p*(*i*) or *pi*. Furthermore, we adopt the notation, standard in physics where the probabilities of *x* and *θ* are written *p*(*x*) and *p*(*θ*) but there is no implication that *p* refers to the same mathematical function.

We emphasize that the point is not that we make the unwarranted assumption that keeping *q*(*i*|D) unchanged is guaranteed to lead to correct inferences. It need not; induction is risky. The point is, rather, that in the absence of any evidence to the contrary, there is no reason to change our minds and the prior information takes priority.

**The consequence of DC1** is that non-overlapping domains of *i* contribute additively to the entropy,

*S*(*p*, *q*) = ∑ *i F*(*pi*, *qi*) , (2)

where *F* is some unknown function of two arguments. The proof is given in Appendix A.

**Comment 1:** It is essential that DC1 refers to *conditional* probabilities—local information about a domain D can (via normalization) have a non-local effect on the probability of another domain D.

**Comment 2:** An important special case is the "update" from a prior *q*(*i*) to a posterior *P*(*i*) in a situation in which no new information is available. The criterion DC1 applied to a situation where the subdomain D covers the whole space of *i*s, D = X , requires that *in the absence of any new information, the prior conditional probabilities are not to be updated: P*(*i*|X ) = *q*(*i*|X ) *or P*(*i*) = *q*(*i*).

**Comment 3:** The criterion DC1 implies Bayesian conditionalization as a special case. Indeed, if the information is given through the constraint *<sup>p</sup>*(D˜) = 0, where <sup>D</sup>˜ is the complement of D, then *P*(*i*|D) = *q*(*i*|D), which is referred to as Bayesian conditionalization. More explicitly, if *θ* is the variable to be inferred on the basis of prior information about a likelihood function *q*(*i*|*θ*) and observed data *i* , then the update from the prior *q* to the posterior *P*, is

$$q(i, \theta) = q(i)q(\theta|i) \to P(i, \theta) = P(i)P(\theta|i) \,. \tag{3}$$

consists of updating *q*(*i*) → *P*(*i*) = *δii* to agree with the new information and invoking the PMU so that *P*(*θ*|*i* ) = *q*(*θ*|*i* ) remains unchanged. Therefore,

$$P(i', \theta) = \delta\_{ii'} q(\theta | i') \quad \text{so that} \quad P(\theta) = q(\theta | i') = q(\theta) \frac{q(i'|\theta)}{q(i')} \, , \tag{4}$$

which is Bayes' rule. Thus, *entropic inference is designed to include Bayesian inference as a special case*. Note, however, that imposing DC1 is not identical to imposing Bayesian conditionalization: DC1 is not restricted to information in the form of absolute certainties, such as *p*(D) = 1.

**Comment 4:** If the label *i* is turned into a continuous variable *x*, the criterion DC1 requires that information that refers to points infinitely close but just outside the domain D will have no influence on probabilities conditional on D. This may seem surprising, as it may lead to updated probability distributions that are discontinuous, but it is not a problem. In situations where we have explicit reasons to believe that conditions of continuity or differentiability hold, then such conditions should be imposed explicitly. The inference process should not be expected to discover and replicate information with which it was not supplied.

### 3.3.2. Subsystem Independence

**DC2**: *When two systems are a priori believed to be independent and the information we receive about one of them makes no reference to the other, then it should not matter whether the latter is included in the analysis of the former or not.*

Consider a system of propositions labeled by a composite index, *i* = (*i*1, *i*2) ∈ X = X<sup>1</sup> × X2. For example, {*i*1} = X<sup>1</sup> and {*i*2} = X<sup>2</sup> might describe the microstates of two separate physical systems. Assume that all prior evidence led us to believe the two subsystems are independent, that is, any two propositions *i*<sup>1</sup> ∈ X<sup>1</sup> and *i*<sup>2</sup> ∈ X<sup>2</sup> are believed to be independent. This belief is reflected in the prior distribution: if the individual subsystem priors *q*1(*i*1) and *q*2(*i*2), then the prior for the whole system is *q*1(*i*1)*q*2(*i*2). Next, suppose that new information is acquired such that *q*1(*i*1) would by itself be updated to *P*1(*i*1), and that *q*2(*i*2) would by itself be updated to *P*2(*i*2). DC2 requires that *S*[*p*, *q*] be

such that the joint prior *q*1(*i*1)*q*2(*i*2) updates to the product *P*1(*i*1)*P*2(*i*2) so that inferences about one subsystem do not affect inferences about the other.

**The consequence of DC2** is to fully determine the unknown function *F* in (2) so that probability distributions *p*(*i*) should be ranked relative to the prior *q*(*i*) according to the relative entropy,

$$S[p\_\prime q] = -\sum\_i p(i) \log \frac{p(i)}{q(i)}.\tag{5}$$

**Comment 1:** We emphasize that the point is not that when we have no evidence for correlations, we draw the firm conclusion that the systems must necessarily be independent. Induction involves risk; the systems might, in actual fact, be correlated through some unknown interaction potential. The point is rather that if the joint prior reflected independence and the new evidence is silent on the matter of correlations, then the evidence we actually have—namely, the prior—takes precedence, and there is no reason to change our minds. As before, the PMU requires that a feature of the probability distribution—in this case, independence—will not be updated unless the evidence requires it.

**Comment 2:** We also emphasize that DC2 *is not a consistency requirement*. The argument we deploy is *not* that both the prior *and* the new information tell us the systems are independent in which case consistency requires that it should not matter whether the systems are treated jointly or separately. DC2 refers to a situation where the new information does not say whether the systems are independent or not. Rather, the updating is being *designed* through the PMU—so that the independence reflected in the prior is maintained in the posterior by default.

**Comment 3:** The generalization to continuous variables *x* ∈ X is approached as a Riemann limit from the discrete case. A continuous probability density *p*(*x*) or *q*(*x*) can be approximated by the discrete distributions. Divide the region of interest X into a large number *N* of small cells. The probabilities of each cell are as follows:

$$p\_i = p(\mathbf{x}\_i) \Delta \mathbf{x}\_i \quad \text{and} \quad q\_i = q(\mathbf{x}\_i) \Delta \mathbf{x}\_i \tag{6}$$

where Δ*xi* is an appropriately small interval. The discrete entropy of *pi* relative to *qi* is as follows:

$$S\_N = -\sum\_{i=1}^N \Delta x\_i \, p(x\_i) \, \log \left[ \frac{p(x\_i) \Delta x\_i}{q(x\_i) \Delta x\_i} \right] \, \text{ } \tag{7}$$

and in the limit as *N* → ∞ and Δ*xi* → 0 we get the Riemann integral

$$S[p\_\prime q] = -\int dx \, p(x) \, \log \left[ \frac{p(x)}{q(x)} \right] \,. \tag{8}$$

(To simplify the notation, we include multi-dimensional integrals by writing *dnx* = *dx*.) It is easy to check that the ranking of distributions induced by *S*[*p*, *q*] is invariant under coordinate transformations. The insight that coordinate invariance could be derived as a consequence of the requirement of subsystem independence first appeared in [5].
