Pitfalls?

Entropy is a tool for reasoning and—as with all tools for reasoning or otherwise—it can be misused, leading to unsatisfactory results [52]. Should that happen, the inevitable questions are "what went wrong?" and "how do we fix it?" It helps to first ask what components of the analysis can be trusted so that the possible mistakes can be looked for elsewhere. The answers proposed by the ME method are radically conservative: problems always arise through a wrong choices of variables, priors, or constraints. Indeed, one should not blame the entropic method for not having discovered and taken into account relevant information that was not explicitly introduced into the analysis. Indeed, just as one would be very reticent about questioning the basic rules of arithmetic, or the basic rules of calculus, one should not question the basic sum and product rules of the probability calculus and, taking this one step farther, one should not question the applicability of entropy as the updating tool. The adoption of this conservative approach leads us to reject alternative entropies and quantum probabilities. Fortunately, those constructs are not actually needed—as mentioned above, those Tsallis distributions that have turned out be useful can be derived with standard entropic methods [8,48–51], and quantum mechanics can be handled within standard probability theory without invoking exotic probabilities [39,53].

**Funding:** This research received no external funding.

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Acknowledgments:** I would like to acknowledge many valuable discussions on probability and entropy with N. Caticha, A. Giffin, K. Knuth, R. Preuss, C. Rodríguez, J. Skilling, and K. Vanslette.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Appendix A. DC1—Mutually Exclusive Subdomains**

In these appendices, we establish the consequences of the two criteria DC1 and DC2, leading to the final result: Equation (9). The details of the proofs are important not just because they lead to our final conclusions, but also because the translation of the verbal statement of the criteria into precise mathematical form is a crucial part of unambiguously specifying what the criteria actually say.

First, we prove that criterion DC1 leads to the expression Equation (2) for *S*[*p*, *q*]. Consider the case of a discrete variable, *pi* with *i* = 1 ... *n*, so that *S*[*p*, *q*] = *S*(*p*<sup>1</sup> ... *pn*, *q*<sup>1</sup> ... *qn*). Suppose the space of states <sup>X</sup> is partitioned into two non-overlapping domains <sup>D</sup> and <sup>D</sup>˜ with D ∪ <sup>D</sup>˜ <sup>=</sup> <sup>X</sup> , and that the information to be processed is in the form of a constraint that refers to the domain <sup>D</sup>˜ ,

$$\sum\_{j \in \mathcal{D}} a\_j p\_{\bar{j}} = A \,. \tag{A1}$$

DC1 states that the constraint on <sup>D</sup>˜ does not have an influence on the *conditional* probabilities *pi*|D. It may, however, influence the probabilities *pi* within D through an overall multiplicative factor. To deal with this complication, consider then a special case where the overall probabilities of <sup>D</sup> and <sup>D</sup>˜ are also constrained:

$$\sum\_{i \in \mathcal{D}} p\_i = P\_{\mathcal{D}} \quad \text{and} \quad \sum\_{j \in \mathcal{D}} p\_j = P\_{\mathcal{D}} \, \text{'} \tag{A2}$$

with *<sup>P</sup>*<sup>D</sup> <sup>+</sup> *<sup>P</sup>*D˜ <sup>=</sup> 1. Under these special circumstances, constraints on <sup>D</sup>˜ will not influence *pi*s within D, and vice versa.

To obtain the posterior, maximize *S*[*p*, *q*] subject to these three constraints,

$$\begin{aligned} 0 &= \left[ \delta S - \lambda \left( \sum\_{i \in \mathcal{D}} p\_i - P\_{\mathcal{D}} \right) + \\ &- \bar{\lambda} \left( \sum\_{j \in \mathcal{D}} p\_i - P\_{\mathcal{D}} \right) + \mu \left( \sum\_{j \in \mathcal{D}} a\_j p\_j - A \right) \right] \end{aligned}$$

leading to

$$\frac{\partial S}{\partial p\_i} = \lambda \quad \text{for} \quad i \in \mathcal{D} \; \text{} \tag{A3}$$

$$\frac{\partial \mathcal{S}}{\partial p\_j} = \vec{\lambda} + \mu a\_j \quad \text{for} \quad j \in \vec{\mathcal{D}}\,. \tag{A4}$$

Equations (A1)–(A4) are *n* + 3 equations; we must solve for the *pi*s and the three Lagrange multipliers, *λ*, *λ*˜ , and *μ*. Since *S* = *S*(*p*<sup>1</sup> ... *pn*, *q*<sup>1</sup> ... *qn*) its derivative

$$\frac{\partial S}{\partial p\_i} = f\_i(p\_1 \dots p\_{n\prime} q\_1 \dots q\_n)$$

could, in principle, also depend on all 2*n* variables. However, this violates the DC1 criterion because any arbitrary change in *aj* within <sup>D</sup>˜ would influence the *pi*s within <sup>D</sup>. The only way that probabilities conditioned on D can be shielded from arbitrary changes in the constraints pertaining to <sup>D</sup>˜ is that for any *<sup>i</sup>* ∈ D, the function *fi* depends only on *pj*s with *j* ∈ D. Furthermore, this must hold not just for one particular partition of X into domains <sup>D</sup> and <sup>D</sup>˜ , but it must hold for *all conceivable partitions*, including the partition into atomic propositions. Therefore, *fi* can depend only on *pi*,

$$\frac{\partial S}{\partial p\_i} = f\_i(p\_{i'}.q\_1 \dots q\_n) \,. \tag{A5}$$

The power of the criterion DC1 is not exhausted yet. The information that affects the posterior can enter not just through constraints, but also through the prior. Suppose that the local information about domain <sup>D</sup>˜ is altered by changing the prior within <sup>D</sup>˜ . Let *qj* <sup>→</sup> *qj* <sup>+</sup> *<sup>δ</sup>qj* for *<sup>j</sup>* <sup>∈</sup> <sup>D</sup>˜ . Then (A5) becomes

$$\frac{\partial S}{\partial p\_i} = f\_i(p\_{i\prime}q\_1\ldots q\_{\slash} + \delta q\_{\slash}\ldots q\_n)\_{\prime\prime}$$

which shows that *pi* with *<sup>i</sup>* ∈ D will be influenced by information about <sup>D</sup>˜ unless *fi* with *<sup>i</sup>* ∈ D is independent of all the *qj*s for *<sup>j</sup>* <sup>∈</sup> <sup>D</sup>˜ . Again, this must hold for all possible partitions into <sup>D</sup> and <sup>D</sup>˜ , and therefore,

$$\frac{\partial S}{\partial p\_i} = f\_i(p\_{i\prime}q\_i) \quad \text{for all} \quad i \in \mathcal{X} \dots$$

The choice of the functions *fi*(*pi*, *qi*) can be restricted further. If we maximize *S*[*p*, *q*], subject to constraints

$$
\sum\_i p\_i = 1 \quad \text{and} \quad \sum\_i a\_i p\_i = A\_i
$$

we obtain

$$\frac{\partial S}{\partial p\_i} = f\_i(p\_{i\prime}q\_i) = \lambda + \mu a\_i \quad \text{for all} \quad i \in \mathcal{X}\_{\prime\prime}$$

where *λ* and *μ* are Lagrange multipliers. Solving for *pi* gives a posterior,

$$P\_i = \mathcal{g}\_i(q\_{i\prime}\lambda, \mu, a\_i).$$

for some functions *gi*. As stated in Section 3.3 we do not assume that the labels *i* themselves carry any particular significance. This means, in particular, that for any proposition labeled *i*, we want the selected posterior *Pi* to depend only on the numbers *qi*, *λ*, *μ*, and *ai*. We do not want to have different updating rules for different propositions: two different propositions *i* and *i* with the same *qi* = *qi* and the same *ai* = *ai* should be updated to the same posteriors, *Pi* = *Pi* . In other words, the functions *gi* and *fi* must be independent of *i*. Therefore,

$$\frac{\partial \mathcal{S}}{\partial p\_i} = f(p\_i, q\_i) \quad \text{for all} \quad i \in \mathcal{X} \text{ .} \tag{A6}$$

Integrating, one obtains

$$S[p\_\prime q] = \sum\_i F(p\_{i\prime} q\_i) + \text{constant} \dots$$

for some still undetermined function *F*. The constant has no effect on the entropy maximization and can be dropped.

The corresponding expression for a continuous variable *x* is obtained replacing *i* by *x*, and the sum over *i* by an integral over *x* leading to Equation (2),

$$S[p\_\prime q] = \int d\mathbf{x} \, F(p(\mathbf{x}), q(\mathbf{x})) \, .$$

### **Appendix B. DC2—Independent Subsystems**

Here, we show that DC2 leads to Equation (9). Let the microstates of a composite system be labeled by (*i*1, *i*2) ∈ X = X<sup>1</sup> × X2. We shall consider two special cases.

#### Case (a)

First, we treat the two subsystems separately. Suppose that for subsystem 1, we have the extremely constraining information that updates *q*1(*i*1) to be *P*1(*i*1), and for subsystem 2 we have no new information at all. For subsystem 1, we maximize *S*1[*p*1, *q*1] subject to the constraint *p*1(*i*1) = *P*1(*i*1) and the selected posterior is, of course, *p*1(*i*1) = *P*1(*i*1). For subsystem 2, we maximize *S*2[*p*2, *q*2] subject only to normalization and there is no update, *P*2(*i*2) = *q*2(*i*2).

When the systems are treated jointly, however, the inference is not nearly as trivial. We want to maximize the entropy of the joint system,

$$S[p\_\prime q] = \sum\_{i\_1, i\_2} F(p(i\_1, i\_2), q\_1(i\_1) q\_2(i\_2)) \; ,$$

subject to the constraint on subsystem 1,

$$\sum\_{i\_1} p(i\_1, i\_2) = P\_1(i\_1) \dots$$

Notice that this is not just one constraint: we have one constraint for each value of *i*1, and each constraint must be supplied with its own Lagrange multiplier, *λ*1(*i*1). Then,

$$\delta\left[S - \sum\_{i\_1} \lambda\_1(i\_1) \left(\sum\_{i\_2} p(i\_1, i\_2) - P\_1(i\_1)\right)\right] = 0\dots$$

The independent variations *δp*(*i*1, *i*2) yield the following:

$$f\left(p(i\_1, i\_2), q\_1(i\_1)q\_2(i\_2)\right) = \lambda\_1(i\_1)\,\,\_{\prime}$$

where *f* is given in (A6),

$$\frac{\partial S}{\partial p} = \frac{\partial}{\partial p} F(p\_\prime q\_1 q\_2) = f(p\_\prime q\_1 q\_2) \dots$$

Next, we impose that the selected posterior is the product *P*1(*i*1)*q*2(*i*2). The function *f* must be such that

$$f\left(P\_1q\_2, q\_1q\_2\right) = \lambda\_1\ .$$

Since the RHS is independent of the argument *i*2, the *f* function must be such that the *i*2-dependence cancels out, and this cancellation must occur for all values of *i*<sup>2</sup> and all choices of the prior *q*2. Therefore, we impose that for any value of *x* the function *f*(*p*, *q*) must satisfy

$$f(p\mathbf{x}, q\mathbf{x}) = f(p, q)\dots$$

Choosing *x* = 1/*q*, we obtain

$$f\left(\frac{p}{q},1\right) = f(p,q) \quad \text{or} \quad \frac{\partial F}{\partial p} = f(p,q) = \phi\left(\frac{p}{q}\right). \tag{A7}$$

Thus, the function *f*(*p*, *q*) has been reduced to a function *φ*(*p*/*q*) of a single argument.
