*4.2. MacKay (2003)*

In what soon became a standard Bayesian textbook, MacKay [55] devoted one chapter (Chapter 28) to links between simplicity and likelihood. He actually did not claim equivalence, but as I discussed in [57] and revisit here, he mistakenly equated surprisals and description lengths, and he made an admittedly compelling argumen<sup>t</sup> that, however, was overinterpreted by others—who, subsequently, did claim equivalence.

One of MacKay's conclusions was that "MDL has no apparent advantages over the direct probabilistic approach" [55] (p. 352). However, he attributed MDL not to MDL developer Rissanen [42] but to MML developers Wallace and Boulton [40]—just as [61] later did too, by the way. In fact, in the entire chapter, Mackay mistakenly wrote "MDL" instead of "MML" and "description length" instead of "message length" or "surprisal" (Baxter & Oliver [62] noticed this mistake also in MacKay [63]). Therefore, he in fact discussed the Bayesian MML and not the non-Bayesian MDL. No wonder, therefore, that he saw "no apparent advantages". Unfortunately, his mistake added to the already existing misconceptions surrounding simplicity and likelihood. For instance, subsequently, Feldman [53,64–67] also mixed up MDL's description lengths (which, i.t.o. modern IT's descriptive codes, aim at minimal code length for individual things) and MML's surprisals (which, i.t.o. classical IT's label codes, minimize long-term average code length for large sets of identical, and nonidentical things).

MacKay's mistake above already may have triggered equivalence claims, but unintentionally, another conclusion may have done so more strongly. That is, he also argued that "coherent inference (as embodied by Bayesian probability) automatically embodies Occam's razor" [55] (p. 344). This is easily read as suggesting equivalence (see, e.g., in [52,53]), but notice that MacKay reasoned as follows.

"Simple models tend to make precise predictions. Complex models, by their nature, are capable of making a greater variety of predictions [...]. So if *H*2 is a more complex model [than *H*1], it must spread its predictive probability *<sup>P</sup>*(*D*|*<sup>H</sup>*2) more thinly over the data space than *H*1. Thus, in the case where the data are compatible with both theories, the simpler *H*1 will turn out more probable than *H*2, without our having to express any subjective dislike for complex models." [55] (p. 344)

In other words, he argued that conditional probabilities, as used in Bayesian modeling, show a bias towards hypotheses with low prior complexity. This is definitely interesting and compelling, and as he noted, it reveals subtle intricacies in Bayesian inference.

Currently relevant, however, is that it does not imply equivalence of simplicity and likelihood. For instance, regarding both priors and conditionals, it is silent about how close (fairly stable) simplicity-based precisals and (fairly flexible) Bayesian probabilities might be. Furthermore, whereas prior precisals are nonuniform by nature, MacKay explicitly assumed uniform prior probabilities (he needs this not-truly-Bayesian assumption, because nonuniform prior probabilities could easily overrule the bias he attributed to conditional probabilities). This assumption as such already excludes equivalence. Notice further that he neither gave a formal definition of complexity nor a formal proof of his argument. This means that his argument, though certainly compelling, does not reflect a formally proven fact. Thereby, it has the same status as, for instance, van der Helm's [17] argumen<sup>t</sup> that, specifically in visual perceptual organization, simplicity-based conditional precisals are close to intuitively real conditional probabilities—which would imply that precisals are fairly reliable

in the everyday perception by moving observers. It is true that both arguments reflect interesting rapprochements between simplicity and likelihood, but neither argumen<sup>t</sup> asserts equivalence.

### *4.3. Summary (2)*

My objective here was to trace back where Pinna and Conti's misguided equivalence claim came from. This led to Chater [46] and MacKay [55], whose flawed comprehension of the links between classical IT and modern IT seems to have given rise to various misconceptions. It is true that they pointed at interesting things, but they did not provide any evidence of equivalence of simplicity and likelihood. With fundamentally different baits, classical IT and modern IT are fishing in the same pond of probabilities and information measurements—using a perhaps mind-boggling body of terms. It is therefore understandable that comparisons between them may be confusing, particularly to those who are less trained in formal reasonings. Persisting in an equivalence claim after having been informed in detail that such a claim is nonsense—as Pinna and Conti did—is another matter however, and in my view, scientifically inappropriate.
