**4. Model**

In this section, we propose an information-theoretic model of decision-making with prior beliefs in the presence of Smithian competition and market feedback. Given an agent's prior beliefs and an observed macroeconomic outcome (such as the distribution of returns), the model can infer the least biased decisions that would result in such returns. Importantly, the incorporation of prior beliefs allows for reasoning about the decision-making of the agen<sup>t</sup> based upon both their prior beliefs and their utility maximisation behaviour.

We develop upon the maximum-entropy model of inference from [5], and the thermodynamic treatment of prior beliefs formalised by [4], as outlined in Section 3.

#### *4.1. Maximum Entropy Component*

The proposed approach can be seen as a generalisation of QRSE, allowing for the incorporation of heterogeneous prior beliefs based on the free-energy principle. The key element is the information acquisition cost, measured as the KL-divergence which arises from the free-energy principle and has been shown to provide a fundamentally grounded application of Bayesian inference [46]. In order to derive decisions *f* [*a*|*x*] for an action or choice *a* (e.g., buy, hold or sell) given an observed return *x* (e.g., a return on investment), we maximise the expected utility *U* subject to a constraint on the acquisition of information measured as the maximal divergence *d* between the posterior decisions and prior beliefs

*p*[*a*]. As mentioned, *d* is measured as the KL-divergence, which is the generalised extension of the original (Shannon) entropy constraint [43] introduced in Equation (2)):

$$\max \sum\_{a \in A} f[a|\mathbf{x}] \, \mathcal{U}[a, \mathbf{x}]$$

$$\text{subject to } \sum\_{a \in A} f[a|\mathbf{x}] \log \left( \frac{f[a|\mathbf{x}]}{p[a]} \right) \le d \tag{14}$$

$$\sum\_{a \in A} f[a|\mathbf{x}] = 1$$

The Lagrangian for Equation (14) then becomes

$$\mathcal{L} = \sum\_{a \in A} f[a|\mathbf{x}] \mathsf{U}[a, \mathbf{x}] - \lambda \left( \sum\_{a \in A} f[a|\mathbf{x}] - 1 \right) - T \left( \sum\_{a \in A} f[a|\mathbf{x}] \log \left( \frac{f[a|\mathbf{x}]}{p[a]} \right) - d \right) \tag{15}$$

There are two distinct modelling views on such a formulation [47–50]. The first assumes that specific constraints are known from the data, for example, a maximal divergence *d* may be specified based on actual observations of agen<sup>t</sup> behaviour. The second view, instead, would consider the Lagrange multiplier *T* to be a free parameter of the model, with the constraint *d* representing an arbitrary maximum value: Thus, this approach would optimise *T* in finding the best fit. In this work, we take the second perspective since underlying decision data is unavailable, and a specific restriction on divergent information costs should not be enforced. In other words, *T* is considered to be a free model parameter corresponding to different information acquisition costs, mapping to different (unknown) cognitive and information-processing limits *d*.

Looking at the final term in Equation (15), in the case of homogeneous priors, log *p*[*a*] is a constant which drops out of the solution, which is equivalent to the optimisation problem of Equation (3), and thus, recovers the original QRSE model. In the general case, the dependence on log(*p*[*a*]) means that *T* instead serves as the Lagrange multiplier for the cost of information acquisition. Taking the first order conditions of Equation (15) and solving for *f* [*a*|*x*] (as shown in Appendix A.2) yields

$$f[a|\mathbf{x}] = \frac{1}{Z\_{A|\mathbf{x}}} p[a] e^{\frac{\|\mathbf{f}[a,\mathbf{x}]\|}{T}} \tag{16}$$

we see this as a generalisation of the logit function, which allows for the separation of the prior beliefs and the agent's utility function.

In the more general case, *p*[*a*] can be heterogeneous for all *a*. Parameter *T* therefore controls the deviations from the prior (rather than from the base case of uniformity), that is, it controls the cost of information acquisition. Following [4], we observe the following limits

$$\begin{aligned} \lim\_{T \to \infty} f[a|\mathbf{x}] &= p[a] \\ \lim\_{T \to 0, T \ge 0} f[a|\mathbf{x}] &= e^{\frac{\|\mathcal{U}[\mathbf{x}, a]}{T}} = \max \mathcal{U}[\mathbf{x}, a] \\ \lim\_{T \to 0, T < 0} f[a|\mathbf{x}] &= e^{\frac{\|\mathcal{U}[\mathbf{x}, a]}{T}} = \min \mathcal{U}[\mathbf{x}, a] \end{aligned} \tag{17}$$

In the limit *T* → ∞ (i.e., infinite information acquisition costs), the agen<sup>t</sup> just falls back to their prior beliefs as it becomes impossible to obtain new information. In the limit *T* → 0, the agen<sup>t</sup> becomes a perfect utility maximiser (i.e., if information is free to obtain, the agen<sup>t</sup> could obtain it all and choose the option that best maximises payoff with probability 1). In the *T* < 0 case, we see this corresponds to anti-rationality. For economic decision-making, we can limit temperatures to be non-negative, *T* ≥ 0, although there are specific cases where such anti-rationality may be useful (e.g., modelling a pessimistic observer or adversarial environments [4]). The relationship between temperature and utility is visualised in Figure 1.

Crucially, large temperatures (costly acquisition) do not revert to the uniform distribution (as in the typical QRSE case, unless the prior is uniform), instead reverting to prior beliefs. This is visualised in Figure 2, and discussed in more detail in Section 4.3.

#### *4.2. Feedback Between Observed Outcomes and Actions*

Following [5], we use a joint distribution to model the interaction between the economic outcome *x*, and the action of agents *a*.

To recover a joint probability, we need to determine *f* [*x*] (since *f* [*a*, *x*] = *f* [*a*|*x*] *f* [*x*]) which we do with the maximum entropy principle, as shown in Section 3.1. To do this, we maximise the joint entropy with respect to the marginal probabilities. That is,

$$\begin{split} \mathcal{L} &= -\int\_{\mathcal{X}} f[\mathbf{x}] \log f[\mathbf{x}] d\mathbf{x} + \int\_{\mathcal{X}} f[\mathbf{x}] H[A|\mathbf{x}] d\mathbf{x} - \lambda \left( \int\_{\mathcal{X}} f[\mathbf{x}] d\mathbf{x} - 1 \right) \\ &- \gamma \left( \int\_{\mathcal{X}} f[\mathbf{x}] \mathbf{x} d\mathbf{x} - \xi \right) - \rho \left( \int\_{\mathcal{X}} f[\mathbf{x}] \frac{p[a] e^{\frac{\mathcal{U}[\mathbf{x},\mathbf{x}]}{\mathcal{I}}} - p[\mathbf{a}] e^{\frac{\mathcal{U}[\mathbf{z},\mathbf{x}]}{\mathcal{I}}}{Z\_{A|\mathbf{x}}} \mathbf{x} d\mathbf{x} - \delta \right) \end{split} \tag{18}$$

with

$$\begin{split} H[A|\mathbf{x}] &= -\sum\_{a \in A} f[a|\mathbf{x}] \log f[a|\mathbf{x}] \\ &= -\frac{1}{Z\_{A|\mathbf{x}}} \sum\_{a \in A} p[a] e^{\frac{\|I[a,\mathbf{x}]}{T}} \left( \log p[a] + \frac{\mathcal{U}[a,\mathbf{x}]}{T} - \log Z\_{A|\mathbf{x}} \right) \end{split} \tag{19}$$

An important point to be made here is that *<sup>H</sup>*[*A*|*x*] still measures (Shannon) entropy. We have seen above how the new definition for *f* [*a*|*x*] uses the KL-divergence as a generalised extension of entropy when incorporating prior information. In Equation (19), we do not use this divergence for an important reason. In Equation (14) we are measuring divergence from known prior beliefs, however, now when optimising Equation (18) we wish to infer decisions from unobserved decision data. This is where the principle of maximum entropy comes into play, i.e., we wish to maximise the entropy of our new choice data (which was derived from KL-divergence of prior beliefs), but we do not wish to perform cross-entropy minimisation as we do not have the true decisions ¯ *f* [*a*|*x*]. With this in mind, we still utilise the principle of maximum entropy as is done in QRSE for inference to obtain the least biased resulting decisions. This keeps the proposed extensions in the realm of QRSE, but comparisons to the principle of minimum cross-entropy [51,52] could be considered in future work particularly when some target distributions are known directly.

**Figure 2.** Decision Functions. All cases have equivalent utility functions. Each row has equivalent temperatures, showing how with matched parameters and utility, having an alternate prior can shift the decision-makers preference. Each column has different priors, given along the top of the first row to show how decision-makers decisions change based on their prior beliefs. On the left-hand side, preference is shifted towards the buying case. Likewise, on the right-hand side, preference is given to the selling case. The uniform case with equal preference is shown in the middle.

In Equation (18), *ξ* is known from the mean of the observed macroeconomic outcome, and so this constraint is used explicitly. This is in contrast to *d* (and *δ*) which are unknown as outlined in Section 4.1. The important distinction with Equation (18) is that the *f* [*a*|*x*] functions (and *<sup>H</sup>*[*A*|*x*]) now use the updated expressions for *f* [*a*|*x*], which incorporate the prior beliefs. Taking the partial derivative of L with respect to *f* [*x*], and solving for *f* [*x*] gives

$$f[\mathbf{x}] = \frac{1}{Z\_A} e^{H[A|\mathbf{x}] - \gamma \mathbf{x} - \rho \mathbf{x} \left(\frac{\frac{\|I[\mu, \mathbf{x}] }{\|I[\mu, \mathbf{x}] - \|I[\mu, \mathbf{x}]} - \frac{\|I[\mu, \mathbf{x}]}{\|}\right)}{Z\_{A|\mathbf{x}}}} \tag{20}$$

Equation (20) expresses the information acquisition cost in the form of the Lagrange multiplier *T* (from Equation (15)), and a competition cost in the form of the multiplier *ρ*.

As we have a solution for *f* [*a*|*x*] (Equation (16)) and *f* [*x*] (Equation (20)) in terms of prior beliefs and information acquisition costs, we can then derive all other probability

functions using the Bayes rule. That is, we can obtain *f* [*a*, *<sup>x</sup>*], *f* [*x*|*a*] and *f* [*a*] which in turn incorporate these prior beliefs/acquisition costs:

$$\begin{split} f[a, \mathbf{x}] &= f[a|\mathbf{x}]f[\mathbf{x}] \\ &= \frac{\frac{\|f[\mathbf{z}, \mathbf{x}] - \mathbf{z}\|}{T} + H[A|\mathbf{x}] - \gamma \mathbf{x} - \mu \mathbf{x} \left(\frac{p[\mathbf{z}]\frac{\|f[\mathbf{z}, \mathbf{x}]}{T} - p[\mathbf{z}]\frac{\|f[\mathbf{z}, \mathbf{x}]}{T}}{\mathbf{Z}\_{A]\mathbf{x}}\mathbf{z}}\right)}{\mathbf{Z}\_{A]\mathbf{x}}\mathbf{Z}\_{A}} \end{split} \tag{21}$$

We can obtain *f* [*a*] by marginalising out *x* from the joint distribution:

$$\begin{split} f[a] &= \int\_{\mathcal{X}} f[a, \mathbf{x}] \\ &= \frac{1}{Z\_A} \int\_{\mathcal{X}} \frac{1}{Z\_{A|\mathbf{x}}} p[a] e^{\frac{\underline{l}\mathcal{U}[a,\mathbf{x}]}{\mathbf{I}} + \underline{l}\mathcal{U}[A|\mathbf{x}] - \gamma x - \rho x \left(\frac{p[a|\mathbf{c}]^{\frac{\underline{l}\mathcal{U}[a,\mathbf{x}]}{\mathbf{I}} - p[a|\mathbf{c}]^{\frac{\underline{l}\mathcal{U}[\tilde{a},\mathbf{x}]}{\mathbf{I}}}}{\mathcal{Z}\_{A|\mathbf{x}}}\right)} \end{split} \tag{22}$$

Finally, *f* [*x*|*a*] can then be computed by a direct application of the Bayes rule: *f* [*x*|*a*] = *f* [*a*, *x*]/ *f* [*a*].

Given only an expected average value *ξ* (and the usual normalisation constraints), we have derived a joint probability distribution, which maximises the entropy subject to some information acquisition cost *d*, along with a competition cost *δ*. The resulting distribution free parameters (the Lagrange multipliers) are those which fit most closely to the true underlying distribution of returns. Thus, we have provided a generalisation of QRSE, which is fully compatible with the incorporation of prior beliefs.

#### *4.3. Priors and Decisions*

The introduced priors affect the conditional probabilities of agen<sup>t</sup> decisions by shifting focus towards these preferred choices. The introduced priors allow the decision-maker to place more focus on particular actions if they have been deemed important a priori.

In Section 3.2 we showed how to separate the initial energy potential and new energy potential for distinguishing prior beliefs and utility functions. It is instructive to interpret these again as potentials, by setting *αa* = *T* log *p*[*a*], which allows us to represent the choice probability as

$$f[a|\mathbf{x}] = \frac{1}{Z\_{A|\mathbf{x}}} e^{\frac{\|I[x,a] + ax}{\overline{I}}} \,. \tag{23}$$

Equation (23) shows how *α* shifts the likelihood based on the prior preferences. An example of these shifts is visualised in Figure 2. This can be interpreted as placing more emphasis on actions deemed useful a priori as *T* increases. The information acquisition cost component *T* then controls the sensitivity between the utility and a priori knowledge, with a high *T* meaning higher dependence on prior information, and low *T* indicating a stronger focus on the utility alone.

The majority of binary QRSE models use a simple linear payoff definition for utility:

$$\mathcal{U}[\mathbf{x}, \mathbf{a}] = \mathbf{x} - \mu,\ \mathcal{U}[\mathbf{x}, \mathbf{a}] = -(\mathbf{x} - \mu)\ .$$

With this definition, a tunable shift parameter *μ* serves as the expected fundamental rate of return. The relationship between *μ* and the real markets returns *ξ* (which was used as a constraint in Equation (7)), serves then as a measure of fulfilled expectations (i.e., if *μ* = *ξ*) or unfulfilled expectations (*μ* = *ξ*). This implies a symmetric shift parameter *μ*. As a specific example, if *a* = sell and *a*¯ = buy, *μ* = 0.25 means that at *x* = 0.25, buyers and sellers will be equally likely to participate in the market, i.e., *f* [sell|*μ*] = *f* [buy|*μ*] = 0.5. In this sense, *μ* can be seen as the indifference point. The symmetry arises from the fact that *f* [buy|*x*] + *f* [sell|*x*] = 1. Therefore, in the binary action case, it is possible to find a *μ*∗ with the uniform priors *p* = [0.5, 0.5] such that the decision functions will be equivalent

to *μ* with any arbitrary priors *p* = [*c*, 1 − *c*], with *c* ∈ [0, 1]. In this sense, *μ* can be seen as encapsulating a prior belief.

However, explicit incorporation of prior beliefs on actions is useful here as it helps to separate the agents' expectations in relation to their prior belief (e.g., a higher *μ* resulted from needing to change from their past behaviour) and choose the actions for which an agen<sup>t</sup> should emphasise acquiring more information. The introduced prior beliefs are strictly known before any inference is performed, whereas *μ* is the result of the inference process. The separation of prior beliefs and current expectations is important, as with *μ* alone this can not capture an agent's predisposition prior to performing any information processing. In addition, this applies more generally to any arbitrary utility functions (as QRSE is, of course, not limited to the linear shift utility function with *μ* outlined above), or when any preference is known about decisions a priori.

Consider also the three action case, *A* = {buy, hold, sell}, with the same utility functions as above but with the extra utility for holding being *<sup>U</sup>*[*<sup>x</sup>*, hold] = 0. We can see that it would be desirable if buying and selling no longer required this symmetry. The use of priors can introduce this asymmetry, by providing separate indifference points for buy/hold and sell/hold. Such asymmetry alters the resulting frequency distribution of transactions, and may help to explain various trading patterns [16]. The difference of symmetric and asymmetric buy and sell curves is shown in Figure 3. Figure 3 shows that such functions could be recovered by introducing a secondary shift parameter *μ*2. Parameter *μ*1 (the original *μ*) then becomes the indifference point for buy and hold, and *μ*2 for sell and hold. This is the method proposed in [30]. Introducing priors into this case again allows for separation of expectation *μ*, from prior belief and follows the same methodology as outlined above for the binary case. Furthermore, if we set *p*[hold] = 0, we recover the binary case. This highlights that the standard QRSE with binary actions and uniform priors is a special case of the ternary action case with heterogeneous priors.

**Figure 3.** In the three-action case, the priors can introduce asymmetries by biasing the decision functions. This allows for separate indifferent points (**right**) vs. the uniform priors implying a single intersect (**left**).

From this, we can see how introducing priors alters the decision functions by allowing agents to focus on suitable a priori candidate actions. We have also shown how the binary case of a utility function with a shift parameter can be formalised to achieve equivalent results with a uniform prior and altered shift parameter. However, in the multi-action case, the priors allow for asymmetry, and in general, the priors may help with the optimisation process (by providing an alternate initial configuration). This approach also allows for the explicit separation of the two factors affecting an agent's choice, by distinguishing the contributions of prior beliefs and the utility maximisation.

#### *4.4. Rolling Prior Beliefs*

The proposed extension is general and allows for the incorporation of any form of prior beliefs, and in this section, we illustrate an example where the priors at time *t* are set as the resulting marginal probabilities from the previous time *t* − 1:

$$p\_t[a] = f\_{t-1}[a]$$

i.e., the prior belief *pt*[*a*] is set as the previous marginal probability *ft*−<sup>1</sup>[*a*] for taking action *a* (at *t* = 0, we use a uniform prior). Using the previous marginal probability as a prior introduces an "information-switching" cost, where *T* relates to the divergence from the previous actions, resulting in the following decision function:

$$f\_t[a|x] = \frac{1}{Z\_{A|x}} f\_{t-1}[a] e^{\frac{\mathcal{U}[x,a]}{\overline{\mathcal{I}}}}$$

That is, acquiring information on top of the previous knowledge comes at a cost (controlled by *T*). When the cost of information acquisition is high (large *T*), the agen<sup>t</sup> reverts to the previously learnt knowledge (i.e., the marginal probabilities from *t* − 1). In contrast, when *T* is extremely small, the agen<sup>t</sup> is able to acquire new information allowing deviation from their prior knowledge at *t* − 1. In the special case of *T* = 0, information is free, and the agen<sup>t</sup> can become a perfect utility maximiser.

Given the expression for *ft*[*a*|*x*], we obtain the following solution for *ft*[*x*]:

$$f\_t[\mathbf{x}] = \frac{1}{Z\_A} \mathbf{c}$$

from which we can derive the joint and other probabilities, as shown in Section 4.1. This is exemplified in Section 5, in which we examine various priors for time-dependent applications.

#### **5. Australian Housing Market**

To exemplify the model, we use the Greater Sydney house price dataset provided by SIRCA-CoreLogic and utilised in [53,54]. This dataset is outlined in Appendix B. In [54], an agent-based model is used to explain and forecast house price trends and movement patterns as arising from the individual agent's buy and sell decisions. Furthermore, the ABM implemented bounded rational agents driven by social influences (e.g., fear of missing out) and partial information about submarkets. While the resulting dynamics produced by the ABM accurately match the actual price trends, the decision-making mechanism and the bounded rationality of the agents were not theoretically grounded. In the following section, we aim to explain how the bounded rational behaviour of the agents operating in the housing market can be aligned with the model proposed in this study based on prior beliefs of agents and Smithian competition within the market. With this example, Smithian competition can be seen as agen<sup>t</sup> decisions (buying or selling) affecting returns for an area, and agents decisions also being made based on returns for particular areas, i.e., a feedback loop is assumed in the market.

In particular, we want to explore what role an agent's prior beliefs play in their resulting decisions. For example, given equivalent configurations (e.g., utility and returns) and different prior knowledge, how would the agent's behaviour differ? Furthermore, we would like to explore the rationality of the agents, measured in terms of the cost of information acquisition, in order to see how the agents behave. For example, are agents predominantly reliant on past knowledge in times of market growth, resulting in unexpected downturns from mismanaged agen<sup>t</sup> expectations? Alternatively, in deciding if it is a good time to buy or sell, the agents may balance their past knowledge with utility and current returns (i.e., the past knowledge would not be a predominant factor). The proposed model is particularly suited for answering such questions due to the low number

of free (and microeconomically) interpretable parameters, as well as the explicit separation of prior beliefs (as opposed to previous QRSE approaches). Our goal is not to infer the "best" prior, but rather to explore and compare dynamics resulting from various priors. In addition, we aim to verify the conjecture that during crises, and periods exhibiting non-linear market dynamics, macroeconomic conditions may become more heterogeneous, and thus, non-uniform priors may outperform uniform ones in such times.
