*5.2. Results*

We fit the distributions with the various priors outlined in Section 5.1.1 to the actual underlying return data, to estimate how well we are able to capture this distribution and explore the effects that these priors have on the resulting distribution. The results are presented in Table 1, which summarises the likelihood and the percentage of the explained variability (measured as Information Distinguishability (I.D.) [61]) compared to the underlying distribution. We see that there are no large differences in general between the priors in terms of the explained variability. However, the goal here is not to argue for the "best" prior fitting the dataset in terms of the explained variability, but rather to explore differences in the agen<sup>t</sup> behaviour based on the prior knowledge (using the housing dataset as an example). Thus, the resulting fitted distributions *f* [*x*], which are visualised in Figure A5, are more interesting. We observe how altering prior beliefs result in different resulting distributions and discuss how the incorporation of prior beliefs allows for a separation of the agents' utility maximisation behaviour from their previous knowledge. From Figure A5 we can also see how the priors can alter the optimisation process, for example, a good (bad) prior may help (harm) the optimisation by providing alternate initial configurations. The extreme priors can be seen as harmful, for example, in 2012 where the resulting distributions are unable to capture the true underlying distribution. The reason for this is being unable to find suitable *T* to enable appropriate divergence from the extreme prior beliefs. In contrast, well selected priors can help the optimisation process and result in better fitting distributions, such as in 2016 where the decisions resulting from the mean and previous prior fit the true data significantly better than the uniform prior.

The agents' decision functions *f* [*a*|*x*] are visualised in Figure A7 which makes it clear how each prior adjusts the resulting probability of taking an action (and thus, alters the decisions). From this, we can see different probabilistic behaviours despite having equivalent utility functions and optimisation processes due to varying prior beliefs. For example, with the extreme priors, we observe a clear shift towards the strongly preferred action.

Figure A6 shows the resulting joint distributions *f* [*a*, *<sup>x</sup>*], combining the results of Figures A5 and A7, since *f* [*a*, *x*] = *f* [*a*|*x*] *f* [*x*]. Looking at the second row of each plot in Figure A6, we can see a visual representation of how the joint probabilities adjust over time when using the previous year as the prior belief.


**Table 1.** Resuling likelihood and percentage of variability explained for each year, when compared to the actual underlying distribution (i.e., those given in Figure A2). Optimisation is done by minimising the negative log-likelihood between the resulting distributions and the actual distribution of returns.

The resulting marginal action probabilities are visualised in Figure 4, where we observe clear market peaks and dips which match the actual returns of Figure 5, aligning with the general trends observed in Figure A1. The priors work on either increasing or decreasing the resulting marginal probabilities. For example, in the extreme sell case we see much higher resulting probabilities for *f* [sell], likewise in the extreme buying case, we see much higher probabilities for *f* [buy]. The general peaks/dips remain in both cases. Overall, this shows how the prior belief can influence the resulting marginal probabilities.

**Figure 4.** Resulting marginal probabilities *f* [*a*] for varying priors. Green represents *f* [buy], and red represents *f* [sell].

**Figure 5.** Real Average Returns.

Using the previous year's marginal probability as a prior for the current year has a smoothing effect on the resulting year-to-year marginal probabilities. Comparing the previous prior with the uniform prior in Figure 4, we observe, particularly during 2015– 2018, a more defined/well-behaved step-off in *f* [sell]. This indicates the slowing of returns during these years. At the same time, the uniform priors are more affected by local noise, potentially overfitting to only the current time period, since no consideration can be given to

the past behaviour of the market. This results in larger fluctuations in the agen<sup>t</sup> behaviour as they have no concept of market history.

#### *5.3. Role of Parameters*

One of the benefits of QRSE is the low number of free parameters which results in a relatively interpretable model. There are four free parameters in the typical QRSE distribution: *T*, *μ*, *ρ* and *γ*, each with a corresponding microeconomic foundation. In this section, we discuss the two main parameters of interest in this work: The decision temperature *T* and agen<sup>t</sup> expectations *μ*, and the effect that prior beliefs have on the resulting values (and interpretation) of these parameters. We also include discussion on the impact of decisions on resulting outcomes *ρ* and skewness of the resulting distributions *γ* in Appendix D, since *ρ* and *γ* were less affected by the introduced extensions. There is an additional parameter *ξ* (shown in Figure 5), which is not a free parameter, representing the mean of the actual returns and serving as a constraint on the mean outcome in Equation (7).

#### 5.3.1. Decision Temperature

The decision temperature *T* controls the level of rationality and deviations from an agent's prior beliefs. An extremely high temperature corresponds to high information acquisition cost and results in choosing actions simply based on the prior belief. In contrast, an extremely low temperature corresponds to utility maximisation, and in the case of free information (*T* = 0) a perfect utility maximiser is recovered (i.e., homo economicus). In the housing example used here, *T* relates to the ability of an agen<sup>t</sup> to learn all the required knowledge of the market, i.e. the actual profit rates for various areas. With *T* = 0, the agen<sup>t</sup> has perfect knowledge of the current market profitability. With *T* > 0, this represents some friction with acquiring such information, e.g., it can be difficult to gather all the required information to make an informed choice due to, for example, search costs. From a psychological perspective, *T* can be a measure of the "just-noticeable difference" [62], meaning microeconomically, *T* is related to the ability of an agen<sup>t</sup> to observe quantitative differences in resulting choices. High *T* means the agen<sup>t</sup> is unable to distinguish choices based on *U*, due to high information-processing costs, so instead acts according to their previously learnt knowledge.

Since *T* is related to the prior, we see differences in the resulting values visualised in Figure 6. What can be observed from looking at the general trends of *T* is that it peaks in the years with high average growth (large *ξ*), such as 2015, as these years correspond to a growing market, and agents require less attention to market conditions, although this depends on the prior used.

Looking at the previous marginal probability as the prior (the orange profile), we observe in the build-up phase to 2015 increasing decision temperatures corresponding to agents acting on these previous beliefs. As these beliefs were also positive (i.e., agents expected favourable returns), these large returns can be explained by the agents continuously expecting this growth. This pattern changed in 2016, when the market "reverses": Now

the agents must focus instead on their current utility since their prior beliefs no longer reflect the current market state. Such market reversals are categorised by low decision temperatures, since using the previous action probabilities now becomes misinformative (in contrast to the "building"/trend-following stages). This indicates an increased focus on agen<sup>t</sup> rationality in times of market reversals. The incorporation of prior beliefs (particularly using the previous priors) is useful as it allows for the discussion to be extended in the temporal sense (as is done here). In other words, we can consider "building" the agent's beliefs as possible underlying causes for market collapses and relating the rationality of agents to the relative state of the market.

#### 5.3.2. Agent Expectations

In microeconomic terms, parameter *μ* captures the agent's expectations. A large *μ* corresponds to an optimistic agent, who is expecting high returns from the market. In contrast, a low *μ* corresponds to a pessimistic agent, who is expecting poor returns from the market. As this works to shift the decision functions, there is a relation between the prior and parameter *μ*, since the prior also works as shifting preferences towards a priori preferred actions as shown in Section 4.3. There is also a relationship between *μ* and *γ* (outlined in Appendix D.2), since *γ* can help to account for unfulfilled agen<sup>t</sup> expectations by adjusting the skew of the resulting distributions.

Generally, the agent's beliefs are within the ±2.5% range (expecting between a 2.5% quarterly growth or 2.5% dip), which corresponds to the bulk of the area under the curve in Figure A2. This means that the agent's expectations develop in accordance with actual market conditions, as can be seen in Figure 7.

**Figure 7.** Agent Expectations vs. Actual Returns (in black).

The extreme priors result in larger absolute values of *μ* since larger shifts are needed to offset the (perhaps) poor prior beliefs. This can be seen in 2014 particularly, where the extreme sell prior has *μ* = 10%.

The values of previous prior *μ* tend to have a larger magnitude than the uniform priors, since as mentioned, these priors can capture build-up of beliefs (and as such some "trend-following" can be captured). For example, the year 2008 saw the lowest average returns *ξ*, as shown in Figure 5. Using the previous prior, the agents' expectations correctly match the sign of the actual returns in 2008 (i.e., agents correctly expected a decline in house prices). This results in more pessimistic agents than those using the uniform prior since they can reflect on the market performance from 2007. Likewise, during 2013–2015, the values of previous prior *μ* become larger than those for the uniform prior, since they are building on the previous years expectations which were all positive. In contrast, the period 2015–2017 saw a steady decline in agents expectations of returns with previous priors, reflecting the overall market state which appeared to be in a downward trend. The previous priors were able to capture this trend. Using the uniform priors, the year 2016 had a higher *μ* than the market peak of 2015. The reason is that uniform priors are unable to capture the fact that the previous timestep had higher (or lower) returns than the current timestep. In this case, the discussion can not be extended in the temporal sense of "building" on beliefs, and agents may miss such crucial temporal information without the incorporation of prior

beliefs. This is evidenced by the significantly lower performance of the uniform prior in 2016 in comparison to the previous prior, as shown in Table 1, highlighting the usefulness of non-uniform (and temporal-based) priors in times of market crises and reversals.

#### *5.4. Temporal Effects of Data Granularity on Decisions*

In Section 5.2, we have analysed agen<sup>t</sup> decisions over the previous 15 years, where decisions were grouped annually. This level of granularity was chosen to examine different agen<sup>t</sup> behaviour from year to year. However, other levels of grouping can also be explored to give an insight into the impact of noise on the inference process. For example, an extremely granular grouping will likely result in additional noise in the decision-making process, which may or may not be impacted by the incorporation of prior beliefs. Likewise, a low granular grouping can be seen as "pre-smoothed", which may work in a similar fashion to the incorporation of prior temporal-based beliefs at a higher granularity, which we have seen can smooth the resulting decisions. In this section, we examine the usefulness of prior beliefs in such situations, providing comparisons with alternate data representations.

Two additional levels of granularity are considered, one more granular and one less granular than the annual groupings introduced in Section 5.2. We look at quarterly data, as well as aggregate groupings based on market state. In doing so, we have three levels for categorising agen<sup>t</sup> behaviour: Quarterly, annually, and aggregated market state. This allows us to compare resulting agen<sup>t</sup> decisions across different temporal scales, comparing the differences generated by the incorporation of prior beliefs and various data-level modifications.

The aggregate market state data groups years into "terms", which correspond with various "stages" of the market. These are growth and crash phases, highlighted as "Pre Crash" (Mid 2006–2007), "Crash" (2008), "Recovery 1" (2009–Mid 2011), "Small Crash" (Mid 2011–Mid 2012), "Recovery 2" (Mid 2012–Mid 2018) and "Recent Crash" (Mid 2018 to 2020). The overall market trends can be visualised in Figure A1 to see market returns for each corresponding "term".

The resulting decision likelihoods *f* [*A*] are presented in Figure 8. In analysing the differences in resulting marginal probabilities between the various granularities, we can observe the impact from data-level modifications, i.e., performing inference on a larger time scale for macroeconomic observations, and how the incorporation of prior information affects such results. In Section 5.2 we have mentioned the previous and mean priors can have a smoothing effect on resulting decisions, in this sense, the lower granularity groupings (the market state based grouping) can also be seen as a smoothed version of the macroeconomic outcomes, i.e. pre-smoothing the data by considering a much larger interval composed of several years for groupings. We see that the incorporation of prior information helps preserve some important information in such settings. Looking at the left-most column of Figure 8 (the uniform priors), we can see the overall "shape" of the peaks and dips in preferences *f* [*a*] is lost with aggregate groupings. For example, in the quarterly breakdown, there is a clear preference for selling in the later region in the range 2014–2017, corresponding to the highest growing market, which is labelled as "Recovery 2" in the aggregated version. When considering the "Recovery 2" with uniform priors, such a clear preference is lost, and the "Pre Crash" and "Initial Recovery" have a higher corresponding preference. This is because the agents can not separate past market information from the current market state and act purely based on the current utility. In contrast, with both the mean and the previous prior, such overall trends are preserved across the various granularities since agents can distinguish favourable environments when compared with previous market states (as captured by their prior beliefs). This additional temporal insight provides an important consideration and shows that even with various data-level smoothing or preprocessing (i.e., considering alternate data groupings) the prior information remains useful and highlights various market states and corresponding agen<sup>t</sup> preferences.

**Figure 8.** *f* [*a*] for varying granularities.

A key takeaway from this exploration is that the potential for temporal analysis introduced by the prior beliefs provides additional insights into decision-making. These insights can not be generated by simple data-level modifications. Furthermore, the decision temperature *T* provides a way to modulate market state changes when considering agen<sup>t</sup> decision-making.

#### **6. Discussion and Conclusions**

Despite many well-founded doubts of perfect rationality in decision-making, agents are often still modelled as perfect utility maximisers. In this paper, we proposed an approach for inference of agen<sup>t</sup> choice based on prior beliefs and market feedback, in which agents may deviate from the assumption of perfect rationality.

The main contribution of this work is a theoretically grounded method for the incorporation of an agent's prior knowledge in the inference of agen<sup>t</sup> decisions. This is achieved by extending a maximum entropy model of statistical equilibrium (specifically, Quantal Response Statistical Equilibrium, QRSE), and introducing bounds on the agen<sup>t</sup> processing abilities, measured as the KL-divergence from their prior beliefs. The proposed model can be seen as a generalization of QRSE, where prior preferences across an action set do not necessarily have to be uniform. However, when uniform prior preferences are assumed, the typical QRSE model is recovered. The result is an approach that can successfully infer least biased agen<sup>t</sup> choices, and produce a distribution of outcomes matching that of the actual observed macroeconomic outcomes when individual choice level data is unobserved.

In the proposed approach, the agen<sup>t</sup> rationality can vary from acting purely on prior beliefs, to perfect utility maximisation behaviour, by altering the decision temperature. Low decision temperatures correspond to rational actors, while high decision temperatures represent a high cost of information acquisition and, thus, revert to prior beliefs. We showed how varying an agent's prior belief altered the resulting decisions and behaviour

of agents, even those with equivalent utility functions. Importantly, the incorporation of prior beliefs into the decision-making framework allowed the separation of two key elements: The agent's utility maximisation, and the contribution of the agent's past beliefs. This separation allowed for a discussion on the decision-making process in a temporal sense, being able to refer to the previous decisions. This allows for investigation into the building of beliefs over time, elucidating resulting microeconomic foundations in terms of the underlying parameters.

It is worth pointing out some parallels with, and differences from, the frameworks of embodied intelligence and information-driven (guided) self-organisation, in which embodiment is seen as a fundamental principle for the organisation of biological and cognitive systems [63–66]. Similar to these approaches, we consider information-processing as a dynamic phenomenon and treat information as a quantity that flows between the agen<sup>t</sup> and its environment. As a result, an adaptive decision-making behaviour emerges from these interactions under some constraints. Maximisation of potential information flows is often proposed as a universal utility for such emergen<sup>t</sup> agen<sup>t</sup> behaviour, guiding and shaping relevant decisions and actions within the perception-action loops [67–70]. Importantly, these studies incorporate a trade-off between minimising generic and taskindependent information-processing costs and maximising expected utility, following the tradition of information bottleneck [71].

In our approach, we instead consider specific information acquisition costs incurred when the agents need to update their relevant beliefs in the presence of (Smithian) competition and market feedback. The adopted thermodynamic treatment of decision-making allows us to interpret relevant economic parameters in physical terms, e.g., agent's decision temperature *T*, the strength of negative feedback *ρ*, and skewness of the resulting energy distribution *γ*. Interestingly, the decision temperature appears in our formalism as the Lagrange multiplier of the information cost incurred when switching posterior and prior beliefs (KL-divergence). The KL-divergence can be interpreted as the expected excess code-length that is needed if a non-optimal code that was optimal for the prior (outdated) belief is used instead of an optimal code based on the posterior (correct) belief. Thus, the decision temperature modulates the inference problem of determining the true distribution given new evidence, in a forward time direction [72]. Moreover, the thermodynamic time arrow (asymmetry) is maintained only when decision temperatures are non-zero.

We demonstrated the applicability of the method using actual Australian housing data, showing how the incorporation of prior knowledge can result in agents building on past beliefs. In particular, the agen<sup>t</sup> focus can be shown to shift from utility maximisation to acting on previous knowledge. In other words, during the periods when the market has been performing well, the agents were shown to become overly optimistic based on the past performance.

The generality of the proposed approach makes it useful for incorporating any form of prior information on the agent's choice set. Moreover, we have shown that the default QRSE is a special case of the proposed extension with uniform (i.e., uninformative) priors. Therefore, the proposed approach can be seen as an extension of QRSE, which accounts for prior agen<sup>t</sup> beliefs based on information acquisition costs. As the QRSE framework continues to be expanded, the generalised model proposed here could become an important approach. Particularly, this would be useful whenever prior knowledge on agen<sup>t</sup> decisions is known, as well as in multi-action cases when the IIA property of the general logit function is undesirable. Other relevant applications include scenarios with multiple time periods, allowing for a detailed temporal analysis and exploration of the cost of switching between equilibria (measured as an information acquisition cost from prior beliefs).

**Author Contributions:** B.P.E. and M.P.; Funding acquisition, M.P.; Software, B.P.E.; Supervision, M.P.; Writing—original draft, B.P.E. and M.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was funded by the Australian Research Council Discovery Project DP170102927. **Data Availability Statement:** The real-estate pricing data used in this work were made available under license for this study by SIRCA-CoreLogic (https://www.corelogic.com.au/industries/ residential-real-estate).

**Acknowledgments:** The authors would like to thank Kirill Glavatskiy and Michael S. Harré for many helpful discussions regarding the Australian housing market, as well as Adrián Carro, Jangho Yang and anonymous reviewers for various comments. The authors would also like to acknowledge the Securities Industry Research Centre of Asia-Pacific (SIRCA) and CoreLogic, Inc. (Sydney, Australia) for their data on Greater Sydney housing prices.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

#### **Appendix A. Derivations**

#### *Appendix A.1. Decision Duality*

There are two main perspectives, the first is of the agen<sup>t</sup> performing actions within the system, and the second is of the system observer [29].

Each of the two perspectives allows to capture the uncertainty faced by either the actor or the observer, by imposing a constraint on entropy. In this section, we outline the duality that arises from these perspectives, showing that a duality exists between maximum entropy models, and entropy constrained models [7]. Additional discussion on such perspectives is given in [21].

Modelling the actor corresponds to maximising the expected utility subject to a fixed entropy constraint. This is the method outlined in Section 3.1.1. In this case, the agen<sup>t</sup> can be seen as a boundedly rational decision-maker, in that they might not have all of the information required to make a perfectly rational choice.

The alternate perspective, modelling an observer, corresponds to maximising the entropy of the decisions subject to a fixed expected utility. With this perspective, we capture modelling uncertainty from the observer. The observers problem is formulated as follows

$$\begin{aligned} \max & - \sum\_{a \in A} f[a|\mathbf{x}] \log f[a|\mathbf{x}] \\ & \text{subject to } \sum\_{a \in A} f[a|\mathbf{x}] = 1 \\ & \sum\_{a \in A} f[a|\mathbf{x}] \, \mathsf{U}[a, \mathbf{x}] \ge \mathsf{U}\_{\min} \end{aligned} \tag{A1}$$

where *Umin* represents the minimum expected utility. In order to see the duality of Equations (A1) and (1), we formulate the following Lagrangian for converting Equation (A1) into an unconstrained optimization problem.

$$\mathcal{L} = -\sum\_{a \in A} f[a|\mathbf{x}] \log f[a|\mathbf{x}] - \lambda \left(\sum\_{a \in A} f[a|\mathbf{x}] - 1\right) + \beta \left(\sum\_{a \in A} f[a|\mathbf{x}] \mathbb{I} I[a, \mathbf{x}] - \mathbb{I} I\_{\text{min}}\right) \tag{A2}$$

where again, taking the first order conditions and solving for *f* [*a*|*x*] yields

$$f[a|\mathbf{x}] = \frac{1}{Z} e^{\beta l I[a\_\tau \mathbf{x}]} \tag{A3}$$

We can see Equation (A3) is equivalent with Equation (4) with *β* = 1*T* , which highlights an important dualism between the two perspectives.

#### *Appendix A.2. Decision Function*

By setting the partial derivative of the unconstrained optimisation problem given in Equation (15) with respect to *f* [*a*|*x*] to 0, we can obtain the following definition for *f* [*a*|*x*]:

$$\begin{aligned} \frac{d\mathcal{L}}{f[a|\mathbf{x}]} &= \mathcal{U}[a, \mathbf{x}] - \lambda - T \log \left( \frac{f[a|\mathbf{x}]}{p[a]} \right) = 0\\ f[a|\mathbf{x}] &= e^{\frac{\mathcal{U}[a, \mathbf{x}]}{T} - \lambda + \log p[a]} \end{aligned} \tag{A4}$$

and, using the normalisation constraint ∑*a*∈*<sup>A</sup> f* [*a*|*x*] = 1, we obtain the following decision function

$$\begin{split} f[a|\mathbf{x}] &= \frac{1}{Z\_{A|\mathbf{x}}} e^{\frac{\|I[\mathbf{z},\mathbf{x}]}{T} + \log p[a]} \\ &= \frac{1}{Z\_{A|\mathbf{x}}} p[a] e^{\frac{\|I[\mathbf{z},\mathbf{x}]}{T}} \end{split} \tag{A5}$$

with the partition function *ZA*|*x* = ∑*a*∈*<sup>A</sup> p*[*a*]*e <sup>U</sup>*[*a*,*<sup>x</sup>*] *T* .

#### **Appendix B. Australian Housing Market Data**

Data from 2006–2020 is used. Data is split into individual years. We use the rolling median price for each area and then measure the quarterly percentage growth rate for the areas. The month-to-month percentage changes are visualised in Figure A1. The distributions of the returns are visualised in Figure A2.

**Figure A1.** Quarterly returns in the Sydney housing market.

**Figure A2.** Density plots of returns grouped by year. We can see each year follows a different shape,but shows some striking regularities representing a statistical equilibrium.

#### **Appendix C. Relation to Rational Inattention**

In his seminal work, [2] outlined rational inattention "based on the idea that individual people have limited capacity for processing information". This work introduced information-processing constraints into the macroeconomic literature, using mutual information as a measure of such information costs.

Of particular interest are the developments of [3] who showed how to apply rational inattention (RI) to discrete decision-making. The key contribution was the modification to the logit function that arises from considering a cost to decision-makers from deviating from prior knowledge. In this section, we highlight the similarities of R.I. with the thermodynamic approach of [4] and the work proposed here.

The problem to be solved is formulated as follows. A utility-maximising agen<sup>t</sup> must make a discrete choice, while it is costly to acquire information about the options *A* available:

$$\begin{aligned} \max f[a, \mathbf{x}] \sum\_{a \in A} \int\_{\mathcal{X}} f[a, \mathbf{x}] \mathcal{U}[a, \mathbf{x}] d\mathbf{x} - T \left( - \sum\_{a \in A} f[a, \mathbf{x}] \log(\frac{f[a], \mathbf{x}}{p[\mathbf{x}] f[a]}) \right) \\ \text{subject to} \sum\_{a \in A} f[a|\mathbf{x}] = 1 \end{aligned} \tag{A6}$$

where the first term is the expected utility, and the second a cost of information (following Sims [2], the mutual information). We see this as a similar setup to that of [4], which also corresponds to maximising the expected utility subject to an information cost, however, the information cost in [4] is instead measured as the KL-divergence. A key difference between the two is that Equation (A6) adds a dependence on *f* [*a*] into the denominator of the information cost term. We can take the first order conditions of the resulting Lagrangian for (A6) and solve for *f* [*a*|*x*], yielding:

$$f[a|\mathbf{x}] = \frac{e^{\frac{\mathcal{U}(a,\mathbf{x})}{T} + \log(f[a])}}{\sum\_{a' \in A} e^{\frac{\mathcal{U}(a',\mathbf{x})}{T} + \log(f[a'])}} = \frac{f[a]e^{\frac{\mathcal{U}(a,\mathbf{x})}{T}}}{\sum\_{a' \in A} f[a']e^{\frac{\mathcal{U}(a',\mathbf{x})}{T}}} \tag{A7}$$

which is not ye<sup>t</sup> fully solved, as there is a dependence on the unconditional probability *f* [*a*]. Since *f* [*a*] = *x f* [*a*|*x*]*p*[*x*]*dx*, *f* [*a*] depends on *f* [*a*|*x*], and *f* [*a*|*x*] depends on *f* [*a*], this must (generally) be solved numerically, for example, with the Blahut–Arimoto algorithm by first making a guess for *f* [*a*] and then iterating from there (see Caplin et al. [73] or Matˇejka and McKay [3] for solutions). It is for this reason, we utilise the configuration of [4] for the decision-making component, which depends only on the prior probabilities, and not the unconditional action probabilities *f* [*a*] meaning an analytical solution can be obtained. However, the R.I. framework can be seen as equivalent to choosing an "optimal" prior in the free energy framework of [4], as both can be seen as applications of rate-distortion theory [56].

Further discussion on the relationship between R.I. and QRSE is given in [30].

#### **Appendix D. Additional Parameters**

While *μ* and *T* are the main parameters of interest in this work, since they have a direct contribution to the modified decision function introduced, *ρ* and *γ* are still important, although to a lesser extent as they are indirectly impacted. *ρ* is the Lagrange multiplier for the competition constraint, and *γ* controls the skewness of the resulting distribution.

#### *Appendix D.1. Impact of Decisions on Outcomes*

Parameter *ρ* measures the impact of individual decisions on housing prices. A large *ρ* corresponds to a highly effective market (high impact of actions on the response). In contrast, a low *ρ* corresponds to a weaker market response, and thus, lower market effectiveness. Parameter *ρ*, therefore, corresponds to the strength of the negative feedback mechanism, with the case of *ρ* = 0 implying no market feedback (i.e., no impact on the outcome based on the actions). In all cases, we see relatively large *ρ*'s, peaking in 2013 and 2019, indicating the presence of a well-functioning feedback loop across the years. We see little variation between the uniform, previous, and mean prior in Figure A3, perhaps drawn from the fact the priors work as linear weightings in the difference between the conditional action probabilities, as shown in Equation (20).

**Figure A3.** Competition.

#### *Appendix D.2. Skewness*

The parameter *γ* affects the skew of the resulting exponential distribution. This skew arises from (potentially) unfulfilled agen<sup>t</sup> expectations, i.e., where *μ* = *ξ* [21]. Parameter *γ*, therefore, is a measure of skewness in the binary action case. In the asymmetric multi-action QRSE case, *γ* is replaced by alternate *μ*'s explaining such skew. As mentioned, the priors can also introduce such a skew (without the need for a *γ*). This is shown in the extreme buy *γ* in Figure A4 which was almost always near zero, as the buying preference already creates the skew needed to describe the underlying distribution (i.e., the skewness was already explained by *p*). In contrast, extreme sell needs small *γ*'s to switch their (incorrect) skew.

**Figure A4.** Skewness.

Negative *γ* corresponds to positive skewness, and positive *γ* corresponds to negative skewness. In most cases here, we see (at least slightly) positively skewed distributions (resulting in negative *γ*'s), with the exception of 2019, which is negatively skewed, as can be verified in Figure A5.

Generally, *γ*'s for the mean, previous, and uniform priors follow similar paths, except for the 2013–2016 years. In 2014 and 2016, *γ*'s for the previous priors differs from the other priors. This can be explained by the fact that in both cases, the prior had a strong sell preference (shown in Figure 4), meaning an adjusted *γ* was needed to capture the current distributions shift correctly (and offset the influence of the prior).

**Figure A5.** Resulting fitted marginals distributions *f* [*x*] for each year. Each coloured line represents a different prior (with the legend given in the top left). The blue bars show the (discretized) actual return distribution.

#### **Appendix E. Probability Plots**

In this section, we provide the resulting probability plots for *f* [*x*] (Figure A5), *f* [*a*, *x*] (Figure A6), and *f* [*a*|*x*] (Figure A7) across all years analysed.

**Figure A6.** Resulting Joint Distributions. Red lines represent *f* [sell, *<sup>x</sup>*], and green lines represent *f* [buy, *x*]. Each plot from top to bottom shows: Uniform, previous, mean and extreme buy and extreme sell priors (in that order).

**Figure A7.** Decision functions for selling. Buying curves are excluded as they are simply the complement (1 − sell). The green lines represent the extreme buy a priori preference, which means the resulting probabilities of selling are shifted far to the right, i.e., the majority of the area comprises buying actions, and only the extreme positive growth rates for sell. In contrast, the red lines represent the sell preference, which "pulls" the area to the left, resulting in a strong resulting conditional preference for selling.
