*3.1. QRSE*

The QRSE framework aims to explain macroeconomic regularities as arising from social interactions between agents. There are two key assumptions stemming from the idea of Smithian competition: Agents observe and respond to macroeconomic outcomes, and agen<sup>t</sup> actions affect the macroeconomic outcome, i.e., a feedback loop is assumed. It is this feedback that is deemed to cause the macroeconomic outcome to have a distribution that stabilises around an average value. Given only the macroeconomic outcome, QRSE infers the least biased distribution of decisions, which result in the observed macroeconomic distribution using the principle of maximum entropy. This makes QRSE particularly useful for inferring decisions when the individual decision level data is unobserved. In the following section, we outline the key notions behind QRSE [5].

#### 3.1.1. Deriving Decisions

Agents are assumed to respond (i.e., make decisions) based on the macroeconomic outcome, for example, based on profit rates *x*. This is captured by the agents' utility *U*. However, agents are assumed to act in a boundedly rational way, such that they may not always choose the option with the highest *U*, for example, if it becomes impractical to consider all outcomes. That is, agents are attempting to maximise their expected utility, subject to an entropy constraint capturing the uncertainty:

$$\max \sum\_{a \in A} f\left[a \mid \mathbf{x}\right] \mathcal{U}\left[a\_{\prime} \mathbf{x}\right] \tag{1}$$

$$\begin{aligned} \text{subject to} & \sum\_{a \in A} f[a|\mathbf{x}] = 1\\ & -\sum\_{a \in A} f[a|\mathbf{x}] \log f[a|\mathbf{x}] \ge H\_{\text{min}} \end{aligned} \tag{2}$$

where *f* [*a*|*x*] represents the probability of an agen<sup>t</sup> choosing action *a* if rate *x* is observed. The first constraint ensures the probabilities sum to 1, while the second is a constraint on the minimum entropy. The minimum entropy constraint implies a level of boundedness such that there is some limit to the agents' processing abilities, which allows QRSE to deviate from perfect rationality.

Lagrange multipliers can be used to turn the constrained optimization problem of Equation (2) into an unconstrained one, which forms the following Lagrangian function:

$$\mathcal{L} = -\sum\_{a \in A} f[a|\mathbf{x}] \wr \mathcal{U}[a, \mathbf{x}] - \lambda \left( \sum\_{a \in A} f[a|\mathbf{x}] - \mathbf{1} \right) + T \left( -\sum\_{a \in A} f[a|\mathbf{x}] \log f[a|\mathbf{x}] - H\_{\text{min}} \right) \tag{3}$$

taking the first order conditions of Equation (3), and solving for *f* [*a*|*x*] yields:

$$f[a|\mathbf{x}] = \frac{1}{Z} e^{\frac{\|\mathbf{J}[a,\mathbf{x}]\|}{\mathbf{I}}} \tag{4}$$

representing a choice of a mixed strategy by maximising the expected utility subject to an entropy constraint. This problem is dual to maximising entropy of the mixed strategy, subject to a constraint on the expected utility as detailed in Appendix A.1.

#### 3.1.2. Deriving Statistical Equilibrium

From Section 3.1.1 we have a derivation for a decision function, where agents maximise expected utility subject to an entropy constraint introducing bounds in the agents processing abilities. In order to infer the statistical equilibrium based on observed macroeconomic outcomes, the joint probability *f* [*a*, *x*] must be computed.

The joint distribution captures the resulting statistical equilibrium which arises from the individual agen<sup>t</sup> decisions. While there are many potential joint distributions, using the principle of maximum entropy allows for inference of the least biased distribution. From an observer perspective, maximising the entropy of the model accounts for model uncertainty, by providing the maximally noncommittal joint distribution. To compute this, Scharfenaker and Foley [5] maximise the joint entropy with respect to the marginal probabilities (since individual action data is not available), by decomposing the joint entropy into a sum of the marginal entropy and the (average) conditional entropy.

The solution for *f* [*a*|*x*], given by Equation (4), can be used to compute the joint probability *f* [*a*, *<sup>x</sup>*], as long as marginal *f* [*x*] is determined (since *f* [*a*, *x*] = *f* [*a*|*x*] *f* [*x*]). In order to derive *f* [*x*], the approach considers the state dependant conditional entropy, represented as

$$H[A|\mathbf{x}] = -\sum\_{a \in A} f[a|\mathbf{x}] \log f[a|\mathbf{x}] \tag{5}$$

Scharfenaker and Foley [5] then use the principle of maximum entropy to find the distribution of *f* [*x*] which maximises

$$\max\_{f[\mathbf{x}]\geq 0} H = -\int\_{\mathbf{x}} f[\mathbf{x}] \log f[\mathbf{x}] d\mathbf{x} + \int\_{\mathbf{x}} f[\mathbf{x}] H[A|\mathbf{x}] d\mathbf{x} \tag{6}$$

$$\begin{aligned} \text{subject to } & \int\_{\mathfrak{x}} f[\mathfrak{x}] d\mathfrak{x} = 1\\ & \int\_{\mathfrak{x}} f[\mathfrak{x}] \mathfrak{x} d\mathfrak{x} = \mathfrak{z} \end{aligned} \tag{7}$$

The first constraint ensures the probabilities sum to 1, and the second constraint applies to the mean outcome (with *ξ* being the mean from the actual observed data ¯ *f* [*x*]). Importantly, there is also an additional constraint which models Smithian competition [32] in the market. Smithian competition models the feedback structure for competitive markets, for example, entrance into a market tends to lower the profit rates, and exit tends to raise the profit rates. This is captured as the difference between the expected returns conditioned on entrance, and the expected returns conditioned on exiting. This competition constraint can be represented as

$$\text{subject to } \int\_{\mathcal{X}} f[\mathbf{x}] (f[a|\mathbf{x}] - f[\bar{a}|\mathbf{x}]) \mathbf{x} d\mathbf{x} = \delta \tag{8}$$

The combination of the conditional probabilities of Equation (4), which stipulate that the agents enter and exit based on profit rates, and the competition constraint of Equation (8) models a negative feedback loop that results in a distribution of the profit rates around an average (*ξ*).

Again, using the method of Lagrange multipliers, the associated Lagrangian becomes 

$$\begin{aligned} \mathcal{L} &= -\int\_{\mathcal{X}} f[\mathbf{x}] \log f[\mathbf{x}] d\mathbf{x} + \int\_{\mathcal{X}} f[\mathbf{x}] H[A|\mathbf{x}] d\mathbf{x} - \\ \lambda \left( \int\_{\mathcal{X}} f[\mathbf{x}] d\mathbf{x} - \mathbf{1} \right) - \gamma \left( \int\_{\mathcal{X}} f[\mathbf{x}] \mathbf{x} d\mathbf{x} - \boldsymbol{\xi} \right) - \rho \left( \int\_{\mathcal{X}} f[\mathbf{x}] (f[a|\mathbf{x}] - f[\mathbf{d}|\mathbf{x}]) \mathbf{x} d\mathbf{x} - \boldsymbol{\delta} \right) \end{aligned} (9)$$

where taking the first order conditions of Equation (9), and solving for *f* [*x*] yields

$$f[\mathbf{x}] = \frac{1}{Z\_A} e^{H[A|\mathbf{x}] - \gamma \mathbf{x} - \mu \mathbf{x} (f[a|\mathbf{x}] - f[\bar{a}|\mathbf{x}])} \tag{10}$$

where *ZA* is the partition function *ZA* = *x e<sup>H</sup>*[*A*|*x*]−*γ<sup>x</sup>*−*ρx*(*f* [*a*|*x*]−*f* [*a*¯|*x*])*dx*. Note that in Equation (9) we use *ρ* as the Lagrangian multiplier for the competition constraint. Parameter *ρ* is referred to as *β* in [5], we have avoided this notation to avoid confusion with the thermodynamic *β* (inverse temperature) discussed in later sections.

Equations (4) and (10) comprise a fully defined joint probability. Crucially, QRSE allows for modelling the resultant statistical equilibrium even when the individual actions are unobserved—by inferring these decisions based on the principle of maximum entropy.

#### 3.1.3. Limitations of Logit Response

In Section 3.1.1 we have seen how the logit response function used for decision-making in QRSE is derived from entropy maximisation. Following the Boltzmann distribution well known in thermodynamics, this logit response has seen extensive use throughout the literature arising in a variety of domains. For example, the logit function is used as sigmoid or softmax in neural networks, logistic regression, and in many applications in economics and game theory [33,34]. However, one important development not ye<sup>t</sup> discussed is the incorporation of prior knowledge into the formation of beliefs. Up until now, we have considered a choice to be the result of expected utility maximisation based on entropy constraints from which the logit models have arisen. However, from psychology [35], behavioural economics [36,37], and Bayesian methods [38,39] we know that the incorporation of a priori information is often an important factor in decision-making. Thus, we explore the incorporation of prior beliefs into agen<sup>t</sup> decisions in more detail in the following section (and the remainder of the paper).

Furthermore, one criticism of the logit response arises from the independence of irrelevant alternatives (IIA) property of multinomial logit models (which would extend to the conditional function used in QRSE in a multi-action case), which states that the ratio between two choice probabilities should not change based on a third irrelevant alternative. Initially, this may seem desirable, however, this can become problematic for correlated outcomes (of which many real examples possess). This criticism has been proved correct in

several thought experiment studies, showing violations of the IIA assumption [40]. The classical example is the Red Bus/Blue Bus problem [41,42].

Consider a decision-maker who must choose between a car and a (blue) bus, *A* = {car, blue bus}. The agen<sup>t</sup> is indifferent to taking the car or bus, i.e., *p*(car) = *p*(blue bus) = 0.5. However, suppose a third option is added, a red bus which is equivalent to the blue bus (in all but colour). The agen<sup>t</sup> is indifferent to the colour of the bus, so when faced with *A*1 = {blue bus,red bus} the agen<sup>t</sup> would choose *p*(red bus) = *p*(blue bus) = 0.5. Now suppose the agen<sup>t</sup> is faced with a choice between *A*2 = {car, blue bus,red bus}. As per the IIA property, the ratio *p*(blue bus) *p*(car) (from *A*, 0.5 0.5 ) must remain constant. So adding in a third option, the probability of taking any *a* becomes *p*(*a*) = 1 3 (for all *a*), maintaining *p*(blue bus) *p*(car) = 1. However, this has reduced the odds of taking the car from 0.5 to 0.33 based on the addition of an irrelevant alternative (i.e., the red bus in which the agen<sup>t</sup> does not care about colour of the bus). In reality, the probability for taking the car should have stayed fixed at *p*(car) = 0.5, and the probability of taking a bus reduced to 0.25 each. This reduction in the probability of *p*(car) does not make sense for a decision-maker who is indifferent to the colour of the bus and is the basis for the criticism. This may not be immediately relevant for current QRSE models (especially binary ones), but with potential future applications, for example, in portfolio allocation, this could become an important consideration. For example, if adding an additional stock to a portfolio which is similar to an existing stock, it may not be desriable to reduce the likelihood of selecting other (unrelated) stocks.

#### *3.2. Thermodynamics of Decision-Making*

A thermodynamically inspired model of decision-making which explicitly considers information costs, as well as the incorporation of prior knowledge, is proposed by [4]. The proposed approach can be seen as a generalisation of the logit function, where the typical logit function can be recovered as a special case, but in the more general case manages to avoid the IIA property.

Ortega and Braun [4] represent changing probabilistic states as isothermal transformations. Given some initial state *x* ∈ *X* with initial energy potential *φ*0[*x*], the probability of being in state x is *p*[*x*] = *e*<sup>−</sup>*βφ*0[*x*] ∑*x*∈*<sup>X</sup> e*<sup>−</sup>*βφ*0[*x*] (from the Boltzmann distribution). Updating state to *f* [*x*] corresponds to adding new potential <sup>Δ</sup>*φ*0[*x*]. The transformation requires physical work, given by the free-energy difference Δ*F*[ *f* ]. The free energy difference between the initial and resulting state is then

$$\begin{split} \Delta F[f] &= F[f] - F[p] \\ &= \sum\_{\mathbf{x} \in \mathcal{X}} f[\mathbf{x}] \Delta \phi(\mathbf{x}) + \frac{1}{\beta} \sum\_{\mathbf{x} \in \mathcal{X}} f[\mathbf{x}] \log \left( \frac{f[\mathbf{x}]}{p[\mathbf{x}]} \right) \end{split} \tag{11}$$

which allows the separation of the prior *p*[*x*] and the new potential <sup>Δ</sup>*φ*0[*x*]. In economic sense, representing the negative of the new potential as the utility gain, i.e., *<sup>U</sup>*(*x*) = − <sup>Δ</sup>*φ*0[*x*], allows for reasoning about utility maximisation subject to an informational constraint, given here as the Kullback-Leibler (KL) divergence from the prior distribution [4]. Golan [43] shows how the KL-divergence naturally arises as a generalisation of Shannon entropy (of Equation (2)) when considering prior information, and Hafner et al. [44] show how various objective functions can be seen as functionally equivalent to minimising a (joint) KL-divergence, even those not directly motivated by the free energy principle. Such analysis makes the KL-divergence a logical and fundamentally grounded measure of information acquisition costs, captured as the divergence from a prior distribution.

Ortega and Stocker [45] then apply this formulation to discrete choice by introducing a choice set *A* (space of actions), which leads to the following negative free energy difference, for a given observation *x*:

$$-\Delta F[f[a|\mathbf{x}]] = \sum\_{a \in A} f[a|\mathbf{x}] \, \mathcal{U}[a, \mathbf{x}] - \frac{1}{\beta} \sum\_{a \in A} f[a|\mathbf{x}] \log \left(\frac{f[a|\mathbf{x}]}{p[a]}\right) \tag{12}$$

where again *a* represents a choice (or action), and *U* the utility for the agent. The first term of Equation (12) is maximising the expected utility, and the second term is a regularisation on the cost of information acquisition. Again, in this representation, information cost is measured as the KL-divergence from the prior distribution.

Taking the first order conditions of Equation (12) and solving for *f* [*a*|*x*] yields

$$f[a|x] = \frac{p[a]e^{\frac{\mathbb{I}[x,a]}{\mathbb{I}}}}{\sum\_{a' \in A} p[a']e^{\frac{\mathbb{I}[x',x]}{\mathbb{I}}}} \tag{13}$$

where we have moved from inverse temperature *β* to temperature *T* for notational convenience, i.e., *T* = 1*β* . The key formulation here is the separation of the prior probability *p* from the utility gain (or the new potential from the initial potential). *T* then arises as the Lagrange multiplier for the cost of information acquisition (as opposed to the entropy constraint of QRSE, described in Section 3.1). We emphasise this aspect in later sections.

Revisiting the IIA property, the incorporation of the prior probabilities in Equation (A7) can adjust the choices away from the logit equation, and thus managing to avoid IIA. However, if desired, the free energy model reverts to the typical logit function in the case of uniform priors, and so this property can be recovered. In economic literature, a similar model is given by Rational Inattention (R.I.) by [2]. The relationship between R.I. and the free energy approach of [4,45] is detailed in Appendix C.
