*3.6. Stochastic Choice Probabilities*

As defined in Sections 3.1–3.3, each learning model generates a set of attractions for each strategy *j*: *RLA<sup>j</sup> i* (*t*), *EWA<sup>j</sup> i* (*t*), and *CBA<sup>j</sup> i* (*t*). We use the same function to aggregate the attractions generated by these different models. That function is a logit response rule. Let *A<sup>j</sup> i* (*t*) be any of these three attractions. Subsequently, Equation (12) gives the probabilities that the attractions yield:

$$P\_i^j(t+1) = \frac{\mathbf{e}^{\lambda \cdot A\_i^j(t)} }{\sum\_{k=1}^{m\_i} \mathbf{e}^{\lambda \cdot A\_i^k(t)}} \tag{12}$$

Logit response has been expansively used in the learning literature of stochastic choice and, if the exponetial of the attractions are interpretes "choice intensities", this formulation is consistent with the Luce Choice Rule [6], as discussed in the introduction.<sup>9</sup> Equation (12) is used as the stochastic choice rule to fit data to the models to explain each individual choice *j* by each subject *i* in every time period *t*. The learning algorithm equations will be estimated using maximum likelihood in order to determine the fit of the each of the models and provide estimates of the specific learning parameters. This includes experimenting with various initial parameters and algorithms.<sup>10</sup>

In this logit rule, *λ* is the sensitivity of response to the attractions, where a low value of *λ* would suggest that choices are made randomly and a high value of *λ* would suggest that the choices determined by the attractions. This value will be estimated with the empirical data and could vary for a variety of reasons, such as the subject's motivation in the game or unobserved components of payoffs.

<sup>9</sup> In addition to logit response, we also estimate a power logit function, but find that it does not change the conclusions or generally improve the fit of the learning models estimated here.

<sup>10</sup> We use STATA to estimate the maximium likelihood functions using variations of Newton–Raphson and Davidon–Fletcher–Powell algorithms, depending on success in estimation. Code is available upon request.
