*3.3. Self-Tuning Experience Weighted Attraction*

Self-tuning experience weighted attraction was developed by Ho et al. [10] to encompass experience weighted attraction [31] in a simple one parameter model. EWA incorporates both RL and belief learning, which relies on so-called "fictitious play", in which the payoffs of forgone strategies are weighted alongside realized payoffs. Self-tuning EWA is a compact and flexible way to incorporate different types of learning in one algorithm.

Equation (6) describes self-tuning EWA: a *δ* weight is placed on fictitious play and a (1 − *δ*) weight is placed on realized outcomes. Self-tuning EWA has been successful at explaining game play in a number of different settings, including the data that we use in this paper.

$$E\mathcal{W}A\_i^j(t) = \frac{N(t-1)\phi(t)EWA\_i^j(t-1) + \left[\delta + (1-\delta)\cdot I(s\_{i'}^j, s\_i(t))\right] \cdot \pi(s\_{i'}^j, s\_{-i}(t))}{N(t)}\tag{6}$$

In self-tuning EWA, the parameter *N* evolves by the rule *N*(*t*) = *φ* · *N*(*t* − 1) + 1 and *N*(0) = 1. The *I*(·) function is an indicator function that takes a value of 1 when *s*(*t*) = *si* and 0 otherwise. The parameter *φ* acts as a discount on past experiences, which represents either agents forgetfulness or incorporating a belief that conditions of the game may be changing. This parameter evolves, so that *<sup>φ</sup>*(*t*) = <sup>1</sup> <sup>−</sup> <sup>1</sup> <sup>2</sup>*Sp*(*t*), where *Sp*(*t*) is a surprise index. *Sp*(*t*) measures the extent to which agent's partners deviate from previous play. More precisely, it is defined by the cumulative history of play *hk <sup>j</sup>*(*t*) and a vector of the most recent play *<sup>r</sup><sup>k</sup> <sup>j</sup>*(*t*) for strategy *j* and opposing player *k*, as given in the Equations (7) and (8).

$$h\_j^k(t) = \frac{\sum\_{\tau=1}^t I(s\_{j'}^k s^k(\tau))}{t} \tag{7}$$

$$r\_j^k(t) = \sum\_{j=1}^2 \frac{\sum\_{\tau=t-W+1}^t I(s\_{j'}^k s^k(\tau))}{W} \tag{8}$$

<sup>8</sup> The bucket analogy is also apropos because Erev and Roth [7] describe a spillover effect, in which buckets can slosh over to neighboring buckets. We do not investigate the spillover effect in this paper, since with only two actions (in 2 × 2 games) the spillover effect washes out.

*W* = 2 because there are only two strategies available to all agents in these games. In the experiments used in this paper, the subjects are unable to identify opposing players and we treat all of the opposing players as a representative average player, following Chmura et al. [9], to define histories and the surprise index. Equation (9) defines the surprise index, which is the quadratic distance between cumulative and immediate histories.

$$Sp(t) = \sum\_{j=1}^{2} (r\_j^k(t) - h\_j^k(t))^2 \tag{9}$$

The fictitious play coefficient *δ* shifts attention to the high payoff strategy. This function takes the value of *δ* = <sup>1</sup> *<sup>W</sup>* if *<sup>π</sup>*(*si*,*s*−*i*(*t*)) > *<sup>π</sup>*(*t*) and 0 otherwise.
