**3. Learning Algorithms**

Learning models in economics have served dual purposes. First, learning algorithms can play a theoretical role as a model of dynamics which converge to equilibrium. This is the explicit goal of the "belief learning" model [29]. Second, learning algorithms can play an empirical role in explaining the observed dynamics of game play over time. This goal is explicit in the "reinforcement learning" model [7] which draws heavily on models from artificial intelligence and psychology.

Both purposes are incorporated in the Experience Weighted Attraction model, which, appropriately enough, explicitly incorporates the belief learning and reinforcement learning models [30–32]. EWA and its one parameter successor, self-tuning EWA, has proved to be a particularly successful account of human experimental game play. Here, we discuss these reinforcement and self-tuning EWA models and compare and contrast them to the case-based learning approach.

In the following repeated games, we assume the same following notation: there are a set of agents indexed by *i* = 1, ... , *n*, each with a strategy set *Si*, which consists of *mi* discrete choices,

<sup>3</sup> But see Section 7.2. In fact, Erev and Roth [7] discuss such a similarity between situations in which to define experimentation of a subject when choosing strategies.

<sup>4</sup> Similarity is used in a way that maps closely to how learning models work, in general, by repeating successful choices under certain conditions. Choices in Cerigioni [28] use similarity when automated through the dual decision processes familiar from psychology.

so that *Si* = *s*1 *<sup>i</sup>* , ..., *s mi i* . Strategies are indexed with *j* (e.g., *s j i* ). Let *s* = (*s*1,*s*2, ..., *sn*) be a strategy profile, one for each agent; in typical notation, *s*−*<sup>i</sup>* denotes the strategy profile with agent *i* excluded, so *<sup>s</sup>*−*<sup>i</sup>* = (*s*1, ..., *si*−1,*si*+1, ..., *sn*). Scalar payoffs for player *<sup>i</sup>* are denoted with the function *<sup>π</sup>i*(*si*,*s*−*i*). Finally, let *si*(*t*) denote agent *<sup>i</sup>*'s strategy choice at time *<sup>t</sup>*, so *<sup>s</sup>*−*i*(*t*) is the strategy choices of all other agents at time *t*.

Erev and Roth [7] argue that, empirically, behavior in experimental game theory appears to be probabilistic, not deterministic. Instead of recommending deterministic choices, these models offer what the EWA approach has come to call "attractions." An attraction of an agent *i* to strategy *j* at time *t* is a scalar which corresponds to the likelihood that agent will choose this strategy at this time relative to other strategies available to this agent. An attraction by agent *i* to strategy *j* at time *t* under an arbitrary learning model will be represented by *A<sup>j</sup> i* (*t*). We compare these models by saying that different models provide different functions which generate these attractions, so we will have, e.g., *CBA<sup>j</sup> i* (*t*) to represent the attraction that is generated by the case-based model.

Because a given attraction corresponds to a likelihood, a vector of attractions *Aj i* (*t*) *mi j*=1 corresponds to a probability distribution over available choices at time *t* and, therefore, fully describes how this agent will choose at time *t*. 5

We consider case-based learning (CBL), reinforcement learning (RL), and self-tuning experience weighted attraction (EWA), in turn.
