1. Introduction
Over the past few decades, learning models have received much attention in the theoretical and experimental literature of cognitive science. One such model is fictitious play, where players form beliefs about their opponents’ play and best respond to these beliefs. In the fictitious play model, players know the payoff structure and their opponents’ strategy sets.
Whereas there are other learning models in which players have only limited information about the game structure; players may not have information about payoff structure, opponents’ strategy sets or they may not even know whether they are playing against other players. In this situation, they may not be able to form beliefs about the way that the opponents play or all possible outcomes. What they do know is their own available actions and the results from the previous play, that is, the realized payoffs from chosen actions. Instead of forming beliefs about all possible outcomes, each player makes a subjective assessment on each of his actions based on the realized payoffs from the action and tends to pick the action that has achieved better results than the others.
One such model with limited information is the reinforcement learning model introduced by Erev and Roth [
1] (ER, hereafter), where they model the observed behaviour of agents in the lab.
1 In their model, the agent chooses an action randomly, where the choice probability of the action is the fraction of the payoffs realized from the action over the total payoffs realized for all available actions.
2In another learning model, which is introduced by Sarin and Vahid [
2] (SV, hereafter), each player makes a subjective payoff assessment on each of his actions, where the assessment is a weighted average of realized payoffs from the action, and chooses the action that has the highest assessment. After receiving a payoff, each player updates the assessment of the chosen action adaptively; the assessment of the chosen action is adjusted toward the received payoff.
3In this paper, we provide a theoretical prediction of the way in which adaptive players in the SV model behave in the long run in general games, mostly in coordination games, which are of interest to a wide range of researchers.
4 In this model, the initial assessment of each action is assumed to take a value between the maximum and the minimum payoff that the action can provide.
5 For instance, players may have experienced the game in advance, so they may use their knowledge of previous payoffs to form the initial assessment of each action. Given those assessments, each player chooses the action that has the highest assessment.
After playing a game and receiving a payoff, each player updates his assessment using the realized payoff; the new assessment of a chosen action is a convex combination of the current assessment and the realized payoff. In this paper, the sequence of weighting parameters is assumed to be stochastic,
6 meaning that the amount of new payoff information that each player incorporates into his new assessment in each period is uncertain. The randomness assumption is reasonable, because the weights are subjective and may depend on each player’s capacity or emotion. For instance, a player may sometimes show inaccuracy in capturing new information because of his lack of concentration. The randomness also expresses the idea that each player’s subjective weights on the new payoff information depend on his unexpected mood.
Since the initial assessment of each action is smaller than the best payoff that the action can give, each player increases his assessment of the action when he receives the best payoff. If there exists an action profile at which each player receives the best payoff that his current action can give and they play the action profile in some period, then players will keep choosing the action profile in all subsequent periods. We call such an action profile the absorbing state; an action profile is absorbing if, once players play the action profile in some period, they play it in all subsequent periods.
Furthermore, there exist other cases where players stick to one action profile; if there exists a period in which, for each player, the assessment of his chosen action and the realized payoff is greater than the assessments of his other actions, then players keep choosing the action profile in all subsequent periods. It is shown that each pure Nash equilibrium is always a candidate of the convergence point, that is, for each strict Nash equilibrium, there exists a range of assessments for all players and actions, such that players stick to the Nash equilibrium forever. In addition, if: (i) at any non-Nash equilibrium action profile, at least one player receives the payoff, which is less than his maximin payoff; or (ii) all non-Nash equilibrium action profiles give the same payoff, then players end up playing a strict Nash equilibrium with probability one. To see this in detail, we focus on 2 × 2 coordination games and one non-2 × 2 coordination game.
First, we focus on 2 × 2 coordination games. For analytical purposes, we exclusively divide 2 × 2 coordination games into the following two categories: (i) at non-Nash equilibrium action profiles, there exists a player who receives the worst payoff; and (ii) there exists a non-Nash equilibrium action profile that each player’s action corresponds to his unique maximin action. Then, it is shown that players end up playing a strict Nash equilibrium in coordination games in category (i), while players end up playing a strict Nash equilibrium or the action profile that consists of each player’s unique maximin action in coordination games in category (ii). It is helpful for understanding the argument further to see well-known coordination games from each category. In particular, category (i) includes the battle of the sexes game, the pure coordination game and the stag hunt game, whereas category (ii) includes the game of chicken and market entry games.
Next, we focus on a non-2 × 2 coordination game introduced by Van Huyck, Battalio and Beil [
6] (VHBB, hereafter) to compare the theoretical results from this model with their experimental results. In the game, each player is asked to pick a number from a finite set, and coordination is achieved when players pick the same number. If players fail to coordinate, the player who picks the smallest number among players’ choices receives the highest payoff. In addition, each number gives a better payoff when the choice is closer to the smallest number among all the players’ choices. We show that each Nash equilibrium is absorbing.
7 It is also shown that the smallest number of the players’ choices weakly decreases over time. Next, we consider the case where the second best payoff from each action is lower than the payoff from the maximin action, which is the smallest number of their choice set. Hence, players are better off if they choose the smallest number of their choice set when they fail to pick the smallest number among the players’ choices. In this case, we show that players end up playing a Nash equilibrium with probability one, which can be also observed in the experimental results by VHBB.
It is also intriguing to consider cases where the weighting parameters are not stochastic. For example, we consider players who believe that the situation they are involved in is stationary, so that each action’s assessment is the arithmetic mean of its past payoffs. We also consider the case where players believe that the environment is non-stationary and put the same weight on all new payoff information. Then, for each case, we show the necessary and sufficient condition for coordination failure, where players play non-Nash equilibrium action profiles alternately forever. In fact, in fictitious play, an example of the correlated play on off-diagonal action profiles is shown (Fudenberg and Levine [
8]) and the empirical frequencies of the play should converge to the mixed Nash equilibrium in every 2 × 2 games with the diagonal property (Monderer and Shapley [
9]). We, however, show a case in which the empirical frequencies of a correlated play on non-Nash equilibrium action profiles do not converge to the mixed Nash equilibrium.
3. General Games
We consider a case in which M players play the same game repeatedly over periods. Let be the set of players. In each period, , each player chooses an action from his own action set simultaneously. Let be the finite set of actions for player . After all the players choose actions, each player receives a payoff. If players play then player i’s realized payoff is denoted by , where . When choosing an action, each player does not know the payoff functions or the environment in which he is involved.
In each period, each player assigns subjective payoff assessments on his actions; let
denote player
i’s assessment on action
in period
n. Let
be the vector of assessments for all actions for player
i. We assume that the initial assessment for each action and each player takes a value between the maximum and the minimum value that the action gives; thus:
for all
and
If
then we assume that
In each period, each player chooses the action that he believes will give the highest payoff; given his assessments, he chooses the action that has the highest assessment in the period. Therefore, if
is the action that player
i chooses in period
n, then:
For a tie break situation, which arises when more than two actions have the highest assessment, we introduce two types of tie break rules. We say that a tie break rule satisfies the inertia condition if the rule chooses the action that was chosen in the last period; if actions that have the highest assessment were not chosen in the last period, then the rule picks one of the actions randomly. As a comparison, we also introduce another tie break condition, the uniform condition, where the rule picks each of the actions that have the highest assessment with equal probability. In the following argument, we specify a tie break rule if the result depends on the tie break rule; otherwise, the results do not depend on the tie break rule assumption.
After playing the game in each period, each player observes only his own payoff; players observe neither their opponents’ actions nor their payoffs. Given his own realized payoff, each player updates his assessment of the action chosen in the previous period. Specifically, if player
i receives a payoff
when players play
then he updates
as follows:
where
is player
i’s weighting parameter for action
in period
n. We assume that
is a random variable that takes a value between zero and one;
. It reflects the idea that players are uncertain how far to incorporate the new payoff information into their new assessments. The uncertainty can also be interpreted as players’ emotional shocks. How far they incorporate the new payoff information depends on their random mood. We also assume that: (i) the sequence of weighting parameters,
, is independent among periods, players and actions and is identically distributed among periods; and (ii) each component,
, has a density function that is strictly positive on the domain
for all
i and
.
4. Results
In this section, we investigate the convergence results in general games. In later sections, we focus on more specific games, in particular, coordination games.
We first show a sufficient condition under which an action profile is absorbing. We say is absorbing if once players play the action profile in a period, then they play it in all subsequent periods.
Proposition 1. If is such that (i) for all i:and (ii) for all i, there exists , such that:then
is absorbing. Proof. Consider the case where players pick the action profile, , in some period, n. In that case, player i receives the payoff, . Note that the value is the maximum value that action can give; therefore, by condition (ii), player i inflates the assessment of the action, . Since the assessments of other actions do not change in the next period, player i plays action in period again. Since this logic can be applied to other periods and we pick player i randomly, players play the same action profile in all the subsequent periods. ☐
Proposition 1 says that if at an action profile, each player receives a payoff that is strictly better than the other payoffs from his current action, then the action profile is absorbing.
If the inertia condition is always assumed for each player’s tie break rule, then condition in Proposition 1 is not required. However, if the uniform condition is assumed, without condition , players may not end up playing one action profile. As an extreme example, if two actions give the same payoff for any opponents’ actions and the payoff is higher than any other payoffs that any other action can give, then he plays those two actions with equal probability forever.
From Proposition 1, it is easy to see that even action profiles that consist of dominated strategies for all players can be absorbing. To see this, assume that two players play the prisoner’s dilemma game, which has the following payoff matrix:
Note that the strategy “C” is strictly dominated by the strategy “D” for both players. Notice also that at (C,C), both players receive the highest payoffs from the action “C”;
,
. Hence, if players play (C,C) once, then they always play it afterwards.
9In the next statement, we show that player i stops playing an action if the assessment of the action becomes smaller than the minimum payoff that another action can give.
Proposition 2. If in some period, n, for , then player i does not choose after period n.
Proof. From the fact that
we have the fact that
. Notice that
is not chosen in period
n. Then, we have
,
10 and player
i will not choose
in period
. The same logic can be applied in later periods, and thus, player
i will not choose
in any subsequent periods. ☐
Once the assessment of one action becomes lower than the worst payoff from another action, then the action will not be chosen forever. Therefore, if the worst payoff from one action is greater than the best payoff from another action, then the latter action is never chosen at any time. One natural question is whether players end up playing a strict Nash equilibrium. We say that players end up playing
if there exists
n, such that for all periods after
players play
. If the condition,
, satisfies for all
and
then players end up playing
.
11 In the following statement, we show that for any strict Nash equilibrium, there exist assessments for all players, such that they end up playing the strict Nash equilibrium:
Proposition 3. For any strict Nash equilibrium and any period, there exist assessments for all players, such that they play the Nash equilibrium in the period and all subsequent periods.
Proof. Let
be a strict Nash equilibrium and
be player
i’s strategy at the strict Nash equilibrium. Then, we have the following condition; for all
:
for all
. Now, we pick assessments of players, such that the following conditions are satisfied; for all
and:
for all
Note that Equation (3) can hold, since by Equation (1), the minimum value of the assessment of action
is less than or equal to
which is strictly less than
Thus, by Equations (2) and (3), players play the strict Nash equilibrium in period
n and:
for all
. Therefore, players play the strict Nash equilibrium again in period
. Notice that we can apply the same argument for the following periods, and thus, players play the strict Nash equilibrium in all subsequent periods. ☐
Proposition 3 says that for any strict Nash equilibrium, there exists a case in which players end up playing the strict Nash equilibrium. However, it is possible that players end up playing a non-Nash equilibrium. Hence, it is natural to consider the case where if they converge to play one action profile, then it should be a strict Nash equilibrium.
In the following statements, we focus on the cases where all pure Nash equilibria are strict. We also assume that there do not exist any redundant actions that always give the same constant payoff; for any
and actions
,
, the following condition does not hold:
Lemma 1. For any initial assessments, players never end up playing if s.t.:where if the inequality holds with equality, then the following condition also holds; for : Proof. To prove the statement, we should consider the case in which Equation (4) holds strictly and the case in which Equation (4) holds with equality.In both cases, we prove by contradiction.
We first consider the case in which Equation (4) holds strictly. Then, we assume that there exist assessments, such that players end up playing
. It implies that
,
:
. Note that for
and
,
should be greater than
; it is the condition for the other actions not to be chosen in the next period. However, it contradicts the fact that Equation (4) holds strictly.
We next assume that Equation (4) holds with equality. We again assume that there exist assessments, such that players end up playing
. Since we have the additional condition, it implies that
:
However, it contradicts the fact that Equation (4) holds with equality. ☐
If Equation (4) is satisfied at non-Nash equilibrium action profiles, then players never end up playing one of them. It is also obvious that the condition is not satisfied at each strict Nash equilibrium. Equation (4) says that there exists a player who receives a payoff, which is less than his maximin payoff. Though the condition limits the class of games, still, there exist interesting games that satisfy the condition. For example, the stag hunt game satisfies Equation (4) at non-Nash equilibrium action profiles and has the following payoff matrix:
| Rabbit | Stag |
Rabbit | 1,1 | 2,0 |
Stag | 0,2 | 5,5 |
At non-Nash equilibrium action profile, one player decides to hunt a stag, while the other player decides to hunt a rabbit. The player who decides to hunt a stag fails and receives nothing. Note that the payoff is less than the maximin payoff, which is obtained when both players decide to hunt a rabbit and share the rabbit.
Another coordination game that satisfies Equation (4) is the first order statistic game, where each player chooses a number from a finite set, and coordination occurs when all of them pick the same number. In addition, if players succeed in coordinating at a higher number, then they receive a better payoff. When they fail to coordinate on choosing the same number, the player who has chosen the smallest number receives the best payoff, the player who has chosen the second smallest number receives the second best payoff, and so on; the smaller number the player has chosen, the better payoff he receives. For example, we consider the case where each player picks a number from one to four and the payoff matrix of each player is expressed as follows:
| 1 | 2 | 3 | 4 |
1 | 1 | 1.5 | 1.5 | 1.5 |
2 | 0 | 2 | 2.5 | 2.5 |
3 | -1 | 0 | 3 | 3.5 |
4 | -2 | -1 | 0 | 4 |
The first column represents player i’s choice, while the first row represents the minimum value of his opponents’ choices. It is easy to see that at each Nash equilibrium, all players pick the same number. Since action 1 gives at least one, and players who fail to pick the smallest number receive at most zero, this game satisfies Equation (4).
In both games, Equation (4) holds strictly. In other games, such as the battle of the sexes games, Equation (4) holds weakly, in particular,
for all
i and
where
is the set of pure Nash equilibria. For instance, a battle of the sexes game has the following payoff matrix:
| | |
| 1,2 | 0,0 |
| 0,0 | 2,1 |
In the following theorem, we show that players end up playing a Nash equilibrium almost surely if: (i) Equation (4) is satisfied strictly at non-Nash equilibrium profiles; or (ii) if each player’s payoffs at non-Nash equilibrium action profiles are equal.
Theorem 1. Players end up playing a strict Nash equilibrium almost surely if (i) , such that :or (ii) , , Proof. Case (i): (Only the intuition of the proof is provided here. For details, see the
Appendix.) It is a direct consequence from Lemma 1 that if players end up playing one action profile, then it should be a strict Nash equilibrium. Therefore, it should be shown that they actually end up playing a strict Nash equilibrium. The intuition of the proof is as follows. Since off-diagonal action profiles cannot be played infinitely often, there exists a period after which players only play strict Nash equilibria. Since we consider games with strict Nash equilibrium, players should change their actions at the same time when they move from one Nash equilibrium to another Nash equilibrium. Note also that since the sequence of weighting parameters is independent and each component has a density function that is positive on its domain, perfect correlated play on strict Nash equilibria is impossible. Now, the detailed proofs are given in the following arguments.
Case (ii): Note that if condition (ii) satisfies, then the payoff from any Nash equilibrium should be greater than the payoff from non-Nash equilibrium; for all and Therefore, each strict Nash equilibrium is absorbing, and thus, once players play a strict Nash equilibrium, then they play it forever. By the same logic as the proof in (i), players cannot play only non-Nash equilibrium action profiles forever. That is, with probability one, players play a strict Nash equilibrium at some time, and then play it in all subsequent periods. ☐
Theorem 1 shows that players end up playing a strict Nash equilibrium almost surely in the games that satisfy condition (i) or (ii). To see this in detail, in the following sections, we investigate and non- coordination games.
5. 2 ×2 Coordination Games
In this section, we focus on 2×2 coordination games, which have the following payoff matrix:
| | |
| a11, b11 | a12, b12 |
| a21, b21 | a22, b22 |
where
,
,
and
. Note that given the conditions, the pure Nash equilibria are
and
. For the purposes of the analysis, we exclusively divide 2 × 2 coordination games into the following two categories. In the first category, at non-Nash equilibrium action profiles, there exists at least one player who receives his worst payoff;
and
for
. In the second category, there exists an action profile at which each player’s action corresponds to his unique maximin action:
and
for
. Then, we have the following result:
Proposition 4. With probability one, players end up playing: (i) a strict Nash equilibrium in 2 × 2 coordination games in the first category; and (ii) a Nash equilibrium or the action profile at which each player’s action corresponds to his unique maximin action in 2 × 2 coordination games in the second category.
Proof. (i) Note that by Theorem 1, we only have to consider the case in which only one of the inequalities holds with equality: without loss of generality, we assume that
and
. Notice that
, which is a strict Nash equilibrium, is absorbing. Since at
, player 2 receives the payoff,
, which is strictly lower than the worst payoff of another action,
; players never play the action profile infinitely many times. Since player 1 receives the worst payoff at
, players never end up playing the action profile. Therefore, the last case to be considered is that players play only
and
alternately forever without ending up playing one of the action profiles. However, it cannot happen, since
. Therefore, players end up playing one of strict Nash equilibria.
(ii) Without loss of generality, we assume that
and
. Note that
is not played infinitely many times, since, at the action profile, each player receives his worst payoff. It is easy to show that there exists a case in which players end up playing
. What we have to show is that players actually end up playing a strict Nash equilibrium or
with probability one. To show that, we consider the following four possible cases according to the number of absorbing states under the uniform condition for each player’s tie break rule.
12(1) Consider the case where there exist two absorbing states that correspond to strict Nash equilibria. Since strict Nash equilibria are absorbing, it is obvious that they end up playing a strict Nash equilibrium or .
(2) Consider the case where there exists one absorbing state that corresponds to a strict Nash equilibrium. Let such a strict Nash equilibrium be . The only case to be considered is that players play only and alternately without converging one of them. Since and are played infinitely many times, it should be that , which contradicts the condition for the coordination game. Thus, players end up playing a strict Nash equilibrium or .
(3) Consider the case where is absorbing. It can be shown by the argument in Theorem 1 that perfect correlation on strict Nash equilibria is impossible, and thus, players end up playing a strict Nash equilibrium or .
(4) Lastly, consider the case where there exists no absorbing state. Then, the following condition on players’ payoffs should hold:
and
, where at least one of them should hold with equality. Without loss of generality, we assume that
and
. Note that once
is played, it is played in all subsequent periods.
13 Since, by the same logic above,
and
are not played alternately forever, they end up playing a strict Nash equilibrium or
. ☐
In fact, this categorization helps us to understand the long run outcomes of the adaptive learning process in 2 × 2 coordination games. In the following argument, we focus on specific coordination games: the battle of the sexes game, the stag hunt game, the game of chicken and market entry games.
Consider first the battle of the sexes game, which has the following payoff matrix:
| Opera | Football |
Opera | 1, 2 | 0, 0 |
Football | 0, 0 | 2, 1 |
In this game, the row player prefers going to a football game together to going to an opera together, while the column player enjoys going to the opera together rather than going to the football game together. However, players are worse off when they fail to coordinate to go to one of them. By Proposition 4, we know that players end up playing a strict Nash equilibrium almost surely.
There exists another form of the battle of the sexes game, which has the following payoff matrix:
| Opera | Football |
Opera | 1,2 | 0,0 |
Football | 0.5,0.5 | 2,1 |
Notice that the row player enjoys going to a football game alone rather than going to an opera alone. The column player is in the opposite situation; she enjoys going to the opera alone rather than going to the football game alone. In this case, it is a possible outcome that players fail to coordinate, and they end up playing their favored actions (football, opera).
We secondly consider the stag hunt game, which has the following payoff matrix:
| Stag | Rabbit |
Stag | 10,10 | 0,8 |
Rabbit | 8,0 | 4,4 |
In the stag hunt game, at each off-diagonal action profile, one player receives the worst payoff. Therefore, by Theorem 1 or Proposition 4, players end up playing a strict Nash equilibrium almost surely.
We thirdly consider the game of chicken:
| Swerve | Stay |
Stay | 1,-1 | -10,-10 |
Swerve | 0,0 | -1,1 |
The game describes the following situation. There are two drivers who are facing each other to show their braveness. When one driver swerves while his opponent stays, he shows his cowardice to the audience. If both drivers swerve, then both of them are safe and receive nothing. However, the best outcome for each driver is that he stays, while the opponent swerves, so that he can show his braveness. Whereas the worst scenario is that both drivers stay and have a severe accident. Note that each player receives the worst payoff at (stay, stay). Therefore, by Proposition 4, players end up playing a strict Nash equilibrium or (swerve, swerve).
Lastly, consider a market entry game, which has the following payoff matrix:
| Stay Out | Enter |
Enter | 100,0 | -50,-50 |
Stay Out | 0,0 | 0,100 |
In this game, players have to decide to enter a market or stay out from the market. If a player decides to stay out, regardless of his opponent’s action, he receives nothing. If one player decides to enter while his opponent decides to stay out, he enjoys the profit from the market. However, if both players decide to enter, players face severe competition and earn negative profit. In this case, by Proposition 4, we know that players end up playing a strict Nash equilibrium or (stay out, stay out) almost surely.
7. Non-Random Weighting Parameters
7.1. Coordination Failure
In this section, we assume that players’ weighting parameters are not random variables. For example, players may believe that all past experiences equally represent the corresponding action’s value, that is, players believe that the environments in which they are involved are stationary. Therefore, in each period, players put the same weight on all past experiences and players’ assessments become the arithmetic mean of past payoffs. Note that the weighting parameters for each player are as follows: for all and , where is the number of times that the action, , is played until period
We also consider the players who have the following weighting parameters:
for all
and
n as in Sarin and Vahid [
12]; all players have constant weighting parameters in all periods, that is, both players always put the same weight on the received payoff in each period. It is reasonable to assume this condition if players believe that the situation they are facing is non-stationary. If
λ is close to one, then players believe that only the most recent payoffs give information about the values of corresponding actions. If
λ is close to zero, then players believe that initial assessments of actions mostly represent the actions’ value.
In this section, we consider the battle of the sexes game, in which players may play off-diagonal action profiles alternately without ending up at a Nash equilibrium. In detail, we first consider the case where for all and n and off-diagonal payoffs for each player are all equivalent; . In particular, we assume that and .
As an example, consider the case where players’ initial assessments are as follows: where is an irrational number. In this case, in the first period, they play , and both players receive a payoff of zero. In period 2, players’ assessments are as follows: Notice that the assessments of and are greater than the assessments of and . Hence, players play , and both players receive a payoff of zero. Using the payoff information in period 2, they update their assessments, and they have the following assessments in period 3: Then, players play in period 3. Notice that their assessments of action and never coincide with the assessments of action and at any period because of ϵ. After period 3, players play until the corresponding assessments become lower than the assessments of After the event, players again switch back to play , and so on.
When for all and the following statement shows the condition of initial assessments for coordination failures, which is the play on off-diagonal action profiles alternately. In this section, we assume that players’ tie break rules satisfy the inertia condition.
Proposition 7. In 2 × 2 coordination games with under the inertia condition, if for all and n, then the necessary and sufficient condition for the coordination failure is as follows: Proof. See the
Appendix. ☐
This result says that players will play non-Nash equilibria alternately forever, if and only if players’ ratios of initial assessments “coordinate”.
Next, we consider the players who have the following weighting parameters: for all i, and n. Then, the necessary and sufficient condition of initial assessments for the coordination failure is as follows:
Proposition 8. In 2 × 2 coordination games with under the inertia condition, if for all and n, then the necessary and sufficient condition for the coordination failure is as follows; for some :or Proof. See the
Appendix. ☐
Since players play a Nash equilibrium forever if they coordinate once on the Nash equilibrium, for each case, the negation of the condition is the one for the success of coordination. For instance, if off-diagonal payoffs are all zero and players are frequentists, then they coordinate in some period and in all subsequent periods, if and only if the initial assessments for both players and actions should satisfy the following condition: .
7.2. Non-Convergence to a Mixed Nash Equilibrium
It is an interesting question whether the empirical frequencies of play on the off-diagonal action profiles converge to the mixed Nash equilibrium. In fictitious play, Monderer and Shapley [
9] show that every 2 × 2 game with the diagonal property
15 has the fictitious play property; the empirical frequencies of past play, which is a belief of players about an opponent player’s behaviour, converges to a Nash equilibrium.
First note that 2 × 2 coordination games with
also have the diagonal property. In this case, under the condition of coordination failure, players forever play off-diagonal action profiles alternately. However, the frequency of the play need not converge to the mixed Nash equilibrium. We show this by an example. Consider a pure coordination game, which has the following payoff matrix:
| | |
| 1,2 | 0,0 |
| 0,0 | 2,1 |
We assume that weighting parameters and initial assessments for players are as follows: , , , Under the inertia condition for both players, it is easy to see that players play action profiles in the following order: In period 1, they play , and the assessments of and become Because of the inertia condition, they choose again in period 2, and their assessments become Now, players change to play in period 3, and the assessments of and become In period 4, players return to play , and so on. Therefore, the empirical frequencies of play for both players converge to , while the mixed Nash equilibrium in this game is