This article is an open-access article distributed under the terms and conditions of the Creative Commons Attribution license (http://creativecommons.org/licenses/by/3.0/).

The finitely repeated Prisoners’ Dilemma is a good illustration of the discrepancy between the strategic behaviour suggested by a game-theoretic analysis and the behaviour often observed among human players, where cooperation is maintained through most of the game. A game-theoretic reasoning based on backward induction eliminates strategies step by step until defection from the first round is the only remaining choice, reflecting the Nash equilibrium of the game. We investigate the Nash equilibrium solution for two different sets of strategies in an evolutionary context, using replicator-mutation dynamics. The first set consists of conditional cooperators, up to a certain round, while the second set in addition to these contains two strategy types that react differently on the first round action: The ”Convincer” strategies insist with two rounds of initial cooperation, trying to establish more cooperative play in the game, while the ”Follower” strategies, although being first round defectors, have the capability to respond to an invite in the first round. For both of these strategy sets, iterated elimination of strategies shows that the only Nash equilibria are given by defection from the first round. We show that the evolutionary dynamics of the first set is always characterised by a stable fixed point, corresponding to the Nash equilibrium, if the mutation rate is sufficiently small (but still positive). The second strategy set is numerically investigated, and we find that there are regions of parameter space where fixed points become unstable and the dynamics exhibits cycles of different strategy compositions. The results indicate that, even in the limit of very small mutation rate, the replicator-mutation dynamics does not necessarily bring the system with Convincers and Followers to the fixed point corresponding to the Nash equilibrium of the game. We also perform a detailed analysis of how the evolutionary behaviour depends on payoffs, game length, and mutation rate.

During the past two decades there has been a huge expansion in the development and use of agent-based models for a variety of societal systems and economic phenomena, ranging from markets of various types and societal activities such as energy systems and land use, see e.g., [

In the modeling and construction of agents it is therefore of high importance that the assumptions made on rationality and the reasoning process are made explicit. Binmore discusses this in his classic papers ”Modeling rational players” [

One of the major achievements in game theory is the establishment of the Nash equilibrium concept and the existence proof that any finite game has at least one such equilibrium [

In several finitely repeated games, in which the number of rounds is known, the solution of how players choose actions can be guided by the backward induction procedure. This is often exemplified by the Prisoners’ Dilemma, for which the single round game has a unique Nash equilibrium with both players defecting, while the indefinitely repeated game has an uncountable infinity of equilibria allowing for cooperation. However, when the exact number of rounds,

There are at least two important objections against the generality of the reasoning based on backward induction. The first objection is empirical, since studies on how human players behave in the game show a substantial level of cooperation, but with a transition to lower levels of cooperation towards the end. Explanations are several, and this implies that several mechanisms are in play. For example, it has been observed in the laboratory that subjects cooperate initially but attempt to cheat each other by deviation in the end [

The second objection is conceptual and strongly connected to the notion of rationality and what can be considered as a rational way of reasoning. The only equilibrium that can exist in a given finite repetition is the Nash equilibrium, but whether that is to be considered as rational is the question. A critical point concerns what conclusion a player should draw if the opponent deviates from what backward induction implies and instead cooperates

In this situation, the choice between (i) following backward induction and defecting from start and (ii) deviating from backward induction by starting with cooperation becomes a strategic decision. One can imagine ”rational” players in both categories. In the first category, there are then two options, either one just plays defect throughout the game whatever the opponent does, as backward induction suggests, or one switches to cooperation if the opponent cooperates. In the second category, both players are again faced with the question of, provided the opponent is cooperating, when to switch to defection. Obviously, there cannot exist a fixed procedure for deciding on when it is optimal to switch from cooperation to defection, since such a strategy would be dominated by the one that switches one round before. However, the interaction and survival of different ways to handle first round cooperation can be studied using evolutionary methods.

The purpose of this paper is to investigate in an evolutionary context the performance of strategies representing the strategic choices discussed above in the finitely repeated Prisoners’ Dilemma. In Binmore’s terms, we focus on an evolutive process, in which each agent has a certain, relatively simple strategy for the game, and the mix of strategies and their evolution is investigated on the population level. Importantly, the chosen strategies can all be seen as components in the reasoning processes discussed above: both (i) the steps involved in the backward induction process, and (ii) the steps initiating and responding to cooperation in the first round which then reflects the possibility for strategies to deviate from equilibrium play. It is well known that evolutionary drift or mutations, at least if sufficiently strong, can drive the population away from a fixed point corresponding to the Nash equilibrium. Under what circumstances does the evolutionary dynamics lead to the same result as the backward induction process with a Nash equilibrium as its fixed point, and when can deviation from Nash equilibrium play alter that process? The answer, which is elaborated in this paper, depends on choices of a number of critical model characteristics and parameters: selected strategy space, mutation rate, payoff matrix, and the length of the game.

We prove that, for a simple set of strategies,

Most of the work related to evolutionary dynamics, backward induction, and the finitely repeated Prisoners’ Dilemma concerns the replicator dynamics [

Several authors have investigated various types of evolutionary dynamics under the effect of perturbations or mutations ([

Ponti [

Our work is also related to the literature on the ”backward induction paradox” [

The evolution of strategies in the finitely repeated Prisoners’ Dilemma is studied using replicator dynamics with a uniform mutation rate. This is a model of an infinite population where all interact with all, and in which each strategy _{i}_{i}

The score

Finite state machine illustrating the first strategy set, Γ_{1}, of the _{i}_{0} that starts with defection in the right node (D). _{i}

We investigate the evolutionary behaviour considering two sets of strategies. The first one is a strategy set Γ_{1} that represents various levels of depth in applying the backward induction procedure to conditional cooperation. A strategy in this set is denoted _{k}_{N}_{0} defects from the first round. It is then clear that strategy _{k}_{k}_{+1} (for _{0}.

Note that the entire strategy set Ω for the ^{2N −1} possible strategies, e.g. for ^{308}. The selection of strategies to consider is critical. One can certainly introduce strategies so that other Nash equilibria are introduced along with those characterized by always defection. For example, selecting only three strategies, e.g., {_{0}, _{5}, _{10}} will lead to a game with three equilibria, one for each strategy. But this is an artefact of the specific selection made. Here, we do not want to create new Nash equilibria but we want to investigate how and if the evolutionary dynamics brings the population to a fixed point dominated by defect actions corresponding to the original Nash equilibrium. One way to achieve this is to make sure that iterated elimination of weakly dominated strategies can be applied to the constructed strategy set, in a way so that only strategies that defect throughout the game remain, keeping the non-cooperative characteristic of the Nash equilibrium.

The second set of strategies Γ_{2} (in _{1} to include also strategies corresponding to steps of reasoning in which one (i) tries to establish cooperation even if the opponent defects in the first round and (ii) responds to such attempts by switching to cooperation for a certain number of rounds. We refer to such strategies as ”Convincers” and ”Followers”, respectively.

Finite state machine illustrating the extended strategy set Γ_{2} consisting of the strategies _{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}_{i}

A Convincer strategy _{k}_{k}

A Follower strategy _{k}_{k}_{k}_{k}_{k}

For the extended strategy set, Γ_{2}, it is straightforward to see that iterated elimination of weakly dominated strategies, starting with those cooperating throughout the game, leads to a Nash equilibrium with only defectors.

For the first strategy set, Γ_{1}, the Nash equilibrium of (_{0}, _{0}) is strict since any player deviating would score less. For the second strategy set, Γ_{2}, this Nash equilibrium is no longer strict as one of the players could switch to a Follower strategy, still defecting and scoring the same. For the first strategy set, the NE is unique, but for the second one that is not necessarily the case. Since backward induction still applies in the second set, we know that any NE is characterized by defection only, which can be represented by a pair (_{0}, _{k}_{0} player cannot gain by switching to _{k}_{−}_{1} (or to _{2} if _{0}, _{k}_{j}_{k}

This means that in a part of the payoff parameter region, for _{0}, _{0}). This illustrates the fact that in the NE one can switch from a pure defector to a Follower without reducing the payoff. If this happens under genetic drift in evolutionary dynamics, the situation may change so that Convincers may benefit and cooperation can emerge.

The dynamic behaviour and the stability properties of the fixed points are investigated both analytically and numerically, for the two strategy sets presented in _{N}_{i}_{k}_{k}

Realisations of the evolutionary dynamics, Equation (3), for both the simple and the extended strategy sets are shown in

Illustration of the dynamics for a particular game (P=0.2, T=1.33) for 10-round Repeated Prisoners’ Dilemma (_{0}, ..., _{10}). Below: the extended strategy set, which includes also Convincers (_{2}, ..., _{10}) and Followers (_{2}, ..., _{10}). When lowering

First, we consider the case with the simple strategies (_{0}, ..., _{10}) in _{10} players, the dynamics will, for both levels of mutation rate, lead to a gradual unraveling of cooperation to a point where _{0}, full defection, dominates the population. The first step of this unraveling occurs because _{9} defecting in the final round will have higher payoff than _{10}. At this stage _{0} is much worse off, but the population goes through a series of transitions which reminds of a backward induction process. This can also be seen in terms of average payoff, as illustrated in ^{−}^{12}, there is no re-appearance of cooperation. When the mutation rate gets too low, strategies other than defection are kept on a level that is too low to promote further cooperation. This demonstrates that the mutation rate can affect whether cooperation re-appears or not.

Second, we consider the case with extended strategies (the 3_{0}, ..._{10}, _{2}, ..., _{10}, _{2}, ..., _{10}) shown at the bottom of

In the next section, we will investigate the dynamics and the stability characteristics of both the simple and the extended strategy sets in detail, varying the payoff parameters over the full ranges, and investigating the behaviour in the limit of diminishing mutation rate.

We now turn to examine the existence of stable fixed points in the dynamics for low mutation rates. For the simple strategy set the following proposition holds.

_{1}, the fixed point associated with the Nash equilibrium, dominated by strategy _{0}, is stable under the replicator-mutation dynamics, if the mutation rate is sufficiently small (but positive).

Numerical analysis showing which games have stable fixed points. Stable fixed points have been found to the right of a given line, at which they disappear if

Our results for the extended strategy set, Γ_{2}, are based on numerical investigations: by using an eigenvalue analysis of the Jacobian of the replicator-mutation dynamics, Equation (3), we determine for which parameters

This stability analysis over the parameter space and the results are presented for different lengths of the game in

Note from the discussion in

Motivated by the findings above in

The interesting case is when mutation rates are small: higher mutation rates introduce a background of all different strategies, which can be seen as artificially keeping up cooperative behaviour in the population. To avoid this effect, we investigate the dynamics with 0 <

First, we characterise the simple and the extended strategy set for different game lengths, varying mutation rates, and varying the parameters of _{0} strategy becomes stable. On the other hand, for the extended strategy set, a considerable fraction of games seem to offer recurring phases of cooperation despite lowering the mutation. For the 10-round game, the line describing the critical parameters is seen to converge as the mutation rate decreases, _{k}_{k}

The discussion in

Parameter diagram showing which games have recurrent cooperation in the evolutionary dynamics from a starting point of initial cooperation. Top row: simple strategy set. Middle row: extended strategy set with Convincers and Followers. In the graphs, recurrent cooperation exists to the left of the line and a fixed point with defectors characterizes the behaviour to the right. For simple strategies, lowering

A key point in game theory is that a player’s strategic choice must consider the strategic choice the opponent is making. For finitely repeated games, backward induction as a solution concept has become established by assuming player beliefs being based on common knowledge of rationality. However, this assumption says nothing about how players would react to a deviation from full defection—a deviation from the Nash equilibrium—since it a priori rules out actions and reactions that exemplify other ways of reasoning.

Motivated by the general importance of backward induction and what has been called its ”paradox”, we have introduced an evolutionary analysis of the interaction in a population of strategies that react differently to out-of-equilibrium play in the first round of the game. We have shown how extending a strategy set for this possibility, in the special case of the repeated Prisoners’ Dilemma, allows for stable limit cycles in which cooperative players return after a period of defection. The introduction of Convincers and Followers, representing both strategies that try to establish cooperation and strategies that are capable of responding to that, are made in a way to preserve the structure of the selected strategy set so that elimination of weakly dominated strategies leads to full defection.

For the simple strategy set, as the mutation rate becomes sufficiently small, the cyclic behaviour disappears and the system is attracted to a stable fixed point. The stability of this fixed point was shown analytically for a sufficiently small mutation rate

For the extended strategy set, for low levels of mutation, the numerical investigation of fixed point stability and oscillatory modes indicates that, for a certain part of payoff parameter space, the evolutionary dynamics does not reach a stable fixed point but stays in an oscillatory mode, unlike the case of simple strategy set. We characterise our results by a detailed quantitative analysis of where this occurs: showing how the length of the repeated game and the mutation rate affects the boundaries of this region.

One of the main results of the study is an affirmative answer to the question whether different responses to out-of-equilibrium play in the first round can make the dynamics avoid fixed points, and the corresponding Nash equilibrium. Additionally, the fixed point analysis showed the co-existence of a stable fixed point and stable oscillations with recurring phases of cooperation. This means that a system with different responses to out-of-equilibrium play may be found far from its possible stable fixed point. Taken together, this illustrates that the Nash equilibrium play can be unstable at the population level when mutations make explorations off the equilibrium path possible.

This paper contributes to the backward induction discussion in game theory, but more broadly to the study of repeated social and economic interaction. Many models, typically much larger and less transparent ones, of social and economic systems involve agents. If solving these systems means finding the Nash equilibria, then one may doubt whether that is a good representation of rational behaviour except under certain conditions as we have discussed in the paper. We have shown that strategies corresponding to the Nash equilibrium cannot be taken for granted when they interact and compete with strategies that act and respond differently to out-of-equilibrium play.

Financial support from the Swedish Energy Agency is gratefully acknowledged. We would also like to thank two anonymous reviewers for constructive criticism and for inspiring us to prove Proposition 1. We also thank David Bryngelsson for valuable comments on the introduction.