Next Article in Journal
Can Justice and Fairness Enlarge International Environmental Agreements?
Next Article in Special Issue
Erev, I. et al. A Choice Prediction Competition for Market Entry Games: An Introduction. Games 2010, 1, 117-136
Previous Article in Journal
The ‘Hawk-Dove’ Game and the Speed of the Evolutionary Process in Small Heterogeneous Populations
 
 
Correction published on 21 July 2010, see Games 2010, 1(3), 221-225.
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Choice Prediction Competition for Market Entry Games: An Introduction

1
Max Wertheimer Minerva Center for Cognitive Studies, Faculty of Industrial Engineering and Management, Technion, Haifa 32000, Israel
2
Computer Laboratory for Experimental Research, Harvard Business School, Boston, MA, 02163, USA
3
Department of Economics, 308 Littauer, Harvard University, Cambridge, MA 02138, USA
4
Harvard Business School, 441 Baker Library, Boston, MA 02163, USA
*
Author to whom correspondence should be addressed.
Games 2010, 1(2), 117-136; https://doi.org/10.3390/g1020117
Submission received: 30 April 2010 / Accepted: 12 May 2010 / Published: 14 May 2010
(This article belongs to the Special Issue Predicting Behavior in Games)

Abstract

:
A choice prediction competition is organized that focuses on decisions from experience in market entry games (http://sites.google.com/site/gpredcomp/ and https://www.mdpi.com/si/games/predict-behavior/). The competition is based on two experiments: An estimation experiment, and a competition experiment. The two experiments use the same methods and subject pool, and examine games randomly selected from the same distribution. The current introductory paper presents the results of the estimation experiment, and clarifies the descriptive value of several baseline models. The experimental results reveal the robustness of eight behavioral tendencies that were documented in previous studies of market entry games and individual decisions from experience. The best baseline model (I-SAW) assumes reliance on small samples of experiences, and strong inertia when the recent results are not surprising. The competition experiment will be run in May 2010 (after the completion of this introduction), but they will not be revealed until September. To participate in the competition, researchers are asked to E-mail the organizers models (implemented in computer programs) that read the incentive structure as input, and derive the predicted behavior as an output. The submitted models will be ranked based on their prediction error. The winners of the competition will be invited to publish a paper that describes their model.

1. Background

Previous study of the effect of experience on economic behavior highlights the potential of simple learning models. Several investigations demonstrate that simple models can provide surprisingly accurate ex-ante predictions of behavior in some situations. For example, Erev and Roth [1] demonstrate that a 3-parameter reinforcement learning model provides useful predictions of choice rates in 12 games with unique mixed strategy equilibrium. Additional indications of the potential of simple learning models come from the observed similarities of the basic reaction to feedback across species (e.g., [2,3,4]), and the discovery that the activity of certain dopamine neurons is correlated with one of the terms assumed by reinforcement learning models [5].
However, more recent studies reveal that the task of advancing beyond the demonstrations of the potential of simple models is not simple. Different studies appear to support different models, and the relationship between the distinct results is not always clear [6].
We believe there are two main reasons for the inconsistencies in the learning-in-games literature. First is the fact that learning is only one of the factors that affect behavior in repeated games. Other important factors include: framing, fairness, reciprocation, and reputation. It is possible that different studies reached different conclusions because they studied learning in environments in which these important factors have different implications.
A second cause of confusion is a tendency to focus on relatively small data sets and relatively small sets of models. Erev and Roth [1] tried to address this problem by studying 12 games and two families of models. We now understand that this data set and group of models were not large enough. Recent research show that is surprisingly easy to over-fit learning data sets [7,8].
Erev et al. [9] took two measures to address these problems. The first is an extensive experimental study of the effect of experience under conditions that minimize the effect of other factors. The second is the organization of an open choice prediction competition that facilitates the evaluation of a wide class of models. Specifically, Erev et al. organized a simplified version of the competition run by Arifovic, McKelvey, and Pevnitskaya [10]. They ran two large experiments examining different problems drawn randomly from the same space, and challenged other researchers to predict the results of the second study based on evaluation of the results of the first study. The main result of (the repeated decisions from experience part of) that investigation was an indication of a clear advantage of models that assume instance-based reasoning (and reliance on small set of experiences) over more popular models that assume sequential adaptation of propensities (like reinforcement learning and fictitious play). The winner of the competition was an instance based model that assumes an ACT-R cognitive architecture (submitted by Stewart, West and Lebiere based on [11]).
The main goal of the current research is to extend Erev et al.’s competition along two dimensions. The first extension involves the source of uncertainty. Erev et al. focused on individual decisions under uncertainty; thus, the environment (state of nature) was the sole source of uncertainty in the situations they studied. The current study focuses on games that involve both environmental and strategic uncertainty. The second extension is related to the available feedback. Erev et al. focused on situations in which the feedback is limited to the obtained outcomes. The current study focuses on learning under more complete feedback.
We believe that the set of situations considered here (decisions under environmental and strategic uncertainty based on complete feedback when the role of framing, fairness, reciprocation and reputation is small) is more than a good test bed for learning models. This set of situations is also a simulation of many natural environments that have similar characteristics. One set of natural examples involve transportation dilemmas. When deciding how to commute to work, for example, commuters are likely to rely on past experience and pay limited attention to fairness and similar considerations. To clarify the relationship of the current investigation to this class of natural problems we chose to focus on Market Entry games [12,13,14] that model simple transportation dilemmas. Specifically, we consider 4-person 2-alternative Market Entry games. Each player in these games has to decide between a safe option and risky entry to a market in which the payoff decreases with the number of entrants. Under one transportation cover story the safer option abstracts “taking the train”, and the risky option abstracts “driving”.
Previous experimental studies of behavior in Market Entry games [14,15,16] reveal surprisingly fast convergence to Nash equilibrium. For example, experimental study of the market entry game presented in Table 1 documents convergence to equilibrium after 5 trials even when the players did not receive a description of the incentive structure [16]. At first glance this observation appears to be inconsistent with the observation of a robust deviation from maximization even after hundreds of trials in studies of individual decisions under uncertainty based on feedback [17]. However, it is possible that the same learning process leads to different behaviors in different environments (as in e.g. [18]). We hope that the present study will shed light on this possibility.
Table 1. An example of an experimental study of decisions from experience in a market entry game [16].
Table 1. An example of an experimental study of decisions from experience in a market entry game [16].
At each trial, each of 12 players has to decide (individually) between “entering a risky market”, or “staying out” (a safer prospect). The payoff from entering decreases with the number of entrants (E). The exact payoff for player i is at trial t is:
V i ( t ) = { 1 + 2 ( 8 -E ) if i enters 1 if i does not enter
The participants did not receive a description of the incentive structure. Their information was limited to the payoff they received after each trial.
Like Erev et al.’s [9] competition, the current competition is based on the data from two lab experiments: an estimation experiment, and a competition experiment. The estimation experiment was run in March 2010 and focused on the 40 games presented in Table 2. The experimental procedure and the results obtained in this study are presented below. The competition experiment will be run after 28 April 2010 (after the completion of the current introduction paper). It will use the same method as the estimation experiment, but will focus on different games and different experimental subjects.
Table 2. The 40 market entry games that were studied in the estimation experiment. At each trial, each of four players has to decide (individually) between “entering a risky market”, or “staying out” (a safer prospect). The payoffs depend on a realization of a binary gamble (the realization at trial t is denoted Gt, and yields “H with probability Ph; and L otherwise”), the number of entrants (E), and two additional parameters (k and S). The exact payoff for player i at trial t is:
V i ( t ) = { 10 -k ( E )   + G t if i enters round ( G t / S )  with p = .5; -round ( G t / S )  otherwise if i does not enter
The left hand-columns present the exact value of the different parameters in the 40 games, the right hand columns present the equilibrium predictions, and the main experimental results in the first and second block of 25 trials (B1 and B2).
Table 2. The 40 market entry games that were studied in the estimation experiment. At each trial, each of four players has to decide (individually) between “entering a risky market”, or “staying out” (a safer prospect). The payoffs depend on a realization of a binary gamble (the realization at trial t is denoted Gt, and yields “H with probability Ph; and L otherwise”), the number of entrants (E), and two additional parameters (k and S). The exact payoff for player i at trial t is:
V i ( t ) = { 10 -k ( E )   + G t if i enters round ( G t / S )  with p = .5; -round ( G t / S )  otherwise if i does not enter
The left hand-columns present the exact value of the different parameters in the 40 games, the right hand columns present the equilibrium predictions, and the main experimental results in the first and second block of 25 trials (B1 and B2).
Experimental results
The gamesEntry at eq.Entry ratesEfficiencyAlternations
#KphHLSpuresym-metricB1B2B1B2B1B2
120.0470-351.001.000.710.802.772.660.180.15
220.2330-941.001.000.550.622.642.750.270.24
320.671-231.001.000.880.942.392.240.120.04
420.7330-8041.001.000.710.642.582.570.300.28
520.8020-8051.001.000.660.672.502.670.340.28
620.834-2031.001.000.730.822.452.500.280.18
720.946-9051.001.000.860.872.342.380.150.13
820.951-2051.001.000.860.912.482.310.140.10
920.964-9031.001.000.870.902.362.340.140.08
1030.1070-840.750.770.420.481.221.110.350.28
1130.909-8040.750.770.800.73-0.330.290.200.24
1230.917-7060.750.770.760.830.10-0.410.210.14
1340.0660-420.500.500.420.410.520.840.270.17
1440.2040-1040.500.500.480.46-0.340.040.360.34
1540.3120-940.500.500.490.44-0.070.300.380.37
1640.604-620.500.500.560.58-0.27-0.260.280.30
1740.6040-6030.500.500.580.55-0.96-0.200.330.28
1840.733-820.500.500.570.55-0.290.090.260.23
1940.8020-8020.500.500.640.63-1.30-1.210.290.28
2040.901-960.500.500.530.480.120.630.240.18
2140.963-7030.500.500.650.62-0.84-0.380.270.19
2250.0280-230.250.330.360.310.240.640.210.19
2350.0790-730.250.330.390.24-0.810.340.240.17
2450.5380-9050.250.330.650.58-3.41-2.440.290.38
2550.801-420.250.330.450.42-0.310.110.240.19
2650.884-3030.250.330.520.49-0.95-0.570.240.21
2750.935-7040.250.330.570.57-1.63-1.430.310.24
2860.1090-1050.250.220.260.27-0.130.070.250.20
2960.1930-730.250.220.390.32-1.35-0.450.290.28
3060.2950-2030.250.220.470.48-2.74-2.430.400.36
3160.467-660.250.220.380.34-0.90-0.380.270.24
3260.576-840.250.220.440.39-1.56-0.590.280.29
3360.8220-9030.250.220.630.55-5.33-3.140.320.24
3460.888-6040.250.220.570.50-3.30-1.960.220.22
3570.0690-640.250.140.310.35-1.40-1.430.310.24
3670.2130-830.250.140.390.31-2.20-1.040.350.26
3770.5080-8050.250.140.510.55-4.18-4.780.370.40
3870.699-2050.250.140.460.34-2.62-0.880.310.23
3970.817-3020.250.140.410.34-2.25-0.930.270.25
4070.911-1020.250.140.340.27-0.71-0.300.220.20
Means
Estimated error variance
0.510.510.56
.0016
0.54
.0015
-.0.39
.1370
0.04
.1188
0.27
.0018
0.23
.0015
On 24 April 2010 (before running the competition experiment) we posted the data of the estimation experiment and a description of the baseline models (presented below) on the Web (http://sites.google.com/site/gpredcomp/ and https://www.mdpi.com/si/games/predict-behavior/; see also Web Appendices) and we are now inviting other researchers to participate in a competition that focuses on the prediction of the data of the second (competition) experiment. The call to participate in the competition will be published in the journal Games and in the e-mail lists of the leading scientific organizations that focus on decision-making, game theory, and behavioral economics. The competition is open to all; there are no prior requirements. The predictions submission deadline is 1 September 2010 (at midnight EST).
Researchers participating in the competition are allowed to study the results of the estimation study. Their goal is to develop a model that would predict the results of the competition study. The model has to be implemented in a computer program that reads the payoff distributions of the relevant games as an input and predicts the main results as output.

The parameters of the games, and their selection algorithm.

At each trial of the games studied here, each of four players has to decide (individually) between “entering a risky market”, or “staying out” (a safer prospect). The payoffs depend on a realization of a binary gamble (the realization at trial t is denoted Gt, and yields “H with probability Ph; and L otherwise”), the number of entrants (E), and two additional parameters (k and S). The exact payoff for player i at trial t is:
V i ( t ) = { 10 -k ( E )   + G t if i enters round ( G t / S )  with p = .5; -round ( G t / S )  otherwise if i does not enter
The left-hand columns of Table 2 present the exact value of the different parameters of the 40 games that were studied in the estimation experiment. The problems were determined with a random selection of the parameters (k, S, L, M, H and Ph) using the algorithm described in Appendix 1. Notice that the algorithm implies that the expected value of the gamble is 0, and a uniform distribution of k between 2 and 7. These constraints imply that the risk neutral Nash equilibria of the games are determined by the value of k, and that the mean entry rate at equilibrium over games is 50% (these predictions are discussed in Section 5.1. below).

2. Experimental Method

The estimation experiment was run in March 2010 in the CLER lab at Harvard. One hundred and twenty students, members of the lab’s subject-pool that includes more than 2000 students, participated in the study. The study was run in eight independent sessions, each of which included between 12 and 20 participants. Each session focused on 10 of the 40 entry games presented in Table 2, and each subset of 10 games was run twice, counterbalancing the order of problems. The experiment was computerized using Z-tree [19]. After the instructions were read by the experimenter, each participant was randomly matched with three other participants and the four subjects then played each of the 10 games for 50 trials.
The participants’ payoff in each trial was computed by the game’s payoff rule described above. This rule implies that the payoff is a function of the player’s own choice, the choices of the other three participants in the group (such that the more people enter the less is the payoff from entry), and the trial’s state of nature. Participants did not receive a description of the payoff structure but received feedback after each trial, which included the result of their choice, and the result that they could receive had they selected otherwise (the “foregone” payoff).1 Notice that the incentive structure implies that the obtained payoff of the entrants (10-k(E) + Gt), is larger than the forgone payoff from entering observed by the players who did not enter (10-k(E+1) + Gt).
The whole procedure took about 70 minutes on average. Participants’ final payoffs were composed from the sum of a $25 show-up fee, their payoff (gain/loss) in one randomly selected trial, and a small bonus they received each time they responded within 4 seconds in a given trial. Final payoffs ranged between $18 and $36.25.

3. Experimental results

The “Entry rates,” “Efficiency,” and “Alternations” columns in Table 2 present the main results in the first block (B1, the first 25 trials) and the second block (B2, last 25 trials) in each game. The entry rates over games are higher than 50%, and are relatively stable over blocks. The difference between the first block (56%) and the second block (54%) is only marginally significant (t(39) = 1.87, p < .1 using game as a unit of analysis). Additional analysis reveals a higher entry rate in the very first trial (66%). This rate is significantly higher than 50% (t(119) = 6.36, p< .001 using participant as a unit of analysis). The observed entry rate in the very first trial (first trial in the first game played by each participant) is even higher (72%).
The efficiency columns present the observed expected payoffs. The expected payoff of Player i is 10-k(E) if i entered, and 0 otherwise. These columns show an increase in efficiency from -.39 in the first block, to +0.04 in the second block. The increase is significant (t(39) = 4.66, p < .0001).
The alternation columns present the proportion of times that players change their choices between trials (i.e., each trial in which a player chooses a different option than what she had chosen in the previous trial is marked as a “change”). The difference between the first block (27%) and the second block (23%) is small but highly significant (t(39) = 5.8, p < .0001).
Additional analyses reveal replications of eight behavioral regularities that have been observed in previous studies of market entry games and individual decisions from experience. These regularities are summarized below:
The payoff variability effect. Comparison of the observed entry rates in the second block with the equilibrium predictions reveals high correlations: 0.81, and 0.84 with the symmetric and pure strategy predictions respectively. The accuracy of the equilibrium predictions is particularly high in games with relatively low environmental uncertainty. For example, in Games 16 and 18 (absolute value of H and L below 10) the equilibrium predictions are 50% and the observed entry rates in both blocks are between 50% and 60%. This pattern is a replication of the results documented by Rapoport and his co-authors [14,15,16]. The bias of the equilibrium predictions increases with the standard deviation of the gamble. For example, the correlation between the “bias of the symmetric equilibrium prediction of the entry rate” and the standard deviation of the gamble is 0.47. This pattern is consistent with “the payoff variability effect” [20,21]: High payoff variability appears to reduce sensitivity to the incentive structure.
High sensitivity to forgone payoffs. Comparison of the effect of obtained and forgone payoffs reveals “high sensitivity to forgone payoffs”. For example, the tendency to repeat the last choice was slightly better predicted by the most recent forgone payoff than by the most recent obtained payoff. The absolute correlations between the probability of repetition (1 for repetition, 0 otherwise) and the recent payoffs (over all choices in trials 2 to 50) were 0.06 and 0.05 for the forgone and the obtained payoff respectively. Similar results are reported by Grosskopf et al. [22].
Excess entry. The proportion of choices of the risky alternative does not appear to reflect risk aversion and/or loss aversion [23]. The overall R-rate (55%) is higher than the predicted R-rate at equilibrium under the assumption of risk neutrality (less than 51%). Thus, the results reflect excess entry (see [24])
Underweighting of rare events. Analysis of the correlation between the parameters of the environmental uncertainty (H, Ph, and L) and the observed entry rates reveals an interesting pattern. The entry rates (using game as a unit of analysis) are positively correlated with Ph (0.54) and negatively correlated with H and L (-.40 and -.52). Recall that in the current set of games, Ph is negatively correlated with H and L (-.80 and -.59). Thus, the negative correlation of the entry rates with L and H can be a product of higher sensitivity to the probabilities than to the exact outcomes. This pattern is consistent with the observation of underweighting of rare events in decisions from experience (see [25]).
Surprise-triggers-change. Figure 1 presents the alternation rate as a function of Ph (using game as a unit of analysis). It reveals a reversed U relationship: The alternation rate is maximal when Ph is close to 0.5. Nevo and Erev [26] who observed a similar pattern in individual choice tasks note that it can be captured with the assertion that surprise-triggers-change.
Figure 1. Proportion of alternation as a function of Ph. Each data point summarizes the results of one game. The outlier (alternation rate of 0.08 when Ph = 0.67) is Problem 3 which involves the lowest payoff variance, and it is the only problem in which entry cannot lead to losses.
Figure 1. Proportion of alternation as a function of Ph. Each data point summarizes the results of one game. The outlier (alternation rate of 0.08 when Ph = 0.67) is Problem 3 which involves the lowest payoff variance, and it is the only problem in which entry cannot lead to losses.
Games 01 00117 g001
The very recent effect. Analysis of the results in the second block shows that the proportion of choices of the alternative that led to the best outcome in the most recent trial is 67.4%. This “Best Reply 1” rate is larger than the proportion of choices of the outcome that led to the best outcome in the previous trial (Best Reply 2 = 65.4%). The difference is significant (t(119) = 3.89, p < .001) and implies a recency effect. However, the length of this recency effect is not large: Best Reply 2 is not larger than the mean Best Reply to previous trials. Evaluation of the effect of the recent 12 trials (excluding the most recent) reveals that the lowest score is Best Reply 3 (65.1%) and the highest scores are Best Reply 8 and Best Reply 12 (66.0% and 65.9% respectively). Nevo and Erev [26] refer to this pattern as the “very recent effect”.
Strong inertia. The results presented above imply that the participants select the option that led to the best outcome in the most recent experience in 67.4% of the trials, and repeat their last choice in 75% of the trials. Thus, they are more likely to exhibit inertia than to respond to their recent experience (see similar observation in [27]).
Individual differences. Table 3 summarizes the results of four analyses that examine the correlation between behaviors in the different games. These analyses use participant as a unit of analysis and focused on four variables: Entry rate, expected payoff, alternation and recency (the proportion of choice of the option that led to the best payoff in the last trial). Since we run four sets of ten games, we could compute 45X4 = 180 different correlations between pairs of games. Table 3 presents the mean over the 180 correlations, and the proportion of positive correlations. The results show that most correlations are positive. Thus, they are consistent with previous research that suggests robust individual differences in decisions from experience [28,29].
Table 3. Summary of correlation analyses that examine the possibility of consistent individual differences. The summary scores are based on 180 correlation analyses (180 pairs of games) for each of the four variables.
Table 3. Summary of correlation analyses that examine the possibility of consistent individual differences. The summary scores are based on 180 correlation analyses (180 pairs of games) for each of the four variables.
VariableMean correlationProportion of positive correlations
Entry rate0.2490.844
Maximization0.0580.611
Alternation0.3720.977
Recency0.1940.806

4. Competition criteria

The current competition focuses on the prediction of the six statistics presented in Table 2; that is, the entry rates, the efficiency (mean payoffs), and the alternation rates in the first and the second block of 25 trials. As in Erev et al. [9] the accuracy of the prediction will be evaluated using a mean squared error criterion. Specifically, we focus on normalized mean deviation scores. The computation of this score for each of these six statistics includes three steps: (i) A computation of the squared deviation between the model’s prediction and the observed statistic in each of the 40 games. (ii) Computation of the mean squared deviation over the 40 games. (iii) Normalization of each score by the variable’s estimated error variance (the estimated error variances are presented below the means in Table 2). The computation criteria (the model’s final score) is the mean of the six normalized MSD (nMSD) scores.

5. Baseline models

The results of the estimation study were posted on the competition website on 24 April 2010 (before the beginning of the competition study). At the same time we posted several baseline models. Each model was implemented as a computer program that satisfies the requirements for submission to the competition. The baseline models were selected to achieve two main goals. The first goal is technical: The programs of the baseline models are part of the “instructions to participants”. They serve as examples of feasible submissions.
The second goal is to illustrate the range of MSD scores that can be obtained with different modeling approaches. Participants are encouraged to build on the best baselines while developing their models. The baseline models will not participate in the competition. The following sections describe nine baseline models.

5.1. Nash equilibria

Two classes of Nash equilibrium models were considered. The first class, referred to as “pure” allows for the possibility that the players are asymmetric, and assumes that each player consistently chooses one of the pure strategies; that is, each player makes the same choice “Enter” or “Stay out” in all 50 trials of each game. When k = 2, the current games have a unique “pure strategy equilibrium”; all the players enter the market in that equilibrium. When k>2, the current games have multiple equilibria. All these multiple equilibria make the same predictions concerning the aggregate (over the four players) entry rate, efficiency and alteration rates. For example, when k = 7, these models predict that exactly one player will enter in all trials. Thus, they predict entry rate = .25, efficiency= ¾ (the entrant’s expected payoff is 10 - 7 = 3, and the expected payoff of the other three players is 0), and alternation rate = 0.
The second Nash equilibrium model, referred to as “symmetric”, assumes symmetry among the four players, and risk neutrality. These imply that the players select the dominant strategy when they have one (Enter with probability 1 when k = 2), and select the symmetric risk neutral mixed strategy equilibrium strategy in the other cases. The probability of entry in this case is the one that makes the other players indifferent between entering and staying out.
The pure and symmetric columns in Table 2 present the expected entry rates under the two equilibrium concepts. Table 4 summarizes the scores of these models. It shows that the data are closer to the symmetric (mixed when k > 2) equilibrium prediction.

5.2. Stochastic Fictitious Play and Reinforcement Learning models

Four of the baseline learning models considered here can be captured with two basic assumptions. The first assumption is a stochastic choice rule in which the probability of selecting action k at trial t is given by:
P k ( t ) = e q k ( t ) D ( t ) j = 1 2 e q j ( t ) D ( t )
where qk(t) is the propensity to select action k and D(t) modulates how decisive is the decision maker. (When D(t) equals 0, each action is chosen with equal probability; when it is large, the action with higher propensity is chosen with high probability.).
Table 4. The baseline models, the estimated parameters, and the implied normalized MSD scores by statistic and block.
Table 4. The baseline models, the estimated parameters, and the implied normalized MSD scores by statistic and block.
ModelFitted parametersNormalized Mean Squared Deviation scores by statistic and block
Entry ratesEfficiencyAlterationMean
Block:121212
Pure 26.52421.56339.41328.01064.99739.93636.741
Symmetric 29.26624.28020.60813.48425.32522.10222.511
RL λ = 4, w = 0.018.58716.6455.3038.99114.78012.23311.090
NRLλ = 10, w = 0.024.36510.2992.9045.6166.3801.7575.220
SFPλ = 1, w = 0.15.5335.2117.65012.0428.2036.8707.585
NFPλ = 3, w = 0.154.3454.1132.8544.7552.1404.2653.745
EWAλ = 7, φ = .8,
δ = .5, ρ = .6
9.0177.9904.0067.8945.9974.5386.573
SAWεi~U[0,.08], wi~U[0,1], ρi~U[0,.4],
and µi = {1, 2, or 3}.
3.4612.3171.7131.6683.3624.8802.900
I-SAWεi~U[0,.28], wi~U[0,1], ρi~U[0,.4], πi~U[0,.4],
and µi = {1, 2, or 3}.
1.9351.5161.5011.4751.5431.1191.512
The second assumption concerns the adjustment of propensities as experience is gained. The propensity to select action k at trial t+1 is a weighted average of qk(t) the propensity at t, and vk(t) the payoff from selecting this strategy at t.
qk(t+1) = [1-W(t)]· qk(t) +W(t)vk(t)
The initial value of the propensity to stay out is set equal to zero (q1(1)=0). The initial value of the propensity to enter the market was estimated to capture the observed entry rate in the first trial. Under the current set of models, this rate implies that the initial propensity to enter is q2(1)= Ln([.34/.66][1/D(1)]).
The models differ with respect to the decisiveness function D(t), and the weighting function W(t).The first model, referred to as reinforcement learning (RL) assumes stable payoff sensitivity, D(t) = λ, and insensitivity to forgone payoffs: W(t)= w (i.e. a constant) if k was selected at t, and 0 otherwise.
The second model, referred to as normalized reinforcement learning (NRL), is similar to the model proposed by Erev, Bereby Meyer and Roth [30]. It is identical to RL with one exception: payoff sensitivity is assumed to decrease with payoff variability. Specifically,
D(t) = λ/S(t)
where S(t), is the weighted average of the difference between the obtained payoff at trial t and the maximal recent payoff:
S ( t + 1 ) = ( t t + 1 ) S ( t ) + ( 1 t + 1 ) | max ( v 1 ( t ) , v 2 ( t ) ) o b t a i n e d ( t ) |
The initial value, S(1), is assumed to equal 1.
The third model is stochastic fictitious play (SFP, see [31,32,33,34]). It assumes stable payoff sensitivity, D(t) = λ and sensitivity to forgone payoff W(t)= w.
The fourth model, referred to as normalized fictitious play (NFP), was proposed by Ert and Erev [35] to capture choice behavior in individual decision tasks. It is identical to SFP with the exception of the payoff sensitivity function. Like NRL it assumes D(t) = λ/S(t).
The parameters of the four models were estimated using a simulation based grid search procedure.2 The parameters that best fit the estimation study, and the nMSD scores of the fitted models are presented in Table 4. The results reveal that the best model in the current set of four 2-parameter models is normalized fictitious play.

5.3. Experience Weighted Attraction (EWA)

The fifth learning model considered here is a simplified variant of the experience weighted attraction (EWA) model proposed by Camerer and Ho [36]. This model uses Equation 1’s choice rule, and a modified adjustment rule that implies a non-linear combination of a reinforcement learning and a fictitious play model (but not of the reinforcement learning and fictitious play models presented above). Specifically, this model assumes that:
qk(t+1) = {φ·N(t-1)·qk (t) + [δ + (1-δ)·I(t,k)]·vk(t)}/[N(t)].
where φ is a forgetting parameter, N(1) =1, N(t)= ρN(t-1) + 1 (for t > 1) is a function of the number of trials, ρ is a depreciation rate parameter, δ is a parameter that determines the relative weight for obtained and forgone payoffs, I(t,k) is an index function that returns the value 1 if strategy k was selected in trial t and 0 otherwise, and vk(t) is the payoff that the player would receive for a choice of strategy k at trial t.
Table 1 shows that the fit of EWA is better than the fit of reinforcement learning and fictitious play without normalization, but is not as good and the fit of the variants of these models with normalization.

5.4. The Sampling and Weighting (SAW) model

SAW is a modification of the explorative sampler model, the baseline model that provides the best predictions of behavior in the Erev et al. [9] competition. A modification is necessary as the explorative sampler model is designed to capture learning when the feedback is limited to the obtained payoffs. The modification includes two simplification assumptions (fixed exploration probability, and linear value function), and the added assumption of individual differences (in the value of the parameters).
The model distinguishes between two response modes: exploration and exploitation. Exploration implies entry with probability P0. The probability of exploration, by individual i, is 1 in the first trial, and εi (a trait of i) in all other trials. The value of P0 is estimated based on the observed choice rate in the first trial (0.66 in the current setting).
During exploitation trials, individual i selects the alternative with the highest Estimated Subjective Value (ESV). The ESV of alternative j at trial t > 1 is:
ESV(j,t) = (1-wi)(SampleMj) + wi(GrandMj)
The SampleMj (sample mean) is the average payoff from Alternative j in a small sample of µi previous experiences (trials),3 the GrandMj (grand mean) is the average payoff from j over all (t-1) previous trials (µi and wi are traits). The assumed reliance on small samples was introduced to capture the observed tendency to underweight rare events [25,37].
The µi draws are assumed to be independent (sampled with replacement) and biased toward the most recent experience (Trial t-1). Each draw is biased with probability ρi (a trait), and unbiased otherwise. A bias implies a selection of the most recent trial (Trial t-1). In unbiased draws all previous trials are equally likely to be sampled. The motivation behind this assumption is the “very recent effect”.
The traits are assumed to be independently drawn from a uniform distribution between the minimal possible value allowed by the model and a higher point. Thus, the estimation focused on estimating the upper points (five free parameters) of the relevant distributions. The results (c.f. Table 4) show that SAW fits the data better than the models presented above.4

5.5. The Inertia, Sampling and Weighting (I-SAW) model

The final baseline model was explicitly designed to capture the eight behavioral regularities listed above. This model (proposed in [26]) is a generalization of SAW that allows for the possibility of a third response mode: Inertia (see similar additions in [27,38]). In this mode, the players simply repeat their last choice.
The exact probability of inertia at trial t+1 is assumed to decrease when the recent outcomes are surprising. Specifically, if the exploration mode was not selected, the probability of inertia is:
P(Inertia at t+1) = πiSurprise(t)
Where 0 ≤ πi < 1 is a trait that captures the tendency for inertia. The value of the surprise term is assumed to depend on the gap (absolute difference) between past and the present payoffs. The payoffs are compared to the most recent payoffs, and to the mean payoffs:
G a p ( t ) = 1 4 [ j = 1 2 | O b t a i n e d j ( t 1 ) O b t a i n e d j ( t ) | + j = 1 2 | G r a n d M j ( t ) O b t a i n e d j ( t ) | ]
where Obtainedj(t) is the payoff obtained from j at trial t, and GrandMj(t) is the average payoff obtained from j in the first t-1 trials. The surprise at t is normalized by the mean gap (in the first t-1 trials):
Surprise(t) = Gap(t)/[Mean_Gap(t) +Gap(t)]
The mean gap at t is a running average of the gap in the previous trials (with Mean_Gap(1) = .00001). Specifically,
Mean_Gap(t+1) = Mean_Gap(t)(1-1/r) +Gap(t)(1/r)
where r is the expected number of trials in the experiment (50 in the current study).
Notice that the normalization (Equation 10) implies that the value of Surprise(t) is between 0 and 1, and the probability if inertia is between πi (when Surprise(t) =1) and 1 (when Surprise(t) = 0). An interesting justification for gap-based abstraction of surprise comes from the observation that the activity of certain dopamine related neurons is correlated with the difference between average past payoff and the present outcome [5].
We chose to estimate I-SAW under the working assumption of the same learning process (and parameters) in the 40 games described above, and the 20 individual decision tasks studied by Nevo and Erev [26]. The estimation reveals that best fit (of the 60 conditions) is obtained with the trait distribution: εi~U[0,.28], wi~U[0,1], ρi~U[0,.4], πi~U[0,.4], and µi = {1,2 or 3 with equal probability}. Table 4 shows that the fit of I-SAW (of the 40 market entry games) is much better than the fit of the other models. Table 5 presents the predictions of I-SAW for each statistic by game. Comparison of these predictions and the experimental results (Table 2) reveals high correspondence. The lower panel in Table 5 presents the correlations by statistic.
Table 5. The predictions of the best baseline model (I-SAW): The lowest row presents the correlation with the experimental results by statistic.
Table 5. The predictions of the best baseline model (I-SAW): The lowest row presents the correlation with the experimental results by statistic.
The gamesEntry ratesEfficiencyAlternations
#KphhlSfB1B2B1B2B1B2
120.0470-350.790.832.582.540.190.15
220.2330-940.600.662.392.640.270.23
320.671-230.910.912.312.310.110.10
420.7330-8040.650.652.402.560.280.27
520.8020-8050.680.682.422.570.260.25
620.834-2030.780.802.452.540.220.20
720.946-9050.810.822.372.470.170.16
820.951-2050.860.862.372.430.150.13
920.964-9030.840.852.342.420.150.13
1030.1070-840.440.490.871.150.250.22
1130.909-8040.740.73-0.090.140.210.20
1230.917-7060.740.74-0.140.100.200.19
1340.0660-420.420.440.100.260.270.24
1440.2040-1040.440.46-0.190.060.300.28
1540.3120-940.480.50-0.35-0.090.320.31
1640.604-620.510.51-0.230.010.330.30
1740.6040-6030.560.55-0.94-0.580.310.31
1840.733-820.520.52-0.22-0.020.320.29
1940.8020-8020.630.62-1.61-1.120.270.26
2040.901-960.510.520.080.150.280.24
2140.963-7030.600.57-0.75-0.390.240.21
2250.0280-230.350.36-0.150.050.260.23
2350.0790-730.320.33-0.60-0.210.240.22
2450.5380-9050.520.51-2.18-1.690.320.31
2550.801-420.400.39-0.21-0.030.280.25
2650.884-3030.460.45-0.83-0.570.280.25
2750.935-7040.500.48-1.38-0.870.270.22
2860.1090-1050.320.32-1.22-0.740.250.24
2960.1930-730.340.34-1.29-0.800.290.27
3060.2950-2030.420.41-2.16-1.580.320.30
3160.467-660.340.33-0.84-0.530.300.27
3260.576-840.350.34-0.90-0.590.300.27
3360.8220-9030.570.53-4.38-3.120.270.26
3460.888-6040.460.43-2.12-1.490.280.24
3570.0690-640.260.26-1.30-0.740.230.21
3670.2130-830.320.31-1.76-1.180.290.27
3770.5080-8050.480.46-4.43-3.500.320.31
3870.699-2050.350.34-1.70-1.250.290.26
3970.817-3020.360.34-1.73-1.260.290.26
4070.911-1020.280.28-0.73-0.500.250.22
Means0.523 0.523 -0.294 0.039 0.261 0.238
Correlation with the experimental results0.9660.9730.9800.9720.7760.888

6. Summary

The current competition is designed to improve our understanding of the effect of experience on choice behavior in situations that involve both environmental and strategic uncertainty. It focuses on pure decisions from experience in market entry games: The participants did not receive a description of the incentive structure and had to rely on the (complete) feedback that was provided after each trial. The experimental results reveal the robustness of eight qualitative behavioral tendencies that were documented in previous studies of market entry games and individual decisions from experience: (i) Payoff variability effect: Fast convergence to equilibrium when the payoff variability is low, and weaker sensitivity to the incentive structure when the payoff variability is high. (ii) Excess entry: The entry rate does not reflect loss aversion or risk aversion; it is higher than the equilibrium predictions. (iii) High sensitivity to forgone payoffs. (iv) Underweighting of rare events: The tendency to enter the market increases when this behavior is likely to lead to the best payoffs, even when this behavior decreases expected payoff. (v) Surprise triggers change: The probability of alternation decreases when the obtained outcomes are similar to the typical outcomes. (vi) Very recent effect: Choice behavior is most sensitive to the most recent experience, and all previous experiences appear to have the same effect. (vii) Strong inertia: The participants tend to repeat their last choice even when the forgone payoff is higher than the obtained payoff. (viii) Robust individual differences.
Our attempt to capture the quantitative results with different learning models highlights the significance of the eight qualitative regularities listed above. The descriptive values of the different models appear to increase with the number of qualitative regularities that they abstract. The best fit was provided with the I-SAW model that abstracts all eight regularities.
We hope that the prediction competition will clarify and extend these results in qualitative and quantitative ways. One set of possible qualitative contributions involves the clarification of the necessary and sufficient assumptions for effective prediction of behavior in the current setting. It is possible that the competition will highlight the value of simple models that do not abstract all the eight regularities considered above. And it is also possible that the competition will highlight additional regularities that should be abstracted to optimize predictions.
One set of likely quantitative contributions involves the quantification of the different regularities. We hope that the competition will facilitate the development and evaluation of more creative and effective quantifications.

Acknowledgements

This research was supported by a grant from the U.S.A.-Israel Binational Science Foundation (2008243).

References and Notes

  1. Erev, I.; Roth, A. Predicting how People Play Games: Reinforcement Learning in Games with Unique, Mixed Strategy Equilibria. Am. Econ. Rev. 1998, 88, 848–881. [Google Scholar]
  2. Thorndike, E.L. Animal Intelligence: An Experimental Study of the Associative Processes in Animals. Psychol. Rev. Monograph Supplement 1898.
  3. Skinner, B.F. The behavior of organisms; Appleton-Century-Crofts: New York, NY, USA, 1938. [Google Scholar]
  4. Shafir, S.; Reich, T.; Tsur, E.; Erev, I.; Lotem, A. Perceptual Accuracy and Conflicting Effects of Certainty on Risk-Taking Behavior. Nature 2008, 453, 917–920. [Google Scholar]
  5. Schultz, W. Predictive Reward Signal of Dopamine Neurons. J. Neurophysiol. 1998, 80, 1–27. [Google Scholar]
  6. Erev, I.; Haruvy, E. Learning and the Economics of Small Decisions. The Handbook of Experimental Economics; Kagel, J.H, Roth, A.E., Eds.; Princeton University Press: Princeton, NJ, USA In press. http://www.utdallas.edu/~eeh017200/papers/LearningChapter.pdf.
  7. Salmon, T. An Evaluation of Econometric Models of Adaptive Learning. Econometrica 2001, 69, 1597–1628. [Google Scholar] [CrossRef]
  8. Hopkins, E. Two Competing Models of How People Learn in Games. Econometrica 2002, 70, 2141–2166. [Google Scholar] [CrossRef]
  9. Erev, I.; Ert, E.; Roth, A.E.; Haruvy, E.; Herzog, S.; Hau, R.; Hertwig, R.; Stewart, T.; West, R.; Lebiere, C. A Choice Prediction Competition, for Choices from Experience and from Description. J. Behav. Decis. Making 2010, 23, 15–47. [Google Scholar] [CrossRef]
  10. Arifovic, J.; McKelvey, R.D.; Pevnitskaya, S. An Initial Implementation of the Turing Tournament to Learning in Repeated Two-Person Games. Games Econ. Behav. 2006, 57, 93–122. [Google Scholar] [CrossRef]
  11. Gonzalez, C.; Lerch, F.J.; Lebiere, C. Instance-Based Learning in Real-Time Dynamic Decision Making. Cognit. Sci. 2003, 27, 591–635. [Google Scholar] [CrossRef]
  12. Selten, R.; Guth, W. Equilibrium Point Selection in a Class of Market Entry Games. In Games, Economic Dynamics, and Time Series Analysis; Diestler, M., Furst, E., Schwadiauer, G., Eds.; Physica-Verlag: Wien-Wurzburg, Austria-Germany, 1982. [Google Scholar]
  13. Kahneman, D. Experimental Economics: A Psychological Perspective. In Bounded Rational Behavior in Experimental Games and Markets; Tietz, R., Albers, W., Selten, R., Eds.; Springer-Verlag: Berlin, Germany, 1988. [Google Scholar]
  14. Rapoport, A. Individual Strategies in a Market-Entry Game. Group Decis. Negot. 1995, 4, 117–133. [Google Scholar] [CrossRef]
  15. Sundali, J.A.; Rapoport, A.; Seale, D.A. Coordination in Market Entry Games with Symmetric Players. Organ. Behav. Human Decis. Proc. 1995, 64, 203–218. [Google Scholar] [CrossRef]
  16. Erev, I.; Rapoport, A. Magic, Reinforcement Learning and Coordination in a Market Entry Game. Games Econ. Behav. 1998, 23, 146–175. [Google Scholar] [CrossRef]
  17. Erev, I.; Barron, G. On Adaptation, Maximization and Reinforcement Learning Among Cognitive Strategies. Psychol. rev. 2005, 112, 912–931. [Google Scholar] [CrossRef]
  18. Roth, A.E.; Erev, I. Learning in Extensive Form Games: Experimental Data and Simple Dynamic Models in the Intermediate Term. Games Econ. Behav. 1995, 8, 164–212. [Google Scholar] [CrossRef]
  19. Fischbacher, U. z-Tree: Zurich Toolbox for Ready-Made Economic Experiments. Exper. Econ. 2007, 10, 171–178. [Google Scholar]
  20. Myers, J.L.; Sadler, E. Effects of Range of Payoffs as a Variable in Risk Taking. J. Exper. Psychol. 1960, 60, 306–309. [Google Scholar] [CrossRef]
  21. Busemeyer, J.R.; Townsend, J.T. Decision Field Theory: A Dynamic-Cognitive Approach to Decision Making in an Uncertain Environment. Psychol. Rev. 1993, 100, 432–459. [Google Scholar] [CrossRef] [PubMed]
  22. Grosskopf, B.; Erev, I.; Yechiam, E. Forgone with the Wind: Indirect Payoff Information and its Implications for Choice. Int. J. Game Theory 2006, 34, 285–302. [Google Scholar]
  23. Erev, I.; Ert, E.; Yechiam, E. Loss Aversion, Diminishing Sensitivity, and the Effect of Experience on Repeated Decisions. J. Behav. Decis. Making 2008, 21, 575–597. [Google Scholar] [CrossRef]
  24. Camerer, C.; Lovallo, D. Overconfidence and Excess Entry: An Experimental Approach. Am. Econ. Rev. 1999, 89, 306–318. [Google Scholar] [CrossRef]
  25. Barron, G.;  Erev, I. Small Feedback-Based Decisions and Their Limited Correspondence to Description Based Decisions. J Behav. Decis. Making 2003, 16, 215–233. [Google Scholar]
  26. Nevo, I.; Erev, I. On Surprise, Change, and the Effect of Recent Outcomes. Technion: Haifa, Israel, Unpublished work; 2010. [Google Scholar]
  27. Biele, G.; Erev, I.; Ert, E. Learning, Risk Attitude and Hot Stoves in Restless Bandit Problems. J. Math. Psychol. 2009, 53, 155–167. [Google Scholar] [CrossRef]
  28. Ert, E.; Yechiam, E. Consistent Constructs in Individuals’ Risk Taking in Decisions from Experience. Acta Psychol. 2010, 134, 225–232. [Google Scholar] [CrossRef]
  29. Yechiam, E.; Busemeyer, J.R. Evaluating Generalizability and Parameter Consistency in Learning Models. Games Econ. Behav. 2008, 63, 370–394. [Google Scholar] [CrossRef]
  30. Erev, I.; Bereby-Meyer, Y.; Roth, A.E. The Effect of Adding a Constant to all Payoffs: Experimental Investigation, and Implications for Reinforcement Learning Models. J. Econ. Behav. Organ. 1999, 39, 111–128. [Google Scholar] [CrossRef]
  31. Cheung, Y.-W.; Friedman, D. Individual Learning in Normal Form Games: Some Laboratory Results. Games Econ. Behav. 1997, 19, 46–76. [Google Scholar] [CrossRef]
  32. Cooper, D.; Garvin, S.; Kagel, J. Signaling and Adaptive Learning in an Entry Limit Pricing Game. Rand J. Econ. 1997, 28, 662–683. [Google Scholar] [CrossRef]
  33. Fudenberg, D.; Levine, D.K. The Theory of Learning in Games; MIT Press: Cambridge, MA, USA, 1998. [Google Scholar]
  34. Nyarko, Y.; Schotter, A. An Experimental Study of Belief Learning Using Elicited Beliefs. Econometrica 2002, 70, 971–1006. [Google Scholar] [CrossRef]
  35. Ert, E.; Erev, I. Replicated Alternatives and the Role of Confusion, Chasing, and Regret in Decisions from Experience. J. Behav. Decis. Making 2007, 20, 305–322. [Google Scholar] [CrossRef]
  36. Camerer, C.; Ho, T.H. Experience Weighted Attraction Learning in Normal Form Games. Econometrica 1999, 67, 827–874. [Google Scholar] [CrossRef]
  37. Hertwig, R.; Erev, I. The Description–Experience Gap in Risky Choice. Trends Cognit. Sci. 2009, 13, 517–523. [Google Scholar] [CrossRef]
  38. Cooper, D.J.; Kagel, J.H. Learning and Transfer in Signaling Games. Econ. Theory 2008, 34, 415–439. [Google Scholar] [CrossRef]

Appendix 1

Problem Selection Algorithm

At each trial, each of 4 players has to decide (individually) between “entering”, or “staying out” (a safer prospect). The payoffs depend on a realization of a binary gamble (the realization at trial t is denoted Gt, and yields “H with probability Ph; and L otherwise”), the number of entrants (E), and two additional parameters (k and S).
The exact payoff for player i at trial t is:
V i ( t ) = { 10 -k ( E )   + G t if i enters round ( G t / S )  with p = .5; -round ( G t / S )  otherwise if i does not enter
k is drawn (with equal probability) from {2, 3, 4, 5, 6, 7}
S is drawn (with equal probability) from {2, 3, 4, 5, 6}
The parameters of the binary gamble:
  • high is drawn (with equal probability) from {1, 2, 3, 4, 5, 6, 7, 8, 9, 10}.
  • A random number r is generated from (0, 1).
  • If r<0.5 then H=high, otherwise H=10*high
  • low is drawn (with equal probability) from {-10, -9, -8, -7, -6, -5, -4, -3, -2, -1}.
  • A random number r’ is generated from (0, 1).
  • If r’<0.5 then L=low, otherwise L=10*low
  • Ph=round[-L/(H-L), .01]

Appendix 2

Instructions

This experiment includes several games. In each game you will be matched to interact with 3 other participants, for several trials. At each trial each participant will be asked to choose between two options: “stay out” or “enter”.
Your payoff in each trial will depend on your choice, the state of nature, and on the choices of the other participants (such that the more people enter the less is the payoff from entry).
You will not receive a description of the exact payoff rule, but you will receive feedback after each trial. This feedback will include your payoff in that trial, and the payoff that you would have gotten had you selected the other option.
In addition, for each time that you will make your decision within 2 seconds and confirm your feedback information (by pressing OK) within 2 seconds you will receive a bonus of .03 experimental units.
The different games will involve different payoff rules. Before the start of each new game you will receive a notice.
Your final payoff will be composed of a starting fee of $25 plus/minus the experimental payoff in one randomly selected trial (where each experimental unit equals $0.1), and the bonus.
Good Luck!

Web Appendices

  • 1See a copy of the instructions in Appendix 2.
  • 2We feel that the known limitation of this procedure (it does not guarantee convergence to the “correct parameters”) is not very important in the current context. We do not use models to find the correct parameters. Rather, we challenge readers to derive more useful predictions.
  • 3It is natural to assume that a previous experience is more likely to be sampled if the current trial is similar to the trial that led to that experience. This similarity rule can be used to capture discrimination between different states of nature [11]. However, the current implementation of the model is simplified by the assumption that all previous trials, but the most recent, are equally similar. This simplification assumption has to be modified to address learning in dynamic settings.
  • 4In an additional analysis we estimated a variant of SAW that assumes that all the players behave in accordance to the same parameters. This assumption reduces the fit to the level of the fit of NFP.

Share and Cite

MDPI and ACS Style

Erev, I.; Ert, E.; Roth, A.E. A Choice Prediction Competition for Market Entry Games: An Introduction. Games 2010, 1, 117-136. https://doi.org/10.3390/g1020117

AMA Style

Erev I, Ert E, Roth AE. A Choice Prediction Competition for Market Entry Games: An Introduction. Games. 2010; 1(2):117-136. https://doi.org/10.3390/g1020117

Chicago/Turabian Style

Erev, Ido, Eyal Ert, and Alvin E. Roth. 2010. "A Choice Prediction Competition for Market Entry Games: An Introduction" Games 1, no. 2: 117-136. https://doi.org/10.3390/g1020117

Article Metrics

Back to TopTop