**1. Introduction**

The use of the level-*k* model has prevailed in the literature for characterizing people's initial responses in laboratory strategic games [1,2]. The model characterizes the player's systematic deviations from the Nash equilibrium using a bounded rational-type explanation. The level-0 type's action is assumed to be uniformly distributed over all actions (or in some cases, level-0 type's action is the most prominent action available), whereas the level-1 type has the best response to the expected action of the level-0 type. The level-2 type has the best response to the expected action of the level-1 type. The iterations follow this pattern, as the level-*k* type always has the best response to the actions of level-*k* − 1 type. Such patterns of off-equilibrium play have been evidenced in many laboratory experiments. In Nagel's *p*-beauty contest game, Nagel found spikes that correspond to the first and second rounds of iterative best responses [1]. Stahl and Wilson found similar evidence of level-1 and level-2 types with 10 matrix games [2]. Camerer et al. developed a cognitive hierarchy model [3]. Instead of holding a belief that all the other players are type *k*-1, level-*k* players in the cognitive hierarchy model assign a probability distribution over all the lower types. Many other studies used the level-*k* model to explain laboratory data (matrix game [4]; beauty contest game [5–8]; sequential game [9]; auction [10,11]; Crawford, Costa-Gomes and Iriberri also provide a comprehensive literature review [12]).

However, although the level-*k* model has proven its usefulness in characterizing initial responses for many laboratory games, its predictive power remains ambiguous because (1) it is often used posteriorly to classify a player's type given their actions and (2) the model lacks components related to individual characteristics that could help identify different types of players. It is important to understand how certain levels are reached for each individual, as it is a starting point for the discussion of the model's predictive power. Alaoui and Penta developed a framework called the endogenous depth of reasoning (EDR) model to explain what may happen in a player's head when they encounter a given strategic situation [13]. The EDR model captures individual characteristics by introducing cost of reasoning, which is determined both by the strategic environment and by a player's endogenous cognitive ability. The model includes game-specific characteristics by introducing the benefit of reasoning through payoffs of the games. Lastly, the model allows a clear separation of cognitive bounds and behavioral levels observed in games by introducing higher-order beliefs. Such separation makes room for individual adjustments of *k*-levels in different strategic environments. As a result, a level-1 action observed from a game does not necessarily classify the player as a level-1 player. Instead, such action can be a product of the player's cost and benefit analysis and his belief about his opponents.

The EDR model provides a plausible starting point to study the persistence of the level-*k* model. However, as individuals have heterogeneous costs of reasoning and belief systems in all kinds of strategic situations, it is hard to conduct direct comparisons across games to test whether the behavioral *k*-levels follow the EDR model's predictions. In this paper, I use Costa-Gomes and Crawford's two-person guessing games (henceforth CGC06) and cognitive load to create different strategic environments [14]. By controlling cognitive load, I create a standard for the cost of reasoning for all the subjects. Although individual cognitive ability may still have an effect, by using a within-subject experimental design, the individual effect will no longer impact the comparisons of strategic levels across games for the same subject. The revelation of information about the strategic environment is also carefully manipulated to clearly control the subject's belief space. The goal was to test whether the EDR model provides directional predictions about the changes on *k*-levels across games for any given subject. Alaoui and Penta tested the benefit part of their model using the 11–20 money request game with altered bonus rewards [13,15]. To the best of my knowledge, this was the first paper to provide experimental tests of the EDR model by introducing different strategic environments with controlled cost and belief space.

With the 18 two-person guessing games in the experiment, the results suggest that the subject's behavioral levels systematically vary across the games. Subjects are mostly responsive to the changes in the strategic environment. Their directional changes in behavioral levels can be predicted by the EDR model when they are more cognitively capable or their opponent is less cognitively capable. An inherent cognitive bound exists for the subjects in different strategic environments. When comparing a subject's behavioral levels across all the games while providing the same amount of cognitive resources, their behavioral levels rarely exceed their cognitive bound level for that strategic environment.

A few other papers also studied the correlation of individual *k*-levels with cognitive ability. Allred et al. investigated the effects of cognitive load on strategic sophistication [16]. In their experiments, they asked the subjects to perform a memorization task of either a three- or nine-digit binary number concurrently with strategic games such as beauty contest, 11–20, and 10 matrix games. They found that subjects with high loads (i.e., nine-digit number) were less capable of computing best responses, especially for the beauty contest game. They were also aware of their strategic disadvantages. The net result of cognitive load depended on the specific strategic context. Burnham et al. used a standard psychometric test to measure the cognitive abilities of their subjects, and correlated the test results with subjects' performances in a *p*-beauty contest game [17]. They found a negative correlation between cognitive test scores and entries in the beauty contest game, indicating that subjects with higher cognitive ability tend to be more strategically sophisticated in such games. Gill and Prowse used a 60-question non-verbal Raven test to assign subjects into high- and low-cognitive-ability groups [18]. They asked the subjects to play a *p* beauty contest game for 10 rounds, and found that subjects in the high-cognitive-ability group converged to equilibrium faster. These studies provided some evidence of the correlation of individual *k*-levels with cognitive ability or carefully controlled cognitive tasks. In my experiment, I used memorization tasks to manipulate the cost of reasoning for the subjects in the context of a two-person guessing game. According to Allred et al., higher cognitive load negatively affects a subject's ability to calculate the best responses in this type of guessing games [16]. To attain a higher level of strategic sophistication, players have to exert more effort to combat the effects of cognitive load; therefore, the cost of reasoning increases with cognitive load in this strategic situation. Every subject experienced both the low and high cognitive loads at some point during the experiment,

so they were fully aware of the additional cost of reasoning that was added by these memorization tasks. As a result, their cost of reasoning and their belief about their opponent's cost of reasoning can be quantified by the cognitive load.

The stability of *k*-levels is an important aspect in the level-*k* model literature. Stahl and Wilson used twelve normal-form games to estimate the player's level [19]. They found that using a relatively low threshold, 35 out of 48 subjects could be classified as stable across games. Fragiadakis et al. asked the subjects to repeat their decisions in a series of two-person guessing games to subsequently best respond to their past actions [20]. They found that only 40% of the subjects who were able to replicate the decisions could be classified as a known behavioral type. A few works mentioned the predictive power of strategic sophistication. Arad and Rubinstein used a multidimensional Colonel Blotto game to observe subject's multidimensional iterative reasoning process [21]. They found that subjects with a higher level of reasoning in the 11–20 money request game also seem to have more rounds of iterative reasoning in this game.

Perhaps the most closely related work to this paper is Georganas, Healy, and Weber's 2015 paper [22]. They conducted an experiment to examine the cross-game stability of the *k*-levels. They used four matrix undercutting games and six two-person guessing games and compared them at the individual level. They found no correlation between the levels of reasoning across games. However, they found some evidence of cross-game stability within the class of undercutting game. I studied a similar question to the cross-game stability of the level-*k* model. Instead of introducing a second family of games, I used cognitive load to mimic different strategic environments, and restricted the subjects to fixed pairs while playing the games. The belief space was therefore carefully controlled, and the uncertainty from playing against a new random player for each round was completely eliminated. The data suggested that systematic level changes can be predicted by the EDR model under certain conditions. In Section 2, I provide a brief introduction to the EDR model to cover some necessary background and theoretical predictions. In Section 3, the experimental design is introduced in detail. Sections 4 and 5 cover the data analysis procedure and the discussion of the results, respectively. Section 6 provides the concluding remarks.
