Synthetic User Generation in Games: Cloning Player Behavior with Transformer Models

Chapa Mata, Alfredo; Nimi, Hisa; Chacón, Juan Carlos

doi:10.3390/info16040329

Open AccessArticle

Synthetic User Generation in Games: Cloning Player Behavior with Transformer Models

by

Alfredo Chapa Mata

¹,

Hisa Nimi

² and

Juan Carlos Chacón

^2,*

¹

School of Arts and Design, Universidad de Monterrey, Monterrey 66238, Mexico

²

Faculty of Informatics, Chiba University, Chiba 263-8522, Japan

^*

Author to whom correspondence should be addressed.

Information 2025, 16(4), 329; https://doi.org/10.3390/info16040329

Submission received: 6 March 2025 / Revised: 15 April 2025 / Accepted: 18 April 2025 / Published: 21 April 2025

(This article belongs to the Special Issue Next-Generation Applications and Implementations of Gamification Systems)

Download

Browse Figures

Versions Notes

Abstract

:

User-centered design (UCD) commonly requires direct player participation, yet budget limitations or restricted access to users can impede this goal. To address these challenges, this research explores a transformer-based approach coupled with a diffusion process to replicate real player behavior in a 2D side-scrolling action–adventure environment that emphasizes exploration. By collecting an extensive set of gameplay data from real participants in an open-source game, “A Robot Named Fight!”, this study gathered comprehensive state and input information for training. A transformer model was then adapted to generate button-press sequences from encoded game states, while the diffusion mechanism iteratively introduced and removed noise to refine its predictions. The results indicate a high degree of replication of the participant’s actions in contexts similar to the training data, as well as reasonable adaptation to previously unseen scenarios. Observational analysis further confirmed that the model mirrored essential aspects of the user’s style, including navigation strategies, the avoidance of unnecessary combat, and selective obstacle clearance. Despite hardware constraints and reliance on a single observer’s feedback, these findings suggest that a transformer–diffusion methodology can robustly approximate user behavior. This approach holds promise not only for automated playtesting and level design assistance in similar action–adventure games but also for broader domains where simulating user interaction can streamline iterative design and enhance player-centric outcomes.

Keywords:

machine learning techniques for video games; deep learning; transformer model; game design; game design tools

1. Introduction

Video game design focuses on tasks such as the creation of game mechanics, rules, challenges, narratives, and aesthetics, in which designers must always consider the players and their desires [1,2,3]. As Brathwaite and Schreiber highlighted, good game design is player centric [1]. This approach of putting the player at the center of the design process is known as user-centered design (UCD) and aims to provide a satisfying and usable product that meets the needs of its users [4,5]. Ideally, integrating the user to work in a co-design environment will allow for exploring new ideas and developing knowledge collaboratively [6]. However, integrating human players into the design process can be time consuming and expensive, requiring significant effort from both developers and players to gather valuable data that can enhance the overall experience [7]. To investigate alternative ways of integrating players, this research proposes the development of a virtual user or artificial agent capable of simulating real player behavior. Specifically, it explores the potential of using a generative approach, utilizing a transformer model combined with a diffusion process, to interact with a Metroidvania game.

Introduced by Google in 2017, the transformer is a deep learning architecture designed to learn context by taking a sequence of input data and identifying the relationships between them [8,9,10]. By transforming an input sequence into an output sequence [11], transformers have been widely adopted in applications such as natural language processing [10], image generation [12], vision [13], and speech recognition [14]. Leveraging its advantage of requiring less training and processing time and its potential to adapt to other tasks [8], this research aims to investigate whether a transformer model and a diffusion process can replicate player behavior in Metroidvania games, enabling virtual agents to interact with the game environment as effectively as real users. The ultimate goal is to integrate these virtual agents into a game design team to enrich the process and ensure player-centered experiences. Two hypotheses guide this exploration: first, that a transformer model combined with a diffusion process is a viable strategy for cloning human behavior, potentially contributing to simulation and interaction experiences; and, second, that such a virtual agent can transform text-based instructions into actions, interacting seamlessly with a Metroidvania video game and paving the way for integration into game design and testing processes.

This paper is structured as follows: After the introductory section, Section 2 presents a comprehensive review of related work, focusing on synthetic user modeling, digital twins, and AI-driven player behavior replication in video games. Section 3 describes the methodology, including the data collection process, the proposed transformer–diffusion architecture, and the training approach. Section 4 reports the experimental results and evaluation, incorporating quantitative performance metrics, such as cross-entropy and perplexity, and observational analysis of the virtual agent’s behavior. Section 5 provides a discussion contextualizing the findings, identifying key strengths, limitations, and areas for potential enhancement. Finally, Section 6 concludes and outlines future research directions, highlighting the possible applications of AI-driven virtual users in automated playtesting and player-centric game development.

2. Related Work

A key objective of artificial intelligence research is the development of AI agents capable of learning and exhibiting human-like behavior, which is a critical element for enhancing human–AI collaboration within interactive systems [15]. Achieving such human-like behaviors is essential not only for autonomous decision-making processes in AI systems but also for significantly improving user engagement in video game environments through more intuitive interactions for players [16]. Previous work has emphasized the role of artificial intelligence techniques in video game design, recognizing it as a fundamental milestone that can be used for different purposes like game personalization, user data analysis, and enhancing graphics and sound [17]. One area of interest within artificial intelligence (AI) research for computer games involves creating non-player characters (NPCs) that exhibit human-like behaviors. Modern NPCs utilize diverse decision-making methods to determine their actions across various scenarios, enabling them to dynamically influence the state of the game environment [18]. To address this challenge, extensive research has explored integrating artificial agents or users capable of emulating human behavior into video game environments. This includes studies on synthetic users, which are artificially generated agents designed to simulate human behavior [19,20,21]; player digital twins, which involve creating personalized virtual models of individual players by capturing their unique playing styles, decisions, and preferences to enable tailored interactions and predictive analysis [22,23,24]; and AI-driven player modeling, which refers to techniques employing machine learning algorithms to analyze, predict, and adapt to player behaviors dynamically, enhancing user engagement and gameplay personalization [25,26,27].

AI-driven agents have been thoroughly researched for their capability to mimic human behavior in digital environments, particularly in areas such as automated playtesting [28,29,30,31] and game analytics [32]. Various methodologies have been employed for modeling human interactions in virtual spaces, including behavioral cloning (BC), reinforcement learning, and generative modeling techniques [33,34,35,36].

Behavioral cloning is a widely used imitation learning approach that has been successful across application domains, such as autonomous driving [37] and video game playing [38]. However, the traditional BC models face significant limitations in complex tasks due to oversimplified human behavior approximations, dataset biases, and overfitting, leading to poor generalization under novel settings, exhibiting training instability with high variance, and undermining reliability and reproducibility [39]. To address these limitations, recent research has explored the use of generative approaches, such as diffusion models, which have attained state-of-the-art results in image, video, and audio generation [40]. Diffusion models can enhance AI behavior modeling through a good representation of the multimodal and complex action distributions found in human behavior, enabling AI agents to handle new situations satisfactorily and adapt to diverse scenarios [41]. By applying diffusion models to behavioral cloning, researchers have demonstrated improved action consistency and adaptability in developing an artificial agent that matches the performance of the medium-difficulty built-in AI in the deathmatch game mode while exhibiting a human-like play style in gaming environments such as Counter-Strike/Global Offensive (CS:GO) [42]. Building on these advancements, this study proposes a transformer-based generative model combined with a diffusion process to replicate player behavior in a 2D side-scrolling action–adventure game. This approach aims to enhance player simulation techniques, contributing to AI-driven game testing, automated playtesting, and player-centric design.

3. Methodology

The main objective of this research is to explore a way to create a generated user capable of replicating how a specific real player interacts with a Metroidvania video game. To achieve this, we took a natural language processing (NLP) approach and developed and trained a transformer model that interprets a sequence of gameplay data and transforms it into a sequence of actions in the form of buttons, allowing it to interact with the game.

The development of the proposed model started with the selection of the users to be cloned by using their game patterns as a reference for training. Through several play sessions, the necessary data were collected to form a dataset. Subsequently, extensive training was carried out using the dataset as input, allowing the model to learn patterns and develop the ability to react to different scenarios. Finally, integrated tests were performed on the video game to evaluate the accuracy of the generated agent to move between levels and replicate the user’s game style. Figure 1 shows a graphical visualization of these steps.

3.1. Participants

The experiment involved a group of five participants from Mexico consisting of three men and two women. The selection aimed to include individuals with varying levels of experience in video games to ensure a diverse behavioral dataset. The age of the participants ranged from 27 to 67 years, with a mean age of 41.2 years (SD = 15.57).

3.2. Materials

3.2.1. Video Game

After establishing that 2D side-scrolling, exploration-focused action–adventure games offer ample opportunities for collecting varied data—thanks to level exploration elements, platforming mechanics, and character upgrades—an evaluation was carried out to decide whether to use an existing game of this type or develop a small custom test level. Following research and feasibility checks, “A Robot Named Fight!”, developed by Morningstar Game Studio in 2017, was selected. This title is described by its creators as a roguelike action–adventure game that emphasizes exploration and item collection [43]. One reason for its selection was the procedural generation mechanic, which allows each new game to feature a different layout, thereby broadening possible data samples. Another crucial factor was the availability of its open-source code [44], which enabled the integration of custom data-capturing functionalities without undermining the stability of the final build.

3.2.2. Software Tools

Unity (version 2023.2.19) and C# were used to introduce game modifications into “A Robot Named Fight!”, specifically for logging and storing data frames during play sessions. Python (version 2.4.0) served as the primary language for dataset management, with PyTorch (version 2.4.0) being used as the core machine learning framework to build and train the transformer.

3.3. Data Collection

For the data collection process, approximately 11,104,548 million distinct game frames were analyzed, distributed across five player cohorts, each representing a unique play style or skill level. The game environment was modified to annotate each frame with detailed information, including the player’s position, available weapons, selected weapons, directional collision detections, on-screen obstacles or enemies, and the input keys pressed during gameplay. All recorded data were stored in multiple JSON files. These records constituted the primary dataset used for the model’s training phase.

Following the completion of training, an additional 49,140 frames (equivalent to approximately 13.65 min of gameplay) were captured for testing (Figure 2). These frames featured previously unseen level seeds and configurations, enabling the evaluation of the virtual agent’s capacity to generalize to new environments.

The captured data are related to the elements of the level visible on the screen in each frame, with which the user interacts to move and complete the level. This is seen from two perspectives: that of the character being controlled and that of the real player. From the character’s perspective, the data of the elements nearby in 8 directions from the character, such as walls, platforms, obstacles, and enemies, with which they can collide, are captured (Figure 3a). From the real player’s perspective, the data of the elements with which they can collide are also collected, but, in this case, these include all elements that are present on the screen, even if they are hidden behind a wall (Figure 3b). See Table 1 for a detailed description of each item stored for the dataset.

Data Preprocessing

In preparation for model training, a comprehensive preprocessing pipeline was implemented to ensure data quality and consistency. This process involved three primary stages: First, filtering and normalization were applied to remove irrelevant metadata, such as timestamp drift or logging anomalies, and to discard incompatible or corrupted frames. Second, a tokenization step was performed, wherein each recorded player action was converted into discrete tokens to create structured sequences suitable for model input. For example, an action such as USE_SKILL_FIREBALL AT_LOCATION would be transformed into a short sequence of tokens representing the action and its parameters. Finally, considerations regarding sequence length and vocabulary size were addressed: the average sequence length per frame cluster ranged between 50 and 60 tokens, and the constructed vocabulary encompassed approximately 5000 unique actions or sub-actions. This preprocessing ensured that the dataset was standardized and semantically meaningful, facilitating efficient and accurate model training.

3.4. Model and Training Process

The objective of this exploration was to train a transformer model augmented with diffusion regularization that can effectively clone user actions recorded across over 11 million game frames. Specifically, we focused on perplexity as a measure of the model’s predictive confidence in accurately replicating players’ decisions. The methodology involved injecting controlled Gaussian noise into decoder embeddings, thereby teaching the transformer to recover from noisy inputs. This strategy provided more fine-grained control over the generative process, enabling more precise iterative corrections and extending its applicability to a broader range of tasks.

Compared to ablated versions without diffusion, our approach markedly reduced perplexity and maintained fidelity even when applied to data from new, unseen game frames. We further discuss the architectural details, domain-specific considerations, and future expansions, such as exploring advanced schedulers and domain adaptation strategies.

In this work, diffusion regularization was incorporated alongside standard teacher forcing. This method trains the transformer to effectively “denoise” noisy labels, thus simultaneously learning the precise mapping of actions and improving its resilience against minor input or embedding perturbations. Perplexity—a principal metric for probability estimation in language modeling—was adopted as our core performance indicator, as it gauges the model’s certainty in sequential decision prediction.

3.4.1. Transformer Architecture and Training

Model Design

The core of the proposed model is based on a standard transformer architecture consisting of an encoder–decoder framework. The encoder processes the input data—such as partial game frame information—and transforms them into a latent representation that captures the contextual relationships within the frame. The decoder then utilizes this encoded information, along with the previously generated tokens, to predict the next user action in the sequence. This architecture enables the model to learn complex temporal and spatial dependencies inherent in gameplay behavior, facilitating accurate action generation based on dynamic game environments. Based on a 12-layer encoder–decoder transformer with 8 attention heads per layer and a hidden dimensionality of 512, our architecture was chosen for a balance between expressiveness and computational feasibility. Preliminary hyperparameter tuning showed diminishing returns beyond 12 layers for this dataset.

Diffusion Regularization Mechanism

To enhance the model’s robustness and generalization, a noise injection and diffusion-based loss strategy was employed during training. First, Gaussian noise ε∼N(0,σ2)ε∼N(0,σ2) was added to the clean decoder embeddings EE, which represent the ground-truth user actions, resulting in a perturbed or “diffused” version:

Enoisy = E + εEnoisy = E + ε.

(1)

These noisy embeddings were then passed through the decoder to generate a set of output logits denoted as noisy_logits. oss L_diff was computed to align noisy_logits with the true user actions.

The final objective function combined the standard cross-entropy loss obtained from the model’s primary prediction path with the diffusion loss, forming a total loss defined as follows:

Ltotal = Lce + λ·LdiffLtotal = Lce + λ·Ldiff.

(2)

where L_ce is the standard cross-entropy loss from the model’s primary path, and λ is a weighting factor. Over the course of training, σ (the noise standard deviation) and λ ramp up linearly.

3.4.2. Training Regime

The training process was conducted using a batch size of 8 and a learning rate of 1 × 10⁻⁴ optimized with the AdamW algorithm. Preliminary experiments indicated that the use of teacher forcing during the initial training epochs was essential for stabilizing and grounding the model’s learning behavior. To ensure consistency in input structure across training batches, a fixed maximum sequence length of 64 tokens was adopted. This constraint helped maintain uniformity in batch processing and contributed to efficient model convergence.

3.4.3. Model Evaluation

Once trained, the model was integrated back into the modified “A Robot Named Fight!” to function as an autonomous agent. During gameplay, the environment continually fed data frames to the model, which responded with sequences of button presses executed in real time, allowing the agent to move, jump, shoot, and otherwise interact with the level. The flow of how this process works is shown in Figure 4.

To assess how effectively the model replicated the user’s play style, two methods were employed.

3.4.4. Quantitative Metrics

The primary evaluation metric used to assess the model’s performance was perplexity (PPL), a standard measure in sequence prediction tasks. Although secondary metrics, such as exact match rate and edit similarity, were computed, perplexity served as the principal criterion due to its sensitivity to token-level prediction quality. On an evaluation set composed of 10,308 game frames, the model achieved a perplexity of 1.0000 and an average cross-entropy loss of 0.0000. These results indicate a near-perfect prediction of user actions, demonstrating extremely high fidelity in replicating the behavior observed in the training data (Table 2). To evaluate the model’s generalizability, it was tested on a separate dataset comprising new 10,308 game frames that were not part of the original dataset. The model achieved a perplexity of 1.0012 and an average cross-entropy loss of 0.0012. While marginally higher than the values observed for the original dataset, these results confirm the model’s strong capacity to generalize and accurately replicate novel user-action sequences. An ablation study was conducted to assess the contribution of diffusion-based regularization. When the diffusion component was minimized or removed, the model’s performance declined slightly, yielding a perplexity of 1.0045 and an average cross-entropy loss of 0.0045. This modest increase highlights the beneficial role of diffusion in enhancing prediction stability and consistency.

In terms of the evaluation results across datasets, the model consistently achieved low perplexity, validating its effectiveness in cloning user actions with and without diffusion-based regularization.

As part of the evaluation, a comparative test was conducted between our model (based on a diffusion transformer architecture) and a standard transformer model (Table 3). The presented results demonstrate a nuanced performance difference between our model (diffusion transformer) and a standard transformer architecture when applied to the task of action cloning in a game environment. While total counts for frames and actions remained equivalent across both models, indicating consistent action coverage, several key performance metrics revealed the impact of the diffusion component. The diffusion transformer model consistently achieved superior Frame Accuracy, edit similarity, and Action Success Rate, suggesting improved fidelity in replicating user actions within the game frames. Statistical analysis, specifically the t-test and Cohen’s d, confirmed that these differences were statistically significant, bolstering the conclusion that the diffusion component meaningfully enhances performance. The observed decreases in norm_levenshtein and avg_entropy in the ablation model suggest a loss of diversity and potentially an increase in the probability of producing less optimal or less varied action sequences.

These findings suggest that the diffusion process introduces a valuable regularization or exploration component, enabling the model to generate more accurate and robust action sequences. This is likely due to the diffusion process’s ability to model the inherent stochasticity of human action, generating a distribution of potential actions rather than deterministic outputs. While the gains were dramatic in all metrics, the consistency of improvement across several key indicators, coupled with statistical significance, supports the inclusion of diffusion mechanisms in action-cloning architectures.

Both models processed an equivalent number of frames and actions, ensuring consistency in task coverage. The diffusion transformer consistently outperformed the baseline in key areas including Action Success Rate, Average Action Distance, Frame Accuracy, and Entropy.

These results demonstrate a clear performance advantage for the diffusion transformer architecture in the task of action cloning within a game environment. While the total number of actions and frames processed remained consistent between both models, the diffusion transformer consistently achieved higher levels of accuracy and action quality, as evidenced by the significantly improved Frame Accuracy (0.0932 difference) and reduced Average Action Distance (−0.016). These improvements were not merely statistically significant (p < 0.0001 for Frame Accuracy, and further statistical significance demonstrated for other metrics), but also exhibited meaningful effect sizes, bolstering the claim that the inclusion of diffusion mechanisms meaningfully enhances performance.

The reduction in Average Action Distance is particularly noteworthy, suggesting that the diffusion transformer not only successfully replicates user actions more often, but also produces more accurate and nuanced action predictions when it does. Furthermore, the observed decrease in Entropy indicates a more predictable and consistent behavior from the diffusion transformer, implying a more stable and reliable action-cloning process.

3.4.5. Observational Analysis

Second, a non-quantitative assessment was conducted (Figure 5). The original participant reviewed gameplay recordings of the agent, offering feedback on how closely the clone’s choices resembled his own. This included instances where the agent decided to bypass enemies, how it navigated unfamiliar room layouts, and its precision in platforming. Although valuable for highlighting nuances in movement and strategy, this evaluation relied on a single observer’s perspective. Future research will aim to incorporate multiple evaluators and standardized protocols to minimize bias.

During observational analysis, the model convincingly emulated the participant’s gameplay style across multiple elements. It advanced to the edges of the screen to locate entrances, shot at doors to open them, timed jumps fairly accurately, selectively removed obstructions, and prioritized avoiding unnecessary combat. Overall, these behaviors showcased the model’s ability to remain stable under varied in-game challenges, reflecting the synergy between the collected data and the modeling process.

4. Results

The model achieved near-perfect perplexity scores across both the original and unseen datasets, indicating exceptionally low predictive uncertainty and a high degree of fidelity in mimicking player behavior. Ablation experiments further revealed that removing the diffusion component led to a modest but measurable decline in performance, underscoring the critical role of noise injection as a regularization mechanism. These findings highlight the value of denoising strategies in stabilizing and enhancing sequence prediction tasks. At the end of the testing phase of the agent’s interaction with the video game environment, there were very promising results about its behavior. The results of the comparative model test support the conclusion that incorporating diffusion models into action-cloning architectures leads to measurable improvements in both the accuracy and quality of cloned actions. The consistent and statistically significant advantages observed across multiple key metrics provide strong evidence that diffusion mechanisms effectively address challenges associated with accurately capturing and replicating complex user behaviors. Future research should focus on exploring the optimal configurations of diffusion models within action-cloning pipelines and investigating the potential of these architectures to generalize across diverse and complex action spaces. The presented results offer a compelling argument for the continued development and adoption of diffusion-based approaches for action cloning and imitation learning. The following are the findings from the agent’s performance in different gameplay tasks.

4.1. Navigation

A positive result was obtained in this task as the agent was always able to move through the scenarios. The agent knew that he had to move to one of the edges of the screen to find a door that would take him to the next scenario. Figure 6a,b show examples of this behavior.

4.2. Platforming

It was noted that, in most cases, the generated agent had good dexterity to delimit the edges of the platforms and move between them by jumping to avoid hazards (Figure 7a). There were isolated occasions where the agent, when trying to simulate many movement instructions in a queue, reacted too late and fell off the platform, causing damage. Fortunately, the agent managed to react and recover from the error by jumping to the next platform (Figure 7b).

4.3. Obstacle Detection

The agent was able to identify and remove on-screen obstacles that obstructed his path by shooting at them. This helped the agent to follow its navigation within each scenario. Examples of these behaviors are shown in Figure 8a,b.

4.4. Combat

In most cases, the agent chose to dodge enemies to concentrate on moving forward in the scenario. This behavior was similar to that adopted by the real user during data collection, as he preferred to skip them in order to move quickly. On some occasions, when enemies were close to the character, the agent decided to shoot them (Figure 9a). For this action, we found different results due to the enemies’ behavior. In the cases where the enemies were only statically positioned at one point or had a simple movement pattern, where it moved from one point to another at a slow speed, it was very easy to defeat them (Figure 9b). However, when the enemies moved faster or in more complex directions, it was more difficult to shoot at them. The agent could successfully aim in the direction of the enemy but failed in his shot because he was not fast enough (Figure 9c).

4.5. Areas of Opportunity

Despite the good results obtained, there were some undesired behaviors that decreased the accuracy of the agent in its interactions. One of these was the sporadic stalling of the player in some parts of the scenario (Figure 10). This consisted of the player repeating the same action, such as running or shooting, for a short period of time until he reacted again and continued on his way. It is believed that this behavior was caused by the previously mentioned limitation that causes a button saturation in the simulation.

5. Discussion

Research has demonstrated the use of behavioral cloning and imitation learning to replicate player interactions [45]. For instance, Tastan and Sukthankar (2011) used inverse reinforcement learning to model human-like behavior in first-person-shooter (FPS) games by extracting high-level patterns from player movement and aiming data [46]. Similarly, Gordillo et al. (2020) applied reinforcement learning to generate AI-driven playtesting curious agents capable of exploring 3D scenarios and finding potential issues inside a game [47]. Unlike these approaches, which rely on reward-based learning, our method leverages a transformer to infer button-press sequences directly from game states, capturing temporal dependencies without requiring explicit reinforcement signals. Additionally, our diffusion mechanism improves action prediction by iteratively refining output sequences. These findings confirm that a transformer model incorporating a diffusion process can effectively replicate a real user’s gameplay style in a 2D side-scrolling, exploration-focused action–adventure environment and also underscore the critical role of diffusion regularization in enhancing model robustness and generalization. By training the model to reconstruct user action tokens from noisy embeddings, the diffusion branch functions as an effective denoising mechanism. This approach yields a resilient generative model capable of accurately replicating user behaviors in the dynamic context of video game environments. Such capabilities open a wide range of applications in game development and testing. For instance, low-perplexity models can serve as player simulators, enabling the automated testing of new content with behavior that closely mirrors that of human players. Similarly, these models could be deployed as AI opponents, generating more natural and human-like behavior in adversarial scenarios. In quality assurance and balancing, the ability to simulate thousands of plausible user actions at scale presents a valuable tool for identifying design flaws and optimizing gameplay experiences. However, certain limitations remain. Future research could explore adaptive or non-linear noise schedules to further improve training dynamics. Expanding the system’s applicability across a broader range of game genres and diverse user behaviors would also enhance its generalizability. Moreover, incorporating multimodal inputs, such as synchronized audio or visual signals, holds promise for improving the fidelity of user-action predictions in more complex, sensor-rich environments.

Quantitative comparisons, combined with observational insights, identified areas of both strength and potential refinement. The agent’s tendency to mirror the participant’s reluctance to engage enemies unless necessary aligned strongly with the source data. However, misplaced or late attacks suggest the model’s limitations in highly dynamic contexts. Likewise, occasional input lag—stemming from hardware constraints—underscored the importance of a robust simulation environment for proper evaluation. Nevertheless, this approach demonstrates the feasibility of training an agent to embody a user’s in-game decision-making patterns. It underscores how further enhancements, such as domain adaptation methods, hardware optimization, and additional play data, could generate even more robust clones. In practice, these clones might assist with iterative design, automated QA testing, or early-stage concept validation.

6. Conclusions and Future Work

The research presented here suggests that a transformer-based model, combined with a diffusion process, can reproduce a substantial segment of a real player’s behaviors in 2D side-scrolling, exploration-focused action–adventure games. Despite hardware constraints, the agent displayed an impressive level of fidelity in replicating navigation, platforming, obstacle clearance, and selective combat avoidance. Crucially, its adaptability in unfamiliar contexts highlights its potential for broader applications.

Several directions for future research and development emerge from these results. First, gathering richer datasets, including multiple users or more diverse genres, could determine how effectively this approach generalizes to various gaming styles and preferences. Second, improving the system’s capacity to process high-frequency input streams without delay or loss would yield more accurate reflections of user behavior. Third, introducing domain adaptation techniques may help the agent handle significantly altered or entirely new environments, strengthening its utility for cross-game or cross-genre testing.

Finally, these user-cloning agents could transform design pipelines by standing in for human testers, providing rapid, player-like responses to new levels, mechanics, or balancing updates. While human play and feedback remain irreplaceable for capturing creative and emotional nuances, a reliable virtual proxy offers an adaptive, resource-efficient complement to traditional user-centered design processes. Further refinement of this transformer–diffusion approach holds significant promise not only for 2D side-scrolling, exploration-focused action–adventure titles with procedural elements, but also for a wide spectrum of interactive systems seeking robust, player-oriented experiences.

Beyond these specific 2D exploration-based action–adventure contexts, the user-cloning methodology described here could be adapted to other game genres or interactive experiences. For instance, strategy games might benefit from simulating diverse tactical styles to test balance and difficulty scenarios, while puzzle- or narrative-driven titles could use cloned agents to refine level design or dialog pacing. By systematically extending data collection and model training across varied gaming paradigms, designers may gain deeper insights into user decision making, streamline prototyping, and ultimately deliver more engaging, player-centric products.

Author Contributions

A.C.M.: investigation, data curation, writing—original draft. H.N.: project administration, conceptualization, supervision, visualization, writing—review and editing. J.C.C.: supervision, conceptualization, methodology, resources, software, validation. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

This research was a collaborative project between Universidad de Monterrey in Mexico and Chiba University in Japan.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Brathwaite, B.; Schreiber, I. Challenges for Game Designers; Charles River Media: Needham, MA, USA, 2009; pp. 1–2. [Google Scholar]
Schell, J. The Art of Game Design: A Book of Lenses; CRC Press: Boca Raton, FL, USA, 2008; pp. 40–46. [Google Scholar]
Zubek, R. Elements of Game Design; MIT Press: Cambridge, MA, USA, 2020; pp. 10–22. [Google Scholar]
Abras, C.; Maloney-Krichmar, D.; Preece, J. User-Centered Design. Encycl. Hum.-Comput. Interact. 2004, 37, 445–456. [Google Scholar]
Lowdermilk, T. User-Centered Design: A Developer’s Guide to Building User-Friendly Applications; O’Reilly Media, Inc.: Sebastopol, CA, USA, 2013; pp. 5–13. [Google Scholar]
Sanders, E.B.N.; Stappers, P.J. Co-Creation and the New Landscapes of Design. CoDesign 2008, 4, 5–18. [Google Scholar] [CrossRef]
Suetake, H.; Fukusato, T.; Arzate Cruz, C.; Nealen, A.; Igarashi, T. Interactive design exploration of game stagesusing adjustable synthetic testers. In Proceedings of the International Conference on the Foundations of Digital Games, Bugibba, Malta, 15–18 September 2020; pp. 1–4. [Google Scholar] [CrossRef]
Amatriain, X. Transformer Models: An Introduction and Catalog. arXiv 2023, arXiv:2302.07730. [Google Scholar]
Merritt, R. What Is a Transformer Model? NVIDIA Blogs. 2024. Available online: https://blogs.nvidia.com/blog/what-is-a-transformer-model/ (accessed on 18 November 2024).
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
Giacaglia, G. How Transformers Work—Towards Data Science. Medium. 3 December 2024.
Parmar, N.; Vaswani, A.; Uszkoreit, J.; Kaiser, Ł.; Shazeer, N.; Ku, A.; Tran, D. Image Transformer. arXiv 2018, arXiv:1802.05751. [Google Scholar]
Khan, S.; Naseer, M.; Hayat, M.; Zamir, S.W.; Khan, F.S.; Shah, M. Transformers in Vision: A Survey. ACM Comput. Surv. 2022, 54, 1–41. [Google Scholar] [CrossRef]
Radford, A.; Kim, J.W.; Xu, T.; Brockman, G.; McLeavey, C.; Sutskever, I. Robust Speech Recognition via Large-Scale Weak Supervision. arXiv 2022, arXiv:2212.04356. [Google Scholar]
Brooks, R.A.; Breazeal, C.; Irie, R.; Kemp, C.C.; Marjanovic, M.; Scassellati, B.; Williamson, M.M. Alternative Essences of Intelligence. In Proceedings of the AAAI Conference on Artificial Intelligence 15, Madison, WI, USA, 26–30 July 1998. [Google Scholar] [CrossRef]
Conroy, D.; Wyeth, P.; Johnson, D. Modeling player-like behavior for game AI design. In Proceedings of the 8th International Conference on Advances in Computer Entertainment Technology, Lisbon, Portugal, 8–11 November 2011; pp. 1–8. [Google Scholar]
Filipović, A. The role of Artificial Intelligence in video game development. Kult. Polisa 2023, 20, 50–67. [Google Scholar] [CrossRef]
Uludağlı, M.Ç.; Oğuz, K. Non-player character decision-making in Computer Games. Artif. Intell. Rev. 2023, 56, 14159–14191. [Google Scholar] [CrossRef]
Ariyurek, S.; Betin-Can, A.; Surer, E. Automated video game testing using synthetic and humanlike agents. IEEE Trans. Games 2021, 13, 50–67. [Google Scholar] [CrossRef]
Jouni, S.; Harri, H. Synthetic Players: A Quest for Artificial Intelligence in Computer Games. Hum. IT 2005, 7, 57–77. [Google Scholar]
Arrabales, R.; Ledezma, A.; Sanchis, A. Towards conscious-like behavior in computer game characters. In Proceedings of the 2009 IEEE Symposium on Computational Intelligence and Games, Milano, Italy, 7–10 September 2009; pp. 217–224. [Google Scholar] [CrossRef]
Antunes, A. Designing a digital twin for adaptive serious games-based therapy. In Proceedings of the 22nd International Conference on Mobile and Ubiquitous Multimedia, Vienna, Austria, 3–6 December 2023; pp. 574–576. [Google Scholar] [CrossRef]
Alexander, K.; McArthur, J.J.; Lachapelle, G.; El Mokhtari, K.; Damm, M. Applying video game design to Building Digital Twin Creation. In Proceedings of the Computing in Construction, Crete, Greece, 10–12 July 2023. [Google Scholar] [CrossRef]
Tanberk, S.; Bilgin Tukel, D.; Acar, K. The Design of a 3D Character Animation System for Digital Twins in the Metaverse. arXiv 2024, arXiv:2407.18934. [Google Scholar] [CrossRef]
Yannakakis, G.N.; Togelius, J. Modeling players. In Artificial Intelligence and Games; Springer: Berlin/Heidelberg, Germany, 2018; pp. 203–255. [Google Scholar] [CrossRef]
Bakkes, S.C.J.; Spronck, P.H.M.; van Lankveld, G. Player behavioural modelling for video games. Entertain. Comput. 2012, 3, 71–79. [Google Scholar] [CrossRef]
Hooshyar, D.; Yousefi, M.; Lim, H. Data-driven approaches to game player modeling. ACM Comput. Surv. 2018, 50, 1–19. [Google Scholar] [CrossRef]
Amadori, P.V.; Bradley, T.; Spick, R.; Moss, G. Robust Imitation Learning for Automated Game Testing. arXiv 2024, arXiv:2401.04572. [Google Scholar] [CrossRef]
Hernández Bécares, J.; Costero Valero, L.; Gómez Martín, P.P. An approach to automated videogame beta testing. Entertain. Comput. 2017, 18, 79–92. [Google Scholar] [CrossRef]
Mastain, V.; Petrillo, F. A behavior-driven development and reinforcement learning approach for Videogame Automated Testing. In Proceedings of the ACM/IEEE 8th International Workshop on Games and Software Engineering, Lisbon, Portugal, 14 April 2024; pp. 1–8. [Google Scholar] [CrossRef]
Ferdous, R.; Kifetew, F.; Prandi, D.; Susi, A. Curiosity Driven Multi-agent Reinforcement Learning for 3D Game Testing. arXiv 2025, arXiv:2502.14606. [Google Scholar] [CrossRef]
Agarwal, S.; Herrmann, C.; Wallner, G.; Beck, F. Visualizing AI playtesting data of 2D side-scrolling games. In Proceedings of the 2020 IEEE Conference on Games (CoG), Osaka, Japan, 24–27 August 2020; pp. 572–575. [Google Scholar] [CrossRef]
Bain, M.; Sammut, C. A framework for Behavioural Cloning. In Machine Intelligence 15; Oxford University Press: Oxford, UK, 2000; pp. 103–129. [Google Scholar] [CrossRef]
Kanervisto, A.; Pussinen, J.; Hautamaki, V. Benchmarking end-to-end behavioural cloning on video games. In Proceedings of the 2020 IEEE Conference on Games (CoG), Osaka, Japan, 24–27 August 2020; pp. 558–565. [Google Scholar] [CrossRef]
Gharbi, H.X.; Fennan, A.; Lotfi, E. Replicating video game players’ behavior through deep reinforcement learning algorithms. J. Theor. Appl. Inf. Technol. 2024, 102, 5735. [Google Scholar]
Pearce, T.; Rashid, T.; Kanervisto, A.; Bignell, D.; Sun, M.; Georgescu, R.; Valcarcel Macua, S.; Tan, S.Z.; Momennejad, I.; Hofmann, K.; et al. Imitating human behaviour with diffusion models. arXiv 2023, arXiv:2301.10677. [Google Scholar] [CrossRef]
Sharma, S.; Tewolde, G.; Kwon, J. Behavioral cloning for lateral motion control of autonomous vehicles using Deep Learning. In Proceedings of the 2018 IEEE International Conference on Electro/Information Technology (EIT), Rochester, MI, USA, 3–5 May 2018. [Google Scholar]
Chen, B.; Tandon, S.; Gorsich, D.; Gorodetsky, A.; Veerapaneni, S. Behavioral cloning in Atari games using a combined variational Autoencoder and predictor model. In Proceedings of the 2021 IEEE Congress on Evolutionary Computation (CEC), Krakow, Poland, 28 June–1 July 2021; pp. 2077–2084. [Google Scholar]
Codevilla, F.; Santana, E.; Lopez, A.; Gaidon, A. Exploring the limitations of behavior cloning for autonomous driving. In Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), Seoul, Republic of Korea, 27 October–2 2019; pp. 9328–9337. [Google Scholar]
Yang, L.; Zhang, Z.; Song, Y.; Hong, S.; Xu, R.; Zhao, Y.; Zhang, W.; Cui, B.; Yang, M.H. Diffusion models: A comprehensive survey of methods and applications. ACM Comput. Surv. 2023, 56, 1–39. [Google Scholar] [CrossRef]
Chen, S.F.; Wang, H.C.; Hsu, M.H.; Lai, C.M.; Sun, S.H. Diffusion Model-Augmented Behavioral Cloning. In Proceedings of the 41st International Conference on Machine Learning, Vienna, Austria, 21–27 July 2024. [Google Scholar] [CrossRef]
Pearce, T.; Zhu, J. Counter-strike deathmatch with large-scale behavioural cloning. 2022 IEEE Conference on Games (CoG). 2022, 104–111. [Google Scholar]
Morningstar Game Studios. A Robot Named Fight! On Steam. 2017. Available online: https://store.steampowered.com/app/603530/A_Robot_Named_Fight/ (accessed on 9 November 2024).
Bitner, M. I’m Making the A Robot Named Fight Source Code Public! 2022. Available online: https://store.steampowered.com/news/app/603530/view/3293844171243621529 (accessed on 9 November 2024).
Jaderberg, M.; Czarnecki, W.M.; Dunning, I.; Marris, L.; Lever, G.; Castañeda, A.G.; Graepel, T. Human-level performance in 3D multiplayer games with population-based reinforcement learning. Science 2019, 364, 859–865. [Google Scholar] [CrossRef] [PubMed]
Tastan, B.; Sukthankar, G.R. Learning policies for first-person shooter games using inverse reinforcement learning. In Proceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Palo Alto, CA, USA, 11–14 October 2011; Available online: https://api.semanticscholar.org/CorpusID:7835693 (accessed on 9 November 2024).
Gordillo, C.; Bergdahl, J.; Tollmar, K.; Gisslen, L. Improving playtesting coverage via Curiosity Driven Reinforcement Learning Agents. In Proceedings of the 2021 IEEE Conference on Games (CoG), Copenhagen, Denmark, 17–20 August 2021; pp. 1–8. [Google Scholar] [CrossRef]

Figure 1. Steps followed in this methodology.

Figure 2. Game session with a participant.

Figure 3. Two different perspectives are used for data capture. (a) The character’s perspective uses 8 ray pointers surrounding the character that detect elements that can be interacted with and the distance at which they are located. (b) From the real player’s perspective, all the elements that appear on screen and which the play can interact with are detected. The screenshot was retrieved from “A Robot Named Fight!” from Morningstar Game Studio, 2017.

Figure 4. Virtual agent’s autonomous play flow.

Figure 5. Observational analysis of the virtual agent’s performance during gameplay. The agent’s actions were reviewed, allowing for a frame-by-frame inspection of its decisions. The original participant assessed these recordings to qualitatively evaluate the extent to which the agent’s behavior aligned with his own play style.

Figure 6. Navigation results: (a) the agent showed he knew that, to go through doors, he had to shoot them first; (b) no matter the position of the door, the agent was able to identify it and walk through it. Screenshots retrieved from “A Robot Named Fight!” from Morningstar Game Studio, 2017.

Figure 7. Platforming results: (a) in most cases, moving between platforms was achieved without any problem; (b) when the character had too much speed and there was still an instruction to move forward in the simulation queue, it would fall into the hazard until it reacted again. Screenshots retrieved from “A Robot Named Fight!” from Morningstar Game Studio, 2017.

Figure 8. Obstacle detection results: (a) the agent was successful in knowing where there was an obstacle and removing it; (b) the task could be performed even if there was a distance between the agent and the obstacle. Screenshots retrieved from “A Robot Named Fight!” from Morningstar Game Studio, 2017.

Figure 9. Combat results: (a) when an enemy was close by, the agent decided to shoot it; (b) it was relatively easy for him to defeat enemies who were stationary in one spot; (c) in this case, the enemy was faster than the agent’s response and was able to escape being shot. Screenshots retrieved from “A Robot Named Fight!” from Morningstar Game Studio, 2017.

Figure 10. There were a few times where the movements would cycle for a moment, which made the character become stuck. Screenshot retrieved from “A Robot Named Fight!” from Morningstar Game Studio, 2017.

Table 1. Dataset structure.

Attribute	Definition
positionX	Player’s position in the X-coordinate
positionY	Player’s position in the Y-coordinate
health	Current life points
energy	Current energy points
weaponsCollected	List of weapons obtained
weaponsSelected	Weapon currently equipped
inputKey	List of buttons pressed
rayPointer	List of collider detector pointers
direction	Direction in which the ray points
collider	Name of the colliding object
distance	Distance between the player and the collider
collidersOnScreen	List of colliders displayed on the screen
collider	Name of the colliding object
location	Collider’s screen coordinates
distance	Distance between the player and the collider

Table 2. Evaluation outcomes across all scenarios, underscoring perplexity (PPL) as the central performance measure.

Evaluation	Samples	PPL	CE Loss
Evaluation set	10,308	1.0000	0.0000
New Frames set	10,308	1.0012	0.0012
Ablation over Evaluation set	10,308	1.0045	0.0045

Table 3. Comparative performance metrics of our model and a standard transformer architecture in the task of action cloning within a game environment.

Group	Metric	Diffusion Transformer	Transformer	Transformer (DTT)
Action	Total	375,779	375,779	-
	Action Success Rate	0.9993 ± 0.0045	0.9965 ± 0.0097	0.0028
	Avg. Action Distance	0.052 ± 0.012	0.068 ± 0.018	−0.016
Frame	Total	10,308	10,308	-
	Frame Accuracy	0.9730 ± 0.1620	0.8798 ± 0.3252	0.0932
	Avg. Frame Confidence	0.9990 ± 0.0026	0.9965 ± 0.0051	0.0025
Entropy	Avg. Entropy	0.0031 ± 0.0060	0.0101 ± 0.0124	−0.0070

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chapa Mata, A.; Nimi, H.; Chacón, J.C. Synthetic User Generation in Games: Cloning Player Behavior with Transformer Models. Information 2025, 16, 329. https://doi.org/10.3390/info16040329

AMA Style

Chapa Mata A, Nimi H, Chacón JC. Synthetic User Generation in Games: Cloning Player Behavior with Transformer Models. Information. 2025; 16(4):329. https://doi.org/10.3390/info16040329

Chicago/Turabian Style

Chapa Mata, Alfredo, Hisa Nimi, and Juan Carlos Chacón. 2025. "Synthetic User Generation in Games: Cloning Player Behavior with Transformer Models" Information 16, no. 4: 329. https://doi.org/10.3390/info16040329

APA Style

Chapa Mata, A., Nimi, H., & Chacón, J. C. (2025). Synthetic User Generation in Games: Cloning Player Behavior with Transformer Models. Information, 16(4), 329. https://doi.org/10.3390/info16040329

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Synthetic User Generation in Games: Cloning Player Behavior with Transformer Models

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Participants

3.2. Materials

3.2.1. Video Game

3.2.2. Software Tools

3.3. Data Collection

Data Preprocessing

3.4. Model and Training Process

3.4.1. Transformer Architecture and Training

Model Design

Diffusion Regularization Mechanism

3.4.2. Training Regime

3.4.3. Model Evaluation

3.4.4. Quantitative Metrics

3.4.5. Observational Analysis

4. Results

4.1. Navigation

4.2. Platforming

4.3. Obstacle Detection

4.4. Combat

4.5. Areas of Opportunity

5. Discussion

6. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI