Game-o-Meta: Trusted Federated Learning Scheme for P2P Gaming Metaverse beyond 5G Networks

Bhattacharya, Pronaya; Verma, Ashwin; Prasad, Vivek Kumar; Tanwar, Sudeep; Bhushan, Bharat; Florea, Bogdan Cristian; Taralunga, Dragos Daniel; Alqahtani, Fayez; Tolba, Amr

doi:10.3390/s23094201

Open AccessArticle

Game-o-Meta: Trusted Federated Learning Scheme for P2P Gaming Metaverse beyond 5G Networks

by

Pronaya Bhattacharya

¹

,

Ashwin Verma

²,

Vivek Kumar Prasad

²,

Sudeep Tanwar

^2,*

,

Bharat Bhushan

³,

Bogdan Cristian Florea

^4,*

,

Dragos Daniel Taralunga

⁴

,

Fayez Alqahtani

⁵

and

Amr Tolba

⁶

¹

Department of Computer Science and Engineering, Amity School of Engineering and Technology, and Research and Innovation Cell, Amity University, Kolkata 700135, West Bengal, India

²

Department of Computer Science and Engineering, Institute of Technology, Nirma University, Ahmedabad 382481, Gujarat, India

³

Department of Computer Science and Engineering, School of Engineering and Technology, Sharda University, Greater Noida 201310, Uttar Pradesh, India

⁴

Department of Applied Electronics and Information Engineering, Faculty of Electronics, Telecommunications and Information Technology, Politehnica University of Bucharest, 061071 Bucharest, Romania

⁵

Software Engineering Department, College of Computer and Information Sciences, King Saud University, Riyadh 12372, Saudi Arabia

⁶

Computer Science Department, Community College, King Saud University, Riyadh 11437, Saudi Arabia

^*

Authors to whom correspondence should be addressed.

Sensors 2023, 23(9), 4201; https://doi.org/10.3390/s23094201

Submission received: 1 March 2023 / Revised: 10 April 2023 / Accepted: 19 April 2023 / Published: 22 April 2023

(This article belongs to the Special Issue Emerging Machine Learning, Blockchain, Sensor and Sensing Technologies for Computer Vision Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The aim of the peer-to-peer (P2P) decentralized gaming industry has shifted towards realistic gaming environment (GE) support for game players (GPs). Recent innovations in the metaverse have motivated the gaming industry to look beyond augmented reality and virtual reality engines, which improve the reality of virtual game worlds. In gaming metaverses (GMs), GPs can play, socialize, and trade virtual objects in the GE. On game servers (GSs), the collected GM data are analyzed by artificial intelligence models to personalize the GE according to the GP. However, communication with GSs suffers from high-end latency, bandwidth concerns, and issues regarding the security and privacy of GP data, which pose a severe threat to the emerging GM landscape. Thus, we proposed a scheme, Game-o-Meta, that integrates federated learning in the GE, with GP data being trained on local devices only. We envisioned the GE over a sixth-generation tactile internet service to address the bandwidth and latency issues and assure real-time haptic control. In the GM, the GP’s game tasks are collected and trained on the GS, and then a pre-trained model is downloaded by the GP, which is trained using local data. The proposed scheme was compared against traditional schemes based on parameters such as GP task offloading, GP avatar rendering latency, and GS availability. The results indicated the viability of the proposed scheme.

Keywords:

P2P gaming; federated learning; metaverse; federated averaging; 5G

1. Introduction

The modern gaming industry has taken a transformative shift owing to revolutions in computer vision, augmented reality (AR), virtual reality (VR), and 3D-rendering engines [1]. This has changed the entire gaming experience for game players (GPs). Modern games are mostly decentralized and peer-to-peer (P2P)-driven, with peer GPs engaging in gaming environments (GEs), and the data are stored on game servers (GSs) [2]. In VR gaming environments (GEs), GPs play the game (e.g., Beat Saber, Iron Man, or Star Trek) through assisted VR hardware (head-mounted displays and hand controllers), and the GP is immersed in the virtual GE. On the contrary, an AR GE (e.g., Pokemon Go or Sun-seeker) converts the real environment surrounding the GP into a digital interface and overlays new information on the real environment. With the rebranding of Oculus Quest and Facebook renaming itself Meta, a giant leap forward is expected in AR/VR GEs, with real-time 360

^{\circ}

navigation, object interactions, and haptic device controls having increased four-fold [3]. The next shift will be towards extended reality (XR) engines, enabling the GP’s eye movements and sensory touch (ears, hands, and facial expression) to be rendered and streamed over P2P networks.

A report by DappRadar suggested that the metaverse’s penetration would reach USD 1.3 billion in the gaming industry in 2022 (third quarter). It is estimated that GM projects will drive NFT-based assets and tokens on Web 3.0 platforms, with 912,000 crypto wallets created every day globally. Figure 1 presents the global market cap of NFT-asset-based GMs by 2030, which shows a compounded increase of 8% annually [4].

The increased adoption of metaverses in P2P GEs has propelled researchers to integrate XR-enabled metaverses (networked 3D virtual worlds) in game engines and offer a next-generation gaming experience to GPs. A metaverse creates a virtual environment to emulate real-world objects and human beings by utilizing various IoT-enabled sensors and haptic devices (motion tracking, eye tracking, and voice recognition) with haptic feedback. A metaverse comprises AR/VR systems. A basic VR system needs an accelerometer, a gyroscope, and a magnetometer to create a virtual form of the existing surroundings [5]. AR systems use more sophisticated and complex sensing systems comprising depth sensing; infrared, ambient light, and bio-sensors; and heat-mapping sensors. This enables the AR system to understand the each person’s exact position in the game, what they see and hear, and how it adapts to the game console’s changing environment. Moreover, it helps to create a more immersive and interactive experience for players in the meta environment.

As per the reports by Statistica [6], a daily number of

54.1

million concurrent gamers play the metaverse game Robolox. The potential metaverse gaming market is expected to reach USD

2.4

billion by 2024. These gaming metaverses (GMs) would offer a holistic XR experience wherein GP avatars can interact, socialize, and earn. For game asset trading, non-fungible tokens (NFTs) could be used as payment mechanisms, and smart contracts (SCs) could be executed to facilitate the asset transfer and selling and purchase of items [7]. Thus, assets would be portable, and transactional details would be stored on the blockchain (BC) [8]. Moreover, GMs could be supported by artificial intelligence (AI)-based engines to personalize and customize the GM environment according to the GP’s likeability [9]. This would improve the Quality-of-Interaction (QoI) of the GPs. The overall experience of playing games can be greatly affected by the QoI. Playing games emphasizing engaging and interacting with players often results in a more enjoyable experience due to QoI. The degree of interactivity a game provides is a critical aspect of the quality of the gameplay experience. Games that provide players with various options, activities, and responses tend to offer a more immersive and captivating experience. Customizable characters and decision making influencing the game’s plot show how games can generate a stronger sense of attachment to the result. The caliber of social interaction among players is also crucial. Multiplayer games that promote collaboration, teamwork, and communication build stronger relationships among players and promote a positive environment. Conversely, games encouraging negative or toxic behavior can damage the social experience and push players to leave. Hence, game developers who prioritize the creation of captivating and immersive experiences, including gameplay and social interaction, are more likely to draw and keep players in the long run.

Game data are normally stored over cloud-assisted game servers (GSs), and AI models are executed to personalize the GE according to the GP. Many confidential GP data (asset information, NFT wallet identifiers, contact addresses, and human sensor data) are stored on GSs for analytics. Thus, any malicious intruder can launch a potential adversarial attack [10]. Secondly, cloud-based GSs suffer from the limitations of huge amounts of traffic, end-user latency, and computational bottlenecks [11].

To address these limitations, federated learning (FL) is a viable choice in GMs and mobile devices. FL has been extensively used for medical ecosystems, as FL utilization in mobile devices enables the support of increasingly complex tasks and operations and contributes to safeguarding user privacy and exploiting the computing resources available on mobile devices [12]. FL is a promising candidate to support the gaming ecosystem due to its high potential. The GE data are trained locally (with the GP’s avatar and personalization data) and are sent to the global model as an update. Coupled with BC, trusted FL training is possible, as local updates are verified as transactional ledgers [13,14].

The remote GPs communicate with the GS over wireless communication infrastructures [15]. Fourth-generation (4G) long-term evolution (LTE) services are preferred. GMs require high-end human–computer interaction (HCI) support, 3D modeling, and edge computing and caching support [16,17]. A recent study by Cheng et al. [18] suggested fifth-generation (5G) network services for improved coverage and caching. The supported throughput in 5G networks is ≈

10 - 20

Gbps, and to improve HCI, tactile internet (TI) channels are preferred to support dynamic XR content. However, in the near future, the large number of concurrent GP connections could require a shift towards sixth-generation (6G) services. By 2030, it is predicted that there will be a need for 6G networks to support up to 125 billion wireless devices [19]. To enable this, creating a smart signal and data-processing system that enables distributed learning is essential. FL is a critical technology that has the potential to fulfill the expected requirements of 6G networks [20]. 6G-TI is envisioned to support a low packet loss rate in the order of

10^{- 9}

, with a

0.1

ms end-to-end delay, which is suitable for real-time haptic interactions. 6G-TI operates on the THz band, with a low outage probability of

10^{- 7}

[21]. Thus, 6G-TI integration with GMs would improve the GP QoI, as real-time rendering and control would be possible.

Motivated by the above discussion, we present an integrative scheme, named Game-o-Meta, which combines FL in a 6G-P2P GM environment. The scheme ensures responsive control for GPs (the avatars in the GM), and the communication is handled with a low latency and delay. FL ensures the customization of the GE for specific GPs based on local data training. Offline storage via interplanetary file systems (IPFSs) is used for the GP data, and the meta records are stored on the BC to ensure trust between heterogeneous P2P game servers. In our scheme, we considered an optimized version of the federated averaging (FedAvg) algorithm. We proposed a parallel version of the FedAvg algorithm, named parallel FedAvg (P-FedAvg), that overcomes the drawbacks of the traditional FedAvg scheme, where a centralized parameter server is responsible for communication with all local FL clients. The traditional FedAvg algorithm suffers from computational bottlenecks in high-load conditions and would significantly affect the learning rate. The reason for this is trivial, as clients might be located at heterogeneous sites, and it is not easy to establish a network with high availability across all links. Our approach divided the entire network into small regions controlled by its regional parameter server.

The research contributions of this article can be summarized as follows.

A framework for GP interaction in a 6G-P2P decentralized GM environment is proposed, whereby the GPs can socialize and interact.
Further, a model of the interaction between two game clients is presented, where the clients form game avatars and interact in the GE through an XR-enabled system. The clients can perform transactions of assets via NFT tokens, and the game state is stored in the local IPFS server.
The article presents a BC-based gradient-exchange model and a threat and privacy model, followed by the problem formulation.
Based on the problem, an optimized parallel federated averaging (P-FedAvg) algorithm is presented, which trains the local data and GE parameters collected from the GM and computes the local gradients to fine-tune the global model hyperparameters. The global model state is stored in a game chain, and game clients download the updated parameters.
The performance of the aforementioned proposed scheme was validated using the parameters of the mining cost, rendering latency, and energy consumption.

The rest of the article is organized as follows. Section 2 discusses the entities, the working flow, and the problem formulation of the proposed scheme. Section 3 presents the interaction model and the FL learning algorithm based on the problem formulation. Section 4 presents the performance analysis of the scheme, and, finally, Section 5 concludes the article.

1.1. Related Work

This section presents the existing GM, FL, and AR/VR ecosystem schemes. Researchers have been inspired to incorporate metaverses (networked 3D virtual worlds) into game engines and provide GPs with a cutting-edge gaming experience, resulting in the rising acceptance of XR in P2P GEs. This section discusses various articles related to gaming scenarios, metaverses, and FL model training scenarios.

Regarding FL-based mechanisms, Zou et al. [22] presented a scheme that integrated FL and game theory for a dynamic strategy and the rationing of optimal participants in distributed learning ecosystems. The accuracy and consumption of energy were identified using cost metrics. For decentralized environments, a mobile crowd FL approach was presented. The scheme ensured profit maximization via training rewards and cost metrics over associated distributed servers and devices. These metrics were deployed using two strategies: accuracy- and size-based policies. Sun et al. [23] combined FL with mobile augmented reality in a metaverse realm. The aim was to perform resource-intensive network operations in distributed systems at an extremely low latency in order to induce a quick response from avatars. The gaming systems had financial transaction modes supported through an NFT and cryptocurrency exchange platform. Park et al. [24] presented a case study on channel access mechanisms for both orthogonal and non-orthogonal policies in the metaverse. The authors presented the concept of semantic communication in the metaverse, which was used for real-time interactions among diverse entities. The authors called this the semantic metaverse (SM). In the SM, the details of the gaming participants and the environmental conditions were analyzed based on reinforcement learning methodologies. The learning models formed an optimal-fit policy to bundle the events and actions together for optimal QoI among the GPs. The scheme proposed a 6G communication channel on the wireless communication front.

Regarding the networking aspects of FL models in the metaverse, Latif et al. [25] discussed the emerging wireless key performance indicators (KPIs) and their role in avatar generation, rendering, twin modeling, and deployment in the metaverse. The paper discussed twin signaling, twin applications, the scheduling of resources through radio signals, proactive traffic analysis, traffic predictions, and intelligent media control rates for optimal network utilization. The overall space was divided into physical and meta-space, and wireless interfaces carried out interactions. Networking metrics for QoS were presented, such as reliability, latency, and a low error rate. In addition to this, security mechanisms for human-centric and emotion-driven model data transfer were also presented. A potential limitation of increased mobile usage was presented, which accounted for the increased complexity of data handling in the metaverse. To solve this limitation, Lin et al. [26] presented the concept of sparse samples, which also dealt with the associated security issues, and proposed an algorithm called federated multi-task inverse soft actor-critic. Here, the roles of the actors and their avatars were given importance in relation to the connected environment. Zelenyanszki et al. [27] considered the privacy implications of NFT avatars. NFTs are one-of-a-kind tokens that can symbolize anything from art to audio to a digital/virtual user participating in the distributed virtual environment. Zhou et al. [28] focused on an FL-assisted mobile AR (MAR) system using non-orthogonal multiple access (NOMA) and proposed an algorithm to optimize time, energy, and model accuracy simultaneously.

Secured and trusted metaverse environments have recently been proposed. For example, Hui et al. [29] assessed the quality of the aggregated model maximization problem and proposed the MAXQ high-quality function to achieve a better decision-making process. Zhang et al. [30] proposed a differential privacy scheme that was used for interactions between participants in a decentralized scenario. A game-theoretic concept that supported budget-balanced, truthful, and individually rational mechanisms was presented for transactional procedures. An influence reversal mechanism, GAMES, in an FL-based environment with an ML framework was discussed in Gupta et al. [31]. The client heterogeneity under distribution alterations in FL was examined structurally. Specifically, the FL GAMES model for learning stable causal representations across clients was designed and tested Lee et al. [32] proposed an asynchronous FL process and analyzed the convergence. Additionally, transmission scheduling in a wireless distributed learning network was explored to enhance the learning process in the distributed ecosystem. Fan et al. [33] proposed a BC-based incentive mechanism for a trading platform and secured transfer via a game-theoretic approach that eventually presented an optimal Nash equilibrium. Regarding FL-based incentive and resource optimization, Jiang et al. [34] proposed a scheme based on the Stackelberg equilibrium for resource optimization and incentive preservation. Yu et al. [35] proposed an FL-incentivized payoff-sharing scheme to incentivize FL data owners to contribute high-quality data to the data federation. This approach also took account of the various factors critical to FL while distinguishing between concerns related to the delay in the federated model’s revenue-generation scheme. Table 1 presents a comparative analysis of the proposed sheme and existing state-of-the-art (SOTA) approaches.

1.2. Research Gap

As discussed in Section 1.1, most recent studies on metaverses for gaming have focused on avatar formation, metaverse engines, and the details of the game design. However, to realize an end-to-end solution, it is imperative to design a scheme for GMs wherein the game data are analyzed and the GPs’ privacy is preserved (including players’ wearable devices and haptic feedback). The sensor data are highly sensitive, because they include a player’s physical movements, motion capture, and physiological and environmental monitoring data. In addition, some metaverse games also collect players’ personal information, such as biometric data (i.e., voice and facial recognition); location; and behavior patterns. Furthermore, metaverse GEs are often accessed via an online interface (public internet), which implies that the sensing data collected by GEs are transmitted over the internet. Thus, if these data are not properly managed (i.e., encrypted), they could be easily intercepted by attackers, potentially leading to a data breach or data theft attack. If adversaries access these data, they can be used for identity theft, fraud, and other nefarious activities. Therefore, a proactive mechanism is required to tackle security- and privacy-related issues associated with metaverse GEs.

Very few studies have focused on integrating FL with metaverses, and, to the best of our knowledge, this is the first scheme to combine the metaverse and FL for GEs. Further, a GM requires quick interactions and control; thus, we present the scheme against the backdrop of the 6G-TI channel, and we utilized enhanced mobile broadband (FeMBB) for bandwidth management. For transactional control, we present an NFT engine in a GM that allows the trading of items among GP avatars. Recent studies have noted that, due to the increase in the use of mobile devices and other ubiquitous computing equipment, a growth in the utilities of metaverses has been observed. The proposed scheme involves a decentralized gaming scenario. The players create the avatars using the metaverse for a real-time experience. These interactions between avatars must be sufficiently secured and fast. Hence, supporting technologies such as 6G, NFTs, and ERCs and connectivity between the local storage and file-sharing systems such as IPFSs with the integration of FL were included.

2. Game-o-Meta: Problem Formulation

This section presents the entities and flow of the proposed scheme. Figure 2 presents the GM environment and the overall GP interaction with the metaverse.

Initially, we present the scheme’s entities and proceed with the problem formulation.

Game players: We considered n GPs, represented as ${G P_{1}, G P_{2}, \dots, G P_{n}}$ . Every $G P$ has an XR-enabled headset and controller ${H_{1}, H_{2}, \dots, H_{n}}$ , which allows the GP to interact in the GM. A trivial assumption for a P2P environment requires all GPs to be globally distributed. The XR controllers allow the players to interact with p games, represented by ${G O_{1}, G O_{2}, \dots, G O_{p}}$ .
Game avatars: Once any $G P_{n}$ registers to a specific game $G O_{m}$ , a mapping ID is created $M : G P \to G O$ , which is denoted as $G I D (n, p)$ . Based on $G I D (n, p)$ , we constructed a metaverse environment wherein a game avatar $G A$ was created. The GM includes the following details of the GP:

$\begin{matrix} G A = {M I D (n, p), V_{i s}, W_{G P}, A_{G P}} \end{matrix}$

(1)

where $M I D (n, p)$ denotes the metaverse ID of user n mapped to game p, $V_{i s}$ is the GE virtual space, $W_{G P}$ is the crypto wallet of $G P$ , and $A_{G P}$ is the NFT assets (tokens) of the GP in the GM ecosystem.

2.1. The Scheme Flow

Any $G P_{n}$ interacts with the XR environment via a controller $X R_{C}$ .
$G P_{n}$ sets up its NFT wallet and decides the assets $A_{G P}$ it can trade in the GM ecospace.
Once $A_{G P}$ is fixed, the information is registered in a smart contract via the ERC 721 token.
The details of the contract and the $A_{G P}$ are stored in a local IPFS, which is accessible to the user via a 32-byte IPFS content key, represented as $I P F S (G P_{n})$ .
GP selects the game engine and the XR environment $X R_{E n v}$ on $X R_{C}$ . A virtual space is created via the GE, and the GP selects a local avatar, $G O_{n}$ , to represent itself in the GM. The haptic control and communication from the GP to the GM is managed through a 6G-TI channel.
The local GM data are stored in a shared metaverse database, which is encrypted by two sets of keys, ${K_{m p u b}, K_{g p r}}$ , where $K_{m p u b}$ represents the accessible public GM key, and $K_{g p r}$ is the private key of the GP.
Based on the local data, a local P2P server is connected as a game server (GS), and the P2P game starts with other GPs (via their GO form in the GM).
An FL training process is initiated by the GS, which ensures the privacy of the GP data and customizes the game according to player interactions, which improves the gamer’s experience.
Based on the game duration and GM interactions via GO in the GM, the local FL model weights are updated.
The updated FL model weights w are stored in a local BC.
The weights are communicated to the mobile aggregator node M.
The aggregator finally updates the global server model, which shares the updated weights and parameters with the local FL models.

2.2. Network Model

A 6G-TI channel C was considered, wherein the bandwidth was managed among n GPs, and the allocation was managed through a network function virtualization (NFV) operator O, which operated over the FeMBB service. As there were n GPs, we considered the link bandwidth split ratio R, where every user received a share of

C / R

. At any given time

τ

, the individual bandwidth function is denoted as follows:

\begin{matrix} B_{n} (τ) = δ_{n} (τ) C_{n} \end{matrix}

(2)

where

B_{n} (τ)

denotes the bandwidth allocated to GP n,

δ_{n} (τ)

denotes the allocated capacity at time

τ

, and

C_{n}

denotes the allocation NFV constant. Based on this, the traffic demand

T_{d}

cannot exceed the overall allocation, denoted as follows:

\begin{matrix} T_{d} \leq \sum_{i = 1}^{n} B_{n} (τ) \end{matrix}

(3)

2.3. Transaction Model

In

V_{i s}

, we considered any two GPs, denoted by

G P_{a}

and

G P_{b}

, interacting via their wallets

W P_{a}

and

W P_{b}

. The wallets are linked to an Ethereum contract C, and virtual NFT tokens

N_{a}

and

N_{b}

are generated for asset transfer. The contract contains the following details:

\begin{matrix} S C = {M I D (n, p), W P_{a}, W P_{b}, N_{a}, N_{b}, A, P} \end{matrix}

(4)

where A denotes the asset ID for transfer, and P denotes the asset price. A local contract address is used to execute the ownership transfer in the GM, stored in the local IPFS of users A and B. The IPFS receives the previous weights

p_{n}^{t}

, overall set of mobile devices

D_{n}

, and local data D.

\begin{matrix} I D \leftarrow I P F S (D, D_{n}, D p_{n}^{t}) \end{matrix}

(5)

Once the 32-bit content address is generated, it is added as the data part of the transaction. The block also contains version V, the hash of the previous block

H_{p}

, the Merkle root

M_{r}

, the time stamp T, and the nonce value N [36].

\begin{matrix} B k \leftarrow (V, H_{p}, M_{r}, T, N, I D) \end{matrix}

(6)

For gradient exchange, we used BC, whereby the local model updates are exchanged. We assumed that each GP generates a gradient update denoted by

g_{a}

and

g_{b}

for their respective local models. The gradients are then encrypted using homomorphic encryption before being sent to the BC network.

Let

H E_{k}

be the homomorphic encryption function with the public key k. The encrypted gradients are denoted by

H E_{k} (g_{a})

and

H E_{k} (g_{b})

. The BC stores the encrypted gradients along with their corresponding transaction IDs. The process of gradient exchange in BC can be defined as follows:

$G P_{a}$ generates the gradient update $g_{a}$ for its local model.
$g_{a}$ is encrypted using homomorphic encryption, denoted as $H E_{k} (g_{a})$ .
The encrypted gradient $H E_{k} (g_{a})$ is added to a transaction $T x_{a}$ along with its metadata and is broadcast to the network.
$G P_{b}$ receives the transaction $T x_{a}$ and decrypts the gradient using its private key: $g_{a} = D E_{k} (H E_{k} (g_{a}))$ .
$G P_{b}$ generates its gradient update $g_{b}$ for its local model.
$g_{b}$ is encrypted using homomorphic encryption, denoted by $H E_{k} (g_{b})$ .
The encrypted gradient $H E_{k} (g_{b})$ is added to a transaction $T x_{b}$ along with its metadata and is broadcast to the network.
$G P_{a}$ receives the transaction $T x_{b}$ and decrypts the gradient using its private key: $g_{b} = D E_{k} (H E_{k} (g_{b}))$ .
The decrypted gradients $g_{a}$ and $g_{b}$ are then used to update the global model on the game server.

The use of homomorphic encryption ensures the privacy of the gradients during the exchange process, as only the authorized GPs can decrypt the gradients.

2.4. FL Model

Two core operations were considered for FL: the local updates stored in the GM-P2P server and the weight aggregation. For any general iteration t, we considered that mobile device

D_{n}

updates the model weights as follows:

\begin{matrix} p_{n}^{t + 1} = p_{n}^{t} - η Δ l (p_{n}^{t}) \end{matrix}

(7)

where

p_{n}^{t}

denotes the previous weights from the device

D_{m}

,

η

denotes the learning rate of the model, and

Δ l

denotes the loss function. The weights are updated based on the FedAvg algorithm, represented as follows:

\begin{matrix} w_{0}^{t + 1} = \sum_{n \in D_{n}} \frac{D_{n}}{D} p_{n}^{t} \end{matrix}

(8)

where

D_{n}

denotes the overall set of mobile devices, and

w_{0}^{t + 1}

denotes the averaged weight of the learning model after the

t^{t h}

iteration.

D_{n}

shows the size of individual

G P_{n}

’s data in the GS, and D is the overall local data size. More specifically,

\frac{D_{n}}{D}

denotes the importance (contribution) of a GP in the learning process.

2.5. Threat Model

We considered a threat model with three types of attackers: passive eavesdroppers, active attackers, and compromised peers. Let us denote the set of all peer devices as R, the set of compromised peers as

C_{R}

, and the set of honest peers as

H = R / C_{R}

. We also denote the set of all GSs as S. The attacks are presented below.

Passive eavesdropping: Passive eavesdroppers

A_{p e}

aim to intercept and read the communication between peers and GSs without modifying the data. We assumed that passive eavesdroppers can intercept all network traffic, including encrypted gradients and weights, and have the ability to perform traffic analysis to infer information about the training data and model. The passive eavesdroppers’ attacks can be represented as follows:

\begin{matrix} A_{p e} = {intercept all network traffic} \end{matrix}

(9)

Active attackers: Active attackers

A_{a a}

aim to modify the data being exchanged between peers and game servers to manipulate the FL process. We assumed that active attackers can modify network traffic in transit, including encrypted gradients and weights, and perform man-in-the-middle attacks. The active attackers’ attacks can be represented as follows:

\begin{matrix} A_{a a} = {modify network traffic in transit} \end{matrix}

(10)

Compromised peers: Compromised peers

A_{c p}

aim to manipulate the FL process by sending incorrect gradients or weights. We assumed that an external attacker controls compromised peers and can perform arbitrary computations. The compromised peers’ attacks can be represented as follows:

\begin{matrix} A_{c p} = {send incorrect gradients or weights} \end{matrix}

(11)

Thus, the overall threat model is presented as follows:

\begin{matrix} T = {R, C_{R}, A_{p e}, A_{a a}, A_{c p}} \end{matrix}

(12)

2.6. Differential Privacy Model

Based on the threat model, we needed to formulate a privacy-preservation FL scheme for GPs in the GM. Let x be the private information of a GP and y be the noisy FL weights uploaded by the GP to the BC. The privacy guarantee of the proposed mechanism was defined in terms of the differential privacy parameter c and the privacy budget

ϵ

.

We assumed that the noise added to the FL weights satisfied the following conditions:

The noise is independent and identically distributed (i.i.d.) and Laplace distributed with a mean of 0 and a scale parameter c.
The differential privacy parameter c satisfies the following condition:

$\begin{matrix} c \geq \frac{ln (1 / δ)}{ϵ} \end{matrix}$

(13)

where $δ$ is the probability of the failure of the differential privacy mechanism, and $ϵ$ is the privacy budget.

Under these constraints, we could prove that the proposed mechanism provided a differential privacy guarantee by analyzing the impact of the noise on the FL weights. Let

H (x)

and

H (y)

be the histograms of the private information x and the noisy FL weights y, respectively. The privacy guarantee can be expressed by the relative entropy between the histograms

H (x)

and

H (y)

, as follows:

\begin{matrix} D (H (x) | | H (y)) \leq ϵ \end{matrix}

(14)

where D is the relative entropy function. Using the properties of the Laplace distribution and the assumption that the noise is i.i.d. and Laplace distributed, we showed that the relative entropy could be upper-bounded as follows:

\begin{matrix} D (H (x) | | H (y)) \leq {c | | x | |}_{1} \end{matrix}

(15)

where

{| | x | |}_{1}

is the L1 norm of the private information x. Therefore, the privacy guarantee could be rewritten as follows:

\begin{matrix} {c | | x | |}_{1} \leq ϵ \end{matrix}

(16)

which implies that

\begin{matrix} {| | x | |}_{1} \leq \frac{ϵ}{c} \end{matrix}

(17)

Thus, by appropriately setting the value of the differential privacy parameter c, we could control the amount of privacy protection the mechanism provides. Thus, the proposed mechanism effectively protects the privacy of GPs in the metaverse by adding differential noise to the FL weights. The model provides a privacy guarantee regarding the differential privacy parameter and the privacy budget and can be tuned to balance privacy and utility. Based on the above discussion, we present the problem objectives below.

Maximize $B_{n} (τ)$ : We intended to maximize the individual GP bandwidth to maximize the overall user experience.
Minimize $S C$ latency l: When $G P_{a}$ and $G P_{b}$ interacted in the GM, the asset transfer latency needed to be minimal via gameplay.
Maximize the accuracy of $(p_{n}^{t})$ : At any given iteration t, the accuracy of the FL model needed to be maximized.

Thus, the problem was considered convex based on the abovementioned objectives and hence needed to be broken into sub-optimal problem sets

S O (l)

.

\begin{matrix} F (B_{n} (τ), S C_{l}, p_{n}^{t}) = m a x (S O (l_{1}), - S O (l_{2}), S O (l_{3})) \end{matrix}

(18)

3. Game-o-Meta: The Proposed Scheme

In this section, based on Equation (18), we present a parallel optimization of the FedAvg scheme (P-FedAvg), wherein we considered that the global model

G M o d

was distributed into q different parallel server models,

{M P_{1}, M P_{2}, \dots, M P_{q}}

. Figure 3 presents the details of the global aggregation model, wherein the game parameters from different GPs (denoted by players A, B, C, and D in the figure) are fed into a subset of

G_{m o d}

. To present the same, we considered any

i^{t h}

MP as a subset of the total GPs connected to

M P_{i}

. A trivial condition was followed, whereby the assignment of

G P

to

M P_{i}

and

M P_{j}

were distinct, and thus

G P_{M P_{i}} \cap G P_{M P_{j}} = ϕ

.

Any

M P_{i}

and

M P_{j}

communicate by forming a ring topology, whereby each

M P

is connected to its two nearest neighbors in terms of network connectivity and not based on physical location. This assumption is trivial, as the nearest

M P

neighbor might be dynamic, and thus the closest

M P

connected through a network link (protocol) would be considered a neighbor. Thus, all these

M P s

are connected circularly. The topology implementation is carried out by defining an array of size

M A X

, where

M A X

denotes the total number of

M P

s in the system. Each element of the array corresponds to an

M P

in the ring. The index of the array represents the identifier (ID) of the

M P

.

The following process could help in determining the neighbors of an

M P

with the ID i. The left neighbor is assigned ID

(i - 1)

, and the ID would wrap around if

i = 0

. The right neighbor is given ID

(i + 1)

, and the ID would wrap to 0 if

i = (M A X - 1)

, where

m a x

is the maximum size of the array. In this way, we could easily determine the neighbor of a given

M P

in the ring based on the ID. Any node sends a message to its left or right neighbors, and the message is passed through the ring until it reaches the destination

M P

.

Based on the ring topology assumption, we considered that

M P_{i}

and

M P_{j}

communicate with each other to share local transactional data, forming a matrix

M_{m \times m}

. The entry is

M_{i j} = 1

if the PS has communicated; otherwise, it is

M_{i j} = 0

.

For any

G P_{i}

, we considered the local data sample as

D_{i}

, and, similarly, as

D_{j}

for any

G P_{j}

. The objective was to train the global

M P_{i}

to which both

G P_{i}

and

G P_{j}

are assigned. For the training problem, we considered the reinforcement FL (RFL) scheme, and the local

D_{i}

at time t for any GP is presented as

D_{i, t}

. We could represent the optimization problem for the

i^{t h}

GP at time t as follows:

\begin{matrix} min_{w \in R^{d}} F_{i} (w) = E_{(x, y) \sim D_{i, t}} [ℓ (w, (x, y))] + λ R (w) \end{matrix}

(19)

where w is the model parameters, ℓ is the loss function, R is the regularizer, and

λ

is the regularization strength. The equation could be modified as follows:

\begin{matrix} min_{w \in R^{d}} F_{i} (w) = E_{(x, y, r) \sim D_{i, t}} [ℓ (w, (x, y)) + γ r V_{i - 1} (x^{'})] + λ R (w), \end{matrix}

(20)

where

γ

is the discount factor, r is the reward signal,

V_{i - 1} (x^{'})

is the value function at the previous time step, and

x^{'}

is the next state. We could use Q-learning to estimate the value function as follows:

\begin{matrix} V_{i - 1} (x) = max_{a} Q_{i - 1} (x, a), \end{matrix}

(21)

where a is the action, and

Q_{i - 1} (x, a)

is the Q-value function at the previous time step. We used the P-FedAvg algorithm to optimize the above equation in a parallel and distributed manner, so that the

i^{t h}

GP communicates with the

j^{t h}

GP to share local transactional data, and the global model is distributed into q different parallel server models.

In P-FedAvg, to maximize

p_{n}^{t}

, we needed to minimize the loss function

L (f)

, computed as follows:

\begin{matrix} L (f) = \sum_{i = 1}^{S} 1 / S_{i} \sum_{i \in N_{s}} \frac{1}{N_{s}} f_{j} (x) \end{matrix}

(22)

where S denotes the cardinality of the GP assigned to any

M P_{i}

, and

N_{s}

denotes the load size. The parallel FedAvg algorithm proceeds as follows:

Every $M P_{i}$ initially distributes the global model parameters obtained from $G M o d$ to its sub-global model $M P$ to start the initial round.
Based on local weights, the $M P$ optimizes the parameter value and passes it to the client batch assigned under it.
Each $G P_{i}$ runs t rounds of local model iteration and updates the model weights, which are returned to its local $M P_{i}$ .
$G M o d$ sets up a local value of the loss function, which it optimizes at each response round from $M P_{i}$ . The process terminates once the loss converges to the optimal value.

Every

G P_{i}

initiates a parameter vector, denoted by

G_{i}

, which requires intermediate vector

γ_{i}

(represented as

\frac{1}{K_{i}} \sum_{i \in K_{i}} G_{i}

) to be accessed by any

M P_{i}

. Similarly, each

M P_{i}

aggregates all the parameter vectors from the neighboring

G P_{j}

, where

j \neq i

. Once this has been achieved, a mixing vector

m_{x} = {w_{i_{i}}, w_{i_{2}}, \dots, w_{i_{k}}}

is defined for k GP as clients for parallel FedAvg learning. The final vector

V_{f}

is updated as follows [37,38]:

\begin{matrix} V_{f} = (m_{x_{1}}, m_{x_{2}}, \dots, m_{x_{k}}) \end{matrix}

(23)

A mixing matrix

M_{f}

is defined for the weights obtained from

V_{f}

. This step ensures the diffusion of the individual weights and the secrecy of the individual gradients.

The gradient function for the client batch of

M P_{i}

is denoted as

\nabla f (G, ξ)

, where

ξ

represents the cardinality

N

of the client GP batch handled by

M P_{i}

. More intuitively, the gradient function is demonstrated as follows:

\begin{matrix} \nabla f (G, ξ) = \frac{1}{ξ} \sum_{\forall s \in ξ} \nabla f (G, s) \end{matrix}

(24)

where

\nabla f (G, s)

is the loss gradient of a general sample obtained from any

G P_{i}

. For convergence, we defined an optimal loss function

\nabla^{*} (G, ξ)

, which is decided by the global server and distributed to

\forall M P

. Once the data are shared by

G P_{i}

, the aggregator A stores the result in the local BC, where the global model picks the gradient.

The algorithm P-FedAvg involves a server and multiple client processes, and every client is associated with a local dataset. The objective was to train the global model and minimize the loss. At any

M P_{i}

, we aggregated the model parameters from its mapped clients while assuring data privacy via a differential privacy mechanism, as outlined in this section. In the server, the initial model parameters are distributed to all clients. Based on a predefined number of rounds, the clients perform local updates on their data using the current model parameters and share their updated parameters with the server. The server aggregates the updated model parameters received from the clients and broadcasts the aggregated parameters to all the clients for the next round of updates. This process continues until convergence is achieved. The client process involves local computation and the sharing of model parameters with the server. Each client computes the gradient of the loss function for its local dataset using the current model parameters and shares the gradient with the server. The server aggregates the clients’ gradients and updates the global model parameters accordingly.

Algorithm 1 shows the details of this process.

Algorithm 1 The parallel approach to the FedAvg algorithm

Input: Learning rate

η

, mixing vector

m_{x}

,

G P_{i}

initial parameter

G_{i}

, iteration number N.

Output: Model parameter X.

1:: procedure Server( $x_{0}$ )
2:: $G_{m o d} \leftarrow x_{0}$
3:: $G_{m o d} \leftarrow$ Distribute ( $M P_{1}, M P_{2}, \dots, M P_{q})$
4:: for (Each round $r = 1, 2, \dots, t$ ) do
5:: for (Every $M P = 1$ to q) do
6:: $N (c) \leftarrow$ rand_assign(Cluster_Size)
7:: end for
8:: Update $γ_{i} \leftarrow \frac{1}{K_{i}} \sum_{i \in K_{i}} G_{i}$
9:: $M P_{i} \leftarrow$ Find_closest_MP $M P_{j}$
10:: $A \leftarrow$ Exchange model parameters with $M P_{j}$
11:: Update $\nabla f (G, ξ)$
12:: end for
13:: if ( $| \nabla^{*} (G, ξ) - \nabla f (G, ξ) | \leq 0.01)$ then
14:: Broadcast message "Global Model convergence achieved"
15:: Mine block M and add to game-chain C
16:: else
17:: REPEAT steps 4-10 UNTIL convergence
18:: end if
19:: end procedure
20:: procedureCLIENT( $x_{0}$ )
21:: for ( $w \leftarrow 1$ to t) do
22:: Compute $γ_{w}$
23:: Compute local $\nabla f (x)$ and share with assigned $M P$
24:: Forward to aggregator A
25:: end for
26:: RETURN X
27:: end procedure

We defined two procedures, SERVER and CLIENT. On the server side, lines 2–3 show the formation of a parallel

M P_{i}

from the global model, and lines 4–6 define the random assignment of a

G P

to any

M P_{i}

. Once the

G P

is assigned,

γ_{i}

is updated, and the aggregator exchanges model parameters with

M P_{j}

. These steps are shown in lines 7–10. Finally, convergence is achieved and the model loss is minimized between the optimal and current iteration. Lines 12–15 show the conditions. On the other side, the clients compute the local gradient for each iteration and share the local gradient loss with their assigned

M P

. Lines 18–24 depict these conditions.

To improve the rendering latency of the GPs in the GM environment, we envisioned that our parallel FedAvg algorithm would form a dynamic connected mesh network M, where the topology T of the node connections (between

M P_{i}

and

M P_{j}

) is non-variable. A link l is established per demand based on the broadcast request defined in Algorithm 1. Further, the 6G-FeMBB service addresses the channel bandwidth to maximize

B_{n} (τ)

and minimize l. Thus, the scheme addresses our sub-optimal problems and sets

S O (l)

to form a unified scheme.

3.1. Complexity Analysis

In this subsection, we present an overview of the time complexity of the proposed algorithm. The complexity depends on the number of iterations and the overall computation time per iteration. To analyze the time complexity, we had to consider both the SERVER and the CLIENT processes. The details are presented below.

3.1.1. Server Process

The complexity of the server process depends on the number of iterations t and the computation time required for each iteration. For each iteration, the server performs four functions: cluster assignment, model aggregation, parameter exchange, and gradient computation. The details are presented below.

For cluster assignment, we assumed that the dataset size is d, with a total of q clients. Then, the complexity would be $O (d \times q)$ .
For model aggregation, we used $m_{x}$ as a mixing vector and assumed that the model size is C. Thus, for q clients, the complexity would be $O (q \times C)$ .
For parameter exchange, we assumed that a ring-based system is preferred to communicate with a neighboring node (neighboring MP), exchanging a message of size m. Thus, the communication complexity would be $O (t \times m)$ in each iteration. While communicating with the client node, with t iterations, the complexity would be $O (d \times C \times t)$ . Thus, the overall complexity would be $O (d \times q + d \times C \times t)$ in the ring-based system.
For gradient computation, the server computes the gradient of the objective function using the exchanged parameters. With a dataset size d and a model size C, the complexity would be $O (d \times C)$ .

Thus, the overall time-complexity of the server process would be

O (d \times q + q \times C + d \times q + d \times C \times t + d \times C)

.

3.1.2. Client Process

The time complexity of the client process depends on the number of iterations t and the computation time per iteration. There are three major tasks at the client node: the local gradient computation, model update, and local model sharing with mapped

M P_{i}

. The details are presented as follows.

The time complexity of computing the gradient of the objective function depends on the size of the local dataset and the complexity of the model. Assuming a fixed model architecture, the time complexity of computing the gradient is linear according to the local dataset’s size, denoted by $n_{i}$ . Specifically, for each round w, the CLIENT function computes the gradient of the objective function for the $i^{t h}$ client using its local dataset, which takes $O (n_{i})$ time.
The time complexity of updating the model parameters also depends on the size of the local dataset and the complexity of the model. Assuming a fixed model architecture and a fixed number of optimization steps per round, the time complexity of updating the model parameters is linear according to the local dataset’s size, denoted by $n_{i}$ . Specifically, for each round w, the CLIENT function updates its local copy of the model parameters using the computed gradient, which takes $O (n_{i})$ time.
The time complexity of sharing the updated model parameters with the server depends on the communication protocol and network bandwidth. Assuming a fixed communication protocol and a fixed number of clients per server, the time complexity of model sharing is proportional to the size of the model parameters, denoted by b. Specifically, for each round w, the CLIENT function sends its updated model parameters to the assigned MP, which takes $O (b)$ time.

Thus, the overall time complexity of the CLIENT node is

O (n_{i} + b)

.

In our proposed scheme, we noticed that the overall SERVER node complexity depends on model aggregation, cluster assignment, and fast parameter exchange. In the problem formulation, we wished to maximize

B_{n} τ

, and thus the model aggregation complexity depends on the amount of individual bandwidth the client node posses. Using a 6G communication service (FeMBB) alleviates this problem. Further, at the server, the size of the local dataset maps onto the problem of maximizing the FL model accuracy

(p_{n}^{t})

. The larger the size of the dataset, the more accurately the model can be trained.

On the other hand, the time complexity of the CLIENT node is influenced by the number of iterations (t), which is related to the objective of maximizing the accuracy of the FL model

(p_{n}^{t})

. The more iterations, the more accurately the model can be trained. The time complexity of the CLIENT node is also influenced by the size of the local dataset, which is related to the objective of minimizing the latency of asset transfer. The smaller the dataset size, the faster the asset transfer latency can be minimized.

Thus, the proposed P-FedAvg algorithm with a ring-based communication system between

M P_{i}

s forms an effective load-balancing mechanism and was thus deemed conducive to a higher accuracy and faster convergence in FL tasks compared to other methods such as centralized FL and non-ring-based parallel FL algorithms. Additionally, our method can handle heterogeneous client environments with varying computation capabilities and communication bandwidths, a common real-world scenario. Therefore, our method is superior in scalability, adaptability, and performance.

4. Performance Evaluation

This section presents the performance evaluation of the Game-o-Meta scheme based on some defined parameters, namely, the mining cost of transactional updates by

G P_{n}

, the

G O

(avatar) rendering time on the 6G channel, the energy consumption of the FL dataset setup, the federated RL average reward plot, the overall model accuracy, and a comparative analysis of different FL aggregation algorithms. After the simulation results, we present the formal security analysis of the proposed scheme.

4.1. Experimental Setup

For the simulation of the BC node, we used the Ethereum Remix virtual machine 1.3.6 native IDE. This system included an Intel i5 processor with 8 GB RAM, 128 GB SSD, and Ubuntu 80.04 LTS installed. For FL learning, we trained the machine learning model using MNIST with skewed data [39]. Each device from a federated group of 100 mobile game samples was randomly selected from the training dataset. We varied the number of GPs to 60 for a federated environment, resulting in a training dataset of up to 6000 combinations. We used three different neural networks to train the model with 0, 64, and 512 hidden layers of neurons. We trained the model with a different dataset and different Earth mover’s distance settings, and, to measure the energy consumed in training, we used Raspberry Pi. The details of the simulation parameters for the experimental setup are presented in Table 2.

4.2. Simulation Results

Here, we present the simulation results of the proposed scheme. Figure 4 represents the mining cost of storing the weights (

W_{G P_{n}}

) in the BC via the IPFS.

First, the

W_{G P_{n}}

is stored in the IPFS, and the IPFS generates a 32-byte content address that is stored in a block of the BC. For 1000 gradient updates, the approximate size of the block would be 3.8 kilobytes (KB). With this block size, the cost of storing based on the mining incentive and the current rate of Ethereum 2023 (Q1) is USD 14.82 per KB. Based on the above computation, we compared the mining cost of storing data on the BC against related schemes [40,41] and demonstrated a significant improvement of 78.24% in the mining cost.

Figure 5 In terms of metaverse rendering, we considered the 6G-TI channel, with the FeMBB service.

A realistic scenario consists of a 4K scene of about 8294400 pixels with a 12-bit color depth and a 120-frames-per-second frame rate. The transmission time of the image in 5G-TI and 6G-TI would be 111 ms and 101 ms, respectively, and the 3D rendering avatar would require a transmission of 4.32 Tbps [42]. Thus, the rendering time for 5G-TI and 6G-TI was found to be 4.9 ms and 0.049 ms, which resulted in the overall latency of 5G-TI being ≈ 115 ms and that of 6G-TI being ≈ 101 ms [43,44]. The figure shows an improvement of ≈36% compared to traditional 5G-TI communication.

Figure 6 shows the energy consumption of training the global model based on the

W_{G P}

received by the gradients from all GPs.

The energy demand of an ML model can be defined as the amount of energy required to train the model on a given hardware platform, which is a critical factor in designing and deploying ML models, as it directly affects the cost, efficiency, and environmental impact of these systems. One way to mathematically represent the energy demand of an ML model is to consider the number of operations required for training or inference. Let us assume that we have a neural network model with L layers, and each layer l has

n_{l}

neurons. The number of operations required for training the model on a dataset of size N can be approximated as follows:

\begin{matrix} E = \sum_{i = 1}^{n} N_{i} \cdot f_{i} \cdot V_{d d}^{2} \cdot C_{i} \end{matrix}

(25)

where E is the total energy demand of the ML model, n is the total number of layers in the model,

N_{i}

is the number of neurons in layer i,

f_{i}

is the average spiking frequency of neurons in layer i,

V_{d d}

is the supply voltage of the neurons, and

C_{i}

is the total capacitance of the neurons in layer i. Here, the term

n_{l}^{2}

represents the number of operations required to compute the inner product of the weights and activations of a single neuron in layer l. The sum of all layers accounts for the total number of neurons in the model. Changing the number of hidden layer units also affects the model’s accuracy. The number of hidden layer units is a hyperparameter that needs to be tuned to optimize the performance of a neural network. Increasing the number of hidden layer units can allow the network to learn more complex relationships between the input and output, potentially leading to higher accuracy. However, increasing the number of hidden layer units also increases the computational complexity of the network, which can lead to higher energy consumption. Thus, we adjusted the number of hidden layer units in the simulations to find a balance between accuracy and energy demand. We considered the hidden layer units to determine the optimal configuration for the proposed scheme. As we observed, the training duration for local GPs was linear, and as we increased the dataset size, the time required to train the model grew simultaneously. The neural network model was trained with 0, 64, and 512 hidden neural network layers. From the figure, one can observe that as the complexity of the training increased, the energy required for the model’s training also increased.

Figure 7 presents the model accuracy with a varying dataset size.

To simulate this relationship, we considered the Earth mover’s distance (EMD) metric, which signifies the dissimilarity between two probability distributions. In FL, EMD is used to measure the distance between the local models of clients and the global model. The EMD impacts the performance of FL algorithms, as it affects how the local models of the clients are combined to form the global model. A smaller EMD indicates that the clients’ local models are more similar and hence can be combined more easily to form the global model. A larger EMD indicates that the local models are more dissimilar; hence, it may be more difficult to combine them to form the global model. Therefore, a lower EMD is generally preferred for FL.

When the dataset size was small, each client had fewer samples to train the model, which meant that the distribution of model parameters across clients was more similar. This was because, with fewer samples, there was less room for variation in the learned model parameters.

Therefore, it was expected that when the dataset size was small, the EMD value would be lower compared to when the dataset size was large and each client had more samples and, therefore, there was more variability in the learned model parameters. The results were consistent with the above explanation. We used different Earth mover’s distances (

σ

) to check the model’s accuracy and evaluate the dissimilarity between the two multidimensional distributions of data points. The position and points were the points in N-dimensional space that were critical. When the size of the dataset was relatively small, the accuracy of the model increased dramatically, converging to a certain level when the size was large.

To assess the performance of our federated RL algorithm, we considered a plot of the average reward obtained during the training of the RL algorithm. We considered the number of episodes and the average reward per episode. The initial RL parameters were set as depicted in Table 1. Figure 8 presents the plot details.

As one can see from the plot, the average reward was low at first and gradually increased as the number of episodes increased. This indicated that the RL algorithm was learning and improving over time. The Q-learning algorithm uses a Q-value function to estimate the expected rewards of a particular action in a given state. The algorithm iteratively updates the Q-values based on the observed rewards and chooses the action with the highest Q-value in each state.

The plot shows the average reward obtained per episode over 100 episodes of training. The blue line represents the rewards obtained in each episode, and the gray dashed line represents the average reward over all episodes. The plot indicates that the Q-learning algorithm improved the game player experience by gradually increasing the average reward over the training episodes. Thus, the algorithm performance improved as it learnt to optimize the action selection process in the GM environment. The reward obtained per episode increased as the algorithm became better at selecting actions.

In an FL environment, client devices are heterogeneous, with limited resources, and each device has capabilities such as computing power and network bandwidth. Normally, a client randomly decides to participate in the training process, which affects the training and unbalances the model accuracy. Therefore, it was important to ensure that the clients participated in the training process. The FedAvg approach randomly selected a client for participation; the model performed poorly when the data were highly dependent and not distributed identically.

Figure 9 represents a comparison of the traditional, FedCS [45], and HybridFL [46] aggregation algorithms with FedAvg.

We considered the true-positive rate (TPR), true-negative rate (TNR), false-positive rate (FPR), false-negative rate (FNR), and accuracy as the parameters for algorithm selection. In the figure, TPR is compared against the traditional approaches for classification, TNR for categorization, FPR for classification, and FNR for classification, as well as classification accuracy. We used the FedAvg algorithm to achieve better accuracy based on these parameters.

4.3. Formal Security Analysis

Theorem 1.

The proposed P-FedAvg scheme is secure against passive attacks, wherein an adversary tries to obtain sensitive information from the encrypted gradients or weights. This is achieved by using homomorphic encryption for gradient exchange and aggregation.

Proof.

Let

g_{n}^{t}

denote the gradient computed on the n-th peer device at iteration t. The gradient is encrypted before transmission to the GS using a public key encryption scheme, as follows:

\begin{matrix} {\hat{g}}_{n}^{t} = E_{P K} (g_{n}^{t}) \end{matrix}

(26)

where

E_{P K}

is the encryption function using the public key

P K

. Assuming the adversary intercepts the encrypted gradient

{\hat{g}}_{n}^{t}

, the adversary would not be able to obtain the plaintext gradient

g_{n}^{t}

without the private key

S K_{n}

. This is due to homomorphic encryption, which allows operations on ciphertexts to be performed as if they are being performed on plaintexts. It also ensures that the plaintext gradient is not revealed during aggregation. Hence, the proposed P-FedAvg scheme is secure against passive attacks. □

Theorem 2.

The proposed P-FedAvg scheme is secure against active attacks launched by malicious GSs.

Proof.

Let

g_{n}^{t}

be the gradient computed on the n-th peer device at iteration t. Let

E_{P K}

be the encryption function using the public key

P K

, and

D_{S K_{n}}

the decryption function using the private key

S K_{n}

of the n-th peer device. Let

g^{t}

be the aggregated gradient at iteration t, computed as the sum of the encrypted gradients, as follows:

\begin{matrix} g^{t} = \sum_{n \in N} {\hat{g}}_{n}^{t} \end{matrix}

(27)

where

{\hat{g}}_{n}^{t} = E_{P K} (g_{n}^{t})

. Suppose a malicious GS launches an active attack to modify the aggregated gradient

g^{t}

by adding a malicious gradient

Δ g^{t}

as follows.

\begin{matrix} g^{' t} = g^{t} + Δ g^{t} \end{matrix}

(28)

where

g^{' t}

is the modified gradient. The game server then generates a new set of global weights

w^{t + 1}

by performing a gradient descent step using the modified gradient, as follows:

\begin{matrix} w^{t + 1} = w^{t} - α \nabla g^{' t} \end{matrix}

(29)

However, the malicious gradient

Δ g^{t}

is encrypted using the public key

P K

, and thus the GS cannot directly modify it. Instead, the GS would have to compute the encrypted version of the malicious gradient as follows:

\begin{matrix} {\hat{Δ g}}^{t} = E_{P K} (Δ g^{t}) \end{matrix}

(30)

and add it to the encrypted gradients received from the peer devices. This step is shown below.

\begin{matrix} {\hat{g^{'}}}^{t} = \sum_{n \in N} ({\hat{g}}_{n}^{t} + {\hat{Δ g}}^{t}) \end{matrix}

(31)

However, since homomorphic encryption is used, the GS cannot perform addition on the encrypted gradients without knowing the private keys

S K_{n}

of the peer devices. Thus, a malicious attack is not feasible. Therefore, the proposed P-FedAvg scheme is secure against active attacks launched by malicious GSs. □

Theorem 3.

The proposed P-FedAvg scheme is secure against compromised peer attacks.

Proof.

Let

g^{t}

be the aggregated gradient at iteration t, computed as the sum of the encrypted gradients, as follows:

\begin{matrix} \hat{g_{n}^{t}} = \sum_{n \in N} \hat{g_{n}^{t}} \end{matrix}

(32)

where

\hat{g_{n}^{t}} = E_{P K} (g_{n}^{t})

. Suppose a compromised peer attack is launched, where a peer device with private key

S K_{c}

is compromised by an attacker. The attacker can use

S K_{c}

to decrypt the encrypted gradient

\hat{g_{n}^{t}}

and obtain the plaintext gradient

g_{c}^{t}

. The attacker can then modify the plaintext gradient and encrypt it using the public key

P K

, generating a new encrypted gradient

\hat{g {^{'}}_{c}^{t}}

, as follows:

\begin{matrix} \hat{g {^{'}}_{c}^{t}} = E P K (g {^{'}}_{c}^{t}) \end{matrix}

(33)

where

\hat{g {^{'}}_{c}^{t}}

is the modified plaintext gradient. However, since the other peer devices are still computing their gradients, the attacker cannot modify their encrypted gradients without knowing their private keys

S K_{n}

. Thus, the attacker cannot modify the aggregated gradient

g^{t}

. Therefore, the proposed P-FedAvg scheme is secure against compromised peer attacks. □

5. Conclusions and Future Scope

In the future, the gaming industry will involve immersive, realistic GE experiences for GPs, and GMs will soon become the norm. AI-enabled GMs could leverage XR engines to enable smooth interactions and improve the QoI for GPs. In this paper, we proposed a scheme, Game-o-Meta, that addresses the requirements of secured and personalized GEs for GPs, with GE data being trained on local devices via FL. We envisioned a 6G-TI channel to allow real-time haptics control and support large numbers of concurrent GP connections over the metaverse environment. FL also assures the privacy of GP data, as they are not shared centrally. A FedAvg algorithm was presented, wherein the communication costs were minimized due to the 6G-P2P channel. Further, the scheme considered transactional payments and asset transfers via BC and NFTs. To address the scalability concerns of BC, we determined that local GP data would be stored in local offline IPFSs and accessed through a content key shared with any authorized user. Thus, the presented scheme formed an umbrella approach for P2P GM networks (in terms of effective learning, communication delay, and secured transactions).

In the future, the authors will extend this article by achieving weight FL aggregation over a multi-GP scenario via the secured multi-part computation algorithm, with the collaborating GPs keeping the game inputs private and jointly computing a function to obtain the gradient inputs.

Author Contributions

Conceptualization: P.B., A.V., B.C.F. and S.T.; writing—original draft preparation: P.B, A.V., D.D.T. and V.K.P.; methodology: A.V., V.K.P., F.A., A.T. and S.T.; writing—review and editing: P.B., A.V., B.C.F., D.D.T. and V.K.P.; software: A.V., V.K.P., B.C.F., D.D.T. and B.B.; visualization: P.B., A.V., F.A., A.T. and B.B.; investigation: S.T., A.T., D.D.T. and B.B. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the Researchers Supporting Project number (RSP2023R509) King Saud University, Riyadh, Saudi Arabia and also supported by a grant of the Ministry of Research, Innovation and Digitization, CNCS/CCCDI–UEFISCDI, project number PN-III-P1-1.1-TE-2021-0393, within PNCDI III.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No data are associated with this research work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Burkert, R.; Horwat, P.; Lütsch, R.; Roth, N.; Stamm, D.; Stamm, F.; Vogt, J.; Jansen, M. Decentralized Online Multiplayer Game Based on Blockchains. In Proceedings of the Blockchain and Applications: 3rd International Congress, Salamanca, Spain, 6–8 October 2021; pp. 44–53. [Google Scholar]
Brooks, A.L. Gaming, VR, and Immersive Technologies for Education/Training. In Recent Advances in Technologies for Inclusive Well-Being: Virtual Patients, Gamification and Simulation; Springer: Berlin/Heidelberg, Germany, 2021; pp. 17–29. [Google Scholar]
Allam, Z.; Sharifi, A.; Bibri, S.E.; Jones, D.S.; Krogstie, J. The Metaverse as a Virtual Form of Smart Cities: Opportunities and Challenges for Environmental, Economic, and Social Sustainability in Urban Futures. Smart Cities 2022, 5, 771–801. [Google Scholar] [CrossRef]
Blockchain Gaming and Metaverse Initiatives Raised $1.3 Billion in Q3 2022, Says Report. Available online: https://www.outlookindia.com/business/blockchain-gaming-and-metaverse-initiatives (accessed on 18 January 2023).
Bhattacharya, P.; Saraswat, D.; Savaliya, D.; Sanghavi, S.; Verma, A.; Sakariya, V.; Tanwar, S.; Sharma, R.; Raboaca, M.S.; Manea, D.L. Towards Future Internet: The Metaverse Perspective for Diverse Industrial Applications. Mathematics 2023, 11, 941. [Google Scholar] [CrossRef]
Clement, J. Video Gaming and the Metaverse—Statistics & Facts. Available online: https://www.statista.com/topics/9490/metaverse-and-video-gaming/#dossierKeyfigures (accessed on 19 January 2023).
Kshetri, N. Web 3.0 and the Metaverse Shaping Organizations’ Brand and Product Strategies. IT Prof. 2022, 24, 11–15. [Google Scholar] [CrossRef]
Bodkhe, U.; Tanwar, S.; Bhattacharya, P.; Verma, A. Blockchain adoption for trusted medical records in healthcare 4.0 applications: A survey. In Proceedings of the Second International Conference on Computing, Communications, and Cyber-Security: IC4S 2020, Ghaziabad, India, 3–4 October 2021; pp. 759–774. [Google Scholar]
Khatri, S.; Vachhani, H.; Shah, S.; Bhatia, J.; Chaturvedi, M.; Tanwar, S.; Kumar, N. Machine learning models and techniques for VANET based traffic management: Implementation issues and challenges. Peer-Peer Netw. Appl. 2021, 14, 1778–1805. [Google Scholar] [CrossRef]
Di Pietro, R.; Cresci, S. Metaverse: Security and Privacy Issues. In Proceedings of the 2021 Third IEEE International Conference on Trust, Privacy and Security in Intelligent Systems and Applications (TPS-ISA), Atlanta, GA, USA, 13–15 December 2021; pp. 281–288. [Google Scholar] [CrossRef]
Verma, A.; Bhattacharya, P.; Bodkhe, U.; Ladha, A.; Tanwar, S. Dams: Dynamic association for view materialization based on rule mining scheme. In Proceedings of the Recent Innovations in Computing: Proceedings of ICRIC 2020, Jammu, India, 20–21 March 2020; pp. 529–544. [Google Scholar]
Prasad, V.K.; Bhattacharya, P.; Maru, D.; Tanwar, S.; Verma, A.; Singh, A.; Tiwari, A.K.; Sharma, R.; Alkhayyat, A.; Țurcanu, F.E.; et al. Federated Learning for the Internet-of-Medical-Things: A Survey. Mathematics 2023, 11, 151. [Google Scholar] [CrossRef]
Saraswat, D.; Verma, A.; Bhattacharya, P.; Tanwar, S.; Sharma, G.; Bokoro, P.N.; Sharma, R. Blockchain-Based Federated Learning in UAVs beyond 5G Networks: A Solution Taxonomy and Future Directions. IEEE Access 2022, 10, 33154–33182. [Google Scholar] [CrossRef]
Patel, V.A.; Bhattacharya, P.; Tanwar, S.; Gupta, R.; Sharma, G.; Bokoro, P.N.; Sharma, R. Adoption of Federated Learning for Healthcare Informatics: Emerging Applications and Future Directions. IEEE Access 2022, 10, 90792–90826. [Google Scholar] [CrossRef]
Singh, A.; Tiwari, A.K.; Bhattacharya, P. Bit Error Rate Analysis of Hybrid Buffer-Based Switch for Optical Data Centers. J. Opt. Commun. 2022, 43, 339–346. [Google Scholar] [CrossRef]
Gupta, R.; Reebadiya, D.; Tanwar, S. 6G-enabled Edge Intelligence for Ultra-Reliable Low Latency Applications: Vision and Mission. Comput. Stand. Interfaces 2021, 77, 103521. [Google Scholar] [CrossRef]
Gupta, R.; Nair, A.; Tanwar, S.; Kumar, N. Blockchain-assisted secure UAV communication in 6G environment: Architecture, opportunities, and challenges. IET Commun. 2021, 15, 1352–1367. [Google Scholar] [CrossRef]
Cheng, R.; Wu, N.; Chen, S.; Han, B. Will Metaverse be NextG Internet? Vision, Hype, and Reality. arXiv 2022, arXiv:2201.12894. [Google Scholar] [CrossRef]
Saad, W.; Bennis, M.; Chen, M. A vision of 6G wireless systems: Applications, trends, technologies, and open research problems. IEEE Netw. 2019, 34, 134–142. [Google Scholar] [CrossRef]
Verma, A.; Bhattacharya, P.; Budhiraja, I.; Gupta, A.K.; Tanwar, S. Fusion of Federated Learning and 6G in Internet-of-Medical-Things: Architecture, Case Study and Emerging Directions. In Proceedings of the Futuristic Trends in Networks and Computing Technologies: Select Proceedings of Fourth International Conference on FTNCT 2021, Ahmedabad, India; Springer: Singapore, 2022; pp. 229–242. [Google Scholar]
Fettweis, G.P.; Boche, H. 6G: The Personal Tactile Internet—And Open Questions for Information Theory. IEEE BITS Inf. Theory Mag. 2021, 1, 71–82. [Google Scholar] [CrossRef]
Zou, Y.; Feng, S.; Niyato, D.; Jiao, Y.; Gong, S.; Cheng, W. Mobile Device Training Strategies in Federated Learning: An Evolutionary Game Approach. In Proceedings of the 2019 International Conference on Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and IEEE Cyber, Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData), Atlanta, GA, USA, 14–17 July 2019; pp. 874–879. [Google Scholar] [CrossRef]
Sun, H.; Ma, X.; Hu, R.Q. Adaptive Federated Learning With Gradient Compression in Uplink NOMA. IEEE Trans. Veh. Technol. 2020, 69, 16325–16329. [Google Scholar] [CrossRef]
Park, J.; Choi, J.; Kim, S.L.; Bennis, M. Enabling the Wireless Metaverse via Semantic Multiverse Communication. arXiv 2022, arXiv:2212.06908. [Google Scholar]
Khan, L.U.; Yaqoob, I.; Salah, K.; Hong, C.S.; Niyato, D.; Han, Z.; Guizani, M. Machine Learning for Metaverse-enabled Wireless Systems: Vision, Requirements, and Challenges. arXiv 2022, arXiv:2211.03703. [Google Scholar] [CrossRef]
Lin, F.; Ning, W.; Zou, Z. Fed-MT-ISAC: Federated Multi-task Inverse Soft Actor-Critic for Human-Like NPCs in the Metaverse Games. In Proceedings of the Intelligent Computing Methodologies: 18th International Conference, ICIC 2022, Xi’an, China, 7–11 August 2022; pp. 492–503. [Google Scholar]
Zelenyanszki, D.; Hóu, Z.; Biswas, K.; Muthukkumarasamy, V. A privacy awareness framework for NFT avatars in the metaverse. In Proceedings of the 2023 International Conference on Computing, Networking and Communications (ICNC), Honolulu, HI, USA, 20–22 February 2023; pp. 431–435. [Google Scholar] [CrossRef]
Zhou, X.; Li, Y.; Zhao, J. Resource Allocation of Federated Learning Assisted Mobile Augmented Reality System in the Metaverse. arXiv 2023, arXiv:2301.12085. [Google Scholar]
Hui, D.; Zhuo, L.; Xin, C. Quality-Aware Incentive Mechanism Design Based on Matching Game for Hierarchical Federated Learning. In Proceedings of the IEEE INFOCOM 2022-IEEE Conference on Computer Communications Workshops (INFOCOM WKSHPS), Online, 2–5 May 2022; pp. 1–6. [Google Scholar]
Zhang, L.; Zhu, T.; Xiong, P.; Zhou, W.; Yu, P. A robust game-theoretical federated learning framework with joint differential privacy. IEEE Trans. Knowl. Data Eng. 2022, 35, 3333–3346. [Google Scholar] [CrossRef]
Gupta, S.; Ahuja, K.; Havaei, M.; Chatterjee, N.; Bengio, Y. FL Games: A federated learning framework for distribution shifts. arXiv 2022, arXiv:2205.11101. [Google Scholar]
Lee, H.S.; Lee, J.W. Adaptive Transmission Scheduling in Wireless Networks for Asynchronous Federated Learning. IEEE J. Sel. Areas Commun. 2021, 39, 3673–3687. [Google Scholar] [CrossRef]
Fan, S.; Zhang, H.; Wang, Z.; Cai, W. Mobile Devices Strategies in Blockchain-based Federated Learning: A Dynamic Game Perspective. IEEE Trans. Netw. Sci. Eng. 2022; Early Access. [Google Scholar] [CrossRef]
Jiang, S.; Wu, J. A Reward Response Game in the Federated Learning System. In Proceedings of the 2021 IEEE 18th International Conference on Mobile Ad Hoc and Smart Systems (MASS), Denver, CO, USA, 4–7 October 2021; pp. 127–135. [Google Scholar] [CrossRef]
Yu, H.; Liu, Z.; Liu, Y.; Chen, T.; Cong, M.; Weng, X.; Niyato, D.; Yang, Q. A Sustainable Incentive Scheme for Federated Learning. IEEE Intell. Syst. 2020, 35, 58–69. [Google Scholar] [CrossRef]
Patel, F.; Bhattacharya, P.; Tanwar, S.; Gupta, R.; Kumar, N.; Guizani, M. Block6Tel: Blockchain-based Spectrum Allocation Scheme in 6G-envisioned Communications. In Proceedings of the 2021 International Wireless Communications and Mobile Computing (IWCMC), Harbin, China, 28 June–2 July 2021; pp. 1823–1828. [Google Scholar] [CrossRef]
Zhong, Z.; Zhou, Y.; Wu, D.; Chen, X.; Chen, M.; Li, C.; Sheng, Q.Z. P-FedAvg: Parallelizing Federated Learning with Theoretical Guarantees. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar] [CrossRef]
Patel, V.A.; Bhattacharya, P.; Tanwar, S.; Jadav, N.K.; Gupta, R. BFLEdge: Blockchain based federated edge learning scheme in V2X underlying 6G communications. In Proceedings of the 2022 12th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Online, 27–28 January 2022; pp. 146–152. [Google Scholar] [CrossRef]
Lecun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
Chopade, M.; Khan, S.; Shaikh, U.; Pawar, R. Digital forensics: Maintaining chain of custody using blockchain. In Proceedings of the 2019 Third International conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 12–14 December 2019; pp. 744–747. [Google Scholar]
Lone, A.H.; Naaz, R. Reputation Driven Dynamic Access Control Framework for IoT atop PoA Ethereum Blockchain. Cryptology ePrint Archive, Paper 2020/566. 2020. Available online: https://eprint.iacr.org/2020/566 (accessed on 20 February 2023).
Tataria, H.; Shafi, M.; Molisch, A.F.; Dohler, M.; Sjöland, H.; Tufvesson, F. 6G wireless systems: Vision, requirements, challenges, insights, and opportunities. Proc. IEEE 2021, 109, 1166–1199. [Google Scholar] [CrossRef]
Gupta, R.; Tanwar, S.; Shudhanshu, T.; Kumar, N. Tactile-Internet-Based Telesurgery System for Healthcare 4.0: An Architecture, Research Challenges, and Future Directions. IEEE Netw. 2019, 33, 22–29. [Google Scholar] [CrossRef]
Bhattacharya, P.; Obaidat, M.S.; Savaliya, D.; Sanghavi, S.; Tanwar, S.; Sadaun, B. Metaverse assisted Telesurgery in Healthcare 5.0: An interplay of Blockchain and Explainable AI. In Proceedings of the 2022 International Conference on Computer, Information and Telecommunication Systems (CITS), Piraeus, Greece, 13–15 July 2022; pp. 1–5. [Google Scholar] [CrossRef]
Jin, Y.; Jiao, L.; Qian, Z.; Zhang, S.; Lu, S. Learning for Learning: Predictive Online Control of Federated Learning with Edge Provisioning. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications, Online, 10–13 May 2021; pp. 1–10. [Google Scholar] [CrossRef]
Diwangkara, S.S.; Kistijantoro, A.I. Study of Data Imbalance and Asynchronous Aggregation Algorithm on Federated Learning System. In Proceedings of the 2020 International Conference on Information Technology Systems and Innovation (ICITSI), Bandung, Indonesia, 19–23 October 2020; pp. 276–281. [Google Scholar] [CrossRef]

Figure 1. Global market cap of NFT-based gaming metaverses by 2030.

Figure 2. Game-o-Meta: The proposed flow model.

Figure 3. Game-o-Meta: The global parallel FedAvg model.

Figure 4. Mining cost of transactions stored in BC.

Figure 5. Rendering latency of GP avatars.

Figure 6. Measurement of device energy consumption.

Figure 7. Model accuracy for different dataset sizes.

Figure 8. RL average rewards plot.

Figure 9. Comparative analysis of different aggregation algorithms.

Table 1. Comparative analysis of proposed scheme and SOTA approaches.

Author	Year	Objectives	Application	Limitations
Proposed herein	2023	A parallel FL-based averaging (P-FedAvg) scheme for GM wherein gradient exchange is managed via BC	Gaming metaverses	An evaluation of the privacy-preserving scheme was not carried out
Zelenyanszki et al. [27]	2023	The authors suggested creating a framework for promoting privacy awareness using ML algorithms to recognize harmful patterns and send notifications to users	Metaverses and digital transformation	Privacy preservation and the identification of a malicious users were not considered
Zhou et al. [28]	2023	This paper presented a metaverse-based MAR system called NOMA-FL	Metaverses and augmented reality	Communication delays and global model convergence were not considered
Hui et al. [29]	2022	A model for hierarchical FL in the gaming industry that safeguarded user privacy and leveraged mobile edge computing	Gaming, autonomous driving, and health monitoring	The model convergence was not fast
Zhang et al. [30]	2022	A differential privacy FL model with client incentives for training	The identification of malicious participants	Security and privacy concerns were not addressed
Gupta et al. [31]	2022	To devise a model called FL GAMES for learning and observing causal features	Gaming	The time complexity was high, and the speed of the communication rounds was low
Lee et al. [32]	2021	To utilize asynchronous FL (AFL) in the wireless distributed learning network field for metaverse implementations	The technology of autonomous twins for metaverses	The wireless network radio resources are restricted, and posterior distribution is difficult
Fan et al. [33]	2021	To implement a blockchain-based FL system and propose an incentive mechanism for creating a transparent trading platform	Word prediction	Higher computational cost and delay
Jiang et al. [34]	2021	To implement a mobile crowd FL gaming system consisting of a collection of mobile devices and a central server	Gaming environment	The unpredictable upload time resulting from the mobile devices led to increased time for local training
Yu et al. [35]	2020	FL incentivization for data owners was implemented in the waiting time calculations with payoffs from the received information	The training of FL data	Privacy and quality improvements were required

Table 2. Simulation parameters.

Parameter	Value
# of $M P_{i}$ s	6
# of $G P$ s per $M P$	10
Epochs	1000
Dataset size	6000
Training samples	5000
Testing samples	1000
Batch size	100
Optimizer	SGD
RL episodes	100
Max time steps/episode	1000
Learning rate ( $α)$	0.01
Discount factor ( $γ$ )	0.99
Initial value function	0
Initial Q-value	0

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Bhattacharya, P.; Verma, A.; Prasad, V.K.; Tanwar, S.; Bhushan, B.; Florea, B.C.; Taralunga, D.D.; Alqahtani, F.; Tolba, A. Game-o-Meta: Trusted Federated Learning Scheme for P2P Gaming Metaverse beyond 5G Networks. Sensors 2023, 23, 4201. https://doi.org/10.3390/s23094201

AMA Style

Bhattacharya P, Verma A, Prasad VK, Tanwar S, Bhushan B, Florea BC, Taralunga DD, Alqahtani F, Tolba A. Game-o-Meta: Trusted Federated Learning Scheme for P2P Gaming Metaverse beyond 5G Networks. Sensors. 2023; 23(9):4201. https://doi.org/10.3390/s23094201

Chicago/Turabian Style

Bhattacharya, Pronaya, Ashwin Verma, Vivek Kumar Prasad, Sudeep Tanwar, Bharat Bhushan, Bogdan Cristian Florea, Dragos Daniel Taralunga, Fayez Alqahtani, and Amr Tolba. 2023. "Game-o-Meta: Trusted Federated Learning Scheme for P2P Gaming Metaverse beyond 5G Networks" Sensors 23, no. 9: 4201. https://doi.org/10.3390/s23094201

APA Style

Bhattacharya, P., Verma, A., Prasad, V. K., Tanwar, S., Bhushan, B., Florea, B. C., Taralunga, D. D., Alqahtani, F., & Tolba, A. (2023). Game-o-Meta: Trusted Federated Learning Scheme for P2P Gaming Metaverse beyond 5G Networks. Sensors, 23(9), 4201. https://doi.org/10.3390/s23094201

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Game-o-Meta: Trusted Federated Learning Scheme for P2P Gaming Metaverse beyond 5G Networks

Abstract

1. Introduction

1.1. Related Work

1.2. Research Gap

2. Game-o-Meta: Problem Formulation

2.1. The Scheme Flow

2.2. Network Model

2.3. Transaction Model

2.4. FL Model

2.5. Threat Model

2.6. Differential Privacy Model

3. Game-o-Meta: The Proposed Scheme

3.1. Complexity Analysis

3.1.1. Server Process

3.1.2. Client Process

4. Performance Evaluation

4.1. Experimental Setup

4.2. Simulation Results

4.3. Formal Security Analysis

5. Conclusions and Future Scope

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI