Skip to Content
MathematicsMathematics
  • Article
  • Open Access

19 July 2024

A Multi-Agent Reinforcement Learning-Based Task-Offloading Strategy in a Blockchain-Enabled Edge Computing Network

and
1
Key Laboratory of Broadband Wireless Communication and Sensor Network Technology (Ministry of Education), Nanjing University of Posts and Telecommunications, New Mofan Road No. 66, Nanjing 210003, China
2
Post Big Data Technology and Application Engineering Research Center of Jiangsu Province, Nanjing University of Posts and Telecommunications, New Mofan Road No. 66, Nanjing 210003, China
3
Post Industry Technology Research and Development Center of the State Posts Bureau (Internet of Things Technology), Nanjing University of Posts and Telecommunications, New Mofan Road No. 66, Nanjing 210003, China
*
Author to whom correspondence should be addressed.
This article belongs to the Special Issue Fuzzy Modeling and Fuzzy Control Systems

Abstract

In recent years, many mobile edge computing network solutions have enhanced data privacy and security and built a trusted network mechanism by introducing blockchain technology. However, this also complicates the task-offloading problem of blockchain-enabled mobile edge computing, and traditional evolutionary learning and single-agent reinforcement learning algorithms are difficult to solve effectively. In this paper, we propose a blockchain-enabled mobile edge computing task-offloading strategy based on multi-agent reinforcement learning. First, we innovatively propose a blockchain-enabled mobile edge computing task-offloading model by comprehensively considering optimization objectives such as task execution energy consumption, processing delay, user privacy metrics, and blockchain incentive rewards. Then, we propose a deep reinforcement learning algorithm based on multiple agents sharing a global memory pool using the actor–critic architecture, which enables each agent to acquire the experience of another agent during the training process to enhance the collaborative capability among agents and overall performance. In addition, we adopt attenuatable Gaussian noise into the action space selection process in the actor network to avoid falling into the local optimum. Finally, experiments show that this scheme’s comprehensive cost calculation performance is enhanced by more than 10% compared with other multi-agent reinforcement learning algorithms. In addition, Gaussian random noise-based action space selection and a global memory pool improve the performance by 38.36% and 43.59%, respectively.

1. Introduction

Mobile Edge Computing (MEC) is an emerging computing paradigm that deploys computing resources at the edge of the network to provide low-latency, high-bandwidth, and customizable services that can deliver computing and storage capabilities at the edge of the network, close to the data source. In addition, MEC can improve the performance, efficiency, and security of various applications that require low latency, high bandwidth, and data privacy, such as augmented reality, smart cities, and autonomous driving. However, because the mobile edge computing network environment has the characteristics of openness and dynamics, the network is vulnerable to the threat of malicious node invasion and data attacks. Such attacks can cause problems such as shared data leakage, task execution interference, and resource allocation anomalies within the network, seriously affecting the security of MEC. Therefore, ensuring the safe sharing of data and trustworthy collaboration of nodes is an important issue that needs to be solved in MEC [1,2].
Blockchain, known as a distributed, tamper-proof, decentralized data storage technology, was first proposed by Nakamoto in the context of Bitcoin, which enables secure, transparent, and immutable transactions and transfer of data records between multiple parties without relying on a trusted third party [3]. Blockchain has been widely used in various fields, such as digital asset management, supply chain finance, and intelligent manufacturing. Blockchain-based MEC (BMEC) is a new type of architecture that applies blockchain technology to MEC systems and can solve many challenges, such as data security, privacy protection, incentive mechanisms, resource management, etc. [4]. The in-depth integration of blockchain and MEC has been widely discussed [5]. In telematics and intelligent transportation systems, blockchain can provide collaborative management of service resources [6], data security sharing management [7], and collaborative node identity authentication [8]. In the smart grid, blockchain-based MEC is mainly applied to system architecture design [9], energy transaction pricing [10], and transaction security [11]. In addition, benefiting from the advantages of blockchain-based MEC, intelligent health care [12] and artificial intelligence [13] are also beginning to be applied.
Although the blockchain-based MEC system has excellent application prospects and research value, it also faces some important problems, including task offloading. MEC task offloading refers to the technology of offloading computing tasks from user devices to edge nodes or clouds for execution in order to solve the deficiencies of user devices in terms of resource storage, computational performance, and energy efficiency. For the problem of task offloading in blockchain-based MEC systems, there are still some limitations in the current research, mainly with respect to the following aspects: (1) most existing works only consider the quality of service indicators of traditional mobile edge computing task offloading, such as task processing latency and energy consumption, but ignore blockchain mechanisms and user privacy leakage, which makes the problem modeling insufficient [14,15,16], and (2) task offloading algorithms are often based on heuristic learning methods or single-agent reinforcement learning algorithms [17,18,19]. The analytical performance and solution efficiency need to be more satisfactory for dynamically changing, high-dimensional, non-convex task-offloading problems. In this paper, blockchain-based mobile edge computing task-offloading modeling and a multi-agent reinforcement learning method are investigated, and the main innovative contributions are summarized as follows:
  • We propose a novel task-offloading model for blockchain-based MEC networks that comprehensively considers the blockchain-specific incentive mechanism and consensus mechanism. It also takes the user privacy metric as the optimization objective, together with the task service quality as the joint optimization objective, which makes the modeling of the optimization problem more in line with the practical environment;
  • We propose a reinforcement learning algorithm based on a multi-agent global memory pool. Agents can enhance the overall collaborative ability among the agents by sharing parameters;
  • We adopt attenuatable Gaussian random noise in the action space selection process in the actor network to enhance the search capability and avoid falling into local optimum;
  • We conduct several sets of comparative experiments to validate the performance of the proposed algorithm in dealing with the task-offloading problem.
This paper is structured as follows. Section 2 investigates state-of-the-art research related to the research content of this paper. Section 3 presents the proposed blockchain MEC network task-offloading optimization model to be solved in this paper. Section 4 describes the principle and process of the reinforcement learning algorithm used in this paper. Section 5 conducts simulation experiments to evaluate the performance and effectiveness of the proposed algorithm. Section 6 summarizes the full paper.

3. Model

In this section, we propose a blockchain-based MEC system architecture, then provide a system overview and describe the operational flow and the blockchain consensus process.

3.1. System Model

In this paper, we propose a blockchain mobile edge network task-offloading model. The specific architecture of this model is shown in Figure 1. The network model mainly contains a blockchain layer, an edge server layer, and a device layer. The device layer contains a collection of devices, and different user devices interact with the MEC network environment, send task offload requests to the edge service node, and receive the offload policy feedback from the edge service node to complete task offloading. The edge server layer contains a collection of edge nodes; each node has certain task processing resources, receiving task-offloading requests sent by users and completing the task-offloading requests of the devices through cooperation between nodes. The edge service nodes also have the role of blockchain nodes, which can participate in network consensus and reward allocation in the blockchain layer and jointly maintain the blockchain that stores network information, ensuring the security of the network and incentivizing the participation of nodes.
Figure 1. Blockchain-based edge computing network model.
The edge network of this model has a device set ( U = { u 1 , u 2 , , u n } ) consisting of n user devices and an edge node set ( E = { e 1 , e 2 , , e m } ) consisting of m edge nodes. For any user device, u i = ( p w l i , f l i , e n c i m a x , t k i t , l o c l i t = ( x l i t , y l i t ) ) can move along the irregular trajectory within the time slot and initiate a task-offloading request to the edge server, where p w l i is the total transmission power of the device, f l i is the processing speed (the number of processing cycles per second), e n c i m a x is the upper energy limit of the device, t k i t is the task initiated by the user in time slot t, l o c l i t is the device localization, and x l i t and y l i t are the position coordinates.
In addition, the task ( t k i t ) of any user device can be expressed as an array ( ( d i t , D i t , T d i , m a x t ) , where d i t (bit) is the size of the task, D i t is the number of computation cycles required for the computation task (500 CPU computation cycles are required to process the data of a 1 bit task in this paper), and T d i , m a x t is the maximum tolerable delay of the task). Due to the limitation of energy consumption and computational capability of the device, these tasks cannot all be computed locally at the same time and need to be partially offloaded to the edge node by using t k i t , l and t k i j t , o to denote the local computation of the task ( t k i t ) and its offloading part to the node ( e j ), respectively. The sizes of the corresponding offloading task and computational task cycle are denoted as d i t , l , D i t , l , d i j t , o , and D i j t , o , respectively.
For any edge node, e j = ( k j , p w e j , f e j , l o c e j = ( x e j , y e j ) ) can receive the task data offloaded by the device and process the task using its computational resources, then return the result to the smart device after task processing is completed. k j t denotes the number of tokens held by the block node corresponding to the edge node, p w e j denotes the transmission power of the edge node, f e j denotes the processing speed of the edge node, l o c e j denotes the fixed location of the server, and x e j and y e j are the location coordinates.
In the blockchain of this model, all the mobile edge network nodes also have the role of blockchain nodes, sharing parameters and recording proof of workload through the blockchain. The consortium blockchain uses a Proof of Stake (PoS)-based consensus method to validate the workload of the computing nodes and distribute incentive rewards to each of the individual nodes involved in the computation of the offloading task.

3.2. Consensus Model

The consensus mechanism in blockchain is the core method to ensure that all participating nodes agree on the state of the blockchain. Currently, there are two main consensus mechanisms in mainstream blockchain systems, namely Proof of Work (PoW) and proof of stake (PoS). In PoW, all entities compete to solve a mathematical puzzle to generate blocks and receive a reward. However, the process of PoW is very computationally intensive and only applies to mobile edge network scenarios. PoS is proposed to address the limitations of PoW, and unlike PoW, the probability of an entity getting the right to publish a block depends on its equity, i.e., the number of tokens owned by the entity [38]. A comparison of the two consensus mechanisms is shown in Table 1.
Table 1. Consensus mechanism comparison.
In this paper, the PoS-based consensus mechanism is used to implement the workload consensus checking of computing nodes. Its execution process is as follows:
(1)
Packing node selection: The system selects a negative blockchain node [39] to construct a new block by periodically selecting the block creation node ( e g E ) among all the nodes with equity based on the number of tokens held by the verifier;
(2)
New block creation: The block creation node packages all blockchain network transactions in the system during time slot t into a new block, assuming that the block consists of the following two parts: the task ( t k i t ) offloading data ( d b t ) and the block-fixing data ( d 0 ) contained in block b i . The size of the task transaction data is calculated from the original size conversion of the task, noting the conversion rate as s. Then, the block size d b i t can be expressed as
d b i t = d b t + d 0 = s Σ e j E d i j t , o + d 0
(3)
Block validation: The coalition chain calculates the selection probability of the edge nodes according to the number of tokens owned by the nodes using a Poisson distribution with parameter λ . The first v nodes according to the order of probability constitute the set of validation nodes ( E V ), in which the probability distribution of the edge node ( e j ) being selected as a validation node is
p j v = P ( K = k j ) = λ K K ! e λ
(4)
Block addition: Once a new block is recognized by all the validation nodes, it is added to the blockchain;
(5)
Incentive distribution: Based on the incentive mechanism, a certain reward is provided to the network nodes that participate in the task to compute and verify the new block.
The workflow of the blockchain-based MEC task-offloading system described in this section is shown in Figure 2.
Figure 2. BMEC data process.

3.3. Quality of Service Model

In this section, the blockchain-based MEC task-offloading quality of service model proposed in this paper is described in detail, in addition to description of the design methodology for quality of service models reported in existing MEC task-offloading research [15,29,33], to simulate a blockchain–mobile edge network within each time slot The delay and energy consumption generated by user task computation, task offloading communication, block verification, etc., are investigated to construct a blockchain-based network quality of service-oriented communication model and a computation model.

3.3.1. Communication Model

In this paper, it is assumed that the size of the task calculation result is much smaller than the task itself and that the communication overhead required to transmit the result is negligible. Therefore, this paper mainly considers the two data communication scenarios of user device task offloading and block verification and calculates the energy consumption and transmission delay in the communication process.
In this system model, the device and the MEC server are linked through a wireless network, and the transmission rate between them is affected by the transmission environment, communication resources, and transmission distance. In this paper, we refer to [40,41] and calculate the channel gain ( h i j t ) from any device ( u i ) to edge node e j in time slot t using the following formula:
h i j t = h 0 d i s t i j t φ 2
where h 0 denotes the initial gain of the channel, φ is the path loss exponent, and d i s t i j t = x l i t x e j 2 + y l i t y e j 2 denotes the distance from device u i to edge node e j at time slot t.
The signal-to-interference-plus-noise ratio ( S I N R i , j t ) from device u i to edge node e j is
S I N R i , j t = p w i j t h i j t 2 e j E \ { e j } p w i j t h i j t 2 + N 0
where p w i j t , N 0 , and B denote the transmission power from device u i to edge node e j in time slot t, the Gaussian noise in the channel, and the channel communication bandwidth, respectively. p w l i = Σ e j E p w i j t , and the data transmission rate from device u i to edge node e j in time slot t is
R i j t = B · l o g 2 ( 1 + S I N R i , j t )
Therefore, in the task-offloading communication scenario, the communication delay ( T d i j , c o m m t , o ) and energy consumption ( E n i j , c o m m t , o ) of the user device ( u i ) transmitting the task offloading to the edge node ( e j ) during the time slot t is expressed as follows:
T d i j , c o m m t , o = d i j t , o R i j t
E n i j , c o m m t , o = p w i j t T d i j , c o m m t , o = p w i j t d i j t , o R i j t
During the block consensus process, the block generation node transmits the block to the validation node for verification. Assuming that the block generation node ( e g ) and v validation nodes have a fixed network transmission speed (R) between them, the consensus verification communication delay ( T d i v , c o m m t , v ) and energy consumption ( E n i v , c o m m t , v ) between the block generation node ( e g ) and the validation node ( e v ) for block b are
T d i v , c o m m t , v = d b i t R
E n i v , c o m m t , v = p w e g v T d i v , c o m m t , v = p w e g d b i t R

3.3.2. Computing Model

In this paper, we mainly consider three kinds of computing scenarios, namely local task computing, task-offloading computing, and block verification computing. The computing model must determine the processing delay and energy consumption according to the computing process. It is assumed that the blockchain selects block generation nodes according to the number of tokens owned by the nodes, and the calculation volume of generation node selection is ignored in this model.
In the task computation scenario locally executed by the user device, the energy consumption coefficient of the user device is assumed to be ε l = 10 11 [42] in this paper. The delay ( T d i , c o m p t , l ) and energy consumption ( E n i , c o m p t , l ) of device u i for local task processing are
T d i , c o m p t , l = D i t , l f l i
E n i , c o m p t , l = ε l D i t , l f l i 2
In the offloading task scenario executed by edge nodes, this paper assumes that the edge node provides a separate CPU computing core for each offloading task, i.e., tasks offloaded on the same edge node have the same task-computing speed, and the number of offloading tasks that the edge node can host at the same time is related to the number of CPU cores. The energy consumption factor of the edge node is defined as ε o = 10 27  [34]. The delay ( T d i j , c o m p t , o ) and energy consumption ( E n i j , c o m p t , o ) of the edge node ( e j ) in computing the offloading task ( t i o ) are
T d i j , c o m p t , o = D i j t , o f e j
E n i j , c o m p t , o = ε o D i j t , o f e j 2
In the block consensus verification scenario, the delay and energy consumption generated by block generation are not calculated in this paper because the overall overhead of block creation is small compared to that of block verification, where there is a large number of validation links, which has a low impact on the overall performance of the system. When the edge node performs block validation, assuming that the validation computation period of block b i is D b i t , the validation delay ( T d i v , c o m p t , v ) and energy consumption ( E n i v , c o m p t , v ) of the blockchain validation node ( e v ) are
T d i v , c o m p t , v = D b i t f e v
E n i v , c o m p t , v = ε o D b i t f e v 2

3.3.3. Comprehensive Model

In this paper, we comprehensively calculate the delay and energy cost of the blockchain-based MEC task-offloading model by combining the designed communication and computation models.
(1)
Latency Cost
The time delay in the quality of service model designed in this paper contains two links, namely task processing and block verification links. When calculating the time delay of the task processing link, it is assumed that all users start a local task and offload task processing from the same moment, i.e., local task computation and task offload transmission are carried out at the same time, so the actual time delay of task processing is the maximum value of the time delay of local computation and offload processing. The task-offloading delay consists of the communication delay ( T d i j , c o m m t , o ) of the user offloading the task to the edge node and the computation delay ( T d i j , c o m p t , o ) of the task on the edge node. If the user offloads the task to more than one edge node, the task-offloading delay is only computed for the longest processing delay; then, the task offloading delay ( T d i t , o ) of user u i is denoted as
T d i t , o = max T d i 1 , c o m m t , o + T d i 1 , c o m p t , o , , T d i m , c o m m t , o + T d i m , c o m p t , o
Furthermore, the task processing delay ( T d i t ) of user u i is denoted as
T d i t = max ( T d i , c o m p t , l , T d i t , o )
Similarly, when calculating the delay of the block verification link, since the packing node sends the block to each verification node for block verification at the same time, the block verification delay ( T d i t , v ) is the maximum delay processed by each verification node and is denoted as
T d i t , v = max T d i 1 , c o m m t , v + T d i 1 , c o m p t , v , , T d i m , c o m m t , v + T d i m , c o m p t , v
In summary, the delay ( T d i t ) of the quality of service model for user device u i in time slot t is
T d i t = T d i t + T d i t , v
(2)
Energy Cost
In the energy consumption calculation process, the energy consumption of the communication model and the computation model are obtained by summing the processing energy consumption of each task.
Then, the communication and computation energy of user device u i in time slot t are
E n i t , c o m m = Σ e j E E n i j , c o m m t , o + Σ e v E V E n i v , c o m m t , v
E n i t , c o m p = E n i , c o m p t , l + Σ e j E E n i j , c o m p t , o + Σ e v E V E n i v , c o m p t , v
In summary, the energy consumption ( E n i t ) of the quality of service model for user device u i in time slot t is
E n i t = E n i t , c o m m + E n i t , c o m p

3.4. Incentive Reward Model

Previous research [22,43,44,45] has integrated the incentive mechanism of blockchain into the study of task pricing and resource allocation of MEC, balancing the allocation of edge service resources and value gains by considering game theory and auction theory. In this paper, the design of the incentive mechanism is simplified, and only the edge nodes participating in task-offloading computation and block verification are considered to be provided with incentive tokens in equal proportions according to energy consumption. Hence, the incentive model favors edge nodes obtaining more incentive tokens to gain more benefits. In the incentive model, the blockchain uses the β ratio of the unit of energy converted into obtainable tokens based on the energy consumption of the edge nodes; then, the tokens generated by agent u i are calculated as
I i t = β e j E E n i j , c o m p t , o + E n i j , c o m p t , v , e j E V β e j E E n i j c o m p , o , e j E V

3.5. Privacy Model

In this section, we mainly consider that in the process of MEC task offloading, if we consider the energy consumption and delay factors of task communication and computation, user terminals often tend to offload a large number of tasks to edge nodes that are closer to them and have higher levels of resources. However, such a task-offloading method potentially risks data privacy leakage because MEC tasks usually contain sensitive private data such as the physical location of the device, identity characteristics, task data, etc. Suppose that many tasks containing private information are offloaded to an edge node. In that case, the edge node, out of its curiosity or due to being hijacked by an adversary, may collect and infer the user’s location and business characteristics based on the user’s offloading preferences. More seriously, the edge node may predict the user’s private information based on these data characteristics, resulting in user privacy leakage [46]. Therefore, it is necessary to design a privacy metric model to evaluate the degree of privacy leakage that may be caused by the user in the process of task offloading.
Information entropy is a concept that measures the uncertainty or amount of information. Privacy computing models can be utilized to assess and reduce privacy risks. The information entropy-based privacy measure is advantageous in the task of measuring the privacy leakage of user data and has been applied in research on MEC task offloading [41,47]. Therefore, this paper uses the privacy metric based on information entropy to measure the MEC task offloading privacy protection effect.
We define user u i ’s task-offloading preference ( P i ) and measure the probability that user u i ’s data are exposed to edge nodes by calculating the ratio of user u i ’s offloaded task data volume to the total task data volume ( P i ), which is calculated as follows:
P i t = d i t , o d i t = Σ e j E d i j t , o d i t
Based on the user’s task-offloading preference, the concept of privacy entropy is further adopted to describe the amount of privacy information carried by the offloading strategy of user u i   H i t . When there is no task offloading on the user’s terminal, i.e., P i t = 0 , the edge node cannot infer the user’s task information. The privacy entropy is at the maximum value ( H m a x ), and in this paper, we set the value of maximum entropy to 10. The privacy entropy of user u i is calculated as
H i t = P i t l o g 2 P i t , 0 < P i t < 1 H m a x , P i t = 0

4. Problem Description

This paper’s optimization objectives for task offloading in mobile blockchain edge networks focus on privacy preservation, quality of service, and incentive reward. Privacy protection requires maximization of the privacy entropy of the privacy-preserving model to prevent users from offloading too much private data to the edge servers, leading to user privacy leakage. Quality user experience requires minimization of the latency and energy consumption of offloading user tasks. Incentive rewards require maximization of the workload of nodes in the blockchain edge network and improvement of the workload and efficiency of nodes. In this paper, by comprehensively considering offloading privacy, quality of service, and incentive reward factors, the optimization problem can be formulated as the maximum value of the comprehensive optimization objective for user device u i and edge servers within time slot t under the satisfaction of multiple constraints. The specific optimization objective function and constraints are expressed as follows:
P : max C i t = ω 1 H i t + ω 2 I i t ω 3 T d i t ω 4 E n i t
s . t . T d i t T d i , m a x t
0 p w i j t p w i l
0 < P i t 1
H i t H m a x
where ω 1 , ω 2 , ω 3 , and ω 4 are the weights of the indicators, which are used to specify the level of importance of different indicators. Equation (27) means the total task delay is constrained by the maximum tolerable delay of the task. Equation (28) means the device-to-node transmission power receives the constraint of the total transmission power. Equation (29) means the amount of offloaded task data of any user device does not exceed the constraint of the total task data. Equation (30) means the user’s privacy entropy is subject to the constraint of the maximum entropy value.
It is not difficult to find that the optimization problem presented in this paper is a mixed-integer linear programming problem, which are usually NP-hard and, therefore, difficult to solve with a globally optimal solution. The decision-making process for such problems occurs in a dynamic environment of long-term optimization, which makes it difficult for traditional convex optimization algorithms to adapt to unknown environments and perform adaptive optimization.

5. Algorithm

To address the environmental complexity and multi-objective competitiveness possessed by the above optimization problem description, this section first proposes an actor–critic deep reinforcement learning algorithm based on multiple agents sharing a global memory pool to improve the robustness and stability of performance. Secondly, the optimization problem is reformulated as a Markov process (MDP) by constructing each agent’s state space, action space, immediate rewards, and state transitions, and the algorithmic framework structure is described in detail.

5.1. Construction of the Markov Decision Process

In the blockchain mobile edge network task-offloading environment designed in this paper, each user device acts as a reinforcement learning agent, adopting a decentralized execution and centralized training model, which enables the agent to make independent decisions based on its observed and learned strategies. Multiple edge servers form a federated blockchain, sharing network parameters to jointly hold global information about the entire system. At the beginning of each time slot, user devices can initiate task processing requests, sending task and localization information to edge servers. After the edge server obtains the global network state information through blockchain sharing, it conducts centralized training. After training, each agent makes distributed local decisions based on its observations.
In order to solve the above optimization problem, it needs to be converted to the standard form of the Markov decision process (MDP) when using reinforcement learning algorithms. The key components of this transformation include defining the state space, action space, reward space, and state space transitions for each agent.
(1)
State Space
The state space ( s i t ) of an agent (i) in time slot t consists of the localization ( l o c l i t = ( x l i t , y l i t ) ) of its corresponding user device ( u i ) and the amount of requested task data ( d i t ), i.e., s i t = ( l l i t , d i t ) . Therefore, the state space ( s t ) of the reinforcement learning algorithm as a whole is denoted as s t = ( s 1 t , , s n t ) .
(2)
Action space
The action space ( a i t ) of agent i in time slot t represents the distribution of request data processing and channel power allocation of user device u i in the current network state, i.e., a i t = ( d i t , l , d i 1 t , o , , d i m t , o , p w i 1 t , , p w i m t ) .
(3)
Reward function
The reward function of the blockchain mobile edge network task-offloading model aims to maximize the optimization objective function ( C i t ) of each agent, i.e., maximize the privacy entropy of the user device to safeguard the privacy of user data, as well as the blockchain rewards computed by completing the offloaded tasks, and, at the same time, minimize the task processing latency and energy consumption of the user device in order to provide the user with a higher quality of service. The reward function at time slot t is expressed as follows:
r i t = ω 1 H i t + ω 2 I i t ω 3 T d i t ω 4 E n i t , Equations ( 27 ) ( 30 ) r 0 , o t h e r
where r 0 is a constant much smaller than 0 that represents the value of the algorithmic base reward given by the environment if the current policy does not satisfy the constraints of Equations (27)–(30).

5.2. Algorithmic Framework

The framework of the algorithm proposed in this paper is shown in Figure 3. The algorithm sets a corresponding agent for each user device, including an actor network, a critic network, and a random sampler. The actor network and critic network adopt a dual neural network structure. The current network is responsible for constructing the actor’s policy network ( π i ) and the critic’s value network ( Q i ). The Q value of the critic network represents the expected reward for taking a particular action in a given state. The target network is softly updated using the current network parameters ( θ i π and θ i Q ), thus guaranteeing the stability of network learning.
Figure 3. Algorithm structure.
We assume that the sample value function for the critic target network to compute time slot t is Q i ( s i t , a i t | θ i Q ) ; then, the target Q value can be calculated as
q i = r i t + γ Q i ( s t + 1 , a i t + 1 | θ i Q ) ,
where γ denotes the discount factor.
To update the critic’s current network parameter ( θ i Q ), the loss values of the parameters are computed using a mean-square error function. The mean-square error function can help the critic network accurately predict the value of a state or state–action pair.
L o s s ( Q i ) = E [ ( Q i ( s t , a i t | θ i Q ) q i ) 2 ] = 1 n i = 1 n ( Q i ( s t , a i t | θ i Q ) q i ) 2
We minimize L o s s ( θ i Q ) by gradient descent, and the update method for the θ i Q parameter is denoted by
θ i Q θ i Q + α θ i Q L o s s ( Q i ) ,
where α is the learning rate of the critic’s current network parameter ( θ i Q ).
The actor network constructs the action policy ( π i ) based on the state space ( s i t ) of the reinforcement learning agent in time slot t and the reward function ( r i t ) and generates the action ( a i t ) in the time slot, which can be represented as
a i t = π i ( s i t | θ i π )
However, using the output of the strategy network directly does not allow the agent to discover more strategies, so an exploration strategy needs to be constructed by adding noise.
a i t = π i ( s i t | θ i π ) + τ N t
where τ denotes the attenuation factor of the noise, which gradually decreases with the number of iterations of the algorithm to guarantee the stability of network training and N t is Gaussian noise obeying a normal random distribution.
The policy objective function of the actor network is
J ( π i ) = E [ Q i ( s t , a i t | θ i Q ) ]
Then, the gradient of the objective function of the strategy is expressed as
θ i π J ( π i ) = E [ a i π Q i ( s t , a i t | θ i Q ) θ i π π ( s t | θ i π ) ]
Then, the update method for the θ i π parameter is expressed as
θ i π θ i π + β θ i π J ( π i )
where β is the learning rate of the actor network’s θ i π parameter.
In addition, the soft update method for the actor and critic target network parameters ( θ i π and θ i Q ) can be represented as
θ i π σ θ i π + ( 1 σ ) θ i π
θ i Q σ θ i Q + ( 1 σ ) θ i Q
where σ ( 0 , 1 ) is the soft update weight.
In order to reduce environmental changes due to policy learning by other agents, this paper adopts a global memory pool to store the experience samples ( s i t , s i t + 1 , a i t , r i t ) of each agent and uses it to train the neural network of the agents. The global memory pool can be constructed by using the blockchain to realize the sharing of information among agents in the actual application process.
In order to better understand the idea and process of this paper, the pseudo-code of the algorithm is shown in Algorithm 1.  
Algorithm 1: Actor–Critic Algorithm for Blockchain–MEC Task Offloading
Mathematics 12 02264 i001

5.3. Complexity Analysis

In this paper, the computational complexity of the proposed algorithm is mainly considered to be the sum of the training time overhead of all the agents. We assume that n is the number of agents, L a is the number of neural network layers of the actor network, L c is the number of neural network layers of the critic network, S is the number of samples of each agent from the global memory pool, I is the number of algorithmic iterations, d s is the state-space dimension, and d a is the action-space dimension. Then, the computational complexity of the algorithm can be calculated as O ( n S I ( L a + L c ) ( d s + d a ) 2 ) .

6. Experiment and Discussion

In this section, our proposed algorithm is evaluated and analyzed through simulation experiments.

6.1. Experimental Environment

The hardware and software specifications of the experimental environment described in this paper are shown in Table 2.
Table 2. Hardware and software specifications.

6.2. Parameter Design

In order to realize the simulation of the network model, this paper simulates the mobile user task-offloading environment in real scenarios in a 1000 × 1000 area (Figure 4) that contains four blockchain–MEC servers at fixed locations and user mobile devices moving along the path of black arrows. The servers receive task offload requests from user mobile devices and specify the user offload policy for the devices through collaborative planning using multiple servers. The user’s mobile device moves along the non-random irregular black arrow path with a fixed step size in each time slot. It generates a random amount of task data, which are offloaded to one or more servers for processing according to the task-offloading policy.
Figure 4. Network environment simulation.
The parameters of the reinforcement learning algorithm and blockchain edge network environment are shown in the following Table 3.
Table 3. Experimental parameter settings.

6.3. Experimental Analysis

6.3.1. Contrasted Algorithms

In this paper, the following algorithms are selected to be analyzed and compared:
  • JODRL-PP [33]: The JODRL-PP (Joint Optimal Deep Reinforcement Learning with Privacy Preservation) algorithm is a stochastic game-theoretically based task-offloading problem for multi-access point environments proposed for multi-agent deep reinforcement learning algorithms. The algorithm uses a trusted third party for centralized training. It achieves distributed execution to improve the quality of the results while considering the dynamic changes in a multi-user environment and dealing with the complexity of multiple users and access points through stochastic game theory.
  • IQL [48]: IQL (Independent Q-Learning) is a reinforcement learning algorithm applied in multi-agent systems. In a multi-agent system, each agent learns its own Q-value function independently without considering the actions and strategies of other agents and uses only its own state and action information in the learning process. In the IQL-based task-offloading algorithm, if an agent does not cache the corresponding requested service, the agent migrates the task to be executed to another agent that has cached the service based on the service cache information shared among the agents at the beginning of each time slot.
  • QMIX [49]: QMIX (Q-value Mixing Network) is a value-based multi-agent reinforcement learning algorithm that can be used to train decentralized policies in a centralized end-to-end manner. In addition, QMIX’s network estimates joint action values as complex nonlinear combinations of per-agent values conditional only on local observations. It requires that the joint action values for each agent be monotonic. This maximizes the joint action values that can be handled in non-strategy learning and ensures consistency between centralized and decentralized strategies.
  • VDN [50]: The VDN (Value-Decomposition Network) is a value decomposition method for multi-agent systems that decomposes the global value function into local value functions. Each agent learns only the local value function associated with it. This network architecture learns to decompose the team value function into the value functions of agents. It solves the problem of collaborative reinforcement learning of multiple agents with a single joint reward signal. The VDN algorithm does not consider the spatial relationship of the type of service request and the state of the wireless network among agents, and it directly decomposes the joint action value function into the sum of the local action value functions of all agents.

6.3.2. Results

(1)
Experiment 1: Performance Comparison
We set up ten random mobile users in the experimental simulation environment by recording the reward function during 1000 iterations of the reinforcement learning algorithm, the result of which is shown in Figure 5. From the figure, we can find that compared with other schemes, the proposed algorithm’s curve of the final stabilization reward function value is significantly higher than that of other algorithms, and the fluctuation amplitude after stabilization is smaller.
Figure 5. Reward function value iteration.
In order to minimize the impact of single-experiment error on the results, we conducted five repetitive experiments. We recorded the average reward function values for different algorithm configurations for all training cycles, and the comparison results are shown in Figure 6. From the figure, we can find that the proposed algorithm improves by more than 40% in performance compared to QMIX, IQL, and VDN and outperforms JODRL-PP, indicating that the proposed algorithm can obtain a better solution to the problem set in this paper.
Figure 6. Average reward function value comparison.
In our experiments, we also recorded the average costs of task processing energy consumption, task processing latency, user privacy metrics, and blockchain incentive rewards in the reward function, and the comparison graphs are shown in Figure 7. Through the comparison, we can find that the proposed algorithm significantly outperforms QMIX, IQL, and VDN in all costs except blockchain incentive rewards, except that the proposed algorithm reduces the energy cost by 44.38% and improves the blockchain incentive rewards by 13.27% compared to the JODRL-PP algorithm. However, the proposed algorithm is inferior in terms of task processing latency and user privacy metrics.
Figure 7. (a) Average processing delay comparison; (b) average energy consumption comparison; (c) average incentive reward comparison; (d) average privacy metric comparison.
(2)
Experiment 2: Performance Comparison under Different User Scales
In order to test the changes of the algorithms in the optimization problem proposed in this paper under different user sizes, we set the user sizes to 10, 15, 20, 25, and 30 and recorded the average reward function values of the algorithms under different user sizes in five groups of repeated experiments. The results are shown in Figure 8. From the figure, we can find that with increasing user size, the average reward function value of all models decreases; this is because with the increase in users, the corresponding amount of user tasks is also raised. The delay and energy consumption required to process the task increase due to the existence of an upper limit of the user’s privacy metric, and the blockchain network incentive rewards are subject to the limitation of the amount of nodes to receive the task. Hence, a decrease in the value of the reward function is a normal phenomenon. The proposed algorithm still has an optimal average reward function value based on different agent scales.
Figure 8. Reward function value iteration for different user scales.
The experimental comparison graphs of the average cost of task processing energy consumption, task processing delay, user privacy metrics, and blockchain incentive rewards are shown in Figure 9. The proposed algorithm has advantages in some single cost metrics in growing user size, and the experimental results are similar to those of Experiment 1.
Figure 9. (a) Average processing delay comparison; (b) average energy consumption comparison; (c) average incentive reward comparison; (d) average privacy metric comparison for different agent scales.
(3)
Experiment 3: Ablation Experiment
In this paper, we design ablation experiments to investigate the effects of Gaussian noise-based action-space search in the proposed algorithm and the global memory pool of agents on the performance of the algorithm. As in Experiment 1, we set up 10 random mobile users in the experimental simulation environment by recording the reward function value during 1000 iterations of the reinforcement learning algorithm, and the result is shown in Figure 10. From the figure, we can find that the curve of the proposed algorithm reaches a stabilization level faster than that of the other two configurations. It exhibits less fluctuation of the state after stabilization. In addition, the algorithm’s final stabilization reward function value is significantly higher than that of the other two configurations, which indicates that the algorithm’s overall performance has been improved.
Figure 10. Reward function value iteration.
In addition, we compared the average rewards of different algorithm configurations through five repetitions of the experiment, as shown in Figure 11. From the figure, it can be seen that the average reward function value of the proposed algorithm possesses a significant advantage throughout the training cycle. The performance is improved by 38.36% and 43.59% compared to the schemes lacking Gaussian process action-space selection noise and global memory pool, respectively.
Figure 11. Average reward function value comparison.
According to the results of the ablation experiments, the introduction of Gaussian noise-based action-space search and global shared memeory pool significantly improved the algorithm’s performance. These two improvements enhance the algorithm’s ability to explore and utilize historical information, thus improving the learning efficiency and quality of the policy in the long run. This enhancement is significant in complex and dynamic environments, requiring the algorithm to adapt and discover new and better strategies quickly.
In summary, the proposed algorithm was analyzed and validated through many comparative experiments, and we demonstrated the advantages of the proposed algorithm over comparative algorithms in terms of global optimization objectives. Through ablation experiments, we analyzed the important role of Gaussian noise-based action-space search and global shared memory pooling. However, the proposed algorithm still has a disadvantage in calculating task processing delay cost.

7. Conclusions and Future Works

In this paper, we propose a blockchain-based MEC task offloading strategy based on multi-agent reinforcement learning that utilizes a global memory pool to enable each agent to acquire the experience of other agents during the training process in order to enhance the collaborative ability among agents and the overall performance of the system. Moreover, the algorithm introduces a search strategy based on decayable Gaussian random noise action space, improving the agents’ search state space to avoid falling into the local optimum. In terms of the optimization objective function, this paper comprehensively considers cost factors such as task execution energy consumption, processing delay, user privacy metrics, and blockchain incentive rewards and innovatively proposes a blockchain-based MEC task-offloading model. The experimental results show that compared with other algorithms, the proposed algorithm improves the performance of the global optimization objective by more than 10% and has obvious advantages in energy consumption and blockchain incentive rewards. In addition, the ablation experiments show that the Gaussian process action-space selection noise and the global memory pool improve the performance by 38.36% and 43.59%, respectively.
However, this paper is subject to limitation in terms of problem modeling and algorithm design. Firstly, we only used the existing consensus mechanism and simplified incentive mechanism to simulate the execution process of blockchain on MEC, which still has a large deviation from the actual scenario. Secondly, we must consider more security elements of MEC task offloading in the model design. Thirdly, we still need to improve the algorithm’s execution efficiency. Therefore, further research and optimization of problem modeling and algorithm design for blockchain-based MEC task offloading are important research directions for us in the future.

Author Contributions

C.L. developed the idea, performed research and analyses, and wrote the manuscript. Z.S. verified and revised the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This work was funded by the National Natural Science Foundation of China (No. 62272239), the Postgraduate Research & Innovation Plan of Jiangsu Province (No. KYCX20_0761), and the Jiangsu Agriculture Science and Technology Innovation Fund (No. CX(22)1007).

Data Availability Statement

Data are contained within the article.

Acknowledgments

We wish to thank all code providers. We also wish to thank all colleagues, reviewers, and editors who provided valuable suggestions.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Mao, Y.; You, C.; Zhang, J.; Huang, K.; Letaief, K.B. A Survey on Mobile Edge Computing: The Communication Perspective. IEEE Commun. Surv. Tutor. 2017, 19, 2322–2358. [Google Scholar] [CrossRef]
  2. Qiu, H.; Zhu, K.; Luong, N.C.; Yi, C.; Niyato, D.; Kim, D.I. Applications of Auction and Mechanism Design in Edge Computing: A Survey. IEEE Trans. Cogn. Commun. Netw. 2022, 8, 1034–1058. [Google Scholar] [CrossRef]
  3. Nakamoto, S. Bitcoin: A peer-to-peer electronic cash system. Decentralized Bus. Rev. 2008, 21260. Available online: https://bitcoin.org/bitcoin.pdf (accessed on 14 July 2024).
  4. Yu, R.; Oguti, A.M.; Obaidat, M.S.; Li, S.; Wang, P.; Hsiao, K.F. Blockchain-based solutions for mobile crowdsensing: A comprehensive survey. Comput. Sci. Rev. 2023, 50, 100589. [Google Scholar] [CrossRef]
  5. Yang, R.; Yu, F.R.; Si, P.; Yang, Z.; Zhang, Y. Integrated Blockchain and Edge Computing Systems: A Survey, Some Research Issues and Challenges. IEEE Commun. Surv. Tutor. 2019, 21, 1508–1532. [Google Scholar] [CrossRef]
  6. Wang, S.; Ye, D.; Huang, X.; Yu, R.; Wang, Y.; Zhang, Y. Consortium Blockchain for Secure Resource Sharing in Vehicular Edge Computing: A Contract-Based Approach. IEEE Trans. Netw. Sci. Eng. 2021, 8, 1189–1201. [Google Scholar] [CrossRef]
  7. Aujla, G.S.; Singh, A.; Singh, M.; Sharma, S.; Kumar, N.; Choo, K.K.R. BloCkEd: Blockchain-Based Secure Data Processing Framework in Edge Envisioned V2X Environment. IEEE Trans. Veh. Technol. 2020, 69, 5850–5863. [Google Scholar] [CrossRef]
  8. Liu, H.; Zhang, P.; Pu, G.; Yang, T.; Maharjan, S.; Zhang, Y. Blockchain Empowered Cooperative Authentication with Data Traceability in Vehicular Edge Computing. IEEE Trans. Veh. Technol. 2020, 69, 4221–4232. [Google Scholar] [CrossRef]
  9. Lu, Y.; Tang, X.; Liu, L.; Yu, F.R.; Dustdar, S. Speeding at the Edge: An Efficient and Secure Redactable Blockchain for IoT-Based Smart Grid Systems. IEEE Internet Things J. 2023, 10, 12886–12897. [Google Scholar] [CrossRef]
  10. Bao, Z.; Tang, C.; Lin, F.; Zheng, Z.; Yu, X. Rating-protocol optimization for blockchain-enabled hybrid energy trading in smart grids. Sci. China Inf. Sci. 2023, 66, 159205. [Google Scholar] [CrossRef]
  11. Guan, Z.; Zhou, X.; Liu, P.; Wu, L.; Yang, W. A Blockchain-Based Dual-Side Privacy-Preserving Multiparty Computation Scheme for Edge-Enabled Smart Grid. IEEE Internet Things J. 2022, 9, 14287–14299. [Google Scholar] [CrossRef]
  12. Li, Z.; Zhang, J.; Zhang, J.; Zheng, Y.; Zong, X. Integrated Edge Computing and Blockchain: A General Medical Data Sharing Framework. IEEE Trans. Emerg. Top. Comput. 2023, 1–14. [Google Scholar] [CrossRef]
  13. Sharma, D.; Kumar, R.; Jung, K.H. A bibliometric analysis of convergence of artificial intelligence and blockchain for edge of things. J. Grid Comput. 2023, 21, 79. [Google Scholar] [CrossRef]
  14. Lin, Y.; Kang, J.; Niyato, D.; Gao, Z.; Wang, Q. Efficient Consensus and Elastic Resource Allocation Empowered Blockchain for Vehicular Networks. IEEE Trans. Veh. Technol. 2023, 72, 5513–5517. [Google Scholar] [CrossRef]
  15. Zhang, X.; Zhu, X.; Chikuvanyanga, M.; Chen, M. Resource sharing of mobile edge computing networks based on auction game and blockchain. EURASIP J. Adv. Signal Process. 2021, 2021, 26. [Google Scholar] [CrossRef]
  16. Xu, S.; Liao, B.; Yang, C.; Guo, S.; Hu, B.; Zhao, J.; Jin, L. Deep reinforcement learning assisted edge-terminal collaborative offloading algorithm of blockchain computing tasks for energy Internet. Int. J. Electr. Power Energy Syst. 2021, 131, 107022. [Google Scholar] [CrossRef]
  17. Moghaddasi, K.; Rajabi, S.; Gharehchopogh, F.S. Multi-Objective Secure Task Offloading Strategy for Blockchain-Enabled IoV-MEC Systems: A Double Deep Q-Network Approach. IEEE Access 2024, 12, 3437–3463. [Google Scholar] [CrossRef]
  18. Wu, H.; Wolter, K.; Jiao, P.; Deng, Y.; Zhao, Y.; Xu, M. EEDTO: An Energy-Efficient Dynamic Task Offloading Algorithm for Blockchain-Enabled IoT-Edge-Cloud Orchestrated Computing. IEEE Internet Things J. 2021, 8, 2163–2176. [Google Scholar] [CrossRef]
  19. Nguyen, D.C.; Pathirana, P.N.; Ding, M.; Seneviratne, A. Privacy-Preserved Task Offloading in Mobile Blockchain with Deep Reinforcement Learning. IEEE Trans. Netw. Serv. Manag. 2020, 17, 2536–2549. [Google Scholar] [CrossRef]
  20. Le, Y.; Ling, X.; Wang, J.; Guo, R.; Huang, Y.; Wang, C.X.; You, X. Resource Sharing and Trading of Blockchain Radio Access Networks: Architecture and Prototype Design. IEEE Internet Things J. 2023, 10, 12025–12043. [Google Scholar] [CrossRef]
  21. Salim, M.M.; Pan, Y.; Park, J.H. Energy-efficient resource allocation in blockchain-based Cybertwin-driven 6G. J. Ambient. Intell. Humaniz. Comput. 2024, 15, 103–114. [Google Scholar] [CrossRef]
  22. Sun, W.; Liu, J.; Yue, Y.; Wang, P. Joint Resource Allocation and Incentive Design for Blockchain-Based Mobile Edge Computing. IEEE Trans. Wirel. Commun. 2020, 19, 6050–6064. [Google Scholar] [CrossRef]
  23. Ding, J.; Han, L.; Li, J.; Zhang, D. Resource allocation strategy for blockchain-enabled NOMA-based MEC networks. J. Cloud Comput. 2023, 12, 142. [Google Scholar] [CrossRef]
  24. Zhang, L.; Zou, Y.; Wang, W.; Jin, Z.; Su, Y.; Chen, H. Resource allocation and trust computing for blockchain-enabled edge computing system. Comput. Secur. 2021, 105, 102249. [Google Scholar] [CrossRef]
  25. Baranwal, G.; Kumar, D.; Vidyarthi, D.P. Blockchain based resource allocation in cloud and distributed edge computing: A survey. Comput. Commun. 2023, 209, 469–498. [Google Scholar] [CrossRef]
  26. Xue, H.; Chen, D.; Zhang, N.; Dai, H.N.; Yu, K. Integration of blockchain and edge computing in internet of things: A survey. Future Gener. Comput. Syst. 2023, 144, 307–326. [Google Scholar] [CrossRef]
  27. Liu, X. Towards blockchain-based resource allocation models for cloud-edge computing in IoT applications. Wirel. Pers. Commun. 2021, 135, 2483. [Google Scholar] [CrossRef]
  28. Guo, S.; Dai, Y.; Guo, S.; Qiu, X.; Qi, F. Blockchain Meets Edge Computing: Stackelberg Game and Double Auction Based Task Offloading for Mobile Blockchain. IEEE Trans. Veh. Technol. 2020, 69, 5549–5561. [Google Scholar] [CrossRef]
  29. Devi, I.; Karpagam, G.R. Energy-Aware Scheduling for Tasks with Target-Time in Blockchain based Data Centres. Comput. Syst. Sci. Eng. 2022, 40, 405–419. [Google Scholar] [CrossRef]
  30. Xiong, J.; Guo, P.; Wang, Y.; Meng, X.; Zhang, J.; Qian, L.; Yu, Z. Multi-agent deep reinforcement learning for task offloading in group distributed manufacturing systems. Eng. Appl. Artif. Intell. 2023, 118, 105710. [Google Scholar] [CrossRef]
  31. Lu, K.; Li, R.D.; Li, M.C.; Xu, G.R. MADDPG-based joint optimization of task partitioning and computation resource allocation in mobile edge computing. Neural Comput. Appl. 2023, 35, 16559–16576. [Google Scholar] [CrossRef]
  32. Li, K.; Wang, X.; He, Q.; Yang, M.; Huang, M.; Dustdar, S. Task Computation Offloading for Multi-Access Edge Computing via Attention Communication Deep Reinforcement Learning. IEEE Trans. Serv. Comput. 2023, 16, 2985–2999. [Google Scholar] [CrossRef]
  33. Wu, G.; Chen, X.; Gao, Z.; Zhang, H.; Yu, S.; Shen, S. Privacy-preserving offloading scheme in multi-access mobile edge computing based on MADRL. J. Parallel Distrib. Comput. 2024, 183, 104775. [Google Scholar] [CrossRef]
  34. Yang, L.; Li, M.; Si, P.; Yang, R.; Sun, E.; Zhang, Y. Energy-Efficient Resource Allocation for Blockchain-Enabled Industrial Internet of Things with Deep Reinforcement Learning. IEEE Internet Things J. 2021, 8, 2318–2329. [Google Scholar] [CrossRef]
  35. Nguyen, D.C.; Ding, M.; Pathirana, P.N.; Seneviratne, A.; Li, J.; Poor, H.V. Cooperative Task Offloading and Block Mining in Blockchain-Based Edge Computing with Multi-Agent Deep Reinforcement Learning. IEEE Trans. Mob. Comput. 2023, 22, 2021–2037. [Google Scholar] [CrossRef]
  36. Yao, S.; Wang, M.; Qu, Q.; Zhang, Z.; Zhang, Y.F.; Xu, K.; Xu, M. Blockchain-Empowered Collaborative Task Offloading for Cloud-Edge-Device Computing. IEEE J. Sel. Areas Commun. 2022, 40, 3485–3500. [Google Scholar] [CrossRef]
  37. Wang, C.; Jiang, C.; Wang, J.; Shen, S.; Guo, S.; Zhang, P. Blockchain-Aided Network Resource Orchestration in Intelligent Internet of Things. IEEE Internet Things J. 2023, 10, 6151–6163. [Google Scholar] [CrossRef]
  38. Du, Y.; Wang, Z.; Li, J.; Shi, L.; Jayakody, D.N.K.; Chen, Q.; Chen, W.; Han, Z. Blockchain-Aided Edge Computing Market: Smart Contract and Consensus Mechanisms. IEEE Trans. Mob. Comput. 2023, 22, 3193–3208. [Google Scholar] [CrossRef]
  39. Kaur, M.; Khan, M.Z.; Gupta, S.; Noorwali, A.; Chakraborty, C.; Pani, S.K. MBCP: Performance Analysis of Large Scale Mainstream Blockchain Consensus Protocols. IEEE Access 2021, 9, 80931–80944. [Google Scholar] [CrossRef]
  40. Liang, L.; Kim, J.; Jha, S.C.; Sivanesan, K.; Li, G.Y. Spectrum and Power Allocation for Vehicular Communications with Delayed CSI Feedback. IEEE Wirel. Commun. Lett. 2017, 6, 458–461. [Google Scholar] [CrossRef]
  41. Xu, X.; Liu, X.; Yin, X.; Wang, S.; Qi, Q.; Qi, L. Privacy-aware offloading for training tasks of generative adversarial network in edge computing. Inf. Sci. 2020, 532, 1–15. [Google Scholar] [CrossRef]
  42. Chen, X. Decentralized Computation Offloading Game for Mobile Cloud Computing. IEEE Trans. Parallel Distrib. Syst. 2015, 26, 974–983. [Google Scholar] [CrossRef]
  43. Huang, X.; Zhang, B.; Li, C. Incentive Mechanisms for Mobile Edge Computing: Present and Future Directions. IEEE Netw. 2022, 36, 199–205. [Google Scholar] [CrossRef]
  44. Xu, Y.; Zhang, H.; Li, X.; Yu, F.R.; Ji, H.; Leung, V.C.M. Blockchain-Based Edge Collaboration with Incentive Mechanism for MEC-Enabled VR Systems. IEEE Trans. Wirel. Commun. 2024, 23, 3706–3720. [Google Scholar] [CrossRef]
  45. Gao, Q.; Xiao, J.; Cao, Y.; Deng, S.; Ouyang, C.; Feng, Z. Blockchain-based collaborative edge computing: Efficiency, incentive and trust. J. Cloud Comput. 2023, 12, 72. [Google Scholar] [CrossRef]
  46. Li, X.; Liu, S.; Wu, F.; Kumari, S.; Rodrigues, J.J.P.C. Privacy Preserving Data Aggregation Scheme for Mobile Edge Computing Assisted IoT Applications. IEEE Internet Things J. 2019, 6, 4755–4763. [Google Scholar] [CrossRef]
  47. Xu, X.; He, C.; Xu, Z.; Qi, L.; Wan, S.; Bhuiyan, M.Z.A. Joint Optimization of Offloading Utility and Privacy for Edge Computing Enabled IoT. IEEE Internet Things J. 2020, 7, 2622–2629. [Google Scholar] [CrossRef]
  48. Tampuu, A.; Matiisen, T.; Kodelja, D.; Kuzovkin, I.; Korjus, K.; Aru, J.; Aru, J.; Vicente, R. Multiagent cooperation and competition with deep reinforcement learning. PLoS ONE 2017, 12, e0172395. [Google Scholar] [CrossRef]
  49. Rashid, T.; Samvelyan, M.; de Witt, C.S.; Farquhar, G.; Foerster, J.; Whiteson, S. Monotonic Value Function Factorisation for Deep Multi-Agent Reinforcement Learning. J. Mach. Learn. Res. 2020, 21, 1–51. [Google Scholar]
  50. Sunehag, P.; Lever, G.; Gruslys, A.; Czarnecki, W.M.; Zambaldi, V.F.; Jaderberg, M.; Lanctot, M.; Sonnerat, N.; Leibo, J.Z.; Tuyls, K.; et al. Value-Decomposition Networks For Cooperative Multi-Agent Learning. arXiv 2017, arXiv:1706.05296. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.