Next Article in Journal
Transmission-Path Selection with Joint Computation and Communication Resource Allocation in 6G MEC Networks with RIS and D2D Support
Previous Article in Journal
STREAM: A Semantic Transformation and Real-Time Educational Adaptation Multimodal Framework in Personalized Virtual Classrooms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MEC-Chain: Towards a New Framework for a MEC-Enabled Mobile Blockchain Network Under the PoS Consensus

1
College of Technological Innovation, Zayed University, Abu Dhabi P.O. Box 144534, United Arab Emirates
2
Faculty of Economics and Management of Sfax, Multimedia, Information Systems and Advanced Computing Laboratory (MIRACL), Sfax University, Sfax 3000, Tunisia
*
Author to whom correspondence should be addressed.
Future Internet 2025, 17(12), 563; https://doi.org/10.3390/fi17120563
Submission received: 6 September 2025 / Revised: 20 November 2025 / Accepted: 25 November 2025 / Published: 5 December 2025

Abstract

The Proof of Stake (PoS) consensus mechanism is increasingly used in blockchain systems; however, resource allocation for PoS-based mobile blockchain networks remains underexplored, particularly given the constraints of mobile devices. This work introduces MEC-Chain, a new framework that integrates Mobile Edge Computing (MEC) with mobile blockchain to support efficient validator-node execution under PoS. MEC-Chain formalizes a multi-objective resource-allocation problem that jointly considers latency, reliability, and cost from both the validator and MEC-provider perspectives. To address this challenge, we develop a deep reinforcement learning-based allocation agent using the Proximal Policy Optimization (PPO) algorithm. Experimental results show that PPO achieves a 30–40% reduction in total execution time, 25–35% lower transmission latency, and 10–15% higher reliability compared to A2C (Advantage Actor–Critic) and DQN (Deep Q-Network), while offering comparable cost savings across all methods. These results demonstrate the effectiveness of MEC-Chain in enabling low-latency, reliable, and resource-efficient PoS validation within mobile blockchain environments.

Graphical Abstract

1. Introduction

Blockchain technology is continually evolving in response to the success and growing popularity of digital currencies, such as Bitcoin. It is an innovative distributed ledger technology that has sparked widespread interest by addressing the issues of centralized parties, high costs, inefficiencies, and data storage challenges. Blockchain guarantees a tamper-proof ledger and transparent, secure transactions. Blockchain technology can change how we share information and affect the future of the digital economy and society, promoting its adoption in many fields, such as the Internet of Things (IoT) [1], circular economy [2], social crowdfunding [3], insurance systems [4], and smart manufacturing [5].
Within the blockchain network, data integrity and validation are maintained thanks to a process called mining. The latter is performed by miners using different consensus mechanisms, such as Proof of Work (PoW) [6] and Proof of Stake (PoS) [7]. The consensus mechanism aims to solve a mathematical problem by generating a hash that satisfies a set of requirements using extensive computing resources. This mathematical problem refers to specific transactions that must be added to the blockchain’s ledger. In recent years, many researchers have been interested in the problem of resource allocation for the execution of the consensus mechanism [8,9,10]. They primarily focused on selecting miners and determining the optimal resources to allocate to each participating miner. Despite the importance of these studies, they focus solely on wired blockchain networks and do not consider more challenging blockchain networks, such as mobile blockchains operating on mobile devices. Mobile blockchain, also known as wireless blockchain networks, has garnered significant attention in recent years, with many authors demonstrating in the literature its effectiveness and potential to drive substantial changes in various domains [11]. However, mobile devices usually have limited resources (energy, storage, computation, etc.) that cannot support validation and verification tasks. With the emergence of mobile edge computing (MEC), mobile blockchain can utilize this environment to facilitate the offloading of mining and verification tasks from mobile devices to nearby edge servers that possess sufficient resources, thereby achieving improved communication efficiency. Because MEC enables the deployment of edge servers near mobile devices and has sufficient computing and storage capacity, it can overcome the high latency problem of traditional cloud computing and the limited computational power of mobile devices [12]. MEC servers offer a distributed platform with integrated networking, computing, storage, and application processing capabilities.
Several studies on MEC-enabled wireless blockchain networks have been undertaken to enhance data security and transactional throughput. Most of these studies concentrated on applying game theory to resource allocation problems or presenting deep learning-based allocation [13,14]. Other works utilized auctions to allocate virtual machines to end users. All these works assign tasks related to the PoW protocol, such as mining and transaction verification, which require significant computing resources and demand that all nodes participate in these tasks. In contrast to these works, we are particularly interested in blockchain systems built upon the PoS protocol, which, unlike PoW, are less energy-intensive and resource-demanding. Indeed, miners compete to find a block hash; the first to do so gains the right to create a new block. In contrast, PoS, which Ethereum 2.0 uses, selects validators randomly once they have paid the stake. PoS introduces new considerations for resource allocation that both the validator and the MEC provider must consider. First, when leasing MEC resources, the validator must accurately calculate the utility of this allocation in terms of rewards, considering their chances of being selected as validators and the allocation costs they must incur. On the other hand, the MEC provider must maximize their income, considering that clients are mobile, and must properly balance the load on their resources. They must also provide the necessary security mechanisms for the validator to outsource their validation and verification scripts.
Although several studies have explored resource allocation for mobile blockchain networks, most of them focus on PoW and do not reflect the specific characteristics and constraints of PoS. Existing work typically assumes stable computation resources and therefore overlooks the dynamic and heterogeneous resource availability of mobile devices, which directly affects validators’ ability to operate PoS nodes reliably. Moreover, prior studies generally optimize resource allocation solely from the user perspective and do not incorporate the dual interests of both validators and MEC providers, such as balancing cost, reliability, and profit. Finally, current models rarely address the strict latency requirements imposed by PoS block creation times (e.g., Ethereum’s 12-second slot), which are crucial for ensuring timely attestation and block proposal. These limitations highlight the absence of a comprehensive framework that can address mobile resource variability, multi-stakeholder objectives, and PoS-specific timing constraints, underscoring the need for the proposed MEC-Chain framework.
The main contributions of this paper are summarized as follows:
  • We propose MEC-Chain, a framework that integrates MEC with mobile PoS validation and supports both full and partial delegation modes, ensuring flexible, low-latency, and reliable validator operation in dynamic mobile environments.
  • We formulate a PoS-aware resource allocation problem that jointly considers transmission latency, server reliability, and the minimum computing resources required to maintain continuous validator availability.
  • We design a deep reinforcement learning agent based on Proximal Policy Optimization (PPO) to learn optimal offloading and resource allocation strategies under varying network and mobility conditions.
  • We implement and evaluate MEC-Chain through extensive simulations, demonstrating that PPO outperforms A2C and DQN in execution time, transmission latency, and server reliability.
The rest of the paper is organized as follows. Section 2 offers the necessary background, serving as a foundational basis for this study. Related works are presented in Section 3. Section 4 presents the MEC-Chain framework. Section 5 depicts a detailed design of the RL-based Resource allocation algorithm. Then, Section 6 describes and discusses the preliminary evaluation results before concluding remarks and future work in Section 7.

2. Background

This section elucidates the fundamental concepts essential for comprehensively understanding our approach.

2.1. Proof of Stake Consensus Mechanism

Ethereum, one of the leading blockchain platforms, has transitioned to a Proof of Stake (PoS) consensus mechanism as part of its Ethereum 2.0 upgrade [15]. This transformative shift aims to enhance the network’s scalability, security, and sustainability. In Ethereum’s PoS, validators are chosen to propose and validate new blocks based on the amount of cryptocurrency they lock up as collateral, known as staking. With a 12-s block time, Ethereum PoS significantly reduces the time required for block creation, thereby improving transaction throughput. This transition aligns with Ethereum’s commitment to sustainability by reducing energy consumption compared to the previous PoW model. Stakers, motivated by the prospect of earning rewards, play a pivotal role in maintaining the network’s integrity. Ethereum’s PoS introduces a dynamic and participatory approach to consensus, fostering a decentralized and secure ecosystem as the platform evolves.

2.2. Mobile Blockchain Network

The concept of a mobile blockchain refers to the integration of blockchain technology with mobile devices, extending the capabilities and benefits of decentralized ledgers to the mobile ecosystem [16]. The use of a mobile blockchain for Ethereum validators is a significant advancement [17]. This notion is simple and convenient, allowing validators to remain active regardless of their location. Because mobile devices are portable, they can validate transactions while on the go. This is consistent with the availability and connection criteria, as validators remain linked and active. On the other hand, creating a validation node on a mobile device may pose technical and security problems related to PoS consensus procedures. These activities often necessitate a robust, dependable, and high-performance computer infrastructure, which may not be available on a mobile device. Furthermore, due to their frequent use in public contexts, mobile devices may be more vulnerable to attacks than desktop or laptop computers, increasing their exposure to malicious actors.

2.3. Mobile Edge Computing and Resource Allocation

Mobile Edge Computing (MEC) represents a paradigm shift in how computing resources are utilized in mobile networks [18]. MEC brings computational capabilities closer to the network’s edge, reducing latency and enhancing the overall user experience. Efficient resource allocation is a cornerstone of MEC, where computing resources are dynamically distributed based on their proximity to end users and the specific requirements of applications. This allocation strategy optimizes available resources, ensuring that processing tasks are performed closer to the data source and reducing the need for data to traverse long distances to centralized cloud servers. The challenge lies in designing algorithms and frameworks that can adaptively allocate resources based on varying workloads and user demands, striking a balance between efficiency and responsiveness in the dynamic landscape of mobile edge computing.

3. Related Work

Some studies have explored the challenge of compute offloading for mobile blockchain networks, aiming to address the computation-intensive proof-of-work (PoW) and promote trust between resource requesters and providers.
Liu et al. [19] suggested a wireless blockchain framework with mobile edge computing (MEC) capabilities. The computationally difficult mining activities can be offloaded to nearby edge computing nodes, and the cryptographic hashes of blocks can be saved on the MEC server. Jiao et al. [20] proposed an auction-based resource allocation system to maximize social welfare. The authors enhanced their initial work [21] by proposing two bidding schemes: constant and multi-demand. In the former, each miner bids for a fixed quantity of resources, while in the latter, the miners can submit their preferred demands and bids. The authors introduce an auction mechanism for the constant-demand bidding scheme that achieves optimal social welfare. In the multi-demand bidding scheme, they design an approximate algorithm that guarantees truthfulness, individual rationality, and computational efficiency. Zuo et al. [13] designed a new mobile blockchain-enabled edge computing network based on a nonce hash computing ordering (HCO) mechanism. First, the authors formulated the demands for computing an individual user’s nonce hash as a non-cooperative game that maximizes personal revenue. Then, they analyzed the existence of Nash equilibrium in the non-cooperative game and designed an alternating optimization algorithm to achieve the optimal nonce selection strategies for all users. Nevertheless, the transmission delay between the miners and the MEC server is not taken into account in the different approaches [19,20,21]. A large transmission delay may occur if too many miners simultaneously offload tasks to the MEC server.
Qiu et al. [14] proposed a model for the optimal bid-rigging mechanism in a mobile blockchain edge computing resource auction. To increase the revenue of the MEC service provider, the authors proposed a method for determining the optimal reserve price using the simulated annealing algorithm. This algorithm calculates the optimal reserve price based on bidders’ willingness to pay and the number of bid-rigging participants. Nonetheless, the suggested method enables ESPs to discourage user cheating in resource optimization auctions, reduce revenue losses, and address user data privacy concerns. Another work is that proposed by Zhang et al. [22], which investigated both the regular task offloading problem and the mining task offloading problem in a blockchain-enabled beyond 5G network. The former problem is formulated as a double auction market, and the latter is formulated as a Stackelberg game. Wang et al. [23] proposed a new differential evolution algorithm to jointly optimize the mining decision and resource allocation for an MEC-enabled wireless blockchain network. Ding et al. [24] proposed a NOMA-based MEC wireless blockchain system that optimizes task offloading, user clustering, computing resources, and transmission power to reduce overall energy consumption. They decompose the non-convex problem into sequential algorithms and show through simulations that their joint optimization approach effectively lowers system energy use. Another recent effort is SharpEdge [25], a QoS-driven task-scheduling scheme for mobile edge computing that leverages blockchain to enable secure, efficient, and trustworthy peer offloading between edge servers from different infrastructure providers. In SharpEdge, edge servers publish tasks with associated rewards and select reliable executors through a reputation mechanism built on historical performance. After execution, the results and executor performance are immutably recorded on the blockchain. To meet MEC’s low-latency requirements, the authors design a concurrent consensus mechanism based on sharding, which significantly improves scheduling efficiency. Fang et al. [26] present a blockchain-assisted MEC offloading scheme in which user terminals offload sensitive tasks to a base station that executes them while blockchain consensus ensures security. They formulate a joint optimization problem to minimize user energy under delay constraints and solve it using a collective reinforcement learning approach, with simulations confirming its effectiveness. However, this work focuses on generic task offloading and minimizing user energy consumption, without addressing PoS validation or mobile validator behavior.
Overall, most of the above works tackle task offloading from a user perspective and concentrate on PoW mining tasks. Since PoW relies on computational work to select the miner who adds a new block, resource allocation should concentrate on optimizing computational tasks. However, in PoS, given the shorter block creation times, minimizing latency while allocating resources is crucial. Unlike existing works, we focus on PoS consensus and aim to propose an approach that considers the viewpoints of both mobile blockchain users and MEC providers. The user seeks a reliable, secure, and cost-effective resource. As for the providers, they aim to maximize their profit by accommodating requests from validators.

4. The MEC-Chain Framework

The proposed framework for a MEC-enabled mobile blockchain network under the PoS consensus is depicted in Figure 1. In this framework, we make certain assumptions to streamline the deployment and functionality of edge nodes. Firstly, we assume that the edge node, while not pervasive, maintains a satisfactory level of stability. Furthermore, it is configured to operate as a node within the Proof-of-Stake (PoS) consensus mechanism. This presupposes a reliable and stable environment for the edge node’s consistent performance. Our assumptions include deploying edge servers equipped with Docker images that encompass all the required software components to function as an Ethereum node. The Docker image is designed to generate a container, facilitating the efficient configuration of Ethereum nodes on edge servers. These assumptions collectively contribute to the foundation of our approach to mobile edge computing and resource allocation.
The ultimate goal of this MEC-Chain framework is to enable validators to run their nodes via mobile devices, offering several advantages. In general, validators have three options for executing their nodes—individual staking, service staking, and group staking—each with its advantages and disadvantages. If the requirements of 32 ETH are met, the main challenge is to provide suitable infrastructure. Whatever the validator’s decision, it must preserve validator revenues and node security, while also ensuring network stability and security.
To guarantee these objectives, the framework consists of two basic layers: validators and edge server layers. The validators layer comprises a set of mobile devices, including smartphones, tablets, and PCs. Each mobile device is considered a node in the blockchain network. Mobile devices can be full nodes or lightweight nodes. The task characteristics are almost similar and are described as validator node execution. The edge server layer comprises servers located in different regions, which deploy several edge computing nodes according to their advantage strategy. Each provider offers its quality of service in terms of availability, price, and trust. Providers compete with each other for a group of users with similar needs in the area.
Edge computing servers provide their customers with a user-friendly interface, enabling validators to easily manage their validation nodes. Furthermore, to simplify the interaction between validators and Edge servers, we propose a smart contract with two main functions. The first is the save() function, which initiates an allocation process by saving public information (server and validator identifiers, allocation duration, price, and allocated resources), as well as the validator’s private key, which is essential for consensus participation. It is essential to note that this key is solely used for validation purposes and does not grant access to the funds. To ensure secure handling of validator credentials, we assume that private keys are never stored or transmitted in plaintext. Before invoking the save() function, the validator encrypts the key locally, and the MEC server is only authorized to use the encrypted material for validation tasks under strict access controls. This prevents sensitive credentials from being exposed during the allocation process. The second function is the finish-allocation() function, allowing the validator to free up its computational resources. Together, these functions ensure the efficient management of the validation node’s parameters and the computational resources utilized by the validator.
In the context of MEC-Chain, the two smart contract functions (save() and finish-allocation()) require a secure execution environment. For this reason, the design of the contract must explicitly account for common blockchain vulnerabilities. In particular, reentrancy can be mitigated by structuring these functions according to the checks–effects–interactions pattern or by applying a nonReentrant modifier when external calls are involved. To prevent front-running during the submission of allocation parameters or sensitive validator metadata, a commit–reveal scheme may be incorporated when appropriate. Furthermore, access to save() and finish-allocation() must be restricted to authenticated validator addresses to avoid unauthorized invocation. These security considerations are essential for ensuring that the MEC-Chain smart contract can be implemented without exposing validator credentials or compromising the integrity of the allocation process.

4.1. The MEC-Chain Process

The proposed framework follows a process that begins when users decide to run their tasks on an edge server and specify their requirements, and ends when the task is fully executed (see Figure 2).
This process includes five steps:
  • Step 1—Determining the resource requirements: In this step, the validators evaluate the resources required to operate their nodes, such as computing power, storage space, bandwidth, allocation period, and core number. The validator’s location is also taken into account to reduce latency times. It is important to note that the basic requirements of the validator are generally constant. However, in a context of equitable resource distribution, each validator can keep a history recording the activity of its node and the resources required as a function of network throughput. This method enables the estimation of the ideal resources required to run the node reliably and efficiently. Finally, validators send their requests to the system once these specific requirements have been defined.
  • Step 2—Resource allocation: When a validator submits a request to the system, this request, including task-specific characteristics, is captured by the resource allocation agent. The latter receives requests from validators, extracts information about the task, summarizes it in a vector adapted to the decision engine, and then transmits it to the decision optimization engine. The decision engine receives a matrix describing the environment, including server status, available resources, and validator requests. It updates this data to form a complete state, at which point the optimization engine makes the allocation decision and implements it by physically deploying the decision via the peripheral network. A more detailed explanation of this decision algorithm is presented in Section 5.
  • Step 3—Validator’s Node Configuration: Once the system has assigned the allocation to each validator, validators can log in to their session. At this stage, the validator faces several choices, such as the operating system, consensus client type, and runtime client type. The server will then take these choices and create a Docker image of the operating system, runtime client type, and consensus type. The server will then execute this image to launch the container, making the server ready to run the validator node. Once the configurations have been completed, the server will import the validator’s validation key from the smart contract, preparing the node for function. This process simplifies the management of PoS validators by automating the creation and configuration of containers.
  • Step 4—Exchanging requests and responses: During this step, the validator must choose between two modes—full delegation or partial delegation—taking into account various external factors, such as battery charge. By opting for full delegation, the validator delegates all its work to the server. The server, acting as a validator, decides which tasks will be performed and how. The user interface (UI) is then transformed into a dashboard displaying statistics, the results of various node missions, and the status of the network and nodes. In this mode, tasks are script-automated (see Figure 3) according to a series of instructions defined in specific code and using the signature key stored in the smart contract to authorize the execution of validation tasks.
    In the limited delegation mode, the validator is the decision-maker (see Figure 4). He or she has full rights to manage their node, and the Ethereum blockchain network will notify the validator’s node of all network details and information, as well as their tasks (attesting, block creation, and transaction validation). This information is then transmitted to the validator via their interface. During this stage, the validator makes decisions concerning these tasks and sends its response as a query containing the parameters required to execute the necessary instructions specific to its needs. For example, once a task has been created in a block, the validator examines the tasks created in the blocks to ensure they comply with Ethereum’s security and confidentiality policies. If these policies are not respected, the validator rejects the task and sends a rejection message to the sending node. If the task complies with Ethereum’s security and confidentiality policies, the validator accepts it and sends a creation request to the issuing node.
In summary, whether validators choose full or partial delegation, the MEC server must guarantee that the validator node operates reliably and responds within the timing constraints imposed by PoS. Although PoS does not require intensive computation like PoW, the validator still depends on stable processing capacity, low transmission latency, and high server availability to avoid missed attestations or delayed participation in consensus. Consequently, the resource management problem in MEC-Chain focuses not on heavy mining tasks but on ensuring continuous validator responsiveness under mobile conditions. These PoS-specific constraints motivate the need for a dedicated resource-allocation model, which we formalize in the next subsection.

4.2. Resource Allocation System Model

Mobile devices have different allocation periods and resources requested, depending on the validator’s needs. Similarly, edge servers are characterized by their associated processing capacities and qualities.
The Edge network is made up of N Edge servers, each of which is denoted by R j where j = 1 , 2 , 3 , , N .
Each Edge j server is defined by:
  • R j < R A M j , D i s k , C o r e _ N u m b e r j >: which represents the available capacity of RAM, disk, and the number of cores.
  • P r i c e j : the proposed server’s price (per day).
  • B j : the proposed server’s bandwidth to receive data and return task results after execution, measured in Mbit/s.
  • L o c j : the server localization ( X j , Y j )
  • R e l i a _ l e v e l j : the reliability level corresponds to the percentage of tasks correctly executed by the server in the last period, reflecting its reliability.
The validator’s request V i is characterized by:
  • R d i < R A M i , D i s k i , Core_ Number i >: which represents the required capacity of RAM, disk, and the number of cores.
  • O f f e r _ P r i c e i : the price suggested by the validator.
  • P i : the allocation period (number of days).
  • L i : the list of validator moves expressed in terms of point P m a x ( X m a x , Y m a x ) and point P m i n ( X m i n , Y m i n ). To find a point close to all validator movements, we calculate the center point as a function of P m a x , P m i n
    c e n t e r _ p o i n t i = x m a x + x m i n 2 , y m a x + y m i n 2
The communication between servers and validators is carried out via bandwidth. It is responsible for connection management, data transmission, and quality of service (QoS). To evaluate the performance of the communication link between users and servers, we can use the following criteria:
  • The speed of transmission, estimated by the transmission time T i j , deduced as a function of the transmission rate noted D i j per second, will be calculated based on distance, signal quality, bandwidth B j , and using Claude Shannon’s theorem [27].
    D i j = B × log 2 1 + P × h N
Note that N defines a constant value for noise, B is the bandwidth value, P represents the signal power, and h is the estimated channel gain using the signal power propagation model, which is calculated as:
h i j = g i j · c 4 π f d i s t i j 2
where d i s t i j denotes the distance between validator i and server j, f is the carrier frequency, c is the speed of light, and g i j represents the small-scale fading coefficient (e.g., Rayleigh or Rician distributed). In this work, for tractability, we assume the average fading case E [ | g i j | 2 ] = 1 , which is equivalent to neglecting instantaneous fading and considering only large-scale path loss. This simplification enables us to focus on resource allocation strategies.
  • The distance between communication points will be calculated according to the Pythagorean theorem: d i s t i i j , i j N .
d i s t i j = ( X l o c j X c e n t e r j ) 2 + ( Y l o c j Y c e n t e r j ) 2
As a result, the transmission time T i j between the validator and the server will be calculated as follows:
T i j = S D i j
where S is the request size in Mbit.
Decision variable: The allocation of a task to a server is represented by the decision variable X i j for each validator i and server j.
In our context, the task is to run an Ethereum validation node for a validator, which can only be run on a single server. This constraint is defined by Equation (6), indicating that a task can only be assigned to one server at a time.
X i j { 0 , 1 } , i I , j J
The validation node must be properly configured to ensure optimal functionality. To accomplish this, each request must be routed to a server with sufficient capacity. This condition can be expressed as follows:
R d i < = R j
where R d i represents the capacity requested by the validator and R j represents the server’s available capacity.
In practice, the validator provides a price for the amount of resources per day, while the edge provider gives a total price for all the resources. We then calculate the percentage of resources requested from the available resources and deduct the provider’s price from the validator’s request P r i c e _ P e r c e n t j i .
P r i c e _ P e r c e n t j i = R e q _ U s a g e P r i c e j
At the same time, the P r i c e _ P e r c e n t j i must be lower than the price proposed by the validator O f f e r _ P r i c e i
P r i c e _ P e r c e n t j i < = O f f e r _ P r i c e i
where R e q _ U s a g e represents the resource usage percentage requested by the validator.
Another important condition is the validator’s mission completion time T d e l a y , often referred to as blocking time. Ethereum’s blocking time is 12 s, noted T B l o c k , generally encompassing the local execution time. In our case, this includes the time required for the request from the server to the validator T t j i , the response time from the validator to the server T r i j , and the execution time T e . This value must remain below the blocking time.
T d e l a y = T t j i + T r i j + T e x < = T B l o c k
We present a resolution to the resource allocation problem in what follows.

5. DRL-Based Resource Allocation Algorithm

Our objective is to develop an advanced optimization algorithm that efficiently allocates resources across various validators. For the validators, this entails minimizing latency, enhancing the provider’s reliability, and reducing costs. For edge server providers, the focus is on maximizing profit and achieving optimal resource allocation.
Mathematically, this problem is formalized as a constrained multi-objective optimization problem in which we aim to maximize several objective functions: ensuring the quality of service provided to the validators so that they meet necessary standards, coordinating between requests and available resources to address as many requests as possible, and guaranteeing the provider’s profit, all while adhering to a set of constraints, such as resource availability, server capacity, and block creation time.
To address the complex and dynamic nature of this resource allocation problem, we propose a solution based on deep reinforcement learning (DRL), specifically using the Proximal Policy Optimization (PPO) algorithm. DRL is particularly well-suited for this problem because it excels in environments where decisions must be made sequentially and under uncertainty. Unlike traditional optimization methods, which may struggle with the problem’s high dimensionality and non-stationary characteristics, DRL can learn from the environment and adapt its strategies over time. DRL is fundamentally based on Markovian formalization, particularly through the use of Markov Decision Processes (MDPs).

5.1. Markov Formalization

In this section, we formulate our problem in the context of Markov Decision Process (MDP) theory by defining a set of states, actions, and rewards that allows us to efficiently model the interactions between the resource allocation agent and the environment.
  • State: the state space S represents the state of the allocation system at each instant st, where st = s t , 1 , s t , 2 , … s t , n . In our model, the state space is structured as a dictionary consisting of two parts: the first part contains the task queue, which is a list of requests waiting to be assigned to a server. This part is identified by the key “Request” and is presented in the form of a matrix, where each row corresponds to a specific request according to its “request status”:
    • If the request status is 0, then it has not been processed. It takes the form of a vector representing the characteristics of the requested resources ( R d i ), which include disk ( D i s k i ), core ( C o r e _ N u m b e r i ), RAM ( R A M i ), nearest center ( p o i n t _ c e n t e r i ), proposed price ( O f f e r _ P r i c e i ), and request status.
    • If the request status is 1, it has already been processed and assigned to a server; thus, it is a vector of zeros.
    • If the status of the request is −1, this means that the agent has made the wrong decision and is represented as a vector of −1.
    The second part represents the state of the servers, identified by the key “Servers”, and presented in the form of a matrix, where each row corresponds to the state of a particular server.
    Each server is characterized by available resources, R j including disk ( D i s k j ), core ( C o r e _ N u m b e r j ), RAM ( R A M j ), B j , location ( L o c j ), reliability level ( R e l i a _ l e v e l j ), and price of available resources ( P r i c e j ).
  • Action: the action space is a discrete space ranging from 0 to J, where J represents the number of available servers. Each action a ∈ A indicates the server to which the request is allocated.
  • Reward: a reward r i j is allocated to each time interval t, following a feasible action A(t). For each allocation X i , j , a reward r i j is given. This reward must be consistent with the needs of both the validators and the edge server providers. To this end, we have divided the reward form into three parts:
    • Provider reliability reward: The reward can be estimated using the reliability level to encourage the agent to choose a server with a higher confidence value.
      r r = R e l i a _ l e v e l j 0.5
    • Latency minimization reward: Efficiency in our problem boils down to minimizing transmission time and the cost of server access fees.
      r l = 1 T i j
      Equation (12) encourages the agent to find a server closer to the validator.
    • Profit maximization reward: The agent will seek to maximize the difference between the price proposed by the validator and the cost of server access fees, which guarantees savings compared to the validator’s budget.
      r p = P r i c e _ d i f i j
      where:
      p r i c e _ d i f i j = O f f e r _ P r i c e i P r i c e j i
Once the various reward values have been calculated, the agent receives the following reward vector:
R = [ r r , r l , r p ]
To promote optimal decision-making by the agent, we introduced a condition emphasizing the significance of strategic choices. Specifically, when both r r > 0 (indicating a reliability level above the threshold of 0.5) and r t > 0 (indicating a transmission time below 1 s), the rewards for reliability ( r r ) and latency ( r l ) are amplified by a factor of 10. Conversely, to discourage poor decisions, we established a critical condition for operational success: if the available resources ( R j ) fall short of the required demand ( R d i ) , the agent incurs penalties. This reduces the rewards for provider reliability, latency minimization, and profit maximization by 10 points each. This framework encourages desired behaviors while penalizing actions that could compromise system performance.

5.2. Proximal Policy Optimization as a Resource Allocation Method

The proposed DRL is based on the Proximal Policy Optimization (PPO) algorithm to solve the optimization problem (see Algorithm 1). PPO was chosen for its ability to efficiently handle complex and varied reward functions. It strikes a balance between exploration and exploitation, ensuring that the algorithm explores a wide range of possible solutions while still converging towards optimal policies. This is crucial in our multi-objective scenario, where different objectives, such as minimizing latency, enhancing reliability, reducing costs, and maximizing provider profit, must be balanced under stringent constraints like resource availability, server capacity, and block creation time.
Algorithm 1 Resource Allocation with PPO
1:
Input: resource_list, request_list
2:
Output: allocation_list
3:
for num_step in range(len(request_list)) do
4:
      State ← get_state(num_step)
5:
      action ← AgentPPO(State)
6:
      if server is available then
7:
          reward ← Calculate_reward(request, server)
8:
          alloc ← allocate(server, request)
9:
          allocation_list.append(alloc)
10:
    end if
11:
    num_step ← num_step + 1
12:
end for

6. Experiments and Results

The main focus of the experiments is to assess the performance of the proposed DRL-based resource allocation algorithm. This algorithm was implemented using the Stable-Baselines3 library [28], chosen for its robustness, flexibility, and comprehensive documentation. This library offers optimized use of numerous reinforcement learning algorithms. We implemented the resource allocation algorithm and trained and tested the agent on a laptop with 6 cores, 16 GB RAM, and a 512 GB SSD under Windows 11.
To assess the performance of the proposed DRL-based resource allocation algorithm, we rely on a simulation that models real-world scenarios based on the validators’ needs and edge server characteristics. The simulation models a homogeneous environment where edge resources are geographically distributed, with validators in the same area requesting resources based on their operational needs. Table 1 and Table 2 outline the configuration parameters for edge servers and requests. The simulation, conducted in Python v. 3.10, uses a random distribution according to the specified configuration.

6.1. PPO-Based Approach Training Assessment

This section explores the evaluation of our algorithm during the learning process, followed by a comparison with the performance of two other reinforcement learning algorithms: Advantage Actor-Critic (A2C) [29] and Deep Q-Network (DQN) [30]. A2C was chosen for its balance of computational efficiency and performance in policy gradient methods. At the same time, DQN was selected for its strong track record in handling discrete action spaces using value-based methods. Table 3 presents the hyperparameters used for all three algorithms (our PPO-based algorithm, A2C, and DQN).
The curves in Figure 5 illustrate the performance evolution, in terms of rewards, of three reinforcement learning algorithms during the training phase: our algorithm based on PPO, A2C, and DQN. The curves in Figure 5a,b show the rewards from over 16,000 episodes for the PPO and A2C agents. The blue line represents the gross reward per episode, while the orange line represents the moving average of rewards for every 5 episodes. Initially, the PPO and A2C agents start with negative rewards, meaning their initial attempts to solve the resource allocation problem are unsuccessful. Over time, as the agent learns by trial and error, the rewards increase dramatically as it gains more experience. Around episode 6000, the curve in Figure 5b shows that the A2C agent’s performance stabilizes, reaching a maximum value of approximately 850. Performance improvement stops once the agent reaches its allocation limit, unlike the PPO agent, which continues to improve slowly until episode 16,000.
The constant reward increase suggests an improved allocation quality for each request.
Furthermore, comparing Figure 5a,c reveals that the PPO curve shows faster reward improvement than the DQN curve, which requires more than 60,000 episodes. In contrast to the PPO agent, the DQN agent shows more significant fluctuations, while the PPO agent displays a much more stable movement.
In summary, as demonstrated in this experiment, our PPO-based algorithm outperformed both A2C and DQN during the training phase.

6.2. PPO-Based Approach Test Assessment

In this experiment, we compare the results of using our agent with two other algorithms, A2C and DQN, to test the effectiveness of our approach in terms of total execution time, cost savings, total transmission time, and reliability.
We have created a new simulation with the same parameters mentioned in Table 1 and Table 2 for use in the test.
Figure 6 presents the experimental results comparing the performance of the three algorithms: our approach based on PPO, A2C, and DQN. Each algorithm was evaluated over 500 iterations, with each iteration using a new set of requests and a corresponding set of available resources.
  • Total Execution time: The first graph shows the evolution of the total execution time as a function of the number of iterations for three reinforcement learning algorithms: our approach, A2C, and DQN. Our approach-based PPO is distinguished by systematically shorter execution times, demonstrating superior efficiency; A2C shows intermediate execution times, stabilizing around 0.5 s, slightly higher than those of PPO’s, while DQN has the longest execution times of the three. This may be due to the nature of the DQN algorithm, which uses a table of Q values to represent the policy and can be more computationally expensive than the neural network-based methods used by PPO and A2C. However, it should be noted that from iteration 400 onward, the execution time starts to increase, and the curve shows fluctuations at this stage for both PPO and A2C.
  • Cost savings: When we analyze the savings achieved through price negotiation, we find that the three algorithms, PPO, A2C, and DQN, generate nearly equal average savings of between $1000 and $1600.
  • Transmission time: The third graph shows that the PPO algorithm has the shortest total transmission time, stabilizing at around 30 s. DQN has a higher transmission time, hovering around 40 s, while A2C has the longest transmission time, stabilizing at around 60 s.
  • Reliability: The last graph shows that the PPO algorithm has a more stable reliability rate, fluctuating between 75 and 80; the A2C and DQN algorithms, on the other hand, exhibit more variable reliability rates, fluctuating between 50 and 75.

7. Conclusions

The shift to PoS has provided mobile blockchain with new opportunities, but it has also raised issues due to the constrained resources of mobile devices. Validators depend on the availability and seamless functioning of nodes, which are impacted by these limitations. This study suggested a novel framework, MEC-Chain, that uses the Proof of Stake (PoS) consensus method to integrate Mobile Edge Computing (MEC) into a mobile blockchain network. The proposed framework begins with the formulation of user needs and ends with task completion, helping to optimize task execution on edge servers. One unique feature of this design is its flexibility to adapt to the specific needs of validators, which is facilitated by both full and partial delegation modes. Additionally, the architecture includes a resource allocation agent that uses deep reinforcement learning (DRL), more precisely, the Proximal Policy Optimization (PPO) algorithm, to balance the interests of mobile blockchain users and the MEC providers. We used multi-objective reinforcement learning constrained to a single strategy. To evaluate the effectiveness of our approach, we carried out simulations. The results obtained demonstrate the ability of our PPO-based approach to perform quality resource allocations. A comparison with the A2C and DQN algorithms highlighted PPO’s superiority in server reliability and transmission time, but not at the cost level, which may be a limitation for our PPO model.
To further optimize our results, we plan to adopt a multi-strategy approach that enables the agent to learn and combine different allocation strategies in response to changing system conditions. For example, the Pareto strategy could help identify efficient solutions that balance key objectives such as cost and latency. However, this study has two main limitations: the wireless channel model relies on average fading and does not capture fast variations or interference, and the evaluation remains simulation-based, as MEC-Chain has not yet been tested in a real MEC–PoS deployment. To address these limitations, future work will focus on integrating more realistic wireless models and evaluating MEC-Chain in real-world environments to assess its usability, adaptability, and potential for further optimization.

Author Contributions

R.G. designed the research framework and contributed to the supervision and overall project coordination. K.B. implemented the methodology and performed the experiments. S.E. conducted data analysis and contributed to the interpretation of the results. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data presented in this study are available on request from the corresponding author. (The data are not publicly available due to privacy or ethical restrictions.)

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Loukil, F.; Ghedira-Guegan, C.; Boukadi, K.; Benharkat, A.N.; Benkhelifa, E. Data privacy based on IoT device behavior control using blockchain. ACM Trans. Internet Technol. (TOIT) 2021, 21, 1–20. [Google Scholar] [CrossRef]
  2. Grati, R.; Loukil, F.; Boukadi, K.; Abed, M. A blockchain-based framework for circular end-of-life vehicle processing. Clust. Comput. 2024, 27, 707–720. [Google Scholar] [CrossRef]
  3. Feki, E.; Boukadi, K.; Loukil, F.; Abed, M. BELONG: Blockchain basEd pLatform fOr donation & social project fuNdinG. In Proceedings of the 2022 IEEE/ACS 19th International Conference on Computer Systems and Applications (AICCSA), Abu Dhabi, United Arab Emirates, 5–8 December 2022; pp. 1–8. [Google Scholar]
  4. Loukil, F.; Boukadi, K.; Hussain, R.; Abed, M. Ciosy: A collaborative blockchain-based insurance system. Electronics 2021, 10, 1343. [Google Scholar] [CrossRef]
  5. Leng, J.; Yan, D.; Liu, Q.; Xu, K.; Zhao, J.L.; Shi, R.; Wei, L.; Zhang, D.; Chen, X. ManuChain: Combining permissioned blockchain with a holistic optimization model as bi-level intelligence for smart manufacturing. IEEE Trans. Syst. Man Cybern. Syst. 2019, 50, 182–192. [Google Scholar] [CrossRef]
  6. Nakamoto, S. Bitcoin: A peer-to-peer electronic cash system. Decentralized Bus. Rev. 2008, 21260. [Google Scholar]
  7. Vasin, P. Blackcoin’s Proof-of-Stake Protocol v2. 2014; Volume 71. Available online: https://blackcoin.org/blackcoin-pos-protocol-v2-whitepaper.pdf (accessed on 5 September 2025).
  8. Kiayias, A.; Koutsoupias, E.; Kyropoulou, M.; Tselekounis, Y. Blockchain mining games. In Proceedings of the 2016 ACM Conference on Economics and Computation, Maastricht, The Netherlands, 24–28 July 2016; pp. 365–382. [Google Scholar]
  9. Liu, X.; Wang, W.; Niyato, D.; Zhao, N.; Wang, P. Evolutionary game for mining pool selection in blockchain networks. IEEE Wirel. Commun. Lett. 2018, 7, 760–763. [Google Scholar] [CrossRef]
  10. Tang, C.; Li, C.; Yu, X.; Zheng, Z.; Chen, Z. Cooperative mining in blockchain networks with zero-determinant strategies. IEEE Trans. Cybern. 2019, 50, 4544–4549. [Google Scholar] [CrossRef] [PubMed]
  11. Christidis, K.; Devetsikiotis, M. Blockchains and smart contracts for the Internet of Things. IEEE Access 2016, 4, 2292–2303. [Google Scholar] [CrossRef]
  12. Chen, X.; Liu, G. Energy-efficient task offloading and resource allocation via deep reinforcement learning for augmented reality in mobile edge networks. IEEE Internet Things J. 2021, 8, 10843–10856. [Google Scholar] [CrossRef]
  13. Zuo, Y.; Zhang, S.; Han, Y.; Jin, S. Computation resource allocation in mobile blockchain-enabled edge computing networks. In Proceedings of the 2020 IEEE/CIC International Conference on Communications in China (ICCC), Chongqing, China, 9–11 August 2020; pp. 617–622. [Google Scholar]
  14. Qiu, H.; Li, T. Auction method to prevent bid-rigging strategies in mobile blockchain edge computing resource allocation. Future Gener. Comput. Syst. 2022, 128, 1–15. [Google Scholar] [CrossRef]
  15. ethereum.org. Proof-of-Stake (PoS) | Ethereum.org. 2023. Available online: https://ethereum.org/en/developers/docs/consensus-mechanisms/pos/ (accessed on 22 December 2023).
  16. Xiong, Z.; Zhang, Y.; Niyato, D.; Wang, P.; Han, Z. When mobile blockchain meets edge computing. IEEE Commun. Mag. 2018, 56, 33–39. [Google Scholar] [CrossRef]
  17. Cortes-Goicoechea, M.; Mohandas-Daryanani, T.; Muñoz-Tapia, J.L.; Bautista-Gomez, L. The impact of connectivity and software in Ethereum validator performance. Clust. Comput. 2025, 28, 366. [Google Scholar] [CrossRef]
  18. Abbas, N.; Zhang, Y.; Taherkordi, A.; Skeie, T. Mobile edge computing: A survey. IEEE Internet Things J. 2017, 5, 450–465. [Google Scholar] [CrossRef]
  19. Liu, M.; Yu, F.R.; Teng, Y.; Leung, V.C.; Song, M. Computation offloading and content caching in wireless blockchain networks with mobile edge computing. IEEE Trans. Veh. Technol. 2018, 67, 11008–11021. [Google Scholar] [CrossRef]
  20. Jiao, Y.; Wang, P.; Niyato, D.; Xiong, Z. Social welfare maximization auction in edge computing resource allocation for mobile blockchain. In Proceedings of the 2018 IEEE International Conference on Communications (ICC), Kansas City, MO, USA, 20–24 May 2018; pp. 1–6. [Google Scholar]
  21. Jiao, Y.; Wang, P.; Niyato, D.; Suankaewmanee, K. Auction mechanisms in cloud/fog computing resource allocation for public blockchain networks. IEEE Trans. Parallel Distrib. Syst. 2019, 30, 1975–1989. [Google Scholar] [CrossRef]
  22. Zhang, K.; Gui, X.; Ren, D.; Du, T.; He, X. Optimal pricing-based computation offloading and resource allocation for blockchain-enabled beyond 5G networks. Comput. Netw. 2022, 203, 108674. [Google Scholar] [CrossRef]
  23. Wang, Y.; Chen, C.R.; Huang, P.Q.; Wang, K. A new differential evolution algorithm for joint mining decision and resource allocation in a MEC-enabled wireless blockchain network. Comput. Ind. Eng. 2021, 155, 107186. [Google Scholar] [CrossRef]
  24. Ding, J.; Han, L.; Li, J.; Zhang, D. Resource allocation strategy for blockchain-enabled NOMA-based MEC networks. J. Cloud Comput. 2023, 12, 142. [Google Scholar] [CrossRef]
  25. Gu, J.; Liu, Y.; Xu, X. SharpEdge: A QoS-driven task scheduling scheme with blockchain in mobile edge computing. Concurr. Comput. Pract. Exp. 2024, 36, e8161. [Google Scholar] [CrossRef]
  26. Fang, R.; Lin, P.; Liu, Y.; Liu, Y. Task offloading and resource allocation for blockchain-enabled mobile edge computing. IET Commun. 2024, 18, 1889–1899. [Google Scholar] [CrossRef]
  27. Shannon, C.E. Coding theorems for a discrete source with a fidelity criterion. IRE Nat. Conv. Rec 1959, 4, 1. [Google Scholar]
  28. Raffin, A.; Hill, A.; Gleave, A.; Kanervisto, A.; Ernestus, M.; Dormann, N. Stable-Baselines3: Reliable Reinforcement Learning Implementations, 2021. Software. Available online: https://github.com/DLR-RM/stable-baselines3 (accessed on 5 September 2025).
  29. Mnih, V. Asynchronous Methods for Deep Reinforcement Learning. arXiv 2016, arXiv:1602.01783. [Google Scholar] [CrossRef]
  30. Mnih, V.; Kavukcuoglu, K.; Silver, D.; Rusu, A.A.; Veness, J.; Bellemare, M.G.; Graves, A.; Riedmiller, M.; Fidjeland, A.K.; Ostrovski, G.; et al. Human-level control through deep reinforcement learning. Nature 2015, 518, 529–533. [Google Scholar] [CrossRef] [PubMed]
Figure 1. MEC-Chain: a new framework for a MEC-enabled mobile blockchain network.
Figure 1. MEC-Chain: a new framework for a MEC-enabled mobile blockchain network.
Futureinternet 17 00563 g001
Figure 2. MEC-Chain process.
Figure 2. MEC-Chain process.
Futureinternet 17 00563 g002
Figure 3. Exchanging requests and responses within the full delegation mode.
Figure 3. Exchanging requests and responses within the full delegation mode.
Futureinternet 17 00563 g003
Figure 4. Exchanging requests and responses within the partial delegation mode.
Figure 4. Exchanging requests and responses within the partial delegation mode.
Futureinternet 17 00563 g004
Figure 5. Reward evaluation curves during the agents’ training phase: (a) PPO, (b) A2C, (c) DQN.
Figure 5. Reward evaluation curves during the agents’ training phase: (a) PPO, (b) A2C, (c) DQN.
Futureinternet 17 00563 g005
Figure 6. Comparison of performance between the PPO-based approach, A2C, and DQN algorithms.
Figure 6. Comparison of performance between the PPO-based approach, A2C, and DQN algorithms.
Futureinternet 17 00563 g006
Table 1. Edge environment simulation parameters.
Table 1. Edge environment simulation parameters.
ParameterDescriptionValue
N e d g e Total number of edge servers50
D i s k j Disk size (GB)[1024, 8192]
R A M j Memory RAM (GB)[8, 64]
N o m b e r _ c o r e s j Number of cores[4, 20]
C o u t j Total cost[10, 50]
B j Bandwidth rate (Mbps)25
L o c j Location (X, Y) X , Y [ 5000 , 5000 ]
R e l i a _ l e v e l j Reliability level[0, 1]
Values in brackets indicate ranges.
Table 2. Parameters for simulating validator requests.
Table 2. Parameters for simulating validator requests.
ParameterDescriptionValue
N v a l i d a t o r Number of validators100
D i s k i Disk size (GB)[700, 2000]
R A M i Memory RAM (GB)[8, 24]
N o m b e r _ c o r e s i Number of cores[2, 4]
L o c i Location (X, Y) X , Y [ 5000 , 5000 ]
Values in brackets indicate ranges.
Table 3. Agents training parameters for PPO, A2C, and DQN.
Table 3. Agents training parameters for PPO, A2C, and DQN.
ParameterDescriptionPPOA2CDQN
l e a r n i n g _ r a t e Step size for updating neural network weights 3 × 10 3 7 × 10 4 10 4
n _ e p o c h s Number of times each mini-batch is used to update the model10510
c l i p _ r a n g e Clipping range for action probability ratios 2 × 10 1
b a t c h _ s i z e Batch size646432
s d e _ s a m p l e _ f r e q Stochastic differential equation sampling frequency−1−1−1
v f _ c o e f Value function coefficient 5 × 10 1 5 × 10 1 5 × 10 1
b u f f e r _ s i z e Size of the replay buffer1,000,000
Dashes (–) indicate parameters not applicable to the algorithm.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Grati, R.; Boukadi, K.; Elleuch, S. MEC-Chain: Towards a New Framework for a MEC-Enabled Mobile Blockchain Network Under the PoS Consensus. Future Internet 2025, 17, 563. https://doi.org/10.3390/fi17120563

AMA Style

Grati R, Boukadi K, Elleuch S. MEC-Chain: Towards a New Framework for a MEC-Enabled Mobile Blockchain Network Under the PoS Consensus. Future Internet. 2025; 17(12):563. https://doi.org/10.3390/fi17120563

Chicago/Turabian Style

Grati, Rima, Khouloud Boukadi, and Safa Elleuch. 2025. "MEC-Chain: Towards a New Framework for a MEC-Enabled Mobile Blockchain Network Under the PoS Consensus" Future Internet 17, no. 12: 563. https://doi.org/10.3390/fi17120563

APA Style

Grati, R., Boukadi, K., & Elleuch, S. (2025). MEC-Chain: Towards a New Framework for a MEC-Enabled Mobile Blockchain Network Under the PoS Consensus. Future Internet, 17(12), 563. https://doi.org/10.3390/fi17120563

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop