Next Article in Journal
Reinforcement Learning-Based Energy-Saving Path Planning for UAVs in Turbulent Wind
Previous Article in Journal
High-Precision Calibration Method and Error Analysis of Infrared Binocular Target Ranging Systems
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MADDPG-Based Deployment Algorithm for 5G Network Slicing

1
School of Communication and Information Engineering, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
2
School of Information Science and Technology, Northwest University, Xi’an 710169, China
3
School of Science, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
4
School of Automation, Xi’an University of Posts & Telecommunications, Xi’an 710121, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(16), 3189; https://doi.org/10.3390/electronics13163189
Submission received: 5 July 2024 / Revised: 1 August 2024 / Accepted: 4 August 2024 / Published: 12 August 2024

Abstract

:
One of the core features of 5G networks is the ability to support multiple services on the same infrastructure, with network slicing being a key technology. However, existing network slicing architectures have limitations in efficiently handling slice requests with different requirements, particularly when addressing high-reliability and high-demand services, where many issues remain unresolved. For example, predicting whether actual physical resources can meet network slice request demands and achieving flexible, on-demand resource allocation for different types of slice requests are significant challenges. To address the need for more flexible and efficient service demands, this paper proposes a 5G network slicing deployment algorithm based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG). Firstly, a new 5G network slicing deployment system framework is established, which measures resources for three typical 5G network slicing scenarios (eMBB, mMTC, uRLLC) and processes different types of slice requests by predicting slice request traffic. Secondly, by adopting the multi-agent approach of MADDPG, the algorithm enhances cooperation between multiple service requests, decentralizes action selection for requests, and schedules resources separately for the three types of slice requests, thereby optimizing resource allocation. Finally, simulation results demonstrate that the proposed algorithm significantly outperforms existing algorithms in terms of resource efficiency and slice request acceptance rate, showcasing the advantages of multi-agent approaches in slice request handling.

1. Introduction

With the significant growth in mobile data traffic, the widespread application of 5G network technology in various services has become increasingly important. The International Telecommunication Union (ITU) defines three typical service scenarios in the literature: enhanced mobile broadband (eMBB), ultra-reliable low-latency communication (uRLLC), and massive machine-type communication (mMTC) [1]. A core characteristic of 5G networks is their ability to provide different services on the same infrastructure. According to literature sources [2,3,4,5], deploying network slicing using Network Function Virtualization (NFV) can realize the advantages of differentiated services. These Virtual Network Functions (VNFs) ensure platform flexibility and scalability while reducing operational costs. Studies [6,7,8,9,10,11,12] have investigated the use of NFV technology in network slicing deployment. By monitoring the resource status of VNFs and considering factors such as load and bandwidth, this optimization can reduce network latency and improve resource utilization.
The literature [13,14,15,16] analyzes the scenario of virtual network mapping, optimizing the mapping process for nodes and links and modeling the Virtual Network Embedding (VNE) problem to meet node and link constraints while optimizing physical resource allocation to enhance deployment success rates. The literature [17,18,19,20,21] explores different solutions to the resource allocation problem using Markov chains, big data analysis, and queuing theory. The literature [22,23,24] employs a near-optimal Integer Linear Programming (ILP) approach to address the network slicing deployment problem. However, this method is slow in handling large-scale problem instances and can only be used in offline scenarios. Due to the excellent performance of the Deep Deterministic Policy Gradient (DDPG) algorithm in solving problems within continuous action spaces, the authors of [25,26,27] use the DDPG algorithm for dynamic resource allocation to optimize network performance. However, these studies have some limitations when using heuristic and reinforcement learning algorithms to address the separation between slice requests and resource allocation. They may face challenges in handling resource scheduling for multiple correlated requests and in adopting more conservative criteria to meet high reliability and demanding service requirements.
In order to efficiently implement network slicing requests on physical networks and provide high reliability and quality of service, this paper proposes a 5G network slicing deployment algorithm based on the Multi-Agent Deep Deterministic Policy Gradient (MADDPG). The algorithm aims to optimize the deployment process for different types of slicing requests. The main contributions of this paper are as follows: (1) A new 5G network slicing deployment system framework is established, which measures resources for three typical 5G network slicing scenarios (eMBB, mMTC, uRLLC) and addresses different types of slicing requests by predicting the traffic of slice requests. (2) The proposed algorithm utilizes the concept of multi-agent deep reinforcement learning, enhancing cooperation among agents. The algorithm focuses on training the action selection for slice requests and performs resource scheduling for each slicing request separately to achieve optimal resource allocation rewards. Simulation results show that the proposed algorithm performs excellently in terms of resource utilization and the slice deployment success rate, demonstrating its effectiveness and advantages.

2. System Model

As shown in Figure 1, the following two modules have been created for the deployment framework of 5G network slicing: a prediction module and a scheduling module.
  • At present, the network slicing request arrives at the prediction module, which determines the user request based on the traffic prediction, and then chooses whether to accept or reject the network slicing request (NSR) by combining the amount of resources of the scheduling module and the state of the physical network at the previous moment.
  • If the prediction module accepts the NSR, it sends a notification to the scheduling module to schedule the resource and embed it in the physical network, or otherwise refuses.
  • Upon receiving the notification, the scheduling module sends a notification to the physical network at the next moment to virtualize the deployment targets for the three different types of slices, as well as the provisioning of resources.

2.1. Physical Network

We perform unification of the various hardware facilities into physical nodes that provide physical resources. The underlying physical network consists of a network of virtual machines and can be represented by an assignment undirected graph G S = ( N S , L S ) , where N S is denoted as the set of physical nodes, L S is the set of bi-directional physical links. Each physical node n S N S can be characterized by a set of attributes R n that help to define whether a specific physical node can host a virtual node.

2.2. NSR

Three types of NSR based on 5G services are considered: eMBB, mMTC, and uRLLC. We denote each NSR as an undirected graph G V = ( N V , L V ) where N V denotes the set of virtual nodes and L V is the set of virtual links in the NSR. The requirements of each virtual node or virtual link are typically related to the same parameters of the physical nodes or links. The legend for the symbols is shown in Table 1 below:

2.3. System Framework for 5G Network Slicing Deployment

2.3.1. User Traffic Characteristics

Traffic prediction is based on the aggregation of each accepted tenant. Tenants may request different network slices according to their specific service requirements, without specifying the set of physical cells where their users are located. In fact, a particular vertical tenant may only require a small number of cells to provide specific services for its users. For example, the automotive industry may only need cells that cover suburban roads. Our approach involves classifying traffic requests according to relevant service requirements and geographic locations, thus enabling separate predictions for each cell and each slice. In our analysis, we initially assume that traffic requests are uniformly distributed across the entire network.
We assume the following traffic model. We assume different classes of traffic based on specific SLAs, as shown in Table 2 [28]. Let the traffic volumes of tenant i for traffic k be modeled as a point process, ζ i ( k ) = t = 0 T δ t c ψ r i , c ( k ) , where δ t denotes the Dirac metric for sample t and Ψ is the set of cell. We express each user’s traffic requests r i , c ( k ) ( t ) as the required resource metric, and for each user i express the aggregate traffic as r i ( k ) = c ψ r i , c ( k ) .

2.3.2. Traffic Forecasting

In our model, a fundamental key assumption is that traffic requests follow a cyclical pattern, which is essential for applying time series forecasting algorithms. Based on the assumed periodicity, traffic forecasting is conducted based on an observed time window T 0 and is represented by a vector, r i ( k ) = ( r i ( k ) ( t T 0 ) , r i ( k ) ( t T 0 1 ) , , r i ( k ) ( t ) ) . Then, we fix a future time window T F that provides the predicted traffic for the time period [ t + 1 , t + T F ] , r ^ i ( k ) = ( r ^ i ( k ) ( t + 1 ) , r ^ i ( k ) ( t + 2 ) , , r ^ i ( k ) ( t + T F ) ) , through a forecast function f . As the observed time window T 0 extends and more information is collected, the accuracy of traffic predictions for each user’s slice in the future time window T F increases.
Under the above assumptions, the system exhibits periodic behavior, where W s represents a season that repeats over time. Within a single season, we assume that the process ζ i ( k ) is stationary and ergodic [29]:
μ ¯ i ( k ) = 1 Z z = 0 Z X [ z ] = 1 T t = 0 T r i , k [ t ]
where μ ¯ i k expresses the average traffic request for the k -th type of traffic of the i -th user, Z represents the number of units, and r i , k [ t ] defined as the k -th type of traffic request for the i -th user at time t .
To this end, we use the Holt–Winters (HW) forecasting method to analyze and predict future traffic requests associated with specific network slices across all selected cells. The definition of the forecasting function f H is as follows:
f H = R T 0 + 1 R T F r i ( k ) r ^ i ( k )
We denote a specific predicted traffic request r ^ i ( k ) ( t ) by r ^ i ( k ) . We use the additive version of the HW forecasting model, as the seasonal effect does not depend on the mean traffic level of the observed time window but is instead added based on values predicted through level and trend effects. Following the standard HW procedure and assuming a seasonality frequency (W) based on traffic characteristics, we can predict such requests using the level s t , trend b t , and seasonal u t factors, as follows:
r ^ i , t + T F ( k ) = s t + b t T F + u t + T F W ( k + 1 ) s t = α ( r i , t ( k ) u t W ) + ( 1 α ) ( s t 1 + b t 1 ) b t = β ( S t s t 1 ) + ( 1 β ) b t 1 u t = γ ( r i , t ( k ) s t 1 b t 1 ) + ( 1 γ ) u t W
where k denotes the integer part of ( T F 1 ) / W and the set of optimal HW parameters α , β , and γ can be obtained during a training period employing existing techniques. We focus on how prediction errors and inaccuracies affect our network slicing solutions. Inaccurate predicted traffic values may lead to incorrect embedding of network slices, which in turn may fail to adapt to the system’s capacity, resulting in degraded service quality. Therefore, the training prediction error is further defined as follows:
m i , t ( k ) = r i , t ( k ) r ^ i , t ( k ) = r i , t ( k ) ( s t 1 + b t 1 + u t 1 )
which can be computed during the training of the prediction algorithm. For any predicted value at time y, where a prediction interval [ l ^ i , y ( k , η ) ,   h ^ i , y ( k , η ) ] can be derived for which the error probability of future traffic requests is η i ( k ) for that particular network slice request. Given that our process ζ i ( k ) is iterative, assuming an optimal set of hardware parameters, for any predicted value at time z , we can derive the prediction interval [ l ^ i , y ( k , η ) ,   h ^ i , y ( k , η ) ] with a certain probability η i ( k ) for future traffic requests specific to our network slice. Therefore, it is considered that
Pr { l ^ i , y ( k , η ) r ^ i , y ( k ) h ^ i , y ( k , η ) } = η i ( k ) , y [ t + 1 , t + T F ]
where h ^ i , y ( k , η ) ( or   l ^ i , y ( k , η ) ) = r ^ i , y ( k ) + ( ) Ω η V a r ( m i , y ( k ) ) , Ω η denotes the one-tailed value of a standard normal distribution, V a r ( m i , y ( k ) ) ( 1 + ( y 1 ) ) α 2 [ 1 + y β + y ( 2 y 1 ) 6 β 2 ] ) σ m 2 , η i ( k ) expresses the probability, and σ m 2 is the variance of the one-step training prediction error over the observed time window, i.e., σ m 2 = V a r ( m i , t ( k ) ) .
Due to the requirements imposed by traffic SLAs, we focus only on the upper bound of the prediction interval as it provides the “worst-case” of a forecasted traffic level. Following the above, best-effort traffic requests with no stringent requirements can tolerate a prediction with a longer time pace that results in imprecise values. This makes the upper bound h ^ i , y ( k , η ) very close to the future value r i , y ( k , η ) regardless of the error probability η i ( k ) ; the number of predicted y values is finite. On the other hand, when considering the need to meet bit rate traffic requirements within a shorter period, the forecasting process becomes more complex and requires more predicted values y . Such traffic needs to be modeled with a higher probability of prediction error probability η i ( k ) .
According to the traffic categories defined in Table 1, the traffic category k = 0 requires a shorter prediction range than the other traffic categories, and therefore further predictions are necessary, for which an upper bound on the prediction probability error per user for that traffic category is derived. We define the maximum gain between sliced requests and predicted traffic requests as d ^ i ( k ) = max y T F ( R i ( k ) r ^ i , y ( k ) ) . Then, the prediction error probability is as follows:
η i ( k = 0 ) : Ω η V a r ( m i , y ( k = 0 ) ) = d ^ i ( k = 0 )
As soon as the potential gain d ^ i ( k = 0 ) becomes very large, we cap the one-tailed value Ω η to 3.49, resulting in η i ( k = 0 ) = 99.9%. Conversely, for the best-effort traffic ( k = 4), we compute the forecasting error probability η i ( k = | K | ) = 50%, due to its more relaxed service. For the other traffic classes k , intermediate forecasting error probabilities η i ( k ) are calculated from (6) by deriving d ^ i ( k ) values from the upper and the lower bound values.

2.3.3. Scheduling Model

We define the network slicing request as ω i ( k ) = { R , T , i , k } , where R is the amount of resources required, T is the duration of the slice, i denotes the tenant, and k is the traffic class. In general, the user requests are denoted as R i ( k ) ( T i ) and the traffic classes are predicted for a fixed time window T F . We assume that the physical network resources are a rectangular box that represents a limited amount of resources. where the width corresponds to T F and the height corresponds to the total amount of available resources ϑ . The cost of each network slice request s i is proportional to the amount of resources requested R i . The allocation of resources is an NP-hard problem.
The amount of resources required per time y based on the predicted traffic type k (with a given prediction error probability η i ( k ) ) is denoted by the set R ^ i , y ( k ) = h ^ i , y ( k , η ) , and r ^ i , y ( k ) denotes the requests of user i for category k . Five traffic categories shown in Table 1 are considered, each characterized by a time window T ( k ) that identifies the duration [ y ,   y + T ( k ) ] over which the category should see the metric, which is shorter for highly demanding traffic categories and longer for more moderate categories. The scheduling module will give the appropriate amount of resources R ^ i , y ( k ) to meet the traffic metric within this length.
To ensure that the scheduling module expects the network slice traffic level to be no lower than the predicted traffic boundary, r i , y ( k ) > R ^ i , y ( k ) , y T i . The key goal of the scheduling module is to minimize the consumed resources, i.e., minimize the cost, while guaranteeing the SLA of the traffic within the network slice by setting a time window T s that will encompass all classes of time windows T ( k ) , T s T ( k ) . The mathematical model of the scheduling module is established as follows:
min n T s ( z i , n ( k ) + ξ P i , n ( k ) ) s . t . ( n = y k y k + T ( k ) z i , n ( k ) ) r i , y ( k ) x i ( k ) , y [ 0 , T i T ( k ) 1 ] z i , n ( k ) ϑ + P i , n ( k ) , n T z i , n ( k ) R , i N , n T , k τ
where ϑ is the total capacity of physical resources, P i , n ( k ) is the penalty generated by not meeting the SLA of the user’s network slicing service used for feedback prediction, ξ is a larger constant factor set to ensure the reduction in P i , n ( k ) always has the highest priority, x i ( k ) is to determine whether to accept the request, acceptance is 1, and refusal is 0.
The prediction error probability for the general flow category can be derived from Equation (6):
η i ( k ) = h i ( k ) Ω η V a r ( m i , y ( k ) ) = d ^ i ( k )
h i ( k ) as a historical penalty function, is defined as follows:
h i ( k ) = m n m W c + n m
where n m is defined as the number of times the penalty is zero, P i , j ( k ) = 0 , j . W c is defined as the length of the season in the forecasting process. The historical penalty function sets the control policy in order to control the system from SLA violations, derives a larger prediction error probability η i ( k ) in the case of prediction failure, obtains a smaller gain, and prevents the waste of resources. The objective of setting three types of slices determines the metric δ t of each user traffic request in the prediction model.
eMBB: Enhanced mobile broadband is a key application scenario of 5G technology, aiming to provide wide-area coverage and hotspot connectivity. In hotspot scenarios, where user density is high, there is a need for very high traffic capacity, while the requirements for mobility are lower and user data rates are higher. This type of slice does not require strict delay constraints or abundant resources. Therefore, the deployment goal of eMBB slices should be to maximize the remaining resources on physical nodes, which can be represented as follows:
max [ n s N s C s ( n s ) n v N v C v ( n v ) μ n v , n s ]
where n s and n v represent the sets of physical nodes and virtual nodes, respectively, C s ( n s ) denotes the resource capacity of physical node, C v ( n v ) denotes the resource demand of virtual node, and μ n v n s represents the mapping relationship of virtual node n v on physical node n s .
mMTC: Massive machine-type communications (mMTC) primarily address the large-scale connectivity needs of Internet of Things (IoT) devices, characterized by a vast number of connected devices, typically transmitting low volumes of data with latency insensitivity. Due to the need to handle a large number of connections, there is a high demand for computational resources and low congestion rates. Consequently, this type of slice aims to minimize bandwidth usage on physical links. In other words, it should maximize the remaining bandwidth on physical links. Therefore, the deployment objective for mMTC slices can be expressed as follows:
max [ l i L i B i ( l i ) l v L v B v ( l v ) * ρ l v , l i ]
where l i and l v represent the sets of physical link and virtual link, respectively, B i ( l i ) represents the bandwidth of the physical link, B v ( l v ) denotes the bandwidth of the virtual link, and ρ l v l i represents the number of virtual links corresponding to the physical link.
uRLLC: Ultra-reliable low-latency communication aims to provide extremely high reliability and very low communication latency to meet the needs of applications requiring high real-time performance and stability. uRLLC has strict latency requirements, and the deployment objective should be to minimize the slice’s latency. We translate latency time into hops, so minimizing latency is equivalent to minimizing the length of each physical path. Therefore, the deployment objective for uRLLC slices is as follows:
min [ l v L v ρ l v , l i ]
The constraints corresponding to the overall deployment objectives are as follows:
n v N v μ n v , n i = 1 ,   n i N i
n i N i μ n v , n i 1 ,   n v N v
l i L i B i ( l i ) B i
where the constraints of the above equation are expressed as follows: (a) each virtual node is guaranteed to be mapped to only one physical node; (b) each physical node is guaranteed to host only one virtual node; (c) ensuring that the bandwidth occupied by each node does not exceed the total available bandwidth.
In order to meet the demands of the virtual nodes, it is necessary to add two constraints to ensure that the resource capacity of each physical node can meet the total demand of the virtual nodes mapped to it. At the same time, we ensure that the available computing resources of the physical nodes mapping the virtual nodes are not less than the demands of the virtual nodes, which can be expressed as follows:
n v N v μ n v , n i C v ( n v ) C i ( n i ) ,   n i N i
n i N i μ n v , n i C i ( n i ) C v ( n v ) ,   n v N v

3. Algorithm Design

We use multiple network slicing agents as multiple agents to update their own policies and undertake MDP training of multiple agents to ensure a stable environment. Markov decisions of N agents can be represented as multivariate groups ( S , A , r 1 , r 2 , , r n , p , γ ) , S is the state space S = { s 1 , , s n } , and A is the action space A = { a 1 , , a n } . In the whole system, r n is the payoff function of the nth slicing agent (the amount of resources that the slicing agent requests to expect to be allocated), and p is the transfer probability, where all the agents are in the current state s n S , take action a n A , and transfer to a new state s n S . The transfer probability becomes p ( s n | s n , a 1 , , a n ) . Then, the learning goal of each agent is to maximize cumulative returns:
R N = n = 0 T s γ n r n n
where γ n represents the discount factor γ raised to the power of the time step n . The discount factor, which typically takes values between 0 and 1, balances immediate and future rewards by assigning a higher weight to rewards closer to the present. r n n represents the immediate reward obtained by the agent at time step n .
T s is the time window containing all categories of slice requests and γ [ 0 , 1 ) is the discount factor. For each slicing agent, which is centrally trained and decentralized for execution, the policy of all the agents, denoted as μ = { μ 1 , , μ n } , and the action-value function of the agent n is as follows:
Q n μ ( s , a ) = E s , r n E [ r n + γ Q n μ ( s , μ ( s ) ) ]
The gradient of the expected payoff J n = E [ R n ] of an agent n is as follows:
θ n μ J n E s , a D [ θ n μ Q n ( s , a | θ n Q ) | a n = μ ( s n | θ n μ ) ]
where θ n Q represents the parameters of the policy network of agent n and θ n Q represents the parameters of the action-value network of agent n . D is the experience pool in which the historical information of each intelligence is stored. s and a represent the state and action, respectively. μ ( s n | θ n μ ) represents the action generated by the policy network μ in state s n . In the MADDPG-based 5G network slicing deployment algorithm, the actor network and the critic network of all agents are represented as μ = { μ 1 , , μ n } ,   Q = { Q 1 , , Q n } .

3.1. State Space

The real-time resource situation of each physical node in the 5G physical network is the state space. We define two feature vectors φ S ( G S , t ) and φ V ( G V , t ) that represent the physical network and network slice request states at time t .
φ S ( G S , t ) consists of the following characteristics: (1) Number of physical nodes N : the number of physical nodes N ( t ) = { n N S | A N ( n , t ) > 0 } available at time t . (2) Number of physical links L : The number of physical links L ( t ) = { l L S | A L ( l , t ) > 0 } with available bandwidth capacity at time t. (3) Free/occupied resources: the total amount of available/occupied resources in the physical network at time t . (4) Available/unavailable bandwidth: the total available/unavailable bandwidth at time t .
φ V ( G V , t ) consists of the following characteristics: (1) Number of virtual nodes N V : the number of virtual nodes at time t . (2) Number of virtual links L V : the number of virtual links at time t . (3) User’s traffic request r i ( k ) : the request expressed as a specific requested resource metric. (4) Traffic category corresponding to the type of slice request.

3.2. Action Space

The action at every moment t contains two parts. One is represented by x i ( k ) , accepting a slice request gives a score of as 1 and otherwise 0, and the other is the resource allocation for the three slice types. Therefore, each action of the action set is denoted as a = { x , ϕ } .
ϕ { ϕ 1 ,   ϕ 2 ,   ϕ 3 } ; the resource allocation action for the three types of slices first requires the selection of nodes in the physical network to host the virtual nodes for the slicing requests while satisfying the capacity requirements. Considering the degree and centrality of nodes, we first assess the node importance. The degree and centrality of nodes are normalized and the node degree is expressed as follows:
d i = d i N 1
Node centrality is expressed as follows:
b i = 2 b i ( N 1 ) ( N 2 )
Based on the above normalization criteria, combined with the resources and structure of the nodes, the node importance is defined as follows:
N R ( i ) = C ( i ) l L i B ( l )
N I ( i ) = N R ( i ) ( d i + b i 2 )
N R ( i ) represents the resources of the node, including the capacity C ( i ) of the node; the set L i of links connected to node i ; and the available bandwidth B ( l ) of link l . Based on the importance of the nodes, Sort the virtual nodes using Algorithm 1, and then perform the node mapping.
Algorithm 1. Virtual node sorting algorithm
Input: N v , the set of virtual nodes requested by the slice.
Output: N I ( i ) , the sequence of the sorted virtual nodes.
(1)
Calculate the importance degree N I ( i ) of each virtual node;
(2)
Sort the virtual nodes in non-increasing order according to their importance values;
(3)
(Select the virtual node with the highest importance value as the root node, labeled as P ;
(4)
Traverse the network slice request graph using the BFS algorithm to obtain the BFS tree T ;
(5)
Sort the virtual nodes in each layer of T in non-increasing order according to their importance values;
(6)
Return N I ( i ) .
Secondly, for the three types of slicing requests, the node mapping and link mapping are also classified into three specific types. We use Algorithms 2–4 to complete this process, and the details of the algorithms are as follows:
Algorithm 2. eMBB slice deployment algorithm
Input: Network slicing request and physical network.
Output: Success or failure of the node mapping process.
(1)
Sort virtual nodes v N v i , according to Algorithm 1;
(2)
For v N v i , do;
(3)
If v is the root node;
(4)
Map v to the physical node with the highest physical resource capacity;
(5)
Else;
(6)
Find the parent virtual node P of virtual node v in tree T ;
(7)
Find the physical node P P to which P is mapped;
(8)
Set the physical node adjacent to P P to Q ;
(9)
Map to the physical node in Q with the highest physical resources;
(10)
End;
(11)
End.
(12)
Sort the virtual link ( i , j ) L v i in a non-increasing order according to the bandwidth requirements of the network slice request;
(13)
For ( i , j ) L v i , do;
(14)
Remove the physical links that cannot meet the bandwidth requirements;
(15)
Find the physical nodes a, b that map virtual nodes i , j ;
(16)
Map the virtual link ( i , j ) to the physical shortest path between a , b ;
(17)
End.
Algorithm 3. mMTC slice deployment algorithm
Input: Network slicing request and physical network.
Output: Success or failure of link mapping process.
(1)
Sort the virtual links in non-increasing order according to the bandwidth requirements of the network slicing requests ( i , j ) L v i ;
(2)
For ( i , j ) L v i , do;
(3)
Find all physical candidate paths AB that can map virtual link ( i , j ) ;
(4)
For a b A B , do;
(5)
Find the physical link l a b with the minimum bandwidth resource;
(6)
End;
(7)
Select the path m a l with the maximum value of l ;
(8)
Map virtual link ( i , j ) to path m a l ;
(9)
Map virtual nodes i , j to physical source and physical destination nodes on path m l ;
(10)
End.
Algorithm 4. uRLLC slice deployment algorithm
Input: Network slicing request and physical network.
Output: Success or failure of link mapping process.
(1)
Sort the virtual links in non-increasing order according to the bandwidth requirements of the network slicing request ( i , j ) L v i ;
(2)
For ( i , j ) L v i , do;
(3)
Find all physical candidate paths AB that can map virtual link ( i , j ) ;
(4)
Select the path m i l A B with the minimum number of hops;
(5)
Map virtual link ( i , j ) to m i l ;
(6)
Map virtual nodes i , j to physical source and physical destination nodes on path m i l ;
(7)
End.

3.3. Reward Functions

The learning of the whole system is driven by the reward function. The agent maximizes the reward by interacting with the environment. When the agent decides to access the network slicing request, it chooses the type of slicing request to improve the efficiency of resource utilization and reduce the cost. When the agent rejects the slicing request, it directly ends the action without wasting time. The reward function is as follows:
r x { z i , n ( k ) + ξ P i , n ( k ) ,   x = 1 0 ,   x = 0
Based on the state space and action space mentioned above, the training is undertaken with the goal of obtaining the optimal reward. The flow of the MADDPG-based 5G network slicing deployment algorithm comprises two parts of centralized training and decentralized decision-making, as shown in Figure 2.
During training, the intelligence runs at each time slot j and selects the sliced traffic requests r i ( k ) in the allowed network slicing requests x i ( k ) = 1 by the prediction module reward function by giving higher priority to network slicing requests for requests closer to the deadline T i ( k ) . Slicing requests r i ( k ) that are not fully satisfied are reserved for the next time slot j + 1. If it is not delivered within the last time T i ( k ) , a penalty value P i , j ( k ) is set. The prediction parameters are dynamically adjusted by the penalty value. The centralized training algorithm for MADDPG-based 5G network slicing deployment is shown in Algorithm 5.
Algorithm 5. MADDPG-based centralized training algorithm for 5G network slicing deployment
Input: Virtual network parameters, physical network parameters, network slice request, S , r i ( k ) , and ϑ .
Output: Target actor network parameters θ μ .
(1)
Randomly initialize the actor network and critic network parameters θ μ , θ Q , z i , n ( k ) = 0 ,   P i , n ( k ) = 0 ;   i N ;
(2)
Initialize the experience replay area;
(3)
Set the initialization state of all agents s 0 = { s 1 0 , , s n 0 } ;
(4)
For r i , y ( k ) , T i ( k ) , do;
(5)
If x i ( k ) = = 1 ;
(6)
If ( T i ( k ) n ) 0 ;
(7)
A μ i = ( r i , y ( k ) t = 0 n z i , t ( k ) ) ( T i ( k ) n ) ϑ ;
(8)
Else
(9)
P i , n ( k ) = r i ( k ) t = 0 n z i , t ( k ) ;
(10)
r i ( k ) = 0 ;
(11)
End if;
(12)
End if;
(13)
S = ϑ , performing actions, obtaining rewards, and observing the next state;
(14)
Place the generated states, actions, and returns in the experience replay area;
(15)
While S > 0 , do;
(16)
Execute μ i in action space A , with rewards obtained in non-increasing order;
(17)
Place the highest award in H;
(18)
z i , n ( k ) = min { r i ( k ) , S | H | } ;
(19)
S = S i N z i , n ( k ) ;
(20)
r i ( k ) = r i ( k ) z i , n ( k ) ;
(21)
End while;
(22)
Update the critics’ network;
(23)
Update the network of actors;
(24)
Update the target network;
(25)
End for.
The decentralized decision process will take the centrally trained policy input and output the value of the action Q to determine the good or bad choice of the action to make a specific action selection. The specific algorithm steps are shown in Algorithm 6:
Algorithm 6. Decentralized decision-making algorithm for 5G network slicing deployment based on MADDPG
Input: Virtual network parameters, physical network parameters, target network parameters θ μ .
Output: Q -value.
(1)
Import the parameter q θ μ ;
(2)
Observe the state S = { s 1 , , s n } ;
(3)
Select actions based on the action set a = { x , ϕ } and allocate resources to the three types of slicing requests;
(4)
Execute action a and calculate the system return;
(5)
Return the Q -value.

4. Simulation Result

Time complexity and space complexity help to evaluate performance and resource requirements, compare the efficiency of different algorithms, optimize performance bottlenecks, manage limited computing resources, determine the feasibility of algorithms under specific hardware and time constraints, and theoretically prove the correctness and optimality of algorithms. To this end, we analyzed the time complexity and space complexity of the six proposed algorithms as shown in Table 3, where n represents virtual nodes, m represents physical nodes, T represents the time steps required for the training process, l represents virtual links, and p is the number of candidate paths.
This section presents the simulation results of the 5G network slicing deployment algorithm based on MADDPG. In the underlying physical network, various physical resources are unified as the physical resources of nodes. We used Python for simulation, with specific parameter settings as follows: the time window is defined as 3600 s and 7200 s, the number of physical nodes is 30, 60, and 90, the capacity of physical nodes is in the range of [20, 50], the link bandwidth is in the range of [20, 50], the virtual node capacity requirement is in the range of [5, 25], and the virtual link bandwidth requirement is in the range of [5, 25]. In our analysis, we mainly focus on the dynamic analysis of the system, the impact of the prediction module on the system, the resource efficiency of the three types of network slice requests, and the overall system operational efficiency. By comparing our proposed MADDPG algorithm with the DDPG algorithm and the ILP algorithm in terms of resource efficiency and slice request acceptance rate, the simulation results demonstrate the advantages of multi-agent deployment in slice requests. The specific simulation results are as follows:
In the test process for system returns, the impact of prediction on the system is compared. As shown in Figure 3 in the training process for the system returns R = ϑ i , k z i , n ( k ) ,   n T F , the prediction process provides traffic request information to the scheduling module, and based on this information the scheduling module selects whether to access the network slicing request to ensure the quality of service and the rational use of resources and reduce costs.
Figure 4 compares the resource efficiency of three types of network slicing requests (eMBB, mMTC, and uRLLC) using three different algorithms. As the number of slicing requests and the mapped physical paths increase, the resource efficiency decreases in all three cases. However, multi-agent collaboration significantly improves the resource allocation efficiency for slicing requests. The other two algorithms are less efficient than the one proposed in this paper, indicating its superiority in achieving on-demand resource allocation. Specifically, our algorithm allocates resources more effectively, ensuring the quality of service for various slicing requests while maximizing the utilization of the available physical resources.
Figure 5 compares the running times of three types of network slicing requests (eMBB, mMTC, and uRLLC). As shown in the figure, eMTC has the longest running time, followed by eMBB, and uRLLC has the shortest running time. This is consistent with our time complexity analysis of the algorithms. In the mMTC algorithm, node mapping and link mapping need to consider more physical nodes and physical links, which increases its time complexity. In the uRLLC algorithm, since the focus is on selecting the path with the fewest hops, the time complexity of path selection and link mapping is usually lower.
Figure 6 compares the performance of the proposed MADDPG algorithm with two other algorithms (DDPG and ILP) in terms of network slicing request acceptance rates. The results show that as the number of slicing requests increases, resource consumption also increases, while the acceptance rate tends to decrease. Due to the predictive capabilities of the prediction module and the presence of the experience replay area, the proposed algorithm significantly outperforms the other two algorithms in accepting slicing requests. This advantage makes the MADDPG algorithm particularly effective in handling complex network environments and high-load conditions, thereby helping to maintain a high level of service quality.
Figure 7 compares the running time of the proposed MADDPG algorithm with the other two algorithms (DDPG and ILP) as the number of network slicing requests increases. The results show that although our proposed algorithm is slightly inferior to the other two algorithms in terms of running time, it has significant advantages in resource allocation efficiency and acceptance rate. This is because MADDPG and DDPG both require multiple iterations for training, whereas ILP solves for the optimal solution in a single run, resulting in a shorter solving time under the same conditions. MADDPG needs to train multiple agents simultaneously, each with its own policy and value network, significantly increasing the computational complexity and time. However, our proposed algorithm significantly outperforms the other two algorithms in other aspects, making it worthwhile to consider the increased running time in exchange for improved network efficiency.

5. Conclusions

This paper proposes a 5G network slicing deployment algorithm based on Multi-Agent Deep Deterministic Policy Gradient (MADDPG), aimed at efficiently implementing network slicing requests while providing high reliability and high-quality service. Through prediction and scheduling modules, the algorithm can deploy and allocate resources for three typical slicing requests (eMBB, mMTC, uRLLC). Utilizing the concept of deep reinforcement learning, slicing agents act as intelligent bodies, taking the real-time resource status of the underlying physical network as the state space and using admission requests and resource allocation as actions. The optimal actions are produced through centralized training, and optimal returns are achieved through decentralized decision-making in interaction with the environment. The final simulation results show that the proposed algorithm performs excellently in terms of resource utilization and the slice deployment success rate, demonstrating its effectiveness and advantages. By improving network efficiency, this algorithm effectively implements network slicing requests on physical networks, showing significant application prospects. However, our algorithm still has room for improvement in terms of running time. In future work, we will optimize the algorithm process to reduce its time complexity.

Author Contributions

Conceptualization, L.Z. and J.L.; methodology, L.Z. and J.L.; software, L.Z.; validation, J.L.; formal analysis, L.Z. and J.L.; investigation, Q.Y. and C.X.; resources, L.Z.; data curation, J.L.; writing—original draft preparation, J.L. and Q.Y.; writing—review and editing, L.Z., Q.Y. and J.L.; visualization, J.L.; supervision, F.Z.; project administration, F.Z.; funding acquisition, F.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China, grant number 62375219 and the Natural Science Foundation of Shaanxi Province (2023-JC-JQ-58).

Data Availability Statement

The data that support the findings of this study are available from the corresponding author upon reasonable request.

Acknowledgments

This work was supported by Feng Zhao from the School of Automation at Xi’an University of Posts and Telecommunications.

Conflicts of Interest

The authors have no conflicts to disclose.

References

  1. Recommendation ITU. IMT Vision–Framework and Overall Objectives of the Future Development of IMT for 2020 and beyond, 1st ed.; Recommendation ITU: Geneva, Switzerland, 2015; pp. 1–19. [Google Scholar]
  2. Alliance, N.G.M.N. 5G White Paper, 2nd ed.; Next Generation Mobile Networks: Frankfurt, Germany, 2020; pp. 4–32. [Google Scholar]
  3. Han, B.; Gopalakrishnan, V.; Ji, L.; Lee, S. Network Functions Virtualisation: An Introduction, Benefits, Enablers, Challenges and Call for Action, 1st ed.; SDN and OpenFlow World Congress: Darmstadt, Germany, 2012; pp. 1–16. [Google Scholar]
  4. Cheng, X.; Su, S.; Zhang, Z.; Wang, H.; Yang, F.; Luo, Y.; Wang, J. Virtual Network Embedding Through Topology-Aware Node Ranking. Comput. Commun. Rev. 2011, 41, 38–47. [Google Scholar] [CrossRef]
  5. Fischer, A.; Botero, J.F.; Beck, M.T.; De Meer, H.; Hesselbach, X. Virtual Network Embedding: A Survey. IEEE Commun. Surv. Tutor. 2013, 15, 1888–1906. [Google Scholar] [CrossRef]
  6. Luu, Q.-T.; Kerboeuf, S.; Mouradian, A.; Kieffer, M. A Coverage-Aware Resource Provisioning Method for Network Slicing. IEEE/ACM Trans. Netw. 2020, 28, 2393–2406. [Google Scholar] [CrossRef]
  7. Bega, D.; Gramaglia, M.; Banchs, A.; Sciancalepore, V.; Costa-Pérez, X. A Machine Learning Approach to 5G Infrastructure Market Optimization. IEEE Trans. Mob. Comput. 2020, 19, 498–512. [Google Scholar] [CrossRef]
  8. Raza, M.R.; Rostami, A.; Wosinska, L.; Monti, P. A Slice Admission Policy Based on Big Data Analytics for Multi-Tenant 5G Networks. J. Light. Technol. 2019, 37, 1690–1697. [Google Scholar] [CrossRef]
  9. Davalos, E.; Baran, B. A Survey on Algorithmic Aspects of Virtual Optical Network Embedding for Cloud Networks. IEEE Access 2018, 6, 20893–20906. [Google Scholar] [CrossRef]
  10. Bh, D.; Jain, R.; Samaka, M.; Erbad, A. A Survey on Service Function Chaining. J. Netw. Comput. Appl. 2016, 75, 138–155. [Google Scholar]
  11. Wei, X.; Hu, S.; Li, H.; Yang, F.; Jin, Y. A Survey on Virtual Network Embedding in Cloud Computing Centers. Open Autom. Control Syst. J. 2015, 6, 414–425. [Google Scholar]
  12. Lischka, J.; Karl, H. A Virtual Network Mapping Algorithm Based on Subgraph Isomorphism Detection. In Proceedings of the 1st ACM Workshop on Virtualized Infrastructure Systems and Architectures, Barcelona, Spain, 17 August 2009; pp. 81–88. [Google Scholar]
  13. Han, B.; Sciancalepore, V.; Costa-Pérez, X.; Feng, D.; Schotten, H. Multiservice-Based Network Slicing Orchestration with Impatient Tenants. IEEE Trans. Wirel. Commun. 2020, 19, 5010–5024. [Google Scholar] [CrossRef]
  14. Butt, N.; Chowdhury, N.; Boutaba, R. Topology-Awareness and Reoptimization Mechanism for Virtual Network Embedding; Springer: Berlin/Heidelberg, Germany, 2010; Volume 6091, p. 39. ISBN 978-3-642-12962-9. [Google Scholar]
  15. Ludwig, K.; Fendt, A.; Bauer, B. An Efficient Online Heuristic for Mobile Network Slice Embedding. In Proceedings of the 2020 23rd Conference on Innovation in Clouds, Internet and Networks and Workshops (ICIN), Paris, France, 24–27 February 2020; pp. 139–143. [Google Scholar]
  16. Zhang, T.K.; Wang, X.F.; Yang, L.W.; Yang, D.C. A SFC Deployment and Computation Resource Allocation Joint Algorithm in Mobile Networks. J. Beijing Univ. Posts Telecommun. 2021, 44, 7. [Google Scholar]
  17. Caballero, P.; Banchs, A.; De Veciana, G.; Costa-Perez, X. Network Slicing Games: Enabling Customization in Multi-Tenant Mobile Networks. IEEE/ACM Trans. Netw. 2019, 27, 662–675. [Google Scholar] [CrossRef]
  18. Jiang, M.; Condoluci, M.; Mahmoodi, T. Network Slicing in 5G: An Auction-Based Model. In Proceedings of the 2017 IEEE International Conference on Communications (ICC), Paris, France, 21–25 May 2017; pp. 1–6. [Google Scholar]
  19. Yu, M.; Yi, Y.; Rexford, J.; Chiang, M. Rethinking Virtual Network Embedding: Substrate Support for Path Splitting and Migration. Comput. Commun. Rev. 2008, 38, 17–29. [Google Scholar] [CrossRef]
  20. Yan, Z.; Ge, J.; Wu, Y.; Li, L.; Li, T. Automatic Virtual Network Embedding: A Deep Reinforcement Learning Approach With Graph Convolutional Networks. IEEE J. Sel. Areas Commun. 2020, 38, 1040–1057. [Google Scholar] [CrossRef]
  21. Guan, W.; Wen, X.; Wang, L.; Lu, Z.; Shen, Y. A Service-Oriented Deployment Policy of End-to-End Network Slicing Based on Complex Network Theory. IEEE Access 2018, 6, 19691–19701. [Google Scholar] [CrossRef]
  22. Woldeyohannes, Y.T.; Mohammadkhan, A.; Ramakrishnan, K.K.; Jiang, Y. ClusPR: Balancing Multiple Objectives at Scale for NFV Resource Allocation. IEEE Trans. Netw. Serv. Manag. 2018, 15, 1307–1321. [Google Scholar] [CrossRef]
  23. Bhamare, D.; Krishnamoorthy, M.; Gumaste, A. Models and Algorithms for Centralized Control Planes to Optimize Control Traffic Overhead. Comput. Commun. 2015, 70, 68–78. [Google Scholar] [CrossRef]
  24. Fendt, A.; Lohmuller, S.; Schmelz, L.C.; Bauer, B. A Network Slice Resource Allocation and Optimization Model for End-to-End Mobile Networks. In Proceedings of the 2018 IEEE 5G World Forum (5GWF), Silicon Valley, CA, USA, 9–11 July 2018; pp. 262–267. [Google Scholar]
  25. Liu, Y.; Lu, Y.; Li, X.; Qiao, W.; Li, Z.; Zhao, D. SFC Embedding Meets Machine Learning: Deep Reinforcement Learning Approaches. IEEE Commun. Lett. 2021, 25, 1926–1930. [Google Scholar] [CrossRef]
  26. Ren, Y.; Guo, A.; Song, C.; Xing, Y. Dynamic Resource Allocation Scheme and Deep Deterministic Policy Gradient-Based Mobile Edge Computing Slices System. IEEE Access 2021, 9, 86062–86073. [Google Scholar] [CrossRef]
  27. Gong, Y.; Wei, Y.; Yu, F.R.; Han, Z. Slicing-based resource optimization in multi-access edge network using ensemble learning aided DDPG algorithm. J. Commun. Netw. 2023, 25, 1–14. [Google Scholar] [CrossRef]
  28. Sciancalepore, V.; Costa-Pérez, X.; Banchs, A. RL-NSB: Reinforcement Learning-Based 5G Network Slice Broker. IEEE/ACM Trans. Netw. 2019, 27, 1543–1557. [Google Scholar] [CrossRef]
  29. Blaszczyszyn, B.; Jovanovic, M.; Karray, M.K. How user throughput depends on the traffic demand in large cellular networks. In Proceedings of the 2014 12th International Symposium on Modeling and Optimization in Mobile, Ad Hoc, and Wireless Networks (WiOpt), Hammamet, Tunisia, 12–16 May 2014. [Google Scholar]
Figure 1. Framework for 5G network slicing deployment system.
Figure 1. Framework for 5G network slicing deployment system.
Electronics 13 03189 g001
Figure 2. MADDPG-based 5G network slicing deployment algorithm flow.
Figure 2. MADDPG-based 5G network slicing deployment algorithm flow.
Electronics 13 03189 g002
Figure 3. Comparison of system returns with and without prediction.
Figure 3. Comparison of system returns with and without prediction.
Electronics 13 03189 g003
Figure 4. The resource efficiency of the three types of network slicing requests: eMBB, mMTC, and uRLLC.
Figure 4. The resource efficiency of the three types of network slicing requests: eMBB, mMTC, and uRLLC.
Electronics 13 03189 g004
Figure 5. Scheduling time for each slice.
Figure 5. Scheduling time for each slice.
Electronics 13 03189 g005
Figure 6. Deployment acceptance rate.
Figure 6. Deployment acceptance rate.
Electronics 13 03189 g006
Figure 7. Runtime.
Figure 7. Runtime.
Electronics 13 03189 g007
Table 1. Symbol description table.
Table 1. Symbol description table.
NotationDescription
G S Physical network
N S Physical nodes
L S Physical links
G V Virtual network
N V Virtual nodes
L V Virtual links
C s ( n s ) The resource capacity of the physical node
C v ( n v ) The resource capacity of the virtual node
B i ( l i ) The bandwidth of the physical link
B v ( l v ) The bandwidth of the virtual link
Table 2. Table of network slicing traffic types.
Table 2. Table of network slicing traffic types.
kTType—5QI
010 msGBR—65
1100 msGBR—1
250 msNon-GBR—79
3300 msNon-GBR—6
410 msDelay Critical GBR—83
Table 3. Complexity analysis of the algorithm.
Table 3. Complexity analysis of the algorithm.
AlgorithmTime ComplexitySpace Complexity
Sorting O ( n + n log n ) O ( n )
eMBB Slice Deployment O ( m n + n log n ) O ( n + m 2 )
mMTC Slice Deployment O ( l p m ) O ( l p m )
uRLLC Slice Deployment O ( l p + l log l ) O ( l p m )
Centralized Training O ( T n log n ) O ( T n )
Decentralized Decision-Making O ( n ) O ( n )
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zhang, L.; Li, J.; Yang, Q.; Xu, C.; Zhao, F. MADDPG-Based Deployment Algorithm for 5G Network Slicing. Electronics 2024, 13, 3189. https://doi.org/10.3390/electronics13163189

AMA Style

Zhang L, Li J, Yang Q, Xu C, Zhao F. MADDPG-Based Deployment Algorithm for 5G Network Slicing. Electronics. 2024; 13(16):3189. https://doi.org/10.3390/electronics13163189

Chicago/Turabian Style

Zhang, Lu, Junwei Li, Qianwen Yang, Chenglin Xu, and Feng Zhao. 2024. "MADDPG-Based Deployment Algorithm for 5G Network Slicing" Electronics 13, no. 16: 3189. https://doi.org/10.3390/electronics13163189

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop