Next Article in Journal
Incentive Public Auditing Scheme with Identity-Based Designated Verifier in Cloud
Previous Article in Journal
HIT-GCN: Spatial-Temporal Graph Convolutional Network Embedded with Heterogeneous Information of Road Network for Traffic Forecasting
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Improved Adaptive Service Function Chain Mapping Method Based on Deep Reinforcement Learning

1
College of Software Engineering, Zhengzhou University of Light Industry, Zhengzhou 450001, China
2
School of Electronic and Communication Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(6), 1307; https://doi.org/10.3390/electronics12061307
Submission received: 6 February 2023 / Revised: 5 March 2023 / Accepted: 7 March 2023 / Published: 9 March 2023
(This article belongs to the Section Networks)

Abstract

:
With the vigorous development of the network functions virtualization (NFV), service function chain (SFC) resource management, which aims to provide users with diversified customized services of network functions, has gradually become a research hotspot. Usually, the network service desired by the user is randomness and timeliness, and the formed service function chain request (SFCR) is dynamic and real-time, which requires that the SFC mapping can be adaptive to satisfy dynamically changing user requests. In this regard, this paper proposes an improved adaptive SFC mapping method based on deep reinforcement learning (ISM-DRL). Firstly, an improved SFC request mapping model is proposed to abstract the SFC mapping process and decompose the SFC mapping problem into the SFCR mapping problem and the VNF reorchestration problem. Secondly, we use the deep deterministic policy gradient (DDPG), which is a deep learning framework, to jointly optimize the effective service cost rate and mapping rate to approximate the optimal mapping strategy for the current network. Then, we design four VNF orchestration strategies based on the VNF request rate and mapping rate, etc., to enhance the matching degree of the ISM-DRL method for different networks. Finally, the results show that the method proposed in this paper can realize SFC mapping processing under dynamic request. Under different experimental conditions, the ISM-DRL method performs better than the DDDPG and DQN methods in terms of average effective cost utilisation and average mapping rate.

1. Introduction

1.1. Global Technology Status

In recent years, with the explosive growth in the number of network users and the increasing diversification of network service needs, various new applications and demands have forced network service providers to provide more flexible and high-quality network services. In traditional network architectures, network functions (Firewall, Load Balance, DPI, etc.) are deployed on dedicated hardware devices to provide various network functions to users. However, dedicated hardware devices are generally bulky, expensive and equipped with dedicated software, which cannot be easily moved, extended and managed in a unified manner after deployment. In order to provide new network services, network service providers must continuously add new dedicated hardware devices, resulting in increasingly large and bloated network devices that lack flexibility.
Network function virtualization (NFV) [1] technology was created to address the traditional network function service deployment challenges. Network operators can use NFV technology to transform the way they build and operate their networks by softwareing traditional network functions into virtual network functions(VNF) [2], separating network functions such as firewalls, message routing and routers from dedicated hardware and consolidating them as virtual network function instances (VNFI) [3] on a range of industry-standard common hardware servers and storage. Using NFV technology, network operators can dynamically migrate virtual network functions to instantiate them in different networks on demand, without the need to install new hardware devices, as it takes only a few hours to deploy virtualised network function components on a common hardware platform compared to months to deploy dedicated network function hardware devices in a traditional network. Through NFV technology, network operators can greatly increase the flexibility and scalability of network function deployment and reduce their hardware investment costs and operation and maintenance costs.
In NFV, users obtain network services by initiating a service request to network service provider. The network service data flow passes through a series of VNFs in a specific order from the source node to the destination node in turn, and this chained service request is called service function chain (SFC) [4]. A network service request initiated by a user is called service function chain request (SFCR) [5]. SFCRs arrive at the network at different moments and stay in the network for a period of time before leaving. At the same time, different network services usually have different resource requirements, such as bandwidth, waiting time, etc. [6]. Network infrastructure resources provided by network service providers are limited and costly to maintain. In the face of complex and changing network service requests, network service operators can guide data flows through network functions quickly and timely with appropriate SFC mapping policies, improving network resource utilisation and network and service processing speed, so scientific and effective data flow guidance decisions become a key means to ensure efficient operation of network services. At the same time, the rapid development of artificial intelligence technology has had a huge impact on SFC applications, which can greatly improve SFC data flow selection and guidance due to its powerful learning capabilities. The introduction of AI techniques into the SFC mapping problem has important research implications for improving the SFC mapping rate and reducing the expenses and overheads of network service operators.

1.2. Related Works

Most of the research on SFC mapping problems is concentrated in diverse network scenarios such as 5G networks, the Internet of Things, and the Internet [7]. Wei et al. [8] proposed a delay-aware multi-path parallel SFC mapping method. This method divides SFCRs into multiple streams for parallel transmission, and solves the mapping optimization problem of multiple SFC streams by a quantum genetic algorithm. Compared with the Hill-Climbing algorithm, it minimizes the resource consumption and routing energy consumption, and effectively improves the mapping efficiency of parallel SFC. Xu et al. [9] proposed a reliability-and-energy-balanced service function chain mapping and migration method for IoT applications. This method, aimed at cost optimization, load balancing and resource configuration optimization, maps SFC requests to the network and provides backup to improve the reception rate and reliability of mapped requests. Yaghoubpour et al. [10] proposed a multi-level delay-guaranteed mapping method. This method considers queuing delay and propagation delay, and shares VNFIs to map an SFC to physical nodes, so as to maximize the profit gain of service providers. Aiming at service function chain mapping under specific physical network topology, the above methods effectively improve the mapping rate, reliability and revenue of network services. However, these methods do not consider that SFC requests change dynamically over time, and the existing physical network environment is not suitable for the changed requests, which leads to excessive load of network links and affects the stability of the network environment.
With the rise of artificial intelligence, researchers have applied machine learning methods to solve the dynamic SFC mapping problem and proposed a series of adaptive SFC mapping methods [11]. Fu et al. [12] proposed an SFC mapping algorithm based on DQN for the dynamic and complex IoT environment. This method adopts the resource allocation strategy of the Deep Q network and adopts experience replay and target network to improve the convergence performance. To overcome the instability and divergence of mapping results caused by different network topologies, the algorithm can adaptively learn and adjust the mapping strategy while making the mapping scheme improve performance and converge. However, DQN is usually used to deal with discrete action problems, which are difficult to deal with continuously. Therefore, the mapping effect of this method still needs to be improved. Li et al. [13] proposded a service chain mapping method based on reinforcement learning. This method optimizes the system load balancing by adjusting the feedback value function, and effectively reduces the average link delay of SFC under different network topologies. Tang et al. [14] proposed a DDPG-based virtual network function migration optimization method, which addresses the VNF migration problem caused by dynamic demand changes and jointly optimizes SFC latency and system energy consumption to maximize the benefits. Ouamri et al. [15] proposed a load-balancing optimization method based on DRL, this method guarantees fast convergence to the optimal solution by minimising load balancing through DDPG.Filali et al. [16] proposed a pre-emptive SDN load-balancing with machine learning for delay-sensitive applications method, this method proposes to pre-emptively balance the load in the SDN control plane to support network flows that require low-latency communications. Wang et al. [17] proposed a DRL-SFCP that maximizes the long-term average revenue by combining both the graph convolution network, which extracts the features of the physical network, and the sequence-to-sequence model, which captures the ordered information of the SFC request, to generate placement strategies. Tam et al. [18] proposed a priority-aware resource management method based on deep reinforcement learning for adaptive service function chaining. This method investigates the problem models of the primary features that impact the optimization of configuration times and resource utilization and provides the order of mappings according to SFCR priorities by modelling the configuration of real-time IoT service requests. In summary, the SFC mapping method based on machine learning shows a performance advantage when adapting to dynamically changing SFC mapping. However, blindly improving the mapping rate of service requests, ignoring the cost of service operation and maintenance, leads to high cost of SFC mapping.

1.3. Motivation and Contribution

Based on the inspiration from the literature [12,14], this paper aims to investigate adaptive SFC mapping methods under dynamic requests. From the comparison of the optimisation objectives and methods in Table 1, it is found that both literatures only target a single objective and neither considers changing the deployment of VNFs in the underlying physical network to make the deployment of VNFs more adaptive to dynamic service requests.
In response to the above problems, this paper improves the SFC mapping model by analysing the process of SFC mapping, and proposes an improved adaptive SFC mapping method based on deep reinforcement learning (ISM-DRL). DQN suffers from low sample utilisation, unstable result output and mainly deals with discrete action problems. The SFC mapping process requires continuous processing of incoming SFCRs and is a continuous type of action. DDPG uses an experience replay pool, target network freeze, new policy network, and soft update, which can effectively solve the sample and target value instability problem and apply the continuous action solution. Therefore, the ISM-DRL method uses the improved deep deterministic policy gradient (DDPG) [19] as a deep reinforcement learning framework, which can achieve continuous traffic scheduling and optimization of the joint goal. Based on the request rate and utilization rate of each VNF in the historical SFC mapping, the VNF is re-orchestrated after the mapping period is over, so as to improve the adaptability of the network topology to the SFCR.
The specific contributions are as follows:
(1)
We analyse that the existing SFC mapping methods are difficult to deal with the balance between mapping rate and effective service cost when facing dynamic SFC requests, and propose an adaptive SFC mapping method that jointly optimizes mapping rate and effective service cost.
(2)
The SFC mapping problem is decomposed into the SFCR mapping problem and the VNF re-orchestrating problem, meanwhile an improved SFCR mapping algorithm based on DDPG and a VNF reorchestration algorithm based on historical mapping data are proposed, respectively.
(3)
The pdh network structure topology in SDNlib is used as the experimental topology, and the change between the mapping rate and the effective service cost of ISM-DRL under different weight trends is verified.
The rest of the paper is structured as follows: In Section 2, the SFC mapping model and optimization objective are established, and in Section 3, an improved SFCR mapping algorithm based on DDPG and a VNF reorchestration algorithm based on historical mapping data are implemented. In Section 4, we verified the advantages of ISM-DRL in terms of the average mapping rate and average service cost through experimental comparisons. Finally, Section 5 presents our conclusion. Table 2 presents a list of all the parameters used in this paper.

2. Modelling of SFC Mapping

In this section, we describe the SFC mapping model and the SFC mapping problem, including the relevant parameter composition of the physical network, the set of network function requests and the request mapping layer involved in the mapping process, as well as how to decompose the SFC mapping problem into the SFCR problem and the VNF orchestration in a formulaic form.

2.1. Mapping Model

In the research of SFC mapping, the SFC mapping model usually includes two parts: the underlying physical network topology and the network function service request set [20], which is unclear in describing the process of mapping the service request to the physical network. We propose an improved SFC mapping model, in which a request mapping layer is added to abstract the SFCR mapping process to the underlying physical network, hiding extraneous mapping information. The improve the SFC mapping model is shown in Figure 1.
As shown in Figure 1, the underlying physical network topology includes physical service nodes and data switches, and is usually represented as weighted undirected graph G = { N , L } , where N = { n 1 , n 2 n m } denotes the inclusion of m physical service nodes and L = { l i , j | i , j m } denotes the physical link between two physical service nodes n i and n j . A variety of VNFIs can be deployed on each physical service node n i , and the VNFI set of the node is denoted as V N F I s i = { V N F x , p | p = 0 , 1 } . If  p = 0 indicates that the xth VNF is in a dormant state, the network function service cannot be provided. Conversely, p = 1 indicates that the xth VNF is active and can provide this network function service. The current remaining CPU resources and memory resources of physical service node n i are denoted by ( C ( n i ) , M ( n i ) ) .The bandwidth resource of physical link l i , j of physical service nodes n i and n j is denoted by B ( l i , j ) .
A set of network service function request is denoted as S R s = { S C F R 1 , S C F R 2 , S C F R 3 } , and the number of SFCRs contained in the set of service requests may not be the same due to the uncertainty of user requirements. The fth SFCR is denoted as S R f = ( V f , E f , d f ) , where V f = { v 1 f , v 2 f v l f } denotes the virtual network function node, v 1 f denotes the source node of S R f , v l f denotes the destination node, l denotes the number of functions required by this SFCR, and the order of v 2 f v l 1 f is the sequence in which the SFC network flow or service flow passes through the network function; E f = { e i , j f | i , j l } denotes a virtual link between virtual network functional nodes v i f and v j f ; d f indicates that the SFCR will release the service node resources and bandwidth resources occupied by the SFCR after unit time d. The CPU and memory resource requirements required by each virtual network function node v i f ( v i f V f ) during normal operation are represented by ( v C ( v i f ) , v M ( v i f ) ) , and the bandwidth resources required by each virtual link e i , j f ( e i , j f E f ) are represented by v B ( e i , j f ) .
In the request mapping layer, the undirected graph G M = ( V f , N , v E ) represents the service mapping graph of SFCR in the physical network, where V f is the set of virtual network function points of the fth SFCR; N is the set of physical network function service nodes; v E = { M v i f , n j } represents the mapping link between the ith virtual network function node v i f and the jth physical service node n j in the fth SFCR.

2.2. Problem Description

Since user requests for specific network services will change over time, this paper set a time period T = { t 1 , t 2 , t 3 } , take the time period as a processing process, and the number of SFCR requests S R s processed each time is the total number of requests in each time slot t. In NFV and SFC research, the time slot t is usually recorded as 100 unit time, and the unit time can be expressed in minutes (min) and seconds (s) [21]. However, a variety of VNFIs can be deployed on each physical service node. In the process of mapping the SFC to the physical network, two key problems need to be solved: SFCR mapping and VNF orchestration.

2.2.1. SFCR Mapping

The essence of the SFCR mapping problem is to find a set of service nodes that can satisfy the VNFI requested by users and the resources required for normal operation in the underlying physical network within timeslot t, and pass through the service nodes in the order of VNFs in the service request [22]. Constraints to be satisfied when mapping SFCRs in the network service function request set S R s :
v i f V f M v i f , n j = 1
i < l , j < m ( v C ( v i f ) , v M ( v i f ) ) ( C ( n j ) , M ( n j ) )
e i , j f E f , l i , j L v B ( e i , j f ) B ( l i , j )
Equation (1) represents that only one physical network service node is found for each VNF in the same service request SFCR to map. Equation (2) represents that the remaining CPU and memory resources of physical service nodes should meet the normal operation requirements of VNF during mapping. Equation (3) represents that the bandwidth resource required by virtual link e i , j f is less than that of physical link l i , j during mapping; this SFCR is updated into the request mapping layer G M when the mapping constraints are satisfied. The number of SFCRs successfully mapped in time slot t can be represented by n u m ( S R s ( t ) , S R f G M ( t ) ) , and the number of SFCR requests can be represented by n u m ( S R s ( t ) ) . Therefore, the average SFCR mapping rate is shown in Equation (4):
a v g M ( t ) = t = 1 T n u m ( S R s ( t ) , S R f G M ( t ) ) t = 1 T n u m ( S R s ( t ) ) × 100 %

2.2.2. VNF Orchestration

The essence of the VNF orchestration problem is whether to activate and dormant the VNFI deployed on the physical service node in this timeslot t. In order to achieve the goal of maximizing resource utilization and minimizing cost, after several times of mapping, the VNF is re-orchestrated in the physical service network topology according to the collected mapping data, so that the overall network functional topology is more in line with user needs and enhances the stability of the network environment.
In order to save service costs, a network service provider does not activate all VNFIs in response to a SFCR. If the VNFI set of the physical service node, through which the SFCR traffic flows, contains the required VNF type, the VNFI state of the type must be activated. Otherwise, in order to reduce operating costs, it is best to keep the VNFI in a dormant state when no business needs to be processed. Therefore, the total operating cost within time slot t is calculated as shown in Equation (5).
C r ( t ) = i = 1 m x = 1 k V N F I s i { V N F x , p | p = 1 } × r ( V N F x )
Among them, m is the number of physical service nodes, k is the VNF type, r ( V N F x ) is the VNFI operating cost of such type, and  V N F I s i { V N F x , p | p = 1 } represents the deployed and activated VNFIs of type x on the current physical service node. Activation cost is incurred when the activated VNFIs in type x VNF are fully occupied or the resources do not meet the mapping constraints and need to activate VNFIs that are dormant to have sufficient resources. The total activation cost is calculated as shown in Equation (6).
C a ( t ) = i = 1 m x = 1 k ( { V N F I s i ( t ) , V N F x | p = 1 } { V N F I s i ( t 1 ) , V N F x | p = 0 } ) × a ( V N F x )
Among them, { V N F I s i ( t ) , V N F x | p = 1 } denotes a type x VNF that is activated at time slot t; { V N F I s i ( t 1 ) , V N F x | p = 0 } denotes that the type x VNF is in a dormant state during the previous time slot of time slot t. a ( V N F x ) is the activation cost of this type of VNFI. When all instances of type x VNF are occupied, deployment costs are incurred when new VNFIs need to be deployed on the service nodes. The total deployment cost is shown in Equation (7).
C s ( t ) = i = 1 m x = 1 k ( { V N F I s i ( t ) | V N F x = 1 } { V N F I s i ( t 1 ) | V N F x = 0 } ) × s ( V N F x )
where { V N F I s i ( t ) | V N F x = 1 } represents the deployment status of type x VNFs deployed at time slot t, { V N F I s i ( t 1 ) | V N F x = 0 } indicates that type x VNFs are non-deployed at time slot t 1 . s ( V N F x ) is the cost of this VNFI deployment. Therefore, the total service cost in time slot t is shown in Equation (8).
C o ( t ) = C r ( t ) + C a ( t ) + C s ( t )
The service provider’s total service cost includes many unnecessary costs when processing user service requests, such as activated but idle VNFIs, which do not effectively reflect the effective service cost. Therefore, this paper proposes the effective service cost rate, which is based on the ratio of necessary costs to total costs when serving users. It is used to reflect how much of the current total service cost is necessary, and is on the same scale as the mapping rate, which can be operated in the same scale. The effective cost of the service rate formula is shown in Equation (9).
U r ( t ) = t = 1 T V N F I G M , p = 1 C r ( t ) t = 1 T C o ( t ) × 100 %

3. SFC Adaptive Mapping Scheme

In this section, we give an overview of the entire process of the proposed ISM-DRL method, and describe the process and parameters involved in the proposed “DDPG-based improved SFCR mapping algorithm” for the SFCR mapping problem, resulting in a table of SFCR algorithms. Similarly, for the VNF orchestration problem, the parameters involved in the VNF orchestration process are described in this section, and a table of VNF orchestration algorithms is developed.

3.1. ISM-DRL Framework

Deep reinforcement learning combines the perceptual capability of deep learning with the decision-making capability of reinforcement learning, allowing DRL methods to be applied to more complex action space and continuous action space scenarios. In this paper, we propose the SFC adaptive mapping method ISM-DRL based on DRL. The ISM-DRL framework uses DDPG from a novel DRL method, which constructs an actor–critic framework by combining the DQN method with the deterministic policy gradient method, using neural networks instead of policy functions and Q functions to form efficient and stable discrete action control models. The ISM-DRL framework is shown in Figure 2.
Figure 2 describes the SFCR mapping process and the VNF re-orchestration process in the ISM-DRL framework. In the SFCR mapping process, this paper introduces the DDPG algorithm in deep reinforcement learning to the SFCR mapping process based on the current network environment for SFCR mapping, and uses the advantages of the DDPG online network and target network, as well as the application of the soft update algorithm to explore the optimal mapping strategy by using the effective service cost rate and the mapping rate as joint optimization objectives to promote a more stable learning process and ensure model convergence. In the process of VNF re-orchestration, this paper redeploys VNFs and sets the activation and dormancy of this VNFI through the pre-set orchestration time period and historical data, such as the VNFI mapping rate and utilization rate, collected during the mapping phase, so that the network environment is closer to the user demand habits and improves the effective service cost rate and mapping rate.

3.2. SFCR Mapping Algorithm

In this paper, the service cost and mapping rate mainly involved in the SFC mapping process are formulated and a improved SFCR mapping algorithm based on DRL is proposed, which can be constituted by the tuple < S , A , R > , denoting the state space, action space and reward space, respectively.
(1)
State space
S ( t ) = { s 1 , s 2 , s 3 } denotes the state space. The method takes the effective service cost rate and mapping rate as the joint optimization objective, which is mainly related to the state’s information on the physical service node network and the set of service function requests at the current and historical moments. In this method, the two state features are jointly used as the state set s = { G ( t ) , S R s ( t ) } under the current time slot t to input the neural network for training.
(2)
Action space
A ( t ) = { a 1 , a 2 , a 3 } denotes the action space. The set of actions at time slot t can be represented by a t = { a v , a m , a s } , where a v is the mapping action between the VNF and the physical network service node, a m is the mapping between the virtual link and the physical link, and  a s is the VNF activation and dormancy action. Action strategy μ taken in each state can be calculated by function a t = μ ( s t ) .
(3)
Reward space
R ( t ) = { r 1 , r 2 , r 3 } denotes the reward space. Each action taken generates an immediate return r ( s t , a t ) . The reward in this paper takes the effective service cost rate and the mapping rate as the joint optimization objective, then the immediate return can be expressed as r ( s t , a t ) = α 1 U r ( t ) + α 2 a v g M ( t ) , α [ 0 , 1 ] , where α is the weight, and the larger the value indicates the more focus on which one has the influence on the final mapping result.
This paper denotes the expected value of the expected reward obtained when the action strategy taken at state s t is μ and sustained as the action value function Q μ ( s t , a t ) . The result is used to judge the goodness of the current action strategy choice and can be expressed by the Bellman Equation (10) [23].
Q μ ( s t , a t ) = E [ r ( s t , a t ) + γ Q μ ( s t + 1 , μ ( s t + 1 ) ) ]
In this paper, we use the convolutional neural network actor strategy network and critic value network to fit the current action strategy function and action value function, respectively. Since the learning process of a single neural network is very unstable, the strategy network and the value network each create two neural networks, called the online network and the target network. The neural network parameters of the online strategy network and the value network are noted as θ μ and θ Q , and the parameters of the target strategy network and the value network are noted as θ μ and θ Q , respectively. When updating the network parameters, the actual value of Q ( s i , a i | θ Q ) is obtained by putting the current state s i with action a i into the online value network. N * samples are randomly extracted from the experience replay pool D and sent to the target value network for training, resulting in the target value Q . Then the target return is obtained according to y i = r i + γ Q s i + 1 , μ ( s i + 1 | θ μ ) | θ Q . Finally, the critic network parameter θ Q is updated by the loss function L o s s = 1 / N × i ( y i Q ( s i , a i | θ Q ) ) 2 between the target return and the actual value, and the actor network parameters θ μ are updated using the policy gradient θ μ μ | s i 1 / N × i a Q ( s i , μ ( s i ) | θ Q ) θ μ μ ( s i | θ μ ) .
The parameters θ μ and θ Q of the target strategy network and target value network are updated by the soft update algorithm with the update formula shown in Equation (11).
θ μ = τ θ μ + ( 1 τ ) θ μ θ Q = τ θ Q + ( 1 τ ) θ Q
The soft update algorithm uses a step-by-step iterative update to find the average, which can ensure the stability of the target network update, and usually τ is taken as 0.001. The improved SFCR mapping algorithm based on DDPG is shown in Algorithm 1.
Algorithm 1: The improved SFCR mapping algorithm based on DDPG
1: Input: network topology G = { N , L } , Network service function request S R s = { S F C R 1 , S F C R 2 , S F C R 3 }
2: Output: mapping topology G M = ( V f , N , v E )
3: Random initialize parameter θ μ , θ Q , θ μ , θ Q and D
4:    For episode=1, T do
5:       Initialize mapping topology G M = ( V f , N , v E ) , state s 1 = { G ( e p i s o d e ) , S R s ( e p i s o d e ) } and update parameter τ
6:       For t in S R s do
7:          Select action a t = μ ( s t | θ μ ) according to the current policy
8:          Update the mapping topology G M ( G M , a t )
9:          Obtain r ( s t , a t ) and s t + 1
10:          Store transition [ s t , a t , r t , s t + 1 ] in D
11:          Sample a random mini batch of N * from D
12:          For [ s t , a t , r t , s t + 1 ] in N * do
13:              y i = r i + γ Q s i + 1 , μ ( s i + 1 | θ μ ) | θ Q
14:              L o s s = 1 / N × i ( y i Q ( s i , a i | θ Q ) ) 2
15:              θ μ μ | s i 1 / N × i a Q ( s i , μ ( s i ) | θ Q ) θ μ μ ( s i | θ μ )
16:              θ μ = τ θ μ + ( 1 τ ) θ μ
17:              θ Q = τ θ Q + ( 1 τ ) θ Q
18:          End for
19:       End for
20:    End for

3.3. VNF Re-orchestration Algorithm

The improved SFCR mapping algorithm based on DDPG can obtain optimal mapping results in the current network VNF deployment environment. However, user demand for different VNFs varies from time-to-time, and if SFC mapping is continuously performed only for the current network VNF deployment environment, it is easy to form service nodes and links with high demand in a long-term load state, while service nodes and links with low demand in a long-term idle state is more challenging. This results in unreasonable resource allocation and fails to improve the effective service cost rate and mapping rate. Therefore, this paper proposes the VNF re-orchestration algorithm based on historical data such as request rate and utilization rate of each VNF in SFC mapping.
The algorithm is implemented based on the set T e m p s of samples of the first H time slots extracted from the experience replay pool D, and if T e m p s is empty, the VNF deployment in the base physical network is initially set. Conversely, the VNFs are redeployed by counting the request rate R e s , utilization rate U s e s , the number of existing deployments V a and the number of inactivations S l p of different VNFs. The current traversal period is recorded with parameter t, and the current traversal VNF is recorded with parameter x. The respective statistical Equations are shown in (12)–(15).
R e s ( x ) = S u m ( S R s ( t ) , V N F x ) S u m ( S R s ( t ) ) × 100 %
U s e s ( x ) = S u m ( G M ( t ) , V N F x ) S u m ( S R s ( t ) ) × 100 %
V a ( x ) = S u m ( S R s ( t ) | V N F x , p = 0 ) + S u m ( S R s ( t ) | V N F x , p = 1 )
S l p ( x ) = S u m ( S R s ( t ) | V N F x , p = 0 )
VNF re-orchestration algorithm is shown in Algorithm 2.
Algorithm 2: VNF re-orchestration algorithm
1: Input: network topology G = { N , L }
2: Output: Updated G
3: Obtain the first H samples from D to form T e m p s
4:    If T e m p s = : Initalize G = { N , L } ; Break
5: Initalize R e s = ( 0 , 0 , 0 ) , U s e s = ( 0 , 0 , 0 ) , V a = ( 0 , 0 , 0 ) , S l p = ( 0 , 0 , 0 )
6: Calculate R e s ( x ) , U s e s ( x ) , V a ( x ) and S l p ( x ) of various VNFs at different time slots t according to Equations (12)–(15)
7:    For 1, M do
8:       Average request rate: A v g R e s = x = 0 l e n ( R e s ) R e s ( x ) / l e n ( R e s )
9:       Average usage rate: A v g U s e s = x = 0 l e n ( U s e s ) U s e s ( x ) / l e n ( U s e s )
10:       Average number of deployments: A v g V a = x = 0 l e n ( V a ) R e s ( x ) / l e n ( V a )
11:       Average number of inactivations: A v g S l p = x = 0 l e n ( S l p ) R e s ( x ) / l e n ( S l p )
12:    End for
13:    For 1, M do
14:       If R e s ( x ) A v g R e s × 0.7 AND U s e s ( x ) A v g U s e s × 0.7 AND V a ( x )   A v g V a × 1.2 AND S l p ( x ) A v g S l p × 1.1 :
15:          Uninstall 20% of this VNF
16:       If R e s ( x ) A v g R e s × 1.3 AND U s e s ( x ) A v g U s e s × 1.3 AND V a ( x ) A v g V a × 0.8 AND S l p ( x ) = = 0 :
17:          Install 20% of this VNF
18:       If R e s ( x ) A v g R e s × 1.1 AND U s e s ( x ) A v g U s e s × 1.1 AND S l p ( x ) 0 :
19:          Activate 10% of this VNF
20:       If R e s ( x ) A v g R e s × 0.9 AND U s e s ( x ) A v g U s e s × 0.9 AND V a ( x ) S l p ( x ) 0 :
21:          Dormant 10% of this VNF
22:    End for
23:    Updating network VNF deployments: G G

3.4. Time Complexity

In order to verify and analyse the practicality of the algorithm, the time complexity of the ISM-DRL method needs to be calculated. The ISM-DRL includes the SCFR mapping process as and VNF re-orchestration process, so the complexity of both parts needs to be analysed. In the SFCR process, multiple iterations are learned using the DDPG algorithm, so we use the number of iterations and training batches as the basis to obtain the time complexity of the SFCR process as O ( T N * V ) , where T is the total number of iterations, N * is the training batch, and V is the SFCR set. During VNF re-orchestration, the time complexity is O ( M V 2 ) . Since M denotes the number of VNFs in the SFC, the complexity of VNF orchestration is O ( V 2 ) . Therefore, the total time complexity is O ( V 2 + T N * V ) .

4. Experimental Evaluation

In this section, we introduce the physical and software environments for building the simulation experiments, and set up the physical network topology, SFC and DDPG learning rates, experience replay pool size and other relevant parameters used in the simulation experiments. At the same time, the experimental results obtained by running the proposed method with DDPG and DQN methods in three sets of environments with different weights are analysed.

4.1. Experimental Environment and Parameter Configuration

To evaluate the effectiveness of the ISM-DRL method, the effective service cost rate and mapping rate are used as evaluation criteria in this paper. The simulation experiments are conducted on a device with 32GB-DDR4 memory and two GTX-3060 8G graphics cards based on Python3.5, tensorflow1.6-gpu. In order to verify the accuracy and credibility of evaluating ISM-DRL, the DDPG and DQN methods are compared and Monte Carlo methods are used. We conducted 100 sets of simulation tests in each experimental scenario and took the average of 100 sets of simulation tests as the test results.
In this paper, the underlying physical service network uses the pdh network structure in SDNlib, which contains 11 nodes and 34 links. In order to simulate the real network environment, the rest of the experimental parameters is shown in Table 3:
During the ISM-DRL training process, the neural network uses the Adam optimizer and the Relu activation function. Its parameters involve the number of algorithm-training steps, the learning rate, the target network parameter update rate, and the size of the experience replay pool during the DDPG training process. The specific configuration of DDPG is shown in Table 4.

4.2. Experimental Comparison

In the design of the experimental environment, we consider the influence of the weight of the effective service cost rate and the mapping rate on the optimization effect of the IMS-DRL method, and divide the experimental environment into (1) α 1 = 0.3 , α 2 = 0.7 ; (2) α 1 = 0.5 , α 2 = 0.5 ; and (3) α 1 = 0.7 , α 2 = 0.3 into three groups with the average effective service cost rate and the average mapping rate as the objectives, where α 1 and α 2 , respectively, represent the degree of influence of the effective service cost rate and the mapping rate on the final optimization effect, and conduct a comparison experiment with the DDPG method and the DQN method.

4.2.1. Average Effective Service Cost Rate

The experimental comparison results of the average effective cost rate under different experimental environments are shown in Figure 3, Figure 4 and Figure 5. The results show that as the reward weight α 1 of the effective service cost rate increases, the average effective service cost rate after the convergence of all three algorithms shows an increasing trend, and the effect of ISM-DRL is better than other algorithms. From Table 5, at the average effective cost rate at T = 500, the average effective service cost rate for the ISM-DRL methodology at α 1 = 0.7 is 84.86%, an improvement of approximately 18.63% compared to 66.23% at weighting factor α 1 = 0.3 , while DDPG improves by 15.24% and DQN improves by 15.18%. The ISM-DRL improved by 13.9% compared to the DDPG and by 24.3% compared to the DQN at α 1 = 0.7 . ISM-DRL is more effective than DDPG and DQN in improving the effective service cost rate because the ISM-DRL approach uses the request rate, etc., as a guide for VNF redeployment, and reduces the unused unnecessary service cost at the time of mapping by adjusting the number of VNFI deployments and the VNFI activation and dormancy states.

4.2.2. Average Mapping Rate

The experimental comparison results of the average mapping rate under different experimental environments are shown in Figure 6, Figure 7 and Figure 8. The results show that as the mapping rate reward weight α 2 decreases, the average mapping rate of all three algorithms shows a decreasing trend, which is due to the fact that as the time period increases, the number of requests from SFCs is accumulatively increasing, thus the node resources and link resources in the underlying physical network are continuously occupied, causing some SFCs to fail in mapping due to insufficient resources being allocated to them. However, the ISM-DRL method is significantly slower than the DDPG and DQN methods. From Table 6 at the average mapping rate T = 500, the mapping rate of ISM-DRL is 75.76% at α 2 = 0.7 , which is 15.09% higher than that of DDPG and 32.82% higher than that of DQN. ISM-DRL has the best convergence on the average mapping rate compared to DDPG and DQN because the ISM-DRL method redeploys VNFs based on the user request rate and the actual mapping rate of VNFs every 50 time periods. By increasing the number of VNFIs with high VNF request rates and low actual mapping rates and setting them to active status, more alternative VNFIs are provided for mapping for VNFs with high user request rates in subsequent SFCs. In Figure 3, Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8, as the ISM-DRL method rearranges the VNF every 50 time periods, the fold corresponding to ISM-DRL fluctuates significantly after each orchestration, while the DDPG and DQN methods do not regulate the number of VNFs deployed and activated, and the overall trend becomes smoother.

5. Conclusions

This paper investigates SFC mapping in dynamic request scenarios and divides the SFC mapping process into the SFCR problem and the VNF re-programming problem. Firstly, an improved SFCR mapping algorithm based on DDPG is proposed for the SFCR problem, which learns the demand degree of VNFs at each time slot of SFCR for dynamic service requests and improves the mapping rate of the SFC. The VNF re-orchestration algorithm is proposed for VNF deployments in underlying physical networks that are unable to adapt to changing SFCRs. Re-orchestrated VNFs based on historical information from the SFCR increase the installation and activation of high-request-rate VNFs and make low-request-rate VNFs uninstall and dormant by installing, uninstalling, activating and dormant VNFI operations to improve cost effective utilisation. The experimental results show that the proposed method can effectively improve the effective cost utilisation and mapping rate in the face of dynamic SFCRs, and improve the ability of the underlying network to adapt itself to the upper layer services.
In future work, we will continue our research adaptive mapping under dynamic requests to explore how deep reinforcement learning can better incorporate SFCRs for the SFCR mapping problem, resulting in faster convergence of the algorithm and higher mapping rates. For the VNF orchestration problem, we will also consider introducing deep reinforcement learning methods to better improve the adaptability of the underlying network to dynamic requests by predicting the dynamic changes in future requests through the autonomous learning and exploration capabilities of deep reinforcement learning.

Author Contributions

Conceptualization, W.H. and S.L.; methodology, W.H., S.L. and S.W.; software, S.L. and H.L.; validation, W.H., S.L., S.W. and H.L.; formal analysis, W.H. and S.L.; investigation, S.L., S.W. and H.L.; data curation, W.H., S.L. and H.L.; writing—original draft preparation, W.H. and S.L.; writing—review and editing, W.H., S.L., S.W. and H.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in part by the Project of Science and Technology in Henan Province (222102210175, 222102210111, 222102210096), and in part by the Postgraduate Education Reform and Quality Improvement Project of Henan Province (YJS2022AL035).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:
SFCService Function Chain
SFCRService Function Chain Request
NFVNetwork Function Virtualization
VNFVirtual Network Function
VNFIVirtual Network Function Instance
DRLDeep Reinforcement Learning
DDPGDeep Deterministic Policy Gradient

References

  1. Sun, G.; Xu, Z.; Yu, H.; Chang, V. Dynamic network function provisioning to enable network in box for industrial applications. IEEE Trans. Ind. Inform. 2020, 17, 7155–7164. [Google Scholar] [CrossRef]
  2. Mei, C.; Liu, J.; Li, J.; Zhang, L.; Shao, M. 5G network slices embedding with sharable virtual network functions. J. Commun. Netw. 2020, 22, 415–427. [Google Scholar] [CrossRef]
  3. Fang, L.; Zhang, X.; Sood, K.; Wang, Y.; Yu, S. Reliability-aware virtual network function placement in carrier networks. J. Netw. Comput. Appl. 2020, 154, 102536. [Google Scholar] [CrossRef]
  4. Qiu, H.; Tang, H.; You, W. Online Service Function Chain Deployment Method Based on Deep Q Network. J. Electron. Inf. Technol. 2021, 43, 3122–3130. [Google Scholar]
  5. Herrera, J.G.; Botero, J.F. Tabu Search For Service Function Chain Composition In NFV. IEEE Lat. Am. Trans. 2021, 19, 17–25. [Google Scholar] [CrossRef]
  6. Han, X.; Meng, X.; Yu, Z.; Zhai, D. A Dynamic Adjustment Method of Service Function Chain Resource Configuration. KSII Trans. Internet Inf. Syst. 2021, 15, 2783–2804. [Google Scholar]
  7. Zhang, D.; Zheng, Z.; Lin, X.; Chen, X.; Wu, C. Dynamic backup sharing scheme of service function chains in NFV. China Commun. 2022, 19, 178–190. [Google Scholar] [CrossRef]
  8. Wei, S.; Zhou, J.; Chen, S. Delay-Aware Multipath Parallel SFC Orchestration. IEEE Access 2022, 10, 120035–120055. [Google Scholar] [CrossRef]
  9. Xu, S.; Liao, B.; Hu, B.; Han, C.; Yang, C.; Wang, Z.; Xiong, A. A reliability-and-energy-balanced service function chain mapping and migration method for Internet of Things. IEEE Access 2020, 8, 168196–168209. [Google Scholar] [CrossRef]
  10. Yaghoubpour, F.; Bakhshi, B.; Seifi, F. End-to-end delay guaranteed Service Function Chain deployment: A multi-level mapping approach. Comput. Commun. 2022, 194, 433–445. [Google Scholar] [CrossRef]
  11. Li, G.; Feng, B.; Zhou, H.; Zhang, Y.; Sood, K.; Yu, S. Adaptive service function chaining mappings in 5G using deep Q-learning. Comput. Commun. 2020, 152, 305–315. [Google Scholar] [CrossRef]
  12. Fu, X.; Yu, F.R.; Wang, J.; Qi, Q.; Liao, J. Dynamic service function chain embedding for NFV-enabled IoT: A deep reinforcement learning approach. IEEE Trans. Wirel. Commun. 2019, 19, 507–519. [Google Scholar] [CrossRef]
  13. Li, W.; Wu, H.; Jiang, C.; Jia, P.; Li, N.; Lin, P. Service Chain Mapping Algorithm Based on Reinforcement Learning. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; pp. 800–805. [Google Scholar]
  14. Tang, L.; He, L.; Tan, Q.; Chen, Q. Virtual Network Function Migration Optimization Algorithm Based on Deep Deterministic Policy Gradient. J. Electron. Inf. Technol. 2021, 43, 404. [Google Scholar]
  15. Ouamri, M.A.; Barb, G.; Singh, D.; Alexa, F. Load Balancing Optimization in Software-Defined Wide Area Networking (SD-WAN) using Deep Reinforcement Learning. In Proceedings of the 2022 International Symposium on Electronics and Telecommunications (ISETC), Timisoara, Romania, 10–11 November 2022; pp. 1–6. [Google Scholar] [CrossRef]
  16. Filali, A.; Mlika, Z.; Cherkaoui, S.; Kobbane, A. Preemptive SDN Load Balancing With Machine Learning for Delay Sensitive Applications. IEEE Trans. Veh. Technol. 2020, 69, 15947–15963. [Google Scholar] [CrossRef]
  17. Wang, T.; Fan, Q.; Li, X.; Zhang, X.; Xiong, Q.; Fu, S.; Gao, M. DRL-SFCP: Adaptive Service Function Chains Placement with Deep Reinforcement Learning. In Proceedings of the ICC 2021—IEEE International Conference on Communications, Montreal, QC, Canada, 14–23 June 2021; pp. 1–6. [Google Scholar] [CrossRef]
  18. Tam, P.; Math, S.; Kim, S. Priority-Aware Resource Management for Adaptive Service Function Chaining in Real-Time Intelligent IoT Services. Electronics 2022, 11, 2976. [Google Scholar] [CrossRef]
  19. Xu, S.; Li, Y.; Guo, S.; Lei, C.; Liu, D.; Qiu, X. Cloud–Edge Collaborative SFC Mapping for Industrial IoT Using Deep Reinforcement Learning. IEEE Trans. Ind. Inform. 2022, 18, 4158–4168. [Google Scholar] [CrossRef]
  20. Li, J.; Shi, W.; Wu, H.; Zhang, S.; Shen, X. Cost-Aware Dynamic SFC Mapping and Scheduling in SDN/NFV-Enabled Space–Air–Ground-Integrated Networks for Internet of Vehicles. IEEE Internet Things J. 2022, 9, 5824–5838. [Google Scholar] [CrossRef]
  21. Xiao, Y.; Zhang, Q.; Liu, F.; Wang, J.; Zhao, M.; Zhang, Z.; Zhang, J. NFVdeep: Adaptive Online Service Function Chain Deployment with Deep Reinforcement Learning. In Proceedings of the International Symposium on Quality of Service, IWQoS’19, Phoenix, AZ, USA,, 24–25 June 2019; Association for Computing Machinery: New York, NY, USA, 2019. [Google Scholar]
  22. Yue, Y.; Cheng, B.; Liu, X.; Wang, M.; Li, B.; Chen, J. Resource optimization and delay guarantee virtual network function placement for mapping SFC requests in cloud networks. IEEE Trans. Netw. Serv. Manag. 2021, 18, 1508–1523. [Google Scholar] [CrossRef]
  23. Muzahid, A.J.M.; Kamarulzaman, S.F.; Rahman, M.A.; Alenezi, A.H. Deep Reinforcement Learning-Based Driving Strategy for Avoidance of Chain Collisions and Its Safety Efficiency Analysis in Autonomous Vehicles. IEEE Access 2022, 10, 43303–43319. [Google Scholar] [CrossRef]
Figure 1. The improved SFC mapping model.
Figure 1. The improved SFC mapping model.
Electronics 12 01307 g001
Figure 2. The ISM-DRL framework.
Figure 2. The ISM-DRL framework.
Electronics 12 01307 g002
Figure 3. Average effective service cost rate at α 1 = 0.3 , α 2 = 0.7 .
Figure 3. Average effective service cost rate at α 1 = 0.3 , α 2 = 0.7 .
Electronics 12 01307 g003
Figure 4. Average effective service cost rate at α 1 = 0.5 , α 2 = 0.5 .
Figure 4. Average effective service cost rate at α 1 = 0.5 , α 2 = 0.5 .
Electronics 12 01307 g004
Figure 5. Average effective service cost rate at α 1 = 0.7 , α 2 = 0.3 .
Figure 5. Average effective service cost rate at α 1 = 0.7 , α 2 = 0.3 .
Electronics 12 01307 g005
Figure 6. Average mapping rate at α 1 = 0.3 , α 2 = 0.7 .
Figure 6. Average mapping rate at α 1 = 0.3 , α 2 = 0.7 .
Electronics 12 01307 g006
Figure 7. Average mapping rate at α 1 = 0.5 , α 2 = 0.5 .
Figure 7. Average mapping rate at α 1 = 0.5 , α 2 = 0.5 .
Electronics 12 01307 g007
Figure 8. Average mapping rate at α 1 = 0.7 , α 2 = 0.3 .
Figure 8. Average mapping rate at α 1 = 0.7 , α 2 = 0.3 .
Electronics 12 01307 g008
Table 1. Comparison of Optimization Objectives and Methods.
Table 1. Comparison of Optimization Objectives and Methods.
Optimising CostsOptimised SFCR MappingVNF ReorchestrationMethod
This paperDDPG
Literature [12]××DQN
Literature [14]××DDPG
Table 2. List of all the parameters.
Table 2. List of all the parameters.
ParameterDescription
G = { N , L } The underlying physical network topology
NPhysical service nodes
LPhysical link
LPhysical link
V N F I s i VNFI set of physical service nodes
( C ( n i ) , M ( n i ) ) The current remaining CPU resources and memory resources
B ( l i , j ) The current remaining bandwidth resource
S R s A set of network service function request
S R f The fth SFCR
v i f ( v i f V f ) The CPU and memory resource requirements required by VNF
v B ( e i , j f ) The bandwidth resources required by each virtual link
G M = ( V f , N , v E ) The service mapping graph
TTime period and Training steps
Table 3. Simulation experiment parameters configuration.
Table 3. Simulation experiment parameters configuration.
Experimental ParametersParameter Value
Physical Node CPU and Memory ResourcesRandom distribution between [200, 300]
Physical Link ResourcesRandom distribution between [200, 300]
Number of VNFIs deployed per physical nodeUniform distribution between [2, 5]
Variety of virtual network functionsM = 5 kinds
SFCR Composition2∼5 kinds of network functions
Time Period10 s
Number of SFCRs arrived in the time periodPoisson distribution with an average of 5
Service time per request dRandom distribution between [10, 50]
CPU and memory resources required for normal operation of VNFRandom distribution between [10, 20]
Virtual link bandwidth normal operation requirementsRandom distribution between [10, 20]
Table 4. The specific configuration of DDPG.
Table 4. The specific configuration of DDPG.
Experimental ParametersParameter Value
Training steps T5000
Learning rate of actor/critic l r 0.002/0.001
Target network parameter update rate t a u 0.001
Size of the experience replay pool D5000
Discount factor λ 0.7
Greedy ε 0.01
Raining batch size N * 128
Reward weight parameter α 1   α 2 0–1
Table 5. Average effective cost rate at T = 500.
Table 5. Average effective cost rate at T = 500.
ISM-DRLDDPGDQN
α 1 = 0.3 66.23%55.72%45.38%
α 1 = 0.5 75.89%64.72%53.49%
α 1 = 0.7 84.86%70.96%60.56%
Table 6. Average mapping rate at T = 500.
Table 6. Average mapping rate at T = 500.
ISM-DRLDDPGDQN
α 2 = 0.7 75.76%60.67%42.94%
α 2 = 0.5 62.83%49.75%40.11%
α 2 = 0.3 57.68%46.45%31.41%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Huang, W.; Li, S.; Wang, S.; Li, H. An Improved Adaptive Service Function Chain Mapping Method Based on Deep Reinforcement Learning. Electronics 2023, 12, 1307. https://doi.org/10.3390/electronics12061307

AMA Style

Huang W, Li S, Wang S, Li H. An Improved Adaptive Service Function Chain Mapping Method Based on Deep Reinforcement Learning. Electronics. 2023; 12(6):1307. https://doi.org/10.3390/electronics12061307

Chicago/Turabian Style

Huang, Wanwei, Song Li, Sunan Wang, and Hui Li. 2023. "An Improved Adaptive Service Function Chain Mapping Method Based on Deep Reinforcement Learning" Electronics 12, no. 6: 1307. https://doi.org/10.3390/electronics12061307

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop