Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario

Liu, Bei; Long, Shuting; Su, Xin

doi:10.3390/e26100820

Open AccessArticle

Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario

by

Bei Liu

^1,*

,

Shuting Long

¹ and

Xin Su

²

¹

School of Communication and Information Engineering, Chongqing University of Posts and Telecommunications, Chongqing 400065, China

²

Department of Electronic Engineering, Tsinghua University, Beijing 100084, China

^*

Author to whom correspondence should be addressed.

Entropy 2024, 26(10), 820; https://doi.org/10.3390/e26100820

Submission received: 6 August 2024 / Revised: 14 September 2024 / Accepted: 23 September 2024 / Published: 25 September 2024

(This article belongs to the Special Issue Advanced New Physical Layer Technologies for Next-Generation Wireless Communications)

Download

Browse Figures

Versions Notes

Abstract

:

Future 6G networks will inherit and develop Network Function Virtualization (NFV) architecture. With the NFV-enabled network architecture, it becomes possible to establish different virtual networks within the same infrastructure, create different Virtual Network Functions (VNFs) in different virtual networks, and form Service Function Chains (SFCs) that meet different service requirements through the orderly combination of VNFs. These SFCs can be deployed to physical entities as needed to provide network functions that support different services. To meet the highly dynamic service requirements in the future 6G Internet of Things (IoT) scenario, the highly flexible and efficient SFC reconfiguration algorithm is the key research direction. Deep-learning-based algorithms have shown their advantages in solving this type of dynamic optimization problem. Considering that the efficiency of the traditional Actor Critic (AC) algorithm is limited, the policy does not directly participate in the value function update. In this paper, we use the Proximal Policy Optimization (PPO) clip function to restrict the difference between the new policy and the old policy, to ensure the stability of the updating process. We combine PPO with AC, and further bring the historical decision information as the network knowledge to offer better initial policies, to accelerate the training speed. We also propose the Knowledge = Assisted Actor Critic Proximal Policy Optimization (KA-ACPPO)-based SFC reconfiguration algorithm to ensure the Quality of Service (QoS) of end-to-end services. Simulation results show that the proposed KA-ACPPO algorithm can effectively reduce computing cost and power consumption.

Keywords:

6G IoT; SFC reconfiguration; knowledge assisted; ACPPO algorithm

1. Introduction

Looking towards 2030 and beyond, 6G will provide lower latency and more reliable end-to-end connectivity to power Industry 4.0 to Industry 5.0 [1]. The network environment and service requirements are highly dynamic in future 6G IoT scenarios. To cope with highly dynamic service requirements, 6G networks will inherit and develop NFV architecture. With the NFV-enabled network, it becomes possible to establish different virtual networks within the same infrastructure and create different VNFs in different virtual networks. When the service requirements arrive, VNFs are combined in an orderly manner to form SFCs to meet different service requirements. In this way, SFC deployment based on mobile edge computing and network function virtualization can be a viable solution for providing flexible and controlled network services [2]. When the service requirement changes, the corresponding SFC needs to be reconfigured to guarantee the QoS requirements. To meet the low latency requirements of 6G services, highly flexible and efficient SFC orchestration and reconfiguration algorithms have become a crucial problem.

The SFC reconfiguration problem involves the study of how to modify, re-organize, and monitor the deployed SFC to cope with network dynamic changes and meet new service requirements. SFC reconfiguration is always implemented by VNF migration. Reference [3] modeled the SFC reconfiguration problem as an Integer Linear Programming (ILP) formulation and proposed a greedy-based heuristic algorithm to solve the problem Reference [4] proposed a threshold-dependent scalable cluster VNF migration algorithm to minimize the cost of embedding VNF under the condition of satisfying delay constraints. Reference [5] described the VNF migration problem as a new graph theory problem and proposed an efficient heuristic algorithm based on dynamic programming to effectively alleviate dynamic traffic and reduce total traffic costs. Reference [6] studied the reconfiguration problem of a set of SFCs with different priorities. A polynomial time heuristic algorithm was proposed to quickly deploy emergency SFCs while meeting the requirements of emergency SFC input traffic and resource constraints. Another precise algorithm was also proposed to achieve maximum profit for service providers. Reference [7] aimed to minimize the total migration cost and used a temporal conventional network to predict network traffic, proposing a fast and efficient heuristic VNF migration algorithm.

Nowadays, with the rapid development of Artificial Intelligence (AI), deep learning has been widely investigated and has shown advantages in coping with this type of dynamic problem. Reference [8] proposed a DRL-based algorithm to provide fast VNF migration decisions in highly dynamic environments, with the goal of minimizing the weighted total latency and cost of VNF migration. Reference [9] used a mixed-density neural network to accurately model complex user migration patterns in reality, in order to support the prediction of user edge cloud access probabilities and minimize the sum of operational costs and potential losses caused by downtime. Reference [10] proposed a deep Dyna-Q approach to solve the SFC reconfiguration problem under the premise of guaranteeing QoS and resource constraints. Reference [11] proposed a SFC management scheme based on the prediction of service. Reference [12] proposed the Dueling Double Deep Q Network (D3QN)-based SFC orchestration scheme to ensure QoE of users in the highly dynamic and resource-constrained Unmanned Aerial Vehicle (UAV) scenario.

The aforementioned research demonstrates the tremendous potential of deep learning algorithms. The AC algorithm, as a policy-based algorithm, has been used in many dynamic optimization problems. However, the efficiency of classical AC is limited in that the policy does not directly participate in the value function update. So, we use the principle of PPO, that is, proximal ratio pruning, to limit the magnitude of policy updates, and form the ACPPO algorithm. Based on this, we further utilize the historical decision information as the network knowledge to offer better initial policies and accelerate the training speed.

The main contributions of this paper are synthesized as follows:

We formulate the SFC reconfiguration as a VNF migration problem, aiming to minimize the migration cost, to cope with the highly dynamic service requirements in 6G IoT scenarios.
We use proximal ratio pruning to limit the magnitude of policy updates, reducing the impact of coupling relationship in the AC algorithm, and combine the advantages of the AC and PPO algorithm to solve the VNF migration problem. And we further introduce the 6G knowledge base, which is dynamic updating by saving and updating the excellent historical policies, to expand the action space and formulate the KA-ACPPO algorithm.
Simulation results show that our proposed KA-ACPPO can effectively reduce computing cost and power consumption.

This paper is organized as follows. In Section 1, the system model is presented and the problem is formulated. Section 2 proposes the KA-ACPPO algorithm. Section 3 gives the simulation results. Finally, Section 4 concludes the paper and gives future research directions.

2. System Model and Problem Formulation

2.1. Network Model

Consider the 6G IoT scenario. Service requirements are satisfied by SFC orchestration. SFC orchestration is complemented by global control with the help of the 6G knowledge base. When the device produces a service request, it is assigned to the closed-edge node and communicates with the top-level SDN controller and NFV manager. The VNF manager complements the VNF orchestration, forming an ordered set of VNFs, that is, the SFC deployed policy. Finally, the policy is issued to the infrastructure layer to deploy the network functions to meet the service requirements. And when the service requirements change, SFC can be reconfigured through VNF migration, for example, VNF is migrated from one edge node to another edge node.

It is shown in Figure 1 that the edge nodes and the physical links in the infrastructure layer are expressed as the undirected graph

G = (X, L)

, in which X and L, respectively, describe the set of the edge nodes and the physical links. The available computing resource of the edge node

x \in X

can be expressed as

C_{x}^{C P U}

.

l_{x, y}

describes the physical link between the edge node x and the edge node y, in which

x, y \in X

and

x \neq y

. SFC reconfiguration is implemented through VNF migration, so the physical link needs to reserve bandwidth B for VNF migration. In fact, every SFC can be expressed as an ordered set of VNF, denoted as

V = {v_{1}, v_{2}, \dots, v_{| V |}}

. We divide IoT services into two categories: delay-sensitive services and delay-tolerant services. The corresponding SFC requests can be denoted as delay-sensitive request

j \in J

and delay-tolerant request

k \in K

, and the corresponding computing resource provided by the edge nodes can be, respectively, denoted as

b_{j}

and

b_{k}

. This paper ensures the QoS of service requests by controlling the reconstruction costs of VNF migration in three aspects: computing cost, power consumption cost, and delay cost.

For clear presentation, we summarize the notations used in the following formulation in Table 1.

2.2. The Computing Cost of VNF Migration

SFCs are deployed in the edge node and are allocated computing resources from the edge node to meet the service requirements. To represent the computing resource consumption of node x, we denote the decision-making variables of SFC reconfiguration as follows:

θ_{x, j} = \{\begin{matrix} 1 & i f r e s o u r c e b_{j} i s a l l o c a t e s t o r e q u e s t j \\ 0 & o t h e r w i s e \end{matrix},

(1)

θ_{x, k} = \{\begin{matrix} 1 & i f r e s o u r c e b_{k} i s a l l o c a t e s t o r e q u e s t k \\ 0 & o t h e r w i s e \end{matrix} .

(2)

The computing resource consumption of the edge node x can be denoted as follows:

z_{k} = \sum_{j = 1}^{J} θ_{x, j} b_{j} + \sum_{k = 1}^{K} θ_{x, k} b_{k} .

(3)

During the VNF migration process, only the CPU state between the initial node and the target node is considered. Therefore, the computing cost of VNF migration can be denoted as follows

C o s t_{1} = \sum_{x = 1}^{X} \sum_{v = 1}^{V} W_{x}^{V} (1 - U_{x}^{V}) z_{k},

(4)

where

U_{x}^{V}

is the variable of SFC deployment, and

W_{x}^{V}

is the variable of SFC reconfiguration. For example, if the first VNF of the current SFC is not deployed at edge node 1, it can be denoted as

U_{1}^{1} = 0

. If the VNF is migrated to the edge node 1, it can be denoted as

W_{1}^{1} = 1

, and

W_{1}^{1, 1} (1 - U_{1}^{1, 1}) = 1

, which means the VNF is migrated.

2.3. The Power Consumption Cost of VNF Migration

VNF migration can cause changes in the working status of certain edge nodes, resulting in additional power consumption. Power consumption is related to the usage of computing resources at edge nodes. Based on the current SFC deployment policy, the power consumption of the edge node x can be denoted as [13]:

P_{x} = P (0 %) + (P (100 %) - P (0 %)) (2 z_{x} - {(z_{x})}^{1.4}),

(5)

where

P (0 %)

denotes the power consumption of the edge nodes in idle modes, and

P (100 %)

denotes the power consumption of the edge nodes in full-load modes.

Therefore, the power consumption cost of VNF migration can be denoted as:

C o s t_{2} = \sum_{x = 1}^{X} \sum_{v = 1}^{V} W_{x}^{V} (1 - U_{x}^{V}) P_{x} .

(6)

2.4. The Delay Cost of VNF Migration

In this paper, the delay-sensitive services and the delay-tolerate services are both modeled as

M / M / 1

queuing, where the arriving rates are, respectively, depicted as

λ_{j}

and

λ_{k}

. And the total arriving rate at edge node x can be denoted as follows:

λ^{'} = \sum_{j = 1}^{J} λ_{j} + \sum_{k = 1}^{K} λ_{k} .

(7)

The CPU working frequency of the edge node x is denoted as F, and the serving rate of VNF can be denoted as:

μ^{'} = z_{x} F .

(8)

And the traffic intensity can be denoted as:

ρ^{'} = \frac{λ^{'}}{μ^{'}} .

(9)

Therefore, the queuing delay of the SFC requests can be denoted as:

D_{q} = \frac{{ρ^{'}}^{2}}{λ (1 - ρ^{'})} .

(10)

Simply, only the queuing delay of the delay-sensitive SFC requests is considered, which is denoted as:

D_{j} = \frac{β_{j}}{b_{j}},

(11)

where

β_{j}

denotes the data size of the jth delay-sensitive SFC request.

Considering the network overhead of the infrastructure layer and the stability of network services, the impact of VNF migration latency and downtime should be minimized to the greatest extent possible. The VNF migration latency is related to the position change of VNF, and the number of migrated VNF at edge node x can be denoted as:

μ_{x} = \sum_{v = 1}^{V} | W_{x}^{V} - U_{x}^{V} | .

(12)

And the number of VNF migration between edge node x and edge node y can be denoted as:

μ_{x, y} = \sum_{v = 1}^{V} W_{x}^{V} (1 - U_{y}^{V}) .

(13)

The corresponding delay of VNF migration between edge node x and edge node y can be denoted as:

D_{x, y}^{v} = \frac{d_{j}}{B} W_{x}^{v} U_{y}^{v} max_{(x, y) \in X} (μ_{x}, μ_{x, y}, μ_{y}) .

(14)

The total delay of VNF migration caused by the position change of VNFs can be denoted as:

D_{m} = max_{(v, x, y)} D_{x, y}^{v} .

(15)

Downtime refers to the time period during which the migrated VNF is unresponsive, as the VNF state is either migrating from the initial node to the target node or the network has not yet aggregated. Therefore, it is necessary to minimize the possibility of network service interruption caused by prolonged downtime. Using the current resource utilization rate of network edge nodes as a penalty factor to calculate the additional delay caused by excessive downtime, the CPU utilization rate of current network edge node x can be expressed as:

ρ_{x} = \frac{z_{x}}{C_{x}^{C P U}} .

(16)

The additional delay caused by prolonged downtime is denoted as

D_{o} = \sum_{x = 1}^{X} \sum_{v = 1}^{V} (D_{j} + D_{q}) W_{x}^{v, s} (1 - U_{x}^{v, s}) ρ_{x} .

(17)

When the downtime of VNF migration is too long, it can lead to delay-sensitive SFC requests not being processed in a timely manner during migration, resulting in QoS degradation of delay-sensitive SFC requests. The QoS degradation rate of delay-sensitive SFC requests caused by long VNF downtime can be expressed as:

Q_{j} = \frac{D_{o}}{D_{q} + D_{j} + D_{m} + D_{o}} .

(18)

Taking into account the total latency of VNF migration and the additional latency caused by excessive downtime, the latency cost of VNF migration can be expressed as:

C o s t_{3} = D_{m} + D_{o} .

(19)

2.5. Problem Formulation

The total cost of SFC reconfiguration in 6G IoT scenario can be denoted as:

C o s t = \frac{C o s t_{1}}{σ_{1}} + \frac{C o s t_{2}}{σ_{2}} + \frac{C o s t_{3}}{σ_{3}},

(20)

where

σ_{1}

,

σ_{2}

, and

σ_{3}

are normalized constants. In this paper, we minimize the total cost of SFC reconfiguration, and formulate the problem as follows

min_{U_{x}^{V}, W_{x}^{V}} (C o s t = \frac{C o s t_{1}}{σ_{1}} + \frac{C o s t_{2}}{σ_{2}} + \frac{C o s t_{3}}{σ_{3}}),

(21)

Satisfy the constraint:

\begin{matrix} \{\begin{matrix} C_{1} : z_{x} \leq C_{x}^{C P U}, \forall j \in X \\ C_{2} : \sum_{x = 1}^{X} W_{x}^{V} = v, \forall v \in V \\ C_{3} : D_{q} + D_{j} + D_{m} + D_{o} \leq D, \forall v \in V, \forall a, y \in X \\ C_{4} : z_{x} F - (\sum_{j = 1}^{J} λ_{j} + \sum_{k = 1}^{K} λ_{k}) \geq 0, \forall x \in X, \forall j \in J, \forall k \in K \end{matrix}, \end{matrix}

(22)

where

C_{1}

ensures that the edge node x would not be overloaded.

C_{2}

ensures that reasonable quantity of VNFs are deployed in every SFC.

C_{3}

is the constraint of delay constraint of the delay-sensitive services.

C_{4}

ensures the stability of the SFC request queueing.

3. Knowledge-Assisted ACPPO Algorithm

In this section, the Markov Decision Process (MDP) is used to model the above problem. The MDP can be simply defined as

< S, A, R >

, where state S, action A, and reward R are, respectively, defined as follows.

3.1. State

Define the state S as system state at time t, which can be written as follows:

s (t) = {c_{j} (t), b_{k} (t), z_{x} (t)},

(23)

where

c_{j} (t)

and

b_{k} (t)

, respectively, denote the resource requirements of delay-sensitive and delay-tolerate SFC requests, which are detailed as follows:

c_{j} (t) = {c_{1} (t), c_{2} (t), \dots, c_{J} (t)},

(24)

b_{k} (t) = {b_{1} (t), b_{2} (t), \dots, b_{K} (t)},

(25)

z_{x} (t) = {z_{1} (t), z_{2} (t), \dots, z_{X} (t)} .

(26)

3.2. Action

Action A denotes the sets of VNF migration, and the VNF migration at time t can be denoted as

a (t) = {a_{1} (t), a_{2} (t), \dots, a_{i} (t), \dots, a_{V} (t)}, \forall i \in V,

(27)

where

a_{i} (t)

describes that if the ith VNF is migrated.

a_{i} (t) = 1

means that ith VNF is migrated, and otherwise,

a_{i} (t) = 0

. In order to better adapt to the network changes, the knowledge-molded VNF transfer method is introduced into the 6G network knowledge base as prior knowledge to expand the action space, allowing for more choices in VNF transfer decision-making and optimization processes, thereby improving the flexibility and adaptability of the algorithm. At the same time, it is possible to explore more fully, thus having the opportunity to find better solutions and improve algorithm performance and efficiency.

3.3. Reward

Since the section aims to minimize the configuration cost of SFC, the reward function can be denoted as

r (t) = - α C o s t,

(28)

where

α > 0

is the reward factor, and the larger

r (t)

, the smaller SFC reconfiguration cost.

Based on the above MDP model, the KA-ACPPO algorithm is proposed to solve the SFC reconfiguration problem. Combining the advantages of the AC algorithm and the PPO algorithm, the ACPPO algorithm is efficient, stable, and adaptable. However, the selection of hyperparameters and the efficiency of sample processing are still issues that need to be considered when using the ACPPO algorithm. Therefore, under the 6G autonomous control framework, the KA-ACPPO algorithm introduces network knowledge related to the algorithm from the 6G network knowledge base, initializes the Actor network parameters and the Critical network parameters, and accelerates the learning process and reduces training time by providing a good initial strategy in the initial stage, avoiding a lot of exploration. Meanwhile, another VNF transfer method is introduced from the 6G network knowledge base as prior knowledge to expand the action space, helping the algorithm converge and learn faster, improving its performance in complex tasks, and enabling it to cope with a wider range of 6G scenarios.

The framework of the KA-ACPPO algorithm is shown in Figure 2. Input the current network topology G and the VNF deployment state, then the Actor network factor and the Critic network factor are initialized according to the module of the 6G knowledge base. The historic policies in the module knowledge base are updated based on the SFC reconfiguration policy. The Actor network is responsible for selecting VNF migration actions, while the Critic network evaluates the value of each action. The gradient descent method is used to update the parameters of the Critic network and the Actor network, and the optimal SFC reconstruction strategy can be denoted as:

π^{*} = arg max_{a} V^{π} (s, a), \forall s, a .

(29)

The Critical network evaluates the Actor network by using the advantage function to select actions based on the current output strategy, thereby assisting the Actor in policy updates. In the current state

s (t)

, the advantage function of selecting action

a (t)

based on the strategy of the Actor network can be denoted as:

{\hat{A}}_{t} = Q_{π o l d} (s_{t}, a_{t}) - V_{π o l d} (s_{t}),

(30)

where

Q_{π o l d} (s_{t}, a_{t})

denotes the value

V_{π o l d} (s_{t})

is the value at state

s_{t}

.

According to generalized advantage estimation [14], the advantage function can be denoted as:

{\hat{A}}_{t} = θ_{t} + (γ λ) θ_{(t + 1)} + \dots + {(γ λ)}^{T - t + 1} θ_{(T - 1)},

(31)

where

θ_{t} = r (t) + γ V_{(s_{t +})} - V_{(s_{t})}

is the error of Temporal Difference (TD), and

γ

is the discount factor, Let

λ = 1

, the advantage function can be denoted as:

{\hat{A}}_{t} = - V (s_{t}) + r_{t} + γ r_{t + 1} + \dots + γ^{T - t + 1} r_{t - 1} + γ^{T - t} V (s_{T}) .

(32)

The larger the value of the advantage function, the better the selected action. The advantage function is designed to increase the stability of policy and avoid unstable optimization processes caused by excessive policy updates. At the same time, by limiting the amplitude of action policy updates, the Actor network’s action update amplitude will not be too large. The update amplitude of the Actor network action strategy represents the ratio of the probability of the current strategy taking action

a (t)

in state

s (t)

to the probability of the old strategy taking action

a (t)

in state

s (t)

, which is denoted as:

r_{t} (θ) = \frac{π_{θ} (a_{t} | s_{t})}{π_{θ_{o l d}} (a_{t} | s_{t})} .

(33)

The clip function limits the size of the policy update step, preventing excessive policy updates and thus avoiding instability issues. The clip function can be denoted as:

c l i p (r_{t} (θ), 1 - ε, 1 + ε) = \{\begin{matrix} 1 + ε & r_{t} (θ) > 1 + ε \\ 1 - ε & r_{t} (θ) > 1 + ε \\ r_{t} (θ) & o t h e r w i s e \end{matrix} .

(34)

The clip function restrict the difference between the new policy and the old policy, to ensure the stability of the updating process. The proximal clip loss function in this section can be denoted as:

L^{c l i p} (θ) = E [min r_{t} (θ) {\hat{A}}_{t}, c l i p (r_{t} (θ), 1 - ε, 1 + ε) {\hat{A}}_{t}] .

(35)

Critic network updates as follows:

ϕ = ϕ + β \nabla_{ϕ} L (ϕ) .

(36)

where

β

is the learning rate of the Critic network, and

L (ϕ)

is the loss function of the Critic network to minimize TD error, which can be denoted as:

L (ϕ) = r_{t} + γ r_{t + 1} + \dots + γ^{T - t + 1} r_{T - 1} + γ^{T - 1} V {(s (T + 1) - V (s (T)))}^{2} .

(37)

The Actor network updates as follows:

θ = θ + β^{'} \nabla_{θ} L^{c l i p} (θ),

(38)

where

β^{'}

is the learning rate of the Actor network. Algorithm 1 detailed the training process of KA-ACPPO.

Algorithm 1 The Training Process of KA-ACPPO.

1:: Input: the current network topology G and current VNF deployment location
2:: Output: the best SFC reconfiguration policy $π^{*}$
3:: Initialize the Actor network factor $θ$ and the Critic network factor $ϕ$
4:: for $t h r e a d - n u m b e r = 1, \dots, N$ do
5:: for $i n t e r a t i o n - n u m b e r = 1, \dots, K$ do
6:: Obtain the current network state $s (t)$ , and bring into the knowledge-based VNF migration to expand the action space
7:: Select the action $a (t)$ according to the policy of Actor network
8:: if the constraints $C_{1} C_{4}$ are satisfied then
9:: Execute the action $a (t)$ , get the instant reward $r (t)$ , and turn to the next state $s (t + 1)$
10:: else
11:: Reselect the action $a (t)$
12:: end if
13:: end for
14:: Calculate ${\hat{A}}_{t}$ and $L^{c l i p} (θ)$
15:: Update the Critic network based on Equation (35)
16:: Update the Actor network based on Equation (37)
17:: end for

4. Simulation and Result Analysis

4.1. Parameter Setting

In this section, we set up the simulation environment to evaluate the performance of the proposed KA-ACPPO algorithm. The key parameters are listed in Table 2 [15]. And it is worth noting that the parameter setting of the service requests are generated by random selection from the sum of Gaussian functions with different parameters [15].

4.2. Results Analysis

Figure 3 shows KA-ACPPO algorithm rewards for different number of neurons. It can be seen that, as the number of neurons increases, the reward of the KA-ACPPO algorithm also increases. The reason is that increasing the number of neurons can improve the representation ability of neural networks, enabling them to better learn and fit complex functional relationships, and enable the network to better express the complex relationships between states and actions, thereby improving rewards. However, increasing the number of hidden layer neurons can increase the complexity of the model and may lead to a slower training process. Therefore, when selecting the number of hidden layer neurons, it is necessary to balance training efficiency and increasing rewards. Considering training efficiency and rewards comprehensively, the number of neurons for the KA-ACPPO algorithm is determined to be 128 in the following simulation in this paper.

Figure 4 shows the normalized reconfiguration costs for different algorithms, which is our optimization object in Equation (21). From the figure, we can see that the AC algorithm can quickly converge into a higher convergence value, but it fluctuates greatly throughout the entire training cycle, resulting in a lower convergence value. This is because the strategies and value functions in the AC algorithm are interdependent, and their updates affect each other. This coupling relationship leads to fluctuations in the training process of the AC algorithm. ACPPO algorithm uses proximal ratio pruning to limit the magnitude of policy updates, reducing the impact of this coupling relationship to ensure a stable training process. Moreover, the proposed KA-ACPPO algorithm introduces 6G knowledge on the basis of the ACPPO algorithm, which helps the algorithm learn and converge quickly, ultimately achieving smaller fluctuations and higher convergence.

Figure 5 shows the computing cost for VNF migration using different algorithms, which is related to Equation (4). The length of SFC affects the computational resource cost of VNF migration from two aspects: the current usage of computing resources of edge nodes, and the number of VNF migrations. As the length of SFC increases, the number of VNFs per SFC increases, and the current usage of edge node computing resources becomes more complex. With the goal of minimizing reconstruction costs, the number of VNF migrations should be controlled while migrating VNFs that use fewer computing resources. From the figure, it can be also observed that as the length of SFC increases, the computational resource cost of VNF migration for all three algorithms increases. The computational resource cost of VNF migration using the proposed KA-ACPPO algorithm is always lower than the other two algorithms, ultimately controlling the computational resource cost of VNF migration at

34 %

.

Figure 6 shows the power costs of VNF migration for different algorithms, which is related to Equation (6), and the comparison algorithm is Reinforcement Learning (RL)-based VNF deployment scheme proposed in [16]. The power cost of VNF migration is related to the usage of network node computing resources. Therefore, the length of SFC also affects the power cost of VNF migration in two aspects: the current usage of edge node computing resources and the number of VNF migrations. As the length of SFC increases, the number of VNFs per SFC increases, and the computing resources of the network edge nodes currently used increase, The power consumption increases accordingly. Considering that VNF migration may change the working mode of edge nodes, with the goal of minimizing reconstruction costs, the number of VNF migrations should be controlled to minimize the additional power consumption caused by such changes. From Figure 6, it can be observed that as the length of SFC increases, the power cost of VNF migration for all three algorithms increases. The power cost of VNF migration for the KA-ACPPO algorithm is always lower than the other two algorithms.

Figure 7 shows the QoS degradation rates, which is related to Equation (18), of delay-sensitive SFC requests using different algorithms. According to Formula (18), the QoS degradation rate of delay-sensitive SFC is defined as the proportion of additional delay caused by excessive downtime to the total end-to-end delay. The length of SFC affects the QoS degradation rate of delay-sensitive SFCs from two aspects. Firstly, as the length of the SFC increases, the total end-to-end latency also increases. Secondly, as the length of the SFC increases, the number of VNF migrations per SFC may also increase. The negative impact of migration latency and downtime on the QoS of delay-sensitive SFC will also increase. From Figure 7, it can be observed that as the length of the SFC increases, the QoS degradation rate of the delay-sensitive SFC of all three algorithms increases. The QoS degradation rate of the delay-sensitive SFC of the KA-ACPPO algorithm is always lower than the other two algorithms, and the QoS degradation rate of the delay-sensitive SFC is ultimately controlled at 39%.

5. Conclusions and Discussion

6G will inherit and develop NFV architecture. With this SFC-enabled flexible network architecture, SFC orchestrates different VNFs to improve maintaining the QoS of different services. Considering that the coupling relationship leads to fluctuations in the training process of the traditional AC algorithm, this paper brings proximal ratio pruning to limit the magnitude of policy updates and combine the advantages of the AC algorithm and the PPO algorithm. Moreover, we propose to introduce the 6G knowledge base to provide a better initial policy to accelerate training speed and form the KA-ACPPO-based VNF migration algorithm. Simulation results show that the proposed algorithm can effectively reduce computing cost and power consumption.

However, in this paper, we consider using historic policies as network knowledge to improve performance. In future work, more dimensions of knowledge, such as model-based algorithms and expert experiences, could be researched to further improve the performance of deep learning algorithms.

Author Contributions

Conceptualization, X.S. and B.L.; methodology, B.L.; software, S.L.; validation, B.L. and S.L.; formal analysis, B.L.; writing—original draft preparation, B.L.; writing—review and editing, S.L. and X.S.; supervision, X.S.; funding acquisition, X.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by National Key R&D Program of China (No. 2020YFB1806702).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

No new data were created.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

NFV	Network Function Virtualization
VNF	Virtual Network Function
SFC	Service Function Chains
IoT	Internet of Things
AC	Actor Critic
PPO	Proximal Policy Optimization
QoS	Quality of Service
AI	Artificial Intelligence
DQN	Deep Q Network
D3QN	Dueling Double Deep Q Network
UAV	Unmanned Aerial Vehicle
RL	Reinforcement Learning

References

Giordani, M.; Polese, M.; Mezzavilla, M.; Rangan, S.; Zorzi, M. Towards 6G Networks: Use Cases and Technologies. IEEE Commun. Mag. 2020, 35, 55–61. [Google Scholar] [CrossRef]
Liao, J.; Fu, X.; Wang, J.; Sun, H. 6G-ADM: Knowledge based 6G network management and control architecture. J. Commun. 2022, 43, 6. [Google Scholar]
Wen, T.; Yu, H.; Sun, G.; Liu, L. Network function consolidation in service function chaining orchestration. In Proceedings of the 2016 IEEE International Conference on Communications (ICC), Kuala Lumpur, Malaysia, 22–27 May 2016; pp. 1–6. [Google Scholar] [CrossRef]
Afrasiabi, S.N.; Ebrahimzadeh, A.; Promwongsa, N.; Mouradian, C.; Li, W.; Recse, Á.; Szabó, R.; Glitho, R.H. Cost-Efficient Cluster Migration of VNFs for Service Function Chain Embedding. IEEE Trans. Netw. Serv. Manag. 2024, 21, 979–993. [Google Scholar] [CrossRef]
Tran, V.; Sun, J.; Tang, B.; Pan, D. Traffic-Optimal Virtual Network Function Placement and Migration in Dynamic Cloud Data Centers. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Lyon, France, 30 May–3 June 2022; pp. 919–929. [Google Scholar]
Farkiani, B.; Bakhshi, B.; MirHassani, S.A.; Wauters, T.; Volckaert, B.; De Turck, F. Prioritized Deployment of Dynamic Service Function Chains. IEEE/ACM Trans. Netw. 2021, 29, 979–993. [Google Scholar] [CrossRef]
Zhang, F.; Lu, H.; Guo, F.; Gu, Z. Traffic Prediction Based VNF Migration with Temporal Convolutional Network. In Proceedings of the 2021 IEEE Global Communications Conference (GLOBECOM), Madrid, Spain, 7–11 December 2021. [Google Scholar]
Afrasiabi, S.N.; Ebrahimzadeh, A.; Mouradian, C.; Malektaji, S.; Glitho, R.H. Reinforcement Learning-Based Optimization Framework for Application Component Migration in NFV Cloud-Fog Environments. IEEE Trans. Netw. Serv. Manag. 2023, 20, 1866–1883. [Google Scholar] [CrossRef]
Ibrahimpasic, A.L.; Han, B.; Schotten, H.D. AI-Empowered VNF Migration as a Cost-Loss-Effective Solution for Network Resilience. In Proceedings of the 2021 IEEE Wireless Communications and Networking Conference Workshops (WCNCW), Nanjing, China, 29 March 2021; pp. 1–6. [Google Scholar]
Liu, Y.; Zhang, C.; Yang, H.; Zhang, S.; Wang, X.; Li, F. A Deep Reinforcement Learning-Based Approach for Adaptive SFC Deployment in Multi-Domain Networks. In Proceedings of the 2023 15th International Conference on Communication Software and Networks (ICCSN), Shenyang, China, 21–23 July 2023; pp. 199–203. [Google Scholar] [CrossRef]
Hirayama, T.; Jibiki, M. SFC Path Selection based on Combination of Topological Analysis and Demand Prediction. In Proceedings of the 2022 23rd Asia-Pacific Network Operations and Management Symposium (APNOMS), Takamatsu, Japan, 28–30 September 2022; pp. 1–4. [Google Scholar]
Wu, Y.; Jia, Z.; Wu, Q.; Lu, Z. Adaptive QoE-Aware SFC Orchestration in UAV Networks: A Deep Reinforcement Learning Approach. IEEE Trans. Netw. Sci. Eng. 2024. [Google Scholar] [CrossRef]
Liu, N.; Li, Z.; Xu, J.; Xu, Z.; Lin, S.; Qiu, Q.; Tang, J.; Wang, Y. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning. In Proceedings of the IEEE 37th International Conference on Distributed Computing Systems, Atlanta, GA, USA, 5–8 June 2017; pp. 372–382. [Google Scholar]
Zhang, Z.; Luo, X.; Liu, T.; Xie, S.; Wang, J.; Wang, W.; Li, Y.; Peng, Y. Proximal Policy Optimization with Mixed Distributed Training. In Proceedings of the 2019 IEEE 31st International Conference on Tools with Artificial Intelligence (ICTAI), Portland, OR, USA, 4–6 November 2019; pp. 1452–1456. [Google Scholar]
Fu, X.; Yu, F.R.; Wang, J.; Qi, Q.; Liao, J. Dynamic Service Function Chain Embedding for NFV-Enabled IoT: A Deep Reinforcement Learning Approach. IEEE Trans. Wirel. Commun. 2020, 19, 507–519. [Google Scholar] [CrossRef]
Yao, J.; Chen, M. A Flexible Deployment Scheme for Virtual Network Function Based on Reinforcement Learning. In Proceedings of the 2020 IEEE 6th International Conference on Computer and Communications (ICCC), Chengdu, China, 11–14 December 2020; pp. 1505–1510. [Google Scholar]

Figure 1. 6G IoT scenario and system model.

Figure 2. The framework of the proposed KA-ACPPO algorithm.

Figure 3. Rewards of KA-ACPPO algorithm for different number of neurons.

Figure 4. The normalized reconfiguration cots for different algorithm.

Figure 5. The computing cost of VNF migration for different algorithms.

Figure 6. The power consumption cost of VNF migration for different algorithms.

Figure 7. QoS degradation rate of delay-sensitive SFC for different algorithms.

Table 1. Notations used in this paper.

Notation	Description
X	the set of edge nodes
L	the set of physical links
$l_{x y}$	the link between edge node x and edge node y
$j \in J$	the index of delay-sensitive requests
$k \in K$	the index of delay-tolerant requests
$b_{j}$	computing resource allocated to request j
$b_{k}$	computing resource allocated to request k
$z_{x}$	the computing resource consumption of edge node x
$U_{x}^{V}$	the variable of SFC deployment
$W_{x}^{V}$	the variable of SFC reconfiguration
$P_{x}$	the power consumption cost
$λ_{j}, λ_{k}$	the arriving rates of delay-sensitive and delay-tolerant requests
$λ^{'}$	total arriving rate of requests
$μ^{'}$	the serving rate of VNF
$ρ^{'}$	the traffic intensity
$D_{q}$	the queuing delay of SFC requests
$D_{j}$	the delay of the delay-sensitive SFC requests
$β_{j}$	the data size of jth delay-sensitive SFC requests
$D_{m}$	the total delay of VNF migration caused by the position change of VNFs
$D_{o}$	the additional delay caused by prolonged downtime
$Q_{j}$	QoS degradation rate of jth delay-sensitive SFC requests

Table 2. Simulation parameters.

Simulation Parameters	Simulation Values
number of edge nodes	10
computation resource of edge nodes	[100,200] (units)
power consumption of the edge node at idle state	87 (W)
power consumption of the edge node at full-load state	145 (W)
reserve bandwidth for VNF	[5,15] (Mbps)
required computing resource for delay-sensitive SFC	[10,30] (units)
required computing resource for delay-tolerate SFC	[5,15] (units)
length of SFC	[4,5,6,7]
end-to-end delay constraint	60 (ms)
the parameter of clip function	0.2
learning rate of Actor network	0.001
learning rate of Critic network	0.01
discount factor	0.99

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, B.; Long, S.; Su, X. Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario. Entropy 2024, 26, 820. https://doi.org/10.3390/e26100820

AMA Style

Liu B, Long S, Su X. Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario. Entropy. 2024; 26(10):820. https://doi.org/10.3390/e26100820

Chicago/Turabian Style

Liu, Bei, Shuting Long, and Xin Su. 2024. "Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario" Entropy 26, no. 10: 820. https://doi.org/10.3390/e26100820

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Knowledge-Assisted Actor Critic Proximal Policy Optimization-Based Service Function Chain Reconfiguration Algorithm for 6G IoT Scenario

Abstract

1. Introduction

2. System Model and Problem Formulation

2.1. Network Model

2.2. The Computing Cost of VNF Migration

2.3. The Power Consumption Cost of VNF Migration

2.4. The Delay Cost of VNF Migration

2.5. Problem Formulation

3. Knowledge-Assisted ACPPO Algorithm

3.1. State

3.2. Action

3.3. Reward

4. Simulation and Result Analysis

4.1. Parameter Setting

4.2. Results Analysis

5. Conclusions and Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI