Next Article in Journal
Investigating the Role of Artificial Intelligence Technologies in the Construction Industry Using a Delphi-ANP-TOPSIS Hybrid MCDM Concept under a Fuzzy Environment
Previous Article in Journal
5G and Companion Technologies as a Boost in New Business Models for Logistics and Supply Chain
Previous Article in Special Issue
Risk Assessment Method for Spontaneous Combustion of Pyrophoric Iron Sulfides
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Research on QoS Flow Path Intelligent Allocation of Multi-Services in 5G and Industrial SDN Heterogeneous Network for Smart Factory

1
College of Information Science and Technology, Beijing University of Chemical Technology, Beijing 100029, China
2
College of Communication and Information Technology, Xi’an University of Science and Technology, Xi’an 710054, China
3
School of Computer Science, Xi’an Shiyou University, Xi’an 710065, China
4
College of Mechanical and Electrical Engineering, Beijing University of Chemical Technology, Beijing 100029, China
5
High-Tech Research Institute, Beijing University of Chemical Technology, Beijing 100029, China
6
China Academy of Safety Science and Technology, Beijing 100012, China
*
Authors to whom correspondence should be addressed.
Sustainability 2023, 15(15), 11847; https://doi.org/10.3390/su151511847
Submission received: 22 May 2023 / Revised: 17 July 2023 / Accepted: 25 July 2023 / Published: 1 August 2023
(This article belongs to the Special Issue Risk Assessment and Management in the Process Industries)

Abstract

:
In this paper, an intelligent multiple Quality of Service (QoS) constrained traffic path allocation scheme with corresponding algorithm is proposed. The proposed method modifies deep Q-learning network (DQN) by graph neural network (GNN) and prioritized experience replay to fit the heterogeneous network, which is applied for production management and edge intelligent applications of smart factory. Moreover, through designing the reward function, the learning efficiency of the agent is improved under the sparse reward condition, and the multi-object optimization is realized. The simulation results show that the proposed method has high learning efficiency, and strong generalization ability adapting the changing of topological structure of network caused by network error, which is more suitable than the compared methods. In addition, it is also verified that combining the field knowledge and deep reinforcement learning (DRL) can improve the performance of the agent. The proposed method can achieve good performance in the network slicing scenario as well.

1. Introduction

The purpose of combing Operation Technology (OT) and Information Technology (IT) is to build an information system for the smart factory based on the industrial control network technology. The new industrial network scheme based on SDN and TSN can satisfy the high requirements of low latency, flexibility and dynamic of industrial networks well [1]. Due to the mobility of devices caused by flexible manufacturing, the programmable logic controllers (PLC) of lathes and main controllers need to be connected by wireless network [2]. The controlling traffic of PLC is sensitive to latency, while the fifth generation-advanced (5G-A), which supports ultra-reliable low-latency communications (URLLC) and connects with TSN, can satisfy the traffic with determined latency well [3].
By exploiting edge intelligence (EI), the data stream can route directly to the near edge Data Center (DC) implemented in a workshop instead of transmitting to the application system in the central DC [4]. The structure of the inner network of the smart factory is presented in Figure 1.
The factory network is divided into access layer, distribution layer and core layer. The part between the vertical dotted line a and b is the access layer, which is built by the access switch and wireless network. The part between the vertical dotted line b and c is the distribution layer, which is a ring or MESH network built by distributed switches. The part between the vertical dotted line c and d is the core layer, which is built by two core switches. The network switch is controlled by an SDN controller through open flow protocol.
The edge devices, such as vehicles, augmented reality (AR) glasses, robots, cameras, digital lathes and intelligent terminals, are to the left of the vertical dotted line a in Figure 1. The application system for production management is implemented in the central DC, the EI applications are implemented in the edge DC and the edge devices are connected to the switches of the distribution layer.
Heterogeneous network, diversity of traffic, mobility of access and applications tending to be implemented in both cloud and edge are the characteristics of the network and traffic of a smart factory. However, the mobility of the staff and devices leads to dynamic traffic path allocation rather than static allocation. In addition, the diversity of Quality of Service (QoS) makes management of operation and maintenance difficult. The Operator has been studied, Autonomous Driving Network (ADN), which introduces artificial intelligence (AI) into network management. ADN has attracted numerous researchers, but studies on industrial networks are rare. Actually, the industrial network needs ADN more due to its complexity.
On the one hand, network slicing can avoid decreasing of the delay jitter index of real-time traffic affected by burst data. On another hand, network slicing can provide solution of transmission with determined delay. The authors in [1] divided an SDN switch in two slices. One is for high-reliable low-latency traffic, such as the communication between PLCs, and remote operation. Another is for traffic that is insensitive to delay, such as data collecting and safety video monitoring.
The method proposed in [5] used hard slicing for satisfying the requirement for different QoS of traffics, which realizes the isolation of real-time traffic and non-real-time traffic in long term evolution (LTE) network. The authors in [2] proposed a QoS cooperatively controlling scheme and procedure of 5G and SDN heterogeneous network for smart factory, and modeled the production network with a specific smart factory. Besides 5G network slicing, SDN network slicing was also proposed in [2]. Moreover, a method for multiple QoS traffic flow path dynamic allocation and global optimization by linear programming was proposed.
The Full Paths Re-computation (FPR), Heuristic Full Paths Re-computation (HPR) and Partial Paths Re-computation (PPR) network stream optimization algorithms for single bandwidth parameter in 5G SDN were proposed in [6]. Although HPR and PPR algorithms decrease the operation delay, the allocation delay approximates 1 s when user number is more than 50, which cannot satisfy the requirement of access delay.
With the increasing amount of traffic and the network scale, adopting conventional methods for dynamic flow path allocation and global optimization requires a lot of computing resources, which need long computation time, and leads to the ‘dimension disaster’. In addition, the access delay is too long to satisfy the requirement. The authors in [2] proposed utilizing Deep Reinforcement Learning (DRL) for factory network to realize the establish and dynamic optimization of multiple parameter QoS flow path.
By exploiting the DRL method, the agent is introduced into the management plane. It interacts with the SDN controller, collects condition information of network, including traffic access requirement, and makes a routing decision according to the policy of the agent. According to the decision, the SDN controller generates the traffic flow path, and sends it to the SDN switch in the data plane by open flow protocol to realize the intelligent allocation of the traffic path [7,8,9,10].
However, the aforementioned DRL-based methods are not proposed for industrial network, which does not explicitly consider the requirement of bandwidth and delay together. Therefore, these methods may not satisfy the requirement of industrial network when the delay is larger than the maximum traffic latency requirement. Recently, a study in [11] propose a graph embedding-based DRL framework for adaptive path selection (GRL-PS), which achieved nearly-optimal performance with both latency and throughput, especially in large-scale dynamic networks.
Furthermore, these DRL-based methods may not adapt well to the network topology that is different with the training period, because the conventional neural network may not deal with the complex graphic problem in the industrial network [12]. Industrial network is highly reliable, and can establish the ring network or mesh network, which can act against the error by itself. When the error occurs in the link or switch node, the topological structure of network will change, which needs methods to adapt to the change.
When the network satisfies traffic QoS, the maximum network traffic is used to evaluate the agent performance. However, the reward of each traffic path allocation is hard to get, and only the reward corresponding to the maximum traffic of network can be obtained until one episode is finished, which is a sparse reward problem. Therefore, a reward function needs to be designed to guide the learning of the agent.
The model-free-based DRL methods highly rely on the number of training data, which is not easy to satisfy in the real network, especially in the sparse traffic situation. Although the random experience replay method solves the correlation and non-statable problem of experience data, the uniform sampling and updating data based on batch leads to a scenario where valuable data are not fully used [13].
Although most research focuses on general-purpose networks, such as NSFNET, GEANT2 and GBN, there has also been research on industrial networks in recent years. The authors of [14] designed a Deep Federated Q-Learning-Based algorithm, which dynamically assigns traffic flow paths to different network slices according to QoS requirements in IIOT networks of LoRaWAN Technology. Studies about QoS flow path intelligent allocation in 5G and SDN heterogeneous network for smart factory are rare. These should consider network reliability, network slicing and multi-service mixing in industrial situations.

2. Methodology

Ref. [15] proposes an overall architecture of the intelligent routing method based on Dueling DQN reinforcement learning, which significantly improves the network throughput and effectively reduces the network delay and packet loss rate. To fit the topological structure of inner network of smart factory and the characteristics of the traffic, we study the multiple QoS traffic path allocation method based on DRL for 5G and industrial SDN heterogeneous network. The traffic flow path allocation and optimization architecture are shown as Figure 2. They consist of the data plane, control plane and management plane. The function of each plane is described below.
(1)
THe data plane mainly includes the 5G User Plane and the SDN switch, which is controlled by 5G control plane and SDN controller of the Control plane, respectively. The collected states of the network are reported to the Control plane. The terminal Equipment original or Server original Service request would be sent to the Control plane; it is the Control plane that allocates the traffic flow path.
(2)
The Control plane mainly includes the 5G-control plane, SDN controller and the IW-NEF. The IWF-NEF is responsible for QoS collaborative Control between the application, 5G and SDN controller. The network status, including traffic requests, is sent to the management plane by the control plane. The new state and reward are sent as feedback to management after receiving the action from the QoS Policy Agent.
(3)
The Management plane is a Deep Reinforcement Learning (DRL)-based policy Agent which can achieve the ability of QoS flow path intelligent allocation by training, and can allocate an action corresponding to the QoS flow path.
Due to different production organizations, the factory network topology is different. Meanwhile, due to the high reliability of an industrial network, the traffic path should be allocated properly even with the error of some nodes. Since graph neural network (GNN) has strong ability in terms of modeling graph structure and optimization, as well as generalization [16,17,18], it is reasonable to exploit GNN to model the network structure and realize the relational reasoning and combinatorial generalization. Ref. [19] uses GNN to forecast SDN end-to-end latency with SDN, which can enhance the network’s routing strategy. In this paper, GNN is utilized in DQN to learn the network state and allocation method based on deep learning.
The prioritized experience replay method can solve the problem of the valuable data not being fully used caused by random experience replay, and improve the learning efficiency. The insight of prioritized experience replay method [13] is evaluating the value of data for learning based on TD-error, and replaying the valuable data many times. To avoid the lack of diversity caused by TD-error based prioritization, we utilize prioritized experience replay method based on probability. Moreover, importance sampling is introduced to correct the bias caused by the prioritized experience replay method, and the weight generated by importance sampling will be applied to the Q-learning.
As for the sparse reward problem, we aim to improve the learning effect of the agent by improving the learning method or adding a new reward, including the reward shaping method [20], designing a new reward module [21], and adding a new reward learning method [22,23]. The reward shaping method can design the reward according to prior information about environment. Therefore, we design the reward function by combining the graph theory and conventional network programming and optimization to aid the agent in finding the policy quickly.

2.1. Algorithm Framework

The framework of the proposed algorithm is presented in Figure 3, which contains training and test stages. The simulation environment is composed of a traffic generator and industrial SDN model. The enhanced-DQN (E-DQN), GNN and prioritized experience replay method are applied for agent, deep learning structure and experience replay memory, respectively. The evaluation module collects information of the environment and agent, and evaluates the performance of the agent.
In the training stage, the traffic generator generates the traffic requirements according to QoS. The path set, i.e., the action set, is established by an industrial SDN model. By interacting with the policy network of the agent, the action is selected and sent to the environment. The environment calculates the reward based on the designed reward function, sends complete {s, a, r, s’} to the agent, and records it in the experience replay memory. In the proposed algorithm, E-DQN uses the prioritized experience replay method to choose valuable data for training, and a neural network is chosen as the GNN. The network traffic, QoS index and reward are evaluated periodically in the training stage.

2.2. Network Model and Traffic Model

2.2.1. Factory Network Model

Factory network has its own characteristics, such as three-layer network architectures, intelligent application of edge cloud collaboration, etc. Production management application is implemented at the Factory or Company-level Data center (DC), while some real-time edge intelligent applications are implemented on the production site due to the application of edge computing, [24]. From the view of the network framework, once the access point is determined, the path can only uplink the edge server to the switch, regardless of wire or wireless access. Therefore, the key of traffic path allocation is in the distribution layer and core layer.
For simplification, the study objects of the network model are chosen as the distribution network of WorkShop 1 and WorkShop 2 , and the core network of the factory. The topological structure of the network is presented in Figure 4, where SX is the application server in the central DC, and EC-SXX is the edge intelligent application system implemented on site, such as main PLC and remote operator.
{SW01, SW02} are core switches, and the rest of the switches are distribution switched. The bandwidth connecting with core switches and between distribution switches are 1000 M and 200 M, respectively.

2.2.2. Traffic Model

As shown in Figure 4, the production management application system is usually implemented in the Central DC. The production management application contains monitoring of production data, video of key places, health of device, and dispatching of speech and video, which can be divided into real-time (RT) service and non-real-time (NRT) service types.
The source nodes and target nodes of the traffic in data center are as follows.
(1)
Source nodes
{ s w 31 , s w 32 , s w 33 , s w 34 , s w 35 , s w 36 , s w 37 , s w 38 , s w 39 , s w 40 , s w 41 , s w 42 , s w 43 }
(2)
Target nodes
{ s w 91 , s w 92 , s w 93 }
Edge intelligent application systems EC-S01, EC-S02, EC-S03, EC-S04, EC-S05, EC-S06, EC-S07 deal with highly real-time traffic, including processing control messages between PLCs, security monitoring of local traffic and remote operation of vehicles, which can be divided into RT and highly-reliable ultra- low-latency types (URT). It is worth mentioning that the traffic from different workshops usually accesses the edge computing system of the workshop itself, which means the source and target nodes are divided into different groups. The group details are as follows.
  Group 1:
(1)
source node 1: { s w 31 , s w 37 , s w 38 , s w 39 , s w 40 ,   s w 41 , s w 42 , s w 43 }
(2)
target nodes 1: { s w 37 , s w 39 , s w 40 , s w 42 }
Group 2:
(1)
source node 2: { s s w 32 , s w 33 , s w 34 , s w 35 , s w 36 ,   s w 44 }
(2)
target nodes 2: { s w 33 , s w 34 , s w 35 }
The bandwidth and delay of traffic are shown in Table 1. To simplify, the delay is defined as the maximum number of nodes in the flow path. Traffic bandwidth is chosen and allocated from set { 2 , 4 , 8 , 16 , 32 } .

2.2.3. Network Slicing Model

The industrial network and switch nodes are both divided into two slices, Slice 1 for NRT type and another for RT and URT type. Figure 4 shows the typical way of slicing and connecting switches; the details are as follows.
(1) SW92 is divided into SW92_sl1 and SW92_sl2, RT application server S5 is connected to SW92_sl2, NRT application server S3 and S4 are connected to SW92_sl1. (2) SW02 is divided as SW02_sl1 and SW22_sl2, the link between SW02 and SW92 is divided into L_sl1 and L_sl2, which are connected to SW92_sl1,SW02_sl1 and SW92_sl2, SW02_sl2, respectively.

2.3. QoS Optimization Model

The topological structure of the inner network of a factory is modeled by undirected graph G = (V, E), where V is the node set of G, presenting the distribution switches and core switches, and E is the edge set of G, presenting the link of the network. When applying network slicing, the network is divided into multiple graphs corresponding to slices, which can be presented as G S L i = ( V S L i , E S L i ). The slice index, nodes, links, capacity and bandwidth of links, and exchanging delay of nodes are denoted as SLi, v S L i , v S L i V S L i , e S L i E S L i , c e S L i , C e S L i T s d S L i , respectively. T simplify, the exchanging delays of nodes in the same slice are the same.
The source-target node pairs required by QoS traffic are composed of arbitrary two node pairs v S L i from source and target node set in V S L i . The traffic requiring node is denoted as K S L i . One forwarding path of node pair k is p k , and the all forwarding path set of the node pair set is P S L i . The bandwidth of allocation requirement and maximum delay of traffic are denoted as b R and T d R , respectively.
The optimization targets of QoS are as follows.
(1)
The path delay of required traffic is less than the maximum traffic delay, which is presented as
L e n ( p k ) × T s d S L i < T d R
(2)
Maximizing the traffic capacity of the network, which is presented as
F = max p k P S L i b R

2.4. Algorithm for Agent

2.4.1. Environment State

Network state, which can be presented as [ S , D , b R , T d R , C e S L i ] , includes stable and dynamic information. The topological structure of network, delay of exchanging nodes, bandwidth of link and betweenness of nodes belong to stable information. The link flow, QoS parameters and source-target nodes belong to dynamic information.

2.4.2. Action

The number of routing combinations of the source-target pair required by traffic is large, which leads to the high dimensional space in a real large-sized network and adds difficulty to the route choosing of the agent. Since delay is a key parameter of QoS, applying K-shortest paths can consider the delay requirement implicitly. Moreover, the dimension can be reduced due to the fact that the action space is only a part of the full set.
The choice of K value depends on the size of the network and routing combinations. The agent chooses only one path according to the environment state. The allocation for traffic flow path is proper as long as the path delay and bandwidth satisfy the requirement.

2.4.3. Reward

Reward is defined as a function of the maximum traffic allocated by the agent on the network, which can be calculated when one episode is finished. The mathematical form is
r = f ( F )

2.5. Designing of Reward Function with Sparse Reward

The optimization object is maximizing the traffic flow when the requirement of delay and bandwidth are satisfied. Since the traffic path allocation may not be optimal before one episode finishes, the reward value of each action cannot be given directly. Therefore, we use reward shaping to guide the learning of the agent and find an optimization policy with the sparse reward. The delay requirement has to be satisfied for factory traffic. Meanwhile, pursuing lower delay is not necessary.
For traffic path allocation, large betweenness of network nodes means that the number of nodes in the shortest path is large. Therefore, the nodes are occupied ahead, which leads to a situation where following traffic cannot be allocated and reduces the traffic flow. To avoid this, a negative term about betweenness of the path nodes can be introduced. Each proper path allocation will increase the traffic flow, and the bandwidth of this path can be designed as a positive term. So, the reward function is designed as
r = α × b R + β b e t w ( p k ) + γ × r d
where α , β , γ are the weight of three terms, b e t w ( p k ) is the summarization of the betweenness of the nodes of traffic path p k .
r d = 0 , L e n ( p k ) × T s d S L i < T d R 1 , L e n ( p k ) × T s d S L i > T d R
The reward function can obtain the agent performance according to the delay and bandwidth reward after traffic path allocation, without the aid of b e t w ( p k ) . The β is set as 0 in this situation.
Since the path delay is easy to obtain, we can delete the path that cannot satisfy the delay requirement when we set the initial path set. The γ is set as 0 in this situation.

2.6. Algorithm

Algorithm 1 presents the training procedure of the agent. The algorithm includes experience data generation and training two parts. The total numbers of iteration N i t e , size of experience memory S m , training epochs of Main Net F m , updating times of parameters of target network F t and experience memory buffer are initialized first (Line 1).
Algorithm 1 The training procedure of agent
initialize Algorithm parameters
while  n _ i < N i t e  do
   initialize environment
   while NOT DONE do
     while  j < S m  do
        get k_shortest_path
        for i = 0 to k do
           p i k _ s h o r t e s t _ p a t h [ i ]
           q _ v a l c o m p u t _ q
        end for
         q ε - g r e e d y
         a a c t i o n c o r r e s p o n d i n g q
         r , S , D O N E S i m u l a t i o n E n v i r o m e n t
         C o m p u t i n g W i , P i
         E x p e r i e n c e b u f f e r S , a , r , S
        j = j + 1
     end while
     if  n _ i % F m = = 0  then
        sample Experience buffer
         u p d a t e p i T D _ e r r o r | δ i |
         Δ Δ + w i δ i
         u p d a t e M a i n N e t θ θ + Δ
        if  n _ i % F t = = 0  then
           u p d a t e T a r g e t N e t θ θ
        end if
     end if
   end while
   n_i = n_i + 1
end while

2.6.1. Experience Data Generation and Memory

Each iteration corresponds to one episode, which is finished when a traffic path cannot satisfy the bandwidth requirement. We initialize the environment state, set the link capacity to the maximum and calculate the link betweenness (Line 3).
Then, the traffic generator sends traffic requirements described by [ S , D , b R , T d R , C e S L i ] to the network model. The network model calculates K-shortest paths and forms the action sets according to the source-target nodes. The simulation environment sends the network state, traffic requirement, action sets, available link capacity and betweenness to the E-DQN, which calculates the q-value of each action, and chooses feedback for the action corresponding to the maximum q-value by greedy algorithm. The feedback action is used by the industrial SDN network model to generate new a condition S and reward (Line 6–13).
The environment model sends the above experience data [ S , D , b R , T d R , C e S L i , A c t i o n , r , S ] to the experience replay memory function of E-DQN, which calculates the priority and weight of the experience data, and stores the data in the experience buffer (line 14–15).
When the network cannot allocate new traffic, it sets ‘DONE’ as ‘True’, exits the loop and starts the next loop until it achieves the maximum epochs. Training is called every F M iterations.

2.6.2. Training

We extract data from the experience buffer to train the main network of E-DQN and update the priority and weight. The parameters of the main network are updated by new weight, and the parameters of the target network are updated every F T iterations.

3. Performance Evaluation

3.1. Simulation

The factory network environment is simulated by OpenAI gym platform. The E-DQN algorithm is built using the deep learning framework of keras and TensorFlow. The experimental settings are shown in Table 2. The network model described in Section 2.2 is used.

3.2. Simulation Methods

This experiment evaluates and analyzes the influence of reward function on the learning efficiency and effect of the agent algorithm. After training, the proposed method will allocate the path in factory network. The maximum communication capacity is used to evaluate the performance of the proposed methods with conventional Dijkstra and load balancing routing policy (LB). The LB policy selects randomly one path among the K = 4 candidate shortest paths. Moreover, the performance of the agent algorithm is evaluated under the state of network slicing. In addition, the performance of the agent will also be analyzed on the condition of network node error.

3.3. Simulation Results

3.3.1. Reward Function Has Significant Effect to Learning Efficiency of Agent

(1)
Observing the learning efficiency with γ = 0.5 and γ = 5 , When α = 1 , β = 0
We set α = 1 , β = 0 , ignore the betweenness of nodes in the path, simulate the learning procedure of agent with γ = 0.5 and γ = 5 , respectively. The simulation results are show in Figure 5, which contain two subfigures. Figure 5a shows the average values of the bandwidth score and latency score obtained by the agent in the entire training process, and Figure 5b only shows the average value of the latency score from the training beginning till the value stable. Under the conditions of γ = 0.5 and γ = 5 , the curves of two data sets are plotted on a single figure. The bandwidth and delay are recorded. The solid line in Figure 5 presents the average times of the paths that do not satisfy the delay requirement in the learning procedure. It can be seen that the decreasing speed of the number of allocation unsatisfied delay requirement are the same when γ = 0.5 and γ = 5 , as shown in subfigure (b) of Figure 5. When N > 13, i.e., the number of learning episodes is bigger than 10,400 (13 × 40 × 20), the average number of allocated paths that do not satisfy the delay requirement converges to around 1. The reason is that the probability of occurring paths that do not satisfy the delay requirement is low due to the fact that the optional link set is K = 4 minimum path. Therefore, the punishment has little effect on learning efficiency. However, from the view of learning performance, the average times per episode that paths do not satisfy the delay requirement is less than 1 when γ = 5 . Meanwhile, when γ = 0.5 , the average times per episode that paths do not satisfy the delay requirement is more than 1 and less than 2. This leads to the conclusion that the large γ has better learning performance, as shown in subfigure (a) of Figure 5.
(2)
Observing the learning efficiency and performance when β 0 and considering the betweenness of nodes in the path.
The simulation results are show in Figure 6, which contain two subfigures. Figure 6a shows the average values of the bandwidth score, betweenness score and latency score obtained by the agent in the entire training process, and Figure 6b only shows the average value of the latency score in the early stage of training process. The larger the betweenness of the path assigned by the agent, the more nodes of the path intersect with other existing paths, and the smaller the reward to the agent. This will guide the agent to allocate paths that intersect as few nodes as possible with other existing paths when new traffic requests arrive, so that the link load will not be too concentrated. This policy can ensure that new assigned paths tend to choose paths that pass through fewer nodes, thereby reducing the occurrence of exceeding latency requirements. The verification results in the early stage of training are shown in the right of Figure 6. It can be seen that when N = 5, the delay penalty quickly drops below 10 (i.e., r d > 10 ). Compared to the subfigure (b) of Figure 5, the learning efficiency of the agent has been improved. It has an obvious effect on improving the delay reward that the betweenness sum of all nodes on the path is used as an item of the reward function.
(3)
We use implicit delay limitation, which deletes the paths that do not satisfy the delay requirement and saves the rest of the paths into the experience memory in the learning stage, when γ = 0 . When training is finished, we count the situations that the flow paths do not satisfy the delay requirement in the test stage.
From Table 3, it can be seen that the training performance of E-DQN is not satisfied; on average, there are more than five times that the allocated path does not meet the delay requirement in one episode. Then, the results suggest that, although the flow paths for training all satisfy the delay requirement, the delay punishment does not guide the agent to learn the delay requirement due to the fact that the delay punishment is not in the reward function.

3.3.2. The Effect of Parameters in the Reward Function to the Performance

We set the parameters of the reward function as: α = 1 , β = 0 , γ = 0.5 and 5, respectively. After training, we record the maximum communication capacity and the average times that the allocated path does not satisfy the delay requirement. The results are shown in Table 4.
The table shows that when γ = 0.5 , the average times that the allocated path does not satisfy the delay requirement is about two per episode; this value is less than 1 when γ = 5 . The results imply that the bigger γ can achieve better performance.

3.3.3. Introducing Delay Requirement Choosing Mechanism for Agent Achieves Better Performance

From the simulation results, we find that although the punishment to the paths that do not satisfy the delay requirement can be enhanced by adding the delay punishment parameter, the paths that do not satisfy the delay requirement still exist. Therefore, we propose the delay requirement choosing mechanism for agent. Specifically, after the link sets are generated by the agent, we firstly delete the paths that do not satisfy the delay requirement, and choose the paths in the rest of the paths. The simulation results show that the delay requirement choosing mechanism not only ensures that the paths satisfy the delay requirement without decreasing of communication capacity, but also outperforms LB and Dijkstra.

3.3.4. Performance Comparison with Other Algorithms

From the results in Table 5 and Table 6, we can see that E-DQN achieves the best maximum communication capacity among the compared methods. Although Dijkstra outperforms the proposed method in terms of delay performance, the average times of the proposed method that the allocated path does not satisfy the delay requirement is less than one, which is also excellent.
Comparing Table 6 and Table 7, we can also see that the delay performance decreases by introducing the betweenness of path node to reward function, which suggests that weakening the delay component has a bad influence on delaying performance.

3.3.5. Performance of Agent with Network Slicing Model

The parameters in the reward function are set as α = 1 , β = 0 and γ = 5 . The network is divided into two slices. The switch node in each slice keeps 50% switching capacity and the same exchanging delay.
From Table 8, it can be seen that by slicing isolation, all paths satisfy the delay requirement due to the fact that the NRT traffic is not influenced by the traffic path of RT and U-RT.
From Table 9, it can be seen that by slicing isolation, the traffic with big bandwidth does not influence RT and U-RT traffics, and has big communication bandwidth. The allocations that do not satisfy the delay requirement slightly increase due to the increasing of allocated paths. From Table 10, the results show that when exchanging delay of each switch nodes of slice 2 reduces to 1 2 , it achieves maximum communication capacity and all paths satisfy the delay requirement.

3.3.6. Agent Performance with Error of Core Switches

The parameters of the reward function are set as α = 1 , β = 0 , γ = 5 . The agent performance of 40 episodes is shown in Table 11. From the table, we can conclude that the proper flow path allocation can still be obtained, and delay performance does not decrease, even with the error of core switches SW01. It is reasonable that the communication capacity decreased due to the switches error.

4. Conclusions

In this paper, the proposed algorithm uses GNN to improve the generalization ability of the agent about the network topology. It can allocate the proper traffic path even when the topological structure of the network changes due to the error of nodes or links. Moreover, the proposed algorithm uses the prioritized experience replay method to improve the training efficiency, which is realistic in a network situation with sparse data. Through theoretical analysis and simulation, the effect of designed reward function on training results is verified with the limitation of bandwidth and delay multi-objective condition. The effectiveness of the implicit guide based on prior knowledge is also verified.
The research objects of this paper are the inner network of a smart factory, and its production management application traffic and edge intelligent application traffic. Simulation results show that the agent has good performance for diversity traffic flow path allocation, which can satisfy the requirement of delay and bandwidth simultaneously. The study of this paper creates a base for building an autonomous inner network of a smart factory.

Author Contributions

Conceptualization, Q.G. and Q.J.; methodology, Q.G.; software, Z.L.; validation, M.L.; formal analysis, Q.G.; investigation, Q.G.; resources, L.C.; data curation, Z.L. and M.L.; writing—original draft preparation, Q.G.; writing—review and editing, L.C.; visualization, Z.L.; supervision, L.C. and Z.D.; project administration, X.D.; funding acquisition, Z.D. and L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key R&D Program of China (2021YFB3301100) and the Fundamental Research Funds for the Central Universities (ZY2302).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Liu, J.; Li, D.; Zeng, P. Research on future industrial network architecture based on SDN and TSN. Autom Panor 2018, 35, 56. [Google Scholar]
  2. Jin, Q.; Guo, Q.; Niu, Y.; Wang, Z.; Luo, M. Collaborative Control and Optimization of QoS in 5G and Industrial SDN Heterogeneous Networks for Smart Factory. In Proceedings of the 2021 International Conference on Space-Air-Ground Computing (SAGC), Huizhou, China, 23–25 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 89–94. [Google Scholar]
  3. 5G Plus Industrial Internet Application Development White Paper. 2019. Available online: http://www.aii-alliance.org/index/c316/n58.html (accessed on 1 September 2022).
  4. Edge Native Technical Architecture White Paper1.0. 2021. Available online: http://www.ecconsortium.org/Lists/show/id/552.html (accessed on 10 October 2022).
  5. Jin, Q.; Guo, Q.; Luo, M.; Zhang, Y.; Cai, W. Research on High Performance 4G Wireless VPN for Smart Factory Based on Key Technologies of 5G Network Architecture. In Proceedings of the 2020 International Wireless Communications and Mobile Computing (IWCMC), Limassol, Cyprus, 15–19 June 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1443–1447. [Google Scholar]
  6. Bagaa, M.; Dutra, D.L.C.; Taleb, T.; Samdanis, K. On sdn-driven network optimization and qos aware routing using multiple paths. IEEE Trans. Wirel. Commun. 2020, 19, 4700–4714. [Google Scholar] [CrossRef]
  7. Dapyun, H.; Jin, Q.; Qianchun, L.; Feng, L.; Hongqiang, F. Research and application of traffic engineering algorithm based on deep learning. Telecommun. Sci. 2021, 37, 107–114. [Google Scholar]
  8. Lan, J.; Zhang, X.; Hu, Y.; Sun, P. Software-defined networking QoS optimization based on deep reinforcement learning. J. Commun. 2019, 40, 60–67. [Google Scholar]
  9. Casas-Velasco, D.M.; Rendon, O.M.C.; da Fonseca, N.L. Intelligent routing based on reinforcement learning for software-defined networking. IEEE Trans. Netw. Serv. Manag. 2020, 18, 870–881. [Google Scholar] [CrossRef]
  10. Xu, Z.; Tang, J.; Meng, J.; Zhang, W.; Wang, Y.; Liu, C.H.; Yang, D. Experience-driven networking: A deep reinforcement learning based approach. In Proceedings of the IEEE INFOCOM 2018-IEEE Conference on Computer Communications, Honolulu, HI, USA, 16–19 April 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1871–1879. [Google Scholar]
  11. Wei, W.; Fu, L.; Gu, H.; Zhang, Y.; Zou, T.; Wang, C.; Wang, N. GRL-PS: Graph embedding-based DRL approach for adaptive path selection. IEEE Trans. Netw. Serv. Manag. 2023. [Google Scholar] [CrossRef]
  12. Shoupeng, L.; Siyu, Q.; Shaofu, L.; Xiliang, L.; Huamin, C. Survey of Graph Neural Net- work and its applications in communication networks. J. Beijing Univ. Technol. 2021, 47, 971–981. [Google Scholar] [CrossRef]
  13. Schaul, T.; Quan, J.; Antonoglou, I.; Silver, D. Prioritized experience replay. arXiv 2015, arXiv:1511.05952. [Google Scholar]
  14. Messaoud, S.; Bradai, A.; Ahmed, O.B.; Quang, P.T.A.; Atri, M.; Hossain, M.S. Deep Federated Q-Learning-Based Network Slicing for Industrial IoT. IEEE Trans. Ind. Inform. 2021, 17, 5572–5582. [Google Scholar] [CrossRef]
  15. Huang, L.; Ye, M.; Xue, X.; Wang, Y.; Qiu, H.; Deng, X. Intelligent routing method based on Dueling DQN reinforcement learning and network traffic state prediction in SDN. Wirel. Netw. 2022, 1–19. [Google Scholar] [CrossRef]
  16. Abadal, S.; Jain, A.; Guirado, R.; López-Alonso, J.; Alarcón, E. Computing graph neural networks: A survey from algorithms to accelerators. ACM Comput. Surv. (CSUR) 2021, 54, 1–38. [Google Scholar] [CrossRef]
  17. Badia-Sampera, A.; Suárez-Varela, J.; Almasan, P.; Rusek, K.; Barlet-Ros, P.; Cabellos-Aparicio, A. Towards more realistic network models based on graph neural networks. In Proceedings of the 15th International Conference on Emerging Networking Experiments and Technologies, Orlando, FL, USA, 9–12 December 2019; pp. 14–16. [Google Scholar]
  18. Almasan, P.; Suárez-Varela, J.; Rusek, K.; Barlet-Ros, P.; Cabellos-Aparicio, A. Deep reinforcement learning meets graph neural networks: Exploring a routing optimization use case. Comput. Commun. 2022, 196, 184–194. [Google Scholar] [CrossRef]
  19. Ge, Z.; Hou, J.; Nayak, A. Forecasting SDN End-to-End Latency Using Graph Neural Network. In Proceedings of the 2023 International Conference on Information Networking (ICOIN), Bangkok, Thailand, 11–14 January 2023; pp. 293–298. [Google Scholar] [CrossRef]
  20. Jin, C.; Krishnamurthy, A.; Simchowitz, M.; Yu, T. Reward-free exploration for reinforcement learning. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 4870–4879. [Google Scholar]
  21. Savinov, N.; Raichuk, A.; Marinier, R.; Vincent, D.; Pollefeys, M.; Lillicrap, T.; Gelly, S. Episodic curiosity through reachability. arXiv 2018, arXiv:1810.02274. [Google Scholar]
  22. Reddy, S.; Dragan, A.D.; Levine, S. Sqil: Imitation learning via reinforcement learning with sparse rewards. arXiv 2019, arXiv:1905.11108. [Google Scholar]
  23. Racaniere, S.; Lampinen, A.K.; Santoro, A.; Reichert, D.P.; Firoiu, V.; Lillicrap, T.P. Automated curricula through setter-solver interactions. arXiv 2019, arXiv:1909.12892. [Google Scholar]
  24. IIOT Network Connection Technology White Paper (Ver2.0). 2021. Available online: http://www.aii-alliance.org/index/c316/n2569.html (accessed on 1 September 2022).
Figure 1. The structure of the inner network of the smart factory.
Figure 1. The structure of the inner network of the smart factory.
Sustainability 15 11847 g001
Figure 2. Traffic Flow Path allocation and optimization Architecture.
Figure 2. Traffic Flow Path allocation and optimization Architecture.
Sustainability 15 11847 g002
Figure 3. Algorithm Framework.
Figure 3. Algorithm Framework.
Sustainability 15 11847 g003
Figure 4. Industrial SDN network Topology.
Figure 4. Industrial SDN network Topology.
Sustainability 15 11847 g004
Figure 5. Learning efficiency of reward function with γ = 0.5 and γ = 5 .
Figure 5. Learning efficiency of reward function with γ = 0.5 and γ = 5 .
Sustainability 15 11847 g005
Figure 6. Learning performance of reward function with path nodes betweenness.
Figure 6. Learning performance of reward function with path nodes betweenness.
Sustainability 15 11847 g006
Table 1. Target Node and its QoS.
Table 1. Target Node and its QoS.
DestNodesTrafficTypeMaxDelayDataRate
SW91, SW92NRT7, 82, 4, 8, 16, 32
SW92, SW93, SW33RT5, 62, 4, 8, 16
SW34, SW35, SW39U-RT3, 42, 4
SW37, SW40, SW42RT5, 62, 4, 8, 16
Table 2. Experimental settings.
Table 2. Experimental settings.
ItemConfiguration
ServerAMAX 2080Ti
OSUbantu20.04
GPUNVIDIA GEFORCE 2080
Memory32G
AlgorithmTensorflow2.7 keras2.7
DRL Environmentgym 0.21.0
Table 3. Training performance with γ = 0.
Table 3. Training performance with γ = 0.
R-bwR-LatencyBetweenness
E-DQN32.5656 28.375 29.6014
Table 4. Traing performance with β = 0.
Table 4. Traing performance with β = 0.
R-bwR-LatencyAVG
γ = 0.532.4844 0.95 1.9
γ = 534.0845 4.125 0.825
Table 5. Agent performance with Delay constraint.
Table 5. Agent performance with Delay constraint.
AlgorithmR-bwR-LatencyBetweenness
Dijkstra24.775019.3767
LB2.47502.9193
E-DQN33.2859030.3916
Table 6. Agent performance with β = 0, γ = 5.
Table 6. Agent performance with β = 0, γ = 5.
AlgorithmR-bwR-Latency
Dijkstra25.27810
LB28.1609 61
E-DQN34.0845 4.125
Table 7. Agent performance with β = 1, γ = 5.
Table 7. Agent performance with β = 1, γ = 5.
AlgorithmR-bwR-LatencyBetweenness
Dijkstra25.2781025.0043
LB28.1609 59.25 24.0102
E-DQN31.4156 4.75 30.0176
Table 8. Agent performance at Sic1.
Table 8. Agent performance at Sic1.
AlgorithmR-bwR-Latency
Dijkstra9.66120
LB13.03520
E-DQN14.26330
Table 9. Agent performance at Slic2.
Table 9. Agent performance at Slic2.
AlgorithmR-bwR-Latency
Dijkstra13.09690
LB14.8477 40.8750
E-DQN17.9242 3.0625
Table 10. Agent performance at Slic2 with 1/2 Switch delay.
Table 10. Agent performance at Slic2 with 1/2 Switch delay.
AlgorithmR-bwR-Latency
Dijkstra13.09690
LB14.8477 0.5
E-DQN17.92420
Table 11. Agent performance at SwitchNode error.
Table 11. Agent performance at SwitchNode error.
AlgorithmR-bwR-Latency
Dijkstra24.7750
LB29.26−38.45
E-DQN31.7766 3.85
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Guo, Q.; Jin, Q.; Liu, Z.; Luo, M.; Chen, L.; Dou, Z.; Diao, X. Research on QoS Flow Path Intelligent Allocation of Multi-Services in 5G and Industrial SDN Heterogeneous Network for Smart Factory. Sustainability 2023, 15, 11847. https://doi.org/10.3390/su151511847

AMA Style

Guo Q, Jin Q, Liu Z, Luo M, Chen L, Dou Z, Diao X. Research on QoS Flow Path Intelligent Allocation of Multi-Services in 5G and Industrial SDN Heterogeneous Network for Smart Factory. Sustainability. 2023; 15(15):11847. https://doi.org/10.3390/su151511847

Chicago/Turabian Style

Guo, Qing, Qibing Jin, Zhen Liu, Mingshi Luo, Liangchao Chen, Zhan Dou, and Xu Diao. 2023. "Research on QoS Flow Path Intelligent Allocation of Multi-Services in 5G and Industrial SDN Heterogeneous Network for Smart Factory" Sustainability 15, no. 15: 11847. https://doi.org/10.3390/su151511847

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop