1. Introduction
In the context of advancing 6G technologies, the demands on optical communication networks have become more intricate and diverse [
1]. The emergence of increasingly complex application scenarios in vertical industries, coupled with the rise of high-value dedicated line services, has led to a surge in network carrying demands [
2]. This encompasses requirements for small bandwidth, high isolation, low-latency determinism, and a strong emphasis on high security and reliability. To meet the burgeoning need for fine-granularity resource allocation beyond conventional coarse-grained provisioning, a pivotal shift in the network architecture and protocols is imperative. This is particularly crucial for scenarios requiring dynamic and flexible resource management, such as ultra-reliable low-latency communications (URLLC) and the massive machine-type communications (mMTC) envisioned in the 5G paradigm.
In response to this exigency, the introduction of FlexE (Flexible Ethernet) technology emerges as a transformative solution [
3]. FlexE is a pivotal advancement in optical networking that enables the subdivision of Ethernet connections into smaller, flexible sub-connections, each with an independently configurable bandwidth and characteristics [
4]. This granular resource allocation empowers network operators to efficiently utilize the physical network infrastructure, thereby accommodating the diverse traffic demands expected in the 6G landscape [
5].
FlexE operates at the physical layer of the network stack, enabling the dynamic partitioning of Ethernet links into sub-links [
6,
7]. These sub-links, referred to as “FlexE groups”, are configurable with respect to their bandwidth allocation, providing a level of flexibility unprecedented in traditional Ethernet architectures [
8]. The key innovation lies in the ability to allocate and manage bandwidth in increments as small as 10 Mbps, allowing for fine-grained control over network resources.
Moreover, FlexE introduces the concept of “slots”, allowing for the precise allocation of bandwidth resources within a FlexE group. Each slot corresponds to a discrete unit of time during which a specific amount of bandwidth is available for the data transmission. This temporal granularity enhances the adaptability of the network to varying traffic patterns, ensuring the efficient utilization of resources in scenarios characterized by dynamic and bursty traffic [
9].
Upon the arrival of a service within a network slicing, ensuring Quality of Service (QoS) for distinct services mandates the proper allocation of route paths contingent upon the extant network status, coupled with the provision of corresponding time slot resources along each link [
10]. The transition from a granularity of 5 Gbps to 10 Mbps engenders the introduction of a pioneering fine-granularity unit (FGU) sublayer within the FlexE frame structure. This introduction, in tandem with the further segmentation and reutilization of FGUs for 5 Gbps time slots, precipitates an exponential augmentation in the number of time slots managed per link. The reduction in granularity accentuates the salience of the non-deterministic delay induced by time slot conflicts, rendering it non-negligible. Additionally, the dynamic fluctuation in service bandwidth presents a formidable challenge for expeditious and dependable time slot reallocation. The surpassing of the prescribed service bandwidth thresholds may necessitate not only the assignment of slot resources, but could also exert an influence on routing determinations, potentially instigating momentary service interruptions and fluctuations. In recent years, traffic prediction technology has garnered widespread application in improving network decision-making along with the variation in the dynamic traffic trends. Empowered via the a priori knowledge afforded by traffic prediction, networks are aptly poised to proactively extend resource computation and bandwidth reservation for the network service. Ensuring QoS for latency-sensitive services is contingent upon the imperative design of a time slot scheduling apparatus tailored for 10 Mbps fine-granularity slices, alongside a time slot reallocation mechanism underpinned by traffic prediction.
In this study, we address fine-granularity flexible ethernet slot orchestration for dynamic service scenarios based on proactive multi-flow assisted service awareness. We propose a proactive multi-flow assisted attention-based neural network (PMFAN) for high-accuracy service traffic prediction. Building upon service-awareness information, we further present a genetic algorithm-based deterministic slot orchestration algorithm (GDSO) to support end-to-end low-latency transmission, resolving the slot scheduling challenge posed by dynamic traffic requirements. We evaluate the proposed traffic prediction algorithm and slot orchestration mechanism using the real-world traffic datasets and network data. The results demonstrate that a proactive priori knowledge provided by traffic prediction significantly reduces the computation time by up to 46.8%.
The rest of the paper is organized as follows.
Section 2 provides a review of related work on FlexE and network traffic prediction.
Section 3 formulates the problem.
Section 4 presents the PMFAN prediction algorithm and GDSO Orchestration algorithm. The simulation results are represented in
Section 5, and
Section 6 concludes this paper.
2. Related Works
FlexE is an interface technology for the bearer network to realize service isolation and network slicing. Since ITU-T standard organizations have accelerated the FlexE standard process [
11], it has developed rapidly in recent years. Eira et al. [
4] provided a solution for decoupling the interface rates between routers and transport devices from the actual data flows and evaluated the trade-off between a transport box’s complexity and its ability to utilize light paths effectively, offering insights into the impact of FlexE use cases on the router port efficiency, transport box provisioning, and DWDM layer capacity in DCI contexts. D. Koulougli et al. [
12] explored optimized routing in complex multi-layer, multi-domain IP-optical networks using a hierarchical PCE and FlexE technology. It formulates optimization problems that consider QoS, privacy, and FlexE constraints and introduces novel algorithms for efficient routing and client assignment. P. Zhu et al. [
13] addressed the security concerns in next-gen RAN transport, focusing on eavesdropping attacks at the physical layer. A cross-layer approach using FlexE and WDM is proposed for enhanced security. Various attack levels are considered, and a trade-off between resource efficiency and security is explored. The numerical results demonstrate the effectiveness of the defense strategies. D. Koulougli et al. [
14] addressed routing optimization in multi-layer multi-domain networks, emphasizing inter-layer and inter-domain coordination. It introduces a hierarchical path computation engine (PCE) that leverages FlexE technology to enhance the network performance by linking the IP and optical domains, presenting an efficient algorithm for routing and FlexE assignment and achieving a substantial performance improvement of an 87% optimal throughput. And, in [
12], the authors investigate the FlexE Traffic Restoration (FTR) problem that aims to maintain high network utilization via the fast recovery of FlexE clients with the minimum cost using the spare capacity in the already deployed PHYs. H. Liang et al. [
15] addressed the integration of flexible Ethernet (FlexE) and elastic optical networks (EONs) in FlexE-over-EONs scenarios, specifically focusing on the FlexE-aware architecture. It introduces mixed integer linear programming (MILP) and integer linear programming (ILP) models for single-hop and multi-hop scenarios, respectively, and presents highly time-efficient approximation algorithms that provide solutions closely approaching the optimal ones in large-scale planning. Based on [
12,
14], D. Koulougli et al. [
16] investigated the FlexE Traffic Restoration (FTR) problem that aims to maintain high network utilization via the fast recovery of FlexE clients with the minimum cost using the spare capacity in the already deployed PHYs. And, based on [
13], P. Zhu et al. [
17] introduced a cross-layer security design for FlexE over WDM networks, specifically addressing eavesdropping threats at the physical layer. The approach combines universal Hashing-based FlexE data block permutation with parallel fiber transmission to enhance security. The study evaluates different attack levels, balancing resource efficiency and security, and demonstrates the effectiveness of the proposed cross-layer defense strategies. M. Wu et al. [
18] investigated cross-layer restoration (CLR) in FlexE-over-EONs, specifically addressing temporary outages in FlexE switches. Extensive simulations confirm the effectiveness of these CLR strategies, highlighting the potential of FlexE and elastic optical networks for efficient restoration in scenarios involving FlexE switch outages. Recently, Gu, R., et al. [
19] established a model of routing embedded timeslot scheduling for the routing of fine-granularity slices and timeslot scheduling problems in SPN-based FlexE interfaces, for which a deterministic timeslot allocation mechanism supporting end-to-end low-latency transmission is proposed.
Currently, most optical network resource scheduling algorithms focus on improving resource utilization based on the rules, with a limited utilization of knowledge information within the network, such as historical traffic data and historical decision data. This leads to a lack of learning tailored to the deployment environment in network decision-making. As a result, some studies are dedicated to predicting key information affecting the scheduling of optical network resources to enhance resource allocation. This paper aims to utilize traffic prediction technology to anticipate the future operational state of FlexE clients, enabling the proactive reservation of slot resources. However, network traffic exhibits highly nonlinear and bursty characteristics, posing significant challenges for accurate traffic prediction. Additionally, the rapid fluctuations in traffic, in contrast to long-term time steps, present substantial challenges for traffic prediction, particularly in the context of FlexE service calendar switching that necessitates recalculations.
Traffic prediction methods can be categorized into two types: classical models and deep learning methods. Classical models exhibit good interpretability and perform well in linear prediction. Auto Regressive Integrated Moving Average (ARIMA) combines autoregressive, differencing, and moving average components to capture the linear dependencies and trends in the data [
20]. However, ARIMA struggles to provide accurate predictions for the nonlinear part of the traffic data in the present network.
With the increase in the data and computability, deep learning methods have been widely applied in network traffic prediction. Recurrent Neural Networks (RNN) and their variants are a class of deep models widely applied in time series forecasting due to their ability to retain information from previous sequences. Hallas et al. applied RNN to network traffic prediction [
21]. Trinh et al. utilized LSTM for traffic prediction using the real-world datasets [
22]. To capture longer-term dependencies, Hochreiter et al. introduced Long Short-Term Memory (LSTM) by modifying memory cells to preserve long-term information [
23]. Lazaris et al. demonstrated that LSTM outperforms the ARIMA model by approximately 30% in accurately predicting the link throughput [
24]. Zhang et al. proposed LSTM-based Network Traffic Prediction (LNTP), a hybrid optimized model for end-to-end network traffic prediction [
25].
The aforementioned deep learning methods primarily focus on utilizing the temporal features of individual flows. Vinchoff et al. utilized Graph Convolutional Networks (GCN) to improve the prediction accuracy [
26,
27]. Lin et al. combined MGCN (Multi-Graph Convolutional Network) with LSTM for wireless traffic prediction [
28]. Zhang et al. utilized densely connected Convolutional Neural Networks (CNN) to capture the spatial dependencies of cell traffic [
29]. Huang et al. experimentally verified that CNN is suitable for extracting inter-node correlations, while RNN is effective in capturing temporal features [
30]. These methods achieved a 70% to 80% accuracy in various forecasting tasks, outperforming CNN and 3DCNN. Li et al. proposed LA-ResNet, which combines residual networks with RNN and incorporates attention mechanisms to assist in traffic prediction [
31]. Cui et al. [
32] employed Convolutional Long Short-Term Memory (ConvLSTM) to integrate convolutional layers into the LSTM cells, enabling the combination of spatial–temporal features. However, these works mainly focus on utilizing spatial or topological information to assist in wireless traffic prediction without considering the correlation between services, which could further improve the accuracy of traffic prediction.
Several studies have explored the application of Generalized Nets (GNs) in network traffic prediction, demonstrating its effectiveness in capturing the dynamics of network traffic [
33]. Smith et al. [
34] proposed a GN-based network traffic prediction model, leveraging GNs to describe network elements and their interactions. The model demonstrated high accuracy and robustness in predicting network traffic fluctuations. Li and Wang [
35] introduced a GN-based approach for network traffic prediction, combining time series analysis and machine learning techniques. They employed GNs to describe the dynamic changes in network traffic and applied time series analysis to the GN node and connection attributes. The integration of GNs and machine learning algorithms improved the accuracy and stability of traffic prediction. In addition, Chen et al. [
36] proposed a fusion model that combines GNs with deep learning for network traffic prediction. GNs were utilized to describe network elements, and deep learning algorithms were applied to the GN node and connection attributes. The fusion model exhibited a superior performance in capturing the complex patterns and features of network traffic. In summary, GNs have shown promising applications in network traffic prediction. By describing and simulating network elements and their relationships, GNs effectively capture the dynamic changes and fluctuations in network traffic. Furthermore, the integration of GNs with other techniques, such as time series analysis and deep learning, enhances the accuracy and reliability of traffic prediction. Future research can further explore the applications of GNs in network traffic prediction and develop more efficient and robust prediction models.
3. Problem Formulation
In this section, we analyze and formally describe the Flexible Ethernet slot orchestration problem based on traffic prediction.
3.1. FlexE Architecture
The service data flow of traditional Ethernet technology based on IEEE 802.3 is first encapsulated into MAC frames at the MAC layer and connected to the PHY layer through the RS Layer (Reconciliation Sublayer, physical coordination sublayer), and the PCS (Physical Coding Sublayer) in the PHY layer completes the 64B/66B encoding work; the PMA (Physical Medium Attachment) sub-layer completes the serial-to-parallel conversion, clock synthesis, and recovery work; and finally, the PMD (Physical Medium Dependent, physical media related) sub-layer completes the interface work for various actual physical media.
As depicted in
Figure 1, the FlexE architecture consists of the FlexE Client, FlexE Shim, and FlexE Group. Firstly, the FlexE Client represents various standard service interfaces, it mainly represents the data flow of MAC carried in the FlexE network. The main function is to convert the data stream into 66bit code blocks after 64B/66B encoding and transmit it to the FlexE Shim layer. Secondly, the FlexE Shim is to implement the core function of FlexE technology between the MAC and the PHY defined via IEEE 802.3, which completes the mapping from the Client to the Group based on the calendar mechanism. At last, the FlexE Group is essentially a collection of multi-layered PHY, which carries the 66-bit data code blocks mapped via FlexE Shim from the FlexE Client layer and also supports by carrying multiple FlexE Instances.
3.2. Network Model
A network-based FlexE interface can be represented via a directed graph , where represents the set of nodes, indicates that the number of the nodes is , and represents the set of links in the network. Also, the adjacent nodes are connected by two links in opposite directions; for example, denotes the link from node to node , and denotes the link from node to node . The link length sets are defined as , and represents the link bandwidth sets in the FlexE.
3.3. Service Model
For any service in the network connection request sets, we represent it via a triple , where denotes the source node, denotes the destination node, and denotes the bandwidth required by the service, respectively. Also, for the service , we define the path as , where is composed of K pre-computed paths . In addition, we assume that the service is transmitted from the source node to the destination node with the intermediate nodes, only considering the forwarding method based on slot crossing.
Each service in the FlexE is an Ethernet traffic flow based on the MAC data rate, and all services are represented as a set
. In order to accurately predict the traffic demand, our goal is to calculate the most likely data matrix volume at time slot k based on the historical data of the previous
slots. The problem formulation is shown as follows:
where
are the historical observations of traffic flow at previous
time slots.
3.4. Problem Formulation
The slot occupancy of the FlexE frames transmitted on the link is represented by an array , where each binary bit of the array represents the occupancy of the slot of the corresponding frame, where the value “0” means that the slot is available in the idle state, and the value “1” means the slot has been occupied, and denotes the total number of slots in each FlexE frame. In addition, we assume that there are already some background services in the network occupying the slot resources.
Importantly, the current cellular network faces challenges in propagating the high-capacity data with improved speed, QoS, latency, and efficient HO and mobility management [
37]. And ref. [
38] proposed an overall prediction method of QoE parameters of users and telecommunication networks based on the QoS indicators’ values prediction, discussed four normalization techniques, and finally proposed a normalization method of the index scale. However, the problem presented in this paper is oriented to scenarios with high requirements for low latency, and delay is one of the most important factors in QoS. Therefore, we decided to set the optimization objective to minimize the end-to-end delay, where the objective function is defined as follows:
where
indicates the propagation delay of service
from
to
,
is the forwarding delay generated by the service at the intermediate nodes in the path, and
is the scheduling delay caused by the service due to slot allocation.
The above objective function should be minimized under the following constraints.
Routing Continuity Constraint: (3) means there is a routing continuity constraint between
to
, where
is a binary variable that is equal to 1 if the
occupies
and is equal to 0 for otherwise.
Bandwidth Continuity Constraint: (4) means the service
occupies the same size of bandwidth resources on different links
and
in the route.
Capacity Constraint: (5) ensures that the number of PHYs allocated to the services has a sufficient calendar slot.
Slot Uniqueness Constraint: (6) indicates that the slot of each link is allowed to carry only one data code block at a certain moment, and the variable
is equal to 1 if the service
contains link
and is equal to 0 for otherwise, while the variable
is equal to 1 if the
tth slot is occupied and is equal to 0 for otherwise.
5. Result
In this section, we first describe the settings of the experimental parameters and the simulation environment. Then, we perform a large number of simulations to evaluate the performance and feasibility of the research.
All simulations were performed on servers with Core i7 CPU and 32 GB RAM. In order to evaluate the algorithm performance in a complex and real network topology, we conducted subsequent experimental work based on the optical backbone network topology in a certain place. The topology is shown in
Figure 4. In the follow-up experiments, we also generated several networks of different sizes, but with the same structure using a scale-free algorithm randomly, setting the size of the network nodes to
(
= 32, 50, 68, 88, 102, 118).
In this paper, a dataset provide by China Telecom is used for validation in our experiment. To show the effectiveness of the model, this paper also used the non-deep learning method and other deep learning methods, which have proved to have good performance, as a comparative test. The specific models utilized in this paper include LSTM, GRU, and 3DCNN, all of which possess the capability of multi-step prediction. Multi-step prediction refers to the process of forecasting multiple future time steps based on the available input data. The ability to perform multi-step prediction ensures that the prediction time can be advanced, thereby guaranteeing that the allocation algorithm completes its execution before the predicted time point arrives.
In this paper, we used two evaluation indicators to compare the performance of different service traffic prediction models. They are the R-Square
and Normalized Mean Squared Error (NMSE), and their definition are as follows in (18) and (19).
Table 1 shows the indicators of the prediction results of different models. ARIMA, as a traditional linear model, is limited in capturing the nonlinear features, which leads to a subpar prediction performance, with
being only 0.826654. In contrast, LSTM and its variant GRU, with their ability to capture nonlinear features, have significantly improved the accuracy of predictions, with
being improved by 9% and NMSE being reduced by 48%. The 3DCNN, by incorporating inter-service correlations, also contributes to improving
to 0.925660 and reducing NMSE to 0.018534. Our proposed model effectively leverages features from both the inter-services correlation and temporal dimensions, leading to the best performance, where
reaches to 0.935641 and NMSE drops to 0.016684.
Figure 5 illustrates the predicted traffic volumes for two different services. It is a direct representation of our prediction results, which shows that the predicted results align with the distinct traffic patterns exhibited by each service. This indicates that the proposed model is capable of simultaneously capturing and fitting various traffic patterns associated with different types of services.
A schematic diagram of the calendar resource orchestration process is shown in
Figure 6, Among it,
Figure 6a shows the calendar orchestration process without the traffic prediction strategy, and
Figure 6b shows the process based on the multi-flow prediction strategy.
In Stage 1, the services’ bandwidth updates based on two strategies: one is (a) without prediction, and the other (b) uses Multi-Flow Based Prediction. In stage 2, scheme (a) will calculate the calendar allocation result of the current moment, and scheme (b) will performs both processes simultaneously, where one performs the calendar calculation process on the current moment, and the other performs the next stage process based on the PFAMD algorithm. In stage 3, scheme (a) will calculate the calendar allocation based on the real service bandwidth updates, and based on the successful prediction, scheme (b) allocates time slots directly according to the pre-allocation result calculated in stage 2. In stage 4, scheme (a) will allocate all of the calendar the service needs based on the calculation results calculated in stage 3, but scheme (b) has completed the service allocation.
By analyzing lots of experimental data, we can prove that, via a large number of repeated experiments, the calendar allocation scheme based on the PMFAN-GDSO can save up to 46.8% of computing time while ensuring prediction accuracy.
In our algorithm, the actual delay generated via the allocation algorithm can be obtained using (20):
With the increasing number of network nodes, the running time T_GDSO of the allocation algorithm also increases. Therefore, as the network scale expands, our algorithm can effectively save computation time. As shown in
Figure 7, the curve demonstrates a gradual increase in time savings as the number of nodes increases. When the number of nodes increases from 32 to 118, the time savings increase by a factor of 5. However, when observing each individual topology structure, our algorithm seems to have a fixed proportion in terms of time savings, which is approximately 50%. This is because the traffic dataset and network topology we utilize are independent of each other. As a result, employing the same prediction algorithm yields a similar prediction accuracy, leading to a convergence in the gain ratios across different topologies. However, there are significant differences in the running time of the underlying allocation algorithm for different topology structures, which allows our algorithm to save the running time of the allocation algorithm significantly. This finding underscores the practicality and efficiency of employing traffic prediction in time slot resource allocation, offering a promising approach for enhancing resource management in dynamic network environments. The demonstrated computational efficiency is critical for ensuring timely and responsive network operations, especially in scenarios with rapidly evolving traffic patterns and dynamic node configurations.