ComEdge: Cloud-Native Platform for Integrated Computing and Communication in Satellite–Terrestrial Network

Shi, Haoyang; Zhang, Xing; Wu, Peixuan; Chen, Jingkai; Zhang, Yufei

doi:10.3390/electronics12204252

Open AccessArticle

ComEdge: Cloud-Native Platform for Integrated Computing and Communication in Satellite–Terrestrial Network

by

Haoyang Shi

,

Xing Zhang

^*,

Peixuan Wu

,

Jingkai Chen

and

Yufei Zhang

School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(20), 4252; https://doi.org/10.3390/electronics12204252

Submission received: 11 September 2023 / Revised: 10 October 2023 / Accepted: 11 October 2023 / Published: 14 October 2023

(This article belongs to the Special Issue Satellite Terrestrial Networks: Technologies, Security and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Leveraging technological advancements such as containers, microservices, and service mesh, cloud-native edge computing (CNEC) has become extensively discussed and applied in both academia and industry. The integration of mobile edge computing and communication is crucial for the future communication architecture in order to fully utilize distributed and fragmented communication resources and computing power. The potential for cloud-native integration can help merge mobile edge computing and communication, enhancing network flexibility and resource utilization. This paper investigates the implementation plan for extending cloud-native capabilities to integrated computing and communication (INCCOM) in the satellite–terrestrial network. We construct an experimental verification platform called ComEdge in a real-world setting. Subsequently, we analyze the architecture, functional characteristics, and deployment of the platform in a real-world environment. Furthermore, we explore the solution of deep reinforcement learning in the deployment of cloud-native core network and conduct a preliminary verification of the platform’s potential to enable artificial intelligence in a real production environment, which will provide guidance to both academic and industry sectors. Finally, we conduct an analysis on the challenges and opportunities encountered by the cloud-native INCCOM network system.

Keywords:

satellite–terrestrial network; cloud-native edge computing; integrated computing and communication; deep reinforcement learning; core network

1. Introduction

In the past few years, the ground communication network has further expanded and developed with the increase in communication volume worldwide. However, the ground communication network cannot offer equal coverage to remote areas or the edge network due to insufficient resource planning, which leads to unassured communication service quality in these areas. At the same time, satellite communication systems leverage the benefits of high altitude and multicast/broadcast capabilities while complementing traditional ground networks to enable novel mobile communication networks [1]. Increasingly, organizations are initiating projects on the satellite–terrestrial hybrid network (STHN). Notably, OneWeb, O3b, SpaceX, and Telesat are among the companies proposing satellite-based Internet solutions [2]. As shown in Figure 1, owing to its intrinsic benefits encompassing wide scope, efficient processing capabilities, and versatile nature, STHN finds application across numerous practical domains, intelligent transportation systems [3], military tasks, disaster relief [4], etc.

On the one hand, cloud-native computing inherits from the development of cloud computing and has the characteristics of Infrastructure as a Service (IaaS), Platform as a Service (PaaS), and Software as a Service (SaaS) [5]. Additionally, cloud-native computing achieves high reliability, elasticity, manageability, and observability of applications through means such as containerization, microservices architecture, declarative API, automated deployment, dynamic configuration, and auto-scaling, better adapting to the rapid changes in and complexity of cloud environments.

On the other hand, future universal computation requires an open and integrated network architecture that deeply integrates computing power, communication, and industry intelligence in a decentralized network. This new network architecture will transform traditional cloud AI into edge AI and then support network for AI (NET4AI) [6], enabling a variety of AI applications such as multi-task transfer learning on edge devices [7] and the integration of edge cloud-native and intelligent wireless communication. Cloud-native and edge intelligence are the core parts of NET4AI. Artificial intelligence can achieve better results in edges that are closer to data and have greater elasticity compared to single cloud computing.

In this context, the trend of cloud-native computing in the telecommunications industry is on the rise. Compared to almost any other service, telecommunication services have higher requirements for resilience, security, and performance [8]. The Third Generation Partnership Project (3GPP) initially introduced the Service-Based Architecture (SBA) paradigm [9] for 5G Core (5GC). This paradigm involves the decomposition of monolithic Network Functions (NFs) into multiple microservices, offering finely grained functionalities. These microservices are designed for deployment as Virtual Machines (VMs) or lightweight containers (e.g., Docker) in a cloud-native fashion, making efficient use of cloud computing resources. Consequently, NFs can be dynamically scaled both horizontally (out/in) and vertically (up/down) to accommodate fluctuations in signaling traffic from a multitude of devices. The emergence of open-source core network platforms such as Open5GS [10] and Free5GC [11] further assists research in realizing cloud-native core network.

While the emerging mobile core network can enhance deployment flexibility and scalability through cloud-native environments [12,13], several challenges need to be addressed in practical deployments at multiple edge nodes close to users. First, due to limited computing and data storage resources at edge nodes, the cloud-native network functions (CNFs) should be distributed across multiple edge nodes and run in response to terminal service requests following communication protocol procedures. It is foreseeable that edge-distributed core networks will consume more communication resources than remote data center-based core networks. Furthermore, running CNFs on edge nodes incurs costs for container deployment and operation. Therefore, a cloud-native edge core network deployment scheme should be designed to reduce operational costs and control traffic overhead.

The paper is structured as follows: The challenge of cloud-native INCCOM is introduced in Section 2. The ComEdge experimental platform is detailed in Section 3. Following this, Section 4 discusses the deployment challenges and solutions for the cloud-native mobile core network in ComEdge. Section 5 provides an analysis of the challenges and opportunities encountered by the INCCOM network system. Lastly, Section 6 offers the paper’s conclusion.

2. The Challenge of Cloud-Native INCCOM

As mobile edge computing (MEC) and AI continue to evolve, computing power has become ubiquitous. Networks must now offer intelligent services that enable efficient collaboration between cloud, edge, and terminal computing power [14]. The upcoming 6G network is aimed at achieving seamless global coverage, spanning across land, sea, air, and space. It will also enable the establishment of a hierarchical network which will facilitate centralized and distributed collaboration. The network will support distributed edge autonomy and management, alongside unified and simpler protocols. The integration of computing and communication systems will enable seamless collaboration among cloud, edge, network, and computing power. Communication and computing elements will understand and collaborate with each other, thereby achieving real-time and precise discovery of available computing power, flexible and dynamic computing, as well as connection service scheduling. These advancements will offer ubiquitous services, optimizing the allocation of computing resources, and enhancing the utilization efficiency of communication and computing resources. At the same time, cloud-native architecture is becoming a prevailing trend in modern software architecture design patterns. The objective is to enhance the elasticity, reliability, and scalability of applications. Cloud-native architecture highlights container-based design, microservices, continuous delivery, and automated management in order to improve application performance and ease of management. There is an urgent need to design and implement a smart cloud-native network platform that integrates centralized and distributed architectures and blends communication and computing. The main challenges are as follows.

The primary challenge is how to effectively control and utilize fragmented resources on demand, encompassing efficient scheduling of distributed computing resources, complete utilization of fragmented data silos, appropriate utilization of diverse communication methods, and effective consolidation of decentralized and heterogeneous models. Task scheduling, node management, and structure merging are essential aspects to consider when scheduling fragmented resources. Distributed frameworks such as Hadoop [15] and Spark [16] have attained maturity and are extensively used in fields like data processing and machine learning, providing the ability to manage and schedule computing power in distributed environments. The problems of data silos and model consolidation can be addressed by utilizing machine learning frameworks such as federated learning, transfer learning, and lifelong learning.

The second challenge for INCCOM involves ensuring network determinacy and security. The primary distinction between determinate and non-determinate networks is the assurance of transmission time. Non-determinate networks may encounter issues like lost, delayed, or disordered data packets which can affect real-time applications. In contrast, determinate networks utilize techniques that predict the time of data packet transmission and guarantee the reliability and stability of real-time applications. In cluster networks experiencing real-time topology changes, data security, communication security, and model security are crucial considerations. Blockchain technology and federated learning have potential in enhancing network and data security, making them a vital aspect of network security.

It is crucial to note that the deployment method of the core network plays a significant role in the performance of satellite-ground integrated communication systems in both the present and future stages. However, the orchestration of cloud-native network functions in the core network remains an unresolved issue, with very few works dedicated specifically to addressing this problem. In the context of the satellite-ground fusion network, refs. [17,18,19] discuss several novel core network deployment and management architectures. Ref. [17] proposes the vision of deploying certain critical network functions in the core network on satellites to reduce latency. Ref. [18] further elaborates on this vision by distributing all or some of the network functions of the 5G core network across multiple satellites in different orbits and leveraging the “service set” mechanism in 5G to alleviate the significant signaling pressure between satellites and the ground. Ref. [19] addresses the issue of signaling storms introduced by satellite mobility and presents the in-orbit stateless core network case. It is important to note that the focus above is on the overall architectural research of the mobile core network in the satellite-ground fusion network scenario, with a lack of research on actual deployment algorithms for distributed core networks. In a typical network environment, the authors of refs. [20,21] implemented a cloud-native Service Function Chaining (SFC) framework that provides traffic steering mechanisms for establishing end-to-end network services. In [22], the authors propose a horizontal scaling algorithm that leverages Control Theory to dynamically adjust the number of instances of the Access and Mobility Management Function (AMF) based on traffic load. To conserve resources required for User Plane Function (UPF) instances, the authors of ref. [23] propose a UPF instance scaling mechanism based on deep reinforcement learning. Although these efforts reduce the traffic overhead in the control plane of the core network deployed in the form of virtual machines, containers, or slices, there is a lack of consideration for the relationship between fluctuating service requests, deployment costs of the core network at multiple edge nodes, and inter-node control signaling overhead.

Clearly, to simultaneously minimize deployment costs and control signaling overhead while adapting to the dynamic nature of user service requests, an intelligent CNFs deployment architecture is required. This architecture provides deployment strategies for CNFs by learning the spatiotemporal relationships between user service requests, deployment costs, and control signaling overhead.

3. ComEdge Experimental Platform

With the rapid advancement of architectural technology, the cloud-native concept has gained widespread acceptance. Various technologies, including service meshes, microservices, containers, and Kubernetes, have become the standard for cloud edge architecture in the 5G era. Many industry, academic, and research organizations are engaged in the study of cloud-native edge computing, with examples such as OpenYurt [24], launched by Alibaba, EdgeFoundry [25] from the Linux Foundation, and KubeEdge [26] by Huawei. Following an evaluation of different architectural options, we chose KubeEdge to establish our platform. KubeEdge seamlessly connects cloud and edge applications and resources, creating a unified computing architecture that brings about better data processing efficiency and user experience. This section presents a cloud-native INCCOM platform developed using KubeEdge.

3.1. Platform Scenario

Figure 1 illustrates the cloud-native edge intelligence scenario where the functions of each node can be flexibly deployed on various real-world devices, such as drones, vehicles, ships, satellites, and cloud hosts in different scenarios. The physical communication methods used within the cluster are also varied, including but not limited to fiber optic, wifi, microwave, as well as different types of satellites. Moreover, despite the various underlying communication methods used, the cloud-native edgemesh [12] technology shields the complex underlying topology, enabling the platform to be agile in the allocation and scheduling of business, tasks, and resources. The majority of nodes in the system are equipped with containerized 5G base stations, and the master node deploys a containerized 5G core network, granting access to numerous 5G terminals through the platform’s southbound interface. The platform’s quick deployment on nodes with varied architectures and forms, added to the flexibility and maneuverability of actual physical nodes, alongside the integration of satellite links, greatly expands the various application scenarios of the platform.

In the actual construction of the platform, we use five physical nodes with different architectures, computing power, and communication capabilities as the main cluster parts of ComEdge. The actual situation of each physical node is shown in Table 1.

Meanwhile, the various nodes in the platform can operate in two working modes as needed based on the actual situation, and some devices of the ComEdge platform are shown in Figure 2.

Mesh Mode: A network is formed between Nodes 1, 2, 3, and 4 through a wifi hotspot, with Node 1 creating the hotspot, and Nodes 2, 3, and 4 wirelessly connecting, while Node 5 joins through a wired connection, thus forming a fully functional high-availability cluster. Our platform deploys multiple master nodes to establish a high-availability cluster that adopts a master-slave architecture, with the master node in physical node 1 acting as the primary node, while the other master nodes act as secondary nodes. The primary node bears the responsibility of processing all requests and forwarding them to the secondary nodes, which receive and perform the corresponding operations. In case of failure or ineffectiveness of the primary node, the secondary nodes can automatically take over its role to ensure the high availability and reliability of the cluster. The multiple master nodes collaborate with each other and manage all components and resources in the cluster.

Standalone Mode: The standalone mode refers to the case when only one physical node is present in the system or when a task is designated to a single physical node. It proves useful in scenarios where tasks are simplistic or necessitate specific requirements for edge devices.

3.2. Platform Architecture

The platform architecture, depicted in Figure 3, consists of three main parts. Nodes integrated with the 5G communication system connect to various terminal devices via the southbound interface, depending on the scenario, such as a plethora of IoT sensors, vehicles in the vehicular network, various communication devices across military and civilian applications, including mobile phones and smart watches. Additionally, nodes integrated with satellite antennas establish a communication link with high-throughput Earth-orbiting satellites via the northbound interface, establishing a connection with the data network. The platform can be deployed on general architectures, including X86 and ARM operating systems. The system resources are virtualized, and various businesses and data can be carried out in a containerized way.

The control panel of the platform cluster includes the following important features:

Node management: Node management is an extremely important part of cluster operation and maintenance. It mainly includes operations such as adding, deleting, maintaining, monitoring, and auto-scaling nodes.
Resource management: Resource management achieves flexible configuration and scheduling of compute and storage resources through the use of container resource limitations, requests, and pod scheduling strategies to achieve optimal resource utilization efficiency.
Service orchestration: Service orchestration is achieved by managing and coordinating a group of related containers as an application. The main goal of service orchestration is to coordinate aspects such as container storage, networking, deployment, scaling, and upgrading to create highly available and scalable services, and to provide rich service discovery and routing mechanisms.
Lifecycle management: Lifecycle management functionality allows users the management and monitoring of the entire lifecycle of containers, including container creation, operation, monitoring, updating, and destruction. Through comprehensive container management, lifecycle management ensures stable operation and high availability of containers, improving application efficiency and reliability.

The services of each node in the platform cluster include the following aspects. (1) Task processing: the tasks to be processed can be resource-intensive tasks distributed over a certain period of time, or they can be short, lightweight, and stateless tasks. (2) Data storage: it refers to how to store data in containers and manage the data in the containers. Considering the characteristics of containers, data persistence is usually not needed, and only short-term storage is required. (3) Model derivation: Edge intelligence is achieved by deploying federated learning, transfer learning [27], reinforcement learning, and other algorithms on various edge devices.

4. Deployment of Cloud-Native Core Network in ComEdge

In addressing the deployment of a cloud-native edge core network on the ComEdge platform, we formulated the following reasonable research hypotheses. Due to the ability to fully leverage the dynamic temporal relationships between user service requests, core network deployment and operational costs, DQN-CNFDA can utilize fewer resources while effectively reducing system overhead. Other approaches that consider only a single factor exhibit relatively poorer performance in terms of platform deployment, operational costs, and resource utilization.

To validate our research hypotheses, in this section, we first introduce the key processes and deployment examples of the core network. Afterward, we formulate the deployment optimization problem for the cloud-native edge core network, aiming to minimize the deployment and control traffic overhead of the core network.

4.1. Example Analysis

To further elucidate the actual processes of core network deployment costs and inter-node control signaling overhead in multiple edge nodes, it is necessary to provide a comprehensive analysis with illustrative examples of the key control plane processes that affect the placement cost of cloud-native edge core networks, such as user registration and session establishment. Furthermore, to gain a more intuitive understanding of the distributed core network deployment approaches, it is necessary to provide examples and compare different static edge core network deployment schemes.

Based on the initial registration and session establishment, these two fundamental processes clearly demonstrate the control signaling interactions among various network functions in the core network when fulfilling user service requests. Figure 4 illustrates the procedures for the initial registration and session establishment in the context of 5G network deployment. During the UE’s initial registration, the Authentication and Mobility Management Function (AMF) authenticates the UE and communicates QoS/billing profiles to the Session Management Function (SMF). Following this, the SMF selects a User Plane Function (UPF) to serve as the anchor gateway for session establishment. To initiate uplink data transmission, the UE first establishes a radio connection with the base station, which subsequently sends a service request to the AMF. The AMF, in turn, replicates session states to the base station for effective QoS enforcement. For the delivery of downlink traffic, it is imperative for the anchor gateway to notify the AMF of data arrival. Subsequently, the AMF notifies the base station to initiate paging for the UE. If this paging process succeeds, the UE repeats the aforementioned procedure to establish the session anew. Based on the above process, it is possible to quantitatively determine the control signaling overhead between CNFs.

With the rise of edge computing and related infrastructure development, deploying the core network at the edge has become a major trend. The edge core network is positioned closer to users, which leads to a reduction in data transmission delay and the upload of sensitive data. This proximity helps conserve network bandwidth, enhance data transmission efficiency, and alleviate network congestion while simultaneously safeguarding privacy and data security. In additon, utilizing edge nodes as backup nodes reduces the risk of single point failure and enhances system reliability. Lastly, in comparison to large data centers, the edge core network significantly minimizes energy consumption and operational costs. Then, a brief description is provided for three different examples of edge core network deployment.

In Figure 5a, a relatively complete core network is deployed only on Edge Node 2. The direct consequence of this is that users outside Area 2 need to incur higher edge control overhead for initial registration and session establishment processes. Additionally, the resource utilization of the entire edge system is severely imbalanced. This is a scenario that we should try to avoid in practical deployments. In Figure 5b, a relatively complete core network is distributedly deployed on Edge Nodes 1, 2, and 3. In this case, the resource utilization of the edge system is more balanced, and it also improves, to some extent, the back-haul control overhead for users in Areas 1, 2, and 3. However, this deployment approach still has drawbacks when the number of users increases, especially when there is a significant increase in users in Areas 3 and 4. This can lead to more frequent interactions between AMF, SMF, UPF, and thus consume a substantial amount of inter-node communication resources. Compared to the first two scenarios, the third deployment scenario is more reasonable. In the above processes, network functions that interact frequently with the terminal, such as AMF and UPF, are deployed on almost every edge node, while network functions related to authentication, such as UDM and AUSF, are concentrated on one node. This ensures load balancing between nodes and effectively controls the control signaling overhead of the system.

It should be noted that the number of users and the volume of requests in the wireless coverage areas of various edge nodes are dynamically changing. The above three examples of edge core network deployments are all static solutions and cannot adapt to dynamic real-world situations. In the following subsection, we introduce a DQN-based CNF deployment algorithm.

4.2. DQN-Based CNF Deployment Algorithm (DQN-CNFDA)

We developed a CNF deployment framework based on DQN to address the intricate relationships among service requests, provisioning costs, and control traffic overhead. This framework is designed to reduce costs by learning spatiotemporal patterns within service requests. Once the deployment of each CNF is completed, achieving user registration services is possible through CNFs interacting within or between nodes following the communication protocol process. To adaptively place CNFs in response to fluctuations in service requests, our approach, DQN-CNFDA, learns the optimal policy by gathering rewards through trial-and-error interactions with the environment [28].

To minimize the operation costs of the cloud-native edge core network, we begin by quantifying the communication between nodes and calculating the associated cost within the core network. We simplify the model by solely considering the communication cost between nodes, disregarding the cost within nodes. The meanings of each notation are explained in Table 2.

Formula (1) indicates the occurrence of communication between CNF

f \in F_{s}

and CNF

f + 1 \in F_{s}

across the edge nodes within the time interval t.

M^{t} (f, s)

equals zero if CNF f and

f + 1

are provisioned the same edge node; otherwise,

M^{t} (f, s)

equals one.

\begin{matrix} M^{t} (f, s) = & \sum_{p} \sum_{p^{'}} \sum_{f \in F_{s}} \sum_{f^{'} \in F_{s}} \sum_{e \in E} \sum_{e^{'} \in E} & m i n (1, |e - e^{'}| \cdot x_{p, f, e}^{t} \cdot x_{p^{'}, f^{'}, e^{'}}^{t}) . \end{matrix}

(1)

D_{C M}

in (2) represents the cost of individual communication. Meanwhile,

C_{C M}^{t}

represents the overall cost of communication between nodes necessary to accomplish the service core network.

C_{C M}^{t} = \sum_{s \in S} \sum_{f \in F_{s}} M^{t} (f, s) \sum_{e \in E} n_{s, f, e}^{t} \cdot D_{C M} .

(2)

D_{O P}

in (3) represents the deployment cost of an individual CNF, and

C_{O P}^{t}

represents the deployment cost of the entire edge core network across all the nodes.

C_{O P}^{t} = \sum_{f \in F_{s}} \sum_{e \in E} x_{p, f, e}^{t} \cdot D_{O P} .

(3)

The objective of optimization can be described as

min C_{O P}^{t} + C_{C M}^{t} .

(4)

Resources on edge nodes are limited. CNFs deployed should not exceed the number of available resource units on edge nodes.

\sum_{p} \sum_{f \in F_{s}} x_{p, f, e}^{t} \leq u_{e}^{t}, \forall t, e \in E .

(5)

We use the available resource units

u_{e}^{t}

of each edge node e within time t as the state space

s_{t}

. A represents the action space and an action

a_{t}

at time t is one of the combinations of edge clouds, where the number of combinations of edge nodes is

2^{N_{E}}

.

s_{t} = {u_{1}^{t}, u_{2}^{t}, u_{3}^{t}, u_{1}^{t}, \dots, u_{N_{E}}^{t}} .

(6)

Reward

r_{t}

at time t is defined as

r_{t} = - C_{O P}^{t} - C_{C M}^{t} .

(7)

Algorithm 1 introduces the CNFs deployment algorithm based on DQN. The weight

θ

is initialized in Line 1. In Lines 4–10, the model explores placement schemes for CNFs using various action probabilities of

δ

. In Line 11, the reward

r_{t}

and the new state

s_{t + 1}

are obtained. Established experience replay D in Line 12 weakens the correlation between data, leading to an increased stability of the neural network. Lines 13–20 perform random minibatch of transitions from D in order to compute gradients. The algorithm applies gradient descent by updating the gradients based on the difference between the target value (corresponding to the old parameters) and the Q value (corresponding to the new parameters) to the current parameters. The old parameters are updated with the new parameters every C steps.

Algorithm 1 DQN-based CNFs Deployment Algorithm (DQN-CNFDA)

1:: Initialize action-value function Q with random weights $θ$
2:: Initialize target action-value function $\hat{Q}$ with weights $θ^{'} = θ$
3:: Initialize replay memory D to capacity N
4:: for episode = $1, M$ do
5:: Initialize state s
6:: for $t = 1, T$ do
7:: Generates the probability $δ \in [0, 1]$
8:: if $δ < ε$ then
9:: Randomly selects action $a_{t} \in A$
10:: else
11:: $a_{t} \leftarrow \underset{a}{a r g max} Q (s_{t}, a_{t}; θ)$
12:: end if
13:: Set $r_{t}$ and $s_{t + 1}$
14:: Store transition $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ in buffer D
15:: Select random minibatch of transitions $(s_{t}, a_{t}, r_{t}, s_{t + 1})$ from D
16:: if episode ends at time $i + 1$ then
17:: Set $y_{i} = r_{i}$
18:: else
19:: $y_{i} = r_{i} + γ {max}_{a_{i + 1}} \hat{Q} (s_{i + 1}, a_{i + 1}; θ^{'})$
20:: end if
21:: Perform a gradient descent step on ${(y_{i} - Q (s_{i}, a_{i}; θ))}^{2}$ with respect to the network parameters $θ$
22:: Every C steps reset $\hat{Q} = Q$
23:: end for
24:: end for

4.3. Evaluation Results

The ComEdge platform’s main experimental environment consists of five edge nodes as shown in Table 1. The available resource units for edge nodes are set to the minimum memory value occupied by all CNFs. The main observations in the experiments focus on the two core network processes introduced in Section 4.1: registration and session establishment. Based on Alibaba Cloud’s unit costs for CNF’s operation and control traffic [29],

D_{O P}

and

D_{C M}

are, respectively, set to 0.875 RMB per hour and 0.233 RMB per hour. In addition, we implemented DQN-CNFDA using PyTorch and conducted replays on the ComEdge platform with real data from different cells [30].

The experiments compared the differences in cost and resource utilization among the following edge core network deployment schemes: (1) DQN-CNFDA described in Section 4; (2) the minimum deployment cost scheme (MDCS) mentioned in [31]; (3) engineering-experience-based CNF deployment scheme (ENE-CNFDS); (4) random CNF deployment scheme (R-CNFDS). Among these, Scheme 2 represents a category of current edge core network deployments (as described in [31]), which primarily focus on minimizing the deployment cost of network functions while not paying sufficient attention to control traffic overhead. Scheme 3 represents the approach described in [17,18,19], where either the entire or a portion of core network functions are statically deployed at satellite edge nodes. This approach solely addresses the significant control traffic overhead issue caused by frequent user-triggered registration process handovers in the ground core network, while neglecting the critical factor of deployment costs in the edge core network. The experimental results of the four aforementioned schemes in the ComEdge platform are presented in Figure 6.

Figure 6a illustrates a comparison of platform resource usage for different schemes as the number of users in the wireless coverage area of edge nodes increases. We use the more intuitive platform average memory usage. Consistent with subjective inference, when the number of users within the coverage of edge nodes increases, the number of running containers in the entire platform also increases accordingly, leading to an increase in platform memory usage. R-CNFDS neglects the control traffic overhead between nodes, resulting in inefficient resource utilization. This situation becomes even more severe as the number of users increases. It is worth noting that when the number of users is small, there is little difference in resource utilization between MDCS, ENE-CNFDS, and DQN-CNFDA. With fewer user requests within the node coverage, the required number of containers to be deployed is limited, resulting in relatively good performance for ENE-CNFDS, even though it does not consider deployment costs as effectively. Additionally, the lower volume of control signaling between nodes does not significantly impact MDCS in terms of control traffic. However, as the number of users increases (e.g., when a single node covers 70 to 100 users), DQN-CNFDA outperforms MDCS and ENE-CNFDS in effectively managing the relationship between deployment costs, control traffic, and dynamic user requests. It efficiently scales network functions on each node, leading to more pronounced advantages in system resource utilization.

Figure 6b illustrates a comparison of the costs associated with distributed deployment and operation of the core network on edge nodes using various schemes. All the required CNFs are launched simultaneously at 7:00, resulting in equal costs for all four schemes at the begin time. The operational costs of the edge core network increase gradually as the number of edge users and time progresses. The number of edge users gradually increases over time, leading to a corresponding increase in the operating cost of the edge core network. From 7:00 to 18:00, DQN-CNFDA successfully minimized the costs. The advantage of deep reinforcement learning in deploying CNFs on the ComEdge platform was fully demonstrated. It is worth noting that DQN-CNFDA can predict the placement of network functions based on user’s service request behavior, while R-CNFDS does not involve any learning from user behavior, leading to a significant gap between the two. Furthermore, MDCS and ENE-CNFDS each consider minimizing deployment costs and controlling the inter-node signaling quantity, which reduces the gap with DQN-CNFDA. However, due to the lack of utilization of the relationship between dynamic user requests, deployment costs, and control traffic, MDCS and ENE-CNFDS exhibited significant cost fluctuations (from 13:00 to 17:00). Therefore, using MDCS and ENE-CNFDS results in higher deployment and operational costs. Specifically, the DQN-based scheme achieved a maximum cost reduction of up to 6.9% compared to the engineering-experience-based CNFs deployment scheme.

Based on the experimental results in the aforementioned platform, the proposed DQN-CNFDA better grasps the relationship between dynamic and time-varying user service requests, edge-core network deployment, and operational costs compared to engineering experience and other algorithms. Consequently, it reduces system expenses and enhances resource utilization efficiency.

5. Opportunities and Challenges

5.1. Intelligent Cluster

In the future integrated network, the terminals, base stations, core networks, etc., will evolve towards intelligence and become intelligent network elements. Collaboration among multiple intelligent entities can be aimed at a single task or multiple tasks. Its connotation includes task decomposition and combination, goal analysis and modeling, model training and inference, parameter iteration and sharing, etc. At the same time, for tasks specific to the physical world, resource scheduling and performance optimization are carried out for integrated computation and fusion to ensure the reliability of task execution.

Traditional mobile communication networks also utilize collaboration mechanisms, but these mechanisms are focused on connected collaboration with limited scope, minimal data volume, and collaboration restricted to same-level network elements. However, in the integrated computation network facing AI, multi-intelligent agent collaboration involves a large scale of collaboration, interactive data volume, and collaboration cost, including the possibility of multi-level horizontal and vertical collaboration. As a result, existing collaboration mechanisms cannot be reused. The challenges include selecting collaboration objects, mining and extracting collaboration information, and determining collaboration modes and mechanisms while controlling collaboration costs.

5.2. Telecommunications Industry

In the future, society will experience an intelligent era of universal connectivity. The next-generation mobile communication network will bridge the gap between humanity and the digital world. The evolution of the core network is crucial in the process of integrated networking. The core network acts as a convergence point for network businesses and applications, driving future development. Additionally, the core network is the center of the entire network topology, connecting various terminals and access networks, with a domino effect on the entire network. To adapt to the integration of communication and computation, the following areas need improvement in the process of core network evolution:

Firstly, the increasing number of network functions leads to more service interactions between network functions, weaker security mechanisms for service interactions, and complex operational steps for traffic switching configurations when switching between new and old versions of services.
Secondly, the observability of network function services is poor, the visualization of topology logic is weak, the storage of service operation logs is not standardized, and performance indicators are not standardized, which results in a large amount of manual operations in maintaining network function services and locating and troubleshooting issues.
Finally, operators already have a pool of virtualized resources in the network function virtualization (NFV) stage, which are mostly built and distributed in the form of virtual machine resources. Therefore, it is necessary to consider reusing and migrating virtualized resources into a cloud-native telecommunications infrastructure, running the 5G network in the form of containerized resources, and further improving the flexibility and resource utilization of the network.

6. Conclusions

Due to the growing advancements in cloud-native edge computing and the necessity of effective collaboration amongst cloud, edge, and terminal computing power, the INCCOM network set the path for the next-generation network evolution. The main work of this paper is as follows. We begin by discussing cloud-native INCCOM in the context of a satellite–terrestrial hybrid network. Subsequently, we introduce the ComEdge experimental verification platform that we created and tested in a real-world setting. We then proceed to analyze the platform’s architecture, functional characteristics, and the deployment status in actual production environments. Additionally, we explore the solution of deep reinforcement learning in the deployment of cloud-native core network and conduct a preliminary verification of the platform’s potential to enable artificial intelligence in a real production environment. Lastly, we analyze the challenges and opportunities facing the integrated network from the perspective of intelligent clusters as well as the telecommunications industry.

Author Contributions

Conceptualization, H.S.; Methodology, H.S.; Software, P.W., J.C. and Y.Z.; Resources, X.Z.; Data curation, H.S., P.W. and J.C.; Writing—original draft, H.S.; Supervision, X.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Key Research and Development Program of China 2021YFB2900504.

Data Availability Statement

All relevant data are within the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wang, P.; Zhang, J.; Zhang, X.; Yan, Z.; Evans, B.G.; Wang, W. Convergence of satellite and terrestrial networks: A comprehensive survey. IEEE Access 2019, 8, 5550–5588. [Google Scholar] [CrossRef]
Karavolos, M.; Nomikos, N.; Vouyioukas, D.; Mathiopoulos, P.T. HST-NNC: A novel hybrid satellite-terrestrial communication with NOMA and network coding systems. IEEE Open J. Commun. Soc. 2021, 2, 887–898. [Google Scholar] [CrossRef]
Xiong, G.; Zhu, F.; Dong, X.; Fan, H.; Hu, B.; Kong, Q.; Kang, W.; Teng, T. A kind of novel ITS based on space-air-ground big-data. IEEE Intell. Transp. Syst. Mag. 2016, 8, 10–22. [Google Scholar] [CrossRef]
Hubenko, V.P.; Raines, R.A.; Mills, R.F.; Baldwin, R.O.; Mullins, B.E.; Grimaila, M.R. Improving the global information grid’s performance through satellite communications layer enhancements. IEEE Commun. Mag. 2006, 44, 66–72. [Google Scholar] [CrossRef]
Li, F. Cloud-native database systems at Alibaba: Opportunities and challenges. Proc. Vldb Endow. 2019, 12, 2263–2272. [Google Scholar] [CrossRef]
Yang, T.; Ning, J.; Lan, D.; Zhang, J.; Yang, Y.; Wang, X.; Taherkordi, A. Kubeedge wireless for integrated communication and computing services everywhere. IEEE Wirel. Commun. 2022, 29, 140–145. [Google Scholar] [CrossRef]
Chen, Q.; Zheng, Z.; Hu, C.; Wang, D.; Liu, F. On-edge multi-task transfer learning: Model and practice with data-driven task allocation. IEEE Trans. Parallel Distrib. Syst. 2019, 31, 1357–1371. [Google Scholar] [CrossRef]
TU Group. Cloud Native Thinking for Telecommunications. 2020. Available online: https://github.com/cncf/telecom-user-group/blob/master/whitepaper/cloud_native_thinking_for_telecommunications.md (accessed on 27 March 2020).
3GPP. System Architecture for the 5G System. 3GPP TS 23.501 V15. 3.0. 2018. Available online: https://www.etsi.org/deliver/etsi_ts/123500_123599/123501/15.03.00_60/ts_123501v150300p.pdf (accessed on 20 September 2018).
Open5GS: Open Source Project of 5GC and EPC (Release 16). Available online: https://open5gs.org/ (accessed on 11 June 2021).
Free5GC: An Open-Source Project for 5th Generation (5G) Mobile Core Networks. Available online: https://free5gc.org/ (accessed on 3 May 2021).
Imadali, S.; Bousselmi, A. Cloud native 5g virtual network functions: Design principles and use cases. In Proceedings of the 2018 IEEE 8th International Symposium on Cloud and Service Computing (SC2), Paris, France, 18–21 November 2018; pp. 91–96. [Google Scholar]
Luong, D.H.; Thieu, H.T.; Outtagarts, A.; Ghamri-Doudane, Y. Predictive autoscaling orchestration for cloud-native telecom microservices. In Proceedings of the 2018 IEEE 5G World Forum (5GWF), Silicon Valley, CA, USA, 9–11 July 2018; pp. 153–158. [Google Scholar]
Tang, X.; Cao, C.; Wang, Y.; Zhang, S.; Liu, Y.; Li, M.; He, T. Computing power network: The architecture of convergence of computing and networking towards 6G requirement. China Commun. 2021, 18, 175–185. [Google Scholar] [CrossRef]
Apache Hadoop Project Develops Open-Source Software for Reliable, Scalable, Distributed Computing. Available online: https://hadoop.apache.org/ (accessed on 16 July 2022).
Spark: Unified Engine for Large-Scale Data Analytics. Available online: https://spark.apache.org/ (accessed on 20 August 2022).
Cui, H.; Zhang, J.; Geng, Y.; Xiao, Z.; Sun, T.; Zhang, N.; Liu, J.; Wu, Q.; Cao, X. Space-air-ground integrated network (SAGIN) for 6G: Requirements, architecture and challenges. China Commun. 2022, 19, 90–108. [Google Scholar] [CrossRef]
Wang, X.; Sun, T.; Duan, X.; Wang, D.; Li, Y.; Zhao, M.; Tian, Z. Holistic service-based architecture for space-air-ground integrated network for 5G-advanced and beyond. China Commun. 2022, 19, 14–28. [Google Scholar] [CrossRef]
Li, Y.; Li, H.; Liu, W.; Liu, L.; Chen, Y.; Wu, J.; Wu, Q.; Liu, J.; Lai, Z. A case for stateless mobile core network functions in space. In Proceedings of the ACM SIGCOMM 2022 Conference, Amsterdam, The Netherlands, 22–26 August 2022; pp. 298–313. [Google Scholar]
Bouridah, A.; Fajjari, I.; Aitsaadi, N.; Belhadef, H. Optimized scalable SFC traffic steering scheme for cloud native based applications. In Proceedings of the 2021 IEEE 18th Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 9–12 January 2021; pp. 1–6. [Google Scholar]
Dab, B.; Fajjari, I.; Rohon, M.; Auboin, C.; Diquélou, A. Cloud-native service function chaining for 5G based on network service mesh. In Proceedings of the ICC 2020—2020 IEEE International Conference On Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–7. [Google Scholar]
Alawe, I.; Hadjadj-Aoul, Y.; Ksentini, A.; Bertin, P.; Darche, D. On the scalability of 5G core network: The AMF case. In Proceedings of the 2018 15th IEEE Annual Consumer Communications & Networking Conference (CCNC), Las Vegas, NV, USA, 12–15 January 2018; pp. 1–6. [Google Scholar]
Nguyen, H.T.; Van Do, T.; Rotter, C. Scaling upf instances in 5g/6g core with deep reinforcement learning. IEEE Access 2021, 9, 165892–165906. [Google Scholar] [CrossRef]
OpenYurt: An Open Platform That Extends Upstream Kubernetes to Edge. Available online: https://openyurt.io/ (accessed on 22 July 2021).
EdgeFoundry: The Preferred Open Source Edge Platform. Available online: https://www.edgexfoundry.org/ (accessed on 18 August 2021).
Kubeedge: Kubernetes Native Edge Computing Framework. Available online: https://kubeedge.io/ (accessed on 7 January 2022).
Chen, Q.; Zheng, Z.; Hu, C.; Wang, D.; Liu, F. Data-driven task allocation for multi-task transfer learning on the edge. In Proceedings of the 2019 IEEE 39th International Conference On Distributed Computing Systems (ICDCS), Dallas, TX, USA, 7–10 July 2019; pp. 1040–1050. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, MA, USA, 2018. [Google Scholar]
Alibaba Cloud. Alibaba Cloud Pricing. Available online: https://www.aliyun.com/price/ (accessed on 24 May 2022).
Harvard Dataverse. A Multi-Source Dataset of Urban Life in the City of Milan and the Province of Trentino Dataverse. Available online: https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/EGZHFV (accessed on 16 November 2022).
Zheng, J.; Tian, C.; Dai, H.; Ma, Q.; Zhang, W.; Chen, G.; Zhang, G. Optimizing NFV chain deployment in software-defined cellular core. IEEE J. Sel. Areas Commun. 2019, 38, 248–262. [Google Scholar] [CrossRef]

Figure 1. ComEdge platform scenario.

Figure 2. Actual deployment of ComEdge platform.

Figure 3. Architecture of ComEdge platform.

Figure 4. Partial 5G Signaling Process.

Figure 5. Three Examples of edge core network deployment. (a) A single core network is deployed on a single edge node. (b) A single core network is deployed on multiple edge nodes. (c) Multiple core networks are deployed on multiple edge nodes.

Figure 6. The results of the experiments conducted on the ComEdge platform.

Table 1. The actual situation of each physical node.

Node	CPU	Memory	Communication Method
node1	X86/64 bit, 16 core	126 GB	wifi; ethernet
node2&3	X86/64 bit, 8 core	8 GB	wifi
node4	ARM/64 bit, 4 core	8 GB	ethernet
node5	ARM/64 bit, 4 core	4 GB	wifi;ethernet

Table 2. The meanings of each notation.

Notation	Description
E	Set of edge nodes
e	Edge nodes index
$N_{E}$	Number of edge nodes
$u_{e}^{t}$	Number of available resource units in edge node e at time t
S	Set of services
s	Service index
$F_{s}$	Set of CNFs in service s
f	CNF index in set of CNFs $F_{s}$
p	CNF type index
$x_{p, f, e}^{t}$	CNF f with type p is provisioned on edge node e at time t or not
$n_{s, f, e}^{t}$	Number of service s request for CNF f to edge e at time t

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Shi, H.; Zhang, X.; Wu, P.; Chen, J.; Zhang, Y. ComEdge: Cloud-Native Platform for Integrated Computing and Communication in Satellite–Terrestrial Network. Electronics 2023, 12, 4252. https://doi.org/10.3390/electronics12204252

AMA Style

Shi H, Zhang X, Wu P, Chen J, Zhang Y. ComEdge: Cloud-Native Platform for Integrated Computing and Communication in Satellite–Terrestrial Network. Electronics. 2023; 12(20):4252. https://doi.org/10.3390/electronics12204252

Chicago/Turabian Style

Shi, Haoyang, Xing Zhang, Peixuan Wu, Jingkai Chen, and Yufei Zhang. 2023. "ComEdge: Cloud-Native Platform for Integrated Computing and Communication in Satellite–Terrestrial Network" Electronics 12, no. 20: 4252. https://doi.org/10.3390/electronics12204252

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

ComEdge: Cloud-Native Platform for Integrated Computing and Communication in Satellite–Terrestrial Network

Abstract

1. Introduction

2. The Challenge of Cloud-Native INCCOM

3. ComEdge Experimental Platform

3.1. Platform Scenario

3.2. Platform Architecture

4. Deployment of Cloud-Native Core Network in ComEdge

4.1. Example Analysis

4.2. DQN-Based CNF Deployment Algorithm (DQN-CNFDA)

4.3. Evaluation Results

5. Opportunities and Challenges

5.1. Intelligent Cluster

5.2. Telecommunications Industry

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI