Deep Reinforcement Learning-Based Multipath Routing for LEO Megaconstellation Networks

Han, Chi; Xiong, Wei; Yu, Ronghuan

doi:10.3390/electronics13153054

Open AccessArticle

Deep Reinforcement Learning-Based Multipath Routing for LEO Megaconstellation Networks

by

Chi Han

¹

,

Wei Xiong

^1,2,* and

Ronghuan Yu

^1,2

¹

National Key Laboratory of Space Target Awareness, Space Engineering University, Beijing 101400, China

²

School of Space Information, Space Engineering University, Beijing 101400, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(15), 3054; https://doi.org/10.3390/electronics13153054

Submission received: 12 July 2024 / Revised: 25 July 2024 / Accepted: 31 July 2024 / Published: 1 August 2024

(This article belongs to the Special Issue Satellite Terrestrial Networks: Technologies, Security and Applications)

Download

Browse Figures

Versions Notes

Abstract

The expansion of megaconstellation networks (MCNs) represents a promising solution for achieving global Internet coverage. To meet the growing demand for satellite services, multipath routing allows the simultaneous establishment of multiple transmission paths, enabling the transmission of flows in parallel. Nevertheless, the mobility of satellites and time-varying link states presents a challenge for the discovery of optimal paths and traffic scheduling in multipath routing. Given the inflexibility of traditional static deep reinforcement learning (DRL)-based routing algorithms in dealing with time-varying constellation topologies, DRL-based multipath routing (DMR) enabled by a graph neural network (GNN) is proposed as a means of enhancing the transmission performance of MCNs. DMR decouples the stochastic optimization problem of multipath routing under traffic and bandwidth constraints into two subproblems: multipath routing discovery and multipath traffic scheduling. Firstly, the minimum hop count-based multipath route discovery algorithm (MHMRD) is proposed for the computation of multiple available paths between all source and destination nodes. Secondly, the GNN-based multipath traffic scheduling scheme (GMTS) is proposed as a means of dynamically scheduling the traffic on each available path for each data stream, based on the state information of ISLs and traffic demand. Simulation results demonstrate that the proposed scheme can be scaled to constellations with different configurations without the necessity for repeated training and enhance the throughput, completion ratio, and delay by 42.64%, 17.39%, and 3.66% in comparison with the shortest path first algorithm (SPF), respectively.

Keywords:

satellite network; multipath routing; deep reinforcement learning; traffic scheduling; hop count

1. Introduction

The characteristics of megaconstellation networks (MCNs), such as wide area coverage, low latency, and high bandwidth, confer upon them unique advantages in remote area communications and delay-sensitive services (e.g., emergency communications and disaster monitoring) [1]. The aforementioned advantages have resulted in the accelerated development of MCNs, which are anticipated to fulfill a multitude of services, including communications, remote sensing, navigation, and positioning. Furthermore, MCNs are poised to become a pivotal component of the infrastructure of the space-air-ground integrated network (SAGIN) [2,3,4].

In order to meet the growing demand for satellite communication services, multipath routing is a promising approach. Multipath routing allows multiple transmission paths to be established simultaneously between the source and destination nodes for each data stream, which can increase network throughput [5]. In contrast to single-path routing, multipath routing allows for the optimal utilization of available bandwidth, thereby alleviating congestion, improving throughput, and enhancing fault tolerance. This is achieved by enabling a single service to be transmitted over multiple paths in parallel. Consequently, multipath routing facilitates the resilience of resource-constrained satellite networks to withstand node or link failure scenarios. However, the high dynamics of satellite network topologies and transmission link states presents challenges for multipath routing discovery and multipath traffic scheduling [6,7,8].

Multipath routing discovery is the first challenge posed by the time-varying nature of satellite topology and traffic. Factors such as changes in satellite attitude and complex electromagnetic environments can cause laser link transmission rates to decrease or even fail. As a result, link failures may occur frequently in MCNs rather than being an exceptional event, as in terrestrial networks. The frequent occurrence of link passes or failures impedes the ability of the broadcast mechanism to discover and maintain routes in a timely manner, as is the case in terrestrial networks. This results in an increase in the overheads associated with broadcast and route control. Consequently, topology-based ad hoc routing protocols, such as ADOV [9], are suboptimal in MCNs. Furthermore, due to the mobility and energy constraints of LEO satellites, established links are often not stably maintained. Topological changes in the path discovery process may result in the rapid expiration of discovered paths, with the corresponding paths no longer existing at the time of data transfer. The above factors require multipath routing discovery to be able to sense link failures in a timely manner and reasonably set multiple alternative paths. Some schemes currently deploy path computation in the ground network control center (NCC) [10]. The NCC improves network transmission performance by comprehensively considering the distribution of terrestrial user traffic and link utilization and making full use of the link redundancy brought about by the mesh topology of the satellite network to plan multiple parallel transmission paths for each flow at the same time.

Multipath traffic scheduling in MCNs is the second challenge. Complex electromagnetic environments, satellite attitude changes, and other factors result in fluctuations in link quality. As a consequence, traditional multipath traffic scheduling based on static routing is no longer applicable in MCNs. Furthermore, the time-varying nature of terrestrial user traffic and the fluctuation in link quality render the satellite network incapable of accurately identifying traffic patterns, and fine-grained traffic scheduling between multiple paths frequently lacks effective information. Deep reinforcement learning (DRL) has demonstrated significant advantages in the description of spatiotemporal features, flow decisions, and other related areas. A considerable number of scholars have opted to adopt DRL for the purpose of adaptive multipath flow scheduling, as evidenced by [11,12,13,14]. However, the time-varying satellite networks and fluctuating quality of inter-satellite links (ISLs) necessitate the aggregation of multiple time slices for decision making in models employing DRL. As the constellation size increases, the model’s dimension expands dramatically, resulting in a significant increase in computational effort and a reduction in model scalability. Graph neural networks (GNNs) are capable of identifying features and patterns in graph-structured data, performing relational reasoning and combinatorial generalization on graph-structured data [15,16], and they can be applied without additional model tuning to constellations of different topologies and sizes [17,18].

Recently, DRL has demonstrated considerable potential in learning temporal and spatial traffic characteristics for the purpose of making routing decisions. Researchers have employed DRL for the purpose of adaptive traffic assignment, with the aim of replacing static traffic segmentation. However, DRL-based models require the ability to summarize continuous topological snapshots in order to make decisions in a dynamic satellite environment. Satellite networks have high-dimensional and sparse state spaces, and the dimensionality of DRL models increases dramatically with the size of the constellation and the level of traffic demand, which in turn leads to poor model scalability. Graph neural networks (GNN) facilitate the inference of relational data and the generalization of combinatorial structures within graph-structured information. This enables application of the DRL model to satellite constellations of varying sizes without the need for additional modifications. In light of the aforementioned analysis, we propose DRL-based multipath routing (DMR) embedded by a GNN to improve the transmission performance of MCNs. The multipath routing problem is initially modeled as a stochastic optimization problem, with the objective of maximising network efficiency under traffic and bandwidth constraints. In order to obtain the optimal solution, the original problem is decoupled into two subproblems: multipath routing discovery and multipath traffic scheduling. With regard to the initial subproblem, it is proposed that the redundant topology of the satellite network be fully exploited, with the introduction of a minimum hop count-based multipath routing discovery (MHMRD) algorithm. MHMRD is designed to compute a number of available paths between all source and destination nodes. On this basis, the second subproblem is addressed by the design of GNN-based multipath traffic scheduling (GMTS). GMTS models the multipath traffic scheduling process as a Markov decision process (MDP), which dynamically schedules traffic on each available path for each data stream based on satellite network link state information and traffic demand. Simulation results on Iridium and OneWeb demonstrate that the proposed scheme is not reliant on retraining for different-sized constellations. Furthermore, the network throughput, average delay, and flow completion rate exceeded those of the baseline scheme.

The main contributions of this paper are summarized as follows:

The multipath routing problem in satellite networks is modeled as a stochastic optimization problem under traffic and bandwidth constraints. The objective is to maximize network efficiency. The problem is decoupled into a multipath routing discovery subproblem and a multipath traffic scheduling subproblem.
In order to address the multipath routing discovery subproblem, the MHMRD algorithm has been proposed as a means of planning multiple minimum hop count paths for each flow between any pair of nodes in the network based on the link state information.
The multipath traffic scheduling subproblem is modeled as an MDP, with the GMTS scheme proposed to dynamically schedule the proportion of traffic on each available path for each data stream. GMTS is a scalable solution which can be applied to different constellations.

The rest of this paper is organized as follows. Section 2 provides some related work. In Section 3, we provide the system model and problem formulation. In Section 4, DRL-based multipath routing is described. Section 5 provides the simulation and main results. Finally, we summarize the results in Section 6.

2. Related Work

2.1. Multipath Routing

In satellite networks, there are multiple paths with an equal number of hops from end to end. In order to improve the reliability of data transmission and enhance the network load balancing capability, researchers have proposed the use of multipath routing in satellite networks, where each flow is divided into multiple subflows for parallel transmission along different paths. In order to distribute traffic over multiple paths, the traditional equal-cost multipath (ECMP) method performs static traffic splitting based on message information. However, the ECMP approach does not consider the limitations of network parameters such as the bandwidth and delay and is prone to congestion when the network load is high. To achieve dynamic flow management for multipath routing, a software-defined network (SDN)-based satellite network architecture has been proposed for central topology control and traffic control [18,19,20,21]. In the SDN-based network architecture, the data plane and control plane are separated, the SDN controller determines the routing based on the network parameters and QoS requirements, and the satellite is responsible for data forwarding. In [22], the authors attempted to integrate network coding and multipath routing in order to enhance the data transmission efficiency of satellite networks. However, the centralized control approach is susceptible to high latency issues. In [23], in order to reduce the control latency and improve the transmission stability, the authors employed a distributed approach for routing control. In [24], the authors proposed network coding-based multipath cooperative routing (NCMCR). In order to optimize the transmission capacity, each flow in NCMCR is transmitted along multiple disjoint links. However, the aforementioned multipath routing schemes do not consider the link parameters of the various forwarding paths and thus are unable to adaptively perform reasonable traffic splitting. In [1], the authors designed an adaptive traffic balancing scheme among multiple temporal paths by combining sparse and redundant network coding mechanisms. This approach enables the achievement of deterministic delay guarantees in limited node resources and multi-user competition scenarios. Nevertheless, all of the aforementioned routing schemes utilize static topologies, which precludes them from adapting to satellite networks with dynamically changing topologies.

2.2. Intelligent Routing

The time-varying satellite network environment and dynamic interstellar links bring challenges to satellite network routing. In order to ensure the network’s quality of service under the conditions of proliferating user and constellation sizes, multipath routing requires more efficient path planning and finer traffic control strategies. Reinforcement learning is an intelligent tool which supports decision making by interacting with the environment [25,26,27], and scholars have proposed numerous intelligent routing schemes based on reinforcement learning. In [25], the authors developed a supervised deep learning system for the construction of routing tables. In [28], the authors employed graph-based deep learning in satellite networks through a neural network architecture called teh graph-query neural network. Nevertheless, such supervised learning-based routing schemes are constrained by their inability to generalize and adapt across different constellations. To address this issue, the authors of [29] proposed a Markov decision process (MDP) model for satellite network routing, employing multi-agent deep reinforcement learning (DRL) to satisfy diverse quality of service (QoS) requirements. However, the fixed goal policy renders DRL-based routing algorithms inflexible, as they are unable to adapt to dynamically changing networks. It is evident that traditional static objective configurations are unable to reflect the varying importance of different metrics in dynamic network environments. In order to address this issue, the authors of [17,30] integrated a graph neural network (GNN) into DRL intelligences, enabling the timely adjustment of network performance metrics by predicting the trend of the optimization objectives of routing algorithms. This enables the intelligences to learn optimal paths which can adapt to different environmental changes. In [31,32], the authors proposed different DRL architectures, all of which dynamically adjust the link weights according to the load of the key nodes or the link load. In [33], the authors extended the TCP options to piggyback the relevant control information and flexibly support communication between the subflows of the transport layer and the SDN controller. An SDN cooperated MPTCP (scMPTCP) architecture was proposed, which selects routes for new subflows based on the available bandwidth of each route and avoids the bottleneck of other subflows. It also can adapt to changes in network load. In [34], the authors proposed a DQN controller considering a path loss model based on the Markov decision process model for network selection and adaptive resource allocation in heterogeneous networks.

Nevertheless, an examination of the aforementioned research reveals that existing DRL-based studies are not readily scalable due to their inherent coupling to the input states and limited topology, which is consistent with the training data. To resolve the aforementioned contradiction, the proposed DRL-based multipath routing embedded with a GNN can readily identify the optimal action within a continuous action space. DMR decouples the multipath routing problem into distinct components—multipath discovery and multipath traffic scheduling—to reduce the solution’s complexity. Concurrently, the target policy adopted by DMR can be modified in accordance with the prevailing environmental conditions, thereby enhancing the flexibility of the system with respect to different topologies.

3. System Model and Problem Formulation

3.1. Multipath Scenario

The multipath routing scenario in the MCN is shown in Figure 1, where the satellite network is divided into mutually independent data planes and control planes via the SDN. In the terrestrial network, data in the send buffer are transmitted to the sender through routers over multiple hops and aggregated into multiple subflows. Each subflow can select the optimal transmission path according to the current network condition and then converge at the receiver.

The data plane comprises the Walker Delta constellation

N_{S} N_{P} / N_{P} / F

, which is distributed in a uniform and symmetrical manner, and the ground station. The

N_{P}

orbital planes are distributed uniformly along the equator, while the

N_{S}

satellites are distributed uniformly on each plane. Each satellite is capable of being connected to four inter-satellite links (ISLs), comprising two intra-plane ISLs and two inter-plane ISLs. In the control plane, the network control center assumes the functions of unified topology management, traffic scheduling, and path control. Due to the mobility of LEO megaconstellation networks, inter-plane ISLs are subject to frequent disruptions and rebuilds, and the topology is in a state of constant flux. Consequently, the NCC must be continuously updated to accommodate the evolving topology, which presents a significant challenge to multipath routing planning and traffic scheduling.

3.2. Multipath Routing and Traffic Model

The MCN is modeled as a spatiotemporal graph

G = \{(V, E, T, N)\}

, where

V = S \cup G

is the node consisting of all satellites and terrestrial gateways and

E = E_{S} \cup E_{G}

is the set of ISLs

E_{S}

and the set of terrestrial links

E_{G}

, while T is the time slot vector of a length N. As illustrated in Figure 1, the mesh topology of the MCN allows for the existence of multiple potential routes between any given pair of nodes. Assume that the number of packet flows at time slot t is M (i.e.,

P_{t} = \{(s_{i}, d_{i})| i = 1, 2, \dots, M\}

). In other words, the variable M represents the number of subflows, as shown in Figure 1, where three subflows in the sender are awaiting transmission via multiple paths. The number of available paths between the source node

s_{i}

and destination node

d_{i}

is

L_{s_{i}, d_{i}}

, denoted as

R_{t} = \{p_{1, 1}^{t}, \dots, p_{1, L}^{t}, \dots, p_{M, L}^{t}\}

. Here,

p_{m, l}^{t}

is the lth path between the mth pair of source and destination nodes, consisting of all ISLs on that path (i.e.,

p_{m, l}^{t} = \{ε_{1}, ε_{2}, \dots, ε_{j}, \dots, ε_{n_{m, l}}\}

, where

n_{m, l}

is the total hop count of the current link).

Let

q_{ε_{j}}^{t}

be the queuing delay of the link

ε_{j}

,

r_{ε_{j}}^{t}

be the packet processing delay, and

γ_{ε_{j}}^{t}

be the link distance. Then, the delay of the lth transmission path of the mth source-destination node pair can be expressed as follows:

η_{m, x}^{t} = \sum_{j = 1}^{n_{m, x}} (q_{ε_{j}}^{t} + r_{ε_{j}}^{t} + \frac{γ_{ε_{j}}^{t}}{c})

(1)

where c is the vacuum speed of light and

n_{m, l}

denotes the hop count of the current link. Each source node can divide the traffic into

L_{s_{i}, d_{i}}

subflows to pass through

L_{s_{i}, d_{i}}

paths in parallel. Let

ω_{m, l}^{t}

be the traffic ratio of each subflow. Then, the sum of

L_{s_{i}, d_{i}}

subflows between any pair of source and destination nodes should be one, which can be expressed as follows:

\sum_{l = 1}^{L} ω_{m, l}^{t} = 1, \forall m = 1, 2, \dots, M

(2)

Therefore, the traffic of link

ε_{j}

at time t can be expressed by

f_{ε_{j}}^{t} = \sum_{m = 1}^{M} \sum_{l = 1}^{L} b_{m}^{t} ω_{m, l}^{t} δ_{m, l}^{t}

(3)

where

b_{m}^{t}

is the bandwidth requirement of the mth pair to the source and destination nodes and

δ_{m, l}^{t}

is the path packet loss rate. Noting that the maximum bandwidth of the interstellar link is

B W

, the ISL traffic should not exceed the maximum bandwidth limit, which can be expressed as follows:

f_{ε_{j}}^{t} \leq B W, \forall ε_{j} \in p_{m, l}^{t}

(4)

Consequently, the throughput between the mth node pair can be expressed by

f_{m, l}^{t} = max \{f_{ε_{j}}^{t}\}, \forall ε_{j} \in p_{m, l}^{t}

(5)

3.3. Problem Formulation

In order to measure the delay and throughput of ISLs, the utility function of the MCN is defined as

U ({\bar{f}}_{t}, {\bar{d}}_{t}) = β_{1} log ({\bar{f}}_{t}) - β_{2} log ({\bar{d}}_{t})

(6)

where

β_{1}

and

β_{2}

are the importance coefficients of the throughput and delay, respectively,

β_{1}

and

β_{2}

satisfy

β_{1} + β_{2} = 1

, and

{\bar{f}}_{t}

and

{\bar{d}}_{t}

are the average throughput and delay of the current time slot, respectively, which can be expressed as follows:

{\bar{f}}_{t} = \sum_{m = 1}^{M} \sum_{l = 1}^{|f_{m}|} \sum_{y = 1}^{|p_{k}|} \frac{\nabla_{p a k}}{t}, \forall t \in [1, T]

(7)

{\bar{d}}_{t} = \frac{\sum_{m = 1}^{M} \sum_{l = 1}^{|f_{m}|} \sum_{y = 1}^{|p_{k}|} d_{m, l}^{t} κ_{y}}{\sum_{m = 1}^{M} \sum_{l = 1}^{|f_{m}|} \sum_{y = 1}^{|p_{k}|} κ_{y}}, \forall t \in [1, T]

(8)

where

|f_{m}|

is the number of data flows between the mth node pair,

|p_{k}|

is the number of packets for the kth data flow,

\nabla_{p a k}

is the packet size, and

κ_{y} \in \{0, 1\}

is a binary variable which measures whether the yth packet was successfully sent.

Consequently, the satellite network multipath routing optimization can be expressed as the following optimization problem:

\begin{matrix} P 0 : & max U (\bar{f}, \bar{d}), \forall t \in [1, T] \\ C 1 : \bar{d} = \frac{1}{N} \sum_{i = 1}^{N} {\bar{d}}_{t} \\ C 2 : \bar{f} = \frac{1}{N} \sum_{i = 1}^{N} {\bar{f}}_{t} \\ s . t . & C 3 : f_{ε_{j}}^{t} \leq B W, \forall ε_{j} \in p_{m, l}^{t} \\ C 4 : \sum_{l = 1}^{L} ω_{m, l}^{t} = 1, \forall m \in [1, M] \\ C 5 : d_{m, l}^{t} \leq d_{m, y}^{t}, \forall 1 \leq l \leq y \leq L, \forall m \in [1, M] \end{matrix}

(9)

where constraints C1 and C2 denote the average delay and throughput of the satellite network, respectively, constraint C3 stipulates that the the throughput of link

ε_{j}

at any given moment t cannot exceed the bandwidth limit, constraint C4 denotes that the total communication between any pair of nodes is the cumulative result of multiple sub-streams traversing multiple paths, and finally, constraint C5 describes the delay constraints of all candidate paths between each pair of nodes. It can be seen that P0 is an NP-hard problem with a rugged solution space. P0 consists of two parts—multipath routing discovery and multipath traffic scheduling—which converge slowly and may fall into local optima if solved by stochastic optimization. In order to reduce the solution’s complexity, this paper decouples the problem into two subproblems, routing discovery and traffic scheduling, to improve the efficiency of the algorithm.

4. DRL-Based Multipath Routing

In this paper, we propose DRL-based multipath routing (DMR), as shown in Figure 2, which first uses the minimum hop count based-multipath discovery algorithm to solve for multiple paths with acceptable delays and then utilizes the GMTS algorithm to determine the traffic scheduling scheme based on the state of the constellation network.

4.1. Multipath Routing Discovery

The objective of the multipath routing discovery subproblem is to identify the optimal delay path and the suboptimal delay path. In MCNs, the propagation delay, which depends on the number of path hops, represents a significant component of the end-to-end delay. Consequently, multipath paths are primarily considered for the hop-optimal and suboptimal paths within the network. The multipath routing discovery subproblem can be expressed as follows:

\begin{matrix} P 1 : & min d_{m, l}^{t}, \forall t \in T, m \in [1, M], l \in [1, L] \\ s . t . & d_{m, l}^{t} \leq d_{m, y}^{t}, \forall 1 \leq l \leq y \leq L \end{matrix}

(10)

The minimum hop count path between any pair of nodes is initially determined. On this basis, the NCC determines the suboptimal transmission path (i.e., the backup path) for each flow in descending order of traffic. This is because centralized traffic is more likely to cause bottleneck link congestion than decentralized tiny traffic. In the Wakler Delta constellation

N_{S} N_{P} / N_{P} / F

, as shown in Figure 3, the inclination is

α

, and the right ascension of the ascending node (RAAN) difference of adjacent orbits is

Δ Ω = 2 π / N_{P}

. The phase difference between neighboring satellites in the same orbit is

Δ Φ = 2 π / N_{S}

, while the phase difference between neighboring satellites in adjacent orbits is

Δ υ = 2 π F / N_{S} N_{P}

, while

u_{1}

and

u_{2}

are the argument of latitude (or phase angle) values of the source and destination satellites, respectively. The argument of latitude is the angle between the ascending node and the satellite and basically defines the position of the satellite in the orbit, where

ζ (u_{1})

and

ζ (u_{2})

indicate the longitude difference of the source and destination satellite to the corresponding ascending nodes, respectively. In the Walker Delta constellation, the end-to-end hop count includes both transverse inter-plane hops

H_{h}

and intra-plane hops

H_{v}

.

4.1.1. Inter-Plane Hops $H_{h}$

The number of inter-plane hops depends on the RAAN difference

Δ Υ_{0}

of the orbits where the source and destination nodes are located. Given a source and destination satellite pair

(s, d)

,

Δ Υ_{0}

is calculated as follows:

Δ Υ_{0} = (Υ_{2} - Υ_{1}) mod 2 π \in [0, 2 π]

(11)

where

Υ_{1}

and

Υ_{2}

are the RAAN values of s and d, respectively. If the destination satellite is to the west of the source satellite, then the RAAN difference is

2 π - Δ Υ_{0}

. Since the RAAN difference between adjacent orbits is constant (i.e.,

Δ Ω = 2 π / N_{P}

), the inter-plane hops in the west and east directions can be expressed as follows:

H_{h}^{\leftarrow} = 〈\frac{2 π - Δ Υ_{0}}{Δ Ω}〉

(12)

H_{h}^{\to} = 〈\frac{Δ Υ_{0}}{Δ Ω}〉

(13)

where

〈x〉 = sgn (x) ⌊x + 1 / 2⌋

denotes the integer closest to x,

H_{h}^{\leftarrow}

is the hop count between the orbital planes to the west, and

H_{h}^{\to}

is the eastward inter-plane hop count.

4.1.2. Intra-Plane Hops $H_{v}$

The intra-plane hops depend on the phase angle difference

Δ u

of the satellite. Each intra-plane hop increases the phase angle by

Δ Φ

, and each inter-plane hop results in a phase angle increment of

Δ f

. Therefore, the phase angle of the destination satellite can be expressed as follows:

u_{2} = u_{1} + Δ f \cdot H_{h}^{\to} + \underset{Δ \vec{u}}{\underset{⏟}{Δ Φ \cdot H_{v}^{↗}}}

(14)

where

H_{v}^{↗}

is the eastward intra-plane hop count. In order to compute

H_{v}

, it is necessary to first eliminate the phase difference

Δ f \cdot H_{h}^{\to}

caused by

H_{h}

from the phase difference

Δ u = u_{2} - u_{1}

, where

u_{1}

and

u_{2}

are the argument of latitude values of the source and destination satellites, respectively. To distinguish between the phase difference due to eastward and westward propagation,

Δ u

can be expressed separately as follows:

Δ \vec{u} = (u_{2} - u_{1} - H_{h}^{\to} \cdot Δ f) mod 2 π

(15)

Δ \overset{\leftarrow}{u} (u_{2} - u_{1} + H_{h}^{\leftarrow} \cdot Δ f) mod 2 π

(16)

where

Δ f

is the phase angle change due to inter-plane hops. As illustrated in Figure 3, the track of the sub-satellite point is divided into an ascending segment (from southwest to northeast) and a descending segment (from northwest to southeast), allowing for the propagation of data in both directions. Consequently, in this paper, the intra-plane hops in the four directions are calculated, which can be expressed as follows:

H_{v}^{↖} = |\frac{Δ \overset{\leftarrow}{u}}{Δ Φ}|

(17)

H_{v}^{↗} = |\frac{Δ \vec{u}}{Δ Φ}|

(18)

H_{v}^{↙} = |\frac{2 π - Δ \overset{\leftarrow}{u}}{Δ Φ}|

(19)

H_{v}^{↘} = |\frac{2 π - Δ \vec{u}}{Δ Φ}|

(20)

where

Δ \overset{\leftarrow}{u}

and

Δ \vec{u}

denote the phase difference due to westward and eastward propagation between adjacent orbits, respectively. Meanwhile,

H_{v}^{↖}, H_{v}^{↗}, H_{v}^{↙}

, and

H_{v}^{↘}

denote the phase difference due to hops in the orbit toward the northwest, northeast, southwest, and southeast, respectively.

The inter-plane hops

H_{h}

and the intra-plane hops,

H_{v}

were calculated earlier. Consequently, the end-to-end minimum hop count in the inclined orbit constellation can be expressed as follows:

H = min \{\begin{matrix} H_{h}^{\leftarrow} + H_{v}^{↖} \\ H_{h}^{\leftarrow} + H_{v}^{↙} \\ H_{h}^{\to} + H_{v}^{↗} \\ H_{h}^{\to} + H_{v}^{↘} \end{matrix}\}

(21)

On the basis of obtaining the shortest hop count path, multiple available paths are generated. Taking the suboptimal path computation between node pairs

(s, d)

as an example, firstly, the occupancy frequency

F_{a, b}^{t}

of each link and the minimum hop count path set

R_{a, b}^{t}

are computed. Meanwhile, the link occupancy frequency threshold

ξ_{ε}

is defined. The weight

m_{a, b}^{t} \in [r_{1}, r_{2}]

of link

(a, b)

is set if

F_{a, b}^{t} \leq ξ_{ε}

; otherwise,

m_{a, b}^{t} \in [r_{2}, r_{3}]

, where

r_{1} < r_{2} < r_{3}

are random numbers. In order to prevent the interconnection of disparate paths, the NCC will eliminate the links in

R_{a, b}^{t}

from the network. Subsequently, the NCC will identify the optimal feasible paths for

(s, d)

among the remaining links, utilising the principle of the shortest end-to-end hop count. The aforementioned process is repeated continuously until L possible paths between any two nodes in the network have been identified. The specific process of MHMRD is illustrated in Algorithm 1. The computational complexity of Algorithm 1 is

O (N log N)

.

Algorithm 1: Minimum hop count-based multipath route discovery (MHMRD).

4.2. Multipath Traffic Scheduling

Based on the completion of multipath path discovery, the second subproblem of multipath routing for MCNs is multipath traffic scheduling, which can be expressed as follows

\begin{matrix} P 2 : & max U (\bar{f}, \bar{d}), \forall t \in T \\ s . t . & \sum_{l = 1}^{L} ω_{m, l}^{t} = 1, \forall m = 1, \cdot \cdot \cdot, M \\ f_{ε_{j}}^{t} \leq B W, \forall ε_{j} \in p_{m, l}^{t} \end{matrix}

(22)

Since the traffic assignment decision in satellite networks depends on the current node and surrounding node states in the network, independent of the historical state, the multipath traffic scheduling subproblem can be described as a Markov decision process (MDP). The occurrence of failures in ISLs or nodes has the effect of impairing the functionality of the satellite network, resulting in frequent alterations to the network topology. Given that the GNN has a superior generalization capability for topologies of varying sizes, this paper proposes the adoption of GNN-based multipath traffic scheduling (GMTS), as illustrated in Figure 4. The DRL agent uses a proximal policy optimization (PPO) algorithm, which is an actor-critic algorithm. At each stage of the MDP, the actor model selects a traffic scheduling decision, and the critic model assigns a score to that decision based on the environmental reward feedback. The process is repeated continuously, with the objective of optimizing the flow scheduling task in order to maximize the cumulative reward.

4.2.1. State

By dividing the time into multiple time slots, the traffic demand matrix within each time slot can be denoted as

T R = [b_{1}, b_{2}, \dots, b_{M}]

, where

b_{i}

denotes the bandwidth between the ith pair of nodes. According to Equation (5), the throughput of the jth link is

f_{j}

. Consequently, the residual bandwidth of the ith link, denoted by

c_{i}

, can be expressed in terms of the link bandwidth

B W

:

c_{i} = B W - f_{i}, i \in [1, L]

(23)

Given that the maximum number of hops between pairs of nodes is

H_{max}

, the jth path between the ith pair of nodes can be designated as

p_{i, j} = \{e_{1}, e_{2}, \dots, e_{H_{max}}\}

, and the corresponding path matrix can be designated as

P_{t} = [p_{1, 1}, \dots, p_{1, L}, \dots, p_{M, L}]

. Therefore, the state vector at time slot t can be expressed as follows:

s_{t} = [C_{t}, T R_{t}, G_{t}, P_{t}], s_{t} \in S

(24)

where S is the satellite network state space.

4.2.2. Action

Based on the information in the state space, NCC splits the network traffic between each node pair onto L candidate paths which are precomputed. Therefore, the action is defined as a vector

a_{t}

consisting of the proportion of traffic on the candidate paths, which can be expressed as follows:

a_{t} = [ω_{1, 1}^{t}, \dots, ω_{1, L}^{t}, \dots, ω_{M, L}^{t}]

(25)

where the proportion of flow on each path should satisfy Equation (2).

4.2.3. Reward

After executing action

a_{t}

under the state

s_{t}

at time t, the agent will gain feedback to evaluate the effectiveness of the action. In multipath traffic scheduling, the reward is the objective function of subproblem P2, which can be expressed as follows:

r_{t} = \sum_{m = 1}^{M} \sum_{l = 1}^{L} U ({\bar{f}}_{m, l}, {\bar{d}}_{m, l})

(26)

where

{\bar{f}}_{m, l}

is the average throughput on the lth path between the mth pair of nodes and

{\bar{d}}_{m, l}

is the average delay on the lth path between the mth pair of nodes. The agent will seek to complete the multipath traffic scheduling with a higher throughput and lower latency. Since the decision making is divided into multiple phases, the cumulative reward

R_{t}

, based on a defined single momentary reward, can be expressed as follows:

R_{t} = \sum_{k = 0}^{\infty} ρ^{k} r_{t + k + 1}

(27)

where

ρ \in [0, 1]

is the discount factor. Given that there are multiple possible actions in each state, the average gain for each state and the average gain for each state action can be expressed as follows:

Γ_{μ} (s) = E_{μ} (G_{t}| s_{t} = s)

(28)

Ψ_{μ} (s, a) = E_{μ} (G_{t}| s_{t} = s, a_{t} = a)

(29)

Here,

Ψ_{μ} (s, a)

is maximized when the agent of the DRL finds the optimal policy

μ^{*}

, converting the process of updating

Ψ_{μ} (s, a)

with the number of iterations to the Bellman form; in other words, we have

Ψ_{μ} (s, a) = E_{μ} [r_{t} + ρ Ψ_{μ} (s_{t + 1}, a_{t + 1})| s_{t} = s, a_{t} = a]

(30)

After the DRL converges to a stable level, the corresponding optimal action

a_{μ^{*}}

can be expressed as follows:

a_{μ^{*}} = arg max_{a} Ψ_{μ^{*}} (s, a)

(31)

4.3. Training Process of GMTS

The pseudo-code for the GMTS training process is shown in Algorithm 2 with the computational complexity of

O (L)

. The initialization of

μ (s| θ^{μ})

and

Q (s, a| θ^{Q})

is first required, where

θ^{μ}

and

θ^{Q}

are hyperparameters of the actor network and critical network, respectively. During the training phase, the experience replay buffer is employed to store environmental interaction data, including states, actions, and rewards. In order to prevent overfitting, the generated samples

(s_{t}, a_{t}, r_{t}, s_{t + 1}, χ)

are stored in the experience replay buffer at each decision step. At this point,

χ \in \{0, 1\}

is used to determine whether the training of the model is complete in the current phase. The sample

(s_{i}, a_{i}, r_{i}, s_{i + 1}, χ)

will be extracted at specified intervals in accordance with the playback buffer threshold

B_{max}

with the objective of training the network. At this point, the target Q value can be expressed as follows:

y_{i} = r_{i} + ρ (1 - χ) [Q (s_{i + 1}, a_{i + 1}| θ^{Q})]

(32)

In the GMTS algorithm, the temporal difference error (TD error) is employed as a means of gauging the significance of transitions. The TD error between the target Q value and the current Q value can be expressed as follows:

δ_{i} = |y_{i} - Q (s_{i}, a_{i}| θ^{Q})|

(33)

The corresponding proportional prioritization is

p_{i} = δ_{i} + η

(34)

where

η

denotes an extremely small amount and

η

is added so that the samples can be sampled even when the TD error is zero. The stochastic sampling method is adopted, and the probability of sampling transition i is

P (i) = \frac{p_{i}^{α}}{\sum_{k} p_{k}^{α}}

(35)

where

α

determines how much prioritization is used. When

α = 0

, then the sampling method is degraded to random sampling. The importance sample (IS) weights of

(s_{i}, a_{i}, r_{i}, s_{i + 1}, χ)

are

ω_{i} = {(\frac{P (i)}{P_{min}})}^{β}

(36)

where

β

is used for biased and unbiased control. Given the learning rate

ϑ^{Q}

, the actor network

θ^{Q}

can be updated in the following way:

θ^{Q} \leftarrow θ^{Q} + ϑ^{Q} δ_{i} \nabla_{θ^{Q}} Q (s, a| θ^{Q})

(37)

Therefore, the loss function of the critic network and actor network can be expressed as follows:

L_{q} = \frac{1}{N_{B}} \sum_{i = 1}^{B_{max}} ω_{i} δ_{i}^{2}

(38)

L_{μ} = - Q (s, a| θ^{Q})

(39)

where

N_{B}

is the size of the replay buffer B.

Algorithm 2: Training process of the GNN-based multipath traffic scheduling (GMTS).

4.4. Workflow of DMR

The DRL-based multipath routing (DMR) proposed in this paper can be divided into two phases, minimum hop count-based multipath path discovery (MHMRD) and GNN-based multipath traffic scheduling (GMTS), as shown in Figure 5. In the first stage, the network control center first plans multiple optimal paths between arbitrary node pairs based on the minimum hop count principle according to the network topology. Specifically, based on the periodicity and predictability of the satellite orbits, the network is divided into multiple time slices according to time slot intervals, and the network topology remains stable in each discrete time slice. Within each time slot, MHMRD calculates L minimum hop count available paths for each service based on the priority order defined by the MHMRD based on the end-to-end service volume. After completing the calculation for the current timeslot, it moves to the next timeslot. In the second phase, the network control center generates the split ratios for each service on different paths based on the model obtained from GMTS training and sends them to the corresponding satellites at the appropriate time before the start of each time slot.

On the one hand, this paper adopts priority experience replay to speed up the convergence of the model by sampling the samples in the experience replay pool according to a certain priority. On the other hand, due to the dynamic nature of satellite networks, the network topology and traffic matrices are time-varying, and traditional DNNs cannot effectively handle such changing inputs. Therefore, in this paper, we adopt GNN for variable size graph structures and route configurations accordingly. Since GNNs have better generalization capabilities for dynamic network structures at different scales, they can be used to aggregate elementary features without specifying the input dimensions.

5. Performance Evaluation

5.1. Simulation Set-Up

In this paper, we implement routing computation and multipath traffic scheduling for LEO megaconstellations in NS3 to analyze the performance of the proposed DMR algorithm. In order to analyze the scalability of multipath routing on different constellations, this paper conducts tests on two different-sized constellations. The first constellation was the Iridium constellation [35] with 66 polar orbiting satellites, and the second was the OneWeb constellation [36] with 648 satellites, which contains a total of 18 orbital planes with 36 satellites evenly distributed on each orbital plane. The parameters of the two constellations are shown in Table 1. In order to validate the performance of multipath routing under different sizes of traffic demand, a traffic dataset was generated based on the ground traffic density, where the number of source-destination node pairs was 50. The proposed GMTS algorithm was implemented based on Python 3.9 and Pytorch 1.14, and the corresponding parameters are shown in Table 2.

The performance of the proposed minimum hop count multipath path discovery-based (MHMRD) algorithm, which distributes end-to-end traffic in equal proportions to precomputed multiple paths, was tested first. Based on this, the GMTS algorithm was trained using inclined orbit constellation topology and traffic matrices [37], which were deployed to the NCC after training and tested for key metrics such as the average throughput, latency, and flow completion rate of the satellite network. The following comparison algorithms were used in this paper:

Shortest path first algorithm (SPF): SPF employs the average transmission delay as a link metric with the objective of minimizing the total delay.
Network coding-based multipath cooperative routing (NCMCR) [24]: NCMCR is designed to address the routing challenges posed by a frequently changing topology and potentially sparse and intermittent connectivity. The NCMCR algorithm takes advantage of the predictability of the relative motion of the satellites, with the time-varying topology modeled as a spatiotemporal map.
Ant-based multipath backbone routing for load balancing (AMBRLB) [38]: In order to overcome the issues of traffic overflow and overhead, an ant-based load balancing multipath backbone routing algorithm in MANET was proposed. Upon the initiation of transmission by a source node to a destination node, the ant colony optimization (ACO) algorithm is employed to identify multiple paths with the highest probability of success.
Deep deterministic policy gradient traffic engineering (DDPG-TE) [39]: This method employs the DDPG algorithm to dynamically allocate the traffic proportions of different transmission paths. It should be noted that the model is not applicable to arbitrary satellite constellations, as the dimensions of the state and action spaces are constrained by the size of the input topology and traffic matrices.

5.2. Results and Analysis

5.2.1. Throughput

The average throughput of the MCN over the Iridium constellation and OneWeb constellation are illustrated in Figure 6, with each data point representing the average value of the throughput under five consecutive topological snapshots. It can be observed that, in the context of multipath routing, as the input traffic volume increased, a greater proportion of traffic was allocated to multiple paths for forwarding, based on different policies. This resulted in an overall increase in the average network throughput. Nevertheless, the increase in throughput gradually slowed down. The proposed DMR algorithm exhibited a superior throughput at varying user sizes. DMR enhanced the average throughput by 30.26% and 8.45% in comparison with SPF and NCMCR, respectively, when the total service volume reached its upper limit. Figure 6b illustrates the throughput of multipath routing on the OneWeb constellation. It can be observed that DMR employed the same model as that used for the Iridium constellation, while the other algorithms were retrained based on the characteristics of the new network. It can be observed that the DMR algorithm continued to demonstrate superior performance relative to the other benchmark algorithms in the context of changes in the constellation topology and size, reflecting the enhanced generalization ability of DMR. When the traffic intensity was 8 Gbps, DMR demonstrated an improvement in the average throughput of 42.64% and 9.55% in comparison with SPF and NCMCR, respectively.

5.2.2. Average Flow Completion Ratio

Figure 7 illustrates the average flow completion ratio for varying traffic intensities on the Iridium and OneWeb constellations. As the intensity of traffic increased, some nodes in the network gradually became congested. This is evidenced by a decreasing trend in the average flow completion ratios of all five routing schemes. At a traffic intensity of 8 Gbps, the average flow completion ratio of the proposed DMR scheme was enhanced by 17.39% and 11.52% on the Iridium and OneWeb constellations, respectively, in comparison with SPF. Meanwhile, on the OneWeb constellation, the average stream completion rate of the DMR algorithm improved by 1.98%, 2.58%, and 6.49% compared with NCMCR, AMBRLB, and DDPG-TE, respectively. Consequently, it can be seen that the average flow completion ratio on the larger OneWeb constellation was consistently higher than that on the smaller Iridium constellation, even under varying traffic intensities. This demonstrates the advantage of large-scale constellations in handling high loads. It can therefore be concluded that the proposed DMR scheme demonstrates superior generalization ability across different scale constellations, with a higher flow completion ratio than the baseline scheme.

5.2.3. Average End-to-End Delay

Figure 8 illustrates the average end-to-end delay as a function of the traffic intensity on the Iridium and OneWeb constellations. As the network load increased, the queuing delay also rose, with more flows transmitted along the candidate paths with higher hop counts. Consequently, the average end-to-end delay gradually increased for all of the compared schemes. In the Iridium constellation, SPF achieved the lowest end-to-end delay when the network load was low. This is due to the fact that the scheme always transmits the packets along the shortest path when all the links in the network are in a normal state. As the network load increased, when the total service volume reaches the upper bound, SPF is unable to dynamically change the path to avoid congested paths caused by the increased load. Consequently, on the Iridium constellation, the DMR scheme reduced the average end-to-end delay by 3.67%, 1.78%, 1.86%, and 0.86% compared with SPF, NCMCR, DDPG-TE, and AMBRLB, respectively. Upon extending multipath routing to larger OneWeb constellations, the delay for DMR approached that of NCMCR, AMBRLB, and DDPG-TE when the traffic intensity reached its upper limit. This is due to the fact that both the DMR scheme and the DDPG-TE scheme take into account the dynamic distribution of traffic across different paths. In general, the mean end-to-end delay of DMR at a high load was situated between those of SPF and DDPG-TE. The primary objective of multipath routing is to enhance the network’s throughput and resilience when the topology is disrupted. In small-scale constellations, multipath routing offers additional routing options which can be employed to circumvent congested nodes in a timely manner, thereby reducing the delay. However, in large-scale constellations, there are always sufficient alternative routes, and the improvement in end-to-end delay metrics is not substantial.

The experimental results demonstrate that the SPF scheme with the shortest path exhibited a lower average delay in low-load scenarios. However, SPF exhibited the lowest average throughput and average flow completion ratio, and the performance of SPF deteriorated rapidly under high-load conditions. In contrast, the proposed DMR scheme exhibited a better average throughput and flow completion ratio under different loads, and it exhibited a better generalization ability over constellations of different sizes and topologies without repeated training.

6. Conclusions

This paper proposed a deep reinforcement learning (DRL)-based multipath routing (DMR) solution for satellite networks. The objective was to address the multipath routing transmission problem in LEO megaconstellation networks. The proposed DMR approach split multipath routing into two subproblems: multipath route selection and multipath traffic scheduling. In the context of the multipath path selection subproblem, this paper proposed a centralized routing scheme, MHMRD, which generates a set of available paths for end-to-end nodes. This is achieved by collecting node state information based on the current end-to-end minimum hop count and link utilization. In the multipath traffic scheduling subproblem, GNN-based multipath traffic scheduling (GMTS) is proposed. Each routing node is controlled by a DQN agent, and the GNN is used to address the issue of topologically varying satellite network topologies. The optimal multipath traffic scheduling model was obtained through iterative training. The proposed DMR scheme was finally validated on two different sizes of megaconstellations in the NS3 simulation environment. The simulation results demonstrate that the proposed DMR scheme exhibited superior performance in terms of the average network throughput and average flow completion ratio compared with the baseline scheme. Furthermore, the DMR scheme effectively addressed satellite network routing under load balancing while maintaining minimal increases in delay costs. Furthermore, the DMR scheme devised in this paper exhibited favorable scalability and adaptability in the context of evolving constellation topologies. It is also capable of being extended to satellite networks with changing topologies without the necessity for additional training.

It should be noted that as the size of the constellation continues to expand, the computational requirements of DQN-based online routing also rise. In the future, we intend to explore the potential of adopting solutions such as multi-controller deployment to enhance the responsiveness of the network and further validate its efficacy in real-world scenarios.

Author Contributions

Conceptualization, C.H.; methodology, C.H.; software, R.Y.; validation, W.X.; formal analysis, C.H.; investigation, W.X.; resources, W.X.; data curation, C.H.; writing—original draft preparation, C.H.; writing—review and editing, C.H.; visualization, C.H.; supervision, W.X.; project administration, W.X. and R.Y.; funding acquisition, W.X. and R.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Key Laboratory Fund Project under grant number 614201003022207 and the Aerospace Discipline Education New Engineering Project under grant number 145AXL250004000X.

Data Availability Statement

The relevant data and source code of this paper can be accessed by contacting hanchi@hgd.edu.cn.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

DRL	Deep reinforcement learning
DMR	DRL-based multipath routing
MHMRD	Minimum hop count-based multipath routing discovery algorithm
GNN	Graph neural network
GMTS	GNN-based multipath traffic scheduling
MCN	Megaconstellation network
SAGIN	Space-air-ground integrated network
ISLs	Inter-satellite links
SDN	Software-defined network
RAAN	Right ascension of ascending node
NCC	Network control center
LEO	Low Earth orbit
DQN	Deep Q network

References

Jiang, X.; Huang, Y.; Li, J.; He, H.; Chen, S.; Yang, F.; Yang, J. Spatio-Temporal Routing, Redundant Coding and Multipath Scheduling for Deterministic Satellite Network Transmission. IEEE Trans. Commun. 2023, 71, 2860–2875. [Google Scholar] [CrossRef]
Kodheli, O.; Lagunas, E.; Maturo, N.; Sharma, S.K.; Shankar, B.; Montoya, J.F.M.; Duncan, J.C.M.; Spano, D.; Chatzinotas, S.; Kisseleff, S.; et al. Satellite Communications in the New Space Era: A Survey and Future Challenges. IEEE Commun. Surv. Tutor. 2021, 23, 70–109. [Google Scholar] [CrossRef]
Rakhmanov, A.; Wiseman, Y. Compression of GNSS Data with the Aim of Speeding up Communication to Autonomous Vehicles. Remote Sens. 2023, 15, 2165. [Google Scholar] [CrossRef]
Correia, S.D.; Perez, R.; Matos-Carvalho, J.; Leithardt, V.R.Q. µJSON, a Lightweight Compression Scheme for Embedded GNSS Data Transmission on IoT Nodes. In Proceedings of the 2022 5th Conference on Cloud and Internet of Things (CIoT), Marrakech, Morocco, 28–30 March 2022; pp. 232–238. [Google Scholar]
Xue, C.; Li, W.; Yu, L.; Shang, J.; Chen, X.; Lu, S. SERO: A Model-Driven Seamless Roaming Framework for Wireless Mesh Network With Multipath TCP. IEEE Trans. Commun. 2019, 67, 1284–1296. [Google Scholar] [CrossRef]
Liu, X.; Ma, T.; Qin, X.; Zhou, H.; Zhao, L. A DRL Empowered Multipath Cooperative Routing for Ultra-Dense LEO Satellite Networks. In Proceedings of the GLOBECOM 2023, Kuala Lumpur, Malaysia, 4–8 December 2023; pp. 5961–5966. [Google Scholar]
Li, R.; Zhang, J.; Zheng, S.; Wang, K.; Wang, P.; Zhang, X. LEO Mega-Constellations Routing Algorithm Based on Area Segmentation. In Proceedings of the 2023 IEEE Wireless Communications and Networking Conference (WCNC), Glasgow, UK, 26–29 March 2023; pp. 1–6. [Google Scholar]
Li, Y.; Zhang, Q.; Yao, H.; Gao, R.; Xin, X.; Yu, F.R. Stigmergy and Hierarchical Learning for Routing Optimization in Multi-Domain Collaborative Satellite Networks. IEEE J. Select. Areas Commun. 2024, 42, 1188–1203. [Google Scholar] [CrossRef]
Perkins, C.E.; Royer, E.M. Ad-Hoc on-Demand Distance Vector Routing. In Proceedings of the WMCSA’99. Second IEEE Workshop on Mobile Computing Systems and Applications, New Orleans, LA, USA, 25–26 February 1999; pp. 90–100. [Google Scholar]
Jiang, F.; Zhang, Q.; Yang, Z.; Yuan, P. A Space–Time Graph Based Multipath Routing in Disruption-Tolerant Earth-Observing Satellite Networks. IEEE Trans. Aerosp. Electron. Syst. 2019, 55, 2592–2603. [Google Scholar] [CrossRef]
Huang, Y.; Yang, D.; Feng, B.; Tian, A.; Dong, P.; Yu, S.; Zhang, H. A GNN-Enabled Multipath Routing Algorithm for Spatial-Temporal Varying LEO Satellite Networks. IEEE Trans. Veh. Technol. 2024, 73, 5454–5468. [Google Scholar] [CrossRef]
Huang, Y.; Jiang, X.; Chen, S.; Yang, F.; Yang, J. Pheromone Incentivized Intelligent Multipath Traffic Scheduling Approach for LEO Satellite Networks. IEEE Trans. Wirel. Commun. 2022, 21, 5889–5902. [Google Scholar] [CrossRef]
Tian, A.; Feng, B.; Zhou, H.; Huang, Y.; Sood, K.; Yu, S.; Zhang, H. Efficient Federated DRL-Based Cooperative Caching for Mobile Edge Networks. IEEE Trans. Netw. Serv. Manag. 2023, 20, 246–260. [Google Scholar] [CrossRef]
Huang, X.; Yuan, T.; Qiao, G.; Ren, Y. Deep Reinforcement Learning for Multimedia Traffic Control in Software Defined Networking. IEEE Netw. 2018, 32, 35–41. [Google Scholar] [CrossRef]
Lei, K.; Guo, P.; Wang, Y.; Wu, X.; Zhao, W. Solve Routing Problems with a Residual Edge-Graph Attention Neural Network. Neurocomputing 2022, 508, 79–98. [Google Scholar] [CrossRef]
Zheng, X.; Huang, W.; Li, H.; Li, G. Research on Generalized Intelligent Routing Technology Based on Graph Neural Network. Electronics 2022, 11, 2952. [Google Scholar] [CrossRef]
Wei, H.; Zhao, Y.; Xu, K. G-Routing: Graph Neural Networks-Based Flexible Online Routing. IEEE Netw. 2023, 37, 90–96. [Google Scholar] [CrossRef]
Rusek, K.; Suarez-Varela, J.; Almasan, P.; Barlet-Ros, P.; Cabellos-Aparicio, A. RouteNet: Leveraging Graph Neural Networks for Network Modeling and Optimization in SDN. IEEE J. Select. Areas Commun. 2020, 38, 2260–2270. [Google Scholar] [CrossRef]
Bi, Y.; Han, G.; Xu, S.; Wang, X.; Lin, C.; Yu, Z.; Sun, P. Software Defined Space-Terrestrial Integrated Networks: Architecture, Challenges, and Solutions. IEEE Netw. 2019, 33, 22–28. [Google Scholar] [CrossRef]
Hu, M.; Xiao, M.; Hu, Y.; Cai, C.; Deng, T.; Peng, K. Software Defined Multicast Using Segment Routing in LEO Satellite Networks. IEEE Trans. Mob. Comput. 2024, 23, 835–849. [Google Scholar] [CrossRef]
Han, Z.; Xu, C.; Zhao, G.; Wang, S.; Cheng, K.; Yu, S. Time-Varying Topology Model for Dynamic Routing in LEO Satellite Constellation Networks. IEEE Trans. Veh. Technol. 2023, 72, 3440–3454. [Google Scholar] [CrossRef]
Giambene, G.; Luong, D.K.; De Cola, T.; Le, V.A.; Muhammad, M. Analysis of a Packet-Level Block Coding Approach for Terrestrial-Satellite Mobile Systems. IEEE Trans. Veh. Technol. 2019, 68, 8117–8132. [Google Scholar] [CrossRef]
Chen, Z.; Zhou, W.; Wu, S.; Cheng, L. An Adaptive On-Demand Multipath Routing Protocol With QoS Support for High-Speed MANET. IEEE Access 2020, 8, 44760–44773. [Google Scholar] [CrossRef]
Tang, F.; Zhang, H.; Yang, L.T. Multipath Cooperative Routing with Efficient Acknowledgement for LEO Satellite Networks. IEEE Trans. Mob. Comput. 2019, 18, 179–192. [Google Scholar] [CrossRef]
Li, G.; Zhou, H.; Feng, B.; Zhang, Y.; Yu, S. Efficient Provision of Service Function Chains in Overlay Networks Using Reinforcement Learning. IEEE Trans. Cloud Comput. 2022, 10, 383–395. [Google Scholar] [CrossRef]
Kato, N.; Fadlullah, Z.M.; Mao, B.; Tang, F.; Akashi, O.; Inoue, T.; Mizutani, K. The Deep Learning Vision for Heterogeneous Network Traffic Control: Proposal, Challenges, and Future Perspective. IEEE Wirel. Commun. 2017, 24, 146–153. [Google Scholar] [CrossRef]
Mao, B.; Fadlullah, Z.M.; Tang, F.; Kato, N.; Akashi, O.; Inoue, T.; Mizutani, K. Routing or Computing? The Paradigm Shift Towards Intelligent Computer Network Packet Transmission Based on Deep Learning. IEEE Trans. Comput. 2017, 66, 1946–1960. [Google Scholar] [CrossRef]
Geyer, F.; Carle, G. Learning and Generating Distributed Routing Protocols Using Graph-Based Deep Learning. In Proceedings of the 2018 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks, Budapest, Hungary, 20 August 2018; pp. 40–45. [Google Scholar]
Liu, C.; Xu, M.; Yang, Y.; Geng, N. DRL-OR: Deep Reinforcement Learning-Based Online Routing for Multi-Type Service Requirements. In Proceedings of the IEEE INFOCOM 2021—IEEE Conference on Computer Communications, Vancouver, BC, Canada, 10–13 May 2021; pp. 1–10. [Google Scholar]
Almasan, P.; Suárez-Varela, J.; Rusek, K.; Barlet-Ros, P.; Cabellos-Aparicio, A. Deep Reinforcement Learning Meets Graph Neural Networks: Exploring a Routing Optimization Use Case. Comput. Commun. 2022, 196, 184–194. [Google Scholar] [CrossRef]
Sun, P.; Guo, Z.; Li, J.; Xu, Y.; Lan, J.; Hu, Y. Enabling Scalable Routing in Software-Defined Networks With Deep Reinforcement Learning on Critical Nodes. IEEE/ACM Trans. Netw. 2022, 30, 629–640. [Google Scholar] [CrossRef]
Kim, G.; Kim, Y.; Lim, H. Deep Reinforcement Learning-Based Routing on Software-Defined Networks. IEEE Access 2022, 10, 18121–18133. [Google Scholar] [CrossRef]
Jiang, Z.; Wu, Q.; Li, H.; Wu, J. scMPTCP: SDN Cooperated Multipath Transfer for Satellite Network With Load Awareness. IEEE Access 2018, 6, 19823–19832. [Google Scholar] [CrossRef]
De Santis, E.; Giuseppi, A.; Pietrabissa, A.; Capponi, M.; Delli Priscoli, F. Satellite Integration into 5G: Deep Reinforcement Learning for Network Selection. Mach. Intell. Res. 2022, 19, 127–137. [Google Scholar] [CrossRef]
Iridium Constellation. Available online: https://celestrak.org/NORAD/elements/table.php?GROUP=iridium (accessed on 30 July 2024).
Oneweb Constellation. Available online: https://celestrak.org/NORAD/elements/table.php?GROUP=oneweb (accessed on 30 July 2024).
MAWI Working Group Traffic Archive. Available online: https://mawi.wide.ad.jp/mawi/ (accessed on 30 July 2024).
Francis Antony Selvi, P.; Manikandan, M.S.K. Ant Based Multipath Backbone Routing for Load Balancing in MANET. IET Commun. 2017, 11, 136–141. [Google Scholar] [CrossRef]
Chen, B.; Sun, P.; Zhang, P.; Lan, J.; Bu, Y.; Shen, J. Traffic Engineering Based on Deep Reinforcement Learning in Hybrid IP/SR Network. China Commun. 2021, 18, 204–213. [Google Scholar] [CrossRef]

Figure 1. Multipath routing in the megaconstellation network.

Figure 2. The overall structure of DRL-based multipath routing.

Figure 3. Track of sub-satellite point of megaconstellation networks.

Figure 4. The framework of the proposed GNN-based multipath traffic scheduling (GMTS) algorithm.

Figure 5. DRL-based multipath routing for LEO megaconstellation networks.

Figure 6. Average throughput on different constellations with various traffic intensity. (a) Average throughput for Iridium. (b) Average throughput for OneWeb.

Figure 7. Average flow completion ratio of different constellations with various traffic intensity. (a) Average flow completion ratio for Iridium constellation. (b) Average flow completion ratio for OneWeb constellation.

Figure 8. Average end-to-end delay for different constellations with various traffic intensities. (a) Average end-to-end delay for Iridium constellation. (b) Average end-to-end delay for OneWeb constellation.

Table 1. Constellation parameters.

Constellation	$N_{P}$	$N_{S}$	$N_{G}$	$ε$	h	E
Iridium	6	11	16	$90^{°}$	780 km	$20^{°}$
OneWeb	18	36	16	$53^{°}$	550 km	$20^{°}$

Table 2. Training parameters.

	Parameter	Value
Network parameter	$N_{g}$	24
	$L_{q}$	100
	$\nabla_{p a k}$	1 KB
Learning parameter	$N_{B}$	32
	$B_{max}$	100
	$α$	0.6
	$β$	0.5
	$ρ$	0.8

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, C.; Xiong, W.; Yu, R. Deep Reinforcement Learning-Based Multipath Routing for LEO Megaconstellation Networks. Electronics 2024, 13, 3054. https://doi.org/10.3390/electronics13153054

AMA Style

Han C, Xiong W, Yu R. Deep Reinforcement Learning-Based Multipath Routing for LEO Megaconstellation Networks. Electronics. 2024; 13(15):3054. https://doi.org/10.3390/electronics13153054

Chicago/Turabian Style

Han, Chi, Wei Xiong, and Ronghuan Yu. 2024. "Deep Reinforcement Learning-Based Multipath Routing for LEO Megaconstellation Networks" Electronics 13, no. 15: 3054. https://doi.org/10.3390/electronics13153054

APA Style

Han, C., Xiong, W., & Yu, R. (2024). Deep Reinforcement Learning-Based Multipath Routing for LEO Megaconstellation Networks. Electronics, 13(15), 3054. https://doi.org/10.3390/electronics13153054

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Reinforcement Learning-Based Multipath Routing for LEO Megaconstellation Networks

Abstract

1. Introduction

2. Related Work

2.1. Multipath Routing

2.2. Intelligent Routing

3. System Model and Problem Formulation

3.1. Multipath Scenario

3.2. Multipath Routing and Traffic Model

3.3. Problem Formulation

4. DRL-Based Multipath Routing

4.1. Multipath Routing Discovery

4.1.1. Inter-Plane Hops H h

4.1.2. Intra-Plane Hops H v

4.2. Multipath Traffic Scheduling

4.2.1. State

4.2.2. Action

4.2.3. Reward

4.3. Training Process of GMTS

4.4. Workflow of DMR

5. Performance Evaluation

5.1. Simulation Set-Up

5.2. Results and Analysis

5.2.1. Throughput

5.2.2. Average Flow Completion Ratio

5.2.3. Average End-to-End Delay

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.1.1. Inter-Plane Hops $H_{h}$

4.1.2. Intra-Plane Hops $H_{v}$