1. Introduction
Compared with GEO satellite networks, LEO satellite networks have the advantages of low communication delay, wide coverage, and low transmission power of communication equipment. With the growth of global communication services, LEO satellite networks have become more widely used in military, commercial, and civilian applications [
1]. The LEO satellite communication system is composed of LEO satellite constellations, user terminals, and ground base stations. In the forward link, the traffic is uploaded from the ground base stations, then routed to the corresponding satellite, and finally transmitted to the user terminal. Correspondingly, in the reverse link, the traffic is uploaded from the user terminal, then routed to the corresponding satellite, and finally transmitted to the ground base station. This shows routing is an important issue in satellite communication systems.
Laser terminal has been widely used in intersatellite communication, due to its advantages of intensity, high transmission rate, and strong anti-interference, such as StarLink, LeoSate, and TelSate [
2]. However, the laser intersatellite communication is unstable in various events such as sudden changes in channel status, satellite operating condition conversion, and on-board equipment failure, which may cause communication interruption (hereinafter referred to as link-off). Furthermore, various failures of the entire satellite can also be abstracted into corresponding link failure events that request specific consideration for the routing algorithm design [
3].
In recent years, numerous routing algorithms have been designed to deal with the routing problems of LEO satellite networks. In order to solve the topology changes caused by satellite movement, there are mainly two topology discovery strategies [
4]. The first is the virtual topology strategy [
5], which is suitable for complex network structures and changeable network topologies. The network topology is stored in each satellite in fragments, avoiding a large amount of routing information interaction and routing calculations, but at the same time, the satellites are required to have a certain storage capacity. The second strategy for virtual nodes [
6] uses the periodicity of topology changes to avoid storage requirements, but it is only suitable for LEO satellite networks with simple topology and good coverage. The two topology discovery strategies have their own advantages and disadvantages. The virtual topology strategy divides the feeder link switching more finely, but also requires more storage space. The virtual node strategy saves storage space, but its utilization rate of the feeder link is low. At the same time, the satellite node is required to switch the IP address synchronously. However, none of these routing algorithms considers the instability of intersatellite communication.
In order to achieve network load balancing in the space segment, there are three types of solution [
7]. The first type for centralized offline routing [
8] solves the difficulties of real-time routing better, but it cannot solve the problem of changes in business requirements. The second type for distributed on-demand routing strategy [
9] mainly saves the overhead of routing information and unnecessary on-board routing calculations. It can support on-board real-time autonomous routing calculations and has certain load balancing capabilities. This kind of scheme is suitable for sudden business access, but it is difficult to reach a global optimum. The third type for distributed multi-path routing strategy [
1,
3,
10,
11] is adapted to solve real-time link load jitter or link failure problems, avoiding link overload caused by burst traffic and resulting in packet loss, while the response speed is faster. This strategy can achieve local load balancing. However, existing research mostly uses router interface buffers to measure the degree of congestion of the link, which is impractical in satellite networks. Satellite networks encounter high propagation delay and cannot tolerate higher queuing delay, and due to the large intersatellite link bandwidth, the increase in queuing delay means that a large amount of buffer size needs to be prepared, which requires a higher performance for the space router [
12]. Therefore, it is more reasonable to conduct research on this type of algorithm from the perspective of real-time link load.
There is currently no accepted solution to the problem of intersatellite link failure and interruption. FSA [
8], a centralized routing algorithm based on virtual topology, utilizes predictable topology changes and service demand changes to dynamically allocate intersatellite link bandwidth resources, but it cannot handle unexpected link failure events. According to the distributed algorithms AODV and OLSR commonly used in terrestrial wireless ad hoc networks, both the LAOR algorithm [
9] and the DODR algorithm [
13] for link failure scenarios are proposed in the satellite network, and the orbit speaker strategy is proposed in CEMR [
14]. These algorithms mainly save routing information overhead and unnecessary on-board routing calculation, while at the same time, they can support on-board real-time autonomous calculation of routing, and have certain load balancing capabilities. However, satellite networks have higher requirements for network service quality. When a link fails, these algorithms cannot provide an instant and reliable transmission path, which is prone to packet loss. DTDR, on the other hand, utilizes the mesh characteristics and periodic rules of the network topology of the space segment, and combines static routing with dynamic algorithms, which greatly improves the efficiency of the routing calculation. The distributed multi-path routing algorithm such as ELB [
10] is suitable for solving the problems of real-time link load jitter or link failure, avoiding link overload caused by burst traffic and resulting in packet loss, and the response speed is fast. However, this kind of algorithm can only achieve local load balancing, and routing loops and cascading congestion problems are easily formed. In addition to propagating link failure information to neighboring nodes, DTDR also floods the information to neighboring nodes within a specified number of hops to achieve load balancing in the area of a failed link. At the same time, a loop avoidance mechanism is formulated in combination with the network topology characteristics of the space segment. OPSPF [
15] generates an instantaneous routing table through periodic routing calculation and uses an on-demand dynamic routing mechanism to deal with the impact of link failures. The algorithm can save the storage space of the network topology, but the routing information overhead is large when the link fails, and the convergence is slow in large satellite constellations, wrong paths are easily formed. DTDR only floods the failure information to the neighboring nodes in a certain area, which reduces the routing overhead, and the algorithm converges quickly.
For LEO satellite constellation routing, the immediacy and reliability of the path is a major challenge. The bandwidth of the laser link is large and the resources on the satellite are limited. There is no large-scale buffer to support the buffering of data packets [
12]. The buffer can only meet the storage requirements of a small amount of data packets during the pre-switching of the intersatellite link [
16,
17]. Therefore, when a link is interrupted, if there is no timely route switching strategy, a large amount of packet loss will occur. When a link interruption event occurs, there must be a backup path that can be replaced immediately, and the path should be reliable, that is, to ensure that the data is accessible under certain severe conditions. The advantage of the routing algorithm based on the link state is that it can perceive the topology of the entire network in real time, but it is more vulnerable when the link is interrupted. Before the link status of the entire network is fully synchronized, routing loops are easily formed between satellite nodes due to the lack of reliable backup paths. At the same time, the cost of broadcasting link failure information in the entire network is relatively large, and the synchronization time is long, which causes a large amount of data loss.
In response to the above problems, we designed an LEO satellite network disruption tolerant distributed routing algorithm (DTDR) based on partial link status information. This algorithm can not only provide instant alternative backup paths for space segment routing but also balance the load around the failed link, while the routing cost is limited. Since the research in this paper focuses on the routing scheme for link failure, no clear strategy is given for global or local load balancing. It was mentioned above the current solution for realizing the load balance of the space segment network, this algorithm can better combine the above three types of solutions to form a complete space segment routing system.
2. LEO Satellite Network Model
In our constellation system, the space router on each satellite communicates with neighboring satellites through four interfaces connected to laser terminals and receives traffic from a ground base router and a space base station through the other two interfaces. In addition to transmitting data packets between itself and the space router, the laser terminal can also inform the space router if the laser link is in a fault state.
As shown in
Figure 1, each satellite is connected to neighboring satellites through inter-plane ISLs and intra-plane ISLs. There are two seams between the two hemispheres, the satellites on both sides of the seam move in opposite directions and do not build links. The four intersatellite link interfaces of the satellites located in the western hemisphere are counted as 0, 1, 2, 3 in the counterclockwise direction from the north direction. As shown in
Figure 2, when in the western hemisphere, satellite B communicates with the hetero-orbiting satellite A through its interface 1 and the interface 3 of satellite A. When the two satellites cross the polar region and enter the eastern hemisphere, they do not adjust their movement attitudes, but constantly adjust the positions of interface 1 and interface 3. Finally, in the eastern hemisphere, satellite B still communicates with satellite A through its interface 1 and the interface 3 of satellite A, and the positions of interface 1 and interface 3 are swapped at this time. During the entire process of crossing the polar regions, the inter-plane ISL always maintains link establishment status. From the perspective of routing, the length of the inter-plane ISL is constantly changing, and the link establishment status and the corresponding relationship between the interfaces are always constant.
For a polar orbit constellation, its constellation configuration is determined by the following parameters: orbit radius , orbit number , number of satellites in each orbit , phase difference between adjacent orbiting satellites , and right ascension difference of the ascending node of the adjacent orbit .
The length of the intra-plane intersatellite links is fixed, and is calculated by
The length of the inter-plane intersatellite links varies with latitude. As shown in
Figure 1, if one takes a point
B′ at the same latitude as star
B on the orbital plane of star
A, and take a point A′ at the same latitude as star
A on the orbital plane of star
B, then
AA′
BB′ is an isosceles trapezoid.
where
is the latitude of star
A and
is the latitude of star
B. If
A and
B are both in the southern or northern hemisphere,
, if
A and
B are in different hemispheres,
.
According to the isosceles trapezoidal waist length formula, the length of the inter-plane intersatellite links is
4. Detailed Algorithm Description
4.1. Algorithm Parameters and Performance Analysis
Since the services carried by the space segment network mainly go through three to four hops, we compared the routing performance of the strategies in which the link failure information is propagated to one hop, two hops, and three hops for the V1 -> V9 data flow in a 3 × 3 network topology. As shown in
Figure 5,
,
,
. The main path of V1 -> V9 is V1-V2-V3-V6-V9, and the failed link is V6-V9. Due to the long intersatellite link length, and the buffer size of the space router being relatively small compared to the laser link bandwidth, the transmission delay of the data packet is much smaller than the propagation delay. We take the propagation delay
and the detour distance
as the evaluation index of the detour path, the propagation delay
, where
is the total length of the intersatellite link that the detour path passes through, and
is the speed of light, which is used to measure the quality of service provided by the detour path, while the detour distance
is represented by the graphic area enclosed by the detour path and the main path, which is used to measure the load balancing capacity of the detour path for the intersatellite link.
Let represent the length of the ISL V1-V2, represent the length of the ISL V4-V5, represent the length of the ISL V7-V8, and represent the length of intra-plane ISL. If the link failure information is propagated to one hop, the data flow starts to detour when it reaches V3. The detour path is V1-V2-V3-V2-V5-V8-V9. Under this circumstance, the propagation delay , the detour distance . When the link failure information is propagated to two hops, the data flow starts to detour when it reaches V2, and the detour path is V1-V2-V5-V8-V9. In this case, the propagation delay , the detour distance . When the link failure information is propagated to three hops, the data flow starts to detour at V1, and the detour path is V1-V4-V7-V8-V9. In this case, the propagation delay , the detour distance .
The above only exemplifies the failure of V6-V9, and V1-V2, V2-V3, and V3-V6 will also fail with the same probability. Under the premise that each link in the network fails with equal probability, the routing performance under various strategies is comprehensively analyzed, and the calculation results are as follows:
Dissemination of failure information to one hop.
,
.
Dissemination of failure information to two hops.
,
.
Dissemination of failure information to three hops.
,
.
The strategy of dissemination of failure information to one hop is obviously undesirable because of repetitive paths and poor diversion capabilities. We compare the strategies of dissemination of failure information to two hops and three hops.
The variables in this formula are and , in , , , in , , , since , , so , .
According to the changing law of the cosine function, when the satellite is at low latitudes, the change of
is relatively gentle, but at high latitudes, the change is relatively large, so that the strategy of dissemination of failure information to two hops has an obvious advantage on delay at high latitudes. At the same time, the LEO satellite network has relatively dense business at low latitudes, and relatively few at high latitudes [
19], so the strategy of dissemination of failure information to three hops is more important at low latitudes. In summary, satellites at high latitudes can adopt the strategy of dissemination of failure information to two hops, and satellites at low latitudes can adopt the strategy of dissemination of failure information to three hops.
4.2. Forwarding Process and Loop Avoidance
In the routing strategy mentioned above, a total of three routing tables are involved, and their priorities are: 1. detour path routing table, 2. main path routing table, and 3. alternate path routing table. During the forwarding process, the conditions for entering the next priority are as follows:
The satellite is not in the state of the failure area, that is, there is no failure link in the area.
1-> 2
The corresponding value of the destination address of the data packet in the detour routing table is −1, that is, its main path does not pass through the failed link.
1-> 2
According to the current routing table, the forwarding interface is in an unestablished state (point to the seam) or the link to which it points has failed.
1-> 2, 2 -> 3
According to the current routing table, the forwarding interface direction is the same as the incoming interface direction of the data packet, which means the direction leads packets to turn back.
1-> 2, 2 -> 3
As shown in
Figure 6, the data packet enters from node 4 to node 5, p1 is the main path, and p2 is the alternate path. In the case where the main path and the alternate path are both unavailable, the direction p3 opposite to the alternate path is selected for forwarding. At this time, it is no longer considered whether the forwarding interface is in the same direction as the incoming interface, that is, the data packet is allowed to turn back. If p3 is unavailable, the remaining p4 is used for forwarding at this time, and the local satellite is marked as invalid at the head of the data packet. Then, the data packet will no longer be sent to this satellite when it passes through the satellite adjacent to this satellite.