1. Introduction
The satellite communication system is one of the most important components in the future of global communication systems. Among them, the low earth orbit (LEO) satellite network has the advantages of wide coverage, low latency and flexible deployment, and will provide low-cost commercial communication services around the world. Especially in remote areas with imperfect infrastructure and low population density, LEO satellite networks have great application prospects and development potential. In recent years, many commercial companies have begun to research and deploy large-scale LEO satellite networks to provide communication services around the world. Representative networks include Starlink, OneWeb and Telesat [
1]. The main abbreviations in the article are shown in
Table 1.
The frequency of a laser is pure, the energy is highly concentrated, the beam is very fine and the wavelength is between microwave and infrared [
2]. If the unique characteristics of lasers such as high intensity, high monochromaticity, high coherence and high directionality are used for inter-satellite communication, a series of advantages such as larger capacity, narrower beam, higher gain, faster speed, stronger anti-interference and better confidentiality, etc., can be obtained, which makes laser the most ideal carrier for the development of space-satellite communication [
3].
Solar noise is a broadband noise, and the radiation intensity increases with the frequency. Therefore, the impact of sun outage on the received signal-to-noise ratio depends on the size of the solar noise, the frequency bandwidth and the operating frequency. The sun outage interference is the greatest during the peak period of solar activity. The wider the working frequency band, the more noise is received, and the solar interference is relatively serious; the higher the working frequency, the greater the noise intensity received in the corresponding frequency band [
4]. Due to its high frequency and large bandwidth, the laser inter-satellite link is extremely susceptible to interference from the sun outage phenomenon. When the laser link is collinear with sunlight, the laser link will be interrupted for a period of time [
5].
The impact of sun outage on the satellite transmission network is very significant, especially when it occurs in the direct link of the satellite at the edge of the ground station’s visible area, because it carries a large amount of traffic from the satellite terminal to the core network, and the failure time is long and cannot be recovered, so load balancing in the failure area is very necessary. As a network topology change, sun outage failure is predictable. For predictable topology changes, the algorithm can perform route calculations in advance, so it can support a certain algorithm complexity and calculation time. At the same time, link information from a period of time before the topology change occurs, and in a specific area, can be collected in a targeted manner. This can not only improve the accuracy of the algorithm, but also prevents a large amount of routing overhead from transmitting link information.
For the LEO satellite system transmission network, in addition to sun outage, the predictable topology changes mainly include changes in the propagation delay of inter-satellite links, switching of inter-satellite links in polar regions, and feeder links between ground stations and satellites. To solve the above problems, there are strategies such as virtual topology and virtual nodes. These strategies are aimed at generating the whole network topology and simply using the link cost method to calculate the whole network route. The virtual topology strategy [
6] is suitable for complex network structures and changing network topologies. The network topology is stored in slices in each satellite, which avoids a large amount of routing information interaction and routing calculations, but it also requires satellites to have certain storage capabilities [
7]. The virtual node strategy [
8] takes advantage of the periodicity of topology changes and avoids the need for storage, but it is only applicable to polar-orbit LEO constellations and requires high regularity in the satellite network topology. Among them, the virtual topology strategy can help satellite nodes deal with topology changes caused by sun outage failure. At the same time, a load-balancing algorithm needs to be designed to solve the congestion problem that may be caused by link failure.
In this paper, we designed a sun-outage-recovery algorithm for LEO satellite inter-satellite links based on virtual topology, which collects network information in the area where the link is about to fail during a period of time before the sun outage occurs. The traffic in the failed link is redistributed to achieve load balancing in the failed area during sun outage. The contributions of this article include:
Taking a typical polar-orbiting LEO satellite network system as the analysis object, the sun outage phenomenon in the inter-satellite link of the LEO satellite network was simulated, and the duration, location and occurrence pattern of sun outage failure of the inter-satellite link were analyzed.
A local pre-rerouting algorithm (LPRA) is proposed. This algorithm only collects link information in a limited area and performs centralized routing calculations, which solves the load-balancing problem in LEO satellite networks during sun outage.
An algorithm for uplink satellite reselection based on feeder link load and minimum hop count is developed, and the scalability of this algorithm in multi-layer satellite networks and large-scale constellations is discussed.
The remainder of the article is structured as follows. First, the system architecture and traffic model of the LEO satellite network are introduced, then the simulation analysis of the sun outage phenomenon in the satellite system is performed, then a routing algorithm to solve the sun outage failure is proposed, and finally the algorithm performance is analyzed through network simulation.
2. Related Work
Although the impact of sun outage on ISLs is very significant, there is currently a lack of research on routing algorithms for sun outage interruptions in LEO constellations. For interruptions in satellite networks, most routing algorithms use backup paths. There are backup paths that respond in real time and backup paths that are calculated on demand. These algorithms can be divided into three categories. The first category uses different forms of shortest path algorithms to calculate alternative paths. The second category of algorithms calculates alternative paths for different exits in advance and uses active queue management (AQM) for path switching. The third type of algorithm finds the optimal backup path through different link models. These algorithms are prone to falling into local optimality and passing congestion to neighboring nodes and cannot effectively improve network performance. The LPRA algorithm proposed in this article takes advantage of the predictability of sun outage and the mesh structure of the LEO satellite network space segment to collect routing information in the failure area in advance and achieve load balancing through centralized routing calculations. The following sections compare the three types of algorithms with the LPRA algorithm.
The ASER (area-based satellite routing) algorithm [
9] adopts a distributed domain-based routing mechanism. Only a few intra-domain routing tables need to be updated for most topology changes, while the inter-domain routing table is updated only when inter-domain neighbor relationships change. This enables fast rerouting of dynamic networks. Zhang et al. [
10] proposed an on-board autonomous control routing algorithm based on inter-satellite link status information. The satellite makes decisions for each hop based on the link status information with adjacent satellites and the network topology, optimizing the packet arrival rate and end-to-end time delay. OPSPF (Orbit Prediction Shortest Path First) [
11] generates an instant routing table through periodic routing calculations, and uses an on-demand dynamic routing mechanism to cope with the impact of link failures [
8]. These algorithms have less routing overhead. However, only using a static or dynamic shortest-path algorithm to calculate routes cannot effectively achieve load balancing. LPRA takes advantage of the predictability of sun outage, calculates routes during sun outage failure in advance and achieves load balancing in the failure area through an algorithm with a certain complexity.
The ELB (Explicit Load Balancing) routing algorithm [
12] adopts the method of dynamic interaction of link information. Neighboring satellites exchange queue usage status to indicate their current congestion status, and determine the forwarding path of the local and previous hop nodes based on the queue occupancy rate. Thereby reducing network congestion. The FSRS (fuzzy satellite routing strategy) algorithm [
13] uses the Fuzzy Queue Congestion Index (FQCI) to reflect the degree of queue congestion, and uses the queuing theory and Naive Bayes methods to predict queuing delays, distinguish data packets with different QoS requirements and obtain effective paths that meet the needs. These algorithms are suitable to solve the problem of real-time link load jitter or link failure, avoid sudden traffic causing link overload and packet loss and have a fast response speed, but they can only achieve local load balancing. Routing loops and cascading congestion are easily formed in network environments with high loads [
7]. In contrast, LPRA does not use a hop-by-hop approach for load balancing. Instead, it sets a limited area and redistributes traffic in links that are about to fail, achieving network load balancing on a larger scale. Thus, the network will not face routing loops and cascading congestion. In fact, the above algorithm can be used as a supplement to the LPRA algorithm, using the path calculated in LPRA as the main path in the above algorithm, and switching backup paths according to real-time queue congestion indicators to cope with real-time link load jitter.
The IADR [
14] algorithm (ISL attributes-based dynamic routing algorithm) evaluates the link utilization by modeling the SNR, link duration and buffer queue of the link. Based on the links that the path passes, the path with the lowest utilization rate is selected from the set of alternative paths as the forwarding path. The SALB (state-aware and load-balanced) routing method [
15] integrates load changes and fault conditions of links and nodes, quantitatively estimates link status and dynamically adjusts the weight of queuing delays, improving QoS performance while maintaining low routing overhead. Based on the multi-objective decision theory, Yang et al. [
16] calculated paths that meet business QoS requirements based on the actual status of satellite network nodes and links and the specific requirements of the business to achieve multi-objective dynamic routing. The above algorithm can provide high-quality real-time access and has certain load-balancing capabilities. Since the optimal path is calculated independently by each service, when a large number of services need to be rerouted, the algorithm’s effect is not ideal. LPRA does not use any models to evaluate link availability, but performs centralized routing calculations based on the real load of links within a limited area. Compared with distributed route calculation, LPRA is more suitable for scenarios in which a large number of services need to be rerouted.
4. Analysis of Sun Outage
Inter-satellite link sun outage refers to the phenomenon that the incident angle of sunlight is within the main lobe width of the laser terminal’s receiving antenna, causing inter-satellite link communication to be interrupted. We used the Satellite Tool Kit (STK) and MATLAB to jointly simulate the actual impact of sun outage on the inter-satellite link. The simulation time is from the spring equinox to the summer solstice in 2023, the constellation is the polar-orbit Walker constellation 324/9/4.05, the orbit inclination is 89°, the right ascension of the ascending node of the first orbit is the vernal equinox and the true periapsis angle is 0°. In the simulation, we establish the vectors from each satellite to the adjacent satellite and the satellite to the sun in STK, and determine whether the angle between the two vectors is less than 3°. If it is less than 3°, the link is considered to have failed. The simulation is divided into two parts. The first part simulates the occurrence of solar transits from the vernal equinox to the summer solstice. The simulation step is 10 s. After that, the failure duration and number of failures of intra-plane ISLs and inter-plane ISLs are calculated, respectively, and the inter-satellite links located within the polar area and those outside the polar area can be distinguished. In the second part, based on the results of the first part, a time period in which inter-satellite link failures outside the polar region are more frequent is selected. The simulation duration is 1 h and the step size is 1 s. The duration and sequence of inter-satellite link failures during this period are analyzed.
Figure 5 shows the interruption situation of the inter-satellite link in the entire network during sun outage. The time taken for intra-plane ISLs to be affected by sun outage is mainly concentrated around 100 s, while the failure time for inter-plane ISLs is between 10 s and 200 s.
Figure 6 shows the failure situation of the inter-satellite link in the polar region from 4.00 to 5.00 on April 21. The abscissa represents the simulation time, the ordinate represents the link destination satellite number and the length of the line segment is the failure time. The blue line segment represents the inter-plane ISL in the west direction, and the red line segment represents the inter-plane ISL in the east direction. During this period, no sun outage occurred on the intra-plane ISL outside the polar region. It can be seen from the results that sun outage on inter-satellite links is discontinuous, and there will be two inter-plane ISLs experiencing sun outage at the same time in the entire network, but they are scattered on both sides of the Earth and have little influence on each other. For the problem of inter-satellite link failure in the polar region, since it carries less traffic and the network topology changes relatively frequently, the mesh characteristics of the topology can be used to use pre-designed backup paths to deal with link failure. For inter-satellite link failures outside the polar region, we established a traffic distribution model to solve the traffic distribution problem to achieve load balancing in the failure area.
5. Local Pre-Rerouting Algorithm
5.1. Algorithm Overview
The routing algorithm proposed in this article is based on the routing strategy of virtual topology [
6]. The virtual topology switching period is
. The network topology is considered unchanged in each topology cycle. The propagation delay in the link needs to be updated between different topologies. This information is stored in the space router. No additional routing overhead is required, and the routing table for each cycle is generated by the Dijkstra shortest-path (DSP) algorithm. Our algorithm needs to collect link information in a limited area and traffic information from links that are about to fail at
time before sun outage occurs, and centrally calculate paths for all different pair of satellite source and destination nodes so that the traffic can be redistributed before failure. After the occurrence of sun outage, these services will be routed and forwarded according to the pre-calculated path, and the priority of forwarding according to the path is higher than that of forwarding according to the routing table.
5.2. Collection of Routing Information
When sun outage occurs, the traffic in the failed link is redistributed to other paths so that this traffic can arrive with the minimum delay and does not exceed the carrying capacity of the surrounding links. Since sun outage is predictable, the routing algorithm can perform route calculations before sun outage occurs, so the algorithm can tolerate a certain calculation time. Our algorithm is shown in
Figure 7. The link that is about to fail is
. The area where link load needs to be collected is limited to
hops, and this area is called a limited area. The value of
is determined by the number of satellites in the constellation and their business requirements. The specific setting of this parameter will be explained in
Section 7.1. “Simulation design”. During a period of time
before sun outage occurs, the satellite connected to the link
in the limited area counts the total number of data packets
forwarded through the link during this period, and sends the link information to the satellite (node 1) that is directly connected to the link that is about to fail, and the latter uses this to estimate the average load of the link
At the same time, the satellite directly connected to the link that is about to fail counts the traffic information within the link, which is called link-failure traffic. What needs to be collected is the source address, destination address, required bandwidth and path of the services in the traffic. The service shown in
Figure 7 is part of the link-failure traffic. Its source address is node s, the destination address is node d and the path before sun outage is s-1-2-3-4-d. Assume that the total number of data packets forwarded by service
via link
during
is
, then the traffic volume of this service is
5.3. Alternate and Detour Path Calculations
Under such a design, we can obtain the service distribution of the links affected by sun outage, and at the same time update the link load in the limited area based on the original paths of these services, and obtain the link load in the limited area after removing the link failure traffic. Assume that the updated link load is
, the service set in the link that is about to fail is
and path
is the path of service
before sun outage occurs;
represents whether path
p passes through link
e,
Next, we design three different paths for each source and destination node pair. The first path is the shortest path. The calculation method for this path is to use the Dijkstra algorithm to calculate the shortest path after removing the link that is about to fail in the original topology. In the figure this is shown as s-5-2-3-4-d. The second path is the second shortest path. In the topology where sun outage occurs, the second shortest path is calculated through Yen’s algorithm. In the figure, it is s-5-6-3-4-d. The third path is the detour path. In the topology where sun outage occurs, the links with updated loads greater than
times the average load in the limited area are removed. The loads
of these links
satisfies
Here, represents the set of links in the limited area, and k represents the total number of links in the limited area. After removing these links, the Dijkstra algorithm is used to calculate the detour path; as shown in the figure: s-7-8-9-6-3-4-d.
5.4. Solve for the Optimal Path Set
In the previous section, we obtained three alternative paths through calculation, namely the shortest path path1, the second shortest path path2 and the detour path path3. Our goal is to select a path for each service in the link that is about to fail so that the total propagation delay is the shortest and the link load in the limited area does not exceed the set threshold .
The set of paths enabled by all services is
,
represents whether path
is enabled,
is the set of inter-satellite links in the entire network,
is the cost of link
,
is the cost of path
,
represents the current load of link
,
represents the set maximum link bandwidth,
represents the remaining bandwidth of link
,
Note the set of inter-satellite links in the limited area as
, the set of business requirements in the link that is about to fail as
, the set of paths of all services as
and
represents the total traffic between the source address and the destination address of business
. The set of feasible paths for a specific service is recorded as
, and the total propagation delay is
. The problem of traffic redistribution of failed links within a limited area can be described as:
Constraints (18a) and (18c) represent that service has, and only has, one path that can be used for transmission, and constraint (18b) represents that the sum of all services transmitted on link is less than the remaining capacity of the link. The optimization problem is a 0-1 type integer linear programming problem that can be solved using the implicit enumeration method. If the link maximum load is set too small, the problem may not be solved. In this case, the value of can be gradually increased to solve the problem.
Due to the existence of limited areas, traffic distribution only considers load balancing within the area rather than global load balancing. The algorithm calculates three alternative paths for each source–destination node pair, and finally obtains a set of enabled paths. Considering that space routers have limited cache capabilities and insufficient support for multi-path routing, the algorithm proposed in this article only selects one of the paths. In different scenarios, the algorithm constraints can also be modified so that the traffic of specific services is distributed among different paths.
5.5. Algorithm Iteration
Since we use the virtual topology strategy as the topology generation strategy, and the sun outage is likely to span two virtual topology cycles, we can similarly perform routing calculations in advance before the virtual topology switching. The idea behind the iterative algorithm is similar to that for the routing algorithm before the sun outage occurs. Since the traffic affected by sun outage is forwarded according to the path, this part of the traffic can be easily distinguished from other traffic. We need to collect statistics on this part of the traffic before topology switching. The difference is that the nodes that count this part of the traffic are no longer the nodes connected to the failed link, but the source nodes of these services. In addition, among the link loads in the limited area, the portion forwarded according to the routing table also needs to be counted. The above routing information is collected by the node connected to the failed link. This node performs route calculation before topology switching and sends the new path to the source node of the services. In the iterative algorithm, the topology used to calculate the three alternative paths is the virtual topology of the next topology cycle.
Figure 8 shows the overall flow of the algorithm.
7. Performance Analysis
7.1. Simulation Design
We derived the orbital parameters of the satellite through STK, built a LEO satellite network simulation platform in OPNET and compared and analyzed the actual performance of different algorithms in sun outage scenarios. The Walker parameters of the polar-orbit satellite constellation used in this article are 324/9/4.05, the orbital altitude is , and the orbital inclination is . There are two seams between the first orbit and the last orbit, and no link is built between the seams.
According to the traffic model described in
Section 3.2, the time interval between data packet arrivals between any two satellite nodes
and
follows an exponential distribution with parameter
:
The parameters in the local pre-rerouting algorithm (LPRA) in this article are set as follows. The high-load link threshold in the limited area is
, and the initial value of the maximum link load is
. The time period for collecting routing information before sun outage is
, and the reserved time period for collecting routing information and calculating paths is
. The calculation formula of virtual topology cycle is
Here, is the gravitational constant, is the mass of the earth, is the radius of the earth and is the number of satellites in each orbit. According to the constellation scale in this article, .
In the simulation, the scope of the limited area can be set based on the traffic density of the satellite location. For the constellation size used in the simulation within this article, if the satellite where the sun outage occurs is located over North America or Eurasia, the limited area hop number can be
; if the sun outage occurs over other continents,
can be used; if the sun outage occurs over the ocean far away from the mainland, we use
. The main simulation parameter settings are shown in
Table 2.
We compared the algorithm in this article with the SPF (shortest path first) algorithm and the IADR algorithm. SPF allows all nodes to forward packets through the shortest path after link failure, while the IADR algorithm first evaluates link utilization within a limited area. Based on the links the path passes through, the path with the lowest utilization is selected from the set of alternative paths as the forwarding path after sun outage failure. All algorithm paths or routing tables are calculated, generated and stored in the corresponding space router in advance. When sun outage occurs, the space router immediately changes the forwarding method.
We analyzed the services that originally passed through the failed link and calculate their average end-to-end delay and packet loss rate during the sun outage to evaluate the performance of different algorithms. In addition, in order to evaluate the load balancing capability of the algorithm, we counted the load of links in the limited area during sun outage to analyze the impact of the detour path on the load of surrounding links.
7.2. Simulation Results
In the sun outage scenario in LEO satellite networks, two characteristics are more important. The first is the duration of sun outage. A longer duration of the sun outage is a challenge to the adaptability of the algorithm. As time goes by, the link information and traffic information before failure become unreliable. The longer the failure time, the worse the adaptability of the algorithm. The second is the average load of the links in the network. Link average load reflects the degree of network congestion and can evaluate the effect of the algorithm in different network environments. In the simulation, we set the duration of sun outage to 120 s and the reference value of the average network load to 0.33. We fixed the value of one feature and analyzed the impact of the other feature on the algorithm performance.
7.2.1. Performance Comparison under Different Sun Outage Durations
In order to simulate the performance of the algorithm under different sun outage durations, based on the simulation results in
Section 4, we set the minimum duration of sun outage to 60 s, the longest duration to 180 s, the step size is 20 s and fixed the average load of the link to 0.33. As shown in
Figure 11, in terms of packet loss rate, LPRA is always better than other algorithms. The effect of SPF is far worse than LPRA and IADR. This is because, in the scenario with the SPF algorithm, after the link fails, services will choose the shortest path to bypass, making the originally congested link even more congested, resulting in very serious packet loss. The IADR algorithm evaluates link utilization and selects the best path. However, since this selection is made independently by each service, it does not consider global load balancing. After the link fails, the amount of traffic that needs to be rerouted is very large. At this time, the original optimal alternate path becomes unsatisfactory. In terms of latency, LPRA still maintains the optimal performance, but LPRA is most affected by the duration of the sun outage. The main reason is that LPRA uses many detour paths, and the latency performance of the preset paths gradually decreases as the topology changes.
7.2.2. Performance Comparison under Different Link Average Loads
In order to simulate the impact of the average link load on the algorithm’s performance, we changed the rate of the uplink traffic of the satellite node, but its ratio still met the traffic model in
Section 2, so that the average link load ranged from 0.23 to 0.42, and fixed the duration of sun outage to 120 s. It can be seen from
Figure 12 that, the packet loss rate of the three algorithms all increase with the increase in link load, and the packet loss rate of LPRA is always the lowest. Among them, the packet loss rate of the IADR algorithm increases significantly when the link load reaches 0.35. This is because the load at this time has reached the link bottleneck. After reaching the link bottleneck, the statistics of the link cannot truly reflect the network status. At this time, the performance of algorithms that rely on link information drops significantly. While the packet loss rate increased significantly, the delay decreased slightly. The main reason for this is that some originally high-delay data packets were not included in the final statistical results due to packet loss. In terms of latency, the SPF algorithm is always the highest. After the load exceeds 0.35, the delay of the LPRA algorithm exceeds that of the IADR algorithm. This is because the LPRA algorithm uses more detour paths, and the paths with higher queuing delays in IADR have already caused packet loss, which is not included in the statistics. For the actual arriving data packets, the average delay of the IADR algorithm is better than LPRA.
7.2.3. Comparison of Link Load Balancing Capabilities
This section demonstrates the load-balancing capabilities of the algorithm at low and high average link loads. Under different algorithms, we calculated the load and proportion of all inter-satellite links in the limited area. As shown in
Figure 13, no matter if in the low or high average link load scenario, there are no inter-satellite links belonging to the highest level of load under the LPRA algorithm, and there are more inter-satellite links at intermediate levels of load than SPF and IADR. This shows that the LPRA algorithm has the strongest load balancing capability, and also verifies that the LPRA algorithm maintains a low packet loss rate without causing excessive degradation in delay performance.
7.3. Simulation Analysis
It can be seen from the above simulation results that, since the algorithm in this article uses centralized routing calculations, it has stronger load balancing capabilities than other distributed routing algorithms. Therefore, there are fewer ISLs with the highest level of load in the failure area. At the same time, the packet loss rate in the network is also lower. However, it does not have such an obvious advantage in terms of delay. This is partly because the higher packet loss rate leads to inaccurate delay statistics. In addition, since the algorithm in this article takes many detour paths, although it can avoid congested links and reduce queuing delay, the propagation delay is also increased. The increase in propagation delay is more significant when the sun outage lasts longer.
The three alternative paths proposed in this article are the shortest path, the second shortest path and the detour path. The second shortest path is used as a substitute for the shortest path. Most of the links in the path coincide with the shortest path, so the probability of being selected is low. The detour path is quite different from the shortest path. By sacrificing part of the network delay performance in exchange for network load balancing, when the network load is high, the probability of it being selected is also higher. However, the selection of alternative paths can be further optimized. The detour path only considers the congestion status of the link before the sun outage, and does not consider whether it excessively overlaps with the paths of other services in the algorithm. The algorithm performance can be further improved by adding more detour paths, but this will also increase the computational complexity of this algorithm.
8. Conclusions
A local pre-rerouting algorithm (LPRA) is proposed in this paper, which solves the routing problem during sun outage of inter-satellite links in LEO satellite networks, and improves the load-balancing capabilities of LEO satellite networks. Considering the characteristics of LEO satellite networks, our proposed algorithm only collects link information within a limited area and performs centralized routing calculations. LPRA collects local routing information and traffic information in links that are about to fail before sun outage occurs, and calculates separate paths for each different pair of satellite source and destination nodes so that the traffic can be redistributed before failure. After the occurrence of sun outage, these services will be routed and forwarded according to the pre-calculated path. In addition, in order to optimize the uplink path of traffic from ground stations during sun outage, an algorithm for uplink satellite reselection based on feeder link load and minimum hop count is developed, and the scalability of this algorithm in multi-layer satellite networks and large-scale satellite networks is discussed. Simulation results show that the algorithm proposed in this paper is superior to existing algorithms in terms of its packet loss rate and load-balancing capabilities. When the sun outage duration is long, the delay performance advantage is not clear. When the link load is high, the delay performance decreases seriously. This result is consistent with the idea that the algorithm in this paper sacrifices part of the delay performance in exchange for throughput performance. In subsequent research, the selection of alternative paths can be further optimized to improve the delay performance of the algorithm.