1. Introduction
As a resurgence in the space industry, the declining cost of producing and launching nanosatellites has stimulated exponential growth in low Earth orbit (LEO) constellations over the past two decades [
1,
2]. The volume of space-native raw data increases exponentially with constellation size, and due to the bandwidth constraints of the satellite–ground link, timely downloads of all the updated data are not achievable [
3]. Therefore, as a distinctive distributed learning method, federated learning (FL) [
4,
5,
6] holds immense potential for broad deployment in satellites. This system avoids transmitting training data between satellites, thereby preserving the data security of satellite data [
7] and reducing costly traffic in satellite communication [
8,
9,
10].
In traditional Earth observation missions, the ground users must wait for all satellites to send back data, typically requiring two to three days [
11]. In contrast, FL can harness the local computational resources of satellites to decentralize the model training tasks among them, thereby eliminating the need for data transmission to ground stations (GSs) or other satellites [
12].
FL has been widely used in terrestrial networks, where the Google team has developed a basic FL algorithm called FedAvg [
13]. Chen et al. [
14] applied this to the LEO constellation to verify the effectiveness of SFL. However, as a synchronous approach, FedAvg forces all satellites to complete local training and transmit their parameters to servers in a global training round, resulting in several days to converge a model. Therefore, Razmi et al. [
15] applied an asynchronous approach and proposed FedSat, where servers no longer need to wait for parameters from all satellites before starting the next round of training. Although the asynchronous-based approach has some benefits, it can also lead to a new problem: model aging, because a few satellites fall too far behind the global rounds. Then, So et al. [
16] proposed the FedSpace algorithm, which balances the idle connections of satellites and the obsolescence of local models. They calculated the variable satellite connectivity based on the satellite orbits and Earth’s rotation to determine the global model aggregate schedule. On this basis, Razmi et al. [
17] proposed FedISL, which uses inter-satellite links to reduce the delay of FL. Furthermore, they considered whether satellites can complete local model training during the communication window and designed a scheduling algorithm, FedSatSchedule [
18]. Due to the intermittent connectivity between satellites and GSs, stale gradients and unstable learning in SFL are challenges. Hence, Wu et al. [
19] proposed FedGSM, a novel asynchronous FL algorithm that introduces a compensation mechanism to mitigate gradient staleness.
The introduction of FL to satellite networks has brought many new opportunities. For example, Al-Hawawreh et al. [
20] proposed a new FL-assisted distributed intrusion detection system using a mesh satellite net to protect autonomous vehicles. Li et al. [
21] developed a FL module for multi-satellite, multi-modality in-orbit data fusion, which compressed communication costs by a factor of 4 and significantly reduced the training time by 48.4 min (15.18%). Salim et al. [
22] proposed a novel threat detection FL model for proactively identifying intrusions in satellite communication networks. This model utilizes decentralized on-device data while preserving data privacy.
However, the applications of FL in satellite networks still face many unique challenges. The mobility of satellites usually makes satellite communication unstable, which relies on the predictability of satellite visits to create a system design [
23]. One limitation of LEO satellites is their brief visibility period with GSs. While their orbit period typically ranges from 90 to 120 min, they are only in direct contact with a GS for 5 to 20 min per orbit [
19]. Moreover, failures from the network, hardware, software, etc., make SFL face more challenges than terrestrial FL [
12].
It is necessary for FL to make a client selection (CS), and this problem in terrestrial networks has been fully studied [
24]. However, to the best of our knowledge, this was lacking in satellite networks. This problem needs to consider the orbital characteristics of the client satellites, data value, computing capabilities, etc., to improve the performance of the model in satellite federated learning (SFL).
In this paper, we firstly investigate the characteristics of satellites links, and discuss the positioning of a parameter server (PS) in two distinct scenarios: deployment at a GS and deployment on an LEO satellite. Secondly, we consider these specific attributes of satellites in the context of SFL, and propose an index called “client affinity” for client satellites to gauge their contribution to the global model. Finally, we validated the efficacy of our methods by conducting experiments based on two benchmark methods—FedSat [
15] and FedSpace [
16].
The contributions of this paper can be summarized as follows:
We demonstrate a SFL paradigm where LEO satellites act as PSs, and conduct simulations based on a constellation of 120 low-orbit satellites.
We demonstrate in detail the communication and mobility models of SFL, and model the CS problem in SFL as a 0–1 knapsack problem.
We establish a model quality evaluation function for client satellites, and use affinity to describe the contribution of the client to global training. Then, we combine client access and communication to establish a CS mechanism.
Simulation results are presented which verify that the proposed method can effectively improve the convergence speed and accuracy of the model in SFL.
2. Motivation
In this section, we demonstrate how SFL performance can be improved when the LEO satellite plays the role of the PS. One of the benefits of LEO satellites is their short orbital period. For example, the period of a satellite in a circular orbit with an altitude of 500 km is approximately one hour. However, the restricted communication range between the GSs and satellites inherently limits the access time. The communication window between an LEO satellite and a GS is typically only a few minutes long.
We simulated a LEO satellite constellation with a total of 120 satellites. We separately counted the number and duration of accesses for the two cases: an LEO satellite as the PS and a GS as the PS. We constructed Walker delta constellations at three different orbital heights of 400 km, 500 km, and 600 km, whose inclinations are 40°, 45°, and 50°, respectively. At each height, five orbits were placed, each with eight satellites.
In the satellite-to-ground scenario (a GS as the PS), we placed a GS at a location in Shanghai, China, and counted the number and duration of all 120 satellites’ visits to the GS within 24 h. In the satellite-to-satellite scenario (an LEO satellite as the PS), we monitored visit numbers and durations from other satellites to the PS.
As
Figure 1 shows, although the PS satellite cannot access all other satellites, some of them can establish a very long communication window with the PS, which is difficult for GSs to achieve. The statistical results show that the average access duration when the satellite is the PS is 6571.9 s, while it is only 555.3 s for the GS. Similarly,
Figure 2 shows the number of times the satellite accesses when the GS and the satellite, respectively, are the PSs. When a GS is the PS, each satellite has a relatively equal chance of access, but the access frequency is much lower than when the satellite is the PS. Specifically, when the PS is a satellite, the average number of accesses is 30, while for the GS it is only 8. Furthermore,
Figure 3 illustrates the numbers, durations, and temporal relationships of accesses between the clients and the PS. When a GS is the PS, the access in different satellites has no significant differences. However, when a satellite is the PS, some satellites can establish a stable and continuous connection, while others can access the server intermittently, and still others cannot access the server during the simulation period.
In summary, in FL, we hope that clients have longer connection windows and more frequent access to the PS, but the accessibility performance in the satellite–ground scenario is far less than in the satellite–satellite scenario. Moreover, the revisit time of some satellites is too long in both the satellite–ground and satellite–satellite scenarios, which produces performance loss in both synchronous SFL and asynchronous SFL. Therefore, it is necessary to filtrate clients to improve the overall training performance.
5. Simulation
We simulated an LEO constellation with 120 satellites by STK, and imported the information on links between clients and PS as a JSON file. Hence, our program checked the connections between clients and the PS every time slot (0.1 s) based the JSON file, so we can know which clients were connected to the PS at any time. Then, we compared the SFL between satellite-to-ground and inter-satellite communication, and validated effectiveness of our CS algorithm.
5.1. LEO Constellation
We consider an LEO Walker constellation consisting of 120 satellites, distributed at 400 km, 500 km, and 600 km, respectively. Five orbital planes are evenly deployed at each altitude, with each orbital plane having eight satellites. The details of constellation simulation parameters are listed in
Table 1, and the results of the Walker configuration constellation simulated using the STK based on these parameters are shown in
Figure 8.
In the simulation, we established a GS in Shanghai, situated at latitude 31.25° and longitude 121.48°, to serve as the ground PS. Additionally, we selected a satellite orbiting at an altitude of 500 km to serve as the space PS.
5.2. Experimental Environment, Datasets, and Hyper-Parameters
The construction and simulation of constellations in this study were carried out using STK 12.2 and Python 3.9, and client access times (start and end) were obtained via the generation of files in JSON format. We built the SFL using Python 3.9, Pytorch 2.0 with CUDA 11.7. The experiment was executed on a personal computer equipped with Windows 11-64 bit system (Microsoft, Washington, DC, USA), and 64 GB of memory, a 36-thread Intel-i9-10980XE CPU (Intel, Shanghai, China), and an NVIDIA RTX 3090 graphics card (ASUS, Taipei, China) with 24 GB of memory.
The ResNet-50 [
30] model was applied as the backbone network in this experiment, which consists of five stages each with a convolution and identity block. Each convolution block has three convolution layers and each identity block also has three convolution layers. The ResNet-50 has over 23 million trainable parameters, which is challenging to train but is capable of providing highly accurate results.
The Fashion-MINIST [
31] and CIFAR-100 [
32] datasets were applied during the experimental phase. Fashion-MINIST is a dataset comprising 28 × 28 grayscale images of 70,000 fashion products from 10 categories, with 7000 images per category, whose training set has 60,000 images and the test set has 10,000 images. CIFAR-100 has 100 classes containing 600 images each and there are 500 training images and 100 testing images per class, which include animals, foods, vehicles, flowers, etc.
In the experiment, the training set was randomly distributed across the clients after being shuffled, and an overlap coefficient was set to allow for the possibility of identical data among different clients. The test set was deployed on the server side for evaluating the aggregated model. The term signifies that all clients possessed identical datasets, whereas implies that no client datasets overlapped. In one of our experiments, the was set at 0.2, indicating that each client contained 20% of the data common to one or more other clients.
We have established 5 epochs for local training and a total of 200 epochs for global training. When clients have finished the five epochs of local learning, they immediately send their results to the PS. For model evaluation, the ratio coefficient was set to 0.2. In a single-round of training, the maximum number of client systems that can be selected is capped at 40. On the client side, there were three rounds of localized training, with the learning rate being fixed at and the batch size configured to 32.
5.3. Experiment Results
We measured the accuracy and convergence speed in different scenarios, that is, we compared the impact of server deployment on the GS or the satellite, and FedSat [
15] and FedSpace [
16] were applied as two benchmark methods. We compared the impact of CS mechanisms on these SFL methods.
Firstly, we considered the impact of the CS mechanism on both methods when the PS is located is the GS. As shown in
Figure 9, in 200 rounds of iterations, the algorithms equipped with a CS mechanism not only exhibited a faster convergence speed but also significantly improved the accuracy of the model. Amid the stringent visitation conditions, both FedSpace and FedSat, devoid of CS, faced arduous endeavors in reaching model convergence within a span of 48 h. Consequently, the final precision obtained stood at 38.4% and 37.5% for each, respectively. Fortunately, the CS mitigated this situation and significantly improved the convergence speed and accuracy rate; both methods achieved more than 80% model accuracy within 48 h.
What we wanted to illustrate next is that, in SFL, deploying the PS on the satellite is a better choice. We used the same training parameters and configurations as on the GS, but changed the location of the PS to a satellite at a height of 500 km. As shown in
Figure 10, the benchmark method significantly improved the convergence speed and accuracy rate. This reveals the potential of inter-satellite FL. Compared to the traditional methods, we achieved more than a 30% improvement in accuracy. Moreover, using CS can shorten the model convergence time to less than 16 h, with FedSpace achieving an accuracy of 85.5% and FedSat achieving an accuracy of 79.8%. A detailed comparison of the results is shown in
Table 2.
We attempted to further validate the effectiveness of the algorithm on the CIFAR-100 dataset. As shown in
Figure 11, although the increase in dataset variety indeed presents challenges for satellite federal learning, our method was still able to achieve convergence within 24 h and attain accuracies of 82.3% and 77.4%.
6. Conclusions
Federated learning (FL) is a communication-efficient machine learning framework that is well-suited for satellite applications, where communication capabilities are often limited. Additionally, satellite data transmission via wireless broadcasting inherently poses privacy and security risks. FL, however, has strong privacy protection capabilities, which can help to ensure the privacy of remote sensing data. This study investigates the positioning of parameter servers (PSs) and the problem of client selection (CS) within the context of satellite federated learning (SFL). We examined and contrasted the access times and duration from client satellites to both the ground station (GS) and the server satellite. It has been observed that a satellite, when functioning as a PS, possesses a better ability to receive client parameters within its field of view compared to a GS. Both the communication and mobility models for SFL were demonstrated. The CS is modeled as a 0–1 knapsack problem to be solved. A comparative analysis with two benchmark methods, FedSat and FedSpace, was conducted to establish the advancements of inter-satellite FL. Then, Fashion-MINIST and CIFAR-100 were applied as datasets to test the models. Notably, the experimental results indicate that the CS mechanism can expedite the convergence speed of SFL up to 12 h, and it achieved an accuracy surpassing 80%.
We have not yet conducted experiments in a larger-scale constellation. In fact, future constellations may expand to the scale of tens of thousands of low Earth orbit (LEO) satellites, such as Starlink (
https://www.starlink.com/, accessed on 1 December 2023). Therefore, the CS mechanism proposed in this paper may bring about very high computational overheads in the large-scale constellation scenario. This is a problem worth studying in future work. At the same time, we used a general dataset in the simulation. In the real satellite scenario, the remote sensing data of satellites may be different. The impact of the long-tail distribution of data on SFL and client selection has not yet been considered. In the future, we will explore more diverse paradigms of SFL for applications, as well as consider more comprehensive CS mechanisms.