1. Introduction
Satellite management unit (SMU) is one of the main service systems to ensure the completion of the mission of the satellite. It consists of the hardware platform, platform software (operating system, board support package, etc.) and application software. The SMU is responsible for the on-board device management, attitude and orbit control, autonomous mission management, telemetry, telecommand and other functions. It is the key to realize the autonomous flight and management of the satellite and improve the reliability and security of the satellite.
In recent years, space scientific exploration has gradually been moving toward deep space, and the autonomy of space mission management requirement is rising. The scale and complexity of SMU software are constantly increasing, and the demand for deploying artificial intelligent algorithms on spacecraft is also growing, which pose greater challenges to the performance of on-board computers, operating systems, and SMU software architectures.
Taking the space gravitational wave detection as an example, because the gravitational wave detection has high requirements for the control accuracy and stability of the spacecraft, it needs control algorithms with higher complexity and the higher control frequency. At the same time, since the satellite is tens of millions of kilometers away from the Earth during gravitational wave detection, the telecommand delay is high, and the one-way delay is more than 200 s, which requires the intelligent autonomous management ability of the satellite [
1].
In terms of the performance of the onboard computer, the performance of the single-core processor is gradually not enough to cope with the increase in the computing amount brought by the above changes, and the single-core processor is restricted by Moore’s Law and the influence of the power wall factor, so it cannot further improve its computing capacity under the condition of limited power. Multi-core processors have more advantages in computing power and energy consumption, and in recent years, rad-hard multi-core processors have begun to emerge. Therefore, the use of a multi-core processor in the design of satellite control and data processing unit is an inevitable choice to improve the processing capacity of the satellite information system [
2,
3]. To give full play to the performance advantage of the multi-core processor and make it easy to use in the SMU, the support of many aspects, such as an operating system, software ecology and software architecture design, are necessary. These become important problems that need to be solved urgently.
At the operating system level, it is difficult for uC/OS to meet the complex multi-interface communication requirements, while VxWorks is expensive and not open source. The Real-Time Executive for Multiprocessor Systems (RTEMS) is a space qualified operating system that supports multiprocessor, hard real time, portable operating system interface (POSIX) and is freely distributable. However, it is considered a single process with multiple threads, which means a relatively large influence between threads. The decreased popularity in the industry leads to the difficulty in transplanting the latest applications and libraries on the ground to the SMU, which adds obstacles to the intelligentization of the satellites. Compared with the operating systems mentioned above, Linux has advantages, including various system functions, modularity, open source, multi-core processor support and numerous intelligent application libraries [
4]. Therefore, Linux is more suitable for high-performance SMU for future complex satellites.
As a general-purpose operating system, although Linux has many superior features mentioned above, its real-time performance is limited in the standard configuration, and its kernel size is huge [
5]. Therefore, to apply Linux to spacecraft, it is necessary to properly configure and tailor Linux so as to reduce the size of the kernel and improve the real-time performance. Linux has also been partially applied in traditional satellites, but most of them run on single-core processor with small memory. Therefore, they pay attention to extreme kernel size compression and often need to use a real-time operating system (RTOS) on the main computer, while Linux is only used for payload management, such as STRaND-1 [
6] and TacSat-1 [
7]. The situation with high-performance multi-core SOCs is quite different, requiring the design of a new Linux application scheme to make full use of the high-performance of multi-core processors and meet the mission requirements of future complex satellites. A balance is needed between kernel size, real-time performance, scalability and the support for intelligent applications.
To get the most out of a multi-core processor and take advantage of the benefits of Linux, traditional SMU software architecture based on a single-core processor is no longer suitable. Some existing research designed the SMU software architecture based on the multi-core processor. They are still based on RTOS, and few of them are based on Linux, which is difficult to meet the requirements of complex satellites in the future. He et al. [
8] conducted an analysis of dual-core processors and satellite control and data-processing systems, and proposed an asymmetric multiprocessing (AMP) based architecture. The data acquisition/collection thread runs on the core without floating-point unit (FPU), and the other threads run on the core with FPU. The running sequence of the threads is statically arranged. Jan et al. [
9] proposed a symmetric multiprocessing (SMP)-based architecture. Each application has a separate partition to realize the isolation and protection. The scheduling unit is a partition, and the partition has no priority. The partition scheduling scheme is determined in advance and repeats for a fixed period. Ref. [
10] designed a fault-tolerant processor architecture based on the tri-core processor, which supports three-mode redundancy at the maximum. It can be configured into three modes. In performance mode, the cores run separately. In normal mode, two cores form a dual-mode lock-step fault-tolerant processor, and the other core works independently. In reliable mode, three cores form three-mode redundancy. Although it is a hardware design, it has reference significance to the design of the SMU software architecture.
A new SMU software architecture is necessary. It should be able to improve the parallelism of the SMU software as much as possible to take advantage of the multi-core processor on the premise of ensuring reliability and scalability. It should also be capable of making full use of the Linux system functions for the scalability of SMU and carrying out the calculation of intelligent autonomous management tasks without affecting the real-time tasks so as to meet the needs for increasing the control complexity and frequency and the satellite intelligent autonomous management of future complex satellites.
The main contributions of this paper are as follows:
We use a rad-hard multi-core hardware platform to enhance the on-board computing power. Linux is used to manage multi-core resources and provide rich system functions for applications. Through Linux, the environment of SMU software is consistent with that on a personal computer (PC), which is conducive to improve the development process and efficiency. Configuration and tailoring to Linux are implemented considering size, real-time, extensibility and support for intelligent applications.
Based on the hardware platform and Linux, we design the software architecture with three modes. High-performance and high-reliability mode can be switched flexibly according to operating conditions so as to make full use of multi-core resources in various situations.
Under the background of gravitational wave detection project, the application of the SMU designed in this paper is analyzed, and a large number of experiments are carried out. Comparing with other methods, our design shows a close multi-core acceleration ratio, and has much better expansibility and generality.
The organization of this article is as follows:
Section 2 describes the multi-core SoC used in this article.
Section 3 describes the configuration and tailoring of Linux. In
Section 4, the SMU software transplantation and the architecture of the SMU software designed for SMP multi-core processor and Linux are described, and the application in the background of space gravitational wave detection project is also analyzed. In
Section 5, the performance advantages of our SMU compared with the traditional schemes are demonstrated through the experiments based on the actual SMU software, and the effects of the designed SMU are demonstrated combined with the task requirements in the background of space gravitational wave detection project.
Section 6 summarizes this paper.
3. Operating System Selection and Design
3.1. Overview
Because of the high performance of the multi-core processor and the MMU, the GR740 can run Linux directly without using uCLinux.
SpaceX’s Falcon 9 rocket demonstrated that Linux with the Preempt-RT patch can be used well in hard real-time systems [
17]. For our missions, there are generally two aspects of real-time requirements.
Timely data acquisition: Data need to be fetched from the buffer in time to avoid loss of data. However, interfaces with this requirement, such as the controller area network (CAN), are equipped with direct memory access (DMA) or first in first out (FIFO) buffer to reduce this requirement.
Periodic calculation: The results are used to periodically control attitude, orbit, etc. By adding a handling policy for exceeding deadline, the exceeding deadline probability of less than 5% can be tolerated. If an event-based task has missed its deadline, this task drives no event in this period. If a state-based task has missed its deadline, the state controlled by this task remains the same as the current state.
To sum up, by using Linux with the Preempt-RT patch and enhancing the tolerance of exceeding deadline with small probability in applications, the real time requirements of our system can be met [
18]. Real-time Linux variants, such as RTLinux [
19] and RTAI [
20], can be avoided.
The benefits of using the mainline Linux kernel are that it makes the environment of SMU software consistent with that on PC, which provides great convenience for building an integrated satellite and terrestrial on-board software development environment. It can use the full Linux system functionality and full hardware performance. Part of the on-board software can be developed and debugged on the desktop Linux computer and easily transplanted to the on-board computer environment. It will lead to great improvement of the efficiency of on-board software development, which is very important for the increasingly complex on-board software [
21]. Advanced applications used on the ground Linux system, such as database and Docker, as well as mobile AI libraries, such as Tengine [
22] and NCNN [
23], can be conveniently transplanted to the SMU software, which is conducive to the development of onboard intelligence. Therefore, the high-performance SMU can adopt mainline Linux with Preempt-RT patch as the operating system.
3.2. Linux Kernel
The Linux kernel version used in this article is 5.10. The kernel supports SPARC V8 systems with FPU and MMU in single-core and SMP configurations. The kernel consists of three parts: the Linux mainline kernel, the LEON kernel patch, and the GRLIB driver package. The Linux mainline kernel is a long-term stable version of the official Linux kernel that is actively maintained by the large community and industry. The LEON kernel patch is a LEON support for the Linux kernel actively developed by Cobham Gaisler. The purpose of the GRLIB driver package is to provide support for some intellectual property (IP) core parts of the GRLIB IP library [
24]. The structure of the Linux kernel is shown in
Figure 2.
3.3. Configuration
In this paper, the configuration file of 5.10 kernel provided by Cobham Gaisler is used as the basis [
25] for further configuration and tailoring and is considered the default configuration. Due to high processor performance and sufficient memory capacity, considering the needs of intelligent applications, the basic idea of our configuration and tailoring is not to pursue extreme size tailoring or real-time performance, but to provide support for advanced applications and AI libraries, and to use modularity to achieve scalability. The main contents are as follows:
In terms of real-time performance, the Preempt-RT patch [
26] is used to set the preempt model as the fully preemptible kernel. Some of the changes it brings are as follows: converting interrupt service routine to a real-time (RT) thread with priority 50; converting soft interrupt request (softIRQ) to a RT thread with priority 49; turning spinlock into mutex with priority inheritance; and enabling the high precision timer. Except for very low-level and critical code paths, all of the kernel code is preemptible, making the system more responsive. The timer frequency is set to 1000 Hz to provide fast response.
In terms of memory management, the swap function is disabled because it makes the memory access time difficult to predict and greatly affects the real-time performance of the program. Simple list of blocks (SLOB) is used as the Slab allocator, which is specifically designed for embedded devices with small memory capacity.
For the file system, use random access memory file system (Ramfs) as the root file system and crop out the other file systems. At present, usually there is no file system used on the SMU, and the data are stored sequentially on the non-volatile memory according to the pre-specified area. Ramfs is sufficient to provide the functionality needed to run Linux-based SMU software, including the storage of the C libraries and the software itself. The above approach already satisfies the basic file system requirements of an on-board Linux system. Since Ramfs is compressed in the image along with the Linux kernel and decompressed into memory during system startup, it will also help to speed up the reset of the SMU. If the file system is needed to manage the data on the non-volatile memory for advanced requirements such as on-board database, other file system modules, such as Second Extended Filesystem (EXT2) and New Technology File System (NTFS), can be installed after the system is started due to the advantages of Linux modularity.
For the C libraries, choose Glibc over uClibc. Although uClibc is much smaller in size than Glibc, the memory capacity on a high-performance hardware platform is often more than 128 MB, which is enough to support Glibc. At the same time, because uClibc does not include all Glibc interface implementations, and the development of ground applications and function libraries often use Glibc, using uClibc is not conducive to the transplantation of these functions. While it is convenient to replace the library files in the root file system, the application needs to be recompiled, and there is a risk of incompatibility, so using Glibc directly is more appropriate.
In terms of the network, enable network support, but tailor the drivers for specific interface types. CAN is the interface type widely used in the current onboard computer, and there are more and more attempts to use Ethernet as the payload data interface, which has a certain application prospect. Since both of them depend on the network support function on the Linux system, the network support is enabled. Thanks to the modular design of Linux, the drivers can be installed and uninstalled as modules after the system is running according to requirements, so they can be tailored.
In terms of device drivers, keep only the serial port driver and cut out other device drivers. Since the interaction with the Linux console requires the use of a serial port, the serial port driver is retained. Driver modules of other device drivers can be installed after the system is running according to specific requirements.
For other aspects, due to the need of in-orbit function upgrading of complex satellites in the future, and the need to use different file system modules or device driver modules according to the functions of different satellites, it is necessary to enable the modular function of Linux, and support the installation and uninstallation of modules. Since debugging and test-related functions may affect system performance, especially embedded systems, we disable debugging and test-related functions during the experiments. Since on-board applications are trusted and satellite-ground and inter-satellite communications are encrypted, the system operates in a secure environment, and the security module of Linux can be turned off.
After the above configuration and tailoring, the Linux we use is reduced in size, improved in real-time performance, has scalability, and supports intelligent applications.
5. Experiment and Verification
In order to demonstrate the performance of the SMU designed in this paper, a large number of experiments were conducted to obtain the performance data and compared with the traditional methods. Then space gravitational wave project was taken as a case study to show the effect of the SMU designed in this paper.
5.1. Configuration
Based on the GR-CPCI-GR740 development board and the Linux after configuration as described in
Section 3, SMU software in high-performance mode was selected, and the data of various aspects of performance collected during experiments were analyzed.
Due to the complexity of applications, the operating system, and the multi-core processor, accurate static analysis of the worst-case execution time (WCET) is difficult. We adopt a rough mixed analysis technique to measure WCET. Dynamic analysis is performed on the basis of rough static analysis of the applications. By setting various state parameters of satellite in the program according to the rough static analysis on the functional level, the path length of program execution is increased as much as possible. For example, the AOCS-C module is set to the path that involves the calculation of the extended Kalman filter; the TaskMng module is set to conduct forecast and estimation, including the power estimation for the execution of the mission, antenna selection calculation, etc. Further, a large number of dynamic running tests were carried out to measure WCET.
In terms of the operation system, the configured Linux image size is 2.50 MB, 54.4% less than the default size of 5.48 MB. The number of kernel threads after system startup is 41, which is 19.6% less than the default configuration of 51.
In the aspect of SMU software, because the functions in high-performance mode are the most complex and cover the performance parameters required by other modes, SMU software runs in high-performance mode. As the current SMU software has a small demand for computing power, in order to simulate the increase in computing power demand brought by the improvement of the control precision of future satellites, the control frequency is raised from 8 Hz to 10 Hz based on the Space Variable Objects Monitor (SVOM) [
30] SMU software. Moreover, the calculation of the extended Kalman filter described in [
31] is added into Orbit, AOCS-D, AOCS-C and Thermal. The time consumed and operation cycle of each module are shown in
Table 4. Due to the limited number of interfaces on the development board and for ease of testing, the Socat software on the Linux PC was ported to create a virtual interface, and its administrative process was run on Core3 isolated from SMU software modules.
5.2. Performance Experiment
Firstly, the SMU is connected to the ground test computer through the serial port, and the ground test software is used to send the telecommand instruction to the SMU. By observing the telemetry data received by the ground test system, it can be confirmed that the SMU designed in this paper can work properly. Then, according to the requirements of the SMU, the response time jitter, deadline and multi-core speedup are tested, respectively. Finally, the reliability improvement of the triplication-redundancy mode is also evaluated.
5.2.1. Response Time Jitter
Each module of SMU software is a periodic task, TaskMng is awakened by the periodic clock signal, and the periodic execution of the other modules is controlled by TaskMng. Therefore, the jitter of the response time of the periodic clock signal is very important for TaskMng, and reflects the real-time performance of the system. We run the SMU software for 10,000 cycles, record and analyze the jitter of the response time, and compare the results with the default configuration of Linux. The results are shown in
Table 5. The comparison of the jitter distribution is shown in
Figure 10.
By comparison, under the Linux configuration designed in this paper, the average response jitter of periodic signals is reduced by 3.79 us, the maximum jitter is reduced by 36 us, and the variance of jitter is reduced by 49.02, compared with the default configuration. At the same time, the maximum jitter is less than 0.05% of the cycle length, which meets the requirement of response time jitter for cycle execution of modules.
5.2.2. Deadline and Multi-Core Speedup
Since the periodic tasks need to be completed before the deadline, 10,000 cycles are run, the execution time of each cycle is recorded and analyzed, and the results are compared with the results using only one processor core to test the multi-core speedup. Moreover, it is compared with the results of the traditional method of designing the static scheduling table to analyze the impact of the scalability and generality of our proposed scheduling method on the performance. The specific scheduling schemes are compared as shown in
Figure 11, and the running time is compared as shown in
Table 6.
By comparison, it can be seen that our proposed scheduling scheme can meet the deadline requirements under the scenario of high computing power demand, and the average maximum multi-core speedup ratio is 1.51, while the average maximum multi-core speedup ratio of static scheduling is 1.52. Moreover, the analysis shows that the factor that restricts the multi-core speedup is that the critical path composed of attitude and orbit control-related modules is too long, and the improvement of the computing efficiency only through parallel execution at the module level is limited. Therefore, it is necessary to parallelize attitude and orbit control-related modules in the future to further improve the multi-core speedup ratio.
The multi-core speedup ratio of the proposed scheduling method is close to that of static scheduling. However, due to the poor expansibility and generality of static scheduling, the scheduling scheme needs to be redesigned when tasks change, and the design of the scheduling scheme needs the running time of each module per cycle. In the multi-core scenario, there are complex constraints between modules. The running time is highly coupled with the scheduling scheme, which needs to be iteratively optimized to obtain better scheduling performance. At the same time, because the in-orbit operation has a high requirement for reliability, it also needs long-time tests to evaluate the reliability of the scheduling scheme. Therefore, the development and debugging time cost of static scheduling schemes can often reach several weeks, which brings great difficulties to the in-orbit update of code and the extension of in-orbit tasks. In contrast, code updates and task extensions for our scheduling method require only the appropriate partitions to be specified for the module that has changed. Therefore, the proposed scheduling method is more suitable for the needs of future complex satellites due to its strong expandability and generality.
5.2.3. Reliability Improvement
Triplication-redundancy mode is selected. Its reliability improvement is evaluated for the following orbits (solar minimum):
Geosynchronous Earth orbit (GEO): 36,000 km, AP-8 min for Radiation Belt Model;
Low Earth orbit (LEO): 700 km, 98.7 inclination, AP-8 min for Radiation Belt Model.
In the above environments, the device error rates per day due to heavy ions for the GR740 flight silicon are estimated with Weibull data presented in [
32]. The estimated results are 7.81 × 10
events/device/day (GEO) and 2.09 × 10
events/device/day (LEO). Based on the device error rates, functional failure rates of triplication-redundancy mode and configuration without redundancy are estimated and compared. The results are shown in
Table 7.
According to the comparison, the reliability in the GEO orbit environment is improved by 5 orders of magnitudes, and the reliability in the LEO orbit environment is improved by 6 orders of magnitudes. It can be seen that the triplication-redundancy mode designed in this paper has a good reliability improvement capacity.
5.3. Application Experiment
Based on the background of the space gravitational wave detection project, the application experiment is carried out. According to the requirements of the space gravitational wave detection project and in order to demonstrate the generality and scalability of the SMU, the Ethernet driver is installed in Linux on the basis of the configuration in
Section 5.1, and the drag-free control module is added to the SMU software. The running time is shown in
Table 8. Furthermore, the data transmission window planning software based on depth-first search designed for the data protection period in the space gravitational wave detection project is added in Core3 as an example of the satellite intelligent autonomous management program.
The SMU is connected to the National Instruments (NI) platform through an Ethernet port, and the semi-physical simulation test of drag-free control is carried out. According to the test results, the control accuracy is kept within the target continuously. At the same time, the average calculation time of the data transmission window planning is 147 s, which can also output the results within an acceptable time, and does not interfere with the execution of real-time modules.
Further, the running time of each cycle after the addition of the drag-free module is analyzed. The specific scheduling scheme comparison is shown in
Figure 12, and the comparison result of the running time is shown in
Table 9.
By comparison, it can be seen that with the addition of the drag-free module, the average single-core running time of many cycles is close to 80 ms, with the risk of exceeding the deadline, which also shows the necessity of carrying out the design of SMU based on multi-core processor. The average maximum multi-core speedup ratio of our scheduling method is 1.91, while the average maximum multi-core speedup ratio of the static scheduling is 1.97. It indicates that with the increase in the work load, the gap between the multi-core speedup ratio of the two scheduling methods also increases gradually, but still in a small range. The static scheduling scheme needs to be redesigned carefully. Our scheduling method only needs to set the partition of the new module, which is more expandable and general. At the same time, the two methods have no significant increase in the running time after the addition of the drag-free module, indicating that both schemes can make full use of the idle time of the processor caused by the attitude and orbit control critical path and improve the multi-core speedup ratio. The comparison of three methods is summarized as shown in
Table 10.
It can be seen that the SMU designed in this paper can meet various requirements of future complex satellites, such as space gravitational wave detection satellites due to the increase in the complexity of the control algorithm, the increase in control frequency and the intelligent autonomous management of satellites.
6. Conclusions
This paper presents a design of the high-performance general-purpose SMU. Rad-hard multi-core SoC is used to improve the on-board computing power. Linux is used to manage multi-core computing resources and make the environment of SMU software consistent with that on PC. The configuration and cutting of Linux are implemented in terms of size, real-time, extensibility and support for intelligent applications. The SMU software architecture with three modes is designed to make full use of multi-core processor resources under various situations. High-performance and high-reliability mode can be switched flexibly according to the operating conditions. The software is designed to be expandable and general. Safety design for space application is also considered. In this way, the need for computing power and software management brought by the increase in the scale and complexity of SMU software and the intelligent autonomous management are satisfied.
The performance experiment shows that the average jitter of the response to the periodic signal is reduced by 3.79 us, and the maximum jitter is reduced by 36 us compared with the default configuration, making the maximum jitter less than 0.05% of the cycle length, indicating a significant improvement in real-time performance. Meanwhile, under the task configuration of the performance experiment, the multi-core speedup ratio of the proposed scheduling method is 1.51, which is only 0.01 less than that of static scheduling, but it has higher scalability and generality. Then, based on the low device error rate due to heavy ions of GR740, the triplication-redundancy mode is estimated to make further improvement of the reliability in a GEO orbit environment by five orders of magnitudes. Based on the background of the space gravitational wave detection project, the application experiment and the semi-physical simulation of drag-free control further verify that the SMU designed in this paper can meet the needs of future complex satellites.
Future work includes applying the high-performance general-purpose SMU described in this paper to the new-technology experiment satellite of our institute, continuously improving the design of the high-performance general-purpose SMU through in-orbit application. Eventually, we expect to achieve more effective support for the various types of spacecrafts to meet the requirements of the autonomous management of an intelligent computing system so as to complete a variety of tasks with greater challenge and scientific value.