LTE: Lightweight Transformer Encoder for Orbit Prediction

Jeong, Seungwon; Shin, Youjin

doi:10.3390/electronics13224371

Open AccessArticle

LTE: Lightweight Transformer Encoder for Orbit Prediction

by

Seungwon Jeong

¹

and

Youjin Shin

^2,*

¹

Department of Data Science, Sejong University, Seoul 05006, Republic of Korea

²

Department of Data Science, The Catholic University of Korea, Bucheon 14662, Republic of Korea

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(22), 4371; https://doi.org/10.3390/electronics13224371

Submission received: 30 September 2024 / Revised: 4 November 2024 / Accepted: 6 November 2024 / Published: 7 November 2024

(This article belongs to the Special Issue Deep and Classic Machine Learning in Signal, Image, and Video Analysis)

Download

Browse Figures

Versions Notes

Abstract

:

As the focus of space exploration has recently shifted from national efforts to private enterprises, interest in the space industry has increased. With the rising number of satellite launches, the risk of collisions between satellites and between satellites and space debris has grown, which can lead not only to property damage but also casualties caused by the debris. To address this issue, various machine learning and deep learning-based methods have been researched to improve the accuracy of satellite orbit prediction and mitigate these risks. However, most studies have applied basic machine learning models to orbit prediction without considering the model size and execution time, even though satellite operations require lightweight models that offer both a strong prediction performance and rapid execution. In this study, we propose a time series forecasting framework, the Lightweight Transformer Encoder (LTE), for satellite orbit prediction. The LTE is a prediction model that modifies the encoder structure of the Transformer model to enhance the accuracy of satellite orbit prediction and reduce the computational resources used. To evaluate its performance, we conducted experiments using about 4.8 million data points collected every minute from January 2016 to December 2018 by the KOMPSAT-3, KOMPSAT-3A, and KOMPSAT-5 satellites, which are part of the Korea Multi-Purpose Satellite (KOMPSAT) series operated by the Korea Aerospace Research Institute (KARI). We compare the performance of our model against various baseline models in terms of prediction error, execution time, and the number of parameters used. Our LTE model demonstrates significant improvements: it reduces the orbit prediction error by 50.61% in the KOMPSAT-3 dataset, 42.40% in the KOMPSAT-3A dataset, and 30.00% in the KOMPSAT-5 dataset compared to the next-best-performing model. Additionally, in the KOMPSAT-3 dataset, it decreases the execution time by 36.86% (from 1731 to 1093 s) and lowers the number of parameters by 2.33% compared to the next-best-performing model.

Keywords:

time series forecasting; orbit prediction; machine learning; transformer

1. Introduction

Satellites can be categorized into Geostationary Earth Orbit (GEO), Medium Earth Orbit (MEO), and Low Earth Orbit (LEO) satellites, with each serving different purposes. GEO satellites, positioned at an altitude of around 35,786 km, remain fixed relative to a point on Earth and are primarily used for communication, weather monitoring, and broadcasting services. MEO satellites, typically ranging from 2000 to 35,786 km, are often used for navigation systems like GPS and regional communication services. LEO satellites, operating at altitudes between 160 and 2000 km, are primarily deployed by private space exploration companies for services such as Earth observation, broadband internet, and satellite communication, as they offer lower latency and quicker data transmission compared to higher-orbit satellites. In recent years, the cost of launching LEO satellites has decreased dramatically, drawing significant attention from private space exploration companies such as SpaceX [1], Blue Origin [2], Rocket Lab [3], and OneWeb [4]. For instance, SpaceX plans to regularly launch 60 satellites each month until it deploys up to 30,000 satellites [5]. This massive increase in satellite deployments increases the risk of collisions between satellites and with space debris, which could result in property damage and even casualties due to falling debris. As a result, accurate satellite orbit prediction has become crucial for preventing these risks.

Conventionally, orbit prediction algorithms have relied on statistical and physical methods [6,7,8]. However, in recent years, machine learning and deep learning methods have been introduced to enhance prediction accuracy. Some notable studies have employed conventional machine learning architectures [9,10,11,12,13], but these studies were limited by their use of datasets obtained from simulations, which do not consider the irregular patterns that occur during actual satellite operations. For example, Peng and Bai [9] noted that the reason for the failure of physics-based prediction models is that they are not based on historical data of the space environment and resident space objects. Therefore, physics-based prediction models cannot learn the patterns of orbit prediction errors. For this reason, the authors proposed an approach using the support vector machine (SVM), a machine learning model, to learn the patterns of orbit prediction errors based on historical data. They demonstrated that the SVM shows a better prediction performance compared to physics-based prediction models, and the model’s performance can be further improved by adding more training data.

Also, Akshita Saxena [11] used the Two-Line Element (TLE) [14] set provided by The National Aeronautics and Space Administration (NASA) and the North American Aerospace Defense Command (NORAD) for the orbit prediction of Resident Space Object (RSO) satellites. They proposed a methodology that combines a machine learning model, specifically the Gradient-Boosting Regression Tree (GBRT), with a physical-based prediction model. They generated a physics model that predicts future data points using historical data points. The prediction errors obtained from the physics model were fed into the GBRT model along with the points’ actual values. Here, the GBRT model estimates the errors predicted by the physics model. As a result, the physics model improves its performance by adjusting its predictions using the prediction error estimated by the GBRT model. They demonstrated that their model outperformed Artificial Neural Network (ANN), and Gaussian Process (GP) models. The research carried out by Peng and Bai, which uses only a machine learning model, and the research performed by Akshita, which combines both a physics model and a machine learning model, each improved the performance of a prediction model using historical data. However, this research is closer to traditional numerical optimization and statistical techniques. Thus, it cannot effectively handle large datasets or complex patterns.

Other research has explored recurrent neural network (RNN)-based models to predict satellite orbits and spatial targets [15]. Moreover, several studies have applied Long Short-Term Memory (LSTM) models, significantly reducing orbit prediction errors [16,17,18,19,20,21,22]. For example, Qu and Wei [15] predicted the orbit of spatial targets using an LSTM-based model combined with the Particle Swarm Optimization (PSO) algorithm. The PSO algorithm simulates the movement of particles in the search space to find the optimal solution within that space. The PSO-LSTM model calculates the current fitness value of space objects during training based on the mean absolute error, and it determines the position and velocity of space objects. Through this approach, the authors demonstrated that the PSO-LSTM model outperforms the traditional LSTM model’s prediction performance. Osama et al. [19] aimed to predict satellite orbits using TLE set conversions based on LSTM. They preprocessed historical data from the TLE set to obtain velocity and position vectors, aiming to predict the satellite’s orbit using these two values. Then, they used the velocity vector as input to the LSTM model to predict the position vector and vice versa. The authors demonstrated that the proposed approach predicted both position and velocity vectors with a high accuracy of 98%. However, both studies utilized LSTM to predict the orbits of spatial targets or the orbit of satellites but did not consider factors like execution time or model parameter counts.

More recently, growing attention has been directed toward orbit prediction using the Transformer model, which has demonstrated remarkable performance in Natural Language Processing (NLP) as well as time series forecasting. Introduced in 2017, the Transformer model [23] is useful in parallel processing as it performs independent calculations across all elements in a sequence. Its attention mechanism captures the interactions between all positions in the input sequence, providing a more comprehensive context compared to earlier time series models.

In this work, we leverage Transformer Networks for orbit prediction, as these have demonstrated promising results in time series prediction. We propose the Lightweight Transformer Encoder (LTE) model, which predicts satellite orbits with high performance while reducing execution time and model parameters. Our main contributions are as follows: We introduce the LTE model, specifically adapted for satellite orbit prediction by modifying the Transformer Encoder structure. The modifications include the removal of Positional Encoding (PE) and Layer Normalization (LN) to prevent the distortion of input data and over-normalization. Additionally, using real-world data from the Korea Multi-Purpose Satellite (KOMPSAT) series, the LTE model showed significant improvements, reducing the Mean Squared Error (MSE) by up to 50.61% over comparable models. It also showed efficiency gains, with reduced execution time and model parameters. Finally, our analysis across different configurations of PE and LN further supports the LTE model’s suitability for orbital datasets, demonstrating its predictive accuracy and efficiency.

The structure of this paper is as follows: Section 2 introduces the dataset, preprocessing, model architecture, experimental setup, and baseline models. Section 3 presents an analysis of the prediction performance of the LTE model, along with its efficiency in terms of execution time and parameters, as well as the impact of layer removal on the LTE model. Section 4 discusses the limitations and future work. Finally, Section 5 provides a summary of the study and presents our key findings.

2. Materials and Methods

In this section, we provide a detailed description of our LTE model for orbit prediction. The LTE model is composed of three main parts, and its overall architecture is depicted in Figure 1: Step (a) is collecting the TLE set from operational satellites, Step (b) is preprocessing the collected data to make them suitable for orbit prediction, and Step (c) is constructing the LTE architecture for orbit prediction. In particular, we describe the dataset used in our experiment, including the types and characteristics of the data, and present detailed information about the satellites from which the data were collected in Section 2.1. In Section 2.2, we explain the preprocessing step, focusing on the extraction of six orbital elements from the TLE set and their subsequent normalization. Then, in Section 2.3, we introduce the detailed architecture of the LTE model, which includes a modified Transformer Encoder. Finally, in Section 2.4, we introduce the experimental settings, including the baseline models and hyperparameters used for comparison with the LTE model.

2.1. Dataset

The Korea Aerospace Research Institute (KARI) [24], established in 1989, is South Korea’s national aerospace research institute. It was founded to explore, develop, and disseminate new aviation and aerospace science and technology. Among its projects is the Korea Multi-Purpose Satellite (KOMPSAT) series, which was designed to observe the surface of South Korea in high resolution to enhance national security and public services. In this study, we utilized a dataset of orbit data from the KOMPSAT-3 (K3), KOMPSAT-3A (K3A), and KOMPSAT-5 (K5) which consists of approximately 4.8 million data points collected every minute from January 2016 to December 2018.

Measuring the position and velocity of a satellite directly from the ground is difficult. In addition, interferences arise unexpectedly due to various dynamical forces such as Earth’s gravitational potential, the gravitational pull from the sun or moon, and solar radiation pressure. To address these challenges, the TLE set is used to predict the orbit of moving satellites. The TLE set is in a standard format for representing the orbital parameters of Earth-orbiting satellites, encoding information in two lines that describe an object’s orbit through 12 specific elements. These include the Satellite number, International designator, epoch, First derivative of mean motion, BSTAR drag term, Eccentricity, Inclination, Right Ascension of the Ascending Node (RAAN), Argument of perigee, Mean anomaly, Mean motion, and Revolution number. Among these, we used the six orbital elements most relevant to satellite position and velocity for model training. The preprocessing steps for extracting these six orbital elements from the TLE data are detailed in Section 2.2.

To understand the characteristics of these six orbital elements, we displayed the elements from the K3 satellite dataset within the data points from 300,000 to 302,000, as shown in Figure 2. In the case of RAAN, a different range of data points (300,000 to 1,700,000) was provided since it has a long-period repetitive pattern compared to the other elements. Each element exhibits a unique period and wave shape, and a repetitive pattern is observed within each element individually. However, the data show subtle deviations in fluctuation even within these repetitive patterns, and the errors caused by these subtle deviations in fluctuation can lead to major satellite orbit changes.

2.2. Data Preprocessing

There are various mathematical representations of the same orbit, but specific parameter sets of six orbital elements are commonly used in fields like astronomy and orbital mechanics. These traditional orbital parameters are known as the six Keplerian elements, named after Johannes Kepler and his laws of planetary motion [25]: the Semi-major axis, Eccentricity, Inclination, RAAN, the Argument of perigee, and True anomaly. Although True anomaly is the parameter used for precisely indicating a satellite’s position along its orbit, it is challenging to calculate due to the nonlinear motion of objects in elliptical orbits. Instead, Mean anomaly is often used, as it increases linearly over time, providing a simpler and more predictable measure of position. The Semi-major axis is not included in the TLE set directly. However, by using Kepler’s third law [26], which states that the square of an orbital period is proportional to the cube of the Semi-major axis, we can substitute Mean motion with the Semi-major axis [27]. Thus, we can use the Mean anomaly of the TLE set instead of the True anomaly and calculate the Semi-major axis from the Mean motion of the TLE set. As a result, we extracted six elements from the TLE set for satellite orbit prediction, and their descriptions are provided in Table 1.

Given a time t, if

S_{t}, E_{t}, I_{t}, R_{t}, A_{t}, M_{t}

denote the six orbital elements extracted from the TLE set—namely, the Semi-major axis, Eccentricity, Inclination, RAAN, Argument of perigee, and Mean anomaly—then the final vector

X_{t}

can be expressed using the following formula:

\begin{matrix} X_{t} = (S_{t}, E_{t}, I_{t}, R_{t}, A_{t}, M_{t}), t = 0 . . . n, \end{matrix}

(1)

These six orbital elements collected from the TLE set of each satellite exhibit values that have different ranges. The ranges of these six orbital elements in each dataset are provided in Table 1. As shown in this table, Eccentricity ranges from 0.000016 to 0.004229, while the Semi-major axis ranges from 6892.097 to 6915.121. As a result, the influence of Eccentricity is weaker compared to the Semi-major axis during model training. To address this, we applied normalization to the data of the six orbital elements collected from the TLE sets of the K3, K3A, and K5 satellites. Each orbital element was normalized by subtracting the minimum value from each data point and then dividing by the range, which is computed as the difference between the maximum and minimum values. The equation for normalization is shown in Equation (2). Here,

Z

represents the normalized value, x denotes the individual data point,

\min (x)

refers to the minimum value within the column containing the data point to be normalized, and

\max (x)

refers to the maximum value within the same column.

\begin{matrix} Z = \frac{(x - \min (x))}{(\max (x) - \min (x))} \end{matrix}

(2)

We note that, since 2016 is a leap year, which occurs once every four years and includes an extra day, the 29 February, data measured on this day exist. This single day’s data are excluded from our research to develop a more consistent model, as including them would require the manual adjustment of inputs for that day every leap year. By excluding these data, we ensure uniformity across all years and enhance the model’s generalization capability.

2.3. Prediction Model

In Figure 2, each element of the dataset exhibits a unique period and wave shape, and a repetitive pattern is observed within each element individually. This suggests that an original Transformer with a complex structure, containing numerous parameters and layers, may not be necessary for orbit prediction. Therefore, we leverage only the encoder of the Transformer for our LTE model, excluding the decoder, and further removing unnecessary layers to better fit the characteristics of the dataset. Figure 3a represents the structure of the Transformer encoder, while Figure 3b illustrates the structure of the LTE model. The Transformer encoder applies PE to the input Y and passes it through the Multi-Head Attention layer. The output Z from the Multi-Head Attention layer is fed through a residual connection and the LN layer to obtain

Z^{'}

. Subsequently, it passes through a Feed-Forward layer and once again through a residual connection and the LN layer to obtain the final output

Y^{'}

.

Unlike the Transformer Encoder, the LTE model passes the input Y through the Multi-Head Attention layer without applying PE. The value Z that passes through the Multi-Head Attention layer is added to the input Y in the residual connection stage, resulting in

Z^{'}

. In this case, the Transformer encoder applies both residual connection and LN, but the LTE model does not apply LN. Next,

Z^{'}

passes through the Feed-Forward layer and then through another residual connection stage to obtain the final output

Y^{'}

. Even in this case, the LTE model applies only residual connection, without applying LN. The final output,

Y^{'}

, represents the prediction of the 6 elements included in the input Y at a given time.

PE is a method used in the original Transformer model to encode the position of each token in a sequence, since Transformers process input data in parallel and do not inherently track order like RNNs. By adding positional information to each token’s embedding, using sine and cosine functions at different frequencies, the model can capture both short-term and long-term dependencies in the sequence. This allows the Transformer to understand the order of tokens and properly model relationships across time or sequence positions. However, in cases where the data exhibit repetitive patterns or do not require complex positional relationships, PE can unnecessarily complicate the model by introducing artificial patterns and higher frequencies, which can lead to overfitting or poor generalization. Therefore, careful consideration should be given to whether PE is truly beneficial for a given task.

In Figure 4, we can observe the effect of applying PE to the Semi-major axis data. The yellow line, representing the original data without PE, shows a smooth and regular pattern with a relatively low frequency, which aligns with the natural fluctuation expected from satellite orbit dynamics. However, when PE is applied, as shown by the blue line, the data become significantly more complex, introducing higher-frequency components and irregular patterns. This added complexity distorts the original data, altering the natural characteristics of the Semi-major axis. Such distortion can mislead the model during training, causing it to focus on artificial frequencies rather than true orbital patterns. This not only complicates the learning process but also risks overfitting, where the model performs well on the distorted training data but fails to generalize to new, unseen data. Consequently, applying PE in our dataset may degrade the model’s performance, making it harder to accurately forecast satellite orbits. Therefore, the use of PE for this type of data may not be appropriate, as it introduces unnecessary complexity and reduces the overall effectiveness of the model. Therefore, we removed PE from our model. This allows the model to focus on the inherent characteristics of the orbital data, improving both its learning efficiency and generalization to unseen data.

Additionally, we also removed the LN. The decision to eliminate the LN was motivated by the characteristics of our dataset, which exhibits regular and repetitive patterns. Since our data are already normalized through a preprocessing step, LN may add unnecessary complexity and potentially over-regularize the model, limiting its ability to fully learn underlying patterns. By removing this component, we aimed to simplify the model and allow it to focus more effectively on learning the core patterns in the data. Furthermore, the absence of LN helps to improve the gradient flow, allowing the model to train faster and more freely without being overly constrained by normalization. As a result, the removal of LN led to a better performance, demonstrating that in the case of simple, pre-normalized datasets like ours, additional normalization layers may not always be necessary and can sometimes hinder model optimization. The efficiency analysis of removing PE and LN is presented in Section 3.3.

We now explain the process of the LTE in terms of its algorithms. The multivariate sequence input

Y \in R^{B \times N}

becomes

Q \in R^{B \times N}, K \in R^{B \times N}

, and

V \in R^{B \times N}

is computed in the form of Equation (3) [23]. Here, B means batch size and N means the dimension of the dataset. First, the dot product of

Q \in R^{B \times N}

and

K \in R^{B \times N}

is performed, and this value is divided by

d_{k}

to reduce the variance of the weights caused by the dot product. The explanation of the

d_{k}

value is addressed in Equation (4) in the next paragraph. The softmax function normalizes the computed

Q \in R^{B \times N}, K \in R^{B \times N}

values so that their total sum becomes 1 and assigns relative importance to each value when computing with

V \in R^{B \times N}

.

\begin{matrix} Attention (Q, K, V) = softmax (\frac{Q K^{T}}{\sqrt{d_{k}}}) V \end{matrix}

(3)

The Transformer model uses Multi-Head Attention, a variant of Attention, which allows the model to simultaneously attend to different subspaces of information at various positions, enabling it to learn the data’s information more comprehensively. The computation of Multi-Head Attention begins by dividing the input into multiple heads, each performing independent attention operations. The parameter of the number of heads (

n_{head}

) determines how many attention heads are used, allowing the model to distribute its focus across different parts of the input and learn diverse representations. Division into multiple heads enables the model to capture more detailed relationships within the data. To perform the operations of Multi-Head Attention, we first divide

Q, K, V

into

Q_{i} \in R^{B \times \frac{N}{n_{head}}}, K_{i} \in R^{B \times \frac{N}{n_{head}}},

and

V_{i} \in R^{B \times \frac{N}{n_{head}}}

. Here, B means batch size, N means the dimension of the dataset, and

n_{head}

is how many heads are used to divide the dataset during Multi-Head Attention. We then apply Equation (4) [23]. We can now define

d_{k}

in Equation (3), which refers to the dimension of the data N divided by

n_{head}

.

\begin{matrix} MultiHead (Q, K, V) & = Concat ({head}_{1}, {head}_{2}, \dots, {head}_{h}) W^{O} \\ where {head}_{i} & = Attention (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V}) \end{matrix}

(4)

The values that pass through the Multi-Head Attention layer pass through a residual connection layer, followed by the Feed-Forward layer, and then through another residual connection layer to finally obtain the predicted result

Y^{'} \in R^{B \times N}

. For the loss function used in training the prediction model, we employ MSE, as shown in Equation (5). Here,

Y_{i}

denotes the actual data value,

Y_{i}^{'}

represents the predicted data value, and n is the length of the entire dataset.

\begin{matrix} MSE = \frac{1}{n} \sum_{i = 1}^{n} {(Y_{i} - Y_{i}^{'})}^{2} \end{matrix}

(5)

2.4. Experiments

For model training, we used approximately 4.8 million data points collected from the K3, K3A, and K5 satellites from January 2016 to December 2018. The dataset was divided by setting January to December 2016 as the training dataset, January to December 2017 as the validation dataset, and January to December 2018 as the test dataset. All the models in the experiments utilized the same training, validation, and test data. In Section 3.1, we present the results of comparing the prediction performance of our LTE model with that of various baseline models, along with comparisons of the execution time and the number of parameters used against those of Transformer-based models. The baseline models used in our experiments are listed below:

Support Vector Regressor (SVR) [28]: A machine learning model that uses support vector machines (SVMs) for regression tasks by finding the best-fitting hyperplane that minimizes the error within a predefined threshold.
eXtreme Gradient Boosting Regressor (XGBR) [29]: An ensemble learning method based on gradient boosting, which builds a series of weak learners (decision trees) to improve predictive performance.
Long Short-Term Memory (LSTM) [30]: A type of recurrent neural network (RNN) designed to model sequences and time series data by effectively learning long-term dependencies and preventing the vanishing gradient problem.
Variational AutoEncoder (VAE) [31]: A generative model used for unsupervised learning that learns a probabilistic representation of the data by encoding them into a lower-dimensional space and then decoding them back into their original format.
Gated Recurrent Unit (GRU) [32]: Another RNN variant similar to LSTM, but with a simpler architecture, using fewer gates to capture dependencies in sequential data while also addressing the vanishing gradient problem.
Bi-directional Long Short-Term Memory (Bi-LSTM) [33]: An extension of LSTM that processes input sequences in both forward and backward directions, improving its ability to capture context from both past and future information in sequential data.
Transformer [23]: A neural network architecture based on attention mechanisms that process sequential data without relying on recurrence, making it highly effective for tasks like machine translation and time series prediction.
Transformer (encoder only): A model that separates the encoder structure from the Transformer architecture. The model first applies PE to the input, followed by Multi-Head Attention, residual connections, LN, and a Feed-Forward layer.
Transformer (decoder only): A model that separates the decoder structure from the Transformer architecture. The model first applies PE to the input, followed by Multi-Head Attention, residual connections, and LN. Then, it passes through another Multi-Head Attention layer, residual connections, and LN, before passing through a Feed-Forward layer.

We can categorize these 9 models into three groups: (1) machine learning-based models (SVR and XGBR), (2) conventional neural networks (LSTM, VAE, GRU, and Bi-LSTM), and (3) Transformer-based models (Transformer, Transformer (encoder only), and Transformer (decoder only)). Our experiments were conducted in different configurations within each group using various hyperparameters. Also, the experiments were conducted five times, and the average of the five results was recorded in the result table.

All models, except for the machine learning-based models, used the Adam optimizer with a learning rate of 0.001, and the activation function was set to a Rectified Linear Unit (ReLU). Additionally, the batch size was set to 60, the number of hidden units to 64, and the dropout rate to 0.1. In the Transformer-based models, the

n_{head}

for Multi-Head Attention was set to 3 and 6, and the number of Transformer encoder and decoder layers was set to 1.

3. Results

In this section, we present and discuss the experimental results of the LTE model and compare them with those of th baseline models. In Section 3.1, we explain the prediction performance of the LTE model in comparison with a total of 14 baseline models, and in Section 3.2 we compare the execution time and the number of parameters between Transformer-based models, including our LTE model. In Section 3.3, we analyze the efficiency of the presence or absence PE and LN.

3.1. Prediction Performance

Table 2 displays the prediction errors of the LTE and the baseline models. Excluding the row with column titles, rows 1 and 2 (SVR, XGBR) represent the machine learning-based models, rows 3 to 6 (LSTM, VAE, GRU, Bi-LSTM) represent the Conventional Deep Learning models, and from row 7 to the last row all eight models are Transformer-based models. Due to the characteristics of satellite orbits, even an error range of 0.001 can translate to actual orbital errors of tens to hundreds of kilometers. Therefore, our experiments present results with precision up to the seventh decimal place to ensure very accurate measurements of errors. The value in bold with a checkmark indicates the best-performing model and the value of the second-best-performing model is indicated in bold without a checkmark. Additionally, the value that is underlined is the best-performing model excluding our LTE model. “Improvement” refers to the percentage of accuracy improvement between the best-performing model (bold with a checkmark) and the best-performing model except ours (underlined). The prediction error is calculated using MSE, with lower values indicating a better performance. Also, ‘h3’ means that

n_{head}

is set to 3, while ‘h6’ indicates that

n_{head}

is set to 6.

Performance on K3: The LTE model consistently performed well regardless of the

n_{head}

value, with the best performance achieved by the LTE-h3 model, yielding a prediction error of 0.0000241. The second-best-performing model is LTE-h6, with a prediction error of 0.0000292. Among a total of 12 models, excluding our models, Transformer-h3 (encoder only) was the best-performing model, with a prediction error of 0.0000488. As a result, LTE-h3 showed a 50.61% improvement compared to Transformer-h3 (encoder only).

Performance on K3A: Similarly, the LTE model performs well across all

n_{head}

values. LTE-h6 had the lowest prediction error of 0.0000072, followed by LTE-h3 with 0.0000107. The best-performing model excluding our models was Transformer-h6 (encoder only), with a prediction error of 0.0000125. As a consequence, LTE-h6 achieved a 42.40% improvement over Transformer-h6 (encoder only).

Performance on K5: For K5, the LTE model consistently performed well regardless of the

n_{head}

value. The best-performing model was LTE-h3, with a prediction error of 0.0000007, followed by LTE-h6 with a prediction error of 0.0000008. Among a total of 12 models, excluding ours, Transformer-h3 (encoder only) performed best, with a prediction error of 0.0000010. Consequently, LTE-h3 demonstrated a 30.00% improvement over Transformer-h3 (encoder only).

In summary, our LTE models rank as the best (bold with a checkmark) and the second best (bold without a checkmark) among a total of 14 models. The best-performing model (bold with a checkmark) demonstrates a substantial improvement across datasets, significantly reducing prediction errors compared to the best-performing model excluding our models (underlined). The results clearly show that LTE models outperform all other models in their prediction performance. The LTE model achieves excellent results with both three and six attention heads, suggesting the model’s robustness. Additionally, not only for the LTE model but also for other algorithms (e.g., Transformer (encoder only)), the overall performance was generally higher when

n_{head}

was set to three rather than six. This demonstrates that when using less complex datasets, capturing relations between elements in a simplified way is more effective than capturing more complex relations.

3.2. Efficiency Analysis: Execution Time and Parameters

Table 3 compares the performance of four Transformer-based models based on their execution time and the number of parameters. The Transformer model has the longest execution time, at 2779 s, and the largest number of parameters, at 2282. The Transformer (decoder only) records an execution time of 1731 s with 1174 parameters, while the Transformer (encoder only) records an execution time of 1863 s with 1030 parameters. Our LTE model demonstrates the best performance, achieving the fastest execution time of 1093 s, which is a 36.86% improvement compared to the Transformer (decoder only), which achieves the second-fastest execution time of 1731 s. In terms of the number of parameters, the LTE model has 1006 parameters, the lowest among all models, representing a 2.33% reduction compared to the Transformer (encoder only) model, which has the second-lowest number of parameters at 1030. These results indicate that our LTE model not only offers excellent prediction performance but is also more efficient, with both shorter execution times and fewer parameters.

3.3. Efficiency Analysis: Removing Layer Normalization and Positional Encoding

To verify the impact of removing PE and LN on the prediction performance of our model, we conducted an efficiency analysis on the presence or absence of PE and LN. Figure 5 is a graph comparing the values (0 to 400 data points) predicted by our model, LTE-h3, which has no PE or LN, with the ground truth values. The blue lines present the predicted values, and the yellow lines show the ground truth values. As shown in Figure 5, it can be confirmed that there is almost no difference between the actual data and the predicted data.

On the other hand, Figure 6 is a graph comparing the values (0 to 400 data points) predicted by the LTE-h3 + PE model, which includes PE but no LN, with the ground truth values. As depicted in this figure, a distinct difference (prediction error) between the ground truth values and the predicted values can be observed. Similarly, Figure 7 is a graph comparing the values (0 to 400 data points) predicted by the LTE-h3 + LN model, which includes LN but no PE, with the ground truth values. As in Figure 6, a clear difference (prediction error) between the ground truth values and the predicted values can be observed. These graphs visually show that removing PE and LN leads to better performance.

Table 4 provides the exact numerical performance of various configurations of the LTE model, including different combinations of PE and LN, with different numbers of attention heads (

n_{head}

). The best-performing model is indicated in bold with a checkmark, while the second-best model is indicated in bold without a checkmark.

The results clearly show that the best performance is achieved by our model, LTE-h3, without either PE or LN, when the number of heads is three. This configuration produces a prediction error of 0.0000238, demonstrating the superior performance of this simpler model. The second-best model was also our model, LTE-h6, without either PE or LN, when the number of heads is six, resulting in a prediction error of 0.0000288.

Excluding the LTE model, the best-performing model is the Transformer encoder, with both PE and LN included, which achieved a prediction error of 0.0000299 with six heads (LTE-h6 + PE + LN). The next-best-performing model is also the Transformer encoder with both PE and LN, but with three heads (LTE-h3 + PE + LN), which achieved a prediction error of 0.0000499. This result suggests that while the Transformer encoder with both modules performs reasonably well, it cannot outperform our LTE model.

On the other hand, the performance of models using either PE or LN alone is notably worse. For instance, with PE alone, the prediction error was 0.1211978 for the LTE-h3 + PE model and 0.0836508 for the LTE-h6 + PE model. Similarly, using LN alone resulted in prediction errors of 0.0737088 for LTE-h3 + LN and 0.0712439 for LTE-h6 + LN. These results indicate that when complex data are introduced through PE, it is necessary to normalize them with LN to maintain the model’s performance. However, when the original data are used without PE, applying LN can degrade the model’s performance due to excessive normalization.

In summary, the best performance is achieved when both modules are excluded, as our dataset exhibits simple, regular, and repetitive patterns, allowing the model to fully capture the natural structure of the data.

4. Discussion

In this research, for satellite data with repetitive patterns (e.g., K3, K3A, and K5), our model demonstrated faster execution times, required fewer parameters, and achieved reduced prediction errors. Therefore, if other researchers choose to apply our model, it appears best suited for relatively simple time series data with repetitive patterns. In such cases, they may find it helpful to begin their experiments with approximately 50 training epochs and a learning rate of 0.001. As for the selection of

n_{head}

, it should be set as a divisor of the data dimension. When the dataset shows relatively simple repetitive patterns, we recommend using a smaller value, such as

n_{head} = 3

. However, defining the criteria for determining the simplicity of data remains challenging, and further exploration is required to establish guidelines on the optimal use of

n_{head}

based on data complexity. Investigating concrete criteria for setting

n_{head}

according to the complexity of data will be one of the topics of our future research.

Another direction for future research involves conducting additional experiments with more complex and diverse satellite datasets to validate the robustness and generalizability of the LTE model. To further extend our study’s generalization capabilities, we plan to test the LTE model on more complex time series data, such as the Electricity Transformer Temperature (ETT) dataset [34] and the M4 dataset [35].

Furthermore, our approach has focused on solving the orbit prediction problem solely from an orbit data-driven perspective, primarily using computer science methodologies. Future research could explore integrating this approach with traditional methods from fields such as astrophysics and statistics. By incorporating external factors that affect the space environment, such as solar activity or atmospheric conditions, as additional features, its prediction performance could be further improved. Moreover, the use of repetitive and periodic time series data collected from sensors on non-satellite objects, combined with methods such as transfer learning, could enhance the generalization ability of the model. Leveraging these techniques would enable the LTE model to adapt more effectively to diverse types of data and extend its applicability beyond satellite orbit prediction.

5. Conclusions

As the space environment becomes increasingly complex due to the rapid rise in the number of satellites, the risk of collisions is also growing. Consequently, the need for accurate satellite orbit prediction has never been more critical. Therefore, a new approach was needed, one that differs from traditional methods that rely on predicting satellite orbits based on simulated data or physical models. Additionally, to efficiently process large volumes of data, a methodology that considers both execution time and the number of parameters used was required.

In this paper, we proposed the LTE model for satellite orbit prediction by modifying the Transformer structure. Our model was tailored to the characteristics of orbital data by eliminating PE, which could distort input data and unnecessarily increase complexity, as well as LN, which could hinder model training through over-normalization. Additionally, we collected a TLE set from the K3, K3A, and K5 satellites operated by the KARI, which comprised approximately 4.8 million data points collected every minute from January 2016 to December 2018. Leveraging data from an operational satellite, rather than simulations, highlighted the practical utility and effectiveness of our model.

We demonstrated the superiority of the LTE model by comparing its performance against a total of 14 prediction models. The LTE model reduced the MSE compared to the best-performing model other than the LTE. The reduction was 50.61% for K3, 42.40% for K3A, and 30.00% for K5. Additionally, on the K3 dataset, our model was 36.86% faster than the second-fastest model and used 2.33% fewer parameters than the model with the second-lowest parameter count. This showed that the LTE model not only improved prediction performance but also had a reduced execution time and parameter usage. Also, we conducted an efficiency analysis on the presence or absence of PE and LN. By varying the number of attention heads, we experimented with all combinations of PE and LN for the LTE model and evaluated the resulting prediction performances. The results indicated that PE and LN were not well suited to orbital data with repetitive and periodic patterns, thereby enhancing the explanatory power of our model.

The LTE model also offers substantial benefits for satellite operations. Ground stations need to process large volumes of satellite data generated every second across multiple satellites. Thus, a prediction model that is both efficient and lightweight, while still achieving a high performance, is essential. Our study demonstrates the LTE model’s feasibility and practical value in addressing these operational demands effectively. In conclusion, we hope that the LTE model can make a meaningful contribution to improving satellite orbit prediction, helping to mitigate the risk of collisions between satellites and space debris and thereby enhancing the safety and sustainability of the space environment.

Author Contributions

Conceptualization, S.J.; methodology, S.J.; formal analysis, S.J.; investigation, S.J.; resources, S.J.; data curation, S.J.; writing—original draft preparation, Y.S.; writing—review and editing, Y.S.; project administration, Y.S.; supervision, Y.S.; funding acquisition, Y.S. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Research Fund of The Catholic University of Korea in 2024, and a National Research Foundation of Korea (NRF) grant funded by the Korea government (MSIT) (No. RS-2023-00213456).

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in this article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Space Exploration Technologies Corp. SpaceX. 2002–2023. Available online: https://www.spacex.com (accessed on 12 July 2024).
Blue Origin, Enterprises, L.P. Blue Origin. 2007–2023. Available online: https://www.blueorigin.com/ (accessed on 12 July 2024).
Rocket Lab USA, Inc. Rocket Lab. 2006–2023. Available online: https://www.https://www.rocketlabusa.com// (accessed on 12 July 2024).
Eutelsat Oneweb. OneWeb. 2012–2022. Available online: https://www.oneweb.world (accessed on 12 July 2024).
Allen, B. SpaceX Launches Starlink Satellites. 2024. Available online: https://earthsky.org/spaceflight/spacex-starlink-launches-june-2024/ (accessed on 13 July 2024).
Kelecy, T.; Jah, M. Analysis of orbital prediction accuracy improvements using high fidelity physical solar radiation pressure models for tracking high area-to-mass ratio objects. In Proceedings of the Fifth European Conference on Space Debris, Darmstadt, Germany, 30 March–2 April 2009; Volume 5. [Google Scholar]
Puente, C.; Sáenz-Nuño, M.A.; Villa-Monte, A.; Olivas, J.A. Satellite Orbit Prediction Using Big Data and Soft Computing Techniques to Avoid Space Collisions. Mathematics 2021, 9, 2040. [Google Scholar] [CrossRef]
Peng, H.; Bai, X. Gaussian Processes for improving orbit prediction accuracy. Acta Astronaut. 2019, 161, 44–56. [Google Scholar] [CrossRef]
Peng, H.; Bai, X. Exploring Capability of Support Vector Machine for Improving Satellite Orbit Prediction Accuracy. J. Aerosp. Inf. Syst. 2018, 15, 366–381. [Google Scholar] [CrossRef]
Proença, P.F.; Gao, Y. Deep Learning for Spacecraft Pose Estimation from Photorealistic Rendering. In Proceedings of the 2020 IEEE International Conference on Robotics and Automation (ICRA), Paris, France, 31 May–31 August 2020; pp. 6007–6013. [Google Scholar] [CrossRef]
Saxena, A.; Baraha, S.; Sahoo, A.K. Improved Orbit Prediction using Gradient Boost Regression Tree. In Proceedings of the 2023 International Conference on Microwave, Optical, and Communication Engineering (ICMOCE), Bhubaneswar, India, 26–28 May 2023; pp. 1–4. [Google Scholar] [CrossRef]
Peng, H.; Bai, X. Improving orbit prediction accuracy through supervised machine learning. Adv. Space Res. 2018, 61, 2628–2646. [Google Scholar] [CrossRef]
Peng, H.; Bai, X. Artificial Neural Network–Based Machine Learning Approach to Improve Orbit Prediction Accuracy. J. Spacecr. Rocket. 2018, 55, 1248–1260. [Google Scholar] [CrossRef]
Wikipedia. Two-Line Element set. 2023. Available online: https://en.wikipedia.org/wiki/Two-line_element_set (accessed on 12 July 2024).
Qu, Z.; Wei, C. A PSO-LSTM-based Method for Spatial Target Orbit Phase Prediction. In Proceedings of the 2023 4th International Conference on Computer Vision, Image and Deep Learning (CVIDL), Zhuhai, China, 12–14 May 2023; pp. 358–362. [Google Scholar] [CrossRef]
Chen, Y.; Wang, K. Prediction of Satellite Time Series Data Based on Long Short Term Memory-Autoregressive Integrated Moving Average Model (LSTM-ARIMA). In Proceedings of the 2019 IEEE 4th International Conference on Signal and Image Processing (ICSIP), Wuxi, China, 19–21 July 2019; pp. 308–312. [Google Scholar] [CrossRef]
Ren, H.; Chen, X.; Guan, B.; Wang, Y.; Liu, T.; Peng, K. Research on Satellite Orbit Prediction Based on Neural Network Algorithm. In Proceedings of the 2019 3rd High Performance Computing and Cluster Technologies Conference, Guangzhou, China, 22–24 June 2019; pp. 267–273. [Google Scholar]
Salleh, N.; Azmi, N.F.M.; Yuhaniz, S.S. An Adaptation of Deep Learning Technique In Orbit Propagation Model Using Long Short-Term Memory. In Proceedings of the 2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Kuala Lumpur, Malaysia, 12–13 June 2021; pp. 1–6. [Google Scholar]
Osama, A.; Raafat, M.; Darwish, A.; Abdelghafar, S.; Hassanien, A.E. Satellite Orbit Prediction Based on Recurrent Neural Network using Two Line Elements. In Proceedings of the 2022 5th International Conference on Computing and Informatics (ICCI), Cairo, Egypt, 9–10 March 2022; pp. 298–302. [Google Scholar] [CrossRef]
Shin, Y.; Park, E.J.; Woo, S.S.; Jung, O.; Chung, D. Selective Tensorized Multi-layer LSTM for Orbit Prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, CIKM ’22, Atlanta, GA, USA, 17–21 October 2022; pp. 3495–3504. [Google Scholar] [CrossRef]
Napoli, C.; De Magistris, G.; Ciancarelli, C.; Corallo, F.; Russo, F.; Nardi, D. Exploiting Wavelet Recurrent Neural Networks for satellite telemetry data modeling, prediction and control. Expert Syst. Appl. 2022, 206, 117831. [Google Scholar] [CrossRef]
Yang, H.T.; Zhu, J.P.; Zhang, J. The Research of Low Earth Orbit Prediction of Satellite Based on Deep Neural Network. DEStech Trans. Comput. Sci. Eng. 2018. [Google Scholar] [CrossRef] [PubMed]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.u.; Polosukhin, I. Attention is All you Need. In Proceedings of the Advances in Neural Information Processing Systems; Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan, S., Garnett, R., Eds.; Curran Associates, Inc.: Long Beach, CA, USA, 2017; Volume 30. [Google Scholar]
KARI. Korea Aerospace Research Institute, 1989–2023. Available online: https://www.kari.re.kr (accessed on 13 July 2024).
Wikipedia. Kepler’s Laws of Planetary Motion, 2023. Available online: https://en.wikipedia.org/wiki/Kepler%27s_laws_of_planetary_motion (accessed on 13 July 2024).
Nasa Science. Orbits and Kepler’s Laws, 2023. Available online: https://science.nasa.gov/resource/orbits-and-keplers-laws/ (accessed on 13 July 2024).
Nasa Science. Orbits and Kepler’s Laws, 2024. Available online: https://science.nasa.gov/solar-system/orbits-and-keplers-laws/ (accessed on 13 July 2024).
Drucker, H.; Burges, C.J.C.; Kaufman, L.; Smola, A.; Vapnik, V. Support Vector Regression Machines. In Proceedings of the Advances in Neural Information Processing Systems; Mozer, M., Jordan, M., Petsche, T., Eds.; MIT Press: Denver, CO, USA, 1996; Volume 9. [Google Scholar]
Zhai, N.; Yao, P.; Zhou, X. Multivariate Time Series Forecast in Industrial Process Based on XGBoost and GRU. In Proceedings of the 2020 IEEE 9th Joint International Information Technology and Artificial Intelligence Conference (ITAIC), Chongqing, China, 11–13 December 2020; Volume 9, pp. 1397–1400. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Kingma, D.P.; Welling, M. Auto-Encoding Variational Bayes. In Proceedings of the 2nd International Conference on Learning Representations, ICLR 2014, Banff, AB, Canada, 14–16 April 2014. Conference Track Proceedings. [Google Scholar]
Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), Doha, Qatar, 25–29 October 2014; pp. 1724–1734. [Google Scholar] [CrossRef]
Schuster, M.; Paliwal, K. Bidirectional recurrent neural networks. IEEE Trans. Signal Process. 1997, 45, 2673–2681. [Google Scholar] [CrossRef]
Zhou, T.; Ma, Z.; Wen, Q.; Sun, L.; Chen, X.; Cai, X.; Zhu, Y.; Zhang, R.; Ma, J.; Xu, Z. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. In Proceedings of the Thirty-Fifth AAAI Conference on Artificial Intelligence, AAAI 2021, Virtual, 2–9 February 2021; pp. 11106–11115. [Google Scholar]
Makridakis, S.; Spiliotis, E.; Assimakopoulos, V. The M4 Competition: Results, findings, conclusion and way forward. Int. J. Forecast. 2018, 34, 802–808. [Google Scholar] [CrossRef]

Figure 1. The end-to-end research pipeline for our LTE model. Step (a) correlates with Section 2.1, where details of the dataset used in our study are presented. Step (b) corresponds to Section 2.2, which outlines our preprocessing procedure. In Section 2.3, represented by Step (c), we introduce our prediction model, LTE.

Figure 2. Plot of the elements from the KOMPSAT-3 satellite dataset, displaying the data points from 300,000 to 302,000. In the case of RAAN, a different range of data points (300,000 to 1,700,000) was provided since it has a long-period repetitive pattern compared to the other elements. Each element exhibits a unique period and wave shape, but a repetitive pattern is observed within each element individually. However, the data show subtle deviations in fluctuation even within these repetitive pattern, and the errors caused by these subtle deviations in fluctuation can lead to major satellite orbit changes.

Figure 3. Comparison of the architectures of the Transformer encoder (a) and the LTE model (b). To improve prediction performance and reduce computational resources, the LTE model removes two unnecessary layers from the Transformer Encoder: PE and LN.

Figure 4. A representation of 200 data points of the Semi-major axis with PE applied. The blue line indicates the input data with PE, while the yellow line represents the input data without PE.

Figure 5. Plot comparing predicted values (0–400 data points) from LTE-h3, without PE and LN, with the ground truth. The blue line represents the predicted values and the yellow line represents the ground truth values.

Figure 6. Plot comparing predicted values (0–400 data points) from LTE-h3 + PE, with PE and without LN, with the ground truth. The blue line represents the predicted values and the yellow line represents the ground truth values.

Figure 7. Plot comparing predicted values (0–400 data points) from LTE-h3 + LN, without PE and with LN, with the ground truth. The blue line represents the predicted values and the yellow line represents the ground truth values.

Table 1. Description of the six orbital elements used in our study from the TLE set, which consists of twelve elements. The table includes the name and symbol of each element, a brief description, and the range of each element, as obtained from the data of each satellite.

Element’s Name	Description	Dataset	Range
Semi-major axis (S)	A value that indicates the size of the orbit, expressed in kilometers, and signifies how far the satellite is from the Earth.	K3	7052.951∼7073.345
		K3A	6892.097∼6915.121
		K5	6928.670∼6937.687
Eccentricity (E)	A value that characterizes the orbit as an elliptical shape, ranging between 0 and 1.	K3	0.000001∼0.003911
		K3A	0.000002∼0.004223
		K5	0.000410∼0.003133
Inclination (I)	The angle by which the orbital plane is tilted concerning the equatorial plane.	K3	98.11845∼98.19598
		K3A	97.45451∼97.59076
		K5	97.59100∼97.63480
RAAN (R)	The angle between the vernal equinox and the ascending node of the satellite’s orbit.	K3	0.00034∼359.99997
		K3A	0.00006∼359.99992
		K5	0.00004∼359.99992
Argument of perigee (A)	The angle formed between the point where the satellite is nearest to the Earth and the satellite’s ascending node in its orbital path.	K3	0.00016∼359.99936
		K3A	0.00028∼359.99987
		K5	0.00014∼359.99990
Mean anomaly (M)	The angle at a specific point in the orbit, measured from the perigee.	K3	0.00079∼359.99899
		K3A	0.00107∼359.99975
		K5	61.20162∼298.27813

Table 2. Orbit prediction performance of 14 different models. The prediction error is calculated using the MSE. The best-performing model is indicated in bold with a checkmark, the second-best-performing model is indicated in bold without a checkmark, and the best-performing model excluding our model is underlined. “Improvement” is the percentage of accuracy improvement between the best-performing model (bold with a checkmark) and the best-performing model except ours (underlined). “h3” in the model name indicates that

n_{head}

is set to 3, while “h6” in the model name indicates that

n_{head}

is set to 6.

Table 2. Orbit prediction performance of 14 different models. The prediction error is calculated using the MSE. The best-performing model is indicated in bold with a checkmark, the second-best-performing model is indicated in bold without a checkmark, and the best-performing model excluding our model is underlined. “Improvement” is the percentage of accuracy improvement between the best-performing model (bold with a checkmark) and the best-performing model except ours (underlined). “h3” in the model name indicates that

n_{head}

is set to 3, while “h6” in the model name indicates that

n_{head}

is set to 6.

Prediction Error (MSE)
Model	$n_{head}$	Prediction Error
Model	$n_{head}$	K3	K3A	K5
SVR	-	0.0033472	0.0015751	0.0010871
XGBR	-	0.0633434	0.0576222	0.0653850
LSTM	-	0.0156621	0.0038404	0.0004618
VAE	-	0.0713109	0.0211886	0.0016111
GRU	-	0.0154912	0.0204662	0.0101774
Bi-LSTM	-	0.0086924	0.0041601	0.0003658
Transformer-h3	3	0.1071742	0.1492044	0.1120841
Transformer-h3 (encoder only)	3	0.0000488	0.0000214	0.0000010
Transformer-h3 (decoder-only)	3	0.0009327	0.0000934	0.0000011
LTE-h3, ours	3	✓0.0000241	0.0000107	✓0.0000007
Transformer-h6	6	0.0778209	0.1337293	0.1125202
Transformer-h6 (encoder only)	6	0.0000538	0.0000125	0.0000018
Transformer-h6 (decoder only)	6	0.0003631	0.0000558	0.0000030
LTE-h6, ours	6	0.0000292	✓0.0000072	0.0000008
Improvement	-	50.61%	42.40%	30.00%

Table 3. Execution time and number of parameters for Transformer-based models and the LTE model. The best model is indicated in bold with a checkmark, and the second-best model is indicated in bold without a checkmark. “Improvement” is the percentage of accuracy improvement between the best-performing model (bold with a checkmark) and the best-performing model except ours (bold). In terms of execution time, our model is 36.86% faster than the second-fastest model, Transformer (decoder only). Regarding the number of parameters, our model achieves a 2.33% reduction compared to the second-best model, Transformer (encoder only).

Model	Execute Time (s)	Num. of Parameters
Transformer	2779	2282
Transformer (encoder only)	1863	1030
Transformer (decoder only)	1731	1174
LTE, ours	✓1093	✓1006
Improvement (Reduction Rate)	36.86%	2.33%

Table 4. Comparison of prediction performance based on the presence or absence of PE and LN, with different numbers of heads used (

n_{head}

). The best-performing model is indicated in bold with a checkmark, while the second-best-performing model is indicated in bold without a checkmark.

Table 4. Comparison of prediction performance based on the presence or absence of PE and LN, with different numbers of heads used (

n_{head}

). The best-performing model is indicated in bold with a checkmark, while the second-best-performing model is indicated in bold without a checkmark.

Model	$n_{head}$	Prediction Error (MSE)
LTE-h3 + PE + LN	3	0.0000499
LTE-h6 + PE + LN	6	0.0000299
LTE-h3 + PE	3	0.1211978
LTE-h6 + PE	6	0.0836058
LTE-h3 + LN	3	0.0737088
LTE-h6 + LN	6	0.0712439
LTE-h3, ours	✓3	✓0.0000238
LTE-h6, ours	6	0.0000288

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Jeong, S.; Shin, Y. LTE: Lightweight Transformer Encoder for Orbit Prediction. Electronics 2024, 13, 4371. https://doi.org/10.3390/electronics13224371

AMA Style

Jeong S, Shin Y. LTE: Lightweight Transformer Encoder for Orbit Prediction. Electronics. 2024; 13(22):4371. https://doi.org/10.3390/electronics13224371

Chicago/Turabian Style

Jeong, Seungwon, and Youjin Shin. 2024. "LTE: Lightweight Transformer Encoder for Orbit Prediction" Electronics 13, no. 22: 4371. https://doi.org/10.3390/electronics13224371

APA Style

Jeong, S., & Shin, Y. (2024). LTE: Lightweight Transformer Encoder for Orbit Prediction. Electronics, 13(22), 4371. https://doi.org/10.3390/electronics13224371

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LTE: Lightweight Transformer Encoder for Orbit Prediction

Abstract

1. Introduction

2. Materials and Methods

2.1. Dataset

2.2. Data Preprocessing

2.3. Prediction Model

2.4. Experiments

3. Results

3.1. Prediction Performance

3.2. Efficiency Analysis: Execution Time and Parameters

3.3. Efficiency Analysis: Removing Layer Normalization and Positional Encoding

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI