TORKF: A Dual-Driven Kalman Filter for Outlier-Robust State Estimation and Application to Aircraft Tracking

Liu, Li; Bi, Wenhao; Zhang, Baichuan; Huang, Zhanjun; Zhang, An; Xu, Shuangfei

doi:10.3390/aerospace12080660

Open AccessArticle

TORKF: A Dual-Driven Kalman Filter for Outlier-Robust State Estimation and Application to Aircraft Tracking

by

Li Liu

¹,

Wenhao Bi

^1,2,*

,

Baichuan Zhang

³,

Zhanjun Huang

^1,2,

An Zhang

^1,2 and

Shuangfei Xu

¹

School of Aeronautics, Northwestern Polytechnical University, Xi’an 710072, China

²

National Key Laboratory of Aircraft Configuration Design, Xi’an 710072, China

³

AVIC Xi’an Flight Automatic Control Research Institute, Xi’an 710076, China

^*

Author to whom correspondence should be addressed.

Aerospace 2025, 12(8), 660; https://doi.org/10.3390/aerospace12080660

Submission received: 11 June 2025 / Revised: 16 July 2025 / Accepted: 22 July 2025 / Published: 25 July 2025

(This article belongs to the Section Aeronautics)

Download

Browse Figures

Versions Notes

Abstract

This study addresses the limitations of conventional filtering methods in handling irregular outliers and missing observations, which can compromise filter robustness and accuracy. We propose the Transformer-based Outlier-Robust Kalman Filter (TORKF), a hybrid data and knowledge hybrid-driven framework for stochastic discrete-time systems. Initially, this study derives the filtering formulas applicable when outliers exist in observation vectors and, based on these formulations, proposes a novel method capable of accurately identifying observation vectors containing outliers. In addition, a transformer-based prediction compensation approach is employed to compute the prediction vector compensation value in scenarios involving outliers. This method utilizes a specially designed data structure to ensure the transformer encoder fully extracts the input features. Furthermore, to address outlier-induced inaccuracy in prediction error covariance, a compensation method aggregating all prediction outcomes is proposed, leading to enhanced filtering accuracy. Aircraft tracking presents challenges from complex motion models and outlier-prone observations, making it an ideal testbed for robust filtering algorithms. TORKF demonstrates superior performance, with a 12.7% lower RMSE than state-of-the-art methods across both propeller and jet datasets, while maintaining sub-90 ms single-frame processing to meet real-time requirements. Ablation studies confirm that all three proposed methods enhance accuracy and demonstrate synergistic improvements.

Keywords:

outlier; Kalman filter; transformer; robust; aircraft tracking

1. Introduction

Optimal filtering is devoted to the problem of optimal estimation of the state or signal of a system. The classical Wiener filtering method is a frequency-dominant filtering method that employs the spectral decomposition technique of the stationary l1-dom process. However, Wiener filtering is limited by its static stochastic processing, the need to store all historical data, and high computational complexity; thus, its application is somewhat limited [1]. The Kalman filter (KF) proposed later is a time-domain filtering method. It employs state-space techniques and can be computed recursively in real time. It suits time-varying systems, non-stationary stochastic processes, and multidimensional signal filtering. Moreover, KF overcomes the shortcomings and limitations of classical Wiener filters and is widely used in many fields, such as target tracking [2,3], relative orbit estimation [4,5], and navigation [6].

When the system obtains the observation vector, the instability of the system and the large amount of clutter and noise in the observation background [7] often lead to data points that seriously deviate from the target truth value in the observation value—that is, the outlier [8]. It is characterized by irregularity, short duration, and large amplitude. Under the interference of outliers, the filtering algorithm is prone to overflow, increased data processing errors, and even divergence in filtering [9,10].

The present studies on outliers can generally be categorized into two main types. One method is to incorporate it into conventional noise and investigate its probability density. Due to its heavier tails compared to the Gaussian distribution, the distribution of the outlier is referred to as the heavy-tailed distribution [11]. Then, based on its statistical properties, the filtering procedure appropriate for a non-Gaussian distribution can be employed [12]. The other method is to regard them as statistically irregular outliers for distinct identification and processing [13]. In the application, the outlier is often accompanied by the abrupt failure of the system or the sudden change of the external environment, which lacks predictable regularity. It is challenging to implement the offline method in a real-time application [14]. Moreover, the introduction of low-probability outliers into the distribution will make it difficult for the statistical characteristics of the observations to reflect the statistical properties when no outlier occurs, resulting in reduced robustness of the filter [15]. Therefore, it is of theoretical significance and implementation prospect to investigate how to identify and process outliers instantaneously.

To precisely detect outliers, the authors of [16] proposed a method based on innovation, where innovation indicates the difference between a state estimation value and a predicted value. Ref. [17] proposes a semidefinite program for outlier detection. The above filtering algorithms tend to detect outliers and then correct them, which is hardly useful in the face of prominent outliers or missing observations. Ref. [18] reveals that for outlier-prone processes, the optimal filtering approach is nonlinear, even if the system dynamics are linear. Since machine learning methods are well-suited for finding complex, nonlinear relationships in high-dimensional function spaces, this class of methods is currently the optimal choice. Ref. [19] combines a deep neural network with new information as input to train a Long Short-Term Memory (LSTM) network [20] to obtain the initial estimate, and its uncertainty of system noise variance is utilized as a new measurement in the filter. This filtering method uses the network output directly as a measure; the filtering effect is limited by the type of noise used for training and is highly dependent on the stability of the network. There are also many neural network-based methods used for state prediction [21,22], but these methods are unable to make accurate predictions for data containing noise [23,24]. When dealing with the challenge of filtering complex and variable object states, it is hard to create a precise state prediction model [25]. The observation results significantly impact the filtering outcomes, making aircraft tracking a prime illustration of such issues [26]. In addition, transformers [27,28] exhibit superior capabilities in global information extraction and processing compared to traditional neural networks like LSTM, particularly in handling time series prediction tasks. Therefore, an outlier-robust filter that combines accurate prediction with outlier processing capabilities is of great research value [29].

In this paper, based on the characteristics of the existing outlier-robust filtering methods, combined with the characteristics of knowledge and data-driven state prediction methods with high prediction accuracy and strong real-time performance, the Transformer-based Outlier-Robust Kalman Filter (TORKF) is proposed. This method can accurately identify outliers in the observations and modify them with a high-precision prediction method, which improves the accuracy and robustness of the filtering algorithm. The main innovations in this study can be summarized as follows:

(1) Instead of investigating the probability density of outliers, an Outlier-Robust Kalman Filter (ORKF) based on innovation is proposed to identify and process outliers precisely.

(2) To address the issue of prediction errors caused by incomplete matching between the state space model and the actual state of objects, a Transformer-based Prediction Error Compensation (TPEC) model is proposed.

(3) A Prediction Error Covariance Correction (PECC) method is proposed to correct the misalignment of the prediction error covariance matrix caused by outliers.

The remainder of this paper is organized as follows. Section 2 briefly describes the outlier-robust filtering problem and its modeling method. Section 3 provides the details of the proposed TORKF algorithm. The experimental results and discussions are presented in Section 4. The conclusions are drawn in Section 5.

2. Problem Description

This paper considers the state space model as the following decentralized linear system [30]:

x_{k} = f (x_{k - 1}) + w_{k - 1}

(1)

y_{k} = h (x_{k}) + v_{k}

(2)

where

x_{k} \in R^{n}

is the n-dimensional state vector of the system k moment,

y_{k} \in R^{m}

is the m-dimension measurement vector for the system k moment,

f (\cdot)

is the system state function for the system, and

h (\cdot)

is the system measurement function.

w_{k - 1} \in R^{n}

is the

k - 1

moment system model noise, and

v_{k} \in R^{m}

is the k moment that does not contain the observation noise of outliers. Both are zero-average Gaussian white noises and are independent of each other; the covariance matrices are

Q_{k - 1}

and

R_{k}

.

Outliers are challenging to predict and unrelated to other noise; they also have large amplitudes. Therefore, the observation equation is redefined as Equation (3).

y_{k} = h (x_{k}) + v_{k} + o_{k}

(3)

where

o_{k}

is the outlier, which is bounded.

o_{k}

is statistically independent of

w_{k}

and

v_{k}

. The covariance matrix of

o_{k}

is

O_{k}

.

Define the sequence of the input vector as

x^{(t)}

; the observation marker as

E s t^{(t)}

, and the outlier-robust filtration method as

G^{(t)} : = G (x^{(t)}; E s t^{(t)})

. The current moment outlier-robust filtering result and the present moment true value

{\tilde{x}}^{(t)}

are equal to the minimum similar error, meeting both of the objectives, as shown in Equation (4).

L^{(t)} = \min |E [G^{(t)} - {\tilde{x}}^{(t)}] E^{T} [G^{(t)} - {\tilde{x}}^{(t)}]|

(4)

3. Proposed Algorithm

The illustration of the proposed TORKF is reflected in Figure 1. The ORKF based on innovation, the TPEC model, and the PECC method are all designed for TORKF. After completing its initialization, TORKF first detects whether there is an observation input. In the absence of input and before the termination condition is met, it treats the absence of input as if it were an outlier. With input available, the filter proceeds with prediction. The prediction result and the observation serve as inputs for outlier detection, and the filter updates itself in the absence of outliers. When an outlier is detected or there is no observation, the TPEC model is employed to adjust the prediction result, following which the prediction error covariance is corrected to obtain the final filtered output result.

A more detailed description of the filtering process is shown in Figure 2, which includes complete judgment logic and clear inputs and outputs.

3.1. Outlier-Robust Kalman Filter

Considering the differential linear model as described in the problem, the core of KF is to estimate the difference of the error to the minimum and to guarantee the impartiality of the linear optimum estimate, namely,

L_{k} = \min |E [{(x_{k} - {\tilde{x}}_{k})}^{T} (x_{k} - {\tilde{x}}_{k})]|

(5)

where

x_{k} \in R^{n}

is the state estimation of the KF. When outliers are present in the observational noise, the observation error no longer conforms to the statistical properties of the zero-average Gaussian white noise. Then, it is challenging to use the output of the KF to satisfy the principle of the minimum difference. Therefore, outliers are evaluated for the current observation vector of the input. Like KF, ORKF can also be presented in two parts: time update and measurement update.

3.1.1. Time Update

During the time update, the ORKF optimizes linear estimates based on the state equation and adjusts the state error covariance. The equations are shown in Equations (6) and (7), respectively.

x_{k | k - 1} = F_{k} x_{k - 1}

(6)

P_{k | k - 1} = F_{k} P_{k - 1} F_{k}^{T} + Q_{k - 1}

(7)

where

x_{k - 1} \in R^{n}

is the n-dimensional state vector of the system

k - 1

moment,

x_{k | k - 1} \in R^{n}

is the n-dimension prediction vector for the system k moment,

F_{k}

is the state transition matrix, and

P_{k - 1}

is the system state covariance matrix of the system at the moment

k - 1

.

P_{k | k - 1}

is the k moment system prediction error covariance matrix, and

Q_{k - 1}

is the

k - 1

moment model error covariance matrix.

3.1.2. Measurement Update

In the measurement update phase, the outlier detection must first be carried out on the input quantity measurement. The purpose of outlier detection is to identify the observation vector containing anomalous noise. To accurately identify outliers, this study uses the innovation variance as the identification vector and the prediction covariance as the identification control vector. The innovation

z_{k}

is calculated as follows:

\begin{array}{l} z_{k} = y_{k} - H_{k} x_{k | k - 1} \\ = H_{k} w_{k - 1} + v_{k} + o_{k} \end{array}

(8)

E (z_{k}) = E (H_{k} w_{k - 1} + v_{k} + o_{k}) = E (o_{k})

(9)

where

y_{k} \in R^{n}

is the n-dimensional observation vector of the system k moment;

H_{k}

is the measurement matrix; and

o_{k}

is the outlier, which is bounded.

o_{k}

is statistically independent of

w_{k}

and

v_{k}

. The covariance matrix of

o_{k}

is

O_{k}

.

When there is no outlier, the prediction covariance matrix is as follows:

\begin{array}{l} D (z_{k}) = E [{(z_{k} - E (z_{k}))}^{2}] - E {(z_{k} - E (z_{k}))}^{2} \\ = R_{k} + H_{k} Q_{k} H_{k}^{T} \end{array}

(10)

When there is an outlier, the prediction covariance matrix is shown in Equation (11).

D (z_{k}) = R_{k} + H_{k} Q_{k} H_{k}^{T} + D (o_{k})

(11)

Select

B_{k}

as the matrix for determining whether the innovation vector contains outliers or not.

B_{k}

is shown in Equation (12).

B_{k} = H_{k} P_{k | k - 1} H_{k}^{T} + H_{k} Q_{k - 1} H_{k}^{T} + R_{k}

(12)

The judgment matrix

J_{k}

is given in Equation (13). The judgment equation is shown in Equation (14).

J_{k} = z_{k} z_{k}^{T}

(13)

ς (i) = \{\begin{cases} 1, J_{k} (i, i) \geq ω (i) B_{k} (i, i) \\ 0, J_{k} (i, i) < ω (i) B_{k} (i, i) \end{cases}, i = 1, 2 \dots n

(14)

where

ς

is the mark vector of an outlier,

ς = T r u e

is the outlier, and

ς = F a l s e

indicates that no outlier exists.

ω

is the coefficient vector used for determining an outlier, which depends on the statistical characteristics of the outlier; when the outlier’s statistical characteristics are apparent, they are determined according to their differences. When the statistical characteristics of

ω

are ambiguous, the selection of the criteria can be referred to as the three-sigma rule [31].

If there is no outlier, the filtering operation is carried out using Equations (15)–(17).

K_{k} = P_{k | k - 1} H_{k}^{T} {[H_{k} P_{k | k - 1} H_{k}^{T} + R_{k}]}^{- 1}

(15)

x_{k} = x_{k | k - 1} + K_{k} (z_{k} - H_{k} x_{k | k - 1})

(16)

P_{k} = (I - K_{k} H_{k}) P_{k | k - 1} {(I - K_{k} H_{k})}^{T} + K_{k} R_{k} K_{k}^{T}

(17)

When an outlier is detected, TORKF is updated, as shown in Figure 2. The termination condition is determined according to the continuous circumstance of the outlier, and the termination judgment value

γ

depends on the capability of the TPEC model below.

3.2. Transformer-Based Error Compensation Model

In this article, the TPEC model is designed to improve the predictive model’s prediction accuracy and improve the filter’s overall robustness and filtration precision due to the inadequate forecasting model precision. The TPEC model in this article can be divided into three parts: input matrix construction, feature extraction, and Multilayer Perceptron (MLP). The illustration of the TPEC model structure is shown in Figure 3.

This article provides two methods for constructing the input matrix in the input matrix construction portion.

When the regularity of the time series is robust, the procedure of the extraction of time sequence characteristics is used. Defining the model’s k frame’s input matrix is

α_{k}

, which is composed of

a_{τ}^{k}

, and normalization matrix

Λ

.

α_{k}

is shown in Equation (18).

α_{k} = [a_{0}^{k}, a_{1}^{k}, \dots a_{T - 1}^{k}] Λ

(18)

where

τ \in (0, 1, \dots T - 1)

is the input vector of a single frame, and T is the total number of frames. Define

Θ^{k} = [a_{0}^{k}, a_{1}^{k}, \dots a_{T - 1}^{k}]

, contains m values, and

n = m \times T

. is a

n \times n

dimension diagonal matrix. and are shown in Equations (19) and (20).

a_{τ}^{k} = (x_{k - T + τ}, Δ x_{k - T + τ}, Ω)

(19)

Λ = [\begin{matrix} \frac{1}{\max (Θ_{1}) - \min (Θ_{1})} & \dots & 0 \\ ⋮ & ⋱ & ⋮ \\ 0 & \dots & \frac{1}{\max (Θ_{n}) - \min (Θ_{n})} \end{matrix}]

(20)

where

x_{k - T + τ}

is the state vector at the moment

k - T + τ

, and

Δ x_{k - T + τ}

is the difference between the state vector at the moment

k - T + τ

and the state vector at the previous moment.

Ω

is the measurement error in the absence of outliers.

When a single sequence contains more complete characteristics, according to the predictive vector input

x_{k | k - 1}

, the dimension is s, and each dimension is treated as a word vector.

α_{k}

can be divided into a

s \times (m / s)

dimension matrix as the input of the characteristic extraction model.

The transformer model has the characteristics of a rapid forward transmission speed, low structural complexity, and high characteristic extraction efficiency. Therefore, this article employs the transformer encoder element to extract the characteristics of the filter prediction vector. Attention is the premise for constructing the transformer model [32], whose calculation formula is shown in Equation (21).

A t t e n t i o n (Q, K, V) = s o f t \max (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(21)

where Q and K are queries and keys of dimension

d_{k}

, and V indicates the values of dimension

d_{v}

.

The transformer model is based on multi-headed attention, incorporating reconciliation, localization, MLP, and other structures. The model is shown in Figure 3. The encoder of the structure is made up of a series of N-layer networks, each of which contains two sub-layers of the multi-headed attention mechanism and the prefacing neural network.

The transformers encoder is used to extract the characteristics of the input matrix. Normally, the choice of the number of attention heads is limited by the size of the graphics memory and the running time of the algorithm. Therefore, although a larger number means that more dimensional feature information is extracted, it is necessary to balance the physical constraints with the richness of the extracted information, and the choice is usually made among an even number of heads, which is often taken to be in the range of 2–18. The choice of layers for the encoder is similar. Regarding the number of layers in an MLP, it is usually necessary to experiment with the size of the data volume and the complexity of the nonlinear relationships, with 2–3 layers often being a good starting point for experimentation.

Considering the machine learning process of transformers encoder and MLP as a TEC function,

C_{k}

is the same dimension as the state vector, and

C_{k}

is compensated for using the status vector. and the final output state vector.

x_{k}

is shown in Equations (22) and (23).

C_{k} = T E C (α_{k}) * M

(22)

x_{k} = x_{k | k - 1} + C_{k}

(23)

where M is the transformation diagonal matrix of the output vector of the transformer model with the compensation vector.

3.3. Prediction Error Covariance Correction Method

When outliers appear in the observation vector, at the same time as compensating for the prediction error, the prediction error covariance matrix updated in the Kalman filter forecast phase will also be misaligned, so the error covariance matrix needs to be adjusted according to the compensation result of the predicting error to ensure its subsequent filtration robustness. The illustration of the PECC method structure is shown in Figure 4.

The input of the PECC method is the prediction compensation vector, the predictive vector, and the prediction error covariance matrix. The input was concatenated and flatted by the pre-processor. The output of the pre-processor is processed by a three-layer MLP and receives the error adjustment matrix

Φ

.

Γ_{k}

is defined as a prediction error covariance correction matrix, which is a diagonal matrix used to rectify the observation error vector after the prediction error compensation when outliers appear in the observational vector. It is also used to correct the prediction error covariance. The prediction error covariance correction matrix is indicated in Equation (24).

Γ_{k} = [H_{k} (Φ P_{k | k - 1}) H_{k}^{T} + R_{k}] {((Φ P_{k | k - 1}) H_{k}^{T})}^{- 1}

(24)

The corrected state error covariance is included in the error covariance differential correction, and the error covariance difference is obtained in the form of Equation (25).

P_{k} = Γ P_{k | k - 1}

(25)

The pseudo code of the TORKF is shown in Algorithm 1.

Algorithm 1 TORKF method

Require: Observation error matrix R_k, number of time series length T ≥ 1, system state and measurement equations f, h, covariance matrices Q_k and R_k of motion and measurement noise.
Ensure: predicted object states

x_{t}

for t = 1, … T
Initialize: The error covariance matrix P, System state vector x
for k = 1 to T do
    predict as (6) and (7)
    outlier detect as (14)
    if outlier:

P_{k} = P_{k | k - 1}

x_{k} = x_{k | k - 1}

    else:
          update as (15)–(17)
end
for t = K to end time do
    Predict as (6) and (7)
    Outlier detected as (14)
    if outlier:
        calculate the error compensation C_k as (22)
        calculate the state vector x_k as (23)
        calculate the

P_{k} = P_{k | k - 1}

compensation Γ_k as (24)
calculate the error covariance matrix

P_{k}

as (25)
else:
update as (15)–(17)
end

4. Experiment and Analysis

To verify the effectiveness and advancement of the transformer-based field resistance filtration method, this study uses a three-dimensional aircraft position tracking model to test. The model is shown in Equations (26) and (27).

x_{k | k - 1} = [\begin{matrix} U_{x} & 0 & 0 \\ 0 & U_{y} & 0 \\ 0 & 0 & U_{z} \end{matrix}] x_{k - 1} + w_{k - 1}

(26)

y_{k} = [\begin{matrix} I & 0 & 0 \\ 0 & I & 0 \\ 0 & 0 & I \end{matrix}] x_{k} + v_{k}

(27)

where U and Q are shown as Equations (28) and (29), respectively.

U = [\begin{matrix} 1 & T & (α T - 1 + e^{- α T}) / α^{2} \\ 0 & 1 & (1 - e^{- α T}) / α \\ 0 & 0 & e^{- α T} \end{matrix}]

(28)

Q = 2 α σ^{2} [\begin{matrix} q_{11} & q_{12} & q_{13} \\ q_{21} & q_{22} & q_{23} \\ q_{31} & q_{32} & q_{33} \end{matrix}]

(29)

where

T

is the filter time interval, and

α

is the motorization frequency, which expresses the maneuverability of an aircraft.

σ^{2}

is shown in Equation (30), and

q_{i j}

is shown in Equation (31).

σ^{2} = \{\begin{matrix} \frac{4 - π}{π} {(a - \bar{a})}^{2}, \bar{a} > 0 \\ \frac{4 - π}{π} {(- a + \bar{a})}^{2}, \bar{a} < 0 \end{matrix}

(30)

\begin{array}{l} q_{11} = (\begin{array}{l} 1 - e^{- 2 α T} + 2 α T + 2 α^{3} T^{3} / 3 - \\ 2 α^{2} T^{2} - 4 α T e^{- α T} \end{array}) / (2 α^{5}), \\ q_{12} = (\begin{array}{l} 1 + e^{- 2 α T} - 2 e^{- α T} + 2 α T e^{- α T} - \\ 2 α T + α^{2} T^{2} \end{array}) / (2 α^{4}), \\ q_{13} = (1 - e^{- 2 α T} - 2 α T e^{- α T}) / (2 α^{3}), \\ q_{22} = (4 e^{- α T} - 3 - e^{- α T} + 2 α T) / (2 α^{3}), \\ q_{23} = (e^{- 2 α T} + 1 - 2 e^{- α T}) / (2 α^{2}), \\ q_{33} = (1 - e^{- 2 α T}) / (2 α), q_{21} = q_{31} = q_{32} = 0 \end{array}

(31)

where

a

is the acceleration at the current moment, and

\bar{a}

is the average acceleration.

4.1. Datasets and Environments

The datasets used for this experiment utilize flight data generated by the pilot on the semi-physical simulator during one-on-one struggles and overlapping detection errors to test the algorithm. The semi-physical simulator consists of a one-to-one reproduction of the aircraft operating unit, wide-angle scene reproduction hardware, and a server. The pilot is also a professional pilot with many years of flight experience, and the data has a high degree of real scene reproducibility. Flight datasets consist of two categories of data: low-speed dataset (dataset number Ⅰ) and high-speed datasets (dataset number Ⅱ), the specifications of which are shown in Table 1.

These two datasets used in the experiment were generated by two distinct models of aircraft, where the low-speed dataset corresponds to the propeller aircraft and the high-speed dataset to the jet aircraft.

The experiment conducted on Dataset Ⅰ and Dataset Ⅱ utilized 90% of the data for model training and allocated the remaining 10% for verification and test purposes. The following experimental results are from the test dataset. The experimental computer was configured for an Intel i7-13700K CPU, an NVIDIA RTX 4090 GPU, and 32 GB of running memory. The system environment used for the simulation experiment was Windows 11, implemented under the Pytorch 2.4 framework using the Python 3.11.

4.2. Experiment Details

The aircraft coordinate system used in the experiment is a geographic coordinate system. During the experiment, the sampling cycle T was set at 0.5 s. The Motorization frequency was set to 0.5. All the neural network models were trained using the same loss function, optimizer, and learning rate. The L2 loss function and the Adam optimizer were selected during model training, and the learning rate was set to 0.001. Considering the limitation of computational resources and the need to satisfy the real-time requirement, and at the same time, more iterations not increasing accuracy, we chose 12 attention heads and a 6-layer encoder. The 3-layer MLP network has a good balance of accuracy and computation time in both TPEC and PECC. The input is a 128-frame matrix, as shown in Equation (32). All neural network models are trained with the same input for 50 epochs.

The Root Mean Square Error (RMSE) is utilized as an evaluation criterion. Suppose that the current frame is k,

x_{k}

is filter output, and

{\tilde{x}}_{k}

is the true value of frame k. N is all the time frames that were computed. The RMSE of frame k is calculated as

R M S E_{k} = \sqrt{\frac{1}{N} [(N - 1) {R M S E}_{2}^{k - 1} + {(x_{k} - {\tilde{x}}_{k})}^{2}]}

(32)

4.3. Overall Evaluation

In the experiment, TORKF is compared with the following robust filters.

(1) CS-MAEKF [33]: a current statistical model used for maneuvering acceleration using an adaptive extended Kalman filter algorithm.

(2) WRPF [34]: a statistically robust particle filter that exhibits high statistical efficiency and good robustness to outliers.

(3) LSTM-KF [35]: a hybrid LSTM and KF model.

(4) MMCKF [36]: a filter that combines an M-estimation and information-theoretic-learning (ITL)-based Kalman filter under impulsive noises.

Table 2 presents a comprehensive comparison between TORKF and state-of-the-art filters in terms of RMSE, running time, and maximum runtime per frame. Compared to the other algorithms, TORKF achieved significant reductions in RMSE, reaching maximum improvements of 133.11 m and 242.88 m across two datasets, with minimum reductions of 2.1 m and 5.67 m, respectively. Moreover, regardless of the dataset, the runtime of TORKF remains below 100 ms, satisfying the conditions for real-time operation.

Figure 5 and Figure 6 show the RMSE performance of the test results for all comparison methods in Dataset I and Dataset II, respectively. For different datasets, we give the RMSE of the airplane’s x, y, and z positions and the average position RMSE of the three axes over time. Comparing Figure 5 and Figure 6, it can be observed that the filters’ outputs show similar patterns of RMSE variation when faced with aircraft with different maneuvering capabilities, maximum speeds, and maximum accelerations, as well as different noise and outliers in datasets.

Specifically, WRPF converges slower than other KF-based filtering methods due to its different Monte Carlo-based filtering method from KF. The accuracy of the filters that use the outlier-robust method or neural network method is higher than those that do not use these methods. The transformer-based error compensation model and the PECC method can also greatly improve the filtering accuracy. TORKF achieves the best filtering results in both datasets.

Under different datasets and motion characteristics, the RMSEs of ORKF, CS-MAEKF, and MMCKF have similar trends. The RMSE of MMCKF and CS-MAEKF is significantly higher than that of ORKF. The RMSE of WRPF increases more sluggishly due to the selection of 100 sampling points, which have a better ability to fit the noise. However, the RMSE of ORKF converges and starts to be significantly smaller than that of WRPF after about 200 s.

In particular, although MMCKF utilizes M-estimation and an ITL-based method to enhance its robustness, the algorithm shows poor accuracy when handling data containing numerous outliers in both datasets. Upon reviewing the filtering results, we observed that while the method prevents the results from diverging, addressing an outlier in one frame deviates the results over subsequent multiple frames. As a result, the RMSE of MMCKF is significantly larger compared to other methods.

To validate the effectiveness of our proposed three methods, this paper not only compares with existing state-of-the-art approaches but also conducts ablation studies. The comparison methods used in the ablation study involve the combination of the proposed methods, including ORKF, Attention-based Outlier-Robust Kalman Filter (AORKF), prediction-error-covariance correction Outlier-Robust Kalman Filter (PORKF), and LSTM-based prediction-error-covariance correction Outlier-Robust Kalman Filter (LPORKF). In the above methods, AORKF is enhanced by adding the TPEC model to ORKF, PORKF is the combination of ORKF and the PECC method, and LPORKF is an algorithm that integrates LSTM and PECC into ORKF.

4.4. Ablation Study

4.4.1. Innovation-Based ORKF Ablation Study

The effectiveness of the innovation-based ORKF is demonstrated through the comparative analysis presented in Table 2. Compared to other outlier-resistant Kalman filters such as CS-MAEKF and MMCKF, ORKF achieves significant performance improvements, reducing RMSE by at least 53.84 m and 82.51 m in the two datasets, respectively. Moreover, ORKF also consumes approximately the same amount of time.

The convergence speed of filters is also a crucial metric for assessing the performance of filters. According to Figure 5 and Figure 6, ORKF converges faster than CS-MAEKF, MMCKF, and WRPF in both datasets.

4.4.2. Transformer-Based Error Compensation Model Ablation Study

After applying the prediction error covariance correction. Table 3 shows that the RMSE of the AORKF method is reduced by 2.67 m and 10.91 m in Dataset Ⅰ and Dataset Ⅱ compared to ORKF, respectively. The LSTM-KF employs LSTM for end-to-end processing of observations with outliers, demonstrating certain effectiveness in this approach. Compared to ORKF, the RMSE results of LSTM-KF are reduced by 0.84 m and 8.88 m in Dataset Ⅰ and Dataset Ⅱ, respectively. However, compared to AORKF, LSTM-KF still has a higher RMSE, with values of 1.83 m and 2.03 m, respectively.

The experimental results indicate that the filtering accuracy and robustness are superior to those achieved by the filtering algorithm alone, whether using deep learning methods for error compensation or for observation data processing. Figure 7 and Figure 8 show the comparison of RMSE before and after implementing the PECC method with different fixed compensation coefficients for dataset I and dataset II, respectively. The RMSE trends over time and convergence rates of ORKF, LSTM-KF, and AORKF in both datasets are basically the same, as shown in Figure 7 and Figure 8. After the convergence of the filtering results, the data of different dimensions in both datasets show the same accuracy characteristics. AORKF consistently outperforms the other two methods in both the synthesized RMSE and the RMSE of the X, Y, and Z axes, which reflects the superiority of the transformer-based error compensation model compared to the filter prediction method and the LSTM method.

4.4.3. Prediction Error Covariance Correction Method Ablation Study

The PECC method compensates for error covariance misalignment caused by missing or eliminated observation vectors. Figure 7 and Figure 8 compare RMSE before/after PECC implementation and with different compensation matrices across datasets. The RMSE of the input data and PORKF remains invariant to compensation coefficients, while ORKF’s variation demonstrates that an optimal coefficient minimizing RMSE exists for different datasets and motion laws.

The experiments in Figure 7 and Figure 8 aim to verify whether there exists an optimal compensation coefficient other than 1 under different error compensation coefficients, which results in the highest filtering accuracy. Based on the experimental results, it can be seen that there does exist an optimal compensation coefficient that is not 1 under different datasets and noises. Although the PECC method proposed in this paper cannot reach the optimal value in each case, it can still improve the robustness and accuracy of the filter in the face of outliers.

According to the results shown in Table 3, after using the PECC method, the accuracy and robustness of ORKF, LSTM-KF, and AORKF are greatly improved in both datasets. Specifically, the use of the PECC method in TORKF resulted in a reduction in the RMSE of AORKF by 0.33 m and 3.64 m in Dataset Ⅰ and Dataset Ⅱ, respectively. The RMSE of LPORKF is lower compared to LSTM-KF, and lower for PORKF compared with ORKF. The accuracy improvement brought about by the PECC method is significant. It is noticed that the more the error compensation model reduces the error, the more the accuracy improvement brought by the PECC method will be attenuated to some extent.

It can be observed that AORKF, which employs an attention-based approach, is more effective compared to LSTM. Additionally, it should be noted that regardless of whether considering the average runtime or the maximum runtime per frame, AORKF consistently requires more time than TORKF. Upon analysis, this is attributed to the introduction of the PECC method into the training process, which enhances the network’s runtime efficiency.

4.5. 3D Comparison Results

The 3D comparison results for 303 s of continuous flight data are presented in Figure 9. The test data comes from a complete maneuver trajectory from Dataset Ⅱ and contains the true data as well as all the algorithms tested in this paper.

As shown in Figure 9, under the interference of outliers far away from the true value, the MMCKF has large deviations and converges slowly. Other filtering algorithms have better convergence properties, so the filtering results are always near the true value.

Upon analyzing all experimental results, it is evident that the proposed improvement methods in this paper bring the filtering results closer to the true value when compared to existing algorithms. The outlier detection model accurately detects and effectively eliminates outliers. The transformer model effectively learns the motion characteristics of the trajectory and compensates for them. Additionally, the prediction error covariance correction method improves the accuracy of the filtering results.

5. Conclusions

(1) A Transformer-based Outlier-Robust Kalman Filter is proposed in this work. The outlier-robust method is utilized to detect the outliers precisely. The transformer-based error compensation model is developed to improve the prediction accuracy, and the PECC method is proposed to correct the error covariance further to enhance the accuracy and robustness of the filter.

(2) The proposed method was validated on two representative aircraft tracking datasets featuring complex motion models and outlier-contaminated observations: the propeller and jet aircraft datasets. Experimental results demonstrate that TORKF achieves a reduction in the RMSE of over 12.7% compared to current state-of-the-art methods while exhibiting faster convergence speed and enhanced robustness.

(3) It is noted that the average running time of the method proposed in this paper is longer than that of the compared methods, although it still meets the timeliness requirement for aircraft tracking.

(4) In future research, we might be able to propose a more time-efficient model and extend it to other application areas, such as autonomous driving or robotics, which have a high degree of similarity to the information obtained from aircraft sensors.

Author Contributions

Conceptualization, L.L. and W.B.; methodology, L.L.; software, L.L. and B.Z.; validation, L.L., W.B. and A.Z.; formal analysis, L.L. and Z.H.; investigation, L.L., B.Z. and S.X.; resources, W.B. and A.Z.; data curation, L.L., B.Z. and Z.H.; writing—original draft preparation, L.L.; writing—review and editing, W.B., A.Z., B.Z., Z.H. and S.X.; visualization, B.Z. and S.X.; supervision, W.B., A.Z. and Z.H.; project administration, W.B. and A.Z.; funding acquisition, W.B. and A.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by National Natural Science Foundation of China (62073267, 62473319), the Shanxi Provincial Youth Project under Grant 5113220040, and the Fundamental Research Funds for the Central Universities under Grant D5000250232.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

Author Baichuan Zhang was employed by the company AVIC Xi’an Flight Automatic Control Research Institute. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

KF	Kalman Filter.
LSTM	Long Short-Term Memory.
TORKF	Transformer-based Outlier-Robust Kalman Filter.
TPEC	Transformer-based Prediction Error Compensation.
PECC	Prediction Error Covariance Correction.
RMSE	Root Mean Square Error.
RT	Running Time.
MRPF	Maximum Runtime Per Frame.

References

Li, F.; Qian, S.; He, N.; Li, B. Estimation of Wiener Nonlinear Systems with Measurement Noises Utilizing Correlation Analysis and Kalman Filter. Int. J. Robust Nonlinear Control 2024, 34, 4706–4718. [Google Scholar] [CrossRef]
Cui, S.; Li, J.; Yu, Y.; Wang, Y.; Gao, Y.; Zhang, L.; Chen, J. An Improved Laplace Satellite Tracking Method Based on the Kalman Filter. Aerospace 2024, 11, 712. [Google Scholar] [CrossRef]
Wang, Z.; Liu, J.; Yuan, X.; Chen, C.; Liu, J.; Chen, W. A Dynamic Model-Based Doppler-Adaptive Correlation Filter for Maritime Radar Target Tracking. IEEE Trans. Geosci. Remote Sens. 2024, 62, 1–15. [Google Scholar] [CrossRef]
Mathavaraj, S.; Butcher, E.A. Consensus SE(3)-Constrained Extended Kalman Filter for Close Proximity Orbital Relative Pose Estimation. Aerospace 2024, 11, 762. [Google Scholar] [CrossRef]
Yang, Z.; Shang, M.; Yin, J. Analytical Second-Order Extended Kalman Filter for Satellite Relative Orbit Estimation. Aerospace 2024, 11, 887. [Google Scholar] [CrossRef]
Guo, S.; Sun, Y.; Chang, L.; Li, Y. Robust Cubature Kalman Filter Method for the Nonlinear Alignment of SINS. Def. Technol. 2021, 17, 593–598. [Google Scholar] [CrossRef]
Duan, X.; Fan, Q.; Bi, W.; Zhang, A. Belief Exponential Divergence for D-S Evidence Theory and Its Application in Multi-Source Information Fusion. J. Syst. Eng. Electron. 2024, 35, 1454–1468. [Google Scholar] [CrossRef]
Lakra, A.; Banerjee, B.; Laha, A.K. A Data-Adaptive Method for Outlier Detection from Functional Data. Stat. Comput. 2024, 34, 7. [Google Scholar] [CrossRef]
Mu, R.; Chu, Y.; Zhang, H.; Liang, H. A Multiple-Step, Randomly Delayed, Robust Cubature Kalman Filter for Spacecraft-Relative Navigation. Aerospace 2023, 10, 289. [Google Scholar] [CrossRef]
Fisch, A.T.M.; Eckley, I.A.; Fearnhead, P. Innovative and Additive Outlier Robust Kalman Filtering with a Robust Particle Filter. IEEE Trans. Signal Process 2022, 70, 47–56. [Google Scholar] [CrossRef]
Chi, J.; Mao, Z.; Jia, M. Robust Gaussian process regression based on bias trimming. Knowl. Based Syst. 2024, 291, 111605. [Google Scholar] [CrossRef]
Oei, M.; Sawodny, O. Vehicle Parameter Estimation with Kalman Filter Disturbance Observer. IFAC-PapersOnLine 2022, 55, 497–502. [Google Scholar] [CrossRef]
Fang, H.; Haile, M.A.; Wang, Y. Robust extended kalman filtering for systems with measurement outliers. IEEE Trans. Contr. Syst. Technol. 2022, 30, 795–802. [Google Scholar] [CrossRef]
Nakabayashi, A.; Ueno, G. Nonlinear Filtering Method Using a Switching Error Model for Outlier-Contaminated Observations. IEEE Trans. Automat. Contr. 2020, 65, 3150–3156. [Google Scholar] [CrossRef]
Liu, Y.; Wu, X.; Tang, Y.; Li, X.; Sun, D.; Zheng, L. Decomposition with Feature Attention and Graph Convolution Network for Traffic Forecasting. Knowl. Based Syst. 2024, 300, 112193. [Google Scholar] [CrossRef]
Zerdali, E. Adaptive extended kalman filter for speed-sensorless control of induction motors. IEEE Trans. Energy Convers. 2019, 34, 789–800. [Google Scholar] [CrossRef]
Javanfar, E.; Rahmani, M.; Moaveni, B. Measurement-outlier robust kalman filter for discrete-time dynamic systems. ISA Trans. 2023, 134, 256–267. [Google Scholar] [CrossRef] [PubMed]
Talebi, S.P.; Godsill, S.J.; Mandic, D.P. Filtering Structures for α-Stable Systems. IEEE Control Syst. Lett. 2023, 7, 553–558. [Google Scholar] [CrossRef]
Kim, D.; Kim, G.; Choi, S.; Huh, K. An integrated deep ensemble-unscented kalman filter for sideslip angle estimation with sensor filtering network. IEEE Access 2021, 9, 149681–149689. [Google Scholar] [CrossRef]
Kim, J.; Lee, K. Unscented Kalman Filter-Aided Long Short-Term Memory Approach for Wind Nowcasting. Aerospace 2021, 8, 236. [Google Scholar] [CrossRef]
Huang, J.; Yang, S.; Li, J.; Oh, J.; Kang, H. Prediction Model of Sparse Autoencoder-Based Bidirectional LSTM for Wastewater Flow Rate. J. Supercomput. 2023, 79, 4412–4435. [Google Scholar] [CrossRef] [PubMed]
Wang, C.; Zhang, J.; Wang, J.; Wu, Y. Confidence-Based Fusion of AC-LSTM and Kalman Filter for Accurate Space Target Trajectory Prediction. Aerospace 2025, 12, 347. [Google Scholar] [CrossRef]
Zhang, B.; Bi, W.; Zhang, A.; Mao, Z.; Yang, M. Transformer-based error compensation method for air combat aircraft trajectory prediction. Acta Aeronaut. Astronaut. Sin. 2023, 44, 291–304. [Google Scholar]
Li, B.; Chen, H.; An, Z.; Yu, Y.; Jia, Y.; Chen, L.; Sun, M. The continuous memory: A neural network with ordinary differential equations for continuous-time series analysis. Appl. Soft Comput. 2024, 167, 112275. [Google Scholar] [CrossRef]
Lei, X.; Zhang, Z. Recursive Weighted Least Squares Estimation Algorithm Based on Minimum Model Error Principle. Def. Technol. 2021, 17, 545–558. [Google Scholar] [CrossRef]
Cui, Y.; He, Y.; Tang, T.; Liu, Y. A new target tracking filter based on deep learning. Chin. J. Aeronaut. 2022, 35, 11–24. [Google Scholar] [CrossRef]
Kang, H.; Kang, P. Transformer-based multivariate time series anomaly detection using inter-variable attention mechanism. Knowl. Based Syst. 2024, 290, 111507. [Google Scholar] [CrossRef]
Snyder, Q.; Jiang, Q.; Tripp, E. Integrating Self-Attention Mechanisms in Deep Learning: A Novel Dual-Head Ensemble Transformer with Its Application to Bearing Fault Diagnosis. Signal Process. 2025, 227, 109683. [Google Scholar] [CrossRef]
Zhang, A.; Zhang, B.; Bi, W.; Mao, Z. Attention based trajectory prediction method under the air combat environment. Appl. Intell. 2022, 52, 17341–17355. [Google Scholar] [CrossRef]
Zhao, F.; Gao, W.; Lu, J.; Jiang, H.; Shi, J. Real-time concentration detection of Al dust using GRU-based Kalman filtering approach. Process Saf. Environ. Prot. 2024, 189, 154–163. [Google Scholar] [CrossRef]
Wei, Y.; Wang, Y. Dynamic blind source separation based on source-direction prediction. Neurocomputing 2016, 185, 73–81. [Google Scholar] [CrossRef]
Hou, S.; Wang, Z.; Dong, Z.; Li, Y.; Wang, Z.; Yin, G.; Wang, X. Self-Supervised Recalibration Network for Person Re-Identification. Def. Technol. 2024, 31, 163–178. [Google Scholar] [CrossRef]
Wang, M.; Wang, H. An adaptive attitude algorithm based on a current statistical model for maneuvering acceleration. Chin. J. Aeronaut. 2017, 30, 426–433. [Google Scholar] [CrossRef]
Abolmasoumi, A.H.; Farahani, A.; Mili, L. Robust Particle Filter Design with an Application to Power System State Estimation. IEEE Trans. Power Syst. 2024, 39, 1810–1821. [Google Scholar] [CrossRef]
Ahmad, E.; He, Y.; Luo, Z.; Lv, J. A hybrid long short-term memory and kalman filter model for train trajectory prediction. IEEE Trans. Intell. Transport Syst. 2024, 25, 7125–7139. [Google Scholar] [CrossRef]
Liu, C.; Wang, G.; Guan, X.; Huang, C. Robust M-estimation-based maximum correntropy Kalman filter. ISA Trans. 2023, 136, 198–209. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Illustration of TORKF structure for a single frame.

Figure 2. TORKF algorithm flow chart.

Figure 3. Illustration of TPEC model structure for a single frame.

Figure 4. The illustration of the PECC method structure for a single frame.

Figure 5. The RMSE performance on Dataset Ⅰ.

Figure 6. The RMSE performance on Dataset Ⅱ.

Figure 7. Comparison of RMSE before and after implementing the PECC method with different fixed compensation coefficients for Dataset I.

Figure 8. Comparison of RMSE before and after implementing the PECC method with different fixed compensation coefficients for Dataset II.

Figure 9. The 3D comparison results of continuous flight data.

Table 1. Overview of datasets.

Dataset	Aircraft Performance		Dataset Information
	Maximum Velocity	Maximum Acceleration	Maximum Velocity	Maximum Acceleration	RMSE (with Outliers)	RMSE (Without Outliers)	Outlier Number	Dataset Length
Dataset I	80 m/s	50 m/s²	63.44 m/s	49.98 m/s²	230.80 m	2.01 m	27,889	74,840 s
Dataset II	400 m/s	90 m/s²	380.61 m/s	86.39 m/s²	432.98 m	9.92 m	34,708	94,440 s

Table 2. Test results of TORKF compared with the state-of-the-art filters in both datasets.

Notes	Dataset Ⅰ					Dataset Ⅱ
Method	CS-MAEKF	WRPF	MMCKF	LSTM-KF	TORKF	CS-MAEKF	WRPF	MMCKF	LSTM-KF	TORKF
RMSE/m	71.47	68.46	147.80	16.79	14.65	139.91	87.08	285.73	48.52	42.85
RT ¹/(ms)	2.65	212	1.13	19.2	43.38	4.06	348.79	1.63	38.67	71.37
MRPF ²/ms	3.71	229.33	2.08	25.53	49.71	5.58	363.12	2.91	44.09	80.61

¹ Running time (RT). ² Maximum runtime per frame (MRPF).

Table 3. Test results of TORKF compared with the ablation test filters in both datasets.

Notes	Dataset Ⅰ					Dataset Ⅱ
Method	ORKF	PORKF	LPORKF	AORKF	TORKF	ORKF	PORKF	LPORKF	AORKF	TORKF
RMSE/m	17.63	14.98	14.87	14.96	14.65	57.4	49.41	44.32	46.49	42.85
RT ¹/(ms)	2.77	3.19	19.75	45.79	43.38	4.7	5.07	37.22	73.02	71.37
MRPF ²/ms	4.1	4.89	26.11	51.39	49.71	6.14	6.97	40.92	81.17	80.61

¹ Running time (RT). ² Maximum runtime per frame (MRPF).

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liu, L.; Bi, W.; Zhang, B.; Huang, Z.; Zhang, A.; Xu, S. TORKF: A Dual-Driven Kalman Filter for Outlier-Robust State Estimation and Application to Aircraft Tracking. Aerospace 2025, 12, 660. https://doi.org/10.3390/aerospace12080660

AMA Style

Liu L, Bi W, Zhang B, Huang Z, Zhang A, Xu S. TORKF: A Dual-Driven Kalman Filter for Outlier-Robust State Estimation and Application to Aircraft Tracking. Aerospace. 2025; 12(8):660. https://doi.org/10.3390/aerospace12080660

Chicago/Turabian Style

Liu, Li, Wenhao Bi, Baichuan Zhang, Zhanjun Huang, An Zhang, and Shuangfei Xu. 2025. "TORKF: A Dual-Driven Kalman Filter for Outlier-Robust State Estimation and Application to Aircraft Tracking" Aerospace 12, no. 8: 660. https://doi.org/10.3390/aerospace12080660

APA Style

Liu, L., Bi, W., Zhang, B., Huang, Z., Zhang, A., & Xu, S. (2025). TORKF: A Dual-Driven Kalman Filter for Outlier-Robust State Estimation and Application to Aircraft Tracking. Aerospace, 12(8), 660. https://doi.org/10.3390/aerospace12080660

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

TORKF: A Dual-Driven Kalman Filter for Outlier-Robust State Estimation and Application to Aircraft Tracking

Abstract

1. Introduction

2. Problem Description

3. Proposed Algorithm

3.1. Outlier-Robust Kalman Filter

3.1.1. Time Update

3.1.2. Measurement Update

3.2. Transformer-Based Error Compensation Model

3.3. Prediction Error Covariance Correction Method

4. Experiment and Analysis

4.1. Datasets and Environments

4.2. Experiment Details

4.3. Overall Evaluation

4.4. Ablation Study

4.4.1. Innovation-Based ORKF Ablation Study

4.4.2. Transformer-Based Error Compensation Model Ablation Study

4.4.3. Prediction Error Covariance Correction Method Ablation Study

4.5. 3D Comparison Results

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI