Interactive Multiple-Model Learning Filter for Spacecraft Pursuit–Evasion Game Strategy Switch Based on Long Short-Term Memory Network

Wang, Chuangge; Chen, Danhe; Liao, Wenhe

doi:10.3390/aerospace11110894

Open AccessArticle

Interactive Multiple-Model Learning Filter for Spacecraft Pursuit–Evasion Game Strategy Switch Based on Long Short-Term Memory Network

by

Chuangge Wang

,

Danhe Chen

^*

and

Wenhe Liao

School of Mechanical Engineering, Nanjing University of Science and Technology, Nanjing 210094, China

^*

Author to whom correspondence should be addressed.

Aerospace 2024, 11(11), 894; https://doi.org/10.3390/aerospace11110894

Submission received: 26 August 2024 / Revised: 20 October 2024 / Accepted: 28 October 2024 / Published: 30 October 2024

(This article belongs to the Section Astronautics & Space Science)

Download

Browse Figures

Versions Notes

Abstract

:

Aiming to address the problem of pursuit and interception for spacecraft using multiple evasion strategies, a pursuit strategy involving the use of an interactive multiple-model filter (IMM) in a pursuit–evasion game is considered, where the Evader adopts a switchable evasion strategy based on a linear quadratic method and zero-effort miss method. In this case, an improved interactive multiple-model feedback-learning filter method based on a long short-term memory network (LSTM-IMML) is proposed to estimate the Evader’s strategy mode, with the resulting estimation allowing the Pursuer to then switch its own strategy to the appropriate pursuit strategy to intercept the Evader. Also, the improved interactive multiple-model feedback learning filter can feed back the fusion estimation of the last-time state to the next-time state to improve estimation accuracy. An LSTM-based probability estimation network is designed to accurately estimate the probability of different modes. The proposed LSTM-IMML method can be used in the pursuit–evasion game when the Evader is able to switch its evasion strategy. The simulation results show that the LSTM-IMML method has better state estimation accuracy, and the mode probability estimation of the Evader is more exact and stable.

Keywords:

pursuit–evasion game; interactive multiple-model filter; long short-term memory network; mode probability

1. Introduction

Recently, the pursuit–evasion problem between spacecraft remains an important area of study in aerospace engineering, where one spacecraft acts as the pursuer and the other as the evader. Various pursuit-and-evasion strategies have been proposed to solve the pursuit–evasion problem between two spacecraft. But these studies mostly focus on a single pursuit–evasion strategy; that is, the Evader only uses one evasion strategy. However, with the improvement of spacecrafts’ intelligence levels, the Evader may adopt a variety of strategies and will be able to switch to a more appropriate one according to the situation.

For the pursuit–evasion problem, the use of differential game theory, which had been widely used by other researchers in this field, was initially proposed as a solution by Issacs in 1965 [1]. Mauro et al. [2] studied the problem of a long-distance pursuit–evasion game by transforming it into a two-point boundary value problem and proposed a semi-direct method to obtain the saddle point solution. Ye et al. [3] also transformed the pursuit-and-evasion problem into a two-point boundary value problem and solved it through a heuristic search. This algorithm was applied in a close-range pursuit–evasion game under different thrust configurations. Li et al. [4] and Zhang et al. [5] studied the pursuit–evasion game considering the J2 perturbation and proposed new methods with which to quickly find the saddle point solution. Prince et al. [6] used the indirect heuristic method to study the differential game of proximity operations in elliptical orbits. Pang et al. [7] studied the pursuit–evasion game along an elliptical orbit by providing a precise gradient. Unfortunately, the methods mentioned above are all open-loop solutions and cannot be used for real-time feedback control.

Consequently, researchers studied the real-time feedback control of the pursuit–evasion problem. Li et al. [8] proposed an infinite-horizon nonlinear quadratic differential game considering the motion camouflage pursuit problem. Wang et al. [9] and Ye et al. [10] investigated pursuit–evasion control based on zero-effort miss and deduced a pursuit–evasion feedback control strategy. Zhang et al. [11] proposed a new adaptive weighted differential game guidance law to intercept maneuvering targets by combining two guidance laws derived from complete and incomplete information modes. Li et al. [12] designed a linear- quadratic duration-adaptive strategy to solve the orbital pursuit–evasion–defense game problem.

In addition, other methods have also been studied to solve the pursuit–evasion problem. Gong et al. [13] used the reachable region method to study the pursuit–evasion game under continuous thrust and derived the analytical form of the reachable region based on the Hill–Clohessy–Wiltshire (HCW) equation. Zhao et al. [14] proposed an impulsive pursuit–evasion algorithm based on a multi-agent deep deterministic policy gradient (MADDPG) that can yield a pursuit–evasion strategy under multiple constraints. In summary, in the continuous thrust pursuit–evasion game scenario, the real-time feedback control law based on game theory is more likely to be applied in space due to its simple structure and excellent pursuit ability.

In order to evade interception by the Pursuer more effectively, the Evader may employ a variety of strategies and switch from one to another in the process depending on the situation [15]. For example, in reference [16], Evaders have varying structural dynamics and will switch from one mode to another in the game, resulting in changes of evasion strategies. Therefore, for the Pursuer, it is necessary to adjust its pursuit mode to achieve efficient interception of the Evader. In the complete information pursuit–evasion game, the Pursuer can obtain all the information of the Evader and adjust its pursuit strategy in real time. However, the Pursuer generally cannot obtain all the information on the Evader in practice. Under the condition of incomplete information, the Pursuer has to estimate the strategy adopted by the Evader according to the states of the two players and adjust its strategy in real time. This motivated us to study the problem of intercepting the Evader with a switchable evasion strategy under conditions of incomplete information.

The interactive multiple-model filter (IMM) is a method for estimating the state of dynamic systems, and it can be used for Markov stochastic jump systems. It is widely used in state estimation [17], dynamic target tracking [18], fault detection [19,20], and so on. Considering the characteristics of multiple-model estimation, the IMM method is used to solve the problem of Evader strategy switching in a pursuit–evasion game. Zou et al. [21] proposed a cooperative estimation method that combines the IMM of the evader and the Kalman filter of the defender. The cooperative method can effectively estimate the state of the Pursuer and significantly improve the accuracy of active defense guidance. Tang et al. [22] used the IMM method to estimate the state information of the Evader by combining the smooth variable sliding filter with mode matching, achieving a good interception effect.

However, the difference in control strategies is very small in spacecraft pursuit–evasion game strategy switching due to the low thrust of satellite thrusters. The classical IMM cannot predict the evasion mode precisely, which will affect interception performance. In addition, the model probability estimated by an IMM is unstable and fluctuates greatly due to the influence of navigation information error. Motivated by the above problems, this paper aims to design a method that can accurately and stably estimate the strategy mode of the Evader so as to intercept the Evader quickly.

The main novelties and contributions of this paper are as follows: (1) A switchable escape strategy based on a linear quadratic game strategy and a zero-effort miss game strategy was designed. (2) An interactive multiple-model learning filter is proposed by introducing the idea of feedback. (3) An interactive multiple-model learning filter based on an LSTM network (LSTM-IMML) is proposed.

The remainder of this paper is organized as follows. The Pursuer and Evader dynamics in the pursuit–evasion game are introduced in Section 2. In Section 3, a switchable pursuit–evasion strategy based on linear quadratic and zero-effort miss distance is designed. Section 4 details the interactive multiple-model learning filtering method based on an LSTM network. In Section 5, the proposed LSTM-IMML method is compared with IMM to validate the performance of the proposed method. Finally, Section 6 concludes the paper.

2. Dynamics of Spacecraft Pursuit–Evasion

To describe the maneuver game of two players, a reference spacecraft which is very close to the Pursuer and Evader is selected as the origin of the reference coordinate system oxyz as shown in Figure 1. The coordinate systems OXYZ and oxyz represent the Earth inertial and the reference coordinate systems, respectively. The axis ox points from the Earth’s center to the center of mass of the reference spacecraft; the axis oy points to the velocity direction, and the oz axis completes the right-hand rule.

Assuming the orbit of the reference spacecraft is circular and the two satellites maneuver near the reference spacecraft, the dynamics of the Pursuer and Evader in the reference coordinate system oxyz can be simplified to the HCW equation.

\begin{matrix} \{\begin{cases} {\ddot{x}}_{i} = 3 ω^{2} x_{i} + 2 ω {\dot{y}}_{i} + u_{x_{i}} \\ {\ddot{y}}_{i} = - 2 ω {\dot{x}}_{i} + u_{y_{i}} \\ {\ddot{z}}_{i} = - ω^{2} z_{i} + u_{z_{i}} \end{cases} & i = P, E \end{matrix}

(1)

where, subscript P represents the Pursuer and subscript E represents the Evader,

x_{i}, y_{i}, z_{i}

represent the positions of the Pursuer and the Evader in the reference coordinate system oxyz,

{\dot{x}}_{i}, {\dot{y}}_{i}, {\dot{z}}_{i}

denote the velocities,

ω

is the orbital angular velocity of the reference spacecraft,

u_{x_{i}}, u_{y_{i}}, u_{z_{i}}

represent the thrust acceleration in three axes.

Defining state vector

X = {[\begin{array}{l} x & y & z & \dot{x} & \dot{y} & \dot{z} \end{array}]}^{T}

and control vector

U = {[\begin{matrix} u_{x} & u_{y} & u_{z} \end{matrix}]}^{T}

, Equation (1) can be written

{\dot{X}}_{i} = A X_{i} + B U_{i}

(2)

where

A = [\begin{matrix} \begin{matrix} \begin{array}{l} 0 \\ 0 \\ 0 \\ 3 ω^{2} \\ 0 \\ 0 \end{array} & \begin{array}{l} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 0 \end{array} & \begin{array}{l} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ ω^{2} \end{array} \end{matrix} & \begin{array}{l} 1 \\ 0 \\ 0 \\ 0 \\ - 2 ω \\ 0 \end{array} & \begin{array}{l} 0 \\ 1 \\ 0 \\ 2 ω \\ 0 \\ 0 \end{array} & \begin{array}{l} 0 \\ 0 \\ 1 \\ 0 \\ 0 \\ 0 \end{array} \end{matrix}], B = [\begin{matrix} \begin{array}{l} 0 \\ 0 \\ 0 \\ 1 \\ 0 \\ 0 \end{array} & \begin{array}{l} 0 \\ 0 \\ 0 \\ 0 \\ 1 \\ 0 \end{array} & \begin{array}{l} 0 \\ 0 \\ 0 \\ 0 \\ 0 \\ 1 \end{array} \end{matrix}]

Then, the state space equations of the Pursuer and Evader can be obtained as

\begin{array}{l} {\dot{X}}_{P} = A X_{P} + B U_{P} \\ {\dot{X}}_{E} = A X_{E} + B U_{E} \end{array}

(3)

where subscript P represents the Pursuer and subscript E represents the Evader. In the reference coordinate system, the dynamics of the pursuit–evasion game are the difference between the dynamics of the Pursuer and the Evader. Defining

X_{P E} = X_{P} - X_{E}

, the relative dynamics can be written as

{\dot{X}}_{P E} = A X_{P E} + B U_{P} - B U_{E}

(4)

Considering the actual thruster limitation of the satellite, it is assumed that the constraints of thrust acceleration amplitude are

\{\begin{cases} ‖U_{P}‖ \leq ρ_{P} \\ ‖U_{E}‖ \leq ρ_{E} \end{cases}

(5)

Using the control strategy that satisfies Equation (5), the Pursuer and Evader compete for the terminal distance. The goal of the Pursuer is to intercept the Evader in the shortest time, while the Evader expects to avoid the interception. The terminal interception set is defined as

ψ = \{X_{P E} : ‖r_{P E}‖ = ‖r_{P} - r_{E}‖ \leq d_{c}\}

(6)

where

r_{P}

and

r_{E}

represent the position vectors of the Pursuer and the Evader, respectively. The Pursuer wants to use control

U_{P}

to make the relative state enter the set

ψ

as soon as possible, while the Evader expects to avoid this by applying control

U_{E}

.

3. Game Strategy Switch

In the pursuit–evasion game scenario, the Evader expects to increase the relative distance and quickly maneuver away from the Pursuer during the approach process. However, it will easily be intercepted by the Pursuer if the Evader adopts a fixed evasion strategy and this leads to the game strategy switch of the Evader. As the relative distance changes, the Evader will adjust evasion strategies in due time. This paper assumes that two game strategies are adopted, one is the linear quadratic game in consideration of fuel consumption, and the other is the zero-effort miss game strategy with maximum thrust.

3.1. Linear Quadratic Game Strategy

Linear quadratic differential game theory is widely used in pursuit–evasion problems. As described in [23], the objective function is first constructed, which is the quadratic function of the state difference and the control vector of the Pursuer and the Evader. The objective function of the Pursuer is

J_{P} = \frac{1}{2} X_{P E}^{T} (t_{f}) Q_{f} X_{P E} (t_{f}) + \frac{1}{2} \int_{t_{0}}^{t_{f}} (X_{P E}^{T} (t) Q X_{P E} (t) + U_{P}^{T} (t) R_{P} U_{P} (t) - U_{E}^{T} (t) R_{E} U_{E} (t)) d t

(7)

Due to the opposite goals of the Pursuer and the Evader, the designed objective function is also opposite. Then the objective function of the Pursuer can be represented as

J_{E} = - J_{P}

(8)

where,

Q

is a positive semi-definite matrix and

Q_{f}, R_{P}, R_{E}

are positive definite matrices. The conditions for the control sets

U_{P}^{*} (t)

and

U_{E}^{*} (t)

of the Pursuer and the Evader to arrive at the saddle point solution of the game are

\begin{array}{l} J_{E} (U_{P}^{*}, U_{E}^{*}) \leq J_{E} (U_{P}^{*}, U_{E}) \\ J_{P} (U_{P}^{*}, U_{E}^{*}) \leq J_{P} (U_{P}, U_{E}^{*}) \end{array}

(9)

Based on linear quadratic differential game theory, the optimal feedback control law of both sides can be derived [23].

U_{P}^{*} (t) = - R_{P}^{- 1} B^{T} P (t) X_{P E} (t)

(10)

U_{E}^{*} (t) = - R_{E}^{- 1} B^{T} P (t) X_{P E} (t)

(11)

where, P is a symmetric matrix obtained by solving the algebraic Riccati equation reversely, which satisfies

\begin{matrix} \dot{P} = - A^{T} P - P A - Q + P (B R_{P}^{- 1} B^{T} - B R_{E}^{- 1} B^{T}) P & P (t_{f}) \end{matrix} = Q_{f}

(12)

3.2. Zero-Effort Miss Game Strategy

When the pursuit–evasion game comes to the final stage, the Pursuer and Evader compete for the distance between them. In this case, only the relative distance of the two satellites is considered without considering fuel consumption, and the zero-effort miss

Z_{P E}

is introduced.

Z_{P E} = D Φ (t_{f}, t) X_{P E} = [\begin{matrix} I_{3} & 0_{3} \end{matrix}] Φ (t_{f}, t) X_{P E}

(13)

where,

D = [\begin{matrix} I_{3} & 0_{3} \end{matrix}]

and

Φ (t_{f}, t) = e^{A (t_{f} - t)}

are the state transition matrix of system (2) from

t

to

t_{f}

, as shown in [10], which satisfies

\dot{Φ} (t_{f}, t) = - Φ (t_{f}, t) A

(14)

By taking the derivative of zero-effort miss, we can obtain

{\dot{Z}}_{P E} = D (\dot{Φ} (t_{f}, t) X_{P E} + Φ (t_{f}, t) {\dot{X}}_{P E}) = B_{P} U_{P} - B_{E} U_{E}

(15)

where,

B_{P} = B_{E} = D Φ (t_{f}, t) B = Φ_{12} (t_{f}, t)

.

In the pursuit–evasion process, the Pursuer expects to reduce the zero-effort miss, while the Evader expects to increase the zero-effort miss in a way that is beneficial to itself as much as possible, so the zero-effort miss as the objective function is defined as

\min_{U_{P}} \max_{U_{E}} J = ‖Z_{P E} (t_{f})‖

(16)

According to the derivation process in reference [10], the control strategy of the Pursuer under maximum thrust acceleration can be obtained as

U_{P} = - ρ_{P} \frac{{(\frac{Z_{P E}^{T} (t)}{‖Z_{P E} (t)‖} B_{P})}^{T}}{‖\frac{Z_{P E}^{T} (t)}{‖Z_{P E} (t)‖} B_{P}‖} = - ρ_{P} \frac{B_{P}^{T} Z_{P E} (t)}{‖B_{P}^{T} Z_{P E} (t)‖}

(17)

where

ρ_{P}

is the maximum thrust acceleration amplitude of the Pursuer.

In the same way, the control strategy of the Evader under maximum thrust acceleration is

U_{E} = - ρ_{E} \frac{{(\frac{Z_{P E}^{T} (t)}{‖Z_{P E} (t)‖} B_{E})}^{T}}{‖\frac{Z_{P E}^{T} (t)}{‖Z_{P E} (t)‖} B_{E}‖} = - ρ_{E} \frac{B_{E}^{T} Z_{P E} (t)}{‖B_{E}^{T} Z_{P E} (t)‖}

(18)

where

ρ_{E}

is the maximum thrust acceleration amplitude of the Evader.

3.3. The Design of Switchable Pursuit-Evasion Strategy

The pursuit–evasion strategy based on the linear quadratic method takes into consideration the fuel consumption, so it is suitable for long-distance pursuit and evasion. In addition, the thrust acceleration according to the linear quadratic feedback control law is related to

R_{E}

and the relative state

X_{P E}

, and the thrust acceleration of the Evader will decrease as the distance between the two satellites decreases. Hence, the Evader will increase its thrust acceleration by switching the parameters of linear quadratic strategy. That is, when the Pursuer approaches the Evader, the Evader increases the output thrust acceleration by decreasing

R_{E}

.

When the distance between the Pursuer and the Evader is reduced to the warning range of the Evader, the best option for the Evader is to maneuver away from the Pursuer with a maximum thrust amplitude. That is, the Evader will switch to the zero-effort miss evasion strategy regardless of fuel consumption.

Based on the above analysis, the evasion strategy of the Evader can be designed. Suppose that the Evader has

M

evasion modes. Firstly, the Evader switches between different linear quadratic strategies by changing

R_{E}

in the former

M - 1

strategies, and then, finally, the Evader switches to the zero-effort miss strategy of the M-th strategy. The Evader has

M

evasion modes, the former

M - 1

strategies are linear quadratic evasion strategies, and the M-th is a zero-effort miss evasion strategy, which can be expressed as

U_{E} = \{\begin{array}{l} - R_{E_{1}}^{- 1} B^{T} P_{1} X_{P E} & ‖r_{P E}‖ > d_{P E_{1}} \\ ⋮ & ⋮ \\ - R_{E_{M - 1}}^{- 1} B^{T} P_{M - 1} X_{P E} & ‖r_{P E}‖ \geq d_{P E_{M - 1}} & ‖r_{P E}‖ < d_{P E_{M - 2}} \\ - ρ_{E} \frac{B_{E}^{T} Z_{P E} (t)}{‖B_{E}^{T} Z_{P E} (t)‖} & ‖r_{P E}‖ \leq d_{P E_{M - 1}} \end{array}

(19)

where

‖r_{P E}‖

represents the distance between the two spacecraft and

d_{P E_{i}}, i = 1, 2, \dots, M - 1

represents the strategy switch boundary of the Evader.

The Pursuer will also switch its strategy after the Evader performs a strategy switch. Therefore, the Pursuer’s strategy can be designed as

U_{P} = \{\begin{array}{l} - R_{P_{1}}^{- 1} B^{T} P_{1} X_{P E} & U_{E} = U_{E_{1}} \\ ⋮ & ⋮ \\ - R_{P_{M - 1}}^{- 1} B^{T} P_{M - 1} X_{P E} & U_{E} = U_{E_{M - 1}} \\ - ρ_{P} \frac{B_{E}^{T} Z_{P E} (t)}{‖B_{E}^{T} Z_{P E} (t)‖} & U_{E} = U_{E_{M}} \end{array}

(20)

However, in the case of incomplete information, the Pursuer does not know the strategy adopted by the Evader. Hence, the difficulty of strategy switching for the Pursuer lies in the estimation of the evasion strategy used by the Evader.

4. Strategy Estimation Method

To maneuver away from the Pursuer more effectively, the Evader will actively switch evasion strategy, which requires the Pursuer to estimate the evasion strategy in real time, and then change to its appropriate pursuit strategy. In this section, an interactive multiple-model learning filtering combined with an LSTM neural network is proposed. Feedback learning filters are used to estimate the state of the Evader, and then the evasion strategy is estimated by the LSTM network. Afterwards, the Pursuer switches to an appropriate pursuit strategy based on the estimated evasion strategy to intercept the Evader.

4.1. IMM-Based Strategy Switch Method

The multiple-model idea of the strategy switch of the pursuit–evasion game is to map the possible evasion strategy of the Evader into a model set, where each model corresponds to an evasion strategy. At the same time, multiple filters are used, working in parallel to estimate the state of each model. Then, the evasion strategy of the Evader is obtained by calculating the effective probability of each model.

The strategy switch method based on IMM is mainly divided into the following four steps:

Step 1: Reinitialization of Model Condition

The dynamics of the pursuit–evasion game between two spacecraft satisfy the Markov process, and the transition probability is

π_{i j} = \Pr \{r_{k} = j | r_{k} = i\}

(21)

where

r_{k}

represents the system mode at time k.

According to the estimation of each filter in the previous moment, the inputs of the filter corresponding to the j-th model at the current moment should be calculated first, these being mixed probability, mixed state estimation, and the corresponding error covariance matrix.

Mixed probability can be expressed as

μ_{k - 1 | k - 1}^{i | j} = \Pr {r_{k - 1} = i | r_{k} = j, Z_{k - 1}} = \frac{1}{c_{j}} π_{i j} μ_{k - 1}^{i}

(22)

where

c_{j} = \sum_{i = 1}^{M} π_{i j} μ_{k - 1}^{i}

is constant,

μ_{k - 1}^{i}

is the probability of matching model i at time

k - 1

,

π_{i j}

represents the transition probability from model i to model j.

Mixed state estimation and the corresponding error covariance matrix are given as

{\hat{x}}_{k - 1 | k - 1}^{0 j} = E \{x_{k - 1} | r_{k} = j, Z_{k - 1}\} = \sum_{i = 1}^{M} μ_{k - 1 | k - 1}^{i j} {\hat{x}}_{k - 1 | k - 1}^{i}

(23)

\begin{array}{l} P_{k - 1 | k - 1}^{0 j} = E \{(x_{k - 1} - {\hat{x}}_{k - 1 | k - 1}^{0 j}) {(x_{k - 1} - {\hat{x}}_{k - 1 | k - 1}^{0 j})}^{T} | r_{k} = j, Z_{k - 1}\} \\ = \sum_{i = 1}^{M} μ_{k - 1 | k - 1}^{i j} [P_{k - 1 | k - 1}^{i j} + ({\hat{x}}_{k - 1 | k - 1}^{i} - {\hat{x}}_{k - 1 | k - 1}^{0 j}) {({\hat{x}}_{k - 1 | k - 1}^{i} - {\hat{x}}_{k - 1 | k - 1}^{0 j})}^{T}] \end{array}

(24)

where

{\hat{x}}_{k - 1 | k - 1}^{i}

and

P_{k - 1 | k - 1}^{i}

are the state estimation and error covariance matrix of the i-th model filter at time k − 1, respectively.

Step 2: Multiple-Model Filtering

According to different evasion strategies, the corresponding pursuit–evasion strategy model is constructed.

For the pursuit–evasion model with a linear quadratic evasion strategy, it can be expressed as

{\dot{X}}_{P E} = Γ_{j} X_{P E} + B U_{P} = A X_{P E} + B R_{E_{j}}^{- 1} B^{T} P X_{P E} + B U_{P}

(25)

where

Γ_{j} = A + B R_{E_{j}}^{- 1} B^{T} P

and

j = 1, \dots, M - 1

denotes the

M - 1

linear quadratic pursuit strategy.

By discretizing the above equation, the state transition equation in discrete form can be obtained.

X_{P E_{k}} = F_{k | k - 1}^{j} X_{P E_{k - 1}} + B_{F_{j}} U_{P} + W_{k - 1}

(26)

where

W_{k - 1}

is zero-mean Gaussian white noise and represents the process noise sequence.

For the pursuit–evasion model with a zero-effort miss evasion strategy, it can be expressed as

\begin{array}{l} {\dot{X}}_{P E} = Γ_{M} X_{P E} + B U_{P} = A X_{P E} - ρ_{E} B \frac{B_{E}^{T} Z_{P E} (t)}{‖B_{E}^{T} Z_{P E} (t)‖} + B U_{P} \\ = (A + ρ_{E} B \frac{B_{E}^{T} D Φ (t_{f}, t)}{‖B_{E}^{T} Z_{P E} (t)‖}) X_{P E} + B U_{P} \end{array}

(27)

Similarly, the state transition equation can be obtained by discretizing the above equation

X_{P E_{k}} = F_{k | k - 1}^{M} X_{P E_{k - 1}} + B_{F_{M}} U_{P} + W_{k - 1}

(28)

In this way, the state transition matrix under the two game strategies can be acquired after receiving the new measurement information. Then, a Kalman filter is used to update the state estimation of the corresponding matching model j based on the new measurement information, which includes

\begin{array}{l} {\hat{x}}_{k | k - 1}^{j} = F_{k - 1}^{j} {\hat{x}}_{k - 1 | k - 1}^{0 j} + B_{F}^{j} U_{P_{k - 1}} \\ P_{k | k - 1}^{j} = F_{k - 1}^{j} P_{k - 1 | k - 1}^{0 j} {(F_{k - 1}^{j})}^{T} + Q_{k - 1}^{j} \\ {\hat{z}}_{k | k - 1}^{j} = H_{k - 1}^{j} {\hat{x}}_{k | k - 1}^{j} \\ S_{k}^{j} = H_{k}^{j} P_{k | k - 1}^{j} {(H_{k}^{j})}^{T} + R_{k}^{j} \\ K_{k}^{j} = P_{k | k - 1}^{j} {(H_{k}^{j})}^{T} {(S_{k}^{j})}^{- 1} \\ {\hat{x}}_{k | k - 1}^{j} = {\hat{x}}_{k | k - 1}^{j} + K_{k}^{j} (z_{k} - {\hat{z}}_{k | k - 1}^{j}) \\ P_{k | k}^{j} = P_{k | k - 1}^{j} - K_{k}^{j} S_{k}^{j} {(K_{k}^{j})}^{T} \end{array}

(29)

where superscript

j = 1, 2, \dots, M

represents the j-th filter corresponding to the j-th pursuit–evasion model,

Q_{k - 1}

and

R_{k}

are the process noise covariance and measurement noise covariance,

H_{k - 1}

is the measurement matrix, and

z_{k}

is the measured value at time k.

Step 3: Model probability update and pursuit strategy switch

For the model

j = 1, 2, \dots, M

, the model posterior probability at time k is calculated

μ_{k}^{j} = \Pr \{r_{k} = j | Z_{k}\} = \frac{c_{j} Λ_{k}^{j}}{\sum_{i = 1}^{M} c_{j} Λ_{k}^{j}}

(30)

where

Λ_{k}^{j} = N (z_{k}; {\hat{z}}_{k | k - 1}^{j}, S_{k}^{j})

is the Gaussian likelihood function of the j-th model.

N (z_{k}; {\hat{z}}_{k | k - 1}^{j}, S_{k}^{j}) \overset{d e f}{=} \frac{1}{\sqrt{\det (2 π S_{k}^{j})}} \exp [- \frac{1}{2} {(z_{k} - {\hat{z}}_{k | k - 1}^{j})}^{T} {(S_{k}^{j})}^{T} (z_{k} - {\hat{z}}_{k | k - 1}^{j})]

(31)

Obviously, the posterior probability at step k is satisfied

\sum_{j = 1}^{M} μ_{k}^{j} = 1

(32)

According to the posterior probability of each model, the evasion strategy corresponding to the model with the highest probability is the strategy adopted by the Evader.

U_{E} = U_{E}^{j}, j = \arg \max μ_{k}^{j}

(33)

Then, as shown in Equation (20), the Pursuer chooses the corresponding pursuit strategy according to the evasion strategy.

Step 4: Estimate fusion

Based on the output of each sub-filter, the overall estimate and estimation error covariance matrix at time

k

can be calculated

{\hat{x}}_{k | k} = E \{x_{k} | Z_{k}\} = \sum_{j = 1}^{M} μ_{k}^{j} {\hat{x}}_{k | k}^{j}

(34)

P_{k | k} = E \{(x_{k} - {\hat{x}}_{k | k}) {(x_{k} - {\hat{x}}_{k | k})}^{T} | Z_{k}\} = \sum_{j = 1}^{M} μ_{k}^{j} [P_{k | k}^{j} + ({\hat{x}}_{k | k}^{j} - {\hat{x}}_{k | k}) {({\hat{x}}_{k | k}^{j} - {\hat{x}}_{k | k})}^{T}]

(35)

4.2. Interactive Multiple-Model Feedback Learning Filter

In the IMM method, the final fusion estimate is only the output, with no feedback for the estimation of the next state. However, the final fusion tends to be more accurate than the model-dependent estimates, so the final fusion can be used as a reference for next state estimation. On this basis, an interactive multiple-model filter based on fusion estimation feedback learning is proposed.

First, the feedback learning term is defined based on the IMM method, namely

l_{k}^{j} = {\hat{ϕ}}_{k}^{j} - {\hat{x}}_{k | k - 1}^{j} = F_{k - 1}^{j} ({\hat{x}}_{k - 1 | k - 1} - {\hat{x}}_{k - 1 | k - 1}^{j})

(36)

where

{\hat{ϕ}}_{k}^{j}

is obtained from the overall estimate at time

k - 1

through the system state transition matrix, i.e.,

{\hat{ϕ}}_{k}^{j} = F_{k - 1}^{j} {\hat{x}}_{k - 1 | k - 1}

(37)

where

{\hat{x}}_{k - 1 | k - 1}

is the total estimate at time

k - 1

, calculated by Equation (34).

The feedback learning term is used for state estimation in the next moment; that is,

{\hat{x}}_{k | k}^{j} = {\hat{x}}_{k | k - 1}^{j} + K_{k}^{j} (z_{k} - H_{k}^{j} {\hat{x}}_{k | k - 1}^{j}) + ε_{k}^{j} l_{k}^{j}

(38)

where

ε_{k}^{j}

is the gain constant of the feedback learning term.

Due to the introduction of the feedback learning term, the gain matrix

K_{k}^{j}

needs to be redesigned. The error is defined as follows.

\begin{array}{l} e_{k}^{j} = x_{k} - {\hat{x}}_{k | k}^{j} \\ η_{k}^{j} = x_{k} - {\hat{x}}_{k | k - 1}^{j} \\ δ_{k}^{j} = x_{k} - {\hat{ϕ}}_{k}^{j} \end{array}

(39)

According to the updated estimation equation, the relationship between the three errors can be represented as

\begin{array}{l} e_{k}^{j} = η_{k}^{j} - K_{k}^{j} H_{k}^{j} η_{k}^{j} - K_{k}^{j} v_{k}^{j} - ε_{k}^{j} (η_{k}^{j} - δ_{k}^{j}) \\ = (I - K_{k}^{j} H_{k}^{j} - ε_{k}^{j}) η_{k}^{j} - K_{k}^{j} v_{k}^{j} + ε_{k}^{j} δ_{k}^{j} \end{array}

(40)

The corresponding covariance matrix is

\begin{array}{l} P_{k | k}^{j} = E \{e_{k}^{j} {(e_{k}^{j})}^{T}\} \\ = (I - K_{k}^{j} H_{k}^{j} - ε_{k}^{j} I) P_{k | k - 1}^{j} (I - K_{k}^{j} H_{k}^{j} - ε_{k}^{j} I) + \\ K_{k}^{j} R_{k}^{j} {(K_{k}^{j})}^{T} + {(ε_{k}^{j})}^{2} Φ_{k}^{j} + ε_{k}^{j} (I - K_{k}^{j} H_{k}^{j} - ε_{k}^{j} I) \sum_{k}^{j} + \\ ε_{k}^{j} {(\sum_{k}^{j})}^{T} {(I - K_{k}^{j} H_{k}^{j} - ε_{k}^{j} I)}^{T} \end{array}

(41)

where

I

is the identity matrix and

\begin{array}{l} Φ_{k}^{j} = E \{δ_{k}^{j} {(δ_{k}^{j})}^{T}\} = F_{k - 1}^{j} P_{k - 1 | k - 1} {(F_{k - 1}^{j})}^{T} + Q_{k - 1}^{j} \\ \sum_{k}^{j} = E \{η_{k}^{j} {(δ_{k}^{j})}^{T}\} = F_{k - 1}^{j} \sum_{i = 1}^{M} μ_{k - 1}^{i} E \{e_{k - 1}^{j} {(e_{k - 1}^{i})}^{T}\} {(F_{k - 1}^{j})}^{T} + Q_{k - 1}^{j} \end{array}

(42)

The optimal gain matrix

K_{k}^{j}

can be obtained by minimizing the covariance matrix trace, i.e.,

\frac{\partial t r \{P_{k | k}^{j}\}}{\partial K_{k}^{j}} = 0

(43)

Using the matrix calculation theory, any matrix satisfies

\begin{array}{l} \frac{\partial t r \{X Y\}}{\partial Y} = X^{T} \\ \frac{\partial t r \{X^{T} Y X\}}{\partial X} = (Y + Y^{T}) X \end{array}

(44)

Hence, we can obtain

K_{k}^{j} H_{k}^{j} P_{k | k - 1}^{j} {(H_{k}^{j})}^{T} - (1 - ε_{k}^{j}) P_{k | k - 1}^{j} {(H_{k}^{j})}^{T} + K_{k}^{j} R_{k}^{j} - ε_{k}^{j} \sum_{k}^{j} {(H_{k}^{j})}^{T} = 0

(45)

The optimal gain matrix is

K_{k}^{j} = [(1 - ε_{k}^{j}) P_{k | k - 1}^{j} {(H_{k}^{j})}^{T} + ε_{k}^{j} \sum_{k}^{j} {(H_{k}^{j})}^{T}] {[H_{k}^{j} P_{k | k - 1}^{j} {(H_{k}^{j})}^{T} + R_{k}^{j}]}^{- 1}

(46)

Thus, the Kalman filter in Step 2 can be replaced by

\begin{array}{l} {\hat{x}}_{k | k - 1}^{j} = F_{k - 1}^{j} {\hat{x}}_{k - 1 | k - 1}^{0 j} + B_{F} U_{P_{k - 1}} \\ P_{k | k - 1}^{j} = F_{k - 1}^{j} P_{k - 1 | k - 1}^{0 j} {(F_{k - 1}^{j})}^{T} + Q_{k - 1}^{j} \\ {\hat{ϕ}}_{k}^{j} = F_{k - 1}^{j} {\hat{x}}_{k - 1 | k - 1} \\ Φ_{k}^{j} = F_{k - 1}^{j} P_{k - 1 | k - 1} {(F_{k - 1}^{j})}^{T} + Q_{k - 1}^{j} \\ {\hat{x}}_{k | k}^{j} = {\hat{x}}_{k | k - 1}^{j} + K_{k}^{j} (z_{k} - H_{k}^{j} {\hat{x}}_{k | k - 1}^{j}) + ε_{k}^{j} ({\hat{ϕ}}_{k}^{j} - {\hat{x}}_{k | k - 1}^{j}) \\ S_{k}^{j} = H_{k}^{j} P_{k | k - 1}^{j} {(H_{k}^{j})}^{T} + R_{k}^{j} \\ P_{k | k}^{j} = (I - K_{k}^{j} H_{k}^{j} - ε_{k}^{j} I) P_{k | k - 1}^{j} (I - K_{k}^{j} H_{k}^{j} - ε_{k}^{j} I) + \\ K_{k}^{j} R_{k}^{j} {(K_{k}^{j})}^{T} + {(ε_{k}^{j})}^{2} Φ_{k}^{j} + ε_{k}^{j} (I - K_{k}^{j} H_{k}^{j} - ε_{k}^{j} I) \sum_{k}^{j} + ε_{k}^{j} {(\sum_{k}^{j})}^{T} {(I - K_{k}^{j} H_{k}^{j} - ε_{k}^{j} I)}^{T} \\ K_{k}^{j} = [(1 - ε_{k}^{j}) P_{k | k - 1}^{j} {(H_{k}^{j})}^{T} + ε_{k}^{j} \sum_{k}^{j} {(H_{k}^{j})}^{T}] {[H_{k}^{j} P_{k | k - 1}^{j} {(H_{k}^{j})}^{T} + R_{k}^{j}]}^{- 1} \end{array}

(47)

In this way, the interactive multiple-model feedback learning filter is obtained. In Step 2, the multiple-model feedback learning filter is used to estimate the state of the corresponding mode.

4.3. LSTM-IMML Method

RNN (Recurrent Neural Network) is a neural network used to deal with time series problems. It will memorize the previous information and apply it to the current output [24]. LSTM is a special RNN network, which is designed to solve the problem of long dependence. It can process sequence data efficiently and has been adopted in natural language processing, trajectory prediction [25], time series prediction [26], and other fields [27]. The LSTM unit can determine whether the current input is important, so long-term information is not affected by recursive operations and is stored in a more secure manner [24]. When estimating the evasion strategy, it is necessary to combine the current measurement information with the previous estimation state to improve accuracy and stability, which have a long-term dependence on information. Therefore, the LSTM network is embedded into the IMML method to estimate the mode probability of the evasion strategy in this section.

The LSTM cell structure is shown in Figure 2.

O_{t}

,

i_{t}

, and

C_{t}

are the states of output, input, and the cell state in the t-th epoch, respectively. A standard LSTM unit includes a forget gate, an input gate, and an output gate [28]. The forget gate is used to determine which information will be forgotten from the cell state, the input gate determines whether new information can be kept in the cell state, and the output gate determines which information will be output. The mathematical expressions of the forget gate, input gate, and output gate can be seen in [27].

In this paper, the LSTM network is mainly used to estimate the possible mode probability of the Evader based on the state information output by the multiple-model feedback filter. As shown in Figure 3, a probability estimation neural network based on LSTM is established.

As illustrated in Figure 3, the LSTM-based probability estimation network includes an input layer, two LSTM layers, two fully connected layers, and an output layer. The number of neurons in each layer is shown in Figure 3. A dropout layer is added after each LSTM layer to prevent the LSTM network from overfitting and to enhance the generalization ability of the network [29]. The dropout probability of the dropout layer is set as 10%. During training, the dropout layer discards the activation value of the LSTM neurons with a probability of 10%, thereby improving the generalization ability of the network.

The framework of the proposed LSTM-IMML method for the Evader’s strategy estimation is shown in Figure 4. The method consists of two parts. The top block represents the offline training process and the bottom block represents the online estimation process. During training, the training data sets and the training test sets are first constructed according to the IMM method, and then the training is completed on the ground computer. After the training of the LSTM network is completed, the LSTM network can be applied in the actual space pursuit–evasion game scenario without online training.

The steps of using the LSTM-IMML algorithm to estimate the evasion strategy are detailed below.

(1).: Based on the last filter estimation, calculate the mixing probability $μ_{k - 1 | k - 1}$ , the mixing state estimation ${\hat{x}}_{k - 1 | k - 1}^{0 j}$ , and the mixing error covariance matrix $P_{k - 1 | k - 1}^{0 j}$ ;
(2).: Use multiple feedback learning filters to estimate and update the state of each model based on the new measurement information $z_{k}$ ;
(3).: Take the measured residual $z_{k} - {\hat{z}}_{k | k - 1}^{j}$ of the filter as the input, adopt the trained LSTM network to calculate each mode probability $μ_{k}$ ;
(4).: Estimate and fuse the output of each filter, then repeat step 1.

Compared with the classic IMM method, the proposed LSTM-IMML method mainly has the following improvements. (1) The IMML method was inspired by the feedback idea, and the final fusion estimation is introduced into the estimation of the next state in the pursuit–evasion game, which can effectively improve the precision of state estimation. (2) Combined with its memory function, the LSTM network is introduced to improve the estimation stability of the Evader’s evasion strategy mode and reduce fluctuations. (3) The introduction of the LSTM network avoids the error caused by singular value in model probability estimation using the IMM, which increases the robustness of the method.

5. Numerical Simulation

The pursuit–evasion game scenario is simulated to verify the effectiveness of the proposed strategy switch algorithm based on an LSTM network and multiple-model feedback learning filtering. The reference orbit is a geosynchronous orbit, the Pursuer and Evader maneuver around the reference spacecraft. The maximal thrust acceleration amplitude of the Pursuer and Evader are

ρ_{P} = 0.1 {m / s}^{2}

and

ρ_{E} = 0.075 {m / s}^{2}

, respectively. The interception range is set as

d_{c} = 20 m

. The initial state of the Pursuer is

{[\begin{array}{l} 500 & - 1500 & 200 & 0 & 0 & 0 \end{array}]}^{T}

, and the initial state of the Evader is

{[\begin{array}{l} 0 & 0 & 0 & 0 & 0 & 0 \end{array}]}^{T}

. The Evader adopts three evasion strategies, the first two are linear quadratic evasion strategies, and the third is a zero-effort miss evasion strategy.

U_{E} = \{\begin{array}{l} - R_{E_{1}}^{- 1} B^{T} P_{1} X_{P E} & ‖r_{P E}‖ > d_{P E_{1}} \\ - R_{E_{2}}^{- 1} B^{T} P_{2} X_{P E} & ‖r_{P E}‖ \geq d_{{P E}_{2}} & ‖r_{P E}‖ < d_{{P E}_{1}} \\ - ρ_{E} \frac{B_{E}^{T} Z_{P E} (t)}{‖B_{E}^{T} Z_{P E} (t)‖} & ‖r_{P E}‖ \leq d_{{P E}_{2}} \end{array}

(48)

where the mode switch boundary

d_{P E_{1}} = 1000 m, d_{P E_{2}} = 500 m

. In the linear quadratic pursuit–evasion strategy,

Q_{f} = d i a g ([\begin{array}{l} 1 & 1 & 1 & 0 & 0 & 0 \end{array}])

,

R_{E_{1}} = d i a g ([\begin{matrix} 3 & 3 & 3 \end{matrix}]) \times 10^{- 7}

,

R_{E_{2}} = d i a g ([\begin{matrix} 2 & 2 & 2 \end{matrix}]) \times 10^{- 7}

, and

R_{P} = d i a g ([\begin{matrix} 1.5 & 1.5 & 1.5 \end{matrix}]) \times 10^{- 7}

. In the LSTM-IMML method, the Markov transition probability matrix is

π = [\begin{matrix} 0.9 & 0.05 & 0.05 \\ 0.05 & 0.9 & 0.05 \\ 0.05 & 0.05 & 0.9 \end{matrix}]

, and the elements in this matrix correspond to

π_{i j}

in Equation (21). The gain constant

ε

is 0.05.

In the feedback learning filter, the measurement noise covariance matrix is

R_{k} = d i a g ([\begin{matrix} 0.04 & 0.04 & 0.04 \end{matrix}])

and the process noise covariance matrix is

Q_{k} = d i a g ([\begin{array}{l} 1 & 1 & 1 & 0.01 & 0.01 & 0.01 \end{array}]) \times 10^{- 6}

. All prior model probabilities are set to the same value, i.e., the initial moment mode probability is

μ_{0} = {[\begin{matrix} 1 / 3 & 1 / 3 & 1 / 3 \end{matrix}]}^{T}

.

Two cases are simulated in this section. One is that the Pursuer adopts a fixed pursuit strategy, and the other is that the Pursuer switches its own pursuit strategy according to estimation of the Evader’s strategy. The Evader performs a strategy switch in both cases.

5.1. The Case of a Pursuer with a Fixed Strategy

In this scenario, the Evader performs multiple strategy switches, as shown in Equation (46). The Pursuer adopts a fixed strategy, which is the linear quadratic pursuit strategy. The simulation results of pursuit–evasion are as follows.

Figure 5a shows the maneuvering trajectories of the Pursuer and the Evader, and the distance between the two satellites is depicted in Figure 5b. The simulation results indicate that the Pursuer does not intercept the Evader successfully, although the Pursuer gradually approaches the Evader at the beginning. After the distance reaches the switch boundary, the Evader switches its strategy to the zero-effort miss strategy. Then, the Evader maneuvers away from the Pursuer at maximum thrust acceleration, as shown in Figure 5d, which leads to an increase in the relative distance. Figure 5c,d show the velocity and control acceleration of the two satellites, respectively, and it is clear that the Evader adopts two strategy switches to increase the relative velocity between the Pursuer and the Evader.

From the above simulation results, we can conclude that the Pursuer with a fixed pursuit strategy may not be able to intercept the Evader with multiple switchable evasion strategies. Thus, the Pursuer also needs to switch its own strategy to intercept the Evader.

5.2. The Case of a Pursuer with Strategy Switch

In this scenario, the Pursuer performs a strategy switch to match the Evader’s strategy, and the two IMM and LSTM-IMML methods are compared.

Firstly, the constructed strategy estimation network needs to be trained offline. According to the IMM method, the strategy of the Evader is estimated, and the strategy of the Pursuer is switched to perform the pursuit–evasion game simulation. Based on this, the training data set is constructed. The maximum number of training sets is 250, the gradient threshold is set as 1, and the learning rate is 0.005. The loss value in the training process is shown in Figure 6. From the training results, the LSTM neural network tends to converge after 250 iterations, indicating that the LSTM neural network is well trained.

Under the same conditions, based on the trained LSTM neural network, the simulation results of the LSTM-IMML method are shown as follows.

Figure 7a,b depict the maneuvering trajectory and relative distance when the Pursuer adopts the LSTM-IMML method for the estimation of the evasion strategy. It is clear that the Pursuer approaches the interception range within 20 m and finally intercepts the Evader. Figure 7c shows the velocities of the Pursuer and Evader. The control acceleration of the two satellites is shown in Figure 7d, which indicates the Pursuer quickly switches strategy after the Evader switches its evasion strategy. The effectiveness of the proposed LSTM-IMML strategy switch method is verified.

Furthermore, the evasion mode probabilities of the Evader estimated by the IMM method and the LSTM-IMML method are further compared, as shown in the following.

Figure 8 shows the probability that the Pursuer uses the IMM method to estimate the evasion strategy adopted by the Evader. In general, the IMM method can estimate the evasion strategy effectively. However, when the difference between the different evasion strategies is small, the estimated probability difference is not obvious. Especially between 375 s and 420 s, the probability of mode 2 does not show an advantage, and the probability difference between mode 1 and mode 2 is weak. This is because, when the distance between the two satellites is reduced, the control outputs of the Evader’s strategy 1 and strategy 2 are similar, which leads to a weakening of the observability, and the filter cannot distinguish between the two strategy models. In addition, between 500 s and 600 s, although mode 3 is dominant, there is still a jump phenomenon.

The mode probability of the Evader estimated by the LSTM-IMML method is shown in Figure 9, which illustrates that the evasion strategy used by the Evader can be accurately estimated. Compared with the IMM method, the LSTM network is more accurate and stable in estimating mode probability. In the corresponding time interval, the mode probability corresponding to the evasion strategy estimated by the Pursuer is above 0.8. In particular, between 300 and 420 s, the output difference between the control strategies of mode 1 and mode 2 is very small, and the observability becomes weaker. However, the mode 2 probability estimated by the LSTM-IMML method still occupies an advantage, which also demonstrates that the LSTM-IMML method is better. In addition, compared with the IMM method, the proposed LSTM-IMML method is more stable, and the estimated probability does not appear prone to drastic fluctuation.

To further analyze the accuracy of state estimation by the IMM method and the LSTM-IMML method, 100 Monte Carlo simulations were performed to calculate the error between the estimated state and the real value. The root mean square error of the position and the root mean square error of the velocity obtained by the IMM method and the LSTM-IMML method are shown in Figure 10 and Figure 11, respectively. The position and velocity root mean square error of the LSTM-IMML method are smaller than those of the IMM method, indicating that the introduction of feedback items in the filter can improve state estimation accuracy.

In addition, the case of the Evader using four evasion strategies is considered to further compare the IMM and LSTM-IMML methods; that is, the first three are linear quadratic strategies, and the fourth is a zero-effort miss strategy. The boundaries of mode switch are

d_{P E_{1}} = 1400 m, d_{P E_{2}} = 1000 m, d_{P E_{3}} = 500 m

. In the linear quadratic pursuit–evasion strategy,

R_{E_{1}} = d i a g ([\begin{matrix} 3 & 3 & 3 \end{matrix}]) \times 10^{- 7}, R_{E_{2}} = d i a g ([\begin{matrix} 2 & 2 & 2 \end{matrix}]) \times 10^{- 7}, R_{E_{3}} = d i a g ([\begin{matrix} 1.4 & 1.4 & 1.4 \end{matrix}]) \times 10^{- 7}

. In the LSTM-IMML method, the Markov transition probability matrix is

π = [\begin{matrix} 0.85 & 0.05 & 0.05 & 0.05 \\ 0.05 & 0.85 & 0.05 & 0.05 \\ 0.05 & 0.05 & 0.85 & 0.05 \\ 0.05 & 0.05 & 0.05 & 0.85 \end{matrix}]

. The IMM and LSTM-IMML methods are used for simulation, and the simulation results are shown in Figure 12 and Figure 13.

Figure 12a and Figure 13a show the relative distance between the two satellites when the Pursuer adopts the IMM and LSTM-IMML methods, respectively. When the Evader adopts a switchable evasion strategy with four modes, the Pursuer using the LSTM-IMML method approaches the interception range within 20 m at the end, while the Pursuer using the IMM method does not. This is because evasion mode 2 and mode 3 are very similar, which cannot be accurately estimated using the IMM method, but can be accurately estimated using the LSTM-IMML method, which can be seen from Figure 12b and Figure 13b. This simulation further proves the superiority of the proposed LSTM-IMML method in mode probability estimation.

6. Conclusions

In this paper, a new switchable pursuit strategy for the Pursuer is proposed when the Evader adopts multiple switchable evasion strategies. Firstly, the linear quadratic and zero-effort miss pursuit–evasion strategies are designed. Then, the IMM method is used to identify the evasion strategy of the Evader in parallel by using multiple filters. To overcome the problem that the IMM method has poor accuracy in estimating the Evader’s state, a feedback learning filter is proposed to improve the state estimation accuracy. The estimation accuracy is improved by introducing the feedback term. In addition, a mode probability estimation network based on LSTM is proposed to enhance the stability of the probability estimation, which is embedded in the interactive multiple-model learning filter. The simulation results verify the effectiveness of the proposed LSTM-IMML method. Compared with the IMM, the state estimation accuracy of the proposed LSTM-IMML method is improved, the estimated mode probability is more exact and stable, and the observability is stronger, which improves the interception effectiveness of the Pursuer. The proposed LSTM-IMML game switching strategy further enhances the recognition accuracy and stability of the Evader’s mode probability, which can be implemented in a space pursuit–evasion game mission with incomplete information. In subsequent studies, we will consider constraints such as navigation information loss and sensor operating distance.

Author Contributions

Conceptualization, D.C. and W.L.; methodology, C.W.; resources, W.L.; supervision, W.L.; writing—original draft, C.W.; writing—review and editing, D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Laboratory of Space Intelligent Control Technology, China (grant number HTKJ2023KL502009), and the Basic Research Program of the Natural Science Foundation-Youth Fund Project (grant number BK20241460).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy reasons.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Isaacs, R. Differential Games: A Mathematical Theory with Applications to Warfare and Pursuit, Control and Optimization; Dover Publications: Mineola, NY, USA, 1965; pp. 278–280. [Google Scholar]
Pontani, M.; Conway, B.A. Numerical solution of the three-dimensional orbital pursuit-evasion game. J. Guid. Control Dynam. 2009, 32, 474–487. [Google Scholar]
Ye, D.; Shi, M.M.; Sun, Z.W. Satellite proximate pursuit-evasion game with different thrust configurations. Aerosp. Sci. Technol. 2020, 99, 105715. [Google Scholar]
Li, Z.Y.; Zhu, H.; Yang, Z.L. Saddle point of orbital pursuit-evasion game under J2-perturbed dynamics. J. Guid. Control Dynam. 2020, 43, 1733–1739. [Google Scholar]
Zhang, Z.T.; Zhang, Y.K.; Wang, B. Application of the hp-adaptive pseudospectral method in spacecraft orbit pursuit-evasion game. Adv. Space Res. 2024, 73, 1597–1610. [Google Scholar]
Prince, E.R.; Hess, J.A.; Cobb, R.G. Elliptical orbit proximity operations differential games. J. Guid. Control Dynam. 2019, 42, 1458–1472. [Google Scholar]
Pang, B.; Wen, C.X.; Han, H.W.; Qiao, D. Solving Pursuit/Evasion Game Along Elliptical Orbit by Providing Precise Gradient. J. Guid. Control Dynam. 2024, 47, 797–807. [Google Scholar]
Li, J.Q.; Li, C.Y.; Zhang, Y.H. Guidance strategy of motion camouflage for spacecraft pursuit-evasion game. Chin. J. Aeronaut. 2024, 37, 312–319. [Google Scholar]
Wang, Q.; Ye, D.; Fan, N.J. Terminal orbital control of satellite pursuit evasion game based on zero effort miss. Trans. Beijing Inst. Technol. 2016, 36, 1171–1176. [Google Scholar]
Ye, D.; Shi, M.M.; Sun, Z.W. Satellite proximate interception vector guidance based on differential games. Chin. J. Aeronaut. 2018, 31, 1352–1361. [Google Scholar]
Zhang, P.; Fang, Y.W.; Zhang, F.M. An adaptive weighted differential game guidance law. Chin. J. Aeronaut. 2012, 25, 739–746. [Google Scholar]
Li, Z.Y. Orbital Pursuit–Evasion–Defense Linear-Quadratic Differential Game. Aerospace 2024, 11, 443. [Google Scholar] [CrossRef]
Gong, H.R.; Gong, S.P.; Li, J.F. Pursuit–evasion game for satellites based on continuous thrust reachable domain. IEEE Trans. Aerosp. Electron. Syst. 2020, 56, 4626–4637. [Google Scholar]
Zhao, L.R.; Zhang, Y.L.; Dang, Z.H. PRD-MADDPG: An efficient learning-based algorithm for orbital pursuit-evasion game with impulsive maneuvers. Adv. Space Res. 2023, 72, 211–230. [Google Scholar]
Shinar, J.; Glizer, V.Y.; Turetsky, V. Robust pursuit of a hybrid evader. Appl. Math. Comput. 2010, 217, 1231–1245. [Google Scholar]
Shinar, J.; Glizer, V.Y.; Turetsky, V. Capture zone of linear strategies in interception problems with variable structure dynamics. J. Frankl. I 2014, 351, 2378–2395. [Google Scholar]
Khairallah, N.; Kassas, Z.M. An Interacting Multiple Model Estimator of LEO Satellite Clocks for Improved Positioning. In Proceedings of the 2022 IEEE 95th Vehicular Technology Conference: (VTC2022-Spring), Helsinki, Finland 19–22 June 2022; pp. 1–5. [Google Scholar]
Li, X.F.; Ren, J.; Li, Y.B. Multi-mode filter target tracking method for mobile robot using multi-agent reinforcement learning. Eng. Appl. Artif. Intel. 2024, 127, 107398. [Google Scholar]
Wang, S.Y.; Zeng, Q.H.; Shao, C.; Li, F.D.; Liu, J.Y. Fault Detection and Interactive Multiple Models Optimization Algorithm Based on Factor Graph Navigation System. Remote Sens. 2024, 16, 1651. [Google Scholar] [CrossRef]
Li, X.Y.; Du, W.; Zhang, L.Q. Fault Diagnosis and Tolerant Control of Spacecraft via Interacting Multiple Model. In Proceedings of the 2024 IEEE 7th Advanced Information Technology, Electronic and Automation Control Conference (IAEAC), Chongqing, China, 15–17 March 2024; pp. 640–645. [Google Scholar]
Zou, X.G.; Zhou, D.; Du, R.L.; Liu, J.Q. Active defense guidance law via cooperative identification and estimation. J. Guid. Control Dynam. 2018, 41, 2507–2512. [Google Scholar]
Tang, X.; Ye, D.; Huang, L.; Sun, Z.W.; Sun, J.Y. Pursuit-evasion game switching strategies for spacecraft with incomplete-information. Aerosp. Sci. Technol. 2021, 119, 107112. [Google Scholar]
Jagat, A.; Sinclair, A.J. Optimization of Spacecraft Pursuit-Evasion Game Trajectories in the Euler-Hill Reference Frame. In Proceedings of the AIAA/AAS Astrodynamics Specialist Conference, San Diego, CA, USA, 4–7 August 2014. Paper Number 4131. [Google Scholar]
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar]
Silvestrini, S.; Lavagna, M. Neural-based predictive control for safe autonomous spacecraft relative maneuvers. J. Guid. Control Dynam. 2021, 44, 2303–2310. [Google Scholar]
Ding, G.; Qin, L. Study on the prediction of stock price based on the associated network model of LSTM. Int. J. Mach. Learn. Cyb. 2020, 11, 1307–1317. [Google Scholar]
Zhou, X.Y.; Qin, T.; Ji, M.J.; Qiao, D. A LSTM assisted orbit determination algorithm for spacecraft executing continuous maneuver. Acta Astronaut. 2023, 204, 568–582. [Google Scholar]
Alizadegan, H.; Rashidi Malki, B.; Radmehr, A.; Karimi, H.; Asghari Ilani, M. Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction. Energy Explor. Exploit 2024, 01445987241269496. [Google Scholar]
Zhao, H.; Wei, B.; Zhang, P.; Guo, P.; Shao, Z.; Xu, S.; Jiang, L.; Hu, H.; Zeng, Y.; Xiang, P. Safety analysis of high-speed trains on bridges under earthquakes using a LSTM-RNN-based surrogate model. Comput. Struct 2024, 294, 107274. [Google Scholar]

Figure 1. Reference coordinate system.

Figure 2. LSTM cell structure.

Figure 3. The probability estimation neural network.

Figure 4. The framework of the LSTM-IMML method.

Figure 5. Simulation results of Pursuer with fixed strategy.

Figure 6. Loss during training.

Figure 7. Simulation results of Pursuer with 3 switchable strategies.

Figure 8. Mode probabilities estimated by IMM.

Figure 9. Mode probabilities estimated by LSTM-IMML.

Figure 10. Root mean square error of position.

Figure 11. Root mean square error of velocity.

Figure 12. Simulation results of Evader using four evasion strategies and Pursuer using IMM method.

Figure 13. Simulation results of Evader using four evasion strategies and Pursuer using LSTM-IMML method.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, C.; Chen, D.; Liao, W. Interactive Multiple-Model Learning Filter for Spacecraft Pursuit–Evasion Game Strategy Switch Based on Long Short-Term Memory Network. Aerospace 2024, 11, 894. https://doi.org/10.3390/aerospace11110894

AMA Style

Wang C, Chen D, Liao W. Interactive Multiple-Model Learning Filter for Spacecraft Pursuit–Evasion Game Strategy Switch Based on Long Short-Term Memory Network. Aerospace. 2024; 11(11):894. https://doi.org/10.3390/aerospace11110894

Chicago/Turabian Style

Wang, Chuangge, Danhe Chen, and Wenhe Liao. 2024. "Interactive Multiple-Model Learning Filter for Spacecraft Pursuit–Evasion Game Strategy Switch Based on Long Short-Term Memory Network" Aerospace 11, no. 11: 894. https://doi.org/10.3390/aerospace11110894

APA Style

Wang, C., Chen, D., & Liao, W. (2024). Interactive Multiple-Model Learning Filter for Spacecraft Pursuit–Evasion Game Strategy Switch Based on Long Short-Term Memory Network. Aerospace, 11(11), 894. https://doi.org/10.3390/aerospace11110894

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Interactive Multiple-Model Learning Filter for Spacecraft Pursuit–Evasion Game Strategy Switch Based on Long Short-Term Memory Network

Abstract

1. Introduction

2. Dynamics of Spacecraft Pursuit–Evasion

3. Game Strategy Switch

3.1. Linear Quadratic Game Strategy

3.2. Zero-Effort Miss Game Strategy

3.3. The Design of Switchable Pursuit-Evasion Strategy

4. Strategy Estimation Method

4.1. IMM-Based Strategy Switch Method

4.2. Interactive Multiple-Model Feedback Learning Filter

4.3. LSTM-IMML Method

5. Numerical Simulation

5.1. The Case of a Pursuer with a Fixed Strategy

5.2. The Case of a Pursuer with Strategy Switch

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI