1. Introduction
Recently, increasing research efforts [
1] have been dedicated to orbital differential games where spacecraft are regarded as conflicting players, and attempts are being made to maximize their interests. The most typical problem among them is the orbital pursuit–evasion game in which the spacecraft aims to optimize the survival performance indices such as miss distance, game duration, energy consumption, or a combination of them. While most previous research focus on the two-player orbital pursuit–evasion games [
2], the game scenario that includes three spacecraft lacks much attention and remains quite open. In this paper, the three-player orbital pursuit–evasion–defense (PED) game has been investigated, in which the roles of the three spacecraft are the pursuer, the evader, and the defender, respectively. A motivating scenario for the PED problem is the active protection of in-orbit spacecraft from space debris (or out-of-control satellites) [
3]. In this scenario, the in-orbit spacecraft is the evader, the space debris is considered to be the pursuer, and the spacecraft accompanying the evader is the defender. The defender can reduce the impact threat of the debris by actively intercepting the pursuer. This study formulated this active protection problem as a PED differential game, as mentioned in [
4].
Pontani and Conway [
5] conducted an early study on a type of two-player orbital pursuit–evasion problem for spacecraft interception, where the pursuer attempts to reduce the interception time while the evader aims to maximize it. A two-sided optimal solution, i.e., the saddle-point solution, is obtained by solving a challenging high-dimensional two-point boundary value problem (TPBVP). Thereafter, many researchers merge the state-of-the-art intelligent optimization algorithms, such as the evolutionary algorithms [
6], and the traditional gradient optimization algorithms, such as the gradient descent algorithm [
7], to form a variety of numerical methods for solving the TPBVP, such as the sensitive method [
8], the shooting method [
9], and the combined shooting and collocation methods [
10]. Instead of focusing on the orbital pursuit–evasion game with interception time as the only objective, Jagat and Sinclair [
11] examined the orbital linear-quadratic differential game (LQDG), where the pursuer and evader try to optimize the individual performance index, combining both the miss distance and the energy consumption. A pair of two-sided linear-quadratic (LQ) guidance laws are obtained. The orbital LQDG was further extended to nonlinear-quadratic cases by considering the nonlinear spacecraft dynamics [
12]. Taking into account a more realistic information condition in LQDG, Li et al. [
13] investigated the orbital pursuit–evasion game with incomplete information and proposed an optimal strategy for the evader. The above research primarily focuses on the orbital differential game between two players, whereas this paper investigates the game that involves three players.
Three-player differential games have been chiefly examined in the field of missile defense over the past couple of years [
14], with a focus on the target–attacker–defender (TAD) issue. In this problem, a missile (attacker) is pursuing a non-maneuverable aircraft (target), while the target has the potential to launch another missile (defender) to protect itself by actively intercepting the attacker. Research efforts from various perspectives have been devoted to the problem. Shaferman and Shima [
15] proposed a cooperative guidance law for the defender in which the possible guidance laws and parameters of the incoming homing missile are represented by employing a multiple-model adaptive estimator. Ratnoo and Shima [
16] came forth with a guidance strategy for the pursuer in a situation when a set of guidance laws like the line-of-sight guidance are implemented for the defender-missile. Considering both sides, Perelman et al. [
17] reported cooperative evasion and pursuit strategies based on linear kinematics. More meticulously, Prokopov and Shima [
18] categorized the possible cooperation schemes between the target and the defender into three types: one-way cooperation realized by the target, one-way cooperation realized by the defender, and two-way cooperation. Comparison results highlighted that the two-way cooperation exhibited the best performance. Taking energy consumption as the optimization objective, Weiss et al. [
19] determined the minimum interception and evasion effort required to achieve the desired performance in terms of miss distance. Some prominent studies have improved the above strategies from different aspects: the optimization of the switched system [
20], the accessibility of the control information [
21], suitability for the large distance case [
22], and the utilization of the learning-based method [
23]. Moreover, there is also a qualitative analysis of the three-player conflict problem presented by Rubinsky and Gutman [
24] in which the algebraic conditions for the pursuer to capture the evader while escaping from the defender are examined.
Although extensive research has been conducted for the three-player TAD differential game in the fields of missiles, few research works have been presented for the orbital PED game that involves three spacecraft. Due to the dissimilarity in dynamics and game environment, the method developed for the missile TAD problem may not be directly applicable to the orbital PED game in which the gravitational difference among the players needs to be considered. Moreover, the game duration of the orbital game cannot be estimated using a linearized collision triangle like that in the TAD problem, which makes it more complex and difficult to solve. Moreover, the defender in space is not necessarily to be launched by the evader but can hover or lurk at a distance away from the evader at the beginning of the game. Existing research on the orbital PED problem is very limited. The interception trajectories of the orbital PED game under continuous thrust assumption have been optimized by Liu et al. [
25]. The obtained solutions are open-loop and thus cannot reflect real-time conflicts among the three players, as they are being treated as a trajectory optimization problem rather than a closed-loop guidance problem. Later on, they also optimized an impulsive-transfer solution to this problem [
26]. But, the solution is still open-loop. Liang et al. [
27] considered space active defense problems in a two-on-two engagement, including an interceptor, a protector, a target spacecraft, and a defender. The closed-loop guidance laws are framed for them, neglecting the differences in the gravitational force among the players in the game formulation.
In this paper, the three-player orbital PED game is examined and treated as a closed-loop guidance problem. Orbital dynamics accounting for the gravitational force are employed for the spacecraft. Considering both the energy consumption and the miss distance, an LQDG formulation is adopted to model the game. Three categories of guidance strategies are progressively designed and compared. The key contributions of this study are as follows: (1) a two-sided optimal pursuit strategy is designed for the pursuer, which can enhance its self-defense ability while chasing a mission; (2) a cooperative evasion–defense strategy is devised for the evader and the defender to enhance their cooperation. Moreover, a variant of the LQ strategy, the linear-quadratic duration-adaptive (LQDA) strategy, is presented with time-to-go exclusively designed to achieve the interception condition.
The remainder of this paper is organized as follows:
Section 2 briefly reviews the orbital dynamics and the two-player game model, and further introduces the formulation of the three-player orbital PED game.
Section 3 represents the details of deriving and computing the LQDA guidance law, along with the two-sided optimal pursuit strategy, and the design of the cooperative evasion–defense strategy.
Section 4 highlights the effectiveness of the proposed strategies via numerical simulations, followed by concluding remarks in
Section 5.
3. Strategy Design
Different strategies are designed for the three introduced cases. The simplest scenario is where the three players are independent. The pursuer merely chases the evader and has no maneuvers to evade the defender, the evader simply dodges the interception of the pursuer and has no coordination with the defender, and the defender focuses solely on chasing the pursuer. For this scenario, a linear-quadratic duration-adaptive (LQDA) strategy has been designed for the three players. Further, considering that the pursuer can chase the evader and simultaneously evade the defender, a two-sided optimal pursuit strategy has been proposed for the pursuer. Eventually, considering the potential coordination between the evader and defender, a cooperative optimal evasion–defense strategy has been designed.
3.1. Linear-Quadratic Duration-Adaptive Strategy
As discussed in
Section 2.3, the three-player game in the first scenario can be formulated as two sets of two-player pursuit–evasion games. But, the LQ strategy in Equations (7) and (8) cannot be applied directly to the game since it terminates at a fixed duration. For this three-player game terminating by the interception condition in Equation (15), an LQDA strategy is designed for the spacecraft players.
The primary idea of the LQDA strategy is to adjust the time-to-go according to the real-time miss distance and achieve a terminal interception, or to say, to complete a terminal control (TC) [
30] in the pursuit–evasion game. More specifically, the LQDA strategy has the same form of control laws as the classic LQ strategy, but the time-to-go is designed in a different manner. The idea of adjusting the time-to-go can be seen in the literature concerning missiles [
31]. Gutman and Rubinsky [
32] derived analytical vector guidance laws through the analysis of the time-to-go. Later on, Ye et al. [
33] applied this idea to a kind of orbital pursuit–evasion game that considers the terminal miss distance as the only payoff function. But the methods do not apply to this LQ pursuit–evasion game that considers both the miss distance and energy consumption. So, this section presents the details of the LQDA strategy.
3.1.1. Derivation
For the convenience of expression, a reduced state vector is utilized to describe such an interception game:
where
is a three-dimensional state vector known as the zero-effort miss (ZEM) [
31], predicted at time
t based on the fact that no control effort will be applied during the time interval
;
represents the state-transition matrix.
Further, the game is defined by Equations (10) and (11) can be simplified using the following equations:
where the control matrix
and the weighting matrix
.
Accordingly, the form of the saddle-point solution becomes the following [
34]:
where the time-to-go
replaces the game duration. The gain
where the reduced controllability matrices
are computed by:
The above solutions are derived from a standard LQ finite-time game, lacking a terminal constraint according to Equations (14) and (15), i.e.,
which transforms the original problem into a free-time pursuit–evasion TC problem [
30]. To make the solutions in Equations (18) and (19) adaptive to the problem, the time-to-go is exclusively designed.
With the controls of the saddle-point strategy input, the final ZEM depends on the following state equation:
Denoting
and combining the initial condition
, Equation (23) can be rewritten as follows:
For this first-order linear time-variant ordinary differential equation, a closed-form solution of
does not exist unless
[
35]. The
can be computed by numerically integrating Equation (24), but an approximate method is adopted to solve it efficiently.
First, the interval
is evenly divided into
N (
) subintervals,
,
, …,
, …,
, where
. In each subinterval
,
is taken as a constant matrix
, where
. According to the analytical solution of the linear time-invariant system without control inputs [
35],
Let
, then
Combining Equations (22) and (26), the following relationship can be derived:
The designed time-to-go is the exact solution of Equation (27). Given the current miss distance, the current time-to-go can be computed by reversely solving the above equation.
The procedure for computing the time-to-go is presented in
Section 3.1.2. Given the time-to-go
, the control laws for the pursuer and the evader will be derived as follows:
where the values of the control gains keep fluctuating with the real-time miss distance. Note that although the control laws have a linear form, they are nonlinear since the equation of the time-to-go is nonlinear.
Similar to the pursuer in Equation (28), the control law of the defender is given by
where
represents the time-to-go of the defender calculated by Equation (13) and
indicates the ZEM of the defender.
Eventually, Equations (28)–(30) constitute the LQDA strategy for the three independent players.
3.1.2. Calculation of Time-to-Go
Precisely, the time-to-go is the minimum positive zero-point of the following function
:
Since that is not a monotone function and may have multiple zero points, solving the time-to-go demands special techniques. Note that is used here to denote the when , because changes with during the game.
Two rational assumptions are made that (1) at the beginning of the game, the distance between the two players is larger than the permitted miss distance [m in Equation (14)]; (2) given stronger maneuverability of the chasing player along with sufficient time, the miss distance will be smaller than the permitted value; , and . These two assumptions guarantee the existence of the time-to-go.
Based on these assumptions, the first zero point must lie in the descending segment of the curve of
, i.e.,
, excluding all the other zero points with
. After extensive simulations with a variety of different parameters,
m,
,
, and
, it can be concluded that
(3) the time interval between a pair of adjacent peak and trough of
is no less than the 50 s even for extremely fluctuating
(refer
Figure 2);
(4) the change in the time-to-go between two consecutive steps is significantly less than 50 s unless there is a critical point (refer
Figure 3). Because of this, the time-to-go calculated in the last step is taken as the initial guess, and Newton’s method [
36] is used to efficiently solve the current time-to-go irregularities.
But as for the time-to-go at the initial phase of the game or when a critical point appears, it is mandatory to look for an appropriate initial guess. According to the analysis
(3) and
(4), a minimum
can be obtained, subject to
from
to
, with a step of
(
is sufficiently large). This
is later used as the initial guess in Newton’s method. The entire algorithm is summarized in Algorithm 1.
Algorithm 1. Calculation of the time-to-go in the game |
1: | Initialize all the parameters in when . |
2: | For do |
3: | Calculate the derivative of with respect to , . |
4: | If , |
5: | Take as the initial guess and solve using the Newton method. |
6: | If is founded within the accuracy of iteration, |
7: | Break the loop. |
8: | End if |
9: | End if |
10: | End for |
11: | For do |
12: | Let and calculate the value of . |
13: | If , |
14: | Take as the initial guess and solve using the Newton method. |
15: | If is founded within the accuracy of iteration, |
16: | Break the loop. |
17: | Else |
18: | Repeat steps 2 to 8 to obtain the and break the loop. |
19: |
End if |
20: | Else |
21: | Let , calculate the value of . |
22: | End if |
23: | Repeat steps 11 to 18. |
24: | End for |
3.2. Two-Sided Optimal Pursuit Strategy
In the second scenario, the pursuer can chase the evader and simultaneously evade the defender. A two-sided optimal pursuit strategy for the pursuer is further proposed, where “two-sided” implies to take both the evader and the defender into consideration.
From the pursuer’s angle of view, the game system is a combination of Equations (10) and (12):
To behave both as a pursuer as well as an evader, a comprehensive payoff function is needed to be constructed for the pursuer to simultaneously handle the two opponents:
To solve the optimal pursuing controls for the pursuer a Hamiltonian
and a terminal function
are constructed as follows:
where
and
denote the co-states associated with the state equations of
and
, respectively. Substituting Equations (29) and (30) into Equation (34),
According to Pontryagin’s maximum principle (PMP), the optimal controls of the pursuer must minimize the Hamiltonian. Thus, the derivative of Hamiltonian with respect to the pursuer’s control inputs satisfies the following equation:
The evolution of the co-state vectors
and
follows the adjoint equations:
The transversality condition of the co-state satisfies the following conditions:
Substituting Equations (38) and (39) into Equation (37), and further substituting Equation (37) into Equation (32), then the following equation can be obtained:
Combining the equations of the state and the co-state, we have
Let
,
, and let
Therefore, Equation (43) can be rewritten in the matrix form as below:
A linear relationship between the state and the co-state is obtained as below:
where
represents the Riccati matrix. By substituting Equation (47) into Equation (46), a matrix Riccati differential equation is derived as follows:
Substituting Equation (48) into Equations (40) and (41), the boundary condition of Equation (48) is derived as
By integrating Equation (48) backwards, the matrix
can be solved. Then, the optimal control law of the pursuer in Equation (37) can be calculated by
where
.
Eventually, Equation (50) provides a two-sided optimal pursuit strategy for an offensive as well as defensive pursuer.
3.3. Cooperative Evasion–Defense Strategy
The evader and the defender can coordinate with each other to defeat the pursuer. With the help of the defender, the evader can operate without moving far away from the pursuer. Instead, it only needs to maintain a decent distance from the pursuer, preferably more than that between the defender and the pursuer. Based on this, the following payoff function is derived for the evader and the defender:
In the above equation, the square of the distance is utilized to replace the distance for maintaining a quadratic form of the payoff function, which is the same as that in Equation (33).
Based on this, the Hamiltonian is constructed as
According to the PMP, the optimal controls of the evader and the defender must minimize the Hamiltonian. The derivatives of Hamiltonian with respect to their controls thus satisfy the following conditions:
The evolution of the co-state vectors
and
follows the adjoint equations:
Substituting Equations (55) and (56) into the equations of the state (32), the following equation can be derived:
The equations of the state and the co-state are further combined as follows:
Performing a similar procedure in Equations (45)–(49), the following matrices can be obtained to form a matrix Riccati differential equation with the same form as that of Equation (48).
After solving the matrix Riccati differential Equation (48), the optimal control laws of the evader and the defender are as follows:
where
and
.
Consequently, Equations (61) and (62) constitute the optimal cooperative evasion–defense strategy for a coordinated evader–defender pair.
4. Simulation Results
To verify the performance of the proposed strategies, a three-player orbital pursuit–evasion–defense game with distinct strategies is simulated and compared in this section. Initially, the evader is moving in a circular orbit with a height of 400 km. To establish the LVLH coordinate frame, the initial orbit of the evader is chosen as the reference orbit and the point coinciding with the initial position of the evader is selected as the reference moving point. The initial states of the three players are chosen as
= [−6 km, −16 km, 4 km, −9 m/s, 13.6 m/s, 0 m/s],
= [0 km, 0 km, 0 km, 0 m/s, 0 m/s, 0 m/s], and
= [−1 km, 3 km, 0 km, 0 m/s, 0 m/s, 0 m/s] under the LVLH frame. The permitted terminal miss distance in the game is 100 m. The payoff weightings of the game are tuned as
,
,
, and
. The above case (case 1 in
Table 1) will be a typical representative which is simultaneously used in
Section 4.1,
Section 4.2,
Section 4.3 for a vertical comparison. Some other cases with different initial positions are also provided in
Table 1. (In case 2, the pursuer flies around the evader and the defender hovers; in case 3, the defender flies around the evader and the pursuer hovers.)
4.1. Results of the LQDA Strategy
First, we examine the LQDA strategy in the two-player pursuit–evasion game (On the basis of case 1, the defender is temporarily overlooked). As illustrated in
Figure 4a, the pursuer can successfully intercept the evader. Compared with the results of the classic LQ strategy in
Figure 4b, the LQDA strategy has higher interception accuracy. Then, a defender using the LQDA strategy is added to the case (namely, becoming case 1), the game results are illustrated in
Figure 4c. It can be observed that the defender intercepts the pursuer before the pursuer reaches the evader. Compared with
Figure 4a, the appearance of an unexpected defender dramatically affects the ending of the game, signifying the prominence of the defender to the evader. But if the defender uses the classic LQ strategy, the evader will lose the game, as shown in
Figure 4d.
Figure 4e,f further depict the control histories of the three players when the LQDA and LQ strategy are employed by the defender in case 1, respectively. Cases 2 and 3 further validate the significance of the defender. In these two cases, the pursuer is intercepted by the defender after 262 s and 242 s, respectively.
Figure 5 demonstrates the relationship between the miss distance and the time-to-go under different settings of
. A turning point at around 100 m can be observed, before which the curve drops sharply and after which the curve descends gradually. (If the miss distance is 100 m, the interception time will be 560 s. But if the miss distance is 10 m, the interception time will be 1018 s). This suggests that when the miss distance is smaller than a certain value, the required interception time will greatly increase with the decrease in the miss distance.
4.2. Results of the Two-Sided Optimal Pursuit Strategy
When the control law of the pursuer has no components against the defender, the defender can break the two-player equilibrium between the pursuer and the evader. Thus, if the pursuer can pursue the evader and avoid the defender simultaneously, the ending of the game may be in favor of the pursuer.
In this case, the parameters of the game are the same as those in
Section 4.1, but the pursuer is employing the two-sided optimal pursuit strategy, which aims to chase the evader and dodge the defender simultaneously.
Figure 6a depicts the trajectories of the game. Compared to the results of the LQDA strategy in
Figure 4a, the pursuer shown in
Figure 6a succeeds in avoiding the interception from the defender, leading to a sharp transition in the trajectory of the defender. As seen from the locally enlarged view in
Figure 6a, the pursuer successfully intercepts the evader in the end.
The control histories are illustrated in
Figure 6b. Several jumps can be observed in the three players’ controls, accompanied by the jumps of the time-to-go. This is because the control gain in optimal feedback strategy is a function of the time-to-go. To further demonstrate the self-defense ability of the pursuer under the two-sided optimal pursuit strategy, the defender is put in another initial position which is closer to the pursuer. Results in
Figure 7a reflect that the pursuer can still safely complete the interception.
The results of the LQDA strategy and the two-sided optimal pursuit strategy under different
settings are compared in
Figure 7b. Given a lower
(the importance of the energy consumption decreases for the pursuer), the pursuer still cannot succeed under the LQDA strategy, but the results under the two-sided optimal pursuit strategy are different from this. Through these simulations, it can be concluded that it is indeed crucial for the pursuer to be both offensive and defensive in such a PED game, and the designed two-sided optimal pursuit strategy can help the pursuer to achieve high self-defense performance in an offensive task.
4.3. Results of the Cooperative Evasion–Defense Strategy
For the evader and the defender, the performance of the LQDA strategy also possesses some drawbacks owing to ignorance exhibited during the information sharing and the potential cooperation between the two players. Under the LQDA strategy, the defender is solely focusing on intercepting the pursuer but is not concerned with indirectly increasing the distance between the pursuer and the evader. However, the evader does not need to maintain a considerable distance from the pursuer. The optimum scenario would be to maintain a distance between the pursuer and the evader larger than that between the pursuer and the defender.
An example is given in
Figure 8, where the pursuer under the LQDA strategy succeeds in intercepting the evader before being intercepted by the defender. In this case, although the
is smaller than that in
Section 4.1 (the importance of the energy consumption decreases for the evader), the evader is still unable to get rid of the pursuer’s interception. However, if the evader and the defender adopt the cooperative evasion–defense strategy, the game ending will be quite different, as illustrated in
Figure 8b. In the cooperation with the defender, the evader behaves like bait and flies to the defender to reduce the distance between the defender and the pursuer. Eventually, the defender achieves a head-on interception to the pursuer.
In
Figure 9, the game results are depicted to demonstrate the cooperation effect between the evader and the defender when the evader or the defender has no control. In
Figure 9a, the defender is a free-flying spacecraft with no external control force, and the evader is employing the proposed cooperation strategy. As the game goes on, the evader gradually flies to the defender. Similarly, when the evader has no control (see
Figure 9b), the defender actively flies to the evader and intercepts the pursuer. From this case, it can be concluded that the proposed cooperative evasion–defense strategy can help the evader and the defender build cooperation and achieve a better ending in the orbital PED game.