1. Introduction
A hypersonic vehicle is a specific type of vehicle that traverses the atmosphere at speeds exceeding Mach 5. In recent years, the prominence of gliding hypersonic vehicles has increased significantly due to their remarkable capabilities for long-range and cross-range flights, as well as their ability to achieve high-precision targeting in both military and civilian domains [
1]. However, when operating in complex flight environments characterized by factors such as heat, pressure, and overload, the system dynamics of a hypersonic vehicle become coupled, uncertain, and highly nonlinear [
2]. In order to ensure the success of flight missions, the entry guidance algorithm for a hypersonic vehicle necessitates enhanced robustness and autonomy [
3]. Therefore, the online real-time trajectory optimization algorithm is particularly necessary to be developed [
4]. Nevertheless, it is still a significant challenge to design an optimal or near-optimal guidance strategy for onboard applications with guaranteed stability and real-time performance [
5]. In this paper, a novel entry guidance algorithm based on inverse reinforcement learning is proposed to generate optimal or near-optimal control in real-time under complex hypersonic flight environments.
The optimal guidance can be described as a trajectory optimization or an optimal control problem (OCP), which aims to optimize a performance index while satisfying complex constraints. Traditionally, OCP algorithms can be classified into two main types: indirect methods and direct methods [
6,
7]. Based on Pontryagin’s minimum principle, indirect methods transform the OCP problem into a two-point boundary value problem [
8]. Numerous indirect methods have been developed to solve OCP problems, offering high-precision solutions [
9,
10,
11]. However, due to the main drawbacks of indirect methods in convergence difficulty and solving path constraints, direct methods have gained broader application. Direct methods involve transforming OCP problems into finite-dimensional parameter optimization problems through discretization methods, subsequently solved using nonlinear solvers. By combining convex optimization theory and the pseudospectral method, direct methods offer advantages in real-time performance and solution accuracy and have been successfully applied to solving many OCP problems [
12,
13,
14]. Ref. [
15] developed a two-stage trajectory optimization framework using convex optimization and the pseudospectral method to solve the hypersonic vehicle entry problem, improving computational efficiency. Additionally, a Chebyshev pseudospectral method based on differential flatness theory was applied to the hypersonic vehicle entry problem, demonstrating that the guidance algorithm can reduce the solution time for a single trajectory [
16]. Unfortunately, modeling the constraints of the trajectory planning problem into a convex format losslessly is difficult work, particularly for hypersonic dynamic systems with highly nonlinear dynamics and constraints. Moreover, the computational cost escalates rapidly with an increase in discrete points, and the number of iterations becomes unpredictable when aiming for a high-precision solution [
17,
18]. Consequently, for the OCP algorithms, the above shortcomings limit its online application.
In recent years, artificial intelligence (AI)-based guidance algorithms have gained significant attention in the aerospace field, primarily due to their real-time performance and adaptable capabilities. Ref. [
19] proposed that these algorithms can be broadly classified into two implementations: supervised learning (SL) and reinforcement learning (RL). In supervised learning, neural networks are trained using extensive datasets of optimal trajectories generated by OCP algorithms. Several SL-based guidance algorithms have been proposed for onboard applications [
20]. For instance, Ref. [
21] presented a deep neural network (DNN)-based guidance framework for planetary landing, capable of predicting fuel optimal controls from raw images captured by an onboard optical camera. Ref. [
22] introduced a DNN-based guidance method for two-degree-of-freedom (2DOF) entry trajectory planning of hypersonic vehicles, and numerical simulations demonstrated its ability to provide stable and real-time control instructions for maximizing terminal velocity. Ref. [
23] proposed a real-time DNN-based algorithm to solve the 3DOF entry problem of hypersonic vehicles, and the results showed its capability to generate optimal onboard controls. Similarly, Ref. [
24] proposed a DNN-based controller to map optimal control commands from the state, and a hard-ware-in-the-loop (HIL) system was developed to support the real-time performance conclusion of the controller. However, both Ref. [
23] and Ref. [
24] required generating a large number of datasets before the training process, which was extremely costly in practical applications. Consequently, ensuring the convergence accuracy of existing SL-based algorithms for hypersonic entry problems necessitates constructing a large number of datasets to cover all scenarios, which remains a drawback for these algorithms when the missions are time-sensitive.
On the other hand, reinforcement learning offers an alternative approach that does not rely on existing datasets. RL algorithms continuously update model parameters through interactions with the environment, leading to improved generalization and robustness. RL has also shown promising results in addressing aerospace problems [
25,
26]. In comparison to traditional guidance algorithms, RL-based guidance algorithms exhibit strong anti-disturbance capabilities and real-time performance [
27,
28,
29,
30]. Ref. [
31] proposed an RL-based adaptive real-time guidance algorithm for the 3DOF entry problem of hypersonic vehicles, and numerical simulation demonstrated that the proposed algorithm achieved a higher terminal success rate compared to the Linear Quadratic Regulator (LQR) method. The convergence of RL-based algorithms heavily relies on the design of the reward function. In the implementation of Ref. [
31], dense rewards were provided by tracking a human-designed guidance law, which made it challenging for the model to search for the global optimal solution. Hence, in order to generate optimal control commands, it is key to design an improved reward function for RL-based algorithms.
In hypersonic entry flight environments, the reward signal is often sparse, meaning that the agent receives a reward only after completing a mission. To address this challenge, a reward shaping function needs to be designed to provide dense rewards throughout the learning process, motivating the agent to learn continuously. A reasonable reward shaping function is hard to complicate manually. Fortunately, the IRL method is one potential solution for solving this problem. The IRL algorithm represents an innovative approach within the realm of the RL method. Diverging from the traditional RL algorithm, the IRL method aims to infer a potential reward function from observed expert examples. Furthermore, IRL can be thought of as an inverse problem where the objective is to understand the motivation behind expert behavior rather than directly learning a policy.
This paper presents a novel guidance algorithm based on Inverse Reinforcement Learning (IRL) to address the guidance problem during the entry phase of hypersonic vehicles. In comparison to other AI-based algorithms and traditional optimal control algorithms explored in previous works, the proposed algorithm’s controller can generate optimal actions that meet the requirements of onboard applications using only a few trajectories as datasets. To the best of our knowledge, there have been few studies reported on the generation of optimal actions for hypersonic vehicles via a well-trained DNN-based controller supported by a few trajectories as datasets. Therefore, the concern is attempted to be addressed in this work. In our work, the guidance algorithm is implemented as a policy neural network updated through simulated experience over an interaction of a hypersonic entry simulated environment. In the proposed IRL framework, a customized version of Proximal Policy Optimization Algorithms (PPO) [
32] is used to optimize the policy network. In particular, a generative adversarial neural network is designed to distinguish between the agent trajectories and the optimal datasets provided by the optimal control theory, which can effectively address the sparse reward problem while maintaining optimality. It is worth noting that the optimal dataset is only served by a few trajectories. After model optimization, the policy can offer high-frequency closed-loop guidance commands for onboard applications. To fully demonstrate the applicability of the proposed algorithm, numerical simulations are conducted on two typical hypersonic vehicles: the Common Aero Vehicle-Hypersonic (CAV-H) [
33] and the Reusable Launch Vehicle (RLV). The two hypersonic vehicles correspond to different flight conditions, which is sufficient to illustrate the generalization of the proposed algorithm.
This paper is structured as follows.
Section 2 provides an introduction to the entry problem for hypersonic vehicles, highlighting its characteristics of highly nonlinear dynamics. The Inverse-Reinforcement-Learning-based (IRL-based) guidance method is detailed in
Section 3, including the algorithm framework, reward function design, and network structures utilized in the approach.
Section 4 verifies the effectiveness and optimality of the proposed algorithm by performing a number of simulations through comparisons with General Pseudo-Spectral Optimal Control Software (GPOPS) [
34]. The conclusion of this paper is given in
Section 5.