Next Article in Journal
On Convoluted Forms of Multivariate Legendre-Hermite Polynomials with Algebraic Matrix Based Approach
Previous Article in Journal
Reference Architecture for the Integration of Prescriptive Analytics Use Cases in Smart Factories
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

H Differential Game of Nonlinear Half-Car Active Suspension via Off-Policy Reinforcement Learning

by
Gang Wang
*,
Jiafan Deng
,
Tingting Zhou
and
Suqi Liu
Guangxi Key Laboratory of Manufacturing System & Advanced Manufacturing Technology, Guilin University of Electronic Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(17), 2665; https://doi.org/10.3390/math12172665
Submission received: 20 July 2024 / Revised: 14 August 2024 / Accepted: 26 August 2024 / Published: 27 August 2024
(This article belongs to the Special Issue New Advances in Vibration Control and Nonlinear Dynamics)

Abstract

:
This paper investigates a parameter-free H differential game approach for nonlinear active vehicle suspensions. The study accounts for the geometric nonlinearity of the half-car active suspension and the cubic nonlinearity of the damping elements. The nonlinear H control problem is reformulated as a zero-sum game between two players, leading to the establishment of the Hamilton–Jacobi–Isaacs (HJI) equation with a Nash equilibrium solution. To minimize reliance on model parameters during the solution process, an actor–critic framework employing neural networks is utilized to approximate the control policy and value function. An off-policy reinforcement learning method is implemented to iteratively solve the HJI equation. In this approach, the disturbance policy is derived directly from the value function, requiring only a limited amount of driving data to approximate the HJI equation’s solution. The primary innovation of this method lies in its capacity to effectively address system nonlinearities without the need for model parameters, making it particularly advantageous for practical engineering applications. Numerical simulations confirm the method’s effectiveness and applicable range. The off-policy reinforcement learning approach ensures the safety of the design process. For low-frequency road disturbances, the designed H control policy enhances both ride comfort and stability.

1. Introduction

The active suspension system is a critical component for intelligent vehicle operation, significantly impacting driving, steering, braking, and obstacle navigation [1]. With advancements in actuators and drive-by-wire technology, the application of active suspensions is becoming increasingly widespread. Particularly during the pilot phase of intelligent driving technology, active suspension control based on AI large models has emerged as a challenging and vital research area [2,3].
In recent years, researchers have proposed numerous actuator and controller design methods for active suspension systems, including air suspension, electromagnetic suspension, and a magneto-rheological damper [4,5,6]. Reference [7] investigated finite-time neural control of electromagnetic suspension, taking partial actuator failures into account. Reference [8] designed a static output feedback controller for active suspension, which reduces implementation costs. With the advancement of sensing technology, road preview-based active suspension control techniques have also been developed, such as wheelbase preview control [9,10] and comfort-oriented longitudinal velocity planning [11].
In the design of active suspension control, it is crucial to consider the system’s nonlinearity and fault tolerance. References [12,13] developed a nonlinear model predictive controller and a robust fault-tolerant controller with saturation for a quarter-car active suspension, achieving favorable practical results. Additionally, various robust control methods have been extensively researched, including finite-time H control [14], fuzzy sampled H control [15], and finite-frequency H control [16], all of which address system robustness. Reference [17] explored model predictive control for active suspension, achieving optimal control performance while satisfying constraints. However, most methods assume known model parameters or only consider vertical dynamics, with limited focus on nonlinear coupling models in complex and variable environments. Some researchers have investigated model-free control methods, such as neuro-fuzzy H2/H control [18], adaptive optimal control based on neural networks [19], and approximation-free preset-time control [20]. Given that parameter calibration is time-consuming and parameters often vary in complex and dynamic environments, model-free control methods are more practical for real-world applications.
As a technical approach within AI large models, data-driven reinforcement learning (RL) methods have garnered considerable attention. References [21,22,23] proposed data-driven optimal control methods for active suspensions, utilizing driving data to train and optimize control policies. Data-driven RL algorithms are highly effective for nonlinear H differential games, nonlinear optimal control, static output-feedback, and event-triggered control [24,25,26]. While some researchers have applied data-driven RL methods to active suspensions, most have only considered vertical dynamics, neglecting the nonlinear coupling characteristics of front and rear wheels and pitch stability. In previous studies on H control [21,23,27,28], the system is typically assumed to be linear, and only simplified quarter-car active suspension models are considered, which deviates from real-world scenarios. Additionally, most methods require system model parameters, which inevitably increase control costs and complexity. This is the primary motivation for our research. In summary, research on nonlinear H differential games for active suspensions remains insufficient. This paper establishes a more realistic half-car active suspension model and transforms the nonlinear H control problem into a differential game between two players. An off-policy RL algorithm, which does not require model parameters, is designed to approximate the solution to the Hamilton–Jacobi–Isaacs (HJI) equation by collecting a portion of vehicle vibration data. Finally, the effectiveness of control optimization and implementation is verified through hardware-in-the-loop simulation, and the effective vibration reduction range of the control policy is analyzed in detail. The main contributions of this paper are as follows:
  • To enhance the vibration control performance of active suspension systems, a more realistic half-car suspension dynamics model is established, and a nonlinear H differential game method is proposed;
  • A neural network-based approach is utilized to derive an off-policy RL algorithm for solving the HJI equation, providing an optimal solution without requiring any model parameters;
  • A hardware-in-the-loop simulation platform is developed, validating the effectiveness and feasibility of the proposed method through numerical simulations.
The remainder of the paper is structured as follows. Section 2 introduces the nonlinear half-car active suspension control model. Section 3 describes the methodology in detail. Section 4 presents the numerical simulation results. Section 5 concludes with a summary of the findings.

2. Mathematical Model

To represent the nonlinear coupling relationship between the front and rear suspensions, a widely used half-car active suspension model is established, as shown in Figure 1, with the symbols defined in Table 1. According to the second kind of Lagrange’s equations, the dynamic equations for the half-car active suspension are as follows [9,10,29,30]:
M z ¨ c = f 1 + f 2 J θ ¨ = a f 1 b f 2 m t 1 η ¨ 1 = k t 1 η 1 μ 1 f 1 m t 2 η ¨ 2 = k t 2 η 2 μ 2 f 2
where
f 1 = f s 1 + f d 1 + u 1 f 2 = f s 2 + f d 2 + u 2 f s 1 = k s 1 η 1 z 1 + k s n 1 η 1 z 1 2 + k s n 2 η 1 z 1 3 f d 1 = b s 1 η ˙ 2 z ˙ 2 + b s n 1 η ˙ 2 z ˙ 2 2 f s 2 = k s 2 η 2 z 2 + k s n 3 η 1 z 1 2 + k s n 4 η 1 z 1 3 f d 2 = b s 2 η ˙ 2 z ˙ 2 + b s n 2 η ˙ 2 z ˙ 2 2
In Equation (1), the quadratic and cubic nonlinearities of the spring and damper are considered, along with the nonlinear coupling of the front and rear suspensions. For larger road excitations, these nonlinear characteristics should not be ignored.
Define
z 1 = z c + a sin θ z 2 = z c b sin θ
Choose the state variables of the system as
x = x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 T
where
x 1 = z c + a sin θ η 1 ,   x 2 = z c b sin θ η 2 ,   x 3 = η 1 μ 1 ,   x 4 = η 2 μ 2 , x 5 = z ˙ c + a cos θ θ ˙ ,   x 6 = z ˙ c b cos θ θ ˙ ,   x 7 = η ˙ 1 ,   x 8 = η ˙ 2
Define
a 1 = 1 M + a 2 cos θ J ,   a 2 = 1 M a b cos θ J ,   a 3 = 1 M + b 2 cos θ J
combining Equations (1) to (4), the nonlinear state-space equations of the half-car active suspension are obtained:
x ˙ t = A x + B x u + C ω y t = x t
where
A x = x 5 x 7 x 6 x 8 x 7 x 8 a 1 k s 1 x 1 a 2 k s 2 x 2 b s 1 a 1 x 5 b s 2 a 2 x 6 + a 1 b s 1 x 7 + a 2 b s 2 x 8 a sin θ θ ˙ 2 + a 1 f n 1 + a 2 f n 2 a 2 k s 1 x 1 a 3 k s 2 x 2 b s 1 a 2 x 5 b s 2 a 3 x 6 + a 2 b s 1 x 7 + a 3 b s 2 x 8 + b sin θ θ ˙ 2 + a 2 f n 1 + a 3 f n 2 k t 1 m t 1 x 3 + k s 1 m t 1 x 1 b s 1 m t 1 x 7 + b s 1 m t 1 x 5 1 m t 1 f n 1 k t 2 m t 2 x 4 + k s 2 m t 2 x 2 b s 2 m t 2 x 8 + b s 2 m t 2 x 6 1 m t 2 f n 2 , B x = 0 4 × 1 0 4 × 1 a 1 a 2 a 2 a 3 1 m t 1 0 0 1 m t 2 , C = 0 0 0 0 1 0 0 1 0 4 × 1 0 4 × 1 , u = u 1 u 2 , ω = ω 1 ω 2 = μ ˙ 1 μ ˙ 2 f n 1 = k s n 1 η 1 z 1 2 + k s n 2 η 1 z 1 3 + b s n 1 η ˙ 2 z ˙ 2 2 f n 2 = k s n 3 η 1 z 1 2 + k s n 4 η 1 z 1 3 + b s n 2 η ˙ 2 z ˙ 2 2
In the Equation (5), the dimensions of all matrices are A x 8 × 1 , B x 8 × 2 , and C x 8 × 2 .
Define the evaluation metrics for control performance as follows:
Λ 1 = z ¨ c θ ¨ T Λ 2 = z 1 η 1 z 1 max z 2 η 2 z 2 max k t 1 η 1 μ 1 9.8 b M a + b + m t 1 k t 2 η 2 μ 2 9.8 a M a + b + m t 2 u 1 u 1 max u 2 u 2 max T
Here, Λ 1 represents the variables that need to be minimized, namely vertical body acceleration and pitch acceleration. Λ 2 signifies that suspension dynamic travel, tire dynamic load, and control forces must all be below their physical limits, i.e., Λ 2 i < 1 , i = 1 , 2 , 6 .
For vehicles traveling at a certain speed, instantaneous impacts caused by road irregularities typically have a significant effect on ride comfort and stability. Therefore, the following road excitation model is considered [31]
μ t = 2 1 cos 2 π υ ƛ t
where is the height of the road irregularity, ƛ represents the width of the road irregularity, and υ denotes the vehicle speed. The excitation frequency of the impact model is related to the vehicle speed, allowing the evaluation of different excitation frequencies by adjusting the vehicle speed.

3. Methodology

3.1. H Differential Game

In this section, we design a nonlinear H controller based on off-policy RL for the established nonlinear active suspension state-space Equation (5). The road and the controller are considered as two independent players; thus, the design of the H controller can be transformed into an H differential game.
First, define the cost function as follows:
J x 0 , u , ω = 0 x T Q x + u T R u γ 2 ω 2 d t
where Q and R are positive definite matrices, and γ is a positive constant. Q and R are also the weight matrices of the cost function and are closely related to the performance metrics (6). Under zero initial conditions, if J x 0 , u , ω 0 , then the system has an H performance, and γ is called the disturbance attenuation level.
Define the value function of the H differential game as
V x t = t x T Q x + u T R u γ 2 ω 2 d τ
Combining Equations (8) and (9), the H differential game can be expressed as
V x t = min u max d J x 0 , u , ω
where V x t is the optimal value function.
In Equation (10), the control policy u needs to minimize the cost function, while the disturbance policy ω needs to maximize the cost function. Therefore, there exists the following Nash equilibrium point:
u = arg min u   J x 0 , u , ω ω = arg max ω   J x 0 , u , ω
Equation (11) is also referred to as the optimal game policy.
Definition 1.
If inequality (12) holds, the policy pair  u , ω  is a Nash equilibrium point for the H differential game.
J x 0 , u , ω J x 0 , u , ω J x 0 , u , ω
To find the Nash equilibrium point of the H differential game, first establish the Hamilton–Jacobi–Isaacs (HJI) equation for the active suspension. By differentiating the value function (9), we get
x T Q x + u T R u γ 2 ω 2 + V T A x + B x u + C ω = 0
where V = V / x . Equation (13) is called the Bellman equation, and solving this partial differential equation yields the value function V x .
Define the Hamiltonian function of Equation (13) as
H x , V , u , ω = x T Q x + u T R u γ 2 ω 2 + V T A x + B x u + C ω
Based on the static condition of (14), H u = 0 and H ω = 0 , the Nash equilibrium point of the H differential game can be obtained as [1–1]
u = u V x = 1 2 R 1 B x T V ω = ω V x = 1 2 γ 2 C T V
Equation (15) is the saddle point of the Hamiltonian function, and substituting it into Equation (13) yields the HJI equation
x T Q x + 1 4 V T B x R 1 B x T V 1 4 γ 2 V T C C T V + V T A x 1 2 B x R 1 B x T V + 1 2 γ 2 C C T V = 0
The analytical solution of (16) is the optimal value function V x t , and V x t satisfies the positive semi-definite condition.
From the above analysis, it is evident that solving the HJI equation can yield the optimal value function and the optimal game policies. Substituting the analytical solution back into Equation (14) gives H x , V , u , ω = 0 . However, directly solving the HJI equation is extremely difficult. This paper will design a method based on off-policy RL for approximate solutions.
Theorem 1.
If a positive semi-definite solution  V x t  exists for the HJI Equation (16), then u , ω  satisfies the following conditions: (1) For all  ω L 2 0 , , the half-car active suspension closed-loop system (5) under zero initial conditions meets H performance and is asymptotically stable in the absence of disturbances; (2)  u , ω  is the Nash equilibrium of the H differential game.
Proof. 
Since V x t 0 is a solution of the HJI equation and V x = 0 with x = 0 , we choose V x t as the Lyapunov function. Differentiating it, we obtain
V ˙ x = V T A x + B x u + C ω = x T Q x + u T R u γ 2 ω 2
Clearly, combining Equation (17) with Equation (15), we obtain
V T A x + B x u + C ω + x T Q x + u T R u γ 2 ω 2 = H x , V , u , ω = H x , V , u , ω + u u T R u u γ 2 ω ω T ω ω
If u = u and V = V , then Equation (18) satisfies the condition
H x , V , u , ω = γ 2 ω ω T ω ω 0
Further integrating Equation (17) gives
V x T V x 0 0 T x T Q x + u T R u γ 2 ω 2 d τ
Since V x 0 = 0 and V x T 0 , the closed-loop system satisfies H performance under zero initial conditions as
0 T x T Q x + u T R u d τ 0 T γ 2 ω 2 d τ , ω L 2 0 ,
Note that when ω = 0 , Equation (17) satisfies V ˙ x 0 , indicating asymptotic stability of the closed-loop system in the absence of disturbances.
To prove that u , ω is a Nash equilibrium point of the H differential game, rewrite the cost function (8) as
J x 0 , u , ω = 0 x T Q x + u T R u γ 2 ω 2 d t + 0 V ˙ d t V x + V x 0 = 0 H x , V , u , ω d t V x + V x 0
Considering t and V x 0 , we have
J x 0 , u , ω = 0 H x , V , u , ω d t + V x 0 + 0 u u T R u u γ 2 ω ω T ω ω d t
Let V x = V x , and from (23) we obtain inequality (12). According to Definition 1, u , ω is indeed a Nash equilibrium point of the H differential game. Thus, the proof is complete. □

3.2. Off-Policy RL Algorithm

To solve the HJI Equation (16) numerically, this section designs an off-policy RL algorithm. Compared to on-policy RL algorithms, off-policy methods do not require real-time updates, making them safer as they do not affect the actual physical system. Additionally, this method does not require any model parameter information, which can reduce design costs. The algorithm structure is illustrated in Figure 2. In Figure 2, the state data of the active suspension during the vehicle’s operation is first collected as input. The actor and critic neural networks are then employed to learn the solution to the integral Bellman equation online, with the neural network (NN) weights being continuously updated until convergence is achieved. A portion of the active suspension vibration data is extracted and used to update the value function, control policy, and disturbance policy in real time within an actor–critic RL framework.
Firstly, rewrite the nonlinear state-space equations of the active suspension as
x ˙ = A x + B x u k + C ω k + B x u u k + C ω ω k
where u and ω are arbitrary but reasonable control inputs and road disturbances, and u k and ω k are the control and disturbance policies to be updated.
Combining (15) and (24), differentiating the value function (9) yields
V ˙ k x = V k T A x + B x u k + C ω k + V k T B x u u k + C ω ω k = x T Q x + u k T R u k γ 2 ω k 2 2 u k + 1 T R u u k + V k T C ω ω k
Integrating (25) yields
V k x t x t + T = t t + T x T Q x + u k T R u k γ 2 ω k 2 d τ t t + T 2 u k + 1 T R u u k d τ + t t + T V k T C ω ω k d τ
In Equation (26), to simultaneously solve for V k and u k + 1 , define the actor NN and critic NN as follows:
u k + 1 x = Θ k + 1 T σ x V k x = W k T ϕ x
where σ x α and ϕ x β are the basis functions of the NNs, and Θ k + 1 T and W k T are appropriate weight coefficients.
Substituting (27) into (26) gives
W k T ϕ x x t x t + T = t t + T x T Q x + u k T R u k γ 2 ω k 2 d τ t t + T 2 σ x T Θ k + 1 R u u k d τ + t t + T W k T ϕ x x 1 ϕ x x 2 ϕ x x 8 C ω ω k d τ
From Equation (28), it can be seen that the NN weight coefficients can be solved offline without the need for model parameters. To ensure the uniqueness of solution in Equation (28), define
ξ t = ϕ x T x t x t + T t t + T ω ω k T C T ϕ x x 1 ϕ x x 2 ϕ x x 8 T d τ
ζ t = t t + T 2 u u k T R T σ x T d τ
ς t = t t + T x T Q x + u k T R u k γ 2 ω k 2 d τ
where
t = t k , 1 , t k , 2 , t k , 𝓁 0 t k , i + T t k , i + 1 t k , i + T t k + 1 , 1 , k = 0 , 1 , , i = 1 , 2 , 𝓁
Combining (29)–(31), Equation (28) can be rewritten compactly as the following equation:
Φ k Ξ k = Ω k
where
Φ k = ξ t k , 1 ζ t k , 1 ξ t k , 2 ζ t k , 2 ξ t k , 𝓁 ζ t k , 𝓁 , Ξ k = v e c W k v e c Θ k + 1 , Ω k = ς t k , 1 ς t k , 2 ς t k , 𝓁
To ensure Equation (32) has a unique solution, it is necessary to satisfy
r a n k Φ k α + β
where 𝓁 α + β is the dimension of the data, and α and β are the dimensions of the basis functions of the actor NN and critic NN, respectively. Thus, the off-policy RL algorithm (Algorithm 1) can be summarized as follows:
Algorithm 1. Off-policy RL algorithm for nonlinear active suspension H differential game.
Step 1: set k = 1 , initialize parameters Θ 1 and W 0 , apply any reachable u and ω , and collect data x 1 x 𝓁 ;
Step 2: Solve the Equation (32) to obtain NN weight coefficients Θ k + 1 T and W k T if the Equation (33) holds;
Step 3: Update the actor u k + 1 , ω k + 1 and critic V k using (15) and (27);
Step 4: Set k = k + 1 , repeat steps 2–3 until Ξ k Ξ k 1 ε ( ε is a small positive number).

4. Numerical Simulation

4.1. Implementation of Algorithm 1

To validate the feasibility of the off-policy RL algorithm for the nonlinear active suspension differential game, this section applies Algorithm 1 for policy optimization. The simulation setup uses a hardware-in-the-loop platform, as depicted in Figure 3. The nonlinear active suspension model runs on the MicroAutoBox II, while Algorithm 1 operates on the Speedgoat Controller. The MicroAutoBox II was purchased from the German company dSPACE. The Speedgoat Controller was purchased from Speedgoat in Bern, Switzerland. The MicroAutoBox II is equipped with an IBM PPC 750GL processor (900 MHz), and the Speedgoat Controller features an Intel Celeron 2 GHz CPU with four cores. Real-time data are displayed on the host computer, and the Simulink program runs concurrently.
During the simulation process, the parameters of the vehicle active suspension are listed in Table 2. The simulation parameters for Algorithm 1 are set as follows: T = 0.01   s , 𝓁 = 132 , ε = 1 , Q = diag ( [ 10 , 10 , 10 , 10 , 1000 , 1000 , 10 , 10 ] ) , R = 0.00001 × eye ( 2 ) , γ = 35 . All other parameters are set to zero. The NN basis functions are defined as
ϕ x = x 1 2 , 2 x 1 x 2 , 2 x 1 x 3 , 2 x 1 x 4 , 2 x 1 x 5 , 2 x 1 x 6 , 2 x 1 x 7 , 2 x 1 x 8 , x 2 2 , 2 x 2 x 3 , 2 x 2 x 4 , 2 x 2 x 5 , 2 x 2 x 6 , 2 x 2 x 7 , 2 x 2 x 8 , x 3 2 , 2 x 3 x 4 , 2 x 3 x 5 , 2 x 3 x 6 , 2 x 3 x 7 , 2 x 3 x 8 , x 4 2 , 2 x 4 x 5 , 2 x 4 x 6 , 2 x 4 x 7 , 2 x 4 x 8 , x 5 2 , 2 x 5 x 6 , 2 x 5 x 7 , 2 x 5 x 8 , x 6 2 , 2 x 6 x 7 , 2 x 6 x 8 , x 7 2 , 2 x 7 x 8 , x 8 2 , x 1 3 , x 2 3 , x 3 3 , x 4 3 , x 5 3 , x 6 3 , x 7 3 , x 8 3 T σ x = x 1 , x 2 , x 3 , x 4 , x 5 , x 6 , x 7 , x 8 , x 1 2 , 2 x 1 x 2 , 2 x 1 x 3 , 2 x 1 x 4 , 2 x 1 x 5 , 2 x 1 x 6 , 2 x 1 x 7 , 2 x 1 x 8 , x 2 2 , 2 x 2 x 3 , 2 x 2 x 4 , 2 x 2 x 5 , 2 x 2 x 6 , 2 x 2 x 7 , 2 x 2 x 8 , x 3 2 , 2 x 3 x 4 , 2 x 3 x 5 , 2 x 3 x 6 , 2 x 3 x 7 , 2 x 3 x 8 , x 4 2 , 2 x 4 x 5 , 2 x 4 x 6 , 2 x 4 x 7 , 2 x 4 x 8 , x 5 2 , 2 x 5 x 6 , 2 x 5 x 7 , 2 x 5 x 8 , x 6 2 , 2 x 6 x 7 , 2 x 6 x 8 , x 7 2 , 2 x 7 x 8 , x 8 2 T
The simulation results are shown in Figure 4, Figure 5, Figure 6, Figure 7 and Figure 8. Figure 4 displays the training results of the critic NN weight coefficients, Figure 5 shows the training results of the actor NN weight coefficients, and Figure 6 illustrates the number of policy updates. It can be observed from the figures that the convergence condition is met after 10 updates. Given the sampling time of 0.01 s, the total update time amounts to 13.2 s. Notably, the control and disturbance inputs required for data collection are random white noise signals, which do not necessitate a specific functional form, thereby offering greater applicability. Figure 7 and Figure 8 illustrate the control and disturbance inputs during the data collection process, demonstrating that small inputs are sufficient to meet the requirements. This method ensures the safety of real vehicle data collection.

4.2. Vibration Control Performance Analysis

To further verify the effectiveness of the trained control policies, simulations are conducted using a road impact model (7). The vehicle speed is set to υ = 0.36 ~ 108   km / h , L = 5   m , corresponding to a frequency range from 0.1 Hz to 8 Hz, covering typical road conditions. The simulation results are depicted in Figure 9, Figure 10, Figure 11, Figure 12 and Figure 13. The classic linear H algorithm [23,27,28] is compared.
Figure 9 shows the road excitation profile, Figure 10 displays the pitch acceleration of the vehicle, Figure 11 illustrates the vertical acceleration of the vehicle body, and Figure 12 presents the response of the suspension performance constraints. From Figure 10 and Figure 11, it can be observed that the trained control policies effectively attenuate vibrations in the frequency range of 0.1–6 Hz, corresponding to vehicle speeds below 80   km / h . For intelligent driving vehicles equipped with visual sensors, vehicle speed can be controlled and planned according to road conditions [1]. Figure 10 and Figure 11 show that, in terms of ride comfort, the off-policy RL solution outperforms traditional linear H algorithm and passive suspension system. This advantage arises from the method’s ability to account for system nonlinearities, resulting in more effective control performance. Notably, this method requires no model information. Figure 12 indicates that the suspension travel index of the linear H algorithm exceeds the maximum limit, whereas the proposed method consistently remains within a reasonable range. Additionally, Figure 12 demonstrates that all constraint indicators of the proposed method are less than 1, thereby satisfying the physical constraints during actual driving conditions.
To quantitatively evaluate the simulation results, Table 3 presents the root mean square (RMS) values for Λ 1 and u . As shown in Table 3, the off-policy RL solution achieves smaller RMS values, whereas traditional linear H algorithms, which do not account for nonlinearities, exhibit deviations in control performance.
Table 4 provides the peak values for Λ 2 . As evident from Table 4, the off-policy RL solution meets the constraint condition, i.e., Λ 2 i < 1 . In contrast, the linear H algorithm’s suspension travel index exceeds the constraint value, resulting in a loss of constraint effectiveness. This occurs because the linear H method does not account for system nonlinearities, leading to certain indicators being uncontrolled.
To further illustrate the H performance of the closed-loop system, define the function
r t = 0 t x T Q x + u T R u d τ 0 t ω 2 d τ
Figure 13 shows the response of r t , indicating that its value remains consistently below the maximum value γ 2 , thereby satisfying the H performance requirement.

5. Conclusions

This paper investigates the H differential game problem for nonlinear active suspensions, accounting for both geometric nonlinearity and higher-order nonlinearity of the damping elements in the design scheme. An off-policy RL method utilizing an actor–critic structure is employed to approximate the solution to the HJI equation without requiring any model parameters. The simulation results demonstrate the method’s effectiveness. By extracting a portion of the vehicle driving data, the policies can be optimized and converge to the optimal solution after several iterations. Frequency sweep excitation tests reveal that the control policy is effective within the low-frequency range of 0–6 Hz, significantly reducing body vibrations at the first mode while maintaining the physical constraints of the vehicle suspension. Compared to passive suspension and traditional methods, the proposed method reduces vertical acceleration by 20% and 10%, respectively, and pitch acceleration by 10% and 5%. Additionally, the peak control force of the proposed method is 10% smaller than that of traditional methods. All other metrics of the proposed method are below 1, satisfying the time-domain constraints. Future research will explore multi-player collaborative game mechanisms, control saturation constraints, and finite-frequency constraints to further enhance the nonlinear vibration control performance of active suspensions.

Author Contributions

Methodology, G.W.; writing—original draft preparation, J.D.; writing—review and editing, T.Z.; supervision, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Fund of China (No. 12202112), Guangxi Natural Science Foundation (No. 2021JJB160015, No. 2021JJA160252), Guangxi Key Laboratory of Manufacturing System & Advanced Manufacturing Technology (No. 22-35-4-S006).

Data Availability Statement

Data are contained within the article. The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

On behalf of all authors, the corresponding author states that there is no conflict of interest.

References

  1. Yu, M.; Evangelou, S.A.; Dini, D. Advances in Active Suspension Systems for Road Vehicles. Engineering 2023, 33, 160–177. [Google Scholar] [CrossRef]
  2. Pan, H.; Zhang, C.; Sun, W. Fault-tolerant multiplayer tracking control for autonomous vehicle via model-free adaptive dynamic programming. IEEE Trans. Reliab. 2022, 72, 1395–1406. [Google Scholar] [CrossRef]
  3. Li, Q.; Chen, Z.; Song, H.; Dong, Y. Model predictive control for speed-dependent active suspension system with road preview information. Sensors 2024, 24, 2255. [Google Scholar] [CrossRef]
  4. Zhang, J.; Yang, Y.; Hu, C. An adaptive controller design for nonlinear active air suspension systems with uncertainties. Mathematics 2023, 11, 2626. [Google Scholar] [CrossRef]
  5. Su, X.; Yang, X.; Shi, P.; Wu, L. Fuzzy control of nonlinear electromagnetic suspension systems. Mechatronics 2014, 24, 328–335. [Google Scholar] [CrossRef]
  6. Humaidi, A.J.; Sadiq, M.E.; Abdulkareem, A.I.; Ibraheem, I.K.; Azar, A.T. Adaptive backstepping sliding mode control design for vibration suppression of earth-quaked building supported by magneto-rheological damper. J. Low Freq. Noise Vib. Act. Control 2022, 41, 768–783. [Google Scholar] [CrossRef]
  7. Liu, L.; Sun, M.; Wang, R.; Zhu, C.; Zeng, Q. Finite-Time Neural Control of Stochastic Active Electromagnetic Suspension System with Actuator Failure. IEEE Trans. Intell. Veh. 2024, 1–12. [Google Scholar] [CrossRef]
  8. Kim, J.; Yim, S. Design of Static Output Feedback Suspension Controllers for Ride Comfort Improvement and Motion Sickness Reduction. Processes 2024, 12, 968. [Google Scholar] [CrossRef]
  9. Li, P.; Lam, J.; Cheung, K.C. Multi-objective control for active vehicle suspension with wheelbase preview. J. Sound Vib. 2014, 333, 5269–5282. [Google Scholar] [CrossRef]
  10. Pang, H.; Wang, Y.; Zhang, X.; Xu, Z. Robust state-feedback control design for active suspension system with time-varying input delay and wheelbase preview information. J. Frankl. Inst. 2019, 356, 1899–1923. [Google Scholar] [CrossRef]
  11. Liu, Z.; Si, Y.; Sun, W. Ride comfort oriented integrated design of preview active suspension control and longitudinal velocity planning. Mech. Syst. Signal Process. 2024, 208, 110992. [Google Scholar] [CrossRef]
  12. Rodriguez-Guevara, D.; Favela-Contreras, A.; Beltran-Carbajal, F.; Sotelo, C.; Sotelo, D. A Differential Flatness-Based Model Predictive Control Strategy for a Nonlinear Quarter-Car Active Suspension System. Mathematics 2023, 11, 1067. [Google Scholar] [CrossRef]
  13. Guo, X.; Zhang, J.; Sun, W. Robust saturated fault-tolerant control for active suspension system via partial measurement information. Mech. Syst. Signal Process. 2023, 191, 110116. [Google Scholar] [CrossRef]
  14. Xue, W.; Li, K.; Chen, Q.; Liu, G. Mixed FTS/H control of vehicle active suspensions with shock road disturbance. Veh. Syst. Dyn. 2019, 57, 841–854. [Google Scholar] [CrossRef]
  15. Li, H.; Jing, X.; Lam, H.K.; Shi, P. Fuzzy sampled-data control for uncertain vehicle suspension systems. IEEE Trans. Cybern. 2013, 44, 1111–1126. [Google Scholar]
  16. Sun, W.; Gao, H.; Kaynak, O. Finite frequency H control for vehicle active suspension systems. IEEE Trans. Control Syst. Technol. 2010, 19, 416–422. [Google Scholar] [CrossRef]
  17. Dogruer, C.U. Constrained model predictive control of a vehicle suspension using Laguerre polynomials. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2020, 234, 1253–1268. [Google Scholar] [CrossRef]
  18. Esmaeili, J.S.; Akbari, A.; Farnam, A.; Azad, N.L.; Crevecoeur, G. Adaptive Neuro-Fuzzy Control of Active Vehicle Suspension Based on H2 and H Synthesis. Machines 2023, 11, 1022. [Google Scholar] [CrossRef]
  19. Han, X.; Zhao, X.; Karimi, H.R.; Wang, D.; Zong, G. Adaptive optimal control for unknown constrained nonlinear systems with a novel quasi-model network. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2867–2878. [Google Scholar] [CrossRef]
  20. Huang, T.; Wang, J.; Pan, H. Approximation-free prespecified time bionic reliable control for vehicle suspension. IEEE Trans. Autom. Sci. Eng. 2023, 1–11. [Google Scholar] [CrossRef]
  21. Qin, Z.C.; Xin, Y. Data-driven H vibration control design and verification for an active suspension system with unknown pseudo-drift dynamics. Commun. Nonlinear Sci. Numer. Simul. 2023, 125, 107397. [Google Scholar] [CrossRef]
  22. Mazouchi, M.; Yang, Y.; Modares, H. Data-driven dynamic multiobjective optimal control: An aspiration-satisfying reinforcement learning approach. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 6183–6193. [Google Scholar] [CrossRef]
  23. Wang, G.; Li, K.; Liu, S.; Jing, H. Model-Free H Output Feedback Control of Road Sensing in Vehicle Active Suspension Based on Reinforcement Learning. J. Dyn. Syst. Meas. Control 2023, 145, 061003. [Google Scholar] [CrossRef]
  24. Wang, A.; Liao, X.; Dong, T. Event-driven optimal control for uncertain nonlinear systems with external disturbance via adaptive dynamic programming. Neurocomputing 2018, 281, 188–195. [Google Scholar] [CrossRef]
  25. Wu, H.N.; Luo, B. Neural Network Based Online Simultaneous Policy Update Algorithm for Solving the HJI Equation in Nonlinear H Control. IEEE Trans. Neural Netw. Learn. Syst. 2012, 23, 1884–1895. [Google Scholar] [PubMed]
  26. Luo, B.; Wu, H.N.; Huang, T. Off-policy reinforcement learning for H control design. IEEE Trans. Cybern. 2014, 45, 65–76. [Google Scholar] [CrossRef]
  27. Kiumarsi, B.; Lewis, F.L.; Jiang, Z.P. H control of linear discrete-time systems: Off-policy reinforcement learning. Automatica 2017, 78, 144–152. [Google Scholar] [CrossRef]
  28. Wu, H.N.; Luo, B. Simultaneous policy update algorithms for learning the solution of linear continuous-time H state feedback control. Inf. Sci. 2013, 222, 472–485. [Google Scholar] [CrossRef]
  29. Valadbeigi, A.P.; Sedigh, A.K.; Lewis, F.L. H Static Output-Feedback Control Design for Discrete-Time Systems Using Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 396–406. [Google Scholar] [CrossRef]
  30. Sun, W.; Zhao, Z.; Gao, H. Saturated adaptive robust control for active suspension systems. IEEE Trans. Ind. Electron. 2012, 60, 3889–3896. [Google Scholar] [CrossRef]
  31. Li, W.; Du, H.; Feng, Z.; Ning, D.; Li, W.; Sun, S.; Tu, L.; Wei, J. Singular system-based approach for active vibration control of vehicle seat suspension. J. Dyn. Syst. Meas. Control 2020, 142, 091003. [Google Scholar] [CrossRef]
Figure 1. Half-car suspension model.
Figure 1. Half-car suspension model.
Mathematics 12 02665 g001
Figure 2. Actor–critic structure for off-policy RL.
Figure 2. Actor–critic structure for off-policy RL.
Mathematics 12 02665 g002
Figure 3. Hardware-in-the-loop simulation.
Figure 3. Hardware-in-the-loop simulation.
Mathematics 12 02665 g003
Figure 4. Critic NN weight coefficients.
Figure 4. Critic NN weight coefficients.
Mathematics 12 02665 g004
Figure 5. Actor NN weight coefficients.
Figure 5. Actor NN weight coefficients.
Mathematics 12 02665 g005
Figure 6. Number of updates.
Figure 6. Number of updates.
Mathematics 12 02665 g006
Figure 7. Control actions for data collection.
Figure 7. Control actions for data collection.
Mathematics 12 02665 g007
Figure 8. Disturbance actions for data collection.
Figure 8. Disturbance actions for data collection.
Mathematics 12 02665 g008
Figure 9. Road excitation profile for test.
Figure 9. Road excitation profile for test.
Mathematics 12 02665 g009
Figure 10. The evolution of the pitch acceleration.
Figure 10. The evolution of the pitch acceleration.
Mathematics 12 02665 g010
Figure 11. The evolution of the vertical acceleration.
Figure 11. The evolution of the vertical acceleration.
Mathematics 12 02665 g011
Figure 12. The evolution of the suspension performance Λ 2 .
Figure 12. The evolution of the suspension performance Λ 2 .
Mathematics 12 02665 g012
Figure 13. Evolution of r t .
Figure 13. Evolution of r t .
Mathematics 12 02665 g013
Table 1. Symbol definitions.
Table 1. Symbol definitions.
SymbolMeaning
M Sprung mass
J Pitch moment of inertia
m t 1 , m t 2 Unsprung mass
a Distance from front axle to center of mass
b Distance from rear axle to center of mass
k t 1 , k t 2 Tire stiffness
k s 1 , k s 2 Suspension spring linear stiffness
k s n 1 , k s n 2 , k s n 3 , k s n 4 Suspension spring nonlinear stiffness
b s 1 , b s 2 Suspension hydraulic linear damping
b s n 1 , b s n 2 Suspension hydraulic nonlinear damping
u 1 , u 2 Active control force
z c Vertical displacement of the center of mass
θ Pitch angle
η 1 , η 2 Vertical displacement of unsprung mass
μ 1 , μ 2 Road disturbance
Table 2. Model parameters.
Table 2. Model parameters.
ParameterValueParameterValue
M kg 500 a m 1.25
J kg m 2 910 b m 1.45
m t 1 kg 30 k s 1 N / m 10,000
m t 2 kg 40 k s 2 N / m 10,000
k s n 1 N / m 2 1000 k s n 3 N / m 2 1000
k s n 2 N / m 3 20,000 k s n 4 N / m 3 20,000
b s 1 Ns / m 1000 b s 2 Ns / m 1000
k t 1 N / m 100,000 u 1 max N 2000
k t 2 N / m 100,000 u 2 max N 2000
z 1 max m 0.1 b s n 1 Ns 2 / m 2 200
z 2 max m 0.1 b s n 2 Ns 2 / m 2 200
Table 3. RMS evaluation of simulation results.
Table 3. RMS evaluation of simulation results.
Method z ¨ c (m/s2) θ ¨ (rad/s2) u 1 (N) u 2 (N)
Passive suspension1.3711.081————
Linear H algorithm1.2281.004253.6255.1
Off-Policy RL solution1.1310.964247.2249.2
Table 4. Peak evaluation of simulation results.
Table 4. Peak evaluation of simulation results.
Method Λ 2 (1) Λ 2 (2) Λ 2 (3) Λ 2 (4) Λ 2 (5) Λ 2 (6)
Passive suspension0.93010.81310.58320.9135————
Linear H algorithm1.0761.0820.61530.8990.54840.5513
Off-Policy RL solution0.97250.97350.65140.98940.49210.4928
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, G.; Deng, J.; Zhou, T.; Liu, S. H Differential Game of Nonlinear Half-Car Active Suspension via Off-Policy Reinforcement Learning. Mathematics 2024, 12, 2665. https://doi.org/10.3390/math12172665

AMA Style

Wang G, Deng J, Zhou T, Liu S. H Differential Game of Nonlinear Half-Car Active Suspension via Off-Policy Reinforcement Learning. Mathematics. 2024; 12(17):2665. https://doi.org/10.3390/math12172665

Chicago/Turabian Style

Wang, Gang, Jiafan Deng, Tingting Zhou, and Suqi Liu. 2024. "H Differential Game of Nonlinear Half-Car Active Suspension via Off-Policy Reinforcement Learning" Mathematics 12, no. 17: 2665. https://doi.org/10.3390/math12172665

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop