Next Article in Journal
Modern Geochemical and Tectonic Exploration—The Key Factor in Discovering the Northern Copper Belt, Poland
Previous Article in Journal
Research on Gas Drainage Pipeline Leakage Detection and Localization Based on the Pressure Gradient Method
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Reinforcement Learning-Based Vibration Control for Half-Car Active Suspension Considering Unknown Dynamics and Preset Convergence Rate

Guangxi Key Laboratory of Manufacturing System & Advanced Manufacturing Technology, Guilin University of Electronic Technology, Guilin 541004, China
*
Author to whom correspondence should be addressed.
Processes 2024, 12(8), 1591; https://doi.org/10.3390/pr12081591
Submission received: 14 June 2024 / Revised: 23 July 2024 / Accepted: 27 July 2024 / Published: 29 July 2024
(This article belongs to the Section Automation Control Systems)

Abstract

:
Electromagnetic actuators, characterized by their lack of pneumatic or hydraulic circuits, rapid response, and ease of control, have the potential to significantly enhance the dynamic performance of automotive active suspensions. However, the complexity associated with their models and the calibration of control parameters hampers the efficiency of control design. To address this issue, this paper proposes a reinforcement learning vibration control strategy for electromagnetic active suspension. Firstly, a half-car dynamic model with electromagnetic active suspension is established. Considering the unknown dynamics of the actuator and its preset convergence performance, an optimal control method based on reinforcement learning is investigated. Secondly, a heuristic PI adaptive dynamic programming algorithm is presented. This method can update to the optimal control solution without requiring model parameters or initial design parameters. Finally, the energy consumption and dynamic performance of this method are analyzed through rapid prototyping control simulation. The results show that the ride comfort of the vehicle suspension can be improved with the given preset convergence rate.

1. Introduction

In recent years, with the development of intelligent driving and electronic control technology, active suspension systems have received significant attention. Especially in the automotive industry, integrating active suspension systems into steer-by-wire chassis to enhance overall vehicle safety is a hot research topic. However, this integration also faces new challenges, such as coordinating with other subsystems and participating in higher-level vehicle decision-making [1,2].
In terms of actuators, active suspension systems have always been developing towards a compact structure, lightweight design, low power consumption, high-frequency response, and low cost [3]. Electromagnetic active suspensions, characterized by features such as no need for pneumatic or hydraulic pathways, fast response, and ease of control, have promising market prospects [4,5]. In addition to hardware design, the performance of active suspension systems mainly depends on the design of the controllers. Numerous scholars have proposed new methods, such as event-triggered control [6,7], fixed-time control [8], bioinspired trajectory-tracking control [9,10], data-driven control, and predictive control that considers road perception [11,12]. In active suspension control, better handling of various adverse features in the control system has always been a concern for scholars. These features include actuator saturation, state estimation, parameter uncertainty, and nonlinear constraints. In [13], a robust saturation control method is proposed for active suspension systems. In [14], an exact output feedback control method is proposed. To enhance comfort and reduce motion sickness, Jeong et al. [15] proposed a static output feedback control method for active suspension. A control method for electromagnetic active suspension considering actuator faults and addressing finite-time and dimension explosion issues is studied in [16]. To better handle mechanical constraints, parameter uncertainty, and network attacks, some nonlinear suspension control methods have been proposed. Among these methods, quadratic programming, LMI, backstepping control, and barrier Lyapunov functions are applied [17,18,19,20].
Considering the complexity of vehicle models and the tedious parameter calibration process, reinforcement learning methods are gradually being applied to vehicle system control. Pan et al. [21] proposed a reinforcement learning-based model-free trajectory-tracking control for vehicles. In [22], Li et al. combined model predictive control with road preview information to design a controller for active suspension; however, this method requires accurate model parameters. To reduce reliance on model parameters, Wang et al. [23] utilized reinforcement learning to design a model-free control method for active suspension based on output feedback. Currently, most robust control methods exhibit excellent control performance, but these methods often require accurate model parameters, which undoubtedly increase the complexity of design and research and development efforts [24]. Since active suspension systems typically need to consider multiple control and optimization objectives, such as ride comfort, suspension travel, tire dynamic loads, and control energy consumption, this further increases the difficulty of control [25]. Furthermore, to achieve better overall performance, the rate of convergence of system states should be considered in control design [26,27,28]. However, research on multi-objective optimal control of electromagnetic active suspensions that considers both model-free approaches and preset convergence rates is relatively limited. Reinforcement learning is a data-driven machine learning algorithm that can iteratively learn the optimal control solution for uncertain systems either online or offline [29]. This method is an effective model-free approach that can be combined with many well-established classical control methods, such as LQ control [30,31], H∞ control [32,33], and multi-player games [34], thereby enhancing control design efficiency. Although some studies have attempted to apply reinforcement learning methods to active suspension control, they have primarily focused on single-input quarter-car active suspension models. Research on more complex half-car or full-car suspension models is relatively sparse, which is one of the motivations for this study. Moreover, in actual vehicle systems there are numerous model parameters, and accurately calibrating these parameters is time-consuming and costly, which is not conducive to the application of classical optimal control methods. In contrast, input–output data is more readily available [35]. Therefore, this paper applies reinforcement learning methods to the optimal control of a half-car active suspension, exploring the feasibility of model-free methods. Additionally, by considering preset convergence rates, this method can flexibly configure the eigenvalues of the closed-loop system within a given region to achieve specific dynamic performance, which will be beneficial for practical vehicle applications.
Based on the above analysis, this paper investigates a model-free control strategy for electromagnetic active suspensions considering preset convergence rates. The study focuses on a half-car active suspension with front and rear wheel coupling and designs a reinforcement learning vibration control strategy. The main contributions of this paper are as follows:
  • A heuristic reinforcement learning algorithm based on PI is proposed, which rapidly computes the optimal control solution of the system without requiring model parameters.
  • In controller design, the preset convergence performance is considered, and by adjusting the preset convergence rate, a balance between control energy consumption and other performance indicators was achieved.
  • Finally, a rapid prototyping control simulation platform was established to evaluate the energy consumption and dynamic performance of the active suspension through bumped road tests and frequency domain analysis.
The remaining sections of this paper include four subsections: Section 2 establishes the half-car control model for electromagnetic active suspensions. Section 3 presents a vibration control algorithm based on reinforcement learning. Section 4 provides simulation results of active suspension control. Finally, Section 5 concludes the paper.

2. Half-Car Control Model for Electromagnetic Active Suspension

2.1. Electromagnetic Actuator Mechanical Model

The electromagnetic suspension primarily consists of a linear motor, a helical spring, and a hydraulic damper, as depicted in Figure 1. The linear motor generates active control force, with the helical spring located externally and the hydraulic damper internally [3,4,5,36].
The internal structure of the linear motor actuator is illustrated in Figure 2, comprising primary and secondary components. The primary component consists of a permanent magnet ring, a soft iron ring, a fixed ring, and a pedestal. The permanent magnet ring and the soft iron ring are alternately arranged, with adjacent permanent magnet rings repelling each other. The fixed ring is located at the top to prevent the permanent magnet ring from moving due to repulsive forces. The secondary component mainly consists of three sets of coils and a coil framework. Coil 2 is utilized for active control, while coils 1 and 3 are employed for energy recovery.
Figure 3 illustrates the mechanical model of a linear motor. Here, r k represents the radius of the k -th layer of windings of the coil from the inside out, k = 1 n e , and d denotes the diameter of the enameled wire used for the coil windings. r m is the outer radius of the permanent magnet ring and the soft iron ring. h i 1 , h i 2 , and h i 3 , respectively, denote the heights of the soft iron rings. h c 1 , h c 2 , and h c 3 , respectively, indicate the heights of the coils. Φ 1 , Φ 2 , Φ 3 and Φ 4 , respectively, represent the main magnetic flux in the equivalent magnetic circuit. Φ 5 , Φ 6 , Φ 7 , and Φ 8 , respectively, represent the leakage magnetic flux in the equivalent magnetic circuit. The relevant dimensions are shown in Table 1.
In Figure 3, the magnetic induction intensity in the air gap corresponding to the soft iron ring 2 is given by the following:
B g g 2 r k = Φ 2 + Φ 3 2 π h i 2 r m + r k ,
Assuming the magnetic circuit changing the direction of the magnetic flux lines in coil 2 is negligible, using Ampere’s force law and Equation (1), we have the following:
F = B g g 2 L I = k = 1 n e Φ 2 + Φ 3 r k h c 2 h i 2 r m + r k d I ,
where F is the output force of the motor, L is the length of the coil cutting the magnetic flux lines, and I is the current.
Therefore, the control force of the electromagnetic suspension can be expressed as follows:
F = k f I ,
where
k f = k = 1 n e Φ 2 + Φ 3 r k h c 2 h i 2 r m + r k d .
In practical applications, the thrust coefficient k f is usually difficult to measure accurately, so it is treated as an unknown parameter. The stroke length of the linear motor is 115 mm. For the magnetic flux density distribution, please refer to Figure 4 of [36].

2.2. Half-Car Active Suspension Dynamics Modeling

Compared to the quarter-car active suspension, the half-car active suspension more comprehensively reflects the vehicle’s vertical and pitch characteristics, as well as the coupling relationship between the front and rear suspensions. Therefore, a half-car active suspension dynamics model, as shown in Figure 4, is employed [25].
Neglecting the minor tire damping, the dynamic equations of the half-car active suspension are as follows:
M z ¨ c = f 1 + f 2 J θ ¨ = a f 1 b f 2 m t 1 η ¨ 1 = k t 1 η 1 μ 1 f 1 m t 2 η ¨ 2 = k t 2 η 2 μ 2 f 2 ,
where
f 1 = k s 1 η 1 z 1 + b s 1 η ˙ 1 z ˙ 1 + F 1 f 2 = k s 2 η 2 z 2 + b s 2 η ˙ 2 z ˙ 2 + F 2
with
z 1 = z c + a θ z 2 = z c b θ
The state variables of the system are as follows:
x = x 1 x 2 x 3 x 4 x 5 x 6 x 7 x 8 T ,
where
x 1 = z c + a θ η 1 ,   x 2 = z c b θ η 2 ,   x 3 = η 1 μ 1 x 4 = η 2 μ 2 ,   x 5 = z ˙ c + a θ ˙ ,   x 6 = z ˙ c b θ ˙ x 7 = η ˙ 1 ,   x 8 = η ˙ 2
Assuming all state variables are measurable or can be estimated using a Kalman filter, combined with Equations (1)–(5), the system’s state-space equations are as follows:
x ˙ t = A x t + B u t + C ω t y t = x t ,
where
A = 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 a 1 k s 1 a 2 k s 2 0 0 a 1 b s 1 a 2 b s 2 a 1 b s 1 a 2 b s 2 a 2 k s 1 a 3 k s 2 0 0 a 2 b s 1 a 3 b s 2 a 2 b s 1 a 3 b s 2 k s 1 m t 1 0 k t 1 m t 1 0 b s 1 m t 1 0 b s 1 m t 1 0 0 k s 2 m t 2 0 k t 2 m t 2 0 b s 2 m t 2 0 b s 2 m t 2 B = 0 4 × 1 0 4 × 1 a 1 k f a 2 k f a 2 k f a 3 k f k f m t 1 0 0 k f m t 2 ,   C = 0 0 0 0 1 0 0 1 0 4 × 1 0 4 × 1 ,   u = I 1 I 2 , ω = μ ˙ 1 μ ˙ 2 a 1 = 1 M + a 2 J ,   a 2 = 1 M a b J ,   a 3 = 1 M + b 2 J
The control input signal of the system is the current, and the external input is the road disturbance. To better evaluate the system’s energy consumption, consider the root mean square (RMS) value of the motor’s thermal power as follows:
P r m s i = 1 T 0 T I i 2 r 2 d t ,   i = 1 ,   2
The road disturbance adopts a bumped road model that significantly affects the vehicle’s attitude. Its mathematical model can be represented as follows [22]:
μ t = h 2 1 cos 2 π υ L t , 2 L υ t 3 L υ 0 , e l s e ,
where h is the height of the bump, L is the length of the bump, and υ is the vehicle’s longitudinal velocity.

2.3. Control Problem Formulation

In active suspension control design, the typical objective is to achieve rapid decay of the system’s state variables with amplitudes smaller than those of passive suspensions. Simultaneously, efforts are made to minimize control energy consumption. Hence, the system’s performance index is defined as follows [26]:
J x 0 , u = 0 e 2 a t x T Q x + u T R u d t ,
where a 0 represents the preset convergence rate, while Q = Q T > 0 and R = R T > 0 are weighting coefficients related to the system state and control inputs.
The control objective of this paper can be summarized as designing an optimal controller without requiring model parameters, thereby ensuring that the system meets the performance index (9).
In evaluating active suspension performance, it is typically necessary to minimize the vertical and pitch accelerations of the vehicle’s center of gravity while simultaneously satisfying constraints such as suspension travel, tire dynamic load, and control saturation. Therefore, the following observation vector is defined:
Λ 1 = z ¨ c θ ¨ T Λ 2 = z 1 η 1 z 1 max z 2 η 2 z 2 max k t 1 η 1 μ 1 9.8 b M a + b + m t 1 k t 2 η 2 μ 2 9.8 a M a + b + m t 2 u 1 u 1 max u 2 u 2 max T ,
where z 1 max and z 2 max represent the maximum travel of the left suspension and right suspension, respectively, while u 1 max and u 2 max represent the maximum motor output forces.

3. Reinforcement Learning-Based Vibration Control Strategy

3.1. Optimal Control Considering Preset Convergence Rate

To ensure the electromagnetic suspension meets the performance indices (9) and (10), the following optimal controller is designed:
u = K x ,
where K 2 × 8 is the control gain matrix to be determined. Based on linear optimal control theory [26], Equation (11) can be transformed into solving the following Algebraic Riccati Equation (ARE):
A + a I P + P A + a I + Q P B R 1 B T P = 0 ,
where
P = 0 e A + a I B K T t Q + K T R K e A + a I B K t d t
Next, the optimal control gain is then given by K = R 1 B T P .
If A , B is stabilizable and A , Q 1 / 2 is observable, then the ARE (12) has a unique solution, and the closed-loop system satisfies the following global exponential stability condition:
lim t x t e a t = 0 ,
Considering the nonlinear nature of the ARE (12), obtaining its analytical solution directly is challenging. Therefore, a computational adaptive policy iteration (PI) algorithm is introduced for to obtain the numerical solution as follows [27].
Computational adaptive PI algorithm: Choose an initial value K 0 2 × 8 such that max Re λ A B K 0 < a , where a > 0 is the preset convergence rate. Given a very small positive constant δ , repeat the following steps starting from k = 0 .
Step 1: Solve the following Lyapunov equation:
A + a I B K k T P k + P k A + a I B K k + Q + K k T R K k = 0 ,
Step 2: Update the gain matrix:
K k + 1 = R 1 B T P k ,
Step 3: If P k + 1 P k < δ , the iteration stops; otherwise, set k k + 1 and return to Step 1.
The solved K k and P k possess the following properties:
(1) max Re λ A B K k < a ;
(2) P P k + 1 P k ;
(3) lim t K k = K , lim t P k = P .
Using the computational adaptive PI algorithm, the optimal control solution that satisfies conditions (9) and (10) can be found. However, this method requires precise information about the electromagnetic suspension model A , B , which can be costly and time-consuming in practical applications. The next section will present an online PI algorithm based on reinforcement learning that does not require model parameters.

3.2. PI-Based Online Reinforcement Learning Algorithm

First, the following new state-space equation is established:
x ¯ ˙ = A ¯ x ¯ + B ¯ u ¯ + C ¯ ω ¯ t ,
where
x ¯ = x e a t ,   u ¯ = u e a t ,   ω ¯ t = ω e a t A ¯ = A + a I ,   B ¯ = B ,   C ¯ = C
It is easy to prove that the new state-space Equation (16) is equivalent to the original system state-space Equation (6), as shown in Figure 5.
To satisfy the persistent excitation condition, a control input with detection noise ε is introduced as follows:
u ¯ = K k x ¯ + ε ,
Define A ¯ k = A ¯ B ¯ K k . Substituting (17) into (16) yields the following:
x ¯ ˙ = A ¯ k x ¯ + B ¯ ε ,
For the closed-loop system (18), a Lyapunov function is established as V = x ¯ T P k x ¯ . Taking its derivative, we obtain the following:
V ˙ = x ¯ T A ¯ k T P k + P k A ¯ k x ¯ + 2 ε T B ¯ T P k x ¯ ,
From Equations (14) and (15), we know that A ¯ k T P k + P k A ¯ k = Q K k T R K k and K k + 1 R = B ¯ T P k . Substituting these into (19), we obtain the following:
V ˙ = x ¯ T Q + K k T R K k x ¯ + 2 ε T R K k + 1 x ¯ ,
It can be seen that (20) does not contain the system’s parameter matrix A , B . Therefore, it is possible to further employ reinforcement learning methods for an online solution without requiring model parameters.
Integrating (20) over a given interval t , t + δ t yields the following:
x ¯ T t + δ t P k x ¯ t + δ t x ¯ T t P k x ¯ t 2 t t + δ t ε T R K k + 1 x ¯ d τ = t t + δ t x ¯ T Q + K k T R K k x ¯ d τ ,
x ¯ T P k x ¯ and ε T R K k + 1 x ¯ can be rewritten as follows:
x ¯ T P k x ¯ = x ¯ T x ¯ T v e c P k ε T R K k + 1 x ¯ = x ¯ T ε T R v e c K k + 1 ,
Combining (21) and (22) yields the following:
x ¯ T x ¯ T | t t + δ t 2 t t + δ t x ¯ T ε T R d t v e c P k v e c K k + 1 = t t + δ t x ¯ T Q + K k T R K k x ¯ d t ,
Notably, in (23), x ¯ can be measured online. Define a small increment δ t , and collect the following data set:
Θ k v e c P k v e c K k + 1 = Ξ k ,   k = 0 ,   1 ,
where
Θ k = x ¯ T x ¯ T | t k , 1 t k , 1 + δ t 2 t k , 1 t k , 1 + δ t x ¯ T ε T R d t x ¯ T x ¯ T | t k , 2 t k , 2 + δ t 2 t k , 2 t k , 2 + δ t x ¯ T ε T R d t x ¯ T x ¯ T | t k , l k t k , l k + δ t 2 t k , l k t k , l k + δ t x ¯ T ε T R d t ,   Ξ k = t k , 1 t k , 1 + δ t x ¯ T Q + K k T R K k x ¯ d t t k , 2 t k , 2 + δ t x ¯ T Q + K k T R K k x ¯ d t t k , l k t k , l k + δ t x ¯ T Q + K k T R K k x ¯ d t
During each data collection, we define 0 t k , i + δ t t k , i + 1 and t k , i + δ t t k + 1 , 1 , where i = 1 , 2 , , l k . Under the condition of persistent excitation, to ensure the solvability of (24), the data set length l k > 0 should be sufficiently large to satisfy r a n k Θ k = n n + 1 2 + n m , where n represents the number of system state variables and m represents the number of control inputs. In this work, lk = 52 is selected, and the least squares method is used to solve Equation (24).
To satisfy the persistent excitation condition, detection noise is introduced, specifically considering the commonly used Gaussian white noise signal. When the condition r a n k Θ k = 52 is met, Equation (24) has a unique solution P k , K k + 1 .
The PI-based online reinforcement learning (PI-RL) algorithm is illustrated in Figure 6. It is easy to prove that the PI-RL algorithm also satisfies the following properties: (1) max Re λ A B K k < a ; (2) P P k + 1 P k ; and (3) lim t K k = K , lim t P k = P .

3.3. Heuristic Algorithm

It is noted that in the aforementioned algorithm, it is necessary to preset an initial gain matrix K 0 2 × 8 to satisfy max Re λ A B K 0 < a , which is difficult and cumbersome for high-dimensional systems. Therefore, this paper proposes the following heuristic Algorithm 1.
Algorithm 1: Heuristic Algorithm
Step 1: Set K 0 = 0 2 × 8 , a = 0 , and execute the algorithm as shown in Figure 6 to obtain the optimal K a = 0 ;
Step 2: Set K 0 = γ K a = 0 , where γ is a positive parameter. Execute the algorithm as shown in Figure 6. If a solution is found, output the optimal K ; otherwise, adjust a and γ and repeat Step 2.
In Step 1 of the heuristic algorithm above, setting a = 0 , means that there is no need to consider the preset convergence rate. Therefore, we can set K 0 = 0 2 × 8 because for vehicle suspension systems, the system is naturally stable, i.e., A is Hurwitz. In Step 2, γ is a positive parameter that needs to be fine-tuned around 1. By adjusting γ and a , it is always possible to satisfy the condition max Re λ A B K 0 < a .
This heuristic method is effective for high-dimensional systems because there always exist positive constants α 1 , α 2 , β 1 and β 2 such that the following condition is satisfied:
α 2 < max Re λ A < α 1 β 2 < max Re λ A B K 0 < β 1 ,
where β 1 > α 1 , β 2 > α 2 .
When γ 1 , we have the following:
max Re λ A γ B K 0 γ max Re λ A B K 0 + 1 γ max Re λ A < γ β 1 1 γ α 1 ,
so there exist suitable γ and a such that max Re λ A γ B K 0 < a .
When γ > 1 , we have the following:
max Re λ A γ B K 0 γ max Re λ A B K 0 + 1 γ max Re λ A > γ β 2 1 γ α 2 ,
so there exist suitable γ and a such that max Re λ A γ B K 0 < a . Therefore, fine-tuning the positive parameter γ around 1 is crucial for this heuristic algorithm.

4. Rapid Prototyping Control Simulation

4.1. Online Learning and Optimization of Control Parameters

A rapid prototyping control (RPC) simulation platform, as shown in Figure 7, was established. The simulation hardware includes the MicroAutoBox II real-time system, Speedgoat baseline real-time target machines, and a host computer. The MicroAutoBox II is equipped with an IBM PPC 750GL processor (900 MHz), the host computer software is Matlab 2021b, and the Speedgoat baseline real-time target machines feature an Intel Celeron 2 GHz CPU with four cores. The host computer primarily runs Simulink interface programs, which transmit data to the Speedgoat prototype controller via UDP communication through a high-speed network port.
The MicroAutoBox II provides robust computational power, enabling real-time operation of complex half-car active suspension models. It calculates vehicle state information based on control signals sent from the Speedgoat target machine and transmits this information to the host computer. CAN communication can be configured using the Speedgoat and dSPACE driver libraries in Simulink, including settings for writing, reading, and status detection functions. The Speedgoat controller is used to execute the proposed online PI-RL algorithm, while MicroAutoBox II is employed to run the vehicle dynamics model and sensor model. A PC serves as the host computer for data recording and display. For the RPC system, the UDP is preferred due to its low-latency characteristics. Compared to the TCP, using the UDP reduces development and maintenance complexity and facilitates rapid deployment and debugging. Therefore, the UDP is chosen for host computer communication in this setup. In UDP communication, the host computer is connected to the host-link ports of the Speedgoat and MicroAutoBox II, and IP addresses are configured to ensure network communication. Communication between the controller and the vehicle dynamics model utilizes the CAN protocol, while data exchange between the controller, sensors, and host computer occurs via the UDP. The vehicle model parameters are listed in Table 2, the inductance L c and number of turns N c of coil 2 are also given in Table 2, and the controller design parameters are selected as follows:
a = 1 , Q = diag 10 10 10 10 200 300 10 10 , R = 0.0001 eye ( 2 ) , δ = 0.0001 , δ t = 0.01
During the operation of the heuristic algorithm, the core step is to initially run the RL algorithm without considering the preset convergence rate, i.e., a = 0, allowing the initial parameter K 0 to be set to zero. Subsequently, the optimization results from the first step are used as the initial parameters for the second step. This approach avoids the complexity of parameter tuning, which is advantageous for high-dimensional systems. In the weight matrix Q , since our objective is to reduce vertical acceleration and pitch acceleration, Q 5 and Q 6 can be set relatively high. It is important to note that the PI algorithm will only be executed when r a n k Θ k = 0.5 n n + 1 + n m is satisfied; otherwise, sampling will continue until the condition is met. The heuristic algorithm proposed in this paper is not sensitive to exploratory noise, and for any Gaussian white noise, the basic optimization conditions can be met when the data length l k = 0.5 n n + 1 + n m . Firstly, the initial step of running the heuristic algorithm is undertaken. During the learning process, the detection noise, as shown in Figure 8, is mainly generated by the linear motor, with its amplitude typically small and frequencies far from the system’s natural frequencies. Figure 9 displays the learning status signal, while Figure 10 illustrates the suspension’s state variables. Figure 11 depicts the variation of P k + 1 P k , indicating the training completion at 3.77 s. As shown in Figure 11, when P k + 1 P k δ , the solution P k of the ARE converges to the optimal solution, i.e., at iteration step k = 5 . To validate the effectiveness of this method, directly solving the ARE (12) yields the ideal optimal solution. Figure 12 compares the actual control gain with the ideal value.
Furthermore, by incorporating the solved optimal control gains K 0 into the second step of the heuristic algorithm, choosing γ = 0.9 yields the final optimal control gains considering the preset convergence performance, as depicted in Figure 13. Figure 14 illustrates the variation of suspension state variables during the learning process. In practice, values of γ near 1 ensure the feasibility of the algorithm. Moreover, this method does not require the system’s model parameters; by introducing detection noise online, it can rapidly achieve the optimal control solution considering the preset convergence performance.

4.2. Control Implementation

To evaluate the performance of the proposed reinforcement learning vibration controller, road tests are conducted using bump road excitation (8). The test vehicle speed is set to 36 km/h, with a bump length of 5 m and a bump height of 0.1 m. The system’s initial state is set as a random number, and the simulation results are shown in Figure 15, Figure 16 and Figure 17. Figure 15 displays the vertical acceleration response curve of the vehicle. Figure 16 shows the pitch acceleration response curve, and Figure 17 presents the suspension output performance indicators.
The simulation compared the PI-RL solution with the ideal optimal control solution, demonstrating the effectiveness of the PI-RL algorithm proposed in the paper, as all performance metrics align with the ideal scenario. Table 3 compares the root mean square values of key suspension indicators under different preset convergence rates. It can be observed from the table that the preset convergence rate impacts motor thermal power. Increasing the preset convergence rate can reduce motor thermal power, although other indicators may slightly increase.
Figure 18 and Figure 19 show the power spectral density estimates of the vehicle’s vertical acceleration and pitch acceleration. The relationship between the normalized frequency ω and the temporal frequency f is given by f = ω f s / 2 π , where f s is the sampling frequency, set at 1000 Hz in this study. From the figures, it is evident that effective attenuation is achieved in the 0–500 Hz range for both vertical and pitch acceleration, with particularly notable attenuation in the low-frequency range of 0–10 Hz (vibrations in the 4–8 Hz range have the most significant impact on human comfort). On highways and well-maintained urban roads, road excitation frequencies are typically below 3 Hz, whereas on rough roads or in rural areas, they are generally below 11 Hz [24]. Therefore, the PI-RL-based vibration control method is effective overall. Additionally, as noted in [15], reducing vertical and pitch acceleration in the 0.8–8 Hz range helps mitigate motion sickness.

5. Conclusions

This paper investigates a reinforcement learning-based vibration control method for electromagnetic active suspension systems with unknown model parameters. A half-vehicle active suspension control model is established. An online reinforcement learning algorithm based on PI is applied to design an optimal controller considering the preset convergence rate. A heuristic algorithm is proposed to ensure the feasibility of the solution. Rapid prototyping control simulations verify the feasibility of the proposed method, with a decrease in control energy consumption index as the preset convergence rate increases. Simulation tests on rough road surfaces indicate that all metrics of this method are significantly better than those of passive suspensions, with over 40% reduction in vertical and pitch accelerations. Frequency domain analysis demonstrates the effectiveness of this method within the low-frequency range of 0~10 Hz, which can satisfy the driving comfort requirements on most common road surfaces. Future research will further consider the nonlinear characteristics of the model and fault-tolerant control performance, as well as the coordinated control mechanism between active suspension and steering systems.

Author Contributions

Methodology, G.W.; writing—original draft preparation, J.D.; writing—review and editing, T.Z.; supervision, S.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the National Natural Science Fund of China (No. 12202112), the Guangxi Natural Science Foundation (No. 2021JJB160015, No. 2021JJA160252), and the Guangxi Key Laboratory of Manufacturing System & Advanced Manufacturing Technology (No. 22-35-4-S006).

Data Availability Statement

All data are contained within the article.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

M Sprung mass
J Pitch moment of inertia
m t 1 ,   m t 2 Unsprung mass
a Distance from front axle to center of mass
b Distance from rear axle to center of mass
k t 1 ,   k t 2 Tire stiffness
k s 1 ,   k s 2 Suspension spring stiffness
b s 1 ,   b s 2 Suspension hydraulic damping
F 1 ,   F 2 Motor control force
z c Vertical displacement of the center of mass
θ Pitch angle
η 1 ,   η 2 Vertical displacement of unsprung mass
μ 1 ,   μ 2 Road disturbance

References

  1. Yu, M.; Evangelou, S.A.; Dini, D. Advances in Active Suspension Systems for Road Vehicles. Engineering 2024, 33, 160–177. [Google Scholar] [CrossRef]
  2. Zhao, W.; Gu, L. Adaptive PID Controller for Active Suspension Using Radial Basis Function Neural Networks. Actuators 2023, 12, 437. [Google Scholar] [CrossRef]
  3. Ding, R.; Wang, R.; Meng, X.; Chen, L. Energy consumption sensitivity analysis and energy-reduction control of hybrid electromagnetic active suspension. Mech. Syst. Signal Process. 2019, 134, 106301. [Google Scholar] [CrossRef]
  4. Su, X.; Yang, X.; Shi, P.; Wu, L. Fuzzy control of nonlinear electromagnetic suspension systems. Mechatronics 2014, 24, 328–335. [Google Scholar] [CrossRef]
  5. Liu, L.; Li, X.; Liu, Y.-J.; Tong, S. Neural network based adaptive event trigger control for a class of electromagnetic suspension systems. Control. Eng. Pract. 2021, 106, 104675. [Google Scholar] [CrossRef]
  6. Pang, H.; Wang, M.; Wang, L.; Luo, J. A composite vibration control strategy for active suspension system based on dynamic event triggering and long and short-term memory neural network. IEEE Trans. Transp. Electrif. 2023, 1. [Google Scholar] [CrossRef]
  7. Wong, P.K.; Li, W.; Ma, X.; Yang, Z.; Wang, X.; Zhao, J. Adaptive event-triggered dynamic output feedback control for nonlinear active suspension systems based on interval type-2 fuzzy method. Mech. Syst. Signal Process. 2024, 212, 111280. [Google Scholar] [CrossRef]
  8. Zhou, Z.; Zhang, M.; Liu, H.; Jing, X. Fixed-Time Safe-by-Design Control for Uncertain Active Vehicle Suspension Systems With Nonlinear Reference Dynamics. IEEE/ASME Trans. Mechatron. 2023, 1–12. [Google Scholar] [CrossRef]
  9. Huang, T.; Wang, J.; Pan, H. Adaptive bioinspired preview suspension control with constrained velocity planning for autonomous vehicles. IEEE Trans. Intell. Veh. 2023, 8, 3925–3935. [Google Scholar] [CrossRef]
  10. Zhang, Z.; Zhang, J.; Yin, H.; Zhang, B.; Jing, X. Bio-inspired structure reference model oriented robust full vehicle active suspension system control via constraint-following. Mech. Syst. Signal Process. 2022, 179, 109368. [Google Scholar] [CrossRef]
  11. Qin, Z.C.; Xin, Y. Data-driven H∞ vibration control design and verification for an active suspension system with unknown pseudo-drift dynamics. Commun. Nonlinear Sci. Numer. Simul. 2023, 125, 107397. [Google Scholar] [CrossRef]
  12. Liu, Z.; Si, Y.; Sun, W. Ride comfort oriented integrated design of preview active suspension control and longitudinal velocity planning. Mech. Syst. Signal Process. 2024, 208, 110992. [Google Scholar] [CrossRef]
  13. Guo, X.; Zhang, J.; Sun, W. Robust saturated fault-tolerant control for active suspension system via partial measurement information. Mech. Syst. Signal Process. 2023, 191, 110116. [Google Scholar] [CrossRef]
  14. Wang, W.; Liu, S.; Zhao, D.; Zhang, C. Approximation-free output feedback control for hydraulic active suspensions with prescribed performance. Nonlinear Dyn. 2023, 111, 21673–21689. [Google Scholar] [CrossRef]
  15. Jeong, Y.; Yim, S. Design of active suspension controller for ride comfort enhancement and motion sickness mitigation. Machines 2024, 12, 254. [Google Scholar] [CrossRef]
  16. Liu, L.; Sun, M.; Wang, R.; Zhu, C.; Zeng, Q. Finite-Time Neural Control of Stochastic Active Electromagnetic Suspension System With Actuator Failure. IEEE Trans. Intell. Veh. 2024, 1–12. [Google Scholar] [CrossRef]
  17. Shaqarin, T.; Noack, B.R. Enhancing Mechanical Safety in Suspension Systems: Harnessing Control Lyapunov and Barrier Functions for Nonlinear Quarter Car Model via Quadratic Programs. Appl. Sci. 2024, 14, 3140. [Google Scholar] [CrossRef]
  18. Afshar, K.K.; Korzeniowski, R.; Konieczny, J. Evaluation of Ride Performance of Active Inerter-Based Vehicle Suspension System with Parameter Uncertainties and Input Constraint via Robust H Control. Energies 2023, 16, 4099. [Google Scholar] [CrossRef]
  19. Arumugam, K.; Chen, B.-S. Finite-time based fault-tolerant control for half-car active suspension system with cyber-attacks: A memory event-triggered approach. IEEE Trans. Veh. Technol. 2024, 1–13. [Google Scholar] [CrossRef]
  20. Huang, T.; Wang, J.; Pan, H. Approximation-Free Prespecified Time Bionic Reliable Control for Vehicle Suspension. IEEE Trans. Autom. Sci. Eng. 2023, 1–11. [Google Scholar] [CrossRef]
  21. Pan, H.; Zhang, C.; Sun, W. Fault-Tolerant Multiplayer Tracking Control for Autonomous Vehicle via Model-Free Adaptive Dynamic Programming. IEEE Trans. Reliab. 2022, 72, 1395–1406. [Google Scholar] [CrossRef]
  22. Li, Q.; Chen, Z.; Song, H.; Dong, Y. Model Predictive Control for Speed-Dependent Active Suspension System with Road Preview Information. Sensors 2024, 24, 2255. [Google Scholar] [CrossRef] [PubMed]
  23. Wang, G.; Li, K.; Liu, S.; Jing, H. Model-Free H∞ Output Feedback Control of Road Sensing in Vehicle Active Suspension Based on Reinforcement Learning. J. Dyn. Syst. Meas. Control 2023, 145, 061003. [Google Scholar] [CrossRef]
  24. Kim, J.; Yim, S. Design of Static Output Feedback Suspension Controllers for Ride Comfort Improvement and Motion Sickness Reduction. Processes 2024, 12, 968. [Google Scholar] [CrossRef]
  25. Li, P.; Lam, J.; Cheung, K.C. Multi-objective control for active vehicle suspension with wheelbase preview. J. Sound Vib. 2014, 333, 5269–5282. [Google Scholar] [CrossRef]
  26. Jiang, Y.; Zhang, K.; Wu, J.; Zhang, C.; Xue, W.; Chai, T.; Lewis, F.L. H∞-based minimal energy adaptive control with preset convergence rate. IEEE Trans. Cybern. 2021, 52, 10078–10088. [Google Scholar] [CrossRef]
  27. Jiang, Y.; Jiang, Z.-P. Computational adaptive optimal control for continuous-time linear systems with completely unknown dynamics. Automatica 2012, 48, 2699–2704. [Google Scholar] [CrossRef]
  28. Esmaeili, J.S.; Akbari, A.; Farnam, A.; Azad, N.L.; Crevecoeur, G. Adaptive Neuro-Fuzzy Control of Active Vehicle Suspension Based on H2 and H∞ Synthesis. Machines 2023, 11, 1022. [Google Scholar] [CrossRef]
  29. Dridi, I.; Hamza, A.; Ben Yahia, N. A new approach to controlling an active suspension system based on reinforcement learning. Adv. Mech. Eng. 2023, 15, 16878132231180480. [Google Scholar] [CrossRef]
  30. Li, H.; Liu, D.; Wang, D. Integral reinforcement learning for linear continuous-time zero-sum games with completely unknown dynamics. IEEE Trans. Autom. Sci. Eng. 2014, 11, 706–714. [Google Scholar] [CrossRef]
  31. Li, C.; Ding, J.; Lewis, F.L.; Chai, T. Model-free Q-learning for the tracking problem of linear discrete-time systems. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 3191–3201. [Google Scholar] [CrossRef] [PubMed]
  32. Valadbeigi, A.P.; Sedigh, A.K.; Lewis, F.L. H∞ Static Output-Feedback Control Design for Discrete-Time Systems Using Reinforcement Learning. IEEE Trans. Neural Netw. Learn. Syst. 2019, 31, 396–406. [Google Scholar] [CrossRef] [PubMed]
  33. Yang, Y.; Wan, Y.; Zhu, J.; Lewis, F.L. H∞ tracking control for linear discrete-time systems: Model-free Q-learning designs. IEEE Control. Syst. Lett. 2020, 5, 175–180. [Google Scholar] [CrossRef]
  34. Lian, B.; Donge, V.S.; Lewis, F.L.; Chai, T.; Davoudi, A. Data-driven inverse reinforcement learning control for linear multiplayer games. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 2028–2041. [Google Scholar] [CrossRef] [PubMed]
  35. Wang, D.; Gao, N.; Liu, D.; Li, J.; Lewis, F.L. Recent progress in reinforcement learning and adaptive dynamic programming for advanced control applications. IEEE/CAA J. Autom. Sin. 2023, 11, 18–36. [Google Scholar] [CrossRef]
  36. Wei, W.; Li, Q.; Xu, F.; Zhang, X.; Jin, J.; Jin, J.; Sun, F. Research on an electromagnetic actuator for vibration suppression and energy regeneration. Actuators 2020, 9, 42. [Google Scholar] [CrossRef]
Figure 1. Structure of electromagnetic suspension.
Figure 1. Structure of electromagnetic suspension.
Processes 12 01591 g001
Figure 2. Linear motor structure.
Figure 2. Linear motor structure.
Processes 12 01591 g002
Figure 3. Modeling nomenclature of a linear motor.
Figure 3. Modeling nomenclature of a linear motor.
Processes 12 01591 g003
Figure 4. Half-car active suspension model.
Figure 4. Half-car active suspension model.
Processes 12 01591 g004
Figure 5. Equivalent state-space model.
Figure 5. Equivalent state-space model.
Processes 12 01591 g005
Figure 6. PI-RL algorithm.
Figure 6. PI-RL algorithm.
Processes 12 01591 g006
Figure 7. Rapid prototyping control simulation platform.
Figure 7. Rapid prototyping control simulation platform.
Processes 12 01591 g007
Figure 8. Exploration noise.
Figure 8. Exploration noise.
Processes 12 01591 g008
Figure 9. Learning status signal.
Figure 9. Learning status signal.
Processes 12 01591 g009
Figure 10. Suspension state variables ( a = 0 ).
Figure 10. Suspension state variables ( a = 0 ).
Processes 12 01591 g010
Figure 11. P k + 1 P k .
Figure 11. P k + 1 P k .
Processes 12 01591 g011
Figure 12. PI-RL solution and ideal solution ( a = 0 ).
Figure 12. PI-RL solution and ideal solution ( a = 0 ).
Processes 12 01591 g012
Figure 13. PI-RL solution and ideal solution ( a = 1 ).
Figure 13. PI-RL solution and ideal solution ( a = 1 ).
Processes 12 01591 g013
Figure 14. Suspension state variables ( a = 1 ).
Figure 14. Suspension state variables ( a = 1 ).
Processes 12 01591 g014
Figure 15. Vertical acceleration curve.
Figure 15. Vertical acceleration curve.
Processes 12 01591 g015
Figure 16. Pitch acceleration curve.
Figure 16. Pitch acceleration curve.
Processes 12 01591 g016
Figure 17. Suspension performance output Λ 2 .
Figure 17. Suspension performance output Λ 2 .
Processes 12 01591 g017
Figure 18. Vertical acceleration power spectral density.
Figure 18. Vertical acceleration power spectral density.
Processes 12 01591 g018
Figure 19. Pitch acceleration power spectral density.
Figure 19. Pitch acceleration power spectral density.
Processes 12 01591 g019
Table 1. Motor dimension parameters.
Table 1. Motor dimension parameters.
SymbolValueUnit
h c 1 ,   h c 3 20mm
h c 2 40mm
h i 1 ,   h i 3 5mm
h i 2 10mm
r m 22.6mm
r k 32.1mm
Table 2. Vehicle model parameters.
Table 2. Vehicle model parameters.
ParameterValueParameterValue
M (kg)500 a m 1.25
J (kg·m2)910 b m 1.45
mt1 (kg)30 k s 1 N / m 10,000
mt2 (kg)40 k s 2 N / m 10,000
kt1 (N/m)100,000 u 1 max N 2000
kt2 (N/m)100,000 u 2 max N 2000
b s 1 Ns / m 1000 b s 2 Ns / m 1000
z1max (m)0.1kf (N/A)40
z2max (m)0.1r (Ω)25.3
L c mH 23.12 N c 1262
Table 3. Suspension performance RMS value.
Table 3. Suspension performance RMS value.
Method z ¨ c m / s 2 θ ¨ rad / s 2 P r m s 1 (W) P r m s 2 (W)
Passive suspension0.45310.4799————
PI-RL (a = 0)0.18180.226511331378
PI-RL (a = 0.3)0.18580.232410871299
PI-RL (a = 0.6)0.19140.240610341210
PI-RL (a = 1)0.2010.2551958.61086
PI-RL (a = 1.2)0.20670.2638922.61026
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wang, G.; Deng, J.; Zhou, T.; Liu, S. Reinforcement Learning-Based Vibration Control for Half-Car Active Suspension Considering Unknown Dynamics and Preset Convergence Rate. Processes 2024, 12, 1591. https://doi.org/10.3390/pr12081591

AMA Style

Wang G, Deng J, Zhou T, Liu S. Reinforcement Learning-Based Vibration Control for Half-Car Active Suspension Considering Unknown Dynamics and Preset Convergence Rate. Processes. 2024; 12(8):1591. https://doi.org/10.3390/pr12081591

Chicago/Turabian Style

Wang, Gang, Jiafan Deng, Tingting Zhou, and Suqi Liu. 2024. "Reinforcement Learning-Based Vibration Control for Half-Car Active Suspension Considering Unknown Dynamics and Preset Convergence Rate" Processes 12, no. 8: 1591. https://doi.org/10.3390/pr12081591

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop