1. Introduction
As the problems of environmental pollution and energy shortage become more and more serious, the concepts of energy conservation and environmental protection have received more attention [
1,
2,
3]. The continuous development of new energy vehicle technologies has not only effectively alleviated the environmental problems caused by the automobile industry but has come to be regarded as forming a promising field [
4,
5,
6]. Among these technologies, hydrogen fuel cell vehicles have high-quality characteristics such as zero emissions, long battery life, and high durability [
7], and their breakthroughs in related technologies have received extensive attention and participation from scholars. A good energy management strategy (EMS) can further improve the cruising range of hydrogen fuel cell vehicles and effectively reduce hydrogen energy consumption [
8]. At the same time, the development of vehicle-to-everything technology has greatly improved the ability of vehicles to obtain environmental information [
9] and is an important technical means for fuel cell vehicles to achieve better autonomous driving technology and energy management strategies [
10]. Additionally, research on the energy management of hydrogen fuel cell vehicles, combined with environmental information, is an important part of the promotion and application of hydrogen fuel cell vehicles [
11].
EMS is a key technology for fuel cell buses, and its role is to rationally distribute the power of fuel cells and power cells through the current vehicle state to improve vehicle driving efficiency and energy consumption economy [
12,
13]. At present, scholars have conducted much research on EMS for fuel cell buses, and research methods include rule-based EMS [
14], optimization-based EMS [
15] and learning-based EMS [
16]. Among these, rule-based EMS appeared mostly in early research, and, while its calculation is small and easy to implement, it relies on expert experience and cannot be reasonably and effectively processed in the face of complex working conditions [
17]. As a popular direction of EMS research, optimization-based EMS includes dynamic programming (DP) [
18,
19,
20], Pontriagin’s minimum principle (PMP) [
21,
22], model predictive control (MPC) [
23], etc.
Optimized EMS is based on optimizing the control rate to reduce the energy consumption required for vehicle driving. Fu et al. [
24], aiming to address the impact of fuel cell power fluctuations on fuel cell life, proposed an EMS based on fuzzy control method. The EMS controls the power fluctuation of the fuel cell within 300 W/s and prolongs the service life of the fuel cell. Additionally, it has been verified in highway fuel economy test (HFET), urban dynamometer driving schedule (UDDS), and new European driving cycle (NEDC) operating conditions. Saman Ahmadi et al. [
25], aiming to address the problems of the high hydrogen consumption of fuel cells and poor state-of-charge (SOC) retention of power batteries, proposed an EMS based on fuzzy logic control. Equivalent fuel economy and power performance of the vehicle reduce SOC fluctuations of the power battery. Huang et al. [
26], aiming to address the balance between fuel cell economy and durability, proposed an EMS based on the PMP. Wang et al. [
27], aiming to address the economic problem of fuel cells, proposed a fuzzy control optimization strategy based on cycle recognition. Compared with the traditional fuzzy control strategy, fuel cell hydrogen consumption was reduced by 40.5%; compared with the genetic algorithm fuzzy control strategy, the fuel cell hydrogen consumption was reduced by 16.55%. Additionally, optimized EMS has been verified in the economic commission of Europe (ECE) and other working conditions.
With the development of machine learning technology, learning-based EMS has gradually become a popular direction for fuel cell bus energy management research, including support vector machines, neural networks, Markov chains, genetic algorithms, etc. Min et al. [
28] proposed a neural network optimized by a genetic algorithm as an EMS. Aiming at the shortening of fuel cell life caused by frequent start–stop and load changes of vehicles, according to the optimization ability of the algorithm, unnecessary start–stop and load changes are effectively avoided, energy consumption is reduced and fuel cell life is extended. Wu et al. [
29] proposed a DRL-based continuous state parameter fuel cell EMS, which improves the cost effectiveness of energy management strategies based on reinforcement learning. By studying the mean square error and the Huber loss function, the highly random load distribution is effectively processed, and the cost-effectiveness of the EMS is higher. In order to improve the economy of fuel cell vehicles and prolong the service life of fuel cells, Tang et al. [
30] designed a deep Q network with priority experience playback based on the traditional deep Q network (DQN) and the DQN with priority experience playback. Based on this, an EMS was proposed which realizes the optimization of fuel economy and battery durability by adjusting the corresponding weight of the objective function in the fuel cell system. Huang et al. [
31], aiming to address the power and cruising range of fuel vehicles, designed a double-layer DDPG and proposed a dual-mode operation scheme for extended-range fuel cell hybrid electric vehicles to achieve the optimal power distribution under dual modes. The EMS of the layer algorithm optimizes the power and economy of fuel vehicles. Huo et al. [
32], aiming to address the problems of fuel cell hybrid electric vehicle economy and durability, proposed an EMS based on deep Q-learning (DQL) with priority empirical playback and DDPG. Their strategy incorporates fuel economy and power fluctuation factors into the multi-objective reward function, which realizes a reduction of fuel consumption and the improvement of life.
The above studies have greatly promoted the development of fuel cell bus energy management technology. The optimization-based and learning-based EMSs perform better in optimization effect and accuracy and can achieve global or local optimal results [
33]; however, there remain problems: focusing on fuel cell vehicles and using standard working conditions for training optimization leads to a lack of the full consideration of uncertain working conditions. The expressway, as an important part of urban traffic, has a more complex traffic environment due to its large traffic flow and fast average speed. When a fuel cell bus is driving on an expressway, it must first consider the traffic environment, control its own driving speed, and perform energy management at this speed, so as to achieve safe and stable driving and to reduce equivalent hydrogen consumption. In this actual situation, using an EMS trained under the existing working conditions may not achieve better results. In response to changes in traffic information, real-time fuel cell bus speed control is carried out, and the optimization of fuel cell bus energy management under this condition obviously has higher generalization and greater application significance [
34].
At present, the advanced driver assistance system (ADAS) for vehicles has become relatively mature, and adaptive cruise control (ACC) has also become widely used in vehicle longitudinal speed control [
35]. The self-learning ability of DRL enables it to complete more efficient, safe and stable vehicle driving control in complex traffic scenarios [
36]. Wang et al. [
37] used deep Q-learning to control a vehicle such that it was able to automatically drive from a ramp to a main road under the premise of ensuring safe driving. Zhu et al. [
38] used a deterministic policy layer (DPG) method to learn human driving data and proposed a vehicle-following controller. The application of DRL to the above methods has achieved good results in vehicle-following control; however, in the speed control of fuel cell buses, driving economy is also important and cannot be ignored. At present, scholars have promoted the theory of ecological driving [
39], paying more attention to the interaction between the vehicle itself and the traffic environment to achieve the safe driving of the vehicle while also saving energy and protecting the environmental [
40].
Combined with vehicle-to-everything technology, vehicles can easily obtain information on the surrounding traffic environment, and machine learning methods can simultaneously manage vehicle speed and energy management issues. This paper proposes a deep reinforcement learning-based hydrogen fuel cell bus energy management study that considers speed control in urban expressway scenarios. A two-lane urban expressway is built using SUMO simulation software. In this traffic environment, the speed control of the hydrogen fuel cell bus based on the SAC algorithm enables it to run safely and stably while at the same time, the fuel cell output power is reasonably allocated so as to reduce it accordingly. The consumption of hydrogen improves the energy consumption economy and reduces the fluctuation of the fuel cell output power to improve the durability of the fuel cell.
The main contributions of this paper are:
Considering the impact of the traffic environment on vehicle driving and the energy management of hydrogen fuel cell buses, a more complex traffic environment is constructed to improve the generalization of EMS;
Using a DRL algorithm SAC to control a vehicle’s speed and fuel cell output power to effectively improve vehicle driving efficiency, safety, energy consumption economy, and to improve fuel cell life by reducing fuel cell output power;
In the action space of the SAC algorithm, the action space is redirected to select an interval, thereby accelerating the convergence of deep reinforcement learning training and improving the effect of EMS.
The structure of this paper is organized as follows.
Section 2 introduces the hydrogen fuel cell bus model, speed control model and traffic environment model;
Section 3 introduces the speed control and fuel cell EMS based on DRL;
Section 4 analyzes and compares the results; finally,
Section 5 gives conclusions.
2. System Model
The research content of this paper is manifest in the concept that, when hydrogen fuel cell buses drive on urban expressways, energy management and energy saving optimizations are carried out under the guarantee of safe and stable driving. On the basis of this research content, a hydrogen fuel cell bus and its power system model—forming a hydrogen fuel cell bus—are established, as are a speed control model and a traffic environment model.
2.1. Fuel Cell Bus and Power System Model
The basic parameters of the fuel cell hybrid electric bus used in this paper are shown in
Table 1. The fuel cell system is composed mainly of a proton exchange membrane fuel cell and other accessories, and the vehicle is composed mainly of a fuel cell and a power cell that work together. The structure diagram of this is shown in
Figure 1. In this hydrogen fuel cell bus, the fuel cell system boosts its output voltage through a DC/DC converter, and then connects in parallel with the power battery pack. Both the fuel cell system and the power battery system can directly transfer energy to the drive motor. In addition, the two can output energy at the same time to drive the motor with a large output power to drive the vehicle together.
The relationship between the power system of the fuel cell bus is shown in
Figure 2. The relationship between hydrogen consumption, efficiency and fuel cell power is shown in
Figure 2a and is obtained by the interpolation method of the specific parameters of the vehicle.
Figure 2b shows the power battery internal resistance characteristics. Finally,
Figure 2c shows the relationship between the torque, speed and efficiency of the drive motor.
In this fuel cell stack, the relationship between the output voltages of the cells is shown in Formulas (1) and (2), and the total voltage of the stack is shown in Formula (3).
In the formulae, the cell voltage, spark loss voltage, ohmic loss voltage, concentration loss voltage, and proton exchange membrane fuel cell stack voltage are expressed as , respectively. Additionally, activation loss resistance, ohmic loss resistance, and concentration loss resistance are expressed as , respectively. The over-the-voltage source is expressed as , while the output current is expressed as and the number of fuel cell cells is expressed as .
The power battery system in this paper can be expressed as shown in Formulas (4)–(7).
In the formulae, the output voltage of the power battery, and the voltage drop generated by the polarization phenomenon are expressed as , respectively. The ideal voltage source is expressed as , the power battery ammeter is shown as , and ohmic resistance and polarization resistance are expressed as , respectively. The polarization capacitance is expressed as , the load power of the power battery is expressed as , the initial charge of the power battery and the remaining charge are expressed as , respectively, and indicates the initial capacity of the battery.
The driving motor relationship is shown in Formula (8).
In the formula, the power of the drive motor is expressed as , the torque of the drive motor and the drive motor are expressed as , respectively, and the efficiency of the drive motor is expressed as .
In addition, the fuel cell bus powertrain receives physical limitations such as those shown in Formula (9).
where
and
are engine generator speed and torque, respectively.
2.2. Velocity Control Model
As shown in
Figure 3, for this study, the fuel cell bus (target vehicle) is defined as driving in an urban expressway. This is because the relevant passenger car model is restricted by relevant traffic laws and regulations and can only drive in the far-right lane, while, in real traffic, other cars and other models in the environment may have different driving styles and driving routes, and are able to drive in any lane under the premise of ensuring their own safe and efficient driving. This means that the target vehicle inevitably faces the problem of replacing the car in front. In this scenario, the conventional car-following model may have some hidden dangers with regard to the safe driving problem. For example, the vehicle in the adjacent lane may jump in line to the front of the target vehicle, causing the target vehicle to follow the car and, as a result, the distance is suddenly reduced. Therefore, in order to ensure that the vehicle is capable of emergency braking safely in the event of an emergency, this paper uses the maximum and minimum follow-up distances to ensure the safe driving of the vehicle. The minimum follow-up distance is defined as (10).
where
is the speed of the hydrogen fuel cell bus.
In order to ensure the normal driving of the target vehicle, its speed must also be kept within a relatively normal range to reduce its impact on the way in which the environmental vehicle is driven. However, it is expensive to obtain the speed information of other vehicles in the traffic environment, so the maximum following distance is defined to ensure that the speed of the target vehicle is not too different from the speed of the surrounding vehicles. The maximum following distance is as follows (11):
With the limitation of the maximum–minimum following distance, the longitudinal driving of the vehicle can be reasonably controlled to ensure the safe and stable driving of the vehicle under the condition that it matches the target vehicle’s own speed.
2.3. Traffic Environment Model
This paper mainly studies the EMS of fuel cell passenger cars based on longitudinal speed control in urban expressways, so a two-lane expressway is established, as shown in
Figure 4. It is worth mentioning that the vehicle is arranged to drive in the far-right lane without considering the lane change factor, while other vehicles in the environment are not affected. This means that the target vehicle only needs to pay attention to its own speed to control the distance from the car in front and to ensure its own safe and efficient driving. Additionally, after changing the car in front (the car in front changes to the left or there are vehicles in adjacent lanes), the target vehicle must first make reasonable adjustments to its own speed to continue safe passage. According to statistics, the settings of traffic flow are shown in
Table 2.
3. Fuel Cell Bus EMS and Speed Control Based on DRL
3.1. Problem Description
The main research purpose of this paper is to ensure the safe and stable driving of the fuel cell bus by controlling its longitudinal driving. On this basis, we are able to optimize the fuel cell energy management system to reduce hydrogen consumption and minimize the large fluctuation of the fuel cell output power to prolong its life. The research purpose can be summarized as Formula (12):
where
represent user-defined weight parameters for different targets,
represents the equivalent cost of hydrogen consumption of the engine,
represents the difference between the output power fluctuations of the fuel cell at adjacent times,
represents the fuel cell bus and
represents comfortable follow-up cost.
is determined by
Figure 2a, and the output power directly determines the hydrogen consumption. By calculating the power battery consumption, it is converted into equivalent hydrogen consumption, which is added to the direct hydrogen consumption to become the final equivalent hydrogen consumption, which is then defined in its cost function as reaching an optimum as it becomes ever smaller.
is determined by the output power difference of adjacent time steps, and its cost function is defined as Formula (13):
where
, and
are the fuel cell output power at time
and time
, respectively.
In the speed control model, the collision of the fuel cell bus or the absence of the maximum–minimum following distance will lead to an increase in the safety cost, so its cost function is defined as Equation (14):
where
is the following distance of the fuel cell bus at the current moment.
When the following distance of the vehicle is less than the minimum following distance, the current vehicle speed is set as the safety cost, i.e., the greater the speed at this time, the greater the cost of ensuring the safe driving of the vehicle. Additionally, when the following distance of the vehicle is greater than the maximum following distance, the cost function set for the vehicle speed is not sufficient to reflect the safety performance, so the difference between the following distance and the maximum following distance of the vehicle at this time is set as the safety cost.
During the driving process of the fuel cell bus, the real-time change of the vehicle acceleration not only affects vehicle comfort, but also has a certain impact on the fuel cell hydrogen consumption level. Thus, the acceleration of the vehicle and the jerk value are closely related to the comfortable driving of the vehicle and the reduction of hydrogen consumption. The larger the jerk value, the lower the comfort and the higher the hydrogen consumption. Therefore, its cost function is set as Formula (15):
where
is the acceleration of the vehicle, and the acceleration value of the hydrogen fuel cell bus involved in this paper is
.
3.2. Deep Reinforcement Learning
In DRL, the agent makes random actions to interact with the environment and teaches itself according to the rewards (including negative rewards, or punishments) obtained from the interaction, and thus becomes more adaptable to the environment. This paper uses the soft actor–critic (SAC) algorithm to solve the problem proposed above.
Compared with the DDPG algorithm, which is widely used in deep reinforcement learning algorithms, the SAC algorithm also includes actor networks and critic networks. However, unlike the DDPG algorithm, the SAC algorithm integrates maximum entropy learning into the actor–critic network, i.e., in SAC, when the reward obtained by the agent in the interaction is maximized, the entropy of its policy is also maximized, which ensures that the algorithm has a better exploration ability and therefore offers a better performance.
3.3. Speed Control and EMS of Fuel Cell Bus Based on SAC
Based on the problem of speed control and energy management of hydrogen fuel cell buses proposed above, a speed planning and fuel cell bus energy management model based on SAC algorithm is designed. The agent controls the longitudinal speed of the vehicle based on the environmental information provided by the state space to ensure the safe and comfortable driving of the vehicle; at the same time, the optimization of the energy management system is completed through the speed information to reduce the equivalent hydrogen consumption and power fluctuation of the fuel cell bus.
In the SAC algorithm, the goal of solving is the maximum value of the cumulative reward, and the reward function
is:
The selection of the state space
needs to ensure the complete sampling of the state space. In this study, the action space includes the speed
and acceleration
of the hydrogen fuel cell bus, the speed of the front vehicle of the hydrogen fuel cell bus
, the acceleration
and the distance between the target vehicle
, the hydrogen fuel cell bus equivalent hydrogen consumption
, the hydrogen fuel cell output power
, and power battery remaining charge
. Expressed as:
In the speed control of hydrogen fuel cell buses, because there is no need to consider the lane change factor, only the longitudinal acceleration is defined in the speed control action space; in the energy management system, the output power of the power system is directly related to the torque and speed of the drive motor, and the output power can be determined. The optimal speed and torque can be determined, so that, in energy management, the action space is defined as the output power of the fuel cell. That is, the action space
is:
The SAC algorithm hyperparameter settings are shown in
Table 3.
In particular, and as shown in
Figure 2a, the output of the fuel cell power is directly related to the efficiency, and a greater efficiency is achieved between the fuel cell output power
. This shows, to a certain extent, that the fuel cell output power can obtain a larger output power in this interval under the condition that the hydrogen consumption is small, which is critical for the reduction of the equivalent hydrogen consumption. Therefore, this paper optimizes the selection of the action space of the SAC algorithm (The optimized SAC algorithm is called SAC–OPT) and sets the action value selection function of the fuel cell output power to the expected value, which is more biased towards the high-efficiency working interval of the fuel cell. Under the premise of not affecting the randomness strategy of the action space of the SAC algorithm, this change can effectively reduce the consumption of hydrogen and reduce the fluctuation of the output power of the fuel cell to a certain extent, which in turn can effectively improve the life of the hydrogen fuel cell.
After determining the key elements of the SAC algorithm, the complete framework of the research content of this paper is proposed as shown in
Figure 5.
3.4. Simulation Scene Construction
This paper uses Simulation of Urban Mobility (SUMO) as a simulation platform to construct a car-following scene. In SUMO, one can use the Traffic Control Interface (Traci interface) to connect with Python and easily obtain all necessary information in the traffic environment, such as the main-vehicle speed, acceleration, and surrounding vehicle information. The acquisition of this information allows us to better complete the control of the target vehicle by the DRL algorithm. At the same time, the SUMO simulation software includes a variety of complete car-following models and lane-changing models, which can more realistically simulate the normal driving of vehicles in the traffic environment to improve the simulation effect.
Through the traffic environment model established above, a simulation environment is built in the SUMO simulation software, and the specific road and vehicle parameters are shown in
Table 4. In the simulation environment established in this paper, the detailed road design is shown in
Figure 6.
In order to improve the authenticity of the simulation, when the target vehicle enters the road, it is necessary to ensure that the traffic flow in the traffic environment is already in a normal state, so that, in each episode, the target vehicle will not enter the road until the 150th step. This ensures that there are enough vehicles in front of the environment to affect the driving of the target vehicle. At the same time, it should be noted that, in order to ensure that the simulation program does not make any errors due to the target vehicle leaving the road, only the first 12 km of road data are counted in the data analytics processing.
5. Conclusions
Aiming at the energy-saving driving problem of hydrogen fuel cell buses in complex traffic environments, this paper simultaneously optimizes the speed control and energy management of hydrogen fuel cell buses through deep reinforcement learning algorithms. Using SUMO to build a 12 km effective distance urban expressway as a complex traffic environment, the longitudinal speed control of the fuel cell bus is carried out to restrain its following distance and to ensure its safe and efficient driving; at the same time, the real-time vehicle speed is used as the working condition data. The fuel cell energy management system is optimized, and the expected value of the action selection function is optimized in the algorithm action space, effectively reducing the equivalent hydrogen consumption of the hydrogen fuel cell bus, reducing the fluctuation of the fuel cell output power, and improving the fuel cell’s durability.
In terms of speed control, compared with the SUMO–IDM car-following model, the average speed of vehicles is kept the same, and the average acceleration and acceleration change values decrease by 10.22% and 11.57%, respectively. Compared with DDPG, the average speed increases by 1.18%, and the average acceleration and acceleration change values are decreased by 4.82% and 5.31%, respectively. In terms of energy management, the hydrogen consumption of the SAC–OPT-based energy management strategy reaches 95.52% of that of the DP algorithm, and the fluctuation range is reduced by 32.65%. Compared with the SAC strategy, the fluctuation range is reduced by 15.29%. The durability of that fuel cell is effectively improved.
Judging from the current research, DRL has achieved exciting results in simulation scenarios, including the field of autonomous driving and the energy management of hydrogen fuel cell vehicles, and has developed more mature and accessible autonomous driving simulation systems and energy management strategies. However, in terms of real vehicle tests and practical applications, the capture of many environmental factors and the development and application of on-board hardware are still key issues that need to be solved. This is also the deficiency of this paper, i.e., that effective experiments with real vehicles cannot be carried out. However, with the continuous updating of DRL, the upgrading of on-board sensors and the development of on-board hardware, DRL will become ever more mature in the automatic driving and energy management of hydrogen fuel cell vehicles and may even become the primary technology for high-level unmanned driving.