1. Introduction
With the depletion of oil resources and increasing climate variability, the problem of energy has grabbed the attention of a variety of industries worldwide. In order to address climate risk and promote the use of clean energy, the goal of peaking carbon emissions and achieving carbon neutrality has been unanimously recognized by all fields [
1]. Compared to other power sources, lithium-ion batteries offer benefits like high energy density, minimal self-discharge, and resistance to memory effects [
2]. Therefore, lithium-ion batteries are widely used as the main energy source in electric vehicles [
3,
4,
5,
6]. However, electric vehicles relying solely on lithium-ion batteries face difficulties in handling high-rate currents and fluctuating driving conditions, leading to faster battery aging and a reduced lifespan [
7,
8]. On the other hand, ultracapacitors have strong instantaneous power output capabilities. For this reason, combining the two components to form a hybrid energy storage system (HESS) and implementing efficient energy management strategies (EMSs) can reduce the adverse effects of high-rate currents on lithium-ion batteries. This approach not only improves battery performance but also enhances the reliability of battery pack operation [
9,
10,
11]. As a result, the EMS is essential for enhancing power distribution efficiency. This paper aims to investigate the impact of different DRL algorithms (DQN and DDPG) on energy efficiency, battery lifespan, and power distribution under identical training conditions.
The primary types of EMSs are rule-based and optimization-based approaches. The rule-based EMSs are widely used in electric vehicles, which have the characteristics of small computational complexity and high reliability [
12,
13,
14]. However, due to the complex dynamic characteristics of HESSs, a rule-based EMS is difficult to adjust online with real driving conditions, which will result in worse control performance of the HESS. Moreover, rule-based EMSs rely heavily on the experience of engineering designers. In addition, the optimization-based EMSs can be further divided into global optimization methods and real-time optimization approaches. Dynamic programming (DP) is a well-known global optimization strategy with excellent control performance [
15]. In contrast to rule-based EMSs, the optimization-based EMSs have better performances in lower energy consumption for the HESS. However, the methods involve significant computational costs and may encounter challenges, including the curse of dimensionality and discretization errors, making them unsuitable for practical applications. Therefore, the optimization-based EMSs are often used as reference points to evaluate the performance of other EMSs. To realize the optimal power allocation in HESSs, Ref. [
16] utilized the outcomes of DP to enhance the adaptive rule-based EMSs. The simulation results showed that this strategy can protect the battery excellently and reduce overall vehicle energy loss under unknown driving conditions. Combined wavelet transformation, neural networks, and fuzzy logic train offline neural network models, which use data sets obtained from wavelet decomposition [
17]. Then, they use the trained models to predict the low-frequency power demand of the battery and achieve real-time efficient power allocation for the HESS. Moreover, the real-time optimization-based EMSs are generally divided into an equivalent consumption minimization strategy (ECMS) [
18], model predictive control (MPC) [
19], and adaptive equivalent consumption minimization strategy (A-ECMS) [
20]. In Ref. [
21], a combination of MPC and Pontryagin’s minimum principle is presented to carry out energy management for the HESS. A method proposed an online EMS for electric vehicles based on the ECMS [
22]. Compared to existing ECMS methods, the strategy can reduce fuel consumption by 8–14%. Although these real-time optimization strategies can reduce the complexity of system calculation to a certain extent, they are prone to getting stuck in local optima during the solving process, which restricts the full potential of vehicle performance.
With the fast development of internet technology and artificial intelligence (AI) algorithms, reinforcement learning (RL) algorithms have demonstrated remarkable decision-making capabilities in actual engineering applications. These types of AI algorithms can obtain the optimal EMSs for HESSs with unknown system structures and parameters. Generally, the RL algorithms can be divided into traditional RL and deep reinforcement learning (DRL). As a typical representative of traditional RL, Q-Learning has been extensively employed in various industries. The EMS of Q-Learning is used to conduct a comprehensive analysis and research on the HESS in Refs. [
23,
24]. However, the training process of Q-Learning is highly unstable owing to the necessity of discretizing the state and action spaces. To ensure the stability of the energy allocation process, Ref. [
25] proposed a two-stage EMS based on Q-Learning. The results demonstrated that, compared with the recent EMSs, the training time and average absolute error were reduced by 23% and 20%, respectively. Nevertheless, for the sake of adapting to actual driving conditions, the EMS based on the deep Q-network algorithm for HESSs is proposed in the study of [
26]. The battery capacity degradation was reduced by 26.36% under the guaranteeing favorable fuel economy for electric vehicles. Ref. [
27] employed a hierarchical deep Q-Learning (DQL-H) algorithm to determine the best solution for the EMS. The proposed hierarchical algorithm DQL-H addresses the challenge of limited feedback during training while also enhancing training efficiency and reducing fuel consumption. The problem is that the Q value of deep Q network (DQN) during the training process is overestimated. A novel Double DQN-based EMS is introduced in Ref. [
28], which shows a method that can achieve the purpose of cost saving by converting discrete state parameters into continuous ones. Analysis of the simulation results showed that the policy could further decrease costs by 5.5% and reduce training time by 93.8%. Another EMS framework based on Double DQN for HESSs is also constructed [
29]. The framework was designed to address the issues of traditional control strategies and RL. The experimental results indicated that the proposed strategy significantly improves vehicle fuel economy. However, highly discretized state-action spaces not only cause sharp changes in control algorithm dimensions but also increase convergence difficulties. To address this challenge, DRL algorithms with an Actor–Critic structure have been widely employed to handle high-dimensional continuous state-action spaces. The deep deterministic policy gradient (DDPG) method and transfer learning aim to optimize the EMS of HESSs [
30]. The simulation results illustrated that it exhibits superior early performance and quicker convergence, with strong robustness and adaptability. A hierarchical EMS based on a DDPG is proposed in Ref. [
31]. In the upper-level strategy, the DDPG algorithm employs historical operating condition information to generate the State of Charge (SOC) for future driving segments. Meanwhile, the lower-level strategy uses a long short-term memory (LSTM) neural network algorithm to forecast the vehicle’s short-term speed. The analysis results revealed that the method contributes to improving the overall vehicle fuel economy.
A comprehensive literature review indicates that EMSs based on DRL have gained significant attention in recently research. However, there are still notable challenges that impede further advancements in this domain. It is widely accepted that advanced DRL algorithms can improve the performance of EMSs. Nevertheless, the lack of standardized benchmarks for comparing different EMSs based on various DRL algorithms is a major obstacle. Many studies only validate that DRL algorithms with special parameters outperform traditional RL algorithms in particular driving conditions for electric vehicles. However, there is a lack of unified testing benchmarks to compare various EMSs. This limitation complicates the assessment and comparison of the performance of various EMSs based on DRL algorithms. It also hinders the development and progress of this research field. Future research should establish standardized benchmarks to facilitate the comparison and evaluation of diverse EMSs. This will lay the groundwork for future progress in the field of EMSs based on DRL.
Therefore, in order to meet the challenges, the total energy loss is taken as the optimization control target for HESSs. The differences in EMS performance under varying DRL methods are explained by analyzing two DRL algorithms and their principal frameworks. The key contributions of this paper are outlined below: (1) Two DRL algorithms and their schematics are presented, and systematic comparative experiments are conducted of EMSs for HESSs. (2) By comparing with the different DRL methods on the EMSs of electric vehicles under the same benchmark, this paper highlights the future directions for improving DRL-based EMSs.
The structure of this paper is as follows.
Section 2 establishes the power system models of the HESS.
Section 3 presents two DRL-based EMSs. Simulation experiment results of different strategies are analyzed and discussed in
Section 4.
Section 5 summarizes the conclusions of this research.
4. Simulation Results and Discussion
To visually compare the impact of different EMSs based on DRL in electric vehicles, rule-based EMS, DP-based EMS, DQN-based EMS and DDPG-based EMS are selected for comparison. The DP-based optimization strategy with outstanding global optimization characteristics is used to evaluate the performance of other EMSs. The comparison results of the
,
,
and
under different EMSs are shown in
Figure 9. In additional, to provide a better comparison of the effects of different EMSs,
Table 7 summarizes some key features of the HESS in the overall control process.
Figure 9a,b represent the change curves of the battery
and ultracapacitor
, respectively. Because of the low capacity of the ultracapacitor, it is mainly used to provide power demand. Therefore, the economies in different EMSs are decided by the terminal of the battery. As shown in
Table 7, the terminals
for DP-based, DDPG-based, DQN-based, and rule-based EMSs are 0.3465, 0.3452, 0.3124, and 0.3146 under four UDDS driving cycles. These terminal SOC values are averaged across multiple driving cycles to ensure robustness. In comparison to the terminal
of DP-based EMS, the differences in the terminal SOC for DDPG-based, DQN-based, and rule-based EMSs are 0.0013, 0.0341 and 0.0319, respectively. This indicates that, under the same state-action space, reward function, and training hyperparameters, the gap between the DDPG-based EMS and the DP-based EMS is reduced to 0.37%, and the economy is improved by 10.49% compared to the DQN-based EMS.
Additionally, from
Figure 9c,d, it can be known that the diversity in the EMSs at the HESS caused significant disparities in the energy allocation of electric vehicles. The EMS controlled by the DRL approach effectively limits the maximum current under the driving cycle utilization of the unique features of ultracapacitors in power applications, while the DQN-based EMS terminal
improves the utilization of the ultracapacitor.
Table 7 lists the maximum charging current of the battery and ultracapacitor under variable EMSs. The DDPG-based EMS can effectively reduce the impact of peak current on the battery and extend the battery’s lifespan.
In addition, the variation curves of battery power and ultracapacitor power are described in
Figure 9e,f. It can be understood that the majority of regenerative energy is absorbed by the ultracapacitor. This means that the DDPG-based EMS can further reduce the fluctuation amplitude of the battery output power. Therefore, it can effectively maintain the stability of the lithium-ion battery’s output and decrease the driving costs of electric vehicles. Moreover,
Figure 10 shows the energy losses under different EMSs, with specific parameters shown in
Table 8. The DDPG-based EMS cut down the total energy losses to 0.7% compared to the DP-based EMS under the same benchmark. Compared to the DQN-based EMS, the energy loss gap achieved by the DDPG-based EMS is 40.4%, indicating a higher economic efficiency of the DDPG-based EMS.
5. Conclusions
In this paper, two EMSs based on the DRL algorithm are designed for electric vehicles. To investigate the impact of the variable DRL algorithm on the EMSs for HESSs, the same benchmark is set to compare and analyze the different performances of each EMS. The simulation experiment results demonstrate that the DDPG-based EMS can allocate the output power better of various components in the HESS. In comparison to the DQN-based EMS and the rule-based EMS, the DQN-based EMS and the DDPG-based EMS improve the economic efficiency by 28.3% and 33.6%, respectively. Furthermore, the energy loss gap between the DDPG-based EMS and DP-based EMS is reduced to 0.7%. Finally, the DQN-based EMS maximizes ultracapacitor efficiency in recovering regenerative energy under varying driving conditions. Meanwhile, the DDPG-based EMS can restrain the peak current of the lithium battery, which demonstrates the adaptability of the DDPG-based EMS.
In future research, additional enhancements will be made to overcome some of the limitations of the current work. For example, the intelligent agents in DRL algorithms are highly sensitive to the setting for hyperparameters, which will reduce the efficiency of data interaction. Additionally, the influence of temperature and aging on the EMS is not considered in the proposed method. Therefore, in the upcoming work, factors such as aging status, temperature status, and traffic conditions will also be incorporated into the DRL-based EMSs to improve the management performance of HESS.