1. Introduction
HEV and BEV architectures have become increasingly popular in recent years, not only as a result of government incentives for green energy and global emission regulations but also because of the high demand for improved fuel efficiency. HEVs also serve as a stepping stone to increasing the demand for electric vehicles in daily life [
1,
2].
HEVs can have typologies of different structures, such as serial, parallel, and power split; additionally, they can be classified according to the number of electric motors and their place in the architecture [
3,
4,
5].
There are six main components of the powertrain, and they are the engine, clutch (separation clutch), EM, high-voltage battery, transmission, and differential. The connections, both mechanical and electrical, between these components are shown in
Figure 1. The engine and EM are the two main traction sources of the vehicle and are placed on the same shaft. The energy transfer between these two components is bidirectional, as the battery supplies power to the EM during traction, and the opposite scenario holds during regenerative braking. The clutch, namely, the “separation clutch”, decouples the engine from the EM, which is the most characteristic feature of this hybrid architecture. The transmission includes a gearbox, where a related gear ratio is obtained, and the differential is the mechanical part where the rotational movement of the input shaft is transmitted to the wheels by considering the final drive ratio. P2 HEV architecture allows traction from the engine and EM separately or together based on driver request. Therefore, the most significant problem in achieving higher efficiency is the management of the torque distribution.
As a power source for electric motors, Li-On batteries offer great advantages. It is noteworthy that Li-On batteries, which have many advantages in specific energy, specific power, life span, cost, and performance criteria, are suitable for the automotive field [
6,
7,
8].
The current, voltage, SOC, and energy capacity that a Li-On battery can provide are within certain limits. In Li-On batteries, the ranges of the current, voltage, and SOC play a big role in energy efficiency. For this reason, applications such as the SOC window have been developed, and studies have been conducted to establish more sensitive SOC estimation methods [
9,
10]. In addition, these values also change with the SOH. The SOH is one of the key concepts that need to be optimized, and it provides information about the health status and aging of the battery. Battery aging changes depend on the temperature, number of full cycles, battery chemistry, resistance, SOC status, and environmental conditions [
11,
12,
13].
Various controllers are frequently used for the torque distribution problem, such as model predictive control (MPC), fuzzy logic, and reinforcement learning (RL).
Energy management strategies for HEVs are developed in [
14] using fuzzy logic and Elman neural network tuning. The strategies aim to optimize the power split between the engine and electric motor by considering driving conditions, the battery state of charge, and other relevant factors. The fuzzy logic and Elman neural network tuning algorithms are compared with other energy management strategies using simulations.
The study in [
15] compares two energy management strategies, fuzzy logic and the equivalent consumption minimization strategy (ECMS), for P2 HEVs. The strategies aim to optimize the power distribution between traction sources. A rule-based energy management strategy for a light-duty commercial P2 HEV using dynamic programming (DP) is presented in [
16].
A two-level MPC approach for the energy management of parallel HEVs is proposed in [
17]. The upper-level controller optimizes the power split between the engine, motor, and battery based on current driving conditions and driver demands. The lower-level controller implements the power split and manages the battery SOC. This approach enables the separate optimization of the energy management problem, uses simplified models for the powertrain components, and provides a more efficient and responsive system.
The study on the effect of SOC uncertainty on the MPC-based energy management strategy for HEVs in [
18] examines the impact of uncertain SOC estimations on the performance of the MPC-based energy management strategy for HEVs. The study shows that higher levels of SOC uncertainty result in decreased fuel economy and increased battery degradation. Additionally, the study highlights the importance of accurate SOC estimations for optimal HEV performance and longevity.
An adaptive MPC strategy is proposed in [
19], and it includes battery thermal limitations to optimize the fuel consumption of a P2 HEV. The proposed strategy considers the battery’s thermal behavior and adjusts the power distribution between the engine and electric motor to reduce fuel consumption while maintaining the battery’s thermal limits.
A hierarchical control strategy is proposed in [
20] with a robust MPC for the energy management of parallel HEVs in the presence of uncertainty. The high-level controller is responsible for determining the setpoints of the low-level controller based on the driver demand and the vehicle’s operating conditions. The low-level controller is responsible for optimizing the energy flow between the powertrain components of the HEV based on the setpoints determined by the high-level controller.
A novel energy management strategy is proposed in [
21] using a hierarchical MPC framework for plug-in HEVs. The study introduces a dynamic programming (DP) algorithm to solve the high-level MPC optimization problem efficiently. The DP algorithm optimizes the energy management strategy over the entire driving cycle and provides a set of optimal setpoints for the low-level controller to follow.
Using stochastic MPC and RL, the energy management of plug-in HEVs is analyzed in [
22]. The proposed framework accounts for uncertainties in the PHEV’s operating conditions and uses a probabilistic model to solve a stochastic optimization problem. The study introduces an RL algorithm to learn the optimal control policy based on historical data, which significantly improves fuel economy compared to conventional rule-based strategies.
The main problem in MPC is the execution time of the controller due to its complex structure. To reduce execution frequency, many studies have been performed [
23,
24,
25]. Triggering instances are determined based on the difference between the current measured state and the measurement in the previous sampling instance or the difference between the current measured state and the predicted optimal state [
23,
26,
27,
28,
29].
The potential benefits of using event-triggered MPC strategies are demonstrated for BEVs and HEVs, including improved fuel economy, battery life, and reduced computational complexity, in [
30,
31,
32]. MPC and DQN are different approaches and require different problem-solving methods. While MPC is used to create an optimized control strategy for a dynamic system, DQN combines the Q-learning algorithm with deep learning, allowing an agent to learn what actions to take to achieve the highest overall reward in a given environment. DQN is also used to solve the energy management problem in automotive applications in the literature [
33,
34].
Studies have been conducted to compare MPC and DQN [
35]. In addition, there are studies in which the combination of MPC-DQN is used to increase the performance of MPC and to select design parameters, such as the prediction horizon and weight [
36,
37,
38].
In this study, two nonlinear model predictive controllers are considered for different purposes. The first MPC provides the total torque request for reference velocity tracking under a finite prediction horizon according to the drivability criteria. The second MPC is applied under a finite prediction horizon to provide optimal torque distribution within the traction source limits. There are various physical limitations to the internal combustion engine, electric motor, and battery. Accordingly, the physical limits of the battery, voltage, and current can be applied between certain values. In addition, the working behavior of the battery changes with the aging of the battery. For this reason, the factors affecting the aging of the battery are taken into account. Finally, in order to decelerate the aging of the battery, the SOC window is determined, and the battery is set to operate between certain SOC values. The SOC and SOH concepts for torque distribution are taken into account with MPC.
The execution time of MPC can be considered a shortcoming to solve the complex optimal control problem in each step to provide a prediction control sequence. In this study, to reduce the frequency of solving the optimal control problem and to eliminate the mentioned problem, an event-triggered mechanism is constructed via DQN and integrated into MPC. In addition, due to the fact that the efficiency of the internal combustion engine, the electric motor, and the requested total torque change during driving, the weight terms of the cost function are intended to be modified adaptively. In order to achieve this, training with DQN is carried out, and appropriate weight terms are determined. In this way, the efficiency of energy consumption is improved with less frequency of operation of the controller.
This paper is organized as follows: In
Section 2, the vehicle specifications, suitable battery/cell selection, and simulation and modeling environment are presented. In
Section 3, the MPC methodology and vehicle dynamics are provided. In
Section 4, the event-triggered mechanism and MPC weights via DQN are proposed. In
Section 5, the results are provided and discussed, and
Section 6 concludes this work.
2. Materials and Methods
A P2 HEV is modeled in the MATLAB/Simulink environment by using physical modeling, which contains Simscape™ blocks. This approach helps to integrate the specifications of the powertrain components and vehicle dynamics in such a way that they can be highly accurate. Additionally, three modeling standards, namely, MathWorks Advisory Board (MAB), Japan MATLAB Automotive Advisory Board (JMAAB), and Motor Industry Software Reliability Association (MISRA) 2012, are used to check that the vehicle model is built successfully. The results are created by using the MATLAB Model Advisor toolbox without fail. According to the publicly shared specifications of the P2 HEV Kia Niro, all vehicle components are modeled. The sampling time is chosen as 0.01 s as in most automotive applications.
The type of electric motor used in this vehicle is an Interior Permanent Magnet Synchronous Motor (IPMSM) and is often used in the automotive industry. The technical data, which comprise the type of motor, the maximum amount of torque, the maximum motor power, the operating voltage, the maximum speed, and the efficiency of the inverter and motor, are given in
Table 1. In addition to this information, an electric motor efficiency map based on the motor shaft speed and motor torque is also available. Since all the required parameters in the shared battery data are not available, battery pack and cell selections are made by taking into account the system configuration and motor performance criteria.
An automotive consists of many different components and subsystems. These components and subsystems combine to form a larger system that meets the functional requirements of the vehicle. However, managing all the systems of a vehicle as a whole is quite difficult and complex. Therefore, the system levels of the automotive industry are detailed from general to specific and provide the management of system requirements.
For the battery and cell, starting with L0 vehicle level requirements, L1 powertrain level, L2 battery level, and L3 module/cell level requirements are detailed. Finally, after checking whether the cell to be selected meets the high-level requirements, battery pack and cell selections are completed.
Starting from system level 1, the power of the battery should be calculated in charge and discharge cases based on the powertrain efficiency for battery selection. According to the inverter and electric motor efficiencies, the powertrain efficiency ratio is given in
Table 2.
With these efficiency values, when traction and recuperation such as regenerative braking are performed, the target power values of the battery in charge and discharge cases are as shown in
Table 3.
Afterward, a usable energy target which is shown in
Table 4 is found over the estimated consumption values with the maximum range that can be reached in 1 battery cycle using only the motor for traction.
Another important criterion in determining battery cell is the chemistry of the battery. With different battery chemistries, the weight, cost, safety, performance, power, and voltage values of the battery vary. There are 6 different types of chemistries in frequently used Li-On batteries (
Table 5) [
39,
40].
Among other criteria, specific power and specific energy are evaluated as high priority, and NMC, which is the most widely used in the automotive industry, is selected as the battery chemistry. As the system level increases from 2 to 3, the serial–parallel configuration is determined by considering the battery’s chemistry and power demands. The cell nominal voltage of NMC, which is the chemistry of the selected battery, is 3.7 V. A total of 65 cells are needed to supply the system voltage. After determining the total number of cells that need to be connected in series, using the battery’s charge and discharge power values and system voltage values, the maximum charge and discharge requested currents of the battery are as follows:
where
V and
V are the system and cell nominal voltage;
N is the number of cells connected in series; and
P,
P,
I, and
I are the 10 s peak discharge and charge power and current. By using 65 cells with a 3.7 V cell voltage, the battery can provide 240.5 V as the nominal voltage. After determining the use of 65 cells in series, the usable energy capacities are found to be 65s1p and 65s2p for the parallel configuration (
Table 6). The usable energy capacity of the battery is obtained by using the ratio of the usable target energy and the voltage. With the use of the SOC window, the energy capacity of the battery corresponds to separate values as installed and usable. There are various applications for using a battery pack for a long time or over a cycle to obtain consistent power without reducing its lifespan too much. One of them is to create an SOC window for SOC. The SOC window is intended to improve performance by operating the SOC in a narrower range rather than in the 0–100 range. The SOC window changes according to the structure and architecture of the vehicle. While this range is larger in electric vehicles, the SOC window is smaller in hybrid electric vehicles. It is needed to ensure that the battery is not used at very high and very low SOC percentages. High voltage values are required for a high SOC. In high-SOC situations, high voltage values stress the cell. For this reason, the life of the battery is reduced. Exceeding the maximum voltage limit represents a dangerous situation at high voltage, and it risks the safety of the cell and battery pack. For a low SOC, however, the open-circuit voltage values decrease, and the internal resistance increases. Thus, in the case of discharge, the voltage value may decrease further, and the minimum voltage limit may be exceeded. Due to all these reasons, the SOC is operated between 30% and 80%, and the SOC window is selected as 50% by examining other applications. The installed energy capacity of the battery is calculated by multiplying the usable energy capacity of the battery with the SOC window.
A battery with NMC chemistry that provides 12.474 Ah or 6.237 Ah cell-installed energy capacity (Ah) and can deliver more than the desired current values should be selected. Batteries, which are frequently used in the automotive industry and are the subject of long-term tests, are examined, and a battery that meets the system requirements is selected. The battery specifications are shown below (
Table 7).
It is seen that the usable energy capacity, required current limits, and voltage values, which are system requirements, are provided with the characteristics of the selected battery.
3. System Dynamics and Prediction Model
In this section, the flow diagram of the study (which can be seen in
Figure 2), vehicle dynamics equations, and relations for nonlinear model predictive control are explained.
As can be seen in
Figure 2, while tracking the Worldwide Harmonized Light Vehicle Test Procedure (WLTP) velocity reference and distributing the torque between the engine and motor, two MPCs are integrated into the powertrain system, and they dynamically adjust the torque to optimize performance and efficiency.
The first MPC calculates the total torque request, which is the control signal u, based on the WLTP velocity reference V, vehicle position x, and velocity x by considering drivability. The total torque request is the setpoint of the second MPC. The requested total torque is distributed between the engine and motor based on a number of constraints, such as the SOC, engine torque limits, motor torque limits, and battery current I via the second MPC. The distribution of the torque is adjusted continuously based on feedback from the vehicle velocity, SOC level, open-circuit voltage, and actual gear.
The requested torque outputs and of the powertrain system are controlled and regulated by various components, such as the engine control unit (ECU), the motor control unit (MCU), and the transmission control unit (TCU). After that, the produced torques, which are and , are typically transmitted to the wheels as and via the transmission and final drive ratio, respectively. In an HEV, the electric motor is powered by a battery pack, which provides electrical energy to the motor. The current flow in the powertrain system refers to the flow of electrical power from the battery to the motor, as well as the flow of energy back to the battery during regenerative braking. The current is controlled and regulated by the battery management system (BMS), which adjusts the voltage and current supplied to the motor based on the desired torque output and other factors. The event-triggered mechanism via DQN and the weights of the second MPC via DQN are integrated to increase the efficiency of the MPCs.
In this study, two MPCs are integrated to obtain the optimal solution. Using two MPCs instead of one complex MPC has advantages in systems with many constraints. With regard to improving performance and feasibility, in systems with many constraints, it can be challenging to find a solution that satisfies all constraints. By using two simple MPC controllers, each controller can focus on a subset of constraints, making it easier to find feasible solutions that satisfy all constraints. Additionally, reducing computational complexity is another aim. When there are many constraints, a complex MPC controller may require a large number of optimization variables, resulting in high computational complexity. By using two simple MPC controllers, each controller can focus on a smaller set of optimization variables, reducing the overall computational complexity of the control system. Lastly, each controller can be designed to optimize different aspects of the control performance, providing greater flexibility in the control system.
For the prediction model, resistance forces, which are given in Equation (7), affect the vehicle in the opposite direction depending on vehicle velocity, mass, frontal area, etc. Basically, there are three main resistances forces that are applied to the vehicle.
By taking into consideration the road gradient, wind speed, and mechanical motion, the resistance forces are calculated in the prediction model to generate a control signal in order to track the reference input.
With the first MPC, the total torque request should provide the traction forces not only for all resistance forces but also for the velocity tracking error between the vehicle velocity and reference velocity. The nonlinear model equations are as follows:
where
is the control signal, which is the requested total delta torque.
x and
x are system states, which are the position and vehicle velocity. The axle-based torque values
and
are transmitted to the wheel-based torque via the powertrain components, which are the transmission and final drive ratio. The wheel-based torque is converted to the requested force by dividing the wheel radius.
x is updated by considering the net force and vehicle mass, which correspond to acceleration. For the vehicle dynamics of P2 HEV, the nomenclature is given in
Table 8.
The limit of motion, as defined in the ISO 2631-5 standard (
Table 9) [
41], is a measure of the maximum displacement of a vehicle’s body or chassis experienced by the driver or passengers during normal driving conditions. It is expressed as a percentage of the total body or chassis displacement and is used to evaluate the drivability of a vehicle. The limit of motion can significantly affect the comfort and safety of the vehicle’s occupants, with vehicles with a lower limit of motion typically being more stable and comfortable to drive. However, vehicles with a higher limit of motion may be more prone to vibrations and other movements that can be unsettling for the driver and passengers.
According to the ISO 2631-5 standard, comfort levels are given according to the acceleration values. Torque demands and torque variation occur in line with acceleration changes. In order for the requested torque and requested torque changes to be realizable, the acceleration changes specified in the standard are taken as a reference. Using the acceleration–torque relationship, the torque variation limits are determined as follows:
where
and
are the control signal upper and lower limits, which correspond to the ISO 2631-5 acceleration limits. For the total torque request, the cost function is established as follows:
where
N is the prediction horizon,
W is the tracking weight, and
x is the reference velocity for a given discrete time step
k. After calculating the requested total engine and motor torques, one must calculate how much torque should be demanded and from which traction source for minimum power consumption. In HEVs, the torque distribution between the engine and the motor can have a significant impact on the vehicle’s performance and fuel efficiency. The engine and motor can work together to provide the necessary power to the vehicle, or they can operate independently to optimize efficiency.
For example, in a hybrid electric vehicle, the engine may be used to power the car at higher speeds, while the electric motor is used at lower speeds or during acceleration to provide additional torque. This can help to reduce the workload on the engine and improve fuel efficiency. Similarly, the electric motor may be used to power the vehicle during stop-and-go traffic or at low speeds, which can help to reduce fuel consumption and emissions.
Variations of motor torque and engine torque are selected as control signals. The aim of the selection of the delta motor and engine torque as control signals is to prevent steady-state errors by using integral action. The equality constraint is applied in order to be able to generate the requested torque for the HEV in Equation (18).
Powertrain components work within the limited operating range inherently; hence, the limits of the motor, engine, battery, and associated system variables are taken into consideration for the controller design.
Figure 3a,b show the characteristics of the engine and motor, respectively. The blue lines represent the rotational speed and torque curves. These torque curves are the physical limits of the engine and motor; hence, they are used as MPC constraints. The colored regions show the engine and motor efficiency, which takes the values shown on the efficiency bar.
For the torque constraints, the rotational speed is calculated via the vehicle speed dynamically.
The engine and motor constraints for the HEV are considered with the following equations:
where
,
,
, and
are the upper and lower limits of the engine and motor. The axis values of the engine and motor, which are taken from the vehicle datasheet, are predefined to generate limit curves via interpolation. According to the axis values of the torque and rotational speed, the upper limits of the engine and motor are determined based on the actual rotational speed by interpolating. The engine cannot produce negative torque, but the motor can work as a generator to charge a battery, so the lower limit of the engine is 0, and the lower limit of the motor is the negative sign of the upper limit.
In order to provide the torque to be demanded from the motor, the battery must provide a current. The relationship between the battery current and motor torque indicating motor efficiency can be used to calculate the required battery current.
The motor torque can be represented by the control signal, while the motor speed in the next sample is obtained via the vehicle speed using Equation (
19). Motor efficiency is calculated with a lookup table that includes the motor torque and motor speed.
The voltage can be found in the battery equivalent circuit depending on the current with the ohmic resistance, open-circuit voltage, and RC pair, which shows Butler–Volmer effects or the double-layer effect. The double-layer effect is a phenomenon that occurs at the interface between a solid and a liquid (such as an electrode and an electrolyte). It is caused by the presence of a thin layer of charges at the interface, which can be either positive or negative depending on the materials involved. This thin layer of charges, known as the electric double layer, can have a significant effect on the overall behavior of the system. The double-layer effect behavior is similar to that of the capacitor [
42]. The Butler–Volmer equation is a mathematical expression that describes the current flowing through an electrode–electrolyte system in terms of the overpotential (a measure of the driving force for the reaction) and the exchange current density (a measure of the rate at which the reaction occurs). The equation takes into account the effect of the electric double layer on the overall behavior of the system and can be used to predict the behavior of electrochemical cells under various conditions [
43]. In light of this information, the battery equivalent model is given in
Figure 4, and the mathematical equations of battery are as follows:
where
V is the one-time constant voltage for the RC pair,
is the time constant of the RC pair used to represent the double-layer effect,
R is the ohmic resistance,
V is the battery voltage, and
V(SOC) is the open-circuit voltage. The open-circuit voltage has different behaviors according to the charge and discharge conditions. In the open-circuit voltage change, the charging and discharging speeds can be different due to different current values. The charging rate of the battery is denoted by C as shown below. For example, if a battery with a capacity of 6.5 Ah is charged with 13 A, it is charged in half an hour and charged with 2C.
V(SOC) is calculated according to the curves presented in
Figure 5.
Another electrochemical event is the mass transport or Warburg effect. Since long periods such as minutes or hours are required to observe mass transport or Warburg effects, the same battery circuit model is used like many commercial vehicles’ battery models, and this effect has been ignored. The battery current varies with the pack temperature and SOC value. The currents that can be provided by considering these variables differ in charge and discharge processes. In the model used, the battery temperature is assumed to be constant at 25 °C; in this way, a three-dimensional representation which is given in
Figure 6 becomes a two-dimensional representation.
In
Figure 7, the maximum current limits that can be given depending on the SOC and constant temperature are given according to the charge discharge.
The charge current represents a positive value, while the discharge current represents a negative value. For this reason, the discharge current limit is shown as the lower limit, and the charging current is shown as the upper limit.
The SOC variation is determined using the ratio of the battery current to the rated capacity. The SOC is calculated by adding the SOC variation to the initial SOC value.
where
SOC is the initial SOC,
I is the battery current, and
is the cell coulombic efficiency in the charge phase [
44].
The initial value of the SOC and the regular current measurements are very important in this method. Although it is a very widely used method, calendar aging should be taken into account in the calculation of the initial SOC value, and the effects of the SOH on Q should be taken into account.
The Q capacity value changes with the SOH effects. As a rule of thumb, when the capacity of a battery drops to 80%, it is assumed that the battery has reached the end of its life for automotive applications. For this reason, the aging effects of the battery are examined between 100% and 80% of the SOH value. The investigated effects are cyclic effects and temperature effects.
Each charge and discharge of the battery is shown as one cycle. The aging of the battery increases with each cycle. This cycling effect also changes with the use of the SOC window.
Cell temperature, which is another factor, can have a significant effect on the SOH of a battery. High temperatures can accelerate the degradation of the battery, reducing its overall lifespan and performance. This is because high temperatures can cause chemical reactions to occur within the battery at a faster rate, leading to increased chemical breakdown. However, low temperatures can also have a negative impact on the battery SOH by slowing down the battery’s chemical reactions and reducing its performance. This can lead to a decrease in battery capacity and overall lifespan. To show the cell temperature at −10, 25, and 45 and the number of cycle effects, the capacity retention and internal resistance change are given below.
According to
Figure 8, the SOH function is created based on the number of cycles and cell temperature in Equation (
33).
Q is updated via the lookup table based on the effects of cycling
N and the
T temperature. Thus, the accuracy of the estimation is increased by using the SOH effects given in Equation (
33) in the SOC estimation.
The SOC window is important for managing the battery’s SOC within a safe and optimal range. The battery may be operated at either very high or very low SOC levels, which can lead to accelerated degradation, reduced performance, and safety hazards, which are explained in
Section 2. By selecting an appropriate SOC window, one can maintain the battery within a safe and effective operating range, prevent overcharging or deep discharge, avoid thermal runaway, and prolong the battery’s useful life; hence, the SOC is requested to operate in this operating range and is added as a constraint to the MPC. The SOC window and MPC constraint are shown in
Figure 9.
A lower power consumption indicates a lower power loss. For this reason, the cost function is as follows:
where
W,
W, and
W are weighted coefficients for the state and control signal. Additionally, the motor, engine, and battery losses are represented as follows:
All constraints for the torque distribution are as follows:
4. MPC Event-Triggered and Weight Adaptation Mechanism
4.1. Training of Event-Triggered Mechanism
In the HEV, the event-triggered mechanism via DQN is implemented in the MPCs for the velocity tracking system.
Event-triggered MPC has several advantages over traditional MPC. First, it can reduce the computational burden of the control algorithm, as the control action is only updated when necessary. This can be particularly important in systems with limited computing resources or in applications where frequent updates of the control action are not required. Second, event-triggered MPC can improve control performance by ensuring that the control action is updated in response to significant changes in the system behavior rather than continuously updating the control action regardless of the system state. This can result in a more efficient and effective control strategy, especially for systems with nonlinear or time-varying dynamics. To realize the event-triggered mechanism, DQN is used.
The goal of realizing the event-triggered mechanism via DQN is to learn the optimal control policy that maximizes the expected cumulative reward over a finite time horizon, given the current state
s of the system. In the event-triggered DQN, the Q-function
Q is used to approximate the expected cumulative reward of taking action
a in state
s and following the optimal control policy thereafter:
where
r is the reward at time
k, which is a function of the state
s and the control action
a. The
Q-function is updated as follows:
where
r is the immediate reward for taking action
a in state
s;
is the next state; and
and
are the learning rate and discount factor, respectively. The learning rate and discount factor are two important hyperparameters that affect how the agent learns and makes decisions. The learning rate determines the step size at each iteration of the Q-learning algorithm, which updates the
Q-values of the state–action pairs based on the rewards received by the agent. The discount factor determines the importance of future rewards in the agent’s decision-making process. It determines how much the agent values immediate rewards versus long-term rewards.
The main aim is to track the velocity reference with less triggering. To track the velocity reference, states, and action are determined as follows:
The states are determined as the difference between the vehicle velocity and forward-first four reference velocity values. For the triggering mechanism, an action set consists of “0 and 1”, where 0 means no triggering, and 1 means triggering.
One of the essential topics is the reward for DQN. A reward is a metric that tells the system how well it is performing. It can serve to maximize gains, minimize losses, etc., but the general goal is to maximize the total rewards over time. To fulfill the objective, the reward is determined based on the tracking ability and triggering.
where
w and
w are weight calibration parameters, and
w is the calibration lookup table used to calibrate the reward function. The first term of the reward is the state difference reward. It is used as a punishment from a technical point of view. In the case of a higher state difference, the punishment increases based on the quantity of the state difference between the vehicle velocity and vehicle velocity forward-first four references.
The second term of the reward is the track reward. The second reward term is related to the first reward term, and if the difference between the vehicle speed and the first four velocity reference values is lower than a certain parameter Lim, it gives the reward a positive value according to the velocity difference and 1D lookup table. Otherwise, no rewards are given.
The third reward term is related to triggering. The goal is to perform velocity tracking with minimal triggering. For this reason, a penalty value is applied in the case of triggering.
There are few ways to determine the neuron size and hidden layer number based on experiments, intuition, and literature studies. The neuron size, hidden layer size, and activation function type are determined based on trial and error. The activation function is selected as the rectified linear unit (RELU), the hidden layer number is determined to be 3, and the neuron size is selected to be 24 empirically. The working condition of the event-triggered mechanism is that the DQN output is greater than 0.5.
The world, real or virtual, in which the agent performs actions when the reinforcement system begins exploring its environment, is entirely unaware of this system behavior. By observing the results, it gains more experience. The exploration vs. exploitation trade-off is typically controlled by an epsilon-greedy policy. During training, the agent selects the action with the highest Q-value with probability 1 −
and selects a random action with probability epsilon. This way, the agent can balance the need to explore and the need to exploit.
where
is a hyperparameter that controls the trade-off between exploration and exploitation in the agent’s decision-making process. Moreover,
is updated as follows:
With every episode, the epsilon value decays by using , which is the epsilon decay parameter. In this way, in the first phase of training, the agent explores the environment. By decreasing the epsilon value, the agent exploits and gains more rewards. Finally, for training, the MATLAB Reinforcement Learning Toolbox is used.
4.2. Training for Adaptation of Weights of MPC’s Cost Function
For this section, the cost function of the weight terms updated with DQN is employed. The main advantage of using DQN for MPC weight training is that it can handle complex, high-dimensional state and action spaces that may be difficult to optimize using traditional optimization techniques. DQN can learn an optimal policy by approximating the Q-value function, which is a measure of the expected cumulative reward obtained by following a particular policy from a given state.
By training the MPC weights with DQN, the cost function can be optimized to minimize the error between the predicted and actual system behavior. This is particularly useful in systems with nonlinear dynamics, where it may be difficult to design an accurate cost function using traditional optimization techniques.
In addition, DQN can learn from experience, which enables it to adapt to changing system dynamics over time. This makes it possible to improve the MPC weights as the system evolves and to handle disturbances or uncertainties in the system.
The main aim is to increase energy performance and decrease fuel consumption by modifying the weight terms. In this way, better optimization can be provided for the torque distribution problem.
For the DQN design, axle-based torques are predicted by using the forward-first four reference velocities, actual vehicle velocity, and resistance forces with the same principle in Equations (7) and (13); then, they are assigned as DQN states. The cost function consists of the internal combustion engine power, electric motor power, and associated battery power. The reason why the DQN states are chosen as mentioned is that DQN can establish a link between the predicted axle-level torque values and these power values.
In order to determine the weights of the cost function, the DQN action set is determined between 0.1 and 5. The weight coefficient used for the electric motor power
W is determined during the training and transferred to MPC as an action value. The weight of the battery
W is selected as
W for the sake of simplicity. The weight coefficient of the internal combustion engine power
W is chosen as the complement of the weight coefficient of the electric motor. In other words,
W is equal to subtract the trained weight
W from 5, which is the specified maximum weight. Two traction sources work in coordination. Using this relationship, the computational load and complexity of the training are reduced by training only the motor power’s weight instead of two weight terms. The DQN state and action for energy minimization via weight are given in Equation (56).
where
a and
a are the reference acceleration and vehicle acceleration, respectively. By using the reward function, the DQN agent learns to optimize the MPC weight terms to minimize the total power consumption of the system and to maximize the motor and engine efficiencies. The agent learns to select the torque distribution among the axles that results in the highest efficiencies while still meeting the performance requirements of the system.
The use of this reward function provides a strong incentive to the agent to improve the energy efficiency of the system. The reward function provides informative feedback to the agent on the energy efficiency of its behavior and guides the agent towards the optimal policy that maximizes the motor and engine efficiencies. In this direction, the reward function is given in Equation (
57).
For the reward function that tries to maximize the average efficiency of the motor and engine, if the average efficiency is higher than the value of the limit parameter, which is Lim, the difference is given as a reward via the 1D lookup table. If the average efficiency is lower than the value of the limit parameter, the difference is multiplied by the adjustment parameter and used as a punishment.
The same neural network structure of 3 hidden layers with 24 rectified neurons for each layer is used. For the training hyperparameters, the learning rate, discount factor, epsilon, and mini batch size are determined to be 0.001, 0.99, 0.95, and 64, respectively.
5. Results and Discussion
In this section, the first and second MPCs and vehicle outputs are demonstrated.
The first 300 s of the WLTP are used as a reference. The reason for using the first 300 s of the WLTP test cycle is that they are considered to be representative of typical urban driving conditions. During this phase, the vehicle undergoes multiple acceleration and deceleration cycles, simulating stop-and-go traffic and city driving. The results of the total torque demand of the first MPC, which is the total torque request control signal, and the torque distribution of the second MPC, taking into account the corresponding constraints of the engine and the motor, are given in
Figure 10.
In
Figure 11 and
Figure 12, the torque produced by the engine and motor, shaft speed, consumed power, and lost power results are given.
The gear shifting of the gearbox is performed via the lookup table based on the accelerator pedal and vehicle velocity, which is converted from the output shaft speed. The gear shifting results, which are obtained by also considering the slope of the road, are given in
Figure 13. For some period of the simulation, due to the braking priority, the accelerator pedal is set to zero.
The SOC, voltage, and current results are given in
Figure 14 for the battery. When the SOC is 80%, a voltage of 260 V is provided, and the voltage value changes as the SOC value changes. Other factors affecting the change in the voltage value are the ohmic resistance and the double-layer effect (Butler–Volmer). In line with these effects, relaxation is observed between 80 and 140s. The charging current is shown as a positive value, the discharging current is shown as a negative value, and it is observed that the current is between the limits, which are the MPC current constraints given in
Figure 7.
In order to examine the behavior of MPC with different initial SOC values and its results against the SOC constraint, simulation results are obtained by assigning an initial SOC value from the upper SOC limit, a value from the middle of the SOC limit, and a value close to the lower SOC limit. At different initial SOC values, it is observed that it does not violate the SOC constraints, and there are differences in the behavior of MPC. In order not to violate the SOC lower limit, it is observed that the electric motor generally works in the charging regions. While the initial SOC values are 80 and 50, it is observed that the electric motor is frequently used for higher efficiency. Additionally, the battery voltage values also change due to different initial SOC values. The SOC and battery voltage results are given in
Figure 15.
In the simulation where the initial SOC value is close to the lower limit, the current values are usually in the positive (charging) region, not to violate the SOC lower limit. However, in the simulations with the initial SOC values of 80 and 50, a similar current profile with frequent charging and discharging is observed in
Figure 16.
Finally, motor and engine torque behaviors are given at different initial SOC values in
Figure 17. In the simulation, where the initial SOC value is close to the lower limit, the motor is generally used as a generator, and the engine mainly provides the requested traction torque.
An examination of the effects of the SOH on the SOC is one of the important points of this study. The battery chemistry, cycle aging, and cell temperature are the main factors in SOH changes. The cycling aging and cell temperature effects on the SOH are shown in accordance with the SOC window in
Figure 18.
In the figure above, the SOH effect of the battery under different conditions is examined. Under the conditions of 3200 cycles and a cell temperature of 25 °C, the SOH is 90%, the internal resistance is equal to 108.5% of the initial resistance value, and Q is equal to 5.85 Ah. Under the conditions of 4400 cycles and a cell temperature of 25 °C, the SOH is 85%, the internal resistance is equal to 116.4% of the initial resistance value, and Q is equal to 5.525 Ah. Under the conditions of 5500 cycles and a cell temperature of 25 °C, the SOH is 80%, the internal resistance is equal to 130.2% of the initial resistance value, and Q is equal to 5.2 Ah.
When the figure is examined, the battery capacity decreases under these effects; when the SOC shows the same value, serious time differences occur, and it is observed that the battery discharges faster when the SOH is low.
Finally, the velocity tracking results are given in
Figure 19.
Event-triggered MPC with a DQN agent can achieve similar or better performance than traditional time-triggered MPC while reducing communication and computation requirements. For the event-triggered mechanism via DQN, the reward and action, which are designed based on the number of triggers and the error in velocity tracking, are given in
Figure 20.
After observing that similar tracking velocity results are achieved with less triggers, it is aimed to reduce motor and engine losses by modifying the weights with DQN. To test the MPC whose weights are provided by DQN and to compare the performance with the previous method, which is without DQN, results are generated. For the WLTP reference, the motor and engine weights obtained with DQN are given in
Figure 21, and the DQN reward and action are given in
Figure 22.
For energy minimization in WLTP, the engine and motor operational points are marked and plotted on efficiency maps. The cyan marks represent the method without DQN, and the white marks represent the DQN weight algorithm as in
Figure 23 and
Figure 24.
Indicating energy savings and fuel consumption figures via algorithms with and without DQN can be important for providing a way to quantify and compare the performance of different control strategies and algorithms for improving energy efficiency in vehicles. This is why the fuel consumption results are given in
Figure 25. The Simulink Simscape engine module allows us to model and simulate the behavior of an engine. Additionally, it provides the power output of the engine under different operating conditions. However, brake specific fuel consumption (BSFC), which is obtained from the given vehicle specifications, is a measure of the amount of fuel consumed by an engine per unit of power produced and provided by the Simscape model. Then, it is converted to the fuel consumption in units of g/s for comparison purposes. As shown in
Figure 25b, the event-triggered double-MPC structure with DQNs performs the drive cycle under consideration with less fuel consumption in terms of gram per second. In addition, the total consumption is derived as 102.076 g with the event-triggered double-MPC structure with DQNs, while it is calculated as 104.996 g with the classic double-MPC structure by integrating fuel consumption.
After the triggering instances and weight training, the execution time, triggering number, average velocity error, and motor and engine efficiency values are as shown in
Table 10 for the WLTP reference.
The purpose of the event-triggered mechanism is to reduce the computational cost by triggering the controller less frequently. It aims to follow the velocity reference as much as possible while reducing the computational cost. The execution time of a simulation depends on various factors, such as the complexity of the system being simulated, the accuracy and resolution of the simulation model, the computational resources available, and the simulation software being used. Overall, using multiple measurements of execution time can help to reduce the impact of instant factors on the results, improving the accuracy and reliability of the measurements. This is why the execution time is measured more than ten times, and the average execution time is used as a result via the Simulink Profiler Toolbox.
As can be seen in
Table 11, the classic double-MPC structure and event-triggered double-MPC structure are compared. In a Simulink Profiler report, time refers to the total execution time of the subsystem, including the time spent executing its own code, as well as any code in the MPC subsystem. Self-time, however, only refers to the time spent executing the code directly within the MPC, without including any time spent executing the code in its elements, which transfer the outputs. Calls indicate the event-triggered number, and the time/call metric refers to the average execution time of MPCs per function call. The results represent the performance of two MPCs in total. According to the results, the event-triggered mechanism reduces the execution time by triggering fewer controllers.
After this stage, the weights of MPC are trained with DQN to distribute the torque more efficiently. According to the results, the efficiency of the motor increases by 3.61%, while the efficiency of the engine increases by 2.86%. This efficiency increase is achieved by triggering 52.01% fewer controllers. Thus, while the computational cost is reduced, the energy efficiency is also increased with MPC and DQN.