*2.4. Aging and Fuel Economy Trade-Off*

This paper analyzes the trade-off between battery aging and energy consumption in an electric vehicle with a HESS. This section briefly touches on why there must necessarily be a trade-off.

Consider the two paths in which power can flow through the HESS, shown in Figure 5. Consider the first case, where power flows primarily or entirely along the upper path, directly between the battery and the electric motor. In this case, the ultracapacitor is used marginally or not at all, meaning there is little to no change in either energy consumption or battery aging compared to a conventional EV that does not include an ultracapacitor. Next, consider a case where power flows primarily on the lower path, such that the ultracapacitor is heavily used and acts as a buffer between the battery and the electric motor. On one hand, the power flowing to or from the battery can be controlled to reduce aging factors such as large currents. On the other hand, the ultracapacitor introduces new resistances to the energy storage system as well as converter inefficiencies, resulting in increased losses. Therefore, any use of the ultracapacitor to reduce battery aging necessarily incurs new energy losses from the internal resistance of the ultracapacitor. In short, battery lifespan cannot be extended without an increase in energy consumption.

**Figure 5.** HESS power paths. Power on the direct path between battery and motor experiences minimal losses, while power on the path through the UC experience additional losses from the UC internal resistance.

#### **3. Control**

In order to fully investigate the benefits of aging-aware control, seven different types of energy management systems are considered:


#### *3.1. Dynamic Programming*

The first six strategies use DP to generate an optimal controller, with the first set of three using DDP and the second set of three using SDP. The development of DP for HEV energy management has been covered by a variety of literature, such as [40–43]. For both DDP and SDP, the optimization problem considers a discrete-time dynamic system

$$x(k+1) = f(x(k), u(k), w(k))\tag{42}$$

where *x*(*k*) is the state vector at time *k*, *u*(*k*) is the control vector, and *w*(*k*) is a vector of any inputs or disturbances. *x*, *u*, and *w* are assumed to exist in finite ranges *x* ∈ *X*, *u* ∈ *U*, and *w* ∈ *W*.

DDP uses exact knowledge of the driver behavior, including knowledge of future behavior, to minimize a given cost function over the complete driving trajectory.

$$J = \sum\_{k=0}^{N} L(x(k), u(k), w(k))\tag{43}$$

where *L*(*x*, *u*, *w*) is an instantaneous cost function, and *x*, *u*, and *w* are the state variables, controlled variables, and system inputs, respectively. Equation (43) is minimized by solving a recursive cost-to-go function

$$V(\mathbf{x}, N) = \min\_{u \in \mathcal{U}} \{ L(\mathbf{x}, u, w(N)) \} \tag{44}$$

$$V(\mathbf{x},k) = \min\_{\mathbf{u}\in \mathcal{U}} \{ L(\mathbf{x}, \boldsymbol{\mu}, w(k)) + V(f(\mathbf{x}, \boldsymbol{\mu}, w(k)), k+1) \} \tag{45}$$

$$\text{for } k = N - 1, \dots, 0$$

starting at *k* = *N* and working backward through time to *k* = 0. The key point of the DDP method is that, at each optimization step, the entire cost from the current time *k* to the final time *N* is minimized, not just the instantaneous cost. *V*(*x*, *k*) is evaluated for each *x* ∈ *X*, so that *V*(*f*(*x*, *u*, *w*(*k*)), *k* + 1) can be interpolated from the prior update. The optimal control is found by a direct search of *u* ∈ *U*. Then, the optimal control *u*<sup>∗</sup> is given by

$$u^\*(\mathbf{x}, k) = \arg\min\_{\boldsymbol{\mu} \in \boldsymbol{I}} \left\{ L(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{w}(k)) + \\ \tag{46}$$

$$V(f(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{w}(k)), k+1) \right\}$$

Meanwhile, SDP uses a stochastic model of driver behavior to anticipate the future driver power or torque requests and minimize the expected value of a given cost function

$$J = \mathbb{E}\left[\sum\_{k=0}^{N} \gamma^k L(\mathbf{x}(k), \boldsymbol{\mu}(k), \boldsymbol{w}(k))\right] \tag{47}$$

where the function **E**[··· ] denotes an expected value, and *γ* is a discount factor 0 < *γ* < 1 that allows the cost function to converge as *k* → ∞. Equation (47) is again minimized with a recursive cost-to-go function

$$V(\mathbf{x}, w, N) = \min\_{\mathbf{u} \in \mathcal{U}} \{ L(\mathbf{x}, \mathbf{u}, w) \} \tag{48}$$

$$V(\mathbf{x}, w, k) = \min\_{\mathbf{u} \in \mathcal{U}} \{ L(\mathbf{x}, \mathbf{u}, \mathbf{w}) + \gamma \cdot \mathbb{E} [V(f(\mathbf{x}, \mathbf{u}, \mathbf{w}), \mathbf{w}, k + 1)] \}\tag{49}$$

$$\text{for } k = N - 1, \dots, 0$$

where, this time, the expected future costs are considered, rather than the exact future costs.

The SDP problem can be treated as a finite horizon problem, where *N* is a fixed number of updates. Alternatively, it can be treated as an infinite horizon problem, where *N* is arbitrarily large and the updates to the cost-to-go function are carried out until the control policy converges, in other words

$$V(\mathbf{x}, w, k) = V(\mathbf{x}, w, k+1) \,\forall \, \mathbf{x} \in X \text{ and } u \in \mathcal{U}. \tag{50}$$

As noted earlier, a value of 0 < *γ* < 1 ensures convergence of the number as *N* → ∞ [44]. Then, the optimal control *u*∗ is given by

$$u^\*(\mathbf{x}, w) = \arg\min\_{\boldsymbol{\mu} \in \mathcal{U}} \{ L(\mathbf{x}, \boldsymbol{\mu}, w) + \gamma \cdot \mathbb{E}[V(f(\mathbf{x}, \boldsymbol{\mu}, w), w, 1)] \}. \tag{51}$$

That is, the control optimizes the final update of the cost-to-go function. Although the SDP problem is solved backwards in time like the DDP problem, the resulting control policy is both time-invariant and causal. This is because the SDP problem does not require future knowledge of *w*; instead, it relies on the time-invariant stochastic model.

For this research, the state variables are the ultracapacitor state of charge *SOCc* and the battery depth of discharge for the current cycle *DoD*. The controlled variable is the power allotted to ultracapacitor *Puc*. The driver power request *Preq* is an input to the controller. For DDP, it is a precisely known function of time, while, for SDP, the future power request is estimated from the current driver power request and the current vehicle wheel speed *ωwh*, based on a stochastic model as described in [40].

It should be noted that, in general, dynamic programming control strategies are considered too computationally expensive to run in real time on a vehicle [5]. Instead, the control policy must be computed off-line and be implemented on the vehicle using a lookup table. This approach requires quantizing the variables into discrete grids of points; linear interpolation can then be used to find the optimal control at any given operating point. The implementation of such lookups tables has been shown to operate well in real time [45].

The three strategies employed by DDP and SDP in this research have a component of their respective instantaneous cost functions *L*(*x*, *u*, *w*) to penalize battery aging and a component to penalize deviation of the UC SOC from a target value *SOCc*,*tgt* = 60%. The SOC deviation penalty serves two purposes: first, it helps to maintain the UC's readiness to handle large currents. If the UC is near its maximum charge, it may be unable to accept a large charging power request, and if the UC is near its minimum, it will be unable to accept a large discharging power request, both of which can strain the battery. Keeping the charge near a central value combats this problem. Second, by varying the penalty on the deviation from the target value, the extent to which the ultracapacitor is used for aging control can be tuned, allowing for a better comparison of lifespan improvements between simulation cases. The manner in which the instantaneous cost functions penalize aging varies, as described below.

The first strategy, employed by DDP-B and SDP-B, directly penalizes battery aging according to

$$L(\mathbf{x}, \boldsymbol{\mu}, \boldsymbol{w}) = (\mathbf{SOC}\_{\boldsymbol{\varepsilon}} - \mathbf{SOC}\_{\boldsymbol{\varepsilon}, \boldsymbol{\xi}\boldsymbol{\xi}t})^2 + Q\_{1, \Delta D} \cdot \Delta D \tag{52}$$

where Δ*D* is the damage to the battery as a result of a given control decision, as given in Equation (38), and *Q*1,Δ*<sup>D</sup>* is a tuned weighting parameter. This strategy is denoted as DP-B when referring to the DDP-B and SDP-B types together.

The second, used for DDP-EC and SDP-EC, penalizes a combination of battery aging, ultracapacitor aging rate, and electrical energy losses according to

$$L(\mathbf{x}, \mu, w) = Q\_{2, \text{SOC}} (\text{SOC}\_{\mathbf{c}} - \text{SOC}\_{\mathbf{c}, \text{tgt}})^2 + Q\_{2, \text{AD}} \cdot \Delta D + Q\_{2, \text{SOC}} \cdot \frac{dSoA}{dt} + Q\_{2, \text{loss}} E\_{\text{loss}} \tag{53}$$

where *Eloss* is the energy losses from the battery and ultracapacitor, obtained from

$$E\_{loss} = R\_{t\eta} I\_{batt}^2 + R\_{uc,pack} I\_{uc,pack}^2 \tag{54}$$

where *Req* is the battery pack series resistance, *Ibatt* is the current through the battery, *Ruc*,*pack* is the ultracapacitor pack series resistance, and *Iuc*,*pack* is the current through the ultracapacitor pack, per the models presented in Section 2.1. Returning to Equation (53), the *Q*2,*<sup>i</sup>* terms are weighting parameters for their respective elements in the cost function. The three weighting parameters *Q*2,Δ*D*, *Q*2,*SOA*, and *Q*2,*loss* are set according to industrial average prices for lithium-ion batteries, ultracapacitors, and energy from the electrical grid [46,47], such that the battery aging, ultracapacitor aging, and energy loss terms are all equally weighted based on their real-world values. Then, the remaining term *Q*2,*SOC* is used to tune the strategy. This strategy is denoted as DP-EC when referring to the DDP-EC and SDP-EC types together.

The third and final strategy does not directly penalize aging but rather penalizes large power going to or from the battery

$$L(\mathbf{x}, \mu, w) = (SOC\_{\varepsilon} - SOC\_{\varepsilon, t \not\subset t})^2 + Q\_{3, P} \cdot P\_{\overline{Ratt}}^2 \tag{55}$$

where *Q*3,*<sup>P</sup>* is a tuned weighting parameter and *Pbatt* is the power going to or from the battery, per Equation (12). In this way, we limit battery damage using only simple knowledge of how the battery ages—that large currents to and from the battery degrade it. Thus, we can distinguish the benefits of direct aging control in the DDP,-B, SDP-B, DDP-EC, and SDP-EC strategies from the benefits of DP control generally. This strategy is denoted as DP-P when referring to the DDP-P and SDP-P types together.

In this research, DDP is used to obtain the global-optimal control strategy for a given cost function and represents the best-case scenario for a controller type. SDP, on the other hand, represents a causal, implementable controller and offers a more realistic understanding of the capabilities of a given cost function design. Because it is causal, it is also a better comparison to the Load Leveling controller. It should be noted that it is possible to adapt the results of DDP optimization into a causal rule base; however, this method is not used in this research.
