Next Article in Journal
Methodological Planning to Determine the Technological Expansion of Smart Metering Systems for Utilities
Previous Article in Journal
High-Switching-Frequency SiC Power Conversion Systems with Improved Finite Control Set Method Prediction Control
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Extending the BESS Lifetime: A Cooperative Multi-Agent Deep Q Network Framework for a Parallel-Series Connected Battery Pack

Department of Electrical, Electronic and Computer Engineering, University of Ulsan, Ulsan 44610, Republic of Korea
*
Author to whom correspondence should be addressed.
Energies 2024, 17(18), 4604; https://doi.org/10.3390/en17184604
Submission received: 23 July 2024 / Revised: 27 August 2024 / Accepted: 11 September 2024 / Published: 13 September 2024
(This article belongs to the Section D2: Electrochem: Batteries, Fuel Cells, Capacitors)

Abstract

:
In this paper, we propose a battery management algorithm to maximize the lifetime of a parallel-series connected battery pack with heterogeneous states of health in a battery energy storage system. The growth of retired lithium-ion batteries from electric vehicles increases the applications for battery energy storage systems, which typically group multiple individual batteries with heterogeneous states of health in parallel and series to achieve the required voltage and capacity. However, previous work has primarily focused on either parallel or series connections of batteries due to the complexity of managing diverse battery states, such as state of charge and state of health. To address the scheduling in parallel-series connections, we propose a cooperative multi-agent deep Q network framework that leverages multi-agent deep reinforcement learning to observe multiple states within the battery energy storage system and optimize the scheduling of cells and modules in a parallel-series connected battery pack. Our approach not only balances the states of health across the cells and modules but also enhances the overall lifetime of the battery pack. Through simulation, we demonstrate that our algorithm extends the battery pack’s lifetime by up to 16.27% compared to previous work and exhibits robustness in adapting to various power demand conditions.

1. Introduction

Lithium-ion batteries used in electric vehicles (EVs) have a finite lifetime, and for safety reasons must be replaced when their capacity drops to 80% or below [1]. However, these batteries can still be repurposed for other applications, where the remaining capacity is sufficient for less demanding tasks. Figure 1 illustrates the life cycle of lithium-ion batteries, where a battery energy storage system (BESS) can effectively utilize retired batteries when their state of health (SOH) is between 80% and 60% [2,3]. The BESS comprises one or more battery packs, each of which uses a group of battery cells connected in parallel and in series [4], to store electrical energy as backup power for households, data centers, charging stations, etc. [5]. Battery cells in a battery pack can be connected in one of two architectures shown in Figure 2: (a) a module groups the batteries in series, and then the modules are connected in parallel (denoted S-P), which is useful for high voltage applications; and (b) a module groups batteries in parallel, and then the modules are connected in series (denoted P-S) for applications requiring high capacity. In the past few years, many projects around the world have implemented a BESS by repurposing EV batteries. In the Netherlands, a 2.8 MWh BESS was installed for Johan Cruijff Arena in 2018 by reusing Nissan LEAF battery packs, each consisting of 192 cells, in an S-P connection [6]. In Finland, a 2.6 MWh BESS was built in 2021 as a backup power resource for the power grid by repurposing Tesla Model S battery packs, each consisting of 7104 cells, in a P-S connection [7]. Table 1 summarizes recent projects that repurposed EV batteries for a BESS. By leveraging a BESS, the demand for new batteries can be significantly reduced, thereby lessening the environmental impact associated with battery production. However, a BESS that reuses retired lithium-ion batteries from EVs still has some limitations when implemented.
One significant challenge in implementing a BESS is the varying capacity levels of the repurposed batteries connected in parallel and in series, indicating heterogeneous SOHs. The cells within a retired battery pack that exhibits heterogeneous SOHs can result in capacity and voltage imbalances, leading to inefficient energy storage and distribution [8]. This discrepancy in capacity can cause weaker cells to discharge faster or to overheat, potentially shortening the lifetime of the entire battery pack and posing safety risks. Furthermore, the varying SOHs among the cells complicates the task of maintaining balanced charging or discharging across the battery pack, making it difficult to achieve an optimal performance and lifetime [9]. As a result, the cells and modules need to be appropriately connected or bypassed (known as scheduling) to mitigate these issues and ensure the reliable operation of a BESS that reuses EV batteries.
Table 1. Projects around the world that reused batteries from EVs.
Table 1. Projects around the world that reused batteries from EVs.
Name’s ProjectApplicationsCapacityEV ModelBattery Pack Configuration
Johan Cruijff Arena (Netherlands) [6]PV power supply, emergency supply2.8 MWh590 Nissan LEAF battery packs96S-2P 1 (192 cells)
Former coal-fired power plant in Elverlingsen (Germany) [7]Energy storage system for households3.0 MWh72 Renault Zoe battery packs96S-2P (192 cells)
Cactos One Energy Storages (Finland) [10]Energy storage system for households2.6 MWhTesla Model S battery packs74P-96S 2 (1704 cells)
EUREF Campus (Germany) [11]Multi-use storage unit compensates for fluctuations in the grid1.9 MWhAudi battery packs4P-108S (432 cells)
TGN Energy battery energy storage (Norway) [12]Increased self-consumption216 kWhMercedes-Benz battery packsNA
Landafors hydropower plant (Sweden) [13]Offers fast frequency reserve regulation to the power markets250 kWh48 Volvo plug-in hybrid battery packsNA
1 Ninety-six series-connected batteries in a module, then two modules are connected in parallel. 2 Seventy-four parallel-connected batteries in a module, then 96 modules are connected in series.
In the BESS, switches are integrated to schedule cells and modules by connecting or bypassing them [14]. These switches enable scheduling to selectively isolate degraded cells or modules, thereby extending the battery pack’s useful lifetime [15]. By dynamically adjusting the connections between cells and modules, it is possible to balance the load effectively and mitigate the impact of lower capacity cells and modules. Scheduling not only improves the reliability and efficiency of the BESS, but also reduces the need for new batteries by maximizing the utilization of existing battery resources. To achieve this, scheduling requires all the states in the BESS including the state of charge (SOC) and the SOH of the cells and modules, the terminal voltage and output current of the battery pack, the power demand (e.g., from households), and the available power supply (e.g., from solar energy) [16,17,18]. The SOC of a battery indicates the current charge level as a percentage of its maximum capacity, whereas the SOH represents the ratio of the battery’s current maximum capacity to its original rated capacity. The required power demand and available power supply, collectively known as external systems information, refer to the amount of power the BESS needs to discharge and recharge. Incorporating BESS states into scheduling policy protects the battery pack by preventing excessive charging or discharging, and helps connect or bypass cells and modules appropriately. Furthermore, scheduling balances the SOC and SOH across cells and modules, ensuring that no single cell or module is overloaded [19]. In particular, SOH balancing through scheduling reduces the difference between SOHs among cells by utilizing the cells with higher SOHs and bypassing the cells with the lowest SOHs. In this way, the rate of degradation of cells or modules with lower SOHs is minimized. This balance helps to distribute the load more evenly, further protecting the battery pack and extending its useful lifetime.
Scheduling for battery packs in a BESS has been explored in the literature. In [20], an adaptive control algorithm was proposed to balance the SOCs for a series-connected battery pack. However, this approach overlooked the SOH and did not address parallel connections, limiting its effectiveness in comprehensive battery management. In [21], a controller was focused on balancing the SOCs in parallel-connected batteries to prevent overcharge or overdischarge, but failed to consider SOHs and series connections. An approach was presented in [15], where SOC balancing in parallel-series connections was proposed for automatic configuration according to the dynamic load, the storage demand, and the condition of each cell (i.e., SOC and current), yet it ignored SOH, which impacts a BESS lifetime. Other methods proposed in [18,22] were aimed at balancing SOHs by adjusting the charge and discharge durations for cells with weaker SOHs, but they ignored SOC and external systems information. Traditional methods like those presented in [15,18,20,21,22] are limited when they do not consider both SOC and SOH, because ignoring one can lead to suboptimal performance and reduced battery lifetime [19].
Deep reinforcement learning (DRL) has become a promising direction for battery pack scheduling with its ability to observe multiple states in a BESS and develop appropriate scheduling policies to optimize problem formulation. The combination of neural networks with reinforcement learning in DRL has proven to be a significant breakthrough, enabling the development of more scalable and efficient battery management strategies [23]. Unlike traditional reinforcement learning, which struggles with high-dimensional state spaces, DRL can leverage neural networks to approximate value functions and policies more effectively [24], allowing it to manage a larger number of cells and modules while responding in real time to critical factors such as SOC, SOH, and external systems information. In [17], an SOH balancing framework was proposed based on DRL in a series-connected battery pack to minimize the SOH imbalance among battery cells by observing the cell SOCs and SOHs. However, this approach lacked the observation of factors like power demand, terminal voltage, and output current, which are essential for effective switch scheduling and battery pack longevity. In [16], the authors proposed a DRL-based battery management algorithm to maximize the lifetime of retired batteries with varying SOHs in a parallel-connected battery pack, but they ignored scheduling for series-connected modules, so the approach was limited to applications requiring higher voltages. Moreover, the computational complexity in [16,17] was relatively high due to the extensive state space considered, and the proposals were limited to the use of a single agent, which can lead to a struggle with scalability when there is a large number of cells and modules. In [25], a multi-agent DRL-based method was proposed to reduce the SOC and SOH imbalance among battery cells, but overlooked external systems information, which directly affects battery pack lifetime by preventing overcharging or discharging, especially for cells or modules with lower SOHs. Table 2 shows the classification among battery scheduling algorithms.
In this study, we propose a battery management algorithm to maximize the BESS lifetime in a parallel-series connected battery pack with heterogeneous SOHs. To carry this out, the proposed algorithm first estimates the SOC and SOH of all cells jointly online. Then, based on the SOCs and SOHs of the cells and modules, a cooperative multi-agent deep Q network framework is implemented to schedule switches in the parallel-series connected battery pack by connecting or bypassing battery cells and modules. The proposed algorithm maximizes the battery pack lifetime and reduces the SOH imbalance among cells and modules. The algorithm also adapts to changes in external systems (i.e., power demand and available power supply). We demonstrate the effectiveness of our proposed algorithm via simulation using real, measured data compared to previous work.
The rest of this paper is organized as follows: Section 2 explores the proposed parallel-series connected battery pack and the associated scheduling challenges. Section 3 formulates the optimization problem by minimizing the reduction in SOH in the battery pack. Section 4 presents the framework of the proposed algorithm. Section 5 details the simulation setup, results, and the algorithm’s impact. Finally, Section 6 provides the conclusion of this work.

2. System Model

In this paper, we consider a parallel-series connected BESS [15] with a power supply (e.g., a solar energy generator) and a load (e.g., a household), as shown in Figure 3. The BESS comprises a parallel-series connected battery pack and a battery management system (BMS) that controls charging and discharging. We consider a discrete-time model with time slot t ( t = 0 , 1 , 2 , ) and durations of  Δ t .

2.1. Parallel-Series Connected Battery Pack Model

The battery pack consists of  m × n  battery cells, where m battery modules are connected in series to increase voltage at the battery pack terminals, and n battery cells are connected in parallel to form a battery module to provide a higher current (or capacity), as shown in Figure 3. To schedule modules and cells in the battery pack, we consider  m × ( n + 1 )  controllable switches. Specifically, at time slot t, switch  X i j ( t ) ( i = { 1 , 2 , , m } j = { 1 , 2 , , n } ) expresses connecting or disconnecting battery cell j of module i, denoted as cell  ( i , j ) , and switch  X i ( t )  is used to express connecting or bypassing module i in the battery pack. Switch  X i j ( t )  for cell  ( i , j )  at time slot t is defined as
X i j ( t ) = 1 , if cell ( i , j ) is connected 0 , if cell ( i , j ) is bypassed .
Switch  X i ( t )  for module i at time slot t is defined as
X i ( t ) = 1 , if X i j ( t ) = 0 for j 0 , otherwise .
Switch  X i ( t )  is turned ON (1) if all n cells in module i are disconnected (0) to ensure that the battery pack charging or discharging process is not interrupted, which means module i is bypassed from the power supply or load at time slot t. Otherwise, switch  X i ( t )  is turned OFF (0) if any cell in module i is connected (1), which means that module i can be charged or discharged at time slot t.
Each battery cell is modeled according to a second-order Thévenin equivalent model [26] based on the material structure of the lithium-ion battery [25], as shown in Figure 4. Battery cell  ( i , j ) , which represents the jth cell in module i, has electrochemical impedance spectroscopy (EIS) parameters including open-circuit voltage  V i j O , internal resistance  R i j S , and two polarization RC pairs connected in series; each RC pair includes a resistor and a capacitor connected in parallel, i.e., ( R i j P 1 , C i j P 1 ) and ( R i j P 2 , C i j P 2 ). The terminal voltage of cell  ( i , j )  at time t V i j ( t ) , is calculated by
V i j ( t ) = V i j O ( t ) V i j P 1 ( t ) V i j P 2 ( t ) R i j S ( t ) I i j ( t ) ,
where  I i j ( t )  is the measured current of cell  ( i , j )  at time slot t V i j P 1 ( t )  and  V i j P 2 ( t ) , respectively, are the polarization voltages of RC pairs ( R i j P 1 , C i j P 1 ) and ( R i j P 2 , C i j P 2 ) in cell  ( i , j )  at time slot t, and are calculated by using the EIS parameters at the previous time slot  t 1  as [26]
V i j P 1 ( t ) = e Δ t R i j P 1 ( t 1 ) C i j P 1 ( t 1 ) V i j P 1 ( t 1 ) + R i j P 1 ( t 1 ) 1 e Δ t R i j P 1 ( t 1 ) C i j P 1 ( t 1 ) I i j ( t 1 )
and
V i j P 2 ( t ) = e Δ t R i j P 2 ( t 1 ) C i j P 2 ( t 1 ) V i j P 2 ( t 1 ) + R i j P 2 ( t 1 ) 1 e Δ t R i j P 2 ( t 1 ) C i j P 2 ( t 1 ) I i j ( t 1 ) .
Based on measurement data (terminal voltage, current) and EIS parameters, the BMS estimates the SOC and SOH of each cell in order to schedule switches in the battery pack. Details are explained in the following subsection.

2.2. Battery Management System

The BMS schedules the switches in the battery pack and external systems (i.e., the power supply or load demand) according to SOC and SOH estimation, as shown in Figure 3. To estimate SOC and SOH, the BMS monitors the voltage, current, and temperature of each battery cell. We define the SOC of cell  ( i , j )  at time t as the level of charge at time t relative to the maximum battery capacity by [18]
S O C i j ( t ) = S O C i j ( t 1 ) η Δ t I i j ( t 1 ) M i j ( t ) ,
where  M i j ( t )  is the estimated capacity level of cell  ( i , j )  at time slot t, and  η  is the Coulombic efficiencies of the discharging and charging processes. We set the measured current  I i j ( t 1 )  to positive when discharged and negative when charged, to simplify Equation (6).
Module i, in which cells are connected in parallel, ensures that all cells share the same voltage while their individual currents add up, resulting in a cumulative increase in the total capacity of the module [27]. Therefore, capacity  M i ( t )  of module i in parallel connection is the sum of the capacities of the individual cells based on Kirchhoff’s law as [28]
M i ( t ) = j = 1 n M i j ( t ) .
We define the SOC of module i at time slot t as the ratio of the total charge level of all cells in module i to the total capacity  M i ( t )  of module i by [18]
S O C i ( t ) = j = 1 n S O C i j ( t ) M i j ( t ) M i ( t ) .
The SOH of cell  ( i , j )  at time t, which is the ratio of the maximum battery capacity at time t to its rated capacity [29], is defined as
S O H i j ( t ) = M i j ( t ) M i j n e w = M i j ( t ) M n e w ,
where  M i j n e w  is the initial capacity of new cell  ( i , j ) ; in this paper, we consider all battery cells to be the same type and have the same initial capacity as  M i j n e w = M n e w  [30,31]. The BESS exhibits heterogeneous SOHs between individual cells. From Equation (7), capacity  M i ( t )  of module i at time slot t is the capacity summation of parallel-connected cells in the given module. Therefore, combining Equation (9), the SOH of module i at time slot t S O H i ( t ) , is defined as the SOH average of all parallel-connected cells in module i by
S O H i ( t ) = M i ( t ) n M n e w = 1 n j = 1 n S O H i j ( t ) ,
where  S O H i j ( t )  is the SOH of cell j in module i at time slot t.
The SOH of individual aging cells is not uniform, leading to SOH inconsistencies between modules. The battery pack, in which modules are connected in series, decreases the terminal voltage, limiting the fulfillment of demand with lower SOH modules [32]. Therefore, we define the SOH of the battery pack,  S O H P ( t ) , as the lowest SOH of series-connected modules by [30,32]
S O H P ( t ) = min i S O H i ( t ) ,
where  S O H i ( t )  is the SOH of module i at time slot t. The SOH of the battery pack  S O H P ( t ) , which represents battery pack aging, is a non-increasing function until the end of its life cycle. In this paper, we define the lifetime of the battery pack,  T E o L , as the first time slot after the battery pack  S O H P ( t )  reaches a threshold, denoted as  T h E o L , which is assumed to be 60% of SOH as seen in Figure 1, by 
T E o L = arg min t { S O H P ( t ) T h E o L } .
The BMS protects battery cells from overcharge, overdischarge, overheating, and excess current by controlling the BESS to fulfill demand when discharging to the load, then recharging from the power supply to recover power based on the SOC and SOH information. Note that the load and power supply change continuously under actual conditions, where  d D ( t )  and  d C ( t )  represent the power demand and available power supply, respectively, at time slot t and are constrained as [16]
d D ( t ) d C ( t ) = 0 with d D ( t ) , d C ( t ) 0 ,
which means the battery pack only charges, discharges, or is idle at the given time slot. We define the first time slot of a discharge process  t D  if  d D ( t D ) > 0  and  d D ( t D 1 ) = 0 . Similarly, the first time slot of a charge process,  t C , is defined if  d C ( t C ) > 0  and  d C ( t C 1 ) = 0 . When discharging, the BMS considers the amount of discharged power,  l D ( t ) , and the maximum dischargeable power,  E D ( t ) , of the battery pack at time slot t as [33]
l D ( t ) = i = 1 m S O C i ( t D ) S O C i ( t ) M i ( t )
and
E D ( t ) = i = 1 m S O C i ( t D ) S O C m i n M i ( t ) ,
where  S O C m i n  indicates the lower bound of the SOC to prevent overdischarge, and  t D  is the first time slot of the discharge process containing t. When charging, the BMS considers the amount of charged power,  l C ( t ) , and the maximum chargeable power,  E C ( t ) , of the battery pack at time slot t as [33]
l C ( t ) = i = 1 m S O C i ( t ) S O C i ( t C ) M i ( t )
and
E C ( t ) = i = 1 m S O C m a x S O C i ( t C ) M i ( t ) ,
where  S O C m a x  indicates the upper bound of the SOC to prevent overcharging, and  t C  is the first time slot of the charge process containing t.
The BMS controls the battery pack discharge if  d D ( t ) > 0  and  l D ( t ) < E D ( t ) , or charge if  d C ( t ) > 0  and  l C ( t ) < E C ( t ) . Otherwise, the battery pack returns to idle for the remainder of the discharge or charge process. To that end, after estimating SOCs and SOHs, and controlling the charging or discharging process of the battery pack, the BMS schedules all switches in the battery pack in order to extend the lifetime and to reduce the SOH imbalance among the cells and modules.

3. Problem Formulation

The aim of this paper is to extend the lifetime of the battery pack in the BESS. To maximize the battery pack lifetime, we formulate the lifetime maximization of the battery pack as
max T EoL
s . t . I m i n I ij ( t ) I m a x + , SOC min SOC ij ( t ) SOC max , l D ( t ) E D ( t ) , l C ( t ) E C ( t ) ,
where  I m a x +  and  I m i n  represent the discharge current and charge current thresholds to limit the discharging and charging rates, respectively;  S O C m i n  and  S O C m a x  represent the lower and upper limits of the SOC, respectively, which are necessary to prevent overdischarging and overcharging;  l D ( t )  and  l C ( t )  represent the discharged and charged power, respectively, until time slot t E D ( t )  and  E C ( t ) , respectively, indicate the maximum power load of the battery pack when discharging and charging. Maximizing battery pack lifetime  T E o L  means maximizing the number of time slots in this second life cycle (Stage 2 in Figure 1). To carry this out, we minimize the rate of battery pack aging (i.e., the SOH reduction in the battery pack). We first define the SOH reduction in the battery pack at time slot t as
Δ S O H P ( t ) = S O H P ( t 1 ) S O H P ( t ) ,
where  S O H P ( t 1 )  and  S O H P ( t )  denote the SOH of the battery pack, respectively, at time slots  t 1  and t S O H P ( t )  is a non-increasing function, so we constrain it with
Δ S O H P ( t ) 0 .
To achieve this, the problem is framed to minimize the reduction in SOH of the battery pack, which is mathematically expressed as
min t Δ S O H P ( t )
s . t . Δ S O H P ( t ) 0 , I m i n I ij ( t ) I m a x + , SOC min SOC ij ( t ) SOC max , l D ( t ) E D ( t ) , l C ( t ) E C ( t ) ,

4. The Proposed Algorithm

To enhance the lifetime of the battery pack in the BESS, we propose a battery management algorithm to minimize SOH reduction in the battery pack. The overall flow of the proposed algorithm is shown in Algorithm 1. At each time slot, the algorithm first gathers measurement data, including the terminal voltage, current, and temperature of each cell. It then estimates the SOC and SOH (Algorithm 2) and manages the BESS charging or discharging process based on the dynamic power demand (Algorithm 3). After updating the states in the BESS, the proposed algorithm schedules all switches in the battery pack to prolong its second lifetime based on the cooperative multi-agent deep Q network (Algorithm 4). The following subsections provide a detailed discussion of each component of the proposed algorithm.
Algorithm 1 BESS Scheduling Algorithm
1:
Gather measurement data V, I, T
2:
Estimate SOCs and SOHs (Algorithm 2)
3:
Control discharge, charge, and idle processes of the BESS (Algorithm 3)
4:
Update the BESS states
5:
Schedule switches in the battery pack (Algorithm 4)
Algorithm 2 EKF-based SOC and SOH estimation
 1:
Input: Measurement data V, I, T; Data tables
 2:
Output:  S O C i j ( t ) S O H i j ( t )
 3:
for each cell  ( i , j )  do
 4:
      Estimate state vector  x i j ^ ( t )  and error covariance  P i j ^ ( t )  using (23) and (24)
 5:
      Estimate terminal voltage  V i j ^ ( t )  using (28)
 6:
      Compute Kalman gain  G i j ( t )  using (32)
 7:
      Correct state vector  x i j ( t )  and error covariance  P i j ( t )  using (33) and (34)
 8:
      Update  S O C i j ( t )  and  M i j ( t )
 9:
      Update  S O H i j ( t )  using (35)
10:
end for
11:
Update SOC and SOH of each module using (8) and (10)
12:
Update SOH of the battery pack using (11)
Algorithm 3 Charge, Discharge, and Idle Controlling
 1:
Input:  l D ( t ) l C ( t ) d D ( t ) d C ( t ) E D ( t ) E C ( t )
 2:
Output: Discharge, charge, or idle
 3:
Determine the current process (discharge, charge, or idle)
 4:
if  d D ( t ) > 0  then                  ▹ Discharging
 5:
    if  l D ( t ) d D ( t )  then
 6:
          Convert discharge to charge
 7:
    else
 8:
          if  l D ( t ) E D ( t )  then
 9:
              Convert discharge to idle
10:
         else
11:
             Continue to discharge
12:
         end if
13:
   end if
14:
else if  d C ( t ) > 0  then                  ▹ Charging
15:
   if  l C ( t ) d C ( t )  then
16:
         Convert charge to discharge
17:
   else
18:
         if  l C ( t ) E C ( t )  then
19:
             Convert charge to idle
20:
         else
21:
             Continue to charge
22:
         end if
23:
   end if
24:
end if
Algorithm 4 Switch Scheduling
 1:
Input: state vector  s ( t )
 2:
Output: Optimal joint action  a ( t )
 3:
Construct main network  Q i  and target network  Q ¯ i  for each agent
 4:
Initialize acquired knowledge  K
 5:
Select module switch action  a m + 1 ( t )  by  ϵ -greedy policy   ▹ Switch ON/OFF modules
 6:
for each agent  i = 1 to m  do
 7:
      if module i is OFF then
 8:
           Limit action  a i ( t )               ▹ Switch OFF all cells in module i
 9:
    else
10:
           Select action  a i ( t )  by  ϵ -greedy policy       ▹ Switch ON/OFF cells in module i
11:
      end if
12:
end for
13:
Execute joint action  a ( t )
14:
for each agent  i = 1 to m  do
15:
      Compute local reward  r i ( t )
16:
end for
17:
Compute global reward  r m + 1 ( t )
18:
Update new state  s ( t + 1 )
19:
Add  s i ( t ) , a i ( t ) , r i ( t ) , s i ( t + 1 )  into  K  for each agent
20:
Compute target action value  Q ¯ i ( t )  for each agent
21:
Perform a gradient descent to minimize loss function  L ϕ i ( t )  for each agent

4.1. SOC and SOH Estimation

To observe the states of the battery pack, the proposed algorithm estimates the SOC and SOH of each battery cell by gathering its terminal voltage, current, and temperature. To carry this out, the algorithm first uses a fourth-order extended Kalman filter (EKF) to estimate the SOC and SOH of each battery cell, and then updates the SOC and SOH information of each battery module and the battery pack.
For each cell  ( i , j ) , we define the corrected state vector of cell  ( i , j )  at time  t k , denoted as  x i j ( t k ) , by
x i j ( t k ) = [ S O C i j ( t k ) , V i j P 1 ( t k ) , V i j P 2 ( t k ) , 1 / M i j ( t k ) ] T .
Based on measured current  I i j ( t 1 )  and the modeled EIS parameters at the previous time slot, the algorithm estimates state vector  x i j ^ ( t )  and error covariance  P i j ^ ( t )  as
x i j ^ ( t ) = A i j ( t 1 ) x i j ( t 1 ) + B i j ( t 1 ) I i j ( t 1 )
and
P i j ^ ( t ) = A i j ( t 1 ) P i j ( t 1 ) A i j ( t 1 ) T ,
where  x i j ( t 1 )  is the correct state vector of cell  ( i , j )  at time  t 1 A i j ( t 1 )  and  B i j ( t 1 )  denote the transition matrix and the input matrix, respectively. Matrices  A i j ( t 1 )  and  B i j ( t 1 )  are defined as
A i j ( t 1 ) = 1 0 0 η Δ t I i j ( t 1 ) 0 e Δ t R i j P 1 ( t 1 ) C i j P 1 ( t 1 ) 0 0 0 0 e Δ t R i j P 1 ( t 1 ) C i j P 1 ( t 1 ) 0 0 0 0 1
and
B i j ( t 1 ) = 0 R i j P 1 ( t 1 ) ( 1 e Δ t R i j P 1 ( t 1 ) C i j P 1 ( t 1 ) ) R i j P 2 ( t 1 ) ( 1 e Δ t R i j P 2 ( t 1 ) C i j P 2 ( t 1 ) ) 0 ,
where  I i j ( t 1 )  is the measured current of cell  ( i , j )  at  t 1 . EIS parameters  R i j S R i j P 1 C i j P 1 R i j P 2 , and  C i j P 2  are functions of  S O C i j  and  S O H i j . Specifically, the EIS parameters are exponential functions of  S O C i j ( t 1 ) , such as
x 1 exp x 2 S O C i j ( t 1 ) + x 3 ,
where  x 1 x 2 , and  x 3  are real numbers, depending on the SOH of each cell  S O H i j . A dataset [34] is used to construct look-up tables of each EIS parameter  ( R i j S R i j P 1 C i j P 1 R i j P 2 , or  C i j P 2 )  based on the above exponential functions. The algorithm estimates terminal voltage  V i j ^ ( t )  using state vector  x i j ^ ( t )  and Jacobian matrices  C i j ( t )  and  D i j ( t )  as
V i j ^ ( t ) = C i j ( t ) x i j ^ ( t ) + D i j ( t ) I i j ( t ) ,
where  I i j ( t )  is the measured current of cell  ( i , j )  at time slot t. Matrices  C i j ( t )  and  D i j ( t )  are, respectively, defined as
C i j ( t ) = δ V i j O ( t ) δ S O C i j ( t ) 1 1 0
and
D i j ( t ) = R i j S ( t ) .
Open-circuit voltage  V i j O ( t )  is defined as the ath-order polynomial function of  S O C i j ( t )  by
V i j O ( t ) = b = 0 a y b S O C i j ( t ) b ,
where  y b  is a real number depending on  S O H i j . The algorithm calculates Kalman gain  G i j ( t )  to consider the error of the estimated value as
G i j ( t ) = P i j ^ ( t ) C i j ( t ) T C i ( t ) P i j ^ ( t ) C i j ( t ) T 1 .
Using the measured terminal voltage of cell  ( i , j ) V i j ( t ) , the algorithm corrects state vector  x i j ( t )  and error covariance  P i j ( t )  as
x i j ( t ) = x i j ^ ( t ) + G i j ( t ) V i j ( t ) V i j ^ ( t )
and
P i j ( t ) = 1 G i j ( t ) C i j ( t ) P i j ^ ( t ) .
From corrected state vector  x i j ( t ) S O C i j ( t )  and maximum capacity  M i j ( t )  of cell  ( i , j )  at time slot t are updated. The estimation algorithm updates the SOH of cell  ( i , j )  at time slot t by averaging it after completing a charge or a discharge process in the battery pack, as the SOH does not degrade immediately after single or multiple time slots [35], as
S O H i j ( t ) = 1 t t D + 1 τ = t D t S O H i j ( τ ) if completing discharge at time slot t ; 1 t t C + 1 τ = t C t S O H i j ( τ ) if completing charge at time slot t ; S O H i j ( t 1 ) otherwise ,
where  t D  and  t C  are the first time slots of the discharging and charging processes containing t. To that end, the estimation algorithm updates the SOC and SOH of modules ( S O C i ( t ) , S O H i ( t ) i = { 1 , 2 , , m } ) using Equations (8) and (10).  S O H P ( t )  of the battery pack is also updated using Equation (11). Algorithm 2 summarizes the EKF-based combined SOC and SOH estimation.

4.2. Charge, Discharge, and Idle Controlling

The proposed algorithm controls the charging and discharging processes of the battery pack to prevent overcharging and overdischarging in the BESS. By comparing loaded power ( l D ( t ) , and  l C ( t ) ), maximum power load ( E D ( t ) , and  E C ( t ) ), power demand ( d D ( t ) ), and available power supply ( d C ( t ) ), the algorithm determines whether the status of the BESS is discharging, charging, or idle in order to control it accordingly.
On the one hand, if the battery pack is discharging, i.e.,  d D ( t ) > 0 , we calculate the amount of power discharged  l D ( t )  and maximum dischargeable power  E D ( t )  using Equations (14) and (15). If  l D ( t )  reaches power demand  d D ( t ) , the algorithm converts the BESS status from discharging to charging. When  l D ( t )  reaches the maximum power load of the battery pack when discharging  E D ( t ) , the algorithm lets the battery pack go idle. Otherwise, the discharge process continues.
On the other hand, if the battery pack is charging, i.e.,  d C ( t ) > 0 , we calculate the amount of power charged  l C ( t )  and maximum chargeable power  E C ( t )  by using Equations (16) and (17). If  l C ( t )  reaches  d C ( t ) , the algorithm converts the BESS status from charging to discharging. When  l C ( t )  reaches the maximum power load of the battery pack at time slot t when charging  E C ( t ) , the algorithm lets the battery pack go idle. Otherwise, it continues charging. The process of the battery pack charging and discharging controlling is summarized in Algorithm 3.

4.3. Cooperative Multi-Agent Deep Reinforcement Learning-Based Battery Switch Scheduling

A cooperative multi-agent deep Q network (CM-DQN) scheduling algorithm is proposed to control switches in order to minimize the SOH reduction in the battery pack. The algorithm uses  m + 1  agents in which agent  m + 1  collectively evaluates the battery pack and comes up with module switch scheduling for serial connections, while m agents (1 to m) correspond to m modules for scheduling cell switches in parallel networks. While agents 1 to m optimize the local reward (i.e., minimize the SOH reduction in each module) by sharing policies, agent  m + 1  obtains global states (from modules and the BESS) to minimize the SOH reduction in the battery pack in the second life cycle. The proposed algorithm perceives environment state  s ( t ) = { s i ( t ) i = 1 , , m + 1 }  based on the estimation algorithm (Algorithm 2) and the charge and discharge control algorithm (Algorithm 3), and chooses a joint action  a ( t ) = { a i ( t ) i = 1 , , m + 1 }  by achieving a cumulative reward  r ( t ) = { r i ( t ) i = 1 , , m + 1 } . The proposed switch scheduling algorithm first monitors the current environmental state of the battery pack and then derives state vectors for each agent, respectively, as
s i ( t ) = C i ( t ) , H i ( t ) , I P ( t ) , d D ( t ) , d C ( t ) , with i = 1 , 2 , , m C ( t ) , H ( t ) , V P ( t ) , d D ( t ) , d C ( t ) , with i = m + 1 ,
where  C ( t )  and  H ( t )  represent sets of the SOCs and SOHs of m series-connected modules, respectively;  C i ( t )  and  H i ( t )  are value sets of the SOCs and SOHs of n parallel-connected cells in module i, respectively;  V P ( t )  and  I P ( t )  represent the measured terminal voltage and the measured current of the battery pack at time slot t; and  d D ( t )  and  d C ( t )  are the load demand and available power supply at time slot t, respectively. Action  a i ( t )  with  i = 1 , , m + 1  controls switches in the battery pack and is defined as
a i ( t ) = X i 1 ( t ) , , X i n ( t ) , with i = 1 , 2 , , m for cell switches X 1 ( t ) , , X m ( t ) , with i = m + 1 for module switches .
All agents continuously interact with each other by sharing states and policies so the training is synchronous and accurate.
To minimize SOH reduction in the battery pack in accordance with the obtained state, the algorithm utilizes acquired knowledge  K  that includes a replay buffer to store in the form of  s i ( t ) ,   a i ( t ) , r i ( t ) , s i ( t + 1 )  with an experience in each time slot. For each agent  i = { 1 , , m + 1 } , the algorithm constructs a main network  Q i , and a target network  Q ¯ i  with the same structure and random weights  ϕ i = ϕ ¯ i , where  Q i  approximates the action-value function  Q i s i ( t ) ,   a i ( t ) ϕ i  at time slot t, and  Q ¯ i  computes the target action value  Q ¯ i ( t ) .
To utilize the past experiences, the proposed algorithm looks at acquired knowledge  K  to determine whether state  s m + 1 ( t )  is in  K  or not. If state  s m + 1 ( t )  belongs to  K , the algorithm chooses action  a m + 1 ( t )  using an  ϵ -greedy policy [36]. Specifically, it either chooses a random action with probability  p = ϵ  or opts for the action with the highest  Q m + 1 ( t k )  with probability  p = 1 ϵ , by
a m + 1 ( t ) = random action , with p = ϵ arg max a m + 1 ( t ) Q m + 1 ( t k ) , with p = 1 ϵ .
In the case in which state  s m + 1 ( t )  is not in  K , scheduling action  a m + 1 ( t )  is performed at random. We determine which modules are bypassed at time slot t, and then limit scheduling action in each agent of cell switch scheduling (agent 1 to m). If module i is bypassed, we limit action  a i ( t )  by turning OFF all cells in module i. Otherwise, the algorithm continues to assess the acquired knowledge  K  to determine whether state  s i ( t )  is included in  K . If state  s i ( t )  is in  K , the algorithm selects action  a i ( t )  according to an  ϵ -greedy policy as specified in Equation (38). In the case in which state  s i ( t )  is not in  K , scheduling action  a i ( t )  is chosen randomly. After all actions are chosen, a joint action  a ( t ) = { a i ( t ) i = 1 , , m + 1 }  is executed.
For agent i of module i corresponding to cell switch scheduling, the algorithm evaluates the local immediate reward as
R i ( t ) = E Δ S O H i ( t ) ,
where  Δ S O H i ( t )  is the SOH reduction in module i at time slot t, and is defined as a non-increasing function by
Δ S O H i ( t ) = S O H i ( t 1 ) S O H i ( t ) 0 .
The algorithm evaluates the local cumulative reward through interactions with the environment and seeks an optimal policy to maximize it as
r i ( t ) = h = t γ h t R i ( h ) .
To that end, the algorithm computes the global cumulative reward to minimize SOH reduction in the battery pack as
r m + 1 ( t ) = h = t γ h t R m + 1 ( h ) ,
where  R m + 1 ( t h )  is the global immediate reward and is computed by
R m + 1 ( t ) = E Δ i = 1 m S O H i ( t ) .
By executing joint action  a ( t ) , the algorithm updates new state  s ( t + 1 ) , then stores sample  s i ( t ) ,   a i ( t ) , r i ( t ) , s i ( t + 1 )  into accquired knowledge  K . Target action value  Q ¯ i ( t )  is computed by
Q ¯ i ( t ) = r i ( t ) + γ max a i ( t ) Q i s i ( t + 1 ) , a i ( t + 1 ) ϕ ¯ i ,
where  γ ( 0 , 1 ]  is the discount factor that determines the emphasis on future rewards. The CM-DQN updates the acquired knowledge  K  by minimizing the loss function  L ( ϕ i ( t ) )  through gradient descent. The loss function is defined as
L ϕ i ( t ) E Q ¯ i ( t ) Q i s i ( t ) , a i ( t ) ϕ i 2 .
Weight  ϕ i ( t )  is updated by the loss function as
ϕ i ( t ) = ϕ i ( t 1 ) + α L ϕ i ( t )
where  α ( 0 , 1 ]  is the learning factor. After computing the loss for an action, the target network  Q ¯ i  updates its weights to match those of the main network  Q i , i.e.,  ϕ ¯ i = ϕ i  after P time slots to ensure algorithm stability [36]. Loss function  L ϕ i ( t )  is minimized so that action value  Q i s i ( t ) , a i ( t ) ϕ i  has the same value as target action value  Q ¯ i ( t ) , which also means that the SOH reduction in the battery pack is minimized. The proposed switch scheduling algorithm is summarized in Algorithm 4. The training process in the CM-DQN is shown in Figure 5.

5. Performance Evaluation

5.1. Simulation Environment

The BESS was performed via simulation using a lithium-ion battery pack model and was implemented in MATLAB and Simulink 2023b. To assess the performance of the proposed algorithm, we designed a  6 × 4  parallel-series connected battery pack [15], listed in Table 3, including 24 lithium-ion batteries with heterogeneous SOHs. The battery cell was modeled as a second-order Thévenin equivalent battery model with reductions in SOHs by utilizing a NASA dataset [34]. That dataset includes 28 lithium-ion cobalt oxide 18,650 cells with a nominal voltage of 3.7 V and nominal capacity of 2.2 Ah with a maximum current of 4 A, encompassing real-time measurements of terminal voltage, current, cell temperature, discharging capacity, and EIS impedance readings. We identified the EIS parameters, including  V i j O R i j S R i j P 1 C i j P 1 R i j P 2 , and  C i j P 2 , in the 90% to 60% SOH range by using the dataset. A dynamic power demand condition was used to evaluate the effectiveness of the algorithm. The power demand was obtained by generating values from a uniform distribution within 60 Wh to 100 Wh, while the terminal voltage was obtained by generating values from a uniform distribution ranging from 13.75 V to 16.25 V (at least four series-connected modules ON at a given time because the nominal voltage of each cell is 3.7 V). We set the load current for both discharging and charging the battery pack to 8 A (at least two parallel-connected batteries ON at a given time because the maximum output current of one battery is 4 A).
Two deep Q network architectures were constructed for cell scheduling and module scheduling, each including one input layer, two hidden layers, and one output layer. The number of hidden layers and the number of neurons in each hidden layer can be selected by trial and error [37]. To determine the optimal network size, including the number of hidden layers and the number of neurons in each layer, two approaches are commonly used: the constructive approach and the destructive approach [38]. We used a constructive approach to network sizing [39]. We started with a small network and gradually added neurons or hidden layers to improve the performance of the network, where the number of hidden layer neurons varied among 64, 128, and 256, and the number of hidden layers started from 1. Both deep Q network architectures with two 128-dimension hidden layers had the maximum cumulative reward with minimum episodes. For cell scheduling, the input layer included 11 neurons corresponding to the number of dimensions in state  s i  of module i with four parallel-connected cells. The output layer consisted of 12 neurons corresponding to possible actions,  a i , for cell switch scheduling. For module scheduling, the input layer included 15 neurons corresponding to the number of dimensions in state  s m + 1  of the battery pack with six series-connected modules. The output layer consisted of 15 neurons corresponding to possible actions,  a m + 1 , for module switch scheduling. During learning, we set the learning rate  α  to 0.001, which helped reduce the loss function  L ϕ i ( t )  episodes and balance the speed of learning with the stability of the training process. We set the discount factor  γ  to 0.99 to effectively balance prioritizing long-term cumulative rewards, i.e., minimizing SOH reduction in the battery pack, while avoiding slowing down the convergence process. Other simulation parameters are summarized in Table 4.
For the performance evaluation, we first investigated the effect of the proposed algorithm on SOH balancing to enhance the lifetime of the battery pack. Then, we evaluated the impact of the proposed algorithm on dynamic power demand. To validate the performance of the proposed cooperative multi-agent deep Q network (denoted as CM-DQN) algorithm, we compare it with previous works, including a self-X multicell battery algorithm [15] denoted as Self-X, a multilayer SOH equalizer [22] denoted as M-SOH, a DOD-SOH balancing algorithm [17] denoted as DOD-SOH, and a multi-actor–critic scheduling algorithm [25] denoted as M-A2C.

5.2. Capacity Balancing and Battery Pack Lifetime

We evaluated the performance of the proposed algorithm on balancing the SOH of modules, as shown in Figure 6. The SOH balancing of modules in the battery pack in all the algorithms is shown in Figure 6a–e. The proposed algorithm (CM-DQN) achieved more SOH balancing than previous work. During learning, the CM-DQN framework determines the optimal actions known as switches to connect or bypass cells and modules by observing BESS states such as the SOCs and SOHs of cells and modules, power demand  d D ( t ) , or available power supply  d C ( t ) . CM-DQN ensures that no single module degrades significantly faster than the others by fully observing the states in the BESS (including SOC, SOH of modules and cells, charging and discharging states, and the power demand). Figure 6f compares the standard deviation in SOHs among the modules over time for each algorithm. CM-DQN exhibited the lowest standard deviation (close to zero), compared to previous work, further affirming its ability to maintain SOH balance among modules. Self-X showed an SOH imbalance that was almost unchanged without considering SOHs. Other methods (M-SOH, DOD-SOH, and M-A2C), which considered SOHs, reduced the SOH imbalance but, without observations of the states in the BESS, the standard deviation was still higher than CM-DQN, indicating that SOHs between modules were not balanced until the end of the lifetime. With heterogeneous SOHs in the battery pack, CM-DQN offered better performance than other algorithms by balancing SOHs among the modules.
The performance of the proposed algorithm on battery pack lifetime, measured by minimizing SOH reduction, was evaluated and is shown in Figure 7. By balancing the SOHs among the modules in the battery pack, the proposed algorithm (CM-DQN) achieved the lowest SOH reduction in the battery pack compared to the other algorithms by minimizing the SOH reduction for each time slot, as shown in Figure 7a. The reduction in SOH under Self-X and M-SOH increased rapidly after 1200 h, compared to the other algorithms, due to the lack of consideration of both SOC and SOH. M-A2C had the closest performance to CM-DQN, but ignoring external systems information caused the SOH reduction,  Δ S O H P ( t ) , to increase faster as the battery lifetime weakened after 1250 h. The SOH reduction in the battery pack was minimized with CM-DQN, resulting in an extended battery pack lifetime, as shown in Figure 7b. The proposed algorithm CM-DQN achieved a longer battery pack lifetime compared to previous work. The SOH of the battery pack under CM-DQN reached 60% (the end of its second life) after a working time of 1558 h. By not observing the states in the BESS, all other work finished the second lifetimes significantly faster than CM-DQN. Table 5 shows that the battery pack lifetime improvement of the proposed algorithm (CM-DQN) compared to previous work was up to 16.27%. Hence, the proposed algorithm can efficiently schedule switches in the battery pack to maximize the BESS lifetime and balance the SOHs among the modules and cells.
To show the impact of SOC balancing on the lifetime of the battery pack, the standard deviations of the SOCs in a battery pack under different algorithms are shown in Figure 8. Self-X, which only focused on SOCs, had better SOC balancing than other algorithms but stopped decreasing after some time and had a worse battery pack lifetime, as shown in Figure 7b. CM-DQN had a continuously decreasing standard deviation over time and it was lower than that of Self-X after 1130 h. Although the proposed algorithm (CM-DQN) focused on only balancing the SOHs of cells by observing SOCs, it also decreased the SOC imbalance among the cells in a battery pack and achieved a longer lifetime. As a result, scheduling cells and modules in a battery pack for SOH balancing is more important to extend the lifetime of the battery pack, as shown in Figure 6f, Figure 7 and Figure 8.

5.3. Performance of the Proposed Algorithm on Varied Demands

We studied the performance of the proposed algorithm in scenarios with different power demands, as shown in Table 6. We considered various conditions in Scenario 1 with the same mean (80 Wh) and incremental variance (from 8.33 to 533.33), as shown in Figure 9a, and in Scenario 2 with the same variance (133.33) and incremental mean (from 60 Wh to 100 Wh), as shown in Figure 9b. Understanding the impact of demand variance is essential for evaluating the robustness and adaptability of battery management algorithms in real-world applications.
Figure 10 illustrates the impact of different load demand conditions, characterized by varying demand variance ( σ 2 ), on the lifetime of algorithms under the same mean demand. In Figure 10a, it is observed that, as the demand variance increased from 8.33 to 533.33, the standard deviation for all algorithms also increased. The proposed algorithm minimizes the SOH reduction in the battery pack at each time slot by balancing the SOHs of the battery modules. The charge and discharge control algorithm (Algorithm 3) and observation of dynamic demands (shown in state vectors in (36)) helped CM-DQN consistently achieve the lowest standard deviation, indicating robustness against fluctuations in demand. In contrast, the previous algorithms showed an increasing standard deviation proportional to the variation in power demand, indicating a lack of adaptation to demand dynamics. Figure 10b examines the lifetime obtained by the algorithms. The proposed algorithm (CM-DQN) demonstrated a significantly longer lifetime than the others, maintaining a robust performance in SOH balancing and extending lifetime even as demand variance increased. This suggests the proposed algorithm is more adaptive and effective in extending a battery’s operational lifetime under varying demand conditions. In summary, the proposed CM-DQN algorithm outperformed the others in robustness (lower standard deviation and longer lifetime) when subjected to increasing demand variance.
Figure 11 presents the performance of the algorithms under different mean demand conditions ( μ ) while keeping the variance constant. Figure 11a shows the standard deviation of the algorithms’ performance as the demand mean increased from 60 Wh to 100 Wh. The CM-DQN algorithm consistently achieved the lowest standard deviation, indicating its robustness against variations in the demand mean. Figure 11b illustrates the lifetimes of the algorithms as the demand mean increased. The proposed CM-DQN algorithm exhibited the longest lifetime across all mean demand levels, demonstrating its efficiency in extending the battery lifetime under varying power demands. Overall, CM-DQN minimized the SOH reduction in the battery pack in handling different demand mean conditions, providing an extended battery lifetime and an SOH balancing among the modules.

6. Conclusions

In this paper, we proposed a battery management algorithm using a cooperative multi-agent deep Q network to maximize battery lifetimes for the battery pack in a BESS. The BESS consisted of a battery pack in which retired batteries with heterogeneous SOHs are connected in parallel and series. The proposed algorithm scheduled switches in the battery pack by connecting or bypassing battery cells and modules. The algorithm maximized the battery pack lifetime by reducing the SOH imbalance among cells and adapting to varied power demands. Via simulation, we showed that the proposed algorithm outperformed the other algorithms by attaining a more extended battery pack lifetime and reduced SOH imbalance among the cells and modules. The proposed algorithm extended the lifetime of the battery pack by up to 16.27% compared to the other algorithms.

Author Contributions

Conceptualization, N.Q.D., S.M.S. and S.K.; methodology, N.Q.D., S.M.S., S.-J.C. and S.K.; software, N.Q.D.; validation, S.M.S., T.M.D., S.-J.C. and S.K.; formal analysis, N.Q.D., S.M.S., T.M.D. and S.K.; investigation, N.Q.D., S.M.S., T.M.D. and S.K.; writing—original draft preparation, N.Q.D., S.M.S. and T.M.D.; supervision, S.K. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Basic Science Research Program through the National Research Foundation of Korea (NRF), funded by the Ministry of Education (NRF-2021R1I1A3A04037415) and the Korea Hydro and Nuclear Power Co. (2023).

Data Availability Statement

Publicly available datasets were analyzed in this study. These data can be found here: https://www.nasa.gov/intelligent-systems-division/discovery-and-systems-health/pcoe/pcoe-data-set-repository/ (accessed on 22 July 2024).

Conflicts of Interest

The funding sponsors had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.

References

  1. Hunt, G. USABC Electric Vehicle Battery Test Procedures Manual; Revision 2; United States Department of Energy: Washington, DC, USA, 1996.
  2. Examples for Reuse of Power Batteries. In Reuse and Recycling of Lithium-Ion Power Batteries; John Wiley & Sons, Ltd.: Hoboken, NJ, USA, 2017; Chapter 3; pp. 261–334.
  3. Canals Casals, L.; García, B.; Aguesse, F.; Iturrondobeitia, A. Second life of electric vehicle batteries: Relation between materials degradation and environmental impact. Int. J. Life Cycle Assess. 2015, 22, 82–93. [Google Scholar] [CrossRef]
  4. Wang, L.; Yusheng, S.; Wang, X.; Wang, Z.; Zhao, X. Reliability Modeling Method for Lithium-ion Battery Packs Considering the Dependency of Cell Degradations Based on a Regression Model and Copulas. Materials 2019, 12, 1054. [Google Scholar] [CrossRef] [PubMed]
  5. Martinez-Laserna, E.; Gandiaga, I.; Sarasketa-Zabala, E.; Badeda, J.; Stroe, D.I.; Swierczynski, M.; Goikoetxea, A. Battery second life: Hype, hope or reality? A critical review of the state of the art. Renew. Sustain. Energy Rev. 2018, 93, 701–718. [Google Scholar] [CrossRef]
  6. Europe’s Largest Energy Storage System Now Live at the Johan Cruijff Arena. Available online: https://global.nissannews.com/en/releases/europes-largest-energy-storage-system-now-live-at-the-johan-cruijff-arena/ (accessed on 22 July 2024).
  7. Second Life for a Coal Power Plant in Germany. Available online: https://www.electrive.com/2020/11/24/second-life-for-a-coal-power-plant-in-germany/ (accessed on 22 July 2024).
  8. Nováková, K.; Pražanová, A.; Stroe, D.I.; Knap, V. Second-Life of Lithium-Ion Batteries from Electric Vehicles: Concept, Aging, Testing, and Applications. Energies 2023, 16, 2345. [Google Scholar] [CrossRef]
  9. Rouholamini, M.; Wang, C.; Nehrir, H.; Hu, X.; Hu, Z.; Aki, H.; Zhao, B.; Miao, Z.; Strunz, K. A Review of Modeling, Management, and Applications of Grid-Connected Li-Ion Battery Storage Systems. IEEE Trans. Smart Grid 2022, 13, 4505–4524. [Google Scholar] [CrossRef]
  10. Cactos One Energy Storages. Available online: https://www.cactos.fi/en/product/ (accessed on 22 July 2024).
  11. Audi Opens Battery Storage Unit on Berlin EUREF Campus. Available online: https://www.mobilityhouse.com/media/productattachments/files/en20190524_Audi_opens_battery_storage_unit.pdf (accessed on 22 July 2024).
  12. A. Market-Intelligence-Report-December-2022. Available online: https://projectcobra.eu/wp-content/uploads/2022/12/Market-Intelligence-Report-December-2022.pdf (accessed on 22 July 2024).
  13. Fortum Installs Innovative Battery Solution at Landafors Hydropower Plant in Sweden. Available online: https://www.fortum.com/media/2021/04/fortum-installs-innovative-battery-solution-landafors-hydropower-plant-sweden (accessed on 22 July 2024).
  14. Why CellSwitch™ Is Revolutionary for Batteries. Available online: https://www.relectrify.com/technology/cellswitch/ (accessed on 22 July 2024).
  15. Kim, T.; Qiao, W.; Qu, L. Power Electronics-Enabled Self-X Multicell Batteries: A Design toward Smart Batteries. IEEE Trans. Power Electron. 2012, 27, 4723–4733. [Google Scholar]
  16. Doan, N.Q.; Shahid, S.M.; Choi, S.J.; Kwon, S. Deep Reinforcement Learning-Based Battery Management Algorithm for Retired Electric Vehicle Batteries with a Heterogeneous State of Health in BESSs. Energies 2024, 17, 79. [Google Scholar] [CrossRef]
  17. Yang, X.; Liu, P.; Liu, F.; Liu, Z.; Wang, D.; Zhu, J.; Wei, T. A DOD-SOH balancing control method for dynamic reconfigurable battery systems based on DQN algorithm. Front. Energy Res. 2023, 11, 1333147. [Google Scholar] [CrossRef]
  18. Ma, Z.; Gao, F.; Gu, X.; Li, N.; Wu, Q.; Wang, X.; Wang, X. Multilayer SOH Equalization Scheme for MMC Battery Energy Storage System. IEEE Trans. Power Electron. 2020, 35, 13514–13527. [Google Scholar] [CrossRef]
  19. Shen, L.; Li, J.; Meng, L.; Zhu, L.; Shen, H.T. Transfer Learning-Based State of Charge and State of Health Estimation for Li-Ion Batteries: A Review. IEEE Trans. Transp. Electrif. 2024, 10, 1465–1481. [Google Scholar] [CrossRef]
  20. Chowdhury, S.; Shaheed, M.N.B.; Sozer, Y. State-of-Charge Balancing Control for Modular Battery System with Output DC Bus Regulation. IEEE Trans. Transp. Electrif. 2021, 7, 2181–2193. [Google Scholar] [CrossRef]
  21. Abdalla, A.A.; Moursi, M.S.E.; El-Fouly, T.H.M.; Hosani, K.H.A. Reliant Monotonic Charging Controllers for Parallel-Connected Battery Storage Units to Reduce PV Power Ramp Rate and Battery Aging. IEEE Trans. Smart Grid 2023, 14, 4424–4438. [Google Scholar] [CrossRef]
  22. Li, N.; Gao, F.; Hao, T.; Ma, Z.; Zhang, C. SOH Balancing Control Method for the MMC Battery Energy Storage System. IEEE Trans. Ind. Electron. 2018, 65, 6581–6591. [Google Scholar] [CrossRef]
  23. Wang, X.; Wang, S.; Liang, X.; Zhao, D.; Huang, J.; Xu, X.; Dai, B.; Miao, Q. Deep Reinforcement Learning: A Survey. IEEE Trans. Neural Networks Learn. Syst. 2024, 35, 5064–5078. [Google Scholar] [CrossRef] [PubMed]
  24. Wu, L.; Lyu, Z.; Huang, Z.; Zhang, C.; Wei, C. Physics-based battery SOC estimation methods: Recent advances and future perspectives. J. Energy Chem. 2024, 89, 27–40. [Google Scholar] [CrossRef]
  25. Sui, Y.; Song, S. A Multi-Agent Reinforcement Learning Framework for Lithium-ion Battery Scheduling Problems. Energies 2020, 13, 1982. [Google Scholar] [CrossRef]
  26. Hu, X.; Li, S.; Peng, H. A comparative study of equivalent circuit models for Li-ion batteries. J. Power Sources 2012, 198, 359–367. [Google Scholar] [CrossRef]
  27. BU-302: Series and Parallel Battery Configurations. Available online: https://batteryuniversity.com/article/bu-302-series-and-parallel-battery-configurations# (accessed on 22 July 2024).
  28. von Bülow, F.; Meisen, T. A review on methods for state of health forecasting of lithium-ion batteries applicable in real-world operational conditions. J. Energy Storage 2023, 57, 105978. [Google Scholar] [CrossRef]
  29. Lipu, M.H.; Hannan, M.; Hussain, A.; Hoque, M.; Ker, P.J.; Saad, M.; Ayob, A. A review of state of health and remaining useful life estimation methods for lithium-ion battery in electric vehicles: Challenges and recommendations. J. Clean. Prod. 2018, 205, 115–133. [Google Scholar] [CrossRef]
  30. Hua, Y.; Cordoba-Arenas, A.; Warner, N.; Rizzoni, G. A multi time-scale state-of-charge and state-of-health estimation framework using nonlinear predictive filter for lithium-ion battery pack with passive balance control. J. Power Sources 2015, 280, 293–312. [Google Scholar] [CrossRef]
  31. Chang, C.Y.; Tulpule, P.; Rizzoni, G.; Zhang, W.; Du, X. A probabilistic approach for prognosis of battery pack aging. J. Power Sources 2017, 347, 57–68. [Google Scholar] [CrossRef]
  32. Jeng, S.L.; Tan, C.M.; Chen, P.C. Statistical distribution of Lithium-ion batteries useful life and its application for battery pack reliability. J. Energy Storage 2022, 51, 104399. [Google Scholar] [CrossRef]
  33. Lucaferri, V.; Quercio, M.; Laudani, A.; Riganti Fulginei, F. A Review on Battery Model-Based and Data-Driven Methods for Battery Management Systems. Energies 2023, 16, 7807. [Google Scholar] [CrossRef]
  34. Bole, B.; Kulkarni, C.; Daigle, M. Randomized Battery Usage Data Set. In NASA Prognostics Data Repository; NASA Ames Research Center: Moffett Field, CA, USA, 2009. [Google Scholar]
  35. Liu, X.; Li, J.; Yao, Z.; Wang, Z.; Si, R.; Diao, Y. Research on battery SOH estimation algorithm of energy storage frequency modulation system. Energy Rep. 2022, 8, 217–223. [Google Scholar] [CrossRef]
  36. Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; A Bradford Book: Cambridge, MA, USA, 2018. [Google Scholar]
  37. Basheer, I.; Hajmeer, M. Artificial neural networks: Fundamentals, computing, design, and application. J. Microbiol. Methods 2000, 43, 3–31. [Google Scholar] [CrossRef]
  38. Alpaydin, E. Introduction to Machine Learning; MIT Press: Cambridge, MA, USA, 2020. [Google Scholar]
  39. Shahid, S.M.; Ko, S.; Kwon, S. Real-Time Classification of Diesel Marine Engine Loads Using Machine Learning. Sensors 2019, 19, 3172. [Google Scholar] [CrossRef]
Figure 1. Full life cycle of lithium-ion batteries.
Figure 1. Full life cycle of lithium-ion batteries.
Energies 17 04604 g001
Figure 2. Connections in a battery pack: (a) series-parallel (S-P), and (b) parallel-series (P-S).
Figure 2. Connections in a battery pack: (a) series-parallel (S-P), and (b) parallel-series (P-S).
Energies 17 04604 g002
Figure 3. An example of a parallel-series connected BESS.
Figure 3. An example of a parallel-series connected BESS.
Energies 17 04604 g003
Figure 4. Battery cell model.
Figure 4. Battery cell model.
Energies 17 04604 g004
Figure 5. The training process in the cooperative multi-agent deep Q network.
Figure 5. The training process in the cooperative multi-agent deep Q network.
Energies 17 04604 g005
Figure 6. SOH balancing with different methods.
Figure 6. SOH balancing with different methods.
Energies 17 04604 g006
Figure 7. Extending lifetime of the battery pack with (a) SOH reduction per time slot, and (b) maximum lifetime of the battery pack until reaching 60% of SOH.
Figure 7. Extending lifetime of the battery pack with (a) SOH reduction per time slot, and (b) maximum lifetime of the battery pack until reaching 60% of SOH.
Energies 17 04604 g007
Figure 8. Standard deviation of SOCs among modules.
Figure 8. Standard deviation of SOCs among modules.
Energies 17 04604 g008
Figure 9. Energy demand profiles with (a) Scenario 1, and (b) Scenario 2.
Figure 9. Energy demand profiles with (a) Scenario 1, and (b) Scenario 2.
Energies 17 04604 g009
Figure 10. Performance of the scheduling algorithms under dynamic power demand in Scenario 1: (a) standard deviations in SOHs among the modules and (b) operation times of the battery pack until the SOH reached 60%.
Figure 10. Performance of the scheduling algorithms under dynamic power demand in Scenario 1: (a) standard deviations in SOHs among the modules and (b) operation times of the battery pack until the SOH reached 60%.
Energies 17 04604 g010
Figure 11. Performance of the scheduling algorithms under dynamic power demand in Scenario 2: (a) standard deviation in SOHs among the modules and (b) operation time of the battery pack until the SOH reached 60%.
Figure 11. Performance of the scheduling algorithms under dynamic power demand in Scenario 2: (a) standard deviation in SOHs among the modules and (b) operation time of the battery pack until the SOH reached 60%.
Energies 17 04604 g011
Table 2. Classification of battery scheduling algorithms.
Table 2. Classification of battery scheduling algorithms.
ReferencesBattery ConnectionSOC
Balancing
SOH
Balancing
Lifetime
Optimal
Dynamic Power Demand
ParallelSeriesParallel-Series
[20]-----
[21]-----
[15]--
[18,22]----
[17]---
[16]--
[25]----
Proposed
Table 3. Initial SOHs of the cells and modules.
Table 3. Initial SOHs of the cells and modules.
Module 1Module 2Module 3Module 4Module 5Module 6
  S O H i 1 ( 0 ) 90.00%82.17%79.23%89.96%86.72%82.84%
  S O H i 2 ( 0 ) 86.77%84.72%88.16%84.23%83.46%86.11%
  S O H i 3 ( 0 ) 84.13%80.02%87.35%85.76%79.98%78.72%
  S O H i 4 ( 0 ) 78.15%79.28%81.19%87.54%81.75%80.14%
  S O H i ( 0 ) 84.77%81.55%83.98%86.87%82.98%81.95%
  S O H P ( 0 ) 81.55%
Table 4. Simulation parameters.
Table 4. Simulation parameters.
ParameterValue
Number of battery cells   6 × 4
Battery typeLithium-ion 3.7 V/2.2 Ah
Total capacity (new)195.36 Wh
  I d i s c h a r g e 8 A
  I c h a r g e −8 A
  I m i n , I m a x + −4 A, 4 A
  S O C m i n , S O C m a x 10%, 90%
  η 1 (discharge)/0.98 (charge)
Total working time1600 h
  Δ t 10 min
Hidden layers in each network2
Neurons in each hidden layer128
Neurons in input layer (cell scheduling)11
Neurons in output layer (cell scheduling)12
Neurons in input layer (module scheduling)15
Neurons in output layer (module scheduling)15
Learning rate ( α )0.001
ϵ -greedy value   0.9
Discount factor ( γ )   0.99
Period of target network update P100 time slots
Table 5. Battery pack lifetime improvement of the proposed algorithm compared with previous work.
Table 5. Battery pack lifetime improvement of the proposed algorithm compared with previous work.
MethodSelf-XM-SOHDOD-SOHM-A2C
Extended lifetime by CM-DQN16.27%14.14%11.93%4.49%
Table 6. Different energy demand conditions.
Table 6. Different energy demand conditions.
ScenarioEnergy DemandMean  μ Variance  σ 2
U(75 Wh, 85 Wh) * 8.33
U(70 Wh, 90 Wh) 33.33
1U(60 Wh, 100 Wh)80 Wh133.33
U(50 Wh, 110 Wh) 300
U(40 Wh, 120 Wh) 533.33
U(40 Wh, 80 Wh)60 Wh
U(50 Wh, 90 Wh)70 Wh
2U(60 Wh, 100 Wh)80 Wh133.33
U(70 Wh, 110 Wh)90 Wh
U(80 Wh, 120 Wh)100 Wh
* U(75 Wh, 85 Wh) was obtained by generating values from a uniform distribution across 75 Wh to 85 Wh.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Doan, N.Q.; Shahid, S.M.; Duong, T.M.; Choi, S.-J.; Kwon, S. Extending the BESS Lifetime: A Cooperative Multi-Agent Deep Q Network Framework for a Parallel-Series Connected Battery Pack. Energies 2024, 17, 4604. https://doi.org/10.3390/en17184604

AMA Style

Doan NQ, Shahid SM, Duong TM, Choi S-J, Kwon S. Extending the BESS Lifetime: A Cooperative Multi-Agent Deep Q Network Framework for a Parallel-Series Connected Battery Pack. Energies. 2024; 17(18):4604. https://doi.org/10.3390/en17184604

Chicago/Turabian Style

Doan, Nhat Quang, Syed Maaz Shahid, Tho Minh Duong, Sung-Jin Choi, and Sungoh Kwon. 2024. "Extending the BESS Lifetime: A Cooperative Multi-Agent Deep Q Network Framework for a Parallel-Series Connected Battery Pack" Energies 17, no. 18: 4604. https://doi.org/10.3390/en17184604

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop