Next Article in Journal
DynER: Optimized Test Case Generation for Representational State Transfer (REST)ful Application Programming Interface (API) Fuzzers Guided by Dynamic Error Responses
Previous Article in Journal
RGMeta: Enhancing Cold-Start Recommendations with a Residual Graph Meta-Embedding Model
Previous Article in Special Issue
Diagnostics on Power Electronics Converters by Means of Autoregressive Modelling
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Remaining Useful Life of Battery Using Partial Discharge Data

1
Department of Computer Engineering, Kongju National University, Cheonan 31080, Republic of Korea
2
Department of Computer Engineering, Inha University, Incheon 22212, Republic of Korea
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2024, 13(17), 3475; https://doi.org/10.3390/electronics13173475
Submission received: 31 July 2024 / Revised: 26 August 2024 / Accepted: 26 August 2024 / Published: 1 September 2024
(This article belongs to the Special Issue Power Electronics and Renewable Energy System)

Abstract

:
Lithium-ion batteries are cornerstones of renewable technologies, which is why they are used in many applications, specifically in electric vehicles and portable electronics. The accurate estimation of the remaining useful life (RUL) of a battery is pertinent for durability, efficient operation, and stability. In this study, we have proposed an approach to predict the RUL of a battery using partial discharge data from the battery cycles. Unlike other studies that use complete cycle data and face reproducibility issues, our research utilizes only partial data, making it both practical and reproducible. To analyze this partial data, we applied various deep learning methods and compared multiple models, among which ConvLSTM showed the best performance, with an RMSE of 0.0824. By comparing the performance of ConvLSTM at various ratios and ranges, we have confirmed that using partial data can achieve a performance equal to or better than that obtained when using complete cycle data.

1. Introduction

Since lithium-ion batteries are cost-effective, possess high energy density, and exhibit excellent efficiency, they are widely used in a broad range of applications, such as electric vehicles, electronics, and aircraft [1,2]. One widespread use is in electric vehicles, where they are predominantly recognized as environmentally friendly because they reduce greenhouse gas emissions and minimize fossil fuel consumption [3]. Recently, there has been a substantial increase in the growth rate of electric vehicles worldwide that use lithium-ion batteries, and, as a result, the demand for lithium-ion batteries is also increasing day by day [4]. Understanding lithium-ion batteries in detail is necessary for estimating their lifespan, which is important for their efficient use. Detailed knowledge of battery lifespan aids in enhanced design and manufacturing practices, improves stability and reliability, and helps to manage costs more efficiently. The pace at which lithium-ion batteries degrade varies based on their chemical composition, design, and the specific environments in which they are used. The diversity and unpredictability of deterioration patterns due to these factors make accurate predictions difficult. While lithium-ion batteries are important for the development of sustainable technologies, the accurate prediction of their lifespan remains a complex issue [5].
Battery degradation is the most critical aspect of battery health in the management systems of lithium-ion batteries. During the cycling process, lithium-ion cells undergo many physical and chemical changes, which result in reducing the lifespan and overall performance of the battery [6,7]. Conventionally, a lithium-ion battery is considered to have reached the end of life (EOL) when its capacity falls to 80% of its original value [8]. The remaining useful life (RUL) refers to the number of charging and discharging cycles remaining before the battery reaches its EOL [9]. The objective of RUL prediction is to predict the degradation of battery performance over time. By using historical and present data, it aims to forecast the future state of the battery and provide a warning before failure occurs. Precisely estimating the RUL is important to ensure stability, maintenance, and timely replacement [10].
Generally, three methods are used to estimate the RUL of a battery: model-based, data-driven and hybrid methods. Model-based estimations rely on physics-based battery models, consist of mathematical equations, and require a complete understanding of the battery mechanism. Several adaptive filters and observers are used in these approaches to evaluate the battery’s condition and estimate its RUL. Techniques such as empirical models, electrochemical models, and equivalent circuit models (ECM) are commonly used. Monitoring battery degradation specifically requires sophisticated models like the electrochemical model and Brownian motion model, along with estimation methods like the Kalman filter [11], particle filter [12], and other adaptive filters like the H filter. These techniques play a pivotal role as estimation methods, but they have several drawbacks, since they are based on complex electrochemical processes. Firstly, it is difficult to adjust the model’s parameters throughout the battery’s life cycle. Secondly, using particle filters as observers can lead to complications such as a reduction in effective particles, which ultimately makes the RUL predictions unreliable [13].
Data-driven methods are independent of complex mathematical equations and are based on large amounts of data and powerful computers [14]. In recent developments regarding the estimation of battery degradation, machine learning and deep learning models such as gaussian process regression (GPR), support vector machine (SVM), artificial neural network (ANN), and recurrent neural network (RNN) [15] have been employed to predict the RUL of batteries. Several studies have applied differential voltage analysis and incremental capacity analysis under high-current scenarios to determine the relationship associated with capacity degradation. SVM was utilized, achieving a relative error of 2.1% for capacity estimation [16,17]. The LSTM+GRU model was employed for predictions using the aging data from different batteries, which represent various charge and discharge cycles, achieving an accuracy of 0.0032 compared to other models [13]. Furthermore, a multiple-output GPR model was implemented to predict the state of health and the remaining useful life of lithium-ion batteries, obtaining an RMSEEoL of 4.86 when the data were trained from Cell C2 [18]. Although existing studies have made considerable progress in predicting the RUL of batteries, it is important to note that the predictions using these techniques typically relied on complete data from the battery cycles. These techniques used data where the battery underwent repeated charging and discharging from 0 to 100% until it reached its EOL. This approach, while comprehensive, is impractical for real-world applications and results in lower reproducibility issues.
To address some of the drawbacks of previous studies, new research has been proposed that incorporates early cycles of data from the battery instead of the complete cycles of data. These studies continue to use data for predictions like studies that incorporated complete data from the cycles. However, instead of utilizing all cycles from the battery until it reaches its end of life (EOL), it only concentrates on the early cycles. Advanced development in machine learning and deep learning has shown excellent results in these studies. For instance, a deep learning approach based on LSTM layers was employed to predict the entire capacity degradation trajectory of batteries from only 100 cycles of early life data, achieving an accuracy of 1.1% prediction error and a processing speed 15 times faster than that of traditional methods [19]. Similarly, another study utilized the initial 50 cycles of data from lithium-ion batteries to predict the knee point, where the capacity degradation accelerates rapidly and in a non-linear pattern. Based on this knee point, Paula et al. classified the battery life cycle into ‘short’, ‘medium,’ and ‘long’ categories, achieving a prediction error rate of 9.4% for the knee point. In addition to that, the life cycle classification of the cells utilized only the data from the first three to five cycles, attaining a high accuracy of 88–90%. These results paved the way for improved management and predictions of battery lifespan. This classification was based on the early cycle information from the first three to five cycles [20]. Furthermore, a CNN-based model used early cycle data from the first 50 cycles using voltage, current, and temperature measurements of 124 Li-ion cells, which accurately predicted the knee point in capacity degradation, with an MAE of 57.8 cycles and a MAPE of 9.4% [2]. Saxena et al. also utilized a convolutional neural network to forecast the entire capacity fade degradation curve from the first 100 cycles of discharge data, achieving mean absolute percentage errors of 19% and 17% [21]. However, these studies still face some of the same challenges as before. One of the continuous issues was the need to collect data from the complete 0 to 100% charge and discharge cycles. Therefore, the problems of previous studies still exist.
In order to resolve these issues, existing studies have shifted towards the use of partial data for the RUL of battery predictions. These studies utilize only segments of the cycle data, such as 80–100%, instead of the complete data. This strategy offers a potential solution to the issues of impracticality and low reproducibility that were previously addressed. The latest developments in the methodologies for health prognostics of battery focus on the importance of using partial data. An RNN-LSTM method achieved an RMSE of 0.00286 by analyzing partial charge and discharge data, which showed the model’s precision with partial data [22]. Similarly, an SVM model applied to partial discharge data extracted from voltage and temperature features achieved an RMSE of 0.2159, comparable to full discharge data in the voltage range of 3.75–3.5 V [23]. Moreover, an HPR CNN model utilized only 20% of the limited information from battery cycles and obtained an accuracy of 4.15% and predicted the cycle life with an error of 16.09 cycles [24]. Furthermore, a GPR model utilized 30% of the partial discharge data to obtain around 1% RMSE and less than 1% average standard deviation [25].
Hybrid methods combine both the data-driven approaches and model-based approaches [26]. They use machine learning models such as neural network or support vector machine combined with filtering techniques like PF or KF [27]. One study proposes a hybrid predictive model for battery capacity degradation that employs a sequence-to-sequence deep learning framework. Initially, hybrid features are extracted by combining electrochemical model parameters with battery discharge data [28]. Hybrid approaches face challenges such as high computational demands, complexity in integrating different techniques, and difficulties in tuning parameters accurately. These issues can lead to increased system complexity, longer processing times, and potential model overfitting.
Earlier studies proposed approaches for the RUL prediction of a battery using complete data. However, these approaches often proved impractical and suffered from low reproducibility issues. To address that, new studies have emerged that only utilized the early cycle data, reducing the amount of data by concentrating only on the early cycles instead of the complete cycle data. Even with this improvement, some of the original limitations were still in place, requiring the use of full 0–100% cycle data. Nevertheless, studies have been proposed recently that utilize partial data, particularly a portion of the cycle, such as 80–100%. This approach addresses many of the previous issues and improves efficiency, practicality, and reproducibility. The main research differences in the field of RUL prediction concern the utilization of the cycle’s data. Previous studies predicting the RUL of batteries were in following the order: utilization of complete data from the cycles, utilization of early cycles from the complete cycle, and the utilization of partial data. Our research, in contrast, presents a systematic approach by utilizing partial cycle data but using different ratios and ranges. By providing different ratios along with a broader spectrum of ranges, it offers a more comprehensive analysis, which is more practical.
Our paper references these recent studies in predicting the RUL of batteries. The contributions of this research are as follows:
  • By referencing existing research methods, we predicted the RUL of a battery using partial discharge data.
  • Our research introduces a more systematic approach by utilizing data across different ratios and a broader spectrum of ranges. By analyzing and predicting the RUL at various ratios, such as 5%, 10%, and 20%, and further segmenting these into corresponding ranges like 95–100%, 90–100%, and 80–100%, it offers a more comprehensive analysis.
  • We compared the performance of various models using different data ratios and applied the most effective predictive model.
  • We analyzed the impact of data quality on the predictive performance of the optimal model.
  • We demonstrated that high-quality data are important for improving the accuracy of the RUL of battery predictions.
The rest of the paper is organized as follows: Section 2 provides the materials, which includes information about the dataset and the partial data. Section 3 discusses the prediction of the RUL of a battery and the architecture of deep learning models. Section 4 includes the results, in which the prediction of the RUL of a battery and the impact of data quality on performance is discussed. Finally, Section 5 includes the discussion and conclusion.

2. Materials and Methods

2.1. Dataset

The dataset [22] comprises 124 commercial lithium-ion phosphate (LFP) batteries that were subjected to standard charge and discharge processes under specific conditions. The specifications of the dataset are provided in Table 1. These batteries underwent a three-step charging process and a two-step discharging process. The discharge process was carried out using the standard constant current (CC) and constant voltage (CV) CC-CV method, starting at a discharge rate of 4C and continuing until the voltage dropped to 2 V. Once the voltage reached the lower cutoff point, the discharge mode shifted from constant current to constant voltage. The batteries were used in a controlled environment maintained at 30 degrees Celsius, and the end of life (EOL) of the batteries was defined as the point when their capacity fell below 80% of the original capacity. The operational voltage range for each battery was between 3.5 V and 2.5 V.
The dataset for the proposed study consists of 8 features with over 1000 data points for each feature. The main external health indicators of the battery are current, voltage, and temperature, which can be measured directly. However, there are other characteristics of the battery inside the cell that cannot be measured directly and require specific conditions to obtain, as they are sensitive to the operating environment. Using all features is usually impractical and complex. Therefore, for the purpose of our study, we chose to concentrate on the main external health indicators and a set of important attributes, which are voltage, current, temperature, charge capacity, discharge capacity, and the rate of change in discharge capacity [9,29].

2.2. Analysis of Discharge Data

Partial data refers to segments of data from each cycle, rather than the complete set of data from the battery cycles. It consists of segmentation of the battery’s cycle information into defined portions. This segmentation can reveal important changes in the battery’s performance, such as unusual drops in voltage or decreases in discharge capacity at specific points in the cycle. Although significant information can certainly be extracted from the complete dataset, analyzing defined portions of data enables a more detailed examination of cell behaviors that may not be apparent in a comprehensive analysis. The left-hand side of Figure 1 shows the relationship between the discharge capacity and voltage, while the right-hand side shows the relationship between the differences in discharge capacity and voltage. The cycle numbers are marked with different colors. Each cycle starts at 2.5 V and ends at 3.5 V during the constant current discharge process. During the degradation of the battery, significant changes occur in capacity, and the discharge curves of the aged and new batteries show notable differences [30]. An aged battery not only shows a lower starting capacity, but also a faster deterioration of capacity compared to a new battery. Particularly, the curve shows deviation when the voltage is around 3.1–3.2 V, as shown in Figure 1. Since the decline starts early during the discharge process, the early indications of battery degradation observed in the proposed dataset, as shown in Figure 1, enable the prediction of RUL, even in the initial cycles.
The knee point in the voltage versus discharge capacity plot, as shown in Figure 1, represents a critical transition in the battery’s discharge profile. In our analysis, this knee point occurs between the voltage levels of 3.2 and 3.1 V. Particularly, the knee point can be observed at approximately 3.2 V, which corresponds to a discharge capacity of just above 0.75 Ah. This region is characterized by a marked shift from a relatively flat voltage decline to a more pronounced decrease, indicating the approach towards the battery’s lower operational threshold.

2.3. Methodologies

2.3.1. CNN (Convolutional Neural Network)

CNN is a specific type of feedforward neural network that handles data that have grid-like topology, such as images and time series data. CNN was introduced by LeCun et al. in 1998 [31], and since then it has been broadly used in the fields of image processing and natural language processing. As CNN can perform local perception and employ weight sharing, it can be used to analyze spatial features. It is an effective technique for analyzing trends in data because it extracts features of input data and learns a pattern inside of the data [32]. Since CNN can process multiple-dimension input, it is used for many deep learning tasks. The convolution operation in a CNN is defined by the following:
Z T i = m M Z i + m · T m
where Z is the input, T is the n-dimensional convolutional kernel, i is a vector of indices in the output feature map, and m covers the range of all relative positions in the kernel.
CNN consists of convolution layers, pooling layers, and fully connected layers. In the convolution layers, it creates feature maps, which find important patterns and relationships in the data. Once the feature map is created, it is processed through the pooling layers for down sampling to prevent overfitting. The convolutional kernels further extract specific features and intrinsic data by utilizing filters on the width of input data. Then, the data are merged into fully connected layers, which combine all of the extracted and processed information. Lastly, the processed information or data are then used in the last stages of the network, which are typically fully connected layers, to make predictions.

2.3.2. LSTM (Long Short-Term Memory)

The discharge data collected from the battery cycles are in the form of time series data that make RNN architecture suitable because they use the internal memory to forecast the time series data. The processing of sequences of information is possible in RNN through both the internal feedback and forward feedback connection within its processing units. However, RNN performs worse on longer sequences than on short ones because it can only retain a limited portion of the sequence’s information; therefore, it leads to a drop in accuracy for longer sequences. LSTM was proposed to address the problem of long-term dependencies, which is a limitation in traditional RNN. It extracts information and keeps it for extended time intervals [33]. The LSTM network was introduced to address the concerning issue in basic RNN of the inability to maintain long-term dependencies, which often leads to gradient problems. LSTM architecture comprises units known as memory blocks in the recurrent hidden layer. There are memory cells in the memory blocks that have self-connections that store the network’s temporal state. Moreover, these blocks contain specific multiplicative units known as gates that handle the flow of information. In the traditional LSTM design, each memory block has an input gate and an output gate [34]. LSTM is capable of mapping input to output while holding and maintaining information for extended periods of time. The LSTM is based on several gates, including the forget gate f t , input gate i t , cell state c t , output gate o t , and the hidden state h t [35].
The forget gate decides what information will be eliminated from the cell state. It applies a sigmoid function to the input at the current time step x t and the prior hidden state ( h t 1 ).
f t = σ   V f X x t + V f h h t 1 + d f
The next step is to decide what current information is to be stored in the internal state. It uses the hyperbolic tangent activation function (tanh) that yields a new vector ( e t ), which is added to the cell state and comprises a sigmoid function ( σ ) that controls the information that is to be updated. It is computed as follows:
i t = σ V i X x t + V i h h t 1 + d i
e t = tanh V g X x t + V g h h t 1 + d g
Combining Equations (4) and (5) to update the internal state r t 1 into the current state r t is achieved as follows:
r t = e t i t + r t 1 f t
Similarly, the output gate (qt) determines the output information by using the sigmoid activation function. The calculation for the output gate is as follows:
q t = σ V q X x t + V q h h t 1 + d q
The hidden state ( h t ) is determined by passing the cell state ( r t ) through the activation function, tanh, and multiplying it with the output of the output gate q t , as follows:
h t = tanh r t q t
In this way, LSTM is able to keep the important information that is required for a longer period of time and discard the information that is not needed with the help of gate units, and, thus, it performs better with a longer sequence of data.

2.3.3. GRU (Gated Recurrent Unit)

GRU is a type of ANN with recurrent connections. It is suitable for sequential tasks, which involves tasks such as modeling, learning temporal representations, identifying, and forecasting sequential data [36]. GRUs do not contain an output gate like LSTMs, which use the gate to control the availability of memory contents to other units in the network. This means that the entire content of GRU’s memory is accessible to the network at each step. Because there is no output gate, this feature might increase the speed of learning and facilitate the flow of gradients [37]. In GRU, the hidden states work as the network’s memory, and the current state is determined by both the prior state and the current input. GRU can handle the propagation of gradient information, as it uses the reset gate and update gate methods to control the internal information flow. To be more precise, the reset gate controls whether to ignore the information from the previous hidden state, while the update gate is responsible for updating the hidden state [38]. GRU functionality can be described as an update gate z t , reset gate r t , hidden state output h t , and the output at each time step.
z t = σ V z x t + U z h t 1 + e z
r t = σ V r x t + U r h t 1 + e r
h ˜ = U r t   · h t 1 + V c x t + e h ˜
h t = z t · h t 1 + 1 z t · h ˜
where V z , V r , and V c are weight parameters; e z , e r , and e h ˜ are bias parameters; and σ and φ are the sigmoid function and output activation function, respectively.

2.3.4. CNN+LSTM

In CNN+LSTM, units from the CNN are combined with those from LSTM, which is designed to handle heterogeneous spatiotemporal data. Unlike traditional CNNs, which excel at distinguishing spatial patterns but lack temporal context, CNN+LSTM combines the best features of both spatial and temporal dimensions. It can sequentially handle both spatial and temporal data. A practical approach to integrate GPs with existing neural architectures is through the use of GP layers. For instance, in an LSTM model used for time-series forecasting, a GP layer can be added after the LSTM layers to model the residual uncertainty that the LSTM layers cannot capture. This method improves the LSTM’s ability to capture long-term dependencies and the GP’s ability to model uncertainty around those dependencies. In LSTM, after processing the input data through standard LSTM layers to capture the dynamics and dependencies in the data, the outputs (and possibly states) of the LSTM can be fed into a GP layer. This GP layer treats the outputs from the LSTM as inputs (features) and provides a probabilistic output. In GRU and CNN+LSTM, similar integrations can be implemented where the sequential outputs from GRU or the spatiotemporal feature maps from CNN+LSTM are passed through a Gaussian process layer to estimate the uncertainty of the predictions at each time step.

2.3.5. ConvLSTM

Traditional LSTM is unable to handle spatiotemporal data effectively due to its reliance on fully connected layers for input-to-state and state-to-state transitions. The ConvLSTM framework combines the LSTM structure with convolutional operations, enabling it to process data with spatial hierarchies [39]. While this model adopts a learning approach similar to traditional LSTM, it uniquely processes both input and output data, along with state layers, as three-dimensional tensors using convolutions, instead of traditional matrix multiplication. ConvLSTM improves upon traditional LSTM, as it allows for the simultaneous learning of temporal and spatial data. In addition to that, during training, it feeds the predicted data back into the input. It can be defined by the operations on the input tensor as Z t , the hidden state H t 1 , and the cell state E t 1 at the given time t. ConvLSTM modifies the states through convolutional transformations (*), which are as follows:
i t = σ V z i Z t + V h i H t 1 + V e i · E t 1 + b i
f t = σ V z f Z t + V h f H t 1 + V e f · E t 1 + b f
C t = f t · E t 1 + i t · tanh V z e Z t + V h e H t 1 + b c
o t = σ V z o Z t + V h o H t 1 + V e o · E t + b o
H t = o t · tanh E t
where · represents the element-wise multiplication; i t , f t , and o t are the input, forget, and output gates, respectively; V and b represent the weights and biases of the gates; and σ is the sigmoid activation function.

3. Experiments

3.1. Model Evaluation

We have conducted experiments using RMSE, MAE, and MARNE as well, since we observed that the performance trends are similar, hence we only opted to use RMSE as single representative metrics.
To evaluate the performance of the implemented model, we employed the root mean squared error (RMSE) to measure the model’s predicted capacity against the actual discharge capacity over the entire battery life cycle.
R M S E = 1 n i = 1 n z a z p 2
where z p is the predicted capacity and the z a is the actual discharge capacity.

3.2. Overview of Experiment

In our study, we utilized a deep learning model based on ConvLSTM2D layers to analyze temporal and spatial data patterns effectively. To optimize our model, we employed the Adam optimizer, renowned for its adaptive learning rate capabilities, which is important for avoiding local minima during training. The parameters and optimization of the experiment are given in Table 2. The training process was closely monitored using an Early Stopping callback, which is instrumental in preventing overfitting. This callback monitored the validation loss and halted the training if no improvement was observed for five consecutive epochs. Such a strategy ensures that our model generalizes well on unseen data without memorizing the training set.
The dataset is composed of three batches, of which two are used for training and the remaining one for testing the model’s performance. Each cycle in the dataset comprises 1000 data points with eight features. In this paper, we extracted and utilized the current, voltage, temperature, and discharge capacity. The features that we used from the dataset contain outliers and missing values, which were preprocessed using the average of the nearby values. The model receives two parallel inputs. The first input consists of the initial five cycles and the most recent five cycles, which represent the initial and current states of the battery, respectively. The second input is derived by calculating the difference between the recent five cycles and the initial cycles, which enables us to consider data variation. Since the initial cycles often contain noise, the third cycle was chosen as the initial cycle.
Figure 2 represents the overall workflow of our proposed study. The dataset contains diverse features, but the main features are voltage, current, temperature, and discharge capacity. The difference segment is based on variations in the cycles. To represent the variation between the current cycle and the initial cycle, a noise-resistant initial cycle was used. Note that the ΔVoltage in the difference segment was determined to be practically meaningless. This conclusion was reached because the calculations consistently resulted in no significant variance worth reporting. Different models, such as LSTM, GRU, CNN+LSTM, and ConvLSTM, were implemented, and their performance was compared accordingly. Once the best model was identified, the performances were evaluated in the 5%, 10%, and 20% ratios corresponding to the broader spectrum of ranges. Since there was a difference in performance in the 50% range, as we observed a sharp decline during our experiments, we performed a detailed analysis at the 1% ratio (40–60%). Furthermore, performance evaluation was carried out between the parallel inputs and the single input at the 1% corresponding to 40–60% ranges and at the 5%, 10%, and 20% ratios, with respect to all ranges.
In the experiments, we aimed to predict the discharge capacity 10 cycles ahead using partial data specified in ratios (1%, 5%, 10%, 20%, and 100%). For example, at a 5% ratio, only 50 out of 1000 data points are used. Using the optimal model identified through the model comparison outlined in Section 3.3, we predicted the battery’s discharge capacity across the entire range of the dataset, which was divided into various ratios. After that, we conducted a detailed analysis of these prediction results to evaluate the impact of data segmentation on performance. An analysis of the results revealed significant variations, particularly in segments, which were further analyzed using statistical analysis at a 1% ratio. Furthermore, we evaluated the model’s performance without the second input, which accounts for changes in the data, to directly compare its effects on the model’s effectiveness.

3.3. Results

3.3.1. Model Comparison

We conducted experiments across the entire range at specified ratios (5%, 10%, 20%, and 100%) in order to deeply understand the predictive capabilities of each model. The purpose of this experiment was to compare the overall performance of each model and identify the optimal one. We performed five epochs for each model to calculate the RMSE values, and these values were averaged to assess the overall performance of each model. Table 3 shows the overall performance of the models in terms of the average RMSE values. The results show that ConvLSTM and CNN+LSTM outperformed the other models, obtaining lower RMSE values of 0.0914 and 0.0824, respectively. On the other hand, the worst performing models were GRU and LSTM, which obtained RMSE values of 0.1199 and 0.1093, respectively. These results confirm that, although each model exhibited a good performance, ConvLSTM showed the best performance.

3.3.2. Performance Based on Different Ratios and Ranges

The results discussed in Section 3.3 identified ConvLSTM as the best model among the others, which made it ideal to conduct further analysis on this optimal model. Therefore, we evaluated the performance of the ConvLSTM model at specified ratios (5%, 10%, and 20%). Table 4 presents a comparison of the highest and average performances of the model at these ratios. The highest performance for each ratio was observed in the ranges of 95–100%, 90–100%, and 80–100%, with corresponding RMSE values of 0.0286, 0.0332, and 0.031. Moreover, the average performance at these ratios was 0.1373, 0.1041, and 0.1232, demonstrating that the performance is comparable to or better than that achieved with the complete data.
Figure 3 presents an overview of the performance of the ConvLSTM model at the 5%, 10%, 20%, and 100% ratios. The horizontal axis displays the performances by specified ratios and their corresponding ranges, and the vertical axis displays the performance metrics in terms of RMSE values. As shown in Figure 3, in most cases, the performance using partial data was superior to that obtained using the complete data, regardless of the ratio. Additionally, significant performance differences were observed when comparing intervals within the 0–50% range to those within the 50–100% range, with a notable drop in performance around the 50% mark.
To determine the cause of performance deterioration in the 0–50% range, we conducted a thorough statistical analysis of the features. Since voltage, current, and temperature exhibit consistent or regularly varying values, we focused our statistical analysis on discharge capacity, which is the most important feature. We calculated statistical measures such as the mean, median, variance, standard deviation, minimum, and maximum to evaluate the data distribution in the discharge capacity. Figure 4 depicts the data distribution of discharge capacity across a 5% ratio. The horizontal axis highlights the data distribution in the 5% ratio corresponding to their ranges, while the vertical axis represents the value of discharge capacity. Significant differences were observed in the values of discharge capacity across the 0–50% and 50–100% ranges. As shown in Figure 4, the statistical values in the 50–100% range were significantly lower than those in the 0–50% range. A sharp decline in capacity value was observed around the 50% mark, followed by a constant trend at a capacity value of approximately 0.1.
To understand the impact of the sharp decline in discharge capacity on actual prediction results, we selected and compared several results from each range. The experimental results have revealed that predictions tend to be more accurate in the 50–100% range compared to the 0–50% range. Figure 5 presents the predicted outcomes of the RUL of the battery for the testing dataset, which consists of 45 batteries. Each curve represents an individual battery. The horizontal axis shows the progression of cycles over time, while the vertical axis highlights the discharge capacity values. The maximum discharge capacity observed is around 0.1; furthermore, beyond this point, the curves fall, which shows that the battery has reached its EOL, and the cycle for the next battery begins. To examine the actual results, we selected two cases each between the 50–100% and 0–50% ranges at a 5% ratio, as shown in Figure 5. The results reveal that significant variations in discharge capacity greatly influence the actual prediction results. Particularly, predictions within the 50–100% range, where the discharge capacity spans a wider range, are more accurate than those within the 0–50% range, where the discharge capacity is narrower. Our findings highlight that, by using partial data of high-quality, we can achieve results that are similar to or better than those obtained using full cycle data.

3.3.3. Evaluation of Performances on Data Quality by 1%

The aim of this experiment is to evaluate the significant differences in performances observed when analyzing data around the 50% range, as highlighted in Section 3.3.2 where differences in the performances were noted. To further investigate the sharp decline in performance observed around the 50% mark, we conducted a detailed analysis by dividing the range near 50% (40–60%) into 1% increments. Figure 6 presents the prediction results for segments within the 40–60% range that showed significant performance differences. It was observed that the prediction results for the segments 59–60%, 58–59%, and 57–58% were more accurate compared to the segments 48–49%, 47–48%, and 46–47%. In addition to that, a sharp decline in performance was observed between the 47–48% range and the 48–49% range.
In the actual prediction results, it was observed that the prediction performance dropped sharply around the 48% mark. To analyze this in more detail, we conducted a statistical analysis in this range. Figure 7 shows the results of the statistical analysis performed by dividing the 40–60% range into 1% increments. Examining the points where the prediction performance dropped sharply, specifically the 48–49% and 47–48% ranges, we found that the mean and median values of the discharge capacity drop below 0.2. This issue arises from the characteristics of the dataset we used. As detailed in Section 2.1, our battery dataset utilizes a two-step discharge process. This results in a significant variance in the discharge capacity between the 50–100% and 0–50% ranges, subsequently causing a sharp decline in performance within these ranges. Among the four features used for prediction, discharge capacity is the most important. When it drops below a certain threshold, there is a sharp degradation in performance. Thus, this analysis clearly indicates how specific data ranges impact prediction performance and provides useful information for enhancing data collection and model improvement.

3.3.4. Impact of Parallel Inputs on the Model’s Performance

The aim of this experiment is to determine the impact of excluding the second input, which is related to data variation, on the performance of our model. The model’s performance was evaluated by providing only the first input, and the results were observed in the specified ratios (1%, 5%, 10%, and 20%). Figure 8 presents an overview of the performance of the ConvLSTM model at these ratios. As shown in Figure 8, both the 50–100% and 0–50% ranges achieved similarly low RMSE values of approximately 0.1. However, a closer examination of the actual prediction results reveals significant differences. The differences were due to the model’s learning process from the data. In the 50–100% range, there are certain ranges where learning occurred effectively and others where it did not, which resulted in an unstable learning process. In contrast, in the 0–50% range, the learning process was mostly ineffective. This issue is similar to scenarios with parallel inputs; however, when parallel inputs were used, learning in the 50–100% range tended to be much more stable. Therefore, the second input, which is related to data variation, plays an important role in improving the stability of the learning process.

4. Discussion

As technology in lithium-ion batteries advances, there has been a surge in battery development characterized by diverse electrical and chemical properties. The diversity complicates the accurate estimation of the RUL of the batteries. Although existing studies have made contributions to the development of battery health predictions, they were based solely on complete data [13,18,19,20,21]. Utilizing complete data and processing all information is resource intensive, impractical, and difficult to reproduce. Even in real-world conditions, where complete data collection is challenging, predictions using partial data from each cycle prove to be effective.
This paper utilizes partial data to predict the RUL of batteries based on discharge data. However, we could consider expanding this study to include charge data, which would deepen our understanding of the interactions between charging and discharging processes. Moreover, in this paper, the data ratio, which is segmented in 1% intervals within the 40–60% range, could be expanded to cover the entire 0–100% range or refined further into smaller increments, such as 0.5%. Furthermore, our research has identified a sharp decline in performance around the 50% mark due to the characteristics of the battery dataset used. Therefore, using different battery datasets could help us to evaluate the generalizability of our model and locate points of significant performance changes similar to those found in the current dataset. Moreover, while our model utilizes four features from the dataset, which are voltage, current, temperature, and discharge capacity, it could be improved by incorporating additional features such as charge rate and internal resistance. In addition to that, since we have only used one dataset in which changes in the 50% range were clear, future work could consider utilizing new datasets.

5. Conclusions

Previous studies predicted the RUL of batteries using complete data, but these studies often proved to be impractical and suffered from low reproducibility issues. To address that, recent studies have shifted to utilize partial data, which offers solutions to these limitations. Based upon these studies, we have presented our approach for the prediction of the RUL of batteries successfully, while we have significantly improved data processing efficiency and acquired high accuracy in predictions. We segmented the partial discharge data into specified ratios, such as 5%, 10%, 20%, and as low as 1%, for a more detailed analysis. Our results consistently show that the accuracy of predictions using partial data is often better than that obtained with complete data. In particular, the lowest RMSE values observed with partial data were 0.0286 at 5%, 0.0332 at 10%, and 0.032 at 20%, compared to 0.0972, with the full dataset. Additionally, the model exhibited a sharp performance decline at the 50% mark, and a further statistical analysis revealed that this decline was due to inherent characteristics in the dataset. When data quality drops below a certain threshold, there is a sharp decline observed in the performance of the predictions, which highlights the importance of high-quality data for reliable predictions. Moreover, we discovered that our model continues to perform well and achieve good prediction results even with partial data, if the data quality is sufficient, which shows that performance depends on the quality of data rather than the quantity of data.

Author Contributions

Conceptualization, Q.H. and S.Y.; methodology, S.Y.; software, S.Y.; validation, Q.H., S.Y. and J.K.; formal analysis, Q.H. and S.Y.; investigation, J.J., M.L. and J.K.; resources, J.K.; data curation, Q.H., S.Y. and J.K.; writing—original draft preparation, Q.H.; writing—review and editing, Q.H., S.Y., J.J., M.L. and J.K.; visualization, Q.H.; supervision, J.K.; project administration, J.K.; funding acquisition, J.K. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported by the Technology Development Program of MSS (No. S3033853) and Regional Innovation Strategy (RIS), through the National Research Foundation of Korea (NRF) funded by the Ministry of Education (MOE) (2021RIS-004).

Data Availability Statement

The dataset is publicly available at https://data.matr.io/1/ (accessed on 25 August 2024).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Ren, L.; Dong, J.; Wang, X.; Meng, Z.; Zhao, L.; Deen, M.J. A Data-Driven Auto-CNN-LSTM Prediction Model for Lithium-Ion Battery Remaining Useful Life. IEEE Trans. Ind. Inform. 2021, 17, 3478–3487. [Google Scholar] [CrossRef]
  2. Sohn, S.; Byun, H.E.; Lee, J.H. Two-Stage Deep Learning for Online Prediction of Knee-Point in Li-Ion Battery Capacity Degradation. Appl. Energy 2022, 328, 120204. [Google Scholar] [CrossRef]
  3. Li, S.; He, H.; Zhao, P.; Cheng, S. Health-Conscious Vehicle Battery State Estimation Based on Deep Transfer Learning. Appl. Energy 2022, 316, 119120. [Google Scholar] [CrossRef]
  4. Vennam, G.; Sahoo, A.; Ahmed, S. A Survey on Lithium-Ion Battery Internal and External Degradation Modeling and State of Health Estimation. J. Energy Storage 2022, 52, 104720. [Google Scholar] [CrossRef]
  5. Paulson, N.H.; Kubal, J.; Ward, L.; Saxena, S.; Lu, W.; Babinec, S.J. Feature Engineering for Machine Learning Enabled Early Prediction of Battery Lifetime. J. Power Sources 2022, 527, 231127. [Google Scholar] [CrossRef]
  6. Qin, Y.; Yuen, C.; Yin, X.; Huang, B. A Transferable Multistage Model With Cycling Discrepancy Learning for Lithium-Ion Battery State of Health Estimation. IEEE Trans. Ind. Inform. 2023, 19, 1933–1946. [Google Scholar] [CrossRef]
  7. Birkl, C.R.; Roberts, M.R.; McTurk, E.; Bruce, P.G.; Howey, D.A. Degradation Diagnostics for Lithium Ion Cells. J. Power Sources 2017, 341, 373–386. [Google Scholar] [CrossRef]
  8. Cai, Y.; Yang, L.; Deng, Z.; Zhao, X.; Deng, H. Online Identification of Lithium-Ion Battery State-of-Health Based on Fast Wavelet Transform and Cross D-Markov Machine. Energy 2018, 147, 621–635. [Google Scholar] [CrossRef]
  9. Hu, X.; Xu, L.; Lin, X.; Pecht, M. Battery Lifetime Prognostics. Joule 2020, 4, 310–346. [Google Scholar] [CrossRef]
  10. Hu, X.; Che, Y.; Lin, X.; Onori, S. Battery Health Prediction Using Fusion-Based Feature Selection and Machine Learning. IEEE Trans. Transp. Electrif. 2021, 7, 382–398. [Google Scholar] [CrossRef]
  11. Plett, G.L. Extended Kalman Filtering for Battery Management Systems of LiPB-Based HEV Battery Packs—Part 3. State and Parameter Estimation. J. Power Sources 2004, 134, 277–292. [Google Scholar] [CrossRef]
  12. Schwunk, S.; Armbruster, N.; Straub, S.; Kehl, J.; Vetter, M. Particle Filter for State of Charge and State of Health Estimation for Lithium-Iron Phosphate Batteries. J. Power Sources 2013, 239, 705–710. [Google Scholar] [CrossRef]
  13. Liu, K.; Shang, Y.; Ouyang, Q.; Widanage, W.D. A Data-Driven Approach with Uncertainty Quantification for Predicting Future Capacities and Remaining Useful Life of Lithium-Ion Battery. IEEE Trans. Ind. Electron. 2021, 68, 3170–3180. [Google Scholar] [CrossRef]
  14. How, D.N.T.; Hannan, M.A.; Hossain Lipu, M.S.; Ker, P.J. State of Charge Estimation for Lithium-Ion Batteries Using Model-Based and Data-Driven Methods: A Review. IEEE Access 2019, 7, 136116–136136. [Google Scholar] [CrossRef]
  15. Eddahech, A.; Briat, O.; Bertrand, N.; Delétage, J.-Y.; Vinassa, J.-M. Behavior and State-of-Health Monitoring of Li-Ion Batteries Using Impedance Spectroscopy and Recurrent Neural Networks. Int. J. Electr. Power Energy Syst. 2012, 42, 487–494. [Google Scholar] [CrossRef]
  16. Ospina Agudelo, B.; Zamboni, W.; Monmasson, E. Application Domain Extension of Incremental Capacity-Based Battery SoH Indicators. Energy 2021, 234, 121224. [Google Scholar] [CrossRef]
  17. Zheng, L.; Zhu, J.; Lu, D.D.C.; Wang, G.; He, T. Incremental Capacity Analysis and Differential Voltage Analysis Based State of Charge and Capacity Estimation for Lithium-Ion Batteries. Energy 2018, 150, 759–769. [Google Scholar] [CrossRef]
  18. Richardson, R.R.; Osborne, M.A.; Howey, D.A. Gaussian Process Regression for Forecasting Battery State of Health. J. Power Sources 2017, 357, 209–219. [Google Scholar] [CrossRef]
  19. Severson, K.A.; Attia, P.M.; Jin, N.; Perkins, N.; Jiang, B.; Yang, Z.; Chen, M.H.; Aykol, M.; Herring, P.K.; Fraggedakis, D.; et al. Data-Driven Prediction of Battery Cycle Life before Capacity Degradation. Nat. Energy 2019, 4, 383–391. [Google Scholar] [CrossRef]
  20. Fermín-Cueto, P.; McTurk, E.; Allerhand, M.; Medina-Lopez, E.; Anjos, M.F.; Sylvester, J.; dos Reis, G. Identification and Machine Learning Prediction of Knee-Point and Knee-Onset in Capacity Degradation Curves of Lithium-Ion Cells. Energy AI 2020, 1, 100006. [Google Scholar] [CrossRef]
  21. Saxena, S.; Ward, L.; Kubal, J.; Lu, W.; Babinec, S.; Paulson, N. A Convolutional Neural Network Model for Battery Capacity Fade Curve Prediction Using Early Life Data. J. Power Sources 2022, 542, 231736. [Google Scholar] [CrossRef]
  22. Chinomona, B.; Chung, C.; Chang, L.K.; Su, W.C.; Tsai, M.C. Long Short-Term Memory Approach to Estimate Battery Remaining Useful Life Using Partial Data. IEEE Access 2020, 8, 165419–165431. [Google Scholar] [CrossRef]
  23. Ali, M.U.; Zafar, A.; Nengroo, S.H.; Hussain, S.; Park, G.S.; Kim, H.J. Online Remaining Useful Life Prediction for Lithium-Ion Batteries Using Partial Discharge Data Features. Energies 2019, 12, 4366. [Google Scholar] [CrossRef]
  24. Zhang, Q.; Yang, L.; Guo, W.; Qiang, J.; Peng, C.; Li, Q.; Deng, Z. A Deep Learning Method for Lithium-Ion Battery Remaining Useful Life Prediction Based on Sparse Segment Data via Cloud Computing System. Energy 2022, 241, 122716. [Google Scholar] [CrossRef]
  25. Zhao, C.; Andersen, P.B.; Træholt, C.; Hashemi, S. Data-Driven Battery Health Prognosis with Partial-Discharge Information. J. Energy Storage 2023, 65, 107151. [Google Scholar] [CrossRef]
  26. Xu, R.; Wang, Y.; Chen, Z. A Hybrid Approach to Predict Battery Health Combined with Attention-Based Transformer and Online Correction. J. Energy Storage 2023, 65, 107365. [Google Scholar] [CrossRef]
  27. Liao, L.; Köttig, F. Review of Hybrid Prognostics Approaches for Remaining Useful Life Prediction of Engineered Systems, and an Application to Battery Life Prediction. IEEE Trans. Reliab. 2014, 63, 191–207. [Google Scholar] [CrossRef]
  28. Xu, L.; Deng, Z.; Xie, Y.; Lin, X.; Hu, X. A Novel Hybrid Physics-Based and Data-Driven Approach for Degradation Trajectory Prediction in Li-Ion Batteries. IEEE Trans. Transp. Electrif. 2023, 9, 2628–2644. [Google Scholar] [CrossRef]
  29. Shi, D.; Zhao, J.; Wang, Z.; Zhao, H.; Eze, C.; Wang, J.; Lian, Y.; Burke, A.F. Cloud-Based Deep Learning for Co-Estimation of Battery State of Charge and State of Health. Energies 2023, 16, 3855. [Google Scholar] [CrossRef]
  30. Deng, Z.; Hu, X.; Lin, X.; Xu, L.; Che, Y.; Hu, L. General Discharge Voltage Information Enabled Health Evaluation for Lithium-Ion Batteries. IEEE/ASME Trans. Mechatron. 2021, 26, 1295–1306. [Google Scholar] [CrossRef]
  31. Lecun, Y.; Bottou, E.; Bengio, Y.; Haffner, P. Gradient-Based Learning Applied to Document Recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
  32. Lawrence, S.; Lee Giles, C.; Member, S.; Chung Tsoi, A.; Back, A.D. Face Recognition: A Convolutional Neural-Network Approach. IEEE Trans. Neural Netw. 1997, 8, 98–113. [Google Scholar] [CrossRef] [PubMed]
  33. Bengio, Y.; Simard, P.; Frasconi, P. Learning Long-Term Dependencies with Gradient Descent Is Difficult. IEEE Trans. Neural Netw. 1994, 5, 157–166. [Google Scholar] [CrossRef]
  34. Sak, H.; Senior, A.; Beaufays, F. Long Short-Term Memory Recurrent Neural Network Architectures for Large Scale Acoustic Modeling. In Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH, Singapore, 14–18 September 2014; International Speech and Communication Association: Grenoble, France, 2014; pp. 338–342. [Google Scholar]
  35. Zhang, Y.; Xiong, R.; He, H.; Pecht, M.G. Long Short-Term Memory Recurrent Neural Network for Remaining Useful Life Prediction of Lithium-Ion Batteries. IEEE Trans. Veh. Technol. 2018, 67, 5695–5705. [Google Scholar] [CrossRef]
  36. Lecun, Y.; Bengio, Y.; Hinton, G. Deep Learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef]
  37. Chung, J.; Gulcehre, C.; Cho, K.; Bengio, Y. Empirical Evaluation of Gated Recurrent Neural Networks on Sequence Modeling. arXiv 2014, arXiv:1412.3555. [Google Scholar]
  38. Fan, Y.; Xiao, F.; Li, C.; Yang, G.; Tang, X. A Novel Deep Learning Framework for State of Health Estimation of Lithium-Ion Battery. J. Energy Storage 2020, 32, 101741. [Google Scholar] [CrossRef]
  39. Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.-K.; Woo, W.-C.; Kong Observatory, H. Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting. Adv. Neural Inf. Process. Syst. 2015, 28, 802–810. [Google Scholar]
Figure 1. The discharge capacity profile vs. voltage over the 400 cycles.
Figure 1. The discharge capacity profile vs. voltage over the 400 cycles.
Electronics 13 03475 g001
Figure 2. Framework of our proposed study.
Figure 2. Framework of our proposed study.
Electronics 13 03475 g002
Figure 3. Overall performance evaluation of ConvLSTM in 5%, 10%, 20%, and 100% ratio.
Figure 3. Overall performance evaluation of ConvLSTM in 5%, 10%, 20%, and 100% ratio.
Electronics 13 03475 g003
Figure 4. Statistical values of discharge capacity across a 5% ratio.
Figure 4. Statistical values of discharge capacity across a 5% ratio.
Electronics 13 03475 g004
Figure 5. Comparison of performance across 5% ratio in marked ranges.
Figure 5. Comparison of performance across 5% ratio in marked ranges.
Electronics 13 03475 g005
Figure 6. Comparison of performance across 1% ratio in marked ranges.
Figure 6. Comparison of performance across 1% ratio in marked ranges.
Electronics 13 03475 g006
Figure 7. Statistical values of the discharge capacity across a 1% ratio.
Figure 7. Statistical values of the discharge capacity across a 1% ratio.
Electronics 13 03475 g007
Figure 8. Performances of ConvLSTM in the 1%, 5%, 10%, and 20% ranges excluding the second input.
Figure 8. Performances of ConvLSTM in the 1%, 5%, 10%, and 20% ranges excluding the second input.
Electronics 13 03475 g008
Table 1. The specifications of lithium-ion phosphate batteries dataset.
Table 1. The specifications of lithium-ion phosphate batteries dataset.
CharacteristicsProperties
ManufacturerA123 Systems
Type APR 18650M1A
Active MaterialsLiFePO4/Graphite
Energy Capacity/Nominal Voltage1.1 Ah/3.3 V
Voltage Limit3.5 V, 2.5 V
Current4 C
Table 2. The parameters and optimization of the experiments.
Table 2. The parameters and optimization of the experiments.
ParameterValueDescription
Input Shape(200, 4, 10, 1)Dimensions of input data without batch size
Filters32Number of filters in ConvLSTM2D layer
Kernel Size3 × 3Dimensions of the convolution window
ActivationReLUActivation function for ConvLSTM2D layer
Batch Size128Number of samples per batch
Early StoppingYesStops training when validation loss plateaus
Table 3. The overall best model considering the average RMSE across all ranges.
Table 3. The overall best model considering the average RMSE across all ranges.
ModelsAverage RMSE
LSTM0.1093
GRU0.1199
CNN+LSTM0.0914
ConvLSTM0.0824
Table 4. ConvLSTM model performance by various ratios.
Table 4. ConvLSTM model performance by various ratios.
RatioBest RMSEAverage RMSE
5%0.02860.1373
10%0.03320.1041
20%0.0310.1232
100%0.09720.1371
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hussain, Q.; Yun, S.; Jeong, J.; Lee, M.; Kim, J. Prediction of Remaining Useful Life of Battery Using Partial Discharge Data. Electronics 2024, 13, 3475. https://doi.org/10.3390/electronics13173475

AMA Style

Hussain Q, Yun S, Jeong J, Lee M, Kim J. Prediction of Remaining Useful Life of Battery Using Partial Discharge Data. Electronics. 2024; 13(17):3475. https://doi.org/10.3390/electronics13173475

Chicago/Turabian Style

Hussain, Qaiser, Sunguk Yun, Jaekyun Jeong, Mangyu Lee, and Jungeun Kim. 2024. "Prediction of Remaining Useful Life of Battery Using Partial Discharge Data" Electronics 13, no. 17: 3475. https://doi.org/10.3390/electronics13173475

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop