1. Introduction
Lithium batteries, with their significant advantages such as high energy density, eco-friendliness, low self-discharge rate, and long lifespan, have become the preferred choice in emerging energy storage technologies and are widely used across various fields [
1,
2,
3,
4,
5]. However, the capacity of these batteries gradually diminishes through repeated charging and discharging cycles. The number of cycles a battery undergoes before its capacity falls to 70–80% of its initial capacity is defined as its End of Life (EOL) [
6]. Given the long lifespan characteristic of lithium batteries, experimentally determining their lifespan is not only time-consuming but also costly. Therefore, accurately predicting the EOL of batteries is particularly important. Existing studies [
7,
8] have successfully predicted the EOL, significantly saving time and costs. However, predicting just the EOL is not sufficient; more crucial is the prediction of the Remaining Useful Life (RUL) of the battery, which is vital for providing real-time information about the battery’s current state to users. Moreover, the EOL can be considered a special case of the RUL under initial conditions. Although batteries of the same model may have similar EOLs, their RULs at different stages of use can vary greatly. Batteries at different RUL stages exhibit varying electrochemical characteristics, such as capacity and power. Therefore, compared to the EOL, predicting the RUL is more critical for the maintenance and optimization of battery performance. However, due to the nonlinear changes in batteries during use and the randomness of other conditions, accurately predicting the RUL remains a significant challenge [
9].
The methods for predicting the RUL of lithium batteries can be primarily categorized into two types: model-based methods and data-driven approaches. Model-based methods can be further subdivided into electrochemical models [
10,
11], equivalent circuit models [
12], and empirical models [
13,
14]. For instance, Xing et al. [
15] proposed a model combining empirical indices and polynomial regression, which analyzes the degradation trend of batteries throughout their entire cycle life based on experimental data. However, these methods often rely on nonlinear partial differential equations and are highly sensitive to changes in environmental conditions, making the solving process extremely complex [
16]. This complexity poses a significant challenge to accurately predicting the RUL. To enhance prediction accuracy, filters [
17] can be used for fidelity and noise reduction in model predictions. In 2011, He et al. [
18] combined the Dempster–Shafer theory with Particle Filtering (PF) methods to predict battery RUL. In 2013, Miao et al. [
19] employed the UPF algorithm based on a degradation model to predict the RUL of lithium-ion batteries, achieving predictions with less than 5% error in the actual RUL.
In 2014, Ng et al. [
20] proposed a naive Bayes model to predict battery RUL under varying operational conditions, considering the impacts of different environmental temperatures and discharge currents. Subsequently, data-driven methods based on machine learning began to receive increasing attention. In the field of machine learning, commonly used methods include Support Vector Machine (SVM) [
21,
22,
23,
24], Relevance Vector Machine (RVM) [
25,
26,
27], and Gaussian Process Regression (GPR) models [
28,
29]. Notably, similar to model-based approaches, Relevance Vector Machines are often used in conjunction with other filter algorithms, such as the Kalman Filter (KF) [
25], to further enhance prediction accuracy. In 2019, Severson and colleagues [
7] successfully trained a simple linear model, achieving an impressive RUL prediction accuracy of up to 9.1%. Additionally, they created the Massachusetts Institute of Technology (MIT) battery dataset, the largest open-source battery dataset to date, providing a valuable resource for the development of neural network models trained on large datasets.
With significant advancements in computational capabilities, neural networks have garnered widespread attention in the field of lithium battery RUL prediction [
30,
31]. Ren et al. [
31] achieved an accuracy of up to 88.2% in RUL prediction using 21 extracted features and a deep neural network, particularly excelling when a larger number of input cycles were involved. This represented a notable improvement over traditional methods such as linear regression and SVM. In handling electrochemical sequence data, Recurrent Neural Networks (RNNs) [
32] have shown unique advantages. Long Short-Term Memory (LSTM) networks, a variant of RNNs, are capable of handling variable-dimensional inputs and optimizing parameters through prior information, demonstrating significant accuracy in long-term RUL predictions [
33,
34,
35,
36,
37]. Zhang et al. [
36] used an LSTM network to predict the RUL from lithium-ion battery data, effectively avoiding the vanishing gradient problem common in traditional RNNs. Additionally, Convolutional Neural Networks (CNNs), known for extracting local spatial features in electrochemistry, have also been applied in RUL prediction [
38]. Some studies [
33,
39,
40,
41] combined CNNs with RNNs and their variants to further enhance the accuracy of RUL predictions. However, due to the reliance of RNNs and their variants on previous moment data in the computation process, parallel computing is challenging. To address this, Chen et al. [
42] attempted to combine a 1D CNN with a 2D CNN and used LSTM to capture temporal information, achieving a RUL prediction error of only 3.37% using just 50 cycles. Yang, Y. [
43] completely abandoned LSTM and, by combining a three-dimensional CNN (3D CNN) with a 2D CNN, achieved an RUL prediction error of 3.55% using only 10 cycles of charging data. Furthermore, considering the discontinuity of experimental data in practical applications, Zhang et al. [
44] used only 20% of sparse charging data from 10 cycles for RUL prediction, yet still maintained the error within 4.15%.
However, in practical applications, obtaining continuous 20% of charging data is often challenging. In light of this, our study adopts a novel data processing method: each sample contains charging data from 10 cycles, but only 10 points are randomly sampled from each cycle, forming a new dataset. Jiang et al. [
45] designed the Flexible Parallel Neural Network (FPNN), which achieved state-of-the-art (SOTA) results in the early prediction of battery life. In this paper, we input these randomly sampled 10 points of data into the FPNN for battery RUL prediction.
The main contributions of this paper can be summarized as follows:
(1) Super-Sparse Data: This study is the first to use super-sparse random charging data consisting of only 10 points for lithium battery RUL prediction, better aligning with real-world production environments.
(2) Successful Application of FPNN in RUL Prediction: FPNN is an excellent interpretable model, and this study reaffirms its effectiveness. The combination of sparse data with FPNN enables our research to reach a new state-of-the-art level in RUL prediction.
The structure of the paper is arranged as follows.
Section 2 details the MIT dataset, including its composition and charging process data.
Section 3 describes the method of data sparsification and the evaluation metrics for model prediction performance.
Section 4 presents the experimental results and in-depth analysis. It first introduces the performance evaluation of the presented method, compares it with existing methods, conducts ablation experiments, and concludes the paper with a summary and conclusions.
4. Results and Discussion
Each individual sample was composed of data from 10 cycles. When using data from the first 10 cycles as the sample, according to Equation (
1),
equals the RUL plus 10, representing the cycle life of the battery. Therefore, in this case, the study actually involved the early prediction of the battery’s cycle life using early data, aligning with the objectives of previous research. Consequently, this section focuses on the early prediction of the RUL. In the other scenarios, to predict the RUL of the battery at any given time point, the test set consisted of complete data from all cycles, where each individual sample was composed of data from 10 cycles. Although the randomly sampled data more closely reflect real production conditions, to provide a comparative baseline, this paper also considered datasets with uniformly sampled data for comparative analysis alongside those with randomly sampled data.
4.1. Predictive Performance under Different Conditions
Figure 4a–c depict heatmaps of various error metrics. Notably, for early predictions, the MAPE values were significantly lower compared to non-early predictions. This result even surpassed previous studies, where the accuracy for early predictions remained below 1% across different sampling data points. By adding the predicted RUL to 10, the cycle life of the battery could be obtained. This phenomenon can be attributed to the fact that, unlike Jiang et al. [
45], who included samples with only 4 cycles of data, the samples in this study contained data from 10 cycles, providing richer and more specific electrochemical information within each sample. However, although the samples in this study contained data from 10 cycles, the MAPE for early predictions was smaller than that for non-early predictions, which can be explained by Equation (
2). For samples from the same battery, the actual RUL labels for early predictions were larger, whereas those for non-early predictions were smaller. Since the actual RUL label is in the denominator, the MAPE for early predictions was smaller. This is validated in
Figure 4b–c, where it can be seen that for error metrics that do not require normalization, non-early predictions were more accurate, with lower absolute errors, aligning with the common consensus that early predictions were more challenging to model accurately compared to non-early predictions. Considering that the RMSE and MAE exhibited similar trends, only the box plots of the MAPE and MAE are shown in
Figure 4d. The MAPE for non-early predictions exhibited greater variability, possibly because the MAPE for non-early predictions was larger than that for early predictions, leading to increased differences in extreme values of the MAPE and a broader range of data distribution covered by different samples. Since the MAE for non-early predictions was smaller than that for early predictions, the distribution of the MAE in
Figure 4f shows an opposite trend to the distribution of the MAPE in
Figure 4d.
Subsequently,
Figure 4e,g display the distribution of the cycle life for the non-early and early prediction samples, respectively. Given that the entire MIT dataset comprised 124 batteries, there could be up to 124 different cycle life values, meaning that all samples from the same battery share one cycle life. Consistent with Jiang et al. [
45], the training and test sets were divided in a 94:30 ratio. Despite the large number of RUL samples overall, there were relatively fewer samples for early predictions, which may account for the higher non-normalized error metrics (MAE, RMSE) observed for early predictions. Conversely, there were more samples for non-early predictions, covering almost all 124 possible cycle life values. With the same number of data points, random and uniform sampling each exhibited distinct advantages, albeit with minor differences. When other conditions remained constant, various types of errors showed slight fluctuations with the changes in the number of sampling points, possibly because different total numbers of data points could still clearly describe the framework texture of features.
4.2. Predictive Performance under 10 Data Points
Considering the practical value of using 10 data points, this section focuses on the prediction scenarios when sampling 10 data points.
Figure 5a,b show the non-early prediction RUL scenarios for random and uniform sampling, respectively. Overall, the difference between the two is minimal, but uniform sampling has a slight edge in this context.
Figure 5c,d display the early prediction RUL scenarios for random and uniform sampling, where again, the overall difference is small, but uniform sampling maintains a slight advantage.
Figure 5e,f illustrate the prediction scenarios for individual batteries “b1c1” and “b2c44” under random sampling datasets. Here, we selected single battery data representing the extreme cases of maximum and minimum cycle lives for RUL prediction. The selection of individual batteries in this study differs from previous research, as early samples from those batteries were not randomly allocated to the test set, preventing early RUL prediction for individual batteries. The early prediction scenarios for random sampling of “b1c1” and “b2c44” batteries are shown in
Figure 5i, demonstrating that even under extreme conditions, the data processing method in this study combined with the FPNN still exhibits strong robustness. Additionally, the scenarios of early and non-early RUL predictions with 10-point sampling are more clearly presented in
Figure 5g,h, with the conclusions consistent with those of the previous subsection.
Finally,
Table 1 provides a detailed list of the specific numerical results for early and non-early RUL predictions using datasets with different numbers of data points from random sampling. Our method is compared with other published methods in
Table 2. The comparison reveals that the novel data processing approach used in this study combined with the FPNN demonstrates exceptional performance in predicting the RUL, successfully achieving SOTA level.
4.3. Ablation Experiments
To validate the effectiveness of this study, this section presents comprehensive data from ablation experiments conducted for various scenarios. Detailed tabular data can be found in
Appendix A, specifically in
Table A1 (non-early RUL predictions) and
Table A2 (early RUL predictions).
Figure 6a–c display heatmaps of the ablation experiments under all conditions. Given the significant differences in data extremes, a simple mathematical transformation was applied to the original data, namely
, where
x represents the original error evaluation metric and
y is the processed evaluation metric, which is also the value shown in the figures. ’NaN’ is used to indicate missing data because in these scenarios, after removing the initialization layer, the model consumed excessive GPU memory during training, preventing these experiments from being conducted.
In these experiments, the logarithmic MAPE for early predictions was generally smaller, whereas the MAPE for non-early predictions was larger. Conversely, other non-normalized error metrics like the MAE and RMSE showed the opposite trend. This is consistent with the patterns observed in previous prediction results. It is evident that removing different components of the FPNN model impacted its RUL prediction capability in the various scenarios. Since previous research has indicated that setting the NOI to 3 performs well under different conditions, the NOI in this study was also set to 3.
In this study, special attention was given to the MAPE, a normalized metric, particularly for non-early prediction scenarios. When randomly selecting 10 data points and sequentially removing each layer in the FPNN, it was observed that the accuracy of the FPNN generally decreased. However, interestingly, when the residual was removed, the accuracy slightly improved. This suggests that under the current data distribution, residual connections might have had a minor adverse effect. However, it is important to note that removing residual connections did not always produce adverse effects in other scenarios with different numbers of data points and sampling patterns; sometimes, it even enhanced accuracy. The initial layers, differential feature branch, and 3D conv consistently contributed positively to the model, and their removal led to a decline in model performance. Particularly, the differential feature branch had the most significant impact on the FPNN’s performance, with its removal greatly diminishing the FPNN’s capabilities. The initial NOI in the current model was set to 3. For non-early predictions with 300 randomly sampled data points, removing one InceptionBlock slightly improved the FPNN’s accuracy, and the same was observed for non-early predictions with 200 uniformly sampled data points. However, in other scenarios, the FPNN’s performance typically worsened. When removing two InceptionBlocks, there was a slight improvement in accuracy for non-early predictions with 200 and 300 randomly sampled data points, as well as for 200 uniformly sampled data points. Yet, when all three InceptionBlocks were removed, the FPNN’s performance significantly declined across all non-early prediction scenarios.
In the case of early predictions, the situation changed slightly. Removing the initial layers only led to adverse results when sampling 100 data points, whereas in other scenarios with available data, the FPNN’s performance slightly improved. Similar to non-early predictions, removing the residual sometimes had beneficial effects and sometimes the opposite. The differential feature branch and 3D conv were consistently beneficial. With the initial NOI set to 3, removing one InceptionBlock generally led to a decrease in the FPNN’s performance, but there were improvements in scenarios with 10 and 100 randomly sampled points and 10 uniformly sampled points. When removing two InceptionBlocks, the FPNN’s performance generally declined, but there were improvements in scenarios with 10, 100, and 300 uniformly sampled points. Finally, when all three InceptionBlocks were removed, the FPNN’s performance generally declined, but there was an improvement in the scenario with 100 uniformly sampled points.
Given the practical significance of sampling 10 data points,
Figure 6d presents bar graphs of the MAPE, MAE, and RMSE when sampling 10 data points. As previously mentioned, the differential feature branch is crucial, a fact that is reaffirmed in this chart. The roles of the other layers are also quite evident, with the unaltered FPNN consistently performing well under various conditions. Certain layers, particularly the residual connections and NOI, had mixed effects on the FPNN’s performance. However, this also confirms previous research findings [
45] that adapting the NOI to suit different conditions can fully harness the potential of the FPNN.
Finally, for detailed information on the ablation experiments conducted for RUL prediction using datasets with 10 randomly sampled data points, please refer to
Table 3.