1. Introduction
Lithium-ion batteries are commonly employed in electronics, new energy vehicles, and power source systems owing to their high energy density, long life cycle, and low self-discharge rate [
1,
2]. However, as the number of charge-discharge cycles increases, performance degradation will occur for the battery, such as a decrease in the maximum available capacity and an increase in internal resistance [
3]. If the battery is not replaced promptly, great safety risks will be posed to the equipment, seriously affecting the stability and reliability of the power systems [
4,
5]. Accurately estimating battery status is essential for effective battery management, and the state of health (SOH) and remaining useful life (RUL) of lithium-ion batteries are critical indicators for evaluating battery performance, which can effectively quantify the degree of aging [
6].
In the field of battery health assessment, SOH is a widely used macro indicator to measure the overall health of a battery. In addition to SOH, the micro-health parameters of the battery are also important aspects in assessing the health status, including the volume fraction of negative electrode active materials, solid-phase diffusion coefficient, electrolyte concentration, and so on. Micro-health parameters are for the performance of the active material and electrolyte inside the battery, and changes in micro-health parameters can present the internal health state of the battery [
7]. Although micro-health parameters can provide more detailed information about the internal state of the battery, their measurement and evaluation processes are relatively complex and require advanced models and algorithms. In contrast, SOH is more easily assessed by battery capacity, internal resistance, and other parameters. Therefore, in this study, SOH is chosen as the primary indicator for assessing battery health and the mainstream definition of SOH describes it as the ratio of maximum usable capacity to rated capacity [
8]. As the number of charge-discharge cycles increases, the SOH will continuously decrease. When SOH decreases to 70%~80%, the battery is typically recognized as the end of life (EOL), and the number of cycles to degradation from the current state to EOL is called the RUL of the battery [
9,
10].
In addition, there is a correlation between estimating SOH and predicting RUL, as both can be obtained by estimating battery capacity. Model-based and data-driven methods are included in the main research methods [
11]. Model-based methods can be classified as equivalent circuit models and electrochemical models, each based on distinct modeling mechanisms. In the equivalent circuit model, intricate physical and chemical processes within the battery are simplified and basic electronic components are used to simulate battery output [
12]. Zhang et al. [
13] introduced a recursive least square weighted decoupling method to enhance parameter estimation accuracy across various dynamic regimes. Misyris et al. [
14] used recursive least squares with variable forgetting factors to identify parameters of the equivalent circuit model, achieving internal impedance identification and capacity estimation. Wang et al. [
15] developed an equivalent circuit model using the constant voltage charging current curve to predict battery SOH by determining relevant circuit model features. However, establishing a precise model of battery aging is challenging owing to the intricate internal reaction mechanism [
16]. Electrochemical models model the dynamics of battery performance degradation by simulating physical and chemical processes within the battery [
17]. Khodadadi Sadabadi et al. [
18] simulated the battery charging and discharging process by establishing an electrochemical model to analyze the affected factors of battery capacity degradation, thereby predicting RUL. Hong et al. [
19] designed an enhanced single-particle model capable of predicting battery aging states using aging research data from LMO-NMC-cathode graphite-anode batteries. Nevertheless, it is more difficult to develop a precise battery degradation model because the electrochemical model is dynamic and nonlinear. The above model-based methods provide an accurate representation of the external dynamic characteristics of the battery. However, a substantial amount of prior knowledge is required to use the method, and frequent adjustments to the model according to the specific type of lithium-ion battery and varying operating conditions are needed, making it unsuitable for real-time forecasting [
20].
In contrast, data-driven methods do not require intensive examination of the battery’s aging mechanism, but battery performance parameters are directly extracted, such as voltage, current, resistance, temperature, and other data as health features. The health feature parameters are analyzed and modeled with the trend of capacity degradation to estimate the internal state and performance of the batteries. Data-driven methods offer the advantages of efficiency, simplicity, and ease of use, and have attracted extensive attention in various fields [
21]. Zhang et al. [
22] used the rate of temperature change associated with capacity degradation as input to time convolution networks and SOH as output, achieving good prediction accuracy. Wu et al. [
23] implemented RUL prediction by analyzing the change rule of the terminal voltage curve during battery charging and extracting degradation features as inputs to the feed-forward neural network. Khumprom et al. [
24] introduced an application of deep neural networks to extract eight features from charge-discharge curves to predict SOH and RUL, with better results. Chen et al. [
25] used capacity and equal voltage drop discharge time as health features, preprocessed health features using ensemble empirical mode decomposition (EEMD), introduced phase space reconstruction to optimize input sequences, and finally combined it with support vector regression (SVR) to complete the prediction of RUL. Zhang et al. [
26] combined partial incremental capacity with an artificial neural network for estimating battery status under constant current discharge conditions. Deng et al. [
27] used constant current charging time and capacity, as well as constant voltage charging time, as features for training the SVR model to predict SOH. Chen et al. [
28] advanced a hybrid model combining a modified adaptive noise algorithm for performing EEMD decomposition of capacity data, predicting each decomposition component separately using LSSVR. Li et al. [
29] investigated changes in battery charging status under various health conditions. The particle swarm algorithm was employed to optimize the kernel function of the support vector machine for the joint estimation of SOC and SOH. Experimental results show that the method is remarkably adaptable and feasible. Tian et al. [
30] used support vector machines to predict trends in capacity degradation by extracting health features from the temperature differential curve during constant current battery charging. Wu et al. [
31] extracted features from charging capacity and incremental capacity data to model battery aging through ridge regression. Experimental results showed that compared to SVR and Gaussian process regression, more reliable estimates with a simpler structure and lower computational cost were provided by ridge regression using selected features. The data-driven methods mentioned above are simple to implement and achieve high prediction accuracy when reliable training sets are used. However, due to dynamic changes in electrochemical reactions within the battery. This includes factors such as fine-tuning the electrode material structure, electrolyte redistribution, and repair of the solid electrolyte interface layer, which can lead to a phenomenon of capacity regeneration in the battery within a short time [
32]. The existence of this capacity regeneration phenomenon can lead to a situation where the battery has the same SOH but different RUL in two different cycles. This makes it difficult to assess the degree of battery aging comprehensively and accurately by relying only on the SOH. Therefore, it is necessary to consider the SOH and RUL of the current battery to make a comprehensive diagnosis of the degree of aging throughout its life cycle. However, current common evaluation methods usually estimate only one of them, while estimating both separately will inevitably increase computational and algorithmic complexity.
Based on the previous analysis, a method is proposed for estimating SOH and predicting RUL of lithium-ion batteries by charging feature extraction and ridge regression. First, three sets of health feature parameters are extracted from the charging voltage curve, and the Pearson correlation coefficient is applied to analyze these health features for maximum battery capacity. Then, the ridge regression method is applied to develop the battery aging model and estimate the SOH. On this basis, a multiscale prediction model is developed to predict trends in health features as charge-discharge cycles increase. The results of the multiscale prediction model are integrated with the battery aging model to estimate SOH through multiple steps, enabling RUL prediction. Finally, the accuracy and adaptability of the proposed method are confirmed by two battery datasets procured under varied operating conditions. Experimental results show that SOH and RUL are accurately predicted for the proposed method with high accuracy and reliability.
The remainder of this paper is structured as follows:
Section 2 presents battery degradation datasets and explains the process of extracting health features.
Section 3 describes the rationale for the selected models and algorithms, as well as the overall framework for estimating SOH and predicting RUL.
Section 4 provides experimental validation of the method, followed by discussion and analysis of the results.
Section 5 summarizes the work carried out in this paper.
3. Methods
A feasible battery degradation prediction model needs to be constructed after extracting health features from battery charging data. Through the analysis of the health features, it is shown that the correlation coefficients between the selected inputs and outputs are above 0.82, and most of them are above 0.90, indicating a very high correlation between the selected health features and maximum battery capacity. Therefore, linear regression can be used for fitting models because it is computationally inexpensive and can be trained offline.
From the previous section, feature variables are extracted and transformed from the charging voltage curve. Therefore, there is bound to be overlapping information between them, that is, the problem of multicollinearity exists. The presence of a high correlation between data features can contribute to overfitting the linear regression model, ultimately affecting the accuracy of predictions. In addition, a simple linear regression model is extremely susceptible to noise from input data, and even small errors can lead to significant changes in output variables.
3.1. Ridge Regression
Ridge regression is an improved bias estimation method based on least square estimation [
36]. The method optimizes the model by introducing the term L2 regularization to obtain more realistic and reliable regression coefficients, effectively solving the problems of multicollinearity among variables and improving the overall fitting effect of the model. The ridge regression formula is defined as follows:
where
is the vector of coefficients obtained from ridge regression,
is the coefficients of the
th feature,
is the target value of the
th sample,
is the
th feature value of the
th sample,
is the number of samples, and
is the number of features. Where
is the coefficient of the penalty term,
. The greater the coefficient value, the more drastic the contraction of the coefficient vector. When
, the above formula represents the least squares estimation (LSE), so LSE can be considered a special case of ridge regression.
The matrix form of ridge regression is:
where
is the identity matrix, when the eigenvector has a collinearity problem, the matrix
is irreversible and cannot be solved for
. The penalty term added in ridge regression makes the matrix
full rank and invertible, which makes
solvable, and it can be concluded that ridge regression functions to reduce the singularity of the eigenmatrix, and at the same time compresses the size of the coefficient vectors, thus mitigating the risk of overfitting.
3.2. Multiscale Prediction Model
3.2.1. Ensemble Empirical Mode Decomposition
EEMD is an improved EMD method based on noise-assisted analysis [
37]. The principle is to decompose the signal n times by EMD and add white noise with fixed variance and zero means to the initial signal in each decomposition process. By using the property that white noise has zero means to offset the noise effect and taking the integrated mean result as the final result, the modal mixing problem of the EMD algorithm has been effectively solved, ensuring the automatic distribution of the signal in the appropriate time scale.
The EEMD algorithm decomposes complex non-stationary signals into intrinsic mode function (IMF) components at various time scales. Compared to the initial signal, the fluctuation amplitude of each IMF component is reduced, and more precise forecasting results can be obtained by analyzing the IMFs separately. This method has significant advantages in extracting the fluctuation patterns of complex sequences and amplifying the precision of the prediction model. The EEMD algorithm consists of the following specific steps:
- (1)
Add white noise sequence
to the initial signal
to obtain new signal
:
where
is the number of white noise additions.
- (2)
The EMD decomposition of
is performed to obtain the sum of the IMF components of each order and the residual component
after decomposition:
where
is the
th IMF component derived from the decomposition after adding white noise for the
th time, and the value of
is in the range of
.
- (3)
Repeat the above two steps
times and add the IMF components obtained each time, then calculate the mean value as the result:
where
is the
th IMF component average obtained after EEMD decomposition of the initial signal.
3.2.2. Multiple Linear Regression
When there are two or more independent variables in regression analysis, and the independent variable and the dependent variable have a linear correlation, the resulting analysis is called multiple linear regression (MLR). Compared to conventional neural networks, MLR offers high prediction accuracy, easy parameter adjustment, and fast operation speed when dealing with low frequency components that are highly periodic and have a flat trend [
38]. The expression and expansion of the MLR matrix are:
where
is the dependent variable,
is the independent variable,
is the regression coefficient, and
is the random variable. The regression function can be obtained by solving the parameters using the least squares method, and its formula is:
3.2.3. Gated Recurrent Unit
The gated recurrent unit (GRU) optimizes the long short-term memory by combining the forgetting gate and the input gate into an update gate. This optimizes network parameters, enhances convergence speed, and effectively reduces the risk of data overfitting. GRU has excellent performance in handling complex, fluctuating non-smooth, and nonlinear data with excellent performance. Due to the design of the gating mechanism, the GRU model can extract and retain important “key information” in high-frequency components with better capture of long-term dependence in sequence data. The GRU network structure is illustrated in
Figure 4.
The formulas for its status and output are as follows:
where
is the input value at the current moment;
is the value of the update gate at the current moment;
is the value of the reset gate at the current moment;
and
are the state of the implicit layer at the current moment and the previous moment, respectively;
is the activation state of the implicit layer at the current moment;
,
and
are the weight matrices;
and
are the activation functions.
3.2.4. Multiscale Prediction Modeling
From
Figure 3d, as the number of cycles increases, health features show a tendency to decline like capacity, posing a challenge to the model training process. This paper proposes a multiscale prediction model to solve the problem. First, EEMD is utilized to break down the health feature sequence into its high and low frequency components, which have simpler fluctuation patterns and significant frequency characteristics. Then, high frequency and low frequency components are predicted using GRU and MLR models, respectively. Finally, the prediction results from both models are merged to generate a new dataset of health features. The flowchart is shown in
Figure 5.
3.3. Overall Prediction Framework
SOH and RUL serve as crucial indicators for assessing the degree of battery aging, but they are based on different principles and application scopes. Considering the relationship and difference between SOH and RUL, this paper suggests a coupling framework for evaluating and predicting both SOH and RUL in an integrated manner.
The framework of the estimation method is illustrated in
Figure 6. Among them, the blue area represents the data processing process, the green area represents the SOH estimation process, and the yellow area indicates the RUL prediction process. Specific steps are as follows: Initially, charging voltage data for lithium-ion batteries is collected, and EVDCT, IC Peak, and PD Peak are extracted from them as health features. These features are then used as inputs and SOH as outputs to develop a battery aging model using ridge regression. Meanwhile, the multiscale prediction model is employed to forecast the health features of future cycles. Finally, the results are combined with the ridge regression model to make a multistep SOH prediction and achieve RUL prediction.
4. Results and Discussion
4.1. Evaluation Metrics
Prediction results are evaluated using mean absolute error (
MAE), root mean square error (
RMSE), and absolute error (
AE). The following formulas are presented:
where
is the number of predicted cycles,
is the real value,
is the predicted value,
is the number of cycles at the end of battery life in the actual case, and
is the predicted value of RUL.
4.2. SOH Estimation Results
Divide experimental data into training and test sets, and the ridge regression model is constructed by the training set, while its performance is assessed by the test set. Specifically, the first 100 cycle data from B0005 and B0006 of NASA battery datasets are used as training sets, and subsequent cycle data are used as test sets. Similarly, the first 50 cycles of Cell1 and Cell2 in the Oxford battery dataset are chosen as training sets, followed using the remaining cycle data as test sets. Throughout each prediction process, three selected health features are selected from the charging voltage curve and utilized as inputs for the ridge regression model, with SOH estimates as outputs.
To confirm the superiority of the ridge regression model, a comparative experiment was conducted with SVR and LSE. Specific descriptions of the three forecasting methods are illustrated in
Table 3.
Figure 7 displays the SOH prediction results for three methods using various batteries, and a comparison of their estimation errors is shown in
Table 4.
Our method shows superior performance in predicting battery SOH, as shown in
Figure 7. Its prediction curve closely tracks the real SOH degradation curve, accurately capturing the trend of battery capacity decline. Moreover, Our method effectively adapts to the deterioration process of the battery and its capacity regeneration phenomenon. Compared with the SVR and LSE methods, Our method exhibits a notable enhancement in predictive precision. Most of its predicted SOH values closely match the real values, making it well suited for predicting the real SOH trend. However, a careful comparison can be found in
Figure 7c,d, where the simulated results are slightly different from the actual results and appear to jump up and down. This may be due to errors in data collection or processing resulting in bias in the input data used for model training. Therefore, the datasets used for training and testing need to be further reviewed to minimize noise disturbances and outliers and to ensure data accuracy.
Table 4 shows the MAE and RMSE calculation results, and Our method prediction error results on the four battery datasets show extremely low values, which remain below 0.02. Moreover, under the same conditions, Our method exhibits the smallest prediction error. Especially for Cell1 and Cell2, the MAE and RMSE are maintained within 0.004 and 0.005. This shows that the ridge regression model proposed in this paper not only adapts effectively to different datasets but also exhibits superior prediction accuracy.
Among them, the estimation errors of both Cell1 and Cell2 are relatively small, because there is no obvious fluctuation during battery decline and the overall degradation trend is relatively smooth. Therefore, all three prediction methods effectively capture the real SOH curve trends for Cell1 and Cell2 batteries. However, for battery B0006, the degradation trend is more drastic, and fluctuations up and down are more obvious. Within 60 cycles, the SOH of this battery decreased by about 15% and significant capacity regeneration occurred, resulting in a large error in the prediction effect for battery B0006. However, Our method outperforms the other two methods in predicting the real degradation trend with the smallest error, and this also reflects the applicability of Our method in dealing with battery data with drastic and fluctuating degradation trends.
4.3. RUL Prediction Results
RUL prediction is based on the predicted starting point set in the SOH estimation. The left side of
Figure 8 shows the real value and estimated value. The real value of RUL is obtained by calculating the cycles needed to reach EOL from the current cycle count. The red circle on the right side of
Figure 8 shows the absolute error between the real and estimated values. From
Figure 8, the estimated and real values of all four batteries closely match, with a maximum AE of only 2. This observation shows the high accuracy and adaptability of the proposed RUL prediction method.
Notably, the prediction accuracy of health features improves with more cycles, resulting in a decrease in the corresponding RUL estimation error. The overall errors are small in this paper, mainly because the multiscale prediction model can decompose time series signals into components at different frequencies and use the appropriate prediction model for targeted prediction. This method can effectively capture the feature details of the data and can well predict the trend of health features, thus greatly improving the precision and robustness of the RUL prediction method.
Table 5 shows the MAE and RMSE calculation results. Wherein the battery B0005 starts from cycle 101 until it reaches EOL, and the multistep SOH prediction is performed for 25 cycles to obtain the RUL prediction results, with a prediction error of 0.3600 for MAE and 0.6000 for RMSE. Compared to the B0005 battery, the error metrics of the other three batteries are slightly larger, which is mainly due to the poor results caused by the smaller amount of data. However, the MAE of all batteries is below 0.8 and the highest RMSE value is 1.1094, suggesting that the RUL prediction method proposed in this paper can achieve reliable estimation despite the limited data available.
4.4. Results of the Joint SOH and RUL Evaluation
Both SOH and RUL can be used to evaluate the aging state of a battery. However, SOH is evaluated based on the current battery capacity state, while RUL is evaluated based on the number of cycles remaining in the battery. During battery use, the presence of capacity regeneration may cause the battery to have the same SOH in different cycling cycles. However, this does not mean that the battery has recovered to the previous degree of aging at the corresponding SOH, and therefore further judgment is required. Given the more obvious phenomenon of capacity regeneration in B0005, two points with the same SOH but different RUL in B0005 were selected for comparative analysis.
As shown in
Table 6, the remaining usable capacity of the B0005 battery is comparable between cycles 111 and 121, but the RUL of the two differ by 10 cycles. This indicates that although SOH can reflect the current capacity state of the battery, the degree of battery aging cannot be comprehensively and accurately assessed by SOH alone. The joint estimation method proposed in this paper can simultaneously predict the SOH and RUL of the battery at the current moment, and the prediction results are both well-fitted to the real values with high prediction accuracy and reliability. Combining SOH and RUL estimation results helps to assess battery health more comprehensively and accurately.