Lithium-Ion Battery State-of-Health Prediction for New-Energy Electric Vehicles Based on Random Forest Improved Model

Liang, Zijun; Wang, Ruihan; Zhan, Xuejuan; Li, Yuqi; Xiao, Yun

doi:10.3390/app132011407

Open AccessArticle

Lithium-Ion Battery State-of-Health Prediction for New-Energy Electric Vehicles Based on Random Forest Improved Model

by

Zijun Liang

^1,2,*

,

Ruihan Wang

¹,

Xuejuan Zhan

¹,

Yuqi Li

¹ and

Yun Xiao

^1,2

¹

School of Urban Construction and Transportation, Hefei University, Hefei 230601, China

²

Anhui Province Transportation Big Data Analysis and Application Engineering Laboratory, Hefei 230601, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2023, 13(20), 11407; https://doi.org/10.3390/app132011407

Submission received: 20 September 2023 / Revised: 14 October 2023 / Accepted: 16 October 2023 / Published: 17 October 2023

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

The lithium-ion battery (LIB) has become the primary power source for new-energy electric vehicles, and accurately predicting the state-of-health (SOH) of LIBs is of crucial significance for ensuring the stable operation of electric vehicles and the sustainable development of green transportation. We collected multiple sets of charge–discharge cycle experimental data for LiFePO₄ LIB and employed several traditional machine learning models to predict the SOH of LIBs. It was found that the RF model yielded relatively superior predictive results, confirming the feasibility of applying the RF model to SOH prediction for the electric vehicle LIB. Building upon this foundation, further research was conducted on the RF improved model for LIB SOH prediction. The PSO algorithm was employed to adaptively optimize five major parameters of the RF model: max_depth, n_estimators, max_features, min_samples_split, and min_samples_leaf. This adaptation addresses the issue of prediction errors that stem from human experience to optimize parameters in the RF model. The results indicate that the RF improved model proposed in this paper can further improve the prediction accuracy of LIB SOH. Its model evaluation index also outperform others, demonstrating the effectiveness of this approach in the management of LIB SOH for new-energy electric vehicles. This contributes significantly to urban environmental protection and the development of green transportation.

Keywords:

new-energy electric vehicles; lithium-ion battery; state-of-health prediction; random forest model; particle swarm optimization

1. Introduction

With the improvement in residents’ economic levels and the deepening of urbanization, there is increasing popularity of traditional fuel-powered vehicles [1], such as gasoline and diesel. The emission pollution, noise pollution, and electromagnetic pollution are becoming more and more serious, which has resulted in heightened energy consumption and environmental contamination [2]. Developing green transportation is an important way to address these issues, and new-energy electric vehicles have become an ideal choice for green transportation [3]. The biggest difference between new-energy electric vehicles and traditional gasoline vehicles is that their core power source is a battery [4]. This makes new-energy electric vehicles capable of zero emissions, high energy efficiency, low noise levels, and energy conservation. Among them, LIBs have characteristics such as high energy density, long cycle life, and low self-discharge rate [5], making them the primary power source for new-energy electric vehicles. However, during the use of LIB, rapid nonlinear aging issues may sometimes occur, severely limiting battery life [6]. When the LIBs of new energy electric vehicles encounter problems during operation, it is easy to cause traffic accidents and seriously affect traffic safety. Accurately predicting the SOH of LIBs is essential for providing decision support for timely maintenance and replacement of LIBs in new-energy electric vehicles. It serves as a crucial guarantee for the reliability and safety of the battery energy storage system, ensuring that the LIB can operate safely and effectively, and promoting road safety and environmental protection. Therefore, accurate prediction of LIB SOH is of paramount importance [7].

The SOH prediction is one of the important parameters for LIBs. It is usually defined as the ratio of the discharge capacity to the rated capacity of the battery under certain discharge conditions, also known as the capacity retention rate of the battery [8]. Early class methods for SOH prediction mainly include the definition method, electrochemical impedance spectroscopy method, and partial discharge method [9]. With the continuous development of electric vehicles, new prediction methods have been proposed. Currently, the widely used method is the model-based battery SOH prediction method, which estimates the SOH of LIBs by combining an electrochemical model or an equivalent circuit model with established state-space equations. Examples of these models include the second-order RC equivalent circuit model [10], fractional-order equivalent circuit model [11], and the core-buffer equivalent circuit model [12]. In recent years, with the widespread application of probability and statistical theory and the rise of big data, the machine learning model has also been applied in the field of LIB SOH prediction. Examples of these algorithms include the decision trees (DT) [13], the convolutional neural networks (CNN) [14], the long short-term memory neural networks (LSTM) [15], and the random forest (RF) [16], etc. These algorithms form a data-driven approach for the SOH of battery assessment. This approach does not require considering internal chemical changes in the battery. By analyzing the changes in current, voltage, capacity, and other data during battery charge–discharge processes, it can predict the SOH of the battery, providing a solid foundation for the research conducted for this study.

The decision trees (DT) model is a common classification method and a straightforward machine learning algorithm [13]. It can classify and predict data, and it possesses advantages such as high accuracy, low computational cost, and high interpretability [17]. Based on the NASA LIB charge–discharge experimental data, Zhang et al. [18] proposed a model based on the gradient-boosting decision trees model framework to achieve evaluation of LIB SOH; Salinas et al. [19] predicted the SOH of an aging battery based on an enhanced decision trees model and analyzed its influencing factors; Wang et al. [20] used DTs to analyze the adaptability of battery to working condition, temperature and degradation, and found that the dynamic characteristics of battery changed significantly during aging.

The convolutional neural networks (CNN) is a type of feedforward neural network that includes convolutional computations and has a deep structure. It is one of the representative algorithms in deep learning and possesses strong data feature extraction capabilities [14]. Wang et al. [21] proposed integrating CNN and wavelet transform to make full use of the ability of CNN to extract features hierarchically, and wavelet transform to analyze frequencies, so as to predict the SOH of the battery; Din et al. [22] used CNN to extract features from the collected image data of a large number of common battery manufacturing faults, reducing human intervention and saving time; Ruan et al. [23] proposed a CNN-based model to explain the relationship between the charging data and the SOH, so as to diagnose the SOH of the LIB; Li et al. [24] proposed using 1D-CNN technology to learn the nonlinear relationship between VRB current, flow rate, state-of-charge and voltage.

The long short-term memory neural network (LSTM) is a type of recurrent neural network designed specifically to address the issues of class recurrent neural networks. LSTM is well-suited for processing and predicting important events with long gaps and delays in time series data [15]. Nguyen et al. [25] proposed a method for estimating SOH of LIB using the LSTM network, and the estimation accuracy is higher than that of the feedforward neural network; Li et al. [26] compared the prediction accuracy of two algorithms, EKF-LSTM and LSTM-EKF, for the SOH of the battery. The results showed that the LSTM-EKF method achieved the highest estimation accuracy; Yang et al. [27] established an improved LSTM model to predict the SOH of the LIB, and the results show that the prediction accuracy was higher than the LSTM model without optimization; Zhang et al. [28] conducted experiments using NASA and CALCE battery charge–discharge datasets. The results demonstrated that the prediction method based on the LSTM model effectively forecasts the available capacity and remaining useful life of the LIB.

Random forest (RF) is an ensemble learning algorithm initially introduced by Breiman in 2001. It possesses strong data mining capabilities, high accuracy, robustness, and interpretability [29]. The fundamental principle of RF involves randomly sampling the data from the datasets to construct new subsets, which are then used to train multiple decision trees. These decision trees are combined to create a more powerful model, thereby enhancing predictive accuracy [30]. Endzhievskaya et al. [31] suggested a preference for using the RF model in machine learning due to its ease of use and minimum number of hyperparameters for tuning. Compared to machine learning models such as DT, CNN, and LSTM, the RF model has demonstrated higher predictive accuracy in certain application scenarios [32,33,34]. The aforementioned studies indicate that the RF model may offer better predictive performance in data prediction. To this end, there have been some studies that combine the RF model for battery SOH prediction. For example, Lin et al. [35] proposed a battery SOH estimation method based on constant current charging time, combining it with RF regression to achieve accurate and fast SOH prediction. Lin et al. [36] introduced a multi-feature-based multi-model fusion approach for LIB SOH estimation, enhancing the prediction accuracy by incorporating an RF model for preliminary SOH estimation. These studies have established an important model foundation for conducting research on the SOH prediction of LIB in new-energy electric vehicles in this paper. The RF model has five major parameters: maximum depth of decision trees (max_depth), number of decision trees (n_estimators), feature selection (max_features), minimum samples required to split an internal node (min_samples_split), and minimum samples required for a leaf node (min_samples_leaf). Most of these five parameters are often set according to human experience in practical applications, which can potentially impact the prediction accuracy. For instance, in the literature [35,36], there has not been adaptive tuning of the RF model parameters. In this regard, some experts and scholars have proposed improvement methods for the RF model regarding the handling of model parameters. For example, Tang et al. [37] improved the accuracy of point cloud classification by using weighted voting and modifying n_estimators. Xu et al. [38] introduced the RandomizedSearchCV and GridSearchCV methods to optimize n_estimators and max_depth, thereby enhancing the average accuracy. Balyan et al. [39] considered configuring a large number of parameters during the improvement of RF but did not perform parameter optimization. Luo et al. [40] used particle swarm optimization (PSO) to optimize the parameters of max_depth, n_estimators, and max_features, enhancing the control accuracy of slurry pressure. Xiong et al. [41] utilized PSO to optimize the parameters of n_estimators, max_features, and min_samples_leaf, achieving high classification accuracy for ultra-high-voltage converter valve fault detection.

The aforementioned research has provided important insights for further developing an RF improved model in predicting the SOH of LIBs in new-energy electric vehicles. However, these studies, on one hand, only optimized some parameters among the five major parameters of the RF model, without delving into the impact of other unoptimized parameters on the predictive accuracy of the model. The optimization of combinations of these five parameters may further enhance the predictive accuracy of the RF model, but could simultaneously increase the model’s complexity and runtime. Therefore, there is a need to select a search algorithm to enhance the optimization performance of model parameters. The PSO algorithm is a global optimization search known for its high convergence and fast search speed [42], making it suitable for addressing the parameter optimization problem of the RF model [40,41]. It can adaptively optimize the five major parameters of the RF model, namely max_depth, n_estimators, max_features, min_samples_split, and min_samples_leaf, thus improving both the model’s training speed and predictive accuracy. Currently, there is limited research on using PSO to enhance the predictive accuracy of LIB SOH through optimizing the RF model.

In this regard, we focus on two main areas of innovation in the context of SOH prediction for LIBs in new-energy electric vehicles. First, it involves the analysis of SOH prediction using various machine learning models based on multiple 18,650 LiFePO₄ batteries charge–discharge cycle experimental data. This analysis validates the feasibility of applying the RF model to SOH prediction for LIBs in new-energy electric vehicles. Second, the study employs a PSO algorithm to adaptively tune the five major parameters of the RF model. By utilizing the improved RF model, it further improves the accuracy of LIB SOH prediction, contributing to the development of new-energy electric vehicles and green transportation.

2. Method

2.1. General Idea

The overall research approach in this study comprises three main phases: data collection and preprocessing, machine learning model selection and determination, and machine learning model improvement and validation. In the data collection and preprocessing phase, charge–discharge cycle experimental data from multiple 18,650 LiFePO₄ batteries were collected. After data preprocessing, multiple datasets suitable for LIB SOH prediction were obtained. In the machine learning model selection and determination phase, various machine learning models, including the DT model, CNN model, LSTM model, and RF model, were considered by referencing the relevant literature. Based on multiple sets of preprocessed LIB charge–discharge cycle experimental data, Python programming was employed to conduct SOH prediction analyses using these models. The model with relatively optimal SOH prediction results, determined from this analysis, was chosen as the machine learning model for this research, and is the RF model. In the machine learning model improvement and validation phase, to compare and analyze against the RF improved model proposed by Luo et al. [40], two models were constructed: RF improved model 1, which adaptively optimized three parameters (max_depth, n_estimators, and max_features), and RF improved model 2, which adaptively optimized five parameters (max_depth, n_estimators, max_features, min_samples_split, and min_samples_leaf). The RF model, RF improved model 1, and RF improved model 2 were compared based on their SOH prediction results to validate the effectiveness of the proposed approach in this study.

The basic process of the RF improved model proposed in this study is shown in Figure 1, and the specific steps are as follows.

Collect multiple sets of LIB charge–discharge cycle data, use the 3Sigma method to identify and eliminate outlier data for noise reduction, and then apply the Min-Max standardization method for data processing.
Utilize a Python program to read the processed data and partition the dataset into training sets and testing sets (out-of-bag data), according to a specified ratio. Train the model using the training sets and calculate the importance of each feature and sort them based on the testing set data during prediction, in order to extract relevant information that aids in SOH prediction.
Using max_depth, n_estimators, max_features, min_samples_split, and min_samples_leaf as the major parameters, employ a Python program to invoke the PSO library for calculating the particles’ positions, velocities, and fitness values within the search space. Conduct multiple rounds of iteration updates until the predetermined iteration time is reached, and output the best parameter combination for max_depth, n_estimators, max_features, min_samples_split, and min_samples_leaf.
Utilizing the best parameter combination, build an RF improved model using a Python program. Evaluate the constructed RF improved model using the testing datasets and output the model’s SOH prediction and evaluation index.

2.2. Data Collection and Preprocessing

With the rapid development of green transportation and new-energy electric vehicles, the currently industrialized cathode materials for LIBs used both domestically and internationally include LiFePO₄, lithium manganese (LiMn₂O₄), and ternary materials (Li(Ni,Co,Mn)O₂). Among the metal elements comprising the cathode materials of the battery, Co is the most expensive, with limited reserves, Ni and Mn are more affordable, and Fe is the cheapest. Therefore, the LiFePO₄ battery is relatively cost-effective. In addition, the LiFePO₄ battery offers advantages such as high safety, long cycle life, good performance at high temperatures, and environmental friendliness. These qualities have made it widely adopted in new-energy electric vehicles [43]. Therefore, we have selected the 18,650 LiFePO₄ battery for charge–discharge cycle experiments. This allowed us to gather multiple charge–discharge cycle experimental data for the LIB and accumulate a substantial database. In this study, multiple 18,650 LiFePO₄ batteries with a rated capacity of 2500 mAh were employed. These experiments were carried out on a battery tester, model CT-4008T-5V6A-S1, which was connected to a computer. The battery tester was configured and monitored using the BTS Control battery testing software on the computer. Battery data was collected and counted using the BTSDA battery data collection software. The laboratory environment was maintained at a temperature of 25 ± 2 °C. Each charge–discharge cycle of the battery included the following steps.

Secure the LIB in the device using the battery tester’s upper and lower fixtures and allow it to sit undisturbed for ten minutes.
Connect the battery tester to the computer and use the BTS Control software to send commands to the battery tester: discharge the LIB at a constant current rate of 1 C to a cut-off voltage of 3 V and output the corresponding battery data.
The LIB remains idle on the battery tester for ten minutes.
Send commands to the battery tester through the BTS Control software: charge the LIB at a constant current rate of 1 C until it reaches the cut-off voltage of 4.2 V, then continue with constant voltage charging until the current drops to 0.1 C, and output the corresponding battery data.
Repeat steps 1 to 4. Collect data from multiple charge–discharge cycle experiments for the LIB.

Collect the battery data output by using the BTSDA software on the computer connected to the battery tester. The data recorded in the experiment include the cycle index, charge–discharge capacity, median voltage, and charge–discharge time of the LIB. Data collection was concluded when the capacity of the 18,650 LiFePO₄ battery decayed to 20% of the rated capacity, after which data processing was initiated.

Data standardization is a fundamental step in data preprocessing. After obtaining a set of original data, it is common to analyze whether there are any outliers present, and if so, they need to be treated to reduce noise. For LIB charge–discharge cycle data, one can plot the curves of cycle number versus actual capacity to observe the presence of outliers. We use the 3Sigma method to find abnormal data, and the specific implementation process is as follows.

Calculate the difference between the capacity values of the (i − 1) th and (i+1) th cycle periods, where i is an integer greater than 1 and less than the total number of cycle periods.
Calculate the mean (α) and standard deviation (σ) of these differences.
Identify data points whose values fall outside the range (α − 3σ, α + 3σ) with a probability lower than 0.01 as outliers.
The average value of the data in the same category as the abnormal data in the previous cycle and the next cycle is taken as representing the data after the abnormal data processing.

Since the performance of the machine learning model is largely dependent on the effectiveness of data standardization [44], we employed the Min-Max standardization method for data preprocessing before constructing the RF model. Min-Max standardization transforms each original value into a value between 0 and 1 by finding the minimum and maximum values in the data set. The calculation formula is as follows.

{\bar{X}}_{i} = \frac{X_{i} - \min \{X_{1}, X_{2}, \dots, X_{n}\}}{\max \{X_{1}, X_{2}, \dots, X_{n}\} - \min \{X_{1}, X_{2}, \dots, X_{n}\}}

(1)

where X_i is the ith original data, and

{\bar{X}}_{i}

is the data after being transformed into equivalent values.

2.3. Feature Importance Calculation

In the RF model, 90% of the original sample set data is randomly selected to form the training sets for model training. The remaining 10% of the data is referred to as out-of-bag data (OOB) [45], also known as the testing sets, which can be used to calculate feature importance. Different features constitute a feature set, and each feature set may have an impact on the prediction results of the model. The greater the importance of features, the greater the impact on the prediction results of the model, so it is particularly important to determine the feature set according to the importance of features.

We select charge and discharge capacity, charge and discharge energy, median voltage, charge and discharge time, etc. as the characteristics in the process of LIB charge–discharge. To determine the significance of these features, the feature importance is calculated for an RF model consisting of n trees. The calculation rule for feature importance in an RF model with n trees is as follows.

F = \frac{1}{n} \sum_{i = 1}^{n} |e_{2} - e_{1}|

(2)

where e₁ is the error associated with out-of-bag data for decision tree i, and e₂ corresponds to the error generated when introducing noise interference to a certain feature for all out-of-bag data samples.

After calculating the importance of each feature, the feature selection process is performed as follows:

The feature set is formed from the features initially selected, the feature importance is calculated, and then the features are sorted according to the feature importance.
Remove the least important feature in the feature set to obtain a feature set with a smaller range.
Repeat Steps 1 and 2 until the importance of the features in the feature set is greater than a certain threshold.

2.4. Parameter Optimization Based on Particle Swarm Optimization

The particle swarm optimization (PSO) is a swarm intelligence optimization algorithm, introduced by Kennedy and Eberhart in 1995. The basic principle is to mimic the behavior of a group of birds (particle swarm) in a forest (search space) seeking food based on the quantity of food (fitness value), to find the location with the most food (group best position) [46]. Each particle in the swarm is characterized by three attributes: position, velocity, and fitness value. Position represents a solution to the problem being solved, velocity indicates the direction and distance the particle will move in the next iteration, and fitness value is calculated based on the fitness function. Initially, particle position and velocity are randomized. Subsequently, particle position and velocity are updated based on individual best position and group best position. At each generation update, fitness value is calculated, compared with the individual best position and group best position, and the process iterates until predetermined iteration times are reached, yielding the combined best position for the particle swarm [47].

All particles should find the best position in the search space. Each particle position and velocity can be calculated as follows:

v_{i}^{k + 1} = w v_{i}^{k} + c_{1} r_{1} (p_{i}^{k} - x_{i}^{k}) + c_{2} r_{2} (p_{g}^{k} - x_{i}^{k})

(3)

x_{i}^{k + 1} = x_{i}^{k} + v_{i}^{k + 1}

(4)

where

x_{i}^{k}

is the position of the ith particle at the current moment,

v_{i}^{k}

is the velocity of the ith particle at the current moment,

x_{i}^{k + 1}

is the position of the ith particle at the next moment,

v_{i}^{k + 1}

is the velocity of the ith particle at the next moment; r₁ and r₂ are random numbers in the interval [0, 1], c₁ and c₂ are cognitive weight and social weight respectively, and w is inertia weight;

p_{i}^{k}

is the best position obtained from the search of the ith particle at the current moment, and

p_{g}^{k}

is the best position in the whole particle population at the current moment.

The parameter values of the RF model, whether too large or too small, can lead to issues such as poor model fitting, increased computational complexity, and inappropriate feature selection, ultimately affecting the accuracy of SOH prediction. Given the advantages of simplicity and fast convergence offered by PSO, it can be applied to optimize the parameters of machine learning models. Therefore, we establish a PSO algorithm for optimizing the parameters of an RF model. Initially, we define a particle as a combination of five parameters to be optimized within the RF model: max_depth, n_estimators, max_features, min_samples_split, and min_samples_leaf. This is equivalent to particles in a five-dimensional space searching for the best solution within the specified ranges for these five parameters. The ranges for the five parameters can be empirically set as [2, 20], [1, 100], [0.01, 1], [2, 20], and [1, 20], respectively. This is equivalent to defining the search space for each dimension. Next, we initialize the position and velocity of the particles and use the out-of-bag error as the fitness function to compute the fitness values of each particle within the RF model. We then determine whether the current iteration time has reached the predefined value. If it has not, we compare fitness values to obtain group best position and individual best position. We use Equations (3) and (4) to iteratively update the position and velocity of each particle and recalculate their fitness values. This process continues until the iteration times reach the predefined value. At that point, we output the best position of the particle swarm, which represents the best parameter combination. Finally, we decode the best parameter combination to obtain the best solution for the five parameters. The specific steps are shown in Figure 2.

To implement the PSO algorithm for parameter optimization in the RF model using Python, follow the development process outlined below.

Import necessary Python libraries and modules, mainly including sklearn.ensemble, RandomForestRegressor, and numpy.
Define the PSO class.
- Define the parameters of the PSO algorithm, such as the number of particles (num_particles), maximum iteration times (max_iter), parameters ranges (param_ranges), and fitness function (fitness_func).
- Initialize the parameters of particles, including particle positions, particle velocities, global best position, and best fitness.
Define the function to initialize random particle positions, initialize_particles(). Define the function to update particle velocities, update_velocity().Define the function to update particle positions, update_position().
Define the PSO optimization process function, optimize(). Calculate the fitness value of particles and update the group best solution and fitness, as well as updating each particle’s individual best solution and fitness value. Set the inertia weight, cognitive weight, and social weight for PSO. Finally, update the particle positions and velocities.
Define the fitness function, fitness_func(). Construct the RF model with the input parameter values and perform training and prediction using the training data (x_train, y_train) and testing data (x_test, y_test).
Define the parameter ranges, which include max_depth, n_estimators, max_features, min_samples_split, and min_samples_leaf. Initialize the PSO algorithm and call the function to perform optimization, obtaining the best combination of parameters.

2.5. Evaluation Index

A total of 10% of the data of the original sample set in the RF model constitutes the testing sets, and the models trained by the training sets are used to predict the SOH of the LIB. Meanwhile, the prediction accuracy of the model is further verified according to the obtained evaluation index. We adopt mean absolute percentage error (MAPE) [43], root-mean-square error (RMSE) [45], and the coefficient of determination R² [43] as the evaluation index for the model to assess the regression performance. A lower MAPE value indicates a better model performance, a smaller RMSE value indicates a smaller model prediction error, and a higher R² value indicates the better the fit of the model. The specific calculation formula is as follows.

M A P E = \frac{100 %}{n} \sum_{i = 1}^{n} |\frac{p_{i} - y_{i}}{y_{i}}| \in [0, + \infty]

(5)

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - p_{i})}^{2}} \in [0, + \infty]

(6)

R^{2} = 1 - \frac{\sum_{i = 1}^{n} {(y_{i} - p_{i})}^{2}}{\sum_{i = 1}^{n} {(y_{i} - \bar{y})}^{2}} \in [0, 1]

(7)

where n is the number of samples, y_i is the true value of the ith sample, p_i is the predicted value of the ith sample, and

{\bar{y}}_{}

is the average value of the true value of the sample.

3. LIB SOH Prediction Case Study and Analysis

3.1. Experimental Data Collection and Preprocessing

To investigate the variation in capacity and SOH during the charge–discharge process of LIB, we employed 47 DLG manufactured 18,650 LiFePO₄ batteries with a rated capacity of 2500 mAh. Charge–discharge cycle experiments were conducted using a battery tester with the model CT-4008T-5V6A-S1. The specific experimental steps are as shown in Section 2.2, where the battery tester equipment is shown in Figure 3, and the interface for data collection using the BTSDA software is shown in Figure 4. To ensure the applicability of the model, we selected three representative #1, #6, and #26 LIB, from the 47 experimental batteries for SOH analysis. The datasets for these three LIBs consisted of 985, 820, and 656 data points, respectively. Taking the experimental data for #1 LIB as an example, each data point includes 13 types of information, such as cycle index, charge capacity, discharge capacity, and charge–discharge efficiency, among others. A portion of the original data is presented in Table 1.

The original data do not meet the conditions for direct analysis, and require processing to transform them into standardized data. Firstly, any instances of missing values and outliers in the original charge–discharge cycle experimental data are removed. Secondly, the 3Sigma method is used to analyze whether there are abnormal data in the data. If there are, they will be processed. For example, in the data of the 204th cycle in the original data in Table 1, there are anomalies in all the other data except for the consistent-charge-capacity and charge-time data. The average value of the data of the same category as the abnormal data in the last cycle and the next cycle is taken as the processed data, as shown in Table 2. Lastly, the Min-Max standardization method is applied to the data to standardize feature variables’ dimensions and scales. This standardization approach helps mitigate discrepancies in dimensions, which can lead to interpretation biases. It imparts greater interpretational significance to the model results, and enhances model precision. In addition, the number of data after processing has not changed; there are still 985 pieces of data, and some of the processed data are shown in Table 3.

The battery data obtained from the aforementioned #1 LIB charge–discharge cycle experiment yield a discharge capacity trend, as shown in Figure 5. Through this graph, it can be observed that, with an increase in the number of charge–discharge cycles, the overall discharge capacity of the LIB gradually decreases. However, there are also instances of partial capacity recovery, which is a normal occurrence. The SOH of the LIB is defined as the ratio of discharge capacity to rated capacity under specific charge–discharge conditions. As the discharge capacity gradually decreases, SOH also decreases. When SOH falls below 80%, the battery’s range, charge–discharge efficiency, and other performance metrics start to deteriorate. This deterioration can potentially impact the normal operation of new-energy electric vehicles, posing a risk to road traffic safety. Therefore, it is essential to carry out early prediction of LIB SOH.

3.2. Feature Importance Calculation

Using a Python program, the processed LIB charge–discharge cycle experimental data were read. A total of 90% of the data was randomly selected as the training sets, while 10% was set aside as the testing sets. Finally, the importance of each feature in the RF model was calculated and sorted using the defined ‘importance_scores’ code, as shown in Figure 6.

Initially, a feature set consisting of 13 features was selected, including charge–discharge capacity, median voltage, charge–discharge time, and others, as shown in Table 4. Starting with the feature set containing 13 features, the features were sorted based on their importance. The least important feature was removed, and the importance of the remaining features were recalculated. This process continued iteratively until the number of features reached 7. At this point, no feature in the feature set had an importance lower than the threshold of 0.003 (indicating minimal impact on the model). It was considered that all features in this set had a significant influence on the model. The final selection of features and their importance rankings are shown in Table 5. From the data in the table, it can be observed that, apart from cycle index and discharge time, the features with the highest importance are discharge capacity and charge capacity.

3.3. Parameter Optimization Based on PSO

The RF improved model builds upon the Class RF model by incorporating the PSO algorithm for adaptive parameter tuning. It iteratively searches for the best position, and there is a strong interdependence between iterations, making it challenging to achieve full parallel computation. Consequently, the computational complexity of the RF improved model is relatively high. While ensuring the adequate performance of different models, the iteration times for the PSO algorithm was set to 10 iterations. Through measurement, the runtime for the Class RF model is 7.79 s, while the RF improved model 1, optimized for three parameters, has a runtime of 65.22 s, and the RF improved model 2, optimized for five parameters, has a runtime of 89.40 s. This indicates that the proposed RF improved model in this paper comes with significant computational complexity. The key code for parameter optimization based on the PSO algorithm is shown in Figure 7.

Randomly selecting 90% of the charge–discharge cycle experimental data from #1, #6, and #26 LIB for model training sets and reserving the remaining 10% for testing sets, the major parameter values for the Class RF model, RF improved model 1, and RF improved model 2 are presented in Table 6. For the Class RF model, parameters were adjusted based on manual experience. RF improved model 1 employed the PSO algorithm to optimize max_depth, n_estimators, and max_features, while keeping the other two parameters consistent with the Class RF model. RF improved model 2 used the PSO algorithm to optimize max_depth, n_estimators, max_features, min_samples_split, and min_samples_leaf, involving the optimization of five parameters.

3.4. Analysis of Prediction Results

To validate the feasibility of the RF model in LIB SOH prediction, the DT model, CNN model, LSTM model, and Class RF model were employed to predict the SOH of three different LIBs. The comparative results of the prediction are shown in Figure 8. From Figure 8, it can be observed that among the aforementioned traditional machine learning models, the SOH predictions of the Class RF model are consistently the closest to the true values for the three different LIB experimental datasets.

To validate the effectiveness of the RF improved model proposed in this paper for LIB SOH prediction, RF improved model 1 and RF improved model 2 were further employed to predict the SOH of three different LIBs. These predictions were compared with the Class RF model. The comparative results of the different RF models are shown in Figure 9. From Figure 9, it can be observed that among the different RF models, the RF improved models with optimized parameters provide SOH predictions that are closer to the true values compared to the Class RF model. Additionally, RF improved model 2, which optimizes five parameters, consistently provides the prediction closest to the true values.

To further analyze the relationship between the prediction and true values of different models, a simple linear analysis was conducted on the prediction of the six models mentioned above, as shown in Table 7.

Regarding the linear analysis of prediction results, when the slope of the simple linear equation is closer to 1 and the intercept is closer to 0, it indicates higher prediction accuracy. Therefore, Table 7 reveals that among the three different LIB SOH prediction results, the Class RF model consistently exhibits higher prediction accuracy than the other three traditional machine learning models. Among the different RF models, RF improved model 2 consistently exhibits the highest prediction accuracy. To further analyze the evaluation index of the model, the evaluation index for the six prediction models were put into the following Table 8.

Since smaller RMSE and MAPE values indicate smaller model errors, and a larger R² value signifies a higher degree of model fit, Table 8 reveals that different LIB charge–discharge cycle experimental data yield varying values of evaluation index, leading to different ranking of model prediction results. However, among the four traditional machine learning models, namely the DT model, CNN model, LSTM model, and Class RF model, the Class RF model consistently exhibits better evaluation index. Furthermore, the evaluation index of the two optimized RF improved models are superior to the Class RF model. In the SOH prediction results for #1, #6, and #26 LIB, RF improved model 1 achieved a reduction of 19.82%, 17.07%, and 17.61% in RMSE compared to the Class RF model, a decrease of 19.94%, 17.39%, and 17.59% in MAPE, and an increase of 1.24%, 0.82%, and 1.14% in R², respectively. RF improved model 2, when compared to RF improved model 1, achieved a decrease of 21.93%, 15.90%, and 24.00% in RMSE, a reduction of 23.24%, 18.21%, and 24.25% in MAPE, and an increase of 0.92%, 0.61%, and 0.92% in R², respectively. These results demonstrate that RF improved model 2 has the best evaluation index.

By comparing the SOH prediction results of various models across multiple sets of data, as shown in Figure 8 and Figure 9, as well as in Table 7 and Table 8, a comprehensive ranking of the prediction results for each model under three different LIB experimental datasets is presented in Table 9. From Table 9, it can be observed that the comprehensive ranking of prediction results varies with different LIB experimental data. However, among the four traditional machine learning models, the Class RF model is consistently better than the DT model, CNN model, and LSTM model, affirming the feasibility of using the RF model for LIB SOH prediction in this study. Additionally, among the three RF models, the RF improved model is consistently better than the Class RF model, with RF improved model 2 consistently being the best. This validates the effectiveness of the proposed RF improved model for LIB SOH prediction in this study.

4. Discussion

We conducted charge–discharge cycle experiments on multiple different 18,650 LiFePO₄ batteries, collecting various sets of LIB charge–discharge cycle experimental data. Through the selection and improvement of the RF model, research on LIB SOH prediction was carried out. In terms of feasibility verification for the selection of the RF model, we constructed four traditional machine learning models, namely the DT model, CNN model, LSTM model, and RF model, using Python programs. We performed SOH prediction and comparative analysis on three LIB with relatively large datasets. The results showed that different LIB datasets produced varying model prediction results, leading to different rankings of model prediction results. However, among these four traditional machine learning models, regardless of which LIB charge–discharge cycle experimental data was used, the prediction of the Class RF model consistently outperformed the other three models. This finding validates the viewpoints presented in References [32,33,34], indicating that the RF model tends to exhibit higher prediction accuracy in certain application scenarios compared to the DT model, CNN model, and LSTM model. These results demonstrate the feasibility of using the RF model for LIB SOH prediction in this study.

To enhance the prediction accuracy of the model, we further proposed the RF improved model. To verify the effectiveness of the RF improved model, we constructed RF improved model 1 and RF improved model 2 for SOH prediction on multiple LIBs, comparing them with the Class RF model. In the SOH prediction results for #1, #6, and #26 LIB, RF improved model 1 achieved a reduction in RMSE of 19.82%, 17.07%, and 17.61% compared to the Class RF model. It also achieved decreases in MAPE of 19.94%, 17.39%, and 17.59%, along with increases in R² of 1.24%, 0.82%, and 1.14%. This improvement can be attributed to the fact that parameter tuning for the Class RF model was based on subjective human judgment without adaptive optimization. In contrast, RF improved model 1 used a method inspired by Reference [40] for adaptive optimization of max_depth, n_estimators, and max_features. This confirmed that using the PSO algorithm, as discussed in References [40,41], to optimize parameters of the RF model is indeed beneficial for improving prediction accuracy.

Building upon this, RF improved model 2, proposed in this study, further adapted max_depth, n_estimators, max_features, min_samples_split, and min_samples_leaf. In the SOH prediction results for #1, #6, and #26 LIB, RF improved model 2 achieved a reduction in RMSE of 21.93%, 15.90%, and 24.00% compared to RF improved model 1. It also achieved decreases in MAPE of 23.24%, 18.21%, and 24.25%, along with increases in R² of 0.92%, 0.61%, and 0.92%. These results indicate that the evaluation index for RF improved model 2 are superior to RF improved model 1. In conclusion, the RF improved model 2, based on the PSO algorithm for adaptive optimization of five parameters, has proven to be effective in LIB SOH prediction. It can further enhance the prediction accuracy of LIB in new-energy electric vehicles.

5. Conclusions

Aiming at the development trend for green transportation, we conduct SOH prediction method research to focus on LIB for new-energy electric vehicles. The study primarily accomplishes the following three objectives: First, multiple LIB experimental data are collected through charge–discharge cycle experiments on 18,650 LiFePO₄ batteries, providing foundational data for LIB SOH prediction. Second, the analyses of SOH prediction for multiple LIBs are conducted using four traditional machine learning models. The results indicate that the Class RF model yields relatively superior prediction results, confirming the feasibility of employing the RF model for LIB SOH prediction in this study. Third, further optimization of the RF model is carried out using the PSO algorithm, followed by the analysis of LIB SOH prediction. The results reveal that the proposed adaptive optimization of the RF improved model with five parameters provides the best SOH prediction, validating the effectiveness of the RF improved model introduced in this paper for LIB SOH prediction.

Conducting research on SOH prediction for LIB is beneficial for timely assessment of the lifespan of new-energy electric vehicle LIBs. Simultaneously, it provides valuable insights into the lifespan analysis of the battery in various types of electric vehicles. This research contributes to addressing the issues of reliability and sustainability of power in new-energy electric vehicle batteries, ensuring operational efficiency and the promotional value of new-energy electric vehicles. Consequently, it promotes the rapid development of the new-energy electric vehicle and battery industries, serving a practical purpose in advancing green transportation based on new-energy electric vehicles and playing a significant role in reducing environmental pollution.

Due to the lack of data support for different types of LIB, there remains some uncertainty in applying the methods proposed in this paper to SOH prediction for other types of LIB. However, the modeling process for SOH prediction in different types of electric vehicles batteries follows a similar framework, with variations mainly in data collection and preprocessing. Future research will consider the acquisition and processing of data from various types of LIB. Furthermore, since the performance of LIB SOH prediction may vary under different operating conditions and environments, we will continue to explore ways to enhance the model’s generalization capability, making it more versatile and reliable. We also plan to expand its applicability to other scenarios, such as battery management systems for large-scale electric vehicles and the control and management of energy storage systems.

Author Contributions

Conceptualization, Z.L. and R.W.; methodology, Z.L.; software, X.Z.; validation, Z.L., R.W. and X.Z.; formal analysis, R.W. and Y.L.; investigation, Z.L.; resources, Z.L.; data curation, Y.L. and X.Z.; writing—original draft preparation, R.W.; writing—review and editing, Z.L.; visualization, Y.X.; supervision, Z.L.; project administration, Y.X.; funding acquisition, Z.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the University Natural Sciences Research Project of Anhui Province (grant no. 2023AH040306), the General Project of Anhui Natural Science Foundation (grant no. 2208085ME147), Anhui Province quality project (grant no. 2022xjzlts035) and Hefei University Postgraduate Cooperative Education Base Project (grant no. 2021Yjyxm07).

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Li, J. Charging Chinese Future: The Roadmap of China’s Policy for New Energy Automotive Industry. Int. J. Hydrogen Energy 2020, 45, 11409–11423. [Google Scholar] [CrossRef]
Zhao, M.; Sun, T. Dynamic Spatial Spillover Effect of New Energy Vehicle Industry Policies on Carbon Emission of Transportation Sector in China. Energy Policy 2022, 165, 112991. [Google Scholar] [CrossRef]
Jenn, A.; Azevedo, I.L.; Michalek, J.J. Alternative-Fuel-Vehicle Policy Interactions Increase U.S. Greenhouse Gas Emissions. Transp. Res. Part A Policy Pract. 2019, 124, 396–407. [Google Scholar] [CrossRef]
Wang, T. The Current Situation and Prospect of Lithium Batteries for New Energy Vehicles. J. Phys. Conf. Ser. 2021, 2014, 012015. [Google Scholar] [CrossRef]
Jiao, M.; Wang, Y.; Ye, C.; Wang, C.; Zhang, W.; Liang, C. High-Capacity SiO_x (0 ≤ x ≤ 2) as Promising Anode Materials for next-Generation Lithium-Ion Batteries. J. Alloys Compd. 2020, 842, 155774. [Google Scholar] [CrossRef]
Attia, P.M.; Bills, A.; Brosa Planella, F.; Dechent, P.; Dos Reis, G.; Dubarry, M.; Gasper, P.; Gilchrist, R.; Greenbank, S.; Howey, D.; et al. Review—“Knees” in Lithium-Ion Battery Aging Trajectories. J. Electrochem. Soc. 2022, 169, 060517. [Google Scholar] [CrossRef]
Wang, F.; Zhao, Z.; Zhai, Z.; Shang, Z.; Yan, R.; Chen, X. Explainability-Driven Model Improvement for SOH Estimation of Lithium-Ion Battery. Reliab. Eng. Syst. Saf. 2023, 232, 109046. [Google Scholar] [CrossRef]
Wen, J.; Zou, Q.; Chen, C.; Wei, Y. Linear Correlation between State-of-Health and Incremental State-of-Charge in Li-Ion Batteries and Its Application to SoH Evaluation. Electrochim. Acta 2022, 434, 141300. [Google Scholar] [CrossRef]
Faraji-Niri, M.; Rashid, M.; Sansom, J.; Sheikh, M.; Widanage, D.; Marco, J. Accelerated State of Health Estimation of Second Life Lithium-Ion Batteries via Electrochemical Impedance Spectroscopy Tests and Machine Learning Techniques. J. Energy Storage 2023, 58, 106295. [Google Scholar] [CrossRef]
Zhang, H.; Deng, C.; Zong, Y.; Zuo, Q.; Guo, H.; Song, S.; Jiang, L. Effect of Sample Interval on the Parameter Identification Results of RC Equivalent Circuit Models of Li-Ion Battery: An Investigation Based on HPPC Test Data. Batteries 2022, 9, 1. [Google Scholar] [CrossRef]
Bensaad, Y.; Friedrichs, F.; Baumhöfer, T.; Eswein, M.; Bähr, J.; Fill, A.; Birke, K.P. Embedded Real-Time Fractional-Order Equivalent Circuit Model for Internal Resistance Estimation of Lithium-Ion Cells. J. Energy Storage 2023, 67, 107516. [Google Scholar] [CrossRef]
Li, Z.; Ma, S.; Wang, Y.; Zhu, B.; Zhang, H.; Zhang, M.; Pan, Y.; Yu, K. Research and Analysis of Equivalent Circuit Model for Core Snubber. J. Phys. Conf. Ser. 2023, 2452, 012031. [Google Scholar] [CrossRef]
Li, Y.; Herrera-Viedma, E.; Kou, G.; Morente-Molinera, J.A. Z-Number-Valued Rule-Based Decision Trees. Inf. Sci. 2023, 643, 119252. [Google Scholar] [CrossRef]
Hao, H.; Fuzhou, F.; Junzhen, Z.; Xun, Z.; Pengcheng, J.; Feng, J.; Jun, X.; Yazhi, L.; Guanghui, S. Research on Fault Diagnosis Method Based on Improved CNN. Shock Vib. 2022, 2022, 9312905. [Google Scholar] [CrossRef]
Li, B.; Li, R.; Sun, T.; Gong, A.; Tian, F.; Khan, M.Y.A.; Ni, G. Improving LSTM Hydrological Modeling with Spatiotemporal Deep Learning and Multi-Task Learning: A Case Study of Three Mountainous Areas on the Tibetan Plateau. J. Hydrol. 2023, 620, 129401. [Google Scholar] [CrossRef]
Yang, N.; Song, Z.; Hofmann, H.; Sun, J. Robust State of Health Estimation of Lithium-Ion Batteries Using Convolutional Neural Network and Random Forest. J. Energy Storage 2022, 48, 103857. [Google Scholar] [CrossRef]
Liu, C.; Lin, B.; Lai, J.; Miao, D. An Improved Decision Tree Algorithm Based on Variable Precision Neighborhood Similarity. Inf. Sci. 2022, 615, 152–166. [Google Scholar] [CrossRef]
Zhang, Z.; Li, L.; Li, X.; Hu, Y.; Huang, K.; Xue, B.; Wang, Y.; Yu, Y. State-of-Health Estimation for the Lithium-Ion Battery Based on Gradient Boosting Decision Tree with Autonomous Selection of Excellent Features. Int. J. Energy Res. 2022, 46, 1756–1765. [Google Scholar] [CrossRef]
Salinas, F.; Kowal, J. Classifying Aged Li-Ion Cells from Notebook Batteries. Sustainability 2020, 12, 3620. [Google Scholar] [CrossRef]
Wang, P.; Fan, J.; Ou, Y.; Li, Z.; Wang, Y.; Deng, B.; Zhang, Y.; Gao, Z. A comparative study of machine learning based modeling methods for Lithium-ion battery. In Proceedings of the 2020 6th International Conference on Advances in Energy, Environment and Chemical Engineering, Chongqing, China, 20–22 November 2020; pp. 659–668. [Google Scholar]
Wang, Y.; Wang, H. Wavelet Attention-Powered Neural Network Framework with Hierarchical Dynamic Frequency Learning for Lithium-Ion Battery State of Health Prediction. J. Energy Storage 2023, 61, 106697. [Google Scholar] [CrossRef]
Din, N.U.; Zhang, L.; Yang, Y. Automated Battery Making Fault Classification Using Over-Sampled Image Data CNN Features. Sensors 2023, 23, 1927. [Google Scholar] [CrossRef] [PubMed]
Ruan, H.; Wei, Z.; Shang, W.; Wang, X.; He, H. Artificial Intelligence-Based Health Diagnostic of Lithium-Ion Battery Leveraging Transient Stage of Constant Current and Constant Voltage Charging. Appl. Energy 2023, 336, 120751. [Google Scholar] [CrossRef]
Li, R.; Xiong, B.; Zhang, S.; Zhang, X.; Li, Y.; Iu, H.; Fernando, T. A Novel One Dimensional Convolutional Neural Network Based Data-Driven Vanadium Redox Flow Battery Modelling Algorithm. J. Energy Storage 2023, 61, 106767. [Google Scholar] [CrossRef]
Nguyen Van, C.; Quang, D.T. Estimation of SoH and Internal Resistances of Lithium Ion Battery Based on LSTM Network. Int. J. Electrochem. Sci. 2023, 18, 100166. [Google Scholar] [CrossRef]
Li, Z.; Liao, C.; Zhang, C.; Wang, L.; Li, Y.; Wang, L. State-of-Charge Estimation of Lithium-Rich Manganese-Based Batteries Based on WOA LSTM and Extended Kalman Filter. J. Electrochem. Soc 2023, 170, 050540. [Google Scholar] [CrossRef]
Yang, J.; Zou, L.; Wei, Y.; Yuan, P.; Zhou, C. Health Status Prediction of Lithium Battery Based on LSTM Model with Optimization Algorithms. J. Phys. Conf. Ser 2023, 2473, 012020. [Google Scholar] [CrossRef]
Zhang, L.; Ji, T.; Yu, S.; Liu, G. Accurate Prediction Approach of SOH for Lithium-Ion Batteries Based on LSTM Method. Batteries 2023, 9, 177. [Google Scholar] [CrossRef]
Xiang, H.; Xi, Y.; Mao, D.; Mahdianpari, M.; Zhang, J.; Wang, M.; Jia, M.; Yu, F.; Wang, Z. Mapping Potential Wetlands by a New Framework Method Using Random Forest Algorithm and Big Earth Data: A Case Study in China’s Yangtze River Basin. Glob. Ecol. Conserv. 2023, 42, e02397. [Google Scholar] [CrossRef]
Zhang, C.; Wang, W.; Liu, L.; Ren, J.; Wang, L. Three-Branch Random Forest Intrusion Detection Model. Mathematics 2022, 10, 4460. [Google Scholar] [CrossRef]
Endzhievskaya, I.G.; Endzhievskiy, A.S.; Galkin, M.A.; Molokeev, M.S. Machine Learning Methods in Assessing the Effect of Mixture Composition on the Physical and Mechanical Characteristics of Road Concrete. J. Build. Eng. 2023, 76, 107248. [Google Scholar] [CrossRef]
Ribeiro, A.M.N.C.; Do Carmo, P.R.X.; Endo, P.T.; Rosati, P.; Lynn, T. Short- and Very Short-Term Firm-Level Load Forecasting for Warehouses: A Comparison of Machine Learning and Deep Learning Models. Energies 2022, 15, 750. [Google Scholar] [CrossRef]
Wang, R. Comparison of Decision Tree, Random Forest and Linear Discriminant Analysis Models in Breast Cancer Prediction. J. Phys. Conf. Ser. 2022, 2386, 012043. [Google Scholar] [CrossRef]
Moghadam, S.M.; Yeung, T.; Choisne, J. A Comparison of Machine Learning Models’ Accuracy in Predicting Lower-Limb Joints’ Kinematics, Kinetics, and Muscle Forces from Wearable Sensors. Sci. Rep. 2023, 13, 5046. [Google Scholar] [CrossRef] [PubMed]
Lin, C.; Xu, J.; Shi, M.; Mei, X. Constant Current Charging Time Based Fast State-of-Health Estimation for Lithium-Ion Batteries. Energy 2022, 247, 123556. [Google Scholar] [CrossRef]
Lin, M.; Wu, D.; Meng, J.; Wu, J.; Wu, H. A Multi-Feature-Based Multi-Model Fusion Method for State of Health Estimation of Lithium-Ion Batteries. J. Power Sources 2022, 518, 230774. [Google Scholar] [CrossRef]
Tang, Q.; Zhang, L.; Lan, G.; Shi, X.; Duanmu, X.; Chen, K. A Classification Method of Point Clouds of Transmission Line Corridor Based on Improved Random Forest and Multi-Scale Features. Sensors 2023, 23, 1320. [Google Scholar] [CrossRef]
Xu, J.; Feng, Z.; Tang, J.; Liu, S.; Ding, Z.; Lyu, J.; Yao, Q.; Yang, B. Improved Random Forest for the Automatic Identification of Spodoptera Frugiperda Larval Instar Stages. Agriculture 2022, 12, 1919. [Google Scholar] [CrossRef]
Balyan, A.K.; Ahuja, S.; Lilhore, U.K.; Sharma, S.K.; Manoharan, P.; Algarni, A.D.; Elmannai, H.; Raahemifar, K. A Hybrid Intrusion Detection Model Using EGA-PSO and Improved Random Forest Method. Sensors 2022, 22, 5986. [Google Scholar] [CrossRef]
Luo, W.; Yuan, D.; Jin, D.; Lu, P.; Chen, J. Optimal Control of Slurry Pressure during Shield Tunnelling Based on Random Forest and Particle Swarm Optimization. Comput. Model. Eng. Sci. 2021, 128, 109–127. [Google Scholar] [CrossRef]
Xiong, F.; Cao, C.; Tang, M.; Wang, Z.; Tang, J.; Yi, J. Fault Detection of UHV Converter Valve Based on Optimized Cost-Sensitive Extreme Random Forest. Energies 2022, 15, 8059. [Google Scholar] [CrossRef]
Gao, Z.; Xia, R.; Zhang, P. Prediction of Anti-proliferation Effect of [1,2,3] Triazolo [4,5-d] pyrimidine Derivatives by Random Forest and Mix-Kernel Function SVM with PSO. Chem. Pharm. Bull. 2022, 70, 684–693. [Google Scholar] [CrossRef] [PubMed]
Su, Y. Comparative Analysis of Lithium Iron Phosphate Battery and Ternary Lithium Battery. J. Phys. Conf. Ser 2022, 2152, 012056. [Google Scholar] [CrossRef]
Izonin, I.; Tkachenko, R.; Shakhovska, N.; Ilchyshyn, B.; Singh, K.K. A Two-Step Data Normalization Approach for Improving Classification Accuracy in the Medical Diagnosis Domain. Mathematics 2022, 10, 1942. [Google Scholar] [CrossRef]
Yan, L.; Lei, Q.; Jiang, C.; Yan, P.; Ren, Z.; Liu, B.; Liu, Z. Climate-Informed Monthly Runoff Prediction Model Using Machine Learning and Feature Importance Analysis. Front. Environ. Sci. 2022, 10, 1049840. [Google Scholar] [CrossRef]
Zhang, Y. Application of Particle Swarm Algorithm in Nanoscale Damage Detection and Identification of Steel Structure. Int. J. Anal. Chem. 2022, 2022, 4300840. [Google Scholar] [CrossRef]
Kumar, K.; Singh, V.; Roshni, T. Application of the PSO–Neural Network in Rainfall–Runoff Modeling. Water Pract. Technol. 2022, 18, 16–26. [Google Scholar] [CrossRef]

Figure 1. Basic flow chart of RF improved model.

Figure 2. Basic flow chart of the PSO algorithm for optimizing the RF model parameters.

Figure 3. Battery tester equipment.

Figure 4. The data collection interface of the BTSDA software.

Figure 5. The discharge capacity graph of #1 LIB cycle 985 times.

Figure 6. The key code for feature importance ranking.

Figure 7. The key code for PSO algorithm parameter optimization.

Figure 8. Comparison chart of SOH prediction for the four traditional machine learning models. (a) #1 LIB. (b) #6 LIB. (c) #26 LIB.

Figure 9. Comparison chart of SOH prediction for different RF models. (a) #1 LIB. (b) #6 LIB. (c) #26 LIB.

Table 1. Part of the original experimental data.

Cycle Index	Charge Capacity (Ah)	Discharge Capacity (Ah)	Charge Discharge Efficiency (%)	Constant Current Charging Ratio (%)	Constant Current Charging Capacity (Ah)	Mean Voltage (V)	Charge Time	Discharge Time
200	2.3214	2.3217	100.01	80.81	1.8758	3.4619	01:13:42	00:55:41
201	2.3208	2.3205	99.99	80.83	1.8758	3.4616	01:13:39	00:55:39
202	2.3198	2.3209	100.05	80.86	1.8758	3.4607	01:13:41	00:55:40
203	2.32	2.3186	99.94	80.86	1.8758	3.4613	01:13:44	00:55:37
204	2.318	1.4162	61.1	45.25	0.6408	1.3564	01:13:43	00:21:24
205	2.317	2.3171	100	80.96	1.8758	3.4607	01:13:35	00:55:34
206	2.3162	2.3176	100.06	80.99	1.8758	3.4604	01:13:33	00:55:35
207	2.3171	2.3187	100.07	80.96	1.8758	3.4601	01:13:43	00:55:37
208	2.3183	2.3187	100.02	80.92	1.8758	3.4601	01:13:37	00:55:37
……	……	……	……	……	……	……	……	……

Table 2. Partial data after abnormal data processing.

Cycle Index	Charge Capacity (Ah)	Discharge Capacity (Ah)	Charge Discharge Efficiency (%)	Constant Current Charging Ratio (%)	Constant Current Charging Capacity (Ah)	Mean Voltage (V)	Charge Time	Discharge Time
200	2.3214	2.3217	100.01	80.81	1.8758	3.4619	01:13:42	00:55:41
201	2.3208	2.3205	99.99	80.83	1.8758	3.4616	01:13:39	00:55:39
202	2.3198	2.3209	100.05	80.86	1.8758	3.4607	01:13:41	00:55:40
203	2.32	2.3186	99.94	80.86	1.8758	3.4613	01:13:44	00:55:37
204	2.318	2.32	99.97	80.91	1.88	3.46	01:13:43	00:55:36
205	2.317	2.3171	100	80.96	1.8758	3.4607	01:13:35	00:55:34
206	2.3162	2.3176	100.06	80.99	1.8758	3.4604	01:13:33	00:55:35
207	2.3171	2.3187	100.07	80.96	1.8758	3.4601	01:13:43	00:55:37
208	2.3183	2.3187	100.02	80.92	1.8758	3.4601	01:13:37	00:55:37
……	……	……	……	……	……	……	……	……

Table 3. Partial data after standardized processing.

Cycle Index	Charge Capacity (Ah)	Discharge Capacity (Ah)	Charge Discharge Efficiency (%)	Constant Current Charging Ratio (%)	Constant Current Charging Capacity (Ah)	Mean Voltage (V)	Charge Time	Discharge Time
200	0.6428	0.6434	0.525	0.81	0.6479	0.619	0.568	0.6365
201	0.6416	0.641	0.475	0.83	0.6479	0.616	0.556	0.6341
202	0.6396	0.6418	0.625	0.86	0.6479	0.607	0.564	0.6353
203	0.64	0.6372	0.35	0.86	0.6479	0.613	0.576	0.6318
204	0.636	0.64	0.425	0.91	0.6552	0.6	0.572	0.6306
205	0.634	0.6342	0.5	0.96	0.6479	0.607	0.54	0.6282
206	0.6324	0.6352	0.65	0.99	0.6479	0.604	0.532	0.6294
207	0.6342	0.6374	0.675	0.96	0.6479	0.601	0.572	0.6318
208	0.6366	0.6374	0.55	0.92	0.6479	0.601	0.548	0.6318
……	……	……	……	……	……	……	……	……

Table 4. Initial feature selection and feature-importance ranking based on #1LIB experimental data.

Feature	Important
discharge capacity (Ah)	0.238030
cycle index	0.193353
discharge energy (Wh)	0.163842
charge capacity (Ah)	0.148204
discharge time	0.135424
charge energy (Wh)	0.104562
net discharge energy (Wh)	0.010945
mean voltage (V)	0.003314
constant current charge capacity (Ah)	0.001290
constant current charge ratio (%)	0.000938
charge time	0.000053
net discharge capacity (Ah)	0.000025
charge–discharge efficiency (%)	0.000020

Table 5. Final feature selection and feature-importance ranking based on #1LIB experimental data.

Feature	Important
cycle index	0.226575
discharge time	0.202087
discharge capacity (Ah)	0.164712
charge capacity (Ah)	0.139227
discharge energy (Wh)	0.131437
charge energy (Wh)	0.101534
net discharge energy (Wh)	0.034428

Table 6. Parameter values for different RF models.

LIB	Parameter	Class RF Model	RF Improved Model 1	RF Improved Model 2
#1	max_depth	13.14	9.35	18.47
	n_estimators	100	64.06	98.51
	max_features	10	1	1
	min_samples_split	2	2	2.97
	min_samples_leaf	1	1	2.31
#6	max_depth	12.71	12.26	11.64
	n_estimators	100	11.07	8.08
	max_features	10	0.94	0.51
	min_samples_split	2	2	2
	min_samples_leaf	1	1	1
#26	max_depth	11.69	10.02	8.28
	n_estimators	100	88.84	124.51
	max_features	10	1	1
	min_samples_split	2	2	4.02
	min_samples_leaf	1	1	2.48

Table 7. Simple linear analysis of LIB SOH prediction from different models.

LIB	Ranking	Model	Simple Linear Equation
#1	1	RF improved model 2	y = 0.994x + 0.373
	2	RF improved model 1	y = 0.987x + 0.947
	3	Class RF model	y = 1.021x − 1.246
	4	LSTM model	y = 1.029x − 1.860
	5	CNN model	y = 1.036x − 2.595
	6	DT model	y = 0.956x + 4.098
#6	1	RF improved model 2	y = 0.995x − 0.407
	2	RF improved model 1	y = 0.995x + 1.454
	3	Class RF model	y = 1.015x − 2.532
	4	LSTM model	y = 1.062x − 4.144
	5	DT model	y = 1.089x − 6.526
	6	CNN model	y = 1.100x − 9.996
#26	1	RF improved model 2	y = 1.009x − 0.683
	2	RF improved model 1	y = 0.980x + 2.495
	3	Class RF model	y = 0.977x + 2.832
	4	CNN model	y = 0.948x + 3.863
	5	LSTM model	y = 1.054x − 4.135
	6	DT model	y = 1.067x − 5.104

Table 8. Comparison table of evaluation index for different models.

LIB	Ranking	Model	RMSE	MAPE	R²
#1	1	RF improved model 2	0.840	0.968	0.987
	2	RF improved model 1	1.076	1.261	0.978
	3	Class RF model	1.342	1.575	0.966
	4	LSTM model	1.468	1.695	0.959
	5	CNN model	1.700	2.003	0.946
	6	DT model	1.913	2.263	0.931
#6	1	RF improved model 2	0.846	0.952	0.986
	2	RF improved model 1	1.006	1.164	0.980
	3	Class RF model	1.213	1.409	0.972
	4	LSTM model	1.435	1.660	0.960
	5	DT model	1.609	1.864	0.950
	6	CNN model	1.806	2.091	0.937
#26	1	RF improved model 2	0.608	0.653	0.988
	2	RF improved model 1	0.800	0.862	0.979
	3	Class RF model	0.971	1.046	0.968
	4	CNN model	1.138	1.229	0.957
	5	LSTM model	1.312	1.417	0.942
	6	DT model	1.441	1.555	0.930

Table 9. The comprehensive ranking of prediction results for each model.

Ranking	#1	#6	#26
1	RF improved model 2	RF improved model 2	RF improved model 2
2	RF improved model 1	RF improved model 1	RF improved model 1
3	Class RF model	Class RF model	Class RF model
4	LSTM model	LSTM model	CNN model
5	CNN model	DT model	LSTM model
6	DT model	CNN model	DT model

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Liang, Z.; Wang, R.; Zhan, X.; Li, Y.; Xiao, Y. Lithium-Ion Battery State-of-Health Prediction for New-Energy Electric Vehicles Based on Random Forest Improved Model. Appl. Sci. 2023, 13, 11407. https://doi.org/10.3390/app132011407

AMA Style

Liang Z, Wang R, Zhan X, Li Y, Xiao Y. Lithium-Ion Battery State-of-Health Prediction for New-Energy Electric Vehicles Based on Random Forest Improved Model. Applied Sciences. 2023; 13(20):11407. https://doi.org/10.3390/app132011407

Chicago/Turabian Style

Liang, Zijun, Ruihan Wang, Xuejuan Zhan, Yuqi Li, and Yun Xiao. 2023. "Lithium-Ion Battery State-of-Health Prediction for New-Energy Electric Vehicles Based on Random Forest Improved Model" Applied Sciences 13, no. 20: 11407. https://doi.org/10.3390/app132011407

APA Style

Liang, Z., Wang, R., Zhan, X., Li, Y., & Xiao, Y. (2023). Lithium-Ion Battery State-of-Health Prediction for New-Energy Electric Vehicles Based on Random Forest Improved Model. Applied Sciences, 13(20), 11407. https://doi.org/10.3390/app132011407

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Lithium-Ion Battery State-of-Health Prediction for New-Energy Electric Vehicles Based on Random Forest Improved Model

Abstract

1. Introduction

2. Method

2.1. General Idea

2.2. Data Collection and Preprocessing

2.3. Feature Importance Calculation

2.4. Parameter Optimization Based on Particle Swarm Optimization

2.5. Evaluation Index

3. LIB SOH Prediction Case Study and Analysis

3.1. Experimental Data Collection and Preprocessing

3.2. Feature Importance Calculation

3.3. Parameter Optimization Based on PSO

3.4. Analysis of Prediction Results

4. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI