Generative Adversarial Network-Based Voltage Fault Diagnosis for Electric Vehicles under Unbalanced Data

Fang, Weidong; Guo, Yihan; Zhang, Ji

doi:10.3390/electronics13163131

Open AccessArticle

Generative Adversarial Network-Based Voltage Fault Diagnosis for Electric Vehicles under Unbalanced Data

by

Weidong Fang

^1,2,

Yihan Guo

^1,2,* and

Ji Zhang

^1,2

¹

School of Electrical, Electronics and Physics, Fujian University of Technology, Fuzhou 350001, China

²

Fujian Key Laboratory of Automotive Electronics and Electric Drive Technology, Fuzhou 350001, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(16), 3131; https://doi.org/10.3390/electronics13163131

Submission received: 4 July 2024 / Revised: 23 July 2024 / Accepted: 1 August 2024 / Published: 7 August 2024

(This article belongs to the Topic Artificial Intelligence Models, Tools and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The research of electric vehicle power battery fault diagnosis technology is turning to machine learning methods. However, during operation, the time of occurrence of faults is much smaller than the normal driving time, resulting in too small a proportion of fault data as well as a single fault characteristic in the collected data. This has hindered the research progress in this field. To address this problem, this paper proposes a data enhancement method using Least Squares Generative Adversarial Networks (LSGAN). The method consists of training the original power battery fault dataset using LSGAN models to generate diverse sample data representing various fault states. The augmented dataset is then used to develop a fault diagnosis framework called LSGAN-RF-GWO, which combines a random forest (RF) model with a Gray Wolf Optimization (GWO) model for effective fault diagnosis. The performance of the framework is evaluated on the original and enhanced datasets and compared with other commonly used models such as Support Vector Machine (SVM), Gradient Boosting Machine (GBM), and Naïve Bayes (NB). The results show that the proposed fault diagnosis scheme improves the evaluation metrics and accuracy level, proving that the LSGAN-RF-GWO framework can utilize limited data resources to effectively diagnose power battery faults.

Keywords:

electric vehicle; power battery; least squares generative adversarial network; data enhancements; uneven sample; fault diagnosis

1. Introduction

Promoting the development of new energy vehicles has become a top priority for addressing climate change and reducing dependence on fossil fuels. In order to promote the sustainable development of society, the application of new energy vehicles should be vigorously promoted, and the automotive power battery system, as a core component of new energy vehicles, plays a crucial role in its overall performance. Among the various existing battery technologies, lithium-ion batteries are favored by electric vehicles for their high energy density, low self-discharge rate, long service life, and other advantages. Battery pack voltage is a key parameter of power batteries [1,2], and any abnormal voltage fluctuation may lead to overheating; in extreme cases, it may even cause fire or explosion. Timely and accurate fault diagnosis methods for electric vehicle power batteries can improve the overall safety of electric vehicles, increase the public’s trust in electric vehicles [3], and, thus, promote the transformation of traditional transportation to green and low-carbon transportation.

Researchers are actively investigating risk assessment and early warning technologies to enhance the operational safety of electric vehicles. These studies focus on areas such as energy consumption, management strategies, and fault detection in battery power systems [4]. The primary objective is to swiftly and accurately identify faults during vehicle operation, which is crucial for preventing battery system issues and establishing real-time vehicle warning systems. These research efforts are vital for enhancing the safety of electric vehicles. Therefore, the development of diagnostic techniques capable of rapidly and precisely detecting battery pack failures in electric vehicles is essential to improve safety standards and bolster consumer confidence in this eco-friendly, low-carbon mode of transportation [5].

The anomaly detection process of a battery system covers four key aspects: fault identification, fault isolation, fault estimation, and fault tolerance measures [6]. The flowchart of the lithium-ion battery system fault diagnosis is shown in Figure 1. In the event of a battery problem, the data generated during its operation are collected through sensors and carry a fault signal. Based on the collected data and predefined diagnostic programs, it is possible to assess the operational status of the battery system and to detect and isolate potential sources of failure. Nowadays, fault detection technology has evolved into an emerging cross-disciplinary technology by combining technologies from multiple fields such as computer networks, database technology, cybernetics, and artificial intelligence [7].

In the past few years, there has been rapid development in artificial intelligence and machine learning, leading to extensive research on power battery fault diagnosis by scholars worldwide. Several notable studies have achieved significant results in this field. Liu [8] proposed a model-based sensor fault diagnosis and identification method specifically designed for series-connected battery packs. This approach utilizes an adaptive extended Kalman filter to estimate the state of battery cells. By comparing the difference between the estimated voltage and the actual measured voltage, a residual signal is generated to detect fault conditions in the battery system. Sidhu [9] proposed an extended Kalman filtering method to generate the voltage residual signal, which is then used to assess the occurrence of faults. This technique proves effective in identifying and diagnosing faults. Chen [10] proposed a method for detecting voltage faults in lithium-ion batteries using the Local Outlier Factor (LOF). The LOF is employed to quantify the degree of outliers for a parameter relative to its neighboring parameters. The faults are evaluated using the outlier filter based on the Grubbs criterion. Hong [11] developed a predictive model for assessing the safety performance of power batteries by combining climatic conditions, vehicle conditions, and operator data. They employed a long and short-term memory neural network to predict the battery voltage of an electric vehicle. Li [12] proposed a fault detection and localization method by combining empirical mode decomposition and sample entropy techniques. By analyzing voltage signals extracted from automotive power batteries, battery faults were successfully detected and localized.

Nowadays, the fault diagnosis technology of electric vehicle power batteries mainly relies on the construction of complex data models using a large amount of historical vehicle operation data. However, in the process of vehicle operation, the time of normal driving is much larger than the time of fault occurrence, thus leading to too small a proportion of fault data in the collected data and an unbalanced dataset. However, models constructed based on a large amount of normal data may not be able to accurately diagnose faults in real situations. To ensure the validity and general applicability of fault diagnosis models, it is crucial to solve the problem of unbalanced data samples.

Treatment of the unbalanced dataset problem. It has been studied by some experts and scholars. Extracting features from unbalanced data is a key step in data-driven fault diagnosis methods, and common approaches to deal with unbalanced data can be categorized into two groups. The first category focuses on enhancing cost-sensitive learning algorithms to improve diagnostic accuracy for limited fault samples [13]. However, these cost-sensitive classification methods face two challenges: (1) Determining the cost associated with misclassification is often a challenging task; and (2) evaluating the performance of cost-sensitive classifiers can be challenging. The second class of approaches focuses on data preprocessing techniques, such as oversampling and undersampling, to mitigate the imbalance problem [14]. However, the undersampling process in data preprocessing does not introduce new knowledge and may lead to overfitting problems. Traditionally, undersampling reduces the number of samples for most categories, which leads to the loss of information.

By summarizing and reflecting on the previous research and addressing the above issues, this paper proposes a generative adversarial network approach applied to power battery time series data, which includes a well-designed data preprocessing process and a novel fault diagnosis framework architecture. Overall, the contributions of this paper are as follows:

Analyzing the influencing factors that cause the undervoltage fault of a single cell of a power battery in a pure electric vehicle and setting the fault threshold based on a real vehicle driving dataset.
Proposing the use of the LSGAN framework (model) to address the problem of unbalanced samples and single fault characteristics in time series fault data.
Proposing the use of the Gray Wolf Optimization algorithm to find the optimal hyperparameters of the random forest and carrying out the fault diagnosis experiments on the original dataset and the extended dataset, respectively; thus, the accuracy of the obtained results is improved.
Validating the proposed LSGAN-RF-GWO method’s efficacy in voltage fault diagnosis of power batteries through comparative tests.

The overall technical structure of this study is shown in Figure 2.

The rest of this paper is composed as follows: Section 2 introduces the LSGAN data enhancement modeling theory. Section 3 analyzes and extracts the causes of power battery voltage faults and analyzes the fault thresholds, then the processed dataset is put into the model for data enhancement and the results are visualized and analyzed. Section 4 gives the proposed RF-GWO fault detection model and confirms the effectiveness of LSGAN and RF-GWO with experimental data. Finally, Section 5 concludes.

2. Theory of LSGAN Data Augmentation Model

In this section, the proposed method is described in detail. The general framework of the method is first analyzed and then the structure and parameters of the method are discussed.

2.1. Generative Adversarial Networks

In 2014, Goodfellow et al. [15] proposed Generative Adversarial Networks (GANs). GANs consist of two main components: the generative model and the discriminative model. The basic structure of the GAN model is depicted in Figure 3. The generative model’s objective is to learn and extract the real data distribution, enabling it to generate new samples that closely match that distribution. Conversely, the discriminative model acts as a binary classifier, assigning scores to input data. Data with higher scores are categorized as real data, while data with lower scores are identified as fake. During the training process of GANs, the parameters of one network (either the generator or the discriminator) are kept constant while the parameters of the other network are updated using backpropagation. This process can be visualized as a “binary minimax game”. In each training round, the generative and discriminative models alternate in optimizing their parameters. Eventually, the generative model becomes capable of estimating the distribution of the sample data and generating new data that closely resemble real data samples. The overall process is adversarial in nature. GANs, a type of deep learning model, have garnered significant attention and achieved remarkable success in numerous applications, including speech recognition and image recovery. Unlike traditional generative learning algorithms, GANs do not require a priori knowledge of the data distribution and exhibit strong fitting capabilities. Theoretically, GANs have the advantage of fitting the data distribution consistently with the original distribution. Therefore, GANs are widely used in scenarios involving small sample sizes and data imbalances, such as speech restoration and image recognition [16].

The training process of the GAN involves a competitive interaction between the generator G and the discriminator D. The GAN is optimized based on the Nash equilibrium concept [17]. Ultimately, a Nash equilibrium is achieved, where the generator G generates new samples that closely resemble the actual fault data. The objective function of the GAN optimization process can be expressed as follows:

\min_{G} \max_{D} L (G, D) = E_{x ~ p_{d a t a (x)}} [\log D (x)] + E_{z ~ p_{z (z)}} [\log (1 - D (G (z)))]

(1)

where

E

is the mathematical expectation of the distribution specified in the subscript,

P_{d a t a (x)}

is the distribution of the real data

x

,

P_{z (z)}

is the distribution of the generated data,

D (x)

is the discriminant function of the discriminator, and

G (z)

is the data generated by the generator.

2.2. LSGAN Theory

In the training process of the original GAN, the experimental fault samples, represented as a time series of power battery cell voltage undervoltage alarm signals; the generated samples; and the real samples may exhibit a substantial disparity in their distributions. This distinction has the potential to cause pattern collapse or gradient dissipation phenomena to occur [18]. In order to solve this phenomenon, Mao et al. [19] proposed the LSGAN model to improve the loss function, which solves the problem of lower quality of sample generation and improves the stability of the training process. The objective function of the LSGAN model can be expressed as follows:

\{\begin{cases} \min_{D} L_{L S G A N} (D) = \frac{1}{2} E_{x ~ p_{d a t a} (x)} {(D (x) - b)}^{2} + \frac{1}{2} E_{x ~ p_{d a t a} (x)} {(D (G (x)) - a)}^{2} \\ \min_{G} L_{L S G A N} (G) = \frac{1}{2} E_{x ~ p_{G} (x)} {(D (G (x)) - c)}^{2} \end{cases}

(2)

where,

a

,

b

denote the labels of the generated and true samples, respectively;

c

denotes the expectation that the discriminator

D

screens the generated sample as true.

The goal of LSGAN is to gradually train the generated samples to a data distribution close to the real samples by smoothing the unsaturated gradient loss function [20]. To achieve this goal, the parameter

a

will be set to 0 and the parameters

b

and

c

will be set to 1 to make the generated samples as close as possible to the real data distribution. Thus, the feasibility and the accuracy of the GAN model, applied to time series data enhancement, are improved. Second, in order to prevent the generator from overfitting, a Dropout layer is added to the generator and a least squares loss function is used in the discriminator. These operations serve as a safeguard to improve the quality of the generated data and lead to a high accuracy of the model’s diagnostic results. Figure 4 illustrates the structure of the LSGAN network.

3. Experiment Study and Analysis

3.1. Power Battery Single Undervoltage Failure Analysis

The battery system of an electric vehicle mainly consists of lithium-ion batteries and a BMS (Battery Management System), and the working process of the battery system is shown in Figure 5. Due to the differences in the use and manufacture of the individual batteries in the battery pack, this will lead to poor consistency between individual batteries. When the difference between individual cells exceeds a certain degree, the battery pack will trigger a poor consistency alarm, and the performance indicators of the battery pack, such as its capacity and energy density, will also be significantly reduced, and this inconsistency may even shorten the life of the battery.

Consider a battery pack with four single cells with consistent voltages, which are A1, A2, A3, and A4. Assume that the undervoltage fault threshold of a single cell is 2.83 V. In Figure 5a, without the occurrence of a single-unit undervoltage fault, the voltage values of A1, A3, and A4 at the current moment are 3.3 V, and the voltage value of A2 is 3.2 V. Even though the voltage value of A3 is lower than that of the other single cells, it is not below the fault threshold. In contrast, in Figure 5b, a single cell undervoltage fault situation occurs, at which time the voltage values of A1, A3, and A4 are 3.3 V, while the voltage value of A2 is 2.8 V. Since the voltage of A2 is lower than the fault threshold, it is determined that a single cell undervoltage fault has occurred at that moment.

3.2. Electric Vehicle Data Analysis

The research data in this paper come from the real operating data of four pure electric vehicles provided by a car factory in Fujian Province that cooperates with the laboratory, and the sampling time is from September to December 2023, with a sampling interval of 10 s.

3.2.1. Extraction of Key Factors for Abnormal Voltage Fault Diagnosis

The real operating dataset of pure electric vehicles used in this study contains 56 data fields. Some of these fields (e.g., accelerator pedal opening and engine status) are not directly related to abnormal battery voltage faults and are, therefore, not suitable for data enhancement and fault diagnosis modeling. The study employs the Pearson correlation coefficient to filter out the key factors leading to power battery cell voltage undervoltage faults from these 56 characteristic variables.

The Pearson correlation coefficient is a statistical measure that quantifies the strength of the linear correlation between two variables. It ranges between −1 and 1 [21] and can be expressed as follows:

r = \frac{\sum (X_{i} - X) (Y_{i} - Y)}{\sqrt{\sum {(X_{i} - X)}^{2} \sum {(Y_{i} - Y)}^{2}}}

(3)

where

(r)

denotes the Pearson correlation coefficient,

(X_{i})

and

(Y_{i})

represent the

(i)

th data point in the sample dataset, and

(\bar{X})

and

(\bar{Y})

represent the mean values of

(X)

and

(Y)

, respectively.

By calculating the covariance and standard deviation of the sample data, the Pearson correlation coefficient can be obtained to understand the strength of the linear correlation between the two variables.

For data augmentation and fault diagnosis models, feature variables with low input dimensions make it difficult to extract critical information, thus affecting the robustness of the model. However, if the dimension of the feature variables is too high, it will lead to the existence of redundant features and increase the complexity of the model. Therefore, by filtering out the feature variables with a strong correlation with the target variables, the amount of computation can be reduced, and the accuracy and efficiency of the model can be improved [22]. In this paper, seven characteristic variables affecting power battery cell voltage loss faults were finally selected from 56 data fields in the automobile driving dataset. In this study, only those factors that have a greater influence on the occurrence of voltage failure of power batteries are considered, so all the resulting Pearson correlation coefficients are treated as absolute values, as shown in Figure 6.

3.2.2. Voltage Abnormal Fault Threshold Setting

Determining whether the lowest single cell voltage is below the fault threshold is a method to evaluate the single cell undervoltage fault. After preprocessing of the pre-data, the data of the four vehicles with the occurrence of single cell undervoltage faults in the running data are extracted and compared and analyzed with the normal data. As can be seen in Figure 7, according to the actual occurrence of the power cell single cell undervoltage fault alarms in the dataset provided by the vehicle manufacturer, the maximum value of the lowest single cell voltage of 3.16 V is set as a fault threshold, and all the vehicles with faults in the driving condition of the single cell voltage are below the threshold value and meet the fault conditions. Therefore, in the vehicle manufacturer’s dataset used in this study, the threshold value for power cell undervoltage faults under driving condition is 3.16 V.

3.3. Comparative Analysis of Generated and Real Samples

In this paper, we keep the seven key feature data fields obtained by the Pearson correlation coefficient, and obtain 10,000 operating data from the real operating data of four vehicles randomly sampled according to the percentage of different vehicles and different times, including 1000 single-unit undervoltage fault data and 9000 normal data to get the original dataset with a 10% fault percentage, and construct a generative adversarial network by using the Python and TensorFlow libraries. In the construction of the generative adversarial network, the initial step involves applying a normalization method to the original dataset. This process aims to mitigate the impact of scale variations and ensure consistent scaling across the data. The normalization formula can be expressed as follows:

y_{i} = \frac{x_{i} - \min_{1 \leq j \leq n} \{x_{j}\}}{\max_{1 \leq j \leq n} \{x_{j}\} - \min_{1 \leq j \leq n} \{x_{j}\}}

(4)

where

x_{i}

, and

y_{i}

denote the data before and after the normalization process, respectively.

The normalization process is able to scale the data to the range of [0, 1] without changing the distribution of the data. The normalized data are fed into the generative adversarial network model for data augmentation of the power battery cell undervoltage fault data. Each model batch is chosen to contain 64 samples, and the learning rate of the generator and discriminator is set to 0.001. Adam is chosen as the optimizer for the discriminator and generator, and the Adam optimizer [23] has the advantages of being computationally efficient and suitable for unstable objective functions. With the increase in the number of iterations, the loss function curves of the discriminator and the generator tend to stabilize, and the loss of the generator is appropriately tuned up in order to ensure the diversity of fault feature changes in the generated data, as shown in Figure 8.

After model training, 2000 power battery single undervoltage fault data were finally randomly obtained. In order to compare the similarity between the generated data and the original data, 2000 items were also randomly selected from the original fault data, because the randomly selected data could not reflect their correlation with time. Therefore, this study selects four important factors that cause the occurrence of power cell undervoltage faults with less time correlation and most intuitively lead to the occurrence of power cell undervoltage faults, i.e., the total voltage, the maximum value of the single cell voltage, the minimum value of the single cell voltage, and the extreme difference of the single cell voltage, and then evaluates them. The trends of the raw and fault data are plotted on a single graph for visualization and analysis, as shown in Figure 9.

The experimental results demonstrate a similarity in the variation trend between the generated data and the original data, with the maximum and minimum values falling within the interval range of the original data, and complement the problem of a single variation of fault characteristics in the original data. The above results show that the data enhancement proposed in this paper, using LSGAN model power battery single unit undervoltage fault data, generates samples that can be added to the fault samples as an expansion of the fault samples. This process allows for an expansion in the training sample size, enriches the dataset, and enhances the diversity in the changes of fault data features. Consequently, it significantly improves the model’s generalization capability.

4. Fault Diagnosis Model

Based on the enhanced data, this paper adopts the random forest classification model for fault diagnosis of the power battery single undervoltage fault data, and it uses the Gray Wolf Optimization algorithm to optimize the hyperparameters of the model.

4.1. Random Forest Classification Model

The random forest model is a machine learning model that can effectively handle high-dimensional data, has high noise tolerance, and is not prone to overfitting problems, and it is widely used in classification and regression analysis. In classification problems, the final prediction results are determined by voting [24]. According to the characteristics of the data under study, the random forest classification model is selected to diagnose the fault of the single undervoltage abnormality of electric vehicle power battery. The structure of the random forest classification model is shown in Figure 10.

Suppose that the voting method is used to determine the final prediction in random forest classification. Suppose the random forest model contains m decision tree models {

h_{1}

,

h_{2}

, …,

h_{n}

} and m sets of categories {

C_{1}

,

C_{2}

, …,

C_{m}

}. In order to ensure the reliability of the classification outcomes, the absolute majority voting method is employed to determine the final output results. Specifically, if a category is supported by more than half of the votes in all decision trees, then it is predicted to be that category; otherwise, the categorization prediction will be rejected. The formula can be expressed using the following equation:

H (x) = \{\begin{cases} C_{j}, \sum_{i = 1}^{n} h_{i}^{j} (x) > \frac{1}{2} \sum_{k = 1}^{m} \sum_{i = 1}^{n} h_{i}^{k} (x) \\ r e j e c t, otherwise \end{cases}

(5)

where

H (x)

denotes the voting result and

h_{j}^{i} (x)

denotes the decision tree

h_{i}

category judgment of input

x

as

C_{j}

; reject denotes the rejection of the classification prediction.

4.2. Grey Wolf Optimizer

When constructing a fault diagnosis model, manually setting the hyperparameters may not be sufficient to optimize the performance of the model. Therefore, it is crucial to adopt an appropriate hyperparameter search method for the model. The Gray Wolf Optimizer (GWO) proposed by MirJalili et al. from Griffith University, Australia in 2014 is a commonly used method [25]. The GWO algorithm simulates the predatory behavior of gray wolves and is simple to implement with few parameters [26]. Therefore, the GWO algorithm has become a popular choice for hyperparameter optimization when constructing fault diagnosis models [27].

The gray wolf packs are categorized into four tiers, namely α, β, δ, and ω, based on their individual strengths. The top tier, represented by α, signifies the leader wolf, which corresponds to the optimal solution obtained from the optimization algorithm. The β tier consists of suboptimal solutions that follow the guidance of α. The δ tier lies between α and β, exhibiting better performance than the ω tier. All other wolves belong to the ω tier.

The optimization process of GWO comprises three primary components [28].

In the first step, gray wolf populations are initialized by establishing a social hierarchy. The hierarchy of the gray wolf population is not fixed, but it is updated based on the results of each iteration.

In the second step, the gray wolves engage in a population search by gradually surrounding their prey. This behavior is mathematically modeled and can be expressed using Equations (6) and (7):

\vec{D} = |\vec{C} * {\vec{X}}_{p} (t) - \vec{X} (t)|

(6)

\vec{X} (t + 1) = {\vec{X}}_{p} (t) - \vec{A} * \vec{D}

(7)

where

\vec{D}

is the distance between the prey and the gray wolf,

t

is the number of iterations,

\vec{A}

and

\vec{C}

refer to coefficient vectors, and

\vec{X}

and

\vec{X_{p}}

represent the position vectors of the gray wolf and the prey, respectively.

The formulas for vector

\vec{C}

and vector

\vec{A}

are given in Equations (8) and (9):

\vec{A} = 2 * \vec{a} * {\vec{r}}_{1} - \vec{a}

(8)

\vec{C} = 2 * {\vec{r}}_{2}

(9)

where

\vec{r_{1}}

and

\vec{r_{2}}

are random numbers in the range [(0, 1)];

\vec{a}

is the convergence factor, which takes values in the range (0, 2) and decreases linearly with the number of iterations of the algorithm. Corresponding to Equation (8),

|\vec{A}| \geq 1

implies that a global search is performed, and vice versa indicates that the gray wolf performs a local check.

In the third step, the position of the population is updated by the hunting behavior of gray wolves. α, β, and δ provide guidance for gray wolves to surround their prey. In order to simulate the positional changes of gray wolves during hunting, it is assumed that α, β, and δ can identify potential prey locations. Therefore, in each iteration, the remaining gray wolves update their positions based on the position information of the first three optimal solutions α, β, and δ. By this method, the search efficiency and the convergence of the whole gray wolf population are verified. Figure 11 provides an illustrative example of this process.

The mathematical expression for the change in the position of a single gray wolf throughout the update process is as follows:

\{\begin{cases} {\vec{D}}_{α} = |{\vec{C}}_{1} * {\vec{X}}_{α} - \vec{X}| \\ {\vec{D}}_{β} = |{\vec{C}}_{2} * {\vec{X}}_{β} - \vec{X}| \\ {\vec{D}}_{δ} = |{\vec{C}}_{3} * {\vec{X}}_{δ} - \vec{X}| \end{cases}

(10)

\{\begin{cases} {\vec{X}}_{1} = {\vec{X}}_{α} - A_{1} * {\vec{D}}_{α} \\ {\vec{X}}_{2} = {\vec{X}}_{β} - A_{2} * {\vec{D}}_{β} \\ {\vec{X}}_{3} = {\vec{X}}_{δ} - A_{3} * {\vec{D}}_{δ} \end{cases}

(11)

{\vec{X}}_{t + 1} = \frac{{\vec{X}}_{1} + {\vec{X}}_{2} + {\vec{X}}_{3}}{3}

(12)

where

\vec{D_{α}}

, and

\vec{D_{δ}}

denote the distances between α, β, and δ relative to other individuals, respectively;

\vec{X_{α}}

,

\vec{X_{β}}

, and

\vec{X_{δ}}

denote the positions of α, β, and δ, respectively;

\vec{C_{1}}

,

\vec{C_{2}}

and

\vec{C_{3}}

are random vectors; and

\vec{X}

denotes the current position of an individual gray wolf.

The movement step length and direction of an individual wolf towards wolves are defined by Equation (11). Additionally, Equation (12) is utilized to calculate the final position resulting from the wolf’s movement.

Figure 12 illustrates a flowchart of a Gray Wolf Optimization algorithm.

4.3. Troubleshooting Results

In this paper, TensorFlow is used to construct a random forest classification model for electric vehicle power battery voltage abnormality fault assertion, which is firstly trained on the original dataset with 10% of fault data and the LSGAN expanded dataset with 25% and 40% of fault data, respectively. The mean square error (MSE), root mean square error (RMSE), mean absolute error (MAE), and coefficient of determination (R2) are used as the evaluation indexes of the model.

M S E = \frac{1}{M} \sum_{i = 1}^{M} {(y_{i} - {\hat{y}}_{i})}^{2}

(13)

R M S E = \sqrt{\frac{1}{n - T} \sum_{t = T + 1}^{n} {(x_{t} - {\hat{x}}_{t})}^{2}}

(14)

M A E = \frac{1}{n - T} \sum_{t = T + 1}^{n} ‖x_{t} - \hat{x}‖

(15)

R^{2} = 1 - \frac{S S E}{S S T}

(16)

During the model training process, the dataset is partitioned into a training set and a test set, with a ratio of 8:2. The hyperparameters to be optimized for the random forest classification model take the following value intervals: the maximum depth of the decision tree (1, 10) the number of features considered in the maximum number of features node splitting (0.1, 0.5), the minimum number of samples required for splitting nodes (5, 10), the minimum number of samples demanded for leaf nodes (3, 7), and the number of trees (1, 10). Taking the maximum fault diagnosis accuracy as the optimization objective, the initial number of wolves is five. After 50 rounds of Gray Wolf Optimization iterations, the optimized hyperparameters are finally obtained as shown in Table 1, and the test results of the model are shown in Table 2.

Table 1 lists, from left to right, the optimal parameters for different scale datasets These parameters include the tree depth, the maximum number of features, the minimum sample requirement for split nodes, the minimum sample requirement for leaf nodes, and the number of trees.

Table 2 shows that increasing the proportion of fault data improves the accuracy of the model. This finding highlights the feasibility of utilizing the LSGAN data enhancement model proposed in this paper to expand the single undervoltage fault data of power batteries to improve the accuracy of subsequent fault detection methods. In order to further validate the effectiveness and feasibility of the fault diagnosis method proposed in this paper, the models commonly used in fault diagnosis, i.e., Support Vector Machines (SVM), Gradient Boosting Machines (GBM), and Naïve Bayes (NB), are selected and compared using a fault diagnosis dataset that contains 25% of the data in order to evaluate their generalization performance. The accuracy is the ratio of the number of samples of all correctly diagnosed classes (including positive and negative classes) to the total number of samples of positive and negative classes, and can be obtained using the following equation:

A = \frac{T_{P} + T_{N}}{P + N}

(17)

where

A

is the diagnostic accuracy,

T_{P}

is the number of samples in which the positive category is judged to be positive,

T_{N}

is the number of samples in which the negative category is judged to be negative,

P

is the number of samples in which the positive category is judged to be negative, and

N

is the number of samples in which the negative category is judged to be negative. The results are shown in Table 3.

In Table 3, the model diagnosis accuracy of RF-GWO is improved by 0.0885 percentage points, 0.0850 percentage points, and 0.0720 percentage points compared with the SVM model, GBM model, and NB model, respectively, which reflects the superiority of the RF-GWO fault diagnosis framework.

In order to verify the effectiveness of the optimizer selected in this study, two algorithms, Bayesian optimization [29] and particle swarm optimization [30], which are commonly used in the field of machine learning in recent years as hyper-parameter optimization methods, are selected to find the optimal hyper-parameters of the random forest model respectively, and comparative tests are also carried out on an expanded dataset with a fault data percentage of 25%. The optimal parameters of the three methods are shown in Table 4.

Table 4 lists the optimal parameters under each optimizer in order from left to right, as in Table 1. To ensure fairness, the model is also evaluated using the accuracy rate, which gradually stabilizes after unpaired iterative training, as shown in Figure 13a–c.

From Figure 13a, it can be observed that the fault diagnosis accuracy gradually levels off and does not increase after reaching 0.9901 when using PSO optimization. From (b) and (c), it can be observed that the accuracy of fault diagnosis using GWO optimization is slightly higher than that using BO optimization, and the curve smoothness of GWO is better than that of BO as the number of iterations continues to increase; therefore, it can be shown that the optimizer chosen in this study in the fault diagnosis model can be more accurate and efficient.

5. Conclusions

In this paper, based on the real operating data of four pure electric vehicles, LSGAN is, firstly, used to perform data enhancement on the power battery single undervoltage fault data, and through visualization and analysis, it is concluded that this method can be used for time series voltage fault data enhancement. Then, the RF-GWO method is used for fault detection, while experiments are conducted on the original dataset and the extended dataset, respectively, and the comparative analysis concludes that the LSGAN data enhancement method can extend the limited data samples and improve the accuracy of the fault diagnosis model.

Secondly, this paper uses the random forest (RF) model for fault diagnosis and Gray Wolf Optimization (GWO) for the hyper-parameters. Comparative validation results show that the proposed RF-GWO model outperforms the SVM, GBM, and NB models in terms of diagnostic accuracy and generalization performance, and GWO also outperforms PSO and BO in terms of accuracy and stability in the selection of optimization algorithms.

Power battery failures in EVs are affected by a variety of factors, including driving behavior, weather conditions, and road conditions. In addition, there is a correlation between different types of failures. This study only focuses on analyzing abnormal power battery voltage faults using real vehicle operation data. However, future research will focus on exploring the interconnections between different types of battery failures. This will involve combining data from driving behavior, weather information, and other sources to fully investigate the causes of EV battery failures. By combining multiple sources of data, such as driving behavior and weather information, the goal in the future is to improve driving safety and overall traffic safety by understanding the mechanisms behind EV battery failures.

Author Contributions

Conceptualization, W.F., Y.G. and J.Z.; methodology, W.F., Y.G. and J.Z.; software, W.F., Y.G. and J.Z.; validation, W.F., Y.G. and J.Z.; formal analysis, W.F., Y.G. and J.Z.; investigation, Y.G. and J.Z.; resources, W.F.; data curation, J.Z.; writing—original draft preparation, Y.G. and J.Z.; writing—review and editing, W.F., Y.G. and J.Z.; visualization, W.F., Y.G. and J.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This study was funded by the Research on Intelligent Safety Early Warning System for New Energy Vehicles Based on Big Data and Industrial Application of the 2022 Municipal Regional Science and Technology Major Project of Fuzhou Municipal Science and Technology Bureau, Item No. 2022-Q-009. Development and Demonstration of Regulatory Service System for New Energy Vehicles in Fujian Province Should (Project No. KY310044).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Duh, Y.S.; Tsai, M.T.; Kao, C.S. Characterization on the thermal runaway of commercial 18650 lithium-ion batteries used in electric vehicle. J. Therm. Anal. Calorim. 2017, 127, 983–993. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, L.; Sun, F.; Wang, Z. An overview on thermal safety issues of lithium-ion batteries for electric vehicle application. IEEE Access 2018, 6, 23848–23863. [Google Scholar] [CrossRef]
Zhao, Y.; Liu, P.; Wang, Z.; Zhang, L.; Hong, J. Fault and defect diagnosis of battery for electric vehicles based on big data analysis methods. Appl. Energy 2017, 207, 354–362. [Google Scholar] [CrossRef]
Xiong, R.; Ma, S.; Li, H.; Sun, F.; Li, J. Toward a safer battery management system: A critical review on diagnosis and prognosis of battery short circuit. iScience 2020, 23, 101010. [Google Scholar] [CrossRef] [PubMed]
Lipu, M.H.; Hannan, M.; Karim, T.F.; Hussain, A.; Saad, M.H.M.; Ayob, A.; Miah, S.; Mahlia, T.I. Intelligent algorithms and control strategies for battery management system in electric vehicles: Progress, challenges and future outlook. J. Clean. Prod. 2021, 292, 126044. [Google Scholar] [CrossRef]
Wu, C.; Zhu, C.; Ge, Y.; Zhao, Y. A review on fault mechanism and diagnosis approach for Li-ion batteries. J. Nanomater. 2015, 2015, 8. [Google Scholar] [CrossRef]
Lu, L.; Han, X.; Li, J.; Hua, J.; Ouyang, M. A review on the key issues for lithium-ion battery management in electric vehicles. J. Power Sources 2013, 226, 272–288. [Google Scholar] [CrossRef]
Zhu, S.; He, C.; Zhao, N.; Sha, J. Data-driven analysis on thermal effects and temperature changes of lithium-ion battery. J. Power Sources 2021, 482, 228983. [Google Scholar] [CrossRef]
Sidhu, A.; Izadian, A.; Anwar, S. Adaptive nonlinear model-based fault diagnosis of Li-ion batteries. IEEE Trans. Ind. Electron. 2014, 62, 1002–1011. [Google Scholar] [CrossRef]
Chen, Z.; Xu, K.; Wei, J.; Dong, G. Voltage fault detection for lithium-ion battery pack using local outlier factor. Measurement 2019, 146, 544–556. [Google Scholar] [CrossRef]
Hong, J.; Wang, Z.; Yao, Y. Fault prognosis of battery system based on accurate voltage abnormity prognosis using long short-term memory neural networks. Appl. Energy 2019, 251, 113381. [Google Scholar] [CrossRef]
Li, X.; Dai, K.; Wang, Z.; Han, W. Lithium-ion batteries fault diagnostic for electric vehicles using sample entropy analysis method. J. Energy Storage 2020, 27, 101121. [Google Scholar] [CrossRef]
Feng, F.; Li, K.-C.; Shen, J.; Zhou, Q.; Yang, X. Using cost-sensitive learning and feature selection algorithms to improve the performance of imbalanced classification. IEEE Access 2020, 8, 69979–69996. [Google Scholar] [CrossRef]
Park, S.; Park, H. Combined oversampling and undersampling method based on slow-start algorithm for imbalanced network traffic. Computing 2021, 103, 401–424. [Google Scholar] [CrossRef]
Goodfellow, I.J.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial nets. Adv. Neural Inf. Process. Syst. 2014, 27. [Google Scholar]
Saxena, D.; Cao, J. Generative adversarial networks (GANs) challenges, solutions, and future directions. ACM Comput. Surv. (CSUR) 2021, 54, 1–42. [Google Scholar] [CrossRef]
Gnanha, A.T.; Cao, W.; Mao, X.; Wu, S.; Wong, H.-S.; Li, Q. The residual generator: An improved divergence minimization framework for GAN. Pattern Recognit. 2022, 121, 108222. [Google Scholar] [CrossRef]
Akkem, Y.; Biswas, S.K.; Varanasi, A. A comprehensive review of synthetic data generation in smart farming by using variational autoencoder and generative adversarial network. Eng. Appl. Artif. Intell. 2024, 131, 107881. [Google Scholar] [CrossRef]
Mao, X.; Li, Q.; Xie, H.; Lau, R.Y.K.; Wang, Z.; Smolley, S.P. Least squares generative adversarial networks. In Proceedings of the IEEE International Conference on Computer Vision 2017, Venice, Italy, 22–29 October 2017; pp. 2794–2802. [Google Scholar]
Wang, C.; Cao, Y.; Zhang, S.; Ling, T. A reconstruction method for missing data in power system measurement based on LSGAN. Front. Energy Res. 2021, 9, 651807. [Google Scholar] [CrossRef]
Gu, X.; See, K.; Liu, Y.; Arshad, B.; Zhao, L.; Wang, Y. A time-series Wasserstein GAN method for state-of-charge estimation of lithium-ion batteries. J. Power Sources 2023, 581, 233472. [Google Scholar] [CrossRef]
Yu, K.; Guo, X.; Liu, L.; Li, J.; Wang, H.; Ling, Z.; Wu, X. Causality-based feature selection: Methods and evaluations. ACM Comput. Surv. (CSUR) 2020, 53, 1–36. [Google Scholar] [CrossRef]
Ghahramani, Z. Probabilistic machine learning and artificial intelligence. Nature 2015, 521, 452–459. [Google Scholar] [CrossRef] [PubMed]
Antoniadis, A.; Lambert-Lacroix, S.; Poggi, J.M. Random forests for global sensitivity analysis: A selective review. Reliab. Eng. Syst. Saf. 2021, 206, 107312. [Google Scholar] [CrossRef]
Mirjalili, S.; Mirjalili, S.M.; Lewis, A. Grey wolf optimizer. Adv. Eng. Softw. 2014, 69, 46–61. [Google Scholar] [CrossRef]
Zhang, X.; Hou, J.; Wang, Z.; Jiang, Y. Joint SOH-SOC estimation model for lithium-ion batteries based on GWO-BP neural network. Energies 2022, 16, 132. [Google Scholar] [CrossRef]
Águila-León, J.; Vargas-Salgado, C.; Díaz-Bello, D.; Montagud-Montalvá, C. Optimizing Photovoltaic Systems: A Meta-Optimization Approach with GWO-Enhanced PSO Algorithm for Improving MPPT Controllers. Renew. Energy 2024, 230, 120892. [Google Scholar] [CrossRef]
Jafari, S.; Shahbazi, Z.; Byun, Y.-C.; Lee, S.-J. Lithium-ion battery estimation in online framework using extreme gradient boosting machine learning approach. Mathematics 2022, 10, 888. [Google Scholar] [CrossRef]
Wang, X.; Jin, Y.; Schmitt, S.; Olhofer, M. Recent advances in Bayesian optimization. ACM Comput. Surv. 2023, 55, 1–36. [Google Scholar] [CrossRef]
Nayak, J.; Swapnarekha, H.; Naik, B.; Dhiman, G.; Vimal, S. 25 years of particle swarm optimization: Flourishing voyage of two decades. Arch. Comput. Methods Eng. 2023, 30, 1663–1725. [Google Scholar] [CrossRef]

Figure 1. Flow of lithium-ion battery system fault diagnosis.

Figure 2. Overall technical structure.

Figure 3. Basic structure of GAN model.

Figure 4. Structure of LSGAN network.

Figure 5. (a) Normal state battery pack form; (b) Failure state battery pack form.

Figure 6. Key characterization variables.

Figure 7. (a–d) represent the operational data of 4 vehicles with battery cell undervoltage faults.

Figure 8. (a) discriminator loss function; (b) generator loss function.

Figure 9. (a) Total voltage; (b) maximum battery voltage per cell; (c) minimum battery voltage per cell; (d) battery individual voltage polarity.

Figure 10. Random forest classification model structure.

Figure 11. Schematic diagram of wolf location up.

Figure 12. Grey Wolf Optimization algorithm flowchart.

Figure 13. (a) Accuracy of the RF-PSO model; (b) accuracy of the RF-BO model; (c) accuracy of the RF-GWO model.

Table 1. Optimal hyperparameters.

Percentage of Fault Data	Optimal Parameters
10%	(8, 0.38, 6, 7, 8)
25%	(7, 0.36, 6, 7, 9)
40%	(9, 0.43, 6, 7, 8]

Table 2. Comparison of accuracy of different datasets.

Percentage of Fault Data	MSE	RMSE	R²
10% (Original dataset)	0.500	0.707	−1.000
25% (Expanded dataset)	0.033	0.182	0.868
40% (Expanded dataset)	0.001	0.031	0.996

Table 3. Comparison of the accuracy of different diagnostic methods.

Diagnostic Methods	Accuracy
SVM	0.9100
GBM	0.9135
NB	0.9265
RF-GWO	0.9985

Table 4. Optimal hyperparameters.

Optimization Algorithm	Optimal Parameters
RF-PSO	(7, 0.23, 7, 3, 9)
RF-BO	(7, 0.23, 6, 3, 9)
RF-GWO	(7, 0.36, 6, 7, 9)

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Fang, W.; Guo, Y.; Zhang, J. Generative Adversarial Network-Based Voltage Fault Diagnosis for Electric Vehicles under Unbalanced Data. Electronics 2024, 13, 3131. https://doi.org/10.3390/electronics13163131

AMA Style

Fang W, Guo Y, Zhang J. Generative Adversarial Network-Based Voltage Fault Diagnosis for Electric Vehicles under Unbalanced Data. Electronics. 2024; 13(16):3131. https://doi.org/10.3390/electronics13163131

Chicago/Turabian Style

Fang, Weidong, Yihan Guo, and Ji Zhang. 2024. "Generative Adversarial Network-Based Voltage Fault Diagnosis for Electric Vehicles under Unbalanced Data" Electronics 13, no. 16: 3131. https://doi.org/10.3390/electronics13163131

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Generative Adversarial Network-Based Voltage Fault Diagnosis for Electric Vehicles under Unbalanced Data

Abstract

1. Introduction

2. Theory of LSGAN Data Augmentation Model

2.1. Generative Adversarial Networks

2.2. LSGAN Theory

3. Experiment Study and Analysis

3.1. Power Battery Single Undervoltage Failure Analysis

3.2. Electric Vehicle Data Analysis

3.2.1. Extraction of Key Factors for Abnormal Voltage Fault Diagnosis

3.2.2. Voltage Abnormal Fault Threshold Setting

3.3. Comparative Analysis of Generated and Real Samples

4. Fault Diagnosis Model

4.1. Random Forest Classification Model

4.2. Grey Wolf Optimizer

4.3. Troubleshooting Results

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI