1. Introduction
The ultrahigh voltage direct current (UHVDC) thyristor valve is the core equipment of a UHVDC transmission project [
1]. A saturable reactor is a critical component of the UHVDC thyristor valve, which can inhibit the rapid growth of surge currents during the opening and closing progress of the thyristors, bear most of the peak voltage under lightning overvoltage, and protect the thyristors under various working conditions [
2]. Under long-term engineering mechanical stress, the tension band of a saturable reactor iron core will undergo permanent plastic deformation, decreasing the tension force. Then, the air gap of the iron core may change. This may lead to the protection capability of the saturable reactor not meeting the design requirements. Abnormal vibration of the iron core can also occur, which may damage the insulation and cause excessive temperature due to the eddy current effect. Through routine inspection and maintenance, it is impossible to identify the iron core state. Therefore, it is necessary to study the monitoring and diagnosis method of saturable reactor iron core looseness faults to determine the state of the iron core and support reasonable operation and maintenance in engineering.
Vibration signals have been widely used in the condition monitoring and diagnosis of power equipment as a physical quantity that directly reflects the structure state of devices [
3,
4,
5,
6]. Similar to a transformer or an alternative current (AC) reactor, when a saturable reactor works, its iron core vibrates under magnetostriction. The vibration characteristics are directly affected by the iron core’s structural state. Therefore, it is theoretically feasible to diagnose the iron core looseness of a saturable reactor by analyzing and processing the vibration signals.
In the early stage, the time domain statistical characteristics of the vibration signal were used to diagnose the equipment status, and the operation status of the equipment was diagnosed by calculating the root mean square (RMS), skewness, kurtosis, and other time domain statistical features [
7,
8,
9]. However, the time domain characteristics do not change significantly when some faults occur, and the diagnostic methods based on time domain statistical features cannot detect and distinguish the faults in a timely manner, which creates a certain hysteresis and ambiguity in engineering. With the development of frequency domain analysis methods based on the Fourier transform, such as the frequency spectrum, power spectrum, and cepstrum, it is possible to analyze and calculate the characteristics of signals in the frequency domain [
10,
11,
12,
13]. Some faults do not obviously influence the time domain features of vibration signals but cause significant changes in frequency domain characteristics. Frequency domain analysis can diagnose these faults earlier and more accurately. However, the vibration signals of equipment faults are usually nonstationary signals, which vary both in the time domain and frequency domain [
14]. Some research used time-frequency analysis methods such as a short-time Fourier transform and a wavelet transform to calculate the time-frequency spectrum characteristics of the signal to diagnose equipment faults [
15,
16,
17]. The time-frequency spectrum expresses the signal both in the time domain and the frequency domain, and state information within it is more abundant than that in the simple time domain feature and frequency domain feature [
18].
Methods combining statistical features and threshold judgment have been widely used from the early days to the present. With the development of artificial intelligence technology, machine learning algorithms such as k-nearest neighbor (KNN), decision tree (DT), support vector machine (SVM), and artificial neural network (ANN) are combined with the statistical characteristics of signals in the time domain or frequency domain and applied in vibration-based fault diagnosis [
19,
20,
21]. Compared with threshold judgment methods, machine learning methods do not need to conduct in-depth research on the vibration mechanism of equipment faults, nor do they need to manually design the judgment threshold, which greatly reduces the difficulty for engineering applications to enact fault diagnosis methods [
22,
23]. However, due to the influence of working conditions, different fault samples based on the statistical characteristics under variable working conditions may be confused and difficult to separate. Traditional machine learning methods cannot solve the problem of fault diagnosis under unknown working conditions. In recent years, research on fault diagnosis based on deep learning has made great progress [
24,
25,
26,
27]. The deep learning method can automatically realize feature extraction and classify samples and can distinguish between the small differences of samples. With sufficient samples, it has higher accuracy than traditional machine learning methods and has the potential to solve the problem of fault diagnosis under unknown working conditions [
28,
29,
30].
The imbalance of data samples and the fluctuation of operating conditions are the two main challenges for vibration data-driven fault diagnosis of saturable reactors in UHVDC thyristor valves. The challenges should be solved from the perspectives of expanding data samples and exploring new fault features or diagnosis models.
It is impossible to obtain sufficient vibration data samples of saturable reactors with different iron core looseness faults from engineering fields. Research on dataset extension should be carried out to improve the accuracy and generalization capability of the fault diagnosis method and to solve the problem of imbalanced samples. In recent years, models such as the variational autoencoder (VAE) [
31] and generative adversarial networks (GANs) [
32] and derived methods based on these models, such as the conditional VAE (CVAE) and Wasserstein GAN, have been widely used in sample expansion in different fields [
33,
34,
35]. The CVAE-GAN model combines the advantages of VAE and GAN. It can improve the low generated sample quality of the VAE model and perfect the GAN model’s mode collapse and sample distortion. The CVAE-GAN model has been applied in image generation and mechanical vibration signal generation [
36,
37], and its performance is superior to that of pure VAE, GAN, and their derived methods.
The vibration of a saturable reactor is affected by the structural state and electrical excitation at the same time. The DC transmission project may operate under different loads, and the current passing through a saturable reactor is variable rather than fixed. Because there is no bridge current and no reactor voltage acquisition device in its engineering, electrical excitation data are not available in real time, which increases the difficulty of saturable reactor fault diagnosis. Without electrical excitation data, it is almost impossible to diagnose the iron core looseness of a saturable reactor using traditional machine learning methods combined with time-domain or frequency-domain statistical features. Therefore, it is necessary to study more effective features and fault diagnosis models to solve the challenges of fluctuating and unknown working conditions. Sychrosqueezed wavelet transform (SST) is an improved wavelet transform method that can sharpen the time-frequency domain representation results of signals by relocating the wavelet transform results on the time-frequency domain plane [
38]. Compared with the general wavelet transform time-frequency spectrum, the energy distribution of the SST time-frequency spectrum is more concentrated, and the key components are more prominent, which is conducive to improving the fault diagnostic performance. Through convolution, pooling, and other technologies, the deep convolution neural network (DCNN) can avoid the exponential increase in the number of neurons caused by the rise in input feature dimensions and hidden layers [
39]. It can build a deeper network structure and use fewer hidden layer parameters. It has a more vital ability to learn high-dimensional input features. The DCNN can accept features of sequences or images as input, and some research uses the original vibration signal sequence or time spectrum as the input feature of the DCNN to diagnose faults and achieve good diagnostic performance [
28,
40,
41]. Furthermore, the DCNN model can be designed and modified into a special structure that allows multiple different features to be input at the same time, thus achieving higher diagnostic performance than the method based on a single feature [
36].
This paper presents a fault diagnosis method named CVG-MFICNN for the iron core looseness of a UHVDC thyristor valve saturable reactor. A CVAE-GAN model was trained based on an unbalanced dataset to produce generated samples and expand the original dataset to balance. The vibration signals of the expanded dataset samples are processed from different perspectives. The time-frequency spectrum and frequency-domain sequence of the vibration signals are calculated using the SST and Welch method, respectively. The time-domain series and time-domain statistics are also extracted as vibration signal features. A new MFICNN structure is designed that integrates different multimodal features to diagnose the iron core looseness fault of a UHVDC thyristor valve saturable reactor under variable operating conditions.
The remaining part of this paper is organized as follows: The vibration mechanism of the saturable reactor in a UHVDC thyristor valve is introduced in
Section 2. The workflow and key model structure of the proposed CVG-MFICNN method are described in detail in
Section 3. In
Section 4, the saturable reactor vibration experiment is described, and the performance of the proposed method is tested using manually set unbalanced datasets collected from the vibration experiment and compared with other methods. The work of this paper is summarized in
Section 5.
2. Vibration Mechanism of the Saturable Reactor
The wiring diagram of a 6-pulse converter consists of six single thyristor valves, and the location of the saturable reactor in a single valve is shown in
Figure 1a.
Figure 1b shows the structure of a saturable reactor, mainly composed of a coil, iron cores, clamping boards, and screw bolts. The coil is covered by cast epoxy resin, and the iron cores are mounted outside the epoxy resin. Clamping boards and screw bolts connect the saturable reactor as a whole structure.
Figure 1c shows a cross-sectional view of the iron core. A single iron core comprises two U-shaped laminated silicon steel sheets and a tensioning belt. There is an air gap filled with insulating material between the two U-shaped laminated silicon steel sheets.
The primary vibration source of a saturable reactor is the magnetostrictive vibration of the iron core [
42]. Under the operating condition of the thyristor valve, the periodic trapezoidal wave-like current passing through the saturated reactor can be equivalent to the sum of multiple sinusoidal currents.
and
are the
k-th harmonic’s current amplitude and phase angle, respectively.
is the base angle frequency of the 50 Hz current. According to the principle of electromagnetic induction, the magnetic field strength
in the iron core of a saturable reactor is:
and
are the turns of a saturable reactor coil and the equivalent length of the iron core magnetic circuit, respectively. Without considering the magnetic saturation effect, the small relative deformation
of the silicon steel sheet caused by magnetostriction satisfies Equation (3) [
43]:
and
are the deformation and the original length of the silicon steel sheet, respectively.
is the saturated magnetostriction rate of the silicon steel sheet.
is the coercive force, and
is the magnetic field strength in the iron core. By substituting Equation (2) into Equation (3) and consecutively calculating the differential of the deformation to the time twice, the vibration acceleration of the iron core can be obtained as Equation (4):
where
is a constant, its value is related to the magnetostriction rate
, the original length
of the silicon steel sheet, the coercive force
, the coil turns
, and the equivalent length
of the iron core.
stands for the harmonic order.
and
are two different orders of harmonics. It can be seen from Equation (4) that there are both odd and even times of 50 Hz components within the vibration signals of a saturable reactor. Magnetization saturation and magnetostriction of silicon steel sheets are both nonlinear physical processes. This will aggravate the complexity of the iron core vibration signal. The looseness fault of an iron core directly affects its air gap size, structural mode, and, ultimately, vibration characteristics.
3. Proposed CVG-MFICNN Method
Figure 2 is the framework of the CVG-MFICNN method proposed in this paper. First, a 1-D (one-dimensional) CVAE-GAN model is trained based on the vibration data samples of the training set to produce generated samples. The generated and original training samples are combined to form an extended training set. Second, the time-frequency spectrum, time-domain vibration sequence, frequency spectrum sequence, and time-domain statistics of vibration signals are extracted. The extended training feature set, validation feature set and testing feature set are formed based on the above features. Then, the extended training feature set is used to train the MFICNN model, and the validation feature set is used to select the model with the best validation accuracy during the total training process. After training and validation, a model can be obtained without overfitting or underfitting. Finally, the performance of the selected model is tested using the testing feature set to evaluate the effectiveness of the proposed method.
The processes of training, validation, and testing simulate the actual procedure in engineering applications. The testing corresponds to a practical diagnosis process. Before testing, only training and validation data are available for model training and selection. There is no data leakage during the whole process.
3.1. 1-D CVAE-GAN Model for Dataset Extending
The method proposed in this paper uses a 1-D CVAE-GAN model to generate vibration time series data to expand the training set. The CVAE-GAN model is composed of CVAE and GAN [
37]. As shown in
Figure 3, CVAE-GAN includes the following four parts: (1) encoder E; (2) generator G; (3) discriminator D; and (4) classifier C. Encoder E maps real sample
x to implicit representation
z by learning conditional distribution
of the real sample, where c is the category of the sample. Generator G learns the distribution of real samples through the gradient provided by discriminator D and samples
to produce generated sample
x’ through the learned conditional distribution
. Discriminator D learns to distinguish between real and generated samples, and classifier C learns to classify samples.
The loss functions [
37] used for CVAE-GAN training are:
,
,
, and
are the loss functions of discriminator D, classifier C, generator G, and encoder E, respectively.
and
are introduced to solve the problem of unstable early training caused by gradient explosion.
is the output of discriminator D.
is the classification probability output of classifier C.
and
are the input vectors of the last fully connected (FC) layer of discriminator D and classifier C, respectively.
and
are the mean and standard deviation logarithm vectors of hidden layer representation
.
and
are used to update discriminator D and classifier C, respectively.
,
, and
are used to update generator G.
and
are used to update encoder E.
A 1-D CVAE-GAN model is designed, and its structure is shown in
Figure 4. The 1-D CVAE-GAN model is constructed based on CNN. The encoder, the discriminator, and the classifier use kernels with a size of 17 and a stride of 5 to construct convolution layers. Through two consecutive convolutions, the size of the feature map input to the final FC layer is reduced to 1/25 of the original vibration data. The decoder applies two consecutive transposed convolution layers constructed using kernels with the same parameters to realize the conversion from code to data. The parameters of each component of the model are shown in
Table 1.
The encoder consists of four convolution layers and two fully connected layers. A batch normalization layer and a ReLU activation layer are set after every convolution layer. The Fc1 and Fc2 FC layers output the mean and standard deviation vector of the code, respectively. The decoder consists of one FC layer and three transposed convolution layers. The FC layer is used to input codes and category labels. A ReLU activation layer is set after each of the first two convolution layers, and a sigmoid activation layer is set after the last transposed convolution layer. The discriminator and classifier are composed of three convolution layers and two FC layers. For the discriminator and classifier, all convolution layers and the first FC layer are set with a ReLU activation layer, and the second FC layer is set with a sigmoid layer as the activation function.
Model parameters are randomly initialized before training. For a single training process, the classifier parameters are first updated based on the original samples. Then, the batched original samples are put into the encoder to obtain the mean and standard deviation of the code corresponding to every original sample in the batch. Then, the random codes are generated and input into the decoder together with the category labels to obtain batched generated samples. The generated samples are put into the classifier and discriminator to obtain the classification and discrimination results, respectively. The parameters of the decoder, encoder, and discriminator are updated according to the output of each component. The training of the whole CVAE-GAN model can be performed through multiple iterations of the above processes.
For the sample generation stage, we input the batched original samples and category labels into the encoder and generate random codes according to the mean and standard deviation of the code corresponding to every original sample. The random codes are put into the decoder together with the sample category labels to obtain the batched generated samples to extend the training dataset.
3.2. Features as Input of the MFICNN Model
The time-frequency spectrum, time-domain sequence, frequency-domain sequence, and time-domain statistical code are used as features as the input of the MFICNN model in the proposed method, as shown in
Table 2.
3.3. MFICNN Model
Conventional convolutional neural network models cannot process multiple types of input features, such as images and sequences, simultaneously. In this paper, an MFICNN model structure is designed which can fuse and process multiple types of input features. The model structure is shown in
Figure 5. In the figure, TFDS, TDS, FDS, and TDF represent the time-frequency spectrum, time-domain sequence, frequency-domain sequence, and time-domain statistical value code, respectively.
The MFICNN model is constructed using several two-dimensional (2-D) and 1-D convolution pooling layer groups. A convolution pooling layer group is composed of a convolution layer, a batch normalization layer, a maximum pooling layer, and a ReLU activation layer, in order. After the convolution layer, the size of the feature graph remains unchanged. After subsampling by the maximum pool layer, the size of the feature graph is reduced to 1/2 of the original size. The parameters of the above two kinds of convolution and pooling layer groups are shown in
Table 3. The data dimension is reduced, and the key fault characteristics are extracted by passing the original features through multiple consecutively stacked convolution pooling layer groups.
The time-frequency spectrum and sequence features are converted into feature graphs with sizes of 4 × 4 × 16 and 8 × 16, respectively. The final output feature graphs are flattened into three 1-D sequences and combined with the flattened time-domain statistics codes to form an integrated sequence. The integrated sequence is input to the first FC layer with 16 neurons and a dropout rate of 0.4. The number of neurons in the second FC layer is five, which is consistent with the number of fault types. A softmax layer is set in the end to compute the probability that the samples belong to different categories and output the classification result.
4. Case Study and Discussion
4.1. Saturable Reactor Vibration Experiment
A vibration experiment was conducted to obtain the vibration data of the faulty saturable reactor under different electrical excitations.
Figure 6 shows the vibration experiment wiring diagram. An AC power with a voltage of 220 V is used and connected to a DC power via a transformer. The DC power mainly consists of four sets of IGBTs and is used to charge capacitor C
s. The capacitor C
s is connected to the DC power through switch K
1, grounded through grounding resistance R
cs and switch K
s. K
s ensures that capacitor C
s is reliably grounded when the experiment is not running. A thyristor valve Thy is set to control the energy supplement for the H-bridge module. Isolation transformers and series reactors are arranged in the H-bridge module cabinet to supply energy for IGBT drivers. The H-bridge module rectifies the DC voltage for the saturable reactor. The voltage peak value is adjusted by connecting the equivalent stray capacitor to simulate the operating conditions of the saturable reactor. The excitation can simulate the two pulse voltages generated at the moment of opening and closing the UHVDC thyristor valve. The current passing through the saturable reactor is approximately trapezoidal. The voltage and the current waveform are consistent with the actual engineering waveform. Therefore, the operating vibration characteristics of the saturated reactor can be well simulated by conducting this vibration experiment.
A high-voltage probe and a Rogowski coil are used to measure the voltage and current of the saturable reactor. The voltage, current, and vibration data are synchronously collected using a high-voltage isolated data acquisition system. The isolated data acquisition system uses laser fiber communication to ensure reliable insulation between the saturable reactor and the data acquisition device and prevent high voltage from causing safety accidents or damaging the data acquisition device. The high-voltage probe, Rogowski coil, and piezoelectric accelerometer are connected to the laser transmitter of the acquisition system and isolated from each other. The piezoelectric accelerometer and the laser transmitter use batteries for the power supply. The laser transmitter is connected to the receiving device of the data acquisition system through laser fibers, and the laser receiver and data recorder are connected to a PC for data storage.
The experimental field, the appearance of the saturable reactor used in the experiment, and the installation of the accelerometer are shown in
Figure 7. To ensure dynamic insulation, the reactor iron core and fixing screw bolts are equipotent to the midpoint of the coil, and all discrete metal parts are wired together. The reactor iron cores are installed on the epoxy resin structure outside the coil and fixed by the tensioning belt. The iron core looseness faults at various levels are simulated by adjusting the torque of the tensioning belt screw bolt. Considering the accuracy of vibration measurement and insulation requirements, we stuck the piezoelectric accelerometer to the surface of the clamping board above the iron core to measure the acceleration along the iron core length direction.
The vibration experiment is conducted under different peak currents to collect vibration data of saturable reactors with iron core looseness faults of various levels. Specific experimental parameters are shown in
Table 4. In the table, a, b, c, d, and e represent five levels of iron core looseness faults.
Figure 8 shows the saturable reactor’s voltage, current, and vibration data with the iron core in the normal state (NS) and a peak current of 1280 A.
Figure 8a represents the time domain waveform of voltage, current, and vibration acceleration of five consecutive cycles, and
Figure 8b represents the first valve opening and closing process in
Figure 8a. Saturable reactor vibration is caused by current change, so the macro period of vibration is also 50 Hz, which is the same as that of the current. In the process of single valve opening and closing, the vibration excitation of a saturated reactor caused by current change and magnetostriction is similar to two consecutive shocks with a certain time interval. Due to the higher current change rate
di/
dt during the closing process, the corresponding shock excitation is stronger, and the resulting vibration response is more intense than the opening process.
Figure 9 shows two periodic vibration time domain waveforms of saturated reactors with different loose cores under different operating conditions. The fault state and load affect the vibration signal of the saturable reactor at the same time. It is difficult to distinguish the fault state of the saturable reactor iron core by purely analyzing the characteristics of the vibration signal without load information.
4.2. Dataset Splitting and Extending
The collected vibration data are resampled at 100 kHz and cut into samples with a length of 0.25 s to form the original dataset. The dataset is divided into a training set, a validation set, and a testing set with a ratio of 50%:25%:25%. To create datasets with unbalanced sample distribution, one can randomly remove a certain number of faulty samples of four types except for the normal state type so that the proportion of faulty and normal samples is 2, 4, 8, 15, and 30. The above training sets are defined as training sets I, II, III, IV, and V. The number of different load samples in each dataset is consistent. The specific division of datasets is shown in
Table 5.
The CVAE-GAN model was trained separately using each unbalanced training set. The model parameters are set as N = 25,000 and K = 5 since the length of a single vibration data sample is 25,000. Every unbalanced training set is expanded to balance so that the proportion between normal and faulty samples is 1:1.
Figure 10 shows the power spectra of the original and generated samples in training set V or produced based on training set V. The blue lines represent the real samples, and the red lines represent the generated samples. In general, the power spectrum curve of the generated sample coincides with the real sample, which means that the CVAE-GAN model has learned the key features of the real sample data. Due to the limited number of samples used to train the CVAE-GAN model, there are local differences in certain positions. This can increase the difference among training samples, alleviate the overfitting problem, and improve the model’s generalization capability to a certain extent.
4.3. Feature Extraction
4.3.1. Time-frequency Domain Feature Extraction
For a given signal
, its continuous wavelet transform
is defined as:
is the selected mother wavelet.
and
are the scale parameter and translation parameter, respectively. For any
meeting
, the instantaneous frequency
of signal
is:
is converted from the time-scale plane to the time-frequency plane according to the mapping relationship of
, which is called sychrosqueezing.
and
are divided into several small boxes so that
,
. The sychrosqueezed transform
is only determined by
in consecutive boxes with a center of
and a width of
[
38]:
where
is the
k-th discrete scale and
is the
l-th discrete angular frequency. Through the above transformation, the wavelet factor spectrum can be squeezed along the scale axis, and the energy distribution can be more concentrated. Therefore, the SST time-frequency spectrum has higher energy aggregation than the general wavelet factor spectrum.
The time-frequency spectra of the vibration samples are calculated using the SST method. The vibration signals are linearly normalized to the range of (−0.5, 0.5) according to the peak value before the time-frequency spectrum calculation. Considering the periodicity of the vibration signal in the time domain, we only retain the time-frequency spectrum within one single period for a single data sample. The time-frequency spectrum’s center of the time dimension is defined based on its peak position in the time domain to ensure phase consistency. Because the original SST time-frequency spectrum can only highlight the main frequency components, the original SST time-frequency spectrum is processed with a logarithm of to better express the nonmajor components that are submerged in the background area, and the parameter is set as . Every single time-frequency spectrum matrix is linearly normalized according to its maximum and minimum values and then pseudocolor processed using the Turbo color map to improve its distinguishability.
Figure 11 shows the SST time-frequency spectrum features of saturable reactors with different fault states operating under different loads. It can be seen from the figure that both the operating condition and fault state affect the color and shape of the internal vortices in the time-frequency spectrum. With increasing load, the concentration degree of the signal in the frequency domain increases, and the difference between the hot spot area and the background area of the time-frequency spectrum increases as well. However, both the overall brightness of the image and the proportion of red, yellow, and green areas decrease. By comparing the time-frequency spectra of different fault samples under the same operating conditions, we find that the color distribution position and the shape, number, and position of the internal vortices all vary. The time-frequency spectrum in the areas circled by the red solid ovals change significantly. In general, the time-frequency spectra of different faulty samples under different operating conditions are quite different, which can be roughly distinguished by human eyes and used as input of CNN for fault classification.
4.3.2. Time Domain Sequence Feature Extraction
The time-domain vibration waveforms of the samples are shown in
Figure 8. The time-domain vibration sequence segment phase is consistent with the SST time-frequency spectrum. A two-periodic time-domain segment with a duration of 0.04 s and a length of 4000 is intercepted for each sample. The vibration time domain sequence is normalized according to Equation (9) so that the original vibration signal with a mean value of 0 is converted into a sequence with a mean value of 0.5 and maximum and minimum values not exceeding the range of (0.05, 0.95).
4.3.3. Frequency Domain Sequence Feature Extraction
A power spectrum sequence with a length of 4001 is calculated using the Welch method for each complete sample with a length of 0.25 s.
Figure 10 shows the power spectrum curves of real and generated samples in different fault states under different loads. To improve the power spectrum data distribution and the resolving ability of nonmajor frequency components, we convert the spectrum sequence into a decibel value according to
. Before being input into the neural network, the power spectrum sequence should be linearly normalized to the range of (0.05, 0.95) according to the maximum and minimum values according to Equation (10).
4.3.4. Time Domain Statistic Code Extraction
Four time-domain statistics, the peak value, peak-peak value, root mean square value, and peak factor, are added as value features to retain the absolute value information of vibration signals, as the above three features are all normalized before being input into the neural network. These features are processed into structured code using the one-hot method. In this case, each feature is converted into a code with a length of 10. We divided the distribution range of features in the dataset into ten sections at equal intervals and set the code value at the same position as the section where the feature is located as 1, and the other 9 code values are set as 0, as shown in
Figure 12.
4.4. Model Training and Testing
The F1 score [
45] is used to evaluate the performance of the fault diagnosis methods. Equation (11) is the definition of the F1 score, where
is the precision rate,
is the recall rate,
is the number of samples correctly diagnosed for a specific type of fault,
is the number of samples wrongly diagnosed as a specific type of fault, and
is the number of wrongly diagnosed samples for a specific type of fault. The precision ratio
indicates the proportion of a specific type of sample in the diagnosis results that truly belongs to this type of fault. The recall ratio
represents the correctly diagnosed proportion of all samples belonging to a specific type of fault in the set. The F1 score combines the
and
values and more comprehensively considers the method’s performance. Diagnosing the iron core looseness fault of the UHVDC thyristor saturable reactor is a multiclass classification problem. For models trained based on a certain method, this paper calculates the F1 score of all fault classifications and uses the minimum and average values of all fault classifications F1 scores to characterize the method performance.
The MFICNN model is trained using the imbalanced training set I~V and the corresponding CVAE-GAN extended training sets. The Adam optimizer is used for model training. The learning rate is set to 0.0005, the maximum number of iterations is set to 1000, the number of small batch samples is set to 64, and the single training model selection strategy is set as selecting the model with the best validation loss.
Figure 13 shows the model training process using the CVG-MFICNN method based on training set I. The solid blue line represents the training accuracy, the dotted blue line represents the validation accuracy, the solid red line represents the training loss, and the dotted red line represents the validation loss. In the early stage, the model’s accuracy continues to rise with the increase in training iterations. At 248 iterations, the training accuracy reached 100% and remained almost unchanged. The validation accuracy exceeded 95% but still fluctuated with the increase in iterations. The changing trend of the loss and the accuracy rate is the opposite. After 694 iterations, the accuracy of model validation reached the maximum value of 99.47%, and the validation loss reached the minimum value. After that, the accuracy and loss of model validation continued to fluctuate, but it could not obtain better validation performance than the model of iteration 694, which indicates that the model was overfitted after 694 iterations. Therefore, the model corresponding to iteration 694 is selected as the final model obtained from this training process and then used for performance testing.
To illustrate the advantage of MFICNN over the single-feature neural network, we trained and tested single-feature neural network models of the time-frequency spectrum, time-domain sequence, and frequency-domain sequence for all training sets. 1-D CNN models and 2-D CNN models are separately trained based on the sequence features and the time-frequency spectrum feature. The above neural network model parameters are consistent with the parameters in the MFICNN model proposed in this paper, and the training parameters and model selection strategy are the same as those of the MFICNN model training process.
The testing results of models trained using different methods and based on different training sets are shown in
Table 6. In the table, I~V represent five training sets with different unbalanced proportions. TDS-1DCNN, FDS-1DCNN, and TFD-CNN represent three single-feature CNN methods. The three features are the time-domain sequence feature, frequency-domain sequence feature, and time-frequency spectrum feature. MFICNN represents the method using the MFICNN model designed in this paper. CVG represents the combination of the above methods with CVAE-GAN data extension. To exclude the influence of the model training randomness on the testing results, 30 models were trained for each method. The average values of the minimum F1 scores and average F1 scores of the 30 models for each method were used for performance comparison. A higher F1 score indicates better method performance. The results in
Table 6 are also illustrated in
Figure 14 and
Figure 15.
Table 6 shows that CVAE-GAN data extension can improve the performance of models trained based on all original unbalanced training sets for all methods. Especially for the III, IV, and V training sets with an imbalance ratio of greater than 8, the performance improvement is more significant.
Compared to the other two single-feature methods, the TDS-1DCNN method has the worst performance. The performance of the FDS-1DCNN method is better than that of the TDS-1DCNN method. Combined with CVAE-GAN data extension, the average F1 score of the CVG-FDS-1DCNN method on datasets I~IV exceeds 0.900, which is significantly better than that of the TDS-1DCNN and CVG-TDS-1DCNN methods. The TFDS-CNN method has the best performance. Combined with CVAE-GAN data extension, the average F1 score of the CVG-TFDS-CNN method on all datasets exceeds 0.900, especially on datasets I~IV, and the average F1 score is close to or exceeds 0.950.
The performance of the MFICNN method is better than that of any method based on a single feature and CNN. The F1 scores of models trained based on the MFICNN and CVG-MFICNN methods are higher than those trained based on the TFDS-CNN and CVG-TFDS-CNN methods, respectively. The average F1 score of models trained using the CVG-MFICNN method on training set I reaches 0.983. For the extremely imbalanced training set V with an imbalance ratio of 30, the average F1 score of models trained based on the CVG-MFICNN method still reaches 0.927.
To represent the diagnosis result of the CVG-MFICNN method more clearly, we take training sets I and V as examples to compare the testing confusion charts of the TFDS-CNN, CVG-TFDS-CNN, MFICNN, and CVG-MFICNN methods, as shown in
Figure 16. The vertical axis of a confusion chart represents the real fault categories, and the horizontal axis represents the diagnosed fault categories. The five fault categories are defined in
Table 4. Paired comparisons of A vs. B, C vs. D, E vs. F, and G vs. H show that combining CVAE-GAN data extension can improve the performance of the models trained using unbalanced training sets. We find that the performance of models trained using the most imbalanced training set V is improved significantly from paired comparisons of C vs. D and G vs. H. In addition, by comparing A vs. E, B vs. F, C vs. G, and D vs. H, it can be found that the number of samples wrongly classified by the MFICNN method for a single fault category is less than that of the TFDS-CNN method, so the diagnostic performance of the MFICNN method is better.
In summary, the CVG-MFICNN method proposed in this paper can effectively improve the diagnostic performance of the models trained based on unbalanced training sets by extending the datasets using the CVAE-GAN method. The performance of the severely unbalanced training sets III, IV, and V is improved rather significantly. Using multimodal features can achieve a higher minimum and average F1 score than other methods based on single features such as the time-frequency spectrum, time-domain sequence, and frequency-domain sequence.
4.5. Comparison with Other Methods
To further illustrate the performance advantages of the CVG-MFICNN method, traditional machine learning models such as KNN, DT, SVM, fully connected neural network (FCNN), and classical convolutional neural network models such as LeNet-5 [
39], AlexNet [
46], VGGNet-16 [
47], ResNet-50 [
48] were trained, tested, and compared in this chapter.
The time domain and frequency domain statistics [
49] are taken as the input features for the traditional machine learning models. Time domain statistics include the peak-to-peak value, the average amplitude, the root mean square value, the kurtosis factor, the skewness factor, and the peak factor. Frequency domain statistics include the average frequency domain amplitude, the gravity frequency, the standard deviation of frequency, the skewness factor of frequency amplitude, and the kurtosis factor of frequency domain amplitude. The SST time-frequency spectrum is taken as the feature for the classical convolutional neural network models. To adapt to the classical network structure, the SST time-frequency spectrum image is scaled before being input into the neural networks. The neuron number in the last fully connected layer of all CNN models is changed to five, which is the number of iron core looseness fault categories of the saturable reactor in the UHVDC thyristor valve.
The performance comparison result is shown in
Table 7. The TFSF represents the time domain and frequency domain statistics, and the TFDS stands for the SST time-frequency spectrum. Similar to the comparison in
Table 6, 30 models were trained for each method, and the average value of the minimum F1 scores and average F1 scores of the 30 models were used for method performance comparison. The computational costs and parameter numbers of different method models are also illustrated in the table. The computational cost is obtained by training models of different methods via the same platform with a CPU of i5-12600KF and a GPU of Nvidia RTX 3090. The computational cost is defined as the time of the total model training process with 1000 iterations for other methods, and the CVG-MFICNN method needs an additional 60 h to extend the training set. The results in
Table 7 are also illustrated in
Figure 17 and
Figure 18.
The minimum and average F1 scores of all traditional machine learning methods are low. The performance of the SVM is the best, but its average F1 score is only 0.5~0.7, and the minimum F1 score is 0.3~0.6. The performance of KNN, decision tree, and FCNN is inferior to that of SVM, the minimum F1 score is in the range of 0.3~0.4, and the average F1 score is approximately 0.5.
Among the classical CNN methods, AlexNet has the best performance. The average F1 scores of AlexNet models trained on all imbalanced training sets exceed 0.850, and the average F1 scores of models trained on training sets II and III exceed 0.900. The performance of LeNet-5, which has the simplest structure, was the second best. The average F1 scores of the LeNet-5 models trained on training sets I, II, and III exceed 0.870, and the average F1 scores of the LeNet-5 models trained on training sets IV and V exceed 0.810. The performance of ResNet-50 and VGGNet-16 is poor. Only the average F1 scores of ResNet-50 models trained on training sets I and II exceed 0.800. The average F1 scores of ResNet-50 models trained on other training sets and VGGNet-16 models trained on all training sets do not exceed 0.800.
Through comparison, it can be found that the performance of the CVG-MFICNN method proposed in this paper is far superior to the traditional machine learning methods and the classical CNN methods. The model training computational cost of the CVG-MFICNN method is similar to that of the LeNet-5, AlexNet, and VGGNet-16 but over 50% less than that of the ResNet-50. The CVG-MFICNN method costs additional computational resources of approximately 60 h for training set expansion during the model preparation stage, but this is acceptable because the model preparation is conducted only once before the model deployment. On the other hand, the models trained using the CVG-MFICNN method have significantly fewer parameters than most classical CNN models, which makes the models easier to deploy in the actual production environment.