4.1. Dataset Description
(1) Case Western Reserve University (CWRU) Dataset: The CWRU dataset is obtained from [
32]. In the experiment, the single-point damage of the electrodischarge machining (EDM) was artificially made. The signals from the bearings (model SKF6205) were collected at a sampling frequency of 12 kHz over 3 different rotational speeds of a motor drive. Except the normal state, the faults are set as the damages on the inner race, outer race and ball, where the damage diameters of
,
,
and
inches are contained. By considering the correspondences between the damage places and diameters, the 11 fault statuses shown in
Table 1 are adopted.
(2) Xi’an Jiaotong University (XJTU) Dataset: The XJTU dataset is obtained from [
33]. During the accelerated lifetime test, the data from the bearings (model LDK UER204) were collected at a sampling frequency of 64 kHz over 3 different working conditions of an AC motor. The faults are set as the damages on the outer race, inner race and cage. Three fault statuses are considered, as listed in
Table 2. By comparison with the CWRU dataset, the rotational speeds from the XJTU dataset obviously change between different working conditions, which results in significant distribution differences in the collected data.
4.2. Detailed Settings
(1) CWRU Dataset: With this dataset, experiments covering 1HP → 2HP (
), 2HP → 1HP (
), 2HP → 3HP (
) and 3HP → 2HP (
) are designed for the 11 fault statuses. For each experiment situation,
million sampling points are collected, where
of the data are used for training and the rest of the data are used for testing. A range of datasets with different IRs from these data are further designed. IR
is subject to a power-law distribution [
34], since some categories are more likely to occur than others, where
and
represent, respectively, the maximum and minimum numbers of samples from the fault classes. Detailed information is listed in
Table 3 and
Table 4.
(2) XJTU Dataset: With this dataset, experiments covering 1HP → 2HP (
), 2HP → 1HP (
), 2HP → 3HP (
) and 3HP → 2HP (
) are designed for the 3 fault statuses. For each experiment,
million sampling points are collected. The detailed information is also listed in
Table 4, which implements similar settings to that of the CWRU dataset.
In the model training procedure, the detailed parameter settings of UDA-EO are given as listed in
Table 5, where the optimizer is set as the stochastic gradient descent (SGD).
4.3. Comparison Methods
All experiments are carried out on a PC with an Intel Core i9 CPU, 32 GB RAM and GeForce RTX 2080Ti GPU. The programming platform is PyTorch. In this paper, a CNN backbone is customized, where the last linear layer is replaced by a fully connected layer with Xavier-initialized weights and no bias.
To validate the competitiveness of the proposed method, 5 other methods are implemented below for comparisons, including the baseline CNN, domain adversarial NN (DANN, [
35]), deep coral [
36], deep adaptation network (DAN, [
37]) and conditional domain adversarial network (CDAN, [
38]). According to the detailed settings shown in the references, the experimental parameters of all the models are properly fine-tuned.
- (1)
DANN is a typical adversarial learning method and a way to solve transfer learning under severe label imbalance.
- (2)
Deep coral is a DA model that utilizes the second-order statistics to align features between the source and the target domains.
- (3)
DAN is an adaptive model that selects the optimal kernel in the multicore Hilbert space to match the mean value of the distribution.
- (4)
CDAN is an adaptive model that improves the discrimination ability of classifiers through multilinear conditions and conditional entropy.
4.4. Experimental Results
The overall average accuracy defined as the number of correctly identified samples divided by the total number of test samples is adopted to measure the performance of different methods. To eliminate the randomness in the experiments, the average values of 10 experimental results are collected for comparison.
- (1)
Result analyses between different methods in terms of different working conditions
The overall classification results of these two datasets on the imbalanced experiments are listed in
Table 6,
Table 7 and
Table 8 and shown in
Figure 4,
Figure 5 and
Figure 6. As listed in
Table 6, the proposed UDA-EO outperforms the other methods significantly at an average level of 8 transfer experiments with IR1. The average accuracy of UDA-EO for all transfer experiments exceeds
, which is a
more accurate performance than the second best method. As shown in
Figure 4, each method performs different degrees of fluctuations, among which CNN and the proposed UDA-EO show the largest and smallest degrees, respectively. These results indicate that the proposed UDA-EO is able to handle the class-imbalanced problem more effectively under variable working conditions.
In
Table 7, the average accuracy of UDA-EO with IR2 for 8 transfer experiments is more than
, which improves the average accuracy over the next competing method by
. In
Table 8, the classification accuracy of the proposed method with IR3 approximately reaches an 11.27% improvement compared with the baseline CNN method. This shows that the proposed UDA-EO retains the capability of shared-class classification, even with sharp IR in real applications. In addition, the proposed method still displays more robust performance and keeps relatively stable classification accuracy in all eight experiments, as shown in
Figure 5 and
Figure 6.
It should be noted that accurate classification for the imbalanced data is yielded based on two main reasons. Firstly, the proposed selection strategy promotes more reliable pseudosamples for model training, ensuring that the feature of the minority class can be effectively learned by the model on severely class-imbalanced conditions. Secondly, the data features across domains are well-aligned, which guarantees that the sample categories are correctly identified by the classifier.
- (2)
Feature visualization
A result randomly selected from the
experiment under IR2 is processed to visualize the classification effects. The 2D features are displayed by the t-SNE ([
39]), as shown in
Figure 7. It is observed from
Figure 7a–e that the comparison methods have limited learning and separation effects on different health categories under severe class imbalance, resulting in quite a few incorrect classifications of the fault types. However, the classes shown in
Figure 7f were effectively separated with a much clearer class boundary and a more compact learned class feature based on the proposed UDA-EO. The reasons come from the fact that UDA-EO well fits the distributions of the 11 fault types of the imbalanced samples in
, and the overlaps of the class probability distributions are reduced under the conditional entropy minimization. These results indicate the effectiveness and feasibility of using the pseudolabel idea and the entropy minimization method to address the class-imbalanced problem.
- (3)
Interclass performance study
To explore the detailed impact of the proposed framework on some minority class in the class-imbalanced datasets under a cross-domain condition, a single-class performance study is implemented on eight imbalanced experiments. By comparing with the baseline CNN, the average accuracies of all classes can be seen in
Figure 8a, where it corresponds to the majority class, and the rest of the subfigures correspond to the minority classes. It is observed that both methods perform stably on the majority class, but only the proposed UDA-EO maintains good performances on the minority classes when IRs intensify; comparatively, the performances of the baseline CNN on the minority classes degrade significantly. This indicates that the proposed method substantially improves the classification accuracy of the minority classes while maintaining good classification ability on the majority class.
To display the role of the proposed UDA-EO more profoundly in
with imbalanced samples, we randomly visualize one result from
by comparing with the baseline CNN, as shown in
Figure 9. Clearly, the proposed selection strategy achieves a
improvement in classification accuracy over the baseline CNN. As has been discussed in
Section 3.3, UDA-EO is designed to assist the model in improving the confidence of pseudosamples and overcoming the performance degradation caused by ignoring the minority classes under class imbalance, which allows the model to maintain good classification ability in severe class imbalance.
- (4)
Parameter sensitivity analysis
Because the penalty coefficients
and
are critical for UDA-EO, these two hyperparameters are further discussed. Experimental results of
are listed in
Table 9, which indicate that the FD ability of the proposed method is relatively stable when
and
vary in
. It is based on the fact that the model takes advantage of the predictive consistency and robustness of the proposed EO and data augmentation strategies to select reliable target samples for corresponding classes. The slight changes in the performances indicate that the best parameters could be found, such as the ones around
and
.
- (5)
Ablation case analysis
To evaluate the effectiveness of the proposed optimized-entropy algorithm, ablation experiments are designed in this subsection. In particular, UDA-EO is compared with the following modifications. (1) w/o conditional entropy: the CNN is trained without conditional entropy loss function. (2) w/o
: the model is trained without the proposed data augmentation strategy. (3) Ours: the proposed method is provided to train the model. The ablation results with IR3 settings are listed in
Table 10.
In
Table 10, the results of (1–3) show that the proposed method obtains relatively higher classification accuracy, which implies that the conditional entropy and the proposed data augmentation strategy have positive impacts on the UDA-EO model and improve FD precision with data imbalance settings. By comparing (2) with (3), it can be discovered that the precision of 3/4 FD drops when the data augmentation strategy is removed from the proposed method, which shows that the data augmentation strategy can effectively alleviate the model’s negative learning of minority class feature information in the initial state and improve the FD accuracy.