3.1. Comparison and Analysis of Time–Frequency Maps
The original Morlet wavelet’s direct current component is not zero and does not satisfy the wavelet admissibility condition. The results are as follows: (1) the time–frequency map is blurred and unclear, and (2) the different frequency components interfere with each other, resulting in aliasing or crossing of energy distribution in time–frequency diagram.
Figure 6 is the time–frequency diagram of the original Morlet wavelet transform for four signals (
).
Figure 7 is the time–frequency diagram of the Morlet wavelet transform for four signals (
).
Figure 8 is a time–frequency diagram of the modified Morlet wavelet transform (
).
In
Figure 6, it is apparent that the original Morlet wavelet does not satisfy the admissibility condition, resulting in fuzzy features, aliasing, and crossing in the time–frequency map.
Figure 7 has the same effect as
Figure 6, and aliasing still exists in the time-frequency diagram.
Figure 8 shows the time–frequency map after the improved Morlet wavelet, which might satisfy the admissibility condition and improve the feature ambiguity and overlap in the time–frequency map.
According to the above analysis, the improved Morlet wavelet can reduce the aliasing effect and improve the time–frequency map’s resolution. However, in
Figure 8b,c, it is apparent that there remains a small amount of feature aliasing because the modified Morlet wavelet only possibly satisfies the admissibility condition. To further separate and deepen features, the time–frequency maps can be input into the residual network. By using residual networks, deeper nonlinear transformations and feature extraction can be utilized in time–frequency map processing. This enables more advanced feature separation and enhancement on the time–frequency map to better capture the time–frequency characteristics of the signal.
3.2. Wavelet Residual Network
For each type of working condition, the sample length is 784, and the sample number of each type is 600. The dataset is divided into a training set, a validation set, and a test set. The ratio is 0.5:0.25:0.25. Two completed experiments are detailed in this paper. Experiment 1 relates to a residual neural network, and Experiment 2 is wavelet transform, which is then input to a residual neural network.
The sample signal is normalized, and the Adam optimization algorithm is adopted. The initial learning rate of the model is 0.001, which decreases to one-tenth every five iterations. A total of 10 types of sample signals are input into the model. The ResNets model was used in Experiment 1. The accuracy of the training set was 100%, and that of the test set was 93.80%, as shown in
Figure 9. In Experiment 2, the accuracy of the training set and test set was 99.15% and 99.12%, respectively, as shown in
Figure 10. The results show that the accuracy of the proposed method was improved by 5.32%.
In summation, the model proposed in this paper can identify faults more accurately than ResNet under variable load conditions. The proposed method can be effectively applied to diagnosis.
Figure 11 shows the confusion matrix of the residual network training results. As apparent from the graph, prediction errors exist in the categories labeled 0, 1, 2, and 8. Specifically, the prediction accuracy rates were 94.67%, 88.00%, 77.33%, and 98.67%, respectively.
Figure 12 shows the confusion matrix of the wavelet residual network’s training results. As observed in the figure, only the classification labeled as 1 has errors in the prediction, and the accuracy rate was 96.00%.
Classification performance usually varies with the number of training samples in the training set.
Table 4 shows the training results for 300, 600, 900, and 1200 samples per class. The results include accuracy and loss rates for the training and validation sets. Meanwhile,
Figure 13 shows the training set’s accuracy trend, while
Figure 13 shows that of the validation set. These trends can be observed to determine whether the model converges during training and how well it generalizes the validation set.
As can be seen from
Figure 13 and
Figure 14, with increasing training samples, the accuracy of both the training and validation sets is improved, and the model performance is enhanced. However, when the sample size reaches 1200, the accuracy decreases by 0.56% and 1.85%, respectively. From the above, as the number of training samples increases, the classification model can more easily capture the overall characteristics and general rules of the data, thus reducing the risk of overfitting. However, if the number of training samples is too large, it may also cause the model to overfit the training data and perform poorly on unseen data. Therefore, there is a trade-off between overfitting and underfitting. A total of 600 sample points are selected for the following reasons: (1) as seen in
Figure 14, 600 sample points for training can provide enough information for learning the characteristics and laws of data. (2) A smaller sample size means shorter training time. The time required to train a model is proportional to the size of the dataset, so choosing a smaller number of samples can significantly reduce training time and improve efficiency.
By using the t-SNE algorithm [
21],
Figure 15 shows the classification of data points at each level after t-SNE dimensionality reduction. Ten colors are used to represent ten categories of fault signals. By observing the relative positions and clustering structure between data points, we can analyze the classification situation more intuitively. Suppose the data points of the same category are clustered together in the dimensionality-reduced space, forming a tight cluster structure. In that case, the model classification ability in this category can be considered as good. Conversely, if data points of the same category are scattered in different regions of space or those of different categories are clustered, there may be misclassification or confusion.
Figure 15a shows the original vibration signal’s distribution. Since these signals are not processed, the data points of different categories are randomly aliased together, and no obvious classification effect can be observed.
Figure 15b shows the classification results after the original Morlet wavelet transform. It is apparent that there is aliasing between the data points of 0 and 1, and there is also aliasing between the data points of 7 and 8. This shows that the original Morlet wavelet transform has not been able to effectively separate different types of data points.
Figure 15c shows the classification results of the improved Morlet wavelet transform. Compared to
Figure 15b, data points with different labels are more distantly separated, and data points with the same label are more closely clustered. This shows that the improved Morlet wavelet transform has achieved some improvement in reducing the aliasing effect between features. However, a small amount of aliasing can still be observed for data points 1 and 5.
Figure 15d shows the classification result after further processing by the residual network. It can be seen that data points of different types are completely separated, and those of the same type are closely clustered. This indicates that the model has good classification performance after residual network processing.
In summary,
Figure 15 shows the effect of different processing steps on vibration signal classification. The introduction of improved Morlet wavelet transform and residual network helps improve classification performance and feature separation so that different types of data points can be better distinguished.