Rolling bearings are critical components in rotating machinery, and their operating conditions under various loads directly impact their performance, stability, and endurance. More specifically, rolling bearings are vital in mechanical equipment. To maintain the normal operation of mechanical equipment, it is necessary to monitor the vibration signals generated by the rotating mechanism in real time [
1]. Many scholars extensively studied the fault detection and diagnosis of rolling bearings [
2,
3,
4]. The traditional manual diagnostic can no longer adapt to the large-capacity, diverse, and high-speed data in the current mechanical field, which leads to poor diagnosis capability and generalization performance in the face of massive amounts of mechanical equipment data with alternating multiple working conditions and the serious coupling of fault information [
5].
The diagnosis of rolling bearings generally consists of two stages: feature extraction and classification. Signal processing approaches that are widely employed to extract features from a raw signal include short-time Fourier transform (STFT) [
6], wavelet transform (WT) [
7], and empirical mode decomposition (EMD) [
8]. However, traditional fault diagnosis methods rely heavily on manual feature engineering and expert knowledge, and the process is time-consuming and laborious. In addition, when extracted features are insufficient, the accuracy of fault diagnosis is greatly reduced, which is not conducive to the diagnostic tasks of massive amounts of industrial data. In the past decade, machine-learning theories and statistical inference techniques have been widely applied to identify bearing faults, such as Bayesian networks [
9], artificial neural networks (ANNs) [
10], support vector machines (SVMs) [
11], and k-nearest neighbor [
12]. Despite the effectiveness of the above-mentioned methods, shallow networks are restricted in their capacity to represent complicated functions with limited samples; thus, they lack the ability to diagnose the faults of complex and high-dimensional signals.
In recent years, deep-learning models have grown in popularity in the field of machine learning, which uses the deep network structure to achieve more efficient and reliable feature extraction. Deep learning disposes of the dependence on manually extracting features and expert experience, which has achieved breakthroughs in many pattern recognition tasks such as natural-language processing [
13], automatic speech recognition [
14], and computer vision [
15]. The application of deep-learning models in fault diagnosis and health monitoring is flourishing [
16,
17]. Shao et al. [
18] proposed a new deep belief network, which was optimized with the particle swarm algorithm, and verified the robustness of the model. Wen et al. [
19] developed a novel DTL model for fault diagnosis that extracted features with a three-layer sparse autoencoder and achieved high prediction accuracy. Jiang et al. [
20] constructed a deep recurrent neural network with an adaptive learning rate for the fault diagnosis of bearings, and results confirmed the effectiveness of the method. Hasan et al. [
21] proposed an explainable AI-based fault diagnosis model and incorporated explainability to the feature selection process. Within the deep-learning framework, convolutional neural networks, as an end-to-end learning model with powerful feature extraction capability, have received more attention in fault diagnosis. Chen et al. [
22] developed bearing discrimination patterns on the basis of the cyclic spectral coherence (CSCoh) maps of vibration signals and established a CNN model to learn high-level features. Guo et al. [
23] proposed a new method named DCTLN for transfer fault diagnosis tasks, and verified the effectiveness of the model by experiments. Jia et al. [
24] proposed a DNCNN to address imbalanced classification problems in fault diagnosis. In some scenarios, raw one-dimensional signals are converted into two-dimensional gray images with pixels fulfilled by data stacking [
25,
26]. However, these methods may contain limited feature information because spatial correlation in a raw vibration sequence can be corrupted. Although there are a few commonly used image representation approaches based on time–frequency principles, such as short-time Fourier transform (STFT) [
6] and wavelet packet transform (WPT) [
27], short-time Fourier transform is not suitable for handling nonstationary signals such as mechanical fault signals, and the determination of the number of decomposition layers for wavelet packets usually relies heavily on expert knowledge. Therefore, a new image encoding method called Markov transition field (MTF) was introduced [
28] that preserves complete time-domain information by representing Markov transition probabilities, and converts that information into two-dimensional images. In addition, despite the great success of deep convolutional neural networks, degradation problems such as gradient disappearance or explosion can occur as the number of layers increases. To address the issue mentioned above, He et al. [
29] proposed residual networks that have achieved excellent performance on various machine-learning tasks.
In order to efficiently represent the state characteristics of vibration signals in image form and improve the feature learning capability of the network, a new intelligent bearing fault diagnosis method (MTF-ResNet) is proposed in this paper. The main contributions of this paper are summarized as follows.
The remainder of this paper is organized as follows.
Section 2 introduces the fundamentals of CNN and residual networks. In
Section 3, the details of the proposed MTF-ResNet model for fault diagnosis are elaborated.
Section 4 outlines experimental analysis to verify the effectiveness of the proposed model by employing a popular bearing dataset.
Section 5 presents the conclusions.