Enhanced Bearing Fault Diagnosis in NC Machine Tools Using Dual-Stream CNN with Vibration Signal Analysis

Ni, Zhen; Tong, Yifei; Song, Yixuan; Wang, Ruikang

doi:10.3390/pr12091951

Open AccessArticle

Enhanced Bearing Fault Diagnosis in NC Machine Tools Using Dual-Stream CNN with Vibration Signal Analysis

¹

School of Information Engineering, Nanjing Xiaozhuang University, Nanjing 211171, China

²

School of Mechanical Engineering, Nanjing University of Science & Technology, Nanjing 210094, China

^*

Author to whom correspondence should be addressed.

Processes 2024, 12(9), 1951; https://doi.org/10.3390/pr12091951

Submission received: 13 August 2024 / Revised: 7 September 2024 / Accepted: 9 September 2024 / Published: 11 September 2024

(This article belongs to the Section Process Control and Monitoring)

Download

Browse Figures

Versions Notes

Abstract

Numerically controlled (NC) machine tools, as vital production equipment in manufacturing, have been widely applied across various sectors and have become a core competitive advantage for enterprises in the global market. Therefore, ensuring the normal and efficient operation of NC machine tool groups and promptly diagnosing faults have become critical concerns for many enterprises and scholars today. This paper focuses on bearing fault diagnosis, utilizing the vibration signals from the Case Western Reserve University Bearing Data Center as the input dataset. This study constructed a dual-stream convolutional neural network (CNN) fault diagnosis model, where the first stream processes one-dimensional vibration signal spectra and the second stream handles two-dimensional time-frequency maps derived from the same signals. The model uniquely integrates convolutional attention mechanisms to enhance feature extraction along with dropout algorithms and batch normalization to prevent overfitting and improve training stability. The proposed approach enables a comprehensive learning of both temporal and spatial features, effectively identifying bearing faults with high accuracy. The model’s performance was validated against this widely recognized dataset, demonstrating superior accuracy compared to traditional methods.

Keywords:

dual-stream CNN; fault diagnosis; bearing faults

1. Introduction

Numerically controlled machine tools are widely used in modern manufacturing, playing a crucial role in manufacturing-related enterprises. However, in such contexts, there are also increased potential risks. If a machine tool failure caused by a damaged component is not detected in time, it can trigger a chain reaction that affects the normal operation of other machine tools, reduces the qualification rate of processed workpieces, and even interrupts production tasks, leading to irrecoverable economic losses. Therefore, ensuring the stable and efficient operation of NC machine tool groups and focusing on the state monitoring and fault diagnosis technology of NC machine tool groups are crucial. The frequency of spindle-bearing failures is notably high. As a vulnerable component in NC machine tools, the health condition of rolling bearings directly impacts the machining precision and product quality of the equipment.

During the operation of numerically controlled machine tools, acoustic signals, temperature signals, and vibration signals are generated. When the spindle bearings are subjected to long-term loading, they release energy that causes shock waves, which can be detected through sensors. The acoustic signals collected by these sensors exhibit changes in the early stages of spindle-bearing failure. Indicators such as amplitude, energy, and average signal level are used for fault diagnosis. Although acoustic signals have been successfully applied in diagnosing spindle-bearing faults in NC machine tools, this method has limitations. It is highly susceptible to interference from external environmental noise and has relatively high detection costs. Temperature signals are greatly influenced by the environment and are typically monitored using contact sensors that measure the temperature of the spindle-bearing housing or are estimated using infrared thermography. These temperature signals can be converted into thermal images to extract features for fault diagnosis. However, this approach cannot pinpoint the exact fault location and tends to lag in detecting early-stage issues such as minor wear or initial pitting, where temperature changes are minimal. Vibration signals, on the other hand, respond quickly to changes in the spindle-bearing condition. By installing vibration sensors on the spindle housing or bearing seat, vibration signals can be collected and analyzed using various algorithms to identify the fault location. This method is currently the most commonly used in the field of spindle-bearing fault diagnosis due to its strong adaptability across different working conditions and types of bearings, as well as its lower cost and higher reliability. Therefore, this paper focuses on a fault diagnosis of spindle rolling bearings in NC machine tools based on vibration signals.

This paper adopts an end-to-end recognition algorithm to establish a convolutional model that integrates multiple feature fusion based on the characteristics of vibration signals. The effectiveness of this model in diagnosing faults in rolling bearings (hereafter referred to as bearings) was verified, where its capability to maintain high accuracy even in noisy environments and domain adaptive scenarios was demonstrated.

2. Characteristics of Bearing Vibration Signals

Bearing vibration signals, when disregarding manufacturing and assembly errors, are caused by two factors: inherent vibrations due to elastic contact and vibrations from localized bearing damage. The original vibration signals generated by normal bearings are complex signals with multidimensional interference factors and exhibit random distribution. In contrast, the vibration signals generated when bearings fail alter the frequency of the original vibration signals, appearing as periodic pulse signals. Simultaneously, vibration impacts trigger high-frequency vibrations in the system, constituting the low-frequency and high-frequency parts of the signal, i.e., the periodic vibration impacts generated by bearing faults modulate the inherent vibration effects of the bearings [1].

Random samples of vibration signals under normal conditions, as well as outer-ring faults, inner-ring faults, and rolling-element faults, were extracted from the CWRU standard bearing fault dataset. Figure 1 and Figure 2, respectively, depict the vibration acceleration signals and signal envelope spectra of the bearings in these four states. Under normal conditions, the vibration signals of bearings exhibit random distribution with relatively small amplitude, showing periodic weak attenuation. When an outer-ring fault occurs, the rolling-element impact at the fault position causes shock vibration, resulting in periodic characteristics of the bearing’s vibration signals, with the amplitude of each impact remaining basically unchanged [2]. This periodicity is also clearly observable in the envelope spectrum. In the case of inner-ring faults, because the inner ring rotates continuously during bearing operation, the magnitude and direction of the load at the fault location change periodically with external loads. Compared to outer-ring fault signals, the amplitude significantly decreases, exhibiting periodic variations that modulate the bearing vibration signals. For rolling-element faults (as the rolling elements not only revolve, but also rotate on their own axis during rolling), the contact state with the inner and outer rings changes during operation, leading to variations in the magnitude and direction of the load. From the above analysis, it is evident that the amplitude of the vibration signals generated when faulty rolling elements contact the outer ring is greater than when contacting the inner ring, and they are more susceptible to interference, rendering the fault characteristics relatively ambiguous.

3. Literature Review

Fault diagnosis involves the comprehensive analysis of fault information by combining the status data of the faulty subject with the enterprise’s fault response knowledge base and historical experience. The discipline of fault diagnosis began to develop in the 1960s and received widespread attention globally. Numerous safety incidents caused by sudden failures in numerically controlled (NC) machine tools during production processes drew attention in the United States, leading to the establishment of a mechanical fault prevention team dedicated to in-depth research on equipment fault diagnosis technology [3]. The United Kingdom also established the Machinery Health Monitoring Group & Condition Monitoring Association (MHMG & CMA), advocating for the technical concept of integrated engineering maintenance, which facilitated the rapid progress of the fault diagnosis discipline.

Overall, the development of fault diagnosis for NC machine tool groups can be divided into three stages: The first is manual fault diagnosis, which relies on the practical experience of maintenance personnel to identify faults in the various components of NC machine tool groups. The second stage involves the use of dynamic testing technology to establish mathematical models based on the collected data from various components of NC machine tool groups for fault diagnosis, a phase that has led to the rapid development of signal processing technology. The third stage is the application of intelligent diagnostic technologies. With the rapid advancement of the Internet and artificial intelligence, traditional fault diagnosis methods based on simple knowledge reasoning have become insufficient. This has ushered in the era of intelligent fault diagnosis, which is characterized by technologies such as “expert systems”, “fuzzy theory”, “support vector machines”, “artificial neural networks”, and “deep learning”. This section discusses the commonly used intelligent fault diagnosis techniques [4].

Artificial neural networks (ANNs) simulate the structure and functions of the human brain, leveraging their powerful information processing capabilities to achieve self-learning. Due to their strong serial arithmetic abilities and parallel processing systems, ANNs have been widely applied in the fault diagnosis of NC machine tools. For example, Unal et al. used Hilbert transform and fast Fourier transform to extract features from collected vibration signals and input them into a backpropagation neural network (BPNN) for training, successfully identifying the fault conditions in NC machine tool rolling bearings [5]. Wu L. et al. employed Laplace transform to filter the signal features of NC machine tool rolling bearings, which were then input into a wavelet neural network (WNN) for diagnosis, with experiments demonstrating high accuracy [6].

ANNs have significant advantages in handling the large-scale data within NC machine tools, and they have strong learning abilities and good fault tolerance, leading to the rapid development of fault diagnosis technologies based on ANN algorithms. Deep learning, an extension of ANNs, is essentially a neural network with multiple hidden layers that, combined with a series of learning algorithms, forms a deeper network structure with superior performance compared to traditional ANNs. Deep learning, with its strong self-learning capabilities, has emerged as a promising field in fault diagnosis development, enabling comprehensive learning of NC machine tool signal characteristics and achieving end-to-end intelligent fault diagnosis.

For instance, Shao H. et al. applied a deep autoencoder to the fault diagnosis of NC machine tool gearboxes and bearings using an artificial fish swarm optimization algorithm to optimize network parameters, with multiple trials proving the algorithm’s effectiveness [7]. Compared to methods such as deep belief networks and deep autoencoders, convolutional neural networks (CNNs) offer new approaches in the field of NC machine tool fault diagnosis with their distinctive advantages such as local connectivity and weight sharing, making the network structure more similar to biological neural networks. By extracting data features and reducing dimensionality through convolutional and pooling layers, CNNs identify faults in the fully connected layer.

Due to CNNs’ superior image recognition capabilities, most researchers utilize 2D-CNNs for fault recognition. Zhao used time-frequency graphs of the vibration signals from NC machine tool rolling bearings obtained through compression transform as input for CNNs, and they also employed particle swarm optimization to achieve a convolutional neural network model with high fault recognition accuracy [8]. Zhang D. et al. converted the one-dimensional time-series signals of NC machine tool rolling bearings into grayscale images, extracted features using CNNs, and trained the model; they then combined it with transfer learning to transfer the parameters from the source model to DCNN-TL, achieving high classification accuracy. Deep learning models have gradually demonstrated their advantages in the context of NC machine tool intelligence development, and they are capable of capturing rich information in high-speed, large-capacity, and diverse sample data, leading to superior fault diagnosis outcomes [9]. Currently, deep learning holds unique advantages in the field of NC machine tool fault diagnosis and is developing toward a trend of integrating multiple optimization algorithms.

4. Issues with Classical CNN Models in Bearing Fault Diagnosis Applications

The traditional machine learning approaches applied to bearing fault diagnosis require extensive domain knowledge and involve cumbersome and challenging diagnostic processes, which do not adequately meet the demands of complex industrial production environments. Although classical convolutional neural network (CNN) models are widely used because they eliminate manual feature extraction, reduce human influence on features, and achieve high recognition accuracy compared to traditional machine learning methods, they still face several issues when directly applied to bearing fault diagnosis.

(1): The structure of classical CNN models is singular and the data structure at the input layer is relatively uniform; thus, they are unable to comprehensively analyze the features of complex data, resulting in insufficient speed and effectiveness in recognition.
(2): Classical CNN models leverage advantages such as local perception and weight sharing, and they are extensively applied in image recognition fields where they effectively represent image features. However, bearing fault signals typically manifest as one-dimensional vibration signals, posing a mismatch with the filter structure of classical CNNs.
(3): The CNN models widely used in image recognition predominantly address low-noise, multi-class classification problems, whereas bearing fault diagnosis often involves high-noise, few-class classification problems. Due to these inherent differences, classical CNN models fail to leverage their strengths effectively.
(4): Bearing fault signals often exhibit periodicity, rich frequency domain information, and significant interference. Using classical CNN models alone does not effectively extract features or explore the frequency domain information based on signal characteristics.

5. Dual-Stream CNN Model

Based on the characteristics of bearing fault signals and the challenges of classical CNN models, this paper constructed a dual-stream CNN model that integrates one-dimensional features and two-dimensional feature maps of bearing fault signals for multi-feature fusion.

Dual-stream networks were first applied to behavioral analysis in videos or images, mimicking human visual processing during object recognition. However, the strong local perception ability of convolutional neural networks results in a poor processing of video and image information. Dual-stream networks, in order to better characterize the environmental information in videos or images, split the behavioral classification stream into two and process the spatial and temporal streams separately when performing data processing. The spatial flow network takes a frame image as input and extracts and parses the features in its static information. The temporal stream network takes the optical stream image as input and parses the dynamic information between a certain two frames. Dual-stream networks extract different types of information from the input. After extracting feature sequences from the spatial and temporal streams, the two parts are fused for subsequent analysis. To ensure the generalization ability of the network model, convolutional fusion, LSTM fusion, and other methods are often used. The structure of dual-stream neural networks in the field of computer vision is schematically shown in Figure 3 [10].

6. Optimizing the Dual-Stream CNN Fault Diagnosis Model

6.1. Time-Frequency Analysis Methods

This section focuses on optimizing the dual-stream CNN model, where the 2D-CNN requires converting the one-dimensional bearing vibration signals into two-dimensional time-frequency maps for input. Since bearing vibration signals are non-stationary, traditional signal processing methods, such as time-domain or frequency-domain analysis, cannot capture both types of features simultaneously. Therefore, this section compares commonly used time-frequency analysis methods to select the most suitable one.

In practical signal acquisition, the collected fault signals are non-stationary time-domain signals, making time-frequency analysis crucial in fault diagnosis. Time-frequency analysis transforms the amplitude relationships in the time domain into frequency distribution characteristics, allowing simultaneous analysis in both time and frequency domains. Additionally, it can represent the energy density of fault signals through color, enabling more effective and precise extractions of bearing fault features [11]. Common linear time-frequency analysis methods include Short-Time Fourier Transform (STFT), Continuous Wavelet Transform (CWT), and Gabor Transform (GST). Non-linear methods include Wigner–Ville Distribution (WVD).

(1): Short-Time Fourier Transform (STFT).

The traditional Fourier Transform, when used to process bearing fault signals, provides frequency spectrum information without time resolution, limiting its ability to analyze signal characteristics in the time domain. To address this issue, British physicist Dennis Gabor proposed the Short-Time Fourier Transform in 1964, overcoming the limitations of traditional Fourier Transform and finding extensive application in speech recognition [12].

The Short-Time Fourier Transform assumes that the signal is stationary over short periods. It segments the signal along the time axis in the time domain, applies a windowing function to each segment, and performs Fourier Transform on them, resulting in frequency spectrum functions. The size of the sliding window function determines the segment size and, consequently, the time resolution of the frequency signals, thereby addressing the limitation of the Fourier Transform, which only identifies frequency components without providing time information. The definition of STFT is given by

S T F T_{Z} (τ, f) = \int_{- \infty}^{\infty} z (t) g (t - τ) e^{- j 2 π f t} d t,

(1)

where t represents time; f represents frequency; z(t) is the source signal; and g(t) is the window function, which is commonly chosen from Hanning, rectangular, or Hamming windows.

Ref. [13] indicates that selecting a window function involves a tradeoff: a shorter window increases signal resolution, while a wider window decreases it. For bearing fault signals, optimizing frequency resolution and interference resistance is crucial, making the Hamming window, with its minimal side lobe leakage, a preferred choice. Using the Hamming window function, two random samples from the CWRU bearing fault dataset were selected, representing normal bearings and bearings with an outer-ring fault of 0.021 inches. Figure 4 shows the time-frequency maps obtained using STFT for these samples.

(2): Continuous Wavelet Transform (CWT).

Although the Short-Time Fourier Transform (STFT) addresses some of the limitations of Fourier Transform, its fixed sliding window function cannot balance time-frequency resolution globally, resulting in poor adaptability to different signals. The scale and translation properties of wavelet functions address these issues. While the window area in wavelet transforms is fixed, its length and width can adjust according to frequency changes, altering the time resolution to handle both the high and low-frequency components of the signal. This adaptability ensures precise and high-resolution signal processing, focusing on the specific features within fault signals.

Similar to Fourier Transform, the wavelet coefficients in Continuous Wavelet Transform (CWT) are represented through a series of functions. If the following condition is met, a wavelet time-frequency map can be obtained, allowing for the inverse transform process:

\int_{- \infty}^{\infty} \frac{\overline{ψ} (ω)}{∣ ω ∣} d ω < + \infty,

(2)

\int_{- \infty}^{\infty} t^{k} ψ (t) d t = 0, k = 0, 1, 2, \dots, N - 1

(3)

In Equation (2),

ψ

denotes the parent wavelet function and

\overline{ψ}

represents the conjugate function of

ψ

.

The continuous wavelet function can be represented by

ψ

as

ψ_{a, b} (t) = \frac{1}{\sqrt{a}} ψ (\frac{t - b}{a}), a, b \in R, a \neq 0 .

(4)

In Equation (4), a denotes the scaling factor, which can be set larger to focus on the low-frequency features of the signal or smaller for high-frequency features; and b represents the translation factor, allowing the target to shift parallel along the time axis.

The Continuous Wavelet Transform can be expressed as

C W T (a, b) = \frac{1}{\sqrt{a}} \int_{- \infty}^{\infty} f (t) \bar{ψ} (\frac{t - b}{a}) d t .

(5)

The inverse transformation can be expressed as

f (t) = \frac{1}{C} \int_{- \infty}^{\infty} \int_{- \infty}^{\infty} ψ_{a, b} (t) C W T_{f} (a, b) \frac{1}{a^{2}} d a d b,

(6)

C = 2 π \int_{- \infty}^{\infty} \frac{\overline{ψ} (ω)}{∣ ω ∣} d ω .

(7)

A key factor in wavelet transform is the selection of the wavelet basis. Common wavelet functions include Haar, Daubechies, Morlet, and complex Morlet (cmor) wavelets. Bearing fault signals are non-stationary with localized variations. The cmor wavelet, as a complex form of the Morlet wavelet, closely resembles the time-series waveform of bearing fault signals, accurately depicting the time-frequency changes of α-wave energy with strong adaptive capabilities. Thus, the cmor wavelet was chosen as the wavelet basis function. Random samples from the CWRU bearing fault dataset, representing normal bearings and bearings with a 0.021-inch outer-ring fault, were selected. Figure 5 illustrates the time-frequency maps obtained using CWT for these samples.

(3): Generalized S Transform (GST).

Wavelet Transform addresses the fixed window function limitation of Short-Time Fourier Transform (STFT) and represents an advancement over it. The S Transform, introduced by Stockwell R. G. et al. in 1996, further evolved these concepts using a Gaussian window, thus overcoming both the STFT’s deficiencies and the complexity of selecting a basis function in Wavelet Transform. However, the S Transform’s inability to adapt its scale factor with signal frequency changes restricts its time-frequency resolution enhancement when processing bearing fault signals. Therefore, the Generalized S Transform (GST) introduced a modulation factor, widening the Gaussian window for high-frequency components and narrowing it for low-frequency components, making it more suitable for extracting time-frequency domain information.

The S Transform is represented as follows [14]:

S (τ, f) = \int_{- \infty}^{\infty} x (t) \frac{| f |}{\sqrt{2 π}} e^{- \frac{(τ - t)^{2} f^{2}}{2}} e^{- j 2 π β} d t .

(8)

here, the

τ

time-shift parameter is described, and

f

indicates the frequency.

In order to enhance the flexibility of the time-frequency analysis of the S Transform, a moderating factor was introduced into the Gaussian window function, and the generalized S Transform can be expressed as follows:

G S T (τ, f) = \int_{- \infty}^{\infty} x (t) \frac{λ {|f|}^{p}}{\sqrt{2 π}} e^{\frac{λ^{2} f^{2 p} {(τ - t)}^{2}}{2}} e^{- j 2 π f t} d t .

(9)

When it comes to adjusting factors

λ = 1, p = 1

, the generalized S Transform is very much the standard S Transform.

Two of the vibration signals in the CWRU bearing fault dataset were randomly selected, namely the normal bearing and the bearing with a 0.021-inch outer-ring fault. Figure 6 is a time-frequency diagram that was obtained by the generalized S Transform.

(4): Wigner–Ville distribution (WVD).

Different from the above three time-frequency analysis methods, the Wegener–Weir distribution is a typical bilinear time-frequency distribution; it was applied by Wigner E. to quantum mechanics and then later applied by Ville J. to signal analysis, which has attracted extensive attention. In essence, it is based on the autocorrelation function of the bearing fault signal to carry out the Fourier transform, and it does not add windows in the transformation process, so it overcomes the problem of contradiction and containment between temporal resolution and spatial resolution.

There is a continuous signal

x (t)

, and, from a time-domain perspective, Wigner–Ville distribution can be expressed as

W (t, ω) = \int_{- \infty}^{\infty} x (t + \frac{τ}{2}) x * (t - \frac{τ}{2}) e^{- j ω τ} d τ .

(10)

In Equation (10),

x * (t)

is

x (t)

, which is the complex conjugation.

From a frequency domain perspective, Wigner–Ville distribution can be expressed as

W_{X} (t, ω) = \frac{1}{2 π} \int_{- \infty}^{\infty} X (ω + \frac{θ}{2}) X * (ω - \frac{θ}{2}) e^{i θ t} d θ .

(11)

Based on Equations (10) and (11), the calculation of the Wigner–Ville distribution (WVD) involves performing Fourier transforms of the product of equal-time increments around time t, thereby revealing the time-frequency characteristics of bearing fault signals. WVD exhibits significant time-frequency focusing effects on single-component signals. However, due to its bilinear nature, it may introduce noticeable cross-term interference in the analysis of multi-component signals, thereby affecting the accurate identification of signal characteristics. Two randomly selected vibration signals from the CWRU bearing fault dataset were considered: one from a normal bearing and the other from a bearing with an outer race fault of 0.021 inches. Figure 7 shows the WVD time-frequency representation of these signals.

The time-frequency representations obtained from the analysis methods vividly display variations in the vibration signals over time and frequency. From Figure 4, Figure 5, Figure 6 and Figure 7, it can be observed that the short-time Fourier transform spectrograms clearly distinguished between normal bearings and bearings with outer race faults, but with overall lower resolution. Both the wavelet transform and the generalized S Transform exhibited higher resolutions, with the wavelet transform proving superior. The Wigner–Ville distribution, affected by cross-term interference, showed weaker energy compared to other methods. Therefore, this study selected wavelet transform as the preferred method for time-frequency analysis.

6.2. Attention Mechanisms

The attention mechanism originates from computational neuroscience and fundamentally functions as a bio-inspired mechanism. For example, when humans observe an image, objects with distinct features or vivid colors typically draw more attention from the visual system. The deep neural networks in the brain allocate more attention resources to a few key parts, ignoring irrelevant areas, thus obtaining more useful information. Similarly, in deep learning, the attention mechanism selects key information from all available data. Deep neural networks extract data features through convolution operations, assuming that the resulting feature maps contribute equally. In reality, however, the contributions of these feature maps to the outcome vary significantly, and manually selecting relevant features is challenging. The attention mechanism helps by blurring non-essential information, reducing redundancy, preventing data overload, and focusing on critical information, thereby enhancing the overall efficiency and performance of the network.

In the context of bearing fault information, CNNs extract various features through convolution operations, such as noise, operating conditions, and mechanical equipment characteristics. Only features related to the bearing’s state are considered key. The attention mechanism can capture effective data for fault identification in noisy environments or different operating conditions, suppressing irrelevant information and improving resource utilization efficiency.

Current research categorizes attention mechanisms into three types: channel attention, spatial attention, and hybrid attention mechanisms. This study employed a combination of channel and spatial attention mechanisms, integrating them sequentially into the network. The convolutional attention mechanism, an attention module for feedforward convolutional neural networks, aggregates information from both channels and spatial dimensions [15]. Figure 8 illustrates the structure of this module, which is widely applied due to its strong applicability and flexibility [16].

When combining the mechanisms shown in Figure 9 and Figure 10, the convolutional attention mechanism module first compresses the dimensions of the input features. This was achieved by applying max pooling and average pooling through two separate branches, aggregating the spatial information from these branches. The aggregated information was then transformed through a shared network, which essentially consists of a submodule composed of a multilayer perceptron (MLP) and hidden layers. The features output by the MLP are activated using the Sigmoid function to obtain the channel attention weights. These weights are then multiplied with the input feature F, as shown in Equation (12):

\begin{array}{l} M_{c} (F) & = σ (M L P (A v g P o o l (F)) + M L P (M a x P o o l (F))) \\ = σ (W_{1} (W_{0} (F_{a v g}^{c})) + W_{1} (W_{0} (F_{m a x}^{c}))) \end{array} .

(12)

In Equation (12), the

F \in ◻^{C \times H \times W}

input feature is described; σ denotes the Sigmoid activation function; AvgPool stands for Average Pooling; and MaxPool represents maximum pooling.

As the convolution size can only be 3 × 3 or 7 × 7 and the effect of 7 × 7 is better, the convolutional kernel of a 7 × 7 size is widely used at present; as such, the convolution kernel of a 7 × 7 size was selected in the attention mechanism in this paper, and the step size used was 3. After convolution, the spatial attention weight was obtained by Sigmoid activation, which was multiplied by the input features, as shown in Equation (13).

\begin{array}{l} M_{s} (F) & = σ (f^{7 \times 7} [A v g P o o l (F); M a x P o o l (F)]) \\ = σ (f^{7 \times 7} ([F_{a v g}^{s}; F_{m a x}^{s}])) \end{array} .

(13)

In Equation (13), the

F \in R^{C \times H \times W}

input feature is described; σ denotes the Sigmoid activation function; AvgPool stands for Average Pooling; MaxPool represents the maximum pooling; f^7×7 represents a convolutional kernel of 7 × 7 size.

By introducing the attention mechanism, the importance of features was distinguished in terms of both channel and spatial positions. This differentiation and learning of feature variations enhanced the model’s performance.

6.3. Dropout

Given the high complexity of bearing fault signal structures, there may be significant differences between the training and test datasets. This discrepancy can lead to poor model performance on the test set and weak generalization capabilities. To address this, we employed the Dropout algorithm to prevent overfitting and to enhance the model’s generalization ability. The Dropout algorithm, introduced by Alex et al. in 2012, tackles the overfitting problem in neural networks by randomly altering the network structure [17]. Figure 11 compares the network structure before and after applying the Dropout method.

During training, dropout randomly drops certain neurons and their connections with a probability p during forward propagation, removing the local features represented by these neurons. Consequently, the network structure changes with each iteration. Despite these changes, the parameters are globally shared, and the results from the multiple implicit sub-models generated during training are aggregated. This approach improves classification accuracy and reduces excessive dependency among network neurons, effectively preventing overfitting [18].

The effectiveness of the Dropout method hinges on the selection of the dropout probability p. When p = 1, all network neurons are dropped, rendering the model incapable of acquiring valuable information. Conversely, when p = 0, no dropout operation was performed, making this choice equally meaningless. In this study, we selected p as 0.5.

6.4. Batch Normalization

Generalization performance is one of the metrics for evaluating fault diagnosis models. Better generalization performance indicates that the model can more effectively classify new samples. However, during model training, as various parameters are continuously updated, the data distribution within each layer of the network also changes, affecting both the convergence speed and generalization performance of the network. To mitigate the issue of changing data distributions, it is crucial to ensure that the data samples adhere as closely as possible to the “independent and identically distributed” (i.i.d.) assumption. Therefore, this study incorporated batch normalization (BN) layers into the model.

BN layers first standardize each batch of training samples, ensuring that the outputs between layers are normalized to a standard normal distribution. This process aims to reduce internal covariate shift during network training, thereby improving training efficiency. Additionally, BN introduces scaling parameter γ and shift parameter β, which adjust the normalized features through scaling and shifting. This prevents the input data from being constrained to a narrow linear region, thus enhancing the model’s ability to fit the training set [19]. A BN layer input was represented as B = {

x_{1}, x_{2}, \dots, x_{m}

}, and it was then calculated as Equations (14)–(17):

μ_{B} = \frac{1}{m} \sum_{i = 1}^{m} x_{i},

(14)

σ_{B}^{2} = \frac{1}{m} \sum_{i = 1}^{m} {(x_{i} - μ_{B})}^{2},

(15)

{\hat{x}}_{B} = \frac{x_{i} - μ_{B}}{\sqrt{σ_{B}^{2} + ε}},

(16)

y_{i} = γ {\hat{x}}_{i} + β .

(17)

In Equations (14)–(17), μB represents the mean;

σ_{B}^{2}

denotes the standard deviation; ε indicates a minimum constant, which defaults to 10⁻⁸ to maintain a stable state of the data;

\hat{x}

indicates the intermediate result of batch normalization γ; β represents the amount of scaling and offset, which are learning parameters; and y_i denotes the normalized output.

In this paper, the BN layer was added after each activation function to normalize the input data, which helps mitigate the problems of gradient vanishing and gradient explosion, thereby improving the generalization performance of the network.

6.5. Construction of a Bearing Fault Diagnosis Model Based on an Optimized Dual-Stream CNN

To fully utilize the information in bearing vibration signals, comprehensively consider data features, and overcome the limitations of classical CNNs in bearing fault diagnosis, this paper proposes a multi-feature fusion CNN architecture. This approach mitigates the loss of fault information that can occur when bearing signals are processed in a single domain.

To address the limitations of the classical CNN model’s single architecture and the mismatch between one-dimensional bearing fault signals and the classical CNN filter structure, we replaced two-dimensional filters with one-dimensional filters while integrating two-dimensional time-frequency images to comprehensively analyze the bearing data features. Considering the characteristics of bearing fault diagnosis, such as few classifications, high noise, and complex data features, we incorporated attention mechanisms, dropout layers, and batch normalization (BN) layers into the network model to enhance its generalization ability and performance.

The dual-stream CNN model framework proposed in this paper, as depicted in Figure 12, extracts one-dimensional frequency domain features and two-dimensional time-frequency features to characterize the joint temporal and spectral characteristics of bearing fault signals. These features are fed into a CNN network with multi-channel input. Each channel undergoes convolution, pooling operations, and integrates SE attention mechanisms to capture and learn the relationships between channels. Batch normalization was applied, followed by feature fusion at the aggregation layer where the features from both dimensions were concatenated. To enhance model generalization and prevent overfitting, dropout algorithms were employed. Error backpropagation was then performed to iteratively update parameters.

The process of constructing the multi-feature fusion CNN framework for bearing fault diagnosis is illustrated in Figure 13. The specific processing steps are outlined as follows:

(1): Bearing Vibration Signal Acquisition: Install an accelerometer at the measurement point to collect the original vibration signal of the bearing and record the relevant operational information.
(2): Signal Preprocessing: Fuse the collected vibration data from two directions to prevent the limitations of a single direction. The intelligent fault diagnosis model requires input data of equal length. To ensure a sufficient amount of data in the training set, perform overlapping sampling of the vibration signals. The overlapping sampling process is illustrated in Figure 14, where L represents the sample length and l represents the overlap length. Additionally, divide the dataset into training, validation, and test sets in a ratio of 7:2:1.
(3): Input Preparation: Perform Short-Time Fourier Transform (STFT) on each sample in the dataset to obtain one-dimensional frequency domain information as the one-dimensional input. Perform Continuous Wavelet Transform (CWT) using the cmor wavelet as the wavelet basis function to obtain two-dimensional time-frequency domain information as the two-dimensional input.
(4): Network Model Training: Construct the dual-stream CNN model and initialize the parameters. Perform forward propagation to extract data features and fuse them at the fully connected layer. Use backpropagation to update the parameters and calculate the loss function value. The model training is completed when the loss function value reaches its minimum, and the model is then saved.
(5): Model Testing: Input the test set into the trained dual-stream CNN model and calculate the model’s classification accuracy.

6.6. Parameters of the Bearing Fault Diagnosis Model Based on Optimized Dual-Flow CNN

The structure of the 1D-CNN network is outlined in Table 1. It consists of two convolutional layers that extract data features, with kernel sizes of 64 and 5 and channel numbers of 6 and 16, respectively. The use of a large convolutional kernel in the first layer is intended to effectively capture features as the 1D-CNN input is derived from the Short-Time Fourier Transform (STFT) of the one-dimensional input. A large convolutional kernel can autonomously learn diagnostic features based on FFT, providing better performance. Using only multiple small convolutional kernels instead may result in susceptibility to noise interference in industrial environments. After convolution, max pooling layers are applied to perform dimensionality reduction and decrease the number of parameters. There are two sets of pooling layers with sizes of 2 and 3 and strides of 2 and 3, with corresponding channel numbers of 6 and 16, respectively. To address the data shift and incremental issues, normalization was applied after each layer. The ReLU activation function was used, and, finally, the features were flattened into a one-dimensional feature vector.

The structure of the 2D-CNN network is shown in Table 2. It also contains two convolutional layers with 5 × 5 kernels and channel numbers of 6 and 16, respectively. After convolution, max pooling is used for dimensionality reduction, with pooling layer sizes of 2 × 2. Following normalization, the ReLU activation function is applied, and the features are then flattened into a one-dimensional feature vector, which is concatenated with the 1D-CNN features at the aggregation layer.

The integrated dual-stream CNN network structure is illustrated in Table 3. It includes two fully connected layers designed to process the concatenated features from both streams. The first fully connected layer consists of 120 neurons, which concatenate the features from the dual streams. Dropout is applied between the two fully connected layers to randomly deactivate neurons, thereby preventing overfitting. The output from the fully connected layers, represented as a one-dimensional feature vector, is then connected to a Softmax classifier with ten outputs, corresponding to the ten fault labels in the bearing dataset.

During network training, an optimization algorithm is essential for backpropagation. Traditional stochastic gradient descent (SGD) can often become stuck in local optima when dealing with multiple parameters. Therefore, this study employed the Adam (adaptive moments) algorithm, which dynamically adjusts the learning rate of each parameter through the estimation of the first and second moments of the gradient. This keeps the step sizes within a reasonable range.

Since the learning capability of the model is directly related to the number of iterations, a learning rate decay strategy was adopted to enhance the model’s learning efficiency. Initially, a higher learning rate of 0.01 was used to accelerate model convergence and reduce time costs. However, after 50 iterations, empirical observations show that the convergence rate significantly slowed down. Consequently, a lower learning rate of 0.001 was employed to prevent overfitting and improve the model’s generalization performance.

7. Experimental Verification and Analysis

7.1. Fault Diagnosis Dataset

To verify the performance of the algorithm presented in this paper, the experimental dataset from Case Western Reserve University was utilized. This dataset is widely recognized for its robustness and accuracy in diagnosing rotating machinery faults across various operational conditions, ensuring consistency and comparability with existing studies. This experiment involves collecting vibration signals from the drive-end bearing using an accelerometer at a sampling frequency of 12 kHz [20]. We constructed four datasets (A, B, C, and D) under different load conditions of 0 HP, 1 HP, 2 HP, and 3 HP, respectively. The fault classifications of the bearings included four types: normal bearing, rolling-element fault, inner-ring fault, and outer-ring fault. Except for the normal bearing, the fault sizes were 0.007 inches, 0.014 inches, and 0.021 inches, resulting in ten classifications or fault labels in total.

For the training set, overlapping sampling was performed with a sample length set to 1024, ensuring no overlap between the training, validation, and test sets. Each classification contained 1000 labeled samples, resulting in a total of 10,000 samples per dataset. The datasets were divided into training, validation, and test sets in a 7:3:1 ratio. The specific parameters are shown in Table 4.

7.2. Analysis of Model Results

To verify the fault diagnosis accuracy of the multi-feature fusion CNN model proposed in this paper, Dataset A was used to train the dual-stream CNN model. Additionally, the performance of the dual-stream CNN model was validated using 1D-CNN and 2D-CNN models. The effectiveness of the optimization methods was further validated by comparing with an unoptimized dual-stream CNN model. The configurations of the various models are detailed in Table 5.

The loss curves for the four models are depicted in Figure 15, while the accuracy curves are shown in Figure 16. In Figure 15, the blue curve represents the training set and the red curve represents the validation set. Model 1 exhibited significant fluctuations in both loss and accuracy during iterations. Model 2 showed pronounced fluctuations in loss and accuracy within the first 50 iterations but stabilized thereafter, albeit achieving a relatively lower accuracy after 200 iterations. Model 3 demonstrated rapid convergence in loss and achieved a 99% accuracy. Model 4 converged fastest in terms of loss, and it was aided by the inclusion of dropout algorithms, which mitigated the slight overfitting observed in Model 3. Notably, the optimized dual-stream CNN of Model 4 exhibited minimal fluctuation, with a steady decline in the loss function. By the 30th iteration, both accuracy and loss stabilizes, and, after 200 iterations, the loss function approached 0. To ensure consistency, each model underwent 10 iteration experiments, with Model 4 achieving the highest accuracy of 100%, demonstrating its superiority.

Based on the accuracy and loss rates of the four models, the dual-stream CNN model showed advantages over 1D-CNN and 2D-CNN. To visually assess the effectiveness of fault diagnosis more intuitively, the recognition results of 1000 samples from the test sets of Model 3 and Model 4 were visualized and summarized in a confusion matrix, as shown in Figure 17. The matrix format consolidated the results, with the diagonal representing the accuracy of the prediction results, where darker colors indicate better precision in predicted classifications. Comparing the two models, the optimized dual-stream CNN model achieved a recognition rate of up to 100% for most faults, accurately classifying each sample into its respective fault category. Similarly, Figure 18 visualizes the final hidden fully connected layers of both models in a two-dimensional space using t-SNE technology, where the dimensionality reduction in extracted features was analyzed. In the pre-optimized dual-stream CNN, aggregation effects among various data classes were evident but with some instances of misclassification. Conversely, in the optimized dual-stream CNN model, the classes were well separated with larger inter-class distances, indicating more distinguishable features.

7.3. Noise Environment Verification

The operation of CNC machines is often accompanied by a complex noise environment, which can affect the captured vibration signals with additional complex noises. In this paper, we intentionally added noise signals to the vibration signals in the dataset to further evaluate the model’s fault diagnosis capability under noisy conditions.

In real-world working environments, noise typically consists of a combination of multiple noises, which can be viewed as the sum of several random variables following different probability distributions. According to the Central Limit Theorem, such a noise composite tends toward a Gaussian distribution [21]. Additive white Gaussian noise (AWGN) is an idealized form of white noise that follows a Gaussian distribution, and it is widely used to simulate noise interference in industrial environments. This study selected various levels of noise, using the signal-to-noise ratio (SNR) as the metric for assessing the strength of the noise signal. A higher SNR indicates a lower energy level of noise in the signal; when the SNR is 0, the energy of the original signal equals that of the noise. Specifically, we introduced AWGN with SNRs of −8 dB, −4 dB, 0 dB, 4 dB, 8 dB, and 10 dB into Dataset A to test the noise resistance of the model. For example, with an SNR value of −4, a randomly selected outer race fault signal l was shown with the resultant noisy signal in Figure 19. The figure illustrates that the noise significantly disrupts the signal, making feature extraction challenging.

Figure 20, Figure 21 and Figure 22 demonstrate the performance of various models under different SNR levels of added Gaussian white noise, where the recognition rates, variance, and RMSE values were compared to assess noise resistance. As the intensity of the noise decreased, the overall diagnostic accuracy of the models improved. When the SNR reached 8 dB, the accuracy of all four models exceeded 80%, with the dual-stream CNN and the optimized dual-stream CNN models achieving accuracies above 97%. Both models exhibited similar performance, maintaining high recognition rates under varying levels of noise interference, thus indicating strong noise resistance. In contrast, the 1D-CNN and 2D-CNN models were slightly less effective; at an SNR of −8 dB, their recognition rates were only 79.13% and 80.2%, respectively. Although the recognition accuracy of these models improved as the noise intensity decreased, their performance declined compared to the original input signal.

Taking an SNR value of 0 as an example, feature dimensionality reduction was applied to the final hidden fully connected layer of each model. The accuracies of the four models were 86.3%, 93.03%, 95.77%, and 98.66%, respectively. From the feature visualization of the test set, as shown in Figure 23, it was evident that the latter two models exhibited better classification performance. In particular, the optimized dual-stream CNN model showed well-separated cluster centroids with clear gaps, and only a few data points were misclassified due to spatial overlap. In contrast, the 1D-CNN model showed a more chaotic spatial division of fault samples with overlapping cluster centroids, indicating poorer model robustness.

7.4. Verification of Different Working Conditions

When CNC machines operate, the workload frequently changes, leading to different bearing load states and inconsistencies between source- and target-domain working conditions. To validate the robustness of the proposed optimized dual-stream CNN model under varying conditions, this paper employed domain adaptation to simulate new environments. According to the dataset division described above, Datasets B, C, and D correspond to load conditions of 1 HP, 2 HP, and 3 HP, respectively. One dataset was used for training, while the model’s performance was tested on fault diagnosis data from the other three load conditions. The specific setup is shown in Table 6.

The performance metrics of the model across six domain adaptation scenarios are presented in Table 7. From the table, it can be seen that the dual-stream CNN model exhibited weaker evaluation metrics and poorer performance in domain adaptation scenarios compared to the optimized dual-stream CNN model. The optimized dual-stream CNN model achieved an average accuracy of 94.17%, an average precision of 94.92%, and an average recall of 94.05%, indicating good adaptability. The harmonic mean of precision and recall (F1 score) in all scenarios exceeded 90%.

When the source domain and target domain spanned one load, the accuracy remained high, with minimal difference from the accuracy under a single working condition. However, when the source and target domains spanned two loads, such as in the B → D and D → B scenarios, the performance metrics significantly declined. In these cases, the dual-stream CNN model’s metrics fell below 89%. After optimization, the metrics improved, with all indicators exceeding 90%, though they were still not as high as in other scenarios. This decline was due to the increased disparity in vibration signals between the source and target domains, reducing the model’s adaptability to the target domain.

To provide a more intuitive evaluation of the model performance, the receiver operating characteristic (ROC) curve was used to further describe the relationship between the false positive rate and the true positive rate, where the fault feature extraction capability and fault recognition ability were assessed, as shown in Figure 24. In each subgraph, the four colored curves correspond to the macro-average and micro-average of the model in the current domain-adaptation scenario. It can be observed that each ROC curve tends toward the top left corner, approaching a right-angle shape. Quantifying the ROC curves, the area under the curve (AUC) values for the optimized dual-stream CNN were all above 0.95, with most AUCs reaching 1.00, indicating a prediction accuracy of 100%. In contrast, the AUC values for the dual-stream CNN were relatively lower, with the macro-average AUC in the D → B scenario reaching only 0.94. This demonstrates the effectiveness of the optimization scheme, which enables more accurate identifications of bearing fault features.

8. Summary

This paper introduces an optimized dual-stream CNN model for bearing fault diagnosis, integrating one-dimensional diagnostic signals with two-dimensional time-frequency images. The model uses separate convolutional streams for each data representation and merges them to incorporate multiple features. Several enhancements were made to optimize the dual-stream CNN for fault diagnosis. These include the addition of convolutional attention mechanisms to focus on key features, the application of dropout algorithms to prevent overfitting, and the inclusion of batch normalization layers to improve model generalization. The model also utilizes large convolutional kernels for automatic feature learning specific to diagnosis and employ adaptive learning rate adjustments based on iteration counts. Comparative analysis with 1D-CNN, 2D-CNN, and conventional dual-stream CNN validates the superior diagnostic capability of the optimized model. Experimental results demonstrate its robustness against noise and its ability to accurately identify fault features even in domain adaptation scenarios.

However, this study only considered single-fault scenarios for bearing fault diagnosis. As mechanical systems become increasingly sophisticated and complex, bearing faults may co-occur with multiple fault types in real-world applications. Future work will focus on studying compound bearing faults. Additionally, in this system, only the bearings with the highest fault frequency were selected for diagnosis within the CNC machine tool groups. Future research will also explore faults in other critical components and integrate multi-source data, such as temperature signals, to comprehensively validate the system’s effectiveness.

Author Contributions

Conceptualization, Y.T.; Data curation, Z.N. and Y.S.; Formal analysis, Z.N.; Investigation, Z.N.; Methodology, Y.T.; Software, Z.N. and Y.T.; Validation, Y.S. and R.W.; Writing—Original Draft, Y.T. and Y.S.; Writing—Review and Editing, Y.S. and R.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the MOE (Ministry of Education in China) Project of Humanities and Social Sciences (no. 17YJC630139).

Data Availability Statement

The data presented in this study are available in the Case Western Reserve University Bearing Data Center at [https://engineering.case.edu/bearingdatacenter, accessed on 8 September 2024], reference number [N/A]. These data were derived from the following resources available in the public domain: Case Western Reserve University Bearing Data Center [https://engineering.case.edu/bearingdatacenter, accessed on 8 September 2024].

Conflicts of Interest

The authors declare no conflict of interest.

References

Zhang, J.J. Rolling Bearing Fault Diagnosis Based on Variational Mode Decomposition. Master’s Thesis, Shijiazhuang Tiedao University, Shijiazhuang, China, 2018. [Google Scholar]
Fan, B.Q. Design and Implementation of Vibration Pulse Signal Detection System for Rolling Bearings. Master’s Thesis, Chongqing University, Chongqing, China, 2020. [Google Scholar]
Han, T. Research on CNC Machine Tool Spindle Bearing Fault Diagnosis Based on Deep Learning. Ph.D. Thesis, Southwest University of Science and Technology, Mianyang, China, 2021. [Google Scholar]
Li, G.; Zhu, H.; He, J.; Huo, Y.; Zhang, J. Reliability Modeling of NC Machine Tools Based on Artificial Intelligence. Proc. IOP Conf. Ser. Mater. Sci. Eng. 2018, 435, 012057. [Google Scholar] [CrossRef]
Unal, M.; Onat, M.; Demetgul, M.; Kucuk, H. Fault Diagnosis of Rolling Bearings Using a Genetic Algorithm Optimized Neural Network. Measurement 2014, 58, 187–196. [Google Scholar] [CrossRef]
Wu, L.; Yao, B.; Peng, Z.; Guan, Y. Fault Diagnosis of Roller Bearings Based on a Wavelet Neural Network and Manifold Learning. Appl. Sci. 2017, 7, 158. [Google Scholar] [CrossRef]
Shao, H.; Jiang, H.; Zhao, H.; Wang, F. A Novel Deep Autoencoder Feature Learning Method for Rotating Machinery Fault Diagnosis. Mech. Syst. Signal Process. 2017, 95, 187–204. [Google Scholar] [CrossRef]
Zhao, D.Z.; Wang, T.Y.; Chu, F.L. Deep Convolutional Neural Network Based Planet Bearing Fault Classification. Comput. Ind. 2019, 107, 59–66. [Google Scholar] [CrossRef]
Zhang, D.Z.; Zhou, T.T. Deep Convolutional Neural Network Using Transfer Learning for Fault Diagnosis. IEEE Access 2021, 9, 43889–43897. [Google Scholar] [CrossRef]
Ma, C.Y.; Chen, M.H.; Kira, Z.; AlRegib, G. AlRegib G. TS-LSTM and temporal-inception: Exploiting spatiotemporal dynamics for activity recognition. Signal Process. Image Commun. 2019, 71, 76–87. [Google Scholar] [CrossRef]
Zhu, Y.J.; Hu, J.Q.; Li, W.; Lin, Q.Y.; Yi, C.C. Short-time Fourier Transform Based on Frequency-domain Window Function and Its Application in Mechanical Impulse Feature Extraction. Mach. Tool Hydraul. 2021, 49, 177–182. [Google Scholar]
Mohd, H.R.; Isa, K.; Mohamad, S. A review on speaker recognition: Technology and challenges. Comput. Electr. Eng. 2021, 90, 107005. [Google Scholar] [CrossRef]
Li, N.N. Research on Rolling Bearing Fault Diagnosis Method Based on Deep Learning. Master’s Thesis, Shenyang Jianzhu University, Shenyang, China, 2021. [Google Scholar]
Stockwell, R.G.; Mansinha, L.; Lowe, R.P. Localization of the complex spectrum: The S transform. IEEE Trans. Signal Process. 1996, 44, 998–1001. [Google Scholar] [CrossRef]
Woo, S.; Park, J.; Lee, J.Y.; Kweon, I.S. CBAM: Convolutional Block Attention Module. In Proceedings of the ECCV 2018, Munich, Germany, 8–14 September 2018; pp. 3–19. [Google Scholar]
Bie, S.S. Fault Diagnosis Method of Rolling Bearing Based on Improved Convolutional Neural Network. Master’s Thesis, Hunan University of Technology, Zhuzhou, China, 2022. [Google Scholar]
Hinton, G.E.; Srivastava, N.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R.R. Improving neural networks by preventing co-adaptation of feature detectors. arXiv 2012, arXiv:1207.0580. [Google Scholar]
Jing, L.Y. Research on the Fault Diagnosis Method for Rotating Machinery Using Deep Convolutional Neural Network. Ph.D. Thesis, Tianjin University, Tianjin, China, 2017. [Google Scholar]
Luo, X.J. Research on Fault Diagnosis Method of Wind Turbine Transmission System Based on Deep Learning. Ph.D. Thesis, North China Electric Power University, Beijing, China, 2021. [Google Scholar]
Lou, X.; Loparo, K.A. Bearing fault diagnosis based on wavelet transform and fuzzy inference. Mech. Syst. Signal Process. 2004, 18, 1077–1095. [Google Scholar] [CrossRef]
Xie, X.T. Research on Mechanical Fault Diagnosis Technology Based on Deep Learning Theory. Master’s Thesis, Guizhou University, Guiyang, China, 2019. [Google Scholar]

Figure 1. Time-frequency diagram of the bearing vibration signal. (a) Normal bearings. (b) Outer-ring failure. (c) Inner-ring failure. (d) Rolling-element failure.

Figure 2. Bearing vibration signal envelope spectrum. (a) Normal bearings. (b) Outer-ring failure. (c) Inner-ring failure. (d) Rolling-element failure.

Figure 3. Dual-stream neural network model.

Figure 4. STFT time-frequency diagram. (a) Normal. (b) 0.021-OuterRace.

Figure 5. CWT time-frequency chart. (a) Normal. (b) 0.021-OuterRace.

Figure 6. GST time-frequency chart. (a) Normal. (b) 0.021-OuterRace.

Figure 7. WVD time-frequency diagram. (a) Normal. (b) 0.021-OuterRace.

Figure 8. Structure diagram of convolutional attention mechanism.

Figure 9. Channel attention module.

Figure 10. Spatial attention module.

Figure 11. Comparison of the network structure of the Dropout method. (a) Dropout method is not used. (b) Dropout method is used.

Figure 12. Dual-stream CNN model framework.

Figure 13. Flowchart of the dual-stream CNN fault diagnosis.

Figure 14. Overlapping data sampling.

Figure 15. The convergence of the objective function value during the training of each model. (a) 1D-CNN. (b) 2D-CNN. (c) Dual-stream CNN. (d) Optimized dual-stream CNN.

Figure 16. Accuracy curves for each model.

Figure 17. Confusion matrix of the classification results in Models 3 and 4. (a) Dual-stream CNN. (b) Optimized dual-stream CNN.

Figure 18. Model 3—4 t—SNE feature visualization. (a) Dual-stream CNN. (b) Optimized dual-stream CNN.

Figure 19. The outer-ring fault signal, additive Gaussian white noise, and signal and noise superposition signal.

Figure 20. Comparison of the recognition rates of each model under different signal-to-noise ratios.

Figure 21. Comparison of the variances of each model under different signal-to-noise ratios.

Figure 22. Comparison of the RMSE values of each model under different signal-to-noise ratios.

Figure 23. Model 1—4 t in noisy environment—SNE feature visualization. (a) 1D-CNN. (b) 2D-CNN. (c) Dual-stream CNN. (d) Optimized dual-stream CNN.

Figure 24. The ROC curves of the results of the two-model processing in the domain-adaptive scenario. (a) Optimized dual-stream CNN (Dataset B). (b) Optimized dual-stream CNN (Dataset C). (c) Optimized dual-stream CNN (Dataset D). (d) Dual-stream CNN (Dataset B). (e) Dual-stream CNN (Dataset C). (f) Dual-stream CNN (Dataset D).

Table 1. The 1D-CNN network model parameters.

Layer Number	Network Layer	Convolution Size	Step	Number of Convolution Kernels	Output Size
1	Convolutional layer	1 × 64	5	6	6 × 1 × 242
2	Pooling layer	1 × 2	2	6	6 × 1 × 121
3	Convolutional layer	1 × 5	1	16	16 × 1 × 117
4	Pooling layer	1 × 3	3	16	16 × 1 × 39
5	Flattening				1 × 624

Table 2. The 2D-CNN network model parameters.

Layer Number	Network Layer	Convolution Size	Step	Number of Convolution Kernels	Output Size
1	Convolutional layer	5 × 5	1	6	6 × 60 × 60
2	Pooling layer	2 × 2	2	6	6 × 30 × 30
3	Convolutional layer	5 × 5	1	16	16 × 26 × 26
4	Pooling layer	2 × 2	2	16	16 × 13 × 13
5	Flattening				1 × 2704

Table 3. The parameters of dual-stream converged CNN network model.

Layer Number	Network Layer	Step	Output Size
1	Fully connected layer	120	120 × 1
2	Fully connected layer	64	64 × 1
3	Output layer	10	10 × 1

Table 4. Dataset descriptions.

Fault Classification		Rolling Element Failure			Inner-Ring Failure			Outer-Ring Failure			Normal	Load
Fault Size/Inch		0.007	0.014	0.021	0.007	0.014	0.021	0.007	0.014	0.021	0
Fault Label		0	1	2	3	4	5	6	7	8	9
A	Number of samples	1000	1000	1000	1000	1000	1000	1000	1000	1000	1000	0
B	Number of samples	1000	1000	1000	1000	1000	1000	1000	1000	1000	1000	1
C	Number of samples	1000	1000	1000	1000	1000	1000	1000	1000	1000	1000	2
D	Number of samples	1000	1000	1000	1000	1000	1000	1000	1000	1000	1000	3

Table 5. Configuration of each model.

Serial Number	Model	Disposition
1	1D-CNN	Conv1d(1 × 5)−MaxPool1d(3)−Conv1d(1 × 3)−MaxPool1d(3)−FC
2	2D-CNN	Conv2d(2 × 2)−MaxPool2d(2)−Conv2d(5 × 5)−MaxPool2d(2)−FC
3	Dual-stream CNN	(1D−CNN + 2D−CNN)−FC−FC2
4	Optimized dual-stream CNN	Dual-stream CNN + CA + SA + Dropout + BN + large convolution kernel + learning rate attenuation

Table 6. Domain-adaptive scene settings.

Domain Type	Source Domain	Target Domain
Dataset	Training set B	Test set C	Test set D
	Training set C	Test set B	Test set D
	Training set D	Test set B	Test set C

Table 7. The robustness analysis indicators of the model in the adaptive scenario of each domain.

Model	The Source Domain → The Destination Domain	Analyzed Metrics
Model	The Source Domain → The Destination Domain	Accuracy (%)	Precision (%)	Recall Rate (%)	F1 Value (%)
Dual-stream CNN	B → C	94.5	95.09	94.07	94.58
	B → D	87.5	88.12	85.74	87.39
	C → B	93.0	92.94	93.76	93.35
	C → D	88.0	91.21	88.79	89.98
	D → B	84.0	86.11	85.73	85.92
	D → C	90.0	91.16	89.97	90.56
Optimized dual-stream CNN	B → C	97.0	97.28	97.49	97.38
	B → D	90.0	91.75	90.10	90.61
	C → B	98.5	98.59	98.63	98.61
	C → D	95.0	96.71	95.28	95.99
	D → B	90.5	91.11	90.27	90.18
	D → C	97.0	97.09	97.15	97.12

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Ni, Z.; Tong, Y.; Song, Y.; Wang, R. Enhanced Bearing Fault Diagnosis in NC Machine Tools Using Dual-Stream CNN with Vibration Signal Analysis. Processes 2024, 12, 1951. https://doi.org/10.3390/pr12091951

AMA Style

Ni Z, Tong Y, Song Y, Wang R. Enhanced Bearing Fault Diagnosis in NC Machine Tools Using Dual-Stream CNN with Vibration Signal Analysis. Processes. 2024; 12(9):1951. https://doi.org/10.3390/pr12091951

Chicago/Turabian Style

Ni, Zhen, Yifei Tong, Yixuan Song, and Ruikang Wang. 2024. "Enhanced Bearing Fault Diagnosis in NC Machine Tools Using Dual-Stream CNN with Vibration Signal Analysis" Processes 12, no. 9: 1951. https://doi.org/10.3390/pr12091951

APA Style

Ni, Z., Tong, Y., Song, Y., & Wang, R. (2024). Enhanced Bearing Fault Diagnosis in NC Machine Tools Using Dual-Stream CNN with Vibration Signal Analysis. Processes, 12(9), 1951. https://doi.org/10.3390/pr12091951

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhanced Bearing Fault Diagnosis in NC Machine Tools Using Dual-Stream CNN with Vibration Signal Analysis

Abstract

1. Introduction

2. Characteristics of Bearing Vibration Signals

3. Literature Review

4. Issues with Classical CNN Models in Bearing Fault Diagnosis Applications

5. Dual-Stream CNN Model

6. Optimizing the Dual-Stream CNN Fault Diagnosis Model

6.1. Time-Frequency Analysis Methods

6.2. Attention Mechanisms

6.3. Dropout

6.4. Batch Normalization

6.5. Construction of a Bearing Fault Diagnosis Model Based on an Optimized Dual-Stream CNN

6.6. Parameters of the Bearing Fault Diagnosis Model Based on Optimized Dual-Flow CNN

7. Experimental Verification and Analysis

7.1. Fault Diagnosis Dataset

7.2. Analysis of Model Results

7.3. Noise Environment Verification

7.4. Verification of Different Working Conditions

8. Summary

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI