1. Introduction
Vibration monitoring for rotary machines tracks important components in the time between planned maintenance. The monitoring process helps factories prevent severe damage to electrical machines, ensure product quality, and have active plans in the production process. Vibration signals emitted from electrical motor systems contain health information about these systems that can detect potential damage. According to a literature review, 40–50% of electric motor failures occur due to bearing faults [
1]. Especially in heavy motor machines, it is critical to find broken components that need repair or replacement. Moreover, indicating the degradation level of the component will help plan the time of repair.
Vibration-based bearing fault diagnosis is adopted early by using signal representation methods independently in the time or frequency domains. For example, envelope analysis can be used to show the tendency of signals in the frequency domain [
2]. From the change of the spectrum envelope, bearing faults can be predicted through observation by maintenance engineers. With advancements in the study of extracting useful information from vibration signals, diagnosis of motor failures by maintenance engineers can be gradually replaced by an automatic process. Recent automatic diagnosis methods have focused on achieving high accuracy and flexibility. However, in those methods, very complex fault diagnosis processes are required to analyze the acquired vibration signals.
When the characteristics of signals tend to be nonstationary, time-frequency approaches become appropriate and reach high efficiency. Improved Hilbert–Huang transforms (HHT) illustrated that it is a potential computation-efficient time-frequency method and is unaffected by frequency resolution and time resolution compared to wavelet-based scalogram, resulting in high-accuracy detection for rolling bearing faults by using vibration signals [
3]. Empirical mode decomposition (EMD) to decompose the number of intrinsic mode functions (IMFs) was also proposed to improve diagnosis accuracy [
4]. A third-order statistic bi-spectrum was applied to analyze and detect single-bearing fault types. Moreover, intelligent algorithms and techniques such as support vector machines (SVMs) and artificial neural networks (ANNs) also enhance the prediction accuracy for bearing faults. After using Discrete wavelet transform (DWT) as preprocessing for feature extraction, the SVM classifier fed by determined statistic features performs better than ANN in detecting bearing faults [
5].
Over the last few years, the development of convolutional neural networks (CNNs) for artificial intelligence has improved the accuracy of fault diagnosis significantly, where various kinds of features affect the diagnosis process. The method using CNN as a back-end classifier and training it by statistics features extracted from vibration signals of single faults provides high accuracy for fault diagnosis [
6]. More recent studies have also achieved high diagnosis accuracy by using CNNs for the bearing faults. D. T. Hoang and H. J. Kang used vibration signals directly as the input data for bearing fault diagnosis by adopting an automatic fault diagnosis system that does not require any feature extraction techniques, which achieved high accuracy and robustness under noisy environments while considering variable shaft speeds [
7].
In a real fault diagnosis process, there are situations in which a covariate is not directly measured, thereby confounding the diagnosis of machines. To improve the accuracy of fault diagnosis methods based on deep learning, especially CNNs, constructing nonlinear representations automatically under complex situations is very important. Therefore, the studies that apply CNNs after the preprocessing stage have achieved high prediction accuracy even under complex working conditions. When classifying bearing cracks of multiple scales and compound faults at variable shaft speeds, using bi-spectrum images to represent signals achieved high accuracy [
8]. A CNN with the Adamax optimizer was used to extract the features and to classify types of compounds bearing faults. The efficiency of combining CNN and the time-frequency representation was also proved by using HHT to create time-frequency images as input data for CNN [
9].
Meanwhile, the spectrogram is a simple method that represents useful information from nonstationary signals [
10]. In order to classify bearing faults under complicated conditions, a deep enough CNN network is required. Thus, our study aims to apply a deep and capable CNN model with spectrograms by preprocessing to obtain high accuracy even under complex working conditions. This paper proposes a new fault diagnosis method using a practical CNN to extract useful information from signal representation in the time-frequency domain. The proposed method can achieve high accuracy in diagnosing compound bearing faults under variable shaft speeds. In addition, the stability of the proposed method under noisy environments is also proved through various experiments. The remainder of this paper is organized as follows.
Section 2 presents the details of our proposed bearing fault diagnosis method, and
Section 3 presents the detailed experimental methods.
Section 4 provides a discussion of various experimental results, and
Section 5 presents the conclusions.
2. Proposed Bearing Fault Diagnosis Method
An overview of the proposed bearing fault diagnosis method is shown in
Figure 1. First, the vibration signals are split into fixed-cycle 5 seconds segments. The Short-Time Fourier Transform (STFT) is then applied for each signal segment to produce the spectrogram. Then, spectrogram images generated from the vibration signals corresponding to the bearing faults are used to train network architecture CNN-VGG16, and the trained CNN-VGG16 model is used to classify bearing faults automatically in the test stage.
2.1. Signal Preprocessing
The vibration signals acquired from electrical machines are nonstationary as they contain bearing fault information modulated for variable rotational speeds and surrounding environmental noise. The frequency of the obtained signal is continuous, and its time-domain statistical features change with time. Nonstationary signals typically lack useful information in the time or frequency domains but may have useful information through merge representation within the time-frequency space.
In varying working conditions of bearings (various types of single and compound faults and various degradation levels at variable rotational speeds), traditional signal processing methods such as envelope spectrum analysis and wavelet package transform are challenging to obtain meaningful information. Under these conditions, the signal’s spectrum varies greatly in amplitude and frequency. Therefore, it is troublesome to recognize the deficiencies of bearings under adverse working conditions using conventional signal processing methods. For appropriate analysis and synthesis of nonstationary signals, Short-Time Fourier Transform (STFT) is a typical Fourier transform (FT) applied to create the spectrograms of signals. The spectrogram is a visual representation of the signals in both the time and frequency domains, using a color scale of the image to indicate the frequency’s amplitude.
2.1.1. Short-Time Fourier Transform
STFT applies a Fourier transform for localization both in the frequency and time domains for signals that are time-varying or nonstationary [
11]. The process for the STFT can be represented as follows.
where
m ∈
M {1, 2, …} is an index related to the beginning of the sliding extraction window (local time index),
M ∈
N is the analysis window length,
l ∈ {0, 1, …,
L – 1} is the frame index,
L ∈
N is the number of frames, and
H ∈
N is the hop size. Further, the following discrete Fourier transform (DFT) is performed on every frame
given a localized two-sided spectrum [
11]:
where
k ∈ {1, 2, …,
K} is the frequency bin index and
K ∈
N is the DFT size.
The term is called the STFT of and corresponds to the local time-frequency behavior of the signal around the time index and frequency bin k.
2.1.2. Implementing STFT Analysis for Spectrograms
With the advantages of frequency-domain analysis for nonstationary signals, STFT can be a good approach for analyzing the bearing signals under complex conditions or with background noise. To efficiently apply STFT to bearing fault signals, the STFT matrix can be determined by a new routine with the MATLAB tool to achieve high accuracy and computational efficiency [
12]. After analyzing which segment duration of the signal has essential features (such as
mean,
RMS,
standard deviation, variance, kurtosis, skewness, crest factor, and
form factor that do not change with time), the window length (
) is set as 1024. The accuracy is prioritized to
, although the shorter windows affect the calculation volume in a negative manner. The hop size is set to
/4 based on various experiments [
12].
Figure 2 shows the spectrograms produced by applying STFT to vibrations signals with four different shaft speeds (1730, 1750, 1772, and 1797 RPM) with three kinds of faults (inner race fault, outer race fault, and roller fault).
2.2. CNN-VGG16 Model for Bearing Fault Classification
The CNN architecture affects prediction accuracy and calculation volumes. If an efficient CNN network is selected, it can achieve high prediction accuracy while optimizing the calculation volume. Therefore, a practical model should be used for the systems with limited hardware. Typically, the design and selection of a CNN network architecture should focus on the structure of the network and its size (resolution of input data and the width and depth of the CNN). The complexity level of the CNN architecture is chosen based on the characteristics of the input data and the number of output neurons (the number of layers to be classified). Currently, there are many research results for new CNN network architectures that are highly effective in image classification. The choice of network architecture ensures high prediction accuracy with the most optimal resources. For accurate bearing fault diagnosis, we examined various scenarios based on the architecture using VGG16 [
13].
A CNN model with a limited number of layers is economical considering storage and computing resources. However, for solutions to complex problems such as diagnosing various types of faults under changing rotational speed conditions, a deeper CNN needs to be applied. Previous studies applied generic structuring CNNs, which consist of convolutional layers, pooling layers, and fully connected layers [
14]. LeNet-5, which is constructed by five simple CNN layers, was used for bearing fault diagnosis [
15,
16]. Similarly, a simple self-designed CNN with two CNN layers was also proposed [
8].
Although simple structuring CNN architectures are easy to design and do not require many computation resources, novel complex CNN architectures using efficient techniques can provide high fault diagnosis accuracy in complicated scenarios. VGG16 is a potential model that focuses on depth while previous bearing fault diagnosis methods using the AlexNet CNN model focused on tiny window sizes and strides, in particular, convolutional layers [
9,
17]. The VGG16 model has been proven beneficial to classification accuracy.
Figure 3 shows the detailed architecture of our VGG16 model fed by spectrogram images. The input size of the proposed VGG16 is fixed at 224 × 224 with 3 channels (RGB images). The input image is transmitted through the vast majority of convolution layers where the filters are applied with their receiver field which is very small: 3 × 3. In other configurations, it also uses a built-in smaller size: 1 × 1. The design of those tiny filters acts as a linear transformation of the input image channels (followed by a nonlinear transformation). The fixed size of the convolution strike is 1 pixel. The spatial padding of the input convolutional layer is preserved after convolution. Five max-pooling layers play the role of spatial pooling, which are added following convolutional layers to reduce their dimensionality. Max-pooling is performed over a 2 × 2 pixel window with a stride of 2. The following convolution layers are three fully connected (FC) layers. The first and second FC layers have 4096 neurons, and the number of neurons of the third FC layer depends on the number of different bearing faults needed to be classified (8 or 22 classes, depending on individual experiments). The final layer is the soft-max layer, which outputs a vector to show the prediction probability distribution for bearing faults. All hidden layers use rectification (ReLU) nonlinearity as an activation function [
13].
5. Conclusions
We proposed a new fault diagnosis method for rotary machine bearings that can identify and recognize faults under inconsistent working conditions, including non-steady shaft speeds, bearings with cracks in different scales, compound faults, and noisy working environments. Moreover, the degradation level of every kind of bearing fault was evaluated to identify the diagnosis accuracy under complicated conditions. In the proposed bearing fault diagnosis method, spectrograms of vibration acceleration signals under inconsistent working conditions were calculated by the preprocessing using the Short-Time Fourier Transform (STFT), and then the spectrograms were provided as informative input data to the CNN model using VGG16 for fault identification and classification. Our proposed method provides average accuracy of almost 100% for combined bearing faults as well as single bearing faults. It means that representing nonstationary signals in the time-frequency domain can be a practical approach for bearing fault diagnosis using vibration signals. We also proved that the proposed method could predict bearing faults with very high accuracy even under noisy environments. Therefore, we can expect that the proposed method can be applied in real industrial environments under complex working conditions.