1. Introduction
With the emergence of IoT, a large amount of innovative applications in fault diagnosis fields have been rapidly increasing [
1]. For instance, Kumar et al. [
2] proposed a fault diagnosis method based on IoT and semi-supervised learning for a panel-level solar photovoltaic array. Tran et al. [
3] studied a novel fault recognition based on IoT and deep learning for induction motors. However, the existing centralized cloud computing models find it very difficult to cope with the massive number of IoT devices applied to the acquisition of fault data of rotating machinery and the long distance data transmission between devices and clouds. Consequently, it is very important to develop fast fault diagnosis methods of rotating machinery to avoid major safety accidents and economic losses in industrial production. Fortunately, the edge computing technique operated in the smaller number of IoT devices provides a promising direction addressing the deficiency of the centralized cloud computing [
4]. For example, Wang et al. [
5] proposed a lightweight convolutional neural network method for the intelligent fault diagnosis of bearing in the Industrial IoT context. Pan et al. [
6] proposed a novel edge-IoT framework based on blockchain and smart contracts. Huang et al. [
7] studied the development and application of multi-source sensing data fusion models and algorithms in mechanical equipment fault diagnosis and prediction based on IoT with artificial intelligence and big data processing technology.
For rotating machinery, gearbox is the most important power transmission component in mechanical equipment, which mainly consists of gears, shafts and bearings. Its health status directly affects whether the mechanical equipment can work normally. Due to different types of faults coupled together, non-stationarity and a large amount of noise, it is very difficult to effectively extract the most valuable fault characteristics from the raw data by using the existing methods [
8]. If the specific fault category can be accurately recognized and predicted in the edge-IoT context, then the huge losses caused by the fault should be effectively avoided [
9]. Thus, it is significantly meaningful to develop a high accuracy fault diagnosis method for the gearbox compound faults under a strong noise environment.
It is known that feature extraction and identification of the fault patterns are the two main steps to accomplish the fault diagnosis of rotating machinery [
10]. Usually, the traditional feature extraction methods mainly consist of statistical feature extraction [
11], signal analysis techniques such as Fourier transform [
12], wavelet transform [
13], empirical modal decomposition [
14], and more. Then, the typical pattern recognition methods include support vector machines [
15], extreme learning machines [
16], artificial neural networks [
17] and other improved approaches [
18]. For example, Wang et al. [
18] completed the diagnosis of the gearbox compound faults by using a double-extreme learning machine to implement the process of clustering and classification, respectively.
Although the traditional fault diagnosis methods have achieved some satisfactory results, there still exist many shortcomings. In summary, firstly, the traditional methods largely rely on expert knowledge and prior knowledge to obtain high quality features. Secondly, the traditional approaches typically exhibit poor generalization ability and lack high diagnostic accuracy, as they are easily influenced by environmental factors such as the strong noise interference [
19].
In recent years, some new intelligent diagnosis methods based on deep learning have been widely used in the gearbox fault diagnosis fields [
19], which have a strong self-learning ability and can obtain distinguishable fault features from the raw data after multiple iterations of learning [
20]. For example, Autoencoder [
21], convolutional neural networks [
22], residual neural networks [
23], recurrent neural networks [
24], long short-term memory neural networks [
25], and more, are implemented to identify the fault categories of rotating machinery. It is noted that the convolutional neural network and graph attention network have been widely applied in various research fields characterized by high computational data requirements due to their powerful modeling representation capability [
26,
27]. In addition, there is also the use of transfer learning to investigate deep network models, which can adaptively recognize various faults [
28].
However, the diagnostic methods based on deep learning are largely dependent on hardware and high training cost, and the models often do not have strong generalization capability or anti-noise ability. It is noted that some researchers combined signal processing methods with deep learning to develop more effective fault diagnosis methods, which are more robust and have less learning cost [
29]. For example, Bai et al. [
30] used Fourier transform to process the sensor signal into an image and then applied a MCNN to mine fault characteristics. Chen et al. [
31] utilized wavelet transform to decompose the raw data and then identified the internal features through a MCNN and a softmax classifier. Hong et al. [
13] decoupled the compound faults signals by balanced multiwavelets and maximum correlated kurtosis deconvolution and then extracted the fault frequencies by spectrum analysis. But, it is greatly difficult to obtain high diagnosis accuracy in the situation of accurately locating the specific fault type from the compound faults, especially good robustness against noise under the strong noise conditions by using the existing methods.
To summarize, the difficulties of the existing methods in gearbox compound fault diagnosis mainly lie in three aspects. First, the complexity of the compound faults with highly non-stationary and a large amount of noise usually leads to attain low diagnosis accuracy for locating the specific fault type. Second, the traditional methods largely depend on artificial feature extraction and more complicated algorithms to select the most valuable features. Third, the deep learning-based methods need more complex model architectures and extensive training to finish the compound fault diagnosis. Especially, the ability of the extraction feature based on the deep models is significantly affected by the strong noise.
In view of the problems mentioned above, a novel and high accuracy fault diagnosis method, LMWT-MCNN, is proposed in this work for the gearbox compound faults. The proposed method decomposes the raw data into a few low and high frequency components using LMWT. Then, the feature images are designed based on these frequency components. Finally, the powerful feature learning ability of the MCNN model is implemented to further extract the more salient and valuable fault features from the feature images without artificial feature selection.
It is noted that LMWT has more base functions and many excellent properties to match the complex fault characteristics of gearbox. Therefore, the feature images obtained by LMWT can effectively represent the discriminative fault characteristics of gearbox, and there is no redundancy and leakage due to its orthogonality. Furthermore, the amplitude of the noise in the feature images is usually smaller than that of the fault frequency components [
32], thus the process of the max pooling layers in the MCNN model can effectively remove the noise frequency components, which demonstrates the strong anti-noise ability of the proposed method. In addition, the proposed method uses multiple labels to effectively identify the specific fault type of the gearbox compound faults [
33].
Finally, the effectiveness and robustness of the proposed method are verified by the PHM 2009 dataset from the 2009 Prognostics and Health Management Competition (
https://phmsociety.org/data-analysis-competition/, accessed on 21 October 2023) [
34] and the Paderborn University bearing compound fault datasets (
https://mb.uni-paderborn.de/en/kat/main-research/datacenter/bearing-datacenter/data-sets-and-download/, accessed on 21 October 2023) [
35]. The two datasets are conducted by some fault diagnosis methods but it is difficult to achieve high diagnostic accuracy [
14,
25,
33,
36,
37,
38,
39,
40,
41,
42,
43].
However, the experimental results obtained in this paper demonstrate that the proposed method has the great merits of the highest diagnosis accuracy and more robustness than other existing methods. In summary, the main contributions and advantages of this paper are described as follows.
- (1)
This paper constructs two feature images based on LMWT frequency domain by using a sample data, which can effectively match the complex fault characteristics of rotating machinery.
- (2)
This paper proposes an end-to-end compound fault diagnosis model based on edge-IoT. The proposed model not only avoids the complex artificial feature extraction, but also is a lightweight network only consisting of three convolutional layers and corresponding three pooling layers.
- (3)
This paper provides an effective model for extracting multiple fault features in the strong noise environment and it is very suitable for the compound fault diagnosis in real applications.
- (4)
This work conducts some comparative experiments on two datasets of rotating machinery, which verifies the effectiveness and robustness of the developed method. The corresponding recognition results indicate that the proposed model achieves the highest diagnosis accuracy and shows powerful anti-noise ability.
The remainder of this paper is organized as follows:
Section 2 introduces edge-IoT, LMWT, and the CNN model. The decomposition and reconstruction of a sample of the gearbox fault case 2 are elaborately described for understanding how to decompose the raw data into different frequency components by LMWT. In
Section 3, the two feature images of a sample are constructed based on the frequency components in detail. Then, the flowchart of the hybrid fault diagnosis method of LMWT and MCNN models based on edge-IoT is elaborately described. In
Section 4, the proposed method is implemented to identify different fault categories of rotating machinery, and the diagnosis results are utilized to compare with the existing methods. Finally,
Section 5 gives some conclusions about this research and prospects for future work.
2. Research Methodology
In this section, the framework of edge-IoT is first described in detail. In the second step, the concept and properties of LMW bases are introduced, and the decomposition and reconstruction of a sample are specifically described in this context. In the third step, the structure of CNN model is elaborately described. Finally, the multi-label method for gearbox compound fault diagnosis is briefly introduced.
2.1. The Data Acquisition and Fault Diagnosis System Based on Edge-IoT
It is known that the mechanical equipment intelligent fault diagnosis mainly consists of three processing procedures: signal acquisition, feature extraction and classification diagnosis. The data acquisition stage has a significant impact on the industrial application of mechanical fault diagnosis. Traditional fault diagnosis systems are mostly based on the centralized cloud computing structure [
1]. However, if the data volume of the terminal is large, the centralized transmission of data based on the centralized cloud has high requirements for the bandwidth of the transmission network, which will consume huge bandwidth and computing resources [
4]. Wu et al. [
44] proposed the edge-cloud architecture for IoT devices with the function of mechanical equipment intelligent diagnosis, which can effectively cope with the difficulty of the large volume data of the terminal and arrive the requirement for online fault diagnosis. Consequently, this paper adopts a data acquisition and intelligent fault diagnosis systems based on edge-IoT for gearbox compound faults, which is demonstrated in
Figure 1 as follows.
As shown in
Figure 1, different sensor groups are used to collect the equipment fault data of rotating components such as gearbox, bearing, gear, wind turbines and other mechanical equipment in different environments. Then, the end controller receives the large amount of data attained by the device, and the end server receives the proposed model from the master server. The collected large amount of data are transported to the edge calculation node, which has the proposed lightweight model for online fault diagnosis. The proposed end-to-end lightweight network is effectively utilized to attain the highest accuracy fault diagnosis results on the edge computing nodes. Finally, the obtained diagnosis results with a small amount of data are transported into the centralized cloud platform to be analyzed and visualized by the master controller.
To summarize, the edge computing lightweight model based on IoT can process sensor data directly at the edge of the network, which not only meets the expansion needs of the computing power of terminal devices, but also solves the issue of long delay in accessing cloud data centers. Compared with the centralized cloud computing, the proposed fault diagnosis method based on edge-IoT makes data analysis, communication, control, and storage closer to the sensing point, with low delay, less energy consumption, and high accuracy performance.
2.2. Legendre Multiwavelet Bases
Legendre polynomials of degree
k denoted by
are described as
where
, and
p is the number of the adopted LMW bases. According to the literature [
45], the Legendre scale basis functions
is represented by
Furthermore, a subspace
of piecewise polynomials is defined as
which constitutes a linear space, where
is the resolution level, and
is the translation parameter, and the corresponding interval
is represented by
. It is obvious to the whole set
forms an orthonormal basis for the subspace
. Then, the subspace
is also spanned using
by dilation and translation,
which forms an orthonormal basis in the subspace
. If the vibration signals with various faults of rotating machinery are analyzed only in the subspace
, the low frequency components are essentially obtained at the resolution level
n. Whereas a lot of characteristics of rotating machinery are salient to the high frequency components, the orthogonal complement of
in
, i.e., the multiwavelet subspace
needs to be described as
It is known that Alpert [
45] has constructed this multiwavelet subspace, which is implemented to effectively compute the integral and differential operators. The corresponding results can be explained by the two scales relation of the form
where
is the multiwavelet basis. In this work, the above coefficient matrices
,
,
, and
are implemented to learn the fault characteristics by convolution of the rotating machinery fault data to facilitate thoroughly extraction of comprehensive features.
In addition, in order to intuitively understand Legendre scale bases and wavelet basis functions, let the finest resolution level
and order
, respectively, and plot these bases which are described in
Figure 2 and
Figure 3.
From
Figure 2 and
Figure 3, the rich properties, such as compact support, vanishing moments, orthogonality, various regularities are clearly shown, and LMWT provides a powerful tool for comprehensively extracting the fault characteristics of the rotating machinery data through a few Legendre scale and wavelet bases. Various regularities should be more appropriate to adaptively identify the complex fault characteristics instead of the traditional fault diagnosis methods that rely on engineering experience.
2.3. The Decomposition and Reconstruction of LMWT
LMWT can be considered as a mathematical tool that converts a signal into a series of scale and wavelet coefficients, respectively. According to the multiresolution analysis theory and the basis knowledge of LMW explained in the above subsection, the decomposition procedure
resolution level is based on
where
and
are the low frequency and high frequency components at the resolution level
j, i.e., the approximation coefficients and detail coefficients, respectively. The integer
m is the number of the data obtained by the resolution level
and
. Therefore, the signals are decomposed into a hierarchical structure of details and approximations at the finest resolution level
n as follows.
Correspondingly, the reconstruction
resolution level is described as
Furthermore, a specific sample of the PHM 2009 dataset for case 2 is utilized to demonstrate the effectiveness and stability of LMWT. Then, the raw gearbox fault data of the sample with 4096 points for case 2 are described in
Figure 4.
Specifically, the convolution procedure of LMWT with the order of wavelet bases is described by the above sample as follows.
- Step 1:
The choice of finest resolution data is adopted as the raw data.
- Step 2:
According to the decomposition Equations (
8) and (
9), the raw data is doubled due to using two wavelet. Then, the doubled raw data is segmented into two parts corresponding to
and
, which are easily processed by the four filters.
- Step 3:
The processed data produce the correspondingly low frequency and high frequency components according to the decomposition Equations (
8) and (
9) at resolution level 1 by two Legendre scale bases and two Legendre wavelet bases.
- Step 4:
The detailed frequency components are elaborately demonstrated in
Figure 5 as follows.
As illustrated in
Figure 5, the resolution level 1 by LMW decomposition (LMWD) generates a total of four frequency components without losing any frequency information because of orthogonality. Then, according to the reconstruction Formulas (
11) and (
12), the gearbox fault data for case 2 can be reconstructed with high accuracy and no Gibbs phenomena, and it is described in
Figure 6.
As shown in
Figure 6, the order of the magnitude of the reconstruction error is
, which demonstrates the effectiveness and stability of this transformation.
2.4. A Brief Introduction to CNN
As one of the most important deep learning structure models, CNN model has been widely applied with great success to various fault recognition fields [
46]. In this subsection, the structure of CNN model is first explained in detail. Then, the loss function is elaborately introduced.
The main structure of CNN model is a multi-layer network, which consists of one input layer, alternative convolutional layers and pooling layers, fully connected layers, and one output layer. The convolutional layers applied a number of convolutional kernels to serve as the local filters to slide over the whole input neurons at the previous layer for generating various feature maps. The convolutional operation between the input neurons and the learnable convolutional kernels can be described by
where
is
feature map at the
layer,
denotes the
input feature map at
layer,
denotes the convolutional kernel which connected
input feature map with
feature map,
denotes the bias, and ∗ denotes the convolutional operation.
is an activation function, such as the sigmoid function, hyperbolic tangent function and rectified linear units (ReLU). In contrast with the other activation functions, ReLU applies unilateral inhibition method to alleviate the risk of vanishing gradient problems and accelerate the convergence, which has been widely used in CNN model. The ReLU function is described as
Pooling layers are used to decrease the number of the neurons in the network and achieve low resolution of feature maps, which usually follow the convolution layers adjacently. In CNN model, max-pooling, average-pooling, and stochastic-pooling are the common operations in pooling layers. After multi-stage convolutional layers and pooling layers, a fully connected layer is added to integrate the discriminative local information of the category; in the full connection layer, dropout technology is often used as a regularization method to restrain overfitting.
Furthermore, extracted features of the convolutional layers are flattened and then inputted into the fully connected layers, which work in a similar manner as the traditional back-propagating neural network.
Finally, the output layer uses the classifier for data classification. In the classifier, the softmax function is adopted as the classifier to classify the normal and fault data. To be specific, the estimated probability denoted by
can be calculated as follows.
where the observation
x belongs to
class,
is the
fault class in the full connected layers, and
C is the number of the fault classes. Since the cross-entropy loss of CNN model can accelerate the updating speed of weights and convergence speed of the whole model in comparison with the squared error loss in common classification tasks, in this paper, the cross-entropy loss function is applied to diagnose the various fault categories of rotating machinery and is described as
which is implemented to measure the distance between output probability of the network and real target, i.e., the real probability
.
In contrast with the traditional fully connected neural network, the CNN model is only sensitive to the local receptive field by employing sparse connections to a small scope of neurons, and applies a weight sharing strategy to decrease the number of parameters. Therefore, the CNN model can significantly decrease the computational burden of the whole network and make the network easier to train.
2.5. Multi-Label Approach for Compound Fault Diagnosis
The compound fault vibration signal possesses typical nonlinear and nonstationary properties, and the coupled fault characteristics are immersed in the strong noise. Thus, it is very difficult to effectively extract the coupled characteristics from the raw vibration signal.
The proposed model in this paper locates the compound faults by the multi-label method. To be specific, the label vector of each health condition is represented by multi-hot labels with 1 at multiple indices rather than single hot label. That is, the occurrence of the corresponding fault type is recorded as 1. Subsequently, a softmax layer serves as the output layer in the proposed architecture, where the output represents the probability of each type of fault occurrence. If the position with the highest probability in the network output is the same as the position of 1 in the multi-label, the diagnosis result is regarded as the situation of correction.
Finally, a cross-entropy loss function is implemented to calculate the loss value by the comparison between the output and the multi-label value for updating network parameters. This labelling method can locate the specific fault type of rotating machinery compound faults, where the specific fault type is effectively distinguished through the trained network model.