1. Introduction
Unmanned platforms refer to robots that perform unmanned operations through remote monitoring. As an emerging intelligent equipment, unmanned platforms have a wide range of applications in military, medical, energy, and other fields [
1,
2]. The joint module of unmanned platforms, as a key transmission mechanism, integrates a large number of components, including permanent magnet synchronous motors, planetary gear reducers, encoders, etc. [
3,
4], within a limited space. Considering the harsh working environment and the complexity of its structure, the mechanical components of unmanned platforms inevitably experience various faults, leading to significant economic losses and even endangering one’s safety. In practical situations, when mechanical components, such as bearings and gears, of the joint module fail, the collected vibration signals are inevitably contaminated by strong noise, making it difficult to effectively identify the fault types of the unmanned platform joint module [
5]. Therefore, researching the efficient feature extraction and health status recognition of unmanned platform joint modules using vibration signals has significant engineering application value.
To date, non-stationary signal processing algorithms have been widely applied in the field of mechanical equipment fault diagnosis, such as short-time Fourier transform (STFT) and wavelet transform. However, these time–frequency analysis methods require the selection of suitable window functions or wavelet bases [
6,
7]. To address this issue, a series of signal decomposition methods based on empirical mode decomposition (EMD) [
8,
9] have been proposed and widely extended, such as variational mode decomposition [
10], local mean decomposition [
11], symplectic geometric mode decomposition [
12], and empirical mode decomposition with improved time scales [
13]. These signal decomposition methods can adaptively decompose complex signals into a series of intrinsic mode function (IMF) components. However, these decomposition methods often suffer from mode mixing when dealing with noisy non-stationary signals, affecting the final decomposition results. To overcome the shortcomings of the existing signal decomposition algorithms and improve the fault diagnosis performance, inspired by the deconvolution theory, Miao et al. proposed a new non-stationary signal decomposition algorithm called feature mode decomposition (FMD) [
14]. The FMD method establishes an adaptive finite impulse response (FIR) filter and uses the iterative updating of filter coefficients. During each iteration, the fault period of the measured signal is estimated based on the correlation coefficient (CC) to decompose the non-stationary signal into several modal components. The FMD method not only simultaneously considers the periodicity and impulsive nature of the signal, but also exhibits certain anti-interference capabilities against noise, resulting in a more thorough decomposition.
Although various non-stationary signals can provide distinct fault features, they heavily rely on the application of expert systems for health status recognition, which is clearly not intelligent enough for the big data era of Industry 4.0. Therefore, many scholars have combined signal processing algorithms with intelligent classifiers. The signal processing algorithms can provide richer and more accurate fault features for subsequent intelligent classifiers. For example, Li et al. first used parameter-optimized variational mode decomposition (VMD) for signal decomposition, combined with a sample entropy to extract fault feature vectors, and finally introduced a support vector machine (SVM) to perform the fault diagnosis of rolling bearings [
15]. Zhang et al. used an improved EMD for signal decomposition, combined with signal complexity, to reconstruct effective intrinsic mode functions (EIMFs) [
16]. They then extracted ten time-domain features of EIMFs as inputs to deep belief networks (DBNs) for the diagnosis of rotating machinery faults. In addition, Kim et al. proposed a fault diagnosis method for high-speed train rotational components using a VMD algorithm based on the multi-verse optimization to decompose and reconstruct vibration signals, followed by the feature extraction of the reconstructed signals and the use of adaptive mutation particle swarm optimization-random forest (AMPSO-RF) [
17]. Tong et al. combined the second-generation wavelet packet transform and local feature scale decomposition to decompose the vibration signal into multiple IMFs and applied the extreme learning machine (ELM) to perform the fault diagnosis of rolling bearings [
18]. Tu et al. used an EEMD to decompose the original signal to obtain several IMFs, then conducted an overall average check and optimization of each IMF to obtain multiple sets to characteristic the values, and finally input the characteristic values into a KELM to perform the fault diagnosis of an RV reducer [
19]. However, the aforementioned classifiers (such as the SVM, DBN, RF, ELM, etc.) are only shallow machine learning models, and are ineffective and lack robustness in the face of strong noise and limited training samples in practical situations.
To overcome these challenges, intelligent diagnostic models based on deep learning have garnered increasing attention and achieved numerous results for fault diagnosis. Chen et al. combined the complementary set empirical mode decomposition and STFT to generate a time–frequency diagram of noise reduction signals, and then used a CNN to automatically extract fault information from the time–frequency diagram and realize the fault diagnosis of rolling bearings [
20]. Likewise, Tran et al. presented a two-dimensional time–frequency representation of vibration signals based on the continuous wavelet transform (CWT), combined with a CNN for an intelligent diagnosis of induction motors [
21]. Zaman et al. utilized the S-transform and Sobel filter to create scaleograms with higher time–frequency resolutions. Subsequently, these scaleograms were provided to the CNN for the classification of centrifugal pump health conditions [
22]. Tang et al. used the synchronous compressed wavelet transform to transform the original signal into a two-dimensional image, and then used the CNN for fault feature extraction and classification, ultimately validating the effectiveness using the vibration signals of a hydraulic piston pump, sound signals, and pressure signals [
23]. Huang et al. decomposed the original vibration signal into multi-scale vibration components using the wavelet packet decomposition, then used the CNN to extract fault features from the multi-scale vibration components for the fault diagnosis of a wind turbine gearbox [
24]. Xiong et al. combined a complementary ensemble empirical mode decomposition with multidimensional non-dimensional indicators to extract complementary ensemble multi-dimensional indicators (CEMDIs) from vibration signals, which were then transformed into two-dimensional data as the input for the CNN to perform the fault diagnosis of rotating machinery [
25]. Zhang et al. proposed an adaptive multi-dimensional variational mode decomposition to decompose an original signal and used a multi-scale CNN to extract the fault features from the denoised signal for the fault-type recognition of rolling bearings [
26]. Kim et al. incorporated a health-adaptive time-scale representation (HTSR) into a CNN to extract richer fault information and perform the intelligent diagnosis of gearboxes [
27]. Zhang et al. used compressive sensing to compress and reconstruct vibration signals, then combined transfer learning and the CNN for the fault sample recognition of wind turbine generators [
28]. Gu et al. first decomposed the original signal using the VMD, then used the continuous wavelet transform to transform the IMF decomposition into a two-dimensional time–frequency image, which was trained using a CNN to perform the online fault diagnosis of rotating machinery [
29]. Xie et al. first converted the time domain signal into the frequency domain signal by using the fractional Fourier transform, then converted the amplitude spectrum and phase spectrum into a gram angle field diagram, and finally used the CNN to extract the information of the angle field diagram to realize fault diagnosis of rolling bearings [
30].
Although the aforementioned intelligent diagnosis models achieve certain effectiveness, they still have some drawbacks. On the one hand, the fault features extracted by existing CNN models are scalar, unable to capture the relative positional relationships between fault features, leading to the loss of fault-related information. On the other hand, existing CNN models are often composed of a large number of feature extraction modules, which, limited by this, often become stuck in local optimizations during the training process, and even lead to the degradation of the final recognition capability. Therefore, to address these challenges, this paper proposes a new method for the intelligent health status recognition of unmanned platform joint modules based on the FMD and enhanced capsule network. This method integrates a multi-scale feature enhancement module (MLFE module) and attention mechanism, constructing an intelligent health status recognition model based on the multi-scale feature enhancement module and efficient channel attention module for the enhanced capsule network. The capsule network [
31] adopts feature vectors as the input to reduce the loss of fault feature information, while the MLFE module and attention mechanism enhance the model’s ability to extract fault features.
In summary, this paper proposes a novel approach for the intelligent health status recognition of joint modules in an unmanned platform based on time–frequency representation and enhanced capsule network. The method initially employs the FMD and CWT to extract time–frequency features from the vibration signals. Subsequently, these time–frequency representations are input into an improved capsule network for the fault diagnosis of joint modules in unmanned platforms. The main contributions of this paper are as follows:
- (1)
Introduces a hybrid model based on the FMD, CWT, and capsule network for the fault diagnosis of joint modules in unmanned platforms.
- (2)
Investigates the decomposition effectiveness of the FMD method on vibration signals. The signals processed by FMD are transformed into time–frequency representations using the CWT.
- (3)
Proposes the multi-level feature enhancement (MLFE) module for integrating multi-scale features, and simultaneously utilizes the enhanced channel attention (ECA) module to adaptively extract crucial channel features to enhance the feature extraction capability of the capsule network.
The remaining sections of this study are as follows.
Section 2 briefly outlines the basic theories of the continuous wavelet transform, FMD, and capsule network. In
Section 3, the proposed MLFE module and efficient channel attention module, as well as the overall structure of the proposed method for health status recognition, are introduced.
Section 4 describes the overall roadmap of the proposed intelligent health status recognition framework.
Section 5 validates the effectiveness and robustness of the proposed method through the study and comparative analysis of the experimental platform for unmanned platform joint modules.
Section 6 provides the conclusion of this work.
2. Basic Theory
2.1. Continuous Wavelet Transform
Studies show that one-dimensional time-domain signals are not the best method to reveal fault information, and the research also demonstrates that two-dimensional-type images can represent more complex distributions, thus providing a more distinct way to distinguish different fault distributions. Furthermore, CapsNet was initially proposed for 2D-image classification tasks, making it more suitable for processing 2D data. The CWT is a time–frequency transformation method that is very suitable for analyzing non-stationary signals, as it can accurately locate the frequency information corresponding to each moment [
32]. Therefore, in order to improve the diagnostic performance, the CWT is used to extract deep time–frequency features from the original vibration signal as the input to the intelligent fault diagnosis model.
For a continuous time signal,
, the definition of the continuous wavelet transform is:
In Equation (1), a and b are the scale and translation factors of the wavelet function, respectively, determining the time–frequency window of the wavelet in the frequency and time domains, while represents the wavelet function being used.
The wavelet function is a function with local properties in the time–frequency domain. In this paper, we used the Morlet wavelet because of its concentrated frequency energy, narrow bandwidth, minimal frequency aliasing effects, time-domain symmetry, and linear phase characteristics, ensuring a distortion-free transformation.
Due to the discrete nature of the measured signal, for a discrete time series,
, let
,
, where
,
N is the number of sampling points, and
is the sampling time interval. The continuous wavelet transform of the discrete time series,
, is given by:
By changing the scale factor, a, and the translation factor, b, corresponding to the time indices j and n, a continuous wavelet transform coefficient matrix can be obtained, which reflects the variation of the amplitude of the continuous wavelet transform coefficients with time and scale.
2.2. Feature Mode Decomposition
Inspired by the deconvolution theory, the FMD method is a non-recursive decomposition method aimed at partitioning the original signal into different modes through the design of FIR filter banks. It mainly involves processes, such as an adaptive FIR filter design, filter updating, period estimation, and mode decomposition. Due to the strong dependence of the decomposition results on the filter coefficients, FMD is ultimately considered as a constrained problem solution, which can be expressed as:
where
CK is the objective function, which can simultaneously evaluate the periodicity and impulsiveness of the signal. It is the
n-th decomposition mode and
K-th FIR filter with a length of
L.
M represents the input periodicity and shift order.
To solve the constrained problem in (3), we used the iterative eigenvalue decomposition algorithm. First, we rewrite the decomposition mode in matrix form:
Then, the
CK of the decomposition mode can be defined as:
where the superscript
H denotes the operation of the conjugate transpose, used for the intermediate variable of the weighted correlation matrix. Its expression is shown as:
Substituting Equation (4) into Equation (5), we can obtain the following expression:
where
and
are the weighted correlation and correlation matrices, respectively. Mathematically, maximizing Equation (7) with respect to the filter coefficients is equivalent to solving for the eigenvector corresponding to the maximum eigenvalue,
, in Equation (8):
During the iteration process, the k-th filter coefficient was updated through the solution of Equation (8) to progressively approach the filter signal with the maximum CK.
Due to multi-modality, many modes can contain the same fault features. Therefore, FMD performs the mode selection by computing the Pearson CC to assess the similarity between two modes. The specific expression for CC is given by:
where
and
are the mean values of modes
and
, respectively.
The overall implementation process of the FMD is as follows:
- (1)
Load the original signal, x, and preset the parameters for the FMD, such as the decomposition mode, K; the filter length, L; and the maximum iteration count, I.
- (2)
Initialize the FIR filter bank using M Hanning windows and start the iteration with i = 1. Typically, M is set to be within the range of 5–10.
- (3)
Use to obtain the filtered signal or decomposed mode components, where m = 1, 2, … M; represents the convolution operation.
- (4)
Update the filter coefficients and estimate the fault period based on the input original signal and decomposed mode components. Here, is the time delay corresponding to the local maximum of the autocorrelation spectrum after the first zero crossing.
- (5)
Check if the current iteration count has reached the maximum iteration count. If not, return to step (3); otherwise, proceed to step (6).
- (6)
Compute the CC between two adjacent components and construct a correlation matrix. Select two adjacent mode components with the highest CC and calculate the CK values of the selected mode components based on the estimated fault period. Then, choose the mode component with the larger CK value as the FMD mode component and set M = M − 1.
- (7)
Check if the current mode count has reached the preset mode count, K. If not, return to step (3); otherwise, stop the iteration and output the final decomposition results.
2.3. Capsule Network
A basic capsule network model is shown in
Figure 1. Unlike a typical convolutional neural network, the capsule network performs a convolutional feature extraction, only in the initial part of the network. In the latter part, it replaces the original pooling and fully connected layers with a network-specific primary capsule layer and digit capsule layer. In this process, steps ➀ and ➁ are convolutional layers used to extract low-level convolutional features from the input image. The primary capsule layer is used to generate capsule activation vectors of specific dimensions. Step ➂ refers to the dynamic routing algorithm used to convert primary capsules into digit capsules. The digit capsule layer transforms the length of each capsule vector into the probability of each category appearing using a transformation matrix, and outputs the final classification result.
The capsule network differs from traditional artificial neurons in that it outputs a vector as a result, known as a capsule, which can effectively handle different types of visual stimuli and encode information, such as position, shape, and speed, reducing the loss of important information. When propagating from low-dimensional to high-dimensional capsules, the dynamic routing between the capsules allocates weights to the low-dimensional capsules, enhancing the feature recognition capability, as illustrated in
Figure 2.
This process can be divided into the following steps:
- (1)
The input is a set of lower-level capsules, where represents the number of capsules and represents the number of neurons in each capsule (vector length). Using a transformation matrix, , and representing the number of neurons in the output capsule, the input is transformed into the prediction vector:
where
.
- (2)
The weighted sum of all the obtained prediction vectors is calculated as:
where
is the coupling coefficient and
.
- (3)
The final vector, , is obtained through non-linear mapping by the squeezing function:
where
j represents the
j-th output neuron. Essentially, the squeezing function is a normalization operation that causes the length of each vector fall between 0 and 1 (positively correlated with the original length), only changing the magnitude without affecting the direction;
and
are updated by the dynamic routing algorithm:
In the forward propagation of the network, is initialized as 0, is initially calculated by Equation (14), and then is calculated based on the forward propagation. Equation (15) is used to update and , thereby further updating and .
2.4. Evaluation Metrics
During the process of evaluating a model, it is often necessary to use multiple different metrics for the assessment. Most evaluation metrics can only reflect certain aspects of the model’s performance. The incorrect usage of evaluation metrics can lead to incorrect conclusions and the failure to recognize issues with the model itself, making the correct and rational selection of evaluation metrics extremely important. For common binary classification problems, the classes are typically divided into positive and negative classes, with the positive class being the class of interest. Based on the correctness of the final prediction results, the predicted samples can be categorized into four types: the number of falsely predicted positive samples (False Positive, FP), the number of falsely predicted negative samples (False Negative, FN), the number of correctly predicted positive samples (True Positive, TP), and the number of correctly predicted negative samples (True Negative, TN). In practical fault classification problems, using only positive and negative classes to determine the state of a machine is clearly not detailed enough. It is necessary to differentiate between multiple fault types to thoroughly assess the mechanical fault state, thus requiring the use of evaluation metrics for multi-class problems. The evaluation metrics for multi-class problems have evolved from binary classifications, including metrics, such as accuracy, loss, and the confusion matrix.
In the binary classification, TP, FP, TN, and FN are scalar values, whereas in multi-class problems (taking n classes as an example), these values become n-dimensional vectors, with each dimension of the vector representing a specific value for a particular classification. A sample that is TP in one classification can become FP in another classification.
- (1)
Accuracy represents the proportion of correct predictions to the total number, with a higher ratio indicating a better classification performance.
- (2)
For multi-class classifications, the loss function commonly used is the cross-entropy loss function, where a smaller value indicates a better performance.
- (3)
The confusion matrix, also known as an error matrix, is a way to evaluate the performance of a classifier. It is an n × n matrix that describes the relationship between the true class attributes of the sample data and the predicted recognition classes, widely used for pattern recognition. Each row of the confusion matrix represents the true class attributes of the sample data, while each column represents the predicted recognition classes. It can be inferred that the higher the values on the diagonal of the confusion matrix, the better the classification recognition results.
6. Conclusions
This study proposed an intelligent diagnosis model of a reinforced capsule network based on a multi-scale feature enhancement module and efficient channel attention module for the efficient intelligent identification of the health status of unmanned platform joint modules. Firstly, the collected time-domain vibration signals were filtered using the FMD and then transformed into time–frequency maps with two-dimensional features through the continuous wavelet transform based on the Morlet wavelet as the input to the neural network. In the fault diagnosis stage, the MLFE module, which fused feature maps of different scales in the feedforward process, and the ECA module, which obtained an adaptive interaction range with adaptive convolution kernels, enhanced the key channel features while suppressing irrelevant features. Then, the obtained feature maps were input into the capsule network to convert scalars into vectors, further obtaining detailed information of the features, and finally the vector length output by the main capsule layer was transformed into the diagnosis result. This method not only improved the diagnostic performance of the diagnostic model, but also prevented problems such as the excessively long training time and overfitting caused by overly complex network structures. The effectiveness of the proposed method was verified using vibration signals collected from a simulated test bench of the joint module of an unmanned platform. The experimental results show that the proposed feature-enhanced intelligent diagnosis framework has a high recognition accuracy. The specific conclusions are as follows:
- (1)
The time–frequency representation achieved through the continuous wavelet transform based on Morlet after filtering by the FMD method can obtain richer time-domain and frequency-domain information compared to the original time-domain signal, which is beneficial for the diagnostic performance of the diagnostic model.
- (2)
Compared with the original capsule network, the MLFE and ECA modules included in the proposed method have different degrees of improvement for the original capsule network, with the MLFE module having the greatest improvement, but with increased parameters and training times. Overall, the proposed method is a good improvement compared to the original capsule network.
- (3)
Compared with other advanced diagnostic networks, the proposed feature-enhanced diagnostic model exhibits a good performance in terms of its diagnostic accuracy and diagnostic stability, which also proves the effectiveness of the two proposed modules.
In practical engineering scenarios, acquiring a large amount of labeled fault data is extremely difficult, which remains a major challenge in the development of state recognition. Furthermore, the complex structure of actual engineering machinery leads to the complexity of collected signal components and the presence of a large amount of noise. Therefore, an effective signal processing method for extracting key fault information is particularly important and should be a focus of the future research in this field. Additionally, training neural network models requires significant computational resources. Exploring how to utilize pre-training strategies to reduce the time cost of data processing and ensure diagnostic accuracy are also issues that need to be addressed in future studies.