1. Introduction
As fundamental components of rotating machinery, bearings, gears, couplings, and other related parts are essential in modern industry [
1,
2,
3]. However, harsh factory conditions make them highly prone to wear and performance degradation under prolonged working time. In the era of big data, traditional signal processing methods are increasingly limited due to insufficient accuracy, inefficiency, and dependence on expert knowledge and denoising [
4]. At the same time, the rapid development of deep learning (DL)-based techniques with sensor technology and the Internet of Things (IoT) has brought new opportunities for fault diagnosis. Most importantly, intelligent fault diagnosis (IFD)-based methods, which are known for their efficiency and accuracy, have received great attention from academia and industry in recent years and have been significantly applied in rotating machinery fault diagnosis [
5].
As an efficient fault diagnosis tool, machine learning (ML)-based methods have received much attention from academia and have shown significant potential in rotating machinery fault diagnosis. Generally, the implementation process consists of three key steps: initial data acquisition, feature engineering with expert knowledge, and final health condition classification [
6,
7]. Chen et al. proposed a hierarchical ML-based framework with only two layers to enable fault diagnosis in rotating machinery [
8]. Zhang et al. proposed a novel ML-based framework by integrating reverse feature elimination (RFE) and extreme learning machine (ELM) to achieve fault diagnosis of shield machines [
9]. However, manual feature extraction in ML-based methods heavily relies on expert knowledge, which can lead to significant time and labor costs. Meanwhile, they are commonly constrained by limited data, making them unsuitable for generalization and self-learning capabilities under complex engineering environments [
10]. In recent years, there has been growing interest in developing advanced methods for feature extraction and health condition identification in rotating machinery using large-scale monitoring data.
In recent years, deep learning (DL)-based diagnostic models have been developed to automatically learn fault-based features from monitoring data through deep hierarchical architectures. Compared to ML-based methods, DL-based models can directly establish the relationship between the input samples and health conditions of rotating machinery, which utilizes the learned features to assess the fault types or severities of rotating machinery. Several approaches, including deep belief networks (DBNs) [
11], deep autoencoders (DAEs) [
12], convolutional neural networks (CNNs) [
13], and graph convolutional networks (GCNs) [
14], have been explored for fault diagnosis. For example, Karimi et al. proposed a multi-source domain adaptation method for fault diagnosis, leveraging the attention mechanism, domain attribute loss, and knowledge fusion to enhance cross-domain diagnostic performance [
15]. Shi et al. designed the improved multi-sensor DBNs (MSIDBNs) with redefined pretraining and finetuning stages to effectively extract features in rolling mill fault diagnosis [
16]. Yu et al. introduced a novel DL-based approach by improving one-dimensional and two-dimensional CNNs (I1DCNNs and I2DCNNs) with group normalization (GN) and global average pooling (GAP) for enhanced intelligent fault diagnosis [
17]. Han et al. built a novel multi-source heterogeneous information fusion framework by utilizing improved DL-based methods to enhance the effectiveness and robustness of intelligent fault diagnosis [
18].
Though DL-based methods have made significant advancements in intelligent fault diagnosis of rotating machinery, some drawbacks of these methods still remain. On the one hand, the majority of current DL-based methods assume that they can extract features only for a single task to achieve fault diagnosis while ignoring task-invariant features. On the other hand, most existing DL-based feature extraction methods rely mainly on class labels while overlooking the underlying structure characteristics of samples. This may lead to incomplete feature representations in deep networks.
Multi-task learning (MTL), as a practical and powerful technique for mitigating the first limitation, has been used to learn the task-shared features from various diagnostic tasks. Recently, MTL-based methods have received growing interest in handling multiple tasks jointly. MTL-based methods can enhance model generalization by integrating useful information from multiple tasks, sharing features across tasks, and training within a unified framework. MTL-based techniques have been widely used in various fields such as medical image classification, fault diagnosis, question answering, object detection, and so on. Zhao et al. proposed a novel MTL-based self-supervised transfer learning paradigm by integrating the multi-perspective feature transfer to achieve fault diagnosis [
19]. Zhang et al. proposed an auxiliary prior knowledge-informed framework based on MTL for few-shot fault diagnosis [
20]. Zheng et al. proposed an MTL-based framework with deep inter-task interactions to achieve breast tumor segmentation and classification [
21]. Zhang et al. proposed a novel uncertainty bidirectional guidance MTL-based framework (UBGM) to enhance segmentation and classification in medical image analysis [
22]. However, these methods cannot adequately consider the interactions of different diagnostic tasks at different levels. Therefore, the design of the MTL-based methods still has much room for improvement.
To address the limitations of DL-based methods, a novel multi-task graph-guided convolutional network with an attention mechanism (MTAGCN) has been proposed for intelligent fault diagnosis of rotating machinery. In MTAGCN, multi-task diagnosis is achieved by effectively modeling the data structure, feature extraction, and attention mechanism (AM) in a unified deep network. The fault severity labels and fault type labels are modeled by two task-specific modules, respectively. In order to adjust the data structure from the graph data, task-invariant features are mined from the raw vibration signals using convolutional blocks (Conv). Then, the block is utilized to build instance graphs of the feature structure characteristics. After that, the task-shared module consisting of Conv and graph blocks is used to mine different levels of features. AM should be used to enable connections between task-shared modules and task-specific modules that capture the significance of task-shared characteristics and select valuable and representative characteristics for a particular diagnostic task. The outline of our proposed MTAGCN is shown in
Figure 1. The task-shared module extracts valuable features from input vibration signals, while two task-specific modules further process these features to output diagnostic information, including fault type and severity.
The main contributions of this article are listed as follows.
- (1)
The multi-task GCN is proposed to achieve end-to-end fault diagnosis by modeling the fault type diagnosis task and fault severity diagnosis task simultaneously in a unified DL-based framework.
- (2)
The instance graphs are constructed to mine structural characteristics in the task-shared module. By integrating the AM modules at multiple levels, the proposed MTAGCN can more effectively capture critical task-specific information in specific task modules.
- (3)
Extensive experiments are conducted to show the feasibility and effectiveness of our proposed method.
The remaining part of the paper proceeds as follows: In
Section 2, related works about CNNs, GCNs, AM, and MTL are illustrated. In
Section 3, the proposed MTAGCN is described. The feasibility and effectiveness of the proposed MTAGCN on MTL fault diagnosis are validated by three datasets in
Section 4. In
Section 5, a set of comparison experiments of MTAGCN is carried out. The summary and directions for future research are presented in
Section 6 of this paper.
3. Proposed Method
The proposed MTAGCN consists of a task-shared module and two task-specific modules: fault type diagnosis (FTD) and fault severity diagnosis (FSD). The task-shared module is responsible for extracting task-invariant, structural, and discriminative features from the input data. Specifically, shared features in the task-shared module are propagated through convolutional (Conv) blocks, allowing the model to extract deep features at multiple levels and capture fine-grained information. The task-shared module and the two task-specific modules are interconnected through the proposed attention mechanism (AM), which acts as a feature selector, learning and extracting task-specific features. The detailed structure of the proposed MTAGCN is shown in
Figure 4.
Figure 4 illustrates the MTL-based architecture, which comprises three parallel branches. The top and bottom branches are task-specific pathways, each consisting of attention modules (AM blocks), convolutional blocks, and pooling layers with skip connections to enhance information flow. The middle branch functions as a shared feature extraction backbone, processing input waveform signals through alternating convolutional blocks, graph blocks, and pooling layers. The overall architecture integrates task-specific losses using the formula
and outputs to their respective classification matrices via global average pooling (GAP) and Softmax layers. This design demonstrates an effective multi-task learning paradigm that preserves task specificity while enabling efficient feature sharing.
3.1. Task-Shared Module
The raw vibration signals, without any preprocessing, are directly used as input to the MTAGCN, thereby enabling an end-to-end fault diagnosis framework. The dimensionality of each input signal is 1024 with no overlap, represented as
. To analyze the structural relationships within the input features, the extracted features are transformed into instance graphs. Initially, a convolutional layer (Conv) is applied to extract meaningful features from the raw vibration data. The detailed structure of the Conv block is illustrated in
Figure 5a. This Conv block consists of a one-dimensional convolutional layer (
), followed by batch normalization (
), and a rectified linear unit activation function (
).
The raw vibration signals are fed into the Conv block, which can be mathematically expressed as follows:
where
B denotes the mini-batch. The parameters
and
are learnable scaling and shifting factors in the batch normalization process. The mean
and variance
of the mini-batch are computed as follows:
where
m represents the size of mini-batch
B.
Next, the graph block is employed after the Conv block to construct the adjacency matrix
A based on the extracted features
F. To facilitate the generation of instance graphs, a top-
k sorting mechanism is introduced to identify the
k nearest neighbors, as illustrated in
Figure 5b. The detailed formulation of the graph block is given as follows:
where
A denotes the constructed adjacency matrix, and
represents the output of the multilayer perceptron (MLP). The function
is used to normalize the product of
and its transpose. To enhance computational efficiency, a sparse adjacency matrix
is derived by applying the
operation, which retains only the top
k largest values in each row of
A. This approach effectively reduces the computational overhead while preserving the most significant relational information.
It is noteworthy that the task-shared module is composed of six Conv blocks and three Graph blocks arranged in an alternating sequence. This architecture design enables the network to simultaneously capture both local temporal features (through Conv blocks) and global structural relationships (through Graph blocks) from the input vibration signals. By integrating these complementary feature extraction mechanisms, the MTAGCN framework can effectively learn discriminative representations that enhance fault diagnosis accuracy across different operating conditions.
3.2. Attention Mechanism Module
The detailed architecture of SE-Net is illustrated in
Figure 6.
Assuming the input data for the AM module is defined as
, where
,
W, and
H denote the number of channels, width, and height, respectively. A convolutional operation is then applied to extract deeper features, which can be formulated as follows:
where
represents the deeper features with channel dimension
, width
W, and height
H.
denotes the convolution operation. Subsequently, global information is aggregated by a squeeze operation, resulting in a scalar vector
that captures the global receptive field. The squeezing process is formulated as follows:
where
denotes the global average pooling.
Then, two fully connected (FC) layers are employed to perform the excitation operation, enabling the model to capture inter-channel dependencies within the feature maps.
where
and
refer to the weight parameters of the two FC layers, separately.
denotes the non-linear activation function, which is ReLU.
s represents the weights of the feature maps.
Finally, the scaling operation is applied to re-calibrate the features
, as expressed below:
Most importantly, the AM module not only learns task-specific features from different layers of the task-shared module but also identifies important features to enhance diagnostic performance. It enables the proposed MTAGCN to dynamically learn task-specific features within the task-shared module for use in the two task-specific modules. We integrated AM modules at multiple layers within the proposed MTAGCN to capture multi-level features. Specifically, incorporating AM modules at different levels in the task-shared module allows the two task-specific modules to capture rich, representative features.
3.3. Task-Specific Module
As illustrated in
Figure 4, the task-specific modules include the fault type diagnosis (FTD) module and the fault severity diagnosis (FSD) module, both of which share the same architecture. Each task-specific module consists of three AM modules, three
Conv blocks, three max-pooling layers, and a classifier.
The task-specific modules facilitate the extraction and optimization of features tailored to specific fault diagnosis tasks (i.e., fault types or severities) by leveraging task-shared features through the AM module. The key advantage of these task-specific modules lies in their ability to mine the most discriminative and representative features for fault diagnosis, thereby enhancing the efficiency of multi-task learning (MTL) through the task-shared module.
3.4. Loss Function and Optimization
In general, the proposed MTAGCN framework is trained using a loss function that incorporates the two task-specific modules,
and
, simultaneously. The objective function can be expressed as:
where
and
refer to the impact factors for balancing the two task-specific modules, respectively.
Herein, the
and
can be expressed as
where
Y and
refer to the true and predicted labels of rotating machinery, respectively.
To further improve the fault diagnostic and identification performance of the proposed MTAGCN, the parameters can be fine-tuned and iteratively updated using the Adaptive Moment Estimation (Adam) optimization algorithm proposed by Kingma and Ba [
31]. This enables optimal parameter tuning and, thus, improves the effectiveness and feasibility of the proposed MTAGCN in MTL-based diagnostic tasks. The following equations represent the update process:
where the hyperparameters
,
, and
are used to balance the optimization process, with typical values of
,
, and
, respectively.
indicates the total number of training iterations. The variables
and
are initialized to 0, corresponding to the first and second moment estimates, respectively. Meanwhile,
denotes the updated learning rate after
iteration, and
denotes the initial learning rate, which can be customized by the user.