MTAGCN: Multi-Task Graph-Guided Convolutional Network with Attention Mechanism for Intelligent Fault Diagnosis of Rotating Machinery

Wang, Bo; Zhao, Shuai

doi:10.3390/machines13050347

Open AccessArticle

MTAGCN: Multi-Task Graph-Guided Convolutional Network with Attention Mechanism for Intelligent Fault Diagnosis of Rotating Machinery

by

Bo Wang

^†

and

Shuai Zhao

^*,†

School of Information and Artificial Intelligence, Nanchang Institute of Science & Technology, Nanchang 330108, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Machines 2025, 13(5), 347; https://doi.org/10.3390/machines13050347

Submission received: 1 April 2025 / Revised: 20 April 2025 / Accepted: 21 April 2025 / Published: 23 April 2025

(This article belongs to the Special Issue Signal Processing and Artificial Intelligence Technology for High-End Equipment Fault Diagnosis)

Download

Browse Figures

Versions Notes

Abstract

:

Deep learning (DL)-based methods have shown great success in multi-category fault diagnosis due to their hierarchical networks and automatic feature extraction. However, their superior performance is mostly based on single-task learning, which makes them unsuitable for increasingly sophisticated engineering environments. In this paper, a novel multi-task graph-guided convolutional network with an attention mechanism for intelligent fault diagnosis, named MTAGCN, is proposed. Most existing fault diagnosis models are commonly bounded by a single diagnosis objective, especially when handling multiple tasks jointly. To address this limitation, a new multi-task fault diagnosis framework is designed, incorporating an attention mechanism between the task-specific module and task-shared modules. This framework enables multiple related tasks to be learned jointly while improving diagnostic and identification performance. Moreover, it is observed that most existing DL-based methods share incomplete fault representations, leading to unsatisfactory fault diagnosis. To overcome this issue, a graph convolutional network (GCN)-based fault diagnosis framework is introduced, which not only captures structural characteristics but also enhances diagnostic effectiveness. Comprehensive experiments based on three case studies demonstrate that the proposed MTAGCN outperforms state-of-the-art (SOTA) methods, striking a good balance between accuracy and multi-task learning.

Keywords:

fault diagnosis; multi-task learning; graph convolutional networks; attention mechanism

1. Introduction

As fundamental components of rotating machinery, bearings, gears, couplings, and other related parts are essential in modern industry [1,2,3]. However, harsh factory conditions make them highly prone to wear and performance degradation under prolonged working time. In the era of big data, traditional signal processing methods are increasingly limited due to insufficient accuracy, inefficiency, and dependence on expert knowledge and denoising [4]. At the same time, the rapid development of deep learning (DL)-based techniques with sensor technology and the Internet of Things (IoT) has brought new opportunities for fault diagnosis. Most importantly, intelligent fault diagnosis (IFD)-based methods, which are known for their efficiency and accuracy, have received great attention from academia and industry in recent years and have been significantly applied in rotating machinery fault diagnosis [5].

As an efficient fault diagnosis tool, machine learning (ML)-based methods have received much attention from academia and have shown significant potential in rotating machinery fault diagnosis. Generally, the implementation process consists of three key steps: initial data acquisition, feature engineering with expert knowledge, and final health condition classification [6,7]. Chen et al. proposed a hierarchical ML-based framework with only two layers to enable fault diagnosis in rotating machinery [8]. Zhang et al. proposed a novel ML-based framework by integrating reverse feature elimination (RFE) and extreme learning machine (ELM) to achieve fault diagnosis of shield machines [9]. However, manual feature extraction in ML-based methods heavily relies on expert knowledge, which can lead to significant time and labor costs. Meanwhile, they are commonly constrained by limited data, making them unsuitable for generalization and self-learning capabilities under complex engineering environments [10]. In recent years, there has been growing interest in developing advanced methods for feature extraction and health condition identification in rotating machinery using large-scale monitoring data.

In recent years, deep learning (DL)-based diagnostic models have been developed to automatically learn fault-based features from monitoring data through deep hierarchical architectures. Compared to ML-based methods, DL-based models can directly establish the relationship between the input samples and health conditions of rotating machinery, which utilizes the learned features to assess the fault types or severities of rotating machinery. Several approaches, including deep belief networks (DBNs) [11], deep autoencoders (DAEs) [12], convolutional neural networks (CNNs) [13], and graph convolutional networks (GCNs) [14], have been explored for fault diagnosis. For example, Karimi et al. proposed a multi-source domain adaptation method for fault diagnosis, leveraging the attention mechanism, domain attribute loss, and knowledge fusion to enhance cross-domain diagnostic performance [15]. Shi et al. designed the improved multi-sensor DBNs (MSIDBNs) with redefined pretraining and finetuning stages to effectively extract features in rolling mill fault diagnosis [16]. Yu et al. introduced a novel DL-based approach by improving one-dimensional and two-dimensional CNNs (I1DCNNs and I2DCNNs) with group normalization (GN) and global average pooling (GAP) for enhanced intelligent fault diagnosis [17]. Han et al. built a novel multi-source heterogeneous information fusion framework by utilizing improved DL-based methods to enhance the effectiveness and robustness of intelligent fault diagnosis [18].

Though DL-based methods have made significant advancements in intelligent fault diagnosis of rotating machinery, some drawbacks of these methods still remain. On the one hand, the majority of current DL-based methods assume that they can extract features only for a single task to achieve fault diagnosis while ignoring task-invariant features. On the other hand, most existing DL-based feature extraction methods rely mainly on class labels while overlooking the underlying structure characteristics of samples. This may lead to incomplete feature representations in deep networks.

Multi-task learning (MTL), as a practical and powerful technique for mitigating the first limitation, has been used to learn the task-shared features from various diagnostic tasks. Recently, MTL-based methods have received growing interest in handling multiple tasks jointly. MTL-based methods can enhance model generalization by integrating useful information from multiple tasks, sharing features across tasks, and training within a unified framework. MTL-based techniques have been widely used in various fields such as medical image classification, fault diagnosis, question answering, object detection, and so on. Zhao et al. proposed a novel MTL-based self-supervised transfer learning paradigm by integrating the multi-perspective feature transfer to achieve fault diagnosis [19]. Zhang et al. proposed an auxiliary prior knowledge-informed framework based on MTL for few-shot fault diagnosis [20]. Zheng et al. proposed an MTL-based framework with deep inter-task interactions to achieve breast tumor segmentation and classification [21]. Zhang et al. proposed a novel uncertainty bidirectional guidance MTL-based framework (UBGM) to enhance segmentation and classification in medical image analysis [22]. However, these methods cannot adequately consider the interactions of different diagnostic tasks at different levels. Therefore, the design of the MTL-based methods still has much room for improvement.

To address the limitations of DL-based methods, a novel multi-task graph-guided convolutional network with an attention mechanism (MTAGCN) has been proposed for intelligent fault diagnosis of rotating machinery. In MTAGCN, multi-task diagnosis is achieved by effectively modeling the data structure, feature extraction, and attention mechanism (AM) in a unified deep network. The fault severity labels and fault type labels are modeled by two task-specific modules, respectively. In order to adjust the data structure from the graph data, task-invariant features are mined from the raw vibration signals using convolutional blocks (Conv). Then, the block is utilized to build instance graphs of the feature structure characteristics. After that, the task-shared module consisting of Conv and graph blocks is used to mine different levels of features. AM should be used to enable connections between task-shared modules and task-specific modules that capture the significance of task-shared characteristics and select valuable and representative characteristics for a particular diagnostic task. The outline of our proposed MTAGCN is shown in Figure 1. The task-shared module extracts valuable features from input vibration signals, while two task-specific modules further process these features to output diagnostic information, including fault type and severity.

The main contributions of this article are listed as follows.

(1): The multi-task GCN is proposed to achieve end-to-end fault diagnosis by modeling the fault type diagnosis task and fault severity diagnosis task simultaneously in a unified DL-based framework.
(2): The instance graphs are constructed to mine structural characteristics in the task-shared module. By integrating the AM modules at multiple levels, the proposed MTAGCN can more effectively capture critical task-specific information in specific task modules.
(3): Extensive experiments are conducted to show the feasibility and effectiveness of our proposed method.

The remaining part of the paper proceeds as follows: In Section 2, related works about CNNs, GCNs, AM, and MTL are illustrated. In Section 3, the proposed MTAGCN is described. The feasibility and effectiveness of the proposed MTAGCN on MTL fault diagnosis are validated by three datasets in Section 4. In Section 5, a set of comparison experiments of MTAGCN is carried out. The summary and directions for future research are presented in Section 6 of this paper.

2. Related Works

2.1. Convolutional Neural Networks

CNNs, as one of the classic DL-based architectures, are widely used for processing data characterized by a grid-like structure, such as images and signals [23,24]. Their core components include convolutional layers, pooling layers, and fully connected layers. CNNs have been extensively applied in tasks such as image classification, object detection, and semantic segmentation due to their strong capabilities in feature extraction and representation. The detailed architecture of CNNs is illustrated in Figure 2.

Convolutional layers are the most fundamental component of CNNs and are used to extract coarser features by sliding the input data through a learnable filter or kernel. CNNs can realize equivariant representations through the learnable filters in the input data, while greatly reducing the input dimensionality. To learn about finer features, we usually stack convolutional layers at different depths of the network. Given that the input feature maps

x_{i}^{l - 1} \in R^{H \times W}

, where H and W stand for the height and width, respectively, they are input to the convolutional layer, defined as:

\begin{matrix} x_{c}^{l} = σ (\sum_{i = 1}^{C^{l - 1}} k_{i, c}^{l} * x_{i}^{l - 1} + b_{c}^{l}) \end{matrix}

(1)

where

x_{c}^{n}

represents the output feature of the l-th layer.

σ (\cdot)

is the activation function (i.e., ReLU).

k_{i, c}^{n}

denotes the kernel function. W and C refer to the size of the kernel and the number of channels, separately.

b_{c}^{n}

means the bias of the l-th layer.

The pooling layers are then followed by the convolutional layers to further reduce the dimensionality of the feature map and prevent overfitting. Max-pooling, as one of the common pooling operations, is used in this paper and is defined as follows:

\begin{matrix} P_{n} = \max_{x_{c}^{n} \in S} (x_{c}^{l}) \end{matrix}

(2)

where

P_{n}

means the pooled feature maps of the l-th layer.

max (\cdot)

denotes the max-pooling operation.

After these feature maps undergo several convolution and pooling operations, they are flattened into a one-dimensional vector and input into the fully connected layer for final fault diagnosis, as expressed below:

\begin{matrix} h^{m} = σ^{m} ({(W^{m})}^{T} * x^{m - 1} + b^{m}) \end{matrix}

(3)

where

W^{m}

and

b^{m}

are the weight matrix and bias of the m-th layer in the fully connected layer, respectively.

Finally, the probability distributions from the fully connected layer are converted through a softmax function, which is expressed as:

\begin{matrix} {\hat{y}}_{i} = [\begin{matrix} P (y_{i} = 1) \\ P (y_{i} = 2) \\ ⋮ \\ P (y_{i} = k) \end{matrix}] = \frac{1}{\sum_{i = 1}^{k} e^{h_{o i}}} [\begin{matrix} e^{h_{o i} (1)} \\ e^{h_{o i} (2)} \\ ⋮ \\ e^{h_{o i} (k)} \end{matrix}] \end{matrix}

(4)

where

h_{o i} \in R^{k \times 1}

means the output feature maps of input samples

x_{i}

.

2.2. Graph Convolutional Networks

GCNs, as a novel architecture for DL-based methods, are specialized for processing graph-based data and non-Euclidean data [25]. Compared to the CNNs mentioned above, GCNs not only extract node features but also capture the structural relationship of nodes. By aggregating information from neighboring nodes, GCNs can effectively learn fault-based representations. The detailed architecture of GCNs is illustrated in Figure 3.

A graph

G (V, E, F)

consists of a set of nodes V, a set of edges E, and a set of features F. GCNs update node representations by aggregating information from neighboring nodes, enabling effective processing of graph-structured data. The mathematical equations for the graph convolution operation can be written as follows:

\begin{matrix} H^{(l + 1)} = σ ({\tilde{D}}^{- \frac{1}{2}} \tilde{A} {\tilde{D}}^{- \frac{1}{2}} H^{(l)} W^{(l)}) \end{matrix}

(5)

with

{\tilde{D}}_{i i} = \sum_{j} {\tilde{A}}_{i j}

and

\tilde{A} = A + I

, where

\tilde{D}

is the degree matrix after adding self-loops and

\tilde{D}

means the adjacency matrix after adding self-loops.

H^{(l)} \in R^{N \times d^{(l)}}

is the node feature matrix at the l-th layer. N is the number of nodes.

d^{(l)}

is the feature dimension at the l-th layer.

W^{(l)} \in R^{d^{(l)} \times d^{(l + 1)}}

is the learnable weight matrix at the l-th layer. In this paper, Chebyshev graph convolutional networks (ChebNet) are introduced to capture more complex structural features of graphs.

For an undirected graph G, the normalized Laplacian matrix is written as follows:

\begin{matrix} L = I_{n} - D^{- 1 / 2} A D^{- 1 / 2} \end{matrix}

(6)

where A denotes the adjacency matrix. D means the degree matrix.

I_{n}

is the identity matrix. To apply Chebyshev polynomials, it is necessary to rescale the eigenvalues of the Laplacian matrix L to the interval [−1, 1], resulting in:

\begin{matrix} \tilde{L} = \frac{2 L}{λ_{m a x}} - I_{n} \end{matrix}

(7)

where

λ_{m a x}

denotes the largest eigenvalue of the Laplacian matrix L.

Hence, the convolution operation in ChebNet can be expressed as:

\begin{matrix} g_{θ} * x = \sum_{k = 0}^{K - 1} θ_{k} T_{k} (\tilde{L}) x \end{matrix}

(8)

where

x \in R^{N}

denotes the input signal defined on the graph.

θ \in R^{K}

represents the filter parameters.

T_{k}

is the Chebyshev polynomial of order k.

T_{k} (\tilde{L})

denotes the k-th order Chebyshev polynomial evaluated on the rescaled graph Laplacian matrix

\tilde{L}

.

For multi-channel input features, the layer-wise propagation rule of ChebNet can be formulated as:

\begin{matrix} H^{(l + 1)} = σ (\sum_{k = 0}^{K - 1} T_{k} (\tilde{L}) H^{(l)} W_{k}^{(l)}) \end{matrix}

(9)

where

H^{(l)} \in R^{N \times F_{l}}

denotes the node feature matrix at layer l.

W_{k}^{(l)} \in R^{F_{l} \times F_{l + 1}}

is the learnable weight matrix associated with the k-th order Chebyshev polynomial.

σ

is a non-linear activation function.

2.3. Attention Mechanism

The visual attention mechanism (AM), widely used in deep learning (DL)-based models, is inspired by the capacity of the human visual system to focus on salient regions while filtering out irrelevant information [26,27]. AM dynamically assigns greater weights to the most informative regions of feature maps, thereby enhancing model performance. By weighting different regions or feature maps according to their importance, AM enables the model to prioritize critical information, improving both interpretability and accuracy. Most importantly, AM can be classified into three types: spatial attention, channel attention, and spatial-channel attention.

Spatial attention mechanisms focus on identifying and assigning varying weights to different spatial locations within the input, depending on task-specific requirements. Jaderberg et al. introduced the spatial Transformer module, which enables convolutional neural networks (CNNs) to efficiently learn spatial invariance by performing adaptive transformations on feature maps [28]. Channel attention mechanisms, on the other hand, prioritize important regions within feature maps by emphasizing the relative significance of different channels in the feature representations. Wang et al. proposed the efficient channel attention (ECA) module, which enhances CNN performance by leveraging a local cross-channel interaction strategy, avoiding dimensionality reduction [29]. Spatial-channel attention mechanisms combine the strengths of both spatial and channel attention, allowing deep learning models to extract more informative and discriminative feature representations. He et al. introduced the multi-scale attention (EMA) module, which efficiently preserves per-channel information while reducing computational costs by grouping channels and enhancing spatial semantics within each group [30]. These attention mechanisms are often integrated into deep learning architectures (e.g., CNNs or Transformers) to enhance the model’s capacity to capture long-range dependencies and fine-grained information.

3. Proposed Method

The proposed MTAGCN consists of a task-shared module and two task-specific modules: fault type diagnosis (FTD) and fault severity diagnosis (FSD). The task-shared module is responsible for extracting task-invariant, structural, and discriminative features from the input data. Specifically, shared features in the task-shared module are propagated through convolutional (Conv) blocks, allowing the model to extract deep features at multiple levels and capture fine-grained information. The task-shared module and the two task-specific modules are interconnected through the proposed attention mechanism (AM), which acts as a feature selector, learning and extracting task-specific features. The detailed structure of the proposed MTAGCN is shown in Figure 4.

Figure 4 illustrates the MTL-based architecture, which comprises three parallel branches. The top and bottom branches are task-specific pathways, each consisting of attention modules (AM blocks), convolutional blocks, and pooling layers with skip connections to enhance information flow. The middle branch functions as a shared feature extraction backbone, processing input waveform signals through alternating convolutional blocks, graph blocks, and pooling layers. The overall architecture integrates task-specific losses using the formula

L = λ_{1} L_{1} + λ_{2} L_{2}

and outputs to their respective classification matrices via global average pooling (GAP) and Softmax layers. This design demonstrates an effective multi-task learning paradigm that preserves task specificity while enabling efficient feature sharing.

3.1. Task-Shared Module

The raw vibration signals, without any preprocessing, are directly used as input to the MTAGCN, thereby enabling an end-to-end fault diagnosis framework. The dimensionality of each input signal is 1024 with no overlap, represented as

x \in R^{1 \times 1024}

. To analyze the structural relationships within the input features, the extracted features are transformed into instance graphs. Initially, a convolutional layer (Conv) is applied to extract meaningful features from the raw vibration data. The detailed structure of the Conv block is illustrated in Figure 5a. This Conv block consists of a one-dimensional convolutional layer (

f_{1 D C o n v}

), followed by batch normalization (

f_{bn}

), and a rectified linear unit activation function (

f_{ReLU}

).

The raw vibration signals are fed into the Conv block, which can be mathematically expressed as follows:

\begin{matrix} \{\begin{matrix} f_{1 D C o n v} = 1 D C o n v (x) \\ f_{bn} (f_{1 D C o n v}, B) = γ \frac{f_{1 D C o n v} - μ (B)}{\sqrt{σ {(B)}^{2}}} + β \\ F = max (0, f_{bn}) \end{matrix} \end{matrix}

(10)

where B denotes the mini-batch. The parameters

γ

and

β

are learnable scaling and shifting factors in the batch normalization process. The mean

μ (B)

and variance

σ^{2} (B)

of the mini-batch are computed as follows:

\begin{matrix} μ (B) = \frac{1}{m} \sum_{i = 1}^{m} x_{i} \end{matrix}

(11)

\begin{matrix} σ (B) = \sqrt{\frac{1}{m} \sum_{i = 1}^{m}; {(x_{i} - μ (B))}^{2}} \end{matrix}

(12)

where m represents the size of mini-batch B.

Next, the graph block is employed after the Conv block to construct the adjacency matrix A based on the extracted features F. To facilitate the generation of instance graphs, a top-k sorting mechanism is introduced to identify the k nearest neighbors, as illustrated in Figure 5b. The detailed formulation of the graph block is given as follows:

\begin{matrix} \{\begin{matrix} \tilde{F} = MLP (F) \\ A = normlize (\tilde{F} {\tilde{F}}^{T}) \\ \bar{A} = Top - k (A) \end{matrix} \end{matrix}

(13)

where A denotes the constructed adjacency matrix, and

\tilde{F}

represents the output of the multilayer perceptron (MLP). The function

normalize (\cdot)

is used to normalize the product of

\tilde{F}

and its transpose. To enhance computational efficiency, a sparse adjacency matrix

\bar{A}

is derived by applying the

Top - k (\cdot)

operation, which retains only the top k largest values in each row of A. This approach effectively reduces the computational overhead while preserving the most significant relational information.

It is noteworthy that the task-shared module is composed of six Conv blocks and three Graph blocks arranged in an alternating sequence. This architecture design enables the network to simultaneously capture both local temporal features (through Conv blocks) and global structural relationships (through Graph blocks) from the input vibration signals. By integrating these complementary feature extraction mechanisms, the MTAGCN framework can effectively learn discriminative representations that enhance fault diagnosis accuracy across different operating conditions.

3.2. Attention Mechanism Module

The detailed architecture of SE-Net is illustrated in Figure 6.

Assuming the input data for the AM module is defined as

x_{1} \in R^{C_{1} \times W \times H}

, where

C_{1}

, W, and H denote the number of channels, width, and height, respectively. A convolutional operation is then applied to extract deeper features, which can be formulated as follows:

\begin{matrix} x_{2} = F_{t r} (x_{1}) \end{matrix}

(14)

where

x_{2} \in R^{C_{2} \times W \times H}

represents the deeper features with channel dimension

C_{2}

, width W, and height H.

F_{t r} (\cdot)

denotes the convolution operation. Subsequently, global information is aggregated by a squeeze operation, resulting in a scalar vector

x_{3} \in R^{1 \times 1 \times C_{2}}

that captures the global receptive field. The squeezing process is formulated as follows:

\begin{matrix} x_{2} = F_{s q} (x_{1}) \end{matrix}

(15)

where

F_{s q} (\cdot)

denotes the global average pooling.

Then, two fully connected (FC) layers are employed to perform the excitation operation, enabling the model to capture inter-channel dependencies within the feature maps.

\begin{matrix} \{\begin{matrix} s = σ (W_{2} δ (W_{1} x_{2})) \\ σ = \frac{1}{1 + e^{- x}} \end{matrix} \end{matrix}

(16)

where

W_{1}

and

W_{2}

refer to the weight parameters of the two FC layers, separately.

σ (\cdot)

denotes the non-linear activation function, which is ReLU. s represents the weights of the feature maps.

Finally, the scaling operation is applied to re-calibrate the features

x_{1}

, as expressed below:

\begin{matrix} \tilde{x} = s x_{1} \end{matrix}

(17)

Most importantly, the AM module not only learns task-specific features from different layers of the task-shared module but also identifies important features to enhance diagnostic performance. It enables the proposed MTAGCN to dynamically learn task-specific features within the task-shared module for use in the two task-specific modules. We integrated AM modules at multiple layers within the proposed MTAGCN to capture multi-level features. Specifically, incorporating AM modules at different levels in the task-shared module allows the two task-specific modules to capture rich, representative features.

3.3. Task-Specific Module

As illustrated in Figure 4, the task-specific modules include the fault type diagnosis (FTD) module and the fault severity diagnosis (FSD) module, both of which share the same architecture. Each task-specific module consists of three AM modules, three

1 \times 1

Conv blocks, three max-pooling layers, and a classifier.

The task-specific modules facilitate the extraction and optimization of features tailored to specific fault diagnosis tasks (i.e., fault types or severities) by leveraging task-shared features through the AM module. The key advantage of these task-specific modules lies in their ability to mine the most discriminative and representative features for fault diagnosis, thereby enhancing the efficiency of multi-task learning (MTL) through the task-shared module.

3.4. Loss Function and Optimization

In general, the proposed MTAGCN framework is trained using a loss function that incorporates the two task-specific modules,

L_{1}

and

L_{2}

, simultaneously. The objective function can be expressed as:

\begin{matrix} L = λ_{1} L_{1} + λ_{2} L_{2} \end{matrix}

(18)

where

λ_{1}

and

λ_{2}

refer to the impact factors for balancing the two task-specific modules, respectively.

Herein, the

L_{1}

and

L_{2}

can be expressed as

\begin{matrix} L_{1} (X, Y_{1}) = - \frac{1}{p q} \sum_{p, q}; Y_{1} (p, q) log {\hat{Y}}_{1} (p, q) \end{matrix}

(19)

\begin{matrix} L_{2} (X, Y_{2}) = - \frac{1}{p q} \sum_{p, q}; Y_{2} (p, q) log {\hat{Y}}_{2} (p, q) \end{matrix}

(20)

where Y and

\hat{Y}

refer to the true and predicted labels of rotating machinery, respectively.

To further improve the fault diagnostic and identification performance of the proposed MTAGCN, the parameters can be fine-tuned and iteratively updated using the Adaptive Moment Estimation (Adam) optimization algorithm proposed by Kingma and Ba [31]. This enables optimal parameter tuning and, thus, improves the effectiveness and feasibility of the proposed MTAGCN in MTL-based diagnostic tasks. The following equations represent the update process:

\begin{matrix} \{\begin{matrix} r^{(epoch + 1)} = r^{(0)} \sqrt{(1 - β_{2}^{epoch}) / 1 - β_{1}^{epoch}} \\ m_{1}^{(epoch + 1)} = β_{1} m_{1}^{(epoch)} + (1 - β_{1}) Δ θ_{f t}^{(epoch)} \\ m_{2}^{(epoch + 1)} = β_{2} m_{2}^{(epoch)} + (1 - β_{2}) {(Δ θ_{f t}^{(epoch)})}^{2} \\ θ_{f t}^{(epoch + 1)} = θ_{f t}^{(epoch)} - r^{(epoch + 1)} m_{1}^{(epoch + 1)} / \sqrt{m_{2}^{(epoch + 1)} + ε} \end{matrix} \end{matrix}

(21)

where the hyperparameters

β_{1}

,

β_{2}

, and

ε

are used to balance the optimization process, with typical values of

0.9

,

0.999

, and

10^{- 8}

, respectively.

epoch

indicates the total number of training iterations. The variables

m_{1}

and

m_{2}

are initialized to 0, corresponding to the first and second moment estimates, respectively. Meanwhile,

r^{(epoch + 1)}

denotes the updated learning rate after

(epoch + 1)

iteration, and

r^{(0)}

denotes the initial learning rate, which can be customized by the user.

4. Experiments

4.1. Case 1: Cylindrical Roller Bearing (CRB) Dataset

The CRB dataset is widely employed in bearing fault detection research and serves as a standard for evaluating the performance and practicality of the proposed approach in comparison to other state-of-the-art (SOTA) models [32]. As depicted in Figure 7, the experimental setup consists of a motor, an accelerometer, two bearings, and a load cell. Vibration data from the test bearing were recorded with an accelerometer at a sampling rate of 70,000 Hz. The CRB dataset includes four distinct bearing health statuses: normal (N), inner race fault (IRF), outer race fault (ORF), and roller fault (RF). The specifications of the cylindrical rolling bearing are shown in Table 1. For each faulty bearing, five different fault diameters were introduced using electrical discharge machining (EDM), as described in Table 2. The experiments were conducted with a shaft rotation speed of 2050 rpm and a load of 200 N. The different fault types and their severity levels define two distinct task requirements. In total, twelve rolling bearing health conditions were identified, alongside the normal condition. For simplicity, the thirteen health conditions are denoted as N, IRF-I, IRF-II, IRF-III, IRF-IV, IRF-V, ORF-I, ORF-II, ORF-III, ORF-IV, ORF-V, RF-I, RF-II, RF-III, RF-IV, and RF-V, where IRF-I represents the smallest fault diameter in the inner ring fault category. Detailed diagrams of the different faults are illustrated in Figure 8.

4.2. Case 2: HUSTbearing Dataset

To further demonstrate the advancement and advantages of the proposed MTAGCN under different rotational speed conditions, we conducted experiments using the HUSTbearing dataset. This dataset includes bearing data collected from a simulation platform under varying rotational speeds, fault types, and severity levels [33]. The parameters of the tested bearings are described in Table 3.

As illustrated in Figure 9, the experimental platform consists of a speed controller, motor, shaft, acceleration sensor, bearing module, and data acquisition board. Unprocessed vibration signals were collected from the test bearings using accelerometers mounted on the bearing casing under four distinct operating conditions (i.e., 65 Hz, 70 Hz, 75 Hz, and 80 Hz), with a sampling rate of 25.6 kHz and a constant load maintained across all speeds. To simulate various fault types and severities, single-point defects were introduced into the inner ring, outer ring, and rolling elements of the test bearings, as shown in Figure 10. The bearing conditions are categorized into four groups: normal (N), inner ring fault (IRF), outer ring fault (ORF), and roller ball fault (RBF). Each of the IRF, ORF, and RBF conditions includes two severity levels, denoted as I and II. In total, the HUSTbearing dataset comprises seven distinct bearing health conditions.

Based on the HUSTbearing dataset, four diagnostic tasks were constructed under varying rotational speeds, as summarized in Table 4.

4.3. Case 3: MCC5-THU Gearbox Dataset

In previous studies, vibration data were typically generated using simulated bearing platforms [34]. However, these case studies often fall short of capturing the complexity of real-world rotating machinery, thus limiting the generalization capability of diagnostic models. To further evaluate the robustness and generalizability of the proposed method, we employed a real gearbox test rig to collect vibration signals. The experimental setup of the MCC5-THU gearbox test rig is depicted in Figure 11. It consists of two three-axis vibration acceleration sensors, a frequency inverter, a torque sensor, a magnetic powder brake, a parallel gearbox, an epicyclic gearbox, a speed sensor, a motor, and a data acquisition system. In this experimental study, four types of gear failures were induced: tooth crack, gear wear, tooth break, and gear pitting, each with three levels of fault severity (i.e., light, medium, and severe), as summarized in Table 5. Consequently, the dataset includes twelve distinct gear health conditions. Vibration data were acquired using a three-axis accelerometer mounted on the gearbox, sampled at 12.8 kHz over a 60 s duration.

4.4. Baseline Methods

To demonstrate the practicality and effectiveness of the proposed MTAGCN, we conducted a series of comparative experiments across three rotating machinery case studies. In these experiments, MTAGCN was benchmarked against four state-of-the-art (SOTA) methods, including two single-task learning (STL)-based approaches—MRFGCN and IDBN—and two multi-task learning (MTL)-based approaches—MTCNN and JLCNN.

4.5. Implementation Details

All experiments were implemented using the PyTorch 1.9.0 framework with Python 3.8.3. The computational workload was carried out on a workstation equipped with an Intel i5-10400F CPU, 32 GB of RAM, and an NVIDIA RTX 3070 GPU. In all three case studies, the acquired vibration signals were segmented into subsamples using a sliding window of 1024 points without overlap between consecutive segments. All models were trained for 300 epochs. For the proposed MTAGCN framework, the batch size was set to 32 during training and 16 during testing.

4.6. Diagnosis Results Analysis

The experimental results on the CRB dataset are summarized in Table 6 and illustrated in Figure 12.

As shown in Table 6 and Figure 12, the proposed MTAGCN achieves 100% diagnostic accuracy on both the FTI and FSI tasks, demonstrating its superior performance and robustness. While MTCNN and JLCNN attain over 95% accuracy on the FTI task, their performance on the FSI task drops to approximately 90%, which is notably lower than that of MTAGCN. Moreover, other benchmark methods, such as MRFGCN and IDBN, exhibit even lower accuracy on the FSI task, with results not exceeding 90%. Additionally, the proposed MTAGCN displays significantly smaller error bars compared to other state-of-the-art methods, further validating its enhanced stability and robustness on the CRB dataset.

As demonstrated in Table 7 and Figure 13, the proposed MTAGCN consistently surpasses other state-of-the-art methods in both the FTI and FSI tasks on the HUSTbearing dataset, underscoring its superior adaptability and diagnostic accuracy across varying rotational speed conditions.

As shown in Table 7 and Figure 13, the proposed MTAGCN significantly outperforms existing state-of-the-art (SOTA) methods in both the FTI and FSI tasks across all four rotating speed scenarios. While MTCNN and JLCNN exhibit competitive performance in the FTI task, MTAGCN consistently surpasses JLCNN in the FSI task throughout all experiments. This demonstrates the effectiveness of MTAGCN in handling multiple tasks simultaneously and improving diagnostic performance holistically. In contrast, JLCNN and other single-task learning (STL)-based methods exhibit relatively lower recognition accuracy in both tasks. Notably, MTAGCN achieves an average diagnostic accuracy of 100% across both tasks, outperforming all comparison methods. Although MRFGCN and IDBN perform well under the

T_{1}

scenario, their accuracy drops considerably in

T_{2}

, primarily because STL-based methods treat tasks independently and fail to leverage shared features across tasks. Moreover, the diagnostic accuracy of most SOTA methods tends to decline as the rotation speed increases, highlighting the challenge of maintaining robustness under varying operational conditions.

The diagnostic results from the MCC5-THU are described in Table 8, and its corresponding histogram is illustrated in Figure 14.

4.7. Feature Visualization

In this section, t-distributed stochastic neighbor embedding (t-SNE) is employed to visualize high-dimensional features in a low-dimensional space, thereby providing deeper insight into the strengths and limitations of each method. The HUSTbearing dataset is selected as a representative case for feature visualization. The detailed t-SNE visualization results for all compared methods under the

T_{1}

scenario at 65 Hz are illustrated in Figure 15.

As illustrated in Figure 15, the extracted features corresponding to the four domains (i.e., four fault categories) are well aligned and clearly separable. Furthermore, the multi-task learning (MTL)-based methods demonstrate superior capability in learning more discriminative and informative features compared to single-task learning (STL)-based methods, which is consistent with the diagnostic performance results. These observations confirm that the proposed method effectively captures task-invariant and class-discriminative features, which are essential for enhancing the performance of MTL-based fault diagnosis models.

4.8. Classification Performance

To evaluate the diagnostic effectiveness of the proposed method, confusion matrices are utilized to present the relationship between true and predicted labels in a structured format. The confusion matrix results for all methods on the CRB dataset are shown in Figure 16.

It is evident that the proposed MTAGCN accurately classifies all samples in the FTI task. In comparison to other MTL-based methods (i.e., MTAGCN, MTCNN, and JLCNN), the STL-based methods (i.e., MRFGCN and IDBN) demonstrate lower performance in identifying health conditions.

As shown in Figure 17, our proposed MTAGCN achieves the highest classification performance. In contrast, the worst-performing method (i.e., IDBN) frequently misclassifies Label 2 into other labels. Compared to the FTI task, the diagnostic accuracies of all methods—except for MTAGCN—decrease in the FSI task.

5. Model Discussions

5.1. Detailed Structure and Complexity of MTAGCN

To illustrate the detailed structure and explore the complexity of the proposed MTAGCN, the corresponding parameters are presented in Table 9.

As observed from the results, the total number of FLOPs for the proposed MTAGCN is 15,497,888.

5.2. Ablation Study

In this section, an ablation study is conducted to evaluate the performance of each key innovation in our proposed method. The results of different variants are presented as follows: (1) Method 1 (M1), MTAGCN without GCNs, and (2) Method 2 (M2), MTAGCN without the attention mechanism (AM).

As shown in the data from Table 10, our proposed MTAGCN consistently outperforms the four variants in terms of accuracy. Specifically, MTAGCN surpasses M1 and M2 by 0.96% and 1.46% at 65 Hz, respectively. These significant improvements highlight the importance of incorporating GCNs and the attention mechanism (AM) into MTL-based fault diagnosis.

5.3. Evaluation Metrics

To further demonstrate the effectiveness and practicality of the proposed MTAGCN, four additional evaluation metrics—AUC, precision, recall, and F1-score—are introduced in this section. A detailed comparison of these evaluation metrics on the CRB dataset is presented in Table 11.

The proposed MTAGCN achieves a mean accuracy, AUC, precision, recall, and F1-score of 100% on the CRB dataset, significantly outperforming other state-of-the-art methods. This superior performance can be attributed to the ability of MTAGCN to effectively mine representative features from task-shared modules, a challenge for other methods, which leads to a decline in their fault diagnostic performance.

6. Conclusions

In this paper, we propose the MTAGCN framework to model two specific tasks within a unified deep network, enabling multi-task learning (MTL) for simultaneous fault diagnosis of fault types and severities. Extensive experimental results demonstrate the effectiveness of the proposed method and its superiority over various state-of-the-art (SOTA) approaches. The key conclusions of this work are as follows: (1) the graph block allows features extracted by the Conv block to be automatically transformed into instance graphs, complementing the fault-related information; (2) the task-shared module learns common features, while the two task-specific modules capture task-invariant features; (3) experimental results from three case studies of rotating machinery show that the proposed MTAGCN can effectively mine diagnostic features for both tasks, outperforming comparison methods in multi-task-based fault diagnosis.

Future work on the MTAGCN framework will focus on (1) achieving domain adaptation to handle variable working conditions; (2) exploring multi-sensor data fusion to enhance robustness and anti-noise capabilities; and (3) advancing research in real-time fault diagnosis with the goal of enabling online multi-task learning.

Author Contributions

B.W.: Conceptualization, methodology, software, validation, formal analysis, investigation, resources, data curation, writing—original draft preparation. S.Z.: writing—review and editing, visualization, supervision, project administration, funding acquisition. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the research project funded by the Initial Scientific Research Foundation for Talented Scholars of Nanchang Institute of Science and Technology (Grant numbers NGRCZX-24-10), the Science and Technology Research Project of Jiangxi Provincial Department of Education (Grant No. GJJ2402814), the Initial Scientific Research Foundation for Talented Scholars of Nanchang Institute of Science and Technology (Grant No. NGRCZX-23-10), and the Science and Technology Research Project of Nanchang Institute of Science and Technology (Grant No. NGKJ-22-03).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The authors do not have permission to share private data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Yu, Y.; Karimi, H.R.; Gelman, L.; Cetin, A.E. MSIFT: A novel end-to-end mechanical fault diagnosis framework under limited & imbalanced data using multi-source information fusion. Expert Syst. Appl. 2025, 274, 126947. [Google Scholar]
Zhao, S.; Bao, L.; Hou, C.; Bai, Y.; Yu, Y. Multi-source domain adversarial graph convolutional networks for rolling mill health states diagnosis under variable working conditions. Struct. Health Monit. 2024, 23, 3505–3524. [Google Scholar] [CrossRef]
Shukla, K.; Holderbaum, W.; Theodoridis, T.; Wei, G. Enhancing gearbox fault diagnosis through advanced feature engineering and data segmentation techniques. Machines 2024, 12, 261. [Google Scholar] [CrossRef]
Liu, X.; Wen, S.; Liu, H.; Yu, F.R. CPL-SLAM: Centralized Collaborative Multi-Robot Visual-Inertial SLAM Using Point-and-Line Features. IEEE Internet Things J. 2025. [Google Scholar] [CrossRef]
Yu, Y.; He, Y.; Karimi, H.R.; Gelman, L.; Cetin, A.E. A two-stage importance-aware subgraph convolutional network based on multi-source sensors for cross-domain fault diagnosis. Neural Netw. 2024, 179, 106518. [Google Scholar] [CrossRef]
Chen, B.; Smith, W.A.; Cheng, Y.; Gu, F.; Chu, F.; Zhang, W.; Ball, A.D. Probability distributions and typical sparsity measures of Hilbert transform-based generalized envelopes and their application to machine condition monitoring. Mech. Syst. Signal Process. 2025, 224, 112026. [Google Scholar] [CrossRef]
Chen, B.; Cheng, Y.; Allen, P.; Wang, S.; Gu, F.; Zhang, W.; Ball, A.D. A product envelope spectrum generated from spectral correlation/coherence for railway axle-box bearing fault diagnosis. Mech. Syst. Signal Process. 2025, 225, 112262. [Google Scholar] [CrossRef]
Chen, Q.; Wei, H.; Rashid, M.; Cai, Z. Kernel extreme learning machine based hierarchical machine learning for multi-type and concurrent fault diagnosis. Measurement 2021, 184, 109923. [Google Scholar] [CrossRef]
Liu, X.; Qi, W.; Wang, Y.; Shao, C.; Cong, Q. Fault diagnosis of shield machine based on RFE and ELM. Res. Sq. 2020, preprint. [Google Scholar] [CrossRef]
Wen, S.; Liu, X.; Wang, Z.; Zhang, H.; Zhang, Z.; Tian, W. An improved multi-object classification algorithm for visual SLAM under dynamic environment. Intell. Serv. Robot. 2022, 15, 39–55. [Google Scholar] [CrossRef]
Huang, C.; Shi, Q.; Ding, W.; Mei, P.; Karimi, H.R. A robust MPC approach for platooning control of automated vehicles with constraints on acceleration. Control Eng. Pract. 2023, 139, 105648. [Google Scholar] [CrossRef]
Mei, P.; Karimi, H.R.; Xie, J.; Chen, F.; Ou, L.; Yang, S.; Huang, C. Battery state estimation methods and management system under vehicle–cloud collaboration: A Survey. Renew. Sustain. Energy Rev. 2024, 206, 114857. [Google Scholar] [CrossRef]
Ullah, A.; Younas, M.; Saharudin, M.S. Digital twin framework using real-time asset tracking for smart flexible manufacturing system. Machines 2025, 13, 37. [Google Scholar] [CrossRef]
Huang, G.; Lei, W.; Dong, X.; Zou, D.; Chen, S.; Dong, X. Stage-Based Remaining Useful Life Prediction for Bearings Using GNN and Correlation-Driven Feature Extraction. Machines 2025, 13, 43. [Google Scholar] [CrossRef]
Yu, Y.; Karimi, H.R.; Shi, P.; Peng, R.; Zhao, S. A new multi-source information domain adaption network based on domain attributes and features transfer for cross-domain fault diagnosis. Mech. Syst. Signal Process. 2024, 211, 111194. [Google Scholar] [CrossRef]
Yu, Y.; Shi, P.; Tian, J.; Xu, X.; Hua, C. Rolling mill health states diagnosing method based on multi-sensor information fusion and improved DBNs under limited datasets. ISA Trans. 2023, 134, 529–547. [Google Scholar] [CrossRef]
Shi, P.; Yu, Y.; Gao, H.; Hua, C. A novel multi-source sensing data fusion driven method for detecting rolling mill health states under imbalanced and limited datasets. Mech. Syst. Signal Process. 2022, 171, 108903. [Google Scholar] [CrossRef]
Han, D.; Zhang, Y.; Yu, Y.; Tian, J.; Shi, P. Multi-source heterogeneous information fusion fault diagnosis method based on deep neural networks under limited datasets. Appl. Soft Comput. 2024, 154, 111371. [Google Scholar] [CrossRef]
Zhao, L.; He, Y.; Dai, D.; Wang, X.; Bai, H.; Huang, W. A Novel Multi-Task Self-Supervised Transfer Learning Framework for Cross–Machine Rolling Bearing Fault Diagnosis. Electronics 2024, 13, 4622. [Google Scholar] [CrossRef]
Zhang, T.; Chen, J.; Ye, Z.; Liu, W.; Tang, J. Prior knowledge-informed multi-task dynamic learning for few-shot machinery fault diagnosis. Expert Syst. Appl. 2025, 271, 126439. [Google Scholar] [CrossRef]
Zheng, S.; Li, J.; Qiao, L.; Gao, X. Multi-task interaction learning for accurate segmentation and classification of breast tumors in ultrasound images. Phys. Med. Biol. 2025, 70, 065006. [Google Scholar] [CrossRef]
Wu, X.; Gou, G. Uncertainty bidirectional guidance of multi-task mamba network for medical image classification and segmentation. Signal Image Video Process. 2025, 19, 29. [Google Scholar] [CrossRef]
Trejo-Chavez, O.; Cruz-Albarran, I.A.; Resendiz-Ochoa, E.; Salinas-Aguilar, A.; Morales-Hernandez, L.A.; Basurto-Hurtado, J.A.; Perez-Ramirez, C.A. A CNN-based methodology for identifying mechanical faults in induction motors using thermography. Machines 2023, 11, 752. [Google Scholar] [CrossRef]
Kumar, R.; Kumar, P.; Vashishtha, G.; Chauhan, S.; Zimroz, R.; Kumar, S.; Kumar, R.; Gupta, M.K.; Ross, N.S. Fault identification of direct-shift gearbox using variational mode decomposition and convolutional neural network. Machines 2024, 12, 428. [Google Scholar] [CrossRef]
Peng, R.; Gong, C.; Zhao, S. Multi-Sensor Information Fusion with Multi-Scale Adaptive Graph Convolutional Networks for Abnormal Vibration Diagnosis of Rolling Mill. Machines 2025, 13, 30. [Google Scholar] [CrossRef]
Kim, C.S.; Kim, H.B.; Lee, J.M. Self-Explanatory Fault Diagnosis Framework for Industrial Processes Using Graph Attention. IEEE Trans. Ind. Inform. 2025, 21, 3396–3405. [Google Scholar] [CrossRef]
Snyder, Q.; Jiang, Q.; Tripp, E. Integrating self-attention mechanisms in deep learning: A novel dual-head ensemble transformer with its application to bearing fault diagnosis. Signal Process. 2025, 227, 109683. [Google Scholar] [CrossRef]
Jaderberg, M.; Simonyan, K.; Zisserman, A. Spatial transformer networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
Wang, Q.; Wu, B.; Zhu, P.; Li, P.; Zuo, W.; Hu, Q. ECA-Net: Efficient channel attention for deep convolutional neural networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 11534–11542. [Google Scholar]
Ouyang, D.; He, S.; Zhang, G.; Luo, M.; Guo, H.; Zhan, J. Efficient multi-scale attention module with cross-spatial learning. In Proceedings of the ICASSP 2023–2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Kumar, A.; Zhou, Y.; Gandhi, C.P.; Kumar, R.; Xiang, J. Bearing defect size assessment using wavelet transform based Deep Convolutional Neural Network (DCNN). Alex. Eng. J. 2020, 59, 999–1012. [Google Scholar] [CrossRef]
Zhao, C.; Zio, E.; Shen, W. Domain generalization for cross-domain fault diagnosis: An application-oriented perspective and a benchmark study. Reliab. Eng. Syst. Saf. 2024, 245, 109964. [Google Scholar] [CrossRef]
Chen, S.; Liu, Z.; He, X.; Zou, D.; Zhou, D. Multi-mode fault diagnosis datasets of gearbox under variable working conditions. Data Brief 2024, 54, 110453. [Google Scholar] [CrossRef] [PubMed]

Figure 1. Overview of the proposed MTAGCN.

Figure 2. Detailed architecture of CNNs.

Figure 3. Detailed architecture of GCNs.

Figure 4. Detailed architecture of the proposed method.

Figure 5. Detailed process of the Conv block and instance graph generation. (a) Conv block. (b) Instance graph generation.

Figure 6. The detailed process of the SE-Net.

Figure 7. Experimental benchmark of the CRB dataset. (a) Typical photograph. (b) Schematic diagram.

Figure 8. The bearing faults in the CRB dataset. (a–d) Inner race fault. (e–h) Outer race fault. (i–l) Roller fault.

Figure 9. Experimental platform of the HUSTbearing dataset.

Figure 10. The bearing faults in the HUSTbearing dataset. (a) Normal. (b) Inner ring fault. (c) Outer ring fault. (d) Roller ball fault. (e) Inner ring fault. (f) Outer ring fault. (g) Roller ball fault.

Figure 11. Experimental platform of the MCC5-THU Gearbox dataset.

Figure 12. Ten experimental diagnostic results of all methods under the CRB dataset. (a) FTI task. (b) FSI task.

Figure 13. Diagrams for diagnostic results of all methods under four rotating speeds: (a) 65 Hz, (b) 70 Hz, (c) 75 Hz, and (d) 80 Hz.

Figure 14. Diagnostic accuracy of two fault diagnosis tasks under MCC5-THU dataset.

Figure 15. Feature visualization of different methods. (a) Ours. (b) MTCNN. (c) JLCNN. (d) MRFGCN. (e) IDBN.

Figure 16. Confusion matrix results of all methods for the FTI task on the CRB dataset. (a) Ours. (b) MTCNN. (c) JLCNN. (d) MRFGCN. (e) IDBN.

Figure 17. Confusion matrix results of all methods for the FSI task on the CRB dataset. (a) Ours. (b) MTCNN. (c) JLCNN. (d) MRFGCN. (e) IDBN.

Table 1. Specifications of the cylindrical rolling bearing.

Parameters	Value
Diameter (Inner race)	25 mm
Diameter (Outer race)	52 mm
Pitch Diameter (D)	38.9 mm
Ball Diameter (d)	7.5 mm
Number of rolling elements (N)	13
Contact angle ( $φ$ )	$0^{\circ}$

Table 2. Defect width of the cylindrical roller bearing.

Bearing Component	Label	Defect Width (mm)
Inner race	IRF-I	0.43 mm
	IRF-II	1.01 mm
	IRF-III	1.56 mm
	IRF-IV	2.03 mm
Outer race	ORF-I	0.42 mm
	ORF-II	0.86 mm
	ORF-III	1.55 mm
	ORF-IV	1.97 mm
Roller	RF-I	0.49 mm
	RF-II	1.16 mm
	RF-III	1.73 mm
	RF-IV	2.12 mm

Table 3. Parameters of the tested bearings.

Parameters	Value
Shaft Diameter	38.52 mm
Ball Diameter	7.94 mm
Pitch Diameter (D)	9
Number of rolling balls (N)	9

Table 4. Fault diagnosis based on the HUSTbearing dataset.

Rotating Speed	Load	Samples	Tasks
Rotating Speed	Load	Samples	$T_{1}$	$T_{2}$
65 Hz	Equal	200 × 7	N	N
70 Hz		200 × 7	IRF	I severity
75 Hz		200 × 7	ORF	II severity
80 Hz		200 × 7	RBF

Table 5. The details of gear fault parameters.

Fault Parameter of Gear	Fault Degree
Fault Parameter of Gear	Light	Medium	High
Teeth Crack Depth	1/4 of the teeth height	1/2 of the teeth height	3/4 of the teeth height
Gear Wear	1/3 of the teeth surface area	1/2 of the teeth surface area	Full teeth surface area
Teeth Break	1/4 of the teeth width	1/2 of the teeth width	3/4 of the teeth width
Gear Pitting	Fault diameter 0.5 mm	Fault diameter 1.0 mm	Fault diameter 1.5 mm

Table 6. Diagnostic performance of two fault diagnosis tasks on the CRB dataset (

T_{1}

means the fault type diagnosis (FTD) task;

T_{2}

means the fault severity diagnosis (FSD) task).

Table 6. Diagnostic performance of two fault diagnosis tasks on the CRB dataset (

T_{1}

means the fault type diagnosis (FTD) task;

T_{2}

means the fault severity diagnosis (FSD) task).

Ours		MTCNN		JLCNN		MRFGCN		IDBN
$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$
100	100	99.14 ± 0.21	95.64 ± 0.73	98.09 ± 0.45	93.78 ± 0.89	97.49 ± 0.44	89.14 ± 1.24	96.84 ± 0.51	87.26 ± 1.33

Table 7. Diagnostic accuracy (%) of two fault diagnosis tasks under four rotating speeds (

T_{1}

means the fault type diagnosis (FTD) task;

T_{2}

means the fault severity diagnosis (FSD) task).

Table 7. Diagnostic accuracy (%) of two fault diagnosis tasks under four rotating speeds (

T_{1}

means the fault type diagnosis (FTD) task;

T_{2}

means the fault severity diagnosis (FSD) task).

Speed	Ours		MTCNN		JLCNN		MRFGCN		IDBN
Speed	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$
65 Hz	100	100	98.50	96.89	97.74	95.98	98.94	85.24	92.38	78.25
70 Hz	100	100	94.20	99.16	93.51	90.55	97.67	82.97	90.25	71.56
75 Hz	100	100	98.31	96.41	93.72	91.08	95.19	81.12	88.64	66.74
80 Hz	100	100	97.58	95.94	92.57	90.41	95.84	80.56	89.05	68.57
Average	100	100	97.15	97.10	94.39	92.01	96.91	82.47	90.08	71.28

Table 8. Diagnostic accuracy of two fault diagnosis tasks under the MCC5-THU dataset.

Ours		MTCNN		JLCNN		MRFGCN		IDBN
$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$	$T_{1}$	$T_{2}$
99.51 ± 0.11	99.47 ± 0.14	98.64 ± 0.39	93.71 ± 0.87	97.64 ± 0.61	89.47 ± 0.92	95.27 ± 0.65	85.64 ± 1.41	95.29 ± 0.76	81.67 ± 1.54

Table 9. Detailed structure and complexity of MTAGCN.

Component	Layer	Filter	Input Size	Output Size	Parameters	FLOPs
Task-shared	Conv1	16 × 7 × 1	1 × 1024	16 × 1024	192	229,376
	Conv2	16 × 7 × 1	16 × 1024	16 × 1024	1872	3,670,016
	Graph	16 × 7 × 1	16 × 1024	16 × 1024	1872	3,670,016
	MaxPool	4 × 1 (stride = 4)	16 × 1024	16 × 256	0	16,384
	Conv3	32 × 5 × 1	16 × 256	32 × 256	192	229,376
	Conv4	32 × 5 × 1	16 × 256	32 × 256	1872	3,670,016
	Graph	32 × 5 × 1	16 × 256	32 × 256	1872	3,670,016
	MaxPool	4 × 1 (stride = 4)	16 × 256	32 × 64	0	16,384
	Conv5	64 × 3 × 1	32 × 64	64 × 64	192	229,376
	Conv6	64 × 3 × 1	32 × 64	64 × 64	1872	3,670,016
	Graph	64 × 3 × 1	32 × 64	64 × 64	1872	3,670,016
Task-specific	Attention	k = 3	16 × 1024	16 × 1024	6	192
	Att Conv1	32 × 1 × 1	16 × 1024	32 × 1024	5280	2,621,440
	Att MaxPool	4 × 1 (stride = 4)	32 × 1024	32 × 256	0	16,384
	Attention	k = 3	32 × 256	32 × 256	0	16,384
	Att Conv2	64 × 1 × 1	32 × 256	64 × 256	5280	2,621,440
	Att MaxPool	4 × 1 (stride = 4)	64 × 256	64 × 64	0	16,384
	Attention	k = 3	64 × 64	64 × 64	6	192
	Att Conv3	128 × 1 × 1	128 × 64	128 × 64	5280	2,621,440
	Att MaxPool	4 × 1 (stride = 4)	128 × 64	128 × 16	0	16,384
Task-specific	Attention	k = 3	16 × 1024	16 × 1024	6	192
	Att Conv1	32 × 1 × 1	16 × 1024	32 × 1024	5280	2,621,440
	Att MaxPool	4 × 1 (stride = 4)	32 × 1024	32 × 256	0	16,384
	Attention	k = 3	32 × 256	32 × 256	0	16,384
	Att Conv2	64 × 1 × 1	32 × 256	64 × 256	5280	2,621,440
	Att MaxPool	4 × 1 (stride = 4)	64 × 256	64 × 64	0	16,384
	Attention	k = 3	64 × 64	64 × 64	6	192
	Att Conv3	128 × 1 × 1	128 × 64	128 × 64	5280	2,621,440
	Att MaxPool	4 × 1 (stride = 4)	128 × 64	128 × 16	0	16,384
Task FC Layers	AdaptiveAvgPool	1 × 1	128 × 16	128 × 1	0	4096
	Task 1 FC	128 × 4	128 × 1	4 × 1	516	1024
	Task 2 FC	128 × 4	128 × 1	4 × 1	516	1024

Table 10. Experimental results of the ablation study.

Tasks	M1	M2	Ours
65 Hz	99.04%	98.54%	100%
70 Hz	98.76%	97.69%	100%
75 Hz	97.89%	97.08%	100%

Table 11. Experimental results of the CRB dataset using five evaluation metrics.

Methods	Accuracy (%)	AUC	Precision (%)	Recall (%)	F1-Score (%)
Ours	100	100	100	100	100
MTCNN	99.14	97.48	96.85	96.91	96.88
JLCNN	98.09	95.62	96.17	96.64	96.40
MRFGCN	97.49	93.28	94.53	94.07	94.29
IDBN	96.84	93.61	93.07	92.51	92.79

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, B.; Zhao, S. MTAGCN: Multi-Task Graph-Guided Convolutional Network with Attention Mechanism for Intelligent Fault Diagnosis of Rotating Machinery. Machines 2025, 13, 347. https://doi.org/10.3390/machines13050347

AMA Style

Wang B, Zhao S. MTAGCN: Multi-Task Graph-Guided Convolutional Network with Attention Mechanism for Intelligent Fault Diagnosis of Rotating Machinery. Machines. 2025; 13(5):347. https://doi.org/10.3390/machines13050347

Chicago/Turabian Style

Wang, Bo, and Shuai Zhao. 2025. "MTAGCN: Multi-Task Graph-Guided Convolutional Network with Attention Mechanism for Intelligent Fault Diagnosis of Rotating Machinery" Machines 13, no. 5: 347. https://doi.org/10.3390/machines13050347

APA Style

Wang, B., & Zhao, S. (2025). MTAGCN: Multi-Task Graph-Guided Convolutional Network with Attention Mechanism for Intelligent Fault Diagnosis of Rotating Machinery. Machines, 13(5), 347. https://doi.org/10.3390/machines13050347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MTAGCN: Multi-Task Graph-Guided Convolutional Network with Attention Mechanism for Intelligent Fault Diagnosis of Rotating Machinery

Abstract

1. Introduction

2. Related Works

2.1. Convolutional Neural Networks

2.2. Graph Convolutional Networks

2.3. Attention Mechanism

3. Proposed Method

3.1. Task-Shared Module

3.2. Attention Mechanism Module

3.3. Task-Specific Module

3.4. Loss Function and Optimization

4. Experiments

4.1. Case 1: Cylindrical Roller Bearing (CRB) Dataset

4.2. Case 2: HUSTbearing Dataset

4.3. Case 3: MCC5-THU Gearbox Dataset

4.4. Baseline Methods

4.5. Implementation Details

4.6. Diagnosis Results Analysis

4.7. Feature Visualization

4.8. Classification Performance

5. Model Discussions

5.1. Detailed Structure and Complexity of MTAGCN

5.2. Ablation Study

5.3. Evaluation Metrics

6. Conclusions

Author Contributions

Funding

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI