1. Introduction
The rapid progression of industrial automation and informatization has significantly elevated the importance of fault-diagnosis technologies in maintaining the reliability and safety of industrial systems [
1]. In complex settings such as the chemical, manufacturing and energy sectors, the capacity for precise and timely fault detection is crucial for optimizing production efficiency and averting potential accidents [
2,
3,
4]. Recent advancements in big data and the Internet of Things (IoT) have led to widespread sensor integration across these systems, generating substantial data volumes that enhance the monitoring and diagnostic processes [
5,
6]. This influx of operational data, while beneficial, often overshadows the sparse fault data, creating a significant imbalance that challenges traditional data-intensive machine learning and deep learning methodologies [
7,
8]. This disparity underscores the need for robust few-shot learning techniques that can effectively function in environments with limited fault data, marking a critical area of research in industrial intelligence [
9,
10].
Despite the significant achievements of Few-Shot Learning (FSL) in fields such as natural language processing and image classification [
8,
11,
12,
13], its application in the complex domain of fault diagnosis, particularly when handling intricate industrial data, continues to face numerous challenges. Research has shown that existing FSL approaches can be broadly categorized into generative and discriminative model methods, each with its strengths and limitations. For instance, generative models, as described by Lu et al. [
11], often enhance model performance in data-scarce environments by generating new samples, whereas discriminative models focus on learning distinguishing features from limited data. Zhang et al. [
8] have classified fault-diagnosis methods for small and imbalanced datasets into data augmentation, feature extraction and classifier design. Despite their excellence, these methods commonly exhibit weaknesses such as poor generalization capabilities, sensitivity to adversarial examples and high algorithmic complexity, which can lead to suboptimal performance in industrial applications that demand rapid and accurate diagnostics. Recent studies have shifted focus to addressing FSL fault-diagnosis challenges through meta-learning [
14,
15,
16], with a significant proportion of recent research in time-series signal fault diagnosis concentrating on this approach [
17]. However, the practical utility of meta-learning is limited by its high dependency on data quality and distribution, which can adversely affect performance in industrial settings due to data noise and outliers [
18]. Meta-learning algorithms like MAML, although quick to adapt to new tasks, require multiple gradient updates, which increases the computational load and can lead to overfitting on sparse data, thus diminishing generalization capabilities [
19]. Task sequencing meta-learning introduces a novel method for optimizing the selection and order of learning tasks, but significant differences between tasks can lead to unstable knowledge transfer, affecting the learning outcomes [
20].
In recent years, GAN-based data-augmentation algorithms have garnered considerable attention for addressing small-sample problems in fault diagnosis. These algorithms typically follow a sequence of steps: initially collecting data under various fault conditions, then training a GAN model using real signals and finally training classifiers with both real and synthetic signals [
21]. This approach enables GAN-based data augmentation to generate a diverse set of samples from limited fault data, thereby enhancing the performance of diagnostic models. For instance, Zhang et al. developed a novel multi-module gradient penalty GAN specifically for generating samples for mechanical fault diagnosis [
22]. Furthermore, research by Li et al. utilized a WGAN-GP-based auxiliary classifier to generate high-quality spectra to overcome small-sample challenges [
23]. Although methods for generating one-dimensional spectra or two-dimensional images have been extensively explored, Pan et al. proposed a feature generation network that creates one-dimensional feature sets rich in fault information, suitable for further fault identification [
24]. Examples of GAN-based data augmentation for different signal types are shown in
Table 1 below. Building on this foundation, our research has further evolved to incorporate adversarial sample augmentation techniques. This strategy, by introducing targeted adversarial perturbations directly into the data, creates adversarial samples for training. Compared to traditional GAN generation methods, adversarial sample augmentation not only enhances the model’s sensitivity and detection capabilities for complex fault conditions but also improves its robustness against unknown or varying fault patterns. By simulating potential fault variations, our model is better equipped to adapt to the complex and dynamic conditions encountered in actual operations, thereby increasing the accuracy and reliability of fault diagnosis.
Building on our previous discussion regarding the challenges and limitations of meta-learning in industrial fault diagnosis, the research framework provided by Wang et al. [
12] offers a systematic approach to thinking about few-shot learning. They have meticulously categorized few-shot learning from three perspectives: data, models and algorithms. Following this framework, our study introduces a comprehensive strategy combining data augmentation, model optimization and algorithmic improvements to overcome the existing limitations of meta-learning techniques, particularly when dealing with complex industrial fault-diagnosis data.
To address the challenges in fault diagnosis, our research integrates adversarial sample generation with serialized task learning to enhance the model’s robustness and adaptability. This method leverages modular design and advanced data-augmentation strategies to enable effective management of diverse fault-diagnosis scenarios. Our approach includes generating adversarial samples to enhance the robustness of the model against varied fault conditions. This strategy not only increases the dataset’s diversity but also sharpens the model’s capability to recognize and respond to anomalies that are not typical of the training data, substantially improving its diagnostic accuracy under operational conditions. The model adopts a modular design that allows for greater structural flexibility, facilitating easy adjustments and optimizations tailored to specific fault-diagnosis tasks. This adaptability enhances the model’s efficiency and accuracy in identifying faults, making it particularly suited to dynamic industrial environments. Employing serialized task learning refines the learning process by structuring the sequence of tasks from simple to complex. This strategic arrangement ensures efficient utilization of limited sample data for training, thereby enhancing the model’s rapid adaptability to new fault types and maintaining performance stability across diverse industrial settings. Together, these elements substantially enhance the model’s fault-diagnosis capabilities, providing a robust framework that significantly improves adaptability and accuracy compared to traditional methods. The integration of adversarial learning within the serialized task framework is particularly effective in preparing the model for the unpredictability of real-world operational conditions.
The Adversarial Task Augmented Sequential Meta-Learning (ATASML) framework introduced in this study enhances diagnostic models by integrating adversarial examples within a meta-learning architecture, significantly improving generalization capabilities. Embedding adversarial tasks during the training phase, ATASML prepares the model to handle unexpected or novel fault scenarios, thus robustly boosting adaptability and accuracy—ideal for industrial settings with variable and complex sensor data.
The principal contributions of this work are encapsulated as follows:
Development of the ATASML framework that integrates adversarial tasks during training, enhancing the model’s adaptability and robustness. This dynamic integration is crucial for improving diagnostic accuracy and reliability under diverse industrial fault conditions.
Implementation of a comprehensive data-augmentation strategy in ATASML, incorporating Gaussian noise, temporal warping and adversarial example generation. These techniques broaden training data coverage, significantly improving the model’s anomaly detection and diagnostic capabilities in variable environments.
Application of a sequential task learning approach where tasks are prioritized based on complexity and informational value. This optimizes the learning process, improving training efficiency and diagnostic precision, and making the process computationally economical.
Validation of ATASML using industrial-relevant datasets, including the Tennessee Eastman Process (TEP) and Skoltech Anomaly Benchmark (SKAB), shows its superior performance over other well-established models, particularly in few-shot learning scenarios. The outcomes underscore ATASML’s improved accuracy, F1 scores and robust generalization capabilities across a range of complex fault conditions.
The remainder of this paper is organized as follows:
Section 2 describes the proposed method in detail.
Section 3 presents a case study that applies the method to specific datasets.
Section 4 discusses the implications of the experimental results, and
Section 5 concludes the paper with a summary of the findings and directions for future research.
2. Proposed Method
This chapter delineates our proposed approach, the Adversarial Task Augmented Sequential Meta-Learning (ATASML) framework, designed to enhance the efficacy of fault diagnosis in complex systems. Initially, we explore the foundational concepts of meta-learning, highlighting its significance in enabling models to generalize from limited data. Subsequently, the discussion transitions to the exploration of task-oriented meta-learning algorithms, acknowledging their efforts to refine the adaptability and computational efficiency of meta-learning models. Building on these insights, we introduce the ATASML framework, focusing on its innovative strategies for data augmentation and the generation of adversarial samples. This structured presentation aims to illustrate the logical progression from meta-learning fundamentals to the sophisticated mechanisms underlying the ATASML framework.
2.1. Fundamentals of Meta-Learning
Meta-learning, or learning to learn, is a significant paradigm in machine learning aimed at enabling models to generalize from limited experiences to perform well on new tasks [
18]. This approach leverages accumulated knowledge to quickly adapt to novel learning challenges with minimal data. Meta-learning operates at two levels: the meta-level, focusing on the learning strategy across tasks, and the base-level, where task-specific learning occurs.
The core objective in meta-learning is to fine-tune the model parameters, , to optimize performance not just on a single task’s training data but across a diverse set of tasks represented by the distribution . This goal is pursued through episodic training, which involves sampling tasks and dividing each into a support set S for learning and a query set Q for evaluation. The model’s parameters are updated based on the performance on Q, facilitating the model’s ability to assimilate information from S.
Formally, the meta-learning objective for a given task
T, with support set
S and query set
Q, is to minimize the expected loss
on
Q after learning from
S, as defined by:
where
represents parameters updated from learning on
S, and
denote the samples and labels in
Q. The ultimate aim is identifying
that facilitates rapid adaptation to new tasks from
.
Various meta-learning models exist, including model-based, optimization-based and metric-based approaches. Each type aims to enhance the model’s ability to learn efficiently from minimal data, thereby broadening the scope of tasks it can handle—from few-shot learning challenges to rapid adaptation in reinforcement learning contexts.
Model-Agnostic Meta-Learning (MAML) [
36,
37] is a notable strategy that exemplifies meta-learning’s essence by preparing the model to significantly improve performance on new tasks with only a few adjustments. MAML seeks an initial parameter set
that is optimal for quick learning across tasks. After initial adaptation through gradient updates on
S, resulting in task-specific parameters
, the model’s performance is evaluated on
Q:
This approach’s generality allows its application across various models and tasks, highlighting the versatility and potential of meta-learning.
Despite meta-learning’s and MAML’s advancements, challenges such as high computational demands, assumptions of task homogeneity and sensitivity to hyperparameters remain areas for future research. Addressing these limitations is crucial for advancing meta-learning towards more practical and scalable applications.
2.2. Task-Oriented Meta-Learning Algorithms
Recent developments in task-oriented meta-learning have demonstrated significant advancements in optimizing learning processes through past experiences to enhance performance across a variety of tasks. Notably, task sequencing meta-learning techniques have been introduced to improve adaptability and performance in few-shot learning scenarios by optimizing the order of task presentation.
Task-Sequencing Meta-Learning (TSML): Introduced by Hu et al. [
38], TSML organizes meta-training tasks from simple to complex, enhancing a model’s adaptability in few-shot fault-diagnosis scenarios.
Task-Specific Pseudo Labelling: Developed by Lee et al. [
39], this method employs pseudo labelling to enhance transductive meta-learning by generating synthetic labels for unannotated query sets, significantly improving model performance.
Task Weighting with Trajectory Optimization: Proposed by Do & Carneiro [
40], this approach uses trajectory optimization to automate task weighting, showing improved performance over traditional methods in few-shot learning benchmarks.
Despite these innovative approaches, challenges remain, particularly with the variability between tasks which can lead to unstable knowledge transfer, impacting learning efficacy. This issue is exacerbated in real-world applications where task heterogeneity is prevalent and tasks often deviate significantly from the training distribution. The assumption of task uniformity by these algorithms often does not hold in complex scenarios, leading to a gap between theoretical efficiency and practical applicability. The fixed nature of task prioritization may also fail to accommodate the dynamic variability of real-world data, posing further challenges to model adaptability.
The exploration of these meta-learning algorithms underscores the necessity for a robust framework that intelligently sequences tasks and is resilient to the unpredictable nature of practical applications. The upcoming section introduces the Adversarial Task Augmented Sequential Meta-Learning (ATASML) Framework, which proposes innovative solutions to address these challenges. ATASML integrates data augmentation and adversarial examples within its learning process, significantly enhancing the model’s resilience and generalization capabilities. These features enable ATASML to perform effectively even under the challenging conditions presented by complex fault scenarios, thereby providing a substantial improvement over existing meta-learning frameworks.
2.3. Adversarial Task Augmented Sequential Meta-Learning (ATASML) Framework
The ATASML framework introduces a novel approach to enhance model adaptability and robustness, particularly in fault-diagnosis applications. By integrating data augmentation and adversarial training, ATASML aims to prepare models for a wide array of operational scenarios. This section elaborates on the framework’s methodology, emphasizing data augmentation, adversarial sample generation and the overall algorithmic procedure.
2.3.1. Data Augmentation in the ATASML Framework
Data augmentation is critical in the ATASML framework, ensuring that models are well-equipped for diverse operational scenarios. Techniques such as Gaussian noise addition and temporal warping are employed to enhance data variability and complexity. These techniques are depicted in
Figure 1, where the transformation from the original dataset
to the augmented datasets
and
is illustrated.
The original dataset consists of samples that capture the true operational dynamics. To enhance the dataset, a window of size W is utilized to segment the continuous time-series data into discrete, overlapping segments. This sliding window technique allows for comprehensive capture of temporal patterns within each segment. Subsequently, the augmentation methods applied to each segment are as follows:
Noise Addition (NoiseAdd): Gaussian noise
is added to the data within each window, creating a set of noise-augmented samples
:
where
denotes the variance of the noise, modeling the sensor noise and other environmental variations.
Temporal Warping (TimeWarp): The temporal spacing of data points within each segment is modified, resulting in a set of time-warped samples
:
where
a is a scaling factor that simulates the acceleration or deceleration of process dynamics.
Following augmentation, a selection process ensures that the resulting samples contribute positively to model training. This selection process, including quality assessment and anomaly detection, is critical to maintaining a high-quality augmented dataset
. The entire augmentation process, including the application of windowing and augmentation techniques, is schematically depicted in
Figure 1.
2.3.2. Design of Adversarial Samples
Adversarial samples within the ATASML framework are specifically designed to evaluate and enhance the model’s resilience against operational conditions that mimic real-world disturbances. These samples are derived from both the augmented dataset , which comprises samples modified by standard data-augmentation techniques, and the original dataset .
The generation of adversarial samples, depicted in
Figure 2, involves the following steps:
Dataset Composition: The dataset for generating adversarial samples, , is formulated using:
These augmented samples, along with unaltered original samples from , are used to construct .
Strategic Sample Selection (S): A subset from both (comprising and ) and is selected to create instances that exhibit potential vulnerabilities under varied operational conditions. This targeted selection process is designed to challenge the model realistically and robustly:
Focusing on samples that represent critical transitional states or dynamic conditions.
Electing samples that simulate rare operational disruptions or extreme conditions.
Choosing instances that might indicate potential system failures or significant performance anomalies.
Adversarial Optimization Problem: The adversarial samples
are generated by solving an optimization problem formulated to maximize the predictive error, thereby testing the model’s robustness:
where
,
is the perturbation designed to maximally disrupt the model’s predictions, and
represents the labels corresponding to the strategically selected samples, denoting their operational states as normal or anomalous.
Gradient Sign Method: This method employs the gradient of the loss function with respect to the inputs from
S to determine the most effective direction for perturbations, thereby ensuring the adversarial samples are optimally challenging:
This systematic approach of employing both (including and ) and to generate ensures the model is comprehensively evaluated against synthetic distortions as well as baseline conditions. Such rigorous testing is essential for verifying the model’s operational reliability across diverse and potentially disruptive conditions.
2.4. ATASML Algorithmic Procedure
The ATASML framework is designed to enhance fault diagnosis through a systematic approach incorporating meta-learning and adversarial training, which improve model robustness and adaptability.
Refer to
Figure 3 for a visual representation of the ATASML algorithm’s architecture, which illustrates the critical components and their interactions within the framework.
Parameter Initialization: Begin with a random initialization of model parameters , which helps in avoiding local minima and promotes better convergence towards optimal solutions.
Task Sampling: Sample a set of tasks from the task distribution . This step ensures that the model is trained across a diverse set of scenarios, enhancing its generalization capabilities.
Data Augmentation: Utilize the pre-augmented dataset
to select augmented tasks
. This dataset already includes transformations applied via Gaussian noise and temporal warping, thus directly enhancing the data’s variability and complexity:
Adversarial Task Generation: Generate adversarial tasks
using the dataset
, which has been specifically prepared to challenge the model’s robustness under simulated adversarial conditions:
Task Combination and Difficulty Assessment: Combine the original tasks
from
, the augmented tasks
and the adversarial tasks
into composite tasks
. This step evaluates the complexity of integrating different types of tasks, which helps in assessing the model’s robustness and adaptability across a diverse range of conditions:
Assess the difficulty of these composite tasks to ensure the model can handle varying levels of challenge and complexity. This assessment aids in tuning the model’s sensitivity to subtle and extreme variations alike:
Wasserstein Distance Calculation: Compute the Wasserstein distance to measure how closely the composite tasks
align with the original task distribution, aiding in maintaining the integrity of the model’s training process:
Task Ranking: Rank the tasks based on assessed difficulty and Wasserstein distance, which helps in prioritizing the tasks that will most effectively enhance the model’s learning:
Support and Query Set Sampling: From each ranked task
, extract support and query sets
and
, respectively. These sets are crucial for tuning the model parameters specifically to each task:
Model Optimization: Update the model parameters
using the support set
and evaluate the performance on the query set
. This step is vital for iterative learning and adaptation of the model:
Global Parameter Update: Perform a global update of the model parameters
using the aggregated losses from the query sets across all tasks. This final step ensures the model is refined and ready for deployment:
The systematic methodology employed by the ATASML framework ensures improved diagnostic capabilities across a variety of operational conditions, establishing a robust model well-suited for dynamic environments.
Refer to Algorithm 1 for a complete overview of the procedural steps involved in the ATASML framework. This algorithm highlights the integrated approach to utilizing adversarial and meta-learning techniques to enhance the adaptability and accuracy of fault-diagnosis systems.
Algorithm 1 Adversarial Task Augmented Sequential Meta-Learning (ATASML) |
- Require:
: distribution over tasks. - Require:
: step size hyperparameters for inner and outer loop optimization. - 1:
Initialize model parameters randomly. - 2:
while not converged do - 3:
Sample a batch of tasks from . - 4:
for each do - 5:
Augment to using samples from . - 6:
Generate adversarial tasks using samples from . - 7:
Combine and into . - 8:
Assess difficulty and compute Wasserstein distance to original task distribution. - 9:
end for - 10:
Rank all tasks in based on and . - 11:
for each ranked task do - 12:
Sample support set and query set from . - 13:
Optimize on using gradient descent:
- 14:
Calculate loss on with updated parameters:
- 15:
end for - 16:
Update global parameters:
- 17:
end while - Ensure:
Optimized model parameters for generalized fault diagnosis.
|
4. Discussion
The Adversarial Task Augmented Sequential Meta-Learning (ATASML) framework has been rigorously tested across complex industrial datasets, such as the Tennessee Eastman Process (TEP) and Skoltech Anomaly Benchmark (SKAB), demonstrating substantial improvements in fault-diagnosis capabilities. ATASML is particularly effective in few-shot learning scenarios that often challenge traditional models due to limited data availability.
A key innovation of ATASML is its use of adversarial examples integrated within a meta-learning structure, which significantly enhances the model’s ability to generalize from limited samples. Unlike traditional approaches such as MAML and GOPML, which, while robust, do not specialize in rapid adaptation to new and complex tasks, ATASML’s strategic task sequencing further refines this capability by arranging learning tasks to optimize both the accuracy and speed of the learning process, utilizing adversarially augmented data to effectively challenge the model under diverse conditions.
To underscore the efficacy of the ATASML framework, this study compares it with renowned deep learning models such as VGG-11 and ResNet-18, which are benchmarked for their ability to generalize from extensively pretrained features to new, unseen fault conditions. These models are thoroughly adapted to specific fault-diagnosis tasks by fine-tuning on targeted datasets. Our experiments concentrate on demonstrating that the ATASML framework, when subjected to comprehensive network fine-tuning, achieves superior accuracy compared to merely fine-tuning the classifiers, particularly under few-shot learning conditions. Moreover, to ensure a fair comparison across all evaluated methods, an N-way K-shot setup was employed. This setup not only maintains the integrity of the experimental comparisons but also aligns with the research paradigms of few-shot learning, allowing for a precise assessment of the ATASML framework’s performance in complex fault-diagnosis tasks.
However, the framework’s computational intensity, particularly when processing large, complex datasets, highlights an area for potential improvement. Future work will focus on optimizing the computational efficiency of ATASML to enhance its scalability and practical applicability in industrial settings. This will involve refining the adversarial training components and exploring more efficient ways to implement task sequencing that reduces computational demands while maintaining high diagnostic accuracy.
5. Conclusions
The deployment of the Adversarial Task Augmented Sequential Meta-Learning (ATASML) framework marks a significant advancement in the field of intelligent fault-diagnosis systems. Through comprehensive testing across the Tennessee Eastman Process (TEP) and Skoltech Anomaly Benchmark (SKAB) datasets, ATASML has demonstrated notable improvements in diagnostic accuracy and model robustness, particularly under few-shot learning conditions.
Superior Generalization: ATASML achieves a diagnostic accuracy up to 98.47% in 3-way 6-shot scenarios on the TEP dataset and maintains strong performance in more complex 8-way settings with an accuracy of up to 90.13%. These results are significantly higher than those achieved by traditional models such as MAML and GOPML, illustrating ATASML’s robust adaptability to varied fault conditions.
Enhanced Diagnostic Precision: In SKAB dataset evaluations, ATASML consistently outperformed comparisons, achieving as high as 94.79% accuracy in 3-way 6-shot settings. This precision underlines the framework’s capability to effectively handle even the subtlest anomalies in industrial environments.
Efficient Learning Process: The strategic integration of adversarial learning and task sequencing in ATASML not only accelerates the learning process but also enhances the precision and reliability of fault diagnostics across diverse operational scenarios.
Conclusively, ATASML sets a new benchmark in fault diagnosis by effectively learning from limited data and adapting swiftly to new, intricate operational scenarios. These capabilities are critical in reducing operational downtime and maintenance costs, thereby enhancing safety by mitigating risks associated with delayed or missed fault detections.
Moving forward, the focus will be on further enhancing the computational efficiency of the ATASML framework to facilitate its scalability and broader industrial application. Additional research will explore the integration of more advanced adversarial techniques and the expansion of the framework to accommodate a broader spectrum of industrial conditions, potentially including real-time learning scenarios and the development of more generalized models that can perform across various industries with minimal adjustments.