MFGAN: Multimodal Fusion for Industrial Anomaly Detection Using Attention-Based Autoencoder and Generative Adversarial Network
Abstract
:1. Introduction
1.1. Research Background
1.2. Motivations
- (1)
- Recently, with the emergence of transformers [1] and attention mechanisms, such techniques have also been explored for anomaly detection. However, the work on integrating attention mechanisms with an autoencoder (AE) is still limited. Current studies predominantly apply a transformer or its variants directly [2,3,4], overlooking the untapped potential of attention mechanisms within AE. Particularly, when handling temporal data, conventional AEs exhibit limitations in capturing crucial contextual information. Introducing attention mechanisms in such scenarios is considered a promising approach, aiming to address the shortcomings of AEs in feature extraction by enabling the model to focus more on pivotal segments within sequences, thereby enhancing the model’s comprehension and modeling capabilities across diverse data modalities. Such integration of attention mechanisms and AE holds promise for groundbreaking advancements in the realm of anomaly detection. Hence, one of the motivations behind this study is to explore how attention mechanisms can enhance the performance of AE in anomaly detection, addressing their limitations in extracting essential sequence information.
- (2)
- Traditional methods for anomaly detection mainly use unimodal data from a single type of sensor or device. However, with the advent of Industry 4.0 and the rapid development of Internet of Things (IoT), the data generated in modern industrial production has become increasingly diverse and complex. Particularly, in the framework of Industry 4.0, the Industrial Internet provides us with real-time, high-throughput, multi-source time-series data. Such data come from a variety of sensors, devices, and production systems, and include multiple modalities such as temperature, pressure, vibration, current, acoustic signal, and images. However, due to the sheer volume and complexity of multimodal data, traditional methods for anomaly detection using a single data source cannot fully utilize such data and are often unable to respond promptly. In recent years, with the rise of cross-domain data fusion, multimodal learning has been widely used to integrate information in multiple modalities from multiple sources to improve the performance and effectiveness of learning models for various tasks. However, for anomaly detection, multimodal learning is considered mainly in the image domain, such as multimodal anomaly detection for 3D point clouds and RGB images [5]. Therefore, another motivation behind this study is to address the research gap in utilizing multimodal data for time-series anomaly detection in real industrial settings. Towards this goal, we propose a novel multimodal learning approach to anomaly detection for the core equipment of cement production, namely, roller press.
1.3. Contributions
- (1)
- We propose a method that utilizes attention mechanisms in an AE, which extracts the sequence context information from the original temporal data and improves the model’s feature extraction for each modal data point.
- (2)
- We design a multimodal feature fusion module, which fuses multiple features such as temperature, vibration, and acoustic signal to improve the accuracy of anomaly detection. This module improves the model’s generalization ability, robustness, and applicability to a wide spectrum of practical scenarios.
- (3)
- We collect multimodal data in a real-life industrial environment and experimentally demonstrate the performance superiority of the proposed model over existing methods for anomaly detection.
2. Related Work
2.1. Traditional Methods for Anomaly Detection
- (1)
- Density estimation-based methods. Local outlier factor (LOF) [6] is a widely used outlier detection algorithm, while clustering with outlier factor (COF) [7] introduces the concept of connectivity based on LOF, which allows the algorithm to handle high-dimensional data and different types of data. DAGMM [8] combines a Gaussian mixture model (GMM) [9] with neural networks for anomaly detection.
- (2)
- Reconstruction-based methods. These methods use normal data to train a model and detect anomalies by comparing the original data with the reconstructed data. Park et al. [10] proposed an LSTM-VAE model, which uses long short-term memory (LSTM) [11] to extract features from the original temporal data and uses a variational autoencoder (VAE) [12] to reconstruct the hidden features. Su et al. [13] also proposed a VAE model, but they utilized a gated recurrent unit (GRU) [14] to extract latent features. TimesNet [15] employs a fast Fourier transformation to convert time-series data from one-dimensional to two-dimensional, and uses classical algorithms in computer vision for anomaly detection. In addition, generative adversarial networks (GANs) [16] have also been applied to anomaly detection. AnoGAN [17] uses a GAN for anomaly detection for the first time and it uses a simple DCGAN [18] structure. It learns to generate normal images by feeding noise vectors to a generator built by an inverse convolutional layer. However, AnoGAN still needs to update its parameters during the testing phase. To address this shortcoming, Zenati et al. [19] proposed a BiGAN-based [20] approach, which greatly accelerates the processing speed of the model by adding an encoder that maps the input sample to the potential representation during training. f-AnoGAN [21], on the other hand, trains the model in two phases: the generative adversarial network (GAN) is only trained in the first phase, and in the second phase, the GAN is frozen and only the encoder is trained. The AE’s structure is formed by the generator acting as a decoder, together with the encoder. BeatGAN [22] is an algorithm for reconstructing temporal data based on generative adversarial networks with the AE as the generator, and uses adversarial regularization to further improve the model performance. DAEMON [23] uses the VAE as a generator and uses two discriminators as regularization terms in the model to avoid the problem of model overfitting. MAD-GAN [24] attempts to capture the temporal correlation of time-series data distributions using LSTM-RNN as the underlying framework for GAN networks. In TadGAN [25], the authors pointed out that the original formulation using standard adversarial loss suffers from gradient instability and pattern collapse, and introduced cycle consistency loss to train the generator by minimizing the L2 paradigm of the difference between the original and reconstructed samples.
- (3)
- Prediction-based methods. These methods use an autoregressive model to model time-series data and detect anomalies by comparing actual observations with model predictions. Autoregressive integrated moving average (ARIMA) [26] predicts future observations by building an autoregressive model to determine whether the current time point is an anomaly by calculating the prediction error at each time point. Hundman et al. [27] used an LSTM instead of an autoregressive model. DeepAnT [28] uses a convolutional neural network (CNN) structure for time-series prediction. MARINA [29] considers temporal correlation and spatial correlation for time-series prediction.
- (4)
- Transformer and its variants. These methods use the transformer structure [1] to efficiently capture long-term dependencies and complex patterns in temporal data for anomaly detection. An anomaly transformer [2] proposes the anomaly-attention mechanism and utilizes association discrepancy for anomaly detection. Wu et al. [3] proposed an autoformer with an auto-correlation mechanism to capture time-series dependencies based on learning cycles. Fedformer [4] uses a hybrid approach to enhance seasonal trend decomposition for temporal data.
2.2. Multimodal Methods for Anomaly Detection
3. Proposed Method
3.1. Network Structure
3.2. Data Preprocessing
3.2.1. Data Format
3.2.2. Data Normalization
3.3. Encoder Module
3.4. Feature Fusion (FF) Module
3.5. Decoder Module
3.6. Discriminator Module
Algorithm 1 Training algorithm |
|
4. Performance Evaluation
4.1. Datasets
4.2. Evaluation Metrics
4.3. Baseline Methods
- AE [36]. The autoencoder is a classical reconstruction-based anomaly detection method with the same network structure as . The parameters used in the AE are the same as those in the MFGAN.
- VAE [12]. A variational autoencoder, which combines an autoencoder and variational inference, can learn the underlying distribution of the data and generate new samples that are similar to the original data. The parameters used in the VAE are the same as those in the MFGAN.
- BeatGAN [22]. BeatGAN is a reconstruction-based model that uses adversarial generation methods to reconstruct the data. The network structure of its generator and discriminator is similar to that of the MFGAN. The parameters used in BeatGAN are the same as those in the MFGAN.
- TimesNet [15]. It converts a one-dimensional time series into a set of multi-period, two-dimensional tensors by Fourier transform, and then extracts local features for anomaly detection using a classical method in the image field. Since the TimesNet model has a different architecture than AE, VAE, BeatGAN, and MFGAN, it uses default parameters.
4.4. Comparative Experiments
4.4.1. Implementation Details
4.4.2. Experimental Results
4.4.3. Time Efficiency
4.5. Ablation Study
5. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
- Xu, J.; Wu, H.; Wang, J.; Long, M. Anomaly transformer: Time series anomaly detection with association discrepancy. arXiv 2021, arXiv:2110.02642. [Google Scholar] [CrossRef]
- Wu, H.; Xu, J.; Wang, J.; Long, M. Autoformer: Decomposition transformers with auto-correlation for long-term series forecasting. Adv. Neural Inf. Process. Syst. 2021, 34, 22419–22430. [Google Scholar] [CrossRef]
- Zhou, T.; Ma, Z.; Wen, Q.; Wang, X.; Sun, L.; Jin, R. Fedformer: Frequency enhanced decomposed transformer for long-term series forecasting. In Proceedings of the International Conference on Machine Learning, PMLR, Baltimore, MD, USA, 17–23 July 2022; pp. 27268–27286. [Google Scholar] [CrossRef]
- Wang, Y.; Peng, J.; Zhang, J.; Yi, R.; Wang, Y.; Wang, C. Multimodal Industrial Anomaly Detection via Hybrid Fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 8032–8041. [Google Scholar] [CrossRef]
- Breunig, M.M.; Kriegel, H.P.; Ng, R.T.; Sander, J. LOF: Identifying density-based local outliers. In Proceedings of the 2000 ACM SIGMOD International Conference on Management of Data, Dallas, TX, USA, 14–18 May 2000; pp. 93–104. [Google Scholar] [CrossRef]
- Tang, J.; Chen, Z.; Fu, A.W.C.; Cheung, D.W. Enhancing effectiveness of outlier detections for low density patterns. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 6th Pacific-Asia Conference, PAKDD 2002, Taipei, Taiwan, 6–8 May 2002; Proceedings 6. Springer: Berlin/Heidelberg, Germany, 2002; pp. 535–548. [Google Scholar] [CrossRef]
- Zong, B.; Song, Q.; Min, M.R.; Cheng, W.; Lumezanu, C.; Cho, D.; Chen, H. Deep autoencoding gaussian mixture model for unsupervised anomaly detection. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
- Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar] [CrossRef]
- Park, D.; Hoshi, Y.; Kemp, C.C. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robot. Autom. Lett. 2018, 3, 1544–1551. [Google Scholar] [CrossRef]
- Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef]
- Kingma, D.P.; Welling, M. Auto-encoding variational bayes. arXiv 2013, arXiv:1312.6114. [Google Scholar] [CrossRef]
- Su, Y.; Zhao, Y.; Niu, C.; Liu, R.; Sun, W.; Pei, D. Robust anomaly detection for multivariate time series through stochastic recurrent neural network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2828–2837. [Google Scholar] [CrossRef]
- Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar] [CrossRef]
- Wu, H.; Hu, T.; Liu, Y.; Zhou, H.; Wang, J.; Long, M. TimesNet: Temporal 2D-Variation Modeling for General Time Series Analysis. arXiv 2022, arXiv:2210.02186. [Google Scholar] [CrossRef]
- Goodfellow, I.; Pouget-Abadie, J.; Mirza, M.; Xu, B.; Warde-Farley, D.; Ozair, S.; Courville, A.; Bengio, Y. Generative adversarial networks. Commun. ACM 2020, 63, 139–144. [Google Scholar] [CrossRef]
- Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Schmidt-Erfurth, U.; Langs, G. Unsupervised anomaly detection with generative adversarial networks to guide marker discovery. In Proceedings of the International Conference on Information Processing in Medical Imaging, Boone, NC, USA, 25–30 June 2017; Springer: Cham, Switzerland, 2017; pp. 146–157. [Google Scholar] [CrossRef]
- Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. arXiv 2015, arXiv:1511.06434. [Google Scholar] [CrossRef]
- Zenati, H.; Foo, C.S.; Lecouat, B.; Manek, G.; Chandrasekhar, V.R. Efficient gan-based anomaly detection. arXiv 2018, arXiv:1802.06222. [Google Scholar] [CrossRef]
- Donahue, J.; Krähenbühl, P.; Darrell, T. Adversarial feature learning. arXiv 2016, arXiv:1605.09782. [Google Scholar] [CrossRef]
- Schlegl, T.; Seeböck, P.; Waldstein, S.M.; Langs, G.; Schmidt-Erfurth, U. f-AnoGAN: Fast unsupervised anomaly detection with generative adversarial networks. Med. Image Anal. 2019, 54, 30–44. [Google Scholar] [CrossRef] [PubMed]
- Zhou, B.; Liu, S.; Hooi, B.; Cheng, X.; Ye, J. BeatGAN: Anomalous Rhythm Detection using Adversarially Generated Time Series. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; Volume 2019, pp. 4433–4439. [Google Scholar] [CrossRef]
- Chen, X.; Deng, L.; Huang, F.; Zhang, C.; Zhang, Z.; Zhao, Y.; Zheng, K. Daemon: Unsupervised anomaly detection and interpretation for multivariate time series. In Proceedings of the 2021 IEEE 37th International Conference on Data Engineering (ICDE), Chania, Greece, 19–22 April 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 2225–2230. [Google Scholar] [CrossRef]
- Li, D.; Chen, D.; Jin, B.; Shi, L.; Goh, J.; Ng, S.K. MAD-GAN: Multivariate anomaly detection for time series data with generative adversarial networks. In Proceedings of the Artificial Neural Networks and Machine Learning–ICANN 2019: Text and Time Series: 28th International Conference on Artificial Neural Networks, Munich, Germany, 17–19 September 2019; Proceedings, Part IV. Springer: Cham, Switzerland, 2019; pp. 703–716. [Google Scholar] [CrossRef]
- Geiger, A.; Liu, D.; Alnegheimish, S.; Cuesta-Infante, A.; Veeramachaneni, K. Tadgan: Time series anomaly detection using generative adversarial networks. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 33–43. [Google Scholar] [CrossRef]
- Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
- Hundman, K.; Constantinou, V.; Laporte, C.; Colwell, I.; Soderstrom, T. Detecting spacecraft anomalies using lstms and nonparametric dynamic thresholding. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 387–395. [Google Scholar] [CrossRef]
- Munir, M.; Siddiqui, S.A.; Dengel, A.; Ahmed, S. DeepAnT: A deep learning approach for unsupervised anomaly detection in time series. IEEE Access 2018, 7, 1991–2005. [Google Scholar] [CrossRef]
- Xie, J.; Cui, Y.; Huang, F.; Liu, C.; Zheng, K. MARINA: An MLP-Attention Model for Multivariate Time-Series Analysis. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 2230–2239. [Google Scholar] [CrossRef]
- Baltrušaitis, T.; Ahuja, C.; Morency, L.P. Multimodal machine learning: A survey and taxonomy. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 423–443. [Google Scholar] [CrossRef] [PubMed]
- Wu, L.; Oviatt, S.L.; Cohen, P.R. Multimodal integration—A statistical view. IEEE Trans. Multimed. 1999, 1, 334–341. [Google Scholar] [CrossRef]
- Guo, W.; Wang, J.; Wang, S. Deep multimodal representation learning: A survey. IEEE Access 2019, 7, 63373–63394. [Google Scholar] [CrossRef]
- Zhao, Z.; Bai, H.; Zhang, J.; Zhang, Y.; Xu, S.; Lin, Z.; Timofte, R.; Van Gool, L. Cddfuse: Correlation-driven dual-branch feature decomposition for multi-modality image fusion. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5906–5916. [Google Scholar] [CrossRef]
- Zhao, J.; Zhao, Y.; Li, J. M3tr: Multi-modal multi-label recognition with transformer. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual, 20–24 October 2021; pp. 469–477. [Google Scholar] [CrossRef]
- Ju, X.; Zhang, D.; Li, J.; Zhou, G. Transformer-based label set generation for multi-modal multi-label emotion detection. In Proceedings of the 28th ACM International Conference on Multimedia, Seattle, WA, USA, 12–16 October 2020; pp. 512–520. [Google Scholar] [CrossRef]
- Hinton, G.E.; Zemel, R. Autoencoders, minimum description length and Helmholtz free energy. Adv. Neural Inf. Process. Syst. 1993, 6, 3–10. [Google Scholar]
Features |
---|
Non-drive-side bearing temperature |
Drive side bearing temperature |
Main motor non-drive side bearing temperature |
Main motor drive side bearing temperature |
Main motor stator temperature |
Reducer oil temperature |
Gearbox shaft temperature |
Vibration of reducer |
Model Prameters | Value |
---|---|
Training iterations | 50 |
Batch size | 64 |
Sequence length | 320 |
Step length | 160 |
Optimizer | Adam |
Momentums of Adam | 0.9 (), 0.99 () |
Learning rate | 0.0002 |
Model | F1 | Pre | Rec | Acc |
---|---|---|---|---|
AE | 0.7061 ± 0.0138 | 0.6531 ± 0.0155 | 0.7962 ± 0.0154 | 0.7284 ± 0.0109 |
VAE | 0.7369 ± 0.0021 | 0.8421 ± 0.0026 | 0.6657 ± 0.0033 | 0.7685 ± 0.0018 |
BeatGAN | 0.7775 ± 0.0069 | 0.7895 ± 0.0065 | 0.7706 ± 0.0073 | 0.7993 ± 0.0054 |
TimesNet | 0.9148 ± 0.0044 | 0.8637 ± 0.0059 | 0.9806 ± 0.0048 | 0.9271 ± 0.0055 |
MFGAN | 0.9685 ± 0.0046 | 0.9832 ± 0.0051 | 0.9569 ± 0.0057 | 0.9824 ± 0.0047 |
AM | FF Module | Discriminator | F1 | Pre | Rec |
---|---|---|---|---|---|
0.7086 ± 0.0126 | 0.6578 ± 0.0152 | 0.7967 ± 0.0143 | |||
🗸 | 🗸 | 0.8329 ± 0.0053 | 0.7525 ± 0.0058 | 0.9347 ± 0.0061 | |
🗸 | 🗸 | 0.8513 ± 0.0059 | 0.8315 ± 0.0062 | 0.8639 ± 0.0067 | |
🗸 | 🗸 | 0.8729 ± 0.0086 | 0.7802 ± 0.0091 | 0.9836 ± 0.0088 | |
🗸 | 🗸 | 🗸 | 0.9685 ± 0.0046 | 0.9832 ± 0.0051 | 0.9569 ± 0.0057 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Qu, X.; Liu, Z.; Wu, C.Q.; Hou, A.; Yin, X.; Chen, Z. MFGAN: Multimodal Fusion for Industrial Anomaly Detection Using Attention-Based Autoencoder and Generative Adversarial Network. Sensors 2024, 24, 637. https://doi.org/10.3390/s24020637
Qu X, Liu Z, Wu CQ, Hou A, Yin X, Chen Z. MFGAN: Multimodal Fusion for Industrial Anomaly Detection Using Attention-Based Autoencoder and Generative Adversarial Network. Sensors. 2024; 24(2):637. https://doi.org/10.3390/s24020637
Chicago/Turabian StyleQu, Xinji, Zhuo Liu, Chase Q. Wu, Aiqin Hou, Xiaoyan Yin, and Zhulian Chen. 2024. "MFGAN: Multimodal Fusion for Industrial Anomaly Detection Using Attention-Based Autoencoder and Generative Adversarial Network" Sensors 24, no. 2: 637. https://doi.org/10.3390/s24020637
APA StyleQu, X., Liu, Z., Wu, C. Q., Hou, A., Yin, X., & Chen, Z. (2024). MFGAN: Multimodal Fusion for Industrial Anomaly Detection Using Attention-Based Autoencoder and Generative Adversarial Network. Sensors, 24(2), 637. https://doi.org/10.3390/s24020637