Skip to Content
Applied SciencesApplied Sciences
  • Article
  • Open Access

20 February 2024

Pruning Quantized Unsupervised Meta-Learning DegradingNet Solution for Industrial Equipment and Semiconductor Process Anomaly Detection and Prediction

,
,
,
and
1
College of Electrical and Computer Engineering, National Yang Ming Chiao Tung University, Hsinchu City 300, Taiwan
2
Advance Tech Co., Taipei City 114, Taiwan
3
Department of Electronics Engineering, National Yang Ming Chiao Tung University, Hsinchu City 300, Taiwan
*
Authors to whom correspondence should be addressed.

Abstract

Machine- and deep-learning methods are used for industrial applications in prognostics and health management (PHM) for semiconductor processing and equipment anomaly detection to achieve proactive equipment maintenance and prevent process interruptions or equipment downtime. This study proposes a Pruning Quantized Unsupervised Meta-learning DegradingNet Solution (PQUM-DNS) for the fast training and retraining of new equipment or processes with limited data for anomaly detection and the prediction of various equipment and process conditions. This study utilizes real data from a factory chiller host motor, the Paderborn current and vibration open dataset, and the SECOM semiconductor open dataset to conduct experimental simulations, calculate the average value, and obtain the results. Compared to conventional deep autoencoders, PQUM-DNS reduces the average data volume required for rapid training and retraining by about 75% with similar AUC. The average RMSE of the predictive degradation degree is 0.037 for Holt–Winters, and the model size is reduced by about 60% through pruning and quantization which can be realized by edge devices, such as Raspberry Pi. This makes the proposed PQUM-DNS very suitable for intelligent equipment management and maintenance in industrial applications.

1. Introduction

Research related to prognostics and health management (PHM) indicates that anomaly detection and prediction are important approaches to monitoring equipment faults and semiconductor process abnormalities. However, this type of detection often relies on subjective assessments performed by operators with prior experience. Automating the detection of equipment faults or semiconductor process anomalies is essential for reliable predictive maintenance and can potentially eliminate the need for manual monitoring. Moreover, interconnected intelligent monitoring systems play a crucial role in Industry 4.0, which focuses on artificial intelligence (AI)-driven factory automation.
In recent years, various deep-learning techniques have been introduced for anomaly detection and prediction in PHM, focusing on equipment faults (vibration and current anomalies) and semiconductor processes [1,2,3,4,5,6,7,8,9,10].
The related techniques include dense autoencoders (AEs) [11], convolutional AEs [12], and pretrained convolutional neural networks [13]. Although these deep-learning approaches exhibit excellent performance in anomaly detection, their widespread adoption in real factory settings remains limited. One major reason for the slow adoption is the high computational resource requirements of many deep-learning-driven anomaly detection methods. These methods often lack practical considerations and system integration, ignoring factors such as multi-machine or multi-production line deployment, model retraining, optimization, and systematic integration. Consequently, their applicability in real factory environments is hindered.
Many studies have been conducted in the field of PHM. Pradeep et al. proposed that machine-learning techniques could be used to predict wafer defects with a random forest classifier, achieving an accuracy of over 93.62% [14]. This predictive maintenance approach enhanced the semiconductor manufacturing productivity. Nuhu et al. introduced synthetic data generation techniques that combined two missing value imputation methods and feature selection techniques [15]. This approach achieved an accuracy ranging from 99.5% to 100% when paired with the proposed machine-learning (ML) methods. Mao et al. introduced a novel deep AE (DAE) method that fused discriminative information with a gradient descent optimization approach [16]. This technique enhanced the numerical stability of the model in cases with limited training data. Abbasi et al. presented a series of highly compact deep convolutional AE network architectures that reduced the model size while maintaining a detection accuracy comparable to that of structures with over four million parameters [17]. Givnan et al. proposed an ML method for modeling and detecting anomalies during the operation of rotating machinery. This ML approach learned and generalized based on the fault severity to generate threshold values for anomaly detection [18].
A DAE model specifically designed for factory scenarios involving chillers was introduced that effectively distinguished between normal and abnormal vibration signals based on reconstruction differences [19]. Additionally, meta-learning was employed to improve the accuracy of the new sensor models with limited vibration data. For the new sensor model with fewer vibration data, the accuracy increased by about 33.50%. However, this method is mainly oriented to anomaly detection; it has not yet considered model retraining, anomaly prediction, lightweight models, edge computing, and integration with a complete intelligent management system.
Considering the aforementioned issues, this study proposes a Pruning Quantized Unsupervised Meta-learning DegradingNet System (PQUM-DNS) to address the real-world conditions of practical factories. This approach integrates five key features based on the actual needs of factories as illustrated in Figure 1.
Figure 1. Overview of the proposed Intelligent Equipment Management System with related techniques highlighted in this paper.
(1)
Intelligent Equipment Management System (Section 3.1 and Section 4.1)
This system includes automated methods for vibration signal sensing, data transmission, data preprocessing, model training and retraining, anomaly detection, and prediction. Visual results are presented through dashboards and alert notifications are sent to onsite personnel and managers to facilitate timely problem solutions.
(2)
Meta-learning for Rapid Training of Anomaly Detection and Prediction Models across Multiple Machines (Section 3.2 and Section 4.2)
This approach rapidly establishes models for new machines or production lines with limited data by leveraging meta-learning and unsupervised learning through AEs, thereby achieving anomaly detection and prediction objectives.
(3)
Meta-learning Adaptive Model Retraining (Section 3.3 and Section 4.3)
Machine-specific models are adaptively retrained by employing meta-learning, quickly adjusting to the slow-changing characteristics of the machine over time and enabling long-term anomaly detection and prediction.
(4)
Lightweight AI Models (Section 3.4 and Section 4.4)
The proposed pruning and quantization compression model significantly reduces model size and conserves computational resources.
(5)
Edge Device Computation (Section 3.5 and Section 4.5)
Substituting traditional AI inference engines (IPC) with embedded Raspberry Pi systems enhances lightweight deployment, resource savings, cost reduction, and large-scale deployment feasibility.

3. Proposed Method

3.1. Intelligent Equipment Management System

The proposed PQUM-DNS was integrated into an intelligent equipment management system for practical field applications. Signals were initially collected from the device sensors within the system before feature extraction transformation and data preprocessing, as shown in Figure 8. The feature extraction transformation involved converting raw vibration feature values into multiple key parameters related to machine health. Data preprocessing involved eliminating unnecessary empty and abnormal values to retain only the normal values required for unsupervised learning. The processed data were stored in a database (PostgreSQL) until there were sufficient accumulated data (e.g., 3000 records, adjustable). Subsequently, both AE and Holt–Winters algorithms were used to train the anomaly detection and prediction models, respectively. An inference was performed using anomaly detection and prediction models to generate results that indicated the degree of equipment or process degradation. The inference results were stored in a database and abnormal detection and prediction outcomes were sent to the visual dashboard of the intelligent management system, providing users with insights into the machine conditions. The system issued alert notifications to relevant personnel if the results of the abnormal detection and prediction exceeded the threshold of the AI equipment or process degradation. When retraining the model with a small amount of data using PQUM-DNS, the anomaly detection model can adapt after the first training of the Pretrain and Metatrain models, and then use a small amount of data to fine-tune retraining to obtain the new model.
Figure 8. Flowchart of the proposed intelligent equipment management system.

3.2. Meta-Learning for Rapid Training of Multi-Machine Models for Anomaly Detection and Prediction

The rapid training of meta-learning multi-machine models for anomaly detection and prediction used in the PQUM-DNS was introduced in a previous study [19]. This method utilizes abundant data from numerous machines, trains the Metatrain model using the AE + meta-learning approach, and fine-tunes the Metatrain model with a small amount of data from new machines. This process yields a model adapted to a new machine, facilitating the inference for anomaly detection. In this study, the anomaly detection results were combined with prediction models to forecast future anomalies based on past anomaly detection outcomes. The following provides a brief introduction to the techniques employed.
(1)
Meta-learning
Meta-learning is a technique aimed at enabling machine-learning systems to swiftly adapt to new tasks or environments [4]. Traditional machine-learning algorithms often require large amounts of labeled data to train models. In addition, it is necessary to collect and label substantial data for retraining when faced with new tasks. In contrast, the goal of meta-learning is to train a “learner” that is capable of rapidly learning new tasks from a small amount of labeled data. This approach typically relies on prior experience with numerous similar tasks and applies this experience to new tasks. These tasks can be expressed as:
Learn   θ   s u c h   t h a t   Φ i = f θ D i t r   is   good   for   D i t s
θ * = arg m a x θ i = 1 n l o g P ϕ i D i t s
where
Φ i = f θ D i t r
D meta - train = { ( D 1 t r ,   D 1 t s ) ,   ( D n t r ,   D n t s ) }
T i D i t r = χ 1 i , y 1 i , , x k i , y k i D i t s = x 1 i , y 1 i , , χ l i , y l i
where T i is a (meta-learning) task.
An illustration of the meta-learning method is shown in Figure 9 [31].
Figure 9. Illustration of the meta-learning method.
Various meta-learning models have been proposed for deep learning that are generally categorized as follows: learning good weight initializations, metamodels that generate the parameters of other models, and learning transferable optimizers. Model-agnostic meta-learning belongs in the first category and learns a good initial weight initialization to achieve fast adaptation to new tasks, enabling rapid convergence and fine-tuning of small-scale training samples [32].
(2)
Anomaly Detection
The AI degradation level index was used to detect anomalies and the model was established using an unsupervised AE algorithm, as shown in Figure 10. This method involves constructing a model with normal data and applying an AE to compute the root mean square error (RMSE) between the input and output data, referred to as the reconstruction error. Here, the reconstruction error was defined as the AI degradation level index. A smaller value indicates a closer alignment between the model input and output values, leading to a better data reconstruction capability and a higher likelihood of normal equipment or processes.
Figure 10. Structure of the autoencoder.
In practical applications, suitable threshold values are defined based on the conditions of the equipment or processes used for anomaly determination. These are expressed as:
h = g ( W 1 X + b 1 )
X = f ( W 2 h + b 2 )
M i n i m i z e   R M S E X , X
(3)
Anomaly Prediction
Time-series algorithms can use historical data from the past to predict future trends. In this study, historical records of the health status of semiconductor manufacturing processes and machine equipment in the past can be used to predict future health status using time-series algorithms. The proposed anomaly detection models trained on the anomaly detection dataset could only detect current and past data anomalies. Therefore, an additional anomaly prediction model was required for future anomaly states. The historical data of the AI degradation level index obtained from anomaly detection were compared with various commonly used prediction models. Finally, the Holt–Winters algorithm was applied to the anomaly prediction model [7].
Various anomaly prediction algorithms are introduced below and the overview is in Table 3.
Table 3. Overview of time-series algorithms.
(a)
Simple Exponential Smoothing (SES)
This algorithm is used when there is no clear trend or seasonal pattern in the predictive data [33,34]. The prediction is calculated using weighted averages, meaning that the largest and smallest weights are associated with the most recently and least recently observed values, respectively. This is expressed as:
y ^ T + 1 | T = α y T + α 1 α y T 1 + α 1 α 2 y T 2 +
where y ^ T + 1 | T denotes the one-step-ahead forecast for time T + 1, y T denotes the most recent observation, and 0 ≤ α ≤ 1 denotes the smoothing parameter.
(b)
Holt (Double Exponential Smoothing Method)
The Holt double exponential smoothing method is an extension of the simple exponential smoothing method that predicts trends in data [9]. This method is suitable for linear trending sequences without seasonal patterns and consists of one prediction and two smoothing equations that represent the level and trend components ( l t , b t ), which are respectively expressed as:
y ^ t + h | t = l t + h b t
l t = α y t + 1 α l t 1 + b t 1
b t = β * l t l t 1 + 1 β * b t 1
where y t and l t denote the observed value and level at time t, respectively. b t , h , α , and β * denote the trend at time t, weight for the level (0 ≤ α ≤ 1), and weight for the trend (0 ≤ β * ≤ 1), respectively.
(c)
Holt–Winters Forecasting (Triple Exponential Smoothing)
Holt–Winters forecasting, also known as triple exponential smoothing, is a method used to predict the behavior of time-series data that includes trends and seasonality. This algorithm considers three factors: the level lt, trend bt, and seasonal component s t . It is effective for forecasting time-series data with seasonality patterns. There are two variations of this method: the additive and multiplicative models.
In the additive model, the components are expressed as:
y ^ t + h | t = l t + h b t + s t + h m k + 1
l t = α ( y t s t m ) + 1 α l t 1 + b t 1
b t = β * l t l t 1 + 1 β * b t 1
s t = γ y t l t 1 b t 1 + 1 γ s t m
In the multiplicative model, the components are expressed as:
y ^ t + h | t = ( l t + h b t ) s t + h m k + 1
l t = α y t s t m + 1 α l t 1 + b t 1
b t = β * l t l t 1 + 1 β * b t 1
s t = γ y t l t 1 + b t + 1 + 1 γ s t m
where s t , k , m , α , β * , and y denote the season at time t , integer part of ( h − 1)/ m , number of cycles/frequency of seasonality (e.g., four for quarterly data), level smoothing parameter, trend smoothing parameter, and seasonal smoothing parameters/weight for the season (0 ≤ γ ≤ 1), respectively.
In the additive model, the forecast value for each data element is the sum of the baseline, trend, and seasonality components. However, a multiplicative model is preferred when seasonal variations change proportionally to the level of the series.
(d)
Autoregressive Model
The autoregressive (AR) model is a statistical method used to analyze time-series data that predict the future value of a variable using its own historical data [35]. AR is an evolution of linear regression analysis, where it analyzes the relationship between parameter x and its own past value instead of analyzing the relationship between the parameter x and dependent variable y. This is expressed as:
y t = C + ϕ 1 y t 1 + ϕ 2 y t 2 + + ϕ P y t p + ε t
where y t , C , p, ϕ i , and ε t denote the stationary time series, constant term, autoregressive order, non-zero autocorrelation coefficients, and independent error term, respectively.
(e)
Moving Average
Moving average (MA) is a simple smoothing prediction technique used for time-series data that calculates a moving average over a certain number of terms to reflect long-term trends [36]. However, it is difficult to discern the development trend when time-series data are influenced by periodic and random variations causing large fluctuations. Using MAs can eliminate these influences and reveal the direction and trend of the events, which is expressed as:
y t = μ + ε t + θ 1 ε t 1 + θ 2 ε t 2 + + θ q ε t q
where y t , μ, q , and θi denote the stationary time series, mean of the sequence, moving average order, and non-zero autocorrelation coefficients, respectively.
(f)
Autoregressive Integrated Moving Average
Autoregressive integrated moving average (ARIMA) is an evolution of the AR, MA, and autoregressive moving average models. This approach is used to analyze non-stationary time-series data by transforming them into stationary data through differencing [37]. This method is employed when dealing with non-stationary time-series data that exhibit a changing mean and variance over time. A new stationary time series can be obtained by using the differences in the data, and a suitable probabilistic model can be derived from historical data to represent the dependence between time and data. ARIMA can be expressed as ARIMA (p, d, q), where p, d, and q denote the autoregressive, differencing, and moving average orders, respectively. Furthermore:
y t = c + ϕ 1 y t 1 + + ϕ P y t P + θ 1 ε t 1 + + θ q ε t q + ε t
where ϕ i and θi denote non-zero autocorrelation coefficients.
(g)
Seasonal Autoregressive Integrated Moving Average
The seasonal autoregressive integrated moving average (SARIMA) model incorporates seasonal factors into the ARIMA model [38]. Generally, the SARIMA model is denoted as SARIMA (p, q, d)(P, Q, D)s, where s, P, Q, D denote the seasonal period and seasonal autoregressive, seasonal moving average, and seasonal differencing orders, respectively. This is expressed as:
1 i = 1 p φ i B i 1 i = 1 p Φ i B i m 1 B d 1 B D m y t = 1 + i = 1 q θ i B i 1 + i = 1 q Θ i B i m ϵ t
where B denotes the lag operator, and Φ i and Θ i denote non-zero constants.

3.3. Meta-Learning Adaptive Model Retraining

Meta-learning was employed to develop an adaptive method for retraining machine-specific models [4]. This approach enables models to quickly adjust to the slow changes observed in each machine over time, thereby achieving prolonged anomaly detection. The concept of rapid training models for different machine devices was extended to different time segments by combining the background technique of AEs with meta-learning.
The data were segmented into three intervals based on chronological order: Pretrain (older and long-running machine data), Metatrain (newer and long-running machine data), and Fine-tune (latest operational data). Leveraging the principles of meta-learning, Pretrain and Metatrain data were used to train a generalized anomaly detection model, whereas Fine-tune adapted the model to the most recent machine conditions. This approach automatically trains models that adapt to data changes over time.
Model retraining was based on the operational conditions of the factory to maintain the effectiveness of the anomaly detection model. The historical data were also divided into Pretrain, Metatrain, and Fine-tune segments by utilizing seven days of equipment operation data, as shown in Figure 11. This process led to the training of an anomaly detection model that could assess the degree of equipment degradation, as shown in Figure 12. Additionally, predictive algorithms were applied to forecast equipment degradation over the next seven days. The ultimate goal was to achieve continuous automatic updates for anomaly detection and prediction.
Figure 11. Historical data of seven days segmented into Pretrain, Metatrain, and Fine-tune for training an anomaly detection model to detect equipment degradation.
Figure 12. Automatic model retraining based on time intervals, with the retrained model used for predicting anomalies in the next seven days.

3.4. Lightweight AI Model

A pruning and quantization-based meta-learning anomaly detection model was introduced owing to the possibility of certain neural neuron weights being small or negligible during the retraining process of the neural network of the proposed model. This approach significantly reduced the model size and enhanced the computational speed.
(1)
Model Pruning
Deep-learning neural network models often contain redundant parameters, with many neuron weights approaching zero. Model pruning involves removing these neurons while preserving the same model expressive capability. Model pruning retains the essential weights and parameters, reducing the number of connections between the neural network layers, as shown in Figure 13 [39]. This reduction helps to decrease the number of parameters involved in the calculations, thereby lowering the computation requirements. By maintaining the performance of the model, this approach reduces the storage space, lowers computational costs, and accelerates the training process.
Figure 13. Model pruning retains important weights and parameters while reducing the number of connections between neural network layers.
Pruning algorithms typically employ a three-stage pipeline: training, pruning, and fine-tuning. The weight adjustment process in the three-step training pipeline for pruning is shown in Figure 14. The model’s weights are trained, pruning techniques are applied to remove neurons with weights approaching zero, and the model is fine-tuned to adjust the remaining weights to approximate the performance of the original model. This iterative process helps align the performance of the model with that of the original model.
Figure 14. Weight adjustment process in the three-step training pipeline for pruning.
(2)
Model Quantization
The principle of quantization involves reducing the precision of the bits used to represent model parameters (typically 32-bit floating-point (float32) numbers) [40]. This approach results in smaller model sizes and faster computations. Model quantization involves approximating the continuous values (or a large number of possible discrete values) of the floating-point model weights with a limited set of discrete values (usually 8-bit integer (int8) numbers) at a lower inference accuracy loss, as shown in Figure 15 [40]. A lower-bit data type is used to approximate the finite-range floating-point data, which leads to a reduced model size, decreased memory consumption, and faster inference speed. The calculations are expressed as:
Q = R s + z
s = (RmaxRmin)/(SmaxSmin)
where R , Q , z, and s denote the real floating-point value, fixed-point value after quantization, fixed-point value after quantization of the 0 floating-point value, and minimum scale that can be represented after fixed-point quantization, respectively.
Figure 15. Model quantization: conversion from float32 to int8.
The model-pruning approach is applied using weight sparsity, where weights close to zero are removed from the original model. Subsequently, the model is retrained to adjust its performance. Additionally, quantization techniques are combined to convert the weights from float32 to int8, thereby significantly reducing the model size and enhancing the computational speed.

3.5. Edge Device Computing

The pruned and quantized models are deployed on embedded systems, such as a Raspberry Pi, replacing traditional IPCs. This lightweight approach conserves resources, reduces costs, and facilitates a large-scale deployment. Edge devices can analyze data and promptly alert onsite maintenance personnel in industrial scenarios where certain equipment is located in hard-to-reach locations [10,18]. This enables real-time responsiveness to the equipment conditions, allowing immediate intervention and preventing downtime.
Benefits of edge computing are as follows:
  • Provides rapid real-time reflection of situations, enabling onsite personnel to detect anomalies promptly and take immediate action.
  • Solves bandwidth issues in cloud and edge transmissions because edge devices only need to send inference results back to the control centers.
  • Addresses cybersecurity concerns, protecting against network attacks that could lead to factory shutdowns.
  • Reduces energy consumption because lightweight edge-computing models conserve power.

4. Results and Discussion

4.1. Intelligent Equipment Management System

Automated methods are used for vibration signal sensing, data transmission, data preprocessing, model training, and retraining. The results are visually presented through dashboards. The system utilizes AI degradation level values to detect anomalies and sends warning notifications for timely handling by managers and onsite personnel. A comparison between this system and traditional methods is presented in Table 4.
Table 4. Comparison of intelligent equipment management system and traditional approaches.

4.2. Meta-Learning for Rapid Training of Multi-Machine Models for Anomaly Detection and Prediction

For the sake of versatility, meta-learning is applied to multi-machine anomaly detection and prediction models with different datasets, such as factory chiller vibration data, publicly available datasets for analyzing the vibration and current of Paderborn bearings, and SECOM semiconductor analysis.
(1)
Rapid Training of Multi-Machine Models for Anomaly Detection
The proposed PQUM-DNS method is compared with the general DAE method. PQUM-DNS achieves similar AUC values with minimal data compared to those of the DAE method when used to train models for new machines. This is because PQUM-DNS used meta-learning to train a versatile model applicable to various conditions (Metatrain model). Therefore, training a new machine model only requires a small amount of data for fine-tuning, resulting in a rapidly adaptable anomaly detection model. Test results for different data types, such as chiller vibration, SECOM, and Paderborn current and vibration datasets are presented in Table 5 and Figure 16. Compared with the DAE, PQUM-DNS reduces the required data for training new machine models on average by approximately 75%, with a decrease in AUC performance of only 0.35%, which is a very close AUC value.
Table 5. Comparison of PQUM-DNS and DAE training performance for different data types.
Figure 16. Comparison of PQUM-DNS and DAE training data.
(2)
Anomaly Prediction:
PQUM-DNS detects machine degradation levels through anomaly detection and evaluated these levels using various prediction algorithms. Seven prediction algorithms are compared and one based on RMSE calculations is selected based on the predicted and actual values. The performances of these algorithms on different datasets are presented in Table 6 and Figure 17. The Holt–Winters algorithm demonstrates the best performance, with the lowest RMSE value of approximately 0.037, making it the chosen algorithm for anomaly prediction in the PQUM-DNS.
Table 6. Comparison of PQUM-DNS prediction algorithm performance on different datasets.
Figure 17. Comparison of PQUM-DNS prediction algorithm performance on different datasets.

4.3. Meta-Learning Adaptive Model Retraining

An adaptive method is employed for retraining the machine model using meta-learning. This enabled the model to adapt quickly to gradual changes over time, thereby facilitating long-term anomaly detection. PQUM-DNS chronologically segments data from the same machine and fine-tunes the model using the latest data, thereby achieving a model suited to the machine’s latest condition. Unlike the general DAE method that requires retraining with all data, PQUM-DNS significantly reduces the amount of data needed for retraining, as shown in Table 7. This is because PQUM-DNS already trains a versatile meta-learning model using past data (a meta-trained model), enabling efficient fine-tuning with a small amount of new machine data.
Table 7. Comparison of PQUM-DNS and DAE retraining performance for different data types.

4.4. Lightweight AI Model

PQUM-DNS drastically reduces the model size while maintaining a similar AUC performance to that of the non-lightweight DAE model. This is achieved by removing the near-zero weights from the original DAE-trained model and compressing the model data format from float32 to int8. Consequently, the model size is significantly reduced. The application of PQUM-DNS to various datasets demonstrates that the lightweight model size is reduced by approximately 60% with AUC performance maintained at similar levels, as shown in Table 8 and Figure 18.
Table 8. Comparison of PQUM-DNS and DAE model sizes for different data types.
Figure 18. Comparison of PQUM-DNS and DAE model sizes for different data types.

4.5. Edge Device Computing

The PQUM-DNS, with its reduced and compressed model size, is well suited for lightweight edge-computing devices. Therefore, it is applied to replace traditional IPCs with embedded systems, such as a Raspberry Pi. This substitution reduces the size and weight, conserves resources, lowers costs, and supports large-scale deployments.

5. Conclusions

This study proposes a new PQUM-DNS model, which is an intelligent device management system that combines pruning, quantization, meta-learning, anomaly detection, prediction using AEs, adaptive model retraining, and edge inference. This system effectively reduces the manual labor, provides fault notifications, prevents downtime, decreases model computational resources, accelerates the model inference speed, and enables edge inference.
The system is suitable for various factory scenarios and types of machine equipment and process states. Compared with general DAEs, the system achieves a similar AUC while reducing the training data by approximately 75%. The average RMSE of the predictive degradation degree is 0.037 for Holt–Winters, retraining is conducted using 75% fewer data with similar AUC performance, and the model size is reduced by approximately 60% through pruning and quantization. The proposed system can be deployed on lightweight edge devices, such as a Raspberry Pi, enabling real-time anomaly detection and prediction. The system demonstrates superior performance, thereby realizing intelligent equipment management and maintenance.

Author Contributions

Conceptualization, Y.-C.Y.; methodology, Y.-C.Y. and S.-W.C.; software, Y.-C.Y.; validation, Y.-C.Y.; formal analysis, Y.-C.Y.; investigation, Y.-C.Y.; resources, C.-Y.L. and S.-W.C.; data curation, Y.-C.Y.; writing—original draft preparation, Y.-C.Y.; writing—review and editing, C.-Y.L. and S.-R.Y.; visualization, Y.-C.Y.; supervision, C.-Y.L., J.-T.C. and S.-W.C.; project administration, C.-Y.L., Y.-C.Y. and S.-R.Y.; funding acquisition, Y.-C.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

Publicly available datasets were analyzed in this study. This data can be found here: https://mb.uni-paderborn.de/kat/forschung/kat-datacenter/bearing-datacenter/data-sets-and-download, accessed on 1 June 2023 (Paderborn University Bearing Dataset) and https://archive.ics.uci.edu/dataset/179/secom, accessed on 1 June 2023 (SECOM Dataset).

Acknowledgments

Thanks for the data support from Advantech Taipei headquarters and the technical support from SI2 Laboratory at National Yang Ming Chiao Tung University.

Conflicts of Interest

The authors declare no conflict of interest.

Nomenclature

AEAutoencoder
AIArtificial Intelligence
ARAutoregressive
ARIMAAutoregressive Integrated Moving Average
AUCArea under Curve
DAEDeep AE
FFTFast Fourier Transform
IPCIndustrial Personal Computer
LoRaWANLong-Range Wide-Area Network
MAMoving Average
MCMMachine Condition Monitoring
MCSMotor Current Signal
MLMachine Learning
PHMPrognostics and Health Management
PQUM-DNSPruning Quantized Unsupervised Meta-learning DegradingNet Solution
PSDPower Spectral Density
RMSRoot Mean Square
RMSERoot Mean Square Error
ROCReceiver Operating Characteristics
SARIMASeasonal Autoregressive Integrated Moving Average
USBUniversal Serial Bus
WPDWavelet Packet Decomposition

References

  1. Tung, F.; Mori, G. Clip-q: Deep network compression learning by in-parallel pruning-quantization. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 7873–7882. [Google Scholar]
  2. Frantar, E.; Alistarh, D. Optimal brain compression: A framework for accurate post-training quantization and pruning. Adv. Neural Inf. Process. Syst. 2022, 35, 4475–4488. [Google Scholar]
  3. Hu, P.; Peng, X.; Zhu, H.; Aly, M.M.S.; Lin, J. Opq: Compressing deep neural networks with one-shot pruning-quantization. Proc. AAAI Conf. Artif. Intell. 2021, 35, 7780–7788. [Google Scholar] [CrossRef]
  4. Hospedales, T.; Antoniou, A.; Micaelli, P.; Storkey, A. Meta-learning in neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5149–5169. [Google Scholar] [CrossRef] [PubMed]
  5. Givnan, S.; Chalmers, C.; Fergus, P.; Ortega-Martorell, S.; Whalley, T. Anomaly detection using autoencoder reconstruction upon industrial motors. Sensors 2022, 22, 3166. [Google Scholar] [CrossRef]
  6. Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
  7. Holt, C.C. Forecasting seasonals and trends by exponentially weighted moving averages. Int. J. Forecast. 2004, 20, 5–10. [Google Scholar] [CrossRef]
  8. Jiang, W.; Xu, Y.; Chen, Z.; Zhang, N.; Xue, X.; Liu, J.; Zhou, J. A feature-level degradation measurement method for composite health index construction and trend prediction modeling. Measurement 2023, 206, 112324. [Google Scholar] [CrossRef]
  9. Lehmann, A. Joint modeling of degradation and failure time data. J. Stat. Plan. Inference 2009, 139, 1693–1706. [Google Scholar] [CrossRef]
  10. Bellavista, P.; Della Penna, R.; Foschini, L.; Scotece, D. Machine learning for predictive diagnostics at the edge: An IIoT practical example. In Proceedings of the ICC 2020–2020 IEEE International Conference on Communications (ICC), Dublin, Ireland, 7–11 June 2020; pp. 1–7. [Google Scholar]
  11. Li, P.; Pei, Y.; Li, J. A comprehensive survey on design and application of autoencoder in deep learning. Appl. Soft Comput. 2023, 138, 110176. [Google Scholar] [CrossRef]
  12. Singh, S.A.; Desai, K. Automated surface defect detection framework using machine vision and convolutional neural networks. J. Intell. Manuf. 2023, 34, 1995–2011. [Google Scholar] [CrossRef]
  13. Tang, T.W.; Hsu, H.; Li, K.M. Industrial anomaly detection with multiscale autoencoder and deep feature extractor-based neural network. IET Image Process. 2023, 17, 1752–1761. [Google Scholar] [CrossRef]
  14. Pradeep, D.; Vardhan, B.V.; Raiak, S.; Muniraj, I.; Elumalai, K.; Chinnadurai, S. Optimal Predictive Maintenance Technique for Manufacturing Semiconductors using Machine Learning. In Proceedings of the 2023 3rd International Conference on Intelligent Communication and Computational Techniques (ICCT), Jaipur, India, 19–20 January 2023; pp. 1–5. [Google Scholar]
  15. Nuhu, A.A.; Zeeshan, Q.; Safaei, B.; Shahzad, M.A. Machine learning-based techniques for fault diagnosis in the semiconductor manufacturing process: A comparative study. J. Supercomput. 2023, 79, 2031–2081. [Google Scholar] [CrossRef]
  16. Mao, W.; Feng, W.; Liu, Y.; Zhang, D.; Liang, X. A new deep auto-encoder method with fusing discriminant information for bearing fault diagnosis. Mech. Syst. Signal Process. 2021, 150, 107233. [Google Scholar] [CrossRef]
  17. Abbasi, S.; Famouri, M.; Shafiee, M.J.; Wong, A. OutlierNets: Highly compact deep autoencoder network architectures for on-device acoustic anomaly detection. Sensors 2021, 21, 4805. [Google Scholar] [CrossRef]
  18. Yazici, M.T.; Basurra, S.; Gaber, M.M. Edge machine learning: Enabling smart internet of things applications. Big Data Cogn. Comput. 2018, 2, 26. [Google Scholar] [CrossRef]
  19. Yu, Y.-C.; Chuang, S.-W.; Shuai, H.-H.; Lee, C.-Y. Fast Adaption for Multi Motor Anomaly Detection via Meta Learning and deep unsupervised learning. In Proceedings of the 2022 IEEE 31st International Symposium on Industrial Electronics (ISIE), Anchorage, AK, USA, 1–3 June 2022; pp. 1186–1189. [Google Scholar]
  20. Advantech. WISE-2410—LoRaWAN Wireless Vibration Sensor—Advantech. 2023. Available online: https://www.advantech.com/en/products/b7e2306f-d561-4ca9-b0e3-33f7057e185f/wise-2410/mod_25018dc7-355c-40b4-bf9b-c93f6c73f1a0 (accessed on 1 June 2023.).
  21. Advantech. WebAccess_MCM_DS(07.18.17)—Advantech Support—Advantech. 2023. Available online: https://www.advantech.com/en/support/details/datasheet?id=b5660e1c-d223-40ed-86bd-bdab7be541d7 (accessed on 1 June 2023.).
  22. Artono, B.; Susanto, A.; Hidayatullah, N.A. Design of Smart Device for Induction Motor Condition Monitoring. J. Phys. Conf. Ser. 2021, 1845, 012035. [Google Scholar] [CrossRef]
  23. Lessmeier, C.; Kimotho, J.K.; Zimmer, D.; Sextro, W. Condition monitoring of bearing damage in electromechanical drive systems by using motor current signals of electric motors: A benchmark data set for data-driven classification. In Proceedings of the European Conference of the PHM Society, Bilbao, Spain, 5–8 July 2016; Volume 3. [Google Scholar]
  24. Huang, S.R.; Huang, K.H.; Chao, K.H.; Chiang, W.T. Fault analysis and diagnosis system for induction motors. Comput. Electr. Eng. 2016, 54, 195–209. [Google Scholar] [CrossRef]
  25. Salem, M.; Taheri, S.; Yuan, J.S. An experimental evaluation of fault diagnosis from imbalanced and incomplete data for smart semiconductor manufacturing. Big Data Cogn. Comput. 2018, 2, 30. [Google Scholar] [CrossRef]
  26. Ho, T.K. The random subspace method for constructing decision forests. IEEE Trans. Pattern Anal. Mach. Intell. 1998, 20, 832–844. [Google Scholar]
  27. Ho, T.K. Random decision forests. In Proceedings of the 3rd International Conference on Document Analysis and Recognition, Montreal, QC, Canada, 14–16 August 1995; Volume 1, pp. 278–282. [Google Scholar]
  28. Dua, D.; Graff, C. UCI Machine Learning Repository; University of California, School of Information and Computer Science: Irvine, CA, USA, 2019; Available online: https://archive.ics.uci.edu/ml/datasets.php (accessed on 1 June 2023.).
  29. Huang, J.; Ling, C.X. Using AUC and accuracy in evaluating learning algorithms. IEEE Trans. Knowl. Data Eng. 2005, 17, 299–310. [Google Scholar] [CrossRef]
  30. Type I Error and Type II Error. Available online: https://explorable.com/type-i-error (accessed on 12 February 2024).
  31. ICML. ICML 2019 Meta-Learning Tutorial. 2019. Available online: https://sites.google.com/view/icml19metalearning (accessed on 1 June 2023.).
  32. Finn, C.; Abbeel, P.; Levine, S. Model-agnostic meta-learning for fast adaptation of deep networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1126–1135. [Google Scholar]
  33. Gardner, E.S., Jr. Exponential smoothing: The state of the art. J. Forecast. 1985, 4, 1–28. [Google Scholar] [CrossRef]
  34. Ostertagova, E.; Ostertag, O. Forecasting using simple exponential smoothing method. Acta Electrotech. Et Inform. 2012, 12, 62. [Google Scholar] [CrossRef]
  35. Amo-Salas, M.; López-Fidalgo, J.; Pedregal, D.J. Experimental designs for autoregressive models applied to industrial maintenance. Reliab. Eng. Syst. Saf. 2015, 133, 87–94. [Google Scholar] [CrossRef]
  36. Zhao, Z.; Liu, F. Industrial monitoring based on moving average PCA and neural network. In Proceedings of the 30th Annual Conference of IEEE Industrial Electronics Society, IECON 2004, Busan, Republic of Korea, 2–6 November 2004; Volume 3, pp. 2168–2171. [Google Scholar]
  37. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Sries Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  38. Liang, Y.-H. Combining seasonal time series ARIMA method and neural networks with genetic algorithms for predicting the production value of the mechanical industry in Taiwan. Neural Comput. Appl. 2009, 18, 833–841. [Google Scholar] [CrossRef]
  39. Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
  40. Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and training of neural networks for efficient integer-arithmetic-only inference. In Proceedings of the IEEE Conference on Computer Cision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2704–2713. [Google Scholar]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.