Self-Adaptive Server Anomaly Detection Using Ensemble Meta-Reinforcement Learning

Chang, Bao Rong; Tsai, Hsiu-Fen; Chen, Guan-Ru

doi:10.3390/electronics13122348

Open AccessArticle

Self-Adaptive Server Anomaly Detection Using Ensemble Meta-Reinforcement Learning

by

Bao Rong Chang

¹

,

Hsiu-Fen Tsai

^2,*

and

Guan-Ru Chen

¹

Department of Computer Science and Information Engineering, National University of Kaohsiung, Kaohsiung 81148, Taiwan

²

Department of Fragrance and Cosmetic Science, Kaohsiung Medical University, Kaohsiung 80708, Taiwan

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(12), 2348; https://doi.org/10.3390/electronics13122348

Submission received: 23 May 2024 / Revised: 9 June 2024 / Accepted: 13 June 2024 / Published: 15 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

As the user’s behavior changes at any time with cloud computing and network services, abnormal server resource utilization traffic will lead to severe service crashes and system downtime. The traditional single anomaly detection model cannot handle the rapid failure prediction ahead. Therefore, this study proposed ensemble learning combined with model-agnostic meta-reinforcement learning called ensemble meta-reinforcement learning (EMRL) to implement self-adaptive server anomaly detection rapidly and precisely, according to the time series of server resource utilization. The proposed ensemble approach combines hidden Markov model (HMM), variational autoencoder (VAE), temporal convolutional autoencoder (TCN-AE), and bidirectional long short-term memory (BLSTM). The EMRL algorithm trains this combination with several tasks to learn the implicit representation of various anomalous traffic, where each task executes trust region policy optimization (TRPO) to quickly adapt the time-varying data distribution and make rapid decisions precisely for an agent response. As a result, our proposed approach can improve the precision of anomaly prediction by 2.4 times and reduce the model deployment speed by 5.8 times on average because a meta-learner can immediately be applied to new tasks.

Keywords:

cloud computing; anomaly detection; ensemble learning; model-agnostic meta-reinforcement learning; trust region policy optimization

1. Introduction

In the fab, the network server can connect to the chip packaging and testing machine and update the machine parameters online in real time. Cloud computing at the data center monitors the running status of several network servers around the clock, including the logs of processors, memories, disks, and network devices, and issues warnings when abnormal conditions occur. Among them, time series anomaly detection is a big challenge. First, outlier samples are infrequent, often leading to class imbalance, and second, continuous anomalies frequently occur, making strange patterns challenging to discern. In the resource usage sequence of the cloud server system, we know that when the user behavior changes, the data distribution changes, making the model unable to adapt immediately. If a high-risk application runs, a failure can have a significant impact.

In recent years, ensemble learning [1] has continuously developed in machine learning and has been applied to different fields. Integrating different neural networks or the results of training the same model on various data sets can improve the model’s generalization ability. The advantage of this method is that it can combine multiple models to overcome the limitations of a single model and improve overall prediction ability and stability, making it of great application value in practical applications.

The rapid development of reinforcement learning (RL) [2] has dramatically improved the problem of difficult identification of abnormal patterns. Deep reinforcement learning can learn the states through interaction with the environment and adjusting model parameters through experience to form an optimal strategy. However, reinforcement learning is strongly dependent on the environment. When the environment changes, the previous optimal strategy will also be invalid, and adapting to the complex environment changes in the cloud will be challenging. For this problem, we incorporate meta-learning. The meta-learning algorithm [3] rapidly learns a small number of dedicated samples, using experience to quickly and accurately understand a new and small amount of data. Model-Agnostic Meta-Learning (MAML) [4] is one of the algorithms of meta-learning. It has nothing to do with the model and focuses on learning the common representation between various tasks. We study the application of this algorithm to adapt to abnormal patterns to solve the above problems quickly.

This study aims to implement anomaly detection and prediction for cloud application service at a semiconductor company in Kaohsiung, Taiwan (abbreviated S-company). Cloud application service includes the joint operation and maintenance of multiple virtual machines. The company uses Zabbix Server [5] to monitor the values of various services and sets a warning to notify the administrator. However, this solution can only notify devices with high usage rates and cannot reflect the overall failure and the actual cause of the abnormality. Therefore, this study first analyzes Zabbix Server monitoring data, uses sliding windows to generate time series data and abnormal labels, and uses two Python packages: time series feature extraction library (TSFEL) [6] and Python outlier detection (PyOD) [7], which can extract transferable meta-features. Next, we use the MAML with the reinforcement learning (RL) algorithm [8] (denoted MAML-RL) to generate different subtasks for training during the training phase and record the policy loss and network parameters. In the outer loop, we use Trust Region Policy Optimization (TRPO) [9] to find the optimal strategy to maximize the reward of the decision to generate an initial model. Finally, we can quickly obtain the online prediction model [10,11] using the target data set for small-step training.

The following paragraphs of this paper are arranged as follows. In Section 2, related work will be described in various meta-learning models, a few time series prediction models, training information, and visualization tools. We give the method to implement the system in Section 3. The study’s obtained experimental results and discussion are found in Section 4. Finally, we drew a brief conclusion in Section 5.

2. Related Work

2.1. Literature Review

The increase in the scale of the cluster system leads to a sharp rise in the failure rate. If the manufacturing system unexpectedly crashes, it will cause considerable losses in manufacturing costs. The manufacturing process could significantly reduce operating costs if the intelligent system can give an alert before a machine fails [12].

People have applied the intelligent warning system to the following fields: time series forecasting, anomaly detection, and log analysis methods [13,14]. Many of them use the server’s resource usage, e.g., CPU, Memory, Networking Traffic Flow, as the primary goal. The standard approach of using historical data to train a model offline can fail in dynamic environments where the definition of normal behavior undergoes concept drift over time and may invalidate the model. Therefore, an important topic is how to effectively and over the long-term adapt to complex time series.

Reinforcement learning (RL) has achieved good results in the control field. Some studies have integrated reinforcement learning into time series anomaly detection. Compared with general methods, reinforcement learning has improved the prediction accuracy considerably. Reinforcement learning has a certain level of ability to identify abnormalities in time series; however, it still has a problem with online adaptability. Meta-learning has the advantages of few-shot learning, excellent adaptability, and strong potential in anomaly detection. Therefore, some studies have used the MAML method for anomaly detection with a small amount of data. In addition, few-shot learning using meta-policies is significantly better than other learning frameworks [15]. Meta-ADD [15] proves the transferability of meta-strategy and meta-features in anomaly detection and also adds an active learning method to improve the model’s accuracy.

2.2. Time Series Anomaly Detection

A time series is a series of data within a fixed time interval, usually a continuous value, and we can separate it into two categories: univariate time series and multivariate time series [16]. Time series anomaly detection is used to identify potential anomalies in a sequence. However, time series anomalies are complex and changeable, and sometimes, a series of complex continuous time anomalies appear, making detection algorithms difficult to identify. Therefore, anomaly detection in a time series is challenging, and people will pay a huge price if the prediction is inaccurate. We can use many methods to detect anomalies, give examples, and illustrate the advantages and disadvantages of the methods. (1) Anomaly detection based on statistics is the earliest method used. It assumes that the target data are a typical distribution. Furthermore, if the data contain high-dimensional data points, this method cannot identify primarily spatial anomalies. (2) Supervised learning must rely on much-labeled data to train the model. However, obtaining enough data and labels in the real world is complicated. Furthermore, the proportion of abnormal and normal data categories is seriously unbalanced, leading to poor performance of the trained classifier. (3) Semi-supervised learning based on the autoencoder method [17,18] includes a symmetrical network structure and a hidden vector, as shown in Figure 1. Since the scarcity of positive samples often limits anomaly detection [19], autoencoders can learn normal data distributions and determine those with significant reconstruction errors as anomalies. However, it will fail in cloud facility maintenance due to rapid environmental changes.

2.3. Reinforcement Learning

Reinforcement learning is an algorithm that can modify an agent’s policy by interacting with the environment. Anomaly detection in a time series formulates this problem as a Markov decision process (MDP), which consists of a loop of a policy network, feedback, and environment state, as shown in Figure 2. The agent must maximize reward by learning a control policy. Its self-improvement property solves the problem of the sequence not having a clear normal pattern. However, the excessive dependence on reinforcement learning of the environment leads to the need to re-simulate the process tens of thousands of times when updating the model, which increases the difficulty.

2.4. Model-Agnostic Meta-Learning (MAML)

Model-agnostic meta-learning (MAML) is a classic meta-learning algorithm because it has nothing to do with the model, and we can regard it as a framework for the model to learn by itself. For example, embedding another deep learning method, like reinforcement learning, can improve prediction accuracy. MAML performs classification training through multiple small subtasks. It generates an initial model that can quickly adapt to new tasks, as shown in Figure 3, because of its fast adaptability and model-independent advantages. We apply it to the cloud anomaly detection system. In addition to having the self-correction ability of reinforcement learning, it can also quickly adapt to rapid changes in the cloud environment.

2.5. Ensemble Learning

Ensemble learning can combine a few models or learners to improve the overall prediction performance and accuracy. A few models or learners are weak, and a weak learner performs a slight prediction better than random guessing. The main idea of this approach is to combine a few weak learners to establish a powerful ensemble model that can improve the overall prediction accuracy and generalization ability. There are four ensemble learning types: bagging, boosting, blending, and stacking. The advantage of ensemble learning is that it can effectively combine the benefits of a few models, reduce the risk of overfitting, and improve the stability and accuracy of overall prediction. People widely use ensemble learning to solve machine-learning problems, including feature selection, anomaly detection, regression, and classification. This method has been proven to improve prediction performance significantly in practice and, therefore, has gained widespread attention in the industry. This study adopts the blending method in integrated learning, as shown in Figure 4.

2.6. Hidden Markov Model (HMM)

Markov chain is a mathematical model used to infer the probability of the same type of events (different states) occurring sequentially because the state that occurs before will affect the state that occurs later. Figure 5 shows the flow chart of the hidden Markov model (HMM) [20], which finds some hidden influencing factors that can predict the results. In Figure 5,

x (t)

is a hidden state or hidden variable. The observer cannot directly observe the variable, so this is what we imagined, which means that some decision-making factor affects your results.

y (t)

is the observation state or observation variable, which is what we observed. For example, if we toss a coin five times in a row, the results are positive, negative, and positive. These five times are the states we observed. In each toss, the strength and direction of the hand when holding a coin, the wind speed of the air, etc., are hidden states.

2.7. Variational AutoEncoder (VAE)

The original goal of AutoEncoder was to use the deep learning network to train the entire model through dimensionality reduction (Encoder) and dimensionality enhancement (Decoder). The ultimate goal is to find the critical dimensions and make the input and output layers as close as possible. The simple AurtoEncoder still has some performance limitations, and it may not be able to do a good job when restoring dimensions. Therefore, VAE adds some noise into AutoEncoder to improve the results through normal distribution sampling, as shown in Figure 6.

2.8. Temporal Convolutional Autoencoder (TCN-AE)

In Figure 7, the temporal convolutional autoencoder (TCN-AE) [21] is primarily used for unsupervised anomaly detection [22] in machine health monitoring, and this is a meticulously designed architecture for handling time series data. The temporal convolution network (TCN) layer further enhances this process by boosting the extraction of temporal dependencies and improving feature representation through methods like dimensionality reduction and non-linear processing. These refined features are crucial as they are subsequently used to train a fully connected neural network, ensuring precise classification and robust anomaly detection. The distinct advantage of TCN-AE lies in its robust feature encoding and processing capabilities, which are essential for effective unsupervised anomaly detection. This model identifies subtle irregularities, enabling early diagnostics and interventions vital for maintaining optimal machine performance and extending equipment life.

2.9. Bidirectional LSTM (BLSTM)

Forward data represent input in sequential order, and backward data in reverse order. Two long short-term memory (LSTM) models are trained simultaneously through forward and reverse data. The final output y value is obtained by averaging the results predicted by two models. In Figure 8, we can see that after data enter the model, the parameter

c_{0}

is the long-term memory data of the previous polishing parameters,

h_{0}

is the prediction result of the last series time,

S_{1}

is the polishing parameter signal of the current time series, and

c_{0}

will pass a forgetting gate of ⊗, in which

c_{0}

will determine its forgetting proportion with the value calculated through the Sigmoid activation function between

h_{0}

and

S_{1}

. After that,

h_{0}

and

S_{1}

will pass through a memory gate represented by ⊗ through the value of tanh and

S_{1}

to determine which information to memorize. The circuit will add the data passing through the memory gate to

c_{0}

to become

c_{1}

.

c_{1}

will use tanh to decide whether to add the output value of the current cell. If added, the value processed by the Sigmoid function and the current timing input

S_{1}

value becomes the output of the current cell.

3. Methods

3.1. Anomalous Analysis of Computing Resource

The object of this study is the cluster S-company used for uploading, searching, and calculating online packaging and testing production line data. The cluster uses the Zabbix Server tool to monitor the state of the cluster. Zabbix Server is a monitoring tool that can monitor the utilization rate of various services in the data center and export the monitoring data into CSV files for real-time storage. First, data collection received the resource usage data of the S-company application service from 2 February 2021 to 10 March 2021. The Oplus application is an essential service on the packaging and testing production line. The computing resource of the application service combines several virtual machines. According to the operation of the application service, we found that the memory will switch once when its utilization rate reaches the peak, and the CPU usage will also reach its peak immediately after the switch. After the switch, the memory will maintain an upward trend for about 20 days, as shown in Figure 9. Figure 9a displays the plot of the CPU usage trend, and Figure 9b presents the plot of the memory usage trend.

The hard disk queue length is also an important indicator. When the hard disk queue length exceeds 2, the hard disk has too many tasks waiting to be processed. The CPU utilization service is high and low when running the Oplus application, and the hard disk queue length is longer, as shown in Figure 10. Therefore, it is hard to use the threshold of the hard disk queue length to determine the anomaly of a complex trend.

3.2. Abnormal Data Labeling

The data collection is to receive data every three minutes. Data include CPU, Memory, Disk Queue, and other data, as well as virtual machines, denoted VM1, VM2, VM3, VM4, etc. We mainly observe data about the system CPU and Memory usage as the main features, as shown in Figure 11. We can label the actual anomalies with a fixed-width window. We use the local normalization method [23] for the data in the sliding window to show the sudden data change. Therefore, we mark the sign digit as 1 for the data set in the abnormal interval and 0 for the other normal data. As long as there is an abnormal point in the window, we might determine that the window may be abnormal, as shown in Figure 12. We hope to warn the system according to the binary digital label when the detection model encounters anomaly precursors.

3.3. Meta-Feature Extraction

This study makes further feature extraction for time series. The original time series contains complex feature space, which leads to the poor training effect of reinforcement learning. We adopt a library of time series feature extraction (TSFEL) to extract transferable meta-features of time series. Firstly, Python Pandas reads the virtual machine CPU and memory usage data with the sequence numbers denoted VM1, VM2, VM3, VM4, etc. Then, TSFEL can extract statistical domain features, e.g., maximum, minimum, and gradient. We can observe various values and line charts, as shown in Figure 13.

Since the meta-learning process spans data sets with different attributes, we need to extract features that can rely on the meta-data set at a low level to prevent features from becoming invalid when the data changes. We integrated several unsupervised learning models as preprocessing modules to extract a transferable meta-feature from different data and more accurately evaluate the degree of abnormal data.

This step uses the PyOD suite integrated OneClass SVM (OCSVM) and Isolation Forest (iForest) unsupervised exception detection model. It uses the exception score output by the model as the status map of the data set. In addition, we also add the distance between the sample and the abnormal data as a feature. After meta-feature extraction, we saved it as a CSV file for meta-learning training. This paper uses Anaconda [24] to establish a virtual environment and uses CUDA [25], Cudnn [26], and RTX3080 to accelerate the training process. We use Python as the main in-depth learning framework and Visual Studio Code as the primary code editor.

3.4. Meta-Reinforcement Learning

Meta-reinforcement learning uses the reinforcement learning from pixels autonomous driving agent (RLAD) [27] method to establish an ensemble learning environment, as shown in Figure 14. First, we use a fixed sliding window to capture data as the environment state (State) in the time series, and second, we can input the environment state into the policy network

{E v a l}_{N}

. After that, it will output the decision of the agent where it sets two discrete states: 0 (indicated a normal state) and 1 (indicated an abnormal state), and there is a probability ε to determine whether the agent follows which suggestion made by the policy network. The agent’s output decision will calculate the reward value. Finally, a memory module (Replay Memory) will store the transition state <s, a, r, s’>, as shown in Figure 15.

For the ensemble meta-learning, we regard the environment as a small classification task and slightly modify the ensemble learning environment as a subtask of meta-learning. The static ensemble learning environment changed in only a single interaction, and this study established multiple environments concurrently training the numerous execution threads. Action is to judge whether the agent in reinforcement learning is abnormal. Ensemble meta-learning sets the reward so that when the meta-strategy selects correct abnormal data, we give him a reward of 1, and if there is a misjudgment, a small negative reward of −0.1 is given. However, when the system predicts correct normal data, we provide a reward of 0, and we encourage the system to be able to find more true abnormal instances. The experiment uses HMM, VAE, TCN-AE, and BLSTM models to establish a policy network. The policy network will receive the environment state, output a probability distribution, and then sample an action from the distribution. After each task operation, meta-reinforcement learning stores the state, action, and reward in Replay Memory.

3.5. MAML-RL Algorithms

Model-Agnostic Meta-Learning (MAML) uses the learn2learn [28] suite and Python language development. First, our inner loop arranges multiple execution threads for simultaneous execution in multiple reinforcement learning environments and gives a task and policy network (

θ_{r l}

) for each environment. Next, the State, Action, Reward, and Loss generated in each environment and the updated policy network parameters (

θ_{r l}^{'}

) are stored in Iteration Replay. In the Outer Loop section, we choose trust region policy optimization (TRPO) [9] to find the best strategy. The trajectory of the TRPO algorithm reuses the strategy to increase the utilization rate of samples. Also, it ensures that reinforcement learning will not affect the model’s learning due to the change in strategy during the training process. The TRPO algorithm also delineates a trusted strategy learning area to ensure the stability and effectiveness of policy learning, as shown in Figure 16.

3.6. Combining Ensemble Learning with MAML-RL Learning

This study only selects four base learners, such as HMM, VAE, TCN-AE, and BLSTM, to construct an ensemble meta-learning in the experiments. Although other applicable base learners have been found, such as ARIMA, sparse transformer, and XLNet, their implementation effects are not sound in the chipping prediction during wafer dicing. This study integrates the concept of Ensemble Learning into the MAML-RL framework and trains four different MAML models. Four different models, HMM, VAE, TCN-AE, and BLSTM, are used for training. The final result is found by voting by weighted average from the results generated by four different models based on their weighted values, as shown in Figure 17.

BLSTM cannot perform very well in chipping prediction in some iterations (or at some time instants), where HMM or VAE might achieve better prediction accuracy than BLSTM. Therefore, the overall result can achieve the best prediction accuracy due to the voting by the weighted average of every output among the three models. In ensemble learning, Equation (1) calculates the weight of each learner

ω_{k}

, where

k

represents a specific

k

th learner,

{t e}_{k}

stands for the training error of the

k

th learner attained from the MAML, and

m

is the number of all learners. Next, Equation (2) computes the weighted average of the outputs of all learners

{o u t}_{e n s e m b l e}

, where

k

is a specific kth learner,

ω_{k}

denotes the weight of each learner,

{o u t}_{k}

stands for the output of the

k

th learner attained from the MAML, and

m

represents the number of all learners, and

ω_{k} = 1 - \frac{{t e}_{k}}{\sum_{k = 1}^{m} {t e}_{k}}, w h e r e k = 1, 2, \dots, m; \sum_{k = 1}^{m} ω_{k} = 1

(1)

{o u t}_{e n s e m b l e} = \sum_{k = 1}^{m} ω_{k} \cdot {o u t}_{k}

(2)

3.7. Training the Meta-Reinforcement Model

This step first uses the OpenAI Gym suite to register the resource utilization environment (step 3.5) as this experiment’s environment and sets the Inner Loop’s hyperparameters, as listed in Table 1. Each iteration sets a log file to record the accuracy of the reward and the current final model and uses a TensorBoard to observe the training process trend, as shown in Figure 18 and Figure 19.

3.8. Policy Network Evaluation

During the training process, we can observe the cumulative reward value of each iteration and modify the reward and punishment mechanism set by the environment by observing the trend, as shown in Figure 20. We use Precision, Recall, and F1-score to evaluate whether the model can deploy to a cloud server at this stage, as shown in Equations (3)–(5). The initialization model will first receive the stream data of the target device in an offline manner and then conduct a short-step gradient update, and finally, evaluate whether the test loss converges and deploys to the target device.

P r e c i s i o n = \frac{T P}{T P + F P}

(3)

R e c a l l = \frac{T P}{T P + F N}

(4)

F_{1} = 2 \times \frac{P r e c i s i o n * R e c a l l}{P r e c i s i o n + R e c a l l}

(5)

4. Experimental Results and Discussion

We conducted three experiments in this study: length of adaptation step pick-up, training time estimation, and performance evaluation. Length of adaptation step pick-up uses different adaptation step sizes for training, observes the total reward feedback in the process, and then checks the feasibility of online adaptation. Training time estimation calculates the time-consuming part during the training phase and examines which one can take advantage of the time-consuming part. Performance evaluation can compute the overall performance of anomaly detection between models and find the best-performed semi-supervised model for anomaly detection.

4.1. Experimental Setting

The experiments announced the hardware specifications of the GPU workstation, as listed in Table 2. In addition, the runtime environment announced the recipes of software packages, as listed in Table 3. This study uses Conda and Python as the environment for deep learning model training. At the same time, we use CUDA and CUDA Toolkit to perform GPU operations. In addition, some Python packages and tools support different needs in modeling deep learning. PyTorch is commonly used to build and train deep learning models. cuDNN is designed to operate deep neural networks efficiently and is compatible with deep learning frameworks like PyTorch. TensorBoard can track and visualize various indicators in machine learning experiments to help analyze and understand the model training process. Matplotlib is a Python plotting library for creating static, animated, and interactive visualizations. Torchsummary can view the structure and number of parameters of the PyTorch model. PyOD supports multiple anomaly detection algorithms to facilitate data anomaly analysis and processing.

4.2. Data Collection

The data set part uses the Oplus application, which S-company uses to store online packaging and testing production line data. It will output whenever storage has logged new data generated from production machines or the staff needs to inquire about them. At present, exceptions often occur in ICT equipment, resulting in delays when users ask for data for manufacturing analysis. We have collected the training data about CPU resource usage from 1 May 2021 to 31 August 2021, and the test data from 1 September 2021 to 18 September 2021, as shown in Figure 21.

We set the sliding window size to 25, normalized the data to [0, 1] interval value, and stored it as a CSV file after the meta-feature extraction before training. During model training, the experiment divided the data set into a training set and a testing set at a ratio of 8:2, and we extracted 10% of the training data set as the verification set.

4.3. Length of Adaptation Step Pick-Up

The experiment tests the effect of different adaptation steps on reward. We set the learning rate of the inner loop as 0.1 and the learning rate of the outer loop as 0.8. The network structure consists of two layers of neurons with a length of 100 for each. It tests the reward under the number of adaptive steps with 1, 3, and 5, as shown in Figure 22. We can use a longer adaptation step to obtain a better reward, but the training and inference times may become longer. This study only needs to adapt step 5 because the modeling’s training and inference time will not be too long.

4.4. Training Time Estimation

This experiment chooses four semi-supervised deep learning algorithms, such as hidden Markov model (HMM), variational autoencoder (VAE), temporal convolutional autoencoder (TCN-AE), and bidirectional long short-term memory (BLSTM) [29]. It builds an ensemble meta-reinforcement learning (EMRL) model integrated with the abovementioned four models. In the experiment, adaptive time was calculated as the model training time, divided into the number of seconds spent in the first training phase of the initial task and the time spent in the second training phase of re-adaptation of the new task. The first training phase used 31,920 data on the hardware utilization ratio of the S-company Oplus application from 1 May 2021 to 18 September 2021 as the prior knowledge of the initialization meta-learning. The 8331 data from 1 September 2021 to 18 September 2021 were used as the test data during the second training phase. We will end the adaptive time counting when the F1-score of the trained model at the second training phase reaches 0.65. Table 4 compares the time-consuming process of the first training phase for the initial task and the second for adapting to the new task.

4.5. Performance Evaluation

In cloud services, the experiment evaluates the performance of time series anomaly detection, such as precision, recall, and F1-score indices. This study chooses the hidden Markov model (HMM), variational autoencoder (VAE), temporal convolutional autoencoder (TCN-AE), and bidirectional long short-term memory (BLSTM), which are semi-supervised deep learning algorithms often applied to the anomaly detection in time series. Compared with the four single deep learning methods, the proposed ensemble meta-reinforcement learning (EMRL) can outperform them, as listed in Table 5. In the abnormal alarm threshold, we use several abnormal windows occurring in a fixed interval to judge. Suppose the number of abnormal windows exceeds the threshold that happens in a specific interval. In that case, we will judge it as an anomaly to issue an alarm, and in this way, we can prevent excessively sensitive false alarms from occurring in abnormal windows.

4.6. Discussion

The adaptation steps experiment shows us that the adaptive step size can increase during the training phase. Still, the reward does not increase with the occasional increase in adaptive step size. The experiment has shown that the model will gradually converge at about the 40th iteration during the training phase and can reduce the number of iterations to speed up the training process. According to the different model adaptation times in adaptive time experiments, the ensemble meta-reinforcement learning proposed in this study has been the most time-consuming in the first training. Nevertheless, the ensemble meta-reinforcement learning model will have the least adaptation time when there are new applicable tasks or the model needs to be updated. The above facts prove that our method can quickly adapt to the changed data set. Finally, our proposed approach can outperform the other methods from the performance evaluation. Compared to the hidden Markov model, the proposed ensemble meta-reinforcement learning can improve the precision of anomaly prediction by around 2.4 times, and we do not need to use an overly complex structure design in the neural network structure to achieve the goal of anomaly detection. In Table 4, the proposed approach EMRL in the second training can reduce the model deployment speed by 5.8 times on average because the meta-leaner can be immediately applied to new tasks. Like a pre-trained model, a meta-learner can quickly implement few-shot training to obtain an optimal model for a new task.

This experiment faces certain limitations. In the study of anomaly monitoring within server resources, the completeness of the data and an adequate timeframe are crucial for ensuring the reliability and accuracy of the experimental results. Insufficient data can undermine statistical analyses, fail to reveal meaningful trends, and impede accurate anomaly detection. Moreover, while sliding window techniques are highly effective for processing time series data, careful window size selection is essential to prevent excessive smoothing or overlooking significant variations in the data. Too small a window might result in too much noise, whereas too large a window could obscure vital anomaly signals. Thus, choosing an appropriate window size is critical for enhancing anomaly detection’s sensitivity and specificity. In other words, the precision in selecting the window size plays a pivotal role in the effectiveness of anomaly monitoring strategies.

5. Conclusions

The traditional single model, such as the hidden Markov model (HMM), variational autoencoder (VAE), temporal convolutional autoencoder (TCN-AE), and bidirectional long short-term memory (BLSTM), cannot quickly detect the failure of anomalous traffic of in-cloud services in a timely manner, resulting in the high risk of unexpected system shutdown. In this study, the proposed ensemble meta-reinforcement learning (EMRL) can achieve an online real-time anomaly prediction model to rapidly adapt to the time-varying cloud environment and quickly and precisely detect the exceptions of the server resource utilization traffic. In summary, the proposed EMRL can significantly improve anomaly prediction accuracy and considerably shorten the model deployment for a new task.

In future work, we can try more advanced methods in data feature extraction and optimizer design to further improve the model’s performance. The proposed algorithm can combine with cluster computing technology to enhance system effectiveness and deal with big data applications. It can accelerate the training process by mobilizing idle resources of different nodes and execute a given algorithm. In the future, we will explore several CPU analyses besides a single CPU analysis for a cluster computing system and try different anomaly-labeling methods in detection.

Author Contributions

B.R.C. and G.-R.C. conceived and designed the experiments; H.-F.T. collected the data set and proofread the manuscript; and B.R.C. wrote the paper. All authors have read and agreed to the published version of the manuscript.

Funding

The Ministry of Science and Technology fully supports this work in Taiwan, Republic of China, under grant numbers NSTC 112-2622-E-390-001 and NSTC 112-2221-E-390-017.

Data Availability Statement

The Sample Codes for Sample Code.zip data are used to support the findings of this study. https://drive.google.com/file/d/1E-5JedPC0IMhE8g_lOJWInA1P5ZqRoK3/view?usp=sharing (accessed on 19 May 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kore, V.; Khadse, V. Progressive Heterogeneous Ensemble Learning for Cancer Gene Expression Classification. In Proceedings of the 2022 International Conference on Machine Learning, Computer Systems and Security (MLCSS), Bhubaneswar, India, 21–23 January 2022; pp. 149–153. [Google Scholar] [CrossRef]
Wu, T.; Ortiz, J. RLAD: Time Series Anomaly Detection Through Reinforcement Learning and Active Learning. In Proceedings of the 2021 International Conference on Machine Learning and Data Mining (MLDM), New York, USA, 15–18 July 2021; pp. 123–130. [Google Scholar]
Vanschoren, J. Meta-Learning. In Automated Machine Learning; Springer: Cham, Switzerland, 2019; pp. 35–61. [Google Scholar]
Zhang, S.; Ye, F.; Wang, B.; Habetler, T.G. Few-Shot Bearing Anomaly Detection via Model-Agnostic Meta-Learning. In Proceedings of the International Conference on Electrical Machines and Systems (ICEMS), Hamamatsu, Japan, 24–27 November 2020; pp. 1341–1346. [Google Scholar]
Olups, R. Zabbix 1.8 Network Monitoring; Packt Publishing Ltd.: Birmingham, UK, 2010. [Google Scholar]
Barandas, M.; Folgado, D.; Fernandes, L.; Santos, S.; Abreu, M.; Bota, P.; Liu, H.; Schultz, T.; Gamboa, H. TSFEL: Time Series Feature Extraction Library. SoftwareX 2020, 11, 100456. [Google Scholar] [CrossRef]
Zhao, Y.; Nasrullah, Z.; Li, Z. PyOD: A Python Toolbox for Scalable Outlier Detection. J. Mach. Learn. Res. 2019, 20, 1–7. [Google Scholar]
Finn, C.; Abbeel, P.; Levine, S. Model-Agnostic Meta-Learning for Fast Adaptation of Deep Networks. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017. [Google Scholar]
Schulman, J.; Levine, S.; Abbeel, P.; Jordan, M.; Moritz, P. Trust Region Policy Optimization. In Proceedings of the 32nd International Conference on Machine Learning (ICML 2015), Lille, France, 6–11 July 2015; pp. 1889–1897. [Google Scholar]
Alonso, J.; Torres, J.; Gavaldà, R. Predicting Web Server Crashes: A Case Study in Comparing Prediction Algorithms. In Proceedings of the 2009 Fifth International Conference on Autonomic and Autonomous Systems, Valencia, Spain, 20–25 April 2009; pp. 264–269. [Google Scholar]
Zou, Z.; Ai, J. Online Prediction of Server Crash Based on Running Data. In Proceedings of the 2020 IEEE 20th International Conference on Software Quality, Reliability and Security Companion (QRS-C), Macau, China, 11–13 December 2020; pp. 7–14. [Google Scholar]
Xue, Z.; Dong, X.; Ma, S.; Dong, W. A Survey on Failure Prediction of Large-Scale Server Clusters. In Proceedings of the Eighth ACIS International Conference on Software Engineering, Artificial Intelligence, Networking, and Parallel/Distributed Computing (SNPD 2007), Qingdao, China, 30 July–1 August 2007; pp. 733–738. [Google Scholar] [CrossRef]
Farshchi, M.; Schneider, J.; Weber, I.; Grundy, J. Metric Selection and Anomaly Detection for Cloud Operations Using Log and Metric Correlation Analysis. J. Syst. Softw. 2018, 137, 531–549. [Google Scholar] [CrossRef]
Zhang, K.; Xu, J.; Min, M.R.; Jiang, G.; Pelechrinis, K.; Zhang, H. Automated IT System Failure Prediction: A Deep Learning Approach. In Proceedings of the 2016 IEEE International Conference on Big Data (Big Data), Washington, DC, USA, 5–8 December 2016; pp. 1291–1300. [Google Scholar] [CrossRef]
Zha, D.; Lai, K.-H.; Wan, M.; Hu, X. Meta-AAD: Active Anomaly Detection with Deep Reinforcement Learning. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020. [Google Scholar]
Wu, H.-S. A Survey of Research on Anomaly Detection for Time Series. In Proceedings of the 2016 13th International Computer Conference on Wavelet Active Media Technology and Information Processing (ICCWAMTIP), Chengdu, China, 17–19 December 2016; pp. 426–431. [Google Scholar] [CrossRef]
Aygun, R.C.; Yavuz, A.G. Network Anomaly Detection with Stochastically Improved Autoencoder Based Models. In Proceedings of the 2017 IEEE 4th International Conference on Cyber Security and Cloud Computing (CSCloud), New York, NY, USA, 26–28 June 2017; pp. 201–206. [Google Scholar]
Zhou, C.; Paffenroth, R.C. Anomaly Detection with Robust Deep Autoencoders. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2017), Halifax, NS, Canada, 13–17 August 2017; pp. 665–674. [Google Scholar]
Tien, C.-W.; Huang, T.-Y.; Chen, P.-C.; Wang, J.-H. Using Autoencoders for Anomaly Detection and Transfer Learning in IoT. Computers 2021, 10, 88. [Google Scholar] [CrossRef]
Rabiner, L.; Juang, B. An Introduction to Hidden Markov Models. IEEE ASSP Mag. 1986, 3, 4–16. [Google Scholar] [CrossRef]
Thill, M.; Konen, W.; Wang, H.; Bäck, T. Temporal Convolutional Autoencoder for Unsupervised Anomaly Detection in Time Series. Appl. Soft Comput. 2021, 112, 107751. [Google Scholar] [CrossRef]
Xu, H.; Feng, Y.; Chen, J.; Wang, Z.; Qiao, H.; Chen, W.; Zhao, N.; Li, Z.; Bu, J.; Li, Z.; et al. Unsupervised Anomaly Detection VIA Variational Auto-Encoder for Seasonal KPIs in Web Applications. In Proceedings of the 2018 World Wide Web Conference (WWW ‘18), Lyon, France, 23–27 April 2018. [Google Scholar]
Saurav, S.; Malhotra, P.; TV, V.; Gugulothu, N.; Vig, L.; Agarwal, P.; Shroff, G. Online Anomaly Detection with Concept Drift Adaptation Using Recurrent Neural Networks. In Proceedings of the ACM India Joint International Conference on Data Science and Management of Data, Goa, India, 12–14 January 2018; pp. 78–87. [Google Scholar]
Anaconda Software Distribution. Anaconda Documentation; Anaconda Inc.: Austin, TX, USA, 2020. [Google Scholar]
Vingelmann, N.P.; Fitzek, F.H.P. CUDA, Release: 11.6; NVIDIA Corporation: Santa, Clara, CA, USA, 2020. [Google Scholar]
Chetlur, S.; Woolley, C.; Vandermersch, P.; Cohen, J.; Tran, J.; Catanzaro, B.; Shelhamer, E. cuDNN: Efficient Primitives for Deep Learning. arXiv 2014, arXiv:1410.0759. [Google Scholar]
Bousdekis, A.; Lepenioti, K.; Apostolou, D.; Mentzas, G. A Review of Data-Driven Decision-Making Methods for Industry 4.0 Maintenance Applications. Electronics 2021, 10, 828. [Google Scholar] [CrossRef]
Arnold, S.M.R.; Mahajan, P.; Datta, D.; Bunner, I.; Zarkias, K.S. learn2learn: A Library for Meta-Learning Research. arXiv 2020, arXiv:2008.12284. [Google Scholar]
Gupta, S.; Dinesh, D.A. Resource Usage Prediction of Cloud Workloads Using Deep Bidirectional Long Short Term Memory Networks. In Proceedings of the 2017 IEEE International Conference on Advanced Networks and Telecommunications Systems (ANTS), Bhubaneswar, India, 18–21 December 2017. [Google Scholar]

Figure 1. Autoencoder architecture where different colors represent different layers.

Figure 2. Reinforcement Learning where different colors represent different input signals.

Figure 3. MAML Learning.

Figure 4. Ensemble learning.

Figure 5. HMM flow.

Figure 6. VAE flow.

Figure 7. TCN-AE flow.

Figure 8. BLSTM architecture.

Figure 9. Resource usage trend.

Figure 10. Disk queue and usage plot.

Figure 11. Oplus application resource usage and exception time stamp.

Figure 12. Labeling method of exception window.

Figure 13. TSFEL statistical domain characteristics where the red area indicates that the system has crashed at least once.

Figure 14. Reinforcement learning.

Figure 15. Meta-reinforcement learning.

Figure 16. MAML-RL training flow chart.

Figure 17. Combining ensemble learning with the MAML-RL.

Figure 18. Training meta-reinforcement learning.

Figure 19. TensorBoard observing the training process.

Figure 20. Reward trend of reinforcement learning training process.

Figure 21. CPU resource usage line chart.

Figure 22. Reward curves under different adaptation steps.

Table 1. MAML hyperparameter settings.

Hyperparameter	Value
Hidden layer	[128,128,128]
Adapt learning rate	0.5
Number of iterations	100
Meta batch size	10
Number of workers	10
Cuda	1

Table 2. Hardware specifications of GPU workstation.

Unit	Component
GPU	NVIDIA GeForce RTX3080
CPU	Intel ^® Xeon ^® Silver 4208
Memory	32 GB DDR4
Storage	256 GB × 1 (SSD) 1 TB × 1 (HDD)

Table 3. Open-source software package.

Software Package	Version
Conda	22.9.0
Python	3.7.10
PyTorch	1.8.1
CUDA	11.6
Cuda Toolkit	11.1
cuDNN	8.1.0
TensorBoard	2.4.1
Matplotlib	3.4.2
Torchsummary	1.5.1
Pyod	1.0.7

Table 4. Time-consuming in the training phase (unit: kilo second).

Method	First Training	Second Training
HMM	0.965	0.0047
VAE	0.815	0.0031
TCN-AE	0.743	0.0027
BLSTM	0.581	0.0011
EMRL	5.781	0.0005

Table 5. Performance evaluation.

Method	Precision	Recall	F1-Score
HMM	0.323	0.412	0.362
VAE	0.401	0.508	0.448
TCN-AE	0.511	0.632	0.565
BLSTM	0.691	0.746	0.717
EMRL	0.781	0.841	0.810

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chang, B.R.; Tsai, H.-F.; Chen, G.-R. Self-Adaptive Server Anomaly Detection Using Ensemble Meta-Reinforcement Learning. Electronics 2024, 13, 2348. https://doi.org/10.3390/electronics13122348

AMA Style

Chang BR, Tsai H-F, Chen G-R. Self-Adaptive Server Anomaly Detection Using Ensemble Meta-Reinforcement Learning. Electronics. 2024; 13(12):2348. https://doi.org/10.3390/electronics13122348

Chicago/Turabian Style

Chang, Bao Rong, Hsiu-Fen Tsai, and Guan-Ru Chen. 2024. "Self-Adaptive Server Anomaly Detection Using Ensemble Meta-Reinforcement Learning" Electronics 13, no. 12: 2348. https://doi.org/10.3390/electronics13122348

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Self-Adaptive Server Anomaly Detection Using Ensemble Meta-Reinforcement Learning

Abstract

1. Introduction

2. Related Work

2.1. Literature Review

2.2. Time Series Anomaly Detection

2.3. Reinforcement Learning

2.4. Model-Agnostic Meta-Learning (MAML)

2.5. Ensemble Learning

2.6. Hidden Markov Model (HMM)

2.7. Variational AutoEncoder (VAE)

2.8. Temporal Convolutional Autoencoder (TCN-AE)

2.9. Bidirectional LSTM (BLSTM)

3. Methods

3.1. Anomalous Analysis of Computing Resource

3.2. Abnormal Data Labeling

3.3. Meta-Feature Extraction

3.4. Meta-Reinforcement Learning

3.5. MAML-RL Algorithms

3.6. Combining Ensemble Learning with MAML-RL Learning

3.7. Training the Meta-Reinforcement Model

3.8. Policy Network Evaluation

4. Experimental Results and Discussion

4.1. Experimental Setting

4.2. Data Collection

4.3. Length of Adaptation Step Pick-Up

4.4. Training Time Estimation

4.5. Performance Evaluation

4.6. Discussion

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI