1. Introduction
Anomaly detection aims to find abnormal behavior of data and is widely studied in many fields, like fault detection or predicted maintenance in industrial systems [
1]. The reason anomaly detection is important is because anomalies usually contain useful and critical message. To cope with the increasing data collected by research institutions and industries through the Internet of Things (IoT), it is important to have automated procedures that separate the anomalies from normal data.
However, anomaly detection is considered a hard problem [
2]. The extremely unbalanced data distribution is the biggest difficulty, and the negative class rate is extremely low. One detection algorithm which works very well on a certain benchmark might get surprisingly bad performance on another. Moreover, anomaly detection for time series is much more difficult due to the issue inherent in time series. For these reasons, this paper tries to find an effective and robust detection algorithm. Many scholars have studied the methods of detecting abnormal patterns by extracting data features in the field of anomaly detection. Anomaly-detection methods mainly consist of three types: statistical modeling [
3,
4,
5,
6], such as the k-means clustering and Random forest methods, temporal feature modeling [
7,
8,
9,
10] which is mainly based on the LSTM, and spatial feature modeling [
11,
12,
13] which takes the advantages of CNN. Traditionally, time-series anomaly detection has been tackled using distance-based methods, such as the dynamic time wrapping algorithm (DTW) [
14], meanwhile, artificial neural networks have become powerful tools for time-series anomaly detection due to the large amount of data.
Membrane computing (P system) [
15], a novel branch of natural computing, has gained popularity in recent years due to its promising features such as the distribution, uncertainty, and especially, parallelism. The P system were inspired by the structure and function of biological cells and communication in tissues, organs and cell populations [
16,
17]. Many variants of P system has been proposed and combined with many optimization approaches [
18,
19] which shows great performance of convergence and robustness [
20,
21,
22,
23,
24]. Furthermore, with the development of GPU and to make most use of the parallelism of membrane systems, P system has been simulated in GPU [
25] recently. However, the common P systems use the simplified membrane structures to deal with problems due to computation purposes; therefore, it is necessary to use complex structures of membranes to solve real applications.
Deep learning has become a popular machine learning approach due to its ability to learn high-level representations related to the data, such as the periodicity and seasonality of the time series. These representations are learned automatically from data with little or no need manual feature engineering and domain expertise [
26]. For time-series data, LSTM has become the most widely used model for its ability to learn long-range patterns. LSTM works well in handling the variable-length sequences, but it lacks the ability to extract local contextual information and cannot use the contextual information; therefore CNN is integrated in this paper. Due to these considerations, the main intention of this work is to combine P systems and deep-learning approach to develop a novel framework for time-series anomaly detection. We proposed a hybrid dynamic membrane system (HM-AL-CNN) which reduces the time and takes advantage of ensemble learning in deep P system. In the novel membrane structure, we carry out multiple AL-CNNs for time-series anomaly detection which predicts the label of next timestamp using a window of time-series.
2. Main Contributions
The objective of HM-AL-CNN is to robustly detect time-series point anomalies and discord. As far as we know, this is the first attempt to solve temporal data anomaly-detection tasks via a membrane system-based approach. Profiting from its parallelism, the proposed P system can handle several AL-CNN models with different initialization to get effective features simultaneously. For comparison, we evaluate our methods on three well-known benchmarks that have been employed by many previous approaches. Experimental results show that the proposed methods possess a robust and superior performance compared to the state-of-the-art methods. The following are the main contributions of this paper.
A hybrid dynamic P system is proposed to solve complex tasks, which integrate the tree-based and graph-based P system; two types of membrane evolutionary rules are introduced as well.
This paper intends to take advantage of the outstanding performance of P systems and deep-learning methods. CNN and LSTM were integrated into the proposed membrane systems for time-series anomaly-detection tasks.
The proposed method employs LSTM with attention mechanism; squeeze-and-excitation networks are extended and added to further improve the performance.
The proposed approach is evaluated on three well-known benchmarks from different domains and shows better performance than other detection algorithms.
The rest of this paper is arranged as follows.
Section 3 gives an overview of the background works and gives the architecture framework of HM-AL-CNN. In
Section 4, experimental settings and datasets description are presented.
Section 5 provides a detailed evaluation of the proposed algorithm on three well-known benchmarks along with other popular anomaly-detection methods. Finally, conclusions and direction for future work are laid out in
Section 6.
4. Experiments
4.1. Experiments Settings
To evaluate the proposed method, HM-AL-CNN has been tested on three benchmarks which are described in
Section 4.4. The model was optimized using Adam with an initial learning rate of
and the convolution kernels are initialized by the He initialization scheme [
37], ReLU was used as the activation function for the hidden layers. The number of training epochs was determined based on the length of the input; for the Yahoo Webscope S5, the model was trained for 500 epochs using batches of 128. The Classic Anomaly Datasets and Space Shuttle Valve Dataset were trained for 700 epochs using batches of 256.
Time-series data need to be transformed into sequences of overlapping windows of size w so that the system makes sense. For at time step t, its condition (normal or abnormal) is used as the label of the former w elements; w is the time window size which is also called a history window. Then, we can define the data as a form of (), where N is the number of samples in the time series, Q indicates the maximum time steps and M represents the number of variables; we define the M to 1 if the time series is univariate.
In addition, both the train and test datasets are normalized using Equation (
20). x and
represents the value of the actual time-series data and the normalized value, respectively. Moreover, we define fixed-sized anomaly windows with each window centered around an anomaly; points in the anomaly window are labeled abnormal. For instance, if the anomaly window size is set to 10, indicating the former 5 points and latter 5 points are labeled abnormal. Only the training sets are operated as such; this up-sampling operation can relieve the extremely imbalance of the data and enhance the performance significantly especially the recall rate.
4.2. Loss Function and Output
Cross-Entropy Loss given in Equation (
21) has been employed to measure the difference between the actual value
and predicted value
.
In our case, the SoftMax layer classifies the output into two classes either normal or abnormal as described in Equation (
22).
C indicates the class, d is the output of the fully connected layer,
w is the weight,
L represents the last layer and
is the total number of classes.
4.3. Evaluation Metrics
The proposed approach is evaluated using Precision, Recall, F-score and AUC. If an abnormal case is classified as a normal, this type of error is considered to be false negative (FN). True positive (TP), true negative (TN) and false positive (FP) is defined similarly; each algorithm was evaluated through TP, TN, FP and FN rates. In addition, AUC is also the most commonly used metric for evaluating anomaly-detection methods.
4.4. Datasets Description
In this section, we describe three well-known benchmarks from different domains, including real-world and the synthetic datasets which have been applied in previous works on anomaly detection, including the Yahoo Webscope S5, Classic Anomaly-Detection Datasets and Space Shuttle Valve Dataset.
Yahoo Webscope S5 consists of four classes. Class A1 contains the real Yahoo membership login data, and A2, A3 and A4 contain synthetic anomaly data (
https://research.yahoo.com).
Table 1 shows the characteristics of each sub-benchmark. This dataset contains 367 time series. Each time series consists of almost 1500 data including 0.02% abnormal values.
Figure 4a,b show the statistical graphs for the class A1. We can see from the two figures that the data distribution of each file is significantly different; it is not easy to carry out anomaly detection using statistical analysis techniques.
Figure 5a shows a real-world time series of the A1 class.
Six commonly used natural datasets have been adopted in this section, which can be found at the UCIRepository [
38] and OpenML; anomaly cases have already been marked as ground truth, including the
Pima,
Covertype,
Ionosphere,
Mammography,
Shuttle and
Kddcup99. We have removed all non-continuous attributes as done in [
39,
40]. Properties of each dataset are shown in
Table 2.
This dataset collects values which control the flow of fuel on the space shuttle. Some subsequences are normal and few subsequences are abnormal.
Figure 5b shows this time series, and the time series is segmented to several subsequences with an orange dotted line; some subsequence are considered abnormal or, in other words, discord subsequences.
4.5. Comparison to State-of-the-Art
Experiments on the Yahoo Webscope S5 are compared to several deep-learning approaches, including the CNN, LSTM, CNN + LSTM, DeepAnt [
41] and two popular tools, Yahoo EGADS which was released by Yahoo Labs to detect anomalies in large scale time-series data and Twitter Anomaly-Detection method which aims to detect anomalies of social network data [
38]. There are also many different previous works related to the classic anomaly benchmarks mentioned in
Section 4.4, for the sake of brevity, we select the popular anomaly-detection techniques for comparison including the Isolation Forest (iForest), OCSVM, LOF [
42].
6. Conclusions
In this paper, we propose a novel hybrid dynamic membrane system which takes advantages of tissue-like and cell-like P system for a time-series anomaly-detection task. To get more accurate detection results, CNN and LSTM with attention mechanism are combined, and 1D Squeeze-and-Excitation mechanism is introduced to better learn effective features. Two types of rules are introduced in the designed membrane system, profiting from the parallelism of P system; this proposed HM-AL-CNN can process several AL-CNN models individually, which consumes less time. Experiments show that the proposed possesses better performance than other time-series anomaly-detection algorithms in different benchmarks. However, there are still many important parameters that need to be chosen manually in our system, which remains to be addressed. Evolutionary algorithms such as the particle swarm optimization could be used in the future. Moreover, the design of a more effective membrane system to solve complex problems is also meaningful.