MDFULog: Multi-Feature Deep Fusion of Unstable Log Anomaly Detection Model

Li, Min; Sun, Mengjie; Li, Gang; Han, Delong; Zhou, Mingle

doi:10.3390/app13042237

Open AccessArticle

MDFULog: Multi-Feature Deep Fusion of Unstable Log Anomaly Detection Model

by

Min Li

^†,

Mengjie Sun

^†

,

Gang Li

^*

,

Delong Han

and

Mingle Zhou

Shandong Computer Science Center (National Supercomputer Center in Jinan), Qilu University of Technology (Shandong Academy of Sciences), Jinan 250014, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2023, 13(4), 2237; https://doi.org/10.3390/app13042237

Submission received: 10 January 2023 / Revised: 3 February 2023 / Accepted: 7 February 2023 / Published: 9 February 2023

(This article belongs to the Special Issue Signal, Multimedia, and Text Processing in Cybersecurity Context)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Effective log anomaly detection can help operators locate and solve problems quickly, ensure the rapid recovery of the system, and reduce economic losses. However, recent log anomaly detection studies have shown some drawbacks, such as concept drift, noise problems, and fuzzy feature relation extraction, which cause data instability and abnormal misjudgment, leading to significant performance degradation. This paper proposes a multi-feature deep fusion of an unstable log anomaly detection model (MDFULog) for the above problems. The MDFULog model uses a novel log resolution method to eliminate the dynamic interference caused by noise. This paper proposes a feature enhancement mechanism that fully uses the correlation between semantic information, time information, and sequence features to detect various types of log exceptions. The introduced semantic feature extraction model based on Bert preserves the semantics of log messages and maps them to log vectors, effectively eliminating worker randomness and noise injection caused by log template updates. An Informer anomaly detection classification model is proposed to extract practical information from a global perspective and predict outliers quickly and accurately. Experiments were conducted on HDFS, OpenStack, and unstable datasets, showing that the anomaly detection method in this paper performs significantly better than available algorithms.

Keywords:

Bert; Informer; anomaly detection; cybersecurity; log

1. Introduction

With the development of Internet and computer technology, cloud computing, big data, Internet of Things, and other systems are expanding in size and complexity, system maintenance is becoming more difficult, and anomalies become unavoidable [1]. A minor problem in the system can lead to performance degradation, data corruption, or even a significant loss of customers and revenue. Therefore, for the development and maintenance personnel of a system or software, ensuring its stable and reliable operation and reducing the number and scope of exceptions are their primary objectives [2]. Modern systems often produce many log files during operation. Systems running on medium networks can easily exceed TBs [3] daily. The logs reflect the running state of the system and record activity information for specific events in the system. Logs are a valuable resource for understanding the state of your system. Therefore, system logs are an important data source for performance monitoring and anomaly detection.

The sheer volume and variety of log data make it more difficult for operational engineers to rely on simple keyword searches or regular matching for manual analysis. Although researchers have proposed many solutions for log anomaly detection, the following problems remain:

(1): Standard log parsers frequently generate more noise when detecting log exceptions, resulting in ambiguous parsing. The system log is generally noisy, and understanding the subjectivity of operators takes much work. Normal noise logs are easily misinterpreted as abnormal. The uncertainty of log module updating makes dynamic adaptation to log changes difficult for the detection model. The traditional classification-based deep learning method can effectively solve the problem of static log exception detection. However, when dealing with dynamically unstable data, performance suffers significantly due to conceptual drift and noise.
(2): The current anomaly detection method separately trains the corresponding learning models for the different features of the log (e.g., time anomalies, parameter anomalies, etc.) while ignoring the correlation between the different features of the log. Even some methods usually handle only a single exception type. To effectively identify log-related logical and time exceptions, they need to be better at mining complicated and changeable log feature information. As a result, detection accuracy is impacted when feature correlation analysis and feature fusion capability are neglected.
(3): Current studies have primarily used variants of RNNs, led by LSTM, to detect anomalies in log data. However, capturing information recursively only obtains historical sequences, not long series dependencies, and internal relationships cannot be learned globally. However, in recent years, the popular Transformer’s ability to learn from a global perspective can effectively handle long-sequence problems. However, the computational cost is high, and the running speed is slow. A global, low-cost, and fast algorithm is urgently needed to solve the problem of log anomaly detection.

This paper proposes a novel multi-feature deep fusion of an unstable log anomaly detection model (MDFULog) to address the above problems. In order to have more sophisticated log parsing, the authors propose a novel token-based log parsing method to extract log templates without losing log information. The semantic information in the template is represented vectorially by Bert based on contrastive learning, which can ensure the robustness of the newly input template. Furthermore, the temporal information is projected into the high-dimensional embedding, considering various log anomalies. The suggested Informer-based classification model for anomaly detection reduces memory occupation and enhances attention perception. The proposal was evaluated using the real open-source datasets HDFS, OpenStack, and our synthetic unstable log dataset. The contributions of this article can be summarized as follows:

An innovative token-based log parsing method is proposed in this paper. It can retain the original numerical information of the log for subsequent processing and process logs of different lengths.
This paper presents a feature enhancement mechanism that combines semantic and temporal features, making full use of the correlation among semantic information, time information, and sequence features to detect various log anomalies.
This paper introduces a Bert-based semantic feature extraction model to preserve the semantics of log messages and map them to log vectors. When facing a noisylog, it will not be roughly treated as a new log template, but will be classified based on similarity. It effectively eliminates worker randomness and noise injection caused by log template updates and improves the robustness of anomaly detection.
This paper presents an Informer-based anomaly detection classification model, which accurately extracts effective information from a global perspective and quickly predicts anomalies. This model captures the global dependencies of complex log exceptions and focuses on critical information. It reduces the fitting time of the model and achieves efficient and flexible anomaly detection.
In this paper, many contrasts and robustness experiments were carried out on five datasets, including regular and unstable datasets. It is generally shown that the model proposed in this paper is better than the existing models in accuracy and speed under several evaluation indexes.

The rest of the article is as follows. Section 2 reviews the recent developments in log anomaly detection. Section 3 describes the MDFULog model in detail. Section 4 presents the experiments we performed and presents the results. Section 5 describes the conclusions and future work.

2. Related Work

The steps of log anomaly detection include log gathering, log parsing, feature extraction, and anomaly detection. Logs are usually generated automatically by large systems, and each log contains the timestamp and logs’ message information for a specific event. Since log messages are unstructured text, the goal of log parsing is to withdraw an event template from a set of logs for subsequent structuralanalysis.

Presently, the log parsing methods can be classified into offline and online methods. In offline methods, Cheng et al. [4] proposed regular expressions to identify log templates. The clustering-based log parsing algorithm SLCT was proposed by Vaarandi et al. [5], which extracts log templates by clustering the same word sets that appear more frequently than the threshold in the log. Makanju et al. [6] proposed the IPLoM model based on hierarchical clustering, which iteratively clusters the message length, token position, and mapping relationship of the log layer by layer. Compared with SLCT, it does not require threshold setting. The limitation of the offline method is that it cannot perform real-time anomaly detection and needs to be retrained for novel log types. In order to meet the actual industrial needs, an online method that can be resolved in real-time for subsequent anomaly detection is proposed. Du et al. [7,8] proposed the Spell algorithm for online log clustering based on the longest common subsequence. He et al. [9] presented a Drain algorithm for log clustering based on the idea of a deep fixed tree. Zhang et al. [10] proposed the frequent template tree (FT-Tree) to obtain a log template by extracting the longest combination of words. Meng et al. [11] proposed the template word embedding model Template2Vec based on synonym and antonym sets, combining FT-Tree to generate a log template vector. Studiawan et al. [12,13] proposed the graph-based data structure to determine the similarity between logs, where log templates are extracted by clustering automation. However, as the quantity of log data enlarges, the effect of these methods will change significantly.

After parsing the log into a separate event, it is necessary to encode it into different feature vectors for subsequent anomaly detection. The features include sequence features, variable value features, window features, the co-occurrence matrix, etc.

The current anomaly detection methods are partitioned into supervised and unsupervised based on whether there are labeled training data. The accuracy of supervised anomaly detection models often relies on the number and accuracy of the training data. The supervision methods include logistic regression [14], decision trees [15], and SVM [16,17,18]. Unsupervised anomaly detection methods comprise various clustering methods [5], association rule mining [19], and PCA [20]. In recent years, some researchers have also applied deep learning to the field of anomaly detection, including CNNs [21], RNNs [22], LSTM [11,23,24,25,26,27], Bi-LSTM [28,29,30,31], Auto-Encoder [32,33], Transformer [32,34,35,36], the GRU [37,38], Bert [39], and so on. Among them, the work [5] proposed the LogCluster method to identify online anomalies by clustering logs. The work [23] proposed the DeepLog method, which uses a deep learning model based on LSTM to model the system log and perform anomaly detection. DeepLog ignores the semantic information of logs. The work [24] trained the stacked LSTM to model normal and abnormal sequential patterns. The work [22] proposed the use of the attention-based RNN for anomaly detection, which enhanced the attention to anomaly detection in system logs and improved the interpretability of the model. However, it only concentrates on log events and overlooks the context relationship in the log sequence. The work [11] proposed the LogAnomaly model and realized the end-to-end log group and sequence anomaly detection through the anomaly detection model based on LSTM. The work [28] proved the unstable log data and proposed the LogRobust model to solve this problem. LogRobust uses an attention-based Bi-LSTM to capture context information. However, in the case of a large amount of log data, the calculation cost is high and the training speed is slow. The work [18] proposed the ROEAD framework, using the RFE to eliminate noise and the OES for anomaly detection. The work [26] proposed DeepSyslog, an anomaly detection framework that combines constant text data and digital event metadata values of logs. However, DeepSyslog does not consider the potential danger caused by time interval anomalies.

3. Methods

3.1. Overview

To settle the problems in the existing log anomaly detection methods, this paper proposes a novel model, MDFULog. As shown in Figure 1, the model first parses the log and extracts valid information. Secondly, the semantic vector is obtained using the Bert model with contrastive learning embedded, and the time vector is obtained by calculating the time interval between two logs. Combining the semantic and time vectors was used to obtain a vectorized representation of the log. Finally, anomaly detection was performed using an Informer-based classification model.

3.2. Log Parsing Method

The primary task of log anomaly detection is converting log contents into features the machine can recognize and understand. According to LogRobust [32], the processing noise in log data primarily comes from the lack of log parsing. Incorrect log parsing may result in several misidentified log templates, which hinders the model’s performance. Our proposed parsing model does not use existing log parsers, but directly processes the original logs to avoid the noise generated by the log analysis.

The first step of log parsing is to split each log message

l_{i}

by the delimiter, breaking the log into words and numeric values. In order to segment the log more accurately, the “space”, etc., are regarded as separators, for a given

C = \{d_{1}, d_{2}, \dots d_{n}\},

(1)

where

d_{i}

denotes a valid word. After splitting the log

l_{i}

, first, check whether the words it contains are in dictionary C. The word set of

l_{i}

can be expressed as

l d_{i} = \{d_{a}, d_{b}, \dots d_{m}\}, \forall d_{i} \in C .

(2)

Because some words appear more times in the log message, this will affect subsequent processing. Thus, we count the number of times the word

d_{i}

in the word set

l d_{i}

,

l f_{i}

is used.

The second step is to aggregate log

l_{i}

with the same

l d_{i}

and

l f_{i}

into a novel log cluster

L_{i}

.

The third step is to use the LCS to mask the log variables in

L_{i}

. When using the LCS for masking, not only effective words, but also words outside the dictionary are considered. For example, a cluster contains “PacketResponder 1 for block blk_-1608999687919862906 terminating” and “PacketResponder 2 for block blk_-1608999687919862906 terminating”. The masking result of its cluster is “PacketResponder <*> for block <*> terminating”. “<*>” is a sign used to cover up the variables.

After completing the above three steps, the event template

l t_{i}

of all logs is obtained. This way, parsing logs reduces the noise caused and parses event templates for rare logs.

3.3. Log Vectorization

The authors found that multiple anomalies could not be detected by log semantics alone. Therefore, time information needs to be introduced as another log feature to detect faults. After log parsing, the session is built through a sliding window. The log sequence is converted into the semantic feature F and time feature T. Next, the semantic and temporal features are encoded.

Semantic features: When representing a log, the normal log should be close enough, and the semantic vectors should be as evenly distributed on the hypersphere as possible. The reason is that the uniform distribution has the highest information entropy. The more uniform the distribution is, the more information remains. The semantic vector obtained by contrastive learning can meet the above requirements. Therefore, the Bert model with contrastive learning embedded can improve the noise caused by the update of log statements and enhance the robustness of the model.

The cross-entropy function of contrast learning was used as a loss function to adjust the parameters of Bert’s pre-trained language model. The cross-entropy function of contrast learning can make the model learn rich and accurate semantic features. The formula is

{loss}_{i} = - (log \frac{e^{\frac{cos (f_{i}, f_{i}^{+})}{ρ}}}{e^{\frac{cos (f_{i}, f_{i}^{+})}{ρ}} + \sum_{j = 1}^{a} e^{\frac{cos (f_{i}, f_{a})}{ρ}}} + log \frac{e^{\frac{cos (f_{i}, f_{i}^{*})}{ρ}}}{e^{\frac{cos (f_{i}, f_{i}^{*})}{ρ}} + \sum_{j = 1}^{a} e^{\frac{cos (f_{i} f_{a})}{ρ}}}),

(3)

{loss}_{i}^{*} = - log \frac{e^{\frac{cos (f_{i}^{*}, f_{i})}{ρ}}}{e^{\frac{cos (f_{i}^{*}, f_{i})}{ρ}} + \sum_{j = 1}^{a} e^{\frac{cos (f_{i}^{*}, f_{a})}{ρ}}},

(4)

where

f_{i}

denotes the semantic vector of the common log template

l t_{i}

,

f_{i}^{+}

is the semantic vector of the common log template

l t_{i}^{+}

that has a context with

l t_{i}

,

f_{i}^{*}

is the semantic vector of the noisy log template

l t_{i}^{*}

,

f_{a}

is the semantic vector of the abnormal log template

l t_{a}

, a is the number of abnormal log templates, and

ρ

is an adjustable parameter.

The Bert structure is shown in Figure 2.

e_{1}, e_{2}, \dots, e_{n}

is the input sequence of the Bert model. TM is the Encoder model of the Transformer, and

x_{1}, x_{2}, \dots, x_{n}

is Bert’s output word vector sequence. Bert’s input sequence is shown in Figure 3, using [CLS] to mark the starting position of the log message and sending each word (token) in the log template to the token embedding layer to transform each word into a vector form. The token embedding layer transforms each word into a fixed-dimension vector. In Bert, each word is transformed into a 768-dimensional vector representation. Segment embedding is a vector representation that distinguishes two sentences when the input sequence is a sentence pair. Position embedding is the position information of each word, which is used to solve the problem that token embedding cannot encode the position information of the input sequence, i.e., to assign different semantics to the same word at different positions. Then, the embedded vector

e_{1}, e_{2}, \dots, e_{n}

is input into the transformer encoder (TM) as the model input, and finally, the log semantic vector F is obtained.

Time characteristics: It is worth paying attention to the time interval between two logs if an abnormalexception appears at runtime. Potential threats may cause the interval. Each log message contains a timestamp, which can be used to calculate the time interval between two log messages. Consider time interval sequence

Δ T = {Δ t_{1}, Δ t_{2}, \dots Δ t_{i} \dots}

, and let

Δ t_{1}

to −1,

Δ t_{i}

be the timestamp of

l_{i}

minus the timestamp of

l_{i - 1}

. Let

Δ t_{i}

denote the time interval, W represent the weight matrix, and b denote the random deviation vector, and the vector k is

k = softmax (Δ t_{i} W + b),

(5)

T = k E,

(6)

where the unit matrix E is weighted to obtain the time vector T. The final log vector V consists of a semantic vector F and a time vector T, i.e.,

V = concatenate (F, T) .

(7)

3.4. Classification Based on Informer

In MDFULog, the Informer-based classification model is used for detection. Compared with Transformer, Informer reduces the computational and space complexity resulting from the conventional self-attention. Moreover, it has the advantages of high speed, less memory, and more perception. The classification in the MDFULog model includes the Informer decoder layer, pooling layer, and MLP classification layer.

Informer decoder layer: The classification model stacks multiple Informer decoder layers to increase the robustness. The structure of the Informer decoder is shown in Figure 4. It includes the ProbSparse self-attention and the fully connected layer. This layer receives

X = {V, X_{0}}

as the input (the target part of the prediction was set to 0). The ProbSparse self-attention layer first handles this. Secondly, it interacts with the initial vector V through multi-head attention. Finally, the fully connected layer is used to connect multiple outputs. The calculation process is as follows:

Q = X W^{Q},

(8)

K = X W^{K},

(9)

V = X W^{V},

(10)

H_{i} = Softmax (\frac{{\bar{Q}}_{i} K_{i}^{T}}{\sqrt{d_{k}}}) V_{i},

(11)

O = C o n c a t (H_{1}, H_{2}, . . ., H_{n - 1}, H_{n}) W^{O},

(12)

where

W^{Q}, W^{K}, W^{V}

are the learnable parameter matrices of the model and

d_{k}

is the dimension of the key vector.

C o n c a t (\cdot)

uses the parameter matrix

W^{O}

to connect the outputs of each header.

Pooling layer: The classification model uses the pooling layer to prevent information redundancy and overfitting. Input the output matrix O of the Informer decoder layer into the pooling layer for pooling. The main feature vectors are extracted to lay the foundation for subsequent processing.

Z = pool (O) .

(13)

MLP classification layer: The multi-layer perceptron is used for classification. The Softmax function is used to calculate the probability of abnormality so as to determine abnormality.

3.5. Anomaly Detection

MDFULog trained an Informer-based model for log anomaly detection. When a new set of log messages arrives, the MDFULog model first parses and represents the log vectors. The semantic vector and time vector of the log are used to represent the log, and the obtained log vector is input into the pre-trained model. Finally, the Informer-based model can detect the log to determine whether it is abnormal.

4. Experiment and Analysis

4.1. Dataset

Our proposal was evaluated on HDFS, OpenStack, and unstable synthetic datasets. The HDFS dataset is a log dataset generated by Hadoop’s map-reduce operations. It contains 11,175,629 logs collected from the Hadoop distributed File System on the Amazon EC2 platform. Each piece of information contains a timestamp (year, month, day, minute, and second), PID, message content, and other fields. The program execution in the HDFS system usually involves log blocks. Each session corresponds to a log block. There are 575,061 log blocks in the dataset, of which 16,838 are marked as abnormal by Hadoop domain experts. The OpenStack dataset is a log dataset generated on CloudLab, containing 207,820 logs’information. The main content is to perform virtual-machine-related tasks. There are 2043 log instances in the dataset, of which 198 are treated as abnormal data. The unstable synthetic dataset was used to demonstrate the robustness of our processing of unstable log data, and the HDFS dataset was injected with potentially unstable log data. Two kinds of logs generated in the process of log evolution and abnormal log time intervals were mainly synthesized. These synthesized unstable log sequences are injected into the original log data proportionally. The synthetic log data can express the instability of the actual logs.

Log Exception Characteristics

The anomalies in the log were determined by a combination of multiple factors, such as execution path anomalies caused by event sequence characteristics and time series anomalies caused by time series characteristics. A path anomaly is a common anomaly in the system. A path anomaly is a deviation from the normal event sequence at a certain location in the log event sequence. The specific exception is shown in Figure 5:

A delay sequence is a sequence corresponding to an event sequence based on the output time interval between two logs. Logs are the output of important events generated by a system program through a rigorous process, and there is a certain distribution of time intervals between the two events. The specific exception is shown in Figure 6:

4.2. Evaluation Metrics

The evaluation indices used in this paper include

P r e c i s i o n

,

R e c a l l

, and

F 1

-

m e a s u r e

. This paper used these three effective and common evaluation criteria to evaluate different models. The calculation of

P r e c i s i o n

is

P r e c i s i o n = \frac{T P}{T P + F P},

(14)

and the calculation of

R e c a l l

is

R e c a l l = \frac{T P}{T P + F N},

(15)

where

T P

denotes the number of abnormal data correctly detected as abnormal,

T N

denotes the number of normal data correctly detected as normal,

F P

denotes the number of abnormal data incorrectly detected as normal, and

F N

denotes the number of normal data incorrectly detected as abnormal.

F 1

-

m e a s u r e

is the harmonic average of the precision and Recall rates, which comprehensively evaluates the overall performance of anomaly detection. The calculation formula is

F 1 - m e a s u r e = 2 \times (\frac{P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}) .

(16)

4.3. Result Analysis

MDFULog was compared with PCA [20], DeepLog [23], LogRobust [28], ROEAD [18], and DeepSyslog [26].

4.3.1. Original Dataset

The results of comparing MDFULog versus the baseline models are listed in Table 1, with the highest score expressed in bold. It can be found from the Recall that, compared with the traditional PCA and DeepLog, the recall for the HDFS dataset increased by 30% and 5%, respectively, and for OpenStack increased by 28% and 15%, respectively. This indicates that the MDFULog model has a high Recall rate on the HDFS and OpenStack datasets and can find anomalies more accurately. Precision shows that the model in this paper has a good effect in detecting abnormal log data. MDFULog achieved about 98% and 96% on HDFS and OpenStack, respectively. It can be seen from the F1-score that the proposed model obtained the best F1 value among the six methods, and the average scores on HDFS and OpenStack were about 98% and 96%, respectively. Both PCA and DeepLog use the index of the log template to represent logs, ignoring the template’s semantic information. DeepLog models and detects exceptions of the execution path type and parameter type, respectively, in the detection process, ignoring the correlation between multiple exceptions. LogRobust fully considers semantic information, but does not consider the impact of time parameters. The detection method cannot effectively cover the type of time anomaly. DeepSyslog also does not consider the potential dangers of time interval anomalies. ROEAD uses natural language processing technology to decrease the number of log templates. The Recall rates of ROEAD and MDFULog on the HDFS dataset were very close, but the Precision of MDFULog was somewhat higher than that of ROEAD. On OpenStack, the Recall was slightly lower than for DeepSyslog. DeepSyslog achieved a higher Recall rate of 0.97, but its Precision was only 0.87. The lower Precision implies that additional anomalies cannot be detected. MDFULog combines Bert and contrastive learning to withdraw the semantic information of the log better and introduces time information as a feature to capture more types of faults, hence increasing the Precision of anomaly detection. The Informer classifier is used to perform log anomaly detection on multiple features. Finally, a higher F1 value was obtained, showing the model’s effectiveness for sequence anomaly detection. Our model showed good stability on the different datasets, and the inspection performance of the MDFULog model identifiedvirtually all the anomalies in the HDFS and OpenStack datasets.

4.3.2. Unstable Dataset

We trained the evaluated models on the original HDFS and then tested the trained model on the synthetic test set (the injection ratio is the proportion of unstable log injection, respectively 5%, 10%, 15%, and 20%). The average results of multiple experiments are shown in Table 2.

With the increase in the proportion of unstable logs, the F1-scores of PCA, DeepLog, LogRobust, ROEAD, and DeepSyslog decreased significantly, but the F1-scores of MDFULog did not change significantly. Even when the proportion of unstable logs was as high as 20%, the MDFULog model could obtain a high F1-measure (more than 0.95). MDFULog classifies log messages of the log sequence by calculating the semantic similarity, and the noise injection log rules will not significantly impact the semantics of log messages. Increasing the ratio of infused unstable logs will not significantly impact the F1-measure of MDFULog’s anomaly detection. The traditional method cannot handle the instability log data created during the system update process and the noise generated during the processing. However, MDFULog can maintain higher performance, which confirms its robustness.

The Precision of DeepLog on unstable datasets declined from 0.86 to 0.7. The factor affecting this is that DeepLog treats certain regeneration normal logs as anomalies. The F1-measure of PCA, DeepLog, LogRobust, ROEAD, and DeepSyslog decreased by 25%, 29%, 2%, 11%, and 14%, respectively. After injecting noise into a stable dataset, models (such as PCA) trained on the initial dataset dealt with the changing events as novel templates, bringing error alerts. Although LogRobust had a low F1-measure drop ratio, the overall F1-measure was low because of its coarse parsing of semantic information. On the contrary, MDFULog’s Precision, Recall, and F1-measure did not drop significantly. The reason is that the MDFULog model uses the Bert model embedded with contrastive learning to learn the semantic vector of the log, which can better handle the updated log.

Experimental outcomes showed that the MDFULog model can be effectively be applied not only to unstable log datasets, but also to stable log datasets.

4.3.3. Robustness and Efficiency

This article also evaluated the robustness and efficiency of different log parsing methods on various log volumes. In this experiment, the Spell [7,8], AEL [40], Drain [9], IPLoM [6], and MDFULog models were selected for comparison, and three datasets were selected, namely HDFS, BGL, and Android. Figure 7 shows the Accuracy of the log parsing methods on different log volumes. The horizontal axis denotes the number of logs, and the vertical axis denotes the resolution accuracy for different numbers of log statements. IPLoM uses an iterative partition strategy to divide log messages into groups according to message length, token location, and mapping relationship. The effect of IPLoM on HDFS data was poor. With the increase of the log data, the Accuracy was less than 0.8. On the BGL data, Spell also showed a significant fluctuation. Spell uses the longest common subsequence algorithm to parse logs in a stream. When the log data were complex and changeable, the effect of Spell decreased significantly. On the Android data, MDFULog also dropped slightly when the number of logs increased. Android contains many log templates, and the parsing is much more complicated. Drain applies a fixed-depth tree structure to represent log messages and effectively extracts common templates. Compared with the other algorithms on the Android data, it performed better. The experimental results showed that MDFULog has good robustness to different log volumes and types of log entries when changing the log volume.

Figure 8 shows the efficiency of the log resolution method on different log volumes. The horizontal axis represents the number of logs, and the vertical axis represents the time taken to parse a different number of log statements. The parsing time increases with the increase of the logs. With the linear growth of the log size, Drain, IPLoM and MDFULog had high efficiency on different datasets. AEL divides logs into multiple groups. When the BGL data volume is large, the reason for the low efficiency is that it needs to compare multiple logs. Other algorithms cannot scale effectively with the increase or decrease of the log volume. The MoLFI algorithm cannot parse 1 GB of complex log data in 5 h. The type of log also affects the efficiency of the log parsing algorithm. Log parsing is typically successful when the log data are straightforward and there are not many different event templates. Since HDFS logs, for instance, only include 30 event templates, all log resolution methods can resolve 1 GB of data each hour. However, the parsing procedure will be slow for logs containing many event templates (such as Android).

4.4. Model Evaluation and Advantages

We evaluated the methods proposed in this paper. The MDFULog model has the following advantages. MDFULog model can effectively eliminate the noise generated in the log parsing process and dynamically adapt to the randomness and noise injection brought by log updates. The MDFULog model enhances the correlation between various features through a feature fusion mechanism and can more comprehensively and profoundly detect various log exceptions. The MDFULog model uses the Informer-based exception detection classification model to capture the global dependency of complex log exceptions and focus on key information. It reduces the fitting time of the model and realizes efficient and flexible anomaly detection. The MDFULog model has good stability, robustness, and generalization and improves detection efficiency. Through multiple evaluation index experiments on five different scene datasets, we proved that the model proposed in this paper is superior to the existing models in terms of accuracy and speed and has good scalability.

5. Conclusions

This paper presents an unstable log anomaly detection model based on multi-feature deep fusion, which can simultaneously detect semantic and time anomalies in system logs. This method uses Bert’s model with contrastive learning embedded to withdraw the semantic information of the log template, thereby enhancing the connection between log information. Time information is considered by projecting onto a high-dimensional embedding, using multiple features to represent the log. An Informer-based classification model was used to perform anomaly detection on logs to improve the efficiency and accuracy of detection. The experimental outcomes indicated that this paper’s model had good anomaly detection results. However, the model in this paper needs to be improved when exploring the causes of anomaly detection. In the future, root cause analysis will be added to the anomaly detection model to make the model interpretable. Furthermore, more log parameters will be coded to check their impact on system exception detection.

Author Contributions

Conceptualization, D.H. and M.Z.; methodology, M.L.; software, M.S.; validation, M.L. and M.S.; formal analysis, G.L.; investigation, D.H.; resources, G.L.; data curation, M.S.; writing—original draft preparation, M.S.; writing—review and editing, M.L.; visualization, M.S.; supervision, M.L.; project administration, M.Z.; funding acquisition, G.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Key R&D Program of Shandong Province, China (2022RZB02018, 2022CXGC020106).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Datasets can be accessed upon request to the corresponding author.

Acknowledgments

The authors would like to thank all the Reviewers for their insightful comments and constructive suggestions, which improved this paper’s quality.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

IoT	Internet of Things
TB	Terabyte
SVM	Support vector machine
PCA	Principal component analysis
CNN	Convolution neural network
RNN	Recurrent neural network
LSTM	Long short-term memory
Bi-LSTM	Bidirectional long short-term memory
GRU	Gate recurrent unit
Bert	Bidirectional Encoder Representation from Transformers
ROEAD	Robust online evolving anomaly detection
RFE	Robust feature extractor
OES	Online evolving SVM
LCS	Longest common sequence
MLP	Multi-layer perceptron

References

Kaur, R.; Sharma, E.S. Various techniques to detect and predict faults in software system: Survey. Int. J. Future Revolut. Comput. Sci. Commun. Eng. (IJFRSCE) 2018, 4, 330–336. [Google Scholar]
He, S.; Zhu, J.; He, P.; Lyu, M.R. Experience Report: System Log Analysis for Anomaly Detection. In Proceedings of the 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE), Ottawa, ON, Canada, 23–27 October 2016; pp. 207–218. [Google Scholar] [CrossRef]
Yuan, Y.; Srikant Adhatarao, S.; Lin, M.; Yuan, Y.; Liu, Z.; Fu, X. ADA: Adaptive Deep Log Anomaly Detector. In Proceedings of the IEEE INFOCOM 2020—IEEE Conference on Computer Communications, Toronto, ON, Canada, 6–9 July 2020; pp. 2449–2458. [Google Scholar] [CrossRef]
Cheng, S.; Pei, D.; Wang, C. Error log clustering of internet software. J. Chin. Comput. Syst. 2018, 39, 865–870. [Google Scholar]
Vaarandi, R. A data clustering algorithm for mining patterns from event logs. In Proceedings of the 3rd IEEE Workshop on IP Operations Management (IPOM 2003) (IEEE Cat. No. 03EX764), Kansas City, MO, USA, 3 October 2003; pp. 119–126. [Google Scholar]
Makanju, A.; Zincir-Heywood, A.N.; Milios, E.E. A Lightweight Algorithm for Message Type Extraction in System Application Logs. IEEE Trans. Knowl. Data Eng. 2012, 24, 1921–1936. [Google Scholar] [CrossRef]
Du, M.; Li, F. Spell: Streaming Parsing of System Event Logs. In Proceedings of the 2016 IEEE 16th International Conference on Data Mining (ICDM), Barcelona, Spain, 12–15 December 2016; pp. 859–864. [Google Scholar] [CrossRef]
Du, M.; Li, F. Spell: Online Streaming Parsing of Large Unstructured System Logs. IEEE Trans. Knowl. Data Eng. 2019, 31, 2213–2227. [Google Scholar] [CrossRef]
He, P.; Zhu, J.; Zheng, Z.; Lyu, M.R. Drain: An Online Log Parsing Approach with Fixed Depth Tree. In Proceedings of the 2017 IEEE International Conference on Web Services (ICWS), Honolulu, HI, USA, 25–30 June 2017; pp. 33–40. [Google Scholar] [CrossRef]
Zhang, S.; Meng, W.; Bu, J.; Yang, S.; Liu, Y.; Pei, D.; Xu, J.; Chen, Y.; Dong, H.; Qu, X.; et al. Syslog processing for switch failure diagnosis and prediction in datacenter networks. In Proceedings of the 2017 IEEE/ACM 25th International Symposium on Quality of Service (IWQoS), Vilanova i la Geltrú, Spain, 14–16 June 2017; pp. 1–10. [Google Scholar] [CrossRef]
Meng, W.; Liu, Y.; Zhu, Y.; Zhang, S.; Pei, D.; Liu, Y.; Chen, Y.; Zhang, R.; Tao, S.; Sun, P.; et al. LogAnomaly: Unsupervised Detection of Sequential and Quantitative Anomalies in Unstructured Logs. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, IJCAI-19. International Joint Conferences on Artificial Intelligence Organization, Macao, China, 10–16 August 2019; pp. 4739–4745. [Google Scholar] [CrossRef]
Studiawan, H.; Sohel, F.; Payne, C.N. Automatic Event Log Abstraction to Support Forensic Investigation. In Proceedings of the Australasian Computer Science Week Multiconference, Melbourne, Australia, 4–6 February 2020. [Google Scholar]
Studiawan, H.; Payne, C.; Sohel, F. Automatic Graph-Based Clustering for Security Logs. In Advanced Information Networking and Applications: Proceedings of the 33rd International Conference on Advanced Information Networking and Applications (AINA-2019); Barolli, L., Takizawa, M., Xhafa, F., Enokido, T., Eds.; Springer International Publishing: Cham, Switzerland, 2020; pp. 914–926. [Google Scholar]
Bodík, P.; Goldszmidt, M.; Fox, A.; Woodard, D.B.; Andersen, H. Fingerprinting the datacenter: Automated classification of performance crises. In Proceedings of the 5th European Conference on Computer Systems EuroSys ’10, Paris, France, 13–16 April 2010. [Google Scholar]
Chen, M.; Zheng, A.; Lloyd, J.; Jordan, M.; Brewer, E. Failure diagnosis using decision trees. In Proceedings of the International Conference on Autonomic Computing, New York, NY, USA, 17–18 May 2004; pp. 36–43. [Google Scholar] [CrossRef]
Liang, Y.; Zhang, Y.; Xiong, H.; Sahoo, R. Failure prediction in ibm bluegene/l event logs. In Proceedings of the Seventh IEEE International Conference on Data Mining (ICDM 2007), Omaha, NE, USA, 28–31 October 2007; pp. 583–588. [Google Scholar]
Fang, W.; Tan, X.; Wilbur, D. Application of intrusion detection technology in network safety based on machine learning. Saf. Sci. 2020, 124, 104604. [Google Scholar]
Han, S.; Wu, Q.; Zhang, H.; Qin, B.; Hu, J.; Shi, X.; Liu, L.; Yin, X. Log-Based Anomaly Detection with Robust Feature Extraction and Online Learning. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2300–2311. [Google Scholar] [CrossRef]
Lou, J.G.; Fu, Q.; Yang, S.; Xu, Y.; Li, J. Mining Invariants from Console Logs for System Problem Detection. In Proceedings of the 2010 USENIX Conference on USENIX Annual Technical Conference, USENIXATC’10, Boston, MA, USA,, 23–25 June 2010; USENIX Association: Boston, MA, USA, 2010; p. 24. [Google Scholar]
Xu, W.; Huang, L.; Fox, A.; Patterson, D.; Jordan, M.I. Detecting large-scale system problems by mining console logs. In Proceedings of the ACM SIGOPS 22nd Symposium on Operating Systems Principles, Big Sky, MT, USA, 11–14 October 2009; pp. 117–132. [Google Scholar]
Lu, S.; Wei, X.; Li, Y.; Wang, L. Detecting Anomaly in Big Data System Logs Using Convolutional Neural Network. In Proceedings of the 2018 IEEE 16th Intl Conf on Dependable, Autonomic and Secure Computing, 16th Intl Conf on Pervasive Intelligence and Computing, 4th Intl Conf on Big Data Intelligence and Computing and Cyber Science and Technology Congress(DASC/PiCom/DataCom/CyberSciTech), Athens, Greece, 12–15 August 2018; pp. 151–158. [Google Scholar] [CrossRef]
Brown, A.; Tuor, A.; Hutchinson, B.; Nichols, N. Recurrent Neural Network Attention Mechanisms for Interpretable System Log Anomaly Detection. In Proceedings of the First Workshop on Machine Learning for Computing Systems, Tempe, AZ, USA, 12 June 2018. [Google Scholar]
Du, M.; Li, F.; Zheng, G.; Srikumar, V. DeepLog: Anomaly detection and diagnosis from system logs through deep learning. In Proceedings of the 2017 ACM SIGSAC Conference on Computer and Communications Security, Dallas, TX, USA, 3 October–3 November 2017; pp. 1285–1298. [Google Scholar]
Vinayakumar, R.; Soman, K.P.; Poornachandran, P. Long short-term memory based operation log anomaly detection. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 13–16 September 2017; pp. 236–242. [Google Scholar] [CrossRef]
Farzad, A.; Gulliver, T.A. Two Class Pruned Log Message Anomaly Detection. SN Comput. Sci. 2021, 2, 391. [Google Scholar] [PubMed]
Zhou, J.; Qian, Y.; Zou, Q.; Liu, P.; Xiang, J. DeepSyslog: Deep Anomaly Detection on Syslog Using Sentence Embedding and Metadata. IEEE Trans. Inf. Forensics Secur. 2022, 17, 3051–3061. [Google Scholar] [CrossRef]
Chen, Y.; Luktarhan, N.; Lv, D. LogLS: Research on System Log Anomaly Detection Method Based on Dual LSTM. Symmetry 2022, 14, 454. [Google Scholar] [CrossRef]
Zhang, X.; Xu, Y.; Lin, Q.; Qiao, B.; Zhang, H.; Dang, Y.; Xie, C.; Yang, X.; Cheng, Q.; Li, Z.; et al. Robust log-based anomaly detection on unstable log data. In Proceedings of the 2019 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, Tallinn, Estonia, 26–30 August 2019. [Google Scholar]
Li, X.; Chen, P.; Jing, L.; He, Z.; Yu, G. SwissLog: Robust Anomaly Detection and Localization for Interleaved Unstructured Logs. IEEE Trans. Dependable Secur. Comput. 2022, 1. [Google Scholar] [CrossRef]
Xiao, R.; Chen, H.; Lu, J.; Li, W.; Jin, S. AllInfoLog: Robust Diverse Anomalies Detection Based on All Log Features. IEEE Trans. Netw. Serv. Manag. 2022, 1. [Google Scholar] [CrossRef]
Li, X.; Chen, P.; Jing, L.; He, Z.; Yu, G. SwissLog: Robust and Unified Deep Learning Based Log Anomaly Detection for Diverse Faults. In Proceedings of the 2020 IEEE 31st International Symposium on Software Reliability Engineering (ISSRE), Coimbra, Portugal, 12–15 October 2020; pp. 92–103. [Google Scholar] [CrossRef]
Nedelkoski, S.; Bogatinovski, J.; Acker, A.; Cardoso, J.; Kao, O. Self-Attentive Classification-Based Anomaly Detection in Unstructured Logs. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 1196–1201. [Google Scholar] [CrossRef]
Savaridassan, P.; Maragatham, G. Integrated Deep Auto-Encoder and Q-Learning-Based Scheme to Detect Anomalies and Supporting Forensics in Cloud Computing Environments. Wirel. Pers. Commun. 2021, 127, 2247–2265. [Google Scholar]
Huang, S.; Liu, Y.; Fung, C.; He, R.; Zhao, Y.; Yang, H.; Luan, Z. HitAnomaly: Hierarchical Transformers for Anomaly Detection in System Log. IEEE Trans. Netw. Serv. Manag. 2020, 17, 2064–2076. [Google Scholar] [CrossRef]
Wang, Y.; Li, X. FastTransLog: A Log-based Anomaly Detection Method based on Fastformer. In Proceedings of the 2022 9th International Conference on Dependable Systems and Their Applications (DSA), Wulumuqi, China, 4–5 August 2022; pp. 446–453. [Google Scholar] [CrossRef]
Zhou, J.; Qian, Y. AugLog: System Log Anomaly Detection Based on Contrastive Learning and Data Augmentation. In Proceedings of the 2022 5th International Conference on Data Science and Information Technology (DSIT), Shanghai, China, 22–24 July 2022; pp. 1–7. [Google Scholar] [CrossRef]
Zhang, M.; Chen, J.; Liu, J.; Wang, J.; Shi, R.; Sheng, H. LogST: Log Semi-supervised Anomaly Detection Based on Sentence-BERT. In Proceedings of the 2022 7th International Conference on Signal and Image Processing (ICSIP), Suzhou, China, 20–22 July 2022; pp. 356–361. [Google Scholar] [CrossRef]
Yang, L.; Chen, J.; Wang, Z.; Wang, W.; Jiang, J.; Dong, X.; Zhang, W. PLELog: Semi-Supervised Log-Based Anomaly Detection via Probabilistic Label Estimation. In Proceedings of the 2021 IEEE/ACM 43rd International Conference on Software Engineering: Companion Proceedings (ICSE-Companion), Madrid, Spain, 25–28 May 2021; pp. 230–231. [Google Scholar] [CrossRef]
Guo, H.; Yuan, S.; Wu, X. LogBERT: Log Anomaly Detection via BERT. In Proceedings of the 2021 International Joint Conference on Neural Networks (IJCNN), Shenzhen, China, 18–22 July 2021; pp. 1–8. [Google Scholar] [CrossRef]
Jiang, Z.M.; Hassan, A.E.; Flora, P.; Hamann, G. Abstracting Execution Logs to Execution Events for Enterprise Applications (Short Paper). In Proceedings of the 2008 the Eighth International Conference on Quality Software, Oxford, UK, 12–13 August 2008; pp. 181–186. [Google Scholar] [CrossRef]

Figure 1. Overview structure of MDFULog. Split, clustering, and mask are used to parse the log, extract its valid information, and capture the semantic and temporal features for vectorization, and an Informer-based classification model is used for anomaly detection.

Figure 2. Bert structure.

Figure 3. Bert input sequence diagram.

Figure 4. Informer.

Figure 5. Log sequence anomaly.

Figure 6. Time anomaly.

Figure 7. Accuracy of log parsing methods on various log volumes. (a) HDFS. (b) BGL. (c) Android.

Figure 8. Efficiency of log parsing methods on various log volumes. (a) HDFS. (b) BGL. (c) Android.

Table 1. Results of comparing MDFULog with the baseline methods on stable datasets.

	HDFS			OpenStack
	P	R	F1	P	R	F1
PCA	0.86	0.68	0.759	0.51	0.68	0.583
DeepLog	0.89	0.93	0.910	0.78	0.81	0.795
LogRobust	0.93	0.96	0.945	0.89	0.81	0.848
ROEAD	0.96	0.98	0.970	0.94	0.94	0.940
DeepSyslog	0.96	0.95	0.955	0.87	0.97	0.917
MDFULog	0.98	0.98	0.980	0.96	0.96	0.960

Note: The best results are highlighted in bold.

Table 2. Results for comparing MDFULog versus the baseline methods on unstable datasets.

Injection Ratio	Model	P	R	F1
5%	PCA DeepLog LogRobust ROEAD DeepSyslog MDFULog	0.85 0.86 0.93 0.96 0.92 0.98	0.66 0.79 0.91 0.95 0.9 0.97	0.743 0.824 0.920 0.955 0.910 0.975
10%	PCA DeepLog LogRobust ROEAD DeepSyslog MDFULog	0.78 0.75 0.89 0.95 0.89 0.98	0.64 0.58 0.9 0.92 0.81 0.98	0.703 0.654 0.895 0.935 0.848 0.980
15%	PCA DeepLog LogRobust ROEAD DeepSyslog MDFULog	0.73 0.75 0.89 0.90 0.88 0.96	0.42 0.51 0.88 0.86 0.72 0.96	0.533 0.607 0.885 0.880 0.792 0.960
20%	PCA DeepLog LogRobust ROEAD DeepSyslog MDFULog	0.72 0.70 0.90 0.85 0.87 0.95	0.37 0.42 0.89 0.83 0.69 0.97	0.489 0.525 0.895 0.840 0.770 0.960

Note: The best results are highlighted in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, M.; Sun, M.; Li, G.; Han, D.; Zhou, M. MDFULog: Multi-Feature Deep Fusion of Unstable Log Anomaly Detection Model. Appl. Sci. 2023, 13, 2237. https://doi.org/10.3390/app13042237

AMA Style

Li M, Sun M, Li G, Han D, Zhou M. MDFULog: Multi-Feature Deep Fusion of Unstable Log Anomaly Detection Model. Applied Sciences. 2023; 13(4):2237. https://doi.org/10.3390/app13042237

Chicago/Turabian Style

Li, Min, Mengjie Sun, Gang Li, Delong Han, and Mingle Zhou. 2023. "MDFULog: Multi-Feature Deep Fusion of Unstable Log Anomaly Detection Model" Applied Sciences 13, no. 4: 2237. https://doi.org/10.3390/app13042237

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MDFULog: Multi-Feature Deep Fusion of Unstable Log Anomaly Detection Model

Abstract

1. Introduction

2. Related Work

3. Methods

3.1. Overview

3.2. Log Parsing Method

3.3. Log Vectorization

3.4. Classification Based on Informer

3.5. Anomaly Detection

4. Experiment and Analysis

4.1. Dataset

Log Exception Characteristics

4.2. Evaluation Metrics

4.3. Result Analysis

4.3.1. Original Dataset

4.3.2. Unstable Dataset

4.3.3. Robustness and Efficiency

4.4. Model Evaluation and Advantages

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI