Next Article in Journal
Transformation of the Spatial Spectrum of Scattered Radio Waves in the Conductive Equatorial Ionosphere
Previous Article in Journal
Optical Frequency Comb Generator Employing Two Cascaded Frequency Modulators and Mach–Zehnder Modulator
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

AddAG-AE: Anomaly Detection in Dynamic Attributed Graph Based on Graph Attention Network and LSTM Autoencoder

1
School of Cyberspace, Hangzhou Dianzi University, Baiyang Street, Hangzhou 310018, China
2
Zhongfu Information Co., Ltd., Jingshi Road, Jinan 250100, China
*
Author to whom correspondence should be addressed.
Electronics 2023, 12(13), 2763; https://doi.org/10.3390/electronics12132763
Submission received: 28 April 2023 / Revised: 13 June 2023 / Accepted: 18 June 2023 / Published: 21 June 2023

Abstract

:
Recently, anomaly detection in dynamic networks has received increased attention due to massive network-structured data arising in many fields, such as network security, intelligent transportation systems, and computational biology. However, many existing methods in this area fail to fully leverage all available information from dynamic networks. Additionally, most of these methods are supervised or semi-supervised algorithms that require labeled data, which may not always be feasible in real-world scenarios. In this paper, we propose AddAG-AE, a general dynamic graph anomaly-detection framework that can fuse node attributes and spatiotemporal information to detect anomalies in an unsupervised manner. The framework consists of two main components. The first component is a feature extractor composed of a dual autoencoder, which captures a joint representation of both the network structure and node attributes in a latent space. The second component is an anomaly detector that combines a Long Short-Term Memory AutoEncoder (LSTM-AE) and a predictor, effectively identifying abnormal snapshots among most normal graph snapshots. Compared with baselines, experimental results show that the method proposed has broad applicability and higher robustness on three datasets with different sparsity.

1. Introduction

With the advancement of science and technology, numerous domains, such as network security, intelligent transportation systems, social media, and computational biology [1,2,3,4] are producing large amounts of network-structured data composed of many interdependent objects and time-varying components. However, in these data, there are often anomalies (some unusual patterns or behaviors that significantly deviate from most of the data) that are typically associated with network attacks in network security and network fraud in social networks. Therefore, effectively detecting anomalies in network-structured data is of great significance for mitigating potential risks, monitoring system status, and ensuring system security.
Due to the high complexity of these data, integrating information fully and reasonably from various dimensions and effectively identifying anomalies has become a significant challenge for anomaly detection. To obtain as much information as possible from the network, attributed graphs are usually adopted in contemporary research [5]. However, with the graphs’ structural and attributed features contained, graph anomaly detection (GAD) in attributed networks raises a more complex problem in non-Euclidean space. The authors in [6,7,8] detected anomalous nodes in static attributed graphs and achieved a better performance than their baselines. However, those methods, based on static attributed graphs, usually ignore the dynamic evolution of graph structures and node attributes. There are also some papers focusing on investigating anomaly detection in dynamic graphs. Some literature [9,10] has explored traditional outlier detection algorithms such as Robust Random Cut Forest (RRCF) [11], IsolationForest [12], etc., while others explore anomaly detection in dynamic attributed graphs using deep-learning approaches [13]. Graph Convolutional Network (GCN) [14], Graph Attention Network (GAT) [15], GraphSage [16] and AutoEncoder (AE) [17] are representative methods in dealing with graph data. However, the above methods neglect the impact of the temporal information of graph data. Long Short-Term Memory (LSTM) [18], Gate Recurrent Unit (GRU) [19] and a forecasting-based model are well-designed for dealing with temporal information. However, it neglects both the attribute information of the nodes and the structural information of the graph. To leverage all available information in the network effectively, attempts have been made [20,21,22] to combine both types of techniques in anomaly detection for specific application domains. These efforts have demonstrated the feasibility and efficiency of detecting anomalies in graph-structured data.
Among the abovementioned methods, it is evident that each of them offers a novel algorithmic solution with high performance. However, some of these methods have not fully integrated all the features of network data, including structural, attribute, and temporal factors in dynamic networks. Additionally, certain methods rely on labeled data, which is sometimes impractical in real-world scenarios. Another significant limitation of existing works is that these methods apply to specific types of network datasets only. For instance, one method may demonstrate better performance on networks with fewer nodes and dense connections between them but experience a severe decline when applied to sparse networks. Correspondingly, a method that excels in sparse networks may exhibit worse performance when dealing with dense networks.
To alleviate the aforementioned problems, we proposed a novel framework AddAG-AE for anomaly detection in dynamic attributed networks. AddAG-AE can combine network structural, node attribute, and temporal information reasonably. The framework consists of two main components, namely the feature extractor and anomaly detector. The feature extractor includes a structure autoencoder and an attribute autoencoder, which reconstruct the original node attributes and network adjacency matrix to obtain a joint representation vector in the latent space. The anomaly detector, composed of an LSTM-AE and a forecasting-based model, detects anomalies in data with node attribute vector, adjacency matrix, and the represented vector as input. The main contributions of this work are summarized as follows:
  • We propose an anomaly-detection framework, named AddAG-AE, that effectively integrates spatiotemporal, structural, and attribute information to achieve higher accuracy in anomaly detection in a self-supervised manner. It addresses the issue of integrating different types of information and improves applicability to different graph structures (sparsity levels).
  • In the graph embedding phase, we design and implement a new encoding–decoding mechanism based on GAT, which makes full use of the horizontal and vertical dimensions of the latent matrix, mining the potential associations between different types of information and enhancing the effectiveness of node representations.
  • In the anomaly-detection phase, a joint optimization objective is introduced to effectively integrate the reconstruction model and the prediction model. The reconstruction loss in this joint optimization objective takes into account latent vector reconstruction, graph structure reconstruction, and attribute reconstruction, effectively improving the model’s robustness and performance in anomaly detection.
We conduct extensive experiments on three different kinds of datasets, including dense-connected graphs, sparse-connected graphs, and graphs with frequent changes in connections or weights. The result shows the broader applicability and superior performance of AddAG-AE for anomaly detection in different kinds of graphs.

2. Related Work

In this section, we first review the existing methods for anomaly detection, including some typical frameworks and deep-learning-based ones, and then summarize some approaches relevant to our work.

2.1. Anomaly Detection in Dynamic Networks

In anomaly detection for dynamic networks, in order to represent the real-world dynamic network with evolving relationships between real objects and their attributes, we usually need to model them as dynamic graphs. The benefits and limitations of the existing work are shown in Table 1. Typical methods, such as Steamspot [10], Spotlight [9], and Snapsketch [23] detect anomalies by mapping the graphs modeled by real network data to graph vectors in a sketching space through a special sketching method, and then classifying them using typical one-class classification algorithms. These works take full advantage of network structure features but cannot maintain and process time-varying components.
Recently, an increasing number of deep-learning-based methods have been proposed to tackle this issue. The authors in [24] proposed a novel anomaly-detection framework, named GmapAD, which fully explores the structural and attribute information within and between graphs. The framework utilizes representative nodes in the graph to map it to a new feature space and employs traditional machine-learning classifiers for anomaly detection. Although the aforementioned method extensively exploits structural and attribute information at the graph level, it overlooks the long and short-term patterns of nodes. In order to reasonably combine all possible information of dynamic graph data, the authors in [22] proposed a semi-supervised model for anomalous edge detection, named AddGraph, which employs an attention-based GRU to capture hidden information for long-term patterns and a temporal GCN to deal with graph structural information for short-term patterns of the nodes as input of GRU at current timestamp. However, this method is deployed by assuming that the training data are ideal data without any anomalous edges contained at the initial timestamps. The authors in [25] proposed an unsupervised model DeepSphere, which first incorporates hypersphere learning into LSTM-AE, and can overcome the problem that may seriously degrade the quality of neural network during training by learning the boundary between normal and abnormal data. In practice, this method performs well for relatively stable dense graphs but loses the ability to detect anomalies when applied to unstable sparse graphs. Additionally, it does not consider graph structure.

2.2. Graph Embedding Techniques and Deep Autoencoder

Graph Embedding aims at obtaining the embedding vector of a node or graph in latent space, which can sufficiently preserve valuable information about graph structural data. Corresponding techniques [32] have been widely exploited. Most can be grouped into three kinds, namely factorization-based methods, such as (i) Locally Linear Embedding (LLE) [26], Laplacian Eigenmaps [27] and Graph Factorization (GF) [28]; (ii) random walk-based methods, such as DeepWalk [29] and Node2vec [30]; and (iii) deep-learning-based methods, such as Structural Deep Network Embedding (SDNE) [31], GCN, GAT and others.
Factorization-based methods and random walk-based methods focus on the preservation of structural similarity only. Deep-learning-based methods, especially Graph Neural Network (GNN)-related techniques with their strong capability to fuse structural and content features, have attracted extensive interest recently. GCN generalizes the idea of a convolution model to non-Euclidean space so that it can process graph structural features and node content features in general graphs but not be limited to regular ones by aggregating the representations of node-selves and their one-step neighbors. Building on this idea, GAT improves the aggregating manner using a self-attention layer to learn related weights for different neighbors, using neighbor information more reasonably than GCN. Additionally, it can further enhance the capability of capturing information by applying multi-head attention.
General GNN methods are well-designed for combining all possible features of graph data, but, in dynamic graphs, temporal features reflecting the evolution process of graph structure have not been considered. Naturally, an architecture that can capture temporal information needs to be added to dynamic graph anomaly detection. In time-series anomaly detection, Recurrent Neural Networks (RNNs) [33] designed for capturing temporal information have been widely used. In order to overcome the problem of gradient vanishing and the exploding of standard RNN, the authors in [18] proposed LSTM, which successfully solved the long-term dependency problem. Combing the idea of deep autoencoding, LSTM-AE has become a reasonable choice to capture temporal information in dynamic graphs and achieve anomalous detection by reconstructing original input data. Additionally, inspired by the work of the authors in [34], we introduce a forecasting-based model detecting anomalies by predicting errors of the next timestamp as a complementary method to improve the overall performance of our model under the joint optimization strategy.

3. Problem Definition

In this section, we introduce notations and definitions for our framework. The notations used in this article are listed in Table 2.

3.1. Definition 1: Dynamic Attributed Graph

A dynamic attributed graph stream can be defined as G = { ν t , ε t , χ t } t = 1 T . Next, we set G ( t ) = ( V t , A t , X t ) as a snapshot in G , where t is the timestamp, V t = { V i t } i = 1 N is the node set, V i t is the i-th node, and each row of the attribute matrix X t represents a node’s attribute vector. A t is an adjacency matrix. A i , j t = ω , where ω is the weight of edge. We adopt unweighted graphs in this work, so A i , j t { 0 , 1 } . A i , j t = 1 , when i = j , because GAT-AE considers node-self’s connection in Equations (1)–(4).

3.2. Definition 2: Anomaly Detection

Given a graph steam { G ( t ) } , our goal is to detect anomalous snapshots (i.e., anomalous graphs) where the state of the whole system at the current timestamp differs from most other timestamps. For each node i at any timestamp t, we have learned a function f ( V i ) that reflects the node’s anomaly probability. Additionally, this function can serve as a score for node anomaly. If the average anomaly score of all nodes in the graph corresponding to a specific timestamp t is greater than a certain threshold, the graph is defined as an anomalous graph.

4. Proposed Method

In this section, we elaborate on the proposed AddAG-AE framework, the overview of which is illustrated in Figure 1a. AddAG-AE contains two parts, namely graph embedding and anomaly detection. The former can be viewed as a feature extractor aiming to learn node-embedding vectors fusing graph structural and node attribute information. Meanwhile, the latter is an anomaly detector aiming to detect anomalies with temporal information considered. We make corresponding improvements in two phases based on the original algorithm.

4.1. Graph Embedding Based on GAT-AE

The details of GAT-AE are shown in Figure 1b. In a dynamic attributed graph stream, each snapshot contains graph structural and node attribute feature. Therefore, we aggregate the information of a node with its neighbors in the current graph at timestamp t to obtain its latent representation Z i t as follows,
Z i t = Re L U β i , i X i t W + j N t ( i ) β i , j X i t W
where X i t R 1 × d is the node i’s attribute feature, W R d × d is the weight matrix ( d denotes the embedding dimension of GAT layer), whose parameters are learned by the encoder and shared among all nodes at all timestamps, N t ( i ) = j | A i j t > 0 is the set of the node i’s neighbors, and the attention score β i , j can be computed as
e i j = L e a k y Re LU o T r X i t W X j t W
β i , j = exp e i , j k N t i i exp e i , k
where o R 2 d is a vector of learned parameters for the attention mechanism, ⊕ denotes represents the concatenation of two vectors and T r denotes the transpose of a matrix. The attention score is computed by the L e a k y R e L U function and normalized by the SoftMax function. After obtaining representation Z i t of all nodes, we decode them to achieve the reconstruction of the original graph G ( t ) by structure and attribute decoder. For graph structural decoding, we take latent representation as the structural decoder’s input and calculate the inner product between them to get the reconstructed adjacency matrix
A ^ t = S i g m o i d Z t ( Z t ) T r
where Z t R N × d is a matrix that consists of all nodes of representation Z 1 t , Z 2 t , , Z N t at time step t. ( Z t ) T r denotes the transpose of Z t . A ^ t R N × N denotes the reconstructed adjacency matrix. For node attribute decoding, we improve the idiomatic decoding manner [35] by employing two fully connected layers in two different directions, which take full advantage of more information from the node’s latent representation, reconstruct the attribute vector as follows
Z t = Re L U ( Z t ) T r W 1 + B 1
X ^ t = ( Z t ) T r W 2 + B 2
where W 1 R N × N and B 1 R d × N denote weights and bias in one dimension, and W 2 R d × d and B 2 R N × d denote the weights and bias in the other dimension. These parameters are learned by the decoder and shared across all timestamps. X ^ t is the reconstructed attribute matrix at timestamp t. As shown above, reconstruction models, which include two parts (i.e., graph structural and node attribute reconstruction), are introduced in our model. Correspondingly, the loss of two parts ought to be taken into consideration. Therefore, the loss function of reconstruction errors can be formulated as follows:
L 1 = α L S + 1 α L X
where α is a tradeoff parameter that balances the graph structure and node attribute reconstruction errors. L S is a reconstruction loss derived from a kind of maximum likelihood estimation, which can be written as
L S = 1 N i = 1 N j = 1 N A i j t log A ^ t i j + 1 A i j t log 1 A ^ t i j
where N is the number of nodes, A i j t is the real element value of adjacency matrix and A ^ i j t is predictive value from Equation (4). For the node attribute reconstruction, we take the mean square error between the original node attributes and reconstructed vector as the loss function as follows:
L X t = 1 N i = 1 N X i t X ^ i t 2
where X i t and X ^ i t are node i’s attribute vector and corresponding reconstructed vector respectively. After training using the aforementioned method, we only retain the GAT encoding part (i.e., Equations (1)–(3) corresponding to the encoding layer). Next, we describe the graph embedding based on GAT using an example. For a given graph at a specific time step, we have the node attribute matrix X R N × d and the adjacency matrix A R N × N (the adjacency matrix is represented in the form of N t as in Equation (1)). We input these matrices into the GAT layer, and after applying Equation (1), we obtain the embedding matrix Z R N × d .

4.2. Anomaly Detection

After the above process of the GAT-AE, we obtain embedding vectors of all nodes in all graphs, which represent the structure and attribute features in latent space. However, until now, the time-dependency of the same node at different time steps has not been taken into account for anomaly detection. Furthermore, a node with a high anomalous score in Section 4.1 only represents the degree of deviation from other normal nodes in the same graph (the same timestamp), but not the degree of deviation from the historical feature of nodes, i.e., we are more likely to score nodes by comparing the changes in the state of node i at current timestamp t with this node historical state at timestamp t 1 , t 2 , .
Thus, considering temporal information in anomaly detection, the detector includes two parts, one of which is a reconstruction-based model aiming at capturing the data distribution of the entire graph stream and the other of which is a forecasting-based model aimed at predicting the value at the next timestamp. During the process, the parameters from both models are updated simultaneously. Finally, the loss function can be defined as follows:
L 2 = L r e c + L f o r
where L r e c and L f o r denote the reconstruction loss and forecasting loss, respectively.

4.2.1. Reconstruction-Based Model

We adopt an LSTM-AE as a reconstruction model to reconstruct the embedding vector of the GAT-AE to find anomalous nodes in the data. It is noted that we reshape an output tensor of the GAT-AE so that the embedding vector of the same node i at different timestamp t can form a time sequence Z i 1 , Z i 2 , , Z i T (named X i ) into the LSTM-AE. In the process of encoding, the encoder transforms X i into a hidden representation with a regular LSTM model, as follows:
h e n t = L S T M e n ( Z i t , h e n t 1 )
where h e n t R 1 × d denotes a hidden vector of node i after processing of the LSTM encoder ( d denotes the embedding dimension of LSTM layer). In the process of decoding, we take the last hidden vector h e n T as the decoder’s initial parameter to reconstruct sequence X i ,
Z ^ i t 1 = L S T M d e Z ^ i t , h e n t 1
where Z ^ i t 1 is the reconstruction vector at timestamp t 1 . From Equation (12), we use the output of timestamp t as the input of timestamp t 1 in the decoding phase. Finally, the training loss function of our model is computed as follows:
L r e c = 1 T i = 1 N t = 1 T Z i t Z ^ i t 2
where Z i t and Z ^ i t are feature vector and reconstructed vector, respectively. N is the number of nodes. The number of layers in the LSTM can be set to different values. We maintain the same number of layers for both the encoding LSTM and decoding LSTM. Based on the comprehensive performance considerations, the number of layers in the LSTM encoder–decoder is set as two in this paper (relevant experiments refer to Section 5.4.3).

4.2.2. Forecasting-Based Model

First, we stack fully connected layers with hidden vector h e n t of LSTM-AE as input and use R e L U as the nonlinear activation function, to extract features from hidden vectors that contain all spatio–temporal information for the node
Z ˜ t = f Re L U h t 1
Then, we transform Z ˜ 1 t , Z ˜ 2 t , , Z ˜ N t at time step t into matrix Z ˜ t and then use Equations (4)–(6) to decode Z ˜ t to predict the next timestamp’s original attribute vector and adjacency matrix. The loss function of the predictor can be formulated as follows:
L f o r = 1 T t = 1 T i = 1 N X i t X ^ i t 2 + j N t ( i ) { i } A i , j t A ^ i , j t 2
where X i t and X ^ i t are the original attribute and predicting vector of node i using our predictor, respectively. A i , j t and A ^ i , j t denote the real existing edge and prediction probability value of the edge, respectively. N is the number of nodes. t ranges from 1 to T, denoting the timestamp. Finally, we adopt a node’s reconstruct error of the LSTM and forecasting error as its anomalous score, which can be formulated as follows:
S t ( V i ) = Z i t Z ^ i t 2 + X i t + 1 X ^ i t + 1 2 + j N t + 1 ( i ) A i , j t + 1 A ^ i , j t + 1 2
where S t ( V i ) is the anomalous score of the node i at timestamp t. It demonstrates that AddAG-AE can detect anomalous nodes in dynamic graphs. However, in practical applications, when working with a graph stream, the focus often shifts to assessing whether the whole system state at a specific timestamp (i.e., the statistical properties of a graph) is anomalous or not. To address this, we can evaluate the mean value of the anomalous scores, denoted as S ( t ) = i N S t ( V i ) / N , or consider the number of nodes whose anomalous score exceeds a particular threshold, denoted as S ( t ) = i | S t ( V i ) > λ , as the evaluation metric for determining whether the current network is anomalous. We adopt the first one in this article.

5. Experiment

5.1. Datasets

In this paper, we evaluate AddAG-AE on three commonly used real-world datasets, including Enron Mail (graph whose connections frequently changing, Table 3), NYC Taxi Trips (dense graph, Table 4) and IDS 2017 (sparse graph, Table 5). After processing and calculation, the statistics of nodes (#v) and edges (# ϵ ) of different datasets are shown in Table 6, where the symbol “#” represents abbreviation of “number”.
  • A. NYC Taxi Trips. This is a dataset of NYC taxi trips from 2009 to 2018, including details such as time and coordinates. In the experiment, we extracted data from October 2015 to January 2016 and converted it into a chart broken down by day. NYC taxi trips are divided into 56 zones using K-means clustering, which represents the nodes of the graph, with the trips between regions representing the edges. For each node, the trips from that region are aggregated and normalized into a single attribute vector. Special days with unusual traffic patterns are flagged as anomalies for analysis.
  • B. IDS 2017. The dataset was intercepted on 5 July 2017, including source IP, target IP, attack type, timestamp, and other fields, resulting in more than 640,000 edges and five attack types. After processing, the graph flow was modeled into 5-minute intervals and marked as anomalies if the graph contained at least 200 attack edges.
  • C. Enron Mail dataset. The dataset consists of 352,550 emails exchanged between Enron employees from 1979 to 2002. We focused on the data since 1999 and select the top 147 individuals with the highest number of senders or recipients as nodes. The dataset was segmented into graph streams in days. Since there are no direct anomaly data, the charts were marked as anomalies if they related to a major scandal event.

5.2. Evaluation Metrics

To verify the performance of the model more comprehensively, this paper adopts the AUC score, precision, recall, and loss as evaluation metrics. Among them, the AUC score is the main evaluation metric, which has been widely used in many abnormal detection methods in the past. Specifically, the AUC score is the area under the ROC (receiver operating characteristic) curve, which implies the sorting quality of the sample forecasted by the model. ROC is the spot of the true positive rate against the false positive rate. We treat the anomalous graphs we labeled as a positive sample in statistics and sort all samples by anomalous score provided by the model. Therefore, a higher AUC score reflects the better anomaly-detection performance of the model. In addition, in order to confirm the effectiveness of the model, we added two supplementary metrics, namely recall and precision.

5.3. Experimental Section Setup

In the experiment, we implement AddAG-AE using the Adam optimizer with a learning rate of 0.001 and weight decay of 0.001 on three commonly used datasets. The dimensions of GAT-AE embedding are 2 d , d , and 1.5 d for NYC Taxi Trips, IDS 2017, and Enron Mail dataset, respectively, where d is its own node-embedding vector’s dimension. α is set to 0.1, 0.5, and 0.3 for NYC Taxi Trips, IDS 2017, and Enron Mail respectively. The number of both GAT layers and LSTM layers is set to two for all datasets.

5.4. Experimental Section Results

We compare AddAG-AE with the following six anomaly-detection methods.

5.4.1. Performance Evaluation

The anomaly-detection performance (AUC scores, precision, and recall in the testing phase) of all baseline methods and AddAG-AE on three datasets are reported in Table 7, where the numbers in bold represent the optimal results for their corresponding indexes. The related analyses and evaluations are as follows:
  • Spotlight [9]: This identifies an anomalous graph by focusing on a sudden change in the localized graph structures, which is effectively for IDS 2017, because its graph structure changes drastically when a network attack occurs in a stable network. While NYC Taxi Trips has dense connections between nodes, the Spotlight’s manner of acquiring features can result in changes to the embedding vector of certain special anomalous graph structures that are not obvious. For Enron Mail, many normal graphs have similar graph vectors with anomaly graphs owing to its special sketching method, which led to its poor performance.
  • LSTM-AE [36]: This displays a set of worse precision, recall, and AUC scores on all three datasets and even be worse than random guesses on IDS 2017. The main reason for this is that the method completely disregards the graph structure and the evolution of structure over time. For IDS 2017, with many nodes and sparse connections between nodes, the simple encode–decode strategy cannot generate effective node representation.
  • DeepSphere [25]: This combines an LSTM-AE and hypersphere learning, and learns a compact boundary for distinct, normal, and abnormal data. For NYC Taxi Trips and Enron Mail, whose nodes number fewer, it can effectively capture the structural differences between normal and abnormal graphs. DeepSphere’s embedding manner is expanding the adjacency matrix of a graph into a high-dimensional vector simply by concatenating every line of the adjacency matrix. For IDS 2017, whose number of nodes is more than 1000, the manner will generate a more than one million-dimensional sparse vector for a graph, which may be the main reason for the poor performance.
  • AnomalyDAE [7]: This has a better performance on NYC Taxi Trips and IDS 2017 while it exhibits ordinary performance on Enron Mail. It is worth noting that it achieves the best recall and the better precision on NYC Taxi Trips data. The method acquires node representation using a dual autoencoder, which is sensitive to node attribution and graph structure. Thus, it has generally better performance on different datasets, especially on dense graphs.
  • Dominant [37]: This is stable on the three datasets, which ignores the dynamic evolution of the same node over time. It adopts the single graph encoder while neglecting complex interactions between the graph structure and node attribute. It is maybe suitable for Enron Mail, and achieves the best AUC score.
  • GmapAD [24]: GmapAD exhibits a stable and better precision, recall, and AUC score on the three datasets. It combines cleverly multiple algorithms such as graph neural networks, differential evolution, and graph mapping and leading to a significant recognition effect on anomalous graphs. It performs better on dense graphs (NYC Taxi Trips data) and slightly worse on sparse graphs.
  • AddAG-AE: This shows the best AUC score on NYC Taxi Trips and IDS 2017, and has a better performance on Enron Mail. In addition, it outperforms all baselines in the precision of all three datasets, and has the best recall on IDS 2017 and Enron Mail. We improved the GAT decoding process and enabled the model to capture a richer set of structure and attribute information. As a result, it produces more effective node representations, which can potentially increase precision and AUC scores. Then, based on the traditional LSTM-AE, we design and incorporate a novel prediction module to aid in anomaly detection, which ensures the stability of the model. On the NYC Taxi Trips dataset, in terms of recall, the best method is not AddAG-AE but AnomalyDAE. The reason for this may be that in the GAT decoding stage, AnomalyDAE jointly considers hidden vectors generated by GAT and the embedded vectors of attribute vectors to reconstruct the attribute vectors, while AddAG-AE ignores this point. For Enron Mail, whose graph structure is from email exchange statistics, more normal graphs are similar to the anomaly graph, which leads to AUC scores declining slightly.
The experimental results are illustrated in Figure 2, with the loss graph during the training phase for three datasets depicted in Figure 2a and the AUC score comparison of different methods across the three datasets showcased in Figure 2b. It can be observed that AddAG-AE increases by 2.8% and 2.3% in AUC score, respectively, on NYC Taxi Trips and IDS 2017 compared to the corresponding second-best model, and its performance is approximately flat with the best model on Enron Mail. In addition, Table 7 shows that AddAG-AE achieves gains of 5.3%, 2.6%, and 1.5% in precision on NYC Taxi Trips, IDS 2017 and Enron Mail, respectively, and increases the recall by 3.8% and 2.2% on IDS 2017 and Enron Mail. In summary, compared with all baselines, AddAG-AE has stable and better performance in three metrics on three datasets.

5.4.2. Ablation Experiment

In this section, we carry out a comparative study by contrasting the AddAG-AE method with the unenhanced method (AddAG-AE without 2D-mlp) and the method lacking SA-decoding improvement (AddAG-AE without SA-decoding). Evaluating this control group through the AUC score can further verify the necessity and influence of the improvement of AddAG-AE. The experimental results are shown in Table 8.
In graph embedding, we improve the decoding manner by considering two-dimensional information (2D-MLP), compared with idiomatic manner [35]. The experiment results showed that it improves by 1.11%, 2.08%, and 4.24% in the AUC scores on the three real-world datasets, respectively. In the anomaly-detection phase, we improved the forecasting model by introducing structure and attribute re-decoding to the latent tensor (SA-decoder), compared with [38]. The experiment results showed that it improves by 0.96%, 1.46%, and 3.10% in AUC scores on three real-world datasets, respectively. It is proved that the two improvements are beneficial to the performance.

5.4.3. Parameter Sensitivity

In this section, we mainly investigate the influence of hyperparameters on model performance, including the embedding vector’s dimension d for GAT-AE, tradeoff parameter α , and the number of GAT and LSTM layers.
Figure 3 shows the AUC scores with different hyperparameters on the datasets. For d , whose suitable values are related to node attribute dimension and usually vary greatly in different datasets, we use the ratio of the embedding vector’s dimension to its node attributes instead of d as the x-axis (we take 1 2 , 1, 3 2 , …). As shown in Figure 3a, this is the point at which the highest performance is different on different kinds of datasets. It is larger for dense graphs, while it is smaller for sparse ones because of the difference in the information density of the graph. For tradeoff parameter α , it shows a similar trend, where the AUC score increases first, then stays flat, and finally decreases as the α value increases from 0 to 1.0, which is especially obvious on IDS 2017 and Enron Mail. Although the highest points on the three datasets differ from one another, this implies the importance of jointly considering the contribution of network structure and node attributes for attributed graph anomaly detection. In terms of the number of network layers, including two parameters—the number of GAT encode layers and that of LSTM encode layers—and taking NYC Taxi Trips datasets as an example, we can see that the two-layer model has the best performance.

6. Discussion

In many practical application scenarios, anomalies in network-structured data from many domains often mean that the system may have potential risks. Accurately and effectively detecting anomalies in these data is of great significance for preventing potential threats, monitoring system status, etc. In this paper, AddAG-AE is proposed to solve the above problems, which considers graph structure, node attribute, and temporal information to achieve anomalous graph (or node) detection by combining the GAT dual autoencoder, LSTM-AE, and forecasting-based model effectively. First, it solves the problem that different information is difficult to fuse in dynamical graph anomaly detection. Second, we improve the GAT dual autoencoder by considering two-dimensional information in the graph embedding phase. Additionally, the dual self-decoding mechanism of GAT is introduced, which improves the performance of the model in the anomaly-detection stage. The whole framework detects anomalies in a self-supervised manner, not depending on labeled data, which can be more applicable to real-world situations. Experiments on three commonly used real-world datasets show that AddAG-AE has superior performance in AUC score, precision, and recall. It has a broad application in different kinds of datasets. Compared with the second-best baseline, the AUC score has increased by 2.8% and 2.3%, respectively, on NYC Taxi Trips and IDS 2017, and its performance is approximately flat with Dominant on Enron Mail; Precision improved by 5.3%, 2.6%, and 1.5% on NYC Taxi Trips, IDS 2017, and Enron Mail, respectively, and the recall increased by 3.8% and 2.2% on IDS 2017 and Enron mail, respectively.
However, research into graph attributes is exploratory and not comprehensive. For example, this paper only uses the weight attribute of nodes, but does not use the type of attribute (such as type of attack in IDS) and edge property attribute of nodes, which is important in some special networks. In future work, we plan to explore more node and edge features in networks and will attempt to improve the model as an end-to-end algorithm without losing the model’s current advantages.

Author Contributions

Conceptualization, G.M. and G.W.; methodology, G.M.; software, G.M.; validation, G.M. and Z.Z.; formal analysis, G.M.; investigation, Y.T.; resources, G.W.; data curation, B.L.; writing—original draft preparation, G.M.; writing—review and editing, Z.Z.; visualization, Y.T.; supervision, G.W.; project administration, G.W.; funding acquisition, G.W. All authors have read and agreed to the published version of the manuscript.

Funding

“Pioneer” and “Leading Goose” R&D Program of Zhejiang (Grant No. 2023C03203, 2023C03180, 2022C03174).

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Acknowledgments

Support for this project is provided by GUOHUA WU. The anonymous reviewers have also contributed considerably to the publication of this paper. In addition, I would like to thank the anonymous reviewers who have helped to improve the paper.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Vincent, E.; Korki, M.; Seyedmahmoudian, M.; Stojcevski, A.; Mekhilef, S. Detection of false data injection attacks in cyber–physical systems using graph convolutional network. Electr. Power Syst. Res. 2023, 217, 109118. [Google Scholar] [CrossRef]
  2. Kaytaz, U.; Sivrikaya, F.; Albayrak, S. Competitive Learning for Unsupervised Anomaly Detection in Intelligent Transportation Systems. In Proceedings of the ICC 2022-IEEE International Conference on Communications, Seoul, Republic of Korea, 16–20 May 2022; pp. 5433–5438. [Google Scholar]
  3. Bakkialakshmi, V.; Sudalaimuthu, T. Anomaly Detection in Social Media Using Text-Mining and Emotion Classification with Emotion Detection. In Proceedings of the Cognition and Recognition: 8th International Conference, ICCR 2021, Mandya, India, 30–31 December 2021; Revised Selected Papers. Springer: Berlin/Heidelberg, Germany, 2023; pp. 67–78. [Google Scholar]
  4. Li, M.M.; Huang, K.; Zitnik, M. Graph representation learning in biomedicine and healthcare. Nat. Biomed. Eng. 2022, 6, 1353–1369. [Google Scholar] [CrossRef] [PubMed]
  5. Ma, X.; Wu, J.; Xue, S.; Yang, J.; Zhou, C.; Sheng, Q.Z.; Xiong, H.; Akoglu, L. A Comprehensive Survey on Graph Anomaly Detection with Deep Learning. IEEE Trans. Knowl. Data Eng. 2021, 99, 1. [Google Scholar] [CrossRef]
  6. Liu, Y.; Li, Z.; Pan, S.; Gong, C.; Zhou, C.; Karypis, G. Anomaly detection on attributed networks via contrastive self-supervised learning. IEEE Trans. Neural Netw. Learn. Syst. 2021, 33, 2378–2392. [Google Scholar] [CrossRef] [PubMed]
  7. Fan, H.; Zhang, F.; Li, Z. Anomalydae: Dual autoencoder for anomaly detection on attributed networks. In Proceedings of the ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Virtually, 4–8 May 2020; pp. 5685–5689. [Google Scholar]
  8. Liu, K.; Dou, Y.; Zhao, Y.; Ding, X.; Hu, X.; Zhang, R.; Ding, K.; Chen, C.; Peng, H.; Shu, K.; et al. Bond: Benchmarking unsupervised outlier node detection on static attributed graphs. Adv. Neural Inf. Process. Syst. 2022, 35, 27021–27035. [Google Scholar]
  9. Eswaran, D.; Faloutsos, C.; Guha, S.; Mishra, N. Spotlight: Detecting anomalies in streaming graphs. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1378–1386. [Google Scholar]
  10. Manzoor, E.; Milajerdi, S.M.; Akoglu, L. Fast memory-efficient anomaly detection in streaming heterogeneous graphs. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1035–1044. [Google Scholar]
  11. Guha, S.; Mishra, N.; Roy, G.; Schrijvers, O. Robust random cut forest based anomaly detection on streams. In Proceedings of the International Conference on Machine Learning, PMLR, New York, NY, USA, 20–22 June 2016; pp. 2712–2721. [Google Scholar]
  12. Liu, F.T.; Ting, K.M.; Zhou, Z.H. Isolation forest. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Pisa, Italy, 15–19 December 2008; pp. 413–422. [Google Scholar]
  13. Zhou, L.; Zeng, Q.; Li, B. Hybrid anomaly detection via multihead dynamic graph attention networks for multivariate time series. IEEE Access 2022, 10, 40967–40978. [Google Scholar] [CrossRef]
  14. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  15. Velickovic, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. Stat 2017, 1050, 10–48550. [Google Scholar]
  16. Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30, 1025–1035. [Google Scholar]
  17. Park, D.; Hoshi, Y.; Kemp, C.C. A multimodal anomaly detector for robot-assisted feeding using an lstm-based variational autoencoder. IEEE Robot. Autom. Lett. 2018, 3, 1544–1551. [Google Scholar] [CrossRef] [Green Version]
  18. Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
  19. Cho, K.; Van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning phrase representations using RNN encoder-decoder for statistical machine translation. arXiv 2014, arXiv:1406.1078. [Google Scholar]
  20. He, Z.; Chen, P.; Li, X.; Wang, Y.; Yu, G.; Chen, C.; Li, X.; Zheng, Z. A spatiotemporal deep learning approach for unsupervised anomaly detection in cloud systems. IEEE Trans. Neural Netw. Learn. Syst. 2020, 34, 1705–1719. [Google Scholar] [CrossRef] [PubMed]
  21. Deng, L.; Lian, D.; Huang, Z.; Chen, E. Graph convolutional adversarial networks for spatiotemporal anomaly detection. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 2416–2428. [Google Scholar] [CrossRef] [PubMed]
  22. Zheng, L.; Li, Z.; Li, J.; Li, Z.; Gao, J. AddGraph: Anomaly Detection in Dynamic Graph Using Attention-based Temporal GCN. In Proceedings of the IJCAI, Macao, China, 10–16 August 2019; Volume 3, p. 7. [Google Scholar]
  23. Paudel, R.; Eberle, W. Snapsketch: Graph representation approach for intrusion detection in a streaming graph. In Proceedings of the 16th International Workshop on Mining and Learning with Graphs (MLG), San Diego, CA, USA, 24 August 2020. [Google Scholar]
  24. Ma, X.; Yang, J.; Wu, J.; Sheng, Q.Z. Towards graph-level anomaly detection via deep evolutionary mapping. In Proceedings of the ICLR 2023, Kigali, Rwanda, 1–5 May 2023. [Google Scholar]
  25. Teng, X.; Yan, M.; Ertugrul, A.M.; Lin, Y.R. Deep into hypersphere: Robust and unsupervised anomaly discovery in dynamic networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
  26. Roweis, S.T.; Saul, L.K. Nonlinear dimensionality reduction by locally linear embedding. Science 2000, 290, 2323–2326. [Google Scholar] [CrossRef] [PubMed] [Green Version]
  27. Belkin, M.; Niyogi, P. Laplacian eigenmaps and spectral techniques for embedding and clustering. Adv. Neural Inf. Process. Syst. 2001, 14, 585–591. [Google Scholar]
  28. Ahmed, A.; Shervashidze, N.; Narayanamurthy, S.; Josifovski, V.; Smola, A.J. Distributed large-scale natural graph factorization. In Proceedings of the 22nd International Conference on World Wide Web, Rio de Janeiro, Brazil, 13–17 May 2013; pp. 37–48. [Google Scholar]
  29. Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
  30. Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
  31. Wang, D.; Cui, P.; Zhu, W. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1225–1234. [Google Scholar]
  32. Goyal, P.; Ferrara, E. Graph embedding techniques, applications, and performance: A survey. Knowl.-Based Syst. 2018, 151, 78–94. [Google Scholar] [CrossRef] [Green Version]
  33. Medsker, L.R.; Jain, L. Recurrent neural networks. Des. Appl. 2001, 5, 64–67. [Google Scholar]
  34. Deng, A.; Hooi, B. Graph neural network-based anomaly detection in multivariate time series. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; Volume 35, pp. 4027–4035. [Google Scholar]
  35. Peng, Z.; Luo, M.; Li, J.; Xue, L.; Zheng, Q. A deep multi-view framework for anomaly detection on attributed networks. IEEE Trans. Knowl. Data Eng. 2020, 34, 2539–2552. [Google Scholar] [CrossRef]
  36. Hou, B.; Yang, J.; Wang, P.; Yan, R. LSTM-based auto-encoder model for ECG arrhythmias classification. IEEE Trans. Instrum. Meas. 2019, 69, 1232–1240. [Google Scholar] [CrossRef]
  37. Ding, K.; Li, J.; Bhanushali, R.; Liu, H. Deep anomaly detection on attributed networks. In Proceedings of the 2019 SIAM International Conference on Data Mining, SIAM, Calgary, AB, Canada, 2–4 May 2019; pp. 594–602. [Google Scholar]
  38. Zhao, H.; Wang, Y.; Duan, J.; Huang, C.; Cao, D.; Tong, Y.; Xu, B.; Bai, J.; Tong, J.; Zhang, Q. Multivariate time-series anomaly detection via graph attention network. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 17–20 November 2020; pp. 841–850. [Google Scholar]
Figure 1. The framework of AddAG-AE. φ ( Z ) denotes Equations (5) and (6). (a) Overview of AddAG-AE. (b) The details of GAT-AE.
Figure 1. The framework of AddAG-AE. φ ( Z ) denotes Equations (5) and (6). (a) Overview of AddAG-AE. (b) The details of GAT-AE.
Electronics 12 02763 g001
Figure 2. Performance comparison of AddAG-AE. (a) Loss of AddAG-AE on three datasets. (b) AUC scores of different approaches on three datasets.
Figure 2. Performance comparison of AddAG-AE. (a) Loss of AddAG-AE on three datasets. (b) AUC scores of different approaches on three datasets.
Electronics 12 02763 g002
Figure 3. Parameter sensitivity. (a) GAT Embedding dimension. (b) Tradeoff parameter α . (c) AUC score on NYC Taxi Trips with different parameters.
Figure 3. Parameter sensitivity. (a) GAT Embedding dimension. (b) Tradeoff parameter α . (c) AUC score on NYC Taxi Trips with different parameters.
Electronics 12 02763 g003
Table 1. Benefits and limitations of related work.
Table 1. Benefits and limitations of related work.
MethodsBenefitsLimitations
Steamspot [10]Broad adaptability to graph types, fast processing speed.Easily affected by noise, disregard the contextual relationships
Spotlight [9]Simple, fast and scalableOnly suitable for specific type graphs
Snapsketch [23]Strong feature representation, real-time anomaly detectionOnly suitable for specific type datasets, susceptible to experience
GmapAD [24]Full structural and attribute informationVulnerable to data size, high computational complexity
AddGraph [22]A few labeled dataset, spatiotemporal informationHighly sensitive to noise
DeepSphere [25]No manual labeling requiredNot suitable for sparse graphs
Factorization [26,27,28]High processing efficiency, capable for large-scale graphSensitive to network topology structures
Random Walk [29,30]Suitable for a few samples and missing dataNot suitable for large graphs or highly heterogeneous graphs
Deep Learning [31]Robustness on sparse networksHigh computational complexity, easily affected by hyperparameters
Table 2. Notations.
Table 2. Notations.
NotationDescription
G Dynamic attributed graph stream
ν t A set of nodes(t denotes timestamp t)
ε t A set of edges
χ t A set of nodes’ attribution
G ( t ) A snapshot
NThe number of nodes
MThe number of edges
dThe dimension of the nodes’ attribution
V t A set of nodes
A t R N × N An adjacency matrix
X t R N × d An attribute matrix ( X i t is the attribute of the i-th node)
Z i t R 1 × d Latent representation of node i after GAT layer
A ^ t R N × N Reconstructed adjacency matrix
X ^ t R N × d Reconstructed attribute matrix after GAT decoding
h e n t R 1 × d Latent representation of node i after LSTM layer
L Loss function
S ( V i ) Anomalous score of node i
Table 3. Sample data of Enron mail.
Table 3. Sample data of Enron mail.
DatatimeFromTo
1-Jan-2000[email protected][email protected]
1-Jan-2000[email protected][email protected]
31-Dec-2002[email protected][email protected]
Table 4. Sample data of NYC Taxi Trips.
Table 4. Sample data of NYC Taxi Trips.
Pickup_DatatimeDropoff_DatatimePickup_LongitudePickup_LatitudeDropoff_LongitudeDropoff_Latitude
1-Oct-2015 17:241-Oct-2015 17:32−73.9821548540.76793671−73.9646301340.76560211
1-Oct-2015 0:431-Oct-2015 0:54−73.9804153440.73856354−73.999481240.73115158
31-Jan-2016 18:1431-Jan-2016 18:20−74.0078506540.72357178−74.0076522840.74083328
Table 5. Sample of IDS 2017.
Table 5. Sample of IDS 2017.
DatatimeSipDip
5-Jul-2017 8:42192.168.10.50192.168.10.3
5-Jul-2017 8:43192.168.10.19129.6.15.29
5-Jul-2017 14:43192.168.10.14192.168.10.19
Table 6. Statistics of the used real-world datasets.
Table 6. Statistics of the used real-world datasets.
Dataset#v# ϵ
NYC Taxi Trips5632,675,244
Enron Message14762,479
IDS 20171323620,652
Table 7. AddAG-AE evaluation results.
Table 7. AddAG-AE evaluation results.
IndexDatasetNYC Taxi TripsIDS 2017Enron Mail
Method PrecisionRecallAUCPrecisionRecallAUCPrecisionRecallAUC
Spotlight0.69860.70070.67570.79540.68620.73530.34270.48850.3633
LSTM-AE0.58020.57340.57430.32030.43220.30360.40030.44030.4927
DeepSphere0.92330.74240.90080.47220.53830.44430.60740.55600.6228
AnomalyDAE0.87630.75650.88760.70320.57160.68270.50580.58450.5610
Dominant0.69340.65770.69560.62860.63730.63670.62920.6409 0.6544
GmapAD0.93120.75320.92010.71360.67040.68030.62130.60810.6451
AddAG-AE 0.9807 0.7143 0.9459 0.8160 0.7125 0.7521 0.6385 0.6551 0.6521
Table 8. AUC scores of ablation experiment on three datasets.
Table 8. AUC scores of ablation experiment on three datasets.
MethodsNYC Taxi TripsIDS 2017Enron Mail
AddAG-AE without 2D-mlp0.93550.73680.6256
AddAG-AE without SA-decoding0.93690.74130.6325
AddAG-AE0.94590.75210.6521
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Miao, G.; Wu, G.; Zhang, Z.; Tong, Y.; Lu, B. AddAG-AE: Anomaly Detection in Dynamic Attributed Graph Based on Graph Attention Network and LSTM Autoencoder. Electronics 2023, 12, 2763. https://doi.org/10.3390/electronics12132763

AMA Style

Miao G, Wu G, Zhang Z, Tong Y, Lu B. AddAG-AE: Anomaly Detection in Dynamic Attributed Graph Based on Graph Attention Network and LSTM Autoencoder. Electronics. 2023; 12(13):2763. https://doi.org/10.3390/electronics12132763

Chicago/Turabian Style

Miao, Gongxun, Guohua Wu, Zhen Zhang, Yongjie Tong, and Bing Lu. 2023. "AddAG-AE: Anomaly Detection in Dynamic Attributed Graph Based on Graph Attention Network and LSTM Autoencoder" Electronics 12, no. 13: 2763. https://doi.org/10.3390/electronics12132763

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop