AHD-SLE: Anomalous Hyperedge Detection on Hypergraph Symmetric Line Expansion

Li, Yingle; Yu, Hongtao; Li, Haitao; Pan, Fei; Liu, Shuxin

doi:10.3390/axioms13060387

Open AccessArticle

AHD-SLE: Anomalous Hyperedge Detection on Hypergraph Symmetric Line Expansion

by

Yingle Li

^1,2

,

Hongtao Yu

²,

Haitao Li

^1,2,

Fei Pan

² and

Shuxin Liu

^2,*

¹

Information Technology Research Institute, People’s Liberation Army Strategic Support Force Information Engineering University, Zhengzhou 450001, China

²

National Digital Switching System Engineering and Technological R&D Center, Zhengzhou 450001, China

^*

Author to whom correspondence should be addressed.

Axioms 2024, 13(6), 387; https://doi.org/10.3390/axioms13060387

Submission received: 29 March 2024 / Revised: 28 May 2024 / Accepted: 5 June 2024 / Published: 7 June 2024

(This article belongs to the Special Issue Mathematical Modelling of Complex Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Graph anomaly detection aims to identify unusual patterns or structures in graph-structured data. Most existing research focuses on anomalous nodes in ordinary graphs with pairwise relationships. However, complex real-world systems often involve relationships that go beyond pairwise relationships, and insufficient attention is paid to hypergraph anomaly detection, especially anomalous hyperedge detection. Some existing methods for researching hypergraphs involve transforming hypergraphs into ordinary graphs for learning, which can result in poor detection performance due to the loss of high-order information. We propose a new method for Anomalous Hyperedge Detection on Symmetric Line Expansion (AHD-SLE). The SLE of a hypergraph is an ordinary graph with pairwise relationships and can be backmapped to the hypergraph, so the SLE is able to preserve the higher-order information of the hypergraph. The AHD-SLE first maps the hypergraph to the SLE; then, the information is aggregated by Graph Convolutional Networks (GCNs) in the SLE. After that, the hyperedge embedding representation is obtained through a backmapping operation. Finally, an anomaly function is designed to detect anomalous hyperedges using the hyperedge embedding representation. Experiments on five different types of real hypergraph datasets show that AHD-SLE outperforms the baseline algorithm in terms of Area Under the receiver operating characteristic Curve(AUC) and Recall metrics.

Keywords:

hypergraph learning; hypergraph anomaly detection; anomalous hyperedge detection; symmetric line expansion

MSC:

68T05; 68T07; 05C65

1. Introduction

Graphs can effectively describe the interaction relationships in complex networks, and graph anomaly detection has been widely studied and applied in various fields, especially in the field of cyberspace security. Graph structures can effectively model network attack [1], network fraud [2], network spam [3], and other network attack behaviors. These attack behaviors and the entities involved can be effectively identified using graph anomaly detection methods. However, with the deepening exploration of the real-world, it has been found that in many real complex networks, there are not only pairwise interactions between two individuals, but there are also beyond pairwise interactions where more than two individuals interact simultaneously (or in a specific order), i.e., higher-order interactions. For example, multiple researchers work together to publish academic articles in a coauthorship network [4], multiple metabolites work together to accomplish metabolic reactions in biological systems [5], multiple recipients can participate in a single email interaction in a mail network [6], and community activities in social networks are accomplished jointly under the interaction of multiple individuals [7]. Such higher-order interactions among multiple individuals are difficult to be well described by simple pairwise networks. Suppose A, B, and C collaborated on a paper, and their coauthorship can be represented by a graph G with three nodes (A, B, and C) and three edges (A–B, A–C, and B–C). The graph G is able to describe the cooperation between A and B, A and C, and B and C, but it is unable to represent the cooperation between A, B, and C. This means that there will be a loss of information when graphs are used to represent higher-order interactions. In order to better represent the beyond pairwise interactions, researchers have begun to model beyond pairwise interactions with hypergraphs, which can better portray the complex systems. The hypergraph is composed of hyperedges that can connect any number of nodes and can effectively describe the interaction relationships between multiple nodes [8,9,10,11,12].

Compared to the ordinary graph with pairwise relationships, the biggest difference of the hypergraph is that each hyperedge can contain an arbitrary number of nodes. The node feature aggregation methods commonly used in traditional graph anomaly detection can only deal with pairwise relations, and they cannot be directly used to aggregate the information of this arbitrary number of nodes in the hyperedge. Therefore, how to aggregate the features of any number of nodes in hyperedges to learn the hyperedge representation is one of the biggest challenges in hypergraph learning [13]. In order to apply traditional graph learning methods to hypergraphs, some studies have developed hypergraph expansion methods to map hypergraphs into pairwise graphs on which features can be aggregated, thus achieving hypergraph learning. There are three main hypergraph expansion methods, including star expansion [14], clique expansion [15], and line expansion [16] (Figure 1). Star expansion is to convert the hyperedge into a hypernode, and all nodes in the hyperedge are connected to this hypernode to form a star structure centered on this hypernode; the star expansion graph is isomorphic. Clique expansion is to convert the hyperedge into a fully connected graph. All nodes in the hyperedge are connected with each other, and the clique expansion graph will lose the higher-order information of the hypergraph. The line expansion graph, also known as the dual hypergraph, converts the hyperedge into a node. If two hyperedges have the same node between them, then there is an edge between their line graph nodes, and the line expansion graph loses higher-order information, which is similar to clique expansion. The main reason for the higher-order information loss of hypergraph expansion is that ordinary graphs can only represent pairwise relationships, but not relationships that go beyond pairs.

In order to overcome the shortcomings of existing hypergraph expansion methods, Yang et al. [17] proposed a new hypergraph expansion method based on the topological structure of hypergraphs. This new expansion method targets the special symmetric structure of hypergraphs, i.e., a hyperedge can contain multiple nodes, and a node can be contained in multiple hyperedges, which treats the nodes and hyperedges as equal and is able to reserve the higher-order information. In this paper, we call this method Symmetric Line Expansion (SLE for short) to distinguish it from the previous line expansion. Unlike [17], which uses SLE for hypergraph node classification, we use SLE for anomalous hyperedge detection. We first construct a novel hyperedge representation learning framework, and we then propose an Anomalous Hyperedge Detection method on hypergraph Symmetric Line Expansion (AHD-SLE for short). The AHD-SLE performs information aggregation in the expansion graph and can learn the hyperedge representation for anomalous hyperedge detection. Experiments on five different types of datasets show that AHD-SLE has obvious performance advantages over the baseline algorithm in AUC and Recall metrics.

The main contributions of this study are summarized below:

(1): We construct a novel hyperedge representation learning framework based on SLE graphs, which can preserve the higher-order information of hypergraphs.
(2): We propose an Anomalous Hyperedge Detection method on hypergraph Symmetric Line Expansion (AHD-SLE). AHD-SLE uses the hyperedge embedding representation learned through SLE to detect anomalous hyperedge.
(3): Experiments on five different types of real-world hypergraph datasets show that AHD-SLE has obvious performance advantages over the baseline algorithm in AUC and Recall metrics.

The remainder of the paper is structured as follows: In Section 2, we introduce the related work, including anomaly (hyper)edge detection and hypergraph representation learning. In Section 3, we introduce some preliminaries. In Section 4, we describe our method for anomalous hyperedge detection. The experimental results are analyzed in Section 5. Finally, the conclusions are presented in Section 6.

2. Related Works

2.1. Anomaly Edge and Hyperedge Detection

Anomaly detection is a type of data mining that aims to identify anomalous patterns that deviate from the majority in a dataset. Graph anomaly detection is a specific type of anomaly detection for graph-structured data that aims to identify anomalous graph objects such as nodes, edges, and subgraphs in a single graph by analyzing the structural and attribute information of the graph data. Most of the current research in graph anomaly detection is focused on anomalous node detection, including traditional methods (such as OddBall [18] and Fraudar [19]), network representation learning-based methods (such as NetWalk [20] and DeepSphere [21]), and GNN (Graph Neural Network)-based representation learning methods (such as DOMINANT [22] and ALARM [23]), and so on. Anomalous edge detection focuses on the relationship between entities, where the similarity between nodes in an edge is typically used to measure the degree of the edge anomaly. The lower the similarity, the more likely the edge is anomalous. These methods of node similarity measurement mainly include heuristic methods and graph representation learning-based methods. Heuristics methods use expert experience or statistical laws to design node similarity metrics, such as CN, AA, RA, Katz, Jaccard, and others [16]. The performance is heavily influenced by the quality of the metric design, which can lead to poor generalization. Network representation learning-based approaches map nodes to low-dimensional vectors, and the vector distance is used to measure node similarity. This includes random walk-based algorithms like DeepWalk [24] and Node2Vec [25], as well as neighborhood similarity-based algorithms like LINE [26]. These algorithms embed the network topology into the node vectors and have limited effectiveness, as they do not make use of node attributes; on the other hand, GNN-based representation learning methods such as GCN, GAT, and GraphSAGE [27] use both the node attributes and network structure to learn the nodes representation for computing node similarity, which results in better performance than the topology-only methods. In addition to the above methods through node embedding representation, there are also some other methods. ICANE [28] directly performs edge representation learning for anomalous edge detection. BOURNE [29] constructs discrimination pairs for both target nodes and target edges, and it employs self-supervised learning to compare these pairs to derive anomaly scores, which can efficiently detect anomalous nodes and edges simultaneously.

However, the current methods for anomalous edge detection only focus on pairwise interactions and cannot be directly applied to anomalous hyperedge detection. To address this issue, two main approaches have been proposed. The first involves extending the similarity metrics used in the ordinary graphs to hypergraphs, such as HCN [30], HKatz [30], and HRA [31]. The other is to convert the hypergraph into an ordinary graph and then perform graph representation learning on the ordinary graph using node attributes or topology to obtain the low-dimensional embedding of the nodes. Such methods include HGNN [32], HyperSAGE [33], Hyper-Atten [34], HyperGCN [35], and other hypergraph representation learning methods. However, these methods may result in some information loss, which can affect the final detection performance. These hypergraph representation learning-based methods mentioned above are not specifically designed for the anomalous hyperedge detection task. There are, however, some methods that are specifically designed for anomalous hyperedge detection. For instance, the makers of LSH [36] designed similarity metrics between hyperedges in a hyperedge stream using locally sensitive hashing, and they regarded those that were significantly different from the previous ones as anomalous hyperedges. The makers of HashNWalk [37] designed similarity metrics based on the hash function and random walk similarity metrics for detecting two types of anomalies in the hyperedge data stream: unexpected hyperedges consisting of unnatural combinations of nodes and bursty hyperedges that suddenly appear in a short period of time. The makers of ARCHER [38] developed the random walk with restart (RWR) on hypergraph for hypergraph embedded representation, and they then designed a hyperedge normality score based on the relevance scores provided by hypergraph RWR for anomalous hyperedge detection. The makers of AHIN [39] addressed the problem of abnormal event detection in attribute heterogeneous information networks (AHINs) by modeling the AHINs as hypergraphs, as well as the interactions (i.e., events) of an entity’s collection as hyperedges. Then, a hypergraph contrastive learning method was proposed to capture abnormal event patterns. The summary of these methods is provided in Table 1.

2.2. Hypergraph Representation Learning

Hypergraphs are an expansion of traditional pairwise relational graphs, while traditional pairwise graphs can also be regarded as a special case of hypergraphs. Some existing research methods convert hypergraphs into graphs to simplify problems on hypergraphs into traditional graph learning problems [40], and they then apply heuristic algorithms or GNNs to solve problems on hypergraphs. As mentioned above, hypergraph expansion methods include star expansion, clique expansion, line expansion, and some of their variants. The summary of hypergraph representation learning is provided in Table 2. Unfortunately, most of these methods learn node embedding for downstream node classification tasks, and we have not found methods that can learn hyperedge embedding for anomaly detection tasks.

Star expansion is convenient for information propagation among nodes and hyperedges in the hypergraphs. HyperGT [41] implements node-to-node, node-to-hyperedge, and hyperedge-to-hyperedge information propagation on star expansion graphs, which effectively integrates global interactions while preserving the local connectivity model and realizes a more comprehensive learning of hypergraph representations. HyperSAGE [33] designs two levels of intra-hyperedge and inter-hyperedge aggregation functions to achieve an accurate and effective hypergraph information propagation. The makers of UniGNN [42] improved the message passing process based on HyperSAGE through defining the update process of the GNN as a two-stage aggregation process, which makes the GNN more natural to generalize to hypergraphs. FamilySet [43] develops a message passing framework on star expansion graphs, which can synchronously update the embedded representations of the nodes and observed hyperedges. The makers of HyperSaR [44] designed a two-step hypergraph convolution operation, which first propagates the node information to the belonging hyperedge to obtain the hyperedge embedding, then propagates the hyperedge embedding back to the node to obtain a new node embedding, and finally obtains a smooth node embedding based on the neighborhood information.

Clique expansion is convenient for the information interaction between pairwise nodes in the hypergraph. The HGNN [32] defines a hypergraph convolution operation using a weighted clique expansion graph and migrates the GNN to hypergraphs successfully, as well as implements the aggregation and learning of topology and node features of hypergraphs. To address the problem of imperfect initial hypergraph structure, DHGNN [45] adds a hypergraph construction module on the basis of the HGNN to reconstruct the hypergraph structure after each hypergraph convolution operation. Hyper-Atten [34] introduces attention mechanism based on the HGNN, which assigns different attention weights to different nodes in the information aggregation process and improves the performance of hypergraph learning. In order to overcome the computational complexity caused by too many edges in the clique expansion graph, the HyperGCN [35] simplifies the clique expansion graph by sampling, and the hyperedges degenerate into edges with weights, which are proportional to the maximum distances among the hypergraph nodes, and finally performs graph convolution on this simplified clique expansion graph to aggregate the information. However, sampling leads to the loss of node and topology information and affects the learning performance. The HI-GCN [46] proposes a graph convolution operation that fuses information from hypergraph and its clique expansion graph, which can learn the structural information and higher-order interactions more efficiently.

Line expansion is convenient for information interaction between hyperedges in the hypergraph. The LHCN [16] first proposed the concept of line graph for hypergraphs, which maps a hypergraph to a weighted line graph with attributes and performs a graph convolution operation on the line graph. The computational cost of the LHCN is high due to the explicit computation of line graphs. In order to reduce the computational cost of the line graph, the makers of HAIN [47] proposed an implicit computation of line graph generation and introduced a self-attention mechanism to measure the vertex weights of the line graph, which improves the efficiency and expressiveness of the line graph. The makers of LEGCN [17] fused the ideas of clique expansion and line expansion and proposed a new line expansion graph that is bijective with the original hypergraph, thus making it able to preserve the higher-order information in the hypergraph well, and it performed well in the node classification task.

3. Preliminaries

3.1. Problem Statement

Definition 1

(Hypergraph). A hypergraph is the extension of the graph consisting of a collection of hyperedges, each of which can contain any number of nodes. A hypergraph can be defined as

G = (V, E)

, where

V = {v_{1}, v_{2}, \dots, v_{m}}

denotes the set of hypergraph nodes,

E = {e_{1}, e_{2}, \dots, e_{n}}

denotes the set of hyperedges, and the hyperedge

e_{i} = {v_{i_{1}}, v_{i_{2}}, \dots, v_{i_{k}}}

can contain any number of nodes.

Definition 2

(Hypergraph node degree). The hypergraph node degree is defined as the number of hyperedges, including the node, and is usually denoted by

D_{v} = | v |

.

Definition 3

(Hyperedge degree). The hyperedge degree is defined as the number of nodes contained in the hyperedge and is usually denoted by

D_{e} = | e |

. If all hyperedges have the same degree k, it is called a k-uniform hypergraph. It is easy to see that an ordinary graph is a 2-uniform hypergraph.

Definition 4

(Incidence matrix). Hypergraphs are usually represented by an incidence matrix

H \in {0, 1}^{| V | \times | E |}

, where each row represents a node, and each column represents a hyperedge. The elements of the matrix are defined as

h_{i j} = \{\begin{matrix} 1 & v_{i} \in e_{j} \\ 0 & v_{i} \notin e_{j} \end{matrix}

(1)

Definition 5

(Adjacency matrix). Similar to the graph, the adjacency matrix of a hypergraph represents the neighborhood between each node. It is often defined as

A = H H^{T} - D_{v} \in {0, 1}^{| V | \times | V |}

. The elements of the matrix are defined as

a_{i j} = \{\begin{cases} 1 & v_{i} and v_{j} has common hyperedge, \\ 0 & v_{i} and v_{j} hasn' t common hyperedge . \end{cases}

(2)

Definition 6

(Line Expansion). Assuming

G^{l} = (V^{l}, E^{l})

is the line expansion of hypergraph

G = (V, E)

, the nodes and hyperedges are exchanged. In line expansion, hyperedges are mapped to nodes, i.e.,

e \in E \to v^{l} \in V^{l}

, and

V^{l} = E

; the line expansion nodes are connected when the corresponding hyperedges have common nodes, i.e.,

{v | v \in e_{p} \cap e_{q}} \to e^{l} = (v_{e_{p}}^{l}, v_{e_{q}}^{l})

, and

E^{l} = {((v_{e_{p}}, v_{e_{q}})) | e_{p} \in E, e_{q} \in E, e_{p} \cap e_{q} \neq Ø}

. An example of line expansion is in Figure 1d.

Definition 7

(Anomalous Hyperedge Detection). For a given hypergraph

G = (V, X, E)

, V represents the node set,

X \in R^{m \times d}

represents the feature matrix of the nodes, and E represents the known hyperedge set. The goal of anomalous hyperedge detection is to learn an anomaly scoring function

I (e_{i}) = y_{i}

such that the anomalous hyperedge score

I (e | e \in E)

is greater than that of the normal hyperedge score

I (e | e \notin E)

.

3.2. Symmetric Line Expansion

The hypergraph nodes and hyperedges can be regarded as a special symmetric structure, namely, i.e., each node can be connected to multiple hyperedges, and each hyperedge can conversely be connected to multiple nodes. Motivated by the special symmetric structure of hypergraphs, the SLE graph treats nodes and hyperedges equally and maps them to a pairwise relationship graph. The mapping of SLE graphs mainly consists of two parts (Figure 2):

Each incident node–hyperedge pair $(v_{i} e_{j})$ in the hypergraph is mapped to a “line node” $e_{j} v_{i}$ in SLE.
“Line nodes” are connected if they have the same node or hyperedge.

The formal description of SLE is as follows:

Let

G_{l} = (V_{l}, E_{l})

denote the SLE graph of hypergraph

G_{H} = (V, E)

. The node set

V_{l}

of

G_{l}

is composed of node–hyperedge pairs

{(v, e) | v \in e, v \in V, e \in E}

from

G_{H}

. The edge set

E_{l}

and adjacency matrix

A_{l}

are defined by the pairwise relation with

A_{l} (v_{l}, u_{l}) = 1

if either

v_{a} = v_{c}

or

e_{b} = e_{d}

for

v_{l} = (v_{a}, e_{b}) \in V_{l}

, as well as for

u_{l} = (u_{c}, e_{d}) \in V_{l}

.

From the above description, it can be observed that SLE has the following characteristics:

The incident node–hyperedge pair is bijective with the “line node” of the SLE. As a result, the hypergraph can also be mapped back from its corresponding SLE without the loss of higher-order information. That is, the SLE can preserve the higher-order information of the hypergraph.
In SLE, the nodes are isomorphic, while the edges are heterogeneous. There are two types of edges: edges formed by common nodes, and edges formed by common hyperedges. Due to the previously mentioned symmetric structure of hypergraph, these two types of edges in SLE can be considered isomorphic for information aggregation.

4. Method

We propose an Anomalous Hyperedge Detection method on hypergraph Symmetric Line Expansion (AHD-SLE). The AHD-SLE is composed of three parts: hypergraph expansion, hyperedge representation learning, and hyperedge anomaly scoring (Figure 3).

The basic process of AHD-SLE is as follows:

(1): Map the hypergraph into its SLE graph and construct the feature mapping matrix; then, calculate the node feature matrix of SLE using the node feature mapping matrix;
(2): Feed the SLE’s feature matrix and adjacency matrix into the GCN network for node feature aggregation, and then calculate the hyperedge embedding vector using the hyperedge backmapping matrix;
(3): Design the anomaly scoring function to calculate the anomaly score for each hyperedge embedding vector, and the hyperedge with lower scores is considered anomalous.

4.1. Hypergraph Expansion

Because of the bijection between node–hyperedge pairs and nodes of SLE, their features can also be mapped to each other. There are four mapping operations between hypergraph nodes, edges, and SLE nodes. Specific definitions are given below:

Node map matrix: This can map the hypergraph nodes features to SLE nodes. If the node part of the SLE node $v_{l}$ is v, the features of v are directly used as the features of $v_{l}$ . The definition is as follows:

$P_{V} (v_{l}, v) = \{\begin{matrix} 1 & if the node part of v_{l} is v \\ 0 & otherwise . \end{matrix}$

(3)

Then, the SLE node feature matrix from hypergraph nodes can be calculated by $X_{L} = P_{V} X_{V}$ .
Node backmap matrix: This can map the SLE nodes features to hypergraph nodes. Since hypergraph node v may be converted to multiple SLE nodes, multiple SLE nodes whose node portion is v need to be aggregated in the backmapping. Considering that each SLE node also contains the hyperedge part where v is located, the inverse of the hyperedge degree is used as the aggregation weight. The definition is as follows:

$P_{v}^{'} (v, v_{l}) = \{\begin{matrix} \frac{\frac{1}{δ (e)}}{\sum_{(v, e^{'}) \in V_{l}} \frac{1}{δ (e^{'})}} & if the node part of v_{l} is v \\ 0 & otherwise . \end{matrix}$

(4)

where $δ (e)$ is the degree of hyperedge e, and the feature matrix of hypergraph nodes can be calculated by $X_{v} = P_{v}^{'} X_{l}$ .
Hyperedge map matrix: This can map the hyperedge features to SLE nodes. If the edge part of the SLE node $v_{l}$ is e, the features of e are directly used as the features of $v_{l}$ . The definition is as follows:

$P_{e} (v_{l}, e) = \{\begin{matrix} 1 & if the hyperedge part of v_{l} is e \\ 0 & otherwise . \end{matrix}$

(5)

Then, the SLE node feature matrix from the hyperedges can be calculated by $X_{l} = P_{e} X_{e}$ .
Hyperedge backmap matrix: This can map the SLE nodes features to hyperedges. Similar to the node backmapping, the hyperedge e may be mapped to multiple SLE nodes, and the inverse of the hypergraph nodes degree is used as the feature aggregation weight. The definition is as follows:

$P_{e d g e}^{'} (e, v_{l}) = \{\begin{matrix} \frac{\frac{1}{δ (v)}}{\sum_{(v^{'}, e) \in V_{l}} \frac{1}{δ (v^{'})}} & if the hyperedge part of v_{l} is e \\ 0 & otherwise . \end{matrix}$

(6)

where $δ (v)$ is the degree of hypergraph node v, and the feature matrix of the hypergraph nodes can be calculated by $X_{E} = P_{E}^{'} X_{L}$ .

4.2. Hyperedge Representation Learning

The mapping matrix in the previous section realizes the mapping and backmapping between the hypergraph node, the hyperedge, and the SLE nodes. Given a hypergraph

G = (V, X, E)

,

X_{V} \in R^{| V | \times d}

represents the feature matrix of the nodes,

G_{l} = (V_{l}, X_{l}, E_{l})

is the corresponding SLE, and the mapping matrices are

P_{v}

,

{P^{'}}_{v}

,

P_{e}

, and

{P^{'}}_{e}

. Using these mapping matrices, the hyperedge embedding representation can be obtained by the following steps:

(1): Node feature mapping: Map the node features of the hypergraph to the node features of the SLE.

$H^{(0)} = P_{v} X_{V} \in R^{| V_{l} | \times d_{i}}$

(7)

where $H^{(0)}$ represents the initial node features of the SLE, and $d_{i}$ represents the dimension of $H^{(0)}$ .
(2): Node information aggregation: In SLE, there are two types of neighbors: those based on common nodes and those based on common hyperedges. In order to better aggregate information, different aggregation weights are assigned to these two different types of neighbors. Graph convolution operations are then used to aggregate neighborhood features.

$h_{(v, e)}^{(k + 1)} = σ (\sum_{e^{'}} w_{v} h_{(v, e^{'})}^{(k)} θ^{(k)} + \sum_{v^{'}} w_{e} h_{(v^{'}, e)}^{(k)} θ^{(k)})$

(8)

where $h_{(v, e)}^{(k)}$ represents the kth layer feature matrix of the SLE node $(v, e)$ . $σ (\cdot)$ is a nonlinear activation function, such as ReLU or LeakyReLU. $θ^{(k)}$ is the weight parameter of the kth layer, and it is the hyperparameter that needs to be learned. $w_{v}$ and $w_{e}$ are two hyperparameters representing the information aggregation weights of common node-based neighbors and common hyperedge-based neighbors, respectively. Putting the $w_{v}$ and $w_{e}$ in the adjacency matrix can be represented as

$A_{l} = w_{v} A_{l v} + w_{e} A_{l e} = \{\begin{matrix} w_{v} & common nodes - based \\ w_{e} & common hyperedge - based \\ 0 & otherwise . \end{matrix}$

(9)

By adding two orders of self-loops of nodes in the process of information aggregation, then ${\tilde{A}}_{l} = 2 I + A_{l}$ , and the matrix representation of the convolution is

$H^{(k + 1)} = σ ({\tilde{D}}_{l}^{- \frac{1}{2}} {\tilde{A}}_{l} {\tilde{D}}_{l}^{- \frac{1}{2}} H^{(k)} Θ^{(k)})$

(10)

where $σ (\cdot)$ is a nonlinear activation function, and $Θ^{(k)}$ is the weight parameter matrix to be learned for the kth layer.
(3): Hyperedge feature back-mapping: Backmap the line node feature to the hyperedge using $P_{e}^{'}$

$X_{E} = P_{e}^{'} H^{(K)} \in R^{| E | \times d_{o}}$

(11)

where $X_{E}$ is the hyperedge feature matrix, $H^{(K)}$ is the feature matrix of the SLE node after K iterations, and $d_{o}$ denotes the dimension of $H^{(K)}$ .

4.3. Hyperedge Anomaly Score

After obtaining the hyperedge embedding representation, the anomalous hyperedge detection is treated as a binary classification problem, and the MLP is used to compute the anomaly score.

I_{e} = σ (\sum_{i = 0}^{d_{0}} w_{i} x_{i} + b_{i})

(12)

where

σ (\cdot)

is a nonlinear activation function such as ReLU and LeakyReLU,

d_{o}

is the dimension of feature matrix of the hyperedge,

w_{i}

is the weight of the feature,

b_{i}

is the bias, and

w_{i}

and

b_{i}

need to learn.

4.4. Loss Function

The model training objective is to make the anomalous hyperedge score higher and the normal hyperedge score lower, so the following loss function is defined:

L o s s = \frac{1}{F} \sum_{f \in F} σ ((\frac{1}{| E |} \sum_{e \in E} I_{e}) - I_{f})

(13)

where E is the positive sample (i.e., the normal hyperedge), F is the negative sample (i.e., the anomalous hyperedge), and

σ (\cdot) = log (1 + exp (\cdot))

is the activation function.

The optimization function uses the more efficient Adam algorithm to train the model. During the training stage, the network weights are obtained by minimizing the loss function in Equation (13) so that the anomalous hyperedge has a higher mean score than the normal hyperedge. In the testing stage, the anomaly score of the test hyperedges can be calculated using the trained model.

5. Experiments

5.1. Datasets

To evaluate the effectiveness of the proposed AHD-SLE method, we used five real-world hypergraph datasets from various fields as datasets for our experiments, including a knowledge graph, a coauthorship network, and three chemical reaction networks (one organic and two metabolic). We used the same preprocessed hypergraphs as NHP-U [48]. The specific information of the datasets is shown in Table 3.

5.2. Anomaly Injection

Since the hypergraph datasets are not labeled with anomalies, we use the negative sampling method to construct anomalous hyperedges as the negative samples and inject them into the normal hyperedges. In order to improve the discriminative ability of the training model, negative sampling should select those that are more similar to positive samples, thus ensuring that the model can learn more fine-grained differences between positive and negative samples. For this, we use a similarity-based negative sampling method [48] to generate negative samples f with the same degree for each positive sample hyperedge

e \in E

. First, we randomly select

α d

node from e to join f, and then we randomly select the remaining

1 - α d

nodes from the neighboring nodes of hyperedge e to join f. Here,

α \in [0, 1]

is a hyperparameter used to control the node similarity between negative and positive samples. The larger

α

is, the more overlapping nodes the negative and corresponding positive samples have, and the more similar they are. The negative sample set F is obtained by constructing a corresponding negative hyperedge for each positive hyperedge in this way.

5.3. Baselines

We choose the mainstream hypergraph embedding methods as the baseline:

DeepWalk (Code available at https://github.com/shenweichen/graphembedding, (accessed on 4 June 2024)) [24]: This method obtains a node sequence through n-step random walks and then uses word2vec’s skip-gram algorithm to obtain the embedded representation of the nodes.
LINE (Code available at https://github.com/shenweichen/graphembedding, (accessed on 4 June 2024)) [26]: This method defines the first-order and second-order node similarity and then concatenates them to obtain the embedded representation of the nodes.
Node2Vec (Code available at https://github.com/shenweichen/graphembedding, (accessed on 4 June 2024)) [25]: This method balances the random walk strategy with hyperparameters during the sampling process and inputs the obtained node sequence into the skip-gram to learn node representations.
HGNN (Code available at https://github.com/iMoonLab/HGNN, (accessed on 4 June 2024)) [32]: This method extends graph convolution operations to hypergraphs and designs hypergraph convolution operations to learn embedding representations of high-order data structures.
HyperGCN (Code available at https://github.com/malllabiisc/HyperGCN, (accessed on 4 June 2024)) [35]: This method uses a nonlinear Laplace operator to define the GCN on a hypergraph and then aggregates neighborhood information to obtain the embedded representation of nodes.
Hyper-SAGNN (Code available at https://github.com/ma-compbio/Hyper-SAGNN, (accessed on 4 June 2024)) [49]: This method combines two graph embedding methods, random walk, and encoder to design a new self-attention mechanism and obtain an embedding representation of nonuniform hypergraphs.
UniGNN (Code available at https://github.com/OneForward/UniGNN, (accessed on 4 June 2024)) [42]: This method proposes a unified framework to characterize the message-passing process in the GNN and HyperGNN through defining the update process of the GNN as a two-stage aggregation process.
NHP-U (Code available at https://drive.google.com/file/d/1z7XwXo5ohTudUTjyS4KfQbqwze24Ry-o/view, (accessed on 4 June 2024)) [48]: The method obtains the hyperedge embedded representation through a hyperedge-aware GCN layer that aggregates information about the nodes in the hyperedge and a max–min-based scoring function.

In the above baseline methods, DeepWalk, LINE, and Node2Vec cannot be directly applied to hypergraphs, and they are used on the clique expansion graph of the hypergraphs to obtain the node embedding vectors (experimentally, we set the embedding vectors to have a dimensionality of 128). Here, we define a hyperedge anomaly score function with the node embedding vector:

I_{e} = σ (\frac{1}{|e|} \sum_{\binom{v_{i}, v_{j} \in e,}{i \neq j}} x_{i}^{T} x_{j})

(14)

where

| e |

represents the hyperedge degree, and

x_{i}

represents the node embedding vector.

5.4. Metrics

In order to comprehensively and accurately evaluate the proposed AHD-SLE method, we selected two common anomaly detection indicators: AUC and Recall. They are defined as follows:

The AUC metric is widely used to measure the accuracy of detection methods. The measurement of this index is in the range of [0.5,1]. We randomly pick an anomalous hyperedge and a normal hyperedges to compare their scores; if among n independent comparisons, there are $n^{'}$ times the anomalous hyperedge having a higher score and $n^{″}$ times they have the same score, the AUC value is defined as follows:

$A U C = \frac{n^{'} + 0.5 n ″}{n}$

(15)
The Recall is an indicator to evaluate whether a detection algorithm is comprehensive, i.e., how many anomalous hyperedges are correctly detected. Here, we only focus on the top-ranked detection results, sort the anomaly scores from high to low, and select the top k detection results (k is half of the anomalous hyperedges) to calculate the Recall@k (represented by R@k), which is defined as follows:

$R @ k = \frac{m^{'}}{M}$

(16)

where $m^{'}$ is the number of correctly detected anomalous hyperedges among the top k with the highest detection scores, and M is the total number of actually anomalous hyperedges.

5.5. Implementation Details

The hardware environment for the experiments is 64-core Intel(R) Xeon (R) Gold 5218R CPU @ 2.10 GHz CPU, NVIDIA RTX A6000, 64 GB RAM, and 48 GB GPU Memory, and the software environment is Ubuntu 22.04 LTS, CUDA 12.2, Python 3.7, and PyTorch 1.13.1.

In the detection performance experiment, we randomly selected 20% of the hyperedges (including normal and anomaly) in each dataset to train the model, 10% for validation to optimize the hyperparameters, and the remaining 70% for testing to validate the effectiveness of the algorithm. The negative sample similarity parameter

α

= 0.5. During the training process, the Adam [50] algorithm was used to minimize the loss function with the learning rate set to 0.01 and the hyperparameters set to

w_{v}

= 0.5 and

w_{e}

= 1.5. The dropout mechanism was used to avoid overfitting, and the value was set to 0.5. The tuning of all the parameters in the algorithm was accomplished through a grid search strategy, where 200 independent experiments were averaged to eliminate the possible effects of random errors. The embedding size of DeepWalk, LINE, and Node2Vec was set to 128.

5.6. Results

5.6.1. Detection Performance

The AUC and R@k of the proposed model AHD-SLE and the baselines are shown in Table 4.

The experimental results show that the proposed method AHD-SLE achieved the best detection performance on five different types of datasets. Compared to the suboptimal algorithm, the AUC increased by approximately 4–7%, with an average increase of 5%. The R@k increased by approximately 2–10%, with an average increase of 6%. The experiment verified the superiority of the proposed model in terms of the performance of anomalous hyperedges detection.

In the experiment, DeepWalk, LINE, and Node2Vec are traditional graph representation learning methods that learn topological structure information on hypergraph clique expansion, which causes some information loss. In addition, they only learn topological structure features, which limits the amount of information they can learn and results in the lowest predictive performance; HGNN, HyperGCN, Hyper-SAGNN, UniGNN, and NHP-U use the neighborhood information aggregation ability of the GNN to learn the hypergraphs topology structure and node attributes, thus resulting in improved performance. However, due to the learning of node embedding representations, the anomaly scoring function is designed for nodes and not directly for hyperedges. In contrast, AHD-SLE preserves the higher-order information of the hypergraph using SLE graphs, fully utilizes node attributes and topological structure features in information aggregation, and directly learns hyperedge representation for anomaly detection. Therefore, the detection performance of AHD-SLE has advantages compared to baselines.

5.6.2. Algorithmic Robustness

In order to verify the robustness of the proposed method, we tested the effect of different anomalous proportions in each dataset on the detection performance. The experiments used the dataset in Table 3, the anomalous proportion injected to dataset p was set to 0.1, 0.3, 0.5, 0.7, and 0.9, and the rest of the parameters were set to be the same as those of detection performance experiment. The experimental results are shown in Figure 4.

In the AUC experiment, the USPTO dataset changed significantly, thus showing an overall upward trend, reaching the maximum at p = 0.7, and then decreasing slightly. The other four datasets showed little change. When p = 0.5, i.e., when the positive and negative samples were the same, the AUC value reached the maximum, and in other cases, the AUC value was slightly smaller. In the R@k experiment, the R@k values of the five datasets showed a downward trend with the increase of the anomaly proportion. The experimental results show that the proposed AHD-SLE algorithm has relatively good robustness regarding the AUC metric and poor robustness regarding the R@k metric with different anomaly proportions. The larger the anomaly proportion was, the lower the R@k was.

5.6.3. Parameter Sensitivity

In order to analyze the importance of two types of neighbors (common node-based and common hyperedge-based) for information aggregation in SLE, the AHD-SLE model sets two hyperparameters of

w_{v}

and

w_{e}

and satisfies

w_{v} + w_{e} = 2

. We conducted experiments on the sensitivity of parameters on five datasets. The experimental results show that as

w_{v}

decreased and

w_{e}

increased, both the AUC and R@k first slightly increased and then gradually decreased. The model achieved optimal detection performance when

w_{v} = 0.5

and

w_{e} = 1.5

. Compared to the common hyperedge-based neighbors, the common node-based neighbors had a greater impact on the detection performance (Figure 5).

We also conducted experiments on the impact of the GCN hidden layer size h on the detection performance (Figure 6). The experimental results show that as the hidden layer size increased, the AUC and R@k gradually increased and then tended to stabilize, and the detection performance reached its optimal level when the hidden layer size was 512. It can be observed that a small hidden layer size can affect detection performance, while a large size has limited improvement on detection performance, but it will consume more computation time.

6. Conclusions

In this paper, we propose AHD-SLE, an Anomalous Hyperedge Detection method on hypergraph Symmetric Line Expansion. First, the hypergraph is mapped to a SLE graph. Then, the graph convolution operation is used to aggregate the network information in the expansion graph to obtain the hyperedge embedding representation. Finally, the anomaly scoring function is designed to measure the degree of hyperedge anomalies. Due to the preservation of higher-order information of the hypergraph by the SLE map and the bijection operation, in the experiments on five different types of real-world hypergraph datasets, the AHD-SLE method achieved the optimal detection performance compared to the baseline in both AUC and R@k metrics. This indicates the superiority of SLE in hypergraph learning and also extends the research ideas of hypergraph learning.

In the future, it is worth further extending the proposed method to temporal hypergraphs, directed hypergraphs, and heterogeneous hypergraphs, which are also an important research topic of hypergraph representation learning.

Author Contributions

Conceptualization, Y.L. and S.L.; methodology, Y.L.; software, F.P.; formal analysis, writing—original draft preparation, Y.L; validation, Y.L and H.L.; writing—review and editing, Y.L. and S.L.; supervision, H.Y. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded in full by the Major Scientific and Technological Special Project of Henan Province under Grants 221100210700 and 221100210100.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The AHD-SLE code and datasets are available at https://github.com/ylli0/AHD-SLE, (accessed on 4 June 2024).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Ding, Q.; Katenka, N.; Barford, P.; Kolaczyk, E.; Crovella, M. Intrusion as (anti) social communication: Characterization and detection. In Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Beijing, China, 12–16 August 2012; pp. 886–894. [Google Scholar]
Ott, M.; Cardie, C.; Hancock, J. Estimating the prevalence of deception in online review communities. In Proceedings of the 21st international conference on World Wide Web, Lyon, France, 16–20 April 2012; pp. 201–210. [Google Scholar]
Hu, X.; Li, B.; Zhang, Y.; Zhou, C.; Ma, H. Detecting compromised email accounts from the perspective of graph topology. In Proceedings of the 11th International Conference on Future Internet Technologies, Nanjing, China, 15–17 June 2016; pp. 76–82. [Google Scholar]
Milojević, S. Principles of scientific research team formation and evolution. Proc. Natl. Acad. Sci. USA 2014, 111, 3984–3989. [Google Scholar] [CrossRef] [PubMed]
Pearcy, N.; Crofts, J.J.; Chuzhanova, N. Hypergraph models of metabolism. Int. J. Biol. Vet. Agric. Food Eng. 2014, 8, 752–756. [Google Scholar]
Wolf, M.M.; Klinvex, A.M.; Dunlavy, D.M. Advantages to modeling relational data using hypergraphs versus graphs. In Proceedings of the 2016 IEEE High Performance Extreme Computing Conference (HPEC), Waltham, MA, USA, 13–15 September 2016; pp. 1–7. [Google Scholar]
Mayfield, M.M.; Stouffer, D.B. Higher-order interactions capture unexplained complexity in diverse communities. Nat. Ecol. Evol. 2017, 1, 0062. [Google Scholar] [CrossRef] [PubMed]
Yoon, S.E.; Song, H.; Shin, K.; Yi, Y. How much and when do we need higher-order information in hypergraphs? A case study on hyperedge prediction. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 2627–2633. [Google Scholar]
Gao, Y.; Zhang, Z.; Lin, H.; Zhao, X.; Du, S.; Zou, C. Hypergraph learning: Methods and practices. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 44, 2548–2566. [Google Scholar] [CrossRef] [PubMed]
Krishnagopal, S.; Bianconi, G. Topology and dynamics of higher-order multiplex networks. Chaos Solitons Fractals 2023, 177, 114296. [Google Scholar] [CrossRef]
Bick, C.; Gross, E.; Harrington, H.A.; Schaub, M.T. What are higher-order networks? SIAM Rev. 2023, 65, 686–731. [Google Scholar] [CrossRef]
Tian, H.; Zafarani, R. Higher-Order Networks Representation and Learning: A Survey. arXiv 2024, arXiv:2402.19414. [Google Scholar]
Antelmi, A.; Cordasco, G.; Polato, M.; Scarano, V.; Spagnuolo, C.; Yang, D. A Survey on Hypergraph Representation Learning. ACM Comput. Surv. 2023, 56, 1–38. [Google Scholar] [CrossRef]
Zien, J.Y.; Schlag, M.D.; Chan, P.K. Multilevel spectral hypergraph partitioning with arbitrary vertex sizes. IEEE Trans. Comput.-Aided Des. Integr. Circuits Syst. 1999, 18, 1389–1399. [Google Scholar] [CrossRef]
Sun, L.; Ji, S.; Ye, J. Hypergraph spectral learning for multi-label classification. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 668–676. [Google Scholar]
Bandyopadhyay, S.; Das, K.; Murty, M.N. Line hypergraph convolution network: Applying graph convolution for hypergraphs. arXiv 2020, arXiv:2002.03392. [Google Scholar]
Yang, C.; Wang, R.; Yao, S.; Abdelzaher, T. Semi-supervised hypergraph node classification on hypergraph line expansion. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 2352–2361. [Google Scholar]
Akoglu, L.; McGlohon, M.; Faloutsos, C. Oddball: Spotting anomalies in weighted graphs. In Proceedings of the Advances in Knowledge Discovery and Data Mining: 14th Pacific-Asia Conference, PAKDD 2010, Hyderabad, India, 21–24 June 2010; pp. 410–421. [Google Scholar]
Hooi, B.; Song, H.A.; Beutel, A.; Shah, N.; Shin, K.; Faloutsos, C. Fraudar: Bounding graph fraud in the face of camouflage. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 895–904. [Google Scholar]
Yu, W.; Cheng, W.; Aggarwal, C.C.; Zhang, K.; Chen, H.; Wang, W. Netwalk: A flexible deep embedding approach for anomaly detection in dynamic networks. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 2672–2681. [Google Scholar]
Teng, X.; Yan, M.; Ertugrul, A.M.; Lin, Y.R. Deep into hypersphere: Robust and unsupervised anomaly discovery in dynamic networks. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Ding, K.; Li, J.; Bhanushali, R.; Liu, H. Deep anomaly detection on attributed networks. In Proceedings of the 2019 SIAM International Conference on Data Mining, Calgary, AB, Canada, 2–4 May 2019; pp. 594–602. [Google Scholar]
Peng, Z.; Luo, M.; Li, J.; Xue, L.; Zheng, Q. A deep multi-view framework for anomaly detection on attributed networks. IEEE Trans. Knowl. Data Eng. 2020, 34, 2539–2552. [Google Scholar] [CrossRef]
Perozzi, B.; Al-Rfou, R.; Skiena, S. Deepwalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Tang, J.; Qu, M.; Wang, M.; Zhang, M.; Yan, J.; Mei, Q. Line: Large-scale information network embedding. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 1067–1077. [Google Scholar]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Xu, L.; Wei, X.; Cao, J.; Yu, P.S. ICANE: Interaction content-aware network embedding via co-embedding of nodes and edges. Int. J. Data Sci. Anal. 2020, 9, 401–414. [Google Scholar] [CrossRef]
Lee, G.; Choe, M.; Shin, K. BOURNE: Bootstrapped Self-supervised Learning Framework for Unified Graph Anomaly Detection. In Proceedings of the 40th International Conference on Data Engineering, ICDE-24, Utrecht, The Netherlands, 13–17 May 2024. [Google Scholar]
Zhang, M.; Cui, Z.; Jiang, S.; Chen, Y. Beyond link prediction: Predicting hyperlinks in adjacency space. In Proceedings of the AAAI Conference on Artificial Intelligence, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Kumar, T.; Darwin, K.; Parthasarathy, S.; Ravindran, B. HPRA: Hyperedge prediction using resource allocation. In Proceedings of the 12th ACM Conference on Web Science, Southampton, UK, 6–10 July 2020; pp. 135–143. [Google Scholar]
Feng, Y.; You, H.; Zhang, Z.; Ji, R.; Gao, Y. Hypergraph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 3558–3565. [Google Scholar]
Arya, D.; Gupta, D.K.; Rudinac, S.; Worring, M. Hypersage: Generalizing inductive representation learning on hypergraphs. arXiv 2020, arXiv:2010.04558. [Google Scholar]
Bai, S.; Zhang, F.; Torr, P.H. Hypergraph convolution and hypergraph attention. Pattern Recognit. 2021, 110, 107637. [Google Scholar] [CrossRef]
Yadati, N.; Nimishakavi, M.; Yadav, P.; Nitin, V.; Louis, A.; Talukdar, P. Hypergcn: A new method of training graph convolutional networks on hypergraphs. In Proceedings of the 33rd International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; pp. 1511–1522. [Google Scholar]
Ranshous, S.; Chaudhary, M.; Samatova, N.F. Efficient outlier detection in hyperedge streams using minhash and locality-sensitive hashing. In Proceedings of the Complex Networks & Their Applications VI: Proceedings of Complex Networks 2017 (The Sixth International Conference on Complex Networks and Their Applications), Lyon, France, 29 November–1 December 2017; pp. 105–116. [Google Scholar]
Lee, G.; Choe, M.; Shin, K. HashNWalk: Hash and Random Walk Based Anomaly Detection in Hyperedge Streams. In Proceedings of the Thirty-First International Conference on Neural Information Processing Systems, Vienna, Austria, 23–29 July 2022; pp. 2129–2137. [Google Scholar]
Chun, J.; Lee, G.; Shin, K.; Jung, J. Random walk with restart on hypergraphs: Fast computation and an application to anomaly detection. Data Min. Knowl. Discov. 2023, 38, 1222–1257. [Google Scholar] [CrossRef]
Su, H.; Yang, Y.; Zhang, X.; Zhao, C. Anomalous social network event detection based on Higher-order networks. In Proceedings of the 2022 8th International Conference on Big Data and Information Analytics (BigDIA), Guiyang, China, 24–25 August 2022; pp. 324–329. [Google Scholar]
Agarwal, S.; Branson, K.; Belongie, S. Higher order learning with graphs. In Proceedings of the 23rd International Conference on Machine Learning, Pittsburgh, PA, USA, 25–29 June 2006; pp. 17–24. [Google Scholar]
Liu, Z.; Tang, B.; Ye, Z.; Dong, X.; Chen, S.; Wang, Y. Hypergraph Transformer for Semi-Supervised Classification. arXiv 2023, arXiv:2312.11385. [Google Scholar]
Huang, J.; Yang, J. UniGNN: A Unified Framework for Graph and Hypergraph Neural Networks. In Proceedings of the Thirtieth International Joint Conference on Artificial Intelligence, Virtual Event, 19–27 August 2021; pp. 2563–2569. [Google Scholar]
Srinivasan, B.; Zheng, D.; Karypis, G. Learning over families of sets-hypergraph representation learning for higher order tasks. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), Virtual Event, 29 April–1 May 2021; pp. 756–764. [Google Scholar]
Thonet, T.; Renders, J.M.; Choi, M.; Kim, J. Joint Personalized Search and Recommendation with Hypergraph Convolutional Networks. In Proceedings of the European Conference on Information Retrieval, Stavanger, Norway, 10–14 April 2022; pp. 443–456. [Google Scholar]
Jiang, J.; Wei, Y.; Feng, Y.; Cao, J.; Gao, Y. Dynamic Hypergraph Neural Networks. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 2635–2641. [Google Scholar]
Tan, Y.; Ma, Z.; Zhan, Y.; Mao, Q. Hypergraph induced graph convolutional network for multi-label image recognition. In Proceedings of the 2020 International Conference on Internet of Things and Intelligent Applications (ITIA), Zhenjiang, China, 27–29 November 2020; pp. 1–5. [Google Scholar]
Bandyopadhyay, S.; Das, K.; Murty, M.N. Hypergraph attention isomorphism network by learning line graph expansion. In Proceedings of the 2020 IEEE International Conference on Big Data (Big Data), Atlanta, GA, USA, 10–13 December 2020; pp. 669–678. [Google Scholar]
Yadati, N.; Nitin, V.; Nimishakavi, M.; Yadav, P.; Louis, A.; Talukdar, P. NHP: Neural hypergraph link prediction. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, 19–23 October 2020; pp. 1705–1714. [Google Scholar]
Zhang, R.; Zou, Y.; Ma, J. Hyper-SAGNN: A self-attention based graph neural network for hypergraphs. In Proceedings of the 8th International Conference on Learning Representations, ICLR 2020, Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]

Figure 1. Hypergraph expansions. (a) A hypergraph with 7 nodes and 3 hyperedges. (b) Star expansion. (c) Clique expansion. (d) Line expansion. (e) Symmetric line expansion.

Figure 2. The bijection between node–hyperedge pairs and SLE graph nodes.

Figure 3. Anomalous Hyperedge Detection method on hypergraph Symmetric Line Expansion.

Figure 4. Impact of different anomalous proportions for anomaly detection. (a) is the impact on AUC, (b) is the impact on R@k.

Figure 5. Impact of different

w_{v}

and

w_{e}

for detection performance. The x-axis is the value of the

w_{v}

. Accordingly, the

w_{e} = 2 - w_{v}

. The y-axis in (a) is the AUC value under different

w_{v}

, and the y-axis in (b) is the R@k value under different

w_{v}

.

Figure 5. Impact of different

w_{v}

and

w_{e}

for detection performance. The x-axis is the value of the

w_{v}

. Accordingly, the

w_{e} = 2 - w_{v}

. The y-axis in (a) is the AUC value under different

w_{v}

, and the y-axis in (b) is the R@k value under different

w_{v}

.

Figure 6. Impact of different GCN hidden sizes for detection performance. (a) is the impact on AUC, (b) is the impact on R@k.

Table 1. Summary of anomaly edge and hyperedge detection methods.

Anomaly Type	Category	Method	Key Idea
Anomalous node detection	Traditional method	OddBall [18], Fraudar [19]	Transform graph anomaly detection into a traditional anomaly detection.
	Network representation learning	NetWalk [20], DeepSphere [21]	Encode graph structure into an embedded vector space and identify anomalous nodes.
	GNN-based representation learning	DOMINANT [22], ALARM [23]	Encode graph structure and attribute information into an embedded vector space to identify anomalous nodes.
Anomalous edge detection	Heuristics metrics	CN, AA, RA, Katz, Jaccard [16]	Design heuristics metrics of nodes similarity to measure the degree of edge anomaly.
	Network representation learning	DeepWalk [24], Node2Vec [25], LINE [26]	Encode graph structure to low-dimensional vectors which distance is used to measure node similarity.
	GNN-based representation learning	GCN, GAT, GraphSAGE [27]	Encode nodes attributes and graph structure to low-dimensional vectors whose distance is used to measure node similarity.
	Direct method	ICANE [28]	Directly performs edge representation learning for anomalous edge detection.
	Direct method	BOURNE [29]	Employ self-supervised learning to compare the pairs of target nodes and edges to derive anomaly scores.
Anomalous hyperedge detection	Heuristics metrics	HCN [30], HKatz [30], HRA [31]	Extend similarity metrics used in the ordinary graphs to hypergraphs.
	Hypergraph representation learning	HGNN [32], HyperSAGE [33], Hyper-Atten [34], HyperGCN [35]	Encode hypergraph nodes attributes and graph structure to low-dimensional vectors whose distance is used to measure node similarity.
	Direct method	LSH [36]	Design similarity metrics between hyperedges using locally sensitive hashing.
		HashNWalk [37]	Design similarity metrics based on the hash function and random walk similarity metrics.
		ARCHER [38]	Develop the random walk with restart (RWR) on hypergraph for hypergraph embedded representation.
		AHIN [39]	Propose a hypergraph contrastive learning method to capture abnormal event patterns.

Table 2. Summary of hypergraph representation learning.

Category	Method	Task
Clique expansion	HyperGT [41]	Node classification
	HyperSAGE [33]	Node classification
	UniGNN [42]	Node classification
	FamilySet [43]	Hyperedge classification, hyperedge expansion
	HyperSaR [44]	Search and recommendation tasks
Star expansion	HGNN [32]	Node classification
	DHGNN [45]	Node classification, sentiment prediction
	Hyper-Atten [34]	Node classification
	HyperGCN [35]	Node classification
	HI-GCN [46]	Multilabel image recognition
Line expansion	LHCN [16]	Node classification
	HAIN [47]	Node classification
	LEGCN [17]	Node classification

Table 3. Summary of the real-world hypergraph datasets used in the experiments.

Dataset	Hypergraph Type	Nodes Number	Hyperedges Number	Features Size
iAF1260b	metabolic reactions	1668	2084	26
iJO1366	metabolic reactions	1805	2253	26
USPTO	organic reactions	16,293	11,433	298
dblp	coauthorship	20,685	30,956	3763
Reverb45k	knowledge graph	28,798	66,914	382

Table 4. Comparison of AUC and R@k.

Datasets	iAF1260b		iJO1366		USPTO		DBLP		Reverb45k
Model	AUC	R@k	AUC	R@k	AUC	R@k	AUC	R@k	AUC	R@k
DeepWalk	0.50	0.22	0.62	0.32	0.57	0.29	0.57	0.30	0.68	0.40
LINE	0.54	0.26	0.53	0.24	0.56	0.30	0.53	0.28	0.59	0.37
Node2Vec	0.56	0.26	0.60	0.28	0.57	0.29	0.59	0.31	0.73	0.37
HGNN	0.63	0.28	0.61	0.31	0.68	0.27	0.66	0.35	0.64	0.33
HyperGCN	0.62	0.28	0.59	0.32	0.69	0.30	0.67	0.37	0.67	0.32
Hyper-SAGNN	0.61	0.26	0.57	0.31	0.66	0.28	0.64	0.34	0.66	0.33
UniGNN	0.60	0.30	0.58	0.30	0.54	0.24	0.63	0.32	0.56	0.30
NHP-U	0.64	0.31	0.63	0.32	0.74	0.37	0.69	0.38	0.75	0.44
AHD-SLE	0.69	0.36	0.67	0.35	0.77	0.38	0.69	0.39	0.78	0.45

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Yu, H.; Li, H.; Pan, F.; Liu, S. AHD-SLE: Anomalous Hyperedge Detection on Hypergraph Symmetric Line Expansion. Axioms 2024, 13, 387. https://doi.org/10.3390/axioms13060387

AMA Style

Li Y, Yu H, Li H, Pan F, Liu S. AHD-SLE: Anomalous Hyperedge Detection on Hypergraph Symmetric Line Expansion. Axioms. 2024; 13(6):387. https://doi.org/10.3390/axioms13060387

Chicago/Turabian Style

Li, Yingle, Hongtao Yu, Haitao Li, Fei Pan, and Shuxin Liu. 2024. "AHD-SLE: Anomalous Hyperedge Detection on Hypergraph Symmetric Line Expansion" Axioms 13, no. 6: 387. https://doi.org/10.3390/axioms13060387

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

AHD-SLE: Anomalous Hyperedge Detection on Hypergraph Symmetric Line Expansion

Abstract

1. Introduction

2. Related Works

2.1. Anomaly Edge and Hyperedge Detection

2.2. Hypergraph Representation Learning

3. Preliminaries

3.1. Problem Statement

3.2. Symmetric Line Expansion

4. Method

4.1. Hypergraph Expansion

4.2. Hyperedge Representation Learning

4.3. Hyperedge Anomaly Score

4.4. Loss Function

5. Experiments

5.1. Datasets

5.2. Anomaly Injection

5.3. Baselines

5.4. Metrics

5.5. Implementation Details

5.6. Results

5.6.1. Detection Performance

5.6.2. Algorithmic Robustness

5.6.3. Parameter Sensitivity

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI