Next Article in Journal
A Method for Locating Wideband Oscillation Disturbance Sources in Power Systems by Integrating TimesNet and Autoformer
Previous Article in Journal
Low Probability of Intercept Radar Signal Recognition Based on Semi-Supervised Support Vector Machine
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Augmented Feature Diffusion on Sparsely Sampled Subgraph

1
College of Software, Northeastern University, Shenyang 110169, China
2
College of Engineering, Computing and Cybernetics, Australian National University, Canberra, ACT 2601, Australia
*
Author to whom correspondence should be addressed.
These authors contributed equally to this work.
Electronics 2024, 13(16), 3249; https://doi.org/10.3390/electronics13163249
Submission received: 18 July 2024 / Revised: 12 August 2024 / Accepted: 13 August 2024 / Published: 15 August 2024
(This article belongs to the Special Issue Motion-Centric Video Processing)

Abstract

:
Link prediction is a fundamental problem in graphs. Currently, SubGraph Representation Learning (SGRL) methods provide state-of-the-art solutions for link prediction by transforming the task into a graph classification problem. However, existing SGRL solutions suffer from high computational costs and lack scalability. In this paper, we propose a novel SGRL framework called Augmented Feature Diffusion on Sparsely Sampled Subgraph (AFD3S). The AFD3S first uses a conditional variational autoencoder to augment the local features of the input graph, effectively improving the expressive ability of downstream Graph Neural Networks. Then, based on a random walk strategy, sparsely sampled subgraphs are obtained from the target node pairs, reducing computational and storage overhead. Graph diffusion is then performed on the sampled subgraph to achieve specific weighting. Finally, the diffusion matrix of the subgraph and its augmented feature matrix are used for feature diffusion to obtain operator-level node representations as inputs for the SGRL-based link prediction. Feature diffusion effectively simulates the message-passing process, simplifying subgraph representation learning, thus accelerating the training and inference speed of subgraph learning. Our proposed AFD3S achieves optimal prediction performance on several benchmark datasets, with significantly reduced storage and computational costs.

1. Introduction

The application of complex networks is becoming increasingly widespread in various fields [1], such as social networks [2,3,4], biological networks [5,6], transportation networks [7] and video processing tasks [8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27]. Among them, link prediction is one of the significant research directions in complex networks, aiming to predict the unobserved links between nodes or the likelihood of future links based on known nodes and network structures [16,20]. The research on link prediction not only helps us better understand the internal structure and evolution mechanisms of networks but also has extensive applications in practical fields such as social network analysis [2], bioinformatics [6], skeletal action recognition [8,10,13,14,15,16,17,18,19,20], and recommendation systems [28], demonstrating significant research significance and application value in the real world.
In recent years, researchers have proposed various methods and techniques for link prediction, ranging from early simple heuristic methods (e.g., Common Neighbors [29], Adamic Adar [2], Katz [30], etc.) to Graph Neural Networks (GNNs) [31,32,33,34,35]. Among these methods, GNNs have become widely accepted and successful solutions [14,15,16,17,18,19,20], primarily due to their ability to effectively capture complex relationships within graph-structured data. Building on the success of GNNs, attention mechanisms have recently gained popularity in link prediction tasks. These mechanisms excel at focusing on the most relevant parts of the input data, thereby improving model performance. This technique is well exemplified by models like Code2Vec [36] and Code2Seq [37], where attention is used to select the most pertinent paths in code representation learning. By weighing different paths or sequences based on their relevance, these models enhance prediction accuracy. Although initially applied in the domain of source code processing, the methodology behind attention mechanisms offers a valuable perspective on improving link prediction by emphasizing the most informative subgraphs or connections within a network.
In the context of link prediction, attention mechanisms can dynamically prioritize different parts of the graph during the learning process, leading to more accurate and efficient models. For example, the SSP-AA model [38] incorporates adaptive attention mechanisms to focus on sparse subgraph sampling, thereby improving prediction accuracy. These approaches demonstrate how attention can be leveraged to capture critical information in large, complex networks, enhancing the model’s ability to understand and predict link formation.
Building on these advancements, other researchers have further explored the integration of sophisticated techniques to improve link prediction. For instance, Zhang and Chen [39] introduced a method called SEAL, which improves upon the Weisfeiler–Lehman neural machine (WLNM) [40] by using a GNN to learn graph structure features from locally closed subgraphs, thereby enhancing link prediction performance. Similarly, Keikha et al. [41] proposed the DeepLink framework, which utilizes deep learning to extract structural and content features from nodes in social networks, showing improved performance in link prediction tasks. Another noteworthy approach is the method proposed by Cai and Ji [42], which introduces a multi-scale node aggregation technique to transform closed subgraphs of different scales while preserving essential information, further improving the accuracy of link predictions. In addition to these deep learning-based approaches, traditional clustering methods such as the Graph InfoClust (GIC) [43] proposed by Mavrostan and Karypis continue to offer valuable insights by applying unsupervised learning algorithms like K-means to graph representation learning.
Early GNNs used shallow encoders to learn representations of source and target nodes, then they aggregated these independent node representations as link representations, neglecting the relative positions between nodes [44,45], resulting in inferior link representations [46]. To address this issue, SubGraph Representation Learning (SGRL) methods [38,39,47,48,49,50] significantly enhanced the expressive power of GNNs by learning the enclosing subgraphs around target node pairs instead of learning the embeddings of both ends independently. This approach provides state-of-the-art solutions for link prediction. However, as the graph size increases and the hop of subgraphs grows, the storage and computational costs for extracting, preprocessing, and learning enclosing subgraphs for any target node pair also grow exponentially, leading to high complexity and low computational efficiency [16,20].
To improve the computational efficiency of these models, Scaled [51] achieved better scalability by extracting sparsely sampled subgraphs, while WSEE [52] employed weighted sampling based on node features as weights to reduce the overhead required for scaling to larger graphs while maintaining the basic information of the original graph. SSP-AA [38] utilizes sparse subgraphs based on an adaptive attention mechanism for link prediction. Although these methods enable processing large-scale graphs through sparse subgraph sampling, they sometimes have to sacrifice some predictive performance as a trade-off.
We propose a Link Prediction Algorithm via Augmented Feature Diffusion on Sparsely Sampled Subgraph (AFD3S) to address the issues above. Firstly, we perform local feature augmentation on the original graph by a generative model to learn the feature distribution of neighbor nodes conditioned on the central node’s features. The generated features are then fused with the original features to obtain a feature augmentation matrix, which improves the expressive power of downstream GNNs. Next, we adopt a random walk approach between the target node pairs to extract sparsely sampled subgraphs, thereby reducing the storage and computational costs of the subgraphs. Subsequently, predefined graph diffusion operations are performed on these subgraphs to obtain graph diffusion matrices. Finally, we perform feature diffusion operations on the subgraph’s diffusion matrix and its corresponding feature augmentation matrix to obtain the operator-level node representations of the subgraph. This representation is then used as input for downstream link prediction tasks. Feature diffusion simulates the message-passing process between nodes within the subgraph, simplifying subgraph representation learning and accelerating its training and inference speed, ultimately reducing the overall model runtime. Extensive experiments on real-world datasets demonstrate that AFD3S outperforms all baseline models in link prediction, requiring less training time and memory and achieving significant speedups.

2. Preliminary

Notations. Let G = ( V , E ) be an input graph, where V = { v 1 , v 2 , , v N } denotes the set of nodes in graph G, N represents the number of nodes, and E V × V is the set of edges. The adjacency matrix is defined as A { 0 , 1 } N × N , where A i , j = 1 if and only if ( v i , v j ) E . Let N i = { ν j A i , j = 1 } represent the set of neighbors (neighborhood) of a node v i , and D represent the diagonal degree matrix, where D i , i = j = 1 n A i , j . The feature matrix is denoted as X R N × d , where each node v is associated with a d-dimensional feature vector  X ν .
Definition 1 
(Enclosing Subgraph). Given a graph G and a target node pair T = { u , v } , the h-hop enclosing subgraph of T is a subgraph G u ν h induced from G, with a node set { j d ( j , x ) h or d ( j , y ) h } , where d ( i , j ) represents the shortest distance between node i and node j.
Definition 2 
(Sampled Subgraph). In a given graph G, the randomly walked sampled h-hop enclosing subgraph of a target node pair T = { u , v } is obtained by inducing a subgraph G u ν h , k from G, with a node set V u ν h k S u h , k S ν h , k , where S i h , k represents the set of nodes visited by performing k random walks of length h starting from node i.
Link Prediction. The goal is to infer the existence of edges between target node pairs T = { u , v } based on the observed adjacency matrix A and features X . The learning task is to find a likelihood (or scoring) function f that assigns an interaction likelihood value (or score) to each target node pair ( u , v ) E , where a higher value indicates a higher probability of the existence of a link.
Early link prediction methods mainly relied on network heuristic algorithms, such as common neighbors [29], Jaccard index [53], and Katz index [30]. While these methods are simple and direct, their generalization ability on different graph structures is limited. To address this challenge, researchers proposed various GNN methods, which can independently learn feature representations of node pairs and predict link probabilities by aggregating these representations [44,45]. However, GNNs still have limitations in capturing the automorphism of graphs and the nodes’ different roles in the link formation process [46]. To overcome this limitation, SEAL [39] innovatively transformed link prediction into a graph classification problem on enclosing subgraphs and enhanced the expressive power of node features by introducing structural labels. This led to the emergence of SGRLs, which have achieved significant progress in link prediction tasks and demonstrate state-of-the-art performance.
However, despite the breakthrough in SGRLs’ performance for link prediction tasks, they often face exponential growth in storage and computational costs as the size of graph data and the hop of subgraphs increase. This results in high temporal and spatial complexity, lacking scalability, which has become a crucial obstacle to their practical application and deployment. Therefore, improving the computational efficiency and processing capability of SGRLs has become an important challenge in current research.
Therefore, our work proposes a new SGRL framework to address the existing problems in subgraph representation learning. It uses local feature augmentation to enhance the expressive power of downstream GNNs and employs sparsely sampled subgraphs to effectively reduce the storage and computational requirements of subgraphs. In addition, introducing subgraph-level diffusion operators that are easy to pre-compute simplifies the subgraph representation learning process by using feature diffusion operations to replace traditional expensive message-passing schemes, further accelerating the training and inference processes of SGRL.

3. Our Model

3.1. Model Framework

The AFD3S process consists of four steps, as Figure 1 illustrates. Firstly, local feature augmentation is applied to the input graph to obtain a feature augmentation matrix. Secondly, sparsely sampled subgraphs are extracted using a random walk strategy, starting from the target node pair. Then, a special weighting operation is performed on the subgraphs, which involves applying a predefined graph diffusion operator to these subgraphs to obtain the diffusion matrix. Finally, the diffusion matrix performs feature diffusion with the previously obtained feature augmentation matrix, resulting in an operator-level node representation of the subgraph. This representation serves as input for downstream link prediction tasks.

3.2. Local Feature Augmentation

Existing GNNs [49] mainly focus on designing message-passing schemes to utilize local information in graphs to obtain node representations. Although GNNs have achieved remarkable performance in various graph-based tasks [8,10,13,14,15,16,17,18,19,20], for the limited local neighborhood information of node numbers, existing GNNs may not fully aggregate such information, thus affecting the learning effect of the models. To address this issue, we propose a local augmentation strategy on graphs, which generates feature distributions of neighbor nodes conditioned on the features of central nodes and utilizes these generated features during the training process to enhance the expressive power of downstream GNNs.
To generate more features within the neighborhood N ν of a node v, it is first necessary to know the feature distribution of its neighbor nodes. Since this distribution is related to the central node v, a generative model is used to learn its distribution conditional on the features of the central node. In this paper, we use a Conditional Variational Autoencoder (CVAE) [54] to learn the conditional distribution of the features of the connected neighbor node u ( u N ν ) given the central node v. Since the feature distribution of neighbor node u is related to X ν , we condition it on X ν . The latent variable z is generated from a prior distribution p θ ( z | X ν ) , and the generated feature X u is produced through a generative distribution conditioned on both z and X ν , p θ ( X X ν , z ) , i.e., z p θ ( z | X ν ) , X u p θ ( X X ν , z ν ) . Using ϕ to represent variational parameters and θ to represent generative parameters, we have:
log p θ ( X u X ν ) = q ϕ ( z X u , X ν ) log p θ ( X u , z X ν ) q ϕ ( z X u , X ν ) dz + K L ( q ϕ ( z X u , X ν ) p θ ( z X u , X ν ) ) q ϕ ( z X u , X ν ) log p θ ( X u , z X ν ) q ϕ ( z X u , X ν ) dz
the corresponding Evidence Lower Bound (ELBO) [55] can be defined as:
L ( X u , X ν ; θ , ϕ ) = K L ( q ϕ ( z | X u , X ν ) p θ ( z | X ν ) ) + 1 L l = 1 L log 2 p θ ( X u | X ν , z ( l ) )
where z ( l ) = g ϕ ( X v , X u , ϵ ( l ) ) , ϵ ( l ) N ( 0 , I ) , L represents the number of neighbors of node v, K L refers to the Kullback–Leibler Divergence [56], also known as relative entropy. In information theory and machine learning, K L divergence measures the difference between two probability distributions. In this paper, it is used to measure the difference between the posterior distribution and the prior distribution.
A CVAE model is trained for all nodes during the experiments. The objective during the training phase is to maximize the ELBO, i.e., Equation (2), by taking pairs of adjacent nodes ( X ν , X u , u N ν ) as input. In the Variational Autoencoders (VAE) context, ELBO is typically considered a loss function. During the training of a VAE, the objective is to maximize the ELBO, which is the opposite of minimizing a loss function. Maximizing the ELBO is equivalent to minimizing the sum of the reconstruction error and the K L divergence, which helps the model learn latent representations that can generate the data while preserving the structural information in the latent space. During the generation phase, node features X ν are used as conditions, and a latent variable z N ( 0 , I ) is sampled as input to the decoder. Then, a generated feature vector X ¯ ν associated with node v can be obtained. Algorithm 1 describes the training process of the CVAE feature generation model.
Algorithm 1 CVAE model training
Input: 
Input graph G, adjacency matrix A , feature matrix X
Output: 
Feature generation model Q ϕ
1:
Initialize Q ϕ
2:
while not convergence do
3:
    for each v V  do
4:
         N ν = g e t _ n e i g h b o r s ( A , ν )
5:
         z = e n c o d e r ( X ν , Q ϕ )
6:
         X u = g e n e r a t o r ( z , X v , N v , Q ϕ )
7:
         l o s s = c o m p u t e _ ELBO ( X ω , X ν , z , Q ϕ )
8:
         l o s s . b a c k w a r d ( )
9:
         o p t i m i z e r . s t e p ( )
10:
    end for
11:
end while
12:
return  Q ϕ
After training, the generative model is applied to the input graph, and the generated features X ¯ ν are used as additional input to perform calculations with the original features X to obtain augmented feature representations H for the nodes, thus improving the expressive power of downstream GNNs, as shown in Equation (3).
H = σ ( X , X ¯ )
where σ represents a specific operation. We provide two ways of using the generated features: concatenation and averaging. Figure 2 illustrates the local feature augmentation using concatenation.

3.3. Subgraph Sampling and Graph Diffusion

Since SEAL [39] and its variants (WESLP [57], WalkPool [58], etc.) lack scalability, the size of subgraphs grows exponentially as the hop h increases. Nodes with high degrees tend to have very large enclosing subgraphs, even for small hops, resulting in the models’ high temporal and spatial complexity. Therefore, the proposed model utilizes sparsely sampled subgraphs (Definition 2) instead of enclosing subgraphs when extracting subgraphs for a target node pair. By introducing sparsely sampled subgraphs, the model can effectively reduce the size of subgraphs while maintaining sufficient information, thus lowering the temporal and spatial complexity of the model. Figure 3 illustrates the extraction of a sampled subgraph for the target node pair ( u , v ) , S u h , k = { a , b , c , d , e } , S ν h , k = { f , g , h , i , j } , V u ν h , k = { a , b , c , d , e , f , g , h , i , j , u , v } , where the walk length h is 2, and the number of walks k is 3.
By comparing the definitions of enclosing subgraph (Definition 1) and sparsely sampled subgraph, we can draw the following important conclusions: (i) The sampled subgraph G u ν h , k is a subgraph of enclosing subgraph G u ν h , because random walks of length h cannot reach nodes that are more than h steps away from the starting node; (ii) The size of the sampled subgraph is limited to O ( h k ) , which can be linearly controlled by adjusting the parameters of walk length h and several walks k, in contrast to the exponential growth of enclosing subgraph in Definition 1. By replacing dense enclosing subgraphs with their corresponding sparsely sampled subgraphs, AFD3S reduces the computational and storage overhead of subgraphs, providing scalability while still maintaining the flexibility to control the degree of sparsity and scalability through its sampling parameters h and k.
To obtain the sparsely sampled subgraph G u ν h , k for the target node pair T = { u , v } with the corresponding adjacency matrix A u ν , to further capture the structural relationships and similarities between nodes, while simulating the process of information diffusion between nodes, the AFD3S utilizes predefined graph diffusion operators to perform specific weighted operations on the sampled subgraph and obtain the corresponding diffusion matrix:
M u ν = ψ ( A u ν )
where, M u ν represents the diffusion matrix of the sampled subgraph, and G u ν h , k denotes the specific graph diffusion operator. ψ can be varied by using different diffusion operators to capture different structural features in the graph, such as adjacency matrices/Laplacian operators for capturing connectivity, triangle/motif-based operators [59] for capturing inherent community structures, and Personalized PageRank (PPR)-based operators [60] for identifying important connections. Each operator and its powers can constitute different diffusion operators in AFD3S. The graph diffusion operator used in this model is the multiple powers of the adjacency matrix, which captures and represents the multi-hop neighborhood relationships of nodes in the graph, providing rich topological features for graph structure analysis and GNNs.

3.4. Feature Diffusion

The diffusion matrix M u v of the sampled subgraph is used to perform feature diffusion operations with the corresponding feature augmentation matrix H u v , obtaining the operator-level node representation Z u v of the subgraph:
Z u ν = M u ν · H u ν
Feature diffusion simulates the process of information diffusion between nodes, simplifying subgraph representation learning. The operator-level node feature representation not only contains its information, but also integrates information from its neighbor nodes, thus capturing the structural characteristics within the subgraph. Specifically, feature diffusion operations help with:
  • Feature smoothing: In deep GNNs, information may propagate excessively between nodes, leading to overly similar node representations and the issue of over-smoothing. Adjusting the diffusion matrix can somewhat alleviate this problem, maintaining the diversity of node representations.
  • Enhancing node representations: A node’s feature vector can integrate features from its direct and indirect neighbors through diffusion operations, making the node representation richer and more comprehensive.
  • Simulating graph structure: The diffusion matrix essentially reflects the structural information of the graph. Multiplying it with the feature augmentation matrix can simulate information transmission between nodes based on the graph structure, simplifying the subgraph representation learning process and accelerating training and inference speeds.
  • Improving prediction performance: In link prediction tasks, this node representation fused with structural information can improve the model’s accuracy in predicting potential links, as it can better capture the interdependencies between nodes.
  • Computational efficiency: Compared to performing complex graph neural network operations on the entire graph, this subgraph-level diffusion operation can significantly reduce the amount of computation, making the model more efficient for applications on large-scale graphs.
In experiments, one can apply a set of different graph diffusion operators to the same sampled subgraph to obtain a set of linear diffusion matrices M u ν ( 0 ) , , M u ν ( r ) . These diffusion matrices are then applied to the feature augmentation matrix H u ν of the subgraph to yield a set of operator-level node representations Z u ν ( 0 ) , , Z u ν ( r ) . Furthermore, it holds that:
Z u ν ( i ) = M u ν ( i ) · H u ν
where M u ν ( i ) represents the diffusion matrix corresponding to the adjacency matrix of the subgraph when the i-th diffusion operator is applied. Then, the operator-level node representation matrices Z u ν ( i ) of all sampled subgraphs are concatenated to form the final joint node representation, which is given by
Z u ν = i = 0 r Z u ν ( i )
where the ⨁ symbol represents the concatenation operation of a set of feature vectors. When concatenating node representation matrices with mismatched dimensions, it is necessary to ensure that the rows belonging to the same node are properly aligned. For any missing rows, zero-padding is used, similar to the zero-padding strategy in graph pooling, thus ensuring the uniformity of matrix dimensions and data integrity.

3.5. Training and Prediction

After obtaining the final operator-level node feature matrix Z u ν of the sampled subgraph, the first step is to reduce the dimensionality of the node representation matrix. This can be achieved through a fully connected layer consisting of a learnable weight matrix W and a nonlinear activation function δ . The purpose of this step is to reduce the dimensionality of the node representation while preserving important information. Next, a pooling operation is performed on the reduced representation. This typically involves aggregating the representations of the target node and its common neighbors. Pooling methods can be center pooling or center-common-neighbor pooling, which help further extract and integrate critical information. Finally, the pooled representation is input into a learnable function ζ , such as a Multi-Layer Perceptron(MLP), which transforms the node representation into the probability p u ν of a link existing. This probability can then be used for link prediction tasks. The above process is formulated as follows:
p u ν = ζ p o o l ( δ ( Z u ν W ) )
During the training process, the model optimizes the weight matrix W and the parameters of function ζ by minimizing the difference between the predicted link probability and the actual existence of the link. This is typically achieved through optimization techniques such as backpropagation and gradient descent. The training loss function employs the binary cross-entropy loss function, whose formula is as follows:
L = 1 E l a b e l ( u , ν ) E l a b e l y u ν log p u ν + ( 1 y u ν ) log ( 1 p u ν )
where E l a b e l represents the entire training set, E l a b e l represents the number of samples, y u ν indicates whether there exists an edge between nodes u and v, and p u ν represents the predicted probability of the existence of an edge. This loss function minimizes the cross-entropy between the predicted results and the true labels. Algorithm 2 describes the process of AFD3S for link prediction training.
Algorithm 2 Augmented Feature Diffusion on Sparsely Sampled Subgraph (AFD3S)
Input: 
Input graph G, adjacency matrix A , feature matrix X
Output: 
Link prediction model Ω
1:
Initialize Ω , W , δ
2:
H = l o c a l _ a u g m e n t ( A , X , σ )
3:
while not convergence do
4:
    for each ( u , v ) E  do
5:
         G u v h , k , A u v = S a m p l e r ( A , h , k , u , v )
6:
        for  i = 1 r  do
7:
            M u v ( i ) = ψ ( i ) ( A u v )
8:
            Z u v ( i ) = M u v ( i ) · H u v
9:
        end for
10:
         Z u v = a g g r e g a t e _ Z ( Z u v ( 0 ) , , Z u v ( r ) )
11:
         p u v = ζ ( p o o l ( δ ( Z u v W ) ) )
12:
         l o s s = c o m p u t e _ l o s s ( Ω , p w , y w )
13:
         l o s s . b a c k w a r d ( )
14:
         o p t i m i z e r . s t e p ( )
15:
    end for
16:
end while
17:
return  Ω

4. Experiment

4.1. Datasets and Baselines

Datasets. We used nine real-world network datasets, including weighted and unweighted, undirected, attributed, and non-attributed graph data. The experiments divided these datasets into two categories: non-attributed and attributed datasets. For both attributed and non-attributed datasets, except for Cora, CiteSeer, and PubMed, which were divided into 70% training set, 10% validation set, and 20% test set according to specific experimental settings, the edges of the remaining datasets were randomly divided into 85% training set, 5% validation set, and 10% test set. The experimental datasets include NS [61], a collaboration network of network science researchers, Power [50], an electrical power grid of the western United States, Yeast [62], a protein-protein interaction network, PB [63], a political blog network, Cora [64], a citation network in the field of machine learning, CiteSeer [65], a scientific publication citation network, PubMed [65], a diabetes-related scientific publication citation network, and Texas and Wisconsin [66], web page datasets collected by computer science departments of different universities.
Table 1 details the statistical information of these datasets, with the first four being non-attributed networks and the last five being attributed networks. Node represents the number of nodes, Edge represents the number of edges, Avg Deg represents the average degree of the network, Feat represents the feature dimension of the nodes, and Type represents the network type.
Baselines. In this section, we experimentally analyze the proposed link prediction model AFD3S and compare it with nine existing advanced link prediction models on nine different real-world datasets. These include two message-passing graph neural network (MPGNNs) models: GCN [67] and GIN [68]; three autoencoder (AE) models: GAE, VGAE [69], and GIC (Graph InfoClust) [43] and four SGRLs: SEAL [39], WESLP [57], Scaled [51], and WalkPool [58].

4.2. Experimental Setup

Experimental Environment. Equipped with AMD Ryzen 7 5800H CPU, 32 GB memory, the hardware environment of NVIDIA GeForce RTX 3070 Laptop GPU (8 GB graphics memory) runs on the Windows 11 64-bit operating system, using PyCharm 2023.2.1 as the development tool, Python 3.10.9 as the development language, and PyTorch 1.12.1 and PyTorch Geometry 2.0.9 as the development framework.
Experimental Settings. For SGRLs and the AFD3S method on non-attributed datasets, the hop of the enclosing subgraphs, h, is typically set to 2 (except for the WalkPool on the Power dataset, where h is set to 3). For sparsely sampled subgraphs, the walk length h is set to 2, and the number of walks k is set to 50. On attributed datasets, the hop of the enclosing subgraphs, h, is generally set to 3 (while the WalkPool sets it to 2). The settings for sparsely sampled subgraphs are the same as for non-attributed datasets. Additionally, in the AFD3S, the zero-one [46] labeling scheme is uniformly adopted to label all datasets, while models like SEAL and Scaled use DRNL [39] for labeling. The central common neighbor pooling readout function employs a simple mean aggregation approach. These settings and choices aim to ensure consistency and performance optimization of the models while accommodating the characteristics of different datasets and models. Moreover, for all datasets, the percentages of training, validation, and test sets across all models are uniformly set to 85%, 5%, and 10%, respectively, with a 1:1 sampling ratio for positive and negative samples.
In the AFD3S model, the neural network utilizes SIGN [70], and for the non-attributed datasets, Node2Vec is employed to generate 256-dimensional feature vectors for each node. In the process of feature augmentation, σ uniformly adopts concatenation as the augmentation method. For all datasets, the hidden dimension after pooling in Equation (8) is set to 256, and an MLP with a 256-dimensional hidden layer is adopted in the experiments. To maintain consistency, the dropout rate is set to 0.5 for all models, the learning rate is set to 0.0001, and the Adam optimizer is used for 50 training epochs. During the training process, except for the MPGNN model, which uses full-batch training on the input graph, the batch size for other models is set to 32. These settings ensure the experiments’ fairness and comparability while fully utilizing the potential of the AFD3S model.
Evaluation Metrics. This paper adopts AUC and AP as the evaluation standards for model performance, aiming to accurately assess the performance of the AFD3S in solving the link prediction problem. Additionally, to fully demonstrate the computational efficiency and scalability of the AFD3S, this study further compares the performance of the AFD3S with existing popular SGRLs in terms of average preprocessing time, average training time, average inference time, and total running time.

4.3. Results and Analysis

Link Prediction. For all models, on both attributed and non-attributed datasets, this study presents the average AUC and AP scores over 10 runs with different fixed random seeds on the test data (Figure 4 and Figure 5). Table 2 displays the AUC results for both non-attribute and attribute datasets, while Table 3 displays the AP results for both non-attribute and attribute datasets. The optimal values are marked in bold.
Based on the data in Table 2 and Table 3, it is evident that the proposed AFD3S demonstrates exceptional performance in terms of average AUC and AP results on both non-attributed and attributed datasets, achieving optimal levels. Specifically, on attributed datasets, compared to the advanced benchmark model WalkPool, the AUC results of the AFD3S show improvements of 6.44% on Cora, 9.32% on CiteSeer, 10.23% on Texas, and 14.57% on Wisconsin. Simultaneously, the AUC and AP results of the AFD3S on non-attributed datasets also exhibit a certain degree of improvement.
The significant advantage is the importance of node features in node classification and graph classification tasks. The node features of attributed datasets provide direct, rich, and semantically clear information whose expressive power is often superior to node features generated based on random walks. This direct utilization of original node features helps improve the interpretability and stability of the model while reducing additional computational costs. Furthermore, the AFD3S incorporates the neighboring node features of the central node during the local feature augmentation process, enabling it to capture complex relationships between nodes. This approach fully utilizes the multi-source information of graph data, providing superior performance for downstream tasks of GNNs. Therefore, the superior performance of the AFD3S on various datasets demonstrates its effectiveness and practicability in link prediction tasks.
Computational Efficiency and Scalability. To further validate the computational efficiency and scalability of the AFD3S, this paper selects three currently popular and performance-advanced SGRLs—SEAL, GCN+DE (distance encoding) [66], and WalkPool, and conducts comparative experiments on all datasets. The comparative experiments mainly focus on four key indicators: preprocessing time, average training time (50 epochs), average inference time, and total runtime (50 epochs), aiming to comprehensively demonstrate the computational efficiency of the AFD3S in practical applications. Table 4 and Table 5 present the experimental results on non-attributed and attributed datasets. In these tables, “Train” represents the average training time for 50 epochs, “Inference” represents the average inference time, “Preproc.” represents the preprocessing time, and “Runtime” represents the average runtime for 50 epochs. The fastest values are bolded, and the maximum (minimum) speedup ratio in “Speed up” refers to the ratio of the time required by the slowest (fastest) SGRL methods to the AFD3S model.
Through experimental results, we can observe that the AFD3S model significantly reduces training time across all datasets compared to other SGRLs. Specifically, the training speed is improved by factors ranging from 3.34 to 17.95 times. This is particularly evident in larger datasets like PubMed and PB, where the AFD3S model not only outperforms in speed but also scales better as the dataset size increases. The AFD3S model shows even more impressive gains in inference speed, with accelerations ranging from 3 to 61.05 times. This indicates that the model is highly efficient in making predictions once trained, which is crucial for real-time or large-scale applications. When considering the total runtime (which includes training, inference, and preprocessing), the AFD3S model consistently achieves the shortest times across all datasets. The overall runtime is reduced by factors of 2.27 to 14.53 times, showcasing the model’s superior efficiency. The largest gains are seen in the Yeast dataset, where the model achieves a 14.53× speedup, and in the large-scale PubMed dataset, where the model is 11.64 times faster than the slowest competitor. Although the preprocessing time for the AFD3S model is relatively higher in some cases, such as the NS dataset, this is effectively offset by the significant reductions in training and inference times. As a result, the overall computational time is still sharply reduced, highlighting the model’s strength in scenarios where preprocessing might be time-consuming but is outweighed by the benefits during training and inference. For scalability, as the dataset size increases, the computational efficiency of AFD3S becomes increasingly apparent. The highest acceleration multiples are achieved on the three large PubMed, PB, and Yeast datasets, demonstrating the excellent performance of the AFD3S in computational efficiency and scalability.
This gain is primarily attributed to the innovative strategies employed by the AFD3S. Adopting a random walk-based strategy to sample sparse subgraphs instead of enclosing subgraphs significantly reduces subgraphs’ storage and computational overhead. As the graph size increases, the scale of extracted subgraphs decreases from exponential to linear, reducing computational complexity and improving model efficiency. Additionally, the randomness in random walks brings additional regularization benefits to the model, further enhancing its performance. Meanwhile, the AFD3S utilizes easily pre-computed subgraph-level diffusion operators to replace expensive message-passing schemes through feature diffusion, significantly improving training and inference speeds. These optimization measures collectively enable the AFD3S to demonstrate excellent computational efficiency and scalability in link prediction tasks.
We recognize that preprocessing time is a critical factor. To address this, we suggest the following strategies: (i) Parallel processing: Implementing parallel processing to handle different parts of the preprocessing pipeline concurrently. (ii) Efficient algorithms: Utilizing more efficient algorithms for subgraph extraction and feature computation. (iii) Sampling: Reducing the dataset size through intelligent sampling to maintain a representative subset without processing the entire dataset.
Ablation Study. We conducted an ablation study to further explore the key factors contributing to the performance gains of the AFD3S, specifically verifying the effectiveness of local feature augmentation and graph diffusion operations on the predictive performance of the AFD3S. Three variants of the AFD3S were designed and compared with the original AFD3S in terms of AUC and AP metrics for link prediction across all datasets. The three variants are as follows: (1) Variant AFD 3 S , which does not perform local feature augmentation and graph diffusion operations; (2) Variant AFD 3 S + , which performs local feature augmentation but does not perform graph diffusion operations; (3) Variant AFD 3 S + , which performs graph diffusion operations but does not perform local feature augmentation. The experimental results are presented in Figure 6 and Figure 7.
Observing the experimental results, it can be seen that on most datasets, among the three variants of AFD3S, the link prediction performance of Variant AFD 3 S + using local augmentation alone and Variant AFD 3 S + using graph diffusion alone both show some degree of improvement compared to Variant AFD 3 S without any augmentation. This proves the effectiveness of the two augmentation strategies proposed in this paper for AFD3S in link prediction tasks. Furthermore, when local feature augmentation and graph diffusion operations are used simultaneously (i.e., the original AFD3S), the performance is improved on all datasets, with significant improvements on most datasets. The main reason is that after local feature augmentation, the subsequent graph diffusion operation amplifies this augmentation effect, significantly improving subsequent prediction performance.
Specifically, since the AFD3S extracts sparsely sampled subgraphs of target node pairs to reduce subgraphs’ storage and computational efficiency, it cannot, in most cases, include all h-hop neighbor nodes of the target node pairs as enclosing subgraphs can. However, the local feature augmentation operation is based on the original input graph, fusing features from other neighbor nodes of the central node through feature augmentation, thus compensating for the shortcomings of sparse subgraphs. This augmentation is further amplified by the graph diffusion operation, and it can capture the structural relationships and similarities between nodes. At the same time, it simulates the information diffusion process between nodes through feature diffusion to simplify the information transmission and aggregation operations in subgraph representation learning, accelerating the training and inference speed of downstream tasks.
In summary, AFD3S improves the link prediction performance and reduces subgraphs’ storage and computational overhead, enhancing the model’s computational efficiency and scalability. It provides an efficient and practical solution for graph analysis tasks.
Parameter Sensitivity. We also conducted a sensitivity analysis on the two hyperparameters of the extracted sparsely sampled subgraph, namely the walk length h and the number of walks k, to investigate the impact of the size of the sampled subgraph on the link prediction performance of the AFD3S. Due to certain similarities across different datasets, experiments were performed on the Power non-attribute and Cora attribute datasets as examples. All other parameters remained unchanged in the experiments except for the test variable to maintain fairness. The experimental results are shown in Figure 8 and Figure 9.
The experimental results show that the AFD3S performs excellent prediction even when using smaller h and k values. Notably, when the extracted subgraph is too large, the prediction performance decreases compared to a smaller subgraph. This is likely because a larger subgraph may contain nodes and edges irrelevant to the target link prediction task. Such noise and irrelevant information may interfere with the model’s learning process, leading the model to capture incorrect patterns and thus reducing prediction accuracy. This finding further proves that the AFD3S can extract key information from sparse and small sampled subgraphs, significantly improving computational efficiency while ensuring prediction performance. Additionally, it demonstrates the superiority and scalability of AFD3S in handling large-scale graph data. The AFD3S can more effectively cope with large-scale graph data by reducing computational and storage overhead, providing strong support for practical applications.
t-SNE Plots. we present both t-SNE plots and heatmaps that visualize the learned representations and the impact of hyperparameter variations on performance metrics. The t-SNE (t-Distributed Stochastic Neighbor Embedding) plots, as shown in Figure 10, illustrate the distribution of learned node embeddings in a two-dimensional space for three different datasets: Citeseer, Cora, and PubMed. Each plot compares the results with and without the application of our AFD3S algorithm. In the left column of the Figure 10a,c,e, where AFD3S is not applied, the clusters are less distinct, indicating a weaker ability to separate different node classes. This suggests that the model without AFD3S struggles to capture the underlying data structure effectively. On the other hand, the right column of Figure 10b,d,f demonstrates more distinct clusters when AFD3S is applied, showing that our method enhances the model’s ability to learn more discriminative and meaningful representations, leading to improved separation between different classes.
Heatmaps. The heatmaps in Figure 11 analyse how different hyperparameters and algorithm variants impact model performance across various datasets. In Figure 11a, the AUC heatmap compares different variants of the AFD3S algorithm across multiple datasets. Darker blue shades represent higher AUC scores, indicating better link prediction performance. This visualization helps identify which algorithm variant performs optimally for each specific dataset. Figure 11b,c further explore the relationship between hyperparameters—specifically the number of hidden layers (h) and the number of walks (k)—and the performance metrics (AUC and AP scores) on the Cora and Power datasets. The heatmaps show that certain configurations yield better results, with darker colors signifying higher performance. For example, increasing the number of hidden layers or walks generally improves performance, as shown by the deeper blue shades in the heatmaps. These insights are crucial for fine-tuning the model to achieve optimal performance.

5. Conclusions

In this paper, we propose a novel SubGraph Representation Learning (SGRL) framework called Augmented Feature Diffusion on Sparsely Sampled Subgraph (AFD3S). AFD3S integrates neighborhood features for central nodes through local feature augmentation and utilizes a random walk strategy to sample sparse subgraphs, effectively reducing the storage and computational requirements of the subgraphs. Additionally, by introducing subgraph-level diffusion operators that can be easily precomputed, AFD3S employs feature diffusion operations to replace the traditional expensive message-passing schemes, simplifying the subgraph representation learning process and further accelerating the training and inference processes. Finally, experimental results on multiple real-world datasets show that compared to existing SGRLs, the proposed AFD3S significantly improves computational speed and exhibits higher link prediction performance. These results fully demonstrate the excellent performance, computational efficiency, and scalability of the AFD3S for link prediction tasks.
Our future work will focus on: (i) Integrating attention mechanisms: Exploring the use of attention mechanisms for path selection to potentially enhance model performance. (ii) Scalability improvements: Investigating methods to improve the scalability of AFD3S for larger datasets and more complex graphs. (iii) Domain-specific applications: Applying AFD3S to other graph-based tasks and domains to evaluate its versatility and effectiveness.

Author Contributions

Conceptualization, methodology, formal analysis, investigation, data curation, visualization, writing—original draft preparation, X.W. and H.C.; supervision, resources, project administration, funding acquisition, writing—review and editing, X.W. and H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article, further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Nie, M.; Chen, D.; Wang, D. Reinforcement learning on graphs: A survey. IEEE Trans. Emerg. Top. Comput. Intell. 2023, 7, 1065–1082. [Google Scholar] [CrossRef]
  2. Adamic, L.A.; Adar, E. Friends and neighbors on the web. Soc. Netw. 2003, 25, 211–230. [Google Scholar] [CrossRef]
  3. Chen, L.; Xie, Y.; Zheng, Z.; Zheng, H.; Xie, J. Friend recommendation based on multi-social graph convolutional network. IEEE Access 2020, 8, 43618–43629. [Google Scholar] [CrossRef]
  4. Huang, X.; Chen, D.; Ren, T.; Wang, D. A survey of community detection methods in multilayer networks. Data Min. Knowl. Discov. 2021, 35, 1–45. [Google Scholar] [CrossRef]
  5. Oyetunde, T.; Zhang, M.; Chen, Y.; Tang, Y.; Lo, C. BoostGAPFILL: Improving the fidelity of metabolic network reconstructions through integrated constraint and pattern-based methods. Bioinformatics 2017, 33, 608–611. [Google Scholar] [CrossRef] [PubMed]
  6. Zitnik, M.; Agrawal, M.; Leskovec, J. Modeling polypharmacy side effects with graph convolutional networks. Bioinformatics 2018, 34, i457–i466. [Google Scholar] [CrossRef]
  7. Zhang, W.; Xu, D. Evolving model for the complex traffic and transportation network considering self-growth situation. Discret. Dyn. Nat. Soc. 2012, 2012, 291965. [Google Scholar] [CrossRef]
  8. Wang, L. Analysis and Evaluation of Kinect-Based Action Recognition Algorithms. Master’s Thesis, School of the Computer Science and Software Engineering, The University of Western Australia, Perth, Australia, 2017. [Google Scholar]
  9. Wang, L.; Huynh, D.Q.; Mansour, M.R. Loss switching fusion with similarity search for video classification. In Proceedings of the 2019 IEEE International Conference on Image Processing (ICIP), Taipei, Taiwan, 22–25 September 2019; pp. 974–978. [Google Scholar]
  10. Wang, L.; Huynh, D.Q.; Koniusz, P. A comparative review of recent kinect-based action recognition algorithms. IEEE Trans. Image Process. 2019, 29, 15–28. [Google Scholar] [CrossRef]
  11. Wang, L.; Koniusz, P.; Huynh, D.Q. Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 8698–8708. [Google Scholar]
  12. Wang, L.; Koniusz, P. Self-supervising action recognition by statistical moment and subspace descriptors. In Proceedings of the 29th ACM International Conference on Multimedia, Virtual Event, China, 20–24 October 2021; pp. 4324–4333. [Google Scholar]
  13. Koniusz, P.; Wang, L.; Cherian, A. Tensor representations for action recognition. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 648–665. [Google Scholar] [CrossRef]
  14. Qin, Z.; Liu, Y.; Ji, P.; Kim, D.; Wang, L.; Anwar, S.; Gedeon, T. Fusing higher-order features in graph neural networks for skeleton-based action recognition. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 4783–4797. [Google Scholar] [CrossRef]
  15. Wang, L.; Koniusz, P. Uncertainty-dtw for time series and sequences. In Proceedings of the European Conference on Computer Vision, Tel Aviv, Israel, 23–27 October 2022; pp. 176–195. [Google Scholar]
  16. Wang, L.; Koniusz, P. 3mformer: Multi-order multi-mode transformer for skeletal action recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Vancouver, BC, Canada, 17–24 June 2023; pp. 5620–5631. [Google Scholar]
  17. Wang, L.; Koniusz, P. Temporal-viewpoint transportation plan for skeletal few-shot action recognition. In Proceedings of the Asian Conference on Computer Vision, Macau, China, 4–8 December 2022; pp. 4176–4193. [Google Scholar]
  18. Wang, L.; Liu, J.; Koniusz, P. 3D Skeleton-based Few-shot Action Recognition with JEANIE is not so Naïve. arXiv 2021, arXiv:2112.12668. [Google Scholar]
  19. Wang, L.; Liu, J.; Zheng, L.; Gedeon, T.; Koniusz, P. Meet JEANIE: A Similarity Measure for 3D Skeleton Sequences via Temporal-Viewpoint Alignment. Int. J. Comput. Vis. 2024, 1–32. [Google Scholar] [CrossRef]
  20. Wang, L. Robust Human Action Modelling. Ph.D. Thesis, The Australian National University, Canberra, Australia, 2023. [Google Scholar]
  21. Wang, L.; Koniusz, P. Flow dynamics correction for action recognition. In Proceedings of the ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 3795–3799. [Google Scholar]
  22. Wang, L.; Sun, K.; Koniusz, P. High-order tensor pooling with attention for action recognition. In Proceedings of the ICASSP 2024-2024 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Seoul, Republic of Korea, 14–19 April 2024; pp. 3885–3889. [Google Scholar]
  23. Chen, W.; Xiao, H.; Zhang, E.; Hu, L.; Wang, L.; Liu, M.; Chen, C. SATO: Stable Text-to-Motion Framework. arXiv 2024, arXiv:2405.01461. [Google Scholar]
  24. Fang, S.; Wang, L.; Zheng, C.; Tian, Y.; Chen, C. SignLLM: Sign Languages Production Large Language Models. arXiv 2024, arXiv:2405.10718. [Google Scholar]
  25. Chen, Q.; Wang, L.; Koniusz, P.; Gedeon, T. Motion meets Attention: Video Motion Prompts. arXiv 2024, arXiv:2407.03179. [Google Scholar]
  26. Wang, L.; Yuan, X.; Gedeon, T.; Zheng, L. Taylor videos for action recognition. arXiv 2024, arXiv:2402.03019. [Google Scholar]
  27. Zhu, L.; Wang, L.; Raj, A.; Gedeon, T.; Chen, C. Advancing Video Anomaly Detection: A Concise Review and a New Dataset. arXiv 2024, arXiv:2402.04857. [Google Scholar]
  28. Lü, L.; Medo, M.; Yeung, C.H.; Zhang, Y.C.; Zhang, Z.K.; Zhou, T. Recommender systems. Phys. Rep. 2012, 519, 1–49. [Google Scholar] [CrossRef]
  29. Newman, M.E. Clustering and preferential attachment in growing networks. Phys. Rev. E 2001, 64, 025102. [Google Scholar] [CrossRef]
  30. Katz, L. A new status index derived from sociometric analysis. Psychometrika 1953, 18, 39–43. [Google Scholar] [CrossRef]
  31. Chen, D.; Nie, M.; Xie, F.; Wang, D.; Chen, H. Link Prediction and Graph Structure Estimation for Community Detection. Mathematics 2024, 12, 1269. [Google Scholar] [CrossRef]
  32. Hamilton, W.L. Graph Representation Learning; Morgan & Claypool Publishers: San Rafael, CA, USA, 2020. [Google Scholar]
  33. Nie, M.; Chen, D.; Wang, D. Graph embedding method based on biased walking for link prediction. Mathematics 2022, 10, 3778. [Google Scholar] [CrossRef]
  34. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef]
  35. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
  36. Alon, U.; Zilberstein, M.; Levy, O.; Yahav, E. code2vec: Learning Distributed Representations of Code. Proc. Acm Program. Lang. 2018, 3, 1–29. [Google Scholar] [CrossRef]
  37. Alon, U.; Brody, S.; Levy, O.; Yahav, E. code2seq: Generating Sequences from Structured Representations of Code. arXiv 2019, arXiv:1808.01400. [Google Scholar]
  38. Li, W.; Gao, Y.; Li, A.; Zhang, X.; Gu, J.; Liu, J. Sparse Subgraph Prediction Based on Adaptive Attention. Appl. Sci. 2023, 13, 8166. [Google Scholar] [CrossRef]
  39. Zhang, M.; Chen, Y. Link prediction based on graph neural networks. Adv. Neural Inf. Process. Syst. 2018, 31, 5171–5181. [Google Scholar]
  40. Zhang, M.; Chen, Y. Weisfeiler–Lehman Neural Machine for Link Prediction. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; KDD ’17. pp. 575–583. [Google Scholar]
  41. Keikha, M.M.; Rahgozar, M.; Asadpour, M. DeepLink: A Novel Link Prediction Framework based on Deep Learning. J. Inf. Sci. 2021, 47, 642–657. [Google Scholar] [CrossRef]
  42. Cai, L.; Ji, S. A Multi-Scale Approach for Graph Link Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 7–12 February 2020; Volume 34, pp. 3308–3315. [Google Scholar]
  43. Mavromatis, C.; Karypis, G. Graph infoclust: Maximizing coarse-grain mutual information in graphs. In Proceedings of the Pacific-Asia Conference on Knowledge Discovery and Data Mining, Delhi, India, 11–14 May 2021; pp. 541–553. [Google Scholar]
  44. Dai, H.; Dai, B.; Song, L. Discriminative embeddings of latent variable models for structured data. In Proceedings of the International Conference on Machine Learning, New York City, NY, USA, 19–24 June 2016; pp. 2702–2711. [Google Scholar]
  45. Li, Y.; Tarlow, D.; Brockschmidt, M.; Zemel, R. Gated graph sequence neural networks. arXiv 2015, arXiv:1511.05493. [Google Scholar]
  46. Zhang, M.; Li, P.; Xia, Y.; Wang, K.; Jin, L. Labeling trick: A theory of using graph neural networks for multi-node representation learning. Adv. Neural Inf. Process. Syst. 2021, 34, 9061–9073. [Google Scholar]
  47. Cai, L.; Li, J.; Wang, J.; Ji, S. Line graph neural networks for link prediction. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 5103–5113. [Google Scholar] [CrossRef] [PubMed]
  48. Chen, H.; Chen, J.; Liu, D.; Zhang, S.; Hu, S.; Cheng, Y.; Wu, X. Link Prediction Based on the Sub-graphs Learning with Fused Features. In Proceedings of the International Conference on Neural Information Processing, Changsha, China, 20–23 November 2023; pp. 253–264. [Google Scholar]
  49. Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed]
  50. Watts, D.J.; Strogatz, S.H. Collective dynamics of ‘small-world’ networks. Nature 1998, 393, 440–442. [Google Scholar] [CrossRef]
  51. Louis, P.; Jacob, S.A.; Salehi-Abari, A. Sampling enclosing subgraphs for link prediction. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 4269–4273. [Google Scholar]
  52. Hu, G. Weighted Sampling based Large-scale Enclosing Subgraphs Embedding for Link Prediction. Authorea Prepr. 2023. [Google Scholar] [CrossRef]
  53. Jaccard, P. Distribution de la flore alpine dans le bassin des Dranses et dans quelques régions voisines. Bull. Soc. Vaudoise Sci. Nat. 1901, 37, 241–272. [Google Scholar]
  54. Sohn, K.; Lee, H.; Yan, X. Learning structured output representation using deep conditional generative models. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
  55. Hoffman, M.D.; Johnson, M.J. Elbo surgery: Yet another way to carve up the variational evidence lower bound. In Workshop in Advances in Approximate Bayesian Inference; NIPS: San Diego, CA, USA, 2016; Volume 1. [Google Scholar]
  56. Kullback, S.; Leibler, R.A. On information and sufficiency. Ann. Math. Stat. 1951, 22, 79–86. [Google Scholar]
  57. Yuan, W.; Han, Y.; Guan, D.; Han, G.; Tian, Y.; Al-Dhelaan, A.; Al-Dhelaan, M. Weighted enclosing subgraph-based link prediction for complex network. EURASIP J. Wirel. Commun. Netw. 2022, 2022, 65. [Google Scholar] [CrossRef]
  58. Pan, L.; Shi, C.; Dokmanić, I. Neural link prediction with walk pooling. arXiv 2021, arXiv:2110.04375. [Google Scholar]
  59. Granovetter, M. The strength of weak ties: A network theory revisited. Sociol. Theory 1983, 1, 201–233. [Google Scholar] [CrossRef]
  60. Gasteiger, J.; Weißenberger, S.; Günnemann, S. Diffusion improves graph learning. Adv. Neural Inf. Process. Syst. 2019, 32. [Google Scholar] [CrossRef]
  61. Newman, M.E. Finding community structure in networks using the eigenvectors of matrices. Phys. Rev. E 2006, 74, 036104. [Google Scholar] [CrossRef]
  62. Von Mering, C.; Krause, R.; Snel, B.; Cornell, M.; Oliver, S.G.; Fields, S.; Bork, P. Comparative assessment of large-scale data sets of protein–protein interactions. Nature 2002, 417, 399–403. [Google Scholar] [CrossRef]
  63. Ackland, R. Mapping the US political blogosphere: Are conservative bloggers more prominent? In Proceedings of the BlogTalk Downunder 2005 Conference, Sydney. BlogTalk Downunder 2005 Conference, Sydney, Australia, 19–22 May 2005. [Google Scholar]
  64. Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi-Rad, T. Collective classification in network data. AI Mag. 2008, 29, 93. [Google Scholar] [CrossRef]
  65. Pei, H.; Wei, B.; Chang, K.C.C.; Lei, Y.; Yang, B. Geom-gcn: Geometric graph convolutional networks. arXiv 2020, arXiv:2002.05287. [Google Scholar]
  66. Li, P.; Wang, Y.; Wang, H.; Leskovec, J. Distance encoding: Design provably more powerful gnns for structural representation learning. arXiv 2020, arXiv:2009.00142. [Google Scholar]
  67. Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
  68. Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How powerful are graph neural networks? arXiv 2018, arXiv:1810.00826. [Google Scholar]
  69. Kipf, T.N.; Welling, M. Variational graph auto-encoders. arXiv 2016, arXiv:1611.07308. [Google Scholar]
  70. Frasca, F.; Rossi, E.; Eynard, D.; Chamberlain, B.; Bronstein, M.; Monti, F. Sign: Scalable inception graph neural networks. arXiv 2020, arXiv:2004.11198. [Google Scholar]
Figure 1. Overview of the framework of model AFD3S. The colored nodes are for differentiation; each node corresponds to its row in the matrix (e.g., node 1 to the first row of X). Nodes i and j (1 and 6, marked in red) are the target nodes for link prediction, with the dashed line and red question mark indicating an uncertain edge connection. The goal is to predict the connection probability p i j using our model. The matrix H enhances each node’s features by fusing them with those of neighboring nodes. The matrix Z, shown in the red box, is generated through feature diffusion with the diffusion matrix M.
Figure 1. Overview of the framework of model AFD3S. The colored nodes are for differentiation; each node corresponds to its row in the matrix (e.g., node 1 to the first row of X). Nodes i and j (1 and 6, marked in red) are the target nodes for link prediction, with the dashed line and red question mark indicating an uncertain edge connection. The goal is to predict the connection probability p i j using our model. The matrix H enhances each node’s features by fusing them with those of neighboring nodes. The matrix Z, shown in the red box, is generated through feature diffusion with the diffusion matrix M.
Electronics 13 03249 g001
Figure 2. Schematic diagram of concatenated local feature augment. The yellow circles on the graph correspond to neighboring nodes, generating features from local neighborhood distributions. The different nodes are emphasized by color to differentiate their roles in the process. Specifically, the nodes to be enhanced are represented by green or purple nodes, the yellow nodes are domain nodes of the corresponding nodes, and the other white nodes are non-domain nodes (irrelevant). Then, the original and generated features are inputs for downstream GNNs.
Figure 2. Schematic diagram of concatenated local feature augment. The yellow circles on the graph correspond to neighboring nodes, generating features from local neighborhood distributions. The different nodes are emphasized by color to differentiate their roles in the process. Specifically, the nodes to be enhanced are represented by green or purple nodes, the yellow nodes are domain nodes of the corresponding nodes, and the other white nodes are non-domain nodes (irrelevant). Then, the original and generated features are inputs for downstream GNNs.
Electronics 13 03249 g002
Figure 3. Target node pair (u, v) extraction sampled subgraph. The dashed line and red question mark indicating an uncertain edge connection.
Figure 3. Target node pair (u, v) extraction sampled subgraph. The dashed line and red question mark indicating an uncertain edge connection.
Electronics 13 03249 g003
Figure 4. The average AUC of all models on attributed and non-attributed datasets (over 10 runs).
Figure 4. The average AUC of all models on attributed and non-attributed datasets (over 10 runs).
Electronics 13 03249 g004
Figure 5. The average AP of all models on attributed and non-attributed datasets (over 10 runs).
Figure 5. The average AP of all models on attributed and non-attributed datasets (over 10 runs).
Electronics 13 03249 g005
Figure 6. Experimental results of link prediction AUC using AFD3S and its three variants.
Figure 6. Experimental results of link prediction AUC using AFD3S and its three variants.
Electronics 13 03249 g006
Figure 7. Experimental results of link prediction AP using AFD3S and its three variants.
Figure 7. Experimental results of link prediction AP using AFD3S and its three variants.
Electronics 13 03249 g007
Figure 8. AUC and AP results of AFD3S on Power under different sampled subgraph sizes.
Figure 8. AUC and AP results of AFD3S on Power under different sampled subgraph sizes.
Electronics 13 03249 g008
Figure 9. AUC and AP results of AFD3S on Cora under different sampled subgraph sizes.
Figure 9. AUC and AP results of AFD3S on Cora under different sampled subgraph sizes.
Electronics 13 03249 g009
Figure 10. t-SNE plots on three datasets, with and without the use of our AFD3S. (a) Citeseer without AFD3S. (b) Citeseer with AFD3S. (c) Cora without AFD3S. (d) Cora with AFD3S. (e) PubMed without AFD3S. (f) PubMed with AFD3S.
Figure 10. t-SNE plots on three datasets, with and without the use of our AFD3S. (a) Citeseer without AFD3S. (b) Citeseer with AFD3S. (c) Cora without AFD3S. (d) Cora with AFD3S. (e) PubMed without AFD3S. (f) PubMed with AFD3S.
Electronics 13 03249 g010
Figure 11. Heatmap visualizations of the effects of the number of hidden layers and the number of walks. (a) AUC heatmap of results on AFD3S variants. (b) AUC and AP Heatmaps on Cora. (c) AUC and AP Heatmaps on Power.
Figure 11. Heatmap visualizations of the effects of the number of hidden layers and the number of walks. (a) AUC heatmap of results on AFD3S variants. (b) AUC and AP Heatmaps on Cora. (c) AUC and AP Heatmaps on Power.
Electronics 13 03249 g011
Table 1. Statistics of network datasets. NA stands for Not Applicable.
Table 1. Statistics of network datasets. NA stands for Not Applicable.
DatasetsNodeEdgeAvg DegFeatType
NS14662742375NACollaboration Network
Power49416594267NAElectricity Network
Yeast237511,693985NABiological Network
PB122216,7142736NABlog Network
Cora270844883311433Citation Network
CiteSeer332738702333703Citation Network
PubMed19,71737,676382500Citation Network
Texas1831431561703Web Network
Wisconsin2511971571703Web Network
Table 2. Average AUC for attributed and non-attributed datasets (over 10 runs). The best value is marked in bold.
Table 2. Average AUC for attributed and non-attributed datasets (over 10 runs). The best value is marked in bold.
ModelNSPowerPBYeastCoraCiteSeerPubMedTexasWisconsin
GCN 91.75 ± 1.68 69.41 ± 0.90 90.80 ± 0.43 91.29 ± 1.11 89.14 ± 1.20 87.89 ± 1.48 92.72 ± 0.24 67.42 ± 9.39 72.77 ± 6.96
GraphSAGE 91.39 ± 1.73 64.94 ± 2.10 88.47 ± 2.56 87.41 ± 1.64 85.96 ± 2.04 84.05 ± 1.72 81.60 ± 1.22 53.59 ± 9.37 61.81 ± 9.66
GIN 83.26 ± 3.81 58.28 ± 2.61 88.42 ± 2.09 84.00 ± 1.94 68.74 ± 2.74 69.63 ± 2.77 82.49 ± 2.89 63.46 ± 8.87 70.82 ± 8.25
GAE 92.50 ± 1.71 68.17 ± 1.64 91.52 ± 0.35 93.13 ± 0.57 90.21 ± 0.98 88.42 ± 1.13 94.53 ± 0.69 68.67 ± 6.95 75.10 ± 8.69
VGAE 91.83 ± 1.49 66.23 ± 0.94 91.19 ± 0.85 90.19 ± 1.38 92.17 ± 0.72 90.24 ± 1.10 92.14 ± 0.19 74.61 ± 8.61 74.39 ± 8.39
GIC 90.88 ± 1.85 62.01 ± 1.25 73.65 ± 1.36 88.78 ± 0.63 91.42 ± 1.24 92.99 ± 1.14 91.04 ± 0.61 65.16 ± 7.87 75.24 ± 8.45
SEAL 98.63 ± 0.67 85.28 ± 0.91 95.07 ± 0.35 97.56 ± 0.32 90.29 ± 1.89 88.12 ± 0.85 97.82 ± 0.28 71.68 ± 6.85 77.96 ± 10.37
GCN+DE 98.66 ± 0.66 80.65 ± 1.40 95.14 ± 0.35 96.75 ± 0.41 91.51 ± 1.10 88.88 ± 1.53 98.15 ± 0.11 76.60 ± 6.40 74.65 ± 9.56
WESLP 98.68 ± 0.12 85.31 ± 0.35 94.68 ± 0.41 97.41 ± 0.18 89.91 ± 1.33 89.01 ± 1.25 96.69 ± 0.53 71.15 ± 4.41 77.98 ± 8.73
Scaled 98.88 ± 0.50 83.99 ± 0.84 94.53 ± 0.57 97.68 ± 0.17 90.55 ± 0.18 87.69 ± 1.67 97.94 ± 0.43 70.12 ± 7.44 76.89 ± 9.98
WalkPool 98.92 ± 0.52 90.25 ± 0.64 95.50 ± 0.26 98.16 ± 0.20 92.24 ± 0.65 89.97 ± 1.01 98.36 ± 0.11 78.44 ± 9.83 79.57 ± 11.02
AFD3S 98.98 ± 0.28 90.38 ± 0.80 95.84 ± 0.29 98.42 ± 0.26 98.68 ± 0.13 99.29 ± 0.28 99.12 ± 0.11 88.67 ± 5.18 94.14 ± 3.95
Table 3. Average AP for attributed and non-attributed datasets (over 10 runs). The best value is marked in bold.
Table 3. Average AP for attributed and non-attributed datasets (over 10 runs). The best value is marked in bold.
ModelNSPowerPBYeastCoraCiteSeerPubMedTexasWisconsin
GCN 92.64 ± 1.78 71.26 ± 1.81 93.14 ± 0.29 93.02 ± 1.31 91.21 ± 1.22 89.99 ± 1.19 94.21 ± 0.31 69.71 ± 8.63 75.03 ± 7.48
GraphSAGE 92.31 ± 1.43 65.20 ± 1.92 89.66 ± 1.66 88.55 ± 1.64 86.86 ± 1.96 84.95 ± 1.34 82.10 ± 2.08 54.36 ± 8.22 62.90 ± 7.12
GIN 83.46 ± 2.91 59.77 ± 3.11 89.93 ± 2.31 86.12 ± 1.89 70.64 ± 2.34 71.88 ± 2.48 83.87 ± 2.17 65.62 ± 9.05 73.12 ± 8.75
GAE 93.60 ± 1.53 70.09 ± 1.72 93.03 ± 0.27 95.21 ± 0.48 92.28 ± 0.48 90.92 ± 1.08 96.16 ± 0.71 71.02 ± 7.31 77.31 ± 8.15
VGAE 92.51 ± 1.09 67.97 ± 0.84 92.71 ± 0.33 92.15 ± 1.19 93.48 ± 0.64 92.31 ± 1.60 93.93 ± 0.22 76.77 ± 9.31 76.27 ± 7.25
GIC 91.42 ± 1.37 64.12 ± 1.18 72.98 ± 1.06 90.07 ± 0.48 93.01 ± 1.02 94.13 ± 1.24 92.74 ± 0.46 66.33 ± 8.27 77.87 ± 7.98
SEAL 98.61 ± 0.32 86.96 ± 1.15 95.13 ± 0.26 98.64 ± 0.28 92.44 ± 2.01 90.41 ± 1.15 98.12 ± 0.41 73.02 ± 5.99 79.34 ± 11.03
GCN+DE 98.88 ± 0.42 81.23 ± 1.87 96.2 ± 0.65 96.91 ± 0.19 92.11 ± 1.90 89.05 ± 1.26 99.61 ± 0.36 76.92 ± 7.36 75.84 ± 8.03
WESLP 98.62 ± 0.09 87.01 ± 0.95 94.32 ± 0.37 97.73 ± 0.22 92.41 ± 2.03 91.02 ± 1.14 97.87 ± 0.23 72.94 ± 3.81 79.62 ± 9.73
Scaled 98.68 ± 0.33 85.01 ± 0.71 94.18 ± 0.37 98.43 ± 0.21 92.35 ± 0.21 89.72 ± 1.38 98.08 ± 0.33 73.01 ± 6.54 78.97 ± 9.15
WalkPool 98.72 ± 0.73 91.03 ± 0.42 95.22 ± 0.41 98.71 ± 0.15 94.12 ± 1.05 91.85 ± 1.42 98.14 ± 0.53 81.04 ± 9.52 79.98 ± 11.42
AFD3S 98.75 ± 0.27 91.17 ± 0.68 95.34 ± 0.37 98.76 ± 0.21 98.84 ± 0.18 99.37 ± 0.31 99.05 ± 0.12 91.14 ± 2.92 94.08 ± 3.80
Table 4. Comparison of the computation time between SGRLs and AFD3S models on the non-attributed datasets. The optimal time is marked in bold.
Table 4. Comparison of the computation time between SGRLs and AFD3S models on the non-attributed datasets. The optimal time is marked in bold.
Datasets SEALGCN + DEWalkPoolAFD3SSpeed Up
NSTrain 4.91 ± 0.23 3.58 ± 0.12 7.66 ± 0.09 2.21 ± 0.01 3.47 (1.62)
Inference 0.14 ± 0.01 0.10 ± 0.01 0.41 ± 0.02 0.06 ± 0.01 6.83 (1.67)
Preproc. 17.86 11.73 12.18 30.21 0.59 (0.39)
Runtime 275.28 198.98 427.03 187.84 2.27 (1.06)
PowerTrain 11.73 ± 0.02 8.62 ± 0.27 18.46 ± 0.76 5.23 ± 0.31 3.53 (1.65)
Inference 0.33 ± 0.01 0.25 ± 0.01 0.87 ± 0.06 0.13 ± 0.01 6.69 (1.92)
Preproc. 44.48 28.59 33.51 65.12 0.68 (0.44)
Runtime 658.14 479.4 1024.55 403.55 2.54 (1.19)
YeastTrain 24.03 ± 0.40 18.41 ± 0.71 174.80 ± 1.06 9.33 ± 0.01 18.74 (1.97)
Inference 0.54 ± 0.05 0.46 ± 0.06 8.05 ± 0.11 0.15 ± 0.01 53.67 (3.07)
Preproc. 115.02 82.19 90.75 166.30 0.69 (0.49)
Runtime 1362.85 1040.72 9443.17 649.90 14.53 (1.60)
PBTrain 64.62 ± 5.59 55.82 ± 1.59 133.30 ± 0.52 15.23 ± 0.21 8.75 (3.67)
Inference 2.43 ± 0.10 2.01 ± 0.09 6.48 ± 0.15 0.27 ± 0.01 24 (7.44)
Preproc. 531.79 398.81 136.29 310.53 1.71 (0.44)
Runtime 3947.45 3346.80 7291.50 1001.73 7.28 (3.34)
Table 5. Comparison of the computation time between SGRLs and AFD3S models on the attributed datasets. The optimal time is marked in bold.
Table 5. Comparison of the computation time between SGRLs and AFD3S models on the attributed datasets. The optimal time is marked in bold.
Datasets SEALGCN + DEWalkPoolAFD3SSpeed Up
CoraTrain 18.37 ± 1.49 14.85 ± 0.53 18.53 ± 0.91 5.36 ± 0.16 3.46 (2.77)
Inference 0.73 ± 0.12 0.62 ± 0.08 1.00 ± 0.15 0.15 ± 0.02 6.67 (4.13)
Preproc. 113.32 80.48 27.43 36.36 3.17 (0.75)
Runtime 1090.94 872.68 1034.33 303.15 3.60 (2.88)
CiteSeerTrain 12.54 ± 0.69 11.43 ± 0.71 15.32 ± 0.34 4.59 ± 0.12 3.34 (2.49)
Inference 0.58 ± 0.10 0.52 ± 0.07 0.87 ± 0.05 0.17 ± 0.02 5.12 (3.06)
Preproc. 93.52 71.97 22.82 73.01 1.28 (0.31)
Runtime 768.72 685.98 859.27 331.26 2.59 (2.07)
PubMedTrain 533.19 ± 4.64 423.73 ± 2.67 150.27 ± 6.22 29.71 ± 3.61 17.95 (5.06)
Inference 38.46 ± 1.08 34.44 ± 1.21 8.10 ± 1.06 0.63 ± 0.15 61.05 (12.86)
Preproc. 141.76 106.00 341.12 543.27 0.63 (0.20)
Runtime 30150.31 24311.00 8474.72 2591.31 11.64 (3.27)
TexasTrain 0.32 ± 0.01 0.31 ± 0.01 0.55 ± 0.01 0.14 ± 0.01 3.93 (2.21)
Inference 0.01 ± 0.00 0.01 ± 0.00 0.03 ± 0.01 0.01 ± 0.00 3.00 (1.00)
Preproc. 2.55 1.87 0.92 1.24 2.06 (0.74)
Runtime 20.46 18.55 32.54 8.91 3.65 (2.08)
WisconsinTrain 0.47 ± 0.01 0.43 ± 0.01 0.85 ± 0.04 0.21 ± 0.02 4.05 (2.05)
Inference 0.02 ± 0.00 0.02 ± 0.00 0.06 ± 0.00 0.01 ± 0.00 6.00 (2.00)
Preproc. 3.29 2.63 1.08 1.56 2.11 (0.69)
Runtime 29.27 25.19 49.38 12.89 3.83 (1.95)
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Wu, X.; Chen, H. Augmented Feature Diffusion on Sparsely Sampled Subgraph. Electronics 2024, 13, 3249. https://doi.org/10.3390/electronics13163249

AMA Style

Wu X, Chen H. Augmented Feature Diffusion on Sparsely Sampled Subgraph. Electronics. 2024; 13(16):3249. https://doi.org/10.3390/electronics13163249

Chicago/Turabian Style

Wu, Xinyue, and Huilin Chen. 2024. "Augmented Feature Diffusion on Sparsely Sampled Subgraph" Electronics 13, no. 16: 3249. https://doi.org/10.3390/electronics13163249

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop