Next Article in Journal
A New Semantic Segmentation Method for Remote Sensing Images Integrating Coordinate Attention and SPD-Conv
Previous Article in Journal
Propagation of Ultrashort Optical Pulses in Fractal Objects
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Multi-Order-Content-Based Adaptive Graph Attention Network for Graph Node Classification

1
College of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China
2
Fujian Key Laboratory of Pattern Recognition and Image Understanding, Xiamen 361024, China
3
Institute of Intelligence Science and Engineering, Shenzhen Polytechnic, Shenzhen 518055, China
*
Author to whom correspondence should be addressed.
Symmetry 2023, 15(5), 1036; https://doi.org/10.3390/sym15051036
Submission received: 3 April 2023 / Revised: 24 April 2023 / Accepted: 6 May 2023 / Published: 7 May 2023
(This article belongs to the Section Computer)

Abstract

:
In graph-structured data, the node content contains rich information. Therefore, how to effectively utilize the content is crucial to improve the performance of graph convolutional networks (GCNs) on various analytical tasks. However, current GCNs do not fully utilize the content, especially multi-order content. For example, graph attention networks (GATs) only focus on low-order content, while high-order content is completely ignored. To address this issue, we propose a novel graph attention network with adaptability that could fully utilize the features of multi-order content. Its core idea has the following novelties: First, we constructed a high-order content attention mechanism that could focus on high-order content to evaluate attention weights. Second, we propose a multi-order content attention mechanism that can fully utilize multi-order content, i.e., it combines the attention mechanisms of high- and low-order content. Furthermore, the mechanism has adaptability, i.e., it can perform a good trade-off between high- and low-order content according to the task requirements. Lastly, we applied this mechanism to constructing a graph attention network with structural symmetry. This mechanism could more reasonably evaluate the attention weights between nodes, thereby improving the convergence of the network. In addition, we conducted experiments on multiple datasets and compared the proposed model with state-of-the-art models in multiple dimensions. The results validate the feasibility and effectiveness of the proposed model.

1. Introduction

Networks are ubiquitous in the real world, such as social [1,2], biological [3,4], and citation [5,6] networks. If each entity in the network is regarded as a node, and the interactions between entities are regarded as edges, the relationship between entities can be visualized as graph-structured data [7,8]. In recent years, neural network models [9] have been developed for graph learning [10], and graph neural networks (GNNs) [11] have emerged as a result. Graph node classification [12,13] is an important application scenario of GNNs. In a graph, each node has its own content and label. However, some node labels are hidden. It is important to predict unknown labels using known information. There are many practical applications in real life, for example, using traffic node information in a traffic graph to determine whether there is a traffic congestion node [14], using personal information in a social network to classify users with incomplete information and provide better services to them [15], identifying nodes closely related to diseases in biological network is the key to auxiliary treatment of diseases [16], and using citation information in papers to annotate and classify papers [17].
Graph neural networks [18,19,20] are key in processing graph-structured data. They can model the dependency relationships between nodes by treating real-world problems as connection and message propagation problems between nodes in a graph, enabling the effective analysis of graph-structured data. As an increasing number of experts and scholars conduct in-depth research on graph neural networks, an increasing number of advanced graph neural network models have been proposed. For example, Dynamic-metis (D-METIS) [21] divides s graph into multiple subgraphs, and takes into account the balance of vertices and the balance of cumulative dynamic changes. Therefore, it can learn the dynamics of large graphs more efficiently, and is more accurate in regression tasks. The introduction of s compatibility matrix into graph neural networks (CPGNN) [22] can teach an interpretable class compatibility matrix on the basis of graph neural networks to discover potential neighbor nodes. In addition, non-mixed updates are performed using the information aggregation of similar and heterogeneous neighbors. These improvements render the model applicable to heterogeneous graphs and graphs lacking node features. The generalized page-rank graph neural network (GPR-GNN) [23] optimizes network information extraction by combining generalized page ranks and GNNs, and then learning generalized weights to solve the oversmoothing problem of existing GNNs in multi-layer network structures, ultimately improving model performance. The graph mixed random network based on PageRank (PMRGNN) [24] uses PageRank-based data to enhance the random propagation strategy, and combines two feature extractors to extract features to supplement the mutual information between features. These improvements solve the problem of oversmoothing and improve the generalization performance of the model, so that the model could achieve better performance in tasks of semi-supervised node classification. In particular, graph convolutional networks (GCNs) [25] integrate neighborhood information by using a re-normalized first-order adjacency matrix to obtain meaningful representations of nodes in the network. GCNs have achieved tremendous success in graph-node classification tasks.
Although graph convolutional networks and their variants [26,27] have been successful, a key weakness is that messages in the network are uniformly propagated along edges without considering the relative importance of node attributes. This limits their performance on general network data. Specifically, most graph convolutional networks cannot distinguish the relative importance of messages in the network, resulting in contaminated representations obtained through learning. Thus, attention-based GNNs were proposed to further evaluate how the contribution of neighbors to the central node varies. Graph attention networks (GATs) [28] are the first attempt to introduce an attention module into graph node classification in which the weighting coefficients for each neighbor are calculated on the basis of a self-attention strategy. Despite the numerous successes, GAT compute attention weight coefficients mainly on the basis of low-order node content. These strategies may be suboptimal because they ignore the importance of multi-order content information in the graph. For example, Figure 1 depicts a graph with 10 nodes where different colors represent different node categories. Our goal is to predict the category of Node i, so we need to update its features, that is, the features of neighboring nodes should be aggregated. Low-order content is shown to be in a complex state, and the amount of information for each category is roughly equivalent. Intuitively, it is difficult to judge the importance of neighbor nodes solely on the basis of the content of low-order nodes. On the basis of the content of high-order nodes, we could easily determine the possible category of Node i. Intuitively, the more information about specific categories that is contained in higher-order content, the more likely it is for the central node to belong to that category. This kind of higher-order content information has significant guiding implications for category prediction.
The example mentioned above demonstrates that high-order content information has a significant impact on achieving more discriminative node representation. In order to fully utilize multi-order content information and solve existing problems in graph neural networks, this paper proposes a novel attention-based algorithm for semi-supervised node classification. The algorithm includes a multi-order hybrid attention mechanism in which both high- and low-order contents are considered to compute the attention weights. Specifically, node features between low-order nodes are employed to calculate the coefficient on the basis of low-order content, while the contents of high-order nodes are considered to compute the coefficient on the basis of high-order content. Then, these two coefficients are combined to form the attention weights. We constructed a graph attention network with structural symmetry on the basis of this mechanism. The proposed graph attention network is adaptive and can effectively evaluate the relative importance of high- and low-order content. These weights are used to linearly combine the feature vectors of neighboring nodes, and the features of the central node then are updated. Moreover, this hybrid attention mechanism can better discover friendly neighboring nodes. Extensive experiments have shown that combining these two types of information can lead to more effective semi-supervised node classification performance.

2. Related Works

2.1. Graph Neural Network

Gori et al. [11] first proposed the concept of graph neural networks (GNNs), which addresses the problems of lost structural information and extreme dependence on pre-processing operations in previous graph-structured data processing. Therefore, GNNs have achieved better performance than that of traditional techniques, received widespread attention from academia and industry, and achieved significant breakthroughs in some fields. For example, Yao et al. [29] constructed a single text graph for the corpus on the basis of word co-occurrence and recorded word relationships, used the unique heat representation to initialize the graph neural network, and then performed label prediction for unknown texts by learning words and embedding documents under supervised learning. Jun et al. [30] used the power of graph neural networks to explore complex paired similarities in imaging/non-imaging features between subjects to perform brain-net-based psychiatric diagnoses that provide reliable, concise explanations for clinical diagnoses. Jiang et al. [31] comprehensively investigated the progress of GNNs in solving traffic prediction problems, such as road flow and speed prediction, passenger flow prediction, and traffic demand prediction, which fully demonstrated the enormous potential of GNNs in intelligent transportation systems. An et al. [32] used GNNs to learn the structure and semantic knowledge of sentence relationships for sentiment analysis. Liu et al. [33] monitored the state of machinery through spatiotemporal neural networks, thereby discovering abnormal operations and ensuring the long-term stable operation of the machinery. Deshpande et al. [34] proposed a quantum optimization algorithm based on graph neural networks that uses performance prediction to choose between quantum optimizers and classical optimizers to solve the grid parameter optimization problem.

2.2. Graph Convolutional Network

Later, inspired by convolutional neural networks (CNNs) [35], Bruna et al. [36] were the first to apply the learnable convolution operation from regular grids to irregular graph-structured data. Kipf et al. [25] simplified the definition of frequency-domain graph convolution and proposed graph convolutional networks (GCNs) that greatly improved the computational efficiency of graph convolutional models. GCNs combine the node features and topological structure information of graphs for prediction, and have achieved remarkable results in multiple tasks related to graph data. In recent years, with the deepening of research on graph neural networks, an increasing number of models have been proposed. Deep graph infomax (DGI) [37] is a graph isomorphism network for unsupervised learning. It trains an encoder model to maximize mutual information to capture the global information content of the entire graph and obtain node representations. The temporary graph convolutional network (T-GCN) [38] introduces gating loop units into GCNs to simultaneously capture spatial and temporal dependencies, thereby learning the constraints of the topological structure of dynamic networks and the dynamic changes over time. Experiments have shown that it could fully utilize spatiotemporal correlations to solve prediction problems. GraphSAGE [39] uses batch training, and models based on self-features and neighborhood features of their downsampled neighborhoods, enabling large-scale graph-structured data analysis. Feature-combination-based graph convolutional neural networks (FC-GCNs) [40] generate adjacent matrices to encode the structural information of sentences, and consider prior knowledge to improve the performance of the model in relation extraction. The geometric graph convolutional networks (Geom-GCN) [41] uses permutation-invariant aggregation schemes consisting of three modules, namely, node embedding, structural neighborhood, and double-layer aggregation, to overcome the loss of structural information between nodes and neighbors, and the lack of ability to capture long-range dependencies. Hybrid deep graph convolutional networks (HDGCNs) [42] innovatively adopt a convolutional strategy that combines the spectral and spatial methods, and construct a deep network model to strengthen the convolutional strategy, so that the network could better use the characteristics of the two convolutional methods and thus improve the performance of the model. Frequency adaptation graph convolutional networks (FAGCNs) [43] can integrate low- and high-frequency signals, and evaluate different signals in the process of message transmission, thus enhancing the adaptability of the neural network and effectively preventing oversmoothing. Lastly, the effectiveness of node representations acquired through learning is improved.

3. Proposed Algorithm

How to learn a new feature representation for each node over multiple layers is an important issue for graph node classification. In this section, we introduce the proposed incorporated attention mechanism in which two primary components are involved for attention computation: high- and low-order content. In other words, the required weight for the aggregated information by a central node from its neighbors depends on these two components. Figure 2 depicts the architecture of the proposed incorporated attention mechanism. The upper part of Figure 2 illustrates the generation of high-order content coefficient, while the lower part illustrates the generation of low-order content coefficient. Then, those two type of coefficients are used in an adaptive combination in the message passing step to update the node features. Downstream task node classification is conducted by means of a softmax classifier based on the final representation.
For convenience, mathematical preliminaries and notations used in this paper are first illustrated. Given a graph G ( V , E ) with a set of nodes V and a set of edges E, each node i has an original feature vector x i and a true label y i . Matrix X = [ x 1 , x 2 , , x n ] T stacks all node vectors on top of one another. Furthermore, let A = [ a 1 , a 2 , , a n ] T denote the adjacency matrix. For node i, N i contains the set of indices for its 1-hop neighbors. We only know the labels of a subset of the nodes, and the proposed algorithm aims to learn a model Z = f ( X , A ) R n × l that predicts the unknown labels. For each message passing step l, let h i l 1 R F denote the input features for node i. Then, its output vector is h i l R F . Naturally, the initial node representations are X. We elaborate our algorithm in the following.

3.1. Low-Order Content Coefficient Generation

We follow the method introduced in GAT for low-order content coefficient generation. For an L-layer model, low-order content coefficient f i j is calculated with Equation (1). f i j describes the relevance between nodes i and j that is normalized among its neighbor nodes.
f i j = e x p ( σ ( α T · [ W l h i | | W l h j ] ) ) k N i e x p ( σ ( α T · [ W l h i | | W l h k ] ) ) ,
where | | is the concatenation operation, and σ ( · ) is a non-linear activation function such as LeakyReLU. W l R F × F and α R 2 F are parameters that need to be tuned in training. The greater the value of f i j is, the more important the relevance between nodes i and j, relative to other neighbors. We consider the content information of the node itself the key to node classification. Therefore, we introduce the self-connection of nodes in adjacency matrix A, denoted as A ^ . The calculation formula is as follows:
A ^ = A + I ,
where I is the identity matrix that is used to add a self-connection to each node. In the experimental part, we prove the effectiveness of this operation.

3.2. High-Order Content Coefficient Generation

Considering Adjacent Matrix A that describes the topological information, we generated high-order node connections from it. The initial weight was given to the connection through topological information. The calculation formula of this process is as follows:
A ¯ i j = A ^ i j k = 1 V A ^ i k ,
M i j = k = 1 V A ¯ i k A ¯ j k .
We used a graph convolution kernel encoder to learn the high-order content of nodes. Using a graph convolutional algorithm, we could integrate the information of neighbor nodes. The formula is as follows:
q h = σ ( M X W ) ,
where W is a mapping matrix used to extract relevant information. We expected it to be able to extract important information from the node content after learning. Then, the global content is obtained through the following formula:
q g = 1 V i = 1 V q i h .
Inspired by deep graph infomax, we optimized W by maximizing the mutual information. By maximizing the mutual information, the high-order content is more representative. The objective function in this process is as follows:
L = 1 2 V i = 1 V ( log ( q i h , q g ) + log ( 1 ( q ˜ i h , q g ) ) ) ,
where ( q i h , q g ) are positive sample pairs, and ( q ˜ i h , q g ) are negative sample pairs. q ˜ i h is obtained by perturbing input matrix X in rows. Then, the sample pairs are scored via discriminators ( q i h , q g ) = σ ( ( q i h ) T W q g ) and ( q ˜ i h , q g ) = σ ( ( q ˜ i h ) T W q g ) . Our objective function was to maximize mutual information to render the high-order content obtained by learning more representative.
After obtaining the high-order content of the node, the importance of different neighbor nodes is judged on the basis of the high-order content, that is, the high-order content weight coefficient is calculated. The measurement strategy is as follows:
s i j = s o f t m a x ( ( ( k = 1 n | q i k h q j k h | 2 ) 1 2 + 1 ) 1 ) ,
where s i j represents the degree of recognition of the high-order content of nodes i and j. The higher the value is, the higher the similarity between the high-order content of the two nodes is. In other words, the value of s i j indicates the importance of node j to node i in high-order content.

3.3. Adaptive Attention Mechanism

In order to incorporate high- and low-order content coefficients, we introduce two learnable parameters, g l and g h , to adjust the relative importance between s i j and f i j . r l and r h are the normalized results of g l and g h , respectively, as shown in Equations (9) and (10).
r l = e x p ( g l ) e x p ( g l ) + e x p ( g h ) .
r h = e x p ( g h ) e x p ( g l ) + e x p ( g h ) .
Then, the attention weights are computed as follows.
e i j = e x p ( r l · f i j + r h · s i j ) k N i e x p ( r l · f i k + r h · s i k ) .
In Equation (11), r l and r h are used to control the proportions between f i j and s i j . At the same time, they can perform adaptive adjustments according to the task requirements, such as focusing on high- or low-order content. e i j measures the influence of node j on node i. e i j is different from e j i , because nodes i and j have different neighbors.

3.4. Feature Updating

After obtaining the attention weights from the center’s neighbors, its features can be updated according to the messaging passing mechanism.
h i l + 1 = σ ( j N i e i j W k h j ) .
To render the self-attention calculation more stable, we applied multi-head attention for feature transformation and concatenated the resulting features as the returned output:
h i l + 1 = K k = 1 σ ( j N i e i j k W k h j ) .
Because the attention mechanism is adaptive, different attention weights are calculated in different ways. We expect different attention weights to notice different information. For example, some focus on high-order content, while others focus on low-order content.
For node classification, cross-entropy loss is adopted in which the inconsistence between the ground-truth and predicted labels is minimized. In the training state, we constructed a two-layer network structure for the proposed approach. The first layer utilizes multi-head attention. The output of the second layer is used to predict, so multi-head attention is no longer sensible. This would be run for multiple epochs to achieve convergence.

4. Experiments

We validated our model against a wide variety of state-of-the-art baselines and previous approaches, and achieved the best performance across all of them. The dataset, comparison algorithms, results and qualitative analysis are included in this section.

4.1. Datasets

We conducted experiments on three widely used datasets: Cora, Cite, and PubMed [44,45]. These are standard citation networks in which nodes represent documents, and edges represent citations. The original node features correspond to elements of a bag-of-words representation of a document. Each node has a class label. All datasets were split into three parts: training, validation, and testing. Considering the huge number of nodes and edges in PubMed, a connected subgraph was selected for our experiments. Table 1 shows the statistics of these datasets.

4.2. Comparison Algorithms

We compared out method with eight typical algorithms: Deepwalk, DGI, simplifying graph convolutional network (SGC), GCN, edge enhanced graph neural network (EGNN), graph conjoint attention network (CAT), heat kernel graph convolutional network (HKGCN), and GAT. For convenience, we denote the proposed algorithm as “Our”. “Our-I” indicates that the self connection of nodes was not considered to prove the effectiveness of the self-connection operation.
Deepwalk (https://github.com/shenweichen/GraphEmbedding) (accessed on 5 May 2023) [46] is a representative and successful work of early network representation learning. Deepwalk learns a social representation of a network from truncated random walks. It is also a method that only considers graph structure information when learning graph representation.
DGI (https://github.com/PetarV-/DGI) (accessed on 5 May 2023) [37] is an unsupervised graph embedding algorithm based on mutual information. The goal is to maximize the mutual information between the local representation and the corresponding graph overview representation.
SGC (https://github.com/Tiiiger/SGC) (accessed on 5 May 2023) [47] reduces the complexity of GCN by removing the weight matrix between the nonlinear transformation and the compression convolutional layer.
GCN (https://github.com/tkipf/pygcn) (accessed on 5 May 2023) [25] is a convolutional neural network variant on graph data. It learns the representation of the graph data by stacking several first-order spectral filters in front of nonlinear functions.
EGNN (https://github.com/vietph34/Edge_GNN) (accessed on 5 May 2023) [48] is a novel graph convolution framework that uses edge features across the network layer to more effectively use one-dimensional edge features.
CAT (https://github.com/he-tiantian/cats) (accessed on 5 May 2023) [49] is a conjoint attention network that incorporates feature-based attention with structural interventions that may describe the global relation among nodes. It uses matrix factorization (MF) to solve the objective function approaching the adjacency matrix to obtain a structural correlation. The resulting coefficients are used to capture the weighted mean attention with feature-based correlation.
HKGCN (https://github.com/hazdzz/HKGCN) (accessed on 5 May 2023) [50] popularizes SGC into a linear model through the hot core and can aggregate local information, so it can improve the performance of the model while maintaining the efficiency advantage of the model.
GAT (https://github.com/Diego999/pyGAT) (accessed on 5 May 2023) [28] is a network for node classification based on attention. Its core idea is to update the node representation via the weight value of the content of each low-order adjacent node relative to the content of the central node.

4.3. Parameter Settings

The proposed model is a two-layer framework. The first layer is the multi-head attention, and the output of the first layer is spliced. The second layer consists of a single attention, followed by softmax activation, and the element corresponding to the maximal value of its final output is the prediction result. Regularization is used in the model to prevent overfitting. During training, we used λ = 0.0005 for the Cora and Cite datasets, and λ = 0.001 for PubMed. Dropout [51] p = 0.6 was also applied to the input of each layer. All the above measures could effectively prevent overfitting and improve the generalization performance of the model. The training objective is to minimize cross entropy on the training nodes, and the learning rate was 0.005 for all datasets except PubMed, where it was 0.01. The reason for this difference is due to the difference in the proportion of the training set for each dataset. During the training process, we used an early stop strategy for the cross-entropy loss of the validation nodes with 100 epochs.

4.4. Evaluation

The experimental results of all methods were evaluated via Micro-F1, Macro-F1, and Weighted-F1. In those evaluations, precision P and recall R were used in the calculation process.
P = T P T P + F P .
R = T P T P + F N .
where T P is the number of instances that are positive and were predicted correctly. F P is the number of instances that are negative, but were predicted to be positive. F N is the number of instances that are positive, but were predicted to be negative.
Equations (14) and (15) are calculated over all tested instances, which means that they do not need to distinguish between categories. Micro-F1 directly uses the precision p and recall R to calculate the F1-score:
M i c r o F 1 = 2 P R P + R .
If Equations (14) and (15) are calculated for each category, Macro-F1 can be calculated on their basis. Specifically, P e and R e denote precision and recall for class label e, respectively. Then, Macro-F1 can be calculated:
M i c r o F 1 = e C 2 P e R e P e + R e | C | .
Weighted-F1 considers the importance of different categories. The ratio a e of the number of instances in each category e to the total number of instance is used as the weight to calculate it:
W e i g h t e d F 1 = e C a e 2 P e R e P e + R e .
Through the multiple evaluation indices above, we tested each model from multiple angles and dimensions, which is greatly significant to comprehensively evaluate the model performance. Below, we fully demonstrate the great advantages of the proposed model over the current state-of-the-art models.

4.5. Result Analysis

In the node classification task, we analyzed multiple indicators on three datasets. For each set of testing data, all approaches were run 10 times to achieve statistically steady performance. Table 2 shows the summary of Micro-F1, Macro-F1, and Weighted-F1 results over the three datasets. For each measure, we highlighted the best results over each dataset in bold. Deepwalk, as an early unsupervised model, obviously achieved the worst performance. The fundamental reason for this is that Deepwalk ignored node feature information in the graph. GCNs compensate for this deficiency by utilizing node features for information aggregation and updating node features, which allow for GCNs to perform much better than Deepwalk. DGI, as a variant of GCNs, performed worse than the model with semi-supervised learning because of the unsupervised learning strategy. It is interesting that SGC, as a simplified version of GCN, improved efficiency while maintaining performance. By introducing a hot core into SGC, HKGCN achieved better performance than that of GCN in some results while retaining the advantages of SGC. The weight allocation process of GCN is non-learnable, while that of GAT is learnable. Therefore, when faced with different tasks, GAT could continuously optimize the weight allocation strategy according to the task requirements, thus achieving better generalization performance. Although EGNNs use a different weighting strategy than that of GCN, their performance is still limited for the same reasons. GAT achieved good performance by relying on the attention mechanism based on low-order content. However, as mentioned earlier, there are great limitations. Our proposed method overcomes these limitations by producing both low- and high-order content in the graph’s dual evaluation measures of the attention mechanism, which is adaptive. These improvements allow for our model to better identify meaningful neighboring nodes. Our model performed the best among all models in the three datasets and nine experimental results. The results show that our model outperformed the other models in all cases and had very stable performance, indicating that our model has good generalization ability. In the following section, we outline the ablation experiments that we performed on the model to better evaluate the effectiveness of each module. “Our-I”, as the “Our” removing self connection, showed a great decline in performance, which proved the effectiveness of adding a self-connection. In conclusion, our model effectively addresses the limitations of existing neural network models by introducing an adaptive attention mechanism that considers multi-order content, and achieved the best performance on three datasets.

4.6. Ablation Analysis

To further explore the key component of our proposed hybrid attention mechanism, we designed two variants of our method for ablation analysis, “Our-high” and “Our-low”. Our-low only utilized low-order-content-based attentions, while Our-high only utilized high-order-content-based attentions. For convenience, all variants were single-layered, with no dropout and multi-head attention. In this way, other factors that affect the experimental results were removed. Table 3 reports the results of the compared algorithms, clearly indicating that the hybrid attention mechanism achieved the best performance. Our-low achieved the worst performance in most cases, which indicates that the high-order-content-based attention was superior to the low-order-content-based attention to some extend. The attention mechanism based on multi-level content achieved the best performance in most experimental results.This fully demonstrates that the key components of the model (high- and low-order contents) were indispensable to improving the performance of the model.

Analysis of the Adaptive Mechanism

As shown in Equation (5), r l and r h adjust scores on the basis of low- and high-order content, leading to a standard weighted average. The two variables were tuned in the training. It was interesting to explore their importance in the proposed model. We visualized these those two weight coefficients in our model over the Cora, Cite, and PubMed datasets, as shown in Figure 3. According to Figure 3, r l and r h were equally important over the PubMed dataset, while the low-order-content-based scores were more important over datasets Cora and Cite. To sum up, both weighted scores played a role in constituting the attention of the model. However, their importance was different in different application domains. This result proves that the adaptive attention mechanism could adjust itself according to the task requirements; in other words, it was very effective.

4.7. Example Display

As shown in Figure 4, we visualized several nodes to be predicted and their neighbor nodes on the Cora dataset. For nodes belonging to different categories, we used different colors to distinguish them. The categories corresponding to each color are shown in Figure 4d. Figure 4 shows that the components of the neighboring nodes of the node to be predicted were complex, which was a great test for the classification performance of the graph attention network. The key to improving network performance is how to find important neighboring nodes and assign appropriate attention weights to them. We conducted comparative experiments between our proposed model and the state-of-the-art GAT model. Due to the limitations of the attention mechanism based on low-order content, GAT incorrectly predicted nodes 85299, 144330, and 34708 as “reinforcement learning”, “case-based”, and “theory”, respectively. After comprehensively considering the low-order and high-order contents of the graph, our proposed model provided correct predictions: “neural networks”, “theory” and “neural networks”, respectively. This shows that our model could deal with complex graph structure data better than GAT could.

5. Conclusions

The proposed multi-order content graph attention network is a novel graph neural network model that addresses the issue of existing graph neural network models being unable to fully utilize multi-order content. Specifically, we designed a multi-order content attention mechanism and then proposed a novel graph neural network architecture with symmetry to fully utilize multi-order content information in the graph. Our model considers the similarity of both low- and high-order content information. This improvement gives the network stronger expressive and generalization capabilities when performing tasks such as node classification and graph classification. Unlike traditional graph neural networks, the proposed network is adaptive and can self-adjust according to the task requirements. This adaptability enables the network to perform optimally in different task scenarios, further improving its performance. We conducted multi-dimensional evaluations of the proposed model on multiple datasets for graph node classification, and performed extensive experiments to validate its performance. The experimental results show that the proposed model consistently and significantly outperformed the state-of-the-art models on all datasets, demonstrating that the improvement indeed enhanced the representation learning ability of the graph neural networks. Detailed ablation experiments also confirmed the effectiveness of the independent components in the model. This provides new ideas and directions for future related research. We will continue to improve this method and strive to enhance its representational power for better application in practical scenarios.

Author Contributions

Conceptualization, Y.C. and X.-Z.X.; methodology, Y.C. and X.-Z.X.; software, Y.C.; validation, Y.C.; formal analysis, Y.C.; investigation, W.W.; resources, W.W.; data curation, Y.C.; writing—original draft preparation, Y.C. and X.-Z.X.; writing—review and editing, Y.C., X.-Z.X. and Y.-F.H.; visualization, Y.C.; supervision, X.-Z.X.; project administration, X.-Z.X.; funding acquisition, X.-Z.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Education and Scientific Research Project of Fujian Province (no. JAT210351), and the Natural Science Foundation of Xiamen (no. 3502Z20227067).

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Rani, P.; Tayal, D.K.; Bhatia, M.P.S. Sociocentric SNA on fuzzy graph social network model. Soft Comput. 2022, 1–16. [Google Scholar] [CrossRef]
  2. Tao, Y.; Li, Y.; Zhang, S.; Hou, Z.; Wu, Z. Revisiting graph based social recommendation: A distillation enhanced social graph network. In Proceedings of the ACM Web Conference 2022, New York, NY, USA, 25–29 April 2022. [Google Scholar]
  3. Gürbüz, M.B.; Rekik, I. MGN-Net: A multi-view graph normalizer for integrating heterogeneous biological network populations. Med. Image Anal. 2021, 71, 102–119. [Google Scholar]
  4. Koutrouli, M.; Karatzas, E.; Paez, E.D.; Pavlopoulos, G.A. A guide to conquer the biological network era using graph theory. Front. Bioeng. Biotechnol. 2020, 8, 34. [Google Scholar] [CrossRef] [PubMed]
  5. Dai, T.; Zhao, J.; Li, D.; Zhao, X.; Pan, S. Heterogeneous deep graph convolutional network with citation relational BERT for COVID-19 inline citation recommendation. Expert. Syst. Appl. 2023, 213, 118841. [Google Scholar] [CrossRef]
  6. Hung, B.T. Link prediction in paper citation network based on deep graph convolutional neural network. Comput. Netw. Big Data IoT 2022, 897–907. [Google Scholar]
  7. Chen, J.; Xu, H.; Wang, J.; Xuan, Q.; Zhang, X. Adversarial detection on graph structured data. In Proceedings of the 2020 Workshop on Privacy-Preserving Machine Learning in Practice (PPMLP), New York, NY, USA, 26–30 November 2020. [Google Scholar]
  8. Nguyen, D.H.; Tsuda, K. On a linear fused Gromov-Wasserstein distance for graph structured data. Pattern Recognit. 2023, 138, 109351. [Google Scholar] [CrossRef]
  9. Yang, H.; Liu, Z. Image recognition technology of crop diseases based on neural network model fusion. J. Electron. Imaging 2023, 32, 112–122. [Google Scholar] [CrossRef]
  10. Jiang, B.; He, W.; Wu, X.; Xiang, J.; Hong, L.; Sheng, W. Semi-supervised feature selection with adaptive graph learning. Acta Electonica Sin. 2022, 50, 1643–1652. [Google Scholar] [CrossRef]
  11. Gori, M.; Monfardini, G.; Scarselli, F. A new model for learning in graph domains. In Proceedings of the 2005 IEEE International Joint Conference on Neural Networks (IJCNN), Montreal, QC, Canada, 31 July–4 August 2005. [Google Scholar]
  12. Zhou, C.; Chen, H.; Zhang, J.; Li, Q.; Hu, D.; Sheng, V.S. Multi-label graph node classification with label attentive neighborhood convolution. Expert Syst. Appl. 2021, 180, 115–123. [Google Scholar] [CrossRef]
  13. Huang, Z.; Tang, Y.; Chen, Y. A graph neural network-based node classification model on class-imbalanced graph data. Knowl.-Based Syst. 2022, 244, 108–121. [Google Scholar] [CrossRef]
  14. Xu, Y.; Cai, X.; Wang, E.; Liu, W.; Yang, Y.; Yang, F. Dynamic traffic correlations based spatio-temporal graph convolutional network for urban traffic prediction. Inform. Sci. 2023, 621, 580–595. [Google Scholar] [CrossRef]
  15. Ma, D.; Wang, Y.; Ma, J.; Jin, Q. SGNR: A social graph neural network based interactive recommendation scheme for e-commerce. Tsinghua Sci. Technol. 2023, 28, 786–798. [Google Scholar] [CrossRef]
  16. Liu, X.; Hong, Z.; Liu, J.; Lin, J.; Lin, Y.; Paton, A.R.; Zou, Q.; Zeng, X. Computational methods for identifying the critical nodes in biological networks. Brief Bioinform. 2020, 21, 486–497. [Google Scholar] [CrossRef]
  17. Lachaud, G.; Conde, C.P.; Trocan, M. Graph neural networks-based multilabel classification of citation network. In Proceedings of the Intelligent Information and Database Systems: 14th Asian Conference (ACIIDS), Ho Chi Minh City, Vietnam, 28–30 November 2022. [Google Scholar]
  18. Zhang, C.; Xue, S.; Li, J.; Wu, J.; Du, B.; Liu, D.; Chang, J. Multi-aspect enhanced graph neural networks for recommendation. Neural Netw. 2023, 157, 90–102. [Google Scholar] [CrossRef] [PubMed]
  19. Shlomi, J.; Battaglia, P.; Vlimant, J.R. Graph neural networks in particle physics. Mach. Learn. Sci. Technol. 2020, 2, 21–31. [Google Scholar] [CrossRef]
  20. Bi, W.; Xu, B.; Sun, X.; Wang, Z.; Shen, H.; Cheng, X. Company-as-tribe: Company financial risk assessment on tribe-style graph with hierarchical graph neural networks. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD), Washington, DC, USA, 14–18 August 2022. [Google Scholar]
  21. Huang, X.; Zhu, X.; Xu, X.; Zhang, Q.; Liang, A. Parallel Learning of Dynamics in Complex Systems. Systems 2022, 10, 259. [Google Scholar] [CrossRef]
  22. Zhu, J.; Rossi, R.A.; Rao, A.; Mai, T.; Lipka, N.; Ahmed, N.K.; Koutra, D. Graph neural networks with heterophily. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 19–21 May 2021. [Google Scholar]
  23. Chien, E.; Peng, J.; Li, P.; Milenkovic, O. Adaptive universal generalized pagerank graph neural network. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Vienna, Austria, 4–8 May 2021. [Google Scholar]
  24. Ma, Q.; Fan, Z.; Wang, C.; Tan, H. Graph Mixed Random Network Based on PageRank. Symmetry 2022, 14, 1678. [Google Scholar] [CrossRef]
  25. Kipf, T.N.; Welling, M. Semi-supervised classi-fication with graph convolutional networks. In Proceedings of the 5th International Conference on Learning Representations (ICLR), Toulon, France, 24–26 April 2017. [Google Scholar]
  26. Yan, Y.; Hashemi, M.; Swersky, K.; Yang, Y.; Koutra, D. Two sides of the same coin: Heterophily and oversmoothing in graph convolutional neural networks. In Proceedings of the 2022 IEEE International Conference on Data Mining (ICDM), Orlando, FL, USA, 28 November–1 December 2022. [Google Scholar]
  27. Zhu, J.; Yan, Y.; Zhao, L.; Heimann, M.; Akoglu, M.; Koutra, D. Beyond homophily in graph neural networks: Current limitations and effective designs. In Proceedings of the Thirty-fourth Conference on Neural Information Processing Systems (NeurIPS), Online Event, 6–12 December 2020. [Google Scholar]
  28. Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. In Proceedings of the 6th International Conference on Learning Representations (ICLR), Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
  29. Yao, L.; Mao, C.; Luo, Y. Graph convolutional networks for text classification. In Proceedings of the AAAI conference on artificial intelligence (AAAI), Honolulu, HI, USA, 27 January –1 February 2019. [Google Scholar]
  30. Jun, E.; Na, K.S.; Kang, W.; Lee, J.; Suk, H.I.; Ham, B.J. Identifying resting-state effective connectivity abnormalities in drug-naïve major depressive disorder diagnosis via graph convolutional networks. Hum. Brain Mapp. 2020, 41, 4997–5014. [Google Scholar] [CrossRef]
  31. Jiang, W.; Luo, J. Graph neural network for traffic forecasting: A survey. Expert Syst. Appl. 2022, 207, 117921. [Google Scholar] [CrossRef]
  32. An, W.; Tian, F.; Chen, P.; Zheng, Q. Aspect-based sentiment analysis with heterogeneous graph neural network. IEEE Trans. Comput. Soc. Syst. 2022, 10, 403–412. [Google Scholar] [CrossRef]
  33. Liu, J.; Wang, X.; Xie, F.; Wu, S.; Li, D. Condition monitoring of wind turbines with the implementation of spatio-temporal graph neural network. Eng. Appl. Artif. Intell. 2023, 121, 760–768. [Google Scholar] [CrossRef]
  34. Deshpande, A.; Melnikov, A. Capturing Symmetries of Quantum Optimization Algorithms Using Graph Neural Networks. Symmetry 2022, 14, 2593. [Google Scholar] [CrossRef]
  35. Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G.; Cai, J.; et al. Recent advances in convolutional neural networks. Pattern Recogn. 2018, 77, 354–377. [Google Scholar] [CrossRef]
  36. Bruna, J.; Zaremba, W.; Szlam, A.; LeCun, Y. Spectral networks and locally connected networks on graphs. In Proceedings of the 2nd International Conference on Learning Representations (ICLR), Banff, AB, Canada, 14–16 April 2014. [Google Scholar]
  37. Veličković, P.; Fedus, W.; Hamilton, W.L.; Lio, P.; Bengio, Y.; Hjelm, R.D. Deep graph infomax. In Proceedings of the 7th International Conference on Learning Representations (ICLR), New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
  38. Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
  39. Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 31st Neural information processing systems (NeurIPS), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
  40. Xu, J.; Chen, Y.; Qin, Y.; Huang, R.; Zheng, Q. A Feature Combination-Based Graph Convolutional Neural Network Model for Relation Extraction. Symmetry 2021, 13, 1458. [Google Scholar] [CrossRef]
  41. Pei, H.; Wei, B.; Chang, K.C.C.; Lei, Y.; Yang, B. Geom-gcn:Geometric graph convolutional networks. In Proceedings of the 8nd International Conference on Learning Representations (ICLR), Addis Ababa, Ethiopia, 26–30 April 2020. [Google Scholar]
  42. Yang, F.; Zhang, H.; Tao, S. Hybrid deep graph convolutional networks. Int. J. Mach. Learn. Cybern. 2022, 13, 2239–2255. [Google Scholar] [CrossRef]
  43. Bo, D.; Wang, X.; Shi, C.; Shen, H. Beyond low-frequency information in graph convolutional networks. In Proceedings of the AAAI Conference on Artificial Intelligence, New York, NY, USA, 19–21 May 2021. [Google Scholar]
  44. Getoor, L. Link-based classification. Adva. Meth. Knowl. Disc. Compl. Data 2015, 189–207. [Google Scholar]
  45. Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi, T.R. Collective classification in network data. AI Mag. 2014, 29, 93–107. [Google Scholar] [CrossRef]
  46. Perozzi, B.; AI-Rfou, R.; Skiena, S. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014. [Google Scholar]
  47. Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying graph convolutional networks. In Proceedings of the Machine Learning Research, Long Beach, CA, USA, 9–15 June 2019. [Google Scholar]
  48. Gong, L.; Cheng, Q. Exploiting edge features for graph neural networks. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019. [Google Scholar]
  49. He, T.; Ong, Y.S.; Bai, L. Learning conjoint attentions for graph neural nets. In Proceedings of the Thirty-fifth Conference on Neural Information Processing Systems (NeurIPS), Online Event, 6–14 December 2021. [Google Scholar]
  50. Zhao, J.; Dong, Y.; Tang, J.; Ding, M.; Wang, K. Generalizing graph convolutional networks via heat kernel. In Proceedings of the 9th International Conference on Learning Representations (ICLR), Virtual Event, Austria, 3–7 May 2021. [Google Scholar]
  51. Srivastava, N.; Hinton, G.; Krizhevsky, A.; Sutskever, I.; Salakhutdinov, R. Dropout: A simple way to prevent neural networks from overfitting. J. Mach. Learn. Res. 2014, 15, 1929–1958. [Google Scholar]
Figure 1. An example of graph structured data.
Figure 1. An example of graph structured data.
Symmetry 15 01036 g001
Figure 2. Illustration of attention coefficients generation.
Figure 2. Illustration of attention coefficients generation.
Symmetry 15 01036 g002
Figure 3. An illustration of the weight coefficients between low- and and high-order-content-based scores.
Figure 3. An illustration of the weight coefficients between low- and and high-order-content-based scores.
Symmetry 15 01036 g003
Figure 4. Examples in the Cora dataset. (ac) Central node to be classified and its connected neighbor nodes; (d) node category.
Figure 4. Examples in the Cora dataset. (ac) Central node to be classified and its connected neighbor nodes; (d) node category.
Symmetry 15 01036 g004
Table 1. Dataset statistics.
Table 1. Dataset statistics.
Datasets Nodes Edges Features Classes Training Validation Testing
Cora27085429143371405001000
Cite33274732370361205001000
PubMed335642785003605001000
Table 2. Comparison with other methods in terms of F1 score.
Table 2. Comparison with other methods in terms of F1 score.
Cora Cite PubMed
Micro-F1
Deepwalk60.6 ± 0.8%40.9 ± 0.8%53.2 ± 0.5%
DGI75.3 ± 0.4%71.7 ± 0.4%71.1 ± 0.2%
SGC80.0 ± 0.1%69.7 ± 0.1%73.9 ± 0.1%
GCN82.1 ± 0.6%71.8 ± 0.5%74.2 ± 0.7%
EGNN81.3 ± 0.4%69.7 ± 0.3%75.7 ± 0.3%
CAT83.5 ± 0.5%72.0 ± 0.3%75.5 ± 0.7%
HKGCN82.8 ± 0.1%66.2 ± 0.1%74.3 ± 0.1%
GAT83.8 ± 0.2%71.5 ± 0.2%74.8 ± 0.1%
Our-I83.2 ± 0.2%70.6 ± 0.2%74.2 ± 0.2%
Our85.1 ± 0.2%72.6 ± 0.2%76.8 ± 0.2%
Macro-F1
Deepwalk58.0 ± 0.6%36.3 ± 0.9%46.9 ± 0.6%
DGI69.8 ± 0.7%61.6 ± 0.5%67.7 ± 0.3%
SGC78.1 ± 0.1%61.9 ± 0.1%71.3 ± 0.1%
GCN79.5 ± 0.7%63.2 ± 0.4%71.6 ± 0.8%
EGNN78.3 ± 0.4%61.2 ± 0.3%74.0 ± 0.3%
CAT82.3 ± 0.4%65.0 ± 0.4%74.1 ± 0.8%
HKGCN80.2 ± 0.1%60.2 ± 0.1%73.1 ± 0.1%
GAT82.4 ± 0.2%64.4 ± 0.2%73.6 ± 0.1%
Our-I80.9 ± 0.2%64.9 ± 0.2%73.2 ± 0.2%
Our83.6 ± 0.2%65.9 ± 0.2%75.9 ± 0.2%
Weighted-F1
Deepwalk60.9 ± 0.7%39.3 ± 0.8%51.3 ± 0.5%
DGI74.2 ± 0.5%68.8 ± 0.5%69.7 ± 0.2%
SGC80.2 ± 0.1%67.9 ± 0.1%73.3 ± 0.1%
GCN82.1 ± 0.6%69.6±0.6%73.4 ± 0.8%
EGNN81.4 ± 0.4%67.4 ± 0.3%75.2 ± 0.2%
CAT83.4 ± 0.4%70.6 ± 0.3%75.0 ± 0.7%
HKGCN83.0 ± 0.1%65.1 ± 0.1%74.0 ± 0.1%
GAT83.8 ± 0.1%70.0 ± 0.2%74.5 ± 0.1%
Our-I83.2 ± 0.2%69.5 ± 0.2%74.0 ± 0.2%
Our85.1 ± 0.2%71.2 ± 0.2%76.6 ± 0.2%
Table 3. Experimental results for ablation analysis.
Table 3. Experimental results for ablation analysis.
Cora Cite PubMed
Micro-F1
Our-low54.1%63.1%59.8%
Our-high59.3%66.6%58.7%
Our65.1%67.7%63.5%
Macro-F1
Our-low46.8%52.4%49.5%
Our-high49.5%56.0%44.0%
Our55.7%57.4%48.5%
Weighted-F1
Our-low52.9%59.2%55.1%
Our-high56.6%63.1%51.6%
Our63.2%64.5%56.7%
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Chen, Y.; Xie, X.-Z.; Weng, W.; He, Y.-F. Multi-Order-Content-Based Adaptive Graph Attention Network for Graph Node Classification. Symmetry 2023, 15, 1036. https://doi.org/10.3390/sym15051036

AMA Style

Chen Y, Xie X-Z, Weng W, He Y-F. Multi-Order-Content-Based Adaptive Graph Attention Network for Graph Node Classification. Symmetry. 2023; 15(5):1036. https://doi.org/10.3390/sym15051036

Chicago/Turabian Style

Chen, Yong, Xiao-Zhu Xie, Wei Weng, and Yi-Fan He. 2023. "Multi-Order-Content-Based Adaptive Graph Attention Network for Graph Node Classification" Symmetry 15, no. 5: 1036. https://doi.org/10.3390/sym15051036

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop