Learning Data-Driven Propagation Mechanism for Graph Neural Network

Wu, Yue; Hu, Xidao; Fan, Xiaolong; Ma, Wenping; Gao, Qiuyue

doi:10.3390/electronics12010046

Open AccessArticle

Learning Data-Driven Propagation Mechanism for Graph Neural Network

by

Yue Wu

¹

,

Xidao Hu

¹,

Xiaolong Fan

²,

Wenping Ma

^3,* and

Qiuyue Gao

¹

School of Computer Science and Technology, Xidian University, Xi’an 710071, China

²

School of Electronic Engineering, Xidian University, Xi’an 710071, China

³

School of Artificial Intelligence, Xidian University, Xi’an 710071, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(1), 46; https://doi.org/10.3390/electronics12010046

Submission received: 21 November 2022 / Revised: 12 December 2022 / Accepted: 20 December 2022 / Published: 22 December 2022

(This article belongs to the Special Issue Applications of Computational Intelligence)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

A graph is a relational data structure suitable for representing non-Euclidean structured data. In recent years, graph neural networks (GNN) and their subsequent variants, which utilize deep neural networks to complete graph analysis and representation, have shown excellent performance in various application fields. However, the propagation mechanism of existing methods relies on hand-designed GNN layer connection architecture, which is prone to information redundancy and over-smoothing problems. To alleviate this problem, we propose a data-driven propagation mechanism to adaptively propagate information between layers. Specifically, we construct a bi-level optimization objective and use the gradient descent algorithm to learn the forward propagation architecture, which improves the efficiency of learning different layer combinations in multilayer networks. The experimental results of the model on seven benchmark datasets demonstrate the effectiveness of the proposed method. Furthermore, combining this data-driven propagation mechanism with models, such as Graph Attention Networks, can consistently improve the performance of these models.

Keywords:

graph neural network; propagation mechanism; data-driven method; deep learning

1. Introduction

Graphs are data structures that model a set of objects (nodes) and their relationships (edges). Graphs can be irregular and have variable-sized unordered nodes, and nodes can have different numbers of neighbors. As a consequence, while some important operations (e.g., convolutions [1]) can be easily applied to the image domain [2], it is difficult to apply to the graph domain. In addition, a key assumption of existing deep learning algorithms is that the data samples are independent of each other. For graphs, however, there are edges between each data sample (node) and other data samples (nodes) that capture the interdependencies between instances. Due to the powerful representational power of graph structures, the study of graph analysis using machine learning methods has received increasing attention. Researchers have defined and designed a neural network architecture for processing graph data. This structure has become a new research hotspot—“graph neural network (GNN)”, which achieves excellent performance and interpretability on graph-structured data.

For example, papers in a citation network are linked to each other by citations, and GNNs can classify each paper into a different group [3,4,5,6]. In the fields of chemistry and medicine, molecules can be modeled as graphs, and their biological activities can be identified by GNNs for drug development [7,8,9,10]. In the field of computer vision, GNNs can identify objects depicted by 3D point clouds and explore their topology [11,12,13,14,15]. In the traffic system, GNNs can accurately predict the traffic speed and traffic flow in the traffic network for route planning and flow control [16,17].

GNNs are used to learn node representations (node embeddings), which can simultaneously model node features and graph topology information. In addition, GNNs utilize the relationships (edges) between nodes of a graph to propagate information rather than treating them as features of nodes. Among them, models such as Graph Convolutional Networks (GCN) [3] and Graph Attention Networks (GAT) [18] follow a neighborhood aggregation (message passing) scheme. These models learn to iteratively aggregate the hidden features of each node in the graph and its neighbors as its new hidden features, where the iterations are parameterized by neural network layers. Theoretically, the aggregation process of L iterations fuses the structural information of each node at each layer, which can simultaneously learn the topology and the distribution of node features in the neighborhood.

However, in practice, a deeper version of the model with more information is likely to perform worse. For example, the best performance of GCN and GAT experiments on the Planetoid dataset [19] is achieved with a two-layer model, and increasing the number of layers will reduce the performance. A similar degradation of learning for computer vision problems is addressed by residual connections [20], which greatly aids the training of deep models. However, even with residual connections, GCNs with more layers do not perform as well as two-layer GCNs on datasets such as the Citation Network datasets PubMed [21], CiteSeer, and Cora [22].

We believe that the structure of different nodes and their neighborhoods (subgraphs) in the graph has a great influence on the result of neighborhood aggregation. The rate of expansion, or the growth rate of the radius of influence, is characterized by the mixing time of random walks and varies significantly across subgraphs of different structures. Therefore, the same number of iterations can result in very different local distributions. For example, consider a node at the center of the graph and a node at the edge of the graph to start an expansion of a random walk. After the same number of iterative layers, the nodes that may be located in the center of the graph already contain basically all the information of the graph, so only a small amount of information from other layers needs to be aggregated. At this time, if all the information of each layer is aggregated, it will cause redundancy. The nodes located at the edge of the graph may contain only a small amount of information, and more information needs to be aggregated to perceive the structure of the graph.

To adaptively adjust the influence radius of each node and task, we propose a data-driven propagation mechanism that learns to selectively acquire information from various layers. Finally, each node can selectively obtain low-order local structural information and high-order neighborhood information, thereby effectively avoiding the problems of local structural information degradation and information redundancy and enhancing the representation ability of GNNs. Additionally, stacking too many layers and non-linear transformations can lead to over-smoothing issues, where node representations tend to converge to a fixed value, resulting in degraded model performance. To alleviate this problem, we add an identity map to the convolution operation to improve the network performance.

Since learning a combination of different layers in a multilayer network is computationally expensive, we adopt a differentiable approach to reduce the computational cost. The model achieves good results on the node classification task, demonstrating the effectiveness of the proposed data-driven propagation mechanism. In conclusion, we outline the main contributions of this paper as follows:

(1) We propose a data-driven propagation mechanism (GraphSAP), which adaptively learns the connections between different layers, enabling nodes to selectively fuse low-order local structural information while acquiring high-order neighborhood information.

(2) We add the identity map to the neighbor aggregation function of the GraphSAP model and use a differentiable algorithm during training to make the model more efficient while maintaining high performance.

(3) We provide a quantitative comparison of the node classification task under different datasets, showing the good performance of the model.

2. Related Work

2.1. Graph Neural Networks

The concept of graph neural networks was first proposed in [23] and further clarified by Scarselli et al. [24], and many variants [18,25] have been proposed over the past few years. Ref. [24] is the first paper to propose a graph neural network model, which applies neural networks to graph-structured data, and elaborates the structure, calculation method, optimization algorithm, and implementation of the neural network model in detail.

GNN is a new research hotspot that emerged after the maturity of convolutional neural networks (CNN) [1] to process non-Euclidean data. Some existing studies try to apply the methods used by CNN to GNN to utilize the excellent abilities of CNN. The existing deep GNN model adds other operations to the convolution operation to alleviate over-smoothing or aggregates different layers. Among the contributions of stacking more layers of CNNs, ResNet [20] and DenseNet [26] are excellent methods that can be seen in many deep networks today. JKNet [27] is inspired by ResNet, but it does not achieve good performance by stacking multiple layers like ResNet and can not fully achieve the representation ability of GNN. These methods are all hand-crafted networks. Therefore, we cannot directly apply the method of CNN to GNN, but needs to convert these methods to make GNN obtain better performance. The focus of our work is to better exploit the representational power of GNNs.

2.2. Data-Driven Methods

Hand-designed interlayer connection network structures have achieved great success in the past. The emergence of ResNet [20] and DenseNet [26] showed the importance of residual and dense connections for the design of deep networks and had a huge impact on the design of deep neural networks. With the continuous development of deep neural networks and the continuous invention and utilization of various models and new modules, people gradually realize that developing a new neural network structure is more time-consuming and labor-intensive.

People have begun to explore how to use existing machine learning knowledge to independently build networks suitable for business scenarios. Automated Machine Learning (AutoML) is one of the hottest fields in machine learning and deep learning in recent years. Several recent works have demonstrated the feasibility of automated learning [28] and designed some models that go beyond hand-designed ones, such as [29,30]. Using the dataset as the basis for training the network, various network structures can be designed. For example, if you have a four-layer network, then mathematically, there are 15 combinations of layer-to-layer connections in total. Ideally, given sufficient resources and time, data-driven learning methods can simulate all connections between layers, which would cover all hand-designed network structures. A representative method is the Neural Architecture Search (NAS) algorithm, such as [31]. In NAS, the network architecture is mainly designed from three parts: search space, search strategy, and evaluation strategy. The data-driven approach is also a method in the field of AutoML, which adaptively learns a network model suitable for the data based on the existing data, which is used in our work.

3. Background

Given an undirected graph

G = (V, E)

with node features

X \in R^{n \times d_{i}}

, where V and E denote node and edge sets, respectively. n represents the number of nodes, and

d_{i}

is the dimension of node features. We use

N (v)

to represent the first-order neighbors of a node v in

G

, i.e.,

N (v) = \{u \in V | (v, u) \in E\}

. In addition, we use the set

\tilde{N} (v)

to denote the set of neighbors, including oneself, i.e.,

\tilde{N} (v) = \{v\} \cup \{u \in V | (v, u) \in E\}

. Let

\tilde{G}

be the graph obtained by adding a self-loop to every

v \in V

. The hidden feature of node v learned by the l-th layer of the model is denoted by

h_{v}^{(l)} \in R^{d_{h}}

, where

d_{h}

denotes the dimension of the hidden features. For simplicity of illustration, we assume that it is the same between layers. Let A denote the adjacency matrix and D the diagonal degree matrix. Consequently, the adjacency matrix and diagonal degree matrix of

\tilde{G}

is defined to be

\tilde{A} = A + I

and

\tilde{D} = D + I

, respectively. The normalized graph Laplacian matrix is defined as

L = I_{n} - D^{- 1 / 2} A D^{- 1 / 2}

, which is a symmetric positive semidefinite matrix.

3.1. Graph Convolutional Network

Kipf et al. [3] proposed the Graph Convolutional Network (GCN) model, which can be described as the “pioneering work” of GNN. GCN uses approximation techniques to derive a simple and efficient model that enables convolution operations in image processing to be easily used for graph-structured data processing. Inspired by GCNs, various new graph neural networks are emerging. The form of GCN can be expressed as:

h_{i}^{(l + 1)} = σ (b^{(l)} + \sum_{j \in N (i)} \frac{1}{c_{i j}} h_{j}^{(l)} W^{(l)})

(1)

where

c_{(i j)} = \sqrt{|N (i)|} \sqrt{| N (j) |}

is a regularization term,

W^{(l)}

and

b^{(l)}

are trainable parameters, and

σ

is a non-linear activation function, e.g., a ReLU.

In principle, deeper versions of GCN models that can capture more information will perform better. We conduct node classification experiments on the Cora dataset using GCNs with 2-layer, 4-layer, 6-layer, and 16-layer network structures, respectively, to analyze the performance of GCNs with different layers. The experimental result is shown in Figure 1. The best performance of GCN on the node classification task on the Cora dataset is achieved with a 2-layer model, and increasing the number of layers will reduce the performance. This is due to the over-smoothing problem; as the number of layers in the network increases and the number of iterations increases, the hidden layer representation of each node tends to converge to the same value.

3.2. Deep GNNs

In order to better exploit information from neighborhoods of differing localities and improve the over-smoothing problem of deep GNN models, models such as Jumping Knowledge Networks [27] and GCNII [32] proposed a network structure similar to ResNet [20] structure. These models are roughly represented as follows:

h_{v}^{(l + 1)} = σ (W^{(l + 1)} \cdot a g g r e g a t e ({h_{u}^{(l)}, u \in \tilde{N} (v)}))

(2)

h_{v}^{(f i n a l)} = l a y e r_a g g r e g a t i o n (h_{v}^{(1)}, h_{v}^{(2)}, \dots, h_{v}^{(n)})

(3)

where

a g g r e g a t e

represents aggregation operations between nodes and

l a y e r_a g g r e g a t i o n

is the layer aggregation function, indicating that all representations of the middle layer are aggregated in the last layer. However, this hand-designed way of aggregating the features of all layers may result in information redundancy.

Many GNN models [33,34,35] obtain node features via a message-passing pattern [7,36,37], where the representation of each node is learned by iteratively aggregating the embeddings (“messages”) of its neighbors. APGCN [33] sets each node as an extra unit when the message is passed, which outputs a value that controls whether the communication should continue. This method can better control the information propagation of nodes to combine information from more distant neighbors, but it cannot aggregate information from different layers. To address the above issues, we use GCN as a benchmark and design an adaptive learning method for inter-layer aggregation. Compared with hand-designed networks, it can automatically learn the network aggregation architecture to fully exploit the representational capabilities of GNNs.

4. GraphSAP Network

4.1. Model Analysis

To improve the representation ability of the network model, we design a data-driven propagation mechanism that adaptively learns the connections of different layers, that is, the aggregation of different neighbors. We use GCN as the baseline network in our network and alleviate the over-smoothing problem in deep networks by adding an identity map to the convolution operation. Formally, we define the l-th layer of our GraphSAP as:

H^{(l + 1)} = σ (κ_{l} \tilde{L} H^{l} ((1 - β_{l}) I_{n} + β_{l} W^{l}))

(4)

where

κ_{l}

and

β_{l}

are two hyperparameters, and

\tilde{L} = {\tilde{D}}^{- 1 / 2} \tilde{A} {\tilde{D}}^{- 1 / 2}

is the graph convolution matrix with the renormalization trick. Compared to the vanilla GCN model (Equation (1)), we add the identity map

I_{n}

to the l-th weight matrix

W^{l}

.

Each intermediate layer is computed from all its predecessors:

l a y e r^{(j)} = \sum_{i < j} o^{(i, j)} l a y e r^{(i)}

(5)

where

l a y e r

can be obtained by Equation (4), and

o^{(i, j)}

denotes the connection state between

l a y e r^{(i)}

and

l a y e r^{(j)}

.

The network architecture of our GraphSAP is shown in Figure 2. The main difference between our model and the existing models is that we design an adaptive learning network architecture based on a data-driven propagation mechanism instead of relying on hand-crafted designs. We incorporate identity maps into convolutions to guarantee model performance and then use a data-driven adaptive approach to learn the best-performing network aggregation structure. Our proposed network achieves good results in the node classification task, demonstrating the feasibility of our proposed method.

Identity maps play an important role in preventing performance degradation in deep models, so we add identity maps to the model’s operations. Generally speaking, identity mapping is to add the identity matrix to the weight matrix, which can alleviate the over-smoothing problem of the model due to the increase in the number of network layers. Frequent interactions between different dimensions of the feature matrix [38] will degrade the performance of the model in semi-supervised tasks, whereas direct mapping of the smooth representation

\tilde{L} H^{l}

to the output will reduce this interaction.

4.2. Data-Driven Propagation Mechanism

In this subsection, we first introduce the proposed propagation mechanism. We first introduce the differences and connections between our data-driven propagation mechanism and existing propagation strategies, such as Learning to Propagation (L2P) [39]. Although both L2P and our proposed GraphSAP belong to adaptive propagation, there are still differences between the two methods. Our GraphSAP learns whether neighbor node features of nodes at different levels are aggregated. L2P considers that different nodes may require different propagation layers, so it needs to learn the order of neighbor nodes. Next, we introduce methods for continuous operations between layers and, finally, optimization methods to speed up the learning time.

JKNet [27] aggregates the features of nodes of all layers to get the final feature representation, as shown in Equation (3). Our method obtains a layer connection operation space and adaptively learns aggregations between different layers, as shown in Equation (5), each directed edge

(i, j)

is associated with the edge state

o^{(i, j)}

. Our final task is to find a suitable connection method for each layer. The combination of these operations is discontinuous and learning in discrete spaces is very difficult.

To make the search space continuous, we relax the classification selection for a specific operation to a softmax of all possible operations:

{\bar{o}}^{(i, j)} (l a y e r) = \sum_{o \in O} \frac{e x p (α_{o}^{i, j})}{\sum_{o^{'} \in O} e x p (α_{o^{'}}^{i, j})} o (l a y e r)

(6)

where

O

is the set of all candidate aggregation operations (e.g.,

i d e n t i t y

,

m a x p o o l i n g

, and

z e r o

), and each operation represents some function

o (\cdot)

to be applied to the layer, and the operation mixing weights for a pair of layers

(i, j)

are parameterized by a vector

α^{i, j}

of dimension

| O |

.

l a y e r

represents the GNN layer, as shown in Equation (5). The layer aggregation operation of GraphSAP is shown in Equation (4), and the node features of the last layer can be obtained in the following ways:

l a y e r^{(l)} = [o (H^{l}), \dots, o (H^{l - 1})]

(7)

where o is a classification operation, indicating whether this layer participates in information transmission. The final feature of the nodes can be expressed as

Z = s o f t m a x (l a y e r^{(l)})

(8)

When training the network, we use

L_{t r a i n}

and

L_{v a l}

to denote the training and validation losses. Both losses depend not only on the architecture

α

, but also on the weights w in the network. The goal of our method is to find

α^{*}

that minimizes

L_{v a l} (w^{*}, α^{*})

, where

w^{*}

is the weight that minimizes

L_{t r a i n}

. Thus, our model actually needs to solve a bi-level optimization [40] problem:

\underset{α \in A}{m i n} L_{v a l} (w^{*} (α), α^{*})

(9)

s . t . w^{*} (α) = a r g m i n_{w} L_{t r a i n} (w, α)

(10)

where

α

denotes the network architecture, and

ω^{*} (α)

denotes the weight of this architecture after training. In our experiments, we choose to use the cross-entropy loss for our semi-supervised node classification task:

L = - \sum_{i \in Y_{L}} Y_{i} log Z_{i}

(11)

where

Y_{L}

is the set of node indices with labels,

Y_{i}

denotes the predicted label of node i, and

Z_{i}

is the final feature representation of node i. This cross-entropy function is used for adaptive generative network structure training and task model training.

We train the network using a one-shot differentiable method; the optimization details are given in Algorithm 1. In addition, we use the gradient-based approximation method [41,42,43] to update the operation parameter

α

to save training time, as follows:

▽_{α} L_{v a l} (w^{*} (α), α) \approx ▽_{α} L_{v a l} (w - γ ▽_{w} L_{t r a i n} (w, α), α)

(12)

where w denotes the current weights maintained by the algorithm, and

γ

is the learning rate for a step of inner optimization. We use only a single training step to adjust w to approximate

w^{*} (α)

without fully solving the internal optimization by training until convergence (Equation (10)).

Algorithm 1: Data-Driven Propagation Mechanism (SAP)

After training, we take the top-k operations with good performance in each layer (in our experiments, we set

k = 1

), such as the maximum weight in Equation (6), to form our model. After adaptive learning is complete, we train from scratch using the best-performing model and adjust it based on the validation data to receive the final parameters.

5. Experiment

5.1. Datasets

To verify the effectiveness of our proposed algorithm, we use seven benchmark datasets to perform the node classification task. Table 1 summarizes the statistics of the dataset. We conduct experiments on three citation network datasets: PubMed [21], CiteSeer, and Cora [22]. Each of their nodes represents a paper, and each edge represents a citation relationship between two papers. The dataset contains bag-of-words features for each paper (node). The task is to classify papers into different topics according to a citation network, i.e., node classification. We also introduce four new datasets for the node classification task: Coauthor CS, Coauthor Physics, Amazon Computers, and Amazon Photo [44]. Descriptions of these new datasets are mentioned below. We split the nodes in all graphs into

60 %

,

20 %

, and

20 %

for training, validation, and testing.

Cora. The Cora dataset consists of machine learning papers divided into the following seven categories: Case Based; Genetic Algorithms; Neural Networks; Probabilistic Methods; Reinforcement Learning; Rule Learning; Theory.

CiteSeer. The Citeseer dataset is a portion of papers selected from the CiteSeer Digital Papers repository and is grouped into the following six categories: Agents; AI; DB; IR; ML; HCI.

PubMed. The PubMed dataset includes 19,717 scientific publications on diabetes from the Pubmed database, divided into three categories: Diabetes Mellitus, Experimental; Diabetes Mellitus Type 1; Diabetes Mellitus Type 2.

Coauthor CS and Coauthor Physics. They are coauthorship graphs based on the Microsoft Academic Graph from the KDD Cup 2016 challenge. Nodes in the dataset represent authors and are connected by an edge if two authors coauthored a paper. Node features represent keywords of each author’s papers, and category labels represent each author’s most active research area.

Amazon Computers and Amazon Photo. They are fragments of the Amazon co-purchase graph [44], where nodes represent items, edges represent two items that are frequently purchased together, node features are bag-of-words-encoded product reviews, and category labels represent product classifications.

5.2. Settings

Baselines. To compare our proposed mechanism with other existing methods, we consider the following baselines: Graph Convolutional Network (GCN) [3], Graph Attention Network (GAT) [18], Simplified Graph Convolution Network (SGC)) [4], JKNet [27], Multilayer Perceptron (MLP) [45], Graph Sample and Aggregate (GraphSage) [46], DAGNN [47], GCNII [32], DenseGCN [48], and ResGCN [48].

Configurations. Our experiment is run on a NVIDIA GTX 3090Ti Graphical Card using PyTorch (version 1.7). In our experiment, GCN [3] is used as the baseline model, identity mapping is added to the convolution, and the data-driven propagation mechanism is used to obtain the network model. In all the experiments, we set the depth in {2, 4, 8, 16, 32, 64}. Throughout the experiment, we use the Adam optimizer [49]. We adopt the learning rate to be 0.005 and the maximum number of epochs to be 1000. We set the dropout to be 0.5, the dimensions of the hidden features to be 32, and the weight decay to be 0.001. We add L2 regularization to the model parameters. We set

κ_{ℓ} = 1

and

β_{ℓ} = log (\frac{λ}{ℓ} + 1) \approx \frac{λ}{ℓ}

. The principle of setting

κ_{ℓ}

is to ensure the decay of the weight matrix adaptively increases as we stack more layers.

5.3. Results

Network evaluation. We evaluate the training performance of the model by observing the training of the model in network structure selection and corresponding weight generation. The training results are shown in Figure 3a,b. We adaptively learn the network structure using the validation set and train and optimize the

α

in Equation (9) to obtain the network structure with the best performance. The jumping case of the training loss in Figure 3a is the process of optimizing the network architecture. After the adaptive learning of the network structure is completed, we use the method of training GNN to train and optimize the network parameters w to obtain the final network model. Due to the use of differentiable methods for optimization, both our network structure selection training and parameter training can converge quickly. These training results confirm that the differentiable method we use is feasible and effective.

Performance comparison. The quantitative comparison results of node classification performance with other methods on various datasets are shown in Table 2. All results used for comparison are the best results achievable using the respective models. Our network achieves good performance on all seven datasets, achieving the highest classification accuracy on five of them. GAT shows good results on some datasets, such as Cora, but the effect on the Amazon Com dataset is mediocre. Compared with some current deep models GCNII, ResGCN, and DenseGCN, GCNII performs best on Cora and Citeseer datasets, but our network achieves the best results on all other datasets. In general, our model can be applied to various datasets and has achieved good results, which proves the effectiveness of the model. Our method also provides a feasible direction to better utilize the representational power of GNNs.

To investigate the model performance trends at different depths, we further compare the representational capabilities of our proposed model and existing models at different depths. The detailed comparison results of models with different depths are shown in Figure 4. From these experimental results, we can make the following observations. The baseline model (GCN) struggles to maintain consistent performance as we stack more layers. We also found that residual and dense connections can help improve the model performance on most datasets but not much for Amazon Computer and Pubmed datasets. The Jumping Knowledge (JK) mechanism outperforms the baseline model (GCN) [3] in most cases. However, increasing depth also causes its performance to degrade. The GCNII model outperforms GCN and JKNet on multilayer networks, and the problem of over-smoothing is alleviated with increasing depth. However, GCNII performs poorly on four new datasets, and its generalizability is questionable. These experimental results further confirm that our proposed method is effective and feasible for training models with excellent representation ability.

Model Visualization. We visualize the network structure learned by the model for node classification tasks on the Amazon Photo dataset, as shown in Figure 5. The network structure diagram shows that the final classification result is an aggregation of neighbors from different layers. Neighbors that need to be aggregated are adaptively learned by our method without relying on the manual design. The aggregation between layers of the network structure is irregular. Our method is flexible and widely applicable and has excellent graph representation ability.

6. Conclusions

We propose a data-driven propagation mechanism that adaptively learns different connections between different layers, i.e., learns combinations of different neighbors. This mechanism can alleviate the information redundancy and over-smoothing problems caused by the previously hand-designed GNN layer-connected architecture. Compared with other mainstream methods, the network architecture can be adapted to a variety of different datasets. The proposed GraphSAP achieves good performance on all three public datasets and achieves the best results on one of the public datasets as well as the new four datasets tested. In addition, our method has almost no performance degradation when the number of model layers is deepened. Further, the training efficiency is improved by adopting a more efficient differentiable learning algorithm.

In the future, we will explore more automatic learning methods to further improve the performance of GraphSAP. It also includes exploring other layer aggregators and studying the impact of different combinations of different layers and node aggregators on the graph structure. Furthermore, we can also explore tasks other than node classification tasks, such as graph classification.

Author Contributions

Conceptualization, Y.W.; methodology, Y.W.; validation, X.F.; writing—original draft preparation, Q.G. and X.H.; writing—review and editing, Y.W. and X.H.; supervision, W.M.; funding acquisition, Y.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work is supported by the National Natural Science Foundation of China (62276200, 62036006), the Natural Science Basic Research Plan in Shaanxi Province of China (2022JM-327) and the CAAI-Huawei MINDSPORE Academic Open Fund.

Data Availability Statement

The data used to support the findings of this study are available from the corresponding author upon request.

Conflicts of Interest

The authors declare no conflict of interest.

References

LeCun, Y.; Bengio, Y. Convolutional networks for images, speech, and time series. Handb. Brain Theory Neural Netw. 1995, 3361, 110–124. [Google Scholar]
Bronstein, M.M.; Bruna, J.; LeCun, Y.; Szlam, A.; Vandergheynst, P. Geometric deep learning: Going beyond euclidean data. IEEE Signal Process. Mag. 2017, 34, 18–42. [Google Scholar] [CrossRef] [Green Version]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017; pp. 3031–3041. [Google Scholar]
Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying Graph Convolutional Networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 10–15 June 2019; pp. 6861–6871. [Google Scholar]
Yao, L.; Mao, C.; Luo, Y. Graph Convolutional Networks for Text Classification. AAAI Conf. Artif. Intell. 2019, 33, 7370–7377. [Google Scholar] [CrossRef] [Green Version]
Hang, M.; Neville, J.; Ribeiro, B. A Collective Learning Framework to Boost GNN Expressiveness for Node Classification. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 4040–4050. [Google Scholar]
Gilmer, J.; Schoenholz, S.S.; Riley, P.F.; Vinyals, O.; Dahl, G.E. Neural message passing for quantum chemistry. In Proceedings of the International Conference on Machine Learning, Sydney, Australia, 6–11 August 2017; pp. 1263–1272. [Google Scholar]
Zhang, J.; Dong, B.; Philip, S.Y. Fakedetector: Effective fake news detection with deep diffusive neural network. In Proceedings of the IEEE International Conference on Data Engineering, Virtual, 20–24 April 2020; pp. 1826–1829. [Google Scholar]
Jin, W.; Yang, K.; Barzilay, R.; Jaakkola, T. Learning multimodal graph-to-graph translation for molecular optimization. In Proceedings of the International Conference on Learning Representations, Vancouver, Canada, 30 April–3 May 2018; pp. 1536–1547. [Google Scholar]
Fout, A.; Byrd, J.; Shariat, B.; Ben-Hur, A. Protein interface prediction using graph convolutional networks. In Proceedings of the International Conference on Neural Information Processing Systems, Guangzhou, China, 14–18 October 2017; pp. 6533–6542. [Google Scholar]
Fischer, K.; Simon, M.; Olsner, F.; Milz, S.; Gross, H.M.; Mader, P. StickyPillars: Robust and Efficient Feature Matching on Point Clouds Using Graph Neural Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LA, USA, 18–24 June 2021; pp. 313–323. [Google Scholar]
Qi, X.; Liao, R.; Jia, J.; Fidler, S.; Urtasun, R. 3d graph neural networks for rgbd semantic segmentation. In Proceedings of the IEEE International Conference on Computer Vision, Venice, Italy, 22–29 October 2017; pp. 5199–5208. [Google Scholar]
Wu, Y.; Liu, Y.; Gong, M.; Gong, P.; Li, H.; Tang, Z.; Miao, Q.; Ma, W. Multi-View Point Cloud Registration Based on Evolutionary Multitasking with Bi-Channel Knowledge Sharing Mechanism. IEEE Trans. Emerg. Top. Comput. Intell. 2022, 61, 1–18. [Google Scholar] [CrossRef]
Wu, Y.; Zhang, Y.; Fan, X.; Gong, M.; Miao, Q.; Ma, W. INENet: Inliers Estimation Network with Similarity Learning for Partial Overlapping Registration. IEEE Trans. Circuits Syst. Video Technol. 2022. [Google Scholar] [CrossRef]
Wu, Y.; Ding, H.; Gong, M.; Qin, A.K.; Ma, W.; Miao, Q.; Tan, K.C. Evolutionary Multiform Optimization with Two-stage Bidirectional Knowledge Transfer Strategy for Point Cloud Registration. IEEE Trans. Evol. Comput. 2022. [Google Scholar] [CrossRef]
Kosaraju, V.; Sadeghian, A.; Martín-Martín, R.; Reid, I.; Rezatofighi, S.H.; Savarese, S. Social-bigat: Multimodal trajectory forecasting using bicycle-gan and graph attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 1–14. [Google Scholar]
Zheng, C.; Fan, X.; Wang, C.; Qi, J. Gman: A graph multi-attention network for traffic prediction. Aaai Conf. Artif. Intell. 2020, 34, 1234–1241. [Google Scholar] [CrossRef]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, Canada, 30 April–3 May 2018; pp. 1420–1431. [Google Scholar]
Kersting, K.; Kriege, N.M.; Morris, C.; Mutzel, P.; Neumann, M. Benchmark Data Sets for Graph Kernels. 2016. Available online: http://www.graphlearning.io/ (accessed on 6 February 2020).
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 26 June–1 July 2016; pp. 770–778. [Google Scholar]
Namata, G.; London, B.; Getoor, L.; Huang, B.; EDU, U. Query-driven active surveying for collective classification. Int. Workshop Min. Learn. Graphs 2012, 8, 1–8. [Google Scholar]
Sen, P.; Namata, G.; Bilgic, M.; Getoor, L.; Galligher, B.; Eliassi-Rad, T. Collective classification in network data. Artif. Intell. Mag. 2008, 29, 93. [Google Scholar] [CrossRef] [Green Version]
Gori, M.; Monfardini, G.; Scarselli, F. A new model for learning in graph domains. In Proceedings of the IEEE International Joint Conference on Neural Networks, Budapest, Hungary, 10–15 July 2005; pp. 729–734. [Google Scholar]
Scarselli, F.; Gori, M.; Tsoi, A.C.; Hagenbuchner, M.; Monfardini, G. The graph neural network model. IEEE Trans. Neural Netw. 2008, 20, 61–80. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Xu, D.; Ruan, C.; Korpeoglu, E.; Kumar, S.; Achan, K. Inductive representation learning on temporal graphs. In Proceedings of the International Conference on Learning Representations, Addis Ababa, Ethiopia, 26–30 April 2020; pp. 890–902. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
Xu, K.; Li, C.; Tian, Y.; Sonobe, T.; Kawarabayashi, K.i.; Jegelka, S. Representation learning on graphs with jumping knowledge networks. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 5453–5462. [Google Scholar]
He, X.; Zhao, K.; Chu, X. AutoML: A Survey of the State-of-the-Art. Knowl.-Based Syst. 2021, 212, 106622–106649. [Google Scholar] [CrossRef]
Liang, J.; Meyerson, E.; Hodjat, B.; Fink, D.; Mutch, K.; Miikkulainen, R. Evolutionary Neural AutoML for Deep Learning. In Proceedings of the Genetic and Evolutionary Computation Conference, Prague, Czech Republic, 13–17 July 2019; pp. 401–409. [Google Scholar]
Wu, Y.; Li, J.; Yuan, Y.; Qin, A.K.; Miao, Q.G.; Gong, M.G. Commonality Autoencoder: Learning Common Features for Change Detection From Heterogeneous Images. IEEE Trans. Neural Netw. Learn. Syst. 2022, 33, 4257–4270. [Google Scholar] [CrossRef] [PubMed]
Zoph, B.; Le, Q.V. Neural architecture search with reinforcement learning. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017; pp. 3075–3088. [Google Scholar]
Chen, M.; Wei, Z.; Huang, Z.; Ding, B.; Li, Y. Simple and deep graph convolutional networks. In Proceedings of the International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1725–1735. [Google Scholar]
Spinelli, I.; Scardapane, S.; Uncini, A. Adaptive Propagation Graph Convolutional Network. IEEE Trans. Neural Netw. Learn. Syst. 2021, 32, 4755–4760. [Google Scholar] [CrossRef] [PubMed]
Sun, Y.; Yao, X.; Bi, X.; Huang, X.; Zhao, X.; Qiao, B. Time-Series Graph Network for Sea Surface Temperature Prediction. Big Data Res. 2021, 25, 100237. [Google Scholar] [CrossRef]
Bi, X.; Liu, Z.; He, Y.; Zhao, X.; Sun, Y.; Liu, H. GNEA: A Graph Neural Network with ELM Aggregator for Brain Network Classification. Hindawi Publ. Corp. 2020, 2020, 8813738. [Google Scholar] [CrossRef]
Fan, X.; Gong, M.; Tang, Z.; Wu, Y. Deep Neural Message Passing with Hierarchical Layer Aggregation and Neighbor Normalization. IEEE Trans. Neural Netw. Learn. Syst. 2021, 25, 540–554. [Google Scholar] [CrossRef] [PubMed]
Fan, X.; Gong, M.; Wu, Y.; Qin, A.K.; Xie, Y. Propagation Enhanced Neural Message Passing for Graph Representation Learning. IEEE Trans. Knowl. Data Eng. 2021. [Google Scholar] [CrossRef]
Klicpera, J.; Bojchevski, A.; Günnemann, S. Predict then propagate: Graph neural networks meet personalized pagerank. In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; pp. 1314–1325. [Google Scholar]
Xiao, T.; Chen, Z.; Wang, D.; Wang, S. Learning How to Propagate Messages in Graph Neural Networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Virtual, 14–18 August 2021; pp. 2235–2246. [Google Scholar]
Colson, B.; Marcotte, P.; Savard, G. An overview of bilevel optimization. Ann. Oper. Res. 2007, 153, 235–256. [Google Scholar] [CrossRef]
Maclaurin, D.; Duvenaud, D.; Adams, R. Gradient-based hyperparameter optimization through reversible learning. In Proceedings of the International Conference on Machine Learning, Lile, France, 6–11 July 2015; pp. 2113–2122. [Google Scholar]
Pedregosa, F. Hyperparameter optimization with approximate gradient. In Proceedings of the International Conference on Machine Learning, New York, NY, USA, 19–24 June 2016; pp. 737–746. [Google Scholar]
Franceschi, L.; Frasconi, P.; Salzo, S.; Grazzi, R.; Pontil, M. Bilevel programming for hyperparameter optimization and meta-learning. In Proceedings of the International Conference on Machine Learning, Stockholm, Sweden, 10–15 July 2018; pp. 1568–1577. [Google Scholar]
McAuley, J.; Targett, C.; Shi, Q.; Van Den Hengel, A. Image-based recommendations on styles and substitutes. In Proceedings of the International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 43–52. [Google Scholar]
Leshno, M.; Lin, V.Y.; Pinkus, A.; Schocken, S. Multilayer feedforward networks with a nonpolynomial activation function can approximate any function. Neural Netw. 1993, 6, 861–867. [Google Scholar] [CrossRef]
Hamilton, W.L.; Ying, R.; Leskovec, J. Inductive representation learning on large graphs. Int. Conf. Neural Inf. Process. Syst. 2017, 152, 1025–1035. [Google Scholar]
Liu, M.; Gao, H.; Ji, S. Towards Deeper Graph Neural Networks. In Proceedings of the ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 23–27 August 2020; pp. 5221–5233. [Google Scholar]
Li, G.; Muller, M.; Thabet, A.; Ghanem, B. DeepGCNs: Can GCNs Go as Deep as CNNs? In Proceedings of the IEEE/CVF International Conference on Computer Vision, Long Beach, CA, USA, 15–20 June 2019; pp. 9267–9276. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. In Proceedings of the International Conference on Learning Representations (ICLR), San Diego, CA, USA, 7–9 May 2015; pp. 3612–3624. [Google Scholar]

Figure 1. Performance of GCNs with different numbers of layers on the node classification task on the Cora dataset.

Figure 2. Network architecture of GraphSAP. SAP is based on data-driven learning at different layers. Where l is the number of layers of the network,

h^{(l)}

denotes the hidden features learned by the node at layer l,

r e l a x

denotes relaxation operation,

m i x

denotes Equation (6), and X denotes the initial node features.

Figure 2. Network architecture of GraphSAP. SAP is based on data-driven learning at different layers. Where l is the number of layers of the network,

h^{(l)}

denotes the hidden features learned by the node at layer l,

r e l a x

denotes relaxation operation,

m i x

denotes Equation (6), and X denotes the initial node features.

Figure 3. The training state of our model on the Amazon Photo dataset [44] with a 64-layer network structure. (a) is the validation loss and training loss for the training selection network structure; (b) is the validation loss and training loss for training parameters corresponding to the network structure.

Figure 4. The performance comparison of the network we designed under different layers. We have performed experiments on different datasets. As can be seen in the figure, our data-driven layer connection learning method has relatively good network performance when the number of layers increases.

Figure 5. The 16-layer network structure learned by the model for the node classification task on the Amazon Photo dataset, where Z denotes the final representation of the node after softmax.

Table 1. Dataset statistics.

Datasets	Nodes	Edges	Features	Classes
Cora	2708	5278	1433	7
CiteSeer	3327	4552	3703	6
PubMed	19,717	44,324	500	3
Coauthor CS	18,333	81,894	6805	15
Coauthor Physics	34,493	247,962	8415	5
Amazon Computers	13,752	245,778	767	10
Amazon Photo	7487	119,043	745	8

Table 2. Comparison of GraphSAP with other models for node classification tasks on Cora, Citeseer, PubMed, Coauthor CS, Coauthor Physics, Amazon Computers, and Amazon Photo datasets.

	Cora	Citeseer	PubMed	Coauthor CS	Coauthor Physics	Amazon Computers	Amazon Photo
Model	Cora	Citeseer	PubMed	Coauthor CS	Coauthor Physics	Amazon Computers	Amazon Photo
GCN [3]	81.2	70.3	77.8	93.8	96.1	90.7	92.2
GAT [18]	81.5	71.4	78.7	90.5	92.5	78.0	85.7
SGC [4]	79	68.5	76	83	90.5	88.5	90.3
JKNet [27]	79.6	69.8	78.4	93.8	97.1	90.5	93.6
MLP [45]	58.2	59.1	70.0	88.3	88.9	44.9	69.6
GraphSage [46]	76.6	67.5	76.1	85.0	90.3	/	90.4
DAGNN [47]	82.5	71.2	78.8	92.1	93.7	88.7	93.9
GCNII [32]	82.8	72.6	78.8	88.53	94.8	61.4	88.8
Dense-GCN [48]	80	/	72.6	93.6	96.1	91.1	94.1
Res-GCN [48]	80	/	78.4	93.8	96.0	91.7	94.3
GraphSAP(ours)	81.5	71.7	78.9	95.1	97.2	92.3	95.2

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, Y.; Hu, X.; Fan, X.; Ma, W.; Gao, Q. Learning Data-Driven Propagation Mechanism for Graph Neural Network. Electronics 2023, 12, 46. https://doi.org/10.3390/electronics12010046

AMA Style

Wu Y, Hu X, Fan X, Ma W, Gao Q. Learning Data-Driven Propagation Mechanism for Graph Neural Network. Electronics. 2023; 12(1):46. https://doi.org/10.3390/electronics12010046

Chicago/Turabian Style

Wu, Yue, Xidao Hu, Xiaolong Fan, Wenping Ma, and Qiuyue Gao. 2023. "Learning Data-Driven Propagation Mechanism for Graph Neural Network" Electronics 12, no. 1: 46. https://doi.org/10.3390/electronics12010046

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Learning Data-Driven Propagation Mechanism for Graph Neural Network

Abstract

1. Introduction

2. Related Work

2.1. Graph Neural Networks

2.2. Data-Driven Methods

3. Background

3.1. Graph Convolutional Network

3.2. Deep GNNs

4. GraphSAP Network

4.1. Model Analysis

4.2. Data-Driven Propagation Mechanism

5. Experiment

5.1. Datasets

5.2. Settings

5.3. Results

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI