*2.1. Graph Representation Learning*

A growing number of applications use non-Euclidean methods to generate data, which are then represented as graphs with complex relationships and inter-object dependencies. Past feature representation and extraction algorithms face substantial difficulties in handling the complexity of graph data. Over the past decade, many studies on extending traditional feature extraction approaches for graph data have emerged. Among them, GRL has evolved considerably and can be roughly divided into three generations, including traditional graph embedding, modern graph embedding, and deep learning on graphs. The first generation of methods are classic dimension reduction techniques, such as IsoMap [28] and LLE [29]. The second generation of feature extraction methods on graphs are modern graph embedding methods, such as DeepWalk [30] and LINE [31]. GNNs can be broadly regarded as the third (and latest) generation of GRL after the traditional and modern graph embedding, and they are reported to achieve the most promising performance in a wide range of computational tasks on graphs [32].

A growing body of research has shown that GNNs are extremely effective for both traditional GRL tasks (e.g., recommender systems and social network analysis) and new research areas (e.g., healthcare, physics, and combinatorial optimization) [32]. A typical GNN consists of graph filters and/or graph pooling layers. The former take the node features and graph structure as inputs and output new node features. The latter take the graph as an input and output a coarsened graph with a few nodes.GNNs can be broadly classified into spatial and spectral approaches based on their graph filters. The former explicitly leverages the graph structure, for example, spatially close neighbors, whereas the

latter analyzes the graph using a graph Fourier transform and an inverse graph Fourier transform [33].

Classical spatial-based GNNs include [9–11,34–37]. Ref. [9] is a very early GNN that uses the local transition function as a graph filter. A GraphSAGE filter [10] uses different aggregators (mean/LSTM/pooling) to aggregate information regarding the one-hop neighbors of the nodes. In addition, a GAT-filter [11] relies on a self-attention mechanism to distinguish the importance of neighboring nodes during the aggregation process. An ECC-filter [34] was also proposed to handle graphs with different types of edges. Similarly, a GGNN-filter [36] was designed for graphs with different types of directed edges. By contrast, a Mo-filter [35] is based on a Gaussian kernel. Finally, an MPNN [37] is a more general framework, with the GraphSAGEfilter and GAT-filter mentioned above being special cases. In general, spatial-based GNNs are more generalized and flexible.

Spectral-based graph filters use the graph spectral theory in the design of the filtering operations within the spectral domain. Early studies [33] dealt with the eigen decomposition of the Laplacian matrix and the matrix multiplication between dense matrices; thus, they are computationally expensive. To overcome this problem, a Poly-filter [38] based on a K-order truncated polynomial was proposed. To solve the problem of a Poly-filter in which the basis of the polynomial is not orthogonal, a Cheby-filter [38] based on the Chebyshev polynomial was introduced. A GCN-filter [12] is a simplified version of a Cheby-filter. The latter involves a K-hop neighborhood of a node during the filtering process, whereas in the former, K = 1. A GCN-filter can also be regarded as a spatial-based filter. Currently, GCNs are among the most widely used types of GNN. In our model, we used a GCN as the key component.

Feature extraction methods such as graph embedding and graph kernel techniques are strongly related to the study on GNNs. Compared to GNNs, the former only focus on representing network nodes as low-dimensional vector representations without targeting subsequent tasks, such as graph node classification and link prediction. Many graph embedding techniques are linear and not in an end-to-end manner, such as random walk [39] and matrix factorization [40]. Graph kernel techniques employ a kernel function to obtain the vector representations of graphs. They are also an important type of approaches to solve the graph classification problem. However, compared to GNNs, they are not learnable and far less efficient.

GNNs are designed for different graph-based tasks, such as node classification, link prediction, graph classification, and community detection. In particular, our task is semisupervised, which means that we need to learn the representation of all nodes from a few labeled nodes and the remaining unlabeled nodes. GNNs are a rapidly growing field at the present time. For a more comprehensive and detailed introduction to this field, we refer the reader to [32].
