Because of its focus on the simple and direct first-order neighborhood of nodes in the graph, GAT is prone to overfitting. In this paper, two new methods, namely MGAT and MGATv2, are proposed based on the GAT and GATv2 models, respectively. These methods introduce a hybrid information matrix based on motif structures to preserve the higher-order structural information in the graph and capture the hidden weak connections between nodes.
4.3.1. MGAT
Different graph attention models have their essential differences in the computation of attention coefficients
. For instance, the graph transformer network method [
13] adopts the query, key, and value mechanism from the transformer model to compute attention coefficients in graph data. In this paper, a novel approach is proposed to redesign the generation of attention coefficients
in GAT and introduce a new graph attention model.
The current convolutional graph neural network models tend to focus on low-order neighborhood structural features of graphs and ignore high-order structural features on the network. However, motifs are important high-order topological structures in the field of complex networks, and are essentially frequently occurring subgraphs, which can effectively help models capture high-order structural information of networks. This study introduces the closed triadic network structure M3 and proposes a new graph neural network, the motif-based graph attention network (MGAT). To incorporate the motif structural features while preserving the first-order neighborhood information of the nodes, MGAT introduces the motif-based adjacency matrix
and the motif-based hybrid information matrix
H. The formulation for calculating the
is as follows:
The value at the
i-th row and
j-th column of the motif-based adjacency matrix
represents the number of closed triadic motifs in which the edge
participates. Taking the M3 motif defined in
Figure 1 as an example,
Figure 4a represents the original network, while
Figure 4b represents the adjacency matrix
based on M3. From
Figure 4a, it can be observed that the edge
is included once in the M3 formed by nodes (1,2,3) and (1,2,4), thus the value of
is 2. The construction mechanism of the motif-based adjacency matrix reveals that the more times an edge belongs to motifs, the greater its information aggregation weight. Therefore, the motif-based adjacency matrix
can preserve the high-order structural information of the network to a certain extent.
Specifically, when the motif is limited to closed triadic motifs, its calculation can be formulated as follows:
In the above formula,
A represents the adjacency matrix,
B represents the transition matrix, and
Hadamard denotes the Hadamard product (element-wise multiplication). In this case, the motif-based adjacency matrix
can represent the high-order structural features of the graph. To incorporate the high-order structural features while preserving the low-order node-edge relationships, this paper introduces the motif-based hybrid information matrix
H. The calculation process of
H can be formulated as follows:
In the above formula, the hybrid information matrix
H is represented as the weighted sum of the adjacency matrix
A and the motif-based adjacency matrix
. The hyperparameter
is used to control the proportion of high-order structural information and low-order structural information in the hybrid information matrix
H. A larger value of
emphasizes the low-order structural information, while a smaller value of
emphasizes the high-order structural information. MGAT, based on the prototype of GAT, introduces motif structural information and redefines the attention calculation formula. Essentially, it calculates attention scores based on the weighted motif structural information. The formula can be expressed as follows:
where
represents the non-linear activation function LeakyReLU.
As shown in
Figure 5, the new attention coefficients in MGAT, compared with the original attention coefficients in GAT, incorporate both the attribute features of nodes and the structural features between nodes. This is achieved by introducing the motif-based hybrid information matrix
H, which takes into account not only the low-order neighborhood structure, but also the high-order motif structure.
After obtaining the new attention scores, MGAT also performs regularization on
e to obtain the final attention coefficients, denoted as
. These attention coefficients are used as weights for aggregating the feature vectors of neighboring nodes in the aggregation operation. The weighted sum of the first-order neighbor features is then passed through a non-linear activation function to obtain the new node feature vectors. As shown in
Figure 6, similar to GAT, MGAT also utilizes the multi-head attention mechanism.
A summary of MGAT is presented in Algorithm 1. Overall, the MGAT model combines node attribute features with high-order structural features, enriching the feature aggregation of the model and providing a more nuanced understanding of local attributes and global graph structures. Additionally, in MGAT, the use of the hyperparameter balances the influence of the node attribute features and motif-based structural features, ensuring that the model does not overly prioritize one aspect. These improvements allow the MGAT model to be more adept at handling complex graph structures and enhance its ability to learn from both local and global graph features.
Algorithm 1: Summary of MGAT. |
Input: | Graph ; Node features ; Adjacency matrix ; Motif-based adjacency matrix ; Hyperparameter ; Number of epochs . |
Output: | Enhanced node representations . |
1: | Initialize MGAT with graph G, features X, matrices A, , and hyperparameter ; |
2: | For i = 0 to − 1 do: |
3: | Compute mixed information matrix H using A, , and [Equation (8)]; |
4: | # H combines motif and adjacency information for attention calculation. |
5: | Calculate attention scores using H [Equation (9)]; |
6: | Normalize attention scores to obtain attention coefficients [Equation (3)]; |
7: | Update node representations [Equation (4)]; |
8: | Optimize model parameters (e.g., using Negative Log Likelihood Loss); |
9: | End for |
10: | Return enhanced node representations Z. |
4.3.2. MGATv2
In further research on GAT, the work by Shaked Brody et al. [
8] suggests that the attention mechanism in GAT is a static attention mechanism. The static nature of the attention mechanism in GAT refers to the phenomenon where different core nodes have consistent attention distribution when aggregating neighbor node features.
As shown in
Figure 7, the static nature of the attention mechanism in GAT can be observed as follows: if for a specific query node
, the attention coefficients assigned to key nodes
and
are
, respectively, then for any query node
, it holds that
. This phenomenon is visually represented in the attention coefficient line graph, where, for a given sequence of key nodes, the trend of attention coefficient changes remains the same, regardless of the variation in query nodes. In contrast, for the dynamic attention mechanism in GATv2, the attention coefficients assigned by a query node to any key node are independent of any other query node in the graph. The attention coefficient line graph provides visual proof of this.
After discovering this phenomenon, the work of GATv2 proved and addressed this issue. In contrast with the static attention mechanism, it proposed a dynamic attention mechanism, where the calculation formula for attention scores
e is as follows:
where the variables have the same meanings as the attention coefficient calculation formula in GAT, and the remaining steps are consistent with GAT. GATv2 demonstrated the dynamic nature of this attention calculation approach.
As shown in
Figure 8, following the same consideration as MGAT, MGATv2 introduces the motif-based hybrid matrix
H to enhance the expressive power of high-order structural features in the graph attention mechanism. The calculation method for attention scores
e can be formulated as follows:
Similarly, MGATv2 is also a dynamic attention mechanism.