4.5.1. Node Classification Task
The results of the node classification experiments comparing the model and the ten baseline models on the three citation network datasets, i.e., Cora, Citeseer, Pubmed, and DBLP, are shown in
Table 5, with the highest experimental results in each dataset denoted in bold.
In order to better observe the experimental results, this section further represents the algorithm results in a bar chart as shown in
Figure 5. From
Table 5, it can be observed that all the algorithms in this paper achieve the best results compared to the baseline algorithm. On the Cora dataset, the algorithm improves by 3.3% on the Micro-F1 value over the COSTAMV algorithm, which has the best results. On the Citeseer dataset, the algorithm improves 0.1% on the Micro-F1 value over the COSTAMV algorithm, which has the best results. On the Pubmed dataset, the algorithm is similar to the results of the DGI and the GCA, but compared to the other algorithms, they show better performance; however, both DGI and GCA model training need a large number of negative samples, while the algorithm in this paper does not need negative samples for calculation. Thus, the algorithm does not need strong arithmetic power, and from this point of view, the algorithm is better than DGI and GCA algorithms. Finally, on the DBLP dataset, the algorithm improves by 0.9% compared with the most effective COSTAMV algorithm.
Although the algorithm slightly outperforms the suboptimal algorithm on the Citeseer, Pubmed, and DBLP datasets, the algorithm has a significant improvement on the Cora dataset, as well as the algorithm shows good algorithmic performance on all four datasets.
As can be seen in
Figure 5, the performance obtained by using an objective function with no negative sample loss sometimes even exceeds that of a negative sample-based objective function, indicating a promising future direction, which provides a more efficient negative sample-free solution for subsequent graph contrastive learning. In contrast to the negative sample-based objective, the Barlow Twins loss avoids the need for negative samples, thus significantly reducing the computational burden.
4.5.3. Parameter Sensitivity Analysis
This subsection focuses on the parameter sensitivity of the model. Specifically, the effects of changes in the node output dimension, learning rate, and diffusion coefficient size, as well as the parameters of the combined augmentation strategy on the node classification performance, are investigated. Due to some similarities in different datasets, the Cora dataset is used as an example for illustration. In order to be fair, other parameters except test variables are kept constant during the experiment.
Node Output Dimension: the node output dimension directly affects the model performance. As shown in
Figure 6, as the node representation dimension increases, the model performance first increases then stays the same, and then decreases. As the node output dimension increases from 128 to 512, the results of node categorization show an increasing trend up to 87.27%. This may be due to the fact that at lower dimensions, the model may have difficulty capturing the complex features and structural information of all the nodes in the graph. As the dimension increases, the model is able to learn richer node representations, which improves the classification results. When the node output dimensions continue to increase to 1024 and 2048, the effectiveness of node classification decreases instead. This may be due to the fact that too high dimension leads to model overfitting (i.e., the model is too sensitive to the training data) and the generalization performance on the test data decreases. Another reason could be that a too high dimension increases the computational complexity of the model, leading to a more difficult training process, which affects the final classification performance.
Learning rate: the learning rate has a certain impact on the node classification performance.
Figure 7 shows the node classification performance with the size of the learning rate. It can be seen that as the learning rate increases, the node classification classification results show a weak increase followed by a decrease. When the learning rate increases from 0.0001 to 0.0003, the node classification accuracy improves and reaches a peak of 87.24%. This suggests that lower learning rates may prevent the model from converging to the optimal solution because smaller parameter updates require more iterations. On the contrary, a slightly higher learning rate allows the model to find a suitable local optimal solution more efficiently. When the learning rate was further increased to 0.0005, there was a slight decrease in classification accuracy. This decrease can be attributed to oscillations in the optimization process, which is caused by a learning rate that is too high to converge stably to a suitable local optimal solution. As the learning rate increases to 0.0007, 0.0009, and 0.001, the classification accuracy continues to decrease. This phenomenon may be caused by overfitting due to the high learning rate. As a result, the model is too sensitive to the training data, which reduces its generalization performance.
Diffusion coefficient: the effect of graph diffusion scale factor on node classification performance. When the input of the perspective is a graph diffusion matrix with different scales, the diffusion coefficient is analyzed. From
Figure 8, it can be observed that with the gradual increase, the node classification performance shows a trend of increasing and then decreasing. At a diffusion coefficient of 0.2, the results of the node classification experiment reached the highest value of 87.13%. When the diffusion coefficient increases from 0.1 to 0.2, the experimental results improve, which suggests that a larger diffusion coefficient in this range may help the model to capture more local and global information, thus improving the accuracy of node classification. However, when the diffusion coefficient continues to increase to 0.3 and above, we can see a gradual decrease in the node classification experimental results. This may be due to the fact that too large diffusion coefficients lead to an increase in noise in the information diffusion process, making it difficult for the model to capture effective features. In addition, too large a diffusion coefficient may lead to excessive diffusion of information, which makes local features become less obvious, thus affecting the classification performance.
Data augmentation method parameters: this section provides a sensitivity analysis of the main hyperparameters of the data augmentation algorithms, focusing on the impact of the control parameters pe and pf for the two data augmentation methods Feature Masking (FM) and Edge Removing (ER). This analytical process uses the Cora citation dataset and provides insights for the four parameters pe1 and pe2, as well as pf1 and pf2. In this section, node categorization experiments are conducted over a parameter range of 0.1 to 0.7, by which the impact of hyperparameters on accuracy can be more fully assessed. In order to control the augmentation at the topology level and node feature level, pe = pe1 = pe2 and pf = pf1 = pf2. In performing the sensitivity analysis, only these four parameters are adjusted in this section, keeping the other parameters stable. The related experimental results are shown in
Figure 9. The different colored bars denote the different dataset. The horizontal axis represents the Edge Removing parameter probability and the vertical axis represents the Feature Masking parameter probability in
Figure 9.
Edge Removing parameter sensitivity analysis: as the Edge Removing parameter probability increases, we can observe the fluctuation of the classification accuracy under different Feature Masking parameters. At low Feature Masking parameter probabilities (e.g., 0.1–0.3), an increase in the Edge Removing parameter probability leads to a decrease in classification accuracy. This indicates that the model is more sensitive to the Edge Removing parameter, which may be due to the fact that removing more edges destroys the topology of the graph, making it difficult for the model to capture the structural information in the original data.
Sensitivity analysis of the Feature Masking parameter: with the increase in the probability of the Feature Masking parameter, we can observe the change of classification accuracy under different Edge Removing parameters. At low Edge Removing parameter probabilities (e.g., 0.1–0.3), an increase in the probability of the Feature Masking parameter leads to a decrease in classification accuracy. This indicates that the model is also more sensitive to the Feature Masking parameter, which may be due to the fact that masking more features makes it difficult for the model to obtain feature information in the original data.
Parameter combination sensitivity analysis: When the probability of Edge Removing and Feature Masking parameters increase simultaneously, the classification accuracy generally shows a decreasing trend. This indicates that the model is more sensitive under the simultaneous action of these two parameters, probably because the topology and node features of the graph are damaged at the same time, making it difficult for the model to capture the effective information in the original data. In practical applications, it is necessary to find a balance between these two parameters, so that the model can both learn the information from the original data and have a better generalization ability.