*4.4. Training Configurations*

Following the semi-supervised learning setting, we randomly sampled a portion of the labeled nodes (i.e., sampling ratio) of each dataset and used them for evaluation. Then, we randomly split the sampled nodes into 60%/20%/20% for training, validation, and testing. As in [30], the sampling ratios for the BLOGCATALOG3, FLICKR, and YOUTUBE networks were set to 10%, 1%, and 1%, respectively. To balance the class size, we experimented with different amounts of synthetic unlabeled nodes and synthetic labeled nodes (i.e., different oversampling rates); finally, they were set as those in Section 5.3. All the analyzed models were trained using the Adam optimizer [53] in PyTorch (2020.2.1, community edition) [54]. Each result is presented as the mean of ten replicated experiments. All models were trained until they converged with a typical number of training epochs of 200.
