*5.2. Influence of Training Data*

Similar to [30], we increased the sampling ratio of the BLOG CATALOG3 network from 10% to 90% to observe the performance of **SORAG** on larger training sets. Because the FLICKR and YOU TUBE networks are considerably larger, we varied the sampling ratio from 1% to 10%, which is also consistent with [30]. We also tested the performance of the state-of-the-art method, **GraphSMOTE**, for comparison.

Figure 4 shows the Micro-F1 and Macro-F1 of each analyzed model with respect to the sampling ratio on each dataset. It can be observed that with an increase in the training data, **SORAG***F* exhibits the most stable and promising performance, whereas the performances of **SORAG***L* and **SORAG***U* fluctuate considerably under different test conditions. This finding supports our argumen<sup>t</sup> that the most effective oversampling strategy for multi-label graphs is to conjoin the generation of both unlabeled data and labeled data flexibly. It is also worth mentioning that **GraphSMOTE** shows competitive performance, especially on the YOU TUBE dataset. On average, compared with **GraphSMOTE**, the improvements

brought by **SORAG***F* are 3.9% (BLOGCATALOG3), 6.4% (FLICKR), and 0.4% (YOUTUBE) in terms of Micro-F1 and 6.2% (BLOGCATALOG3), −0.1% (FLICKR), and 1.4% (YOUTUBE) in terms of Macro-F1, respectively.

**Figure 4.** Performance of selected methods with reference to sampling ratio: (**a**) BLOGCATALOG3, MICRO-F1; (**b**) BLOGCATALOG3, MACRO-F1; (**c**) FLICKR, MICRO-F1; (**d**) FLICKR, MACRO-F1; (**e**) YOUTUBE, MICRO-F1; (**f**) YOUTUBE, MACRO-F1.
