**6. Conclusions**

This study investigates a new research problem: imbalanced multilabel graph node classification. In contrast to existing over-sampling algorithms, which generate only new minority instances to balance the class distribution, we propose a novel data generation strategy called **SORAG**, which conjoins the synthesis of labeled instances in minority class centers and unlabeled instances in minority class borders. The new supervision information brought about by the labeled synthetics and the blocking of overpropagated majority features by the unlabeled synthetics facilitates balanced learning between different classes, taking advantage of the strong topological interdependence between nodes on a graph.

We conducted extensive comparative studies to evaluate the proposed framework on diverse, naturally imbalanced multilabel networks. The experimental results demonstrated the high effectiveness and robustness of **SORAG** in handling imbalanced data. In the future, we will develop GNN models that are more adapted to the nature of real-world networks (e.g., scale-free and small-world features).

**Author Contributions:** Methodology/system implementation/experiments/original draft preparation: Y.D.; Supervision: X.L., A.J., H.-t.Y., S.L., K.-S.K. and A.M. All authors have read and agreed to the published version of the manuscript.

**Funding:** This paper is based on results obtained from a project, JPNP20006, commissioned by the New Energy and Industrial Technology Development Organization (NEDO). We would also like to acknowledge partial support from JSPS Grant-in-Aid for Scientific Research (Grant Number 21K12042).

**Data Availability Statement:** The datasets used in this study are available at http://zhang18f.myweb. cs.uwindsor.ca/datasets/ (accessed on 25 June 2022).

**Acknowledgments:** We would like to thank the reviewers for the time and effort necessary to review the manuscript. We sincerely appreciate all the valuable comments and suggestions, which helped us improve the quality of the manuscript. This manuscript is an extension of the authors' earlier work, which is to be presented at the 2022 European Conference on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML-PKDD 2022).

**Conflicts of Interest:** The authors declare no conflict of interest.
