*3.3. Node Generator*

We denote the joint distribution of node feature *x* and label *y* in the SF region as *PSF*(*<sup>x</sup>*, *y*), the marginal distribution of *y* as *PSF*(*y*), and the marginal distribution of *x* in the BD region as *PBD*(*x*). Generator *Gl* is expected to generate labeled instances in the SF region, while generator *Gu* should output unlabeled synthetics in the BD region. Let the data distribution produced by *Gl* and *Gu* be denoted as *Pl*(*<sup>x</sup>*, *y*) and *Pu*(*x*), respectively; then, we expect *PBD*(*x*) ≈ *Pu*(*x*) and *PSF*(*<sup>x</sup>*, *y*) ≈ *Pl*(*<sup>x</sup>*, *y*). Furthermore, a more flexible goal is to have *PBD*(*x*) ≈ *α* · *Pu*(*x*)+(<sup>1</sup> − *α*) · *Pl*(*x*), *PSF*(*<sup>x</sup>*, *y*) ≈ *β* · *Pl*(*<sup>x</sup>*, *y*)+(<sup>1</sup> − *β*) · *Pu*(*<sup>x</sup>*, *y*), *α* ≈ 1, *β* ≈ 1. *α* and *β* are parameters used to control *Gl* and *Gu* to produce various data distributions to fit the original data. Here, *Pu*(*<sup>x</sup>*, *y*) is the joint distribution of *Pu*(*x*) and *PSF*(*y*), and *Pl*(*x*) is the marginal distribution of *Pl*(*<sup>x</sup>*, *y*).

To achieve the above goal, we propose a node generator, which is essentially an ensemble of a GAN [25] and a CGAN [27]. The GAN is responsible for generating unlabeled synthetic nodes, whose generator and discriminator are, respectively, denoted as *Gu* and *Du*. The CGAN is used for generating labeled synthetic instances, where its generator and discriminator are denoted as *Gl* and *Dl*, respectively. Our loss function for training the GAN is

$$\min\_{\mathcal{L}\_{\mathbf{u}}} \max\_{D\_{\mathbf{u}}} \mathcal{L}\_{\mathbf{GAN}} = \mathbb{E}\_{\mathbf{x} \sim P\_{\mathbf{RD}}(\mathbf{x})} \log D\_{\mathbf{u}}(\mathbf{x}) + \mathbf{a} \cdot \mathbb{E}\_{\mathbf{x} \sim P\_{\mathbf{u}}(\mathbf{x})} \log(1 - D\_{\mathbf{u}}(\mathbf{x})) \tag{6}$$

For the CGAN, our objective is given as

$$\min\_{\mathcal{G}\_l} \max\_{D\_l} \mathcal{L}\_{\text{cGAN}} = \mathbb{E}\_{(\mathbf{x}, \mathbf{y}) \sim P\_{\text{SF}}(\mathbf{x}, \mathbf{y})} \log D\_l(\mathbf{x}, \mathbf{y}) + \beta \cdot \mathbb{E}\_{(\mathbf{x}, \mathbf{y}) \sim P\_l(\mathbf{x}, \mathbf{y})} \log(1 - D\_l(\mathbf{x}, \mathbf{y})) \tag{7}$$

To achieve flexible control over *Gl* and *Gu*, we design the following loss function based on the interaction between the GAN and CGAN:

$$\begin{split} \min\_{\mathcal{L}\_{\mathbf{u}, \mathcal{G}\_l}} \max\_{D\_{\mathbf{u}}, D\_{\mathbf{l}}} \mathcal{L}\_{GAN-cGAN} &= (1 - a) \cdot \mathbb{E}\_{\mathbf{x} \sim P\_l(\mathbf{x})} \log(1 - D\_{\mathbf{l}}(\mathbf{x})) \\ &+ (1 - \beta) \cdot \mathbb{E}\_{(\mathbf{x}, \mathbf{y}) \sim P\_u(\mathbf{x}, \mathbf{y})} \log(1 - D\_{\mathbf{l}}(\mathbf{x}, \mathbf{y})) \end{split} \tag{8}$$

Combining these equations, our final loss for node generation L*node* is

$$\mathcal{L}\_{\text{node}} = \min\_{\mathcal{G}\_{\text{u}}, \mathcal{G}\_{l}} \max\_{D\_{\text{u}}, D\_{l}} \mathcal{L}\_{\text{GAN}} + \mathcal{L}\_{\text{cGAN}} + \mathcal{L}\_{\text{GAN}-\text{cGAN}} \tag{9}$$

For our proposed generator, the following theoretical analysis is performed.

**Proposition 1.** *For any fixed Gu and Gl, the optimal discriminator Du and Dl of the game defined by* L*node is*

$$D\_{\boldsymbol{u}}^{\*}(\mathbf{x}) = \frac{P\_{\rm BD}(\mathbf{x})}{P\_{\rm BD}(\mathbf{x}) + P\_{\rm a}(\mathbf{x})},\\D\_{l}^{\*}(\mathbf{x}, \boldsymbol{y}) = \frac{P\_{\rm SF}(\mathbf{x}, \boldsymbol{y})}{P\_{\rm SF}(\mathbf{x}, \boldsymbol{y}) + P\_{\beta}(\mathbf{x}, \boldsymbol{y})} \tag{10}$$

where *<sup>P</sup>α*(*x*) = *α* · *Pu*(*x*)+(<sup>1</sup> − *α*) · *Pl*(*x*), and *<sup>P</sup>β*(*<sup>x</sup>*, *y*) = *β* · *Pl*(*<sup>x</sup>*, *y*)+(<sup>1</sup> − *β*) · *Pu*(*<sup>x</sup>*, *y*).

**Proof.** We have

$$\begin{split} \mathcal{L}\_{node} &= \int\_{\mathcal{X}} P\_{BD}(\mathbf{x}) \log D\_{\mathbf{u}}(\mathbf{x}) d\mathbf{x} + \int\_{\mathcal{x}, \mathcal{Y}} P\_{\mathbf{SF}}(\mathbf{x}, \mathbf{y}) \log D\_{\mathbf{l}}(\mathbf{x}, \mathbf{y}) d\mathbf{x} d\mathbf{y} \\ &+ \mathfrak{a} \cdot \int\_{\mathcal{X}} P\_{\mathbf{u}}(\mathbf{x}) \log(1 - D\_{\mathbf{u}}(\mathbf{x})) d\mathbf{x} + \mathfrak{f} \cdot \int\_{\mathcal{X}, \mathcal{Y}} P\_{\mathbf{l}}(\mathbf{x}, \mathbf{y}) \log(1 - D\_{\mathbf{l}}(\mathbf{x}, \mathbf{y})) d\mathbf{x} d\mathbf{y} \\ &+ (1 - \mathfrak{a}) \cdot \int\_{\mathcal{X}} P\_{\mathbf{l}}(\mathbf{x}) \log(1 - D\_{\mathbf{u}}(\mathbf{x})) d\mathbf{x} + (1 - \mathfrak{f}) \cdot \int\_{\mathcal{X}, \mathcal{Y}} P\_{\mathbf{u}}(\mathbf{x}, \mathbf{y}) \log(1 - D\_{\mathbf{l}}(\mathbf{x}, \mathbf{y})) d\mathbf{x} d\mathbf{y} \\ &= \int\_{\mathcal{X}} P\_{\mathbf{BD}}(\mathbf{x}) \log D\_{\mathbf{u}}(\mathbf{x}) + P\_{\mathbf{a}}(\mathbf{x}) \cdot \log(1 - D\_{\mathbf{u}}(\mathbf{x})) d\mathbf{x} \\ &+ \int\_{\mathcal{X}, \mathcal{Y}} P\_{\mathbf{SF}}(\mathbf{x}, \mathbf{y}) \log D\_{\mathbf{l}}(\mathbf{x}, \mathbf{y}) + P\_{\mathbf{\beta}}(\mathbf{x}, \mathbf{y}) \cdot \log(1 - D\_{\mathbf{l}}(\mathbf{x}, \mathbf{y})) d\mathbf{x} d\mathbf{y} \end{split} \tag{11}$$

For any (*a*, *b*) ∈ R<sup>2</sup>\{0, <sup>0</sup>}, the function *f*(*y*) = *alogy* + *blog*(1 − *y*) achieves its maximum in [0, 1] at *a a*+*b* . This concludes the proof.

**Proposition 2.** *The equilibrium of* L*node is achieved if and only if PBD*(*x*) = *<sup>P</sup>α*(*x*) *and PSF*(*<sup>x</sup>*, *y*) = *<sup>P</sup>β*(*<sup>x</sup>*, *y*) *with <sup>D</sup>*<sup>∗</sup>*u*(*x*) = *<sup>D</sup>*<sup>∗</sup>*l* (*<sup>x</sup>*, *y*) = 12 *, and the optimal value of* L*node is* −*4log2.*

$$\text{Proof. When } D\_{\mathfrak{u}}(\mathfrak{x}) = D\_{\mathfrak{u}}^{\*}(\mathfrak{x}), D\_{l}(\mathfrak{x}, \mathfrak{y}) = D\_{l}^{\*}(\mathfrak{x}, \mathfrak{y})\_{\prime} \text{ we have } \mathfrak{x}$$

$$\begin{split} \mathcal{L}\_{\text{nudz}} &= \int\_{x} P\_{\text{BD}}(\mathbf{x}) \log \frac{P\_{\text{BD}}(\mathbf{x})}{P\_{\text{BD}}(\mathbf{x}) + P\_{\text{a}}(\mathbf{x})} d\mathbf{x} + \int\_{x,y} P\_{\text{SF}}(\mathbf{x}, y) \log \frac{P\_{\text{SF}}(\mathbf{x}, y)}{P\_{\text{SF}}(\mathbf{x}, y) + P\_{\beta}(\mathbf{x}, y)} d\mathbf{x} dy \\ &+ \int\_{x} P\_{\text{a}}(\mathbf{x}) \log \frac{P\_{\text{a}}(\mathbf{x})}{P\_{\text{BD}}(\mathbf{x}) + P\_{\text{a}}(\mathbf{x})} d\mathbf{x} + \int\_{x,y} P\_{\beta}(\mathbf{x}, y) \log \frac{P\_{\beta}(\mathbf{x}, y)}{P\_{\text{SF}}(\mathbf{x}, y) + P\_{\beta}(\mathbf{x}, y)} d\mathbf{x} dy \\ &= -4 \log 2 + 2 \cdot |S D(P\_{\text{BD}}(\mathbf{x}) || P\_{\text{a}}(\mathbf{x})) + 2 \cdot |S D(P\_{\text{SF}}(\mathbf{x}, y) || P\_{\beta}(\mathbf{x}, y)) \\ &\geq -4 \log 2 \end{split} \tag{12}$$

where the optimal value is achieved when the two Jensen–Shannon divergences are equal to 0, namely, *PBD*(*x*) = *<sup>P</sup>α*(*x*), and *PSF*(*<sup>x</sup>*, *y*) = *<sup>P</sup>β*(*<sup>x</sup>*, *y*). When *α* = *β* = 1, we have *PBD*(*x*) = *Pu*(*x*), *PSF*(*<sup>x</sup>*, *y*) = *Pl*(*<sup>x</sup>*, *y*).

In the implementation, both *Gu* and *Gl* are designed as a three-layer feed-forward neural network. In contrast, *Du* and *Dl* are designed with a relatively weaker structure: a one-layer feed-forward neural network for facilitating the training.
