*3.5. Node Classifier*

We now obtain an augmented balanced graph *G* = {*V*, *A*, *X*, *<sup>B</sup>*}, where *V* consists of both real nodes and synthetic labeled and unlabeled nodes; further, *A*, *X*, and *B* denote the edge, feature, and label information of the enlarged vertex set, respectively. A classic two-layer GCN structure [12] is adopted for node classification, given its high accuracy and efficiency. Its first and second layers are denoted as *L*1 and *L*2, respectively, and their corresponding outputs {*O*1, *O*2} are

$$O^1 = \operatorname{ReLU}(\vec{D}^{-\frac{1}{2}}\vec{A}^\prime \vec{D}^{-\frac{1}{2}}\vec{X}^\prime \mathcal{W}^1) \tag{16}$$

$$O^2 = \sigma(F\mathcal{D}^{-\frac{1}{2}}\tilde{A}^\prime \mathcal{D}^{-\frac{1}{2}}O^1\mathcal{W}^2) \tag{17}$$

where *A*˜ = *A* + *I*, *I* is an identity matrix of the same size as *A* . *D*˜ is a diagonal matrix, and *D*˜ *ii* = ∑*j <sup>A</sup>*˜*ij*. *D*˜ − 1 2 *A*˜*D*˜ − 1 2 is the normalized adjacency matrix. Further, *W*<sup>1</sup> and *W*<sup>2</sup> are the learnable parameters in the first and second layers, respectively. *ReLU* and *σ* are the respective activation functions of the first and the second layer, where *ReLU*(*Z*)*i* = *max*(0, *Zi*), *<sup>σ</sup>*(*Z*)*i* = *Sigmoid*(*Z*)*i* = 1 <sup>1</sup>+*exp*(−*Zi*). *O*<sup>2</sup> is the posterior probability of the class to which the node belongs. *F* is the label correlation matrix that is computed in the same way as in [5], which provides helpful label correlation and interaction information.

In Equations (16) and (17), the role of *D*˜ − 1 2 *A*˜*D*˜ − 1 2 , or the normalized adjacency matrix, is to enrich the feature vector of a node by linearly adding all feature vectors of its neighbors. This is because the basic assumption of a GCN is that neighboring nodes (and thus those having similar neighbors) are more likely to belong to the same class. The role of *W*1, *W*<sup>2</sup> is to transform the feature dimension of the nodes, making sparse high-dimensional node features dense at low dimensions. In addition, Equations (16) and (17) can also be equivalently described as the process by which the input signal (i.e., node feature *X*) is filtered through a graph Fourier transform in the graph spectral domain [32]; however, in this study, we consider the spatial domain.

Eventually, given the training labels *<sup>B</sup>train*, we minimize the following cross-entropy error to learn the classifier, where *p* is the number of training samples, *m* is the size of the label set, and *nc* stands for node classifier. By minimizing L*nc*, we can learn the parameters of the GCN such that it predicts the posterior probability of the class to which the unlabeled node belongs.

$$\mathcal{L}\_{\text{nc}} = -\sum\_{i=1}^{p} \sum\_{j=1}^{m} B\_{ij}^{train} \ln O\_{ij}^{2} \tag{18}$$
