*3.4. Edge Generator*

The edge generator described in this section is responsible for estimating the relation between virtual nodes and real nodes, which facilitates feature propagation, feature extraction, and node classification. Such edge generators will be trained on real nodes and existing edges. Following a previous work [17], the inter-node relation is embodied in the weighted inner product of node features. Specifically, for two nodes *vi* and *vj*, let *Eij* denote the probability of the existence of an edge between them, which is computed as

$$E\_{ij} = \sigma(\mathbf{x}\_i \cdot \mathsf{W}^{cdgs} \cdot \mathbf{x}\_j^T) \tag{13}$$

where *xi* and *xj* are the feature vectors of *vi* and *vj*, respectively. *Wedge* ∈ R*k*×*<sup>k</sup>* is the weight parameter matrix to be learned, and *σ* = *Sigmoid*(). Then, the extended adjacency matrix *A* is defined as follows:

$$A'\_{ij} = \begin{cases} A\_{ij\prime} & \text{if } v\_i \text{ and } v\_j \text{ are real nodes} \\\ E\_{ij\prime} & \text{if } v\_i \text{ or } v\_j \text{ is synthetic node} \end{cases} \tag{14}$$

Compared to *A*, *A* contains new information about virtual nodes and edges, which will be sent to the node classifier in Section 3.5. As the edge generator is expected to be partially trained based on the final node classifier (see Section 3.6), the predicted edges should be set as continuous so that the gradient can be calculated and propagated from the node classifier. Thus, *Eij* is not discretized to some value in {0,1}. The edge generator should be capable of accurately predicting real edges to generate realistic virtual nodes. Then, the pre-trained loss function for training the edge generator is

$$\mathcal{L}\_{\text{edge}} = \left\| E - A \right\|^2 \tag{15}$$

where *E* refers to predicted edges between real nodes.
