Unsupervised Graph Structure Learning Based on Optimal Graph Topology Modeling and Adaptive Data Augmentation

An, Dongdong; Pan, Zongxu; Zhao, Qin; Liu, Wenyan; Liu, Jing

doi:10.3390/math12131991

Open AccessArticle

Unsupervised Graph Structure Learning Based on Optimal Graph Topology Modeling and Adaptive Data Augmentation

by

Dongdong An

¹,

Zongxu Pan

¹

,

Qin Zhao

¹

,

Wenyan Liu

^2,3

and

Jing Liu

^4,*

¹

Shanghai Engineering Research Center of Intelligent Education and Bigdata, Shanghai Normal University, Shanghai 200234, China

²

Ant Group, Hangzhou 310023, China

³

College of Computer Science and Technology, Zhejiang University, Hangzhou 310058, China

⁴

Shanghai Key Laboratory of Trustworthy Computing, East China Normal University, Shanghai 200062, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(13), 1991; https://doi.org/10.3390/math12131991

Submission received: 27 May 2024 / Revised: 23 June 2024 / Accepted: 25 June 2024 / Published: 27 June 2024

(This article belongs to the Special Issue Advances in Data Mining, Neural Networks and Deep Graph Learning)

Download

Browse Figures

Versions Notes

Abstract

:

Graph neural networks (GNNs) are effective for structured data analysis but face reduced learning accuracy due to noisy connections and the necessity for explicit graph structures and labels. This requirement constrains their usability in diverse graph-based applications. In order to address these issues, considerable research has been directed toward graph structure learning that aims to denoise graph structures concurrently and refine GNN parameters. However, existing graph structure learning approaches encounter several challenges, including dependence on label information, underperformance of learning algorithms, insufficient data augmentation methods, and limitations in performing downstream tasks. We propose Uogtag, an unsupervised graph structure learning framework to address these challenges. Uogtag optimizes graph topology through the selection of suitable graph learners for the input data and incorporates contrastive learning with adaptive data augmentation, enhancing the learning and applicability of graph structures for downstream tasks. Comprehensive experiments on various real-world datasets demonstrate Uogtag’s efficacy in managing noisy graphs and label scarcity.

Keywords:

graph neural networks; unsupervised learning; graph structure learning; contrastive learning on graphs

MSC:

68T05; 68T07; 68T09

1. Introduction

In real-world scenarios, a set of objects and their interrelations are often represented as graphs. Graph data, such as citation networks, social networks, and knowledge databases, have gained widespread attention for their applicability. With the advancement of deep learning, graph neural networks (GNNs) have become the leading paradigm for graphically depicting nodes. GNNs utilize message-passing mechanisms to aggregate neighboring information, thereby updating and refining node representations. This process encapsulates node attributes, adjacent data, and local graph topology, yielding notable outcomes in areas like social networks [1], traffic systems [2], recommendation systems [3], and biochemistry [4].

However, real-world graphs often contain noise. Training GNNs on graphs with noisy edges or scarce labels can significantly impair their efficiency [5]. Due to message-passing, GNNs become vulnerable to adverse or noisy edges. For example, poisoning attacks by introducing edges between distinctly labeled or featured nodes can corrupt node neighborhoods, which affects node representation and the overall performance of GNNs [6].

To address the aforementioned issues, deep graph structure learning (GSL) has become a current research hotspot. The goal of graph structure learning methods is to learn a denoised graph structure while optimizing the parameters of graph neural networks (GNNs) [7], as traditionally illustrated in Figure 1a. However, these methods face the following challenges:

Dependence on label information. Supervised GSL heavily relies on manually annotated labels for structural enhancements, which is often impractical due to the high cost of data annotation [8].
Limitations of graph learners. As the core module in graph structure learning, the diversity and adaptability of graph learners are vital for handling different datasets. However, most current methods employ a single graph learner, which may lead to significant performance discrepancies across different datasets [5,9].
Inadequate data augmentation. Effective data augmentation methods are lacking in current graph learning strategies, restricting the potential for structural discovery and refinement [10].
Task-specific restrictions. The focus on specific tasks like node classification could result in overly specialized structures, hindering broad application and generalization in other tasks, such as link prediction or clustering.

2. Related Work

To tackle the problems mentioned above, our paper proposes a novel unsupervised graph structure learning paradigm to address the problems mentioned above, as shown in Figure 1b. In our learning paradigm, structures are derived directly from data without any external labels, thus achieving a versatile and unbiased topology adaptable to various downstream tasks. We propose a contrastive learning module that employs adaptive data augmentation for unsupervised structure learning through self-supervised methods. We model the most accurate graph topology by selecting the most appropriate graph learning module based on the data’s known topology and scale. Subsequently, an “anchor graph” is constructed to guide structural optimization. Here, contrastive views generated via adaptive augmentation are employed alongside contrastive loss to promote mutual information maximization between learning entities, unveiling potential connections and promoting balanced edge distribution. Our contributions are threefold:

Addressing the challenges that most traditional graph structure learning (GSL) methods encounter, such as reliance on labeled data, constraints of graph learning tools, lack of effective data augmentation, and limitations in downstream tasks. We propose an unsupervised learning paradigm that addresses the major challenges faced by traditional GSL methods, enhancing practicality across various tasks and adapting to different graph data types and sizes.
Our novel GSL model, Uogtag, judiciously selects the optimal graph learner for differing input data, refining graph structures through an adaptive data augmentation-driven contrastive learning module.
Through exhaustive experimentation on real-world datasets, we validate Uogtag’s robustness across multiple GSL tasks under distinct adversarial conditions and across various parameter settings, showcasing its efficacy in managing noise and sparse labeling.

2.1. Graph Structure Learning

Graph neural networks (GNNs) have attracted considerable attention due to their outstanding performance in various graph applications such as link prediction [11] and social recommendation [12]. However, most GNNs treat the observed graph structure as ground truth, ignoring issues such as noise and missing information, which severely affects the embedding quality of GNNs in defective graph structures [13]. To alleviate this problem, there has been recent work on graph structure learning, which is a machine learning method concerning graph data aimed at jointly optimizing the structure of the graph and the parameters of GNNs to achieve better performance in downstream tasks [14].

Graph structure learning typically follows a generic process: input a graph with node features (the topological information of the graph is not necessarily provided), continually optimize the graph structure through structural modeling, use the optimized graph structure for message passing to generate node representations for downstream tasks, and then iteratively update the graph structure and node representations. Graph structure modeling is a key module of graph structure learning, and existing methods can be broadly categorized into three types [15]:

Metric-based methods: These methods obtain edge weights for node pairs using some metric functions (taking representations of node pairs as input). Relevant works include Yu et al. [16], who describe the graph learning problem as a metric learning problem, use cosine similarity as the metric function, and employ adaptive graph regularization to control the quality of the learned graph. Metric-based methods rely heavily on predefined metrics, which may not generalize well across different datasets. These methods often fail to adapt to the varying scales and complexities of real-world graphs, limiting their applicability. Our method addresses this by using adaptive data augmentation techniques, which avoid over-reliance on predefined metrics and improve generalization across different datasets.

Neural network methods: Compared to metric learning-based methods, neural network methods use more complex deep neural networks to model node features and represent edge weights. Relevant works include Luo et al. [17], who use multi-layer perceptrons to generate intermediate graph adjacency matrices, followed by discrete sampling. Ling et al. [18] utilize attention mechanisms to model edge connections, as attention mechanisms can capture complex interactions between nodes. While powerful, neural network methods often require significant computational resources and may overfit to specific graph structures. They also tend to be sensitive to hyperparameter settings and may not effectively handle noisy or incomplete graphs. Our self-supervised learning scheme does not require external label information and incorporates adaptive data augmentation, enabling effective handling of noisy and incomplete graph data, reducing dependence on computational resources, and improving model robustness.

Direct approaches: This approach considers the adjacency matrix as a freely learnable matrix. Compared to the first two methods, direct approaches have greater flexibility as they do not rely on node representations to model edge connections. For example, Jin et al. [19] introduce a low-rank prior implemented using nuclear norm and use an alternating optimization scheme to update the adjacency matrix and GNN parameters iteratively. Zhang et al. [20] further enhance this approach by introducing virtual nodes and using the Gumbel-Softmax function to optimize the adjacency matrix through gradient descent, achieving improved performance and robustness in graph structure learning. While direct approaches offer flexibility, they often suffer from high computational costs and complexity in learning the adjacency matrix parameters. They may also struggle with scalability and robustness when applied to large and noisy datasets. Our method selectively uses different graph structure modeling modules according to different data inputs, reducing computational costs and improving the efficiency and robustness of learning adjacency matrix parameters.

In summary, while each method—metric-based, neural network-based, and direct approaches—has its strengths and weaknesses, our proposed method integrates adaptive data augmentation and self-supervised learning to address these limitations. By selectively using different graph structure modeling modules based on the input data, our approach enhances generalization, reduces computational costs, and improves robustness in handling noisy and incomplete graphs, providing a more flexible and efficient solution for graph structure learning.

2.2. Contrastive Learning on Graphs

Contrastive learning has achieved significant success in the fields of vision and language and is becoming increasingly popular in the domain of graph representation learning. A plethora of contrastive learning methods based on maximizing mutual information have emerged, among which self-supervised contrastive learning, due to its learning paradigm that does not rely on labeled data, has become a current research hotspot [21].

Self-supervised contrastive learning generates different data forms from different perspectives by adopting various augmentation methods for the original input data. It then learns by contrasting the differences between different views, with the commonality and difference signals between data pairs serving as the supervisory signal [22]. Graph contrastive learning models can be broadly divided into three types: node-level contrast, global contrast, and node-global contrast [23]. Peng et al. [24] proposed the node-level graph contrast method GMI, extending the mutual information between views to be calculated from both node features and topological structure. In GMI, each node acts as a central node, calculating the mutual information of the central node with its neighboring nodes in terms of feature dimensions and structural dimensions, and maximizing mutual information to achieve multi-view contrastive learning, extracting features and topological information of graph data. Velickovic et al. [25] proposed the DGI model, which achieves a node-global contrastive learning framework by contrasting the local neighborhood structure of nodes in a graph with their higher-order neighborhood structure, and constructs a contrastive loss for network optimization by maximizing contrastive mutual information. You et al. [26] proposed the global-scale graph contrast model GraphCL, which, through different data augmentation methods such as node and edge dropout and subgraph partitioning, obtains a wider variety of augmented views to fully mine the hidden patterns in graph data. Sun et al. [27] introduced the Adaptive Line Graph Contrastive Learning (ALGC) method, which converts edges in the graph to line graph nodes and uses an adaptive augmented strategy based on prediction feedback to adjust the edges for an augmented line graph, effectively balancing information in both views to enhance performance and robustness in biomedical interaction prediction.

Much of the innovation in self-supervised contrastive learning currently focuses on data augmentation, which can expand training data without collecting more labeled information, usually by modifying existing data or generating new data to achieve the effect of data augmentation [28]. Moreover, data augmentation can effectively reduce the risk of overfitting during training, a method that has been widely applied in computer vision, natural language processing, and other areas. Existing self-supervised graph contrastive learning frameworks, such as in sublime [8], adopt random edge deletion and feature deletion for augmentation, ignoring the intrinsic connections of the original graph structure. We believe that data augmentation methods should reflect the intrinsic connections of data; therefore, unlike the traditional random augmentation of edges and features, our model’s contrastive learning data augmentation module adopts a scheme based on node and feature centrality to enhance edges and attributes adaptively.

3. Preliminaries

Before presenting our self-supervised graph structure learning framework, we first introduce some symbols and basic concepts. A graph with attributes can be represented as

G = (V, E, X) = (A, X)

, where V is the set of nodes in the graph, E is the set of edges,

X \in R^{n \times d}

is the feature matrix of nodes,

x_{i} \in R^{1 \times d}

is the feature vector of node

v_{i}

, and

A \in {[0, 1]}^{n \times n}

is the adjacency matrix, where

a_{i j}

represents the specific element of the adjacency matrix, with values of 1 or 0 indicating whether there exists an edge between nodes

(i, j)

.

Our paper considers two unsupervised GSL tasks, namely structure inference and structure refinement. Structure inference is mainly used to infer the basic structure of the graph from the given data (at this time, the graph structure may be unknown), while structure refinement is mainly used to optimize and improve the existing graph structure, making the learned graph structure more accurate and suitable for specific tasks.

Definition 1.

Structure Inference: Given a feature matrix

X \in R^{n \times d}

(corresponding to the case where the adjacency matrix A of the input data is unknown, as illustrated in Figure 2), the goal of structure inference is to automatically learn a graph topology

\tilde{S} \in {[0, 1]}^{n \times n}

that reflects the correlations and organizational structure of the data. Here,

{\tilde{S}}_{i j} \in [0, 1]

indicates whether there exists an edge between nodes

x_{i}

and

x_{j}

.

Definition 2.

Structure Refinement: Given a noisy graph structure

G = (A, X)

(corresponding to the case where the adjacency matrix A of the input data is known, as illustrated in Figure 2), structure refinement optimizes A by adding, deleting, or modifying edges, such that the newly generated adjacency matrix

\tilde{S} \in {[0, 1]}^{n \times n}

more accurately reflects the latent relationships between nodes. The graph topology

\tilde{S}

can either be automatically learned from data or optimized from an existing graph structure. Our hypothesis is that using

G_{l} = (\tilde{S}, X)

as input for downstream tasks can fundamentally enhance the model’s performance on these tasks.

4. The Proposed Model

This section provides a detailed description of our new model, Uogtag, as illustrated in Figure 2. Uogtag consists of two main components. The first is the graph structure learning module, which initially models and regularizes the graph topology based on the input data. The second is the graph contrastive learning module, which performs contrastive learning with an adaptive data augmentation approach to refine the graph structure by maximizing the mutual information between the anchor view and the learning view.

In the graph structure learning phase, Uogtag selects different graph learners based on whether the topological structure of the input graph is known. These graph learners parameterize an initial adjacency matrix. Subsequently, this parameterized adjacency matrix undergoes post-processing to generate a preliminarily learned adjacency matrix, which is then provided to the graph contrastive learning module.

The graph contrastive learning module, based on adaptive data augmentation, establishes two distinct contrastive views: the learner view obtained from the graph structure learning module and the anchor view that guides the learning process. Unlike traditional random data augmentation methods, the adaptive data augmentation method assigns different deletion probabilities to each edge to guide the model in ignoring potential edge noise. This augmentation approach better reflects the inherent patterns of the graph. After data augmentation, node-level contrastive learning is employed to maximize the consistency between the two views. Additionally, a structural bootstrapping mechanism is used to update the anchor view with the learned structure, continuously enhancing the learning process. The following sections will separately explain the key components of our framework.

Figure 2. The overall process of Uogtag. In the graph structure learning module, either Graph Learner I or Graph Learner II is chosen based on whether the input data contain a topological structure to model the optimal graph structure (refer to Figure 1b for the Optimal Learner Selection section). Then, the contrastive learning module based on adaptive data augmentation (refer to Figure 1b for the Adaptive Data Augmentation section) maximizes the consistency between the anchor view and the learning view to optimize the final learned adjacency matrix

\tilde{S}

.

Figure 2. The overall process of Uogtag. In the graph structure learning module, either Graph Learner I or Graph Learner II is chosen based on whether the input data contain a topological structure to model the optimal graph structure (refer to Figure 1b for the Optimal Learner Selection section). Then, the contrastive learning module based on adaptive data augmentation (refer to Figure 1b for the Adaptive Data Augmentation section) maximizes the consistency between the anchor view and the learning view to optimize the final learned adjacency matrix

\tilde{S}

.

4.1. Graph Learner

As a key component of graph structure learning, this section employs two types of graph learners to generate a sketch adjacency matrix

\tilde{S}

. For inputs without provided topological information, we utilize fully graph parameterized (FGP) and two metric learning-based learners, namely attention (Attentive) and multi-layer perceptron (MLP). They are selected in graph learner I depending on the scale of the input dataset. In contrast to FGP and the two metric learning-based learners, when the initial topological information of the graph data is known, we adopt a graph learner based on graph neural networks (GNNs), denoted as graph learner II, as it can utilize additional topological information for better graph structure learning performance.

4.1.1. Graph Learner I

FGP learner

The FGP learner models each element of the adjacency matrix with learnable parameters and then ensures training stability through a non-linear activation function. Formally, the FGP learner is defined as

S^{'} = p_{ω}^{F G P} = σ (Ω)

where

ω = Ω \in R^{n \times n}

is a parameter matrix and

σ (\cdot)

is a non-linear function that stabilizes the training. The FGP learner assumes that each edge in the graph exists independently.

In contrast to the FGP learner, metric-learning-based learners first obtain node embeddings

E \in R^{n \times d}

from the data, and then model

S^{'}

using the pairwise similarity of embeddings:

S^{'} = p_{ω}^{M L} (X, A) = ϕ (h_{ω} (X, A)) = ϕ (E)

where

h_{ω} (\cdot)

is an embedding function based on neural networks,

ω

is the parameters, and

ϕ (\cdot)

is a non-parametric metric function (such as cosine similarity) used to calculate pairwise similarity. For different

h_{ω} (\cdot)

, we provide two different metric-learning-based learners: Attention and multi-layer perceptron.

Attentive learner

The attention learner uses an attention network similar to GAT as the embedding network, where each layer computes the element-wise product of input feature vectors and parameter vectors:

E^{(l)} = h_{ω}^{(l)} (E^{(l - 1)}) = σ ({[e_{1}^{(l - 1)} ⊙ ω^{(l)}, \dots, e_{n}^{(l - 1)} ⊙ ω^{(l)}]}^{T})

where

E^{(l)}

is the parameter matrix of the l-th layer of the embedding network,

e_{i}^{(l - 1)} \in R^{d}

is the transposed row vector of

E^{(l - 1)}

,

ω^{(l)} \in R^{d}

is the parameter vector of the l-th layer, ⊙ represents the element-wise product, and GAT dynamically focuses on the importance of different neighboring nodes during information propagation by learning attention weights, thereby more effectively capturing local patterns in the graph structure.

MLP learner

The multi-layer perceptron (MLP) learner uses a multi-layer perceptron as the embedding network. An MLP is a feedforward artificial neural network with multiple layers of nodes, where each layer is fully connected to the next layer. The expression for a single layer can be written as

E^{(l)} = h_{ω}^{(l)} (E^{(l - 1)}) = σ (E^{(l - 1)} Ω^{(l)})

where

Ω^{(l)} \in R^{d \times d}

is the parameter matrix of the l-th layer. Compared to the attention learner, the implementation of the multi-layer perceptron learner is relatively simpler. In certain tasks, especially when the graph structure is small, or the task is relatively simple, the multi-layer perceptron learner may be a more direct and efficient choice.

Post-processor

The post-processor

p (\cdot)

can optimize the rough sketch adjacency matrix

S^{'}

learned by the full-graph parameterization (FGP) and the two metric-learning-based learners into a sparse, non-negative, symmetrically normalized adjacency matrix

\tilde{S}

. Therefore, this part involves four post-processing steps, namely sparsification

p_{s p} (\cdot)

, non-negative processing

p_{a c t} (\cdot)

, symmetrization

p_{s y m} (\cdot)

, and normalization

p_{n o r} (\cdot)

.

Sparsification

The rough sketch adjacency matrix

S^{'}

learned by Graph learnerI is typically dense. However, such an adjacency matrix is not meaningful for most applications and leads to expensive computational costs. Therefore, we adopt sparsification based on k-nearest neighbors (KNNs) for

S^{'}

. We retain edges with the top k connectivity values for each node, while the remaining edges are not connected. The sparsification process

p_{s p} (\cdot)

is expressed as

S_{i j}^{'} (sp) = p_{sp} (S_{i j}^{'}) = \{\begin{matrix} S_{i j}^{'}, & if S_{i j}^{'} \in top - k (S_{i}^{'}), \\ 0, & otherwise, \end{matrix}

Symmetrization and Activation

In the real world, there are many undirected graphs, so we consider optimizing and inferring graph structures under the premise of undirected graphs. Meanwhile, to meet the adjacency matrix’s definition, the edges’ weights should be non-negative. To satisfy these conditions, we perform symmetrization and activation processing on the adjacency matrix as follows:

S_{sym}^{'} = p_{sym} (p_{act} (S_{sp}^{'})) = (σ_{p} (S_{sp}^{'}) + σ_{p} {(S_{sp}^{'})}^{⊺}) / 2

where

σ_{p} (\cdot)

is a non-linear activation function. For metric-based learners, we define

σ_{p} (\cdot)

as the ReLU function, and for FGP learners, we use the ELU function to prevent gradient disappearance.

Normalization

To ensure that the edge weights are within the range [0, 1], we normalize

\tilde{S}

. Specifically, we adopt symmetric normalization:

\tilde{S} = q_{norm} (s_{sym}) = {(D_{sym})}^{- 1 / 2} S_{sym}^{'} {(D_{sym})}^{- 1 / 2}

where

D_{sym}

is the degree matrix of

S_{sym}^{'}

. The final adjacency matrix formed after post-processing is

\tilde{S}

.

4.1.2. Graph Learner II

Compared to the learners mentioned previously, GNN learners can utilize additional known topological information, resulting in better learning outcomes. However, GNNs are highly susceptible to carefully designed perturbations; adversarial attacks can easily corrupt the learned adjacency matrix, affecting downstream task prediction performance. Real-world graphs are typically low-rank and sparse, and features of two adjacent nodes are often similar. Inspired by [19], we enhance the learning performance of GNN learners by searching for these properties. To investigate the low-rank and sparsity properties of the adjacency matrix, we frame the problem as a graph structure learning issue:

arg min_{\tilde{S}} [L_{0} = {∥ A - \tilde{S} ∥}_{F}^{2} + R (\tilde{S})], s . t ., \tilde{S} = S^{T} .

where

R (\tilde{S})

represents the constraints on

\tilde{S}

to enforce its low-rank and sparse properties. Additionally, since similar nodes may have similar features, the smoothness of the features can be captured by the following

L_{S}

:

L_{S} = \frac{1}{2} \sum_{i, j = 1}^{N} s_{i j} {(x_{i} - x_{j})}^{2}

So the final objective function can be expressed as

arg min L = α L_{0} + β \cdot L_{S}, s . t . \tilde{S} = S^{T}

where

α

and

β

are predefined parameters used to control the preservation of low-rank and sparsity of the adjacency matrix, and

β

controls the contribution of feature smoothness. According to the above principles, by taking the feature matrix X and the original feature matrix A as inputs to graph learner II, we obtain

E^{(l)} = h_{ω}^{(l)} (E^{(l - 1)}, A) = σ (D^{' - 1 / 2} \tilde{A} D^{' - 1 / 2} E^{(l - 1)} Ω^{(l)})

where

\tilde{A} = A + I

is the adjacency matrix with self-loops, and

D^{'}

and

A^{'}

represent the degree matrix and the adjacency matrix, respectively.

4.2. Multi-View Contrastive Learning Based on Adaptive Data Augmentation

After generating an initial adjacency matrix

\tilde{S}

using a graph learner based on real datasets, we perform multi-view contrastive learning. Specifically, we construct two views based on the learned structure and the original data, respectively. Then, we augment these two data graphs and utilize node-level contrastive learning to maximize the mutual information between the two augmented views. The following sections elaborate on this process.

4.2.1. Construction of Graph Views

Unlike the method of directly augmenting the original graph to form two views and performing contrastive learning on these two views, our framework defines the graph learned by the graph learner as one view, called the “learner view”. The graph initialized from the original input data is defined as another view, called the “anchor view”.

The learning view, denoted as

G_{l} = (\tilde{S}, X)

, consists of the adjacency matrix

\tilde{S}

learned by the graph learner and the feature matrix X of the original data. In subsequent iterative loops, the parameters modeling it are updated via gradient descent to optimize the graph structure.

The anchor view provides stable guidance for GSL. For structure refinement tasks where the original adjacency matrix A is available, we define the anchor view as

G_{a} = (A_{a}, X) = (A, X)

. For structure inference tasks where the initial adjacency matrix is unknown, we set the anchor view as

G_{a} = (A_{a}, X) = (I, X)

, where I represents the identity matrix. To ensure stable guidance, the anchor view is not updated via gradient descent but instead utilizes a bootstrapping mechanism.

4.2.2. Graph Data Augmentation

With the development of graph contrastive learning, data augmentation schemes have been proven to be a key component of the learning process. Traditional approaches to augmenting graph data typically involve random edge and feature dropout, which overlooks the differences in the impact of nodes and edges during data augmentation. For example, constructing views by uniformly removing edges can degrade the quality of embeddings by discarding influential edges. Since the representations learned through contrastive learning are usually not affected by data augmentation schemes, the data augmentation strategy should reflect the underlying characteristics of the input graph.

Taking edge removal as an example, during random edge dropout, we can assign a higher dropout probability to less important edges and a lower dropout probability to important edges. This approach guides the model to ignore less important edges that may be noise. Specifically, when randomly deleting edges or masking node features, the deletion probability is skewed towards unimportant edges or features, resulting in a higher probability of deletion for unimportant edges or features and a lower probability of deletion for important edges or features.

Topological-Level Enhancement

For topological-level enhancement, we modify the initial graph using random edge deletion. We sample a subset

E^{'}

from the original edge set E with the following probability:

P {(u, v) \in E^{'}} = 1 - p_{u v}^{e}

where

(u, v) \in E

,

p_{u v}^{e}

is the probability of removing the edge

(u, v)

. Thus,

E^{'}

represents the edge set of the graph generated after data augmentation, where

p_{u v}^{e}

reflects the importance of the edge

(u, v)

. The enhancement function aims to discard unimportant edges with a high probability while preserving important connectivity structures.

Node centrality is a widely used measure for assessing the influence of nodes in a graph. We define the centrality measure of nodes as

ϕ_{c} (\cdot)

. In an undirected graph, the centrality

ω_{u v}^{e}

of edge

(u, v)

is defined as the average centrality score of the two connected nodes, i.e.,

ω_{u v}^{e} = (ϕ_{c} (u) + ϕ_{c} (v)) / 2

.

Next, we can calculate the probability of each edge being discarded based on its edge centrality value. Since nodes have varying degrees of connectivity strength, normalization is required. We use

Ω_{u v}^{e} = log (ω_{u v}^{e})

to mitigate the influence of nodes with strong connectivity density. The dropout probability of edge

(u, v)

can be expressed as

p_{u v}^{e} = min (\frac{(Ω_{\max}^{e} - Ω_{u v}^{e})}{(Ω_{\max}^{e} - μ_{s}^{e})} \cdot p_{e}, p_{τ})

where

p_{e}

and

p_{τ}

are two hyperparameters used to prevent the excessive destruction of the graph due to large deletion probabilities.

p_{e}

controls the overall probability of deleting edges, and

p_{τ} < 1

is the truncation probability.

Ω_{\max}^{e}

and

μ_{s}^{e}

are the maximum and mean values of

Ω_{u v}^{e}

, respectively. For the selection of node centrality, we adopt three centrality measures: degree centrality [29], eigenvector centrality [30], and PageRank centrality [31].

Attribute-Level Enhancement

To enhance node features, we adopt the strategy of randomly masking a portion of zero values in node features to add noise. Specifically, we independently draw a random vector m from the Bernoulli distribution, i.e.,

m_{i} \sim Bern (1 - p_{i}^{f})

. Then, the generated node features

\hat{X}

are obtained by the following computation:

\hat{X} = {[x_{1} ⊙ m; x_{2} ⊙ m; \dots; x_{N} ⊙ m]}^{T}

where ⊙ denotes element-wise multiplication. Similar to topological-level enhancement, the probability

p_{i}^{f}

should reflect the importance of the i-th feature dimension of node features. Our assumption is based on the idea that feature dimensions that frequently appear in important nodes should be important. The weight of feature dimension i is defined as follows for any node u with sparse one-hot encoded node features:

ω_{i}^{f} = \sum_{u \in V} x_{u i} \cdot ϕ_{c} (u)

where

ϕ_{c} (u)

is the centrality measure of the node u, and

x_{u i} \in {0, 1}

indicates whether dimension i appears for node u or not. Similar to topological enhancement, we normalize the weights to obtain probabilities representing the importance of features:

p_{i}^{f} = min (\frac{(Ω_{\max}^{f} - Ω_{i}^{f})}{(Ω_{\max}^{f} - μ_{s}^{f})} \cdot p_{f}, p_{τ})

where

Ω_{i}^{f} = log (ω_{i}^{f})

,

Ω_{\max}^{f}

, and

μ_{s}^{f}

are the maximum and average values of

Ω_{i}^{f}

, respectively, and

p_{f}

is a hyperparameter controlling the overall size of feature enhancement.

After applying topological-level and attribute-level enhancements, enhanced views

({\bar{G}}_{l})

and

({\bar{G}}_{a})

are generated for the learning view and anchor view, respectively, for subsequent operations.

4.2.3. Node-Level Contrastive Learning

After forming two enhanced graph views through graph data augmentation, we proceed with node-level graph contrastive learning to maximize their mutual information. We employ a contrastive learning framework inspired by SimCLR [32], which consists of the following components:

GCN-Based Encoder

The GCN-based encoder

f_{θ} (\cdot)

extracts node-level representations for the enhanced views

{\bar{G}}_{l}

and

{\bar{G}}_{a}

:

H_{l} = f_{θ} ({\bar{G}}_{l})

H_{a} = f_{θ} ({\bar{G}}_{a})

where

θ

represents the parameters of the encoder

f_{θ} (\cdot)

, and

H_{l}, H_{a} \in R^{n \times d}

are the representation matrices for the learning view and the anchor view, respectively. The number of layers in GCN is set to 2.

Projection with MLP

After encoding with the GCN encoder, we use an MLP projector

d_{ϕ} (\cdot)

to map node representations to another latent space. The node representation matrices are

Z_{l} = d_{ϕ} (H_{l})

Z_{a} = d_{ϕ} (H_{a})

where

ϕ

represents the parameters of the projector

d_{ϕ} (\cdot)

, and

Z_{l}

and

Z_{a} \in R^{n \times d}

are the projected node representation matrices for the learning view and the anchor view, respectively.

Node-Level Contrastive Loss

The contrastive loss function is used to enforce consistency between the projections

z_{(l, i)}

and

z_{(a, i)}

of the same node

v_{i}

on the two views. We employ a symmetric normalized temperature-scaled cross-entropy loss as our loss function:

L = \frac{1}{2 n} \sum_{i = 1}^{n} [l (z_{(l, i)}, z_{(a, i)}) + l (z_{(a, i)}, z_{(l, i)})]

4.3. Structural Bootstrapping Mechanism

Using the adjacency matrix A or I of the original data graph as a fixed anchor view to maximize mutual information between the two views in order to learn the graph structure may lead to several issues: 1. The original input graph comes with some noise, and if not addressed during learning, the learned graph structure will eventually inherit this noise. 2. The information guiding the graph structure learning in the fixed anchor view is limited, unable to provide stable and effective supervision during the training process. 3. In the process of maximizing mutual information between the two views, the learned graph structure gradually fits the fixed anchor structure, leading to overfitting issues. There are currently many works that adopt a Structure Bootstrapping Mechanism to update the anchor view [33,34]. By slowly updating the anchor graph structure

A_{a}

with the learned graph structure S, the aforementioned issues can be effectively mitigated. Specifically, given a probability

τ \in [0, 1]

, the update method for the anchor view

A_{a}

is as follows:

A_{a} = τ A_{a} + (1 - τ) S

As the training progresses, some noisy edges gradually decrease in

A_{a}

, reducing their negative impact on structural learning. Additionally, since the anchor view changes continuously during training, it can continuously provide more effective information for graph learning while also addressing the overfitting problem. To ensure training stability,

τ

is set to a value close to 1.

4.4. Model Training Algorithm

The overall algorithm is presented in Algorithm 1. From lines 2 to 6, we initialize the adjacency matrix

A_{a}

of the anchor view based on whether the topological structure of the input graph data is known. Starting from line 7, we iteratively optimize the learned adjacency matrix

\tilde{S}

and the GNN parameters

θ

. Specifically, from lines 8 to 15, we choose graph learner I or graph learner II based on whether the topological structure of the initial graph data is known to learn the initial learned view’s adjacency matrix

\tilde{S}

, and construct the anchor and learned views based on the initial node attribute information. Subsequently, from lines 16 to 17, we enhance the anchor view

G_{a}

and the learned view

G_{l}

using the adaptive data augmentation function

f_{e n} (\cdot)

to obtain

\bar{G_{a}}

and

\bar{G_{l}}

. From lines 18 to 23, we perform contrastive learning based on the enhanced anchor view

\bar{G_{a}}

and learned view

\bar{G_{l}}

and update the parameters of the GNN and MLP,

θ

and

ϕ

, respectively. Finally, to provide continuous guidance for the learned view, in lines 24 to 26, we use the learned adjacency matrix

\tilde{S}

to gradually update the adjacency matrix

A_{a}

of the anchor view.

Algorithm 1 Uogtag Framework Training Algorithm
Input: Initial feature matrix X; initial adjacency matrix A (or identity matrix I); structure bootstrapping decay rate and decay interval $τ$ , c; adaptive graph augmentation function $f_{e n} (\cdot)$ ; augmentation hyperparameters for controlling feature and topology enhancement $p_{f}$ , $p_{e}$ ; truncation probability $p_{τ}$ ; temperature hyperparameter t; training rounds $E_{p}$ . Output: Learned adjacency matrix $\tilde{S}$
1:	Initialize the parameter matrix $ω$ of the graph learner, parameters $θ$ of GNN, parameters $ϕ$ of the MLP projector;
2:	ifA is provided then
3:	Initialize the anchor view adjacency matrix: $A_{a} = A$ ;
4:	else
5:	Initialize the anchor view adjacency matrix: $A_{a} = I$ ;
6:	end if
7:	for $e = 1, 2, \dots, E_{p}$ do
8:	if $A_{a} = A$ then
9:	Learn adjacency matrix $\tilde{S}$ using graph learner II;
10:	else
11:	Learn $S^{'}$ using graph learner I;
12:	Process $S^{'}$ with a post-processor to obtain $\tilde{S}$ ;
13:	end if
14:	Construct the anchor view $G_{a}$ with adjacency matrix $A_{a}$ and feature matrix X;
15:	Construct the learned view $G_{l}$ with the learned adjacency matrix $\tilde{S}$ and feature matrix X;
16:	Enhance the anchor view ${\bar{G}}_{a}$ using the adaptive augmentation function $f_{e n} (\cdot)$ ;
17:	Enhance the learned view ${\bar{G}}_{l}$ using the adaptive augmentation function $f_{e n} (\cdot)$ ;
18:	Learn node representations of ${\bar{G}}_{a}$ using GNN $f_{θ} (\cdot)$ ;
19:	Learn node representations of ${\bar{G}}_{l}$ using GNN $f_{θ} (\cdot)$ ;
20:	Obtain representation $Z_{a}$ for $(H_{a})$ using MLP $d_{ϕ} (\cdot)$ ;
21:	Obtain representation $Z_{l}$ for $(H_{l})$ using MLP $d_{ϕ} (\cdot)$ ;
22:	Compute node-level loss based on $Z_{a}$ and $Z_{l}$ ;
23:	Update parameter matrix $ω$ , parameters $θ$ of GNN, parameters $ϕ$ of MLP projector;
24:	if $e mod c = 0$ then
25:	$A_{a} = τ A_{a} + (1 - τ) \tilde{S}$ ;
26:	end if
27:	end for

5. Experiment

In this section, we conduct experiments to demonstrate the effectiveness of the proposed Uogtag framework. Our goal is to address the following four research questions:

RQ1: How effective is the graph structure learned by the Uogtag framework under the unsupervised setting?
RQ2: How does the data augmentation module affect the learning performance of Uogtag?
RQ3: What is the defense effectiveness of Uogtag against adversarial attacks?
RQ4: How do critical hyperparameters affect the performance of Uogtag?

5.1. Experimental Setup

To evaluate the performance of adjacency matrices after graph structure inference or graph structure refinement in downstream tasks, our downstream task utilizes the accuracy of node classification for evaluation. For node classification, we conduct experiments under two scenarios using the accuracy of node classification as a metric to measure the excellence of the learned topology: graph structure inference and graph structure refinement.

5.1.1. Datasets

To assess the performance of our Uogtag framework, we carefully select three datasets which are widely used in graph learning tasks: Cora, Citeseer, and PubMed. In these datasets, each node represents a publication, and each edge represents a citation relationship between two articles. For the overall description of datasets, please refer to Table 1.

Cora Dataset

The Cora dataset contains papers classified into seven computer science fields: case-based, genetic algorithms, neural networks, probabilistic methods, reinforcement learning, rule learning, and theory. It provides a diverse range of topics within the field of computer science.

Citeseer Dataset

The Citeseer dataset comprises academic papers from six different fields: agents, machine learning (ML), information retrieval (IR), databases (DB), human–computer interaction (HCI), and artificial intelligence (AI).

PubMed Dataset

The PubMed dataset focuses on medical literature and classifies papers into three different categories: diabetes, experimental diabetes, and diabetes mellitus type 1 and diabetes mellitus type 2.

5.1.2. Baselines

In our experiments, we compared the performance of our proposed method against several established graph neural network models and graph structure learning methods. These baselines encompass a variety of approaches, from traditional GCN-based models to advanced methods that integrate self-supervision, attention mechanisms, and federated learning. Below, we briefly introduce each baseline model:

GCN [35]: A neural network model that captures node relationships in graph-structured data using convolution operations for tasks like representation learning and node classification.

SGC [36]: A simplified GCN version that removes non-linearities and weight matrices, directly computing powers of the adjacency matrix to reduce computational complexity while maintaining performance.

GAT [37]: Utilizes self-attention mechanisms to assign different weights to neighboring nodes, enhancing the model’s ability to learn from complex graph structures.

GraphSAGE [38]: A framework that generates embeddings for unseen nodes by sampling and aggregating features from their local neighborhood, efficiently handling evolving graphs and new subgraphs.

LDS [39]: A method that simultaneously learns graph structures and GNN parameters using a bilevel program. It models edges as Bernoulli random variables, optimizing graph generation for semi-supervised classification tasks, even when graph structures are noisy or missing.

GRCN [40]: A spatiotemporal traffic flow prediction model that integrates GCN for capturing spatial dependencies and Bi-GRU for modeling bidirectional temporal correlations, effectively predicting short-term and long-term traffic flow.

GEN [41]: A method that estimates optimal graph structures for GNNs using Bayesian inference, integrating structure and observation models to improve node classification performance on noisy or incomplete graphs.

RSGNN [5]: A method designed to enhance GNN performance on noisy graphs with sparse labels by learning a denoised and densified graph structure. It uses a link predictor to eliminate noisy edges and create new edges between similar nodes, improving message passing and semi-supervised classification accuracy.

IDGL [42]: An end-to-end framework that iteratively learns graph structures and node embeddings by optimizing both for downstream tasks. It enhances graph learning using adaptive graph regularization and is scalable via anchor-based approximation, performing well in both transductive and inductive settings.

SLAPS [43]: A method that improves GNN performance by learning graph structures through self-supervision, effectively handling cases where the graph structure is noisy or unavailable. It scales to large graphs and outperforms several established models on benchmark tasks.

FedGL [44]: A federated graph learning framework that uses global self-supervision to enhance local models by discovering and sharing global pseudo labels and graphs, addressing data heterogeneity and complementarity, and significantly improving performance on distributed graph datasets.

VN-GSL [20]: A method for robust graph structure learning that introduces virtual nodes to discover new connections, recalculates edge weights using Gumbel-Softmax for differentiability, and eliminates redundant edges, improving performance and robustness against adversarial attacks on noisy graph data.

RGSLA [45]: A robust GSL approach that aligns node features and graph structure, leveraging sparse dimensional reduction and feature alignment to improve node classification accuracy on noisy graphs.

5.2. Experimental Configurations

The Uogtag framework comprises two main modules: the “Graph Structure Learning Module” and the “Graph Contrastive Learning Module”.

In graph learner I, the model’s layer count is set to 2. The number of nearest neighbors, k, was determined through multiple experiments as follows:

k = 30

for the Cora dataset,

k = 20

for the Citeseer dataset, and

k = 15

for the PubMed dataset. The similarity function,

sim_function

, has selectable options, ’cosine’ and ’minkowski’, with ’cosine’ being the default choice. Graph learner II also uses a GNN with 2 layers. The number of nearest neighbors, k, is consistent with the settings in graph learner I.

In the graph contrastive learning module, we set two learning rates: 0.01 for the classifier and 0.001 for other model parameters. The hidden layer dimension is 512, and the projection layer dimension is 64. The neural network in the contrastive learning module is set to 2 layers. Network optimization is performed using the Adam optimizer.

All experiments were repeated 5 times, and the results were averaged to ensure robustness and reliability.

5.3. Performance Comparison (RQ1)

Table 2 shows the comparison of node classification accuracy between our method and other baseline methods after modeling the graph structure using the structure inference mode and performing contrastive learning. The first column of the table displays the data available during the graph modeling phase, where X and Y represent node features and label information, respectively, and

A_{knn}

denotes the KNN graph (K-nearest neighbor graph construction). The symbol OMM indicates out of memory. For GNNs with a fixed structure, namely GCN, SGC, GAT, and SAGE, as well as some GNNs used in graph structure refinement scenarios, namely GRCN, GEN, RSGNN, VN-GSL and RGSLA, we use the KNN graph as the input graph.

Observing the results in Table 2, it can be seen that in scenarios where input data lacks labels and topological information, our proposed Uogtag framework outperforms all baselines across the three datasets. Uogtag, through its contrastive learning module based on adaptive data augmentation, can automatically adjust and improve the learning objectives during the learning process. This enables the model to learn better and understand the data’s structure and features, thereby optimizing the graph structure and achieving superior learning performance. Additionally, it has been observed that the performance of structurally fixed GNNs using a KNN graph as input surpasses traditional feature-based classifiers. This indicates that constructing connections through a KNN graph by capturing the underlying relationships between samples provides deeper insights and better performance than traditional classifiers relying solely on individual features.

Moreover, Table 3 further demonstrates the performance of Uogtag in node classification tasks within the scenario of graph structure refinement. In this scenario, Uogtag not only performs excellently compared to self-supervised methods but also surpasses some supervised learning methods. This finding highlights Uogtag’s ability to leverage its self-supervisory signal to improve the original graph structure. To achieve such enhanced performance, Uogtag selects suitable graph learners to model the graph structure based on the characteristics of different input data. This means that for different types of input data, we adopt various graph construction strategies and parameter settings to ensure that the graph structure maximally reflects the intrinsic connectivity and dynamism of the data. Furthermore, Uogtag employs a contrastive learning approach based on adaptive data augmentation. Through this method, Uogtag not only learns the direct relationships between data points but also searches for deeper, potentially unobserved associations between the data. The adaptive data augmentation strategy allows the model to continually discover and reinforce these latent, valuable data relationships during training, thereby enhancing the model’s understanding of and adaptability to complex data structures. Utilizing these approaches has helped Uogtag learn more generalized node representations, thus achieving higher node classification accuracy in the scenario of structure refinement.

5.4. Ablation Study (RQ2)

To demonstrate the role of adaptive data augmentation in the graph contrastive learning module, we replaced the proposed adaptive topological and attribute-level augmentations with random augmentations for topology and attributes. This experiment was conducted under the graph structure refinement scenario to investigate the impact of this module on Uogtag. Specifically, Uogtag-UU utilizes uniform random deletion of edges and node attributes for graph data augmentation, with the probabilities of deleting edges and node attributes set to be uniform. Uogtag-UA employs random topological augmentation but uses adaptive attribute augmentation, while Uogtag-AU uses adaptive topological augmentation with random attribute augmentation. For this part, adaptive augmentations assess enhancements using degree centrality.

The specific results, as shown in Figure 3, indicate that, across all datasets, both topological and attribute-level adaptive augmentation schemes can improve the model’s predictive performance. Moreover, Uogtag, which employs both adaptive topological and attribute augmentation schemes, further enhances performance. It is observed that on the PubMed dataset, our adaptive data augmentation scheme improved performance by approximately 2.8% compared to the random topological and attribute augmentation approach. This result validates the effectiveness of our adaptive data augmentation scheme in enhancing both topology and attributes. Lastly, we find that for cases that only involve topological or attribute augmentation across the three datasets, topological augmentation alone yields superior performance. This suggests that adaptive topological augmentation has a more significant impact on the model’s predictive efficacy.

5.5. How Effective Is Uogtag against Adversarial Attacks? (RQ3)

To evaluate the resistance of Uogtag to adversarial attacks, we perturbed the structure of the Cora dataset using two types of attacks: random noise attack and non-targeted attack. We trained models on the perturbed graph structures and evaluated the accuracy of node classification. For the random noise attack, we randomly deleted existing edges or added previously nonexistent edges to the original graph structure. For the non-targeted attack, we employed the MetaAttack method [46] to perturb the graph structure by adding and deleting edges. We conducted model training on the perturbed graph structures and compared the results with three baseline methods: GCN, GNNGuard, and GRCN. The results are shown in Figure 4. As the perturbation probability increases, the prediction accuracy of all methods decreases. However, our method is less affected by adversarial attacks, indicating the robustness of Uogtag to structural attacks.

Figure 3. Performance of model variants on node classification.

5.6. How Do Key Hyperparameters Affect Uogtag’s Performance? (RQ4)

In this section, we conduct a sensitivity analysis on the key parameters within Uogtag, including the overall probability

p_{e}

controlling the deletion of edges and

p_{f}

for the probability of feature deletion. For the anchor view, the hyperparameters for topological and attribute augmentation are denoted as

p_{e 1}

and

p_{f 1}

, respectively, while for the learning view, they are represented as

p_{e 2}

and

p_{f 2}

. We explore the enhancement effects of Uogtag on these hyperparameters by controlling different combinations of

p_{e}

and

p_{f}

. For ease of presentation, the values for the anchor view and learning view under each augmentation are kept consistent, i.e., setting

p_{e} = p_{e 1} = p_{e 2}

and

p_{f} = p_{f 1} = p_{f 2}

to control the magnitude of topological and attribute augmentations, with other parameters held constant.

The performance after conducting experiments on Citeseer is shown in Figure 5. From the heatmap, we can observe that when the values of

p_{e}

and

p_{f}

are not too large, the accuracy of node classification remains relatively high and stable, indicating robustness to hyperparameter perturbations. As the values of

p_{e}

and

p_{f}

increase, leading to greater disruption to the graph, the majority of edges and features in both the anchor and learning views are removed when

p_{e}

and

p_{f}

reach their maximum value of 0.9, adversely affecting the final learning outcomes. This suggests that when the augmentation magnitude is small, the main structures and features in the graph view are preserved, with some capability to eliminate structural and feature noise. In such cases, the graph neural network can effectively use the neighborhood information of nodes to learn useful node embeddings, thereby achieving higher accuracy in node classification tasks. However, when the values of

p_{e}

and

p_{f}

are large, a significant number of edges are deleted, potentially resulting in many isolated nodes within the graph. Under these circumstances, it becomes challenging for the graph neural network to capture effective neighborhood structural information as connections between nodes become sparse or nonexistent. Furthermore, node features may be excessively perturbed, leading to the loss of original feature information. Such substantial structural and feature perturbations make it difficult for GNNs to learn distinctive node embeddings from graph views, impacting the model’s ability to optimize objectives through contrastive learning. Therefore, moderate perturbations to graph topology and node attributes can serve as effective data augmentation means, aiding in improving the model’s generalization ability and robustness. However, excessive perturbation can destroy the basic structure of the graph, negatively affecting model performance.

Another important parameter in our Uogtag framework is

τ

in the Structural Bootstrapping Mechanism. In the structural bootstrapping mechanism, after a fixed number of training epochs, the learned adjacency matrix

\tilde{S}

is used to update the anchor view’s adjacency matrix

A_{a}

. This approach helps reduce the noise in the original graph structure and provides more effective supervision. The parameter

τ

controls the update rate of

A_{a}

; a smaller value represents a larger update magnitude, while a larger value represents a smaller update magnitude. We conducted experiments on the Cora dataset, as shown in Figure 6, and observed that a smaller

τ

results in greater disruption to

A_{a}

, leading to lower node classification accuracy and suboptimal graph structure learning. As

τ

increases, the disruption to

A_{a}

decreases, allowing the model to learn better graph structures. Additionally, more extensive experiments found that when

τ

is between 0.99 and 1, the node classification accuracy fluctuates and does not necessarily reach its maximum at

τ = 1

. Therefore, in our experiments, we set

τ

to a value close to but not equal to 1: specifically, 0.999.

6. Conclusions

In this study, we introduced Uogtag, a novel unsupervised graph structure learning framework tailored to overcome the challenges that graph neural networks (GNNs) encounter with noisy or incomplete graph data. Our approach, which optimizes graph topology through contrastive learning and adaptive data augmentation, has shown noteworthy enhancements in learning efficacy in various graph scenarios, particularly under conditions of noise and scarce labeling.

Extensive experiments validate Uogtag’s capability to refine graph structures and boost GNN performance, highlighting the pivotal role of unsupervised learning in advancing graph-based learning technologies. The findings delineate Uogtag’s capacity to facilitate more precise and durable graph learning methodologies, broadening the spectrum of GNN applications in intricate, real-world datasets.

Nevertheless, this research has its confines. The selection of graph learners and data augmentation strategies, albeit effective, underscores the need for further investigations into more versatile and dynamic methods that accommodate an expanded array of graph configurations and learning contexts. Subsequent studies could also delve into incorporating Uogtag with various GNN architectures, thereby widening its utility and fine-tuning its responsiveness to an assortment of tasks and data paradigms.

In conclusion, Uogtag signifies a noteworthy advancement in unsupervised graph structure learning, presenting a robust resolution to some of the most pressing issues in the domain. This development not only enhances the theoretical comprehension of graph learning mechanisms but also furnishes pragmatic instruments for augmenting GNN deployments. We envisage that Uogtag will catalyze continued innovation in the field, propelling progress in both the methodologies and applications of graph neural networks.

Author Contributions

D.A.: Conceptualization, methodology, validation. Z.P.: Software, formal analysis, writing—original draft. Q.Z.: Supervision. W.L.: Writing—review and editing. J.L.: Project administration. All authors have read and agreed to the published version of the manuscript.

Funding

This work was sponsored in part by the National Natural Science Foundation Youth Fund under Grant 62302308; in part by the National Key Research and Development Program of China under Grant 2022YFC3302600, 2022YFB4501704; in part by the National Natural Science Foundation of China under Grant 61972150, Grant 62132014, Grant 62302308, Grant U2142206, Grant 62372300, and Grant 61702333; in part by the Shanghai Engineering Research Center of Intelligent Education and Big Data; and in part by the Research Base of Online Education for Shanghai Middle and Primary Schools.

Data Availability Statement

Data are contained within the article.

Acknowledgments

This work was supported by the Shanghai Engineering Research Center of Intelligent Education and Big Data.

Conflicts of Interest

Author Wenyan Liu was employed by the company Ant Group. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Li, X.; Sun, L.; Ling, M.; Peng, Y. A survey of graph neural network based recommendation in social networks. Neurocomputing 2023, 549, 126441. [Google Scholar] [CrossRef]
Zhao, Y.; Luo, X.; Ju, W.; Chen, C.; Hua, X.S.; Zhang, M. Dynamic hypergraph structure learning for traffic flow forecasting. In Proceedings of the 2023 IEEE 39th International Conference on Data Engineering (ICDE), Anaheim, CA, USA, 3–7 April 2023; IEEE: New York, NY, USA, 2023; pp. 2303–2316. [Google Scholar]
Gao, C.; Wang, X.; He, X.; Li, Y. Graph Neural Networks for Recommender System. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, New York, NY, USA, 21–25 February 2022; WSDM ’22. pp. 1623–1625. [Google Scholar] [CrossRef]
Cai, H.; Zhang, H.; Zhao, D.; Wu, J.; Wang, L. FP-GNN: A versatile deep learning architecture for enhanced molecular property prediction. Briefings Bioinform. 2022, 23, bbac408. [Google Scholar] [CrossRef] [PubMed]
Dai, E.; Jin, W.; Liu, H.; Wang, S. Towards Robust Graph Neural Networks for Noisy Graphs with Sparse Labels. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, New York, NY, USA, 21–25 February 2022; WSDM ’22. pp. 181–191. [Google Scholar] [CrossRef]
Alrahis, L.; Patnaik, S.; Hanif, M.A.; Shafique, M.; Sinanoglu, O. PoisonedGNN: Backdoor attack on graph neural networks-based hardware security systems. IEEE Trans. Comput. 2023, 72, 2822–2834. [Google Scholar] [CrossRef]
Wu, L.; Lin, H.; Liu, Z.; Liu, Z.; Huang, Y.; Li, S.Z. Homophily-Enhanced Self-Supervision for Graph Structure Learning: Insights and Directions. IEEE Trans. Neural Netw. Learn. Syst. 2023, 1–15. [Google Scholar] [CrossRef] [PubMed]
Liu, Y.; Zheng, Y.; Zhang, D.; Chen, H.; Peng, H.; Pan, S. Towards Unsupervised Deep Graph Structure Learning. In Proceedings of the ACM Web Conference 2022, New York, NY, USA, 25–29 April 2022; WWW ’22. pp. 1392–1403. [Google Scholar] [CrossRef]
Wang, Y.; Wang, Y.; Zhang, Z.; Yang, S.; Zhao, K.; Liu, J. USER: Unsupervised structural entropy-based robust graph neural network. In Proceedings of the Thirty-Seventh AAAI Conference on Artificial Intelligence and Thirty-Fifth Conference on Innovative Applications of Artificial Intelligence and Thirteenth Symposium on Educational Advances in Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; AAAI Press: Washington, DC, USA, 2023. AAAI’23/IAAI’23/EAAI’23. [Google Scholar] [CrossRef]
Almarshdi, R.; Nassef, L.; Fadel, E.; Alowidi, N. Hybrid Deep Learning Based Attack Detection for Imbalanced Data Classification. Intell. Autom. Soft Comput. 2023, 35. [Google Scholar] [CrossRef]
Zhang, P.; Chen, J.; Che, C.; Zhang, L.; Jin, B.; Zhu, Y. IEA-GNN: Anchor-aware graph neural network fused with information entropy for node classification and link prediction. Inf. Sci. 2023, 634, 665–676. [Google Scholar] [CrossRef]
Li, M.; Li, J.; Yang, L.; Ding, Q. Self-Supervised Hypergraph Learning for Knowledge-Aware Social Recommendation. Electronics 2024, 13, 1306. [Google Scholar] [CrossRef]
Zou, D.; Peng, H.; Huang, X.; Yang, R.; Li, J.; Wu, J.; Liu, C.; Yu, P.S. Se-gsl: A general and effective graph structure learning framework through structural entropy optimization. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 1–5 May 2023; pp. 499–510. [Google Scholar]
Kitson, N.K.; Constantinou, A.C.; Guo, Z.; Liu, Y.; Chobtham, K. A survey of Bayesian Network structure learning. Artif. Intell. Rev. 2023, 56, 8721–8814. [Google Scholar] [CrossRef]
Zhu, Y.; Xu, W.; Zhang, J.; Du, Y.; Zhang, J.; Liu, Q.; Yang, C.; Wu, S. A survey on graph structure learning: Progress and opportunities. arXiv 2021, arXiv:2103.03036. [Google Scholar]
Mo, Y.; Peng, L.; Xu, J.; Shi, X.; Zhu, X. Simple unsupervised graph representation learning. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtual, 22 February–1 March 2022; Volume 36, pp. 7797–7805. [Google Scholar]
Luo, D.; Cheng, W.; Yu, W.; Zong, B.; Ni, J.; Chen, H.; Zhang, X. Learning to drop: Robust graph neural network via topological denoising. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Virtual, 8–12 March 2021; pp. 779–787. [Google Scholar]
Ling, Y.; Li, X.; Bin, D.; Yang, C.; Han, S.; Lu, J.; Ming, S.; Li, J. Graph Attention Mechanism-Based Method for Tracing APT Attacks in Power Systems. In Proceedings of the 2024 IEEE 4th International Conference on Power, Electronics and Computer Applications (ICPECA), Shenyang, China, 22–24 January 2021; IEEE: New York, NY, USA, 2024; pp. 23–27. [Google Scholar]
Jin, W.; Ma, Y.; Liu, X.; Tang, X.; Wang, S.; Tang, J. Graph structure learning for robust graph neural networks. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 66–74. [Google Scholar]
Zhang, W.; Ou, W.; Li, W.; Gou, J.; Xiao, W.; Liu, B. Robust Graph Structure Learning with Virtual Nodes Construction. Mathematics 2023, 11, 1397. [Google Scholar] [CrossRef]
Cheng, D.; Zhang, L.; Bu, C.; Wu, H.; Song, A. Learning hierarchical time series data augmentation invariances via contrastive supervision for human activity recognition. Knowl.-Based Syst. 2023, 276, 110789. [Google Scholar] [CrossRef]
Wang, C.; Ma, W.; Chen, C.; Zhang, M.; Liu, Y.; Ma, S. Sequential recommendation with multiple contrast signals. ACM Trans. Inf. Syst. 2023, 41, 1–27. [Google Scholar] [CrossRef]
Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2069–2080. [Google Scholar]
Peng, Z.; Huang, W.; Luo, M.; Zheng, Q.; Rong, Y.; Xu, T.; Huang, J. Graph representation learning via graphical mutual information maximization. In Proceedings of the Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 259–270. [Google Scholar]
Veličković, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep graph infomax. arXiv 2018, arXiv:1809.10341. [Google Scholar]
You, Y.; Chen, T.; Shen, Y.; Wang, Z. Graph contrastive learning automated. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 18–24 July 2021; pp. 12121–12132. [Google Scholar]
Sun, S.; Tian, H.; Wang, R.; Zhang, Z. Biomedical Interaction Prediction with Adaptive Line Graph Contrastive Learning. Mathematics 2023, 11, 732. [Google Scholar] [CrossRef]
Kumar, P.; Rawat, P.; Chauhan, S. Contrastive self-supervised learning: Review, progress, challenges and future research directions. Int. J. Multimed. Inf. Retr. 2022, 11, 461–488. [Google Scholar] [CrossRef]
Zhang, J.; Luo, Y. Degree centrality, betweenness centrality, and closeness centrality in social network. In Proceedings of the 2017 2nd International Conference on Modelling, Simulation and Applied Mathematics (MSAM2017), Bangkok, Thailand, 26–27 March 2017; Atlantis Press: Beijing, China, 2017; pp. 300–303. [Google Scholar]
Bonacich, P. Power and centrality: A family of measures. Am. J. Sociol. 1987, 92, 1170–1182. [Google Scholar] [CrossRef]
Zhang, P.; Wang, T.; Yan, J. PageRank centrality and algorithms for weighted, directed networks. Phys. A Stat. Mech. Its Appl. 2022, 586, 126438. [Google Scholar] [CrossRef]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Elbanna, G.; Scheidwasser-Clow, N.; Kegler, M.; Beckmann, P.; El Hajal, K.; Cernak, M. Byol-s: Learning self-supervised speech representations by bootstrapping. In Proceedings of the HEAR: Holistic Evaluation of Audio Representations, PMLR, Virtual, 13–14 December 2021; pp. 25–47. [Google Scholar]
Tang, H.; Liang, X.; Wang, J.; Zhang, S. DualGAD: Dual-bootstrapped self-supervised learning for graph anomaly detection. Inf. Sci. 2024, 668, 120520. [Google Scholar] [CrossRef]
Zhao, L.; Song, Y.; Zhang, C.; Liu, Y.; Wang, P.; Lin, T.; Deng, M.; Li, H. T-gcn: A temporal graph convolutional network for traffic prediction. IEEE Trans. Intell. Transp. Syst. 2019, 21, 3848–3858. [Google Scholar] [CrossRef]
Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying graph convolutional networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 6861–6871. [Google Scholar]
Tang, W.; Sun, H.; Wang, J.; Liu, C.; Qi, Q.; Wang, J.; Liao, J. Identifying Users Across Social Media Networks for Interpretable Fine-Grained Neighborhood Matching by Adaptive GAT. IEEE Trans. Serv. Comput. 2023, 16, 3453–3466. [Google Scholar] [CrossRef]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Franceschi, L.; Niepert, M.; Pontil, M.; He, X. Learning discrete structures for graph neural networks. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 1972–1982. [Google Scholar]
Jiang, W.; Xiao, Y.; Liu, Y.; Liu, Q.; Li, Z. Bi-GRCN: A spatio-temporal traffic flow prediction model based on graph neural network. J. Adv. Transp. 2022, 2022. [Google Scholar] [CrossRef]
Wang, R.; Mou, S.; Wang, X.; Xiao, W.; Ju, Q.; Shi, C.; Xie, X. Graph structure estimation neural networks. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 342–353. [Google Scholar]
Chen, Y.; Wu, L.; Zaki, M. Iterative deep graph learning for graph neural networks: Better and robust node embeddings. Adv. Neural Inf. Process. Syst. 2020, 33, 19314–19326. [Google Scholar]
Fatemi, B.; El Asri, L.; Kazemi, S.M. Slaps: Self-supervision improves structure learning for graph neural networks. Adv. Neural Inf. Process. Syst. 2021, 34, 22667–22681. [Google Scholar]
Chen, C.; Xu, Z.; Hu, W.; Zheng, Z.; Zhang, J. Fedgl: Federated graph learning framework with global self-supervision. Inf. Sci. 2024, 657, 119976. [Google Scholar] [CrossRef]
Lv, S.; Wen, G.; Liu, S.; Wei, L.; Li, M. Robust graph structure learning with the alignment of features and adjacency matrix. arXiv 2023, arXiv:2307.02126. [Google Scholar]
Zügner, D.; Günnemann, S. Adversarial Attacks on Graph Neural Networks via Meta Learning. arXiv 2024, arXiv:1902.08412. [Google Scholar]

Figure 1. Comparison between the (a) traditional graph structure learning paradigm and (b) our graph structure learning paradigm.

Figure 4. The performance of the model under different attack methods.

Figure 5. The performance of the model on the Citeseer dataset under varying levels of topology or attribute enhancement strength.

Figure 6. Performance of the model under different bootstrapping decay rates.

Table 1. Dataset statistics.

Datasets	Nodes	Edges	Features	Classes	Label Rate
Cora	2708	5429	1433	7	0.052
Citeseer	3327	4732	3703	6	0.036
PubMed	19,717	44,338	500	3	0.003

Table 2. Accuracy of node classification in the structure inference scenario.

Available Data	Method	Cora	Citeseer	PubMed
	Linear SVM	58.9 ± 0.0	58.3 ± 0.0	70.4 ± 0.4
	MLP	56.1 ± 1.6	56.1 ± 1.6	56.7 ± 1.7
	GCN	67.1 ± 0.2	68.1 ± 0.9	69.8 ± 0.2
	SGC	64.8 ± 0.6	67.2 ± 1.1	68.1 ± 0.9
	GAT	66.2 ± 0.5	70.0 ± 0.6	69.6 ± 0.5
	SAGE	66.1 ± 0.7	68.0 ± 1.6	68.7 ± 0.2
X, Y	LDS	71.5 ± 0.8	71.5 ± 1.1	OOM
X, Y, A_knn	GRCN	69.6 ± 0.2	70.4 ± 0.3	70.6 ± 0.1
X, Y, A_knn	GEN	69.1 ± 0.7	70.7 ± 1.1	70.7 ± 0.9
X, Y, A_knn	RSGNN	70.8 ± 0.7	70.8 ± 0.7	68.8 ± 1.5
X, Y, A_knn	VN-GSL	72.8 ± 1.1	71.7 ± 0.8	74.6 ± 0.4
X, Y, A_knn	RGSLA	70.2 ± 0.6	71.5 ± 0.2	73.8 ± 0.3
X, Y	SLPAS	73.4 ± 0.3	72.6 ± 0.6	74.4 ± 0.6
X, Y	IDGL	70.9 ± 0.6	68.2 ± 1.2	70.1 ± 1.3
A_knn	GDC	68.1 ± 1.2	68.8 ± 0.8	68.4 ± 0.3
X	SLAPS-2s	72.1 ± 0.4	69.4 ± 1.4	71.1 ± 0.5
X	FedGL	73.2 ± 0.6	68.1 ± 0.3	73.4 ± 0.3
X	SUBLIME	73.0 ± 0.6	73.1 ± 0.3	73.8 ± 0.6
X	Ours	74.2 ± 0.3	74.4 ± 0.0	75.1 ± 0.3

Table 3. Accuracy of node classification in the structure refinement scenario.

Available Data	Method	Cora	Citeseer	PubMed
	GCN	81.5	70.3	79
	SGC	79.8 ± 0.2	69.2 ± 0.9	77.1 ± 1.1
	GAT	83.0 ± 0.7	72.5 ± 0.7	79.0 ± 0.3
	SAGE	77.4 ± 1.0	67.0 ± 1.0	76.6 ± 0.8
X, Y, A	LDS	83.9 ± 0.6	74.8 ± 0.3	OOM
X, Y, A	GRCN	84.0 ± 0.2	73.0 ± 0.3	78.9 ± 0.1
X, Y, A	GEN	82.3 ± 0.4	73.5 ± 1.5	80.9 ± 0.8
X, Y, A	RSGNN	84.0 ± 0.7	70.8 ± 0.7	OOM
X, Y, A	VN-GSL	84.7 ± 0.5	71.6 ± 0.5	82.8 ± 0.7
X, Y, A	RGSLA	79.1 ± 0.6	70.1 ± 0.4	81.6 ± 0.5
X, Y	IDGL	84.0 ± 0.5	73.1 ± 0.7	83.0 ± 0.2
A	GDC	68.1 ± 1.2	68.8 ± 0.8	68.4 ± 0.3
X, A	SUBLIME	84.2 ± 0.5	73.5 ± 0.6	81.0 ± 0.6
X, A	FedGL	82.8 ± 0.2	73.7 ± 0.1	82.3 ± 0.2
X, A	Ours	86.3 ± 0.9	75.1 ± 0.0	83.8 ± 1.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

An, D.; Pan, Z.; Zhao, Q.; Liu, W.; Liu, J. Unsupervised Graph Structure Learning Based on Optimal Graph Topology Modeling and Adaptive Data Augmentation. Mathematics 2024, 12, 1991. https://doi.org/10.3390/math12131991

AMA Style

An D, Pan Z, Zhao Q, Liu W, Liu J. Unsupervised Graph Structure Learning Based on Optimal Graph Topology Modeling and Adaptive Data Augmentation. Mathematics. 2024; 12(13):1991. https://doi.org/10.3390/math12131991

Chicago/Turabian Style

An, Dongdong, Zongxu Pan, Qin Zhao, Wenyan Liu, and Jing Liu. 2024. "Unsupervised Graph Structure Learning Based on Optimal Graph Topology Modeling and Adaptive Data Augmentation" Mathematics 12, no. 13: 1991. https://doi.org/10.3390/math12131991

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Unsupervised Graph Structure Learning Based on Optimal Graph Topology Modeling and Adaptive Data Augmentation

Abstract

1. Introduction

2. Related Work

2.1. Graph Structure Learning

2.2. Contrastive Learning on Graphs

3. Preliminaries

4. The Proposed Model

4.1. Graph Learner

4.1.1. Graph Learner I

4.1.2. Graph Learner II

4.2. Multi-View Contrastive Learning Based on Adaptive Data Augmentation

4.2.1. Construction of Graph Views

4.2.2. Graph Data Augmentation

4.2.3. Node-Level Contrastive Learning

4.3. Structural Bootstrapping Mechanism

4.4. Model Training Algorithm

5. Experiment

5.1. Experimental Setup

5.1.1. Datasets

5.1.2. Baselines

5.2. Experimental Configurations

5.3. Performance Comparison (RQ1)

5.4. Ablation Study (RQ2)

5.5. How Effective Is Uogtag against Adversarial Attacks? (RQ3)

5.6. How Do Key Hyperparameters Affect Uogtag’s Performance? (RQ4)

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI