Community-Enhanced Contrastive Learning for Graph Collaborative Filtering

Xia, Xuchen; Ma, Wenming; Zhang, Jinkai; Zhang, En

doi:10.3390/electronics12234831

Open AccessArticle

Community-Enhanced Contrastive Learning for Graph Collaborative Filtering

School of Computer and Control Engineering, Yantai University, Yantai 264005, China

^*

Author to whom correspondence should be addressed.

Electronics 2023, 12(23), 4831; https://doi.org/10.3390/electronics12234831

Submission received: 26 October 2023 / Revised: 25 November 2023 / Accepted: 27 November 2023 / Published: 29 November 2023

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Graph collaborative filtering can efficiently find the hidden interests of users for recommender systems in recent years. This method can learn complex interactions between nodes in the graph, identify user preferences, and provide satisfactory recommendations. However, recommender systems face the challenge of data sparsity. To address this, recent studies have utilized contrastive learning to make use of unlabeled data structures. However, the existing positive and negative example sampling methods are not reasonable. Random-based or data augmentation-based sampling cannot make use of useful latent information. Clustering-based sampling methods ignore the semantics of node features and the relationship between global and local information. To utilize the latent structures in the data, we introduce a novel Community-Enhanced Contrastive Learning method to help the recommendation main task called CECL which uses a community detection algorithm to sample examples with semantic and global information, using both known and hidden community connections in the bipartite interaction graph. Extensive experiments are conducted on two well-known datasets, the results of which show a 12% and 8% performance improvement compared to that of the existing baseline methods.

Keywords:

graph neural network; contrastive learning; deep learning; recommender system

1. Introduction

Numerous web applications are created to make daily tasks easier for people. However, the abundance of data has become a significant problem. The vast amount of information available has led to data overload, making it difficult for people to find what they need. For example, short video platforms have an enormous number of videos, which makes it impossible for users to choose them one by one [1]. To solve this issue, recommender systems have been developed. These systems analyze user preferences in term of past interaction behaviors, then suggest content or items that match their preferences, providing personalized recommendations. This way, users can easily find the content they are more likely to enjoy, which increases user engagement and promotes consumption [2].

Collaborative filtering is an algorithm that is highly effective in making recommendations. It employs a user’s known preferences to infer their unknown preferences [3]. The main idea behind this concept is that “birds of a feather flock together”. By utilizing the preferences learned from the historical behaviors, systems can provide useful recommendations to users. These recommendations for a user are derived from other users with shared, similar interests, or from items similar to the items bought by the user before [4,5]. The collaborative filtering algorithm is an easy and effective way to generate recommendations. However, it faces challenges due to data sparsity, making it difficult to identify similarities using existing information, which is termed as the cold-start problem. To mitigate this issue, graph neural networks have been employed. These networks construct node features by harnessing higher-order neighbor information of nodes. This approach is particularly beneficial for nodes with fewer direct neighbors, as it allows for a more effective assembly of user preference data. The graph-based collaborative filtering approach combines graph neural networks with collaborative filtering methods. The assumption is that even users and items without direct interactions can reflect user preferences to some extent. By utilizing collaborative filtering algorithms, graph propagation has been introduced to address the issue of sparse data. This is achieved by propagating and aggregating information across the user–item interaction graph. Such a technique allows for extracting additional insights from higher-order neighbors, enhancing the recommendation process [6,7]. In a previous study, Lu et al. consider the differences in user ratings to convert the rating bias of users on the social graph into a vector representation [8], while Mashael et al. apply a graph collaborative filtering approach to the field of travel recommendation [9].

Despite the great success of graph collaborative filtering recommendation, it still faces the problem of sparse or noisy data leading to unreliable learned features [10]. Due to the sparsity of data, the representation of nodes can deviate from the node characteristics during the training process. This can lead to imperfect representations affecting the performance of recommendations. Recent research in self-supervised learning has also been devoted to solving this problem. For instance, SGL uses data augmentation to generate positive and negative sample pairs in contrastive learning [11]. According to the research, the significance of data augmentation-based contrast learning for performance improvement is not very effective, and the fundamental decisive role lies in the contrast learning loss of negative samples [12]. The NCL method generates positive and negative pairs through two strategies [13]. The first way is to construct positive and negative pairs by aggregating the embeddings of higher-order homogeneous neighbors. The second is to construct pairs of positive and negative samples by clustering the feature space. Existing contrastive learning methods construct positive–negative sample pairs in a way that ignores the relationship between the global and local aspects of the data, which can lead to an imbalance between commonality and individuality and affect the representation of the final features.

Our approach is inspired by contrastive learning and prediction-based self-supervision techniques. We believe these methods can be applied to mine valid node features from sample data, which can then be used to correct node representations. By leveraging the graph’s structure, we can uncover latent community information associated with the nodes and apply it to self-supervised learning to improve embedding [14]. We preprocess the samples and use standard algorithms, such as SCD and spectral clustering, to identify non-overlapping communities where the samples are located. We then predict the community identity for each node to introduce global features. Our approach involves using the community center where a node is located as a positive sample pair, while centers of other communities serve as negative sample pairs to encourage each node to cluster towards its own community center. The goal is to preserve the semantic attributes of the graph’s structure within the node features. We also make use of collaborative filtering to form positive samples for contrastive learning by considering the nodes’ two-hop homogeneous neighbors. This enables nodes to obtain more effective information from their neighborhood and alleviate the data sparsity problem. Our contributions can be summarized in three main facets as follows:

Our proposed model, CECL, is a graph collaborative filtering self-supervised learning model incorporating self-supervised components. These components include self-supervised community classification, community-centered contrastive learning, and structured contrastive learning. By using these self-supervised methods, the learning of node embeddings can be improved. Furthermore, the model can act as an independent component to be employed by other recommendation methods.
We explore node communities within graph structures and leverage the information from these communities to incorporate global characteristics and structural semantics with the aim of enhancing node representations.
We conduct extensive experiments on two public datasets and the experimental results show that our method outperforms traditional GNN-based recommendation methods.

2. Literature Review

2.1. Collaborative Filtering

Traditional recommendation approaches involve analyzing interactions to learn latent representations of users and items. The most popular recommendation methods, termed matrix factorization techniques, decompose the user–item adjacency matrix to low-rank matrices to obtain hidden features [15,16]. Deep learning models represent identifiers as embeddings and predict user–item interactions using the inner product [5]. Furthermore, researchers have been working on embedding various user-related and item-related information and infusing it into embedding representations to enrich the representational capability of features [17,18]. Despite the success of collaborative filtering in recommendation systems, its expressive ability is limited to first-order interactions. It is also unable to express the similarity of indirectly interacting nodes in the feature space, which constrains the performance of traditional collaborative filtering methods. Moreover, these methods usually derive node vector representation from the historical data of user interactions, which can greatly affect accuracy, particularly when dealing with sparse data.

2.2. Graph Collaborative Filtering

Various collaborative filtering techniques use other different algorithms to learn node embeddings. Traditional collaborative filtering techniques use algorithms like random walks to learn node embeddings [19]. However, the performance of such algorithms depends on adjusting the decay factor in the random walk process. Later, many algorithms for creating node embeddings using Graph Neural Networks (GNNs) were proposed [6,7,20,21]. Generally, graph-based collaborative filtering techniques employ Bayesian Personalized Ranking as the loss function to learn node embeddings. For instance, Defferrard et al. implemented graph convolution by defining the convolution operation through the Laplacian matrix and its eigenvectors [22]. Wang et al. introduced some unnecessary parts by directly migrating the neural network with the convolutional structure to the graph convolutional neural network [6]. LightGCN simplified the graph convolution by removing the transform and nonlinear activation function of each layer based on this [20]. However, none of these methods address data sparsity issues, leading to unreliable features learned from the sparse data, thus affecting the performance of recommendations.

2.3. Self-Supervised Learning

Recommendation systems often face the issue of sparse data, but unsupervised learning has emerged as a potential solution in recent years. By utilizing unsupervised learning techniques, it is possible to extract more information from the interactions structures hidden in the graph, which can assist in mitigating the data sparsity issues to obtain effective recommendation improvement. Furthermore, self-supervised learning methods, which fall under the domain of unsupervised learning, are becoming popular in recent research due to their ability to learn useful representations as auxiliary tasks. Self-supervised learning extracts its own supervised information, such as classification and labeling, from unsupervised data by an auxiliary task, then trains to learn in a supervised manner. Two commonly used self-supervised learning approaches are prediction-based tasks and contrastive-based tasks. Prediction-based tasks extract predictable information from unsupervised data and use it to perform predictions. For example, Zhu et al. constructed linked prediction-assisted tasks that utilize self-supervised learning to optimize image representation, which then aids in image classification [14].

Contrastive-based tasks involve creating both positive and negative sample pairs after data augmentation and clustering of the data. This helps to bring the representations of positive samples closer together while keep the representations of negative samples separately. This makes the representation of the data more distinct, which in turn aids in the following downstream tasks. Some studies cluster the samples and then form positive sample pairs with the centers of their respective clusters while forming negative sample pairs with the centers of other clusters. This introduces semantic contrast neighbors to help learn the feature representation of nodes [23]. Similarly, some other studies construct subgraphs by employing the data augmentation method, which randomly deletes edges or nodes or randomly wandering elements, etc. They then use the subgraph nodes and the original graph nodes to form positive samples for contrastive learning to reuduce noise in the input data of the graph [11]. Graph data augmentation is also combined with self-supervised learning to obtain node embeddings from multiple views, training the model with a multi-head attention network [21]. Comparative learning can be applied to knowledge graphs to reduce the noise of information aggregation [24].

2.4. Contrastive Learning

The main aim of contrastive learning is to learn a latent space that the node embeddings are close to, as well as their corresponding positive samples, and far away from their negative ones [25,26]. To achieve this, contrastive learning uses a strategy of positive and negative sample pairing. For contrastive learning-based instance generation, data augmentation is usually employed to generate data with multiple views; thus, it can be also used to generate positive samples. In computer vision, contrastive learning often involves separating the features of an image using data augmentation techniques such as rotation, scaling, and cropping. The image generated by the same instance is used as a positive sample, while other images are used as negative samples [27,28].

In graph-based collaborative filtering, the multiple views of nodes can construct positive samples by using generative subgraphs, while other nodes can be utilized as negative samples to remove noise from the original graph [11]. Another effective strategy for generating positive samples is neighborhood-based contrast learning. This approach utilizes the concept of similarity of nearest neighbors to generate positive samples. By using the neighbors of the data as positive samples, various strategies can be generated through different definitions of neighbors [29]. The data are divided into clusters based on their features, and nodes within the same cluster are considered neighbors. This means that nodes within the same cluster are closely related, while nodes not in the same clusters are relatively unrelated. Essentially, the clustering process indicates which nodes are grouped together and which ones are separated [13,23,30]. By propagating the graph structure, we can obtain positive samples for contrastive learning from the even-hop homogeneous neighbor nodes of the nodes [13].

3. Related Work

The method proposed in this study is related to two types of previous works: graph community detection methods and the Light-Weight Graph neural network.

3.1. Graph Community Detection

Community detection is a process that aims to identify hidden, similar groups of nodes. By using a community detection algorithm, we can extract additional pseudo-label information from graph structure data without the need for supervised learning. Based on whether nodes can belong to only one community or multiple communities simultaneously, these algorithms are divided into two types: overlapping and non-overlapping community detection [31,32]. This paper uses non-overlapping community detection algorithms for data preprocessing to extract the community labels of nodes. The algorithms commonly used for non-overlapping community detection are primarily based on graph theory and clustering techniques, such as modularity-based methods [33], spectral clustering [34] and K-means [35,36]. These algorithms construct community structures by maximizing intra-node connectivity and minimizing the differences in inter-node connectivity.

3.2. LightGCN

Based on the simplified graph convolution module of LightGCN, this paper uses a graph convolution formula presented as follows:

U_{u}^{k} = f_{p r o p a g a t e} ({I_{i}^{k - 1} | A_{u, i} = 1}),

(1)

U_{u} = f_{r e a d o u t} ([U_{u}^{0}, U_{u}^{1}, \dots, U_{u}^{k}]),

(2)

I_{i}^{k} = f_{p r o p a g a t e} ({U_{u}^{k - 1} | A_{u, i} = 1}),

(3)

I_{i} = f_{r e a d o u t} ([I_{i}^{0}, I_{i}^{1}, \dots, I_{i}^{k}]),

(4)

where

I_{i}^{0}

belongs to

I^{0}

and

U_{u}^{0}

belongs to

U^{0}

, the initial features of users as well as items obtained by initialization of

E^{0}

. Then, we propagate the information through all the neighbors of the user, where

f_{p r o p a g a t e}

aggregates all the item neighbor embeddings at the

k - 1

th level to form the user features

U_{u}^{k}

at the kth level. The embeddings of this user at each layer are combined by

f_{r e a d o u t}

, and then the final user representation

U_{i}

is aggregated. Similarly, items are propagated through all user neighbors, and the embeddings of user neighbors in all

k - 1

layers are aggregated by

f_{p r o p a g a t e}

to obtain

I_{i}^{k}

. Finally, the item embeddings in each layer are aggregated by

f_{r e a d o u t}

to obtain the final embedding

I_{i}

.

4. Proposed Method

This section provides a comprehensive implement details of the proposed CECL in five parts. First, the algorithm design framework is introduced in Section 4.1. In Section 4.2, the graph convolutional collaborative filtering method is described, which learns node features by aggregating information from neighbors, and then learns final representations of nodes for recommendation. In Section 4.3, we describe in detail how to obtain unsupervised community information through community discovery. We also explain how to combine community discovery with self-supervised learning. After that, we introduce the structured contrastive learning strategy used by CECL in Section 4.4. This strategy is based on the neighborhood contrastive learning method, and the embedding is corrected by incorporating more heuristic constraints. Structured neighborhood contrastive learning incorporates the information of higher-order neighborhoods into the representation and constructs positive sample pairs for contrastive learning through two-hop homogeneous neighbors. Finally, in Section 4.5, we describe how to fuse all the above methods into a model for end-to-end training. This approach helps to improve the performance of recommendations.

4.1. Design

The key idea of CECL is inspired by the NCL method, which is based on clusters for contrastive learning sampling [13]. First, CECL employs a community discovery algorithm to obtain the community distribution of the nodes, which is used to encode the node embeddings. Then, it applies GCN to aggregate the embeddings of neighbors to learn high-order representations for each node. The main objective function of our approach is BPR. Additionally, node embeddings are corrected based on Structured Contrastive Learning and Community Center Contrastive Learning, which serve as auxiliary tasks. Furthermore, an MLP model is applied to predict the community that the nodes belong to, correcting node embeddings. Figure 1 shows the details of the overall structure of our proposed CECL.

4.2. Graph Convolution Collaborative Filtering

Our proposed CECL method employs the graph convolution technique to model user and item latent representations. First, the adjacency matrix is constructed in terms of past interactions. The latent representations are aggregated and propagated throughout the graph. Our graph convolution method is based on LightGCN, which excludes nonlinear activations and transforms, simplifying the model and improving training speed. We construct the adjacency matrix

A \in R^{n \times n}

in term of the interaction information between users and items, where n denotes the number of users

| U |

and items

| I |

. The first U dimensions of the adjacency matrix A represent user nodes, and the other

| I |

dimensions represent item nodes. We use a symmetric normalized Laplacian matrix for graph convolution, computed from the graph’s degree matrix D and adjacency matrix A. The Laplacian matrix is computed as follows:

L_{s y m} = I - D^{- \frac{1}{2}} A D .

(5)

The embedding matrix composes the embeddings of the nodes, which is denoted as

E \in R^{n \times d}

, where d is the dimension of the embedding.

E^{i}

represents the embedding after the convolution of layer i, and

E^{0}

is the feature of the nodes, which is initialized by a normal distribution. After multiplying the Laplacian matrix

L_{s y m}

with the embedding matrix E, we can obtain the result after further convolution, according to the following equation:

E^{k + 1} = L_{s y m} \times E^{k} .

(6)

We obtain the final node embedding in graph convolution by averaging the embeddings from the previous k convolutions, as shown in the following equation:

E = [\begin{matrix} U & I \end{matrix}] = \frac{1}{k + 1} \sum_{i = 0}^{k} E^{i},

(7)

where U is the final representation of users and I denotes the final representation of items. The degree of user u preference for item i is represented by the inner product of their embeddings:

{\hat{y}}_{u, i} = U_{u}^{T} I_{i} .

(8)

Finally, we use Bayesian Personalized Ranking (BPR) as the loss function to train CECL. This loss can effectively model the users’ preference by utilizing implicit feedback between users and items [15]. The formula is as follows:

L_{B P R} = - \sum_{(u, i, j) \in D} log σ ({\hat{y}}_{u, i} - y_{u, j}),

(9)

where

σ

denotes the sigmoid function.

D = {(u, i, j) | A_{u, i} = 1, A_{u, j} = 0, 0 < u < | U |,

| U | \leq i < | A |, | U | \leq j < | A |}

denotes the positive and negative sample pairs, where item i has interacted with user u and item j has not interacted with the user.

4.3. Community Discovery for Self-Supervised Learning

Inspired by research on self-supervised link prediction tasks for image classification, we extract more valuable information from user–item interaction matrices to improve recommendations [14]. The task of recommendation falls under link prediction, but utilizing classification as an auxiliary task helps to extract potential community information from user–item interactions, improving the accuracy of link prediction. A specific example is described in Figure 2, which shows that we can use existing community detection methods to assign each node a label indicating which community it belongs to [36]. In this study, users and items are clustered to communities of different types. Since the number of communities is related to the training data, we control the number of classifications generated by clustering through a hyperparameter. Through spectral clustering, we obtain functions

f (u)

and

g (i)

to determine the community number corresponding to user node u and item node i, respectively.

To ensure that the node embeddings accurately reflect the potential community information, we utilize MLP to predict the communities of nodes and optimize MLP using the cross-entropy loss function. This process is able to refine node embeddings, the formula of which is provided as follows:

{\hat{c}}^{u} = σ (W_{u} U_{u} + b_{u}),

(10)

L_{c}^{U} = - \sum_{j = 1}^{| C_{u} |} log (f {(u)}_{j} - {\hat{c}}_{j}^{u}),

(11)

{\hat{c}}^{i} = σ (W_{i} I_{i} + b_{i}),

(12)

L_{c}^{I} = - \sum_{j = 1}^{| C_{i} |} log (g {(i)}_{j} - {\hat{c}}_{j}^{i}),

(13)

L_{c} = L_{c}^{U} + α L_{c}^{I} .

(14)

σ

is the softmax function. u and i denote the user embedding vector and the item embedding vector, respectively;

{\hat{c}}^{u}

represents the predicted probability of each classification that the user is assigned to;

{\hat{c}}_{j}^{u}

represents the probability of user u in community j, the value of which ranges between 0 and 1;

| C_{u} |

and

| C_{i} |

denote the number of user communities and item communities, respectively;

α

is used to weight the impact of users and items on the final result.

Motivated by the Prototypical Contrastive Method, we can utilize contrastive learning and clustering algorithms to bring the embeddings of nodes together within the same cluster, and further apart from the centers of different clusters [23]. Despite the fact that the embeddings of nodes are constantly changing, the clustering structure among nodes remains stable. First, we can use the community detection method to determine the community label of each node. Then, we can integrate the node embeddings in the same cluster as one embedding to represent the community center. As shown in Figure 3, we create a set of positive samples from the same cluster and sample negative samples from other different clusters. We then employ neighbor-based contrastive learning to make the embedding of each node closer to its center embedding and further from the other center embeddings. We then define contrastive optimization objective to learn the community characteristics of users and items, based on NCL that optimizes the contrastive learning objective using the expectation maximum algorithm [37]. According to neighborhood-enriched contrastive learning, we obtain the following equation [13]:

L_{m}^{U} = - \sum_{u = 0}^{| U |} log (\frac{exp (U_{u} \cdot C_{u}^{U} / τ)}{\sum_{j = 0}^{| C^{U} |} exp (U_{u} \cdot C_{j}^{U} / τ)}),

(15)

L_{m}^{I} = - \sum_{i = 0}^{| I |} log (\frac{exp (I_{i} \cdot C_{i}^{I} / τ)}{\sum_{j = 0}^{| C^{I} |} exp (I_{i} \cdot C_{j}^{I} / τ)}) .

(16)

C_{u}^{U}

denotes the community center embedding that node u belongs to. Similarly,

C_{i}^{I}

represents the community center embedding that node i belongs to. Additionally,

τ

is used to determine the temperature of a softmax function. Finally, the total loss of community-centered contrastive learning can be expressed using the following equation:

L_{m} = L_{m}^{I} + α L_{m}^{U} .

(17)

4.4. Contrastive Learning with Higher-Order Neighbors

The NCL method introduced the concept of contrastive learning with higher-order neighbors [13]. This method assumes that nodes are similar to their higher-order homogeneous neighbors, and encourages nodes to move closer to their higher-order homogeneous neighbor embeddings while moving further away from the higher-order homogeneous neighbor embeddings of other nodes. This helps in correcting the original embeddings of nodes. Based on experimental findings of the NCL method, we use the two-hop homogeneous neighbor embeddings of the nodes for contrastive learning. As shown in Figure 4, we consider the node embeddings and the embeddings of the nodes after two convolutions as a pair of positive samples for neighbor-based contrastive learning. This approach helps to correct the embeddings of the nodes by obtaining graph structural information from homogeneous second-order neighbors.

L_{s}^{U} = - \sum_{u = 0}^{| U |} log (\frac{exp (U_{u} \cdot U_{u}^{2} / τ)}{\sum_{j = 0}^{U} exp (U_{u} \cdot U_{j}^{2} / τ)}),

(18)

L_{s}^{I} = - \sum_{i = 0}^{| I |} log (\frac{exp (I_{i} \cdot I_{i}^{2} / τ)}{\sum_{j = 0}^{I} exp (I_{i} \cdot I_{j}^{2} / τ)}) .

(19)

U_{u}^{2}

represents the embedding after two graph convolutions for node u.

I_{i}^{2}

denotes the embedding after two graph convolutions for node i. The loss function of structured contrastive learning is shown in the following equation:

L_{s} = L_{s}^{I} + α L_{s}^{U} .

(20)

4.5. Optimization

To implement our proposed method, which utilizes both supervised and self-supervised learning, as well as pre-training, we require an efficient way to manage the training process. First, we conduct pre-training to obtain the community classification of the nodes for graph community detection. After that, we calculate the losses of community classification, structured contrastive learning, and community central contrastive separately. Finally, we combine the losses of each task using appropriate weights to obtain the final total loss and train the model using gradient-based optimization. The final loss formula is defined as follows:

L = L_{B P R} + λ_{1} L_{s} + λ_{2} L_{c} + λ_{3} L_{m} .

(21)

To control the impact of different components of CECL, hyperparameters

λ_{1}

,

λ_{2}

, and

λ_{3}

are used.

λ_{1}

controls the effect of the structured contrastive learning technique,

λ_{2}

controls the effect of community classification prediction, and

λ_{3}

controls the effect of the community-centered contrastive learning method. These hyperparameters are essential for the model and must be adjusted based on the dataset to achieve optimal performance.

5. Experiments

In this section, several experiments are conducted to verify the effectiveness of the method. First, we introduce the details of datasets, compare baselines models, evaluation criteria, and hyperparameter settings. After that, comparision experiments are conducted to compare CECL with the benchmark methods. Then, ablation experiments are conducted by ablating different components of CECL. A detailed analysis is also provided along with each experiment.

5.1. Datasets

Our experiments are conducted on two well-known recommendation datasets. The first dataset is Movie-Lens 1M, which was released by the GroupLens research group in 2003 [38]. This dataset comprises 1,000,209 ratings provided by 6000 users for 4000 movies. The second dataset is Yelp, which is an extensive dataset containing information about merchants, users, reviews, ratings, and photos published by Yelp, Inc. (San Francisco, CA, USA). The dataset includes data from various cities, industries, and years, making it ideal for recommendation system research. For our study, we focused on implementing a recommendation system using user and item interaction information from the Yelp dataset [39].

In this study, for each dataset, the proportions of training, validation and test set are set to 80%, 10%, and 10%, respectively. The test set is used to evaluate the final performance. To obtain data for subsequent ranking, we treat each pair of interaction as a positive sample, and sample several negative samples uniformly for each positive one. For the Yelp dataset, we filter out the interaction data between 2013 and 2015 to reduce its size and ensure the training speed. We also filter out the records users or items which have less interactions to ensure the validity. The remaining user and item interactions are used as the final experimental dataset.

5.2. Baselines

In this study, seven baselines are compored with our proposed method, CECL:

BPR-MF: A Bayesian posterior optimization-based personalized ranking method is used to learn implicit features of user items through matrix decomposition, optimizing the BPR loss [15].
Neu-MF: End-user rating predictions for items are obtained by replacing the dot product operation in matrix decomposition with a multilayer perceptron [5].
NGCF: CF performance is improved by fusing higher-order neighborhood information with user–item bipartite graphs and aggregating using graph neural networks [6].
DGCF: Splitting the embedding into multiple parts that represent the user’s intentions separately models real user behavior impact. The independent model module ensures each part is independent [7].
LightGCN: The GCN is simplified by removing useless components, making the GCN training speed improved [20].
SGL: Two subgraphs are constructed using data augmentation. Self-supervised contrastive learning is performed by comparing the node features of the subgraphs based on GNN aggregation, which corrects the node embedding to help collaborative filtering [11].
NCL: Neighborhood-based contrast learning corrects node embeddings by constructing positive node sample pairs through semantic and structural neighbors, performing contrast learning [13].

We note that the first two methods are traditional recommendation techniques and the other five methods are based on graph neural networks.

5.3. Metrics

To evaluate the recommendation performance, we employ two metrics that are widely used for recommendation algorithms: Recall and Normalized Discount Cumulative Gain (NDCG). The first metric measures the proportion of real interests of users in the top-K recommendations list. On the other hand, the NDCG metric further incorporates the ranking and item relevance of the recommendation list, normalized based on the discounted cumulative gain (DCG). To ensure stable and reliable experimental results, we conducted a comparative analysis of recall and NDCG under different recommendation list lengths, including Top-10, Top-20, and Top-50.

5.4. Implementation Details

The comparison methods presented in this paper were implemented using the Recbole framework. This framework is a robust and efficient recommendation library that offers a wide range of paper method reproduction, dataset provision and pre-processing, and uniform training metric calculation [40]. The framework Recbole is used to conduct comparison experiments to obtain fair results, which provide the same dataset, pre-processing, and optimization method for each model. We set the hyperparameters according to the NCL: embedding size equals 64, and batch size equals 4096 [13].

5.5. Overall Comparison

Table 1 presents experimental data from our proposed method and seven baseline methods. Based on these data, we can draw several important conclusions as follows:

(1) The two traditional methods of matrix factorization, namely BPR and Neu-MF, do not perform as well as other graph collaborative filtering methods in making recommendations on both datasets. This could be due to their models being too simple and only being able to use direct user–item interaction data, while higher-order information is not utilized effectively. This leads to poor recommendation results especially when handling sparse data. DGCF and NGCF show better improvement over the two traditional methods on both datasets due to their ability to use more information on the graph structure. However, DGCF’s performance is not as good as that of NGCF on Yelp (partial), which may suggest that DGCF is prone to undertraining in smaller data sets, resulting in a lack of recommendation accuracy. This might be beause DGCF requires more user interactions to model the impact of user purchase behavior. LightGCN, the same graph collaborative filtering method, has a significant performance improvement over NGCF and DGCF. This indicates that LightGCN has a significant effect on simplifying graph convolution methods.

(2) For self-supervised methods, SGL shows an improvement in performance over LightGCN. This indicates that the data-enhanced contrastive learning of SGL’s subgraphs is indeed effective. On the other hand, NCLexhibits a significant performance boost over both SGL and LightGCN. This proves that the semantic and structural neighbor contrastive learning methods of NCL are incredibly effective in correcting the results of node embedding with good robustness. Our proposed method, CECL, improved most of the metrics in both datasets compared to NCL. Although a few metrics are comparable to NCL, our proposed innovations, such as community discovery-based classification and community-centered comparative learning, demonstrated their effectiveness in improving the performance of our method.

(3) Finally, we can see that our proposed CECL has different degrees of improvement over most of the metrics compared to baselines due to its node embedding correction during training.

5.6. Ablation Experiment

In this section, several ablation experiments are conducted to verify the effectiveness of each component of the CECL approach. We create three variants of the CECL method.

M1: Remove the community-centered contrastive learning in CECL, keep the self-supervised classification, and conduct control experiments with M2, M3, and CECL.
M2: Remove the self-supervised classification in CECL and the self-supervised community-centered contrastive learning methods. Only the LightGCN + BPR + Contrast Learning with the High-Order Neighbors method is used for training.
M3: Remove self-supervised classification in CECL while keeping community-centered contrastive learning for two control experiments with M1 and CECL.

We conduct ablation experiments on Yelp only using a part of its datasets, and the experimental results are shown in Figure 5 and Figure 6.

Our ablation experiments yield important results. When comparing M1 with M2, we find a significant increase in data metrics after adding unsupervised classification. This finding indicates that our proposed unsupervised classification method is indeed effective.

Similarly, when comparing M1 with M3, we observe a significant improvement in metrics. This result suggests that our proposed unsupervised community-centered contrast learning method is helpful in achieving accurate prediction results.

Furthermore, our comparison of M1 with CECL reveals that CECL can achieve substantial improvement, underscoring its effectiveness in producing recommended results.

Finally, when comparing M2 and M3 with CECL, we observe that CECL outperforms M2 and M3. This result further confirms the effectiveness of fusing unsupervised classification methods and unsupervised community-centered contrastive learning methods.

5.7. Discussion

NCL uses the traditional K-means clustering method to cluster the nodes [13]. SGL uses data augmentation to obtain positive samples while other nodes are used as negative samples [11]. The proposed method differs from other self-supervised learning approaches because it incorporates the use of existing unsupervised community discovery algorithms to extract hidden community information from the graph structure, which helps alleviate the issue of data sparsity and improves recommendation accuracy. By combining an unsupervised community discovery-based algorithm with contrastive learning and classification prediction, it transforms into a self-supervised learning approach. By integrating community discovery algorithms, it is possible to assist with self-supervised learning, which in turn helps to generate effective recommendation results.

In terms of time complexity, CECL is based on the specific community discovery algorithm and only runs the community discovery algorithm once per training to obtain positive and negative samples, while NCL needs to run K-means once per epoch to obtain dynamic cluster information [13]. SGL also needs to run data augmentation once per epoch to generate a new subgraph for contrastive learning [11]. Therefore, CECL has a relatively large reduction in training time.

6. Conclusions

Our work proposes a novel method for community discovery and self-supervised learning in graph collaborative filtering recommendation systems to alleviate the data sparsity issues. Our approach enhances node representations by mining latent information. We tune the node embeddings globally and locally, respectively. First, we detect communities based on the graph structure to obtain potential community information of the nodes. This information is used as a pseudo-label for classification. The node embedding is then aligned with a global distribution, and global features are introduced. Second, we cluster the embedding within communities by comparing nodes with community centers. This separation between communities maintains the semantic characteristics of the graph structure. Finally, we use structured contrast targets in combination with graph collaborative filtering to aggregate node nearest neighbor features. Experimental results show that CECL outperforms the traditional method, NGCF, by 12% and 6%, as well as the more effective contrastive learning method, SGL, by 7% and 6%. Furthermore, our method outperforms NCL on both datasets, demonstrating the effectiveness of CECL.

In our future work, we plan to delve deeper into community detection by leveraging contextual information such as social networks to extract more stable and effective community structures. We believe that this can assist self-supervised learning to improve performance. Additionally, we intend to explore alternative methods to define node neighbors for contrast learning.

Author Contributions

Conceptualization, W.M.; Methodology, X.X. and E.Z.; Software, X.X., J.Z. and E.Z.; Validation, X.X.; Investigation, X.X. and J.Z.; Resources, W.M.; Data curation, X.X.; Writing—original draft, X.X.; Writing—review & editing, X.X. and W.M.; Visualization, X.X.; Supervision, W.M., J.Z. and E.Z.; Project administration, W.M.; Funding acquisition, W.M. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported by the National Nature Science Foundation of China (No. 61602399), Shandong Provincial Nature Science Foundation, China (ZR2020MF100), and Youth Innovation Science and Technology Support Program of Shandong Provincial under Grant 2021KJ080.

Data Availability Statement

Data presented in this study are openly available in CECL_GNN at https://zenodo.org/records/10216156.

Conflicts of Interest

The authors declare no conflict of interest.

References

Ricci, F.; Rokach, L.; Shapira, B. Introduction to Recommender Systems Handbook. In Recommender Systems Handbook; Springer: Boston, MA, USA, 2011; pp. 1–35. [Google Scholar]
Covington, P.; Adams, J.; Sargin, E. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 191–198. [Google Scholar]
Su, X.; Khoshgoftaar, T.M. A Survey of Collaborative Filtering Techniques. Adv. Artif. Intell. 2009, 2009, 421425. [Google Scholar] [CrossRef]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-Based Collaborative Filtering Recommendation Algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web, Geneva, Switzerland, 3–7 April 2017; pp. 173–182. [Google Scholar]
Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.S. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
Wang, X.; Jin, H.; Zhang, A.; He, X.; Xu, T.; Chua, T.S. Disentangled Graph Collaborative Filtering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; pp. 1001–1010. [Google Scholar]
Han, L.; Qin, J.; Xia, B. Enhanced Social Recommendation Method Integrating Rating Bias Offsets. Electronics 2023, 12, 3926. [Google Scholar] [CrossRef]
Aldayel, M.; Al-Nafjan, A.; Al-Nuwaiser, W.M.; Alrehaili, G.; Alyahya, G. Collaborative Filtering-Based Recommendation Systems for Touristic Businesses, Attractions, and Destinations. Electronics 2023, 12, 4047. [Google Scholar] [CrossRef]
Wu, J.; Wang, X.; Feng, F.; He, X.; Chen, L.; Lian, J.; Xie, X. Self-Supervised Graph Learning for Recommendation. arXiv 2021, arXiv:2010.10783. [Google Scholar]
Wu, L.; Lin, H.; Tan, C.; Gao, Z.; Li, S.Z. Self-Supervised Learning on Graphs: Contrastive, Generative, or Predictive. IEEE Trans. Knowl. Data Eng. 2023, 35, 4216–4235. [Google Scholar] [CrossRef]
Yu, J.; Yin, H.; Xia, X.; Chen, T.; Cui, L.; Nguyen, Q.V.H. Are Graph Augmentations Necessary?: Simple Graph Contrastive Learning for Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–25 July 2022; pp. 1294–1303. [Google Scholar]
Lin, Z.; Tian, C.; Hou, Y.; Zhao, W.X. Improving Graph Collaborative Filtering with Neighborhood-Enriched Contrastive Learning. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 2320–2329. [Google Scholar]
Zhu, Q.; Du, B.; Yan, P. Self-supervised Training of Graph Convolutional Networks. arXiv 2020, arXiv:2006.02380. [Google Scholar]
Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 452–461. [Google Scholar]
Koren, Y.; Bell, R.; Volinsky, C. Matrix Factorization Techniques for Recommender Systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Xin, X.; He, X.; Zhang, Y.; Zhang, Y.; Jose, J. Relational Collaborative Filtering: Modeling Multiple Item Relations for Recommendation. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 125–134. [Google Scholar]
Wang, H.; Wang, N.; Yeung, D.Y. Collaborative Deep Learning for Recommender Systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 August 2015; pp. 1235–1244. [Google Scholar]
He, X.; Gao, M.; Kan, M.Y.; Wang, D. Birank: Towards ranking on bipartite graphs. IEEE Trans. Knowl. Data Eng. 2016, 29, 57–71. [Google Scholar] [CrossRef]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; pp. 639–648. [Google Scholar]
Zhu, J.; Li, K.; Peng, J.; Qi, J. Self-Supervised Graph Attention Collaborative Filtering for Recommendation. Electronics 2023, 12, 793. [Google Scholar] [CrossRef]
Defferrard, M.; Bresson, X.; Vandergheynst, P. Convolutional Neural Networks on Graphs with Fast Localized Spectral Filtering. arXiv 2016, arXiv:1606.09375. [Google Scholar]
Li, J.; Zhou, P.; Xiong, C.; Hoi, S.C.H. Prototypical Contrastive Learning of Unsupervised Representations. arXiv 2021, arXiv:2005.04966. [Google Scholar]
Jiang, L.; Yan, G.; Luo, H.; Chang, W. Improved Collaborative Recommendation Model: Integrating Knowledge Embedding and Graph Contrastive Learning. Electronics 2023, 12, 4238. [Google Scholar] [CrossRef]
He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar]
Wu, Z.; Xiong, Y.; Yu, S.X.; Lin, D. Unsupervised Feature Learning via Non-parametric Instance Discrimination. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 3733–3742. [Google Scholar]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A Simple Framework for Contrastive Learning of Visual Representations. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Ye, M.; Zhang, X.; Yuen, P.C.; Chang, S.F. Unsupervised embedding learning via invariant and spreading instance feature. In Proceedings of the 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Long Beach, CA, USA, 15–20 June 2019; pp. 6210–6219. [Google Scholar]
Yèche, H.; Dresdner, G.; Locatello, F.; Hüser, M.; Rätsch, G. Neighborhood Contrastive Learning Applied to Online Patient Monitoring. In Proceedings of the 38th International Conference on Machine Learning, Online, 18–24 July 2021; pp. 11964–11974. [Google Scholar]
Caron, M.; Misra, I.; Mairal, J.; Goyal, P.; Bojanowski, P.; Joulin, A. Unsupervised Learning of Visual Features by Contrasting Cluster Assignments. In Proceedings of the 34th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 6–12 December 2020; pp. 9912–9924. [Google Scholar]
Epasto, A.; Lattanzi, S.; Paes Leme, R. Ego-Splitting Framework: From Non-Overlapping to Overlapping Clusters. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 145–154. [Google Scholar]
Ye, F.; Chen, C.; Zheng, Z. Deep Autoencoder-like Nonnegative Matrix Factorization for Community Detection. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 1393–1402. [Google Scholar]
Newman, M.E. Modularity and community structure in networks. Proc. Natl. Acad. Sci. USA 2006, 103, 8577–8582. [Google Scholar] [CrossRef] [PubMed]
Ng, A.Y.; Jordan, M.I.; Weiss, Y. On Spectral Clustering: Analysis and an Algorithm. In Proceedings of the 14th International Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 3–8 December 2001; pp. 849–856. [Google Scholar]
Ma, X.; Gao, L.; Yong, X.; Fu, L. Semi-supervised clustering algorithm for community structure detection in complex networks. Phys. A Stat. Mech. Its Appl. 2010, 389, 187–197. [Google Scholar] [CrossRef]
Prat-Pérez, A.; Dominguez-Sal, D.; Larriba-Pey, J.L. High Quality, Scalable and Parallel Community Detection for Large Real Graphs. In Proceedings of the 23rd International Conference on World Wide Web, Seoul, Republic of Korea, 7–11 April 2014; pp. 225–236. [Google Scholar]
Dempster, A.P.; Laird, N.M.; Rubin, D.B. Maximum likelihood from incomplete data via the EM algorithm. J. R. Stat. Soc. Ser. B (Methodol.) 1977, 39, 1–22. [Google Scholar] [CrossRef]
Harper, F.M.; Konstan, J.A. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 2015, 5, 1–19. [Google Scholar] [CrossRef]
Asghar, N. Yelp Dataset Challenge: Review Rating Prediction. arXiv 2016, arXiv:1605.05362. [Google Scholar]
Zhao, W.X.; Mu, S.; Hou, Y.; Lin, Z.; Chen, Y.; Pan, X.; Li, K.; Lu, Y.; Wang, H.; Tian, C.; et al. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, 1–5 November 2021; pp. 4653–4664. [Google Scholar]

Figure 1. Overall structure of our proposed CECL method.

Figure 2. Graph Community Discover. Clustering users and items into separate communities, each node can obtain the label which indicates the community it belongs to.

Figure 3. Community center contrast learning sample. Nodes form a positive sample with the center of the community in which they are located and a negative sample with the centers of other homogeneous communities.

Figure 4. Structral contrastive sample with two-hop homogeneous neighbors. Two-hop homogeneous neighbors are used as positive samples and homogeneous neighbors are used as negative samples.

Figure 5. Ablation Experiment Result of Recall.

Figure 6. Ablation Experiment Result of NDCG.

Table 1. Performance comparison results compared with other methods.

Datasets	Metrics	BPR	Neu-MF	NGCF	DGCF	LightGCN	SGL	NCL	CECL
ML-1M	Recall10	0.1741	0.1604	0.1744	0.1797	0.1878	0.1881	0.2058	0.2054
	Recall20	0.2641	0.2520	0.2813	0.2727	0.2653	0.2846	0.3042	0.3068
	Recall50	0.4221	0.4120	0.4232	0.4344	0.4454	0.4499	0.4682	0.4725
	NDCG10	0.2394	0.2402	0.2387	0.2476	0.2531	0.2534	0.2714	0.2722
	NDCG20	0.2504	0.2446	0.2500	0.2586	0.2640	0.2664	0.2836	0.2850
	NDCG50	0.2941	0.2837	0.2942	0.3034	0.3102	0.3107	0.3297	0.3316
Yelp (partial)	Recall10	0.0537	0.0941	0.1445	0.1117	0.1564	0.1450	0.1930	0.1921
	Recall20	0.0853	0.1525	0.2249	0.2096	0.2310	0.2395	0.2588	0.2733
	Recall50	0.1629	0.3608	0.3750	0.3393	0.3824	0.4116	0.4167	0.4222
	NDCG10	0.0291	0.0615	0.1012	0.0673	0.0925	0.1082	0.1361	0.1331
	NDCG20	0.0397	0.0800	0.1259	0.0977	0.1147	0.1375	0.1568	0.1589
	NDCG50	0.0592	0.1289	0.1624	0.1298	0.1520	0.1785	0.1955	0.1941

Bolded numbers are the best results.The full name of dataset ML-1M is MovieLens-1M.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Xia, X.; Ma, W.; Zhang, J.; Zhang, E. Community-Enhanced Contrastive Learning for Graph Collaborative Filtering. Electronics 2023, 12, 4831. https://doi.org/10.3390/electronics12234831

AMA Style

Xia X, Ma W, Zhang J, Zhang E. Community-Enhanced Contrastive Learning for Graph Collaborative Filtering. Electronics. 2023; 12(23):4831. https://doi.org/10.3390/electronics12234831

Chicago/Turabian Style

Xia, Xuchen, Wenming Ma, Jinkai Zhang, and En Zhang. 2023. "Community-Enhanced Contrastive Learning for Graph Collaborative Filtering" Electronics 12, no. 23: 4831. https://doi.org/10.3390/electronics12234831

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Community-Enhanced Contrastive Learning for Graph Collaborative Filtering

Abstract

1. Introduction

2. Literature Review

2.1. Collaborative Filtering

2.2. Graph Collaborative Filtering

2.3. Self-Supervised Learning

2.4. Contrastive Learning

3. Related Work

3.1. Graph Community Detection

3.2. LightGCN

4. Proposed Method

4.1. Design

4.2. Graph Convolution Collaborative Filtering

4.3. Community Discovery for Self-Supervised Learning

4.4. Contrastive Learning with Higher-Order Neighbors

4.5. Optimization

5. Experiments

5.1. Datasets

5.2. Baselines

5.3. Metrics

5.4. Implementation Details

5.5. Overall Comparison

5.6. Ablation Experiment

5.7. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI