Next Article in Journal
Suzuki–Ćirić-Type Nonlinear Contractions Employing a Locally ζ-Transitive Binary Relation with Applications to Boundary Value Problems
Previous Article in Journal
Dual-Objective Reinforcement Learning-Based Adaptive Traffic Signal Control for Decarbonization and Efficiency Optimization
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Improving Graph Collaborative Filtering from the Perspective of User–Item Interaction Directly Using Contrastive Learning

1
College of Computer Science and Technology (College of Data Science), Taiyuan University of Technology, Taiyuan 030024, China
2
Shan Xi Energy Internet Research Institute, Taiyuan 030000, China
3
College of Computer Science and Technology, Taiyuan Normal University, Taiyuan 030619, China
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(13), 2057; https://doi.org/10.3390/math12132057
Submission received: 23 May 2024 / Revised: 12 June 2024 / Accepted: 25 June 2024 / Published: 30 June 2024

Abstract

:
Graph contrastive learning has demonstrated significant superiority for collaborative filtering. These methods typically use augmentation technology to generate contrastive views, and then train graph neural networks with contrastive learning as an auxiliary task. Although these methods are very effective, they do not consider using contrastive learning from the perspective of user–item interaction. As a result, they do not fully leverage the potential of contrastive learning. It is well known that contrastive learning can maximize the consistency of positive pairs and minimize the agreement of negative pairs. Collaborative filtering expects high consistency between users and the items they like and low consistency between users and the items they dislike. If we treat the items that users like as positive examples and the items they dislike as negative examples, contrastive learning can work very well with the goal of collaborative filtering. Based on the above understanding, we propose a new objective function called DCL loss, which improves graph collaborative filtering from the perspective of user–item interaction when Directly using Contrastive Learning. Extensive experiments have shown that when a model adopts DCL loss as its objective function, both its recommendation performance and training efficiency exhibit significant improvements.

1. Introduction

In the past few decades, a large amount of information has poured into people’s worlds, making it more and more difficult for people to find what they are interested in [1]. The recommendation system can filter out information that people are not interested in, and recommend information that people are interested in, so as to alleviate the information overload problem [1,2,3,4]. Therefore, recommender systems have become a hot topic in industry and academia. In a recommendation system, collaborative filtering is a classic algorithm. It uses historical user–item interactions to obtain users’ preferences, and then recommends related items to users [1,5,6,7]. Due to its simplicity and ease of use, collaborative filtering has been widely used and studied by a large number of researchers.
Initially, the matrix factorization technique was applied to collaborative filtering and achieved quite good recommendation performance [8,9]. But matrix factorization cannot leverage neighbor information, thus limiting its recommendation performance. The graph neural network can obtain neighbor information through the message-passing mechanism [10]. As a result, some researchers have applied graph neural networks to collaborative filtering to further improve recommendation performance. Although graph collaborative filtering is very effective, it still has two main shortcomings. One is the over-smoothing problem of graph neural networks. As the number of graph neural network layers increases, the node embeddings will become more and more similar, which affects the performance of the network [11]. At present, there is still no good solution to the over-smoothing problem. Therefore, graph neural networks are generally designed with 3–5 layers to achieve the best performance. The other is the problem of data sparsity. Data sparsity means that in the entire dataset, a specific user has interacted with very few items and has not interacted with many items. To alleviate the problem of data sparsity, researchers have introduced contrastive learning [12].
Graph contrastive learning integrates contrastive learning into graph neural networks to improve the performance of collaborative filtering [13]. It usually uses a graph neural network as the backbone and integrates contrastive learning as an auxiliary task to train the graph neural network. The overall framework of the current graph contrastive learning models is shown in Figure 1. During the model training stage, the joint loss function composed of BPR loss [14] and InfoNCE loss [15] is employed. The calculation process of BPR loss is illustrated in Figure 1c. BPR loss is a typical pair-wise loss function. It encourages increasing the discrimination between positive and negative examples by computing the dot product difference between them [16]. The auxiliary loss function L C L refers to InfoNCE loss, which is commonly used in contrastive learning. Its calculation process is shown in Figure 1f. It is obtained by employing augmentation methods to capture different views of users and items, which are then used in the calculation of InfoNCE loss. These methods include stochastic augmentation (e.g., node/edge perturbation) on the user–item interaction graph, heuristic-based augmentation techniques (e.g., user clustering), etc. [17].
While the existing methods prove effective, they are still plagued by the inadequate exploitation of negative examples within the data. Specifically, there is a substantial quantity of items in a given user’s data that the user has not interacted with. Nonetheless, in the BPR loss function, for every instance of user–item interaction, only a single item which the user has not interacted with is chosen as a negative example. This consequently gives rise to a major shortcoming in graph contrastive learning, as it fails to harness the full potential of items that a particular user has not interacted with in the recommendation task. Even after hundreds of training epochs, a positive example pair only matches with a few hundred negative items. This remains vastly insufficient in terms of leveraging negative examples, thereby not only diminishing the model’s recommendation performance but also hampering its training efficiency.
We can observe that the current graph contrastive learning models perform the selection of positive and negative examples on two occasions. One is during the usage of BPR loss, and the other is during the application of contrastive learning. Actually, we can consider directly applying contrastive learning from the perspective of user–item interaction to enable the model to fully leverage negative examples. We regard the items that a user has interacted with as positive examples and the remaining items as negative examples. Then, we can directly use contrastive learning to train the model. This approach is feasible because contrastive learning can work very well with the goal of collaborative filtering. Collaborative filtering expects a high consistency between users and the items they like and a low consistency between users and the items they dislike. Contrastive learning can maximize the consistency of positive pairs and minimize the agreement of negative pairs [12,13].
In this paper, we investigate the training of graph collaborative filtering models directly using contrastive learning. The overall framework of the initially proposed scheme is illustrated in Figure 2. The objective function involved is L D C L _ N . The calculation process of L D C L _ N is illustrated in Figure 2d. Positive and negative examples are selected from both user and item perspectives. For a user, all items that they have interacted with are regarded as positive examples and all items that they have not interacted with are regarded as negative examples. For an item, a model’s method of selecting positive and negative examples is the same as that of the user. By doing so, a model can fully leverage negative examples. Next, we conduct experiments on this scheme. The experimental results are decent, but there is still a noticeable gap compared to the current state-of-the-art graph contrastive learning models. On the one hand, this demonstrates the feasibility of training graph collaborative filtering models directly using contrastive learning. On the other hand, it also compels us to make improvements to this scheme. The slightly inferior performance of L D C L _ N may be attributed to its insufficient and inadequate utilization of positive examples. Therefore, inspired by BPR loss [14], we select positive and negative examples based on each user–item interaction. The computation process of the improved objective function L D C L is depicted in Figure 3. For each user–item interaction, the user and item involved in the interaction are considered positive examples of each other. For the user, we randomly select n items that it has not interacted with as negative examples. For the item, we randomly select n users who have not interacted with it as negative examples. Subsequently, the improved objective function L D C L is constructed. This can ensure the full utilization of positive examples while adequately leveraging negative examples. In the end, we perform a lot of experiments. The experimental results show that when a model uses L D C L as the objective function, both the recommendation performance and the training efficiency will be greatly improved.
In summary, the main contributions of this work are as follows:
  • We propose selecting positive and negative examples from the perspective of user–item interaction, which can enable contrastive learning to effectively achieve the goal of collaborative filtering.
  • We propose the objective function DCL loss, which significantly improves both the performance and training efficiency of the graph collaborative filtering models.
  • We conduct extensive experiments on four benchmark datasets to demonstrate the superiority of DCL loss.
The remainder of this paper is organized as follows: In Section 2, we first provide a review of the related literature. In Section 3, we introduce graph collaborative filtering and graph contrastive learning. Section 4 provides a detailed introduction to the original objective function DCL_N loss and the improved objective function DCL loss. Extensive experimental research is presented in Section 5. We conclude this work in Section 6.

2. Related Work

2.1. Collaborative Filtering

The principle of collaborative filtering is that people tend to cluster together based on similarities, and items are grouped based on common characteristics. Initially, collaborative filtering directly obtains user similarity through a co-occurrence matrix. However, this approach results in the problems of a significant head effect and weak generalization ability. Matrix factorization is introduced to alleviate these issues by incorporating latent vectors [9]. BiasMF [18] introduces user and item bias into the matrix factorization technique to obtain the users’ intrinsic preferences for items. ConvNCF [19] initially produces an outer product between user and item vectors to obtain a two-dimensional matrix, which is subsequently fed into a convolutional network as an input to derive a preference score for the user towards the item.DMF [20] adopts row vectors from the user–item interaction matrix as user feature vectors and column vectors as item feature vectors, subsequently constructing a deep learning model for training. ENMF [21] employs an efficient neural matrix factorization model without sampling, utilizing all negative instances in each training iteration. But matrix factorization does not exploit high-order information, making its recommendation performance limited. Graph neural networks have the ability to aggregate information from neighborhood nodes. Therefore, they are being increasingly applied in this field. GC-MC [22] combines the graph convolution technique with the matrix factorization technique to obtain the local structure and the global structure of the bipartite graph. Pinsage [3] constructs item graphs based on item features and learns item representations through graph convolutional networks for web-scale recommendation. NGCF [8] applies a graph neural network to collaborative filtering, and concatenates the output of each layer as the final representation of the nodes.Multi-GCCF [23] designs a decomposer based on graph convolution to distinguish users’ intention to purchase items, and fuses different types of representations of nodes through a combiner. DGCF [24] decouples users’ intentions to purchase products to form different subgraphs, and concatenates the node sub-embeddings learned from different subgraphs to obtain the final embeddings. LightGCN [10] removes the feature transformation and nonlinear activation operations commonly used in graph neural networks, which greatly simplifies graph collaborative filtering. DHCF [25] uses hypergraphs to add features of items, so that the model can learn better node representations. HGCF [26] defines user and item embeddings into hyperbolic space, and encodes high-level node information through skip-GCN connection. HICF [27] proposes a new method in the hyperbolic space to alleviate the problem of a long tail effect. In addition, some researchers improve its performance by studying objective functions. UltraGCN [28] approximates the effect of infinite-layer graph convolutions by optimizing the proposed constraint loss. DirectAU [29] proposes a new loss function called DiretAU for collaborative filtering from the perspective of the alignment and uniformity of node vector distribution. However, the integration of these objective functions with graph neural networks does not lead to a significant improvement in recommendation performance.

2.2. Graph Contrastive Learning

Contrastive learning, an unsupervised learning methodology, has already demonstrated substantial potential in the domain of deep learning. In image processing, models like MoCo [30] and SimCLR [31] are notably recognized in contrastive learning applications, while in natural language processing, ConSERT [32] and SimCSE [33] are distinguished models within this learning paradigm. Integrating contrastive learning into Graph Neural Networks has also been proven to augment model capabilities. Currently, the realm of collaborative filtering is also witnessing numerous renowned graph-based contrastive learning models. They primarily focus on how to generate better contrastive views. SGL [13] generates views through three methods, such as node perturbation, edge perturbation, and random walk. NCL [34] believes that most graph contrastive learning models use random sampling to generate contrastive pairs, which cannot give full play to the performance of contrastive learning. Therefore, semantic neighbor nodes are constructed based on clustering to form the new different views of contrastive learning. SimGCL [35] believes that it is not necessary to generate different perspectives through graph augmentation technology, which is not only troublesome and time-consuming but also unnecessary. It can directly add noise to node representations to form different perspectives, but its network structure is still the same as that of SGL [13]. XSimGCL [12] further proves that graph augmentation technology is unnecessary, and further simplifies the structure of SimGCL [35], which improves the efficiency of training by about three times. HCCF [36] uses a hypergraph cross-contrast learning structure to capture local and global collaborative information. LightGCL [17] generates the views of contrastive learning in the process of singular value decomposition to obtain global language information. DCCF [37] proposes a disentangled contrastive learning method for recommendation that explores latent factors underlying implicit intent for interactions. AutoCF [38] proposes a new method to automatically perform data augmentation. SimRec [39] proposes a simple and effective collaborative filtering model that marries the power of knowledge distillation and contrastive learning. Figure 1 illustrates the framework of previous works, which employ distinct augmentation techniques in contrastive tasks. However, in recommendation tasks, their reliance on BPR loss as the objective function hinders the full utilization of negative data. Moving forward, Section 3 will provide an in-depth examination of several notable models within the domain of graph-based collaborative filtering, paving the way for the introduction of our study in Section 4.

3. Preliminary Examination

In this section, we first introduce the basic concepts, common notations, and objective function of graph collaborative filtering. Then, we clarify the specific methodology of the current graph contrastive learning by introducing the well-known model XSimGCL. Finally, we analyze the advantages and disadvantages of the current graph contrastive learning.

3.1. Graph Collaborative Filtering

Collaborative filtering is a very classic recommendation algorithm. Its principle is to obtain users’ preferences based on historical user–item interactions, so as to recommend items to target users. Graph collaborative filtering regards users and items as nodes and treats known interactions as edges to obtain a user–item bipartite graph. Then, each user (item) is assigned an initial embedding. Subsequently, user’s preference scores for all items are obtained by training the graph neural network. Finally, the items with high scores are recommended to the corresponding users [34].
Specifically, define U = { u } to represent the user set, I = { i } to represent the item set, and R { 0 , 1 } | U | × | I | to represent the observed interaction matrix. If there is an interaction between user u and item i, then R u , i = 1 , and if there is no interaction between user u and item i, then R u , i = 0 . Let G = { V , E } denote the user–item bipartite graph, where V = { U I } is the set of nodes, and E = { ( u , i ) | u U , i I , R u , i = 1 } is the set of edges. Then, each user (item) is assigned a d-dimensional embedding [10]. For example, user u is assigned the embedding e u ( 0 ) , and item i is assigned the embedding e i ( 0 ) . Next, these embeddings are concatenated together to form an initial embedding matrix E ( 0 ) R | N | × d , where | N | is the number of nodes. Namely, E ( 0 ) is denoted as
E ( 0 ) = e u 0 ( 0 ) , e u 1 ( 0 ) , , e i 0 ( 0 ) , e i 1 ( 0 ) , T
Subsequently, the final embedding matrix E R | N | × d for score prediction is obtained by training the graph neural network. Finally, users’ preference scores for items are obtained through dot product calculation. This can be summarized in the following paradigm:
E = f ( G , E ( 0 ) ) , y u , i = e u T e i
where f ( · ) represents the graph neural network model used [39], e u is the final embedding of user u, e i is the final embedding of item i, e u and e i are embeddings in E , and y u , i is user u’s preference score for item i.
This method generally processes the dataset in the format of triplets. Let O = { ( u , i , j ) | R u , i = 1 , R u , j = 0 } represent the training dataset, where R u , i and R u , j represent the entry of the observed implicit feedback matrix R (namely, user u and item i have an edge, while user u and item j have no edge, on the user–item bipartite graph). The commonly used pair-wise function is BPR loss [14]. BPR loss is defined as
L B P R = ( u , i , j ) B l o g σ ( e u T e i e u T e j )
where B is a subset of the training set O , which denotes a mini-batch, and σ represents the Sigmoid function.

3.2. Graph Contrastive Learning

Graph contrastive learning trains graph neural networks by integrating contrastive learning as an auxiliary task. The overall framework is shown in Figure 1. Here, we specifically use XSimGCL [12] to illustrate the process of graph contrastive learning. XSimGCL [12] uses LightGCN [10] as the backbone. LightGCN [10] obtains the final embedding matrix E by iteratively performing neighbor aggregation operation as follows:
E = 1 L + 1 i = 0 L E ( i ) = 1 L + 1 i = 0 L A i E ( 0 )
where E ( i ) represents the i-th-layer embedding matrix, A R | N | × | N | is the regularized undirected adjacency matrix without self-connection, and L is the number of the network layers.
But XSimGCL [12] uses the representation-level noise perturbation technique to generate contrastive views. During the training process, the encoder used is defined as
E = 1 L i = 1 L ( E ( i ) ) = 1 L i = 1 L A ( E ( i 1 ) ) + Δ ( i )
where E ( i ) = A ( E ( i 1 ) ) , ( E ( i ) ) is the embedding matrix of noise perturbation for E ( i ) (namely, ( E ( i ) ) = E ( i ) + Δ ( i ) ). Δ ( i ) is random noise generated according to E ( i ) , and Δ ( i ) must satisfy the following conditions:
Δ ( i ) = X s i g n ( E ( i ) ) , X R d U ( 0 , 1 ) ; Δ ( i ) 2 = η
where η is a constant that controls the amount of noise added.
Then, the final embedding matrix E and the the K-th-layer noise-perturbed embedding matrix ( E ( K ) ) will be selected as two views for contrastive learning ( K [ 1 , L ] ). Next, the auxiliary loss L C L is defined as
L C L = i B l o g e x p ( z i T ( z i ( K ) ) / τ 1 ) j B e x p ( z i T ( z j ( K ) ) / τ 1 )
where B is a mini-batch, i and j are the users/items included in B , z i and ( z i ( K ) ) are L 2 normalization representations (namely, z i = e i e i 2 , ( z i ( K ) ) = ( e i ( K ) ) ( e i ( K ) ) 2 ), e i is the embedding in E , ( e i ( K ) ) is the embedding in ( E ( K ) ) ), and τ 1 > 0 is the temperature coefficient used to adjust the magnitude of penalties applied to hard negative samples. The joint loss function is defined as
L = L B P R + λ L C L
where λ is a hyperparameter, which is used to adjust the weight of contrastive learning in the whole learning process.

3.3. Further Analysis

The above paradigm of graph contrastive learning is very effective. In general, graph contrastive learning models have higher recommendation performance and training efficiency compared to graph neural network models. Some papers on graph contrastive learning explain that integrating contrastive learning can increase the uniformity of node embedding distribution, thereby further alleviating the pervasive popularity bias and promoting long-tail items [12,35]. The pervasive popularity bias in the recommendation system means that usually, the recommendation system will recommend popular items to users with a high probability, but will not recommend some relatively unpopular items. Adding contrastive learning can increase the spatial distance between users and popular items, thereby reducing the probability of recommending popular items and promoting long-tail items.
We believe that the superiority of graph contrastive learning models over graph neural network models lies in their ability to more fully leverage data. The objective function of the graph collaborative filtering models is BPR loss. They use the dataset in the format of triplets, which means that in each triplet, there is only one positive pair and one negative pair. However, we know that in collaborative filtering data, for a specific user, there are a large number of items that the user has not interacted with. Therefore, this raises the issue that using BPR loss as the objective function does not effectively utilize the negative examples in the data. Regarding the solution to this matter, we already mentioned it in the first section.By integrating contrastive learning, a new objective function called InfoNCE loss [15] is introduced. This allows the model to not only leverage the data from user–item interactions, but also handle relationships between users and users, as well as items and items. However, this still does not address the issue of negative examples not being fully utilized in the recommendation task. In graph contrastive learning, the main task still utilizes BPR loss as the objective function, and negative examples are still not fully utilized.
Abandoning BPR loss is necessary to address the issue of negative examples not being fully utilized. Moreover, it is possible to abandon BPR loss. InfoNCE loss and BPR loss serve a similar purpose as they both encourage increasing the discrimination between positive and negative examples. By using InfoNCE loss as the objective function, it allows the model to fully utilize negative examples and further alleviate the issue of data sparsity. Additionally, as mentioned in the first section, selecting positive and negative examples from the perspective of user–item interactions can align the purpose of contrastive learning with the goal of collaborative filtering. In the next section, we make appropriate adjustments to InfoNCE loss to replace the BPR loss used in different models. By doing so, we can fully unleash the potential of contrastive learning, further improving its recommendation performance and enhancing training efficiency.

4. Methodology

In this section, we provide a detailed introduction to the method of training graph collaborative filtering models directly using contrastive learning. We design two methods for selecting positive and negative examples from the perspective of user–item interactions. Based on these methods, we construct the original objective function DCL_N loss and the improved objective function DCL loss.

4.1. Encoder

We needed to obtain the final embeddings for both users and items through an encoder. The process is the same as traditional graph collaborative filtering models, and can be summarized by the following equation:
E = E n c o d e r ( G , E ( 0 ) )
where E ( 0 ) is the initialized embedding matrix, and its specific expression is given by Equation (1), and G is the user–item bipartite graph. The E n c o d e r ( · , · ) utilizes the user–item bipartite graph and the randomly initialized embedding matrix E ( 0 ) to obtain the final embedding matrix E . This encoder can be the same as the BPRMF [14] algorithm, directly setting E = E ( 0 ) . It can also be directly encoded using the LightGCN [10] model, as shown in Equation (4). And this can also be performed similarly to XSimGCL [12] by adding noise at the node representation level for encoding, as shown in Equation (5). In conclusion, any effective model that has been proposed can be used as an encoder, whether it is a graph neural network model or a graph contrastive learning model.

4.2. DCL_N Loss

In Section 2.1, we mentioned that the dataset is in the form of triplets. However, as this type of dataset no longer met our requirements, we had to change the format of the dataset to adapt to our method. We attempted to construct the dataset through nodes on the user–item bipartite graph and train the network directly using contrastive learning. The overall framework is illustrated in Figure 2. Specifically, let dataset O = V , where V = { U I } is the set of nodes on the graph. In each training, we will randomly select some nodes to form a new set as a mini-batch B = { v 1 , v 2 , , v b } . The selection of positive and negative examples is shown in Figure 2b,c. We use S v i + to represent the positive example set of node v i , and use S v i to represent the negative example set of node v i . If v i U , then S v i + is the set of all items that the user v i has interacted with, S v i is the set of all items that the user v i has not interacted with. If v i I , S v i + is the set of all users that the item v i has interacted with, S v i is the set of all users that the item v i has not interacted with. Next, we define the DCL_N loss:
L D C L _ N = 1 | B | v i B l o g ( p S v i + e x p ( z v i T z p / τ ) q S v i e x p ( z v i T z q / τ ) )
where z * is the embedding after L 2 normalization of the embedding e * (namely z * = e * e * 2 ), e * corresponds to the final embedding of the node ∗, ∗ refers to { v i , p , q } ), and τ is the temperature coefficient. The design of this loss function is referenced from paper [40].
It is worth noting that our method is quite different from the existing graph contrastive learning models. They require some augmentation techniques to generate contrastive views and select positive and negative examples from these different views. The specific process is shown in Figure 1, which we already introduced in detail in Section 1, so we will not reiterate it here. For the objective function DCL_N loss, the selection of positive and negative examples is based on user–item interactions on the graph. Users and items that have interactions with each other are considered positive pairs, while users and items that do not have any interactions are considered negative pairs.
We conducted an initial experiment. We used the LightGCN model as the encoder, adopted DCL_N loss as the objective function, and refer to it as LDCL_N. The overall framework of LDCL_N is shown in Figure 4. The experimental results are shown in Table 1. They indicate that LDCL_N achieves extremely high training efficiency and yields decent results. LDCL_N outperforms LightGCN in some metrics in terms of recommendation performance, but there is still a significant gap compared to XSimGCL.
Although we tried various methods to improve the performance of this method, the results were not satisfactory. This may be because the design of DCL_N loss is somewhat rough. We analyzed this rough issue based on Formula (10). For a node, the sum of all its positive pairs’ scores is placed in the numerator of Formula (10). However, if we do this, we cannot fully exploit the contribution of each positive pair. Inspired by BPR loss, we decided to select positive and negative examples based on each user–item interaction.

4.3. DCL Loss

We constructed our dataset by selecting positive and negative examples through edges on the user–item bipartite graph, ultimately yielding the DCL loss function. In the process of selecting positive examples, any interaction between a user and an item is deemed a positive pair. Conversely, when choosing negative examples, we adopted a dual perspective, considering both users and items. Specifically, for a user, we randomly selected n items from those they had not interacted with to serve as negative examples; similarly, for an item, we randomly picked n users who had not interacted with it to act as negative examples. To elaborate further, let dataset O = { ( ( u , i ) , S ϵ u , S ϵ i ) | ϵ E , ϵ = ( u , i ) } , where E is the set of edges on the user–item bipartite graph, and ϵ is an edge associated with user u and item i on the graph. The selection of positive and negative examples is shown in Figure 3b. When selecting positive examples, we consider user u and item i associated with edge ϵ to be positive examples of each other. When selecting negative examples, we employ a random sampling method. Define S u as the set of all items that user u has not interacted with, and define S i as the set of all users who have not interacted with item i. For the user u associated with edge ϵ , we randomly select n items from the set S u to form the negative examples set S ϵ u . For the item i associated with edge ϵ , we randomly select n users from the set S i to form the negative examples set S ϵ i . n is a hyperparameter that needs to be tuned. It is worth noting that S ϵ u is not only related to the user u, but also related to the edge ϵ it belongs to. That is to say, for each user u, the negative example set is not fixed. For each item i, the situation is the same.
We first define DCL_U loss from the perspective of users:
L D C L _ U = 1 | B | ( ( u , i ) , S ϵ u , S ϵ i ) B l o g e x p ( e u T e i / τ ) e x p ( e u T e i / τ ) + j S ϵ u e x p ( e u T e j / τ )
Then, we define DCL_I loss from the perspective of items:
L D C L _ I = 1 | B | ( ( u , i ) , S ϵ u , S ϵ i ) B l o g e x p ( e i T e u / τ ) e x p ( e i T e u / τ ) + v S ϵ i e x p ( e i T e v / τ )
Finally, DCL loss is defined by computing the sum of L D C L _ U and L D C L _ I as
L D C L = L D C L _ U + L D C L _ I
where B is a mini-batch, e * represents the final embedding of node ∗ on the graph, ∗ refers to { u , i , v , j } , and τ is the temperature coefficient.
As can be seen from the definition of DCL loss, L 2 normalization has been removed. This is because users’ preference scores for items are directly calculated through vector dot product operations, as shown in Equation (2). Additionally, the positive example is added to the negative examples in order to stabilize the value of DCL loss. This is reflected in the computation of the denominator in Equations (11) and (12).
We were inspired by BPR loss to construct a new dataset and improve DCL_N loss according to the edges on the graph. Our primary aim within DCL loss was to optimize the exploitation of every positive pair, ensuring the comprehensive utilization of each user–item interaction. Consequently, we eliminated the procedure of summing the scores of positive pairs. A clear differentiation emerges between DCL_N loss and DCL loss, as illustrated in Equations (10) and (11). The former incorporates the summation of positive pair scores, which inadequately leverages positive data. Conversely, DCL loss omits this summation operation, instead focusing on augmenting the disparity between each positive pair and numerous negative pairs in every computational step. This adjustment rectifies the crude design element inherent in DCL_N loss. Furthermore, it ensures that the model harnesses both negative and positive data effectively.
DCL loss can be used as an alternative to BPR loss as the objective function for a model. For example, we utilized the structural frameworks of BPRMF and LightGCN as encoders, integrating DCL loss as the objective function to create new models named DCLMF and LDCL. The structures of these models are depicted in Figure 5. Moreover, DCL loss does not conflict with the current graph contrastive learning method, and can even be combined with the current graph contrastive learning models. They just apply contrastive learning from two different perspectives. For example, if we use XSimGCL as the encoder, then the joint loss function can be defined as
L = L D C L + λ L C L
where L C L refers to Formula (7) mentioned in Section 3.2. The new model is referred to as XDCL.
In the subsequent experiments, XDCL demonstrates the superior performance. The overall structure of XDCL is depicted in Figure 6, consisting of two main components: one dedicated to recommendation tasks employing DCL loss as its objective function, and another designed for contrastive tasks utilizing InfoNCE loss. At its core, DCL loss is fundamentally akin to InfoNCE loss, yet their roles diverge. DCL loss serves as the primary loss function, derived from the original interaction data, with the purpose of amplifying the dot product scores of interacted user–item embeddings while suppressing those of non-interacted pairs. Conversely, InfoNCE loss operates as an auxiliary loss, necessitating augmentation techniques to procure varied views of users (or items), thereby enhancing consistency within the same user’s (or item’s) different views while diminishing consistency across disparate users’ views.
In DCL loss, two factors influence its performance. One is the size of S ϵ u ( S ϵ i ), which refers to the number of negative instances, n. This parameter determines how many negative pairs DCL loss will differentiate the positive pair from during calculation. The other factor is the temperature coefficient, τ , which plays a pivotal role in modulating the scale of similarity measurement between samples.

5. Experiment

In this section, we will demonstrate the superiority of DCL loss through a large number of experiments.

5.1. Datasets

To evaluate the performance of the proposed DCL loss, we used four public datasets to conduct our experiments: Yelp (https://www.yelp.com/dataset (accessed on 12 January 2024)), Amazon Books [41], Gowalla [42], and Alibaba iFashion [43]. These datasets differ in terms of domains, magnitude, and density. They are all real-world datasets. Specifically, for the Yelp and Amazon Books datasets, we excluded users and items that had fewer than 15 interactions to uphold the data’s integrity. The statistics of the datasets are summarized in Table 2. For each dataset, we divided the data into a training set, a validation set, and a test set in an 8:1:1 ratio. The specific partitioning method is the same as that described in the NCL paper [34].

5.2. Baseline

We choose the following baseline models for performance comparison.
-
BPRMF [14] proposes pair-wise BPR loss for personalized ranking.
-
FISM [44] constitutes an item-oriented collaborative filtering model, wherein it consolidates the representations of a user’s historical interactions to embody their interests.
-
NGCF [8] preprocesses data into a bipartite graph structure and applies graph neural networks to collaborative filtering, thus capturing high-order information.
-
MultiGCCF [45] extends information propagation beyond the user–item bipartite graph, encompassing higher-order correlations among both users and items.
-
DGCF [24] decomposes user interests and uses graph neural networks on the subgraphs to obtain user sub-embeddings, which are concatenated to obtain the final user embedding.
-
LightGCN [10] removes feature transformation and nonlinear activation in graph neural networks, simplifying the graph neural networks while improving recommendation performance.
-
SGL [13] introduces contrastive learning into graph neural networks through graph augmentation techniques, further improving recommendation performance. We use SGL-ED as the comparative scheme.
-
NCL [34] employs structural neighbors and semantic neighbors, which are obtained by using a clustering algorithm for contrastive learning.
-
SimGCL [35] uses the noise perturbation technique to generate different views of contrastive learning, which not only improves performance recommendation but also increases training efficiency.
-
XSimGCL [12] adopts the noise perturbation technique for contrastive learning between cross-layers, further improving training efficiency.

5.3. Evaluation Metrics

We chose Recall@N and NDCG@N as evaluation metrics, where N was set to 10, 20, and 50. This is the same as the NCL [34] too.

5.4. Implementation Details

To ensure a fair comparison, the implementation details were kept consistent with those in the NCL paper. We used the unified open-source framework for recommendation systems RecBole (https://github.com/RUCAIBox/RecBole (accessed on 25 January 2024)) [46] to implement the models that were not included in the NCL paper. The Adam optimizer was chosen as the optimizer for all models. All the parameters were initialized using the default Xavier distribution. We set the batch size to 4096 and the embedding size to 64. The patience of early stopping was set to 10 epochs to prevent overfitting. In DCL loss, only the number n of negative examples and the temperature coefficient τ needed to be adjusted. We tuned the hyperparameter n in [16, 256] and τ in [0.01, 5]. When different models used DCL loss as the objective function, we tuned all hyperparameters to achieve the best performance. In order to thoroughly substantiate the reliability of experimental outcomes, each model was trained on every dataset for five iterations, with the final Recall@N and NDCG@N metrics being calculated as the average of these runs.

5.5. Experiment

5.5.1. Overall Performance

The experimental results are shown in Table 3. Since the recommended results for SimGCL and XSimGCL were not available in the NCL paper, we reproduced their code using RecBole [46]. The results of the other models were obtained from the NCL paper [34]. As shown in Table 3, NGCF, DGCF, and LightGCN are all models that directly use graph neural networks for recommendation. Among these models, LightGCN demonstrates the best overall recommendation performance. SGL, NCL, SimGCL, and XSimGCL are all graph contrastive learning models. Their differences lie in the methods used to generate contrasting views. SGL is the earliest graph contrastive learning model, but its recommendation results are not satisfactory. Among the graph contrastive learning models, XSimGCL achieves the best overall performance.
As shown in Table 3, XDCL (its structure is illustrated in Figure 6) achieves the best recommendation performance on all datasets. Additionally, the last column in Table 3 displays the percentage improvement in XDCL’s performance compared to the best existing models. We can observe that the maximum improvement reaches up to +10.91%. This is enough to illustrate the superiority of DCL loss.

5.5.2. The Recommendation Performance of Other Models Using DCL Loss as the Objective Function

To further illustrate the superiority of DCL loss. We compared the performance of BPRMF and DCLMF, LightGCN and LDCL, and XSimGCL and XDCL. The structures of these models are shown in Figure 5. The encoder of BPRMF is straightforward as it does not have a network structure. It directly trains the initialized embedding matrix E ( 0 ) . LightGCN is the best model in terms of overall recommendation performance in graph neural networks. XSimGCL has the best overall recommendation performance among graph contrastive learning models. Therefore, we utilized these three baseline models as encoders to demonstrate the superiority of DCL loss. The experimental results are given in Table 4.
From Table 4, we can observe that when a model uses DCL loss as the objective function, there is a significant improvement in its recommendation performance. The models between the two vertical lines have the same structural framework, but differ in their objective functions. We calculated the percentage improvement in the performance of the models using DCL loss compared to the previous models. The maximum improvement reached is 64.88%. The main objective function of BPRMF, LightGCN, and XSimGCL is BPR loss. Furthermore, there are some relationships among them. LightGCN introduces a graph neural network based on BPRMF. XSimGCL integrates contrastive learning into LightGCN. Therefore, they exhibit an increasing trend in their recommendation performance. And DCLMF, LDCL, and XDCL also have similar relationships, where their recommendation performance shows an increasing trend as well. So, DCL loss is a good alternative to BPR loss.
The hyperparameter configurations for DCLMF, LDCL, and XDCL are presented in Table 5. Some hyperparameters are common to all models, while others are unique to certain models. For hyperparameters not applicable to a specific model, “-” is entered in the table.

5.5.3. Efficiency Comparison

We set the patience of early stopping to 10 during the training of a model. Therefore, we could record the number of training epochs and the total training time required for a model to reach the best recommendation performance on the validation set. We will compare the training efficiency from the following perspectives:
  • The total training time required for a model to achieve the best performance.
When counting the training time, we did not consider the sampling time and the evaluation time. Our experiments were run on an Intel(R) Xeon(R) Gold 6230R CPU @ 2.10 GHz machine with an NVIDIA GeForce RTX 3090 GPU.
We analyzed the training efficiency based on Figure 7. From part (a), it can be observed that when a model utilizes DCL loss, the training time required for one epoch slightly increases. However, from part (b), it is evident that the total number of required training epochs significantly decreases in most cases. As a result, the overall training time is significantly reduced, as indicated in part (c). This means that the training efficiency is greatly improved in most cases. The improvement in training efficiency is related to the design of DCL loss. When a model utilizes DCL loss, it can not only leverage a large number of negative examples, but also accelerate the rate of performance improvement by adjusting the temperature coefficient τ . As a result, the model requires significantly fewer epochs for training, reducing the overall training time.
Both DCL loss and BPR loss were constructed based on the edges of the user–item bipartite graph; hence, their time complexities are correlated with the number of edges in the graph. Assuming there are S edges in the graph, the dimensionality of node embeddings is d, and DCL loss selects n negative examples, the time complexity for computing DCL loss is O ( n · S · d ) , whereas the time complexity for BPR loss is O ( S · d ) . From this analysis, it is evident that the time complexity of DCL loss is n times that of BPR loss. However, in practical training scenarios, the computation time for DCL loss does not typically escalate to exactly n times that of BPR loss. This is because the computation of the dot products for n negative pairs can be expedited through parallel computing, significantly reducing the actual computation time for DCL loss. As clearly illustrated in Figure 7a, the time required for DCLMF to complete one epoch of training is merely two to three times that of BPRMF, far from being n times longer.

5.5.4. Ablation Experiment

From Formula (13), we can see that DCL loss was obtained by summing DCL_U loss and DCL_I loss. To prevent the different models from influencing the experiments, we chose the simplest encoder, E = E ( 0 ) , for conducting the experiments. Since the experimental results demonstrate similar performance across all datasets, we present the experimental results on the Gowalla and Amazon Books datasets. The experimental results are shown in Figure 8. From Figure 8, we can observe that DCL loss achieved the best performance. This proves the effectiveness of selecting positive and negative examples from both user and item perspectives simultaneously. Additionally, we can observe that DCL_I loss is trainable, but it performs the worst. This is because DCL_I only selects positive and negative examples from the perspective of the item, which fails to obtain better user representations and consequently leads to poorer recommendation results.

5.5.5. The Impact of the Number n of Negative Examples

Similar to the ablation experiments, in order to mitigate the impact of different models on the experimental results, we investigates the impact of the number n of negative examples using the DCLMF model. In the experiment, we not only measured the performance but also kept track of the number of epochs required for the model to achieve the best recommendation performance. We present the experimental results on the Gowalla and Amazon Books datasets. From Figure 9, it can be observed that n has a significant impact on recommendation performance. As n increases, the recommendation performance continues to improve. However, n has little impact on the efficiency of model training. For different choices of n, the number of epochs required for model training does not vary significantly. The reasonable range for the value of n is [16, 256]. The model demonstrates superior recommendation performance when n is set to 128 or 256.

5.5.6. The Impact of Temperature Coefficient τ

We used the DCLMF model to study the impact of the temperature coefficient τ . Similarly, we present the experimental results on the Gowalla and Amazon Books datasets. The experimental results are shown in Figure 10. As τ increases, the recommendation performance exhibits a trend of initially increasing and then decreasing. Moreover, τ has a significant impact on training efficiency. When τ is small, the model only needs to be trained for a few epochs to achieve the best recommendation performance. However, when τ is large, the model needs to be trained for several tens of epochs to reach the optimal recommendation performance. The appropriate range for the value of τ is [0.01, 5]. The model exhibits optimal recommendation performance when τ falls within the interval [0.05, 2].

6. Conclusions

We propose that graph collaborative filtering models can be trained directly using contrastive learning. We select positive and negative examples from the perspective of user–item interactions and construct a novel objective function called DCL loss. DCL loss is a choice to replace BPR loss, allowing the model to fully utilize negative examples. When existing models use DCL loss as the objective function, both their recommendation performance and training efficiency can be significantly improved. Based on our research, it is apparent that contrastive learning holds enormous potential in the field of graph collaborative filtering. So, further in-depth research and analysis are necessary to fully explore its capabilities.

Author Contributions

Conceptualization, J.D. and Y.Z.; methodology, J.D., Y.Z. and S.H.; software, D.F.; validation, Y.Z. and H.Z.; formal analysis, Y.Z. and Z.X.; writing—original draft preparation, J.D.; writing—review and editing, J.D., Y.Z. and S.H.; supervision, Y.Z.; project administration, Y.Z.; funding acquisition, Y.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This work was partially supported by the National Natural Science Foundation of China (62002255), the Fundamental Research Programs of Shanxi Province (Grant No. 202203021222120), the Fundamental Research Programs of Shanxi Province (Grant No. 20210302124168), the Key Research and Development Projects in Shanxi Province (Grant No. 202102020101004), and the Fundamental Research Programs of Shanxi Province (Grant No. 20210302122305).

Data Availability Statement

Our code is accessible at https://github.com/D-JI-Feng/papercode (accessed on 23 May 2024). We provide the corresponding code and configuration files. The relevant datasets will be automatically downloaded and split via the third-party library RecBole. Running the associated code can validate our approach.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Wu, S.; Sun, F.; Zhang, W.; Xie, X.; Cui, B. Graph Neural Networks in Recommender Systems: A Survey. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
  2. Covington, P.; Adams, J.; Sargin, E. Deep Neural Networks for YouTube Recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 191–198. [Google Scholar] [CrossRef]
  3. Ying, R.; He, R.; Chen, K.; Eksombatchai, P.; Hamilton, W.L.; Leskovec, J. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 974–983. [Google Scholar] [CrossRef]
  4. Yuan, F.; He, X.; Karatzoglou, A.; Zhang, L. Parameter-Efficient Transfer from Sequential Behaviors for User Modeling and Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; pp. 1469–1478. [Google Scholar] [CrossRef]
  5. Ebesu, T.; Shen, B.; Fang, Y. Collaborative Memory Network for Recommendation Systems. In Proceedings of the 41st International ACM SIGIR Conference on Research & Development in Information Retrieval, Ann Arbor, MI, USA, 8–12 July 2018; pp. 515–524. [Google Scholar] [CrossRef]
  6. He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar] [CrossRef]
  7. Liang, D.; Krishnan, R.G.; Hoffman, M.D.; Jebara, T. Variational Autoencoders for Collaborative Filtering. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 689–698. [Google Scholar] [CrossRef]
  8. Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.S. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar] [CrossRef]
  9. Wu, L.; He, X.; Wang, X.; Zhang, K.; Wang, M. A Survey on Accuracy-Oriented Neural Recommendation: From Collaborative Filtering to Information-Rich Recommendation. IEEE Trans. Knowl. Data Eng. 2023, 35, 4425–4445. [Google Scholar] [CrossRef]
  10. He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; pp. 639–648. [Google Scholar] [CrossRef]
  11. Zhang, J. Graph Neural Networks for Small Graph and Giant Network Representation Learning: An Overview. arXiv 2019, arXiv:1908.00187. [Google Scholar]
  12. Yu, J.; Xia, X.; Chen, T.; Cui, L.; Hung, N.Q.V.; Yin, H. XSimGCL: Towards Extremely Simple Graph Contrastive Learning for Recommendation. IEEE Trans. Knowl. Data Eng. 2023, 36, 913–926. [Google Scholar] [CrossRef]
  13. Wu, J.; Wang, X.; Feng, F.; He, X.; Chen, L.; Lian, J.; Xie, X. Self-supervised Graph Learning for Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 11–15 July 2021; pp. 726–735. [Google Scholar] [CrossRef]
  14. Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 452–461. [Google Scholar]
  15. van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2019, arXiv:1807.03748. [Google Scholar]
  16. Gao, C.; Wang, X.; He, X.; Li, Y. Graph Neural Networks for Recommender System. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event, 21–25 February 2022; pp. 1623–1625. [Google Scholar] [CrossRef]
  17. Cai, X.; Huang, C.; Xia, L.; Ren, X. LightGCL: Simple Yet Effective Graph Contrastive Learning for Recommendation. arXiv 2023, arXiv:2302.08191. [Google Scholar]
  18. Koren, Y.; Bell, R.; Volinsky, C. Matrix Factorization Techniques for Recommender Systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
  19. He, X.; Du, X.; Wang, X.; Tian, F.; Tang, J.; Chua, T.S. Outer product-based neural collaborative filtering. In Proceedings of the 27th International Joint Conference on Artificial Intelligence, Stockholm, Sweden, 13–19 July 2018; pp. 2227–2233. [Google Scholar]
  20. Xue, H.J.; Dai, X.Y.; Zhang, J.; Huang, S.; Chen, J. Deep matrix factorization models for recommender systems. In Proceedings of the 26th International Joint Conference on Artificial Intelligence, Melbourne, Australia, 19–25 August 2017; pp. 3203–3209. [Google Scholar]
  21. Chen, C.; Zhang, M.; Zhang, Y.; Liu, Y.; Ma, S. Efficient Neural Matrix Factorization without Sampling for Recommendation. ACM Trans. Inf. Syst. 2020, 38, 1–28. [Google Scholar] [CrossRef]
  22. Su, C.; Chen, M.; Xie, X. Graph Convolutional Matrix Completion via Relation Reconstruction. In Proceedings of the 2021 10th International Conference on Software and Computer Applications, Kuala Lumpur, Malaysia, 23–26 February 2021; pp. 51–56. [Google Scholar] [CrossRef]
  23. Sun, J.; Zhang, Y.; Ma, C.; Coates, M.; Guo, H.; Tang, R.; He, X. Multi-graph Convolution Collaborative Filtering. In Proceedings of the 2019 IEEE International Conference on Data Mining (ICDM), Beijing, China, 8–11 November 2019; pp. 1306–1311. [Google Scholar] [CrossRef]
  24. Wang, X.; Jin, H.; Zhang, A.; He, X.; Xu, T.; Chua, T.S. Disentangled Graph Collaborative Filtering. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; pp. 1001–1010. [Google Scholar] [CrossRef]
  25. Ji, S.; Feng, Y.; Ji, R.; Zhao, X.; Tang, W.; Gao, Y. Dual Channel Hypergraph Collaborative Filtering. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 2020–2029. [Google Scholar] [CrossRef]
  26. Sun, J.; Cheng, Z.; Zuberi, S.; Perez, F.; Volkovs, M. HGCF: Hyperbolic Graph Convolution Networks for Collaborative Filtering. In Proceedings of the Web Conference, Ljubljana, Slovenia, 19–23 April 2021; pp. 593–601. [Google Scholar] [CrossRef]
  27. Yang, M.; Li, Z.; Zhou, M.; Liu, J.; King, I. HICF: Hyperbolic Informative Collaborative Filtering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 2212–2221. [Google Scholar] [CrossRef]
  28. Mao, K.; Zhu, J.; Xiao, X.; Lu, B.; Wang, Z.; He, X. UltraGCN: Ultra Simplification of Graph Convolutional Networks for Recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, 1–5 November 2021; pp. 1253–1262. [Google Scholar] [CrossRef]
  29. Wang, C.; Yu, Y.; Ma, W.; Zhang, M.; Chen, C.; Liu, Y.; Ma, S. Towards Representation Alignment and Uniformity in Collaborative Filtering. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 1816–1825. [Google Scholar] [CrossRef]
  30. He, K.; Fan, H.; Wu, Y.; Xie, S.; Girshick, R. Momentum Contrast for Unsupervised Visual Representation Learning. In Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Seattle, WA, USA, 13–19 June 2020; pp. 9726–9735. [Google Scholar] [CrossRef]
  31. Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, Virtual Event, 13–18 July 2020; Available online: https://dl.acm.org/doi/proceedings/10.5555/3524938 (accessed on 12 January 2024).
  32. Yan, Y.; Li, R.; Wang, S.; Zhang, F.; Wu, W.; Xu, W. ConSERT: A Contrastive Framework for Self-Supervised Sentence Representation Transfer. arXiv 2021, arXiv:2105.11741. [Google Scholar]
  33. Gao, T.; Yao, X.; Chen, D. SimCSE: Simple Contrastive Learning of Sentence Embeddings. arXiv 2022, arXiv:2104.08821. [Google Scholar]
  34. Lin, Z.; Tian, C.; Hou, Y.; Zhao, W.X. Improving Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning. In Proceedings of the ACM Web Conference 2022, Lyon, France, 25–29 April 2022; pp. 2320–2329. [Google Scholar] [CrossRef]
  35. Yu, J.; Yin, H.; Xia, X.; Chen, T.; Cui, L.; Nguyen, Q.V.H. Are Graph Augmentations Necessary? Simple Graph Contrastive Learning for Recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1294–1303. [Google Scholar] [CrossRef]
  36. Xia, L.; Huang, C.; Xu, Y.; Zhao, J.; Yin, D.; Huang, J. Hypergraph Contrastive Collaborative Filtering. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 70–79. [Google Scholar] [CrossRef]
  37. Ren, X.; Xia, L.; Zhao, J.; Yin, D.; Huang, C. Disentangled Contrastive Collaborative Filtering. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, Taiwan, 23–27 July 2023; pp. 1137–1146. [Google Scholar] [CrossRef]
  38. Huang, C.; Xia, L.; Wang, X.; He, X.; Yin, D. Self-Supervised Learning for Recommendation. In Proceedings of the 31st ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 5136–5139. [Google Scholar] [CrossRef]
  39. Xia, L.; Huang, C.; Shi, J.; Xu, Y. Graph-less Collaborative Filtering. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 17–27. [Google Scholar] [CrossRef]
  40. Zhong, H.; Wu, J.; Chen, C.; Huang, J.; Deng, M.; Nie, L.; Lin, Z.; Hua, X.S. Graph Contrastive Clustering. In Proceedings of the 2021 IEEE/CVF International Conference on Computer Vision (ICCV), Montreal, QC, Canada, 10–17 October 2021; pp. 9204–9213. [Google Scholar] [CrossRef]
  41. McAuley, J.; Targett, C.; Shi, Q.; van den Hengel, A. Image-Based Recommendations on Styles and Substitutes. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 43–52. [Google Scholar] [CrossRef]
  42. Cho, E.; Myers, S.A.; Leskovec, J. Friendship and mobility: User movement in location-based social networks. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Diego, CA, USA, 21–24 August 2011; pp. 1082–1090. [Google Scholar] [CrossRef]
  43. Chen, W.; Huang, P.; Xu, J.; Guo, X.; Guo, C.; Sun, F.; Li, C.; Pfadler, A.; Zhao, H.; Zhao, B. POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2662–2670. [Google Scholar] [CrossRef]
  44. Kabbur, S.; Ning, X.; Karypis, G. FISM: Factored item similarity models for top-N recommender systems. In Proceedings of the 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; KDD ’13. pp. 659–667. [Google Scholar] [CrossRef]
  45. Zhao, W.X.; Chen, J.; Wang, P.; Gu, Q.; Wen, J.R. Revisiting Alternative Experimental Settings for Evaluating Top-N Item Recommendation Algorithms. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, 19–23 October 2020; pp. 2329–2332. [Google Scholar] [CrossRef]
  46. Zhao, W.X.; Mu, S.; Hou, Y.; Lin, Z.; Chen, Y.; Pan, X.; Li, K.; Lu, Y.; Wang, H.; Tian, C.; et al. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual Event, 1–5 November 2021; pp. 4653–4664. [Google Scholar] [CrossRef]
Figure 1. The overall framework of the current graph contrastive learning models. Part (a) is the user–item bipartite graph. Part (b) is the the process of selecting positive and negative examples, where the yellow item is positive and the green item is negative. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (c) is the calculation process of BPR loss. Part (d) is the two views of users. Part (e) shows the views of the two items. Part (f) is the calculation method of InfoNCE loss, where yellow and brown are positive pairs, and yellow and green are negative pairs.
Figure 1. The overall framework of the current graph contrastive learning models. Part (a) is the user–item bipartite graph. Part (b) is the the process of selecting positive and negative examples, where the yellow item is positive and the green item is negative. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (c) is the calculation process of BPR loss. Part (d) is the two views of users. Part (e) shows the views of the two items. Part (f) is the calculation method of InfoNCE loss, where yellow and brown are positive pairs, and yellow and green are negative pairs.
Mathematics 12 02057 g001
Figure 2. The overall framework of the initially proposed scheme. Part (a) is the user–item bipartite graph. Part (b) is the selection of positive and negative examples from the user’s point of view, where yellow items are positive and green items are negative. Part (c) is the selection of positive and negative examples from the perspective of the item, where brown users are positive and green users are negative. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (d) is the calculation method of DCL_N loss.
Figure 2. The overall framework of the initially proposed scheme. Part (a) is the user–item bipartite graph. Part (b) is the selection of positive and negative examples from the user’s point of view, where yellow items are positive and green items are negative. Part (c) is the selection of positive and negative examples from the perspective of the item, where brown users are positive and green users are negative. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (d) is the calculation method of DCL_N loss.
Mathematics 12 02057 g002
Figure 3. Part (a) is the user–item bipartite graph. Part (b) is the selection of positive and negative examples of the user and item on one edge, where yellow is the positive example and green is the negative example. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (c) is the calculation method of DCL loss.
Figure 3. Part (a) is the user–item bipartite graph. Part (b) is the selection of positive and negative examples of the user and item on one edge, where yellow is the positive example and green is the negative example. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (c) is the calculation method of DCL loss.
Mathematics 12 02057 g003
Figure 4. The overall framework of LDCL_N. Part (a) is the user–item bipartite graph. Part (b) is the selection of positive and negative examples from the user’s point of view, where yellow items are positive and green items are negative. Part (c) is the selection of positive and negative examples from the perspective of the item, where brown users are positive and green users are negative. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (d) is the calculation method of DCL_N loss.
Figure 4. The overall framework of LDCL_N. Part (a) is the user–item bipartite graph. Part (b) is the selection of positive and negative examples from the user’s point of view, where yellow items are positive and green items are negative. Part (c) is the selection of positive and negative examples from the perspective of the item, where brown users are positive and green users are negative. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (d) is the calculation method of DCL_N loss.
Mathematics 12 02057 g004
Figure 5. Part (a1) is the overall framework of BPRMF. Part (a2) is the overall framework of DCLMF. Part (b1) is the overall framework of LightGCN. Part (b2) is the overall framework of LDCL. Part (c1) is the overall framework of XSimGCL. Part (c2) is the overall framework of XDCL.
Figure 5. Part (a1) is the overall framework of BPRMF. Part (a2) is the overall framework of DCLMF. Part (b1) is the overall framework of LightGCN. Part (b2) is the overall framework of LDCL. Part (c1) is the overall framework of XSimGCL. Part (c2) is the overall framework of XDCL.
Mathematics 12 02057 g005
Figure 6. The overall framework of XDCL. Part (a) is the user–item bipartite graph. Part (b) is the the process of selecting positive and negative examples, where the yellow item is positive and the green item is negative. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (c) is the calculation process of DCL loss. Part (d) is the two views of users. Part (e) shows the views of the two items. Part (f) is the calculation method of InfoNCE loss, where yellow and brown are positive pairs, and yellow and green are negative pairs.
Figure 6. The overall framework of XDCL. Part (a) is the user–item bipartite graph. Part (b) is the the process of selecting positive and negative examples, where the yellow item is positive and the green item is negative. The red lines represent the edge that originally existes in the user–item bipartite graph, connecting a user and an item that is positive example of each other. Part (c) is the calculation process of DCL loss. Part (d) is the two views of users. Part (e) shows the views of the two items. Part (f) is the calculation method of InfoNCE loss, where yellow and brown are positive pairs, and yellow and green are negative pairs.
Mathematics 12 02057 g006
Figure 7. A comparison of the training efficiency of different models on various datasets. Part (a) indicates the training time of the model required for one epoch. Part (b) represents the total number of training epochs required for the model. Part (c) indicates the total time required for training the model.
Figure 7. A comparison of the training efficiency of different models on various datasets. Part (a) indicates the training time of the model required for one epoch. Part (b) represents the total number of training epochs required for the model. Part (c) indicates the total time required for training the model.
Mathematics 12 02057 g007
Figure 8. Ablation experiments on different datasets. Part (a) is ablation experiment on gowalla dataset. Part (b) is ablation experiment on amazon-books dataset.
Figure 8. Ablation experiments on different datasets. Part (a) is ablation experiment on gowalla dataset. Part (b) is ablation experiment on amazon-books dataset.
Mathematics 12 02057 g008
Figure 9. The impact of the number n of negative examples.
Figure 9. The impact of the number n of negative examples.
Mathematics 12 02057 g009
Figure 10. The impact of temperature coefficient τ .
Figure 10. The impact of temperature coefficient τ .
Mathematics 12 02057 g010
Table 1. Performance and training efficiency of different models.
Table 1. Performance and training efficiency of different models.
DatasetMetricLightGCNXSimGCLLDCL_N
Recall@100.13620.15040.124
NDCG@100.08760.11020.0894
Gowalla#Epoch585532
Time/Epoch5.55 s7.64 s10.24 s
Total Time54 min 6.75 s6 min 44.91 s0 min 20.48 s
Recall@100.07300.09260.0734
NDCG@100.05200.06810.0526
Yelp#Epoch753495
Time/Epoch14.55 s16.30 s12.52 s
Total Time182 min 36.14 s13 min 18.70 s1 min 1.4 s
Bold font indicates the best results, min represents minutes, while s represents seconds.
Table 2. Statistics of the datasets.
Table 2. Statistics of the datasets.
Datasets#Users#Items#InteractionsDensity
Yelp45,47830,7091,777,7650.00127
Amazon58,14558,0522,517,4370.00075
Gowalla29,85940,9891,027,4640.00084
Alibaba300,00081,6141,607,8130.00007
Table 3. Performance comparison of different models.
Table 3. Performance comparison of different models.
DatasetMetricBPRMFFISMNGCFMultiGCCFDGCFLightGCNSGLNCLSimGCLXSimGCLXDCLImprov.
YelpRecall@100.06430.07140.06300.06460.07230.07300.08330.09200.09060.09200.0984+6.95%
NDCG@100.04580.05100.04460.04500.05140.05200.06010.06780.06630.06780.0752+10.91%
Recall@200.10430.11190.10260.10530.11350.11630.12880.13770.13730.14020.1443+2.92%
NDCG@200.05800.06360.05670.05750.06410.06520.07390.08170.08050.08250.0888+7.24%
Recall@500.18620.19630.18640.18820.19890.20160.21400.22470.22730.22870.2318+1.35%
NDCG@500.07930.08560.07840.07900.08620.08750.09640.10460.10410.10570.1116+5.58%
AmazonRecall@100.06070.07210.06170.06250.07370.07970.08980.09330.10150.10700.1167+9.06%
NDCG@100.04300.05040.04270.04330.05210.05650.06450.06790.07380.07920.0874+10.35%
Recall@200.09560.10990.09780.09910.11280.12060.13310.13810.14770.15340.1647+7.36%
NDCG@200.05370.06220.05370.05450.06400.06890.07770.08150.08800.09330.1020+9.32%
Recall@500.16810.01830.16990.16880.19080.20120.21570.21750.23090.23300.2493+6.99%
NDCG@500.07260.08150.07250.07270.08430.08990.09920.10240.10990.11440.1243+8.65%
GowallaRecall@100.11580.10810.11920.11080.12520.13620.14650.15000.15120.15040.1537+1.65%
NDCG@100.08330.07550.08520.07910.09020.08760.10480.10820.11020.11020.1119+1.54%
Recall@200.16950.16200.17550.16260.18290.19760.20840.21330.21460.21570.2200+1.99%
NDCG@200.09880.09130.10130.09400.10660.11520.12250.12650.12820.12890.1309+1.55%
Recall@500.27560.26730.28110.26310.28770.30440.31970.32590.32650.32660.3330+1.95%
NDCG@500.11500.11690.12700.11840.13220.14140.14970.15420.15570.15600.1587+1.73%
AlibabaRecall@100.30300.03570.03820.04010.4470.04570.04610.04770.05740.05750.0615+6.95%
NDCG@100.01610.01900.01980.02070.02410.02480.02480.02590.03120.03130.0337+7.66%
Recall@200.04670.05530.06150.06340.06770.06920.06920.07130.08490.08470.0898+5.77%
NDCG@200.02030.02390.02570.02660.02990.03070.03070.03190.03820.03820.0409+7.06%
Recall@500.07990.09430.10810.11070.11200.11440.11410.11650.13610.13570.1413+3.82%
NDCG@500.02690.03170.03490.03600.03870.03960.03960.04090.04840.04840.0511+5.57%
Bold font indicates the best results.
Table 4. Performance comparison of other models using DCL loss as the objective function.
Table 4. Performance comparison of other models using DCL loss as the objective function.
DatasetMetricBPRMFDCLMFLightGCNLDCLXSimGCLXDCL
YelpRecall@100.06430.0880 (+36.85%)0.07300.0955 (+30.82%)0.09200.0984 (+6.95%)
NDCG@100.04580.0719 (+11.81%)0.05200.0744 (+43.07%)0.06780.0752 (+10.91%)
Recall@200.10430.1241 (+18.98%)0.11630.1418 (+21.92%)0.14020.1443 (+2.92%)
NDCG@200.05800.0823 (+41.89%)0.06520.0811 (+24.38%)0.08250.0888 (+7.24%)
Recall@500.18620.1957 (+5.10%)0.20160.2298 (+13.98%)0.22870.2318 (+1.35%)
NDCG@500.07930.1006 (+26.86%)0.08750.1108 (+26.62%)0.10570.1116 (+5.58%)
Amazon BooksRecall@100.06070.0972 (+60.13%)0.07970.1108 (+39.02%)0.10700.1167 (+9.06%)
NDCG@100.0430.0709 (+64.88%)0.05650.0818 (+44.77%)0.07920.0874 (+10.35%)
Recall@200.09560.1387 (+45.08%)0.12060.1581 (+31.09%)0.15340.1647 (+7.36%)
NDCG@200.05370.0837 (+55.86%)0.06890.0961 (+39.47%)0.09330.1020 (+9.32%)
Recall@500.16810.2144 (+27.54%)0.20120.2438 (+21.17%)0.23300.2493 (+6.99%)
NDCG@500.07260.1036 (+42.69%)0.08990.1186 (+31.92%)0.11440.1243 (+8.65%)
GowallaRecall@100.11580.1335 (+15.28%)0.13620.1485 (+9.03%)0.15040.1537 (+2.19%)
NDCG@100.08330.0965 (+14.76%)0.08760.1081 (+23.40%)0.11020.1119 (+1.54%)
Recall@200.16950.1920 (+13.27%)0.19760.2121 (+7.33%)0.21570.2200 (+1.99%)
NDCG@200.09880.1132 (+14.51%)0.11520.1263 (+9.63%)0.12890.1309 (+1.55%)
Recall@500.27560.2988 (+8.41%)0.30440.3264 (+7.22%)0.32660.3330 (+1.95%)
NDCG@500.11500.1392 (+21.04%)0.14140.1542 (+9.05%)0.15600.1587 (+1.73%)
Alibaba iFashionRecall@100.3030.0346 (+14.19%)0.04570.0534 (+16.84%)0.05750.0615 (+6.95%)
NDCG@100.01610.0189 (+17.39%)0.02480.0291 (+17.33%)0.03130.0337 (+7.66%)
Recall@200.04670.0521 (+11.56%)0.06920.0790 (+14.16%)0.08470.0898 (+6.02%)
NDCG@200.02030.0233 (+14.77%)0.03070.0356 (+15.96%)0.03820.0409 (+7.06%)
Recall@500.07990.0854 (+6.88%)0.11440.1265 (+10.57%)0.13570.1413 (+4.12%)
NDCG@500.02690.0300 (+11.52%)0.03960.0451 (+13.88%)0.04840.0511 (+5.57%)
Bold font indicates the greatest magnitude of performance enhancement.
Table 5. Hyperparameter configurations for the models.
Table 5. Hyperparameter configurations for the models.
DatasetModelLn τ τ 1 λ η K
YelpDCLMF-2560.2----
LDCL32560.05----
XDCL32560.050.150.20.0052
AmazonDCLMF-2560.1----
LDCL32560.05----
XDCL32560.050.150.30.0052
GowallaDCLMF-2560.2----
LDCL32560.05----
XDCL32560.050.150.20.0052
AlibabaDCLMF-1281----
LDCL31282----
XDCL312820.150.030.0052
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Dong, J.; Zhou, Y.; Hao, S.; Feng, D.; Zheng, H.; Xu, Z. Improving Graph Collaborative Filtering from the Perspective of User–Item Interaction Directly Using Contrastive Learning. Mathematics 2024, 12, 2057. https://doi.org/10.3390/math12132057

AMA Style

Dong J, Zhou Y, Hao S, Feng D, Zheng H, Xu Z. Improving Graph Collaborative Filtering from the Perspective of User–Item Interaction Directly Using Contrastive Learning. Mathematics. 2024; 12(13):2057. https://doi.org/10.3390/math12132057

Chicago/Turabian Style

Dong, Jifeng, Yu Zhou, Shufeng Hao, Ding Feng, Haixia Zheng, and Zhenhuan Xu. 2024. "Improving Graph Collaborative Filtering from the Perspective of User–Item Interaction Directly Using Contrastive Learning" Mathematics 12, no. 13: 2057. https://doi.org/10.3390/math12132057

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop