LoRA-NCL: Neighborhood-Enriched Contrastive Learning with Low-Rank Dimensionality Reduction for Graph Collaborative Filtering

Cao, Tianruo; Chen, Honghui; Hao, Zepeng; Hu, Tao

doi:10.3390/math11163577

Open AccessArticle

LoRA-NCL: Neighborhood-Enriched Contrastive Learning with Low-Rank Dimensionality Reduction for Graph Collaborative Filtering

Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Mathematics 2023, 11(16), 3577; https://doi.org/10.3390/math11163577

Submission received: 12 July 2023 / Revised: 9 August 2023 / Accepted: 17 August 2023 / Published: 18 August 2023

Download Versions Notes

Abstract

:

Graph Collaborative Filtering (GCF) methods have emerged as an effective recommendation approach, capturing users’ preferences over items by modeling user–item interaction graphs. However, these methods suffer from data sparsity in real scenarios, and their performance can be improved using contrastive learning. In this paper, we propose an optimized method, named LoRA-NCL, for GCF based on Neighborhood-enriched Contrastive Learning (NCL) and low-rank dimensionality reduction. We incorporate low-rank features obtained through matrix factorization into the NCL framework and employ LightGCN to extract high-dimensional representations. Extensive experiments on five public datasets demonstrate that the proposed method outperforms a competitive graph collaborative filtering base model, achieving 4.6% performance gains on the MovieLens dataset, respectively.

Keywords:

Contrastive Learning; low-rank dimensionlity reduction; Graph Collaborative Filtering

MSC:

68R10

1. Introduction

The advent of the digital age has led to an explosion of data, particularly in the realm of user–item interactions. This wealth of data has opened up new opportunities for recommendation systems [1], which aim to predict user preferences and recommend items that are most likely to be of interest. However, the sheer volume and complexity of the data present significant challenges. Traditional recommendation systems often struggle to effectively capture the intricate structure of user–item interactions, and fail to fully leverage the rich information embedded in these interactions.

In this paper, we propose a novel hybrid recommendation model that addresses these challenges by integrating Singular Value Decomposition [2] and an optimized version of Neighborhood-enriched Contrastive Learning [3]. Our method aims to capture both the global structure [4] and local neighborhood information [3] inherent in the user–item interaction graph [5], thereby enhancing the recommendation performance.

The primary contributions of this paper are as follows:

Novel Hybrid Recommendation Model: We propose a novel hybrid recommendation model that integrates Singular Value Decomposition (SVD) and an optimized version of neighborhood-enriched contrastive learning. This model is designed to capture both the global structure and local neighborhood information inherent in the user–item interaction graph, thereby enhancing the recommendation performance.
SVD-based Embedding Initialization: We introduce a novel approach to initializing user and item embeddings using SVD. This method captures the global structure of the user–item interaction graph and provides a robust starting point for the learning process. It also expedites the convergence of the training process, leading to improved efficiency.
Optimized Neighborhood-enriched Contrastive Learning: We present several key refinements to the NCL approach, including an adaptive neighborhood structure, unified optimization of contrastive objectives, and prototype regularization. These refinements allow our model to adapt to changing user–item interactions, balance the trade-off between different types of neighborhood information, and enhance the discriminative power of the prototypes.
Empirical Validation: We conduct extensive experiments on several benchmark datasets to validate the effectiveness of our proposed method. The results demonstrate that our method outperforms state-of-the-art recommendation models, thereby confirming its practical utility.
Insights into User–Item Interactions: Our work provides valuable insights into the structure of user–item interactions. By leveraging both global and local information, our model offers a more comprehensive understanding of user–item interactions, which can inform the design of future recommendation systems.

2. Background and Related Work

In this section, we present our proposed method, which is a hybrid recommendation model that combines Singular Value Decomposition (SVD) and Neighborhood-enriched Contrastive Learning (NCL). The goal of our method is to capture both the global structure and the local neighborhood information of the user–item interaction graph to improve the recommendation performance.

2.1. Singular Value Decomposition (SVD)

Singular Value Decomposition [2] is a renowned technique in linear algebra used for matrix factorization. It has found extensive applications in diverse domains, such as recommendation systems, data compression, and image processing. Given a matrix A of dimensions

m \times n

, SVD decomposes the matrix into three matrices: U,

Σ

, and

V^{T}

. Here, U is an

m \times m

orthogonal matrix,

Σ

is an

m \times n

diagonal matrix containing the singular values of A, and

V^{T}

is an

n \times n

orthogonal matrix. This decomposition can be mathematically represented as follows:

A = U Σ V^{T}

(1)

In the realm of recommendation systems, matrix A symbolizes the user–item interaction matrix, where users are represented in rows and items in columns. The entries of matrix A could be explicit ratings or implicit feedback, contingent on the data available. SVD is employed to decompose the user–item interaction matrix into latent factors, thereby capturing the underlying structure in the data and reducing the dimensionality of the original matrix.

Low-rank approximations of A can be achieved by retaining only the top N singular values in

Σ

and the corresponding columns in U and

V^{T}

. This truncated SVD can be represented as follows:

A \approx U_{N} Σ_{N} V_{N}^{T}

(2)

In this equation,

U_{N}

and

V_{N}^{T}

are the column-truncated versions of U and

V^{T}

, respectively, with only the first N columns retained, and

Σ_{N}

is the top

N \times N

diagonal submatrix of

Σ

.

The low-rank approximation of A, obtained through truncated SVD, is instrumental for dimensionality reduction in collaborative filtering. By retaining only the most significant singular values and corresponding latent factors, SVD can capture the essential structure and relations between users and items, while discarding the noise and less informative components in the data. This attribute of SVD allows for improved generalization and robustness in recommendation systems, while simultaneously reducing the complexity of the models involved.

2.2. Contrastive Learning

Contrastive learning [6,7] has been recently adopted in graph collaborative filtering [8,9] to enhance performance, especially in scenarios with data sparsity [10]. In this study, we present the Neighborhood-enriched Contrastive Learning (NCL) method [3]. This method distinctively integrates potential neighbors into contrastive pairs by drawing neighbors from both the graph structure and the semantic space for any given user (or item).

2.2.1. Contrastive Learning with Structural Neighbors

Current graph collaborative filtering models are predominantly trained using observed interactions, such as user–item pairings. However, these models often overlook possible relationships between users or items that are not evident in the observed data. To harness the full potential of contrastive learning, we suggest contrasting each user (or item) with their structural neighbors. These neighbors’ representations are collated through the layer-wise propagation in GNN. The initial user/item features or learnable embeddings in the graph collaborative filtering model are represented as

z^{(0)}

[11]. The final model output is essentially a fusion of embeddings from a subgraph, encompassing multiple neighbors at varied hops. Specifically, the l-th layer’s output

z^{(l)}

of the base GNN model is the weighted sum of l-hop structural neighbors of each node, assuming no transformations or self-loops during propagation [11].

Given that our interaction graph, denoted as

G

, is a bipartite graph, using a GNN-based model for even-numbered iterations facilitates the accumulation of data from similar structural neighbors. This is useful for identifying potential neighbors among users or items. By employing this method, we can extract representations of homogeneous neighborhoods from even layers (e.g., 2, 4, 6) of the GNN model. These representations enable more effective modeling of relationships between users/items and their consistent structural neighbors. In particular, we consider the user’s own embedding and the corresponding embedding from the even-layered GNN output as matched pairs. Building on InfoNCE [12], we propose the structure contrastive learning objective to minimize the distance between them, as follows:

L_{S}^{U} = \sum_{u \in U} - log \frac{exp ((z_{u}^{(k)} \cdot z_{u}^{(0)} / τ))}{\sum_{v \in U} exp ((z_{u}^{(k)} \cdot z_{v}^{(0)} / τ))},

(3)

Given that

z_{u}^{(k)}

represents the standardized output from the kth GNN layer, where k is an even integer. The temperature parameter for softmax is denoted by

τ

. Similarly, we can derive the structure contrastive loss for the item side, denoted as

L_{s t r u c}^{i t e m}

L_{S}^{I} = \sum_{i \in I} - log \frac{exp ((z_{i}^{(k)} \cdot z_{i}^{(0)} / τ))}{\sum_{j \in I} exp ((z_{i}^{(k)} \cdot z_{j}^{(0)} / τ))},

(4)

The total structure contrastive objective function is given by the combined weighted sum of the aforementioned losses:

L_{S} = L_{S}^{U} + α L_{S}^{I} .

(5)

where

α

acts as a hyperparameter to regulate the balance between the two losses in structure contrastive learning.

2.2.2. Contrastive Learning with Semantic Neighbors

The structure contrastive loss specifically delves into the neighbors as outlined by the interaction graph. Nonetheless, it perceives all neighbors of users/items in the same light, leading to the introduction of unnecessary noise in contrastive pairs. To counteract this noise from structural neighbors, we think about enhancing the contrastive pairs by integrating semantic neighbors. These are nodes that are not directly linked on the graph, yet they bear similar attributes (for items) or tastes (for users).

Drawing from prior studies [13], we discern these neighbors by determining the hidden prototype for every user and item. Building on this notion, we introduce the prototype contrastive goal, aiming to delve into potential semantic neighbors. This is then woven into contrastive learning, ensuring a more nuanced grasp of the semantic nuances of users and items in collaborative filtering. Specifically, users/items with similarities tend to cluster in neighboring embedding spaces, with prototypes serving as the focal points of these clusters, symbolizing a collection of semantic neighbors. Consequently, we employ a clustering technique on the user and item embeddings to pinpoint the prototypes for both. Given that this method is not conducive to end-to-end optimization, we harness the EM algorithm to achieve the suggested prototype contrastive goal. In a formal sense, the aim of the GNN model revolves around augmenting the subsequent log-likelihood function:

\sum_{u \in U} log p (e_{u} | Θ, R) = \sum_{u \in U} log \sum_{c_{i} \in C} p (e_{u}, c_{i} | Θ, R),

(6)

where

Θ

is a set of model parameters,

R

is the interaction matrix, and

c_{i}

is the latent prototype of user u. Similarly, we can define the optimization objective for items.

Subsequently, the aim of the suggested prototype contrastive learning approach is to reduce the given function derived from InfoNCE [12]:

L_{P}^{U} = \sum_{u \in U} - log \frac{exp (e_{u} \cdot c_{i} / τ)}{\sum_{c_{j} \in C} exp (e_{u} \cdot c_{j} / τ)} .

(7)

where

c_{i}

is the prototype of user u, which is obtained by clustering over all the user embeddings with the K-means algorithm, and there are k clusters over all the users. The objective on the item side is identical:

L_{P}^{I} = \sum_{i \in I} - log \frac{exp (e_{i} \cdot c_{j} / τ)}{\sum_{c_{t} \in C} exp (e_{i} \cdot c_{t} / τ)} .

(8)

where

c_{j}

is the prototype of item i. The final prototype contrastive objective is the weighted sum of user objective and item objective:

L_{P} = L_{P}^{U} + α L_{P}^{I} .

(9)

In this approach, we deliberately integrate the semantic associations of users/items into contrastive learning to address the issue of data scarcity.

2.3. Neighborhood-Enriched Methods in Graph Learning

Neighborhood-enriched methods in graph learning aim to exploit the local structure of graphs by incorporating information from the neighbors of a given node [13]. In the context of recommendation systems, this can refer to the relationships between users or items in a user–item interaction graph. Neighborhood-enriched methods can be highly beneficial in capturing the complex dependencies and patterns in the data, which can lead to more accurate and effective recommendations.

Graph neural networks [14] are a class of deep learning models specifically designed to handle graph-structured data [15]. GNNs operate on graph data by iteratively aggregating and transforming the features of neighboring nodes to generate node representations that capture both local and global information [16]. The aggregation function of a GNN can be represented as follows:

h_{v}^{(l + 1)} = σ (\sum_{u \in N (v)} \frac{1}{c_{v u}} W^{(l)} h_{u}^{(l)})

(10)

where

h_{v}^{(l + 1)}

is the feature vector of node v at layer

l + 1

,

σ

is a nonlinear activation function,

N (v)

is the set of neighbors of node v,

c_{v u}

is a normalization constant,

W^{(l)}

is a learnable weight matrix at layer l, and

h_{u}^{(l)}

is the feature vector of node u at layer l.

By incorporating neighborhood information, GNNs can learn powerful and expressive representations of users and items, which can be used to make personalized recommendations.

Neighborhood-enriched contrastive learning combines the strengths of contrastive learning and neighborhood-enriched methods to enhance recommendation performance. By incorporating neighbors into contrastive pairs, neighborhood-enriched contrastive learning methods can effectively exploit the potential information contained in the user–item interaction graphs, leading to more accurate and robust recommendations, even in the presence of data sparsity.

3. Proposed Method

In this section, we present our proposed method, a hybrid recommendation model that integrates Singular Value Decomposition (SVD) and an optimized version of Neighborhood-enriched Contrastive Learning (NCL). The objective of our method is to leverage both the global structure and local neighborhood information inherent in the user–item interaction graph, thereby enhancing the recommendation performance.

3.1. Embedding Initialization via Low-Rank Approximation

The traditional methods [11,17] of initializing user and item embeddings in recommendation systems often rely on random or heuristic techniques. These approaches, however, may not adequately capture the intrinsic structure of user–item interaction data, which can lead to less than optimal performance in the early stages of training and slower convergence.

To address this, we suggest an alternative method for initializing user and item embeddings using Singular Value Decomposition (SVD), a powerful tool from the field of linear algebra that provides a low-rank approximation of a matrix. The SVD of matrix A is given by

A = U Σ V^{T}

(11)

In this equation, U and V are orthogonal matrices that contain the left and right singular vectors, respectively, and

Σ

is a diagonal matrix that holds the singular values. The interaction matrix can be broken down into three matrices, where the columns of U and V (i.e.,

u_{k}

and

v_{k}

) and

s_{k}

are left and right singular vectors and a singular value, respectively;

s_{1} > s_{2} > \dots \geq 0

;

diag (\cdot)

is the diagonalization operation. Components with larger (smaller) singular values contribute more (less) to interactions, allowing us to approximate R with only the K-largest singular values.

In the realm of recommendation systems, matrix A represents the user–item interaction matrix, with each entry

A_{u i}

indicating the interaction between user u and item i. By applying SVD to A, we obtain a low-rank approximation that captures the most significant structure in the user–item interactions.

We use the first k columns of U and V as the initial embeddings for users and items, where k is the dimension of the embedding. This strategy offers two main advantages: First, the SVD-based initialization encapsulates the global structure [16] of the user–item interaction graph, providing a solid foundation for the learning process. Second, it can potentially speed up the convergence of the training process, as the initial embeddings are already a good approximation of the final embeddings.

Alternatively, we can dynamically learn low-rank representations [4] through matrix factorization [18]:

min \sum_{(u, i) \in A^{+}} {∥A_{u i} - e_{u}^{T} e_{i}∥}_{2}^{2} + λ ({∥e_{u}∥}_{2}^{2} + {∥e_{i}∥}_{2}^{2}),

(12)

where

λ

is the regularization strength. Each user/item is considered as a node on the graph and parameterized as an embedding vector

e_{u} / e_{i} \in R^{d}

with dimension

d \leq min (| U |, | I |)

, and

A^{+} = {(u, i) ∣ A_{u i} = 1}

. By optimizing this objective function, the model is expected to learn important features from interactions (e.g., components corresponding to the d-largest singular values).

In conclusion, the SVD-based initialization provides a systematic and effective way to initialize the user and item embeddings in recommendation systems, potentially leading to enhanced performance and quicker convergence.

3.2. Enhancing Collaborative Filtering with Contrastive Learning

As mentioned in Section 2.3, GNN-based methods produce user and item representations by applying the propagation and prediction function on the interaction graph

G

. In NCL, we utilize GNN to model the observed interactions between users and items. Specifically, following LightGCN [11], we discard the nonlinear activation and feature transformation in the propagation function as follows:

\begin{matrix} z_{u}^{(l + 1)} & = \sum_{i \in N_{u}} \frac{1}{\sqrt{|N_{u}| |N_{i}|}} z_{i}^{(l)}, \\ z_{i}^{(l + 1)} & = \sum_{u \in N_{i}} \frac{1}{\sqrt{|N_{i}| |N_{u}|}} z^{(l)}, \end{matrix}

(13)

After propagating with L layers, we adopt the weighted sum function as the readout function to combine the representations of all layers and obtain the final representations as follows:

z_{u} = \frac{1}{L + 1} \sum_{l = 0}^{L} z_{u}^{(k)}, z_{i} = \frac{1}{L + 1} \sum_{l = 0}^{L} z_{i}^{(k)},

(14)

With the final representations, we adopt inner product to predict how likely a user u would interact with items i:

{\hat{y}}_{u, i} = z_{u}^{⊤} z_{i},

(15)

We incorporate an optimized version of the NCL approach into the learning process to capture local neighborhood information. The NCL approach defines two types of neighbors for a user (or an item): structural neighbors and semantic neighbors.

Structural neighbors are those who have interacted with the same items (or users). We introduce a self-supervised learning loss, denoted as

L_{s s l}

, to capture the structural neighborhood information. This loss is defined as follows:

L_{s s l} = - log (\frac{exp (dot (u, v_{pos}) / T)}{Σ exp (dot (u, v_{neg}) / T)})

(16)

Semantic neighbors are those with similar representations. We use the K-means clustering algorithm to identify the semantic neighbors. Each user (or item) is assigned to a cluster, and the centroid of the cluster is used as the prototype to represent the semantic neighbors. We introduce a prototype contrastive loss, denoted as

L_{proto}

, to capture the semantic neighborhood information. This loss is defined as follows:

L_{proto} = - log (\frac{exp (dot (u, c_{pos}) / T)}{Σ exp (dot (u, c_{neg}) / T)})

(17)

To capture the information from interactions directly, we adopt Bayesian Personalized Ranking (BPR) loss [19], which is a well-designed ranking objective function for recommendation. Specifically, BPR loss enforces the prediction scores of the observed interactions to be higher than those of the sampled unobserved ones. Formally, the objective function of BPR loss is as follows:

L_{B P R} = \sum_{(u, i, j) \in O} - log σ ({\hat{y}}_{u, i} - {\hat{y}}_{u, j}) .

(18)

By optimizing the BPR loss

L_{B P R}

, the suggested contrastive learning method enriched with neighborhood elements can capture the interplay between users and items. Nonetheless, higher-order connections among users (or items) are equally important for making recommendations. For instance, users often purchase items that their neighbors have bought. Moving forward, we will introduce two contrastive learning goals to harness the inherent neighborhood connections of both users and items.

3.3. Optimization for NCL

Building upon the NCL method, we introduce several key optimizations to further enhance its effectiveness and efficiency in capturing local neighborhood information for recommendation systems.

3.3.1. Dynamic Neighborhood Structure

In the standard NCL approach, the neighborhood structure is often fixed and predefined. This static approach may not adapt well to the dynamic nature of user–item interactions. To address this, we propose an adaptive neighborhood structure [20] that evolves during the learning process.

Specifically, we use the K-means [21] clustering algorithm to dynamically identify the semantic neighbors. The clustering process can be represented as

C, I = KMeans (X)

(19)

where X is the set of embeddings, C is the set of cluster centroids, and I is the assignment of each embedding to a cluster. This adaptive neighborhood structure is updated in each iteration of the expectation–maximization algorithm [22], allowing the model to adapt to changing user–item interactions and capture more accurate neighborhood information.

3.3.2. Unified Optimization of Contrastive Objectives

The original NCL approach optimizes the contrastive objectives separately, which may not be efficient and can lead to suboptimal solutions. We propose a unified optimization framework that balances the trade-off between the structural and semantic neighborhood information.

The unified optimization objective can be represented as

L = λ_{ssl} * L_{ssl} + λ_{proto} * L_{proto}

(20)

where

L_{ssl}

is the self-supervised learning loss for structural neighbors,

L_{proto}

is the prototype contrastive loss for semantic neighbors, and

λ_{ssl}

and

λ_{proto}

are weight parameters. This unified optimization can lead to more efficient learning and better performance.

3.3.3. Regularization on Prototypes

To prevent the prototypes from being too close to each other, which can improve the discriminative power of the prototypes, we introduce a regularization term to the prototypes in the prototype contrastive objective. The regularized prototype contrastive loss can be represented as

L_{proto_reg} = L_{proto} + γ * | | C - C^{'} {| |}^{2}

(21)

where C and

C^{'}

are the current and previous cluster centroids,

γ

is a regularization parameter, and

| | . | |

denotes the Frobenius norm. This regularization encourages the prototypes to move in each iteration, which can enhance the performance of contrastive learning.

In summary, these refinements provide a more flexible and efficient way to capture the neighborhood information in recommendation systems. They offer a new perspective on how to leverage contrastive learning in recommendation systems, leading to improved performance and faster convergence. By integrating these refinements with the SVD-based initialization, our proposed method provides a comprehensive solution for enhancing the performance of recommendation systems.

4. Experiments and Evaluation

4.1. Datasets

We evaluate the performance of our proposed method on five public datasets: MovieLens-1M (ML-1M) [23], Yelp2018 [24], Amazon Books, Gowalla, and Alibaba-iFashion [1]. These datasets vary in domain, scale, and density. For Yelp2018 and Amazon Books, we filter out users and items with fewer than 15 interactions to ensure data quality. The statistics of the datasets are summarized in Table 1.

The selection of these datasets was driven by a few key considerations:

Variety in Domains: These datasets span multiple domains—movie ratings, restaurant reviews, book reviews, social networking check-ins, and fashion. This wide coverage helps ensure the robustness of the model across diverse domains, which is key to a good machine learning model.
Scale and Density: These datasets vary not only in terms of the number of records (scale), but also in terms of the density of interactions. Some datasets may have a high number of interactions per user/item (dense), whereas others might have fewer interactions per user/item (sparse). Both of these scenarios pose unique challenges in recommendation systems, and dealing with both in training helps ensure the model’s adaptability.
Data Quality: For Yelp2018 and Amazon Books, filters have been applied to exclude users and items with fewer than 15 interactions. This decision is to ensure data quality and reliable signal in the data. It helps avoid cases where the model might overfit to users/items with very few interactions, thus making the evaluation more reliable.

Overall, these datasets were chosen to ensure that the evaluation is both rigorous and representative of various real-world situations.

For each dataset, we randomly select 80% of interactions as training data, 10% of interactions as validation data, and the remaining 10% interactions for performance comparison. We uniformly sample one negative item for each positive instance to form the training set.

4.2. Experiment Setup

4.2.1. Compared Models

We compare our proposed method with the following several state-of-the-art models:

SGL [25]: Incorporates self-supervised learning to improve recommendation systems. Our chosen model for SGL is SGL-ED.
NGCF [8]: Leverages the user–item bipartite graph to include high-order connections and employs GNN to bolster CF techniques.
NCL [3]: Advances graph collaborative filtering using neighborhood-enhanced contrastive learning, which our approach is rooted in. We utilize RUCAIBox/NCL as the model representation of NCL.

4.2.2. Evaluation Metrics

To assess the efficacy of top-N recommendations [26], we employ the commonly utilized metrics of Recall@N and NDCG@N [27,28], with N values of 10, 20, and 50 to maintain uniformity. As per earlier studies, we use the full-ranking approach, ranking all potential items that the user has not engaged with.

4.3. Implementation Details

We utilize the RecBole [29] open-source framework to develop our model, as well as all baseline algorithms. For a balanced comparison, we employ the Adam optimizer across all methods and meticulously fine-tune the hyperparameters for each baseline. We designate a batch size of 4096 and use the standard Xavier distribution for initializing parameters. The embedding dimensions are configured to 64. To deter overfitting, we apply early stopping after 10 epochs without improvement, using NDCG@10 as the benchmark indicator. We tune the weight hyperparameters

λ_{ssl}

,

L_{proto}

in [

1 \times 10^{- 10}

,

1 \times 10^{- 6}

], temperature hyperparameter

τ

in [0.01, 1], and number of clusters k in [5, 10,000].

4.4. Overall Performance

Table 2 presents a comprehensive performance comparison of the SGL model, NCL model, NGCF model, and our proposed model LoRA-NCL across various datasets. The results are insightful and reveal several key observations.

Firstly, LoRA-NCL consistently outperforms NCL across most datasets. The superior performance of LoRA-NCL can be attributed to its ability to effectively capture both the global structure and local neighborhood information inherent in the user–item interaction graph. This is achieved through the integration of Singular Value Decomposition (SVD) and an optimized version of NCL, which allows LoRA-NCL to leverage both explicit and implicit feedback from users, thereby enhancing the recommendation performance.

Interestingly, there are instances where NCL outperforms LoRA-NCL. This can be attributed to the inherent differences in the learning mechanisms of the two models. NCL, with its focus on capturing local neighborhood information, might be more effective in scenarios where local patterns and dependencies play a more significant role in user–item interactions. On the other hand, LoRA-NCL, which aims to capture both global and local structures, might be less effective when the global structure is sparse or less informative.

In conclusion, while LoRA-NCL generally outperforms NCL, the choice between the two models should be guided by the specific characteristics of the dataset and the computational resources available.

Table 3 presents the performance comparison of LoRA-NCL with different embedding sizes. The results are insightful and reveal several key observations.

LoRA-NCL (256) and LoRA-NCL (128) outperform LoRA-NCL (64) in most of the metrics across all datasets. For instance, in the MovieLens-1M dataset, LoRA-NCL (256) outperforms LoRA-NCL (64) by approximately 1.2% in Recall@10, 1.1% in NDCG@10, 1.8% in Recall@20, 1.4% in NDCG@20, 1.8% in Recall@50, and 4.6% in NDCG@50. Similar trends can be observed in other datasets such as Yelp, Amazon Books, and Gowalla.

The performance difference between LoRA-NCL (64) and LoRA-NCL (256) can be attributed to the increased capacity of the model with a larger embedding size. A larger embedding size allows the model to capture more nuanced features of the user–item interactions, leading to better performance. However, it is important to note that the improvement comes at the cost of increased computational complexity and memory usage.

The possible reason that higher embedding size led to better performance is that a larger embedding size provides a more expressive representation space for the items and users. This allows the model to capture more complex and subtle patterns in the user–item interactions, which can lead to improved recommendation performance. However, it is important to note that the benefits of a larger embedding size should be weighed against the increased computational cost and the risk of overfitting.

4.5. Further Analysis

In this subsection, we delve deeper into the results of our experiments to gain more insights into the performance of our proposed method.

4.5.1. Performance across Different Datasets

Our method demonstrated varying performance across the different datasets. We used recall and Normalized Discounted Cumulative Gain (NDCG) as our primary performance metrics. Recall, i.e., the proportion of true positives to the combined total of true positives and false negatives, provides insight into the percentage of accurate positive predictions made by our model. NDCG, on the other hand, is a measure of ranking quality, summing up the graded relevance values of all results in the list.

4.5.2. Comparison with Other Methods

Compared to other leading-edge techniques, our suggested approach demonstrated enhanced effectiveness. The NDCG scores of our method were consistently higher than those of the other methods, and the recall scores of our method were always higher than those of other the methods, except for one data point. Our method is more effective at NDCG.

4.6. Impact of Parameter Choices

The performance of our model is influenced by several parameters, one of the most significant being the size of the embeddings. Embedding size refers to the dimensionality of the vectors used to represent items in the recommendation system.

In our experiments, we found that the choice of embedding size had a substantial impact on the performance of our model. Specifically, smaller embedding sizes tended to result in faster training times but at the cost of model accuracy. On the other hand, larger embedding sizes led to more accurate models, but with an increase in computational complexity and training time.

Interestingly, there appeared to be a ‘sweet spot’ for the embedding size. Beyond a certain point, increasing the embedding size did not lead to significant improvements in model performance, and in some cases, it even led to a decrease in performance. This could be due to the model overfitting to the training data when given too many parameters to learn.

Therefore, it is crucial to carefully choose the embedding size when implementing our model. We recommend conducting a thorough parameter tuning process, such as grid search or random search, to find the optimal embedding size for the specific dataset and problem at hand.

4.7. Limitations and Future Directions

While our proposed method shows promising results, it also has several limitations. One of the main limitations is the sensitivity of the model to the choice of embedding size. The performance of our model can vary significantly with different embedding sizes, and finding the optimal size can be a computationally intensive process. Moreover, the optimal embedding size may not be the same for all datasets, adding another layer of complexity to the problem.

Another limitation is related to parameter tuning. Our model has several hyperparameters that need to be carefully tuned to achieve the best performance. However, the optimal set of parameters can vary depending on the specific characteristics of the dataset and the problem at hand, making the tuning process challenging and time-consuming.

Despite these limitations, our research opens up several avenues for future work. One potential direction is to develop more efficient methods for determining the optimal embedding size and tuning the model parameters. This could involve using more advanced optimization techniques or incorporating additional prior knowledge about the problem into the tuning process. Another interesting direction would be to explore ways to make the model less sensitive to the choice of embedding size and other parameters, thereby making it more robust and easier to use. We believe that addressing these limitations and exploring these future directions can further improve the performance and applicability of our method.

5. Conclusions

While our method shows promising results, it has several limitations that provide directions for future work. First, our method assumes that the user–item interaction graph is static, which may not hold in real-world scenarios where user–item interactions are dynamic. Future work could explore how to incorporate temporal information into our method. Second, our method relies on the K-means algorithm to identify semantic neighbors, which may not be optimal for all datasets. Future work could investigate other clustering algorithms or learn the neighborhood structure in an end-to-end manner. Lastly, our method is designed for explicit feedback data. Adapting it to implicit feedback data, where only positive interactions are observed, is another interesting direction for future research.

Author Contributions

Conceptualization, T.C. and Z.H.; methodology, T.C.; software, T.C.; validation, T.C., H.C. and Z.H.; formal analysis, T.C.; resources, T.H.; data curation, T.C.; writing—original draft preparation, T.C.; writing—review and editing, H.C.; visualization, T.C.; supervision, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data is unavailable due to privacy or ethical restrictions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Chen, W.; Huang, P.; Xu, J.; Guo, X.; Guo, C.; Sun, F.; Li, C.; Pfadler, A.; Zhao, H.; Zhao, B. POG: Personalized Outfit Generation for Fashion Recommendation at Alibaba iFashion. arXiv 2019, arXiv:1905.01866. [Google Scholar]
Bermeitinger, B.; Hrycej, T.; Handschuh, S. Singular Value Decomposition and Neural Networks. In Artificial Neural Networks and Machine Learning—ICANN 2019: Deep Learning; Tetko, I.V., Kůrková, V., Karpov, P., Theis, F., Eds.; Springer: Cham, Switzerland, 2019; pp. 153–164. [Google Scholar]
Lin, Z.; Tian, C.; Hou, Y.; Zhao, W.X. Improving Graph Collaborative Filtering with Neighborhood-enriched Contrastive Learning. In Proceedings of the ACM Web Conference 2022, Virtual Event, 25–29 April 2022. [Google Scholar]
Peng, S.; Sugiyama, K.; Mine, T. SVD-GCN: A Simplified Graph Convolution Paradigm for Recommendation. arXiv 2022, arXiv:2208.12689. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-Based Collaborative Filtering Recommendation Algorithms. In Proceedings of the WWW’01: 10th International Conference on World Wide Web, New York, NY, USA, 1–5 May 2001; pp. 285–295. [Google Scholar]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G.E. A Simple Framework for Contrastive Learning of Visual Representations. arXiv 2020, arXiv:2002.05709. [Google Scholar]
Khosla, P.; Teterwak, P.; Wang, C.; Sarna, A.; Tian, Y.; Isola, P.; Maschinot, A.; Liu, C.; Krishnan, D. Supervised Contrastive Learning. arXiv 2021, arXiv:2004.11362. [Google Scholar]
Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T. Neural Graph Collaborative Filtering. arXiv 2019, arXiv:1905.08108. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T. Neural Collaborative Filtering. arXiv 2017, arXiv:1708.05031. [Google Scholar]
Strub, F.; Mary, J. Collaborative Filtering with Stacked Denoising AutoEncoders and Sparse Inputs. In Proceedings of the NIPS 2015, Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. arXiv 2020, arXiv:2002.02126. [Google Scholar]
van den Oord, A.; Li, Y.; Vinyals, O. Representation Learning with Contrastive Predictive Coding. arXiv 2019, arXiv:1807.03748. [Google Scholar]
Lin, S.; Zhou, P.; Hu, Z.Y.; Wang, S.; Zhao, R.; Zheng, Y.; Lin, L.; Xing, E.; Liang, X. Prototypical Graph Contrastive Learning. arXiv 2022, arXiv:2106.09645. [Google Scholar] [CrossRef] [PubMed]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph Neural Networks: A Review of Methods and Applications. arXiv 2021, arXiv:1812.08434. [Google Scholar] [CrossRef]
Ward, I.R.; Joyner, J.; Lickfold, C.; Guo, Y.; Bennamoun, M. A Practical Tutorial on Graph Neural Networks. arXiv 2021, arXiv:2010.05234. [Google Scholar] [CrossRef]
Xu, M.; Wang, H.; Ni, B.; Guo, H.; Tang, J. Self-supervised Graph-level Representation Learning with Local and Global Structure. arXiv 2021, arXiv:2106.04113. [Google Scholar]
Baluja, S.; Seth, R.; Sivakumar, D.; Jing, Y.; Yagnik, J.; Kumar, S.; Ravichandran, D.; Aly, M. Video Suggestion and Discovery for Youtube: Taking Random Walks through the View Graph. In Proceedings of the WWW’08: 17th International Conference on World Wide Web, Beijing, China, 21–25 April 2008; pp. 895–904. [Google Scholar]
Koren, Y.; Bell, R.; Volinsky, C. Matrix Factorization Techniques for Recommender Systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian Personalized Ranking from Implicit Feedback. arXiv 2012, arXiv:1205.2618. [Google Scholar]
Song, K.; Han, J.; Cheng, G.; Lu, J.; Nie, F. Adaptive Neighborhood Metric Learning. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 44, 4591–4604. [Google Scholar] [CrossRef] [PubMed]
Jin, X.; Han, J. K-Means Clustering. In Encyclopedia of Machine Learning; Sammut, C., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2010; pp. 563–564. [Google Scholar]
Moon, T. The expectation-maximization algorithm. IEEE Signal Process. Mag. 1996, 13, 47–60. [Google Scholar] [CrossRef]
Harper, F.M.; Konstan, J.A. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 2015, 5, 19. [Google Scholar] [CrossRef]
Kronmueller, M.; Chang, D.j.; Hu, H.; Desoky, A. A Graph Database of Yelp Dataset Challenge 2018 and Using Cypher for Basic Statistics and Graph Pattern Exploration. In Proceedings of the 2018 IEEE International Symposium on Signal Processing and Information Technology (ISSPIT), Louisville, KY, USA, 6–8 December 2018; pp. 135–140. [Google Scholar]
Wu, J.; Wang, X.; Feng, F.; He, X.; Chen, L.; Lian, J.; Xie, X. Self-supervised Graph Learning for Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 11–15 July 2020. [Google Scholar]
Kabbur, S.; Ning, X.; Karypis, G. FISM: Factored Item Similarity Models for Top-N Recommender Systems. In Proceedings of the KDD’13: 19th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Chicago, IL, USA, 11–14 August 2013; pp. 659–667. [Google Scholar]
Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. Evaluating Collaborative Filtering Recommender Systems. ACM Trans. Inf. Syst. 2004, 22, 5–53. [Google Scholar] [CrossRef]
Silveira, T.; Zhang, M.; Lin, X.; Liu, Y.; Ma, S. How good your recommender system is? A survey on evaluations in recommendation. Int. J. Mach. Learn. Cybern. 2019, 10, 813–831. [Google Scholar] [CrossRef]
Zhao, W.X.; Mu, S.; Hou, Y.; Lin, Z.; Chen, Y.; Pan, X.; Li, K.; Lu, Y.; Wang, H.; Tian, C.; et al. RecBole: Towards a Unified, Comprehensive and Efficient Framework for Recommendation Algorithms. arXiv 2021, arXiv:2011.01731. [Google Scholar]

Table 1. Statistics of the datasets.

Dataset	#Users	#Items	#Interactions	Density
MovieLens-1M	6040	3629	36,478	0.03816
Yelp2018	45,478	30,709	1,777,765	0.00127
Amazon Books	58,145	58,052	2,517,437	0.00075
Gowalla	29,859	40,989	1,027,464	0.00084
Alibaba-iFashion	300,000	816,148	1,607,813	0.00007

Table 2. Performance comparison of all datasets.

Dataset	Metric	SGL	NCL	NGCF	LoRA-NCL
MovieLens-1M	Recall@10	0.1888	0.2057	0.1846	0.2051
	NDCG@10	0.2526	0.2732	0.2528	0.2720
	Recall@20	0.2848	0.3037	0.2741	0.3062
	NDCG@20	0.2649	0.2843	0.2614	0.2860
	Recall@50	0.4487	0.4686	0.4341	0.4734
	NDCG@50	0.3111	0.3300	0.3055	0.3316
Yelp	Recall@10	0.0833	0.0920	0.0630	0.1070
	NDCG@10	0.0601	0.0678	0.0446	0.0895
	Recall@20	0.1288	0.1377	0.1026	0.1445
	NDCG@20	0.0739	0.0817	0.0567	0.1010
	Recall@50	0.2140	0.2247	0.1864	0.2116
	NDCG@50	0.0964	0.1046	0.0784	0.1192
Amazon Books	Recall@10	0.0898	0.0933	0.0617	0.1167
	NDCG@10	0.0645	0.0679	0.0427	0.0860
	Recall@20	0.1331	0.1381	0.0978	0.1665
	NDCG@20	0.0777	0.0815	0.0537	0.1011
	Recall@50	0.2157	0.2175	0.1699	0.2525
	NDCG@50	0.0992	0.1024	0.0725	0.1239
Gowalla	Recall@10	0.1465	0.1500	0.1192	0.1574
	NDCG@10	0.1048	0.1082	0.0852	0.1145
	Recall@20	0.2084	0.2133	0.1755	0.2240
	NDCG@20	0.1225	0.1265	0.1013	0.1337
	Recall@50	0.3197	0.3259	0.2811	0.3400
	NDCG@50	0.1497	0.1542	0.1270	0.1621
Dataset	Metric	SGL	NCL	NGCF	LoRA-NCL (128)
Alibaba-iFashion	Recall@10	0.0461	0.0477	0.0382	0.0547
	NDCG@10	0.0248	0.0259	0.0198	0.0299
	Recall@20	0.0692	0.0713	0.0615	0.0805
	NDCG@20	0.0307	0.0319	0.0257	0.0364
	Recall@50	0.1141	0.1165	0.1081	0.1288
	NDCG@50	0.0396	0.0409	0.0349	0.0460

The best result is in bold and the runner-up is underlined.

Table 3. Performance comparison of different embedding sizes of LoRA-NCL.

Dataset	Metric	LoRA-NCL (64)	LoRA-NCL (256)
MovieLens-1M	Recall@10	0.1939	0.2051
	NDCG@10	0.2613	0.2720
	Recall@20	0.2884	0.3062
	NDCG@20	0.2720	0.2860
	Recall@50	0.4555	0.4734
	NDCG@50	0.2851	0.3316
Yelp	Recall@10	0.0923	0.1070
	NDCG@10	0.0685	0.0895
	Recall@20	0.1382	0.1445
	NDCG@20	0.0827	0.1010
	Recall@50	0.2237	0.2116
	NDCG@50	0.1054	0.1192
Amazon Books	Recall@10	0.0993	0.1167
	NDCG@10	0.0717	0.0860
	Recall@20	0.1453	0.1665
	NDCG@20	0.0857	0.1011
	Recall@50	0.2295	0.2525
	NDCG@50	0.1080	0.1239
Gowalla	Recall@10	0.1493	0.1574
	NDCG@10	0.1071	0.1145
	Recall@20	0.2121	0.2240
	NDCG@20	0.1252	0.1337
	Recall@50	0.3267	0.3400
	NDCG@50	0.1532	0.1621
Dataset	Metric	LoRA-NCL (64)	LoRA-NCL (128)
Alibaba-iFashion	Recall@10	0.0513	0.0547
	NDCG@10	0.0278	0.0299
	Recall@20	0.0762	0.0805
	NDCG@20	0.0341	0.0364
	Recall@50	0.1228	0.1288
	NDCG@50	0.0434	0.0460

The best result is in bold.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Cao, T.; Chen, H.; Hao, Z.; Hu, T. LoRA-NCL: Neighborhood-Enriched Contrastive Learning with Low-Rank Dimensionality Reduction for Graph Collaborative Filtering. Mathematics 2023, 11, 3577. https://doi.org/10.3390/math11163577

AMA Style

Cao T, Chen H, Hao Z, Hu T. LoRA-NCL: Neighborhood-Enriched Contrastive Learning with Low-Rank Dimensionality Reduction for Graph Collaborative Filtering. Mathematics. 2023; 11(16):3577. https://doi.org/10.3390/math11163577

Chicago/Turabian Style

Cao, Tianruo, Honghui Chen, Zepeng Hao, and Tao Hu. 2023. "LoRA-NCL: Neighborhood-Enriched Contrastive Learning with Low-Rank Dimensionality Reduction for Graph Collaborative Filtering" Mathematics 11, no. 16: 3577. https://doi.org/10.3390/math11163577

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

LoRA-NCL: Neighborhood-Enriched Contrastive Learning with Low-Rank Dimensionality Reduction for Graph Collaborative Filtering

Abstract

1. Introduction

2. Background and Related Work

2.1. Singular Value Decomposition (SVD)

2.2. Contrastive Learning

2.2.1. Contrastive Learning with Structural Neighbors

2.2.2. Contrastive Learning with Semantic Neighbors

2.3. Neighborhood-Enriched Methods in Graph Learning

3. Proposed Method

3.1. Embedding Initialization via Low-Rank Approximation

3.2. Enhancing Collaborative Filtering with Contrastive Learning

3.3. Optimization for NCL

3.3.1. Dynamic Neighborhood Structure

3.3.2. Unified Optimization of Contrastive Objectives

3.3.3. Regularization on Prototypes

4. Experiments and Evaluation

4.1. Datasets

4.2. Experiment Setup

4.2.1. Compared Models

4.2.2. Evaluation Metrics

4.3. Implementation Details

4.4. Overall Performance

4.5. Further Analysis

4.5.1. Performance across Different Datasets

4.5.2. Comparison with Other Methods

4.6. Impact of Parameter Choices

4.7. Limitations and Future Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI