Bi-View Contrastive Learning with Hypergraph for Enhanced Session-Based Recommendation

Wang, Zijun; Wei, Lai

doi:10.3390/info16040267

Open AccessArticle

Bi-View Contrastive Learning with Hypergraph for Enhanced Session-Based Recommendation

by

Zijun Wang

and

Lai Wei

^*

School of Information Engineering, Shanghai Maritime University, Shanghai 201306, China

^*

Author to whom correspondence should be addressed.

Information 2025, 16(4), 267; https://doi.org/10.3390/info16040267

Submission received: 13 February 2025 / Revised: 12 March 2025 / Accepted: 24 March 2025 / Published: 26 March 2025

Download

Browse Figures

Versions Notes

Abstract

:

Session-based recommendation (SBR) aims to predict a user’s next interests based on their actions in a single visit. Recent methods utilize graph neural networks to study the pairwise relationship of item transfers, yet these often overlook the complex high-order connections between items. Hypergraphs can naturally model many-to-many relationships and capture complex interactions, thereby improving the accuracy of SBR. However, the potential of hypergraphs in SBR remains underexplored. This paper models session data into two views: the hypergraph view, which employs hypergraph convolution, and the session view, which utilizes the intersection graph of the hypergraph with standard graph convolution to support the main recommendation task. By combining cross-view contrastive learning with view adversarial training as an auxiliary task, the two views recursively exploit different connectivity information to generate ground truth samples, thus enriching the session information. Extensive experiments on three benchmark datasets confirm the effectiveness of our hypergraph modeling approach and cross-view contrastive learning.

Keywords:

session-based recommendation; hypergraph convolutional network; contrastive learning; bi-view graph learning; multi-task joint learning

1. Introduction

Session-based recommendation (SBR) is a special form of recommendation system that predicts items or products users may be interested in based on their behavior during a session. Unlike other recommendation systems [1,2,3], SBR does not rely on long-term historical data, but focuses on the user’s interactive behavior in the current session, such as browsing, clicking, and purchasing.

Based on the characteristics of SBR, early research focused on strategies based on similarity [4,5] and Markov chain models [6,7,8,9]. Although these methods are simple and effective, the similarity-based approaches rely on the co-occurrence information of target session items and overlook the sequential pattern of the sequence. When the user-item interaction matrix is sparse, similarity calculations tend to be unreliable. Moreover, popular items are disproportionately recommended due to their high interaction frequencies, whereas long-tail items are frequently neglected. Meanwhile, Markov chain models are unable to capture higher-order dependencies (e.g., latent patterns across multiple interactions) and are not robust to session noise, such as accidental clicks.

With the advancement of deep learning technology, neural network-based SBR methods have progressively emerged. Recurrent Neural Networks (RNNs) and their variants are extensively utilized in SBR [10,11,12,13], effectively extracting rich session information by modeling the data as one-way sequences. Although these methods offer improved performance compared to traditional approaches, difficulties are encountered in learning dependencies from long-distance sequences, and limitations exist in effectively mining features from session data. Moreover, RNNs lack the ability to filter out noise in sequences (such as accidental clicks or random browsing), which may lead to recommendation bias.

To overcome these limitations, Graph Neural Networks (GNNs) [14,15,16,17] have been introduced into the field of SBR. These networks model session data as a graph structure to capture neighborhood information and inter-item interactions. The embedded representations of items in the target session are effectively fused, expressing user preferences by propagating and updating item embeddings within the graph structure.

Although GNN-based methods have made some progress in enhancing model performance, these approaches primarily focus on extracting user preferences within the current session, neglecting the potential value of cross-session information. The graph structure construction is typically based on simple co-occurrence relationships, neglecting the dynamic evolution of user interests. As the number of GNN layers increases, node features tend to become similar, resulting in recommendation outcomes that lack differentiation. As a result, they fail to fully explore users’ long-term preferences.

In response, hypergraph neural networks [18,19,20,21,22,23] have been proposed to address these limitations. Their ability to capture complex high-order relationships between items enhances the expressive power of the SBR model, further enriching its effectiveness.

Additionally, GNN-based methods typically only consider the item transfer relationships within the target session and do not utilize collaborative information from other sessions, leading to suboptimal performance when faced with data sparsity. Considering that graph contrastive learning algorithms, which can alleviate data sparsity by enhancing data representation [24], are increasingly applied to recommendation systems. In these methods, most generate positive and negative samples by random dropout nodes and edges [25,26,27,28,29]. However, these contrastive learning strategies, despite achieving performance gains in recommendation tasks, exacerbate the sparsity of session data. The random dropout strategy disrupts the semantic integrity of sessions by potentially removing critical interest anchors and reducing session length, leading to an incomplete and inaccurate capture of user interests, while its lack of behavioral logic constraints may introduce noise unrelated to the true distribution of user interests.

To address these issues, this paper proposes a cross-view contrastive learning SBR method based on a hypergraph structure, designed to accurately capture the complex relationships between items in session data. Firstly, differing from the traditional single hypergraph modeling paradigm, this paper designs a bi-view collaborative learning architecture: on one hand, it models the high-order relationships among items within sessions through a hypergraph view, and on the other hand, it innovatively introduces an intersection graph view to capture shared item interaction information across sessions, allowing sparse items to gain supplementary information through multi-session hyperedges, thereby overcoming the limitations of single-view models that are easily affected by data sparsity. Building on this, compared to existing contrastive learning recommendation methods that mostly rely on a single graph structure and struggle to fully utilize cross-session information, we propose a cross-view contrastive learning mechanism. This mechanism maximizes the mutual information between the hypergraph view and the intersection graph view, facilitating knowledge transfer between heterogeneous structures. Additionally, adversarial training is incorporated into the contrastive learning process to constrain the distribution differences in the bi-view embedding space, preventing mode collapse and enhancing the model’s generalization capabilities. Furthermore, to address the issue of hypergraph methods neglecting sequential information, we innovatively integrate Transformer positional encoding into the hypergraph neural network, strengthening the model’s ability to perceive item temporal relationships while maintaining the advantage of high-order relationship modeling. Finally, to further enhance the model’s generalization ability in data-sparse scenarios, a multi-task joint optimization framework is employed to collaboratively integrate multi-dimensional supervision signals such as contrastive learning loss, adversarial training loss, and sequential prediction loss, achieving synergistic enhancement of representation learning in data-sparse scenarios. Experiments demonstrate that this framework significantly improves the accuracy and robustness of session recommendations through the synergistic effects of view complementarity, contrastive enhancement, temporal awareness, and adversarial stability.

All in all, an approach termed Bi-view Contrastive Learning with Hypergraph for Enhanced Session-based Recommendation (BCHRec) is proposed, where the main contributions can be summarized as follows:

Breaking through the limitations of traditional hypergraph recommendations that rely on a single view, we propose a modeling architecture that integrates hypergraph views and intersection graph views. We innovatively design a cross-view contrastive learning mechanism combined with adversarial training to constrain the distribution differences in the bi-view embedding space, effectively alleviating data sparsity and mode collapse issues, and achieving information complementarity and model stability.
For the first time, Transformer positional encoding is introduced into hypergraph recommendation models, constructing a hybrid representation learning module that captures both high-order relationships and temporal awareness. This addresses the defect of existing hypergraph methods ignoring the sequential information of items within sessions, significantly improving the modeling accuracy of users’ dynamic preferences.
A multi-task learning framework that jointly optimizes contrastive learning, adversarial training, and recommendation prediction is proposed. This framework strengthens the accuracy and generalization ability of item representations in sparse scenarios, providing a new perspective on model optimization for complex session recommendation tasks.
Extensive experiments on three benchmark datasets demonstrate significant superiority of this model and achieve notable improvements in recommendation effectiveness.

2. Related Work

2.1. Session-Based Recommendation

In the field of SBR research, the primary objective is to learn user preferences and predict subsequent behaviors from intra-session interactions. These sessions typically exclude user-specific information and occur within short-term sequences. As a result, some classical works [30,31,32], which derive user preferences from long-term historical behaviors, are not suitable for making recommendations in these anonymous, short-term sessions. SBR methods can be broadly categorized into two types based on the underlying technology: traditional techniques and deep learning approaches. Among the traditional methods, POP [33] focuses on recommending the most popular items, while Item-KNN [34] calculates item similarities based on historical behaviors. Subsequent efforts have explored mining temporal information from session data using Markov chains [8,18,35]. The FPMC [6] model initially combined Markov chains with matrix factorization to parse sequence data and user preferences.

With the advent of deep learning, GRU4Rec [10] introduced the use of recurrent neural networks to track dynamic user behaviors for the first time. Building on this, NARM [11], and STAMP [12] each incorporated attention mechanisms to more precisely capture shifts in users’ immediate interests and overall preferences. As graph neural network (GNN) technology advanced, the SBR domain began to explore complex inter-item relationships. SR-GNN [36] employed a gated graph neural network to analyze interaction graphs, while GC-SAN [15] enhanced model comprehension of interactions through a self-attention mechanism. FGNN [37] further improved performance with weighted attention layers. GCE-GNN [38] applied graph convolution to global transformations, aggregating more relevant items for local interactions. Additionally, HADCG [39], MSGAT [40], and KMVG [41] focused on intra-item collaboration and inter-session relationships by constructing graphs with varied structures. Despite the significant advantages demonstrated by these GNN-based methods in SBR, they still struggle to fully capture the complex high-order connectivity between items.

2.2. Hypergraph Learning

Hypergraphs naturally capture higher-order structures, making them highly effective in various recommendation systems. HGNN [18] and HyperGCN [19] first apply graph convolution techniques to hypergraphs. In recent years, research has expanded the application of hypergraph neural networks to recommendation algorithms. For instance, in a general recommendation, the DHCF [21] framework utilizes a hypergraph collaborative filtering approach to learn hybrid high-order correlations by modeling separate hypergraphs for users and items. In sequence-based recommendation, HyperRec [19] employs hypergraphs based on user interaction sequences to capture dynamic relationships that evolve over time and reflect shifting user preferences, thereby extracting short-term item relevance. In the context of SBR, DHCN [14] leverages hyperedge aggregation of session items and incorporates self-supervised learning into a dual-channel GCN to enhance recommendation accuracy. Additionally, in social recommendation tasks, MHCN [22] enhances recommendations by utilizing high-order user relationships through multi-channel hypergraph convolutions. These studies underscore the significant potential and expanding application prospects of hypergraph-based methods in SBR and beyond.

2.3. Contrastive Learning

As a representative technique of self-supervised learning, contrastive learning seeks to maximize similarity between positive samples and minimize it between negative samples to effectively represent data. This approach has recently drawn significant attention, particularly in graph representation learning. For example, DGI [42] and InfoGraph [25] enhance mutual information between a graph and its substructures at varying granularities to capture high-quality representations. GraphCL [26] investigates the impacts of node dropout and edge removal across different graph augmentation scenarios. In the recommendation sector, SGL [27] enhances graph-based collaborative filtering through a multi-task framework, while MBSSL [28] addresses data sparsity and interaction noise by employing a comprehensive SSL paradigm across inter- and intra-behavioral levels. RGCL [29] incorporates node and edge discrimination tasks as additional self-supervised optimization targets. However, these contrastive learning strategies, despite improving recommendation performance, are less effective for SBR due to the increased sparsity caused by random dropout. Recently, efforts like VGCL [43], which employs variational graph reconstruction to generate contrastive views without using random dropout, and MHCPL [44], which captures dynamic user preferences across multiple relationship types from different perspectives, have emerged. RESTC [45] proposed a spatio-temporal contrastive learning framework aiming to align and refine the representations of spatial and temporal views in the latent space for the SBR task. ReCAFR [46] not only employs review data for augmentation to mitigate the sparsity problem, but also aligns the tripartite representations to improve robustness. MCGCL [47] adopts the view of holistic bipartite graph learning and homogeneous subgraph learning and constructs graphs with two augmentation methods, including adding edges and removing edges. DHCN [14] simplifies the process by generating negative samples through random shuffling of embeddings, although this can sometimes compromise sample quality.

3. Method

3.1. Problem Setup and Definitions

This paper captures the complex high-order relationships between items using hypergraphs and extracts cross-session information using the intersection graph of hypergraphs. Figure 1 presents the framework of our model. Figure 2 and Figure 3 provide a detailed illustration of the construction process of the hypergraph and its intersection graph.

3.1.1. Problem Setup

Let

I = {i_{1}, i_{2}, i_{3}, \dots, i_{n}}

represent the set of items, where n is the number of items in the set. Each session sequence is represented as a chronologically ordered sequence

s = [i_{(s, 1)}, i_{(s, 2)}, i_{(s, 3)}, \dots, i_{(s, t)}]

,

i_{(s, m)}

that denotes the m-th clicked item by the user in session s. Each item

i \in I

is embedded into the same space, and

x_{i}^{l} \in R^{d^{(l)}}

represent the representation of the

d^{(l)}

-dimensional item i in the l-th layer neural network. For the entire item set, denote it as

X^{(l)} \in R^{N \times d^{(l)}}

. The SBR task is to predict the next item

i_{(s, t + 1)}

that may be clicked, based on the items interacted with by the previous t users; that is, given the item set I and the session sequence s, the SBR model outputs a sorted list

z = {z_{1}, z_{2}, z_{3}, \dots, z_{n}}

, where

z_{i}

is the predicted probability corresponding to item i. Based on the calculated scores, the top-K items with the highest predicted probability in z are recommended as the final suggestions.

3.1.2. Definition 1: Hypergraph View

In order to better capture the semantics of sessions, it is crucial to effectively model the complex interrelationships within them. Traditional graph models can only capture pairwise item relationships, whereas user behavior in real-world sessions often exhibits high-order correlations. Hypergraphs model such high-order relationships through hyperedges, each of which can connect an arbitrary number of nodes, naturally representing the complex interaction patterns within a session. Hypergraphs extend the concept of edges to connect more than two vertices [18,48]. Specifically, the construction process is as follows: all items in a session are treated as hypergraph nodes. Each session is directly mapped to a hyperedge, incorporating all items within the session into the same hyperedge, thereby explicitly modeling the high-order co-occurrence relationships among items within the session. Let

H = (V; E = {(e_{i})}_{i \in I})

represent a hypergraph, consisting of a vertex set V and a hyperedge set E. Each hyperedge

e \in E

is assigned a positive weight W, with all weights forming a diagonal matrix

W \in R^{| E | \times | E |}

. The hypergraph is represented by an incidence matrix

H \in R^{| V | \times | E |}

, where

H_{v e}

indicates whether the vertex

v \in V

is included in the hyperedge e:

H_{v e} = \{\begin{matrix} 1, & if v \in e, \\ 0, & if v \notin e . \end{matrix}

(1)

For each vertex

v \in V

, the degree matrix

D

is defined as

D_{e e} = \sum_{e \in E} W_{e e} H_{v e}

; for each hyperedge

e \in E

, its degree matrix

B

is defined as

B_{e e} = \sum_{v \in V} H_{v e}

. Both

D

and

B

are diagonal matrices.

3.1.3. Definition 2: Session View

Define the session view as an intersection graph of hypergraphs. Given a hypergraph

H = (V; E)

, the intersection graph

L (H)

of the hypergraph H is a graph. Each hyperedge in the hypergraph H is converted into a node in the intersecting graph

L (H)

. If there is one or more common nodes between two hyperedges in the hypergraph H, then

L (H)

Two nodes are considered connected [49]. Follow [14], formally,

L (H) = (V^{'}; E^{'})

, where

V^{'} = {v_{e} : v_{e} \in E}

,

E^{'} = {(v_{e_{p}}, v_{e_{q}}) : e_{p}, e_{q} \in E, |e_{p} \cap e_{q}| \geq 1}

. Assign a weight

W_{p, q}

to each edge

v_{e_{p}}, v_{e_{q}}

, where

W_{p, q} = |e_{p} \cap e_{q}| / |e_{p} \cup e_{q}|

.

3.2. Method Framework

3.2.1. Hypergraph View and Hypergraph Convolution

This paper develops a hypergraph convolutional network to capture high-level relationships between items in session sequences. Following the spectral hypergraph convolution of [18,50], the hypergraph convolution is defined as follows:

x_{i}^{(l + 1)} = \sum_{v \in V} \sum_{e \in E} H_{i e} H_{v e} x_{v}^{(l)} W^{(l)}

(2)

where

x_{i}^{(l)}

is the embedding of the i-th vertex in the l-th layer,

W^{(l)} \in R^{d \times d}

is the weight between the l-th layer and the

(l + 1)

layer matrix. Inspired by [14,51], the complexity of the convolution is reduced by removing nonlinearity and folding the weight matrix. For the weighted hyperedge weight W, each hyperedge is considered equally important, i.e., the hyperedge is given the same weight of 1.

After that, rewrite the convolution operation into matrix form:

X^{(l + 1)} = D^{- 1} H B^{- 1} H^{⊤} X^{(l)} W^{(l)}

(3)

where

X^{(l)}

and

X^{(l + 1)}

are the inputs of the l-th layer and

(l + 1)

layer, respectively.

Hypergraph convolution transforms features within a hypergraph structure through a two-step refinement process, achieving a “node-hyperedge-node” feature transformation. Initially, the transposed correlation matrix

H^{T}

aggregates node features onto hyperedges, creating hyperedge features. These are then gathered back to the nodes through H to produce refined node features. Additionally, node and hyperedge degree matrices are used for normalization to ensure smooth feature propagation. In essence, hypergraph convolution meticulously adjusts the feature transformation process by aggregating information first from nodes to hyperedges and then back to nodes.

After L layers of hypergraph convolution, the embeddings from all layers are averaged to derive the final item representation:

X_{h} = \frac{1}{L + 1} \sum_{l = 0}^{L} X_{h}^{(l)}

(4)

3.2.2. Transformer as Encoder

The user’s preferences are hidden in the sequence of items within the session, that is, the sequence of clicks with position encoding. Considering that user preferences may change over time, temporal information is added to item embeddings to encode dynamics. To further enhance this model, a learnable position matrix is introduced.

P_{r} = [p_{1}, p_{2}, p_{3}, \dots, p_{m}]

(5)

Use reverse position embedding [14,38] to represent temporal information and integrate it with learned items:

x_{t}^{*} = t a n h (W_{1} [x_{t}; p_{t}] + b_{1})

(6)

where

W_{1} \in R^{d \times 2 d}

,

b_{1} \in R^{d}

are learnable parameters,

p_{t} \in R^{d}

, and

x_{t}^{*}

is the modified item embedding, which contains time information.

Inspired by the efficacy of the self-attention mechanism in modeling transfer relationships within sequences for semantic capture [15,52,53,54], the Transformer model is employed for the main recommendation task. Session embeddings are derived by aggregating clicked items within a session, utilizing a multi-head self-attention mechanism to discern item preferences effectively:

E_{p} = [x_{1}^{*}; x_{2}^{*}; x_{3}^{*}; \dots; x_{m}^{*}]

(7)

α^{m} = \frac{(E_{p}^{l} W_{Q}^{m}) {(E_{p}^{l} W_{K}^{m})}^{T}}{\sqrt{d / M}}

(8)

\hat{E_{p}^{l}} = \overset{M}{\underset{m = 1}{∥}} α^{m} E_{p}^{l} W_{V_{T}}^{m}

(9)

E_{p}^{l + 1} = R e L U (\hat{E_{p}^{l}} W_{2} + b_{2}) W_{3} + b_{3}

(10)

ϕ_{H} = \sum_{l = 1}^{L^{'}} \hat{E_{p}^{l}}

(11)

where ∥ denotes the concatenation operation,

L^{'}

and M represent the total number of multi-head self-attention blocks and the number of heads, respectively.

α^{m}

represents the attention value of the m-th head.

W_{Q}^{m}

,

W_{K}^{m}

,

W_{V_{T}}^{m} \in R^{d \times (d / M)}

are, respectively, the m-th head projection matrix corresponding to query, key, and value in the attention mechanism.

W_{2}

,

W_{3} \in R^{d \times d}

and

b_{2}

,

b_{3} \in R^{d}

form a point-by-point feedforward network. Use residual connections to obtain the final session embedding

ϕ_{H}

.

3.2.3. Predication

The inner product of the session embedding

ϕ_{H}

, derived from the hypergraph, and the item

x_{i}

is employed to compute the predicted preference scores for all items within the system:

z = ϕ_{H}^{T} x_{i}

(12)

Afterwards, softmax normalization is used to calculate the probability of all items becoming the next recommended item:

\hat{z} = s o f t m a x (z)

(13)

The cross-entropy loss function is utilized to measure the difference between two probability distributions. Within the context of recommendation systems, these distributions are the predicted and the actual (target) distributions. The cross-entropy loss function is utilized to calculate the optimization objective for the primary supervised task:

L_{r e c} = \sum_{i = 1}^{|V|} - z_{i} l o g \hat{z_{i}} - (1 - z_{i}) l o g (1 - \hat{z_{i}})

(14)

where

z_{i}

is the one-hot encoding vector of the ground truth value.

3.2.4. Session View and Graph Convolution

Session view convolution involves transforming and encoding the intersection graph of a hypergraph. The intersection graph contains cross-session information, capturing the complex structure of the hypergraph in a more intuitive way. Since items are not directly involved in the session view, a method is required to initialize the embeddings for each session. Specifically, for each session, the embeddings of all involved items are averaged to compute the initial embedding vector

Φ

for that session. Define graph convolution on the session view as follows:

Φ^{(l + 1)} = {\hat{D}}^{- 1} \hat{A} Φ^{(l)} W^{(l)}

(15)

where

\hat{A} = A + I

, I is the identity matrix, and A is the adjacency matrix of the intersection graph. According to definition 2,

A_{p, q} = W_{p, q}

,

\hat{D} \in R (M \times M)

is a diagonal matrix,

{\hat{D}}_{p, q} = \sum_{q = 1}^{m} {\hat{A}}_{p, q}

.

S^{(l)}

and

W^{(l)}

represent the session embedding and parameter matrix of layer l, respectively.

Similar to hypergraph convolution, the initial session embedding undergoes processing through an L-layer graph convolutional network to capture session-level information. By averaging the embeddings from each of these L layers, the final session representation is obtained.

Φ_{s} = \frac{1}{L + 1} \sum_{l = 0}^{L} Φ_{s}^{(l)}

(16)

3.2.5. Cross-View Contrastive Learning

In the SBR task, it is assumed that the strongest correlation exists between the last clicked item and the next item in the session. Consequently, the contrastive learning component is designed to maximize (or minimize) the consistency between the representation of the last clicked item and that of its predicted successor.

Inspired by other works [55,56,57,58,59], if an item is considered to have a high probability of being the next candidate in one view, it should also be regarded as valuable in another view. According to this idea, consider the selection of positive and negative samples in the session view as an example. Given a session

θ

in this view, its representation, learned from the hypergraph view, is used to predict its next potential positive and negative item samples. Specifically, items with the highest probability scores—referred to as the Top-K—are chosen from the item sample set as the positive samples. Conversely, top-ranked items that do not make it into the Top-K are randomly selected from both views to serve as negative samples. This cross-view information exchange allows the model to gain a comprehensive understanding of the data. Items that are top-ranked but not within the Top-K serve as challenging negative samples, pushing the model to learn more detailed and complex feature representations, thus improving the model’s ability to make accurate judgments between similar samples. The selection of positive and negative samples for the hypergraph view follows the same methodology.

Based on the defined positive and negative samples, for a given session representation

θ

and a prediction target, the contrastive loss is formally defined using InfoNCE [59]:

\begin{matrix} L_{c o n} = \\ - l o g \frac{\sum_{i \in F_{H}^{p o s}} e^{s (x_{H}^{'}, x_{H}^{i}) / τ}}{\sum_{i \in F_{H}^{p o s}} e^{s (x_{H}^{'}, x_{H}^{i}) / τ} + \sum_{j \in F_{H}^{n e g}} e^{s (x_{H}^{'}, x_{H}^{j}) / τ} + \sum_{j \in F_{S}^{n e g}} e^{s (x_{H}^{'}, x_{(0)}^{j}) / τ}} \\ - l o g \frac{\sum_{i \in F_{S}^{p o s}} e^{s (x_{(0)}^{'}, x_{(0)}^{i}) / τ}}{\sum_{i \in F_{S}^{p o s}} e^{s (x_{(0)}^{'}, x_{(0)}^{i}) / τ} + \sum_{j \in F_{S}^{n e g}} e^{s (x_{(0)}^{'}, x_{(0)}^{j}) / τ} + \sum_{j \in F_{H}^{n e g}} e^{s (x_{(0)}^{'}, x_{H}^{j}) / τ}} \end{matrix}

(17)

where

x^{'}

is the embedding of the last clicked item in a given session,

τ

represents a temperature parameter,

s (\cdot)

is the cosine similarity calculation, and

F^{p o s}

and

F^{n e g}

represent positive and negative sample sets, respectively. When selecting Top-K item samples, since the session encoder does not output item embeddings,

x_{(0)}

is used.

In general, cross-view modeling enhances feature representation, where hypergraph theory [60] demonstrates that hyperedges connecting multiple nodes can effectively cover long-tail interactions, thereby compensating for information loss in sparse sessions. In contrast, contrastive learning methods based on random dropout may compromise the semantic integrity of sessions. Specifically, the removal of core interest nodes could lead to the breakage of critical behavioral sequences, while in sparse sessions, random dropout further reduces information density. In BCHRec, hyperedges encompassing multi-node interactions strengthen the representational capacity for sparse data through their ability to capture complex, high-order dependencies.

In terms of hard negative sample design, BCHRec enhances the discriminative power of the model. Contrastive learning approaches relying on random negative sampling can lead the model to learn only coarse-grained classification boundaries, failing to distinguish similar interests in sparse data. By selecting “high-ranked but nontop-K” candidates from bi-view as negative samples, which exhibit semantic proximity to positives with subtle differences, our strategy compels the model to learn finer-grained feature boundaries, thereby improving its ability to differentiate analogous interests.

3.2.6. Adversarial Training and Constraints

In the proposed framework, both views originate from the same data source and capture rich information across multiple dimensions. However, after multiple iterations, the bi-view encoders may produce highly similar feature representations when processing identical session inputs. This homogenization leads to mutual interference between cross-session learning capabilities and sequential modeling capacities of the session view, ultimately failing to extract unique and meaningful information from their respective perspectives, thereby undermining the core advantage of multi-view modeling. Consequently, it becomes crucial to differentiate the encoders to some extent.

To address this challenge, inspired by [55], we introduce a cross-view adversarial training mechanism. The core idea is to break the encoders’ homogenization tendency by injecting perturbations, forcing them to focus on heterogeneous feature spaces and compelling the model to recalibrate feature representations under adversarial examples. Specifically, if two trained encoders can resist each other’s adversarial attacks while maintaining accurate predictions under these carefully crafted perturbations, this demonstrates their high robustness. To generate such adversarial examples, following [55], we employ the Fast Gradient Sign Method (FGSM) [55,61], which introduces adversarial perturbations into model parameters through rapid gradient computation. The FGSM method generates adversarial samples by leveraging the gradient of the loss function with respect to the input, controlling both the magnitude and direction of perturbations to deceive classifiers. In essence, it creates adversarial examples by adjusting the input values along the gradient sign direction, adding or subtracting a perturbation value to each dimension of the embedding vectors. The perturbation strength is constrained by a step size parameter to ensure adversarial samples remain semantically plausible while effectively misleading classifiers. The calculation and updating of the perturbation are described as follows:

\begin{matrix} x_{a d v} & = x + r_{a d v}, \\ r_{a d v} = ϵ \frac{g}{∥g∥}, & g = \frac{\partial l_{a d v} (\hat{y} ∣ x + r)}{\partial r} \end{matrix}

(18)

where

l_{a d v} (\hat{y} ∣ x + r)

is the loss of the adversarial sample, and

ϵ

is the control parameter.

At this point, each view has undergone adversarial perturbations, leading to a noticeable dissimilarity between the two encoders. To manage this divergence while preserving a degree of information consistency, the Kullback–Leibler divergence is employed to impose differential constraints on the encoders:

L_{K L} = K L (P_{H} (X_{H}), P_{S} (X_{H} + r_{a d v}^{H})) + K L (P_{S} (X_{H}), P_{H} (X_{H} + r_{a d v}^{S}))

(19)

where

P_{H} (X_{H})

and

P_{S} (X_{H})

are calculated as follows:

P_{H} (X_{H}) = S o f t m a x (X_{H} ϕ_{H}), P_{S} (X_{H}) = S o f t m a x (X_{H} ϕ_{S})

(20)

where

P_{H} (\cdot)

and

P_{S} (\cdot)

denote the probabilities that an item is recommended within a given session,

r_{adv}^{H}

and

r_{adv}^{S}

represent the adversarial perturbations applied to item embeddings for

ϕ_{H}

and

ϕ_{S}

, respectively. The notation

K L (\cdot)

denotes the Kullback–Leibler divergence. The term

P_{S} (X_{H} + r_{adv}^{H})

refers to the probability distribution produced by the session encoder when

X_{H}

is modified by

r_{adv}^{H}

. If the session encoder can effectively handle the adversarial perturbation

r_{adv}^{H}

introduced by the hypergraph encoder, it will generate a probability distribution similar to

P_{H} (X_{H})

, thus yielding a lower

L_{K L}

value.

3.2.7. Multi-Task Joint Learning

Finally, the main recommendation task, the contrastive learning task, and the diversity constraint task are integrated into a unified objective:

L = L_{r e c} + β L_{c o n} + λ L_{K L} + {∥θ∥}_{F}^{2}

(21)

where

β

and

λ

are hyperparameters that control the scale of the contrastive learning and diversity constraint tasks, respectively. The Adam optimizer is used to minimize

L

.

4. Experiments

4.1. Experimental Setup

4.1.1. Dataset

The model was evaluated on three real-world benchmark datasets:

Tmall (https://tianchi.aliyun.com/dataset/dataDetail?dataId=42, accessed on 1 January 2024): This dataset originates from the IJCAI-15 competition, containing anonymous user shopping records from the Tmall online shopping platform.
Diginetica (https://competitions.codalab.org/competitions/11161#learn_the_details-data2, accessed on 1 January 2024): This dataset comes from the CIKM Cup 2016 personalized e-commerce search challenge, featuring five months of shopping click records on an e-commerce website.
RetailRocket (https://www.kaggle.com/retailrocket/ecommerce-dataset, accessed on 1 January 2024): RetailRocket is a dataset released by an e-commerce website, which includes six months of user browsing activity, with all values hashed for privacy reasons.

For these three datasets, follow [36]. For all three datasets, we reserved sessions that involved at least two user interactions and removed items appearing fewer than five times. This is because recommendation systems rely on contextual information: Sessions of length 1 cannot provide sequential patterns, rendering them ineffective for training sequence-based recommendation models. Infrequent items offer limited utility for training large-scale models. We use the sessions from the most recent week as test data, and split datasets into training and test sets based on temporal slices. We then divided sequences into multiple shorter subsequences, with the last item in each session labeled as the prediction target. This data preprocessing approach enhances the model’s learning capability and improves the precision of conversational recommendations. Data statistics are shown in Table 1. Figure 4 and Figure 5 show the distribution of session length and item popularity. From the table and figures, we can observe that the average session length in the three datasets is relatively short, with most being short sequences. Additionally, although the number of items is large, the number of interactions is limited, exhibiting a long-tail distribution.

4.1.2. Baselines

We select ten classic SBR algorithms as baseline models.

POP [33]: Recommends items based on their click frequency within sessions.
Item-KNN [34]: Recommends items similar to previously clicked ones in a session, based on the cosine similarity of session vectors.
FPMC [6]: Combines Markov chains and matrix factorization to recommend the next item based on sequential behavior.
GRU4REC [10]: Uses gated recurrent units to model user sequences and optimizes the model using a ranking-based loss function.
NARM [11]: Integrates gated recurrent units and attention mechanisms to capture latent user intents and infer user preferences.
STAMP [12]: Utilizes a self-attention mechanism, combining long-term interests with the most recent clicks to enhance SBR.
FGNN [55]: Frames the next-item recommendation within a session as a graph classification problem, transforming the target session into a directed weighted graph and learning session features using a weighted attention graph structure and readout function.
SR-GNN [36]: Applies gated recurrent units and graph convolutional layers to capture transitions between items.
DHCN [14]: Constructs hypergraphs to learn inter-session and intra-session information and employs self-distinguishing contrastive learning to enhance SBR.
Atten-Mixer [62]: A multi-level attention-mixing network, leveraging readings from both conceptual and instance views to enable hierarchical reasoning for item transitions.

4.1.3. Evaluation Metrics

The recommendation results are evaluated using P@K (Precision) and MRR@K (Mean Reciprocal Rank), where K is set to 10 or 20.

Precision@K

Precision@K measures the proportion of relevant items in the top K recommendations, quantifying how many items a user truly cares about appear in the recommended results. The formula is as follows:

P r e c i s i o n @ K = \frac{R_{K}}{K}

(22)

where

R_{K}

denotes the number of relevant items in the top K recommendations, K is the length of the recommendation list.

MRR@K (Mean Reciprocal Rank@K)

Mean Reciprocal Rank (MRR) measures the position of the first correct recommendation (relevant item) in a ranked list. It focuses on the rank of the first correct item rather than merely the quantity of relevant items. The higher the MRR value (indicating the first relevant item appears earlier in the list), the better the recommendation quality. The formula is as follows:

M R R @ K = \frac{1}{N} \sum_{i = 1}^{N} \frac{1}{{rank}_{i}}

(23)

where N denotes the total number of users, and

r a n k_{i}

denotes the rank position of the first relevant item for the

i - t h

user in the recommendation list (if no relevant item is present, the reciprocal rank is set to 0).

4.1.4. Parameter Tuning

The embedding size is set to 100, and the mini-batch size is also set to 100. All parameters are initialized using a Gaussian distribution with a mean of 0 and a standard deviation of 0.1. The model is optimized using the Adam optimizer with a learning rate of 0.001. For baseline models, the best parameter settings reported in the original literature are referenced and their results are utilized.

4.2. Experimental Results

Table 2 reports the overall performance results, with the best results in each column highlighted in bold. The BCHRec achieves better performance than baselines, which demonstrates its effectiveness. In our model,

P @ K

and

M @ K

have both improved compared to the baseline models, which indicates the following:

$P @ 10$ ↑: A higher proportion of relevant items in the top 10 recommendations, increasing the likelihood of users seeing effective results during the first-round interaction.
$P @ 20$ ↑: Better overall quality of the recommendation list. Even when expanding the scope to the top 20 items, the model maintains high accuracy, demonstrating its strong broad coverage capability.
$M @ 10$ ↑: The average rank of the first relevant item is closer to the top, allowing users to discover points of interest faster and reducing interaction costs.
$M @ 20$ ↑: Even when extending to the top 20 items, the model efficiently identifies the first relevant item, proving the robustness of its ranking strategy—expanding the recommendation scope does not significantly sacrifice efficiency.

Additionally, from the results, several conclusions can be drawn:

Traditional SBR models such as POP and Item-KNN generally have lower accuracy compared to deep learning-based models like GRU4REC, NARM, STAMP, SR-GNN, and DHCN. This is due to the capability of deep learning technologies such as neural networks to capture deep features between items and sessions, while POP and Item-KNN rely on shallow statistics, or low-order sequential patterns, failing to capture long-range dependencies or multi-hop relationships.
Methods that incorporate positional information, such as GRU4REC, NARM, STAMP, and SR-GNN, significantly outperform traditional methods like POP and FPMC that do not consider positional information. This is because the inclusion of positional information provides richer contextual information for the SBR system, allowing it to more precisely understand users’ real-time needs and preferences, underscoring the crucial role of capturing the sequential dependencies between items for performance improvement.
Additionally, in methods based on RNNs, NARM and STAMP achieve better performance than GRU4REC. Although GRU4REC uses GRU neural networks to process sequence data, it lacks the use of current interest preferences of users. NARM and STAMP, by using attention mechanisms to learn the importance of each item, effectively capture the current interest preferences of users.
Graph-based baseline methods like FGNN, SR-GNN, and DHCN outperform RNN-based methods, demonstrating the powerful session feature-learning capability of graph neural networks, which can capture more complex relationships between items. Among them, DHCN achieves higher accuracy than SR-GNN, proving that capturing information at different session levels (inter-session and intra-session information) is beneficial for accurately predicting user intentions.
The BCHRec proposed in this paper outperforms all baseline methods on three datasets, demonstrating the powerful performance of hypergraph modeling and cross-view contrastive learning, as well as its effectiveness when applied to e-commerce data. BCHRec shows superior performance over DHCN primarily because DHCN only conducts self-distinguishing contrastive learning and does not fully utilize the information interaction between the two views. Atten Mixer also uses multi-view to model user intent, showing second best performance on all baseline models, but its utilization of graphs is insufficient and it also faces data sparsity issues, which limits its performance.

4.3. Ablation Study

To study the contribution of various components within the BCHRec model, three variants of BCHRec were developed: BCHRec-Single, BCHRec-NT, and BCHRec-ND. BCHRec-Single only uses the hypergraph view for data modeling, removing the session view and the cross-view contrastive learning component; BCHRec-NT removes the use of positional information; BCHRec-ND only employs adversarial training without applying diversity constraints. Experiments were conducted on these three variants and the complete BCHRec on the Diginetica and RetailRocket datasets, with results shown in Figure 6.

From Figure 6, it is evident that the performance of each component is consistent across the Diginetica and RetailRocket datasets. The diversity constraints play a significant role in training views, as removing these constraints nearly leads to model collapse. By enforcing the bi-view to learn complementary representations, diversity constraints prevent information redundancy between the views. When such constraints are eliminated, the two views tend to converge to similar feature spaces, causing the model to lose the advantages of multi-view learning (i.e., “model collapse”). This homogenization phenomenon undermines the effectiveness of cross-view contrastive learning, making it challenging for the recommendation system to capture multi-dimensional characteristics of user behavior.

The mutual support between the bi-view is also extremely apparent, demonstrating that intersecting auxiliary views from the same data source can exchange information with the hypergraph view through cross-view contrastive learning, enhancing the recommendation performance of the primary supervision task in the hypergraph view. The bi-view architecture enables knowledge transfer via contrastive learning: the hypergraph view provides global structural information, while the cross-auxiliary view supplements local interaction patterns. Removing this mutual support isolates the two views into independent information silos, preventing feature calibration and enhancement through contrastive loss.

Furthermore, the performance of BCHRec-NT highlights the effectiveness of positional embeddings and self-attention mechanisms, suggesting that considering positional information in SBR helps better connect contextual information, and learning the importance of different items using self-attention mechanisms is more beneficial for recommendation accuracy than merely using the average of item embeddings. Removing positional embeddings would degrade the model’s ability to capture temporal dependencies, weakening its capacity to identify recent user interests.

4.4. Performance Comparison of Different Data Sparsity

We investigated the effectiveness of BCHRec in handling sparse datasets. We performed downsampling on 20% of the training data and evaluated the model on the same test dataset. The experimental results are shown in Table 3. As data sparsity increases, BCHRec achieves significant performance improvements on both datasets.

4.5. Hyper-Parameter Sensitivities

4.5.1. Effect of Contrastive Loss Weight $β$

In BCHRec, a hyperparameter

β

is introduced to control the magnitude of contrastive learning. To investigate the impact of the cross-view contrastive learning task,

β

is varied within the set {0.005, 0.01, 0.05, 0.1, 0.2, 0.5}. According to the results shown in Figure 7, it is observed that for both datasets, smaller

β

values can simultaneously improve P@20 and M@20. As

β

increases, performance gradually declines. Moreover, with an increase in

β

, the performance of M@20 significantly decreases, suggesting that in some cases, it is important to balance between hit rate and ranking when choosing the value of

β

. The optimal

β

value obtained for both datasets is 0.01.

4.5.2. Effect of Diversity Constraint Weight $λ$

In BCHRec, a hyperparameter

λ

is introduced to control the magnitude of the diversity constraints. The value of

λ

is varied within the set {0, 0.001, 0.005, 0.01, 0.05, 0.1, 0.2, 0.5}. According to the results shown in Figure 8, smaller values of

λ

significantly enhance both P@20 and M@20. However, as

λ

increases, performance substantially decreases and may even lead to model collapse. This suggests that a large difference between the two view encoders hinders the model’s ability to learn complementary and effective feature representations from each view, resulting in the loss of information useful for the overall task. The optimal

λ

value obtained is 0.005 for the RetailRocket dataset and 0.001 for the Diginetica dataset.

4.5.3. Effect of Model Depth L

To examine the influence of the depth of the hypergraph convolutional network, the range of network layers was set to {1, 2, 3, 4, 5}. According to the results shown in Figure 9, setting the depth to three layers exhibited the best performance for both the RetailRocket and Diginetica datasets. As the number of layers increased, performance declined due to the oversmoothing problem.

4.5.4. Effect of Transformer Module Depth $L_{T}$

In order to investigate the impact of the depth of the Transformer module, the range of network layers was set to {1, 2, 3}. According to the results shown in Figure 10, setting the depth of the Transformer module to one layer exhibited the best performance on the Diginetica and RetailRocket datasets. This suggests that for the current tasks, a complexity of one layer is sufficient to capture the required data patterns and relationships. As the number of layers increased, the model suffered from overfitting due to excessive capacity, resulting in decreased performance.

5. Conclusions

This paper primarily explores the enhancement of SBR using a hypergraph-guided cross-view contrastive learning paradigm. The proposed BCHRec consists mainly of a hypergraph structure encoder and a regular graph encoder. It models session data as a hypergraph to swiftly capture the high-order complexity of items, utilizes two views to model intra-session and inter-session relationships, and employs cross-view contrastive learning and diversity constraints as auxiliary tasks to enrich the information within sessions. Extensive experiments confirm the superiority of BCHRec compared to competitive baselines.

To demonstrate its broader impact, we plan to extend the framework in future work:

Integrating multi-modal data, such as item images or textual descriptions, to enhance recommendation accuracy;
Adapting the hypergraph-based approach to other domains, such as social recommendation or knowledge graph completion, where complex relationships are prevalent.

These extensions highlight the versatility and practical potential of our work.

Author Contributions

Conceptualization, Z.W.; methodology, Z.W.; software, Z.W.; validation, Z.W.; formal analysis, Z.W.; investigation, Z.W.; resources, Z.W.; data curation, Z.W.; writing—original draft preparation, Z.W.; writing—review and editing, Z.W. and L.W.; visualization, Z.W.; supervision, L.W. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The public datasets had been listed in Section 5 of the manuscript. The code for this study is available in the following repository link: https://github.com/violetingithub/paper/tree/main/BCH, accessed on 16 May 2024.

Acknowledgments

The authors thank the anonymous reviewers for their valuable suggestions and our universities for facilitating our time support in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Li, W.; Zheng, W.; Xiao, X.; Wang, S. STAN: Stage-Adaptive Network for Multi-Task Recommendation by Learning User Lifecycle-Based Representation. In Proceedings of the 17th ACM Conference on Recommender Systems, Singapore, 18–22 September 2023; pp. 602–612. [Google Scholar]
Mao, K.; Zhu, J.; Xiao, X.; Lu, B.; Wang, Z.; He, X. UltraGCN: Ultra simplification of graph convolutional networks for recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Online, 1–15 November 2021; pp. 1253–1262. [Google Scholar]
Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.S. Neural graph collaborative filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
Bonnin, G.; Jannach, D. Automated generation of music playlists: Survey and experiments. Acm Comput. Surv. (CSUR) 2014, 47, 1–35. [Google Scholar] [CrossRef]
Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820. [Google Scholar]
Shani, G.; Heckerman, D.; Brafman, R.I.; Boutilier, C. An MDP-based recommender system. J. Mach. Learn. Res. 2005, 6, 1265–1295. [Google Scholar]
Zimdars, A.; Chickering, D.M.; Meek, C. Using temporal data for making recommendations. arXiv 2013, arXiv:1301.2320. [Google Scholar]
Tavakol, M.; Brefeld, U. Factored MDPs for detecting topics of user sessions. In Proceedings of the 8th ACM Conference on Recommender Systems, Foster City, CA, USA, 6–10 October 2014; pp. 33–40. [Google Scholar]
Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv 2015, arXiv:1511.06939. [Google Scholar]
Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1419–1428. [Google Scholar]
Liu, Q.; Zeng, Y.; Mokhosi, R.; Zhang, H. STAMP: Short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1831–1839. [Google Scholar]
Yu, F.; Liu, Q.; Wu, S.; Wang, L.; Tan, T. A dynamic recurrent model for next basket recommendation. In Proceedings of the 39th International ACM SIGIR Conference on Research and Development in Information Retrieval, Pisa, Italy, 17–21 July 2016; pp. 729–732. [Google Scholar]
Xia, X.; Yin, H.; Yu, J.; Wang, Q.; Cui, L.; Zhang, X. Self-supervised hypergraph convolutional networks for session-based recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; pp. 4503–4511. [Google Scholar]
Xu, C.; Zhao, P.; Liu, Y.; Sheng, V.S.; Xu, J.; Zhuang, F.; Fang, J.; Zhou, X. Graph contextualized self-attention network for session-based recommendation. In Proceedings of the International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; Volume 19, pp. 3940–3946. [Google Scholar]
Chen, T.; Wong, R.C.W. Handling information loss of graph neural networks for session-based recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 1172–1180. [Google Scholar]
Pan, Z.; Cai, F.; Chen, W.; Chen, H.; De Rijke, M. Star graph neural networks for session-based recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 1195–1204. [Google Scholar]
Feng, Y.; You, H.; Zhang, Z.; Ji, R.; Gao, Y. Hypergraph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Hawai, HI, USA, 27 January–1 February 2019; pp. 3558–3565. [Google Scholar]
Yadati, N.; Nimishakavi, M.; Yadav, P.; Nitin, V.; Louis, A.; Talukdar, P. Hypergcn: A new method for training graph convolutional networks on hypergraphs. Adv. Neural Inf. Process. Syst. 2019, 32, 1511–1522. [Google Scholar]
Yang, Y.; Huang, C.; Xia, L.; Liang, Y.; Yu, Y.; Li, C. Multi-behavior hypergraph-enhanced transformer for sequential recommendation. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 14–18 August 2022; pp. 2263–2274. [Google Scholar]
Ji, S.; Feng, Y.; Ji, R.; Zhao, X.; Tang, W.; Gao, Y. Dual channel hypergraph collaborative filtering. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 2020–2029. [Google Scholar]
Yu, J.; Yin, H.; Li, J.; Wang, Q.; Hung, N.Q.V.; Zhang, X. Self-supervised multi-channel hypergraph convolutional network for social recommendation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 413–424. [Google Scholar]
Tan, S.; Bu, J.; Chen, C.; Xu, B.; Wang, C.; He, X. Using rich social media information for music recommendation via hypergraph model. ACM Trans. Multimed. Comput. Commun. Appl. (TOMM) 2011, 7, 1–22. [Google Scholar]
Jing, M.; Zhu, Y.; Zang, T.; Wang, K. Contrastive self-supervised learning in recommender systems: A survey. ACM Trans. Inf. Syst. 2023, 42, 1–39. [Google Scholar] [CrossRef]
Sun, F.Y.; Hoffmann, J.; Verma, V.; Tang, J. Infograph: Unsupervised and semi-supervised graph-level representation learning via mutual information maximization. arXiv 2019, arXiv:1908.01000. [Google Scholar]
You, Y.; Chen, T.; Sui, Y.; Chen, T.; Wang, Z.; Shen, Y. Graph contrastive learning with augmentations. Adv. Neural Inf. Process. Syst. 2020, 33, 5812–5823. [Google Scholar]
Wu, J.; Wang, X.; Feng, F.; He, X.; Chen, L.; Lian, J.; Xie, X. Self-supervised graph learning for recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 11–15 July 2021; pp. 726–735. [Google Scholar]
Xu, J.; Wang, C.; Wu, C.; Song, Y.; Zheng, K.; Wang, X.; Wang, C.; Zhou, G.; Gai, K. Multi-behavior self-supervised learning for recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, China, 23–27 July 2023; pp. 496–505. [Google Scholar]
Shuai, J.; Zhang, K.; Wu, L.; Sun, P.; Hong, R.; Wang, M.; Li, Y. A review-aware graph contrastive learning framework for recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1283–1293. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Wang, J.; Caverlee, J. Recurrent recommendation with local coherence. In Proceedings of the 12th ACM International Conference on Web Search and Data Mining, Melbourne, Australia, 11–15 February 2019; pp. 564–572. [Google Scholar]
Wang, J.; Louca, R.; Hu, D.; Cellier, C.; Caverlee, J.; Hong, L. Time to Shop for Valentine’s Day: Shopping Occasions and Sequential Recommendation in E-commerce. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 645–653. [Google Scholar]
Adomavicius, G.; Tuzhilin, A. Toward the next generation of recommender systems: A survey of the state-of-the-art and possible extensions. IEEE Trans. Knowl. Data Eng. 2005, 17, 734–749. [Google Scholar] [CrossRef]
Davidson, J.; Liebald, B.; Liu, J.; Nandy, P.; Van Vleet, T.; Gargi, U.; Gupta, S.; He, Y.; Lambert, M.; Livingston, B.; et al. The YouTube video recommendation system. In Proceedings of the 4th ACM conference on Recommender systems, Barcelona, Spain, 26–30 September 2010; pp. 293–296. [Google Scholar]
Yin, H.; Cui, B. Spatio-Temporal Recommendation in Social Media; Springer: Berlin/Heidelberg, Germany, 2016. [Google Scholar]
Wu, S.; Tang, Y.; Zhu, Y.; Wang, L.; Xie, X.; Tan, T. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Hawaii, HI, USA, 27 January–1 February 2019; pp. 346–353. [Google Scholar]
Qiu, R.; Li, J.; Huang, Z.; Yin, H. Rethinking the item order in session-based recommendation with graph neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 579–588. [Google Scholar]
Wang, Z.; Wei, W.; Cong, G.; Li, X.L.; Mao, X.L.; Qiu, M. Global context enhanced graph neural networks for session-based recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 169–178. [Google Scholar]
Su, J.; Chen, C.; Liu, W.; Wu, F.; Zheng, X.; Lyu, H. Enhancing hierarchy-aware graph networks with deep dual clustering for session-based recommendation. In Proceedings of the ACM Web Conference 2023, Austin, TX, USA, 30 April–4 May 2023; pp. 165–176. [Google Scholar]
Qiao, S.; Zhou, W.; Wen, J.; Zhang, H.; Gao, M. Bi-channel Multiple Sparse Graph Attention Networks for Session-based Recommendation. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, Birmingham, UK, 21–25 October 2023; pp. 2075–2084. [Google Scholar]
Chen, Q.; Guo, Z.; Li, J.; Li, G. Knowledge-enhanced multi-view graph neural networks for session-based recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, China, 23–27 July 2023; pp. 352–361. [Google Scholar]
Veličković, P.; Fedus, W.; Hamilton, W.L.; Liò, P.; Bengio, Y.; Hjelm, R.D. Deep graph infomax. arXiv 2018, arXiv:1809.10341. [Google Scholar]
Yang, Y.; Wu, Z.; Wu, L.; Zhang, K.; Hong, R.; Zhang, Z.; Zhou, J.; Wang, M. Generative-contrastive graph learning for recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, China, 23–27 July 2023; pp. 1117–1126. [Google Scholar]
Zhao, S.; Wei, W.; Mao, X.L.; Zhu, S.; Yang, M.; Wen, Z.; Chen, D.; Zhu, F. Multi-view hypergraph contrastive policy learning for conversational recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, China, 23–27 July 2023; pp. 654–664. [Google Scholar]
Wan, Z.; Liu, X.; Wang, B.; Qiu, J.; Li, B.; Guo, T.; Chen, G.; Wang, Y. Spatio-temporal contrastive learning-enhanced GNNs for session-based recommendation. ACM Trans. Inf. Syst. 2023, 42, 1–26. [Google Scholar]
Dong, H.V.; Fang, Y.; Lauw, H.W. A Contrastive Framework with User, Item and Review Alignment for Recommendation. arXiv 2025, arXiv:2501.11963. [Google Scholar]
Wu, J.; Gan, W.; Lu, H.; Yu, P.S. Graph Contrastive Learning on Multi-label Classification for Recommendations. arXiv 2025, arXiv:2501.06985. [Google Scholar]
Benson, A.R.; Gleich, D.F.; Leskovec, J. Higher-order organization of complex networks. Science 2016, 353, 163–166. [Google Scholar] [CrossRef] [PubMed]
Whitney, H. Congruent graphs and the connectivity of graphs. In Hassler Whitney Collected Papers; Birkhäuser: Boston, MA, USA, 1992; pp. 61–79. [Google Scholar]
Bai, S.; Zhang, F.; Torr, P.H. Hypergraph convolution and hypergraph attention. Pattern Recognit. 2021, 110, 107637. [Google Scholar] [CrossRef]
Wu, F.; Souza, A.; Zhang, T.; Fifty, C.; Yu, T.; Weinberger, K. Simplifying graph convolutional networks. In Proceedings of the International Conference on Machine Learning, Taiyuan, China, 8–10 November 2019; pp. 6861–6871. [Google Scholar]
Kang, W.C.; McAuley, J. Self-attentive sequential recommendation. In Proceedings of the 2018 IEEE International Conference on Data Mining (ICDM), IEEE, Singapore, 17–20 November 2018; pp. 197–206. [Google Scholar]
Ye, Y.; Xia, L.; Huang, C. Graph masked autoencoder for sequential recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, Taipei, China, 23–27 July 2023; pp. 321–330. [Google Scholar]
Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; Jiang, P. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1441–1450. [Google Scholar]
Xia, X.; Yin, H.; Yu, J.; Shao, Y.; Cui, L. Self-supervised graph co-training for session-based recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, Australia, 1–5 November 2021; pp. 2180–2190. [Google Scholar]
Xia, L.; Huang, C.; Xu, Y.; Zhao, J.; Yin, D.; Huang, J. Hypergraph contrastive collaborative filtering. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 70–79. [Google Scholar]
Zou, D.; Wei, W.; Mao, X.L.; Wang, Z.; Qiu, M.; Zhu, F.; Cao, X. Multi-level cross-view contrastive learning for knowledge-aware recommender system. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1358–1368. [Google Scholar]
Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Graph contrastive learning with adaptive augmentation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2069–2080. [Google Scholar]
Zhu, Y.; Xu, Y.; Yu, F.; Liu, Q.; Wu, S.; Wang, L. Deep graph contrastive representation learning. arXiv 2020, arXiv:2006.04131. [Google Scholar]
Zhou, D.; Huang, J.; Schölkopf, B. Learning with hypergraphs: Clustering, classification, and embedding. Adv. Neural Inf. Process. Syst. 2006, 19, 1601–1608. [Google Scholar]
Goodfellow, I.J.; Shlens, J.; Szegedy, C. Explaining and harnessing adversarial examples. arXiv 2014, arXiv:1412.6572. [Google Scholar]
Zhang, P.; Guo, J.; Li, C.; Xie, Y.; Kim, J.B.; Zhang, Y.; Xie, X.; Wang, H.; Kim, S. Efficiently leveraging multi-level user intent for session-based recommendation via atten-mixer network. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, Virtual, 27 February–3 March 2023; pp. 168–176. [Google Scholar]

Figure 1. Overall framework of the proposed BCHRec model, which includes hypergraph convolution, session graph construction, Transformer modeling, cross-view contrastive learning, and adversarial training.

Figure 2. Hypergraph model item associations within sessions through hyperedges. For instance, items

v_{3}

,

v_{5}

,

v_{6}

, and

v_{7}

in session

s_{3}

are connected by a green hyperedge to represent their co-occurrence. By encoding session interactions into a hypergraph structure, the model explicitly characterizes high-order item relationships: hyperedges capture latent semantic connections between items, thereby inferring user potential interests.

Figure 2. Hypergraph model item associations within sessions through hyperedges. For instance, items

v_{3}

,

v_{5}

,

v_{6}

, and

v_{7}

in session

s_{3}

are connected by a green hyperedge to represent their co-occurrence. By encoding session interactions into a hypergraph structure, the model explicitly characterizes high-order item relationships: hyperedges capture latent semantic connections between items, thereby inferring user potential interests.

Figure 3. In session graph construction, hyperedges in the hypergraph are mapped to nodes in the session graph. For example, the green hyperedge representing session

s_{3}

(which includes items

v_{3}

,

v_{5}

,

v_{6}

, and

v_{7}

) is transformed into node

e_{3}

in the session graph. Through this structural mapping, the session graph explicitly models high-order inter-session relationships. Nodes

e_{1}

to

e_{4}

correspond to sessions

s_{1}

to

s_{4}

in the original hypergraph.

Figure 3. In session graph construction, hyperedges in the hypergraph are mapped to nodes in the session graph. For example, the green hyperedge representing session

s_{3}

(which includes items

v_{3}

,

v_{5}

,

v_{6}

, and

v_{7}

) is transformed into node

e_{3}

in the session graph. Through this structural mapping, the session graph explicitly models high-order inter-session relationships. Nodes

e_{1}

to

e_{4}

correspond to sessions

s_{1}

to

s_{4}

in the original hypergraph.

Figure 4. Distribution of session length.

Figure 5. Distribution of item popularity.

Figure 6. Ablation study.

Figure 7. Effect of contrastive learning weight

β

.

Figure 7. Effect of contrastive learning weight

β

.

Figure 8. Effect of diversity constraint weight

λ

.

Figure 8. Effect of diversity constraint weight

λ

.

Figure 9. Effect of model depth L.

Figure 10. Effect of Transformer module depth

L_{T}

.

Figure 10. Effect of Transformer module depth

L_{T}

.

Table 1. Statistics of the experimental datasets.

Dataset	Tamll	Diginetica	RetailRocket
# of training sessions	351,268	719,470	433,643
# of test sessions	25,898	60,858	15,132
# of clicks	818,479	982,961	1,331,815
# of items	40,728	43,097	36,968
sparsity	99.98%	99.99%	99.99%
avg.len.	6.69	5.12	5.43

Table 2. Performance comparison with baselines on different datasets. The best result in each row is highlighted in bold. * denote statistically significant differences (p < 0.01) when comparing BCHRec to baseline methods.

Dataset	Metric	POP	Item-KNN	FPMC	GRU4REC	NARM	STAMP	FGNN	SR-GNN	DHCN	Atten-Mixer	BCHRec
Tmall	P@10	1.67	6.65	13.10	9.47	19.17	22.63	20.67	23.41	25.14	31.79	32.02 *
	M@10	0.88	3.11	7.12	5.78	10.42	13.12	10.07	13.45	13.91	18.15	18.20 *
	P@20	2.00	9.15	16.06	10.93	23.30	26.47	25.24	27.57	30.43	37.43	38.53 *
	M@20	0.90	3.31	7.32	5.89	10.70	13.36	10.39	13.72	14.26	18.44	18.62 *
Diginetica	P@10	0.76	25.07	15.43	17.93	35.44	33.98	37.72	36.86	39.68	40.31	41.43 *
	M@10	0.26	10.77	6.20	7.33	15.13	14.26	15.95	15.52	17.42	17.04	18.07 *
	P@20	0.89	35.75	26.53	29.45	49.70	45.64	50.58	50.73	52.99	54.37	54.93 *
	M@20	0.20	11.57	6.95	8.33	16.17	14.32	16.84	17.59	18.34	18.14	18.86 *
RetailRocket	P@10	1.72	20.68	25.99	38.35	42.07	42.95	43.75	43.21	48.33	48.63	49.62 *
	M@10	0.69	4.29	13.38	23.27	24.88	24.61	26.11	26.07	28.59	27.95	29.67 *
	P@20	1.97	10.23	32.37	44.01	50.22	50.96	50.99	50.32	56.00	56.66	57.73 *
	M@20	0.75	4.56	13.82	23.67	24.59	25.17	26.21	26.57	29.11	28.52	30.26 *

Table 3. Performance comparison of different data sparsity.

20% Tmall Dataset
method	P@10	P@20	M@10	M@20
DHCN	23.75	13.04	29.35	12.77
BCHRec	31.42	17.83	37.69	17.33
20% Retailrocket Dataset
method	P@10	P@20	M@10	M@20
DHCN	47.06	27.56	55.23	28.31
BCHRec	48.76	28.55	56.43	28.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, Z.; Wei, L. Bi-View Contrastive Learning with Hypergraph for Enhanced Session-Based Recommendation. Information 2025, 16, 267. https://doi.org/10.3390/info16040267

AMA Style

Wang Z, Wei L. Bi-View Contrastive Learning with Hypergraph for Enhanced Session-Based Recommendation. Information. 2025; 16(4):267. https://doi.org/10.3390/info16040267

Chicago/Turabian Style

Wang, Zijun, and Lai Wei. 2025. "Bi-View Contrastive Learning with Hypergraph for Enhanced Session-Based Recommendation" Information 16, no. 4: 267. https://doi.org/10.3390/info16040267

APA Style

Wang, Z., & Wei, L. (2025). Bi-View Contrastive Learning with Hypergraph for Enhanced Session-Based Recommendation. Information, 16(4), 267. https://doi.org/10.3390/info16040267

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Bi-View Contrastive Learning with Hypergraph for Enhanced Session-Based Recommendation

Abstract

1. Introduction

2. Related Work

2.1. Session-Based Recommendation

2.2. Hypergraph Learning

2.3. Contrastive Learning

3. Method

3.1. Problem Setup and Definitions

3.1.1. Problem Setup

3.1.2. Definition 1: Hypergraph View

3.1.3. Definition 2: Session View

3.2. Method Framework

3.2.1. Hypergraph View and Hypergraph Convolution

3.2.2. Transformer as Encoder

3.2.3. Predication

3.2.4. Session View and Graph Convolution

3.2.5. Cross-View Contrastive Learning

3.2.6. Adversarial Training and Constraints

3.2.7. Multi-Task Joint Learning

4. Experiments

4.1. Experimental Setup

4.1.1. Dataset

4.1.2. Baselines

4.1.3. Evaluation Metrics

4.1.4. Parameter Tuning

4.2. Experimental Results

4.3. Ablation Study

4.4. Performance Comparison of Different Data Sparsity

4.5. Hyper-Parameter Sensitivities

4.5.1. Effect of Contrastive Loss Weight β

4.5.2. Effect of Diversity Constraint Weight λ

4.5.3. Effect of Model Depth L

4.5.4. Effect of Transformer Module Depth L T

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.5.1. Effect of Contrastive Loss Weight $β$

4.5.2. Effect of Diversity Constraint Weight $λ$

4.5.4. Effect of Transformer Module Depth $L_{T}$