Multi-Granularity and Multi-Interest Contrast-Enhanced Hypergraph Convolutional Networks for Session Recommendation

Mao, Xingbin; Li, Liang; He, Jiaxing

doi:10.3390/app14188293

Open AccessArticle

Multi-Granularity and Multi-Interest Contrast-Enhanced Hypergraph Convolutional Networks for Session Recommendation

by

Xingbin Mao

,

Liang Li

^* and

Jiaxing He

Department of Computer Science and Engineering, Chongqing University of Technology, Chongqing 400000, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(18), 8293; https://doi.org/10.3390/app14188293

Submission received: 30 July 2024 / Revised: 5 September 2024 / Accepted: 11 September 2024 / Published: 14 September 2024

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

Aiming at the problems of the incomplete recommendation and sparsity of session data in session recommendation, a new multi-granularity and multi-interest contrast-enhanced hypergraph convolutional network (MGMI-CEHCN) model for session recommendation is proposed. The session data are modeled as a heterogeneous hypergraph, the information is embedded by the hypergraph using two granularities of item price and category, and then the information between different granularities is fused, while the final item embedding is obtained through multi-layer convolution. Finally, an interest perceptron is used to detect multiple potential interests for each item, and a decentralized interest extraction network based on a gated recurrent unit (GRU) is used to integrate the user’s final interests and obtain a global session representation through a soft attention mechanism; a local session representation is generated with the help of a weighted line graph convolutional network. A further joint contrast enhancement strategy is used to maximize the mutual information between the global session representation and local session representation, to improve the recommendation performance. Experiments on several real datasets showed that the recommendation performance of the MGMI-CEHCN model outperformed the current mainstream models. On Cosmetics, the P@20 reached up to 55.25% and M@20 reached up to 38.26%, improvements of 3.06% and 3.09%, respectively; on Diginetica-buy, the P@20 reached up to 65.60% and M@20 reached up to 27.47%, improvements of 2.47% and 6.64%, respectively, which proved the validity of the model.

Keywords:

session recommendation; multi-granularity; joint contrast enhancement; hypergraph convolutional networks

1. Introduction

Session-based recommendation is a typical application of recommender systems. The initial exploration of session recommendation systems mainly focused on sequence modeling, which mainly includes project-based neighborhood methods [1] and Markov-chain-based sequence methods [2]. Although the project-based approach completes recommendations based on an item similarity matrix, it ignores the order between items. To solve this problem, the Markov-chain-based approach uses the sequential behavior between two adjacent clicks to model a user’s short-term preferences [3], but it cannot capture long-term preferences. Zhan et al. [4] combined the two to avoid the dependence on local sequence information. Deep-learning-based models mainly include recurrent neural networks (RNNs) and their variants, attention mechanisms, and graph neural networks. Hidasi et al. [5] proposed gated recurrent unit for recommendation (GRU4Rec), and used recurrent neural networks (RNNs) to model sessions based on clicks for the first time. Hidasi et al. [6] proposed a rich feature-oriented session recommendation model based on recurrent neural networks, a parallel RNN model (P-RNN), aiming at the deficiencies of GRU4Rec, which only models project identification and fails to make full use of rich project features (pictures, text, etc). Bogina et al. [7] proposed applying residence time (the interaction interval between users and items) to the RNN-based session recommendation method, and proposed a Dewell time-based RNN, which takes the interaction time between the user and the project as the key feature for extracting the user’s interest preference, suitable for the current popular field of short video recommendation. To solve the same problem, Sun et al. [8] proposed a new model structure, a session recommendation model (TA4Rec) based on recurrent neural networks with a temporal attention factor. TA4Rec calculates the attention factor based on the user’s residence time on the project and adds this to a gated recurrent unit (GRU) network to improve model performance. Ruocco et al. [9] proposed the II-RNN (Inter-Intra RNN) model, which combines modeling within a single session with modeling between historical sessions, to solve the problem that users’ interest preferences cannot be obtained due to the absence of an item sequence at the beginning of each session. Quadrana et al. [10] proposed a hierarchical recurrent neural network personalized session recommendation method HRNN (hierarchical recurrent neural), aiming at existing models that only consider item identification and characteristics and ignore user identification networks, and adding user-level GRUs to GRU4Rec. Phuong et al. [11] proposed to learn the embedded representation of users and projects from the interaction data between users and projects, combine the global user embedding with the project sequence in the session, and generate the next recommendation through an attentive GRU (AGRU)-type structure. Song et al. [12] used the context preferences of higher-level users to expand the RN-based session recommendation method, and proposed an augmented RNN (ARNN), whose core structure is a product-based neural network [13] (PNN) that can capture the high-level interactions between the user context and the project to simulate the user’s context preferences, and can enhance the effect of the existing RNN model. However, the user preferences contained in the sequence are also ignored, and the information is not fully utilized. Then, the RNN model is combined with an attention mechanism, and the accuracy of the model prediction is further improved. The NARM (neural attentive recommendation machine) model was designed as an encoder that blends an RNN and attention mechanism to model the sequential behavior of users [14]. Session-based recommendation with graph neural network (SR-GNN) [15] was the first model to apply graph neural networks to session-based recommendation methods. SR-GNN constructs a sequence of sessions as a graph structure containing only one type of node, and uses a GNN to capture the complex transition relationships between items and to generate a vector of potential item representations with better expressive power.

With the successful application of the SR-GNN model, some variants of GNNs have also achieved good performance. Among them, in the rethinking the item order in session-based recommendation with graph neural networks (FGNN) model proposed by Qiu et al. [16], the SR-GNN is compensated for by considering transformation modes for specific items within the session. Although the above methods have achieved good performance in SBR, they still fail to capture complex high-level relationships between projects. The use of hypergraphs can capture higher-order item relationships to a certain extent [17]. A hypergraph neural network (HGNN) is used to learn the embedded representation of items to mine higher-order information. Xia et al. [18] constructed a set of conversations as a hypergraph and used a dual-channel hypergraph convolutional network (DHCN). The DHCN constructs hypergraphs to capture higher-order dependencies between items, and is able to learn complex transformational relationships between items, thus improving the accuracy of recommendations. Although the above methods have achieved good performance, most of them focus on the complex structure and transformation relationships between items or higher-order dependencies between multiple items, neglecting the importance of the granular information of item price and category.

STAMP (short-term attention/memory priority) utilizes an MLP network and attention mechanism to capture users’ general and current interests [19]. Guo et al. [20] first introduced time interval information into a session-based recommendation method and proposed a time-aware graph neural network (TA-GNN), which takes the time factor as a key feature for user interest preference learning. These methods only modeled the user’s primary interests in a session and ignored the secondary interests. However, in general, for each session, each item reflects the user’s unique interests. If two items share similar interests, they are usually considered relevant; on the contrary, if their interests are different, they may be less relevant to each other. Items containing multiple potential interests in each session are modeled to more accurately capture the correlation between items. In addition, the problem of sample sparsity still hinders the modeling of user–item interactions. In recent years, comparative learning has been gradually promoted as an emerging class of methods to address data sparsity in session recommendation [18,21,22].

A multi-granularity and multi-interest contrast-enhanced hypergraph convolutional network model (MGMI-CEHCN) for session recommendation is proposed. The main contributions of this work are as follows:

A heterogeneous hypergraph-based approach is proposed to embed the category and price information of an item as auxiliary information, enhance the item representation with multi-granularity information, and then better capture the complex dependencies between nodes through a hypergraph convolutional network.
In MGMI-CEHCN, an interest perceptron is used to detect multiple potential interests for each item, a decentralized interest extraction network is used to integrate the user’s final interests, and a global session representation is obtained through a soft attention mechanism.
To alleviate the sparsity of the data, the global session representation and the local session representation are jointly compared, and supervised signals are built for both, to optimize the training, achieve more effective mining of deeper associations between items, and increase the model robustness and interpretability.

The remainder of this paper is structured as follows: In Section 2, we present a literature review of session-based recommendation research. Section 3 presents a model of multi-granularity and multi-interest contrast-enhanced hypergraph convolutional networks for session-based recommendation. In Section 4, we offer an in-depth exploration of the experimental design, the dataset employed, the evaluation metrics, a comparative analysis of the experimental results with a baseline method, a discussion of the findings within the framework of existing studies, ablation experiments, and a hyperparameter analysis. Section 5 encapsulates the research presented in this paper, highlighting the main points and proposing recommendations for future endeavors.

2. Related Work

2.1. Traditional Session Recommendation Methods

Traditional recommendation algorithms in machine learning, such as methods based on matrix factorization [23] (MF) and collaborative filtering [24] (CF), have been widely used in many recommendation systems. However, for the task of session recommendation, traditional matrix decomposition methods are not very suitable. Session recommendation aims to recommend appropriate information or content for a user based on his/her behavior during a specific session. Traditional matrix decomposition methods are usually based on the interaction behaviors between users and items, whereas, in session recommendation scenarios, user behaviors are often transient and fragmented, and a complete interaction matrix cannot be constructed. Collaborative filtering algorithms make recommendations based on the similarity between users or items, which are divided into two categories: user-based collaborative filtering and item-based collaborative filtering. The former recommends by calculating the similarity between users, while the latter recommends by calculating the similarity between items. These methods have been widely used in the recommendation systems of Netflix, Amazon, YouTube, et al. [25]. However, collaborative filtering algorithms have some limitations in calculating the similarity between users, due to the lack of information about users for conversational recommendations. In addition, collaborative filtering algorithms often fail to fully utilize item feature information, and item-based collaborative filtering only calculates similarity using the number of co-occurrences of items in a session, ignoring the sequential relationships between items. Before deep learning became widely used, Markov chain (MC)-based methods were the mainstream session recommendation algorithms. Markov chain is a stochastic process that assumes that the current state is only related to the previous state. A Markov-chain-based approach predicts the user’s subsequent behavior by building a state transfer matrix of the user’s behavioral sequence, and thus recommends items with the highest probability to the user. Rendle et al. [26] combined a matrix-decomposition-based approach and a Markov-chain-based approach, and proposed the FPMC (factorizing personalized Markov chain) model, which achieved good results. Le et al. [27] developed a hidden Markov model that combines users’ dynamic preference tendencies and contextual bias transition information to improve the performance of recommendations. However, the Markov-chain-based approach has some problems. First, it is based on an independence assumption [15], i.e., that the user’s subsequent behavior is only related to the current state and not to previous behavior. This assumption is not fully valid in practical situations, because users’ behaviors usually have some contextual relevance. Second, Markov-chain-based methods tend to ignore the global information of the session sequence and focus only on the items clicked at the end of the session, resulting in poor recommendation results in long sequences.

2.2. Deep-Learning-Based Session Recommendation Method

With the development of deep learning, many recommendation methods utilize the powerful expressive ability of deep neural networks to model complex dependencies in users’ historical interactions. Tang et al. [28], by using a convolutional neural network (CNN) for sequence feature extraction and an attention mechanism for sequence weighting, were able to capture a user’s time-dependent and personalized preferences, improving the effectiveness of recommendations. The use of RNN-based models for modeling historical user interactions in session recommendation is gradually increasing, due to the significant improvements of recurrent neural networks (RNNs) in modeling sequential data. Hidasi et al. [5] treated each session as a series of items arranged by interaction time, and used a gated recurrent unit (GRU) for prediction, achieving significant results. Later, Tan et al. [29] improved the performance of session recommendation using RNN models by using appropriate data augmentation techniques and taking into account temporal variations in user behavior. Jannach et al. [30] combined a domain-based K-nearest neighbors (KNN) method with RNN models to further enhance the performance of RNN-like models. However, there are many inherent defects in RNN models. When dealing with long sequences, RNN-like models are difficult to train, due to the disappearance or explosion of the gradient during back propagation. Meanwhile, RNNs can only be computed according to the sequence order and cannot be computed in parallel, which restricts the computational efficiency of the model and the training speed. Vaswani et al. [31] epochally constructed a new network structure, the Transformer, that uses an attention mechanism to model long-range dependencies of sequences and has achieved significant performance gains in various domains. As a result, many models incorporate an attention mechanism for recommendation [32]. Li et al. [14] proposed a hybrid encoder with an attention mechanism to model a user’s sequential behavior and capture the user’s main purpose in the current session. Liu et al. [19] proposed a novel short-term attention/memory prioritization model that captures a user’s long-term memory in a session sequence of general interest of the user, while focusing on the user’s recent clicks to capture their current interests. Unlike the above approaches, GNN-based models model sessions as graph-structured data and explore the use of GNNs to model complex transfer relationships between items within or between sessions. Wu et al. [15] first introduced GNNNs into session recommendation by modeling sessions as graph-structured data and achieved excellent performance. Chen et al. [33] proposed a lossless encoding scheme to solve the problem of information loss when transforming a session into a graph, and designed a shortcut graph attention layer to propagate information through the shortcut connections of the graph, to efficiently capture the long-distance dependencies of items in the session. Qiu et al. [16] combined an attention mechanism and graph neural networks to propose a weighted attention graph layer to learn the embeddings of items and the session for next item recommendation. Wang et al. [34] modeled users’ behavioral patterns in a session, without disrupting the clicking order, and highlighted key user preferences during the modeling process. The model proposed by Xu et al. [35] captures the relationship between the whole and the local by constructing a graph structure of the session sequence. Huang et al. [36] worked on a method of item transitions in a single session and proposed an encoder for the graph structure. Pan et al. proposed a star graph neural network model [37] to shorten the information propagation distance of non-adjacent items by adding star nodes to the session graph. These methods work well but only focus on the current session, ignoring the complex relationships and potentially beneficial information across sessions. Thus, Deng et al. [38] constructed a global graph and generated session representations through unsupervised learning using a GNN, with good results. After that, Wang et al. [39] built global graphs across all sessions and combined them with the current session to improve recommendation performance.

2.3. A GNN Recommendation Algorithm Incorporating Self-Supervised Comparative Learning

Self-supervised learning (SSL) is a method used to learn useful representations from unlabeled data, it does not rely on external supervisory signals, but by designing prediction tasks to allow the model to discover patterns and structures from the data itself, it can learn node representations in a more robust way. One of the most common and effective self-supervised learning methods is contrastive learning (CL) [40,41]. Zhou et al. [42] enhanced the learning of item representations for recommendation by maximizing mutual information through self-supervised learning. Yao et al. [43] proposed a multi-task self-supervised learning framework and devised a data enhancement method based on feature association. Wu et al. [44] generated multiple views of a node through three data enhancement approaches and maximized the inter-view consistency to assist in the learning of representations. Xia et al. [21] enhanced the recommendation performance of session and item views through self-supervised graph co-training. Following this line of research, in this paper, we integrate a joint contrast learning strategy into a session recommendation model to contrastively learn global and local session representations, to alleviate the data sparsity problem.

3. Materials and Methods

3.1. Formulation of the Problem

Let V =

V^{i d} \cup V^{p} \cup V^{c}

denote the set of all nodes, where

V_{i}^{i d}

=

{v_{1}^{i d}, v_{2}^{i d}, v_{3}^{i d}, \dots, v_{N}^{i d}}

,

V_{i}^{p}

=

{v_{1}^{p}, v_{2}^{p}, v_{3}^{p}, \dots, v_{Y}^{p}}

, and

V_{i}^{c}

=

{v_{1}^{c}, v_{2}^{c}, v_{3}^{c}, \dots, v_{U}^{c}}

, where N, Y, and U are the total number of project IDs, project prices, and project categories, respectively. Each project

v_{i}^{i d} \in v^{i d}

is embedded into the same space, and let

h_{γ_{i}}^{l}

represent the vector representation of

v_{i}^{i d}

of dimension

d^{(l)}

in

l^{t h}

layer of the deep neural network. Each session is represented as a set of length m

S_{i}

=

{v_{i, 1}^{i d}, v_{i, 2}^{i d}, v_{i, 3}^{i d}, \dots, v_{i, m}^{i d}}

, where

v_{i, k}^{i d} \in V (1 \leq k \leq m)

denotes the items interacted with by the anonymous user during session

S_{i}

, and the items are sorted by interaction time. The task of the MGMI-CEHCN model is to predict the most likely interaction project

v_{i, m + 1}^{i d}

for a specific session at time

m + 1

. The model inputs a sequence of sessions, outputs the recommendation scores of all possible items

\hat{y}

, and finally takes the top K items as the recommended candidates for the session

S_{i}

.

Definition 1.

As mentioned in [45], the distribution of prices for a given category is more consistent with a logistic distribution than with the commonly used uniform distribution. The probability density function of a logistic distribution presents a higher preference in the middle zone and a lower preference at the ends. In price-dense intervals, users are confronted with more similarly priced goods and are thus more sensitive to price changes, thus requiring a more detailed division of price levels, while in sparsely priced intervals, a looser division is possible. In addition, in order to balance the training data, the number of items in each price level should be kept approximately the same. Therefore, as shown in Figure 1, we discretize the prices into θ levels (e.g., θ = 5), where each interval corresponds to an equal probability. Formally, for item

x_{i}

with price

x_{p}

and a price range of [min, max] for its category, we determine its price level as follows:

p_{i} = ⌊ \frac{ϕ (x_{p}) - ϕ (m i n)}{ϕ (m a x) - ϕ (m i n)} \times θ ⌋

(1)

where

ϕ (X)

is the cumulative distribution function of logistic distribution, which can be defined as follows:

ϕ (x) = P (X \leq x) = \frac{1}{1 + e^{- π \frac{x - μ}{\sqrt{3} δ}}}

(2)

where μ and δ are the expected value and standard deviation, respectively.

Definition 2.

Heterogeneous Hypergraph. Design a customized heterogeneous hypergraph G =

(V, E)

, as shown in Figure 2, where V is the set of all nodes, and E is the set of all hyperedges. Encoded node types include project prices

v^{p}

, project IDs

v^{i d}

, and project categories

v^{c}

. Nodes are uniformly represented as

v^{τ}

, with nodes of the same type forming isomorphic nodes. Each hyperedge

e \in E

connects two or more nodes of arbitrary type. To represent multi-type relationships between nodes, we define two types of hyperedges: (1) feature hyperedges

e^{f}

connect all features of a project; (2) session hyperedges

e^{i d}

connect nodes of all projects in a session.

If two nodes are connected by a session hyperedge, they are adjacent. Each session hyperedge

e^{i d} \in E

containing two or more vertices is assigned a positive weight

W_{e}

, and all weights form a diagonal matrix

W \in R^{M \times M}

. A hypergraph can be represented by an adjacency matrix

H \in R^{N \times M}

, where each element is defined as

H_{i e} = \{\begin{matrix} 1, & i f v_{i}^{i d} \in e^{i d} \\ 0, & i f v_{i}^{i d} \notin e^{i d} \end{matrix}

(3)

For each vertex and hyperedge of each project, their degrees

D_{i}^{i d}

and

B_{e}^{i d}

are defined as

D_{i}^{i d} = \sum_{e = 1}^{M} H_{i e} * W_{e}

(4)

B_{e}^{i d} = \sum_{i = 1}^{N} H_{i e}

(5)

where

D_{i}^{i d}

and

B_{e}^{i d}

are diagonal matrices.

Definition 3.

Weighted Line Graph. Converting hypergraphs to weighted line graphs

L (G)

=

(V_{L}, E_{L})

, as shown in Figure 2, each node is a session hyperedge in G. Here,

V_{L}

=

\{v_{e} | v_{e} \in e^{i d}\}

and

E_{L}

=

\{(v_{e_{p}}, v_{e_{q}}) | e_{p}, e_{q} \in e^{i d} | e_{p} \cap e_{q} \geq 1\}

. Each edge

(v_{e_{p}}, v_{e_{q}})

is assigned a weight

W_{p, q}

, expressed as follows:

W_{p, q} = | e_{p} \cap e_{q} | / | e_{p} \cup e_{q} |

(6)

3.2. Overview of the Proposed Model

The overall architecture of the MGMI-CEHCN model is shown in Figure 3. Firstly, constructing the session data into a heterogeneous hypergraph and then learning the item representations based on global relations, aggregating the multi-granularity information from both intra-granularity and inter-granularity perspectives through M-G encoder, and then obtaining the final item representations through the hypergraph convolutional network; secondly, integrating the user’s multiple interests using the interest perceptron and the decentralized interest extraction network in an M-I encoder on the global relations. After the information fusion module for item embedding and reverse location embedding fusion, the global session representation is obtained based on the user’s multiple interests through the soft attention mechanism; the local session representation is learned using the average pooling layer and the weighted line graph convolutional network on the local relation. After that, the global session representation and local session representation are jointly compared, to build supervised signals for optimal training. Finally, a list of K-item recommendations for the next click is output through the prediction layer.

3.3. M-G Encoder

The M-G encoder can integrate information within and across different granularities. In the constructed heterogeneous hypergraph, the target node can be a neighbor to nodes with different granularities. Obviously, nodes within the same granularity contain the same semantic information, while nodes between different granularities contain different semantic information.

Information aggregation within a granularity. For target nodes, information aggregation within a granularity focuses on the importance of neighboring nodes.

v^{i d} \in R^{d}

is the target node embedded with the granularity item id. We denote a target node with its neighboring nodes of a specific granularity

(t)

as

N_{i d}^{t}

. First assume that the target formed by the neighboring nodes of a specific granularity

(p)

is

N_{i d}^{p}

, as in Equations (7) and (8), where

v_{i}^{p} \in N_{i d}^{t}

,

u_{t}^{T} \in R^{d}

is the attention vector. For different granularity embeddings for each target node, the granularity information is extracted by applying different attention vectors

(u_{t}^{T})

for better model learning.

θ_{i d}^{p} \in R^{d}

is the granularity embedding of

v^{i d}

with respect to the granularity p. Regarding the target node item id, the embedding of category information

θ_{i d}^{c}

is similar to the embedding of price information.

θ_{i d}^{p} = \sum_{i} α_{i} v_{i}^{p}

(7)

α_{i} = \frac{exp (u_{t} v_{i}^{p})}{\sum_{v_{i}^{p} \in N_{i d}^{p}} exp (u_{t} v_{i}^{p})}

(8)

Inter-granularity information aggregation. For the target node, inter-granularity information aggregation focuses on the impact of different granularities. Information with different granularity provides different semantic information to the target node, we merge the learned category information

θ_{i d}^{c}

and price information

θ_{i d}^{p}

, which is then propagated to the target node, as in Equations (9) and (10), where

W_{i d} \in R^{d * 3 d}

,

W_{i d}^{p}

and

W_{i d}^{c} \in R^{d * d}

are learnable parameters,

[;]

is a cascade, and

σ

is a sigmoid function.

A \in R^{d}

is an embedding of the target node whose semantics are enriched by its neighboring nodes.

ω = σ (W_{i d} [v^{i d}; θ_{i d}^{p}; θ_{i d}^{c}] + W_{i d}^{p} θ_{i d}^{p} + W_{i d}^{c} θ_{i d}^{c})

(9)

A = v^{i d} + ω * θ_{i d}^{p} + (1 - ω) * θ_{i d}^{c}

(10)

By using intra- and inter-granularity aggregation mechanisms, the embeddings of all item id nodes are updated:

γ_{i} = f_{b} (v^{i d}, θ_{i d}^{p}, θ_{i d}^{c}) + a v g (N_{i d}^{i d})

, where

f_{b}

is the aggregation mechanism,

N_{i d}^{i d}

consists of id nodes adjacent to

v^{i d}

, and

a v g (N_{i d}^{i d})

computes the average id embedding in

N_{i d}^{i d}

.

3.4. Hypergraph Convolutional Neural Network

After obtaining the embeddings of all item id nodes through the M-G encoder, in order to generate a richer and more representative representation of the items, a hypergraph convolutional neural network layer is utilized to capture the higher-order relationships of the items in the session. The hypergraph convolutional network is defined as

h_{γ_{i}}^{l + 1} = σ ({(D_{i}^{i d})}^{- 1} H W {(B_{\in}^{i d})}^{- 1} H^{T} h_{γ_{i}}^{(l)})

(11)

The multiplication procedure

H^{T} h_{γ_{i}}^{(l)}

outlines the process of aggregating information from nodes to hyperedges, while the pre-multiplication of matrix H facilitates the aggregation of information from hyperedges back to nodes.

For a given item

γ_{i}

, the corresponding embedding representation is obtained in each layer of the convolution, and an averaging operation is performed on the final embedding

h_{γ_{i}}^{(l)}

obtained in each layer to obtain the final global representation:

h_{γ_{i}}^{g} = \frac{1}{L + 1} \sum_{l = 0}^{L} h_{γ_{i}}^{(l)}

.

3.5. M-I Encoder

The M-I encoder first utilizes an interest perceptron to detect potential interests embedded in each item in the session, then designs a decentralized GRU-based interest extraction network that uses the hidden state of the last step as a representation of each interest, and finally utilizes additive attention to obtain the user’s final multi-interest representation vector. The interest perceptron uses multiple self-attention head vectors to detect the interest implied in the items clicked at each time step. Each self-attention header vector represents one interest, e.g., represents the first interest. As in Equations (12)–(14),

W_{i}^{Q}, W_{i}^{K}

and

W_{i}^{V} \in R^{d \times d}

are parameter matrices used to map inputs to queries, keys, and values, respectively; and

h_{γ_{i}}^{g}

is the item clicked at each time step.

S_{i} = [h_{γ_{1}}^{g}, h_{γ_{2}}^{g}, \dots, h_{γ_{m}}^{g}]

(12)

h e a d_{k} = A t t e n t i o n (W_{i}^{Q} h_{γ_{m}}^{g}, W_{i}^{K} h_{γ_{m}}^{g}, W_{i}^{V} h_{γ_{m}}^{g})

(13)

E_{h e a d} = [h e a d_{1}, h e a d_{2}, \dots, h e a d_{i}, \dots, h e a d_{k}]

(14)

The decentralized GRU-based interest extraction network consists of multiple recursive interest channels, and the number of channels of interest is the same as the number of self-attention heads. Multiple potential interests for each item are obtained through an interest perceptron, and then each channel models the sequential dependency of session items within each potential interest. For the ith channel, the interest representation

h e a d_{i}

for each step is taken as input and the GRU credentials are updated as in Equations (15)–(18), where

p_{t}

and

q_{t}

are the reset and update gates in the GRU, respectively.

σ_{g}

and

σ_{z}

are the

s i g m o i d

and

t a n h

activation functions, respectively.

h e a d_{i, t}

is the ith potential interest representation embedded in the item clicked at time t, and

x_{t - 1}

is the last state of interest in the ith channel.

p_{t} = σ_{g} (W_{1} [h e a d_{i, t}, x_{t - 1}])

(15)

q_{t} = σ_{g} (W_{2} [h e a d_{i, t}, x_{t - 1}])

(16)

u^{'} = σ_{z} (W_{3} [h e a d_{i, t}, x_{t - 1} ⊙ p_{t}])

(17)

u_{t} = (1 - q_{t}) ⊙ x_{t - 1} + q_{t} ⊙ u^{'}

(18)

Until the last step t, the user’s current ith interest representation

u_{t}

is obtained. To further distinguish the different importance of interests and generate more informative representations for the user, we utilize additive attention to aggregate the above interest vector representations, as in Equations (19) and (20), where

q^{T}

is the query vector, v and b are the training parameters, and

β_{i}

is the attentional weight of the ith interest in the user’s potential interests.

β_{i} = q^{T} t a n h (V \times u_{t, i} + b)

(19)

β_{i} = \frac{exp (β_{i})}{\sum_{p = 1}^{k} e x p (β_{p})}

(20)

In addition, the

u_{t}

of all k channels are integrated together to obtain the final composite interest representation of the user:

T^{s} = \sum_{i = 1}^{k} β_{i} u_{t, i}

.

3.6. Global Session Embedding

In order to fully consider the item location information in the session sequence, reverse location coding information is introduced and a reverse location coding matrix

P_{r} = [p_{1}, p_{1}, p_{3}, \dots, p_{m}]

is set to fuse the final item representation and reverse location coding information through a cascade operation and nonlinear transformation, denoted as

h_{γ_{i}}^{*} = t a n h (W_{1} [h_{γ_{i}}^{g} | | P_{m - i + 1}] + b)

, where

W_{1} \in R^{d * 2 d}

and

b \in R^{d}

are learnable parameters.

The session embedding

S^{g l o b a l}

is represented based on the contribution of each clicked item in the session to the overall session and user interest representation, as in Equations (21) and (22), where

T^{s}

is the final composite interest representation of the user represented by the entire session, and

h_{γ_{i}}^{*}

is the embedding of the ith item in session

S_{i}

.

f \in R^{d}

,

W_{2} \in R^{d * d}

and

W_{3} \in R^{d * d}

are the attention parameters used to learn the item weights

a_{t}

.

a_{t} = f^{T} σ (W_{2} h_{γ_{i}}^{*} + W_{2} T^{s} + c)

(21)

S^{g l o b a l} = \sum_{t = 1}^{m} a^{t} h_{γ_{i}}^{*}

(22)

3.7. Local Session Embedding

The local session embeddings represent learning through a line graph of hypergraphs. Each session-specific embedding is initialized using items belonging to each session

θ_{l}^{0}

, and then the corresponding item embeddings in

h_{0}

are averaged through an averaging pooling layer. The association matrix of

L (G)

is defined as

A \in R^{M \times M}

, where M is the number of nodes in the line graph and

A_{p, q} = W_{p, q}

. Let

\hat{A} = A + 1

, where I is the unit matrix.

\hat{A} \in R^{M \times M}

is the diagonal matrix, where

{\hat{D}}_{p, q} = \sum_{q = 1}^{m} {\hat{A}}_{p, q}

Therefore, the line graph convolution is defined as

θ_{l}^{l + 1} = {\hat{D}}^{- 1} \hat{A} θ^{l}

.

After the L-layer graph convolution passes

θ_{l}^{0}

, the session embeddings of each layer are averaged to obtain the final local session embedding:

S^{l o c a l} = \frac{1}{L + 1} \sum_{i = 0}^{L} θ_{l}^{(l)}

.

3.8. Predictive Layer

Based on the obtained global session embedding

S^{g l o b a l}

and local session embedding

S^{l o c a l}

, this paper calculates the probability

\hat{y}

that the item-level representation

h_{γ_{i}}^{i d}

of each candidate item

v_{i}^{i d}

in the session will be clicked the next time by means of a softmax function, as in Equation (23), where

\hat{y_{i}} = (\hat{y_{1}}, \hat{y_{2}}, \dots, \hat{y_{n}})

denotes the probability that the user clicks on each candidate item the next time in the current session. The cross-entropy loss function, which is widely used in session recommendation, is selected as the training function of the model, as in Equation (24), where

y_{i} \in y

denotes the one-hot coding vector of real items.

\hat{y_{i}} = s o f t m a x ({(S^{g l o b a l} + S^{l o c a l})}^{T} h_{γ_{i}}^{g})

(23)

l_{r} = - \sum_{i = 1}^{n} y_{i} l o g (\hat{y_{i}} + (1 - y_{i}) l o g (1 - \hat{y_{i}}))

(24)

3.9. Joint Contrast Enhancement Strategy

To maximize mutual information at the global and local level, a noise contrast type objective is used to construct an optimization strategy, by treating the original session embedding samples as positive samples and the samples obtained by disrupting the order of that session embedding with a rank transformation as negative samples, which in turn aids in the denoising of the model and improves the performance in terms of item or session feature extraction by using the cross-session information for the optimization of embeddings between them.

The InfoNCE loss function is employed to maximize information sharing between session representations at different levels as the learning objective of this paper:

l_{s} = - l o g σ (f_{D} (S_{i}^{g l o b a l}, S_{i}^{l o c a l})) - l o g σ (1 - f_{D} ({\tilde{S}}_{i}^{g l o b a l}, S_{i}^{l o c a l}))

(25)

where

{\tilde{S}}_{i}^{g l o b a l}

is the negative sample obtained by disrupting the order of the

S_{i}^{g l o b a l}

determinant transformation, the

f_{D}

function has a discriminative function, two vectors are used as inputs to evaluate the similarity between them, and

θ

is the sigmoid activation function.

Finally, the recommendation part and comparison learning work together to produce the effect:

L = l_{r} + κ l_{s}

, where

κ

denotes the hyperparameter that resizes the comparison task.

4. Experiments and Analysis of Results

4.1. Datasets and Pre-Processing

Three real-world public datasets, Cosmetics, Diginetica-buy, and Amazon, were used to evaluate the proposed method and baseline model. Cosmetics is a kaggle competitive dataset consisting of user behaviors from medium-sized online cosmetic shops. Diginetica-buy is an e-commerce dataset consisting of user purchasing behaviors on the website. Amazon is a recommendation baseline dataset consisting of user purchasing behaviors on Amazon.com. For a fair comparison, the three benchmark datasets were preprocessed according to the experimental setup in CoHHN et al. Table 1 contains detailed statistical information about these three datasets.

4.2. Evaluation Metrics

The performance of the MGMI-CEHCN and baseline models was evaluated using precision and mean reciprocal rank as the two evaluation metrics.

P@K is a metric for evaluating the accuracy of a recommender system and measures the number of correctly recommended items in the top K items in the recommendation ranking list as a percentage of the total sample size. The formula is calculated as

P @ K = \frac{n_{h i t}}{n}

(26)

where

n_{h i t}

denotes the number of correctly recommended items in the top K items in the recommendation ranking list and n is the total sample size.

MRR@K is a metric used to evaluate the accuracy of a recommender system, which considers the ranking of the recommended items in the sample. A larger value of this metric indicates that the correctly recommended item was placed higher in the ranked list. The formula is calculated as

M R R @ K = \frac{1}{n} \sum_{i \in M} \frac{1}{r a n k_{i}}

(27)

where M denotes the sample set of correctly recommended items among the first K recommended items;

r a n k_{i}

represents the rank of item i in the recommended list.

In this work, we report results for K = 10, 20.

4.3. Parameterization

For a fair comparison, the embedding size of all models was 128 and, in model training, the model parameters were optimized using the minimum batch-based Adam algorithm, with the initial learning rate set to 0.001 and the learning rate decayed to 0.1 for every three iterations. The trained batch size was 100, and all parameters were initialized using Gaussian distributions with a mean of 0 and standard deviation of 0.1. For this paper, the above parameters were consistent. Other hyperparameters were experimentally adjusted according to each dataset, such as the number of channels in the decentralized interest extraction network, the comparison learning hyperparameters, etc.

4.4. Model Performance Comparison and Analysis

To demonstrate the validity of the MGMI-CEHCN model, we compared it with the following methods: (1) S-POP recommends by giving priority to the items that are most frequently observed during the present interaction. (2) SKNN determines the relevance of candidate items by analyzing their frequency of occurrence in neighboring sessions. (3) GRU4Rec [5] learns and captures patterns and dependencies in sequential data by recursively learning and capturing them, in order to better understand and predict associations and transitions between different elements in a session. (4) NARM [14] uses an attention mechanism to capture the user’s main purpose. (5) SR-GNN [15] applies graph neural networks to session data to exploit interaction relationships in graph structures. (6) LESSR [33] successfully solves the problem of information loss in graph neural network-based SBR models. (7) S²-DHCN [17] constructs hypergraphs to capture higher-order dependencies between items and mine complex relationships between items. (8) COHHN [46] constructs heterogeneous hypergraph networks to represent information and enhances the relationship between price and interest preferences through co-leading patterns.

The experimental results for the overall performance are shown in Table 2, with the best results in each column highlighted by underlining. The following conclusions can be drawn from the analysis:

As can be seen in Table 2, the RNN-based recommendation methods outperformed traditional methods such as S-POP and SKNN, which confirms their key role in session recommendation modeling. NARM outperformed GRU4Rec to a large extent, which can be attributed to the attentional mechanism introduced to capture the main intention in the session. The MGMI-CEHCN and GNN-based models (SR-GNN LESSR) and S2-DHCN outperformed traditional approaches because they take into account the sequential information embedded in the session data and incorporate the user’s long- and short-term interests to capture the dynamic variability of user interests and thus achieve efficient recommendation performance. The results also showed that session preference encoders augmented with graph neural networks show remarkable results in learning more accurate session representations from session sequences.

The SR-GNN model, which also uses graph neural networks and attentional mechanisms, was not as capable in extracting session-level representations as the MGMI-CEHCN model, which combines contrast enhancement strategies. This further confirms the importance of filtering out irrelevant user behavioral data for the MGMI-CEHCN, where the adjunctive task based on contrast learning and the goal-supervised task constitute a complementary modeling mechanism that can address the data relevance problem inherent in the representation of implicit feedback and weakly-supervised problems, in principle, by learning robust representations that are applicable to session-based recommendations.

The gnn-based approach (SR-GNN) relies on the ability to model pairwise relationships between nodes to achieve competitive performance. S2-DHCN models higher-order relationships between items. In addition, COHHN models users’ price preferences and interest preferences by extracting them through the attention layer, and its performance is greatly improved. However, fundamentally, these methods can only parse out a single level of information, resulting in a model that cannot accurately capture the multiple potential interests hidden in various item features. As a result, their performance was still inferior to the proposed MGMI-CEHCN model.

The MGMI-CEHCN model proposed in this paper showed a large advantage over all datasets. On Cosmetics, its P@20 reached up to 55.25% and M@20 reached up to 38.26%, an improvement of 3.06% and 3.09%, respectively; on Diginetica-buy, P@20 reached up to 65.60% and M@20 reached up to 27.47%, an improvement of 2.47% and 6.64%, respectively. This was mainly due to the MGMI-CEHCN’s comprehensive consideration of global and local session representations. Meanwhile, the MGMI-CEHCN model also achieved good performance on Amazon, which proved the effectiveness of the MGMI-CEHCN. We believe that the improvement shown by the MGMI-CEHCN was due to modeling the session data as a heterogeneous hypergraph, embedding the information from the two granularities of item price and category through a hypergraph, then fusing the information between different granularities, and obtaining the final item embedding after multi-layer convolution. Finally, an interest perceptron is utilized to detect multiple potential interests for each item, and a decentralized interest extraction network based on GRU is used to integrate the user’s final interests and obtain a global session representation through a soft attention mechanism; a local session representation is generated with the help of a weighted line graph convolutional network. In addition, the joint comparison enhancement strategy improved the performance by maximizing the mutual information between two session representations.

4.5. Ablation Experiment

The MGMI-CEHCN model can be seen as the result of the union of different modules. In order to evaluate the validity and robustness of the proposed modules, the five main parts of the MGMI-CEHCN were deleted separately, which are

M G M I - C E H C N^{C}

,

M G M I - C E H C N^{P}

,

M G M I - C E H C N^{M - G}

,

M G M I - C E H C N^{M - I}

, and

M G M I - C E H C N^{L}

. The five variants are further described as follows:

M G M I - C E H C N^{C} :

Deleting category information in the M-G encoder.

M G M I - C E H C N^{P} :

Delete the price information from the M-G encoder.

M G M I - C E H C N^{M - G} :

Delete the model of the M-G encoder module of the MGMI-CEHCN.

M G M I - C E H C N^{M - I} :

Delete the model of the M-I encoder module of the MGMI-CEHCN.

M G M I - C E H C N^{L} :

Delete the model for the joint contrast enhancement module.

The experimental results are shown in Table 3, the MGMI-CEHCN model consistently outperformed the five variants on the Cosmetics, Diginetica-buy, and Amazon datasets. This means that each component played an active role in the recommendation process of the MGMI-CEHCN model, thus validating the effectiveness of the methodology of this paper. Overall,

M G M I - C E H C N^{C}

outperformed

M G M I - C E H C N^{P}

, and

M G M I - C E H C N^{P}

outperformed

M G M I - C E H C N^{M - G}

, suggesting that embedding two types of granularity information, price and category, into the items can indeed improve the accuracy of the recommender system. However, the price information of each item appeared to be more important compared to the category information, and we believe that the price information is usually more real-time and sensitive. Users may check the current price situation before making a purchase, whereas category information may have relatively less impact on the purchase decision. Therefore, price information is more likely to attract users’ attention and affect decisions in session recommendation. When the joint comparison learning module was removed, this produced a significant performance degradation in all cases. Deleting

M G M I - C E H C N^{M - G}

and

M G M I - C E H C N^{M - I}

was found to significantly degrade all metrics, but for the Amazon dataset, deleting

M G M I - C E H C N^{M - I}

did not have a significant impact on the performance, and this was probably because Amazon’s average length is shorter relative to the other two datasets, and we believe that long sessions may have a higher signal-to-noise ratio relative to short sessions, because they provide more data points to distinguish between the user’s real interest behavior and noise. By analyzing long sessions, short-term interests or incidental behaviors can be better filtered out, leading to more accurate modeling of user interests.

4.6. Analysis of Multi-Interest Channels

Multiple items for the session in the MGMI-CEHCN model imply potential user interests, and the number of channel counts has a large impact on modeling multiple interests in a session using a decentralized GRU-based interest extraction network. Because of the limitations of the model structure, the number of channels is an integer multiple of its embedding size relative to the model embedding size. Therefore, representative values were selected, including 2, 4, 8, and 16, to analyze the performance.

According to the observations in Figure 4, both P@20 and MRR@20 were best at a k of 8 on the Cosmetics and Diginetica-buy datasets, while for the Amazon dataset, the performance was best at a k of 4. To explain this discrepancy, we believe that shorter session average lengths may mean that users are involved in less information per session, and thus the model may require fewer channels to efficiently capture features and patterns in the session. Conversely, longer session average lengths may contain more information and interactions, and thus this may require more channels to capture richer and more complex features in the session. For all three datasets, the performance showed an increase and then a decrease as the number of channels increased. Too few channels may lead to underfitting of the model, and too many channels may lead to overfitting of the model.

4.7. Comparison and Analysis of Contrastive Learning Parameters

In the experiment, the other hyperparameters were fixed and the value of

κ

was varied to observe how the performance changed as

κ

was increased. A typical set of values, including 0.001, 0.01, 0.02, 0.03, and 0.05, were selected to analyze their effect on the performance. According to the observations in Figure 5, the best performance of the model on the Cosmetics and Diginetica-buy datasets occurred when

κ

was set to 0.001; while on the Amazon dataset, the best performance occurred when

κ

was set to 0.01. Furthermore, the performance of the MGMI-CEHCN model decreased as

κ

increased, due to gradient conflicts between the recommendation and comparison tasks. Therefore, an appropriate value of

κ

should be chosen when training a model for the joint comparison enhancement task.

4.8. Analysis of Price Level Quantities

For the MGMI-CEHCN model, the number of price levels determines whether the price granularity information can be fully embedded adequately. We selected different θ values for the three datasets, to conduct relevant experiments and investigate their impact on the model performance. As shown in Figure 6, the optimal value for the number of price levels θ varied across datasets, from 10 on the Cosmetics dataset, to 100 on the Diginetica-buy dataset, to 50 on the Amazon dataset. We hypothesized that the diversity of user pricing preferences across datasets led to this variation. Of course, we can also observe that the three datasets were designed to obtain the best performance when the number of price levels is few or many. We believe that when θ is set to few, the model cannot fully learn the user’s price preference and can only roughly obtain a high or low price; whereas when θ is set to many, this will result in goods with similar prices being assigned to different price levels, which in turn leads to performance degradation.

5. Conclusions

To address the shortcomings of session recommendation, a new multi-granularity and multi-interest contrast-enhanced hypergraph convolutional network model for session recommendation was proposed. The model not only embeds the information of the item itself from two granularities of item price and category through an M-I encoder, but it also extracts multiple potential interests of the user through an M-G encoder, which alleviates the incomplete recommendation problem in session recommendation. Meanwhile, auxiliary tasks are designed to generate self-supervised signals for comparison learning through global session embedding and local session embedding, which enhances the model’s ability to extract session data and effectively mitigates the influence of data sparsity on the recommendation performance.

In the next step, we could consider how to better fuse the information that can be utilized in a session, and embed multiple granularity information such as the brand, purchase frequency, and time preference of the item into the item itself based on a heterogeneous hypergraph, believing that reasonable utilization of this information could effectively improve the effectiveness of the recommendation.The MGMI-CEHCN model focuses on the recommendation accuracy and pays less attention to recommendation efficiency. Therefore, we will integrate the knowledge distillation technique [47] into the MGMI-CEHCN model from the perspective of lightweighting, to accomplish an efficient recommendation performance and thus improve its real-time recommendation efficiency.

Author Contributions

Conceptualization, X.M.; Methodology, X.M.; Software, X.M.; Validation, X.M.; Formal analysis, X.M.; Investigation, X.M.; Resources, J.H.; Data curation, J.H.; Writing—original draft, X.M.; Writing—review and editing, X.M. and L.L.; Visualization, X.M.; Supervision, L.L. and J.H.; Project administration, L.L.; Funding acquisition, L.L. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data available in a publicly accessible repository. The original data presented in the study are openly available in [Cosmetics] at [https://www.kaggle.com/mkechinov/ecommerce-events-history-in-cosmetics-Shop, accessed on 29 July 2024]. The original data presented in the study are openly available from [Diginetica-buy] in [https://competitions.codalab.org/competitions/11161, accessed on 29 July 2024]. The original data presented in the study are openly available in [Amazon] at [http://jmcauley.ucsd.edu/data/amazon/, accessed on 29 July 2024].

Conflicts of Interest

The authors declare no conflicts of interest.

References

Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
Feng, S.; Li, X.; Zeng, Y.; Cong, G.; Chee, Y.M. Personalized ranking metric embedding for next new poi recommendation. In Proceedings of the Twenty-Fourth International Joint Conference on Artificial Intelligence, Buenos Aires, Argentina, 25–31 July 2015. [Google Scholar]
Mobasher, B.; Dai, H.; Luo, T.; Nakagawa, M. Using sequential and non-sequential patterns in predictive web usage mining tasks. In Proceedings of the 2002 IEEE International Conference on Data Mining, Maebashi City, Japan, 9–12 December 2002; pp. 669–672. [Google Scholar]
Zhan, Z.; Zhong, L.; Lin, J.; Pan, W.; Ming, Z. Sequence-aware similarity learning for next-item recommendation. J. Supercomput. 2021, 77, 7509–7534. [Google Scholar] [CrossRef]
Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv 2015, arXiv:1511.06939. [Google Scholar]
Hidasi, B.; Quadrana, M.; Karatzoglou, A.; Tikk, D. Parallel recurrent neural network architectures for feature rich session based recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 241–248. [Google Scholar]
Bogina, V.; Kuflik, T. Incorporating dwell time in session-based recommenda-tions with recurrent neural networks. In Proceedings of the RecTemp @ RecSys, Como, Italy, 27–31 August 2017; pp. 57–59. [Google Scholar]
Sun, Y.; Zhao, P.; Zhang, H. Ta4rec: Recurrent neural networks with time attention factors for session-based recommendations. In Proceedings of the International Joint Conference on Neural Networks, Rio de Janeiro, Brazil, 8–13 July 2018; pp. 1–7. [Google Scholar]
Ruocco, M.; Skrede, O.S.L.; Langseth, H. Inter-session modeling for session-based recommendation. In Proceedings of the 2nd Workshop on Deep Learning for Recommender Systems, Como, Italy, 27 August 2017; pp. 24–31. [Google Scholar]
Quadrana, M.; Karatzoglou, A.; Hidasi, B.; Cremonesi, P. Personalizing session-based recommendations with hierarchical recurrent neural networks. In Proceedings of the 11th ACM Conference on Recommender Systems, Como, Italy, 27–31 August 2017; pp. 130–137. [Google Scholar]
Phuong, T.M.; Thanh, T.C.; Bach, N.X. Combining user-based and session-based recommendations with recurrent neural networks. In Proceedings of the International Conference on Neural Information Processing, Montréal, QC, Canada, 3–8 December 2018; Springer: Cham, Switzerland, 2018; pp. 487–498. [Google Scholar]
Song, Y.; Lee, J.G. Augmenting recurrent neural networks with high-order user-contextual preference for session-based recommendation. arXiv 2018, arXiv:1805.02983. [Google Scholar]
Qu, Y.; Cai, H.; Ren, K.; Zhang, W.; Yu, Y.; Wen, Y.; Wang, J. Product-based neural networks for user response prediction. In Proceedings of the IEEE 16th International Conference on Data Mining, Barcelona, Spain, 12–15 December 2016; pp. 1149–1154. [Google Scholar]
Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; ACM: New York, NY, USA, 2017; pp. 1419–1428. [Google Scholar]
Wu, S.; Tang, Y.; Zhu, Y.; Wang, L.; Xie, X.; Tan, T. Session-based recommendation with graph neural networks. Aaai Conf. Artif. Intell. 2019, 33, 346–353. [Google Scholar] [CrossRef]
Qiu, R.; Li, J.; Huang, Z.; Yin, H. Rethinking the item order in session-based recommendation with graph neural networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; ACM: New York, NY, USA, 2019; pp. 579–588. [Google Scholar]
Liu, G.; Huang, S.; Long, A.; Wang, Y.; Zhu, X. Research on intelligent recommendation method of product design knowledge based on hypergraph network. Appl. Res. Comput. 2022, 39, 2962–2967. [Google Scholar]
Xia, X.; Yin, H.; Yu, J.; Wang, Q.; Cui, L.; Zhang, X. Self-supervised hypergraph convolutional networks for session-based recommendation. Proc. Aaai Conf. Artif. Intell. 2021, 35, 4503–4511. [Google Scholar] [CrossRef]
Liu, Q.; Zeng, Y.; Mokhosi, R.; Zhang, H. STAMP: Short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; ACM: New York, NY, USA, 2018; pp. 1831–1839. [Google Scholar]
Guo, Y.; Ling, Y.; Chen, H. A Time-aware graph neural network for session-based recommendation. IEEE Access 2020, 8, 167371–167382. [Google Scholar] [CrossRef]
Xia, X.; Yin, H.; Yu, J.; Shao, Y.; Cui, L. Self-supervised graph co-training for session-based recommendation. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual, 1–5 November 2021; pp. 2180–2190. [Google Scholar]
Guangbin, B.; Gangle, L.; Guoxiong, W. Bimodal interactive attention for Multimodal sentiment analysis. J. Front. Comput. Sci. Technol. 2022, 16, 909–916. [Google Scholar]
Zhang, W. Research on Recommendation Algorithm Based on Deep Learning and Matrix Decomposition. Ph.D. Thesis, South China University of Technology, Guangzhou, China, 2020. [Google Scholar]
Wu, M. Design and Implementation of Movie and TV Recommendation System Based on Collaborative Filtering. Master’s thesis, East China Normal University (ECNU), Shanghai, China, 2022. [Google Scholar]
Covington, P.; Adams, J.; Sargin, E. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 191–198. [Google Scholar]
Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing Personalized Markov Chains for Next-Basket Recommendation. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820. [Google Scholar]
Le, D.-T.; Fang, Y.; Lauw, H.W. Modeling sequential preferences with dynamic user and context factors. In Proceedings of the Machine Learning and Knowledge Discovery in Databases: European Conference, Riva del Garda, Italy, 19–23 September 2016; pp. 145–161. [Google Scholar]
Tang, J.; Wang, K. Personalized Top-N Sequential Recommendation via Convolutional Sequence Embedding. In Proceedings of the 11th ACM International Conference on Web Search and Data Mining, Marina Del Rey, CA, USA, 5–9 February 2018; pp. 565–573. [Google Scholar]
Tan, Y.K.; Xu, X.; Liu, Y. Improved Recurrent Neural Networks for Session-Based Recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 17–22. [Google Scholar]
Jannach, D.; Ludewig, M. When Recurrent Neural Networks meet the Neighborhood for Session-Based Recommendation. In Proceedings of the 11th ACM Conference on Recommender Systems, Como, Italy, 27–31 August 2017; pp. 306–310. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is All You Need. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 6000–6010. [Google Scholar]
Kang, W.; McAuley, J.J. Self-Attentive Sequential Recommendation. In Proceedings of the 18th IEEE International Conference on Data Mining, Singapore, 17–20 November 2018; pp. 197–206. [Google Scholar]
Chen, T.; Wong, R.C.-W. Handling Information Loss of Graph Neural Networks for Session-Based Recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual, 6–10 July 2020; pp. 1172–1180. [Google Scholar]
Wang, H.; Zeng, Y.; Chen, J.; Zhao, Z.; Chen, H. A Spatiotemporal Graph Neural Network for Session-Based Recommendation. Expert Syst. Appl. 2022, 202, 117114. [Google Scholar] [CrossRef]
Xu, C.; Zhao, P.; Liu, Y.; Sheng, V.S.; Xu, J.; Zhuang, F.; Fang, J.; Zhou, X. Graph Contextualized Self-Attention Network for Session-Based Recommendation. In Proceedings of the 28th International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; pp. 3940–3946. [Google Scholar]
Huang, C.; Chen, J.; Xia, L.; Xu, Y.; Dai, P.; Chen, Y.; Bo, L.; Zhao, J.; Huang, J.X. Graph-Enhanced Multi-Task Learning of Multi-Level Transition Dynamics for Session-based Recommendation. In Proceedings of the 35th AAAI Conference on Artificial Intelligence, Virtual, 2–9 February 2021; pp. 4123–4130. [Google Scholar]
Pan, Z.; Cai, F.; Chen, W.; Chen, H.; De Rijke, M. Star Graph Neural Networks for Session-Based Recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 1195–1204. [Google Scholar]
Deng, Z.H.; Wang, C.D.; Huang, L.; Lai, J.H.; Philip, S.Y. G3SR: Global Graph Guided Session-based Recommendation. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 9671–9684. [Google Scholar] [CrossRef] [PubMed]
Wang, Z.; Wei, W.; Cong, G.; Li, X.L.; Mao, X.L.; Qiu, M. Global Context Enhanced Graph Neural Networks for SessionBased Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 169–178. [Google Scholar]
Hjelm, R.D.; Fedorov, A.; Lavoie-Marchildon, S.; Grewal, K.; Bachman, P.; Trischler, A.; Bengio, Y. Learning deep representations by mutual information estimation and maximization. In Proceedings of the 7th International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019; pp. 8574–8583. [Google Scholar]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, Virtual, 13–18 July 2020; pp. 1597–1607. [Google Scholar]
Zhou, K.; Wang, H.; Zhao, W.X.; Zhu, Y.; Wang, S.; Zhang, F.; Wang, Z.; Wen, J.R. S3-Rec: Self-Supervised Learning for Sequential Recommendation with Mutual Information Maximization. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual, 19–23 October 2020; pp. 1893–1902. [Google Scholar]
Yao, T.; Yi, X.; Cheng, D.Z.; Yu, F.; Chen, T.; Menon, A.; Hong, L.; Chi, E.H.; Tjoa, S.; Kang, J.; et al. Self-Supervised Learning for Large-Scale Item Recommendations. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Virtual, 1–5 November 2021; pp. 4321–4330. [Google Scholar]
Wu, J.; Wang, X.; Feng, F.; He, X.; Chen, L.; Lian, J.; Xie, X. Self-Supervised Graph Learning for Recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Montréal, QC, Canada, 11–15 July 2021; pp. 726–735. [Google Scholar]
Greenstein-Messica, A.; Rokach, L. Personal price aware multi-seller recommender system: Evidence from eBay. Knowl. Based Syst. 2018, 150, 14–26. [Google Scholar] [CrossRef]
Zhang, X.; Xu, B.; Yang, L.; Li, C.; Ma, F.; Liu, H.; Lin, H. Price DOES Matter! Modeling Price and Interest Preferences in Session-based Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; Volume 7, pp. 1684–1693. [Google Scholar]
Wu, X.; He, R.; Hu, Y.; Sun, Z. Learning an evolutionary embedding via massive knowledge distillation. Int. J. Comput. Vis. 2020, 128, 2089–2106. [Google Scholar] [CrossRef]

Figure 1. Absolute prices are discretized into different price classes based on categories, (adapted from ref. [46]).

Figure 2. Construction of heterogeneous hypergraphs and weighted line graphs.

Figure 3. Proposed MGMI-CEHCN model.

Figure 4. The effect of the number of channels of multiple interests.

Figure 5. The effect of parameter size on contrastive learning.

Figure 6. Performance of MGMI-CEHCN for different numbers of price levels.

Table 1. Dataset characteristics after preprocessing. (adapted from ref. [46]).

Dataset	Number of Items	Price Level	Cate	Interaction	Session	Average Lengths
Cosmetics	23,194	10	301	1,058,263	156,922	6.74
Diginetica-buy	24,889	100	721	855,070	187,540	4.56
Amazon	9114	50	613	487,701	204,036	2.39

Table 2. Performance comparison of different models with public datasets (unit: %) (adapted from ref. [46]). (a) Cosmetics, (b) Diginetica-buy, (c) Amazon. The highest performances are shown in the table as underlined.

Method	P@10	MRR@10	P@20	MRR@20
(a) Cosmetics
S-POP	32.83	26.63	38.43	27.32
SKNN	40.22	30.40	47.63	30.80
GRU4Rec	19.41	14.43	21.80	14.60
NARM	42.63	34.17	46.29	34.52
SR-GNN	44.11	34.59	48.01	34.96
LESSR	38.80	24.45	46.32	24.97
S²-DHCN	40.48	32.86	47.95	33.13
COHHN	47.88	36.38	53.56	36.79
MGMI-CEHCN	49.38	37.86	55.25	38.26
Improv	3.13	4.07	3.06	3.99
(b) Diginetica-buy
S-POP	25.51	18.82	25.91	19.84
SKNN	45.68	20.24	55.76	21.10
GRU4Rec	22.04	11.32	27.88	11.73
NARM	46.56	21.76	57.34	23.27
SR-GNN	45.74	21.32	56.80	22.87
LESSR	47.88	20.82	61.35	22.64
S²-DHCN	45.89	21.08	54.91	22.03
COHHN	50.57	24.81	64.02	25.76
MGMI-CEHCN	52.25	26.56	65.60	27.47
Improv	3.32	7.05	2.47	6.64
(c) Amazon
S-POP	34.60	31.96	38.03	32.19
SKNN	61.55	46.07	64.23	46.30
GRU4Rec	55.43	51.43	56.41	51.70
NARM	63.21	57.07	65.38	57.23
SR-GNN	65.32	57.46	65.83	57.89
LESSR	62.48	56.53	64.18	56.69
S²-DHCN	58.67	49.86	60.47	50.03
COHHN	65.32	58.78	67.69	59.01
MGMI-CEHCN	65.81	59.39	68.39	59.59
Improv	0.75	1.03	1.03	0.98

Table 3. Comparison results of the MGMI-CEHCN model ablation experiment. unit: %.

Model	Cosmetics		Diginetica-Buy		Amazon
Model	P@20	MRR@20	P@20	MRR@20	P@20	MRR@20
MGMI-CEHCNC^C	54.85	37.86	65.01	26.96	68.12	59.43
MGMI-CEHCNC^P	54.12	37.32	64.88	26.38	67.95	59.26
MGMI-CEHCNC^M-G	53.88	37.05	64.42	25.96	67.89	59.11
MGMI-CEHCNC^M-I	53.72	37.02	64.28	26.05	68.32	59.48
MGMI-CEHCNC^L	54.35	37.55	64.95	26.85	68.01	59.22
MGMI-CEHCNC	55.25	38.26	65.60	27.47	68.39	59.59

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Mao, X.; Li, L.; He, J. Multi-Granularity and Multi-Interest Contrast-Enhanced Hypergraph Convolutional Networks for Session Recommendation. Appl. Sci. 2024, 14, 8293. https://doi.org/10.3390/app14188293

AMA Style

Mao X, Li L, He J. Multi-Granularity and Multi-Interest Contrast-Enhanced Hypergraph Convolutional Networks for Session Recommendation. Applied Sciences. 2024; 14(18):8293. https://doi.org/10.3390/app14188293

Chicago/Turabian Style

Mao, Xingbin, Liang Li, and Jiaxing He. 2024. "Multi-Granularity and Multi-Interest Contrast-Enhanced Hypergraph Convolutional Networks for Session Recommendation" Applied Sciences 14, no. 18: 8293. https://doi.org/10.3390/app14188293

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Multi-Granularity and Multi-Interest Contrast-Enhanced Hypergraph Convolutional Networks for Session Recommendation

Abstract

1. Introduction

2. Related Work

2.1. Traditional Session Recommendation Methods

2.2. Deep-Learning-Based Session Recommendation Method

2.3. A GNN Recommendation Algorithm Incorporating Self-Supervised Comparative Learning

3. Materials and Methods

3.1. Formulation of the Problem

3.2. Overview of the Proposed Model

3.3. M-G Encoder

3.4. Hypergraph Convolutional Neural Network

3.5. M-I Encoder

3.6. Global Session Embedding

3.7. Local Session Embedding

3.8. Predictive Layer

3.9. Joint Contrast Enhancement Strategy

4. Experiments and Analysis of Results

4.1. Datasets and Pre-Processing

4.2. Evaluation Metrics

4.3. Parameterization

4.4. Model Performance Comparison and Analysis

4.5. Ablation Experiment

4.6. Analysis of Multi-Interest Channels

4.7. Comparison and Analysis of Contrastive Learning Parameters

4.8. Analysis of Price Level Quantities

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI