Next Article in Journal
Wideband Patch Antenna with Modified L-Probe Feeding for mmWave 5G Mobile Applications
Previous Article in Journal
Circularly Polarized Modified Minkowski Metasurface-Based Hybrid Dielectric Resonator Antenna for 5G n79 Wireless Applications
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

MSD: Multi-Order Semantic Denoising Model for Session-Based Recommendations

1
School of Computer and Information, Anqing Normal University, Anqing 246133, China
2
The University Key Laboratory of Intelligent Perception and Computing of Anhui, Anqing 230093, China
3
School of Computer and Information Technology (School of Big Data), Shanxi University, Taiyuan 030006, China
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(16), 3118; https://doi.org/10.3390/electronics13163118 (registering DOI)
Submission received: 13 July 2024 / Revised: 4 August 2024 / Accepted: 5 August 2024 / Published: 7 August 2024

Abstract

:
Session-based recommendations which aim to predict subsequent user–item interactions based on historical user behaviour during anonymous sessions can be challenging to carry out. Two main challenges need to be addressed and improved: (1) how does one analyze these sessions to accurately and completely capture users’ preferences, and (2) how does one identify and eliminate any interference caused by noisy behavior? Existing methods have not adequately addressed these issues since they either neglect the valuable insights that can be gained from analyzing consecutive groups of items or fail to take these noisy data in sessions seriously and handle them properly, which can jointly impede recommendation systems from capturing users’ real intentions. To address these two problems, we designed a multi-order semantic denoising (MSD) model for session-based recommendations. Specifically, we grouped items of different lengths into varying multi-order semantic units to mine the user’s primary intentions from multiple dimensions. Meanwhile, a novel denoising network was designed to alleviate the interference of noisy behavior and provide a more precise session representation. The results of extensive experiments on three real-world datasets demonstrated that the proposed MSD model exhibited improved performance compared with existing state-of-the-art methods in session-based recommendations.

1. Introduction

A recommendation system is designed to help people filter information and make efficient choices. Collaborative filtering is the main method used in traditional recommendation systems, providing content for users with specific interests by analyzing their historical behavior. However, in many scenarios, user profiles are unavailable because users are not logged into the system or because of privacy concerns. Moreover, user preferences can change over time, making short-term user behavior more reflective of current user preferences [1]. In such cases, session-based recommendations have garnered growing attention owing to their considerable practical value. Session-based recommendation is one of the most important methods in modern recommendation systems, which capitalize on the analysis of users’ immediate interaction behaviors within a given session, such as the sequence of items clicked or viewed. This approach excels in cases where user profiles are either unavailable or incomplete, by dynamically adapting recommendations based on real-time data observed during a users’ active session, making it particularly valuable for delivering personalized content and suggestions that are closely aligned with the users’ immediate preferences and needs.
Considering the sequential nature of sessions, various methods have been proposed for session-based recommendation, such as methods based on Markov chain [2,3], RNN-based models [4,5,6], and GNN-based models [7,8]. Despite the advancements brought about by these approaches in session-based recommendation, there are still two challenges that have not garnered enough attention. On the one hand, the existing methods always consider each individual item in isolation without taking into account the semantic information that arises from combining multiple items. On the other hand, noisy behavior similar to that in Figure 1 is an inevitable problem in recommendation systems, and can lead to deviations from the primary users’ intentions.
Sadly, current methods have not tackled these challenges effectively. For higher-order semantic information, recent work mainly utilized a hypergraph [9,10] to represent relationships between items. Although methods based on a hypergraph can facilitate the propagation of information from higher-order connections, the underlying node structure remains unchanged. We argue that nodes in session graphs should not be limited to representing single items. For noisy behavior, reinforcement learning [11] and attention mechanisms [12] were the most frequently used methods. Regarding the uncertainty in user behavior, an attention mechanism was employed to mitigate the influence of irrelevant items by assigning them lower weights. While this alleviated the noise issue to some extent, there was still a main limitation that the item serving as the reference for the attention mechanism could itself be noisy data.
To tackle these two challenges, we designed an innovative model to accurately capture user intentions from two perspectives. Firstly, we proposed a new kind of multi-order semantic unit for analyzing user intentions. In previous methods, researchers used each individual item as a basic unit for this purpose. In our method, we combined consecutive items of different lengths as varying multi-order semantic units. Meanwhile, we improved the method for generating item embeddings, making it possible for information to propagate between semantic units of different lengths. Secondly, we developed a novel denoising module to explicitly differentiate and handle noisy items. Specifically, we proposed to mine the users’ main intentions in the session as the reference to calculate the importance of every semantic unit, and then assigned corresponding weights to each semantic unit based on their importance. Finally, we constructed a pure session embedding using the denoised session sequences.
The contributions of this study can be summarized as follows:
  • We propose a new kind of multi-order semantic unit with varying node lengths to mine user intentions from multiple dimensions and improve the information propagation strategy to fuse information from different semantic units.
  • We introduce a novel denoising module based on the varying multi-order semantic units to filter out noisy items and constructed pure session embedding by adjusting the weight of each semantic unit.
  • Experiments analysis conducted on three real-world benchmark datasets revealed that our proposed multi-order semantic denoising (MSD) model achieved better performance than other state-of-the-art models in terms of the two evaluation metrics.

2. Related Work

In this section, we review related work on session-based recommendation systems from two perspectives, namely traditional methods and deep-learning methods.

2.1. Traditional Methods

Conventional methods used in recommendation systems have widely adopted matrix factorization [13,14,15] to learn the characteristic vectors of both users and items from the user–item interaction matrix to generate recommendation results based on vector similarity. To take advantage of the implicit feedback in user behavior, Bayesian personalized ranking matrix factorization methods (BPR-MF) [16] made use of a generic optimization criterion for personalized ranking, providing a generic learning algorithm for optimizing models. Inspired by the idea of collaborative filtering, that is, that similar users tend to have similar interest in similar items, some methods based on the nearest neighbor algorithm [17,18,19] have been proposed to improve the performance of recommendation systems. Item-KNN [17] could be used to calculate the item similarity in a session and provide recommendations based on the last item in the current session. Owing to the lack of knowledge regarding sequential information, Markov-chain-based methods [3,20] were proposed to capture sequential dependencies in a session. For example, factorizing personalized Markov chains (FPMC) [3] could capture long-term user interests and sequential user behavior to provide more accurate recommendations. The hierarchical representation model (HRM) [20] comprised a two-level hierarchical structure to fuse user preferences and sequence information. However, these traditional methods could not explore the complex transformation relationships between items within a session in depth; that is, these methods assumed that all items in a session were equally important for the analysis of user preferences, which was inconsistent with the actual situation. Given the aforementioned problems associated with traditional methods, researchers have begun to introduce deep-learning methods to session-based recommendation systems.

2.2. Deep-Learning Methods

In recent years, deep learning has been widely used in various research fields as well as in recommendation systems. The accuracy, precision, and impact of personalized recommendations based on deep learning have greatly improved, especially in session-based recommendation systems. Neural-based methods such as RNNs [21] have been widely adopted to capture sequential information in sessions. GRU4Rec [4] introduced an RNN into the field of session-based recommendations, using gated recurrent units [22] to model sequential user behavior. Subsequently, several methods [6,23] have been proposed to compensate for the deficiencies of GRU4Rec. Some methods improved the RNN model through data enhancement [6], while another method [23] proposed a hierarchical RNN model that used both intra- and cross-session information to describe the changes in user interests to generate personalized recommendations. The neural attentive recommendation machine (NARM) [5] introduced an attention mechanism for GRU4Rec that could capture the users’ main intentions. Inspired by the NARM, other methods have applied attention mechanisms to enhance models [9,24,25]. The short-term attention/memory priority model (STAMP) [24] focused on strengthening the impact of users’ recent behavior when modeling user click behavior over a long time series. BERT4Rec [26] used a bidirectional self-attention network to improve the representation ability of the model. Recently, researchers have found that graph structures can contribute to the exploration of the relationships between items, providing supplementary information that is not available in sequential structures. Consequently, methods based on session graphs have been widely adopted for session-based recommendation systems.
The GNN [27,28] was introduced to model complex dependencies between items by building a session graph. Session-based recommendation with graph neural network (SR-GNN) [7] was the first method to encode sequential session data into a graph structure and apply a gated graph neural network to extract transition information. To solve the problem of information loss in GNNs, the lossless edge-order preserving aggregation and shortcut graph attention for session-based recommendation method (LESSR) [29] adjusted the network structure to mine the users’ consumption habits during a certain period of time. Star graph neural networks with highway networks (SGNN-HN) [25] used star graphs to propagate information from items without direct connections. Although these studies achieved promising results, there was still a problem to be solved; that is, there were always noisy clicks in the sessions, which meant that not all consecutive items had dependencies pn one another. Consequently, several methods have been proposed to handle the problem of noisy clicks [11,12,30]. The reinforcement learning-based denoising network (RED) [11] transformed the sequential denoising problem into a Markov decision process (MDP), applying reinforcement learning to separate session sequences. The graph neural network with global noise filtering (GNN-GNF) [30] filtered noisy clicks at an item level and at a session level. The dynamic intent-aware iterative denoising network (DIDN) [12] included an iterative denoising module by learning dynamic item embedding. These methods reduced noisy-click interference on the model to some extent, but none of them could filter noisy clicks from multiple dimensions. In this study, our proposed model adopted semantic units as the basic nodes to construct a novel denoising network which could discriminate and remove noisy clicks based on both individual and combined expressions of items, providing more accurate and comprehensive recommendation results.

3. Methodology

3.1. Problem Definition

Session-based recommendations aim to predict a user’s next click based on the sequence of items with which the user has interacted. In contrast to sequential recommendations, users are anonymous (without profile information) in session-based recommendations. Here, we set I = { v 1 , v 2 , v 3 , , v n } to denote all items in the dataset and V s = { v s , 1 , v s , 2 , v s , 3 , , v s , l } to denote the current anonymous session s, where n and l denote the number of items in the dataset and session, respectively. The proposed MSD model was applied to produce a set of candidate items C = { v r , 1 , v r , 2 , v r , 3 , , v r , k } , in which users are more likely to click on items with higher ranks. In the proposed model, we selected the top N items from set C as the recommendation results.
In order to facilitate the description of the content in the paper, Table 1 shows the relevant key formula symbols used in the paper.

3.2. Model Framework

Our idea of designing models is specifically tailored to address the two challenges presented in the introduction. Firstly, the products users browse on an e-commerce platform are not isolated data points. They collectively form the user’s browsing path, which can reveal the user’s underlying purchase intentions and preferences. Combining the browsed products into semantic units of varying lengths is an attempt to capture and utilize this supplemental intention information. Semantic units can be constructed based on the similarity of products (such as category, brand, price, etc.) or defined based on the sequence and time intervals of user browsing. Semantic units of different lengths can reflect different aspects of user behavior, where short sequences may represent immediate interests, while long sequences may reveal long-term preferences or shopping habits. Secondly, users navigating through an e-commerce platform often explore various products out of curiosity, without a clear intention to purchase, and these do not align with their primary interests or needs at that moment. Attention mechanisms can help to identify noisy items in user sessions. By arranging the focus of user browsing behaviors, the model can concentrate attention on products that are relevant to the user’s actual interests, reducing the impact of noisy items on recommendation results.
To solve these challenges and improve the performance of session-based recommendation, we designed the method framework of our proposed MSD model, which contains three key modules, namely a multi-order semantic construction module, a denoising module, and a prediction module. As shown in Figure 2, the upper part is the multi-order semantic construction module. In this module, we combine items of different lengths as semantic units and convert them to a heterogeneous graph. Then, we use two types of fusion functions to generate the embedding of each semantic unit. For example, in Figure 2, we first divide and merge the items in the session into 3-order semantic units. Following that, we employ the set-aware fusion function and sequence-aware fusion function to generate the embedding. The bottom left part is the denoising module. In this module, we employ an attention mechanism to judge the similarity between different semantic units and the main intentions to screen out noisy nodes. The biggest difference from traditional attention mechanisms, which use the last item of a session as the reference, is that we first mine the user’s main intentions from multi-order semantic units. We argue that the last item in the session could also be noise data. Moreover, using only the last item may not capture the user’s evolving preferences and interests over the course of the session, leading to a bias towards recent items in the recommendation. By using the main intention as a reference standard, these two issues can be effectively avoided. The bottom right part of Figure 2 is the prediction module. In this module, we employ a prediction layer to compare the correlation between the session embedding and the embedding of each item, selecting the top 20 items with the highest correlation as our recommendation results. By integrating results from different orders, we can take into account semantic information across various dimensions to generate more diverse recommendation outcomes.
Next, we will elaborate on the technical details of the three modules mentioned above in Section 3.3, Section 3.4 and Section 3.5.

3.3. Multi-Order Semantic Unit Construction Module

In this section, we demonstrate how to convert item nodes into semantic units and how to initialize and update their representations.

3.3.1. Multi-Order Semantic Unit Construction

Currently, most session-based recommendation methods consider each item separately. However, high-level semantic information [31] is ignored. In our model, we not only consider the correlation between individual items but also focus on the combined information revealed by several consecutive items. We define v j k to be a semantic unit, in which k represents its order and j represents its position. For example, in Figure 2 the session is S 1 = { v 1 , v 2 , v 3 , v 2 , v 4 , v 3 , v 1 } . Here, we can obtain the first-order semantic units, that is, v 1 1 = { v 1 } , v 2 1 = { v 2 } , v 3 1 = { v 3 } , , v 6 1 = { v 1 } . The combined session fragments { v 1 , v 2 } , { v 2 , v 3 } , , { v 3 , v 1 } are second-order semantic units denoted by v 1 2 , v 2 2 , , v 3 2 , with the third-order semantic units being v 1 3 = { v 1 , v 2 , v 3 } ,   v 2 3 = { v 2 , v 3 , v 2 } , v 3 3 = { v 3 , v 2 , v 4 } , , v 5 3 = { v 4 , v 3 , v 1 } .
During node initialization, we assign the learnable embedding vector ( e j 1 ) to represent each first-order semantic unit. A high-order semantic unit is initialized by fusing the first-order semantic units. Specifically, we designed a two-part fusion function to aggregate the related information revealed by the included items. Considering that the items in each high-order semantic unit are sequential, that is, although two high-level semantic units may contain the same items, the positions of items in two high-order units may differ, we use two ways to integrate item information, named the Set-aware function and the Sequence-aware function. In the Set-aware function, we use the MEAN function to integrate the dimension information, whereas in the Sequence-aware function, we use a gated recurrent unit (GRU) to extract the position information. For a higher-order semantic unit, the final intention can be calculated as follows:
e j k = F s e t ( e j 1 , e j + 1 1 , e j + 2 1 , , e j + k 1 1 ) + F s e q ( e j 1 , e j + 1 1 , e j + 2 1 , , e j + k 1 1 )
where e j k denotes the learnable embedding vector of the k-order semantic unit, and F s e t and F s e q denote the Set-aware and Sequence-aware functions, respectively.

3.3.2. Semantic Unit Embedding Learning

Before learning the session representation, we constructed a heterogeneous session graph for each session. Each session can be modeled as a directed graph G = ( V s , E s ) , where V s and E s denote the semantic units and the edges between them, respectively. An example of a heterogeneous session graph is shown in Figure 2, there being two edge types in this graph, namely, the intra-order edge and the inter-order edge. Intra-order edges connect semantic units of the same order, enabling direct interaction and information exchange within the same hierarchical level of semantic granularity, whereas inter-order edges provide a channel for intention propagation between high-order semantic units and their adjacent first-order semantic units.
With the evolution of the network, the model constantly encounters unseen nodes. Inspired by the GraphSAGE framework [32], we employed a heterogeneous graph convolution network to learn the representation of each semantic unit. Node embeddings are generated by aggregating related information from the node and its neighbors. Specifically, for node v, we can update its representation as follows: first, we aggregate the representation of the nodes in v’ immediate neighbors into a single-vector A N ( v ) k 1 . After aggregating the neighbors’ feature vectors, we concatenate the node’s current representation h v k 1 with the aggregated neighborhood vector A N ( v ) k 1 before feeding it through a fully connected layer with nonlinear activation function σ . Given a neighbor set N ( v ) , the aggregation mechanism for updating node v can be expressed as follows:
A N ( v ) k 1 = a g g r e g a t e ( { h j k 1 , j N ( v ) } )
h v k = σ ( W K · c o n c a t ( A N ( v ) k 1 , h v k 1 ) )
h v k = N o r m ( h v k )
h = h h m i n h m a x h m i n
where W k R d × d denotes the projection weights which are learned and layer-specific across different layers, h v 0 = e v 0 denotes the initial representation of the semantic unit v, and the NORM, which is used to accelerate model convergence, can be represented as Equation (5).

3.4. Denoising Module

The denoising process was completed in three steps. First, we extracted the user’s main intentions from the session as a criterion to determine which items tended to be noisy clicks. We then used the soft-attention mechanism [33] to measure the correlation between each semantic unit and the main intention. Finally, we performed noise reduction based on the attention coefficients.

3.4.1. Main Intention

The first step of the denoising module is to find the approximate direction in which users are searching, which we call the main intention. Considering that user intentions may change in a session, much like the initialization of higher-order semantic units, both the items contained in the session and their relative positions need to be handled carefully. We used two methods to extract these two pieces of information separately. Given session S, the main intention can be expressed as follows:
e m k = C 1 · F s e t ( e 1 k , e 2 k , e 3 k , , e n k ) + C 2 · F s e q ( e 1 k , e 2 k , e 3 k , , e n k )
where C 1 + C 2 = 1 denotes the weight coefficient learned by the network, which determines the importance of the two types of information, n denotes the number of semantic units in each order, e m k denotes the primary intention of the k-order semantic units, and F s e t and F s e q denote the s e t - a w a r e and s e q u e n c e - a w a r e functions, respectively.

3.4.2. Attention Coefficient

To measure the importance of each semantic unit in a session, we can employ an attention mechanism to calculate the relevance between each semantic unit and the main intention. For the k th -order semantic unit, the attention coefficient for each semantic unit v j k can be expressed as follows:
a j k = W 0 k σ ( W 1 k h j k + W 2 k e m k + b k )
where W 0 k R d , W 1 k R d × d , and W 2 k R d × d denote learnable parameters, b k R d denotes the bias, and σ ( · ) denotes the sigmoid function, which is not shared between different layers.

3.4.3. Denoising Coefficient

Based on the attention coefficient calculated using Equation (6), we can assign a normalized weight coefficient to each semantic unit to generate the final session representation. Before normalization, we first set a threshold to remove semantic units, with an attention coefficient much lower than the average level for the interference elimination of items that obviously do not represent user interest, as follows:
a j k = a j k , a a v g k a j k < δ 0 , a a v g k a j k δ
where δ denotes a parameter controlling the denoising depth, the average attention coefficient being calculated using a a v g k = ( 1 n j = 1 n a j k ) .

3.5. Prediction Module

This section comprises two parts; that is, we demonstrate how to generate session embedding, which describes user behavior preferences throughout the entire session, and how to train our model.

3.5.1. Session Embedding Learning

In general, session-based recommendation systems divide session embedding into two parts: local preference and global preference. The local preference focuses on the immediate action within the session, reflecting short-term behaviors and specific contextual influence. On the other hand, the global preference captures broader and more enduring interests that are consistent across the whole session. Similarly, we also adopt this approach in our model. We consider the last item v n k of each order to be local embedding, as follows:
S l o c a l k = h n k
We can use denoised semantic units to represent the global embedding, calculated as follows:
S g l o b a l k = j = 1 n S o f t m a x ( a j k ) h j k
Finally, we can combine the local and global embeddings to generate session embeddings for each order of the semantic units, as follows:
S k = W 3 k [ S l o c a l k · S g l o b a l k ]
where [ · ] denotes the concatenation operation and W 3 k denotes the projection matrix that transforms S k from R 2 d into R d .

3.5.2. Training and Optimization

After generating session embeddings from all orders, that is S 1 , S 2 , S 3 , …, S k , we can adopt the inner product to calculate the similarity between the candidate item and the k-order session embedding. Specifically, given a candidate set C = { v 1 , v 2 , v 3 , , v n } , we can calculate the score for each item v i C as follows:
z i k = < S k · e i 1 >
where < · > denotes the inner product operation. We can then apply a SoftMax function to calculate the probabilities of items becoming the next-click item, as follows:
y k = S o f t m a x ( z 1 k , z 2 k , z 3 k , , z n k )
The final score for each item v i in the candidate set C can be calculated as follows:
S c o r e i = k = 1 k y i k
Finally, we can train the proposed model using the cross-entropy loss, the loss function being as follows:
L ( p , s c o r e ) = i = 1 | C | p · l o g ( S c o r e i ) + ( 1 p ) · l o g ( 1 S c o r e i )
where p denotes the ground truth (that is, whether the user will click on item v i ).

4. Experimental Results

In this section, we describe the details of our experimental settings, including the datasets, baselines, evaluation metrics, and experimental results with comparisons. Among these, we focus on analyzing the experimental results.

4.1. Datasets

We evaluated the performance of MSD on three real-world representative datasets commonly used in the field of session-based recommendations, that is, the Diginetica, Yoochoose 1/64, and Yoochoose 1/4 datasets.
  • Diginetica is a personalized e-commerce dataset released at the CIKM Cup 2016. It contains user interaction records from an online store that we used the as the sessions in the previous week for testing.
  • Yoochoose is a public dataset obtained from the RecSys Challenge 2015. It contains a stream of user clicks on an e-commerce website within six months, which we used as the 1/64 and 1/4 subsamples of all training sessions for testing.
Following Wu et al. [7] and Chen et al. [29], we filtered out sessions with lengths less than two and items that appeared less than five times. We also filtered items from the test dataset that did not appear in the training dataset. Additionally, the data augmentation methods described by Li et al. [5] and Liu et al. [24] were used to process the data in both datasets. The statistics of the preprocessed data are presented in Table 2.

4.2. Baselines

We carefully selected three kinds of methods, namely, nearest neighbor methods, RNN-based methods, and GNN-based methods, as our baselines, their descriptions being as follows:
  • The Item-KNN model [17] recommends items that are similar to the previously clicked items in the current session, and cosine similarity is adopted to calculate the similarity between the items.
  • The GRU4Rec model [4] adopts RNNs to model the sequential behavior of items in the current session.
  • The NARM model [5] improves the GRU4Rec model by adding an attention mechanism to the RNN to capture the main purposes of users.
  • STAMP [24] captures the general interests and current interests of users by replacing the RNN encoder with an attention layer.
  • The SR-GNN model [7] encodes session sequences into a graph structure and employs a GNN to capture the complex item transitions.
  • The SGNN-HN model [25] applies an SGNN to propagate information from items without direct connections and uses an HN to tackle overfitting problems.
  • The LESSR model [29] tackles the information loss and long-range dependency problems of GNN-based models by introducing two kinds of session graphs.
  • The GNN-GNF [30] model leverages graph neural networks and global noise filtering to enhance accuracy and personalization in session-based recommendation tasks.
  • The DIDN [12] model utilizes dynamic intention awareness and iterative denoising to filter noisy items based on user intention, continuously refining recommendation results during the process to enhance personalized recommendations.

4.3. Metrics

To compare the performance of the proposed MSD method with the baseline models, we adopted two widely used metrics, namely the MRR@k and P@K metrics. To maintain the same setup as the previous baselines, we chose to use the top 20 items to evaluate the proposed model, that is, the MRR@20 and P@20 metrics.
MRR@20 (mean reciprocal rank) was used to measure the average reciprocal ranks of correct items, representing whether the correct item was in the top 20 items on the recommendation list. P@20 (precision) was used to measure the prediction accuracy, representing the ratio of the number of correct items in the top 20 items on the recommendation list.

4.4. Experimental Setup

For the proposed MSD model and all baselines, we optimized the model hyperparameters via a grid search on the validation dataset, which was randomly split from the training dataset (using 10% of it). Specifically, we set the number of embedding dimensions to 256 and the batch size to 512. We used the Adam optimizer [34] to optimize our model, and the initial learning rate was set to 0.001 and decayed every three epochs at a rate of 0.1. It should be noted that we searched for the best semantic unit order from 1 to 6 and the best denoising depth from 0.0005 to 0.005. We compared our MSD model with the baseline models using the P@20 and MRR@20 metrics.

4.5. Performance Comparison

To evaluate the overall performance of the proposed model, we compared it with the other representative session-based recommendation methods. A summary of the experimental results for the two datasets is presented in Table 3.
From the results, we can make the following important observations. First, conventional methods typically generate recommendations based on co-occurring or successive items. These methods mostly use simple models and are insufficient for extracting and representing complex information. Deep-learning methods greatly improve the representation ability of models through the use of neural networks, thus achieving better performance than conventional methods. Second, the GRU4Rec model achieves relatively poor results, whereas the NARM achieves competitive results. The reason is that the GRU4Rec model applies only sequential information, while the NARM employs both sequential information and global preferences. Moreover, the SR-GNN model outperforms all previous methods, demonstrating the effectiveness of the graph structure in representing the transition relationship of items in the session. The SGNN-HN model further improves the ability to propagate information and addresses the overfitting problem, indicating that there is still room for improving the ability to extract information. Third, the excellent performance of the DIDN model demonstrates the importance of session denoising. Finally, by extracting information from multiple dimensions and tackling the noise issue, the proposed MSD model achieves the best performance in terms of the P@20 and MRR@20 metrics over the three datasets.
We can elaborate on the improvement in the MSD model from two perspectives. Firstly, the MSD model has the capability to leverage user intention from multiple dimensions. Introducing multi-order semantic units as fundamental nodes within the session allows the MSD model to propagate and extract information more comprehensively. This means that the models are capable of capturing user behavior patterns across various levels of abstraction. In practical applications, user behavior is often not a straightforward linear relationship but rather consists of multiple layers of intentions and needs. For instance, a user may purchase a specific item in the short term, but this purchasing behavior could be related to longer-term needs and interests. By incorporating multi-order semantic units, the model can identify and analyze these complex patterns, leading to more accurate user profiling and demand prediction.
Secondly, the model includes a denoising module following the information propagation step. This denoising module plays a crucial role in enhancing the model’s performance. In practical scenarios, session data often contain irrelevant or noisy information, such as erroneous or non-representative user interactions. Denoising helps to filter out these noisy elements, allowing the model to focus on the most relevant and meaningful data. This process improves the quality of the learned embeddings and ultimately leads to more precise and personalized recommendations for users.

4.6. Ablation Study

In this section, we evaluate each key component of our model through ablation experiments. We successively removed the denoising and multi-order semantic modules in each experiment and recorded the change in performance, which indicated the effectiveness of the removed component.

4.6.1. Impact of the Denoising Module

To tackle the challenges posed by natural noise in session data, we introduced a novel denoising module designed to filter out noisy user behaviors in the session following the propagation of information. In order to demonstrate the effectiveness of this module, we conducted experiments by setting the denoising depth to 1 and removing the attention coefficient calculation. This configuration meant that the model could not differentiate between semantic units and all units were treated as equally important in the session. The results presented in Table 4 indicate a significant decrease in both P@20 and MRR@20 metrics.
These findings underscore the presence of natural noise in session data and emphasize the importance of identifying and mitigating its impact. Removing the denoising module leads to the proposed system being misled by noisy items in the session, resulting in a recommendation list that fails to align with users’ interests. Additionally, treating all items as equally important hindered the system’s ability to discern the main intentions of users, thereby further compromising the accuracy of the recommendation results.
This analysis highlights the critical role played by the denoising module in improving the quality of the recommendation system by effectively filtering out noisy data and enhancing the model’s ability to capture users’ true preferences and interests.

4.6.2. Impact of the Multi-Order Semantic Module

To address the issue of information loss in previous methods, we adopted semantic units as fundamental nodes to explore user preferences more comprehensively. To assess the effectiveness of the semantic unit approach, we conducted experiments with the order limited to 1. This configuration meant that the model could not combine items of different lengths into semantic units, instead treating each individual item as a basic node.
The results of our ablation study clearly indicate that removing the multi-order semantic module has a detrimental effect on both P@20 and MRR@20 metrics. This is because higher-order semantic units contain rich user intention information, and mining this from various orders of semantic units helps to mitigate the problem of information loss. Without the multi-order semantic module, the model loses the ability to capture the intricate relationships and dependencies between items across different lengths of semantic units. This limitation restricts the model’s context awareness and propagation path for long dependencies, ultimately leading to a decrease in recommendation accuracy.
In summary, the multi-order semantic module plays a crucial role in capturing and leveraging the diverse user intention information present in different semantic unit orders. Its removal not only harms the model’s ability to address information loss but also limits its capability to understand the complex patterns and contexts within user sessions, resulting in a decrease in the quality of recommendation results.

4.6.3. In-Depth Analysis

Additionally, we conducted experiments to evaluate the contributions of the intra-/inter-edge and main intention. Specifically, we trained two models without intra- or inter-edges. In the third model, we replaced the main intention with a representation of the last semantic unit to validate its impact. As shown in Table 5, all three models performed worse than the original models.
The model’s performance suffers when intra-edges are removed, as these edges are essential for propagating information between same-order semantic units, providing a crucial foundation for analyzing user preferences. Additionally, the inter-edges also benefit from this propagation of information, further enhancing the model’s understanding of user behavior and improving recommendation accuracy. Unlike previous methods that relied on the last item as a criterion for judging the importance of other items, we calculated the main intention of the session to identify noisy clicks. The results clearly demonstrate that the last item may not always represent the user’s current interest or could even be a noisy item. By calculating the main intention of the session, we can better filter out noisy clicks and focus on the items that truly reflect the user’s interests at a given moment. This approach results in more accurate and personalized recommendations compared to relying solely on the last item of the session, which may not always be indicative of the user’s true preferences.

4.7. Parameter Analysis

The two hyperparameters of the order number and denoising depth are important in the proposed mode. Here, we evaluated their roles in the proposed MSD model in terms of the two metrics on the Diginetica and Yoochoose1/64 datasets. The effects of these two hyperparameters are shown in Figure 3 and Figure 4, respectively.
As depicted in Figure 3, we present the experimental results across different orders ranging across 1, 2, 3, 4, 5, and 6. Generally, as the order increases, the performance of the proposed model demonstrates stable improvement. Notably, for both the Diginetica and Yoochoose 1/64 datasets, significant enhancements in model performance were observed when transitioning from order 1 to 2. This indicates that the proposed high-order semantic unit possesses superior expressive capability compared to previous methods that analyze user preference based on individual items.
However, the rate of improvement becomes relatively modest as the order surpasses two. This phenomenon can be attributed to the fact that sessions in both datasets are typically not very long. Continuously increasing the order may lead to overfitting, resulting in diminishing returns in performance improvement.
The primary distinction between the two datasets lies in the magnitude of improvement observed when utilizing the Yoochoose 1/64 dataset compared to the Diginetica dataset. This discrepancy can be attributed to the average session length; the Yoochoose 1/64 dataset exhibited longer average session lengths than the Diginetica dataset. Consequently, a dataset with lengthier sessions necessitates a higher order to effectively capture and model user preferences, leading to more substantial improvements in performance.
We conducted experiments with varying denoising depths to assess their impact on the proposed model. Overall, when integrating the denoising module with the multi-order semantic module, improvements were observed in both the P@20 and MRR@20 metrics. As illustrated in Figure 4, it was evident that excessively large or too low denoising depths did not yield optimal performance. The optimal performance points differed notably between the two datasets. This indicates that each dataset contains a certain level of noisy data, with the degree of noise varying between them.
The results demonstrate the importance of appropriately tuning the denoising depth to achieve optimal performance for a specific dataset. Too high of a denoising depth may lead to excessive filtering, potentially removing relevant information and hindering the model’s ability to capture important patterns in the data. Conversely, too low of a denoising depth may fail to effectively filter out noise, resulting in suboptimal performance. The optimal denoising depth strikes a balance between reducing noise interference and preserving valuable information, leading to improved recommendation accuracy.

5. Conclusions

In this study, we introduced a novel Multi-Order Semantic Denoising (MSD) model aimed at addressing two long-standing challenges in session-based recommendation systems. In contrast to previous methods, our approach involves mining a user’s main intention from a higher dimension by combining items of different lengths as multi-order semantic units. These semantic units are then encoded into a heterogeneous graph, and a Graph Neural Network (GNN) is employed to propagate the information effectively.
Additionally, we developed a denoising module utilizing a soft attention mechanism to assess the relevance between different semantic units. This module helps to recognize and mitigate the impact of noisy behavior in the session data. An ablation study was conducted using three real-world datasets to assess the effectiveness of each component of our model. The results highlighted the importance of capturing user preferences from multiple dimensions and the significance of filtering out noisy data to accurately analyze user intentions.
In future work, we will aim to streamline the item representation and optimize the information propagation mechanism to make our model lighter and more efficient. This will not only improve the model’s computational efficiency but also its scalability and applicability in large-scale recommendation systems.

Author Contributions

S.C.: Conceptualization, Methodology, Formal analysis, Writing—review and editing, Supervision, Funding acquisition. W.H.: Methodology, Writing—original draft, Visualization. Z.Y.: Data curation, Validation. J.Z.: Writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by grants from the Nature Science Foundation of Anhui Province in China, No. 2008085MF193, and the Outstanding Young Talents Program of Anhui Province in China, No. gxyqZD2018060.

Data Availability Statement

The source codes of all the experiments in this paper are available on github https://github.com/shulincheng/MSD_Recsys (acceseed on 24 December 2023).

Conflicts of Interest

The authors declare that they have no known competing final interests or personal relationships that could have appeared to influence the work reported in this paper.

References

  1. Wang, S.; Cao, L.; Wang, Y.; Sheng, Q.Z.; Orgun, M.A.; Lian, D. A survey on session-based recommender systems. ACM Comput. Surv. (CSUR) 2021, 54, 154. [Google Scholar] [CrossRef]
  2. Shani, G.; Heckerman, D.; Brafman, R.I.; Boutilier, C. An MDP-based recommender system. J. Mach. Learn. Res. 2005, 6, 9. [Google Scholar]
  3. Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personalized markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820. [Google Scholar]
  4. Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based recommendations with recurrent neural networks. arXiv 2015, arXiv:1511.06939. [Google Scholar]
  5. Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural attentive session-based recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1419–1428. [Google Scholar]
  6. Tan, Y.K.; Xu, X.; Liu, Y. Improved recurrent neural networks for session-based recommendations. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 17–22. [Google Scholar]
  7. Wu, S.; Tang, Y.; Zhu, Y.; Wang, L.; Xie, X.; Tan, T. Session-based recommendation with graph neural networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 346–353. [Google Scholar]
  8. Wang, X.; He, X.; Chua, T.S. Learning and reasoning on graph for recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 890–893. [Google Scholar]
  9. Wang, J.; Ding, K.; Zhu, Z.; Caverlee, J. Session-based recommendation with hypergraph attention networks. In Proceedings of the 2021 SIAM International Conference on Data Mining (SDM), Virtual Event, 29 April–1 May 2021; pp. 82–90. [Google Scholar]
  10. Xia, X.; Yin, H.; Yu, J.; Wang, Q.; Cui, L.; Zhang, X. Self-supervised hypergraph convolutional networks for session-based recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; pp. 4503–4511. [Google Scholar]
  11. Tong, X.; Wang, P.; Niu, S. Reinforcement learning-based denoising network for sequential recommendation. Appl. Intell. 2023, 53, 1324–1335. [Google Scholar] [CrossRef]
  12. Zhang, X.; Lin, H.; Xu, B.; Li, C.; Lin, Y.; Liu, H.; Ma, F. Dynamic intent-aware iterative denoising network for session-based recommendation. Inf. Process. Manag. 2022, 59, 102936. [Google Scholar] [CrossRef]
  13. Mnih, A.; Salakhutdinov, R.R. Probabilistic matrix factorization. Adv. Neural Inf. Process. Syst. 2007, 20. Available online: https://proceedings.neurips.cc/paper_files/paper/2007/hash/d7322ed717dedf1eb4e6e52a37ea7bcd-Abstract.html (accessed on 24 December 2023).
  14. Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
  15. Khaledian, N.; Mardukhi, F. CFMT: A collaborative filtering approach based on the nonnegative matrix factorization technique and trust relationships. J. Ambient. Intell. Humaniz. Comput. 2022, 13, 2667–2683. [Google Scholar] [CrossRef]
  16. Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. arXiv 2012, arXiv:1205.2618. [Google Scholar]
  17. Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
  18. Davidson, J.; Liebald, B.; Liu, J.; Nandy, P.; Van Vleet, T.; Gargi, U.; Gupta, S.; He, Y.; Lambert, M.; Livingston, B.; et al. The YouTube video recommendation system. In Proceedings of the Fourth ACM Conference on Recommender Systems, Barcelona, Spain, 26–30 September 2010; pp. 293–296. [Google Scholar]
  19. Dias, R.; Fonseca, M.J. Improving music recommendation in session-based collaborative filtering by using temporal context. In Proceedings of the 2013 IEEE 25th International Conference on Tools with Artificial Intelligence, Herndon, VA, USA, 4–6 November 2013; pp. 783–788. [Google Scholar]
  20. Wang, P.; Guo, J.; Lan, Y.; Xu, J.; Wan, S.; Cheng, X. Learning hierarchical representation model for nextbasket recommendation. In Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015; pp. 403–412. [Google Scholar]
  21. Duan, J.; Zhang, P.F.; Qiu, R.; Huang, Z. Long short-term enhanced memory for sequential recommendation. World Wide Web 2023, 26, 561–583. [Google Scholar] [CrossRef]
  22. Cho, K.; Van Merriënboer, B.; Bahdanau, D.; Bengio, Y. On the properties of neural machine translation: Encoder-decoder approaches. arXiv 2014, arXiv:1409.1259. [Google Scholar]
  23. Quadrana, M.; Karatzoglou, A.; Hidasi, B.; Cremonesi, P. Personalizing session-based recommendations with hierarchical recurrent neural networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems, Como, Italy, 27–31 August 2017; pp. 130–137. [Google Scholar]
  24. Liu, Q.; Zeng, Y.; Mokhosi, R.; Zhang, H. STAMP: Short-term attention/memory priority model for session-based recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, New York, NY, USA, 19–23 August 2018; pp. 1831–1839. [Google Scholar]
  25. Pan, Z.; Cai, F.; Chen, W.; Chen, H.; De Rijke, M. Star graph neural networks for session-based recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management, Virtual Event, 19–23 October 2020; pp. 1195–1204. [Google Scholar]
  26. Sun, F.; Liu, J.; Wu, J.; Pei, C.; Lin, X.; Ou, W.; Jiang, P. BERT4Rec: Sequential recommendation with bidirectional encoder representations from transformer. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 1441–1450. [Google Scholar]
  27. Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Philip, S.Y. A comprehensive survey on graph neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2020, 32, 4–24. [Google Scholar] [CrossRef] [PubMed]
  28. Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
  29. Chen, T.; Wong, R.C.W. Handling information loss of graph neural networks for session-based recommendation. In Proceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Virtual Event, 6–10 July 2020; pp. 1172–1180. [Google Scholar]
  30. Feng, L.; Cai, Y.; Wei, E.; Li, J. Graph neural networks with global noise filtering for session-based recommendation. Neurocomputing 2022, 472, 113–123. [Google Scholar] [CrossRef]
  31. Guo, J.; Yang, Y.; Song, X.; Zhang, Y.; Wang, Y.; Bai, J.; Zhang, Y. Learning multi-granularity consecutive user intent unit for session-based recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Tempe, AZ, USA, 21–25 February 2022; pp. 343–352. [Google Scholar]
  32. Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. arXiv 2017, arXiv:1706.02216. [Google Scholar]
  33. Yuan, J.; Song, Z.; Sun, M.; Wang, X.; Zhao, W.X. Dual sparse attention network for session-based recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Virtually, 2–9 February 2021; pp. 4635–4643. [Google Scholar]
  34. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Figure 1. Instances of items clicked by mistake or out of curiosity, which negatively impact the generation of reliable recommendations.
Figure 1. Instances of items clicked by mistake or out of curiosity, which negatively impact the generation of reliable recommendations.
Electronics 13 03118 g001
Figure 2. The framework of the proposed MSD model.
Figure 2. The framework of the proposed MSD model.
Electronics 13 03118 g002
Figure 3. The performance of different orders.
Figure 3. The performance of different orders.
Electronics 13 03118 g003
Figure 4. The performance of denoising depths.
Figure 4. The performance of denoising depths.
Electronics 13 03118 g004
Table 1. Notations.
Table 1. Notations.
SYMBOLDESCRIPTION
v j k the semantic units in each order k
e j k the learnable embedding vector of the k-order semantic unit
h v k the embedding vector of node v in layer k
e m k the primary intention of the k-order semantic units
a j k the attention coefficient of semantic unit v j k
S k the session embeddings of k-order semantic units
Table 2. Statistics of the preprocessed data.
Table 2. Statistics of the preprocessed data.
DigineticaYoochoose1/64Yoochoose1/4
#clicks982,961557,2488,326,407
#train sessions719,470369,8595,917,746
#test sessions60,85855,89855,898
#unique items43,09716,76629,618
Average length5.126.165.71
Table 3. Summary of the experimental results for the three datasets. The results produced by the best performer in each column are boldfaced, respectively.
Table 3. Summary of the experimental results for the three datasets. The results produced by the best performer in each column are boldfaced, respectively.
MethodsDigineticaYoochoose1/64Yoochoose1/4
P@20 MRR@20 P@20 MRR@20 P@20 MRR@20
Item-KNN35.7511.5751.6021.8152.3121.70
GRU4Rec29.458.3360.6422.8959.5322.60
NARM49.7016.1768.3228.6369.7329.23
STAMP45.6414.3268.7429.6770.4430.00
SR-GNN50.7317.5970.5730.9471.3631.89
SGNN-HN55.6719.4572.0632.6172.8532.55
LESSR52.1718.1370.9431.1671.4031.56
GNN-GNF51.6117.7771.5031.3572.1131.67
DIDN56.2220.3072.1231.6972.6532.59
MSD56.9319.8773.6133.7574.5534.21
Table 4. Recommendation performance comparisons of ablation models on the three datasets.
Table 4. Recommendation performance comparisons of ablation models on the three datasets.
MethodsDigineticaYoochoose1/64Yoochoose1/4
P@20 MRR@20 P@20 MRR@20 P@20 MRR@20
M S D 56.9319.8773.6133.7574.5534.21
M S D D 56.6819.4173.1833.2773.8733.75
M S D D & M 55.5418.8972.6332.8673.5233.48
Table 5. Impact of two kinds of edges and main intention.
Table 5. Impact of two kinds of edges and main intention.
MethodsYoochoose1/64Diginetica
P@20 MRR@20 P@20 MRR@20
MSD73.6133.7556.9319.87
w/o Inter-edge73.3933.2556.7719.76
w/o Intra-edge73.2633.2156.7119.58
w/o Main intention73.4033.5856.8119.73
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Cheng, S.; Huang, W.; Yu, Z.; Zheng, J. MSD: Multi-Order Semantic Denoising Model for Session-Based Recommendations. Electronics 2024, 13, 3118. https://doi.org/10.3390/electronics13163118

AMA Style

Cheng S, Huang W, Yu Z, Zheng J. MSD: Multi-Order Semantic Denoising Model for Session-Based Recommendations. Electronics. 2024; 13(16):3118. https://doi.org/10.3390/electronics13163118

Chicago/Turabian Style

Cheng, Shulin, Wentao Huang, Zhenqiang Yu, and Jianxing Zheng. 2024. "MSD: Multi-Order Semantic Denoising Model for Session-Based Recommendations" Electronics 13, no. 16: 3118. https://doi.org/10.3390/electronics13163118

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.
Back to TopTop