Collaborative Co-Attention Network for Session-Based Recommendation

Chen, Wanyu; Chen, Honghui

doi:10.3390/math9121392

Open AccessArticle

Collaborative Co-Attention Network for Session-Based Recommendation

by

Wanyu Chen

^*

and

Honghui Chen

Science and Technology on Information Systems Engineering Laboratory, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

Mathematics 2021, 9(12), 1392; https://doi.org/10.3390/math9121392

Submission received: 19 May 2021 / Revised: 10 June 2021 / Accepted: 11 June 2021 / Published: 15 June 2021

Download

Browse Figures

Versions Notes

Abstract

:

Session-based recommendation aims to model a user’s intent and predict an item that the user may interact with in the next step based on an ongoing session. Existing session-based recommender systems mainly aim to model the sequential signals based on Recurrent Neural Network (RNN) structures or the item transition relations between items with Graph Neural Network (GNN) based frameworks to identify a user’s intent for recommendation. However, in real scenarios, there may be strong sequential signals existing in users’ adjacent behaviors or multi-step transition relations among different items. Thus, either RNN- or GNN-based methods can only capture limited information for modeling complex user behavior patterns. RNNs pay attention to the sequential relations among consecutive items, while GNNs focus on structural information, i.e., how to enrich the item embedding with its adjacent items. In this paper, we propose a Collaborative Co-attention Network for Session-based Recommendation (CCN-SR) to incorporate both sequential and structural information, as well as capture the co-relations between them for obtaining an accurate session representation. To be specific, we first model the ongoing session with an RNN structure to capture the sequential information among items. Meanwhile, we also construct a session graph to learn the item representations with a GNN structure. Then, we design a co-attention network upon these two structures to capture the mutual information between them. The designed co-attention network can enrich the representation of each node in the session with both sequential and structural information, and thus generate a more comprehensive representation for each session. Extensive experiments are conducted on two public e-commerce datasets, and the results demonstrate that our proposed model outperforms state-of-the-art baseline model for session based recommendation in terms of both Recall and MRR. We also investigate different combination strategies and the experimental results verify the effectiveness of our proposed co-attention mechanism. Besides, our CCN-SR model achieves better performance than baseline models with different session lengths.

Keywords:

session-based recommendation; recurrent neural network; graph neural network; co-attention mechanism

1. Introduction

With the information increasing at a rapid speed on the Internet, recommender systems have been proposed to provide users with their required information in an efficient way [1,2,3,4]. Many general recommendation approaches rely on users’ historical behaviors to make personalized recommendations. For example, collaborative filtering (CF) [5] builds the user-item interaction matrix and learns user as well as item representations so as to fill in the matrix and make recommendations. However, in some real scenarios, users’ personal information, e.g., user IDs, is not available. For instance, users may not log into the recommender system when using some online shopping service. It is a challenging task to recommend users satisfying items only based on limited behaviors in a session in those cases. Session-based recommendation (SBRS) is then proposed to deal with the task and make recommendations based on an ongoing session [3].

Since the items in a session may be connected due to sequential relations, early methods model the transition and co-occurrence relations between items with ItemKNN [6] and Markov Chain [7]. However, those models have a strong assumption of the independence of the past interactions and are mainly based on the last behavior to make recommendations, which confines the recommendation accuracy for SBRS. Recently, Recurrent Neural Network (RNN) has played an important role in session-based recommendation tasks due to its ability in modeling the sequential relations among items. Hidasi et al. [3] first use Gated Recurrent Unit (GRU) to model user behavior sequences in a session and proposed the GRU4Rec model. After that, attention mechanism has been adopted and helped to boost the performance of session-based recommendation. Li et al. [8] apply the attention mechanism to distinguish different item importances and then combine the weighted hidden states and the last hidden state to make final recommendations. However, when users click some unrelated items in a session, RNN-based approaches may get misled by those noisy interactions, which results in an inaccurate session representation and unsatisfied recommendations. As RNNs mainly focus on the sequential transitions among items in a single way, Graph Neural Network (GNN)-based approaches have been proposed to enrich the item representation with its neighbors through propagating information between adjacent items. For example, Wu et al. [2] propose to construct a session graph to represent a session and then learn the item embeddings with graph neural networks, which achieves satisfied results. However, GNN-based approaches often ignore the sequential information among user behaviors and cannot capture long-term context information for the next item recommendation.

Thus, in our paper, we propose a Collaborative Co-attention Network for Session-based Recommendation (CCN-SR), which takes the advantages from both RNN as well as GNN structures. More specifically, we first input the user behaviors in an ongoing session, i.e., user interacted items, into a GRU network to model the sequential relations among those behaviors. Meanwhile, we also construct a session graph for the ongoing session and use another GNN network to model the structural information in those behaviors. After that, we can get the hidden state for each item in the session from the GRU network, and the node embedding for each item in the session from the GNN network. Then, we propose a co-attention mechanism to incorporate the sequential as well as structural information to get an accurate representation for the session. The co-attention mechanism can capture the mutual relations between these two kinds of information, i.e., sequential as well as structural information, and thus generate a more comprehensive representation of each session. Specifically, we design two strategies to achieve our co-attention mechanism, i.e., parallel co-attention and alternating co-attention. We conduct experiments on two public e-commerce datasets to verify the effectiveness of our CCN-SR model and explore the differences between the performances of our proposed two kinds of co-attention mechanisms as well as the simple concatenation strategy. The results demonstrate that our CCN-SR model can achieve better performance than the state-of-the-art baseline model in terms of both Recall and MRR.

The main contributions in this paper can be summarized as follows:

To the best of our knowledge, we are the first to incorporate both structural as well as sequential information and capture their co-dependent relations for session-based recommendation;
We propose a Collaborative Co-attention Network for Session-based Recommendation (CCN-SR) model, which introduces the co-attention mechanism upon RNN and GNN networks to get the mutual information between them and enrich the session representations;
We conduct comprehensive experiments on two publicly available datasets by comparing with state-of-the-art baselines to validate the effectiveness of our proposal. Experimental results show that CCN-SR can beat the baselines in terms of both Recall and MRR.

We summarize related literature in Section 2. The details of our proposed model are described in Section 3. The experimental setups and datasets are introduced in Section 4. Finally, we give our analysis of the experimental results in Section 5 and the conclusion in Section 6.

2. Related Works

In this section, we give a summarization of the related literature for our work. We mainly divide them into three aspects: general recommendation approaches, session-based recommendation approaches and attention based approaches.

2.1. General Recommendation Approaches

General recommendation approaches have been applied widely in recommender systems, which predict users’ general preference based on their historical interactions. Most general recommenders are based on Collaborative Filtering (CF), which aims to factorize the user-item interaction matrix into two low rank matrices containing user latent vectors as well as item latent vectors [9,10,11,12,13]. Traditional methods such as Singular Value Decomposition (SVD) [14] generate a user’s preference towards an item with a linear product of the user’s latent vector and the item’s latent vector. However, the linear kernel often cannot model the users’ preference accurately, and many researchers have pointed out that nonlinearity has potential advantages for improving the performance of recommender systems with extensive experiments [15,16,17]. Thus, the deep learning-based recommendation approaches have been proposed and boost the performance of general recommendations.

Restricted Boltzmann Machines (RBM) [18,19,20] was a proposed as an early neural based recommender system. It applies a two-layer undirected graph to model tabular data, such as users’ explicit ratings of movies. For top-N recommendation, He et al. [5] propose to use multi-layer perceptrons to model the the two-way interaction between users and items, which captures the non-linear relationship between users and items and achieves satisfied results. Some other recommendation models [21,22,23] use a convolutional neural network (CNN) to integrate external information, e.g., the review text or contextual information, which helps to improve the recommendation performance.

However, those general recommendation models often ignore the changes in users’ preferences and always generate the same recommendations to a user. Thus, they are not suitable for session-based recommendation, where the recommended items should be adopted to a user’s current interest.

2.2. Session-Based Recommendation Approaches

Session-based recommendation aims to capture users’ dynamic preferences in an ongoing session. In early stages, Markov Chains has been adopted to model the transition relations between adjacent items [7,24,25,26]. Recently, neural network based approaches, e.g., RNN-based models, have been widely adopted for session-based recommendation. Hidasi et al. [3] first introduce Gated Recurrent Unit (GRU) to model the current session and propose a GRU4Rec model as well as a session-parallel mini-batch training process. Following [3], an improved RNN-based approach has been proposed in [27], which applies a data augmentation strategy and solves the distribution shifts in the input data. Hidasi and Karatzoglou [28] propose an improved loss function to optimize the training process of the GRU4Rec model and achieve good performance. Bogina and Kuflik [29] incorporate the dwell time to the RNN structure and boost the performance for session-based recommendation. It also indicates that the sequential relation among items in a session is importance when making recommendations. There are also some memory-based approaches for session-based recommendation. For example, Chen et al. [30] propose a Recommendation with User Memory Network (RUM) model, which uses external memory to store and distinguish users’ interactions in a session.

RNNs and memory network cannot capture some complex relations among items in a session, e.g., some structural information between those items. Thus, GNN-based approaches have been proposed [2,31]. Wu et al. [2] is the first to introduce graph neural network into session-based recommendation and propose a SR-GNN model. They construct a session graph for each session and then apply the gated graph neural network (GGNN) to generate node representations in the graph, which finally help to make recommendations. Based on this work, some researches take the long-term dependencies among items into consideration and generate more accurate session representations [32,33]. Moreover, Yu et al. [34] propose a target-attention mechanism within a graph neural network, which also improves the recommendation performance over the SR-GNN model. In addition, Qiu et al. [35] adopt a weight graph neural network to distinguish different importance of the propagated information. However, those GNN-based approaches cannot model the sequential relations between user behaviors accurately and only propagation information between adjacent items, which limits its ability in capturing context information in a session.

2.3. Attention-Based Recommendation Approaches

The attention mechanism helps to distinguish the importance of different items in a session, which can boost the performance for session-based recommendation [36,37]. Li et al. [8] propose a Neural Attentive Recommendation Machine (NARM) model, which regards the last hidden state modeled by a session-based RNN as the global encoder, and uses other hidden states for calculating attention weights of different items to capture users’ current intents. As for memory-based models, Liu et al. [36] propose a short-term attention memory priority model, i.e., Short-Term Attention/Memory Priority Model (STAMP), where the attention weights for different interactions are calculated based on the session context and the final records in the current session. SR-GNN model [2] also applies the attention mechanism to distinguish importance of different items in the current session, which is the same as the way in NARM [8].

The aforementioned methods often adopt the attention mechanism that is enhanced by the last hidden state, which is not suitable for capturing the mutual information between different structures, i.e., RNNs and GNNs. In contrast, we propose to use co-attention mechanism to generate co-dependent representations of each item in a session and thus can make more accurate recommendations.

3. Methods

The Collaborative Co-attention Network for Session-based Recommendation (CCN-SR) model we propose in this paper mainly contains four components: an RNN-based session encoder, a GNN-based session encoder, a co-attention network and a prediction layer. We show the main framework of our model in Figure 1, in which these components can be trained and optimized in an end-to-end way. In the following sections, we first describe the problem formulation as well as notations, and then we give detailed descriptions of each components in CCN-SR.

3.1. Problem Formulation and Notation

Given a user and their sequential interactions in a session, we aim to recommend their next interaction based their short-term preferences learned from previous behaviors in the session.

We denote the current session as

S = {v_{1}

,

v_{2}

, …,

v_{T}}

, where

v_{i}

is the i-th item interacted by a user in the session; T denotes the number of events in the current session. In Figure 1, an embedding layer is built at the bottom of the network which is used for generating the item embeddings shared by both the RNN network as well as the GNN network. We use

v_{i}

to indicate the embedding of

v_{i}

. We summarize the notations we use in our paper in Table 1.

3.2. RNN-Based Session Encoder

Recurrent Neural Network (RNN) has been widely used to model the sequential data. Given a sequence like

{v_{1}

,

v_{2}

, …,

v_{T}}

, RNN calculates a hidden state

h_{t}

for each step t in the sequence, which mainly contains summative information of the sequence until the step t.

h_{t}

is computed based on the hidden state of its former step

h_{t - 1}

and its current input

v_{t}

:

h_{t} = f (h_{t - 1}, v_{t}),

(1)

where f is the main function in RNN. Different RNN architectures, e.g., Long Short-Term Memory unit (LSTM) [38] and Gated Recurrent Unit (GRU), have different functions. In our paper, we use GRU as the RNN-based session encoder since it shows better performance than simple RNNs and LSTM [3].

GRU contains the input gate, reset gate and update gate, which are used to control the information propagated from former steps to the current step. The hidden state

h_{t}

can be calculated by a linear combination of former hidden state

h_{t - 1}

and the candidate hidden state

{\hat{h}}_{t}

:

h_{t} = z_{t} h_{t - 1} + (1 - z_{t}) {\hat{h}}_{t},

(2)

where the update gate

z_{t}

is given by:

z_{t} = σ (W_{z} v_{t} + U_{z} h_{t - 1}),

(3)

where

W_{z}

and

U_{z}

are update parameters for

v_{t}

and

h_{t - 1}

, respectively. The candidate hidden state can be computed as:

{\hat{h}}_{t} = tanh (W_{o} v_{t} + r_{t} ⊙ U_{o} h_{t - 1}),

(4)

where ⊙ denotes the Hadamard product, which is an element-wise product of matrices. The reset gate

r_{t}

can be calculated by:

r_{t} = σ (W_{r} v_{t} + U_{r} h_{t - 1}),

(5)

where

W_{r}

and

U_{r}

are reset parameters for

v_{t}

and

h_{t - 1}

, respectively.

As the hidden state of each step contains sequential information among user previous behaviors until this step as well as the user’s current intent, we collect the hidden state of each step in the ongoing session modeled by the RNN structure as

S_{r} = {h_{1}^{r}, h_{2}^{r}, \dots, h_{T}^{r}}

and

S_{r} \in R^{D \times T}

, where D is the dimension of each hidden state in

S_{r}

. We then explore the structural information contained in current session with another GNN structure.

3.3. GNN-Based Session Encoder

In this section, we model the transition relations between items and generate accurate item embeddings in the current session with a graph neural network. Let

V_{S} = {v_{1}, v_{2}, \dots, v_{m}}

denotes the unique items in S. Note that m may be smaller than T since there usually exists some repeat interactions with the same item in a session [35,39].

We first construct a directed session graph

G_{s} = (V_{s}, E_{s})

for each session, where

V_{s}

and

E_{s}

denote the nodes and edges, respectively. Each node

v_{s, i}

represents for an item

v_{i}

in the current session, where

v_{s, i} \in V_{S}

. Each edge

(v_{s, i - 1}, v_{s, i})

indicates that the user interacts with item

v_{i}

after

v_{i - 1}

in the session. As some edges may appear several times in a session, we give those edges different weights to distinguish the importance of them. Specifically, the weights are calculate based on the occurrence of the edge divided by the outdegree of the start node of the edge. We then build the adjacency matrices

A_{s}^{o u t}

and

A_{s}^{i n}

, which represents the connection between nodes in the session graph with outgoing edges and incoming edges, respectively. By concatenating these two matrices, we can get the matrix

A_{s}

, which is then used in the learning process of the graph neural network.

The graph neural network can incorporate a node’s neighbor features and update the node representation of

v_{s, i}

as follows:

\begin{matrix} a_{s, i}^{t} & = A_{s, i :} {[v_{1}^{t - 1}, \dots, v_{T}^{t - 1}]}^{T} H + b, \\ z_{s, i}^{t} & = σ (W_{z}^{g} a_{s, i}^{t} + U_{z}^{g} v_{i}^{t - 1}), \\ r_{s, i}^{t} & = σ (W_{r}^{g} a_{s, i}^{t} + U_{r}^{g} v_{i}^{t - 1}), \\ {\hat{v}}_{i}^{t} & = t a n h (W_{o}^{g} a_{s, i}^{t} + U_{o}^{g} (r_{s, i}^{t} ⊙ v_{i}^{t - 1})), \\ v_{i}^{t} & = z_{s, i}^{t} ⊙ {\hat{v}}_{i}^{t} + (1 - z_{s, i}^{t}) v_{i}^{t - 1}, \end{matrix}

(6)

where

H \in R^{D \times 2 D}

indicates the weights,

A_{s, i :}

is the column in

A_{s}

corresponding to the node

v_{s, i}

.

a_{s, i}^{t}

denotes the information propagated from the neighbors of the node

v_{s, i}

.

z_{s, i}^{t}

and

r_{s, i}^{t}

are the update gate and reset gate.

W_{z}^{g}

,

U_{z}^{g}

,

W_{r}^{g}

,

U_{r}^{g}

,

W_{o}^{g}

and

U_{o}^{g}

are trainable parameters.

σ (\cdot)

is the sigmoid function. The final state of node

v_{s, i}

is calculated by a combination of its previous state

v_{i}^{t - 1}

and the candidate state

{\hat{v}}_{i}^{t}

. With several steps of updating, we can get all nodes vectors in the session until convergence. We denote the node vectors of items in the session as

S_{g} = {v_{1}^{g}, v_{2}^{g}, \dots, v_{T}^{g}}

and

S_{g} \in R^{D \times T}

, where D is the dimension of each hidden state in

S_{g}

.

3.4. Co-Attention Network

As we can see from Equation (6), the GNN-based encoder mainly captures the transition relations between adjacent items and models the structural information in the session. Meanwhile, the output of the RNN-based encoder contains the sequential information in the session. It is beneficial to incorporate this two kinds of information so as to generate a comprehensive session representation for making recommendations. Intuitively, using concatenation of the outputs of the RNN- as well as GNN-based session encoder, i.e.,

S_{r}

and

S_{g}

, would be a choice. However, we argue that the two kinds of information can provide context for each other and thus help to distinguish the importance of different items while simple concatenation cannot capture this mutual relation.

Thus, in this section, we design a co-attention mechanism to explore the relations between

S_{r}

and

S_{g}

and make an accurate session representation. Specifically, we design two strategies to achieve the co-attention mechanism, i.e., parallel co-attention and alternating co-attention. We give detailed analysis in the following sections.

3.4.1. Parallel Co-Attention

After modeling the sequential as well as structural information in the session, we use

S_{r}

and

S_{g}

as inputs of our parallel co-attention mechanism. We show the detailed calculation process of the parallel co-attention mechanism in Figure 2. We first calculate the affinity matrix

C

:

C = tanh (S_{r}^{T} W_{c} S_{g}),

(7)

where

W_{c} \in R^{D \times D}

is a transformer matrix.

C

can be regarded as a co-relation matrix between

S_{r}

and

S_{g}

. We then use it as a context information and calculate the attention scores for hidden state of each step in

S_{r}

:

\begin{matrix} H^{r} = & tanh (W_{s r} S_{r} + W_{t}^{r} h_{T}^{r} + (W_{s g} S_{g} + W_{t}^{g} v_{T}^{g}) C^{T}) \end{matrix}

(8)

\begin{matrix} α_{r} = & softmax (w_{h r}^{T} H^{r}) \end{matrix}

(9)

and the attention scores for each node vector in

S_{g}

with:

\begin{matrix} H^{g} = & tanh (W_{s g} S_{g} + W_{t}^{g} v_{T}^{g} + (W_{s r} S_{r} + W_{t}^{r} h_{T}^{r}) C) \end{matrix}

(10)

\begin{matrix} α_{g} = & softmax (w_{h g}^{T} H^{g}), \end{matrix}

(11)

where

W_{s r}

,

W_{s g} \in R^{K \times D}

,

w_{h r}

,

w_{h g} \in R^{K}

are weight parameters for

S_{r}

and

S_{g}

, respectively. Here,

α_{r} \in R^{T}

and

α_{g} \in R^{T}

are the co-attention scores of items in

S_{r}

and

S_{g}

.

In both Equations (8) and (10), we emphasis the importance of the user last behavior, which is marked with red lines in Figure 2, i.e.,

h_{T}^{r}

and

v_{T}^{g}

. It is because that the final interaction often plays an important role in predicting the user’s next behavior in session-based recommendation especially in some e-commerce scenarios, which has been proved by many researches, e.g., NARM [8] and STAMP [36]. Here, we use

W_{t}^{r}

,

W_{t}^{g} \in R^{K \times D}

as the weight parameters for

h_{T}^{r}

and

v_{T}^{g}

so as to emphasis the last behavior adaptively.

After calculating the co-attention weights, we can generate co-dependent session representations modeled by the RNN-based session encoder and the GNN-based session encoder as follows:

\begin{matrix} U_{co} - r & = \sum_{n = 1}^{T} α_{r}^{n} h_{n}^{r}, \\ U_{co} - g & = \sum_{n = 1}^{T} α_{g}^{n} v_{n}^{g} . \end{matrix}

(12)

Combining

U_{co} - r

with

h_{T}^{r}

and

U_{co} - g

with

v_{T}^{g}

, respectively, we can get:

\begin{matrix} U_{r} & = B_{r} [h_{T}^{r}; U_{co} - r], \\ U_{g} & = B_{g} [v_{T}^{g}; U_{co} - g], \end{matrix}

(13)

where

B_{r} \in R^{D \times 2 D}

and

B_{g} \in R^{D \times 2 D}

are used to compress the two vectors to get the hybrid representations.

Finally, we use a concatenation of

U_{r}

and

U_{g}

to generate the final session representation of S as:

U_{S} = B^{p} [U_{r}; U_{g}],

(14)

where

B^{p} \in R^{D \times 2 D}

.

3.4.2. Alternating Co-Attention

In the aforementioned parallel co-attention strategy, we calculate the co-dependent representations

U_{co} - r

and

U_{co} - g

in parallel for a time. In this section, we introduce another co-attention strategy, i.e., alternating co-attention, which can also capture the mutual information between

S_{r}

and

S_{g}

, as well as integrate the sequential information and structural information for session-based recommendation. We show the details of the alternating co-attention mechanism in Figure 3. As shown in Figure 3, we sequentially alternate between the initial outputs of the RNN-based session encoder and the GNN-based session encoder, as well as the attentive representations of them, which can thus take more information into consideration.

First, we calculate the affinity matrix

C

using Equation (7). Then we do normalization row-wise to produce the attention weights

A^{g}

across the outputs of the RNN-based session encoder, i.e.,

S_{r}

, for each node vector in the outputs of the GNN-based session encoder, i.e.,

S_{g}

. At the same time,

C

is also normalized column-wise to conduct the attention scores

A^{r}

across

S_{g}

for each hidden state in the outputs of the RNN-based session encoder, i.e.,

S_{r}

:

\begin{matrix} A^{g} = softmax (C), \\ A^{r} = softmax (C^{T}) . \end{matrix}

(15)

Next, we generate attentive representation of

S_{r}

as:

C^{r} = S_{g} A^{r} .

(16)

We can then incorporate

C^{r}

with the initial representation

S_{r}

to generate the candidate attentive representation of

S_{g}

as:

{\hat{C}}^{g} = [S_{r}; C^{r}] A^{g},

(17)

where

{\hat{C}}^{g} \in R^{2 D \times T}

. In this way, we can preserve the initial sequential information containing in

S_{r}

.

We also keep the initial structural information and integrate

S_{g}

into

{\hat{C}}^{g}

and generate the final attentive representation of

S_{g}

as:

C^{g} = B_{c g} [S_{g}; {\hat{C}}^{g}],

(18)

where

B_{c g} \in R^{D \times 3 D}

. We finally concatenate

C^{g}

and

C^{r}

to generate the reformulated representations of behaviors in the session and adopt the same attention mechanism as NARM [8]:

C_{i} = [C_{i}^{g}; C_{i}^{r}] \forall i = 1, 2, \dots, T .

(19)

The global and local representations of the session S can be denoted as

U_{global}^{S}

and

U_{local}^{S}

:

\begin{matrix} U_{global}^{S} & = C_{T}, \\ U_{local}^{S} & = \sum_{i = 1}^{T} α_{i} C_{i}, \end{matrix}

(20)

where

α

is the weighted factor calculated by:

α_{i} = m^{T} σ (A_{1} C_{i} + A_{2} C_{T}),

(21)

where

σ

is an activation function, and

m

,

A_{1}

and

A_{2}

are learnable parameters. Then, by concatenating the global as well as local representations, we can generate the final session representation as:

U_{S} = B^{a} [U_{global}^{S}; U_{local}^{S}],

(22)

where

B^{a} \in R^{D \times 2 D}

.

3.5. Prediction and Optimization

With the co-attention mechanism, we can generate the final session representation

U_{S}

, which integrates the sequential as well as structural information in the ongoing session. Then, we make predictions by conducting the dot product of the session representation and the embedding of each candidate item:

y_{i} = U_{S}^{T} v_{i} \forall i = 1, 2, \dots, | V | .

(23)

As

y_{i}

is an unnormalized value, we then do softmax across all candidate items so as to get the prediction probability of item i. For training, we apply the widely used cross-entropy loss as our loss function:

L (p, q) = - \sum_{i = 1}^{| V |} p_{i} log (q_{i}),

(24)

where

p

is the ground truth distribution while

q

is the prediction probability distribution generated based on Equation (23). Then, we can optimize our model with Equation (24).

4. Experimental Setup

In order to investigate the effectiveness of our proposal, we compare the recommendation performance of CCN-SR and several baseline methods on two public e-commerce datasets. In this section, we introduce the baselines, datasets and experimental setups in detail.

4.1. Model Summary

We compare our model with several baselines including traditional methods, i.e., Item-pop and BPR-MF, and neural network based approaches, i.e., RNN-based and GNN-based models. Our baselines are as follows:

Item-pop: It recommends items with high popularities, i.e., items with a large number of interactions [40].
BPR-MF: Factorization-based methods Bayesian personalized ranking (BPR-MF) proposes to use a pairwise ranking loss to optimize the matrix factorization model and make recommendations [41].
GRU4Rec: It proposes an RNN model for session-based recommendation, which utilizes session-parallel mini-batches as well as a pair-wise loss function for training [3].
NARM: It is an RNN based model, which also applies the attention mechanism to emphasis the importance of the last item in the ongoing session [8].
SR-GNN: It proposes to construct a session graph for each session and then adopt the Gated Graph Neural Network (GGNN) model as well as the attention mechanism to capture transition relations among items [2].

We also investigate the performance of different variants of our model proposed in this paper:

CCN-SR_pa: Our proposed CCN-SR model, which adopts the parallel co-attention strategy upon the RNN and GNN networks to get the mutual information between them.
CCN-SR_al: Our proposed CCN-SR model, which adopts the alternating co-attention strategy upon the RNN and GNN networks to get the mutual information between them.

4.2. Datasets and Experimental Setup

4.2.1. Datasets

We evaluate our model as well as the baselines on two public e-commerce datasets:

Tmall: Tmall is a dataset released by Taobao. It contains records of online transactions, with 884 users, 9531 brands and 182,880 interactions.
Tianchi: Tianchi is a dataset provided by Alibaba. (https://tianchi.aliyun.com/getStart/information.htm?spm=5176.100067.5678.2.30a8b6d933N6Rr&raceId=231522, accessed on 11 June 2021), which is larger than the Tmall dataset. It is based on user-commodity behavior data of Alibaba’s Mobile-Commerce platforms. It contains 23,291,027 interactions of 20,000 customers on 4,758,484 items within a month.

Both of the two datasets contain several lines where each line records a user ID, an item ID, and a timestamp when the user interacts with the item. Following [42], we preprocess the datasets as follows. For the Tmall dataset, we filter out users with fewer than 5 interactions and items that appear less than 5 times. For the Tianchi dataset, we filter out users with fewer than 20 interactions and items with fewer than 50 interactions. The sessions with less than 3 items or more than 200 items are also filtered out. The characteristics of the datasets after preprocessing are summarized in Table 2.

4.2.2. Settings and Parameters

We divide two datasets into training set and test set for training and evaluation, respectively. For the Tmall dataset, we use the last 30 days of interactions, as the test set and the remaining days of interactions are regarded as the training set. For Tianchi, the training set consists of all but the last 7 days of interactions; the test set contains the remaining 7 days of interactions. We also remove items in the test sets that do not appear in the training set [2]. Following [2], we apply the data augmentation strategy for our model as well as all baseline models in our paper.

We adopt the widely used metrics, i.e., Recall and MRR, to evaluate the performance of all models [8,36]. Recall considers whether the ground truth item is contained in the recommendation list; MRR evaluates the ranking accuracy of the model, i.e., whether the ground truth item is at the top position of the recommendation list.

We use the Adam [43] as our optimizer when training the model; the learning rate is initialized as 0.001; the batch size is set as 100 and the dimension of the item embeddings is set to 100. We initialize all trainable parameters using a Gaussian distribution with a mean of 0 and a standard deviation of 0.1. Unless specified differently, we set the recommendation number N as 20.

5. Results

In this section, we conduct several experiments to evaluate the performance of our model as well as the baselines. We first analyze the overall performance of all models in Section 5.1. We then investigate the impact of different combination strategies of the sequential and structural information on our model, i.e., the parallel co-attention strategy, the alternating co-attention strategy, and the concatenation strategy, in Section 5.2. Finally, we analyze the performance of our model as well as the baseline models on sessions with different lengths in Section 5.3.

5.1. Overall Performance

As shown in Table 3, as for baselines, it is obvious that neural-based approaches, i.e., GRU4Rec, NARM and SR-GNN, outperform traditional methods, i.e., Item-pop and BPR-MF. When comparing among the neural-based approaches, for the RNN-based models, i.e., GRU4Rec and NARM, we can see that NARM generally outperforms GRU4Rec in terms of Recall@20 and MRR@20 on both of the two datasets. It is because the attention mechanism in NARM can distinguish different item importances, which can help to filter out some noisy interactions and capture users’ main purposes in the current session. As for the GNN-based model, i.e., SR-GNN, it achieves the best performance among all of the baseline models, which indicates that modeling the transition relations between adjacent items can help to improve the recommendation accuracy.

As to our proposed CCN-SR models, we can see that both CCN-SR

_{pa}

and CCN-SR

_{al}

outperform the best baseline model in terms of Recall@20 as well as MRR@20 on Tmall and Tianchi. It demonstrates the effectiveness of considering structural as well as sequential information simultaneously and capturing the co-attentive relations between them for session-based recommendation. Besides, the improvements of our model over the best baseline, i.e., SR-GNN, are significant in terms of Recall@20 and MRR@20 on Tianchi. It may be because Tianchi is sparser than Tmall and the recommendation task is more difficult on Tianchi. However, CCN-SR takes both sequential as well as structural information into consideration, and thus can generate accurate session representations even on a sparse dataset.

Comparing the two variants of CCN-SR, we can find that CCN-SR

_{al}

shows better performance than CCN-SR

_{pa}

in terms of all metrics on both datasets except Recall@20 on the Tmall dataset. It maybe because the alternating co-attention strategy incorporates both of the initial and attentive information with Equations (17) and (18), which helps to preserve enough information for recommendation.

5.2. Impact of Different Combination Strategies

In this section, we aim to explore the effect of different strategies combining the structural as well as sequential information on the performance of CCN-SR. In Section 3.4, we propose two co-attention strategies to capture the relation between this two kinds of information and combine them for session-based recommendation, i.e., parallel co-attention and alternating co-attention. Besides, simply using concatenation of

S_{r}

and

S_{g}

is also an intuitive way to integrate the two kinds of information for session-based recommendation. Thus, in this section, we test the performance of CCR-SR with three different combination strategies, i.e., parallel co-attention, alternating co-attention and concatenation, on the top N recommendation task, with

N = 10

, 20, 30, 40, 50. We denote the three models as CCN-SR

_{pa}

, CCN-SR

_{al}

and CCN-SR

_{concat}

correspondingly. The results of them on the Tmall and Tianchi datasets are presented in Figure 4.

On the Tmall dataset, as shown in Figure 4a,b, we can see that with the increases of the recommendation number N, i.e., from 10 to 50, the performance of all models improves in terms of Recall and MRR. Our proposed CCN-SR

_{pa}

and CCN-SR

_{al}

always shows better performance than CCN-SR

_{concat}

, which indicates that simply using concatenation cannot fully take advantages of the structural as well as sequential information. Specifically, our CCN-SR

_{al}

model shows more improvements over CCN-SR

_{concat}

in terms of MRR when there are less recommendations. For example, when the recommendation number

N = 10

and

N = 50

, CCN-SR

_{al}

outperforms CCN-SR

_{concat}

by 2.4% and 1.3% in terms of MRR, respectively. It may be because our CCN-SR

_{al}

model can always rank the ground truth item at the top position of the list. The ranking accuracy of CCN-SR

_{al}

is not influenced much by the recommendation numbers. This also demonstrates that our model can be applied in some scenarios where the number of recommendations is limited, e.g., mobile recommendations.

For the Tianchi dataset shown in Figure 4c,d, the results are similar to that on the Tmall dataset, except that CCN-SR

_{al}

outperforms CCN-SR

_{pa}

in terms of both Recall and MRR on Tianchi while CCN-SR

_{al}

beats CCN-SR

_{pa}

only in terms of MRR on Tmall. It indicates that CCN-SR

_{al}

is better at improving the ranking accuracy than CCN-SR

_{pa}

, since CCN-SR

_{al}

takes more information (both initial and attentive information) into consideration through the alternating co-attention calculation process.

5.3. Impact of the Current Session Length

In this section, we evaluate the performance of our models as well as baselines with different lengths of sessions. We group the sessions on the test sets into short, medium and long sessions. For Tmall, we regard sessions with no more than five interactions as short sessions, sessions with more than 10 interactions as long sessions and others are medium sessions. For Tianchi, short sessions contain no more than 25 interactions, long sessions contain more than 50 interactions and others are medium. We report the results for short, medium and long sessions on the test set of Tmall and Tianchi in Figure 5.

Clearly, as shown in Figure 5a,b, with the increases of the session length, the performance of all models in terms of Recall@20 and MRR@20 improves, and our CCN-SR models always achieve better performance than those baseline models on the Tmall dataset. For baselines, we can find that SR-GNN and NARM shows better performance than GRU4Rec. However, the improvements of SR-GNN over GRU4Rec decrease with the increases of session length. For instance, SR-GNN beat GRU4Rec by 10.5% and 17.8% in terms of Recall@20 and MRR@20 with short sessions on Tmall while 5.7% and 9.0% with long sessions on Tmall. It indicates that the RNN model is better at dealing with long sessions than short sessions. As for our CCN-SR models, the improvements of our models over SR-GNN in terms of MRR@20 is more than that in terms of Recall@20. For instance, CCN-SR

_{al}

beats SR-GNN by 1.6% and 3.2% in terms of Recall@20 and MRR@20 with long sessions on Tmall. This demonstrates that our model can achieve better performance in terms of recommendation ranking accuracy.

On the Tianchi dataset reported in Figure 5c,d, the performance of all models decreases with the increases of the session length and our proposal can still achieve the best performance compared with the baseline models. Specifically, for Recall@20, SR-GNN outperforms GRU4Rec on short sessions. However, the gap between the performance of the two models becomes smaller when the lengths of sessions change from short to medium. To be more specific, on long sessions, GRU4Rec even shows better performance than SR-GNN. This phenomenon is more obvious for MRR@20. SR-GNN beats GRU4Rec by 9.5% and 3.7% in terms of Recall@20 and MRR@20 with short sessions while the improvements of GRU4Rec over SR-GNN are 2.8% and 19.3% for Recall@20 and MRR@20 with long sessions. This demonstrates that the RNN and GNN models are good at dealing with different lengths of sessions and thus integrating the information learned from them can help to boost the performance for session-based recommendation.

6. Conclusions

In this paper, we propose a collaborative co-attention network for session-based recommendation, i.e., CCN-SR. CCN-SR incorporates both sequential as well as structural information learned from an RNN-based session encoder and a GNN-based session encoder, and capture their co-dependent relations for session-based recommendation. Specifically, we propose two co-attention strategies, i.e., parallel co-attention and alternating co-attention, to generate co-dependent representations of the two kinds of information and then combine them for making recommendations. We conduct comprehensive experiments to verify the effectiveness of our model and explore the impact of different combination strategies. We prove that our proposed co-attention mechanism shows more competitive performance with different recommendation numbers than the simple concatenation strategy. As to sessions with different lengths, the experimental results demonstrate that our CCN-SR model outperforms state-of-the-art baseline model across all session lengths. Our proposal shows more competitive performance especially when the recommendation number is smaller. Thus, it can be applied in some mobile recommendation scenarios, where the length of recommendation list is limited.

As to limitations of this work, on the one hand, our proposal has higher computation complexity than the baseline models due to the co-attention network; on the other hand, the improvements of our model over the best baseline model are not significant in some cases, which may be due to the sparsity of different datasets. The improvements of our model over the baselines are more obvious on the dataset, which is sparser. Thus, we plan to explore our model with more datasets in the future.

As for future work, on the one hand, we plan to incorporate some external knowledge, e.g., category information and content information, to capture the item relations more accurately [44,45,46,47,48]. For example, some items are complement products to each other and should be recommended together, especially in some e-commerce scenarios. On the other hand, different user behaviors, e.g., “click”, “collect” and “buy”, can also provide important information for capturing the user’s preference. For instance, “collect” and “buy” show a stronger consumption motivation of a user for an item than “click”. Thus, we plan to extend our model with the behavior information to generate more informative representations of sessions [49,50,51].

Author Contributions

Conceptualization, W.C. and H.C.; methodology, W.C.; validation, W.C.; formal analysis, W.C.; investigation, W.C.; resources, W.C.; data curation, W.C.; writing—original draft preparation, W.C.; writing—review and editing, H.C.; visualization, W.C.; supervision, H.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The Tianchi dataset can be found at https://tianchi.aliyun.com/getStart/information.htm?spm=5176.100067.5678.2.30a8b6d933N6Rr&raceId=231522 (accessed on 11 June 2021).

Acknowledgments

We would like to thank the editor-in-chief and reviewers for their helpful suggestions.

Conflicts of Interest

The authors declare no conflict of interest.

References

Perez, H.; Tah, J.H.M. Improving the Accuracy of Convolutional Neural Networks by Identifying and Removing Outlier Images in Datasets Using t-SNE. Mathematics 2020, 8, 662. [Google Scholar] [CrossRef]
Wu, S.; Tang, Y.; Zhu, Y.; Wang, L.; Xie, X.; Tan, T. Session-Based Recommendation with Graph Neural Networks. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 346–353. [Google Scholar]
Hidasi, B.; Karatzoglou, A.; Baltrunas, L.; Tikk, D. Session-based Recommendations with Recurrent Neural Networks. arXiv 2015, arXiv:1511.06939. [Google Scholar]
Bhaskaran, S.; Marappan, R.; Santhi, B. Design and Analysis of a Cluster-Based Intelligent Hybrid Recommendation System for E-Learning Applications. Mathematics 2021, 9, 197. [Google Scholar] [CrossRef]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Sarwar, B.M.; Karypis, G.; Konstan, J.A.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th iNternational Conference on World Wide Web, Hong Kong, 1–5 May 2001; pp. 285–295. [Google Scholar]
Rendle, S.; Freudenthaler, C.; Schmidt-Thieme, L. Factorizing personalized Markov chains for next-basket recommendation. In Proceedings of the 19th international conference on World wide web 2010, Raleigh, NC, USA, 26–30 April 2010; pp. 811–820. [Google Scholar]
Li, J.; Ren, P.; Chen, Z.; Ren, Z.; Lian, T.; Ma, J. Neural Attentive Session-based Recommendation. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1419–1428. [Google Scholar]
Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep Learning Based Recommender System: A Survey and New Perspectives. ACM Comput. Surv. 2019, 52, 1–38. [Google Scholar] [CrossRef] [Green Version]
Su, X.; Khoshgoftaar, T.M. A Survey of Collaborative Filtering Techniques. Adv. Artif. Intell. 2009, 2009, 4. [Google Scholar] [CrossRef]
Betru, B.T.; Onana, C.A.; Batchakui, B. Deep Learning Methods on Recommender System: A Survey of State-of-the-art. Int. J. Comput. Appl. 2017, 162, 17–22. [Google Scholar]
Liu, J.; Wu, C. Deep Learning Based Recommendation: A Survey. In Proceedings of the International Conference on Information Science and Applications; Springer: Singapore, 2017; pp. 451–458. [Google Scholar]
Sardianos, C.; Ballas Papadatos, G.; Varlamis, I. Optimizing Parallel Collaborative Filtering Approaches for Improving Recommendation Systems Performance. Information 2019, 10, 155. [Google Scholar] [CrossRef] [Green Version]
Koren, Y.; Bell, R.; Volinsky, C. Matrix Factorization Techniques for Recommender Systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Li, S.; Kawale, J.; Fu, Y. Deep Collaborative Filtering via Marginalized Denoising Auto-encoder. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, VIC, Australia, 19–23 October 2015; pp. 811–820. [Google Scholar]
Wu, Y.; DuBois, C.; Zheng, A.X.; Ester, M. Collaborative Denoising Auto-Encoders for Top-N Recommender Systems. In Proceedings of the Ninth ACM International Conference on Web Search and Data Mining, San Francisco, CA, USA, 22–25 February 2016; pp. 153–162. [Google Scholar]
Sedhain, S.; Menon, A.; Sanner, S.; Xie, L. AutoRec: Autoencoders Meet Collaborative Filtering. In Proceedings of the 24th International Conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 111–112. [Google Scholar]
Salakhutdinov, R.; Mnih, A.; Hinton, G. Restricted Boltzmann Machines for Collaborative Filtering. In Proceedings of the 24th International Conference on World Wide Web 2007, ICML ’07, Corvalis, OR, USA, 20–24 June 2007; pp. 791–798. [Google Scholar]
Truyen, T.T.; Phung, D.Q.; Venkatesh, S. Ordinal Boltzmann Machines for Collaborative Filtering. In Proceedings of the 25th Conference on Uncertainty in Artificial Intelligence, UAI ’09, Montreal, Canada, 18–21 June 2009; pp. 548–556. [Google Scholar]
Liu, X.; Ouyang, Y.; Rong, W.; Xiong, Z. Item Category Aware Conditional Restricted Boltzmann Machine Based Recommendation. In Proceedings of the 22nd International Conference, ICONIP 2015, Istanbul, Turkey, 9–12 November 2015; pp. 609–616. [Google Scholar]
Kim, D.; Park, C.; Oh, J.; Lee, S.; Yu, H. Convolutional Matrix Factorization for Document Context-Aware Recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems; Association for Computing Machinery: New York, NY, USA, 2016; pp. 233–240. [Google Scholar]
Wang, C.; Blei, D.M. Collaborative Topic Modeling for Recommending Scientific Articles. In Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining; Association for Computing Machinery: New York, NY, USA, 2011; pp. 448–456. [Google Scholar]
Van den Oord, A.; Dieleman, S.; Schrauwen, B. Deep Content-based Music Recommendation. In Proceedings of the 27th Conference on Neural Information Processing Systems (NIPS ’13), Lake Tahoe, NV, USA, 5–10 December 2013; pp. 2643–2651. [Google Scholar]
Shani, G.; Heckerman, D.; Brafman, R.I. An MDP-Based Recommender System. J. Mach. Learn. Res. 2005, 6, 1265–1295. [Google Scholar]
Wang, P.; Guo, J.; Lan, Y.; Xu, J.; Wan, S.; Cheng, X. Learning Hierarchical Representation Model for NextBasket Recommendation. In Proceedings of the 8th International ACM SIGIR Conference on Research and Development in Information Retrieval; Association for Computing Machinery: New York, NY, USA, 2015; pp. 403–412. [Google Scholar]
Wang, B.; Cai, W. Attention-Enhanced Graph Neural Networks for Session-Based Recommendation. Mathematics 2020, 8, 1607. [Google Scholar] [CrossRef]
Tan, Y.K.; Xu, X.; Liu, Y. Improved Recurrent Neural Networks for Session-based Recommendations. In Proceedings of the 1st Workshop on Mobile Medical Applications; Association for Computing Machinery (ACM): New York, NY, USA, 2016; pp. 17–22. [Google Scholar]
Hidasi, B.; Karatzoglou, A. Recurrent Neural Networks with Top-k Gains for Session-based Recommendations. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management; Association for Computing Machinery: New York, NY, USA, 2018; pp. 843–852. [Google Scholar]
Bogina, V.; Kuflik, T. Incorporating Dwell Time in Session-Based Recommendations with Recurrent Neural Networks. In Proceedings of the Eleventh ACM Conference on Recommender Systems (RecSys’17), Como, Italy, 27–31 August 2017; pp. 57–59. [Google Scholar]
Chen, X.; Xu, H.; Zhang, Y.; Tang, J.; Cao, Y.; Qin, Z.; Zha, H. Sequential Recommendation with User Memory Networks. In Proceedings of the Eleventh ACM International Conference on Web Search and Data Mining; Association for Computing Machinery (ACM): New York, NY, USA, 2018; pp. 108–116. [Google Scholar]
Wang, B.; Cai, W. Knowledge-Enhanced Graph Neural Networks for Sequential Recommendation. Information 2020, 11, 388. [Google Scholar] [CrossRef]
Xu, C.; Zhao, P.; Liu, Y.; Sheng, V.S.; Xu, J.; Zhuang, F.; Fang, J.; Zhou, X. Graph Contextualized Self-Attention Network for Session-based Recommendation. In Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence, Macao, 10–16 August 2019; pp. 3940–3946. [Google Scholar]
Pan, Z.; Cai, F.; Chen, W.; Chen, H.; de Rijke, M. Star Graph Neural Networks for Session-based Recommendation. In Proceedings of the 29th ACM International Conference on Information & Knowledge Management; Association for Computing Machinery (ACM): New York, NY, USA, 2020; pp. 1195–1204. [Google Scholar]
Yu, F.; Zhu, Y.; Liu, Q.; Wu, S.; Wang, L.; Tan, T. TAGNN: Target Attentive Graph Neural Networks for Session-based Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 1921–1924. [Google Scholar]
Qiu, R.; Li, J.; Huang, Z.; Yin, H. Rethinking the Item Order in Session-based Recommendation with Graph Neural Networks. In Proceedings of the the 28th ACM International Conference on Information and Knowledge Management, Beijing, China, 3–7 November 2019; pp. 579–588. [Google Scholar]
Liu, Q.; Zeng, Y.; Mokhosi, R.; Zhang, H. STAMP: Short-Term Attention/Memory Priority Model for Session-based Recommendation. In Proceedings of the KDD’18: The 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, London, UK, 19–23 August 2018; pp. 1831–1839. [Google Scholar]
Pan, Z.; Cai, F.; Ling, Y.; de Rijke, M. Rethinking Item Importance in Session-based Recommendation. In Proceedings of the SIGIR ’20: The 43rd International ACM SIGIR conference on research and development in Information Retrieval, Virtual Event, China, 25–30 July 2020; pp. 1837–1840. [Google Scholar]
Hochreiter, S.; Schmidhuber, J. Long Short-term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Ren, P.; Chen, Z.; Li, J.; Ren, Z.; Ma, J.; de Rijke, M. RepeatNet: A Repeat Aware Neural Recommendation Machine for Session-Based Recommendation. Proc. Conf. AAAI Artif. Intell. 2019, 33, 4806–4813. [Google Scholar] [CrossRef]
Adomavicius, G.; Tuzhilin, A. Toward the Next Generation of Recommender Systems: A Survey of the State-of-the-Art and Possible Extensions. IEEE Trans. Knowl. Data Eng. 2005, 17, 734–749. [Google Scholar] [CrossRef]
Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian Personalized Ranking from Implicit Feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence; AUAI Press: Arlington, VR, USA, 2009; pp. 452–461. [Google Scholar]
Yu, F.; Liu, Q.; Wu, S.; Wang, L.; Tan, T. A Dynamic Recurrent Model for Next Basket Recommendation. In Proceedings of the SIGIR ’16: The 39th International ACM SIGIR conference on research and development in Information, Retrieval Pisa, Italy, 17–21 July 2016; pp. 729–732. [Google Scholar]
Kingma, D.; Ba, J. Adam: A Method for Stochastic Optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Zheng, L.; Noroozi, V.; Yu, P.S. Joint Deep Modeling of Users and Items Using Reviews for Recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining; ACM: New York, NY, USA, 2017; pp. 425–434. [Google Scholar]
Tay, Y.; Tuan, L.A.; Hui, S.C. Multi-Pointer Co-Attention Networks for Recommendation. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018. [Google Scholar]
Hao, J.; Zhao, T.; Li, J.; Dong, X.L.; Faloutsos, C.; Sun, Y.; Wang, W. P-Companion: A Principled Framework for Diversified Complementary Product Recommendation. In Proceedings of the 29th ACM International Conference on Information and Knowledge Management; ACM: New York, NY, USA, 2020; pp. 2517–2524. [Google Scholar]
Abbas, S.M.; Alam, K.A.; Shamshirband, S. A Soft-Rough Set Based Approach for Handling Contextual Sparsity in Context-Aware Video Recommender Systems. Mathematics 2019, 7, 740. [Google Scholar] [CrossRef] [Green Version]
Bhaskaran, S.; Marappan, R.; Santhi, B. Design and Comparative Analysis of New Personalized Recommender Algorithms with Specific Features for Large Scale Datasets. Mathematics 2020, 8, 1106. [Google Scholar] [CrossRef]
Wan, M.; McAuley, J. Item Recommendation on Monotonic Behavior Chains. In Proceedings of the 12th ACM Conference on Recommender Systems; ACM: New York, NY, USA, 2018. [Google Scholar]
Gehring, J.; Auli, M.; Grangier, D.; Yarats, D.; Dauphin, Y.N. Convolutional Sequence to Sequence Learning. arXiv 2017, arXiv:1705.03122. [Google Scholar]
Borisov, A.; Wardenaar, M.; Markov, I.; de Rijke, M. A Click Sequence Model for Web Search. In The 41st International ACM SIGIR Conference on Research & Development in Information Retrieval; ACM: New York, NY, USA, 2018; pp. 45–54. [Google Scholar]

Figure 1. Structure of the CCN-SR model.

Figure 2. Details of the parallel co-attention mechanism.

Figure 3. Details of the alternating co-attention mechanism.

Figure 4. Performance of CCN-SR with different strategies combining structural as well as sequential information on the top N recommendation task, tested on the Tmall and Tianchi datasets.

Figure 5. Effect on performance of five models in terms of Recall@20 and MRR@20 with different session lengths, tested on the Tmall and Tianchi datasets.

Table 1. Summary of the main notations used in the paper.

Notation	Description
S	the current session
$v_{i}$	the embedding of item $v_{i}$
$h_{t}$	hidden state at timestep t in RNN-based session encoder
$z_{t}$	the update gate in GRU
$r_{t}$	the reset gate in GRU
$W_{z}$ , $U_{z}$ , $W_{r}$ , $U_{r}$ , $W_{o}$ , $U_{o}$	trainable parameters in RNN-based session encoder
$S_{r}$	the output of the RNN-based session encoder
$V_{S}$	the unique items set of session S
$G_{s}$	the session graph for S
$E_{s}$	the edges in the session graph $G_{s}$
$V_{s}$	the nodes in the session graph $G_{s}$
$v_{s, i}$	item $v_{i}$ in the current session, where $v_{s, i} \in V_{S}$
$A_{s}^{o u t}$	the matrix containing the connection between nodes in the session graph with outgoing edges
$A_{s}^{i n}$	the matrix containing the connection between nodes in the session graph with incoming edges
$z_{s, i}^{t}$	the update gate in GNN
$r_{s, i}^{t}$	the reset gate in GNN
$W_{z}^{g}$ , $U_{z}^{g}$ , $W_{r}^{g}$ , $U_{r}^{g}$ , $W_{o}^{g}$ , $U_{o}^{g}$	trainable parameters in GNN-based session encoder
$v_{t}^{g}$	the node vector generated by the GNN-based session encoder for item $v_{t}$
$S_{g}$	the output of the GNN-based session encoder
$C$	the affinity matrix in the co-attention network
$W_{s r}$ , $W_{s g}$ , $W_{t}^{r}$ , $W_{t}^{g}$ , $w_{h r}$ , $w_{h g}$	the weight parameters in the parallel co-attention strategy
$B_{r}$ , $B_{g}$ , $B^{p}$	trainable parameters in the parallel co-attention strategy
$B_{c g}$ , $m$ , $A_{1}$ , $A_{2}$ , $B^{a}$	trainable parameters in the alternating co-attention strategy
$U_{S}$	the final session representation of S generated by the co-attention network
$y_{i}$	the prediction score of item $v_{i}$

Table 2. Dataset statistics.

Dataset	Tmall	Tianchi
# of items	3751	36,369
# of interactions	131,857	3,438,390
# interactions in training set	99,057	2,632,480
# interactions in test set	32,800	805,910

Table 3. Performance of recommendation models. The results produced by the best baseline and the best performer in each column are underlined and boldfaced, respectively. Statistical significance of pairwise differences of best model vs. the best baseline is determined by a paired t-test (

^{△}

for p-value ≤ 0.05).

Table 3. Performance of recommendation models. The results produced by the best baseline and the best performer in each column are underlined and boldfaced, respectively. Statistical significance of pairwise differences of best model vs. the best baseline is determined by a paired t-test (

^{△}

for p-value ≤ 0.05).

	Tmall		Tianchi
Model	Recall@20	MRR@20	Recall@20	MRR@20
Item-pop	0.1509	0.0432	0.0044	0.0011
BPR-MF	0.1659	0.0537	0.0014	0.0017
GRU4Rec	0.6815	0.6234	0.4213	0.2263
NARM	0.7314	0.6969	0.4393	0.2198
SR-GNN	0.7370	0.7072	0.4576	0.2294
CCN-SR $_{pa}$	0.7523	0.7226	0.4655	0.2361
CCN-SR_al	0.7435	0.7254	0.4774 $^{△}$	0.2376 $^{△}$

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, W.; Chen, H. Collaborative Co-Attention Network for Session-Based Recommendation. Mathematics 2021, 9, 1392. https://doi.org/10.3390/math9121392

AMA Style

Chen W, Chen H. Collaborative Co-Attention Network for Session-Based Recommendation. Mathematics. 2021; 9(12):1392. https://doi.org/10.3390/math9121392

Chicago/Turabian Style

Chen, Wanyu, and Honghui Chen. 2021. "Collaborative Co-Attention Network for Session-Based Recommendation" Mathematics 9, no. 12: 1392. https://doi.org/10.3390/math9121392

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Collaborative Co-Attention Network for Session-Based Recommendation

Abstract

1. Introduction

2. Related Works

2.1. General Recommendation Approaches

2.2. Session-Based Recommendation Approaches

2.3. Attention-Based Recommendation Approaches

3. Methods

3.1. Problem Formulation and Notation

3.2. RNN-Based Session Encoder

3.3. GNN-Based Session Encoder

3.4. Co-Attention Network

3.4.1. Parallel Co-Attention

3.4.2. Alternating Co-Attention

3.5. Prediction and Optimization

4. Experimental Setup

4.1. Model Summary

4.2. Datasets and Experimental Setup

4.2.1. Datasets

4.2.2. Settings and Parameters

5. Results

5.1. Overall Performance

5.2. Impact of Different Combination Strategies

5.3. Impact of the Current Session Length

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI