MIGAN: Mutual-Interaction Graph Attention Network for Collaborative Filtering

Drif, Ahlem; Cherifi, Hocine

doi:10.3390/e24081084

Open AccessArticle

MIGAN: Mutual-Interaction Graph Attention Network for Collaborative Filtering

by

Ahlem Drif

^1,†

and

Hocine Cherifi

^2,*,†

¹

Faculty of Sciences, Ferhat Abbas University, Setif 1, Setif 19000, Algeria

²

Laboratoire d’Informatique de Bourgogne, University of Burgundy, 21078 Dijon, France

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Entropy 2022, 24(8), 1084; https://doi.org/10.3390/e24081084

Submission received: 9 June 2022 / Revised: 28 July 2022 / Accepted: 2 August 2022 / Published: 5 August 2022

(This article belongs to the Special Issue Selected Papers from the Tenth International Conference on Complex Networks & Their Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Many web platforms now include recommender systems. Network representation learning has been a successful approach for building these efficient recommender systems. However, learning the mutual influence of nodes in the network is challenging. Indeed, it carries collaborative signals accounting for complex user-item interactions on user decisions. For this purpose, in this paper, we develop a Mutual Interaction Graph Attention Network “MIGAN”, a new algorithm based on self-supervised representation learning on a large-scale bipartite graph (BGNN). Experimental investigation with real-world data demonstrates that MIGAN compares favorably with the baselines in terms of prediction accuracy and recommendation efficiency.

Keywords:

recommender systems; mutual influence; graph attention network; self-supervised; collaborative filtering

1. Introduction

In the literature, there are numerous techniques for building recommendation systems. They can be classified either as collaborative, content-based, or hybrid filtering approaches [1,2,3,4]. Collaborative filtering is the most influential. It relies on identifying users with similar tastes for item recommendations. It leverages their feedback to make suggestions to the active user. Collaborative recommender systems have been implemented in multiple application areas [5,6,7].

Learning effective user/item representations from their interactions and side information in recommender systems is a challenging issue. Since most data have a graph structure and the graph neural network (GNN) has superiority in representation learning, using GNN in recommender systems is a flourishing field of research. Several works are based on GNN to perform recommendations, and other tasks, such as the graph convolutional network (GCN) [8] and graph attention network (GAT) [9]. GAT computes the representation of nodes by adaptively combining their neighborhoods’ vectors using a self-attention mechanism with trainable attention weights. Wang et al. [10] suggest a knowledge graph attention network (KGAT) for KG-based recommendations. To enrich the representation, the authors consider the implicit collaborative information of multi-hop neighbors. In the work [11], the authors propose a neighbor-aware graph attention network for recommendation tasks to model the implicit correlations of neighbors. Unlike previous attention networks, our current work determines the most relevant weights characterizing the mutual influence among item-users. In addition to learning the interaction between user interests and item embeddings, it integrates a new component accounting for the mutual influence of items carrying collaborative signals on user decisions. MIGAN learns deeply the most relevant weights representing the users mutual influence on an item. It exploits the complex relation between the user profile and the item attributes. Its main advantage lies in its ability to discriminate the relative influence of various interactions between nodes. Our main contributions summarize as follows:

Our approach is based on self-supervised representation learning on a large-scale bipartite graph (BGNN). We have adapted this powerful representation for the recommendation task because its ability to model the dependencies between the nodes on a large scale.
The collaborative filtering recommender based on interactive neural attention networks takes advantage of the encoding potential of interactive attention between users and items. It learns the most significant weights representing users’ mutual effect on the item. Consequently, exploiting this information improves the recommender systems’ accuracy.
The empirical evaluation, including various real-world dataset, shows that MIGAN significantly outperforms the state-of-the-art baselines.

The rest of the paper is organized as follows. Section 2 describes the related literature. Section 3 presents in detail the proposed architecture. Section 4 discusses the experimental results. Section 5 summarizes the conclusions.

2. Related Work

Graph embedding models are one of machine learning’s newest and fastest-growing subfields. Its strength is in its ability to take advantage of the intrinsic graph structure of a wide range of data types encountered in a wide range of applications. The graph format models a set of elements (represented by nodes) and their relationships (represented by edge) to capture structural information. Therefore, there have been proposed various graph embedding models in literature [12,13]. Node2Vec is an embedding model for converting graphs into numerical representations where each node in the network is used as a starting point to produce a corpus of random walks [14]. In a first-order random walk, each step is solely determined by the current state. The steps in a second-order random walk are determined by the current and prior states. The random walk corpus is fed through Word2Vec to build the node embeddings. Liang et al. [15] extend the variational autoencoders (VAEs) to collaborative filtering for implicit feedback. It models the collaborative information into multinomial likelihood (MultiVAE) for the data distribution to sample prediction for items on the long tail. Drif et al. [16] develop an ensemble variational autoencoder framework for recommendations (EnsVAE) that specifies a procedure to transform sub-recommenders’ predicted utility matrix into interest probabilities that allow the VAE to represent the variation in their aggregation. This architecture is based on two components: (1) GloVe content-based filtering recommender (GloVe-CBF) that exploits the strengths of embedding-based representations and stacking ensemble learning techniques to extract features from the item-based side information, and (2) a variant of neural collaborative filtering recommender, named the Gate Recurrent Unit-based Matrix Factorization (GRU-MF) recommender. It models a high level of non-linearities and exhibits interactions between users and items in latent embeddings, reducing user biases towards items that are rated frequently by users. Weng Lo et al. [17] suggested the graph flow data’s extensive structural information based on a graph neural networks (GNNs). The latter works with the message passing concept, where a node collects its neighbors features and sends them to a node as a message. In recent years, the graph attention network approach has developed rapidly. Wang et al. [18] introduced a multi-dimension interaction-based attentional knowledge graph neural network (MI-KGNN) to improve recommendations based on knowledge graph (KG). MI-KGNN explores the interaction between users and the neighborhood during embedding propagation. In [10], the authors proposed a knowledge graph attention network (KGAT) modeling the high-order connectivity in knowledge graphs. It exploits the attention mechanism to determine important neighbors. In [19], the authors propose a multi-view graph attention network (MV-GAN) based on the heterogeneous information networks for the recommendation. They create attention networks at the node and path levels to learn user and product representations from every view. A view-level attention mechanism is developed to integrate various relationship types in multiple views co-operatively. Liu et al. [20] propose a contextualized graph attention network (CGAT) based on an entity’s local and non-local context data in a knowledge graph. CGAT implements a graph attention method to record local context information while considering users’ unique preferences for entities. The non-local context of an entity is also extracted using a biased random walk sampling method by CGAT. In fact, propagating information from nodes across the network is done during numerous iterations. The aggregated information at each node (node embedding) is a memory- and time-consuming task. Due to this fact, many GNNs for recommendation suffer from scalability limitations, unpredictable memory, and computational resource requirements on large graphs. To overcome these drawbacks, our work considers node representation learning on large-scale bipartite graphs.

3. Mutual-Interaction Graph Attention Network Approach

The proposed architecture considers the mutual collaborative information of the user’s preference on the whole neighboring item to enrich the representation. It is based on self-supervised representation learning on a large-scale bipartite graph (BGNN) [21]. Figure 1 illustrates the model architecture in detail. We first formulate the recommendation task on the bipartite graph. Secondly, we introduce the embedding representation based on BGNN and then describe the mutual interaction graph attention mechanism.

3.1. Problem Formulation

Let

U = u_{1}, u_{2}, \dots, u_{n}

and

I = i_{1}, i_{2}, \dots, i_{m}

be the sets of users and items, respectively, where n is the number of users, and m is the number of items. We assume that

R^{n \times m}

is the user-item rating matrix.

We formulate the recommendation task as a prediction problem as follows:

\hat{R}

: utility matrix;

{\hat{r}}_{u}

: predicted rating for each user

u \in U

;

U = u_{1}, u_{2}, \dots, u_{n}

: is the sets of users, where n is the number of users;

I = i_{1}, i_{2}, \dots, i_{m}

is the set of the items, where m is the number of items;

R_{u i}

: is the ground truth rating assigned by the user u on the item i.

We define the utility matrix as:

{\hat{R}}_{u i} = P Q^{T} = \sum_{k = 1}^{K} p_{u k} q_{k i}

(1)

where: K is latent space’s dimension.

S_{(n \times m)}^{'}

: is the inner product of both user and item latent vectors. It is decomposed by the matrix factorization method into

P \in R^{N \times K}

and

Q \in R^{M \times K}

.

To normalize

\hat{R}

, we apply the min/max scaling:

m i n m a x (x) = \frac{x - m i n}{m a x - m i n} \forall x \in \hat{r} (u i)

(2)

where:

m i n = min (r_{u i})

is the minimal rating;

m a x = max (r_{u i})

: is the maximal rating. The normalization eliminates user bias. In other words, users have different ways to rate the items. Some would only give high ratings to items they like, while others do the opposite. Normalizing users’ ratings hide their bias by mapping them to values between 0 and 1. The lowest rating of each user is associated with 0, while 1 represents their highest value. It assists in leveraging more accurate collaborative-based recommendations.

Table 1 reports the notations used in the rest of this paper.

A bipartite graph is composed of two independent sets of vertices,

U_{1}

and

I_{1}

. The edges connect a vertex from one set

U_{1}

to one in

I_{1}

.

We define Bipartite Graphs as follows: Let

G = (U_{1}, I_{1}, E)

be a bipartite graph.

e_{i j}

represents the edge between

u_{i}

and

i_{j}

.

B_{u} \in R^{M \times N}

is the incidence matrix for

U_{1}

.

B_{i} \in R^{N \times M}

is the incidence matrix for

I_{1}

. Where

B_{u} (i, j) = \{\begin{matrix} 1 & if e_{i j} \in E, \\ 0 & if e_{i j} \notin E . \end{matrix}

(3)

X_{u} \in R^{M \times P}

: is defined as a feature matrix of node

u_{i}

(

X_{i}

is similarly written).

Our work is based on the self-supervised node representation learning model [21] that can employ topology information as well as separate node attributes from two domains to increase the recommendation performance for large graph.

3.2. Embedding Representation Based on Bipartite Graph Neural Networks (BGNN)

He et al. [21] proposed a a self-supervised representation learning framework for large-scale bipartite graphs. In this section, we adapt the outputs of this BGNN architecture, thus, we can deploy it in our recommendation system. Let us define the following notions:

H_{u} \in R^{P'}

(

H_{i} \in R^{Q'}

, respectively): is nodes embedding representation for

U_{1}

(

I_{1}

, respectively).

f_{e m b}

: is an embedding model given by

θ

parameters.

The embedding of distinct node features

X_{u}

and

X_{i}

is written as:

H_{u}, H_{i} = f_{e m b} (X_{u}, B_{u}, X_{i}, B_{i}, θ)

(4)

The architecture of

f_{e m b}

is based on two functions: (i) Inter-Domain Message Passing (IDMP) and (ii) Intra-Domain Alignment (IDA). We will describe briefly these functions and show how we prepare the outputs for the recommendation task (interested reader can refer to [21] for more details). IDMP enables one domain to aggregate information from the other domain, through the linked edges, as follows:

H_{i \to u} = f_{u} (X_{i}, B_{u}, θ)

(5)

H_{u \to i} = f_{i} (X_{u}, B_{i}, θ)

(6)

such as:

f_{u}

(resp.

f_{i}

): represents the IDMP function for this domain.

H_{i \to u}

(resp.

H_{u \to i}

): is the flow of aggregated information from

I_{1}

(resp.

U_{1}

) to

U_{1}

(resp.

I_{1}

). After that, the Intra-Domain Alignment (IDA) is deployed for theses two distinct features into a single representation. After the self-supervised training, the algorithm gives the domains representation of

H_{u}^{1}

and

H_{I}^{1}

. The adversarial loss

L_{a d v}

is used to compute the best results.

L o s s_{u} = L_{a d v} (H_{i \to u}, X_{u})

(7)

L o s s_{i} = L_{a d v} (H_{u \to i}, X_{i})

(8)

Thus, the Inter-Domain Message Passing (IDMP) is expressed as:

H_{i \to u}^{(K)} = σ (B_{u}^{^{'}} H_{i}^{(K)} W_{u}^{K})

(9)

H_{u \to i}^{(K)} = σ (B_{i}^{^{'}} H_{u}^{(K)} W_{i}^{K})

(10)

where:

B_{u}^{^{'}} = D_{u}^{- 1} B_{u}

is the normalization of

B_{u}

(

D_{u}

is the degree matrix of

B_{u}

). By normalizing the incidence matrix of the graph, the algorithm can effectively reduce the computational cost.

σ

is the activation function ReLU [22]. K denotes the depth index of the hidden features of the nodes in set

U_{1}

(resp.

I_{1}

).

Let

σ

is the Intra-Domain Alignment (IDA) discriminator and

ϕ

is the IDMP generator. The discriminator loss function is expressed as follows:

L_{D} (σ | ϕ) = \frac{1}{M} \sum_{j = 1}^{M} l o g P_{σ, ϕ} (s o u r c e = 0 | h_{u (j)}) - \frac{1}{N} \sum_{j = 1}^{N} l o g P_{σ, ϕ} (s o u r c e = 1 | h_{i \to u (j)})

(11)

where:

P_{σ, ϕ}

(s o u r c e = 1 | h)

is the probability that the input feature vector h is from the source domain

H_{i \to u}

.

The implementation for the self-supervised representation learning on large scale bipartite graph (BGNN) for the recommendation task is summarized in Algorithm 1.

Algorithm 1 Bipartite Graph Neural Networks for recommendation task.

Input

X_{u} : U s e r f e a t u r e s l i s t

X_{i} : I t e m f e a t u r e s l i s t

R : R a t i n g m a t r i x

Output

e m b_{u} : U s e r s^{'} E m b e d d i n g r e p r e s e n t a t i o n b a s e d o n B G N N .

e m b_{i} : I t e m s^{'} E m b e d d i n g r e p r e s e n t a t i o n b a s e d o n B G N N .

Begin
phase 1 ▹ Extract the graph from the rating matrix

B_{u}, B_{i} = G e t B i p a r t i t e G r a p h (R)

phase 2 ▹ Computing embeddings for each item and user

H_{u}^{0} = X_{u}

H_{i}^{0} = X_{i}

for

j = 0, 1, \dots, K

do

H_{u \to i}^{j} = IDMP (B_{i}, H_{u}^{K})

H_{i \to u}^{j} = IDMP (B_{u}, H_{i}^{K})

H_{u}^{K} = IDA (H_{u \to i}^{j})

H_{i}^{K} = IDA (H_{i \to u}^{j})

Endfor
phase 3 ▹ embeddings preparation

e m b_{u} = G e t E m b e d d i n g s (H_{u}^{K})

e m b_{i} = G e t E m b e d d i n g s (H_{i}^{K})

return e m b_{u}, e m b_{i}

3.3. The Interactive Attention Network Recommender

The interactive attention network recommendation system aim at identifying latent features that show the users and items mutual influence. The attention mechanism has been shown to be useful in a variety of machine learning applications, including image/video captioning [23,24]. Our proposed interactive concept extracts each participant’s contribution from their compressed representation. Thus, it allows the proposed recommendation framework to model efficiently the interaction characteristic. This attention network model figures out which weights best represent the users’ mutual effect on the item. Figure 2 explains the mutual interactions between users and items.

Algorithm 2 shows the architecture of the proposed interactive attention network. In order to anticipate a distribution over the items, we create combined user and item interactive attention maps. As a result, the co-attention mechanism detects a correlation between items and users and calculates the likelihood that an item will be of interest to indirect comparable individuals.

The first embedding layers

e_{u}

and

e_{i}

captures latent features of users

p_{u}

and items

q_{i}

. They are followed by Long-Short-Term-Memory (LSTM) layers to learn long sequences with long time lags. Each LSTM state includes two inputs: the current feature vector and the output vector

h_{t - 1}

from the previous state. Its output vector is

h_{t}

. We chose to apply the LSTM model as it exhibits interactions between users and items in latent embeddings. Each node embedding layer is chained with an LSTM layer that contains recurrent modules enabling long-range learning. Information from nodes neighbors gradually enhances the subsequent feature representation because LSTM has an augmented hidden state with non-linear mechanisms. It allows propagating without modification, updating, or resetting states using simple learned gating functions. The LSTM representation is expressed as follows:

h_{t u} = g_{1} (p)

(12)

h_{t i} = g_{2} (q)

(13)

The learned representation is

H_{p}

and

H_{q}

, respectively, with

d \times n

dimensions for

H_{p}

and

d \times m

for

H_{q}

. Users and items embedded inputs are projected into a vector representation space using the attention technique. In fact, this representation models the high-order non-linear mutual relationship. For the interactive attention mechanism, we build an attention maps in order to predict a distribution over the items. For this purpose, we compute a matrix

L = t a n h (H_{p}^{⊤} W_{p q} H_{q})

, where

L \in R^{n \times m}

, and

W_{p q}

is a

d \times d

a learnable parameters matrix. The features co-attention maps is defined as:

\begin{matrix} α_{p}^{*} = t a n h (W_{p} H_{p} + (W_{q} H_{q}) L^{⊤}) \\ α_{q}^{*} = t a n h (W_{q} H_{q} + (W_{p} H_{p}) L) \end{matrix}

(14)

The interactive attention model uses a tangent function to model the mutual interactions between users and items. Afterward, we compute the the probability distribution over the embedding space. The softmax function is used to generate the attention weights:

α_{u} = S o f t m a x (f (α_{p}^{*}))

(15)

α_{i} = S o f t m a x (f (α_{q}^{*}))

(16)

where f: is a multi-layer neural network.

Then, the high order interaction latent space of users and items is given by:

f_{1} = [β'_{u} \oplus β'_{i}]

(17)

where

β_{p}

and

β_{q}

: are the derived attention weights.

As a result, the predicted matrix

\hat{R_{u i}}

is defined as:

\hat{R_{u i}} = f (f_{1})

(18)

where f: is a dense layer using a sigmoid activation function.

Finally, we train the model to minimize the loss function which is the Mean Absolute Error (MAE):

L (R_{u i}, {\hat{R}}_{u i}) = \frac{1}{| C |} \sum_{(u, i) \in C} | (R_{u i} - \hat{R_{u i}}) |

(19)

Algorithm 2 CoAttention: The interactive attention network recommender.

Input
lstmU: user’s lstm : size

d \times n

lstmI: item’s lstm: size

d \times m

Output

{att}_{u i} : T h e I n t e r a c t i v e A t t e n t i o n b e t w e e n u s e r s a n d i t e m s

Begin
phase 1 ▹ initialization of weights

W_{u} : s i z e n \times d

W_{i} : s i z e m \times d

W_{u i} : s i z e d \times d

b_{u} : s i z e d \times n

b_{i} : s i z e d \times m

   phase 2                                                          ▹ tanh function application
      S = lstmI
      G = lstmU

F = \tanh (S^{t} W_{u i} G)

α_{u}^{*} = \tanh (W_{u} G + (W_{i} S) F^{t})

α_{i}^{*} = \tanh (W_{i} S + (W_{u} G) F)

phase 3 ▹ Softmax function application

α_{u} = softmax (f (b_{u} α_{u}^{*}))

α_{i} = softmax (f (b_{i} α_{u}^{*}))

β_{u} = O (α_{u})

β_{i} = O (α_{i})

▹ O: is the batch.dot() function from Keras backend that is used between two tensors ((

α_{p})^{t}

and

{(G)}^{t}

), (

α_{(} {q)}^{t}

and

{(S)}^{t}

) respectively.
phase 4 ▹ Each output of function O are transposed and then used as input into a product function. After that, both results (

β'_{u}

β'_{i}

) can be summed by a concatenate function.

{att}_{u i} = concatenation (β'_{u}, β'_{i})

return

{att}_{u i}

Algorithm 3 summarizes the overall mutual-interaction graph attention network approach. Network representation learning can tackle the recommendation problems by embedding nodes into a low-dimensional space

R^{d}

. Furthermore, this adapted BGNN representation for recommendation task improves multi-hop relationship modeling and the training accuracy. Unlike previous research on graph neural networks for recommendation that only learn complex relationship between the target and their neighbors using attention network, our work learn the most important weights representing the users’ mutual influence on the item based on the interactive attention.

Algorithm 3 Migan: Mutual-Interaction Graph Attention Network.

Input

X_{u} : U s e r f e a t u r e s l i s t

X_{i} : I t e m f e a t u r e s l i s t

U : L i s t o f u s e r : s i z e = n

I : L i s t o f i t e m : s i z e = m

R : R a t i n g m a t r i x

Output P_{u i} : p r e d i c t i o n m a t r i x

Begin
phase 1 ▹ Preparing data to be passed to the BGNN

foreach u \in U, i \in V do R j = minmax (R)

phase 2 ▹ extracting embeddings by BGNN-Class()

e m b_{u}, e m b_{i} = B G N N (X_{u}, X_{i}, R_{j})

phase 3 ▹ User and Item embedding are followed by LSTM layers.

l s t m_{u} = L S T M (e m b_{u})

l s t m_{i} = L S T M (e m b_{i})

phase 4 ▹ Applying Attention mechanism

a t t_{u} = A t t e n t i o n (l s t m_{u})

a t t_{i} = A t t e n t i o n (l s t m_{i})

a t t_{u i} = C o A t t e n t i o n (l s t m_{u}, l s t m_{i})

phase 5 ▹ Concatenating The outputs

A T T = c o n c a t e n a t i o n (a t t_{u}, a t t_{i}, a t t_{u i})

InteractiveAttention = BuildModel(ATT);
InteractiveAttention.trainModel(D);

break return P_{u i} = I n t e r a c t i v e A t t e n t i o n . p r e d i c t (φ)

4. Experiments and Discussion

We conduct our experiments on MovieLens 1M that is commonly used for benchmarking recommendation frameworks. The MovieLens dataset [25] is a real, timestamped 5-star ratings of the MovieLens platform users on various films. The selected dataset contains 1 million ratings from 6000 users over 4000 movies. Table 2 shows the dataset description. We divide the datasets into 75% training data and 25% testing data in a stratified way.

We evaluate MIGAN using Mean Average Precision and Normalized Discounted Cumulative Gain. The mean average precision (

M A P

) measures the accuracy of information retrieval. The results of recommender systems are frequently pruned to return the Top-k components. The value of k varies based on the application: a system might display the top three trending items or the best ten items that meet the current user’s preferences. The Normalized Discounted Cumulative Gain (NDCG) calculates an item’s normalized usefulness depending on its position in the final list. It is used to assess the Top-k recommended items’ ranking quality. Interested readers can refer to [16] for more details.

4.1. Hyperparameters Analysis

Here, we report the hyperparameters analysis phase, which is performed separately for each MIGAN recommender variant. Evaluation is done with

M A P @ k

and

N D C G @ k

.

The main idea of our model is to pass the results of the BGNN through a neural network and then apply the attention model to it. Consequently, we perform the analysis on 1–6 variants and focus on the learning algorithms used by each hyperparameter. We have 3 hyperparameters and two kinds of recurrent networks: Embedding output size = 50, 75, 100, Neural network = LSTM, GRU. It gives us six variants. The architecture of each variant is as follows:

variant 1: BGNN output size = $75 \times 1$ | Neural Network = LSTM.
variant 2: BGNN output size = $100 \times 1$ | Neural Network = LSTM.
variant 3: BGNN output size = $50 \times 1$ | Neural Network = LSTM.
variant 4: BGNN output size = $75 \times 1$ | Neural Network = GRU.
variant 5: BGNN output size = $100 \times 1$ | Neural Network = GRU.
variant 6: BGNN output size = $50 \times 1$ | Neural Network = GRU.

As shown in Table 3, Variant 1 scores better than the other variants. Thus, it is picked for further tweaking.

We generate the utility matrix based on the learned embeddings. Figure 3 illustrates the hyperparameters analysis.

We explore a range of values for each hyperparameter as reported below:

Dimensions of the embedding $α \in [30, 100]$ ;
Number of dense layers after the co-attention $θ \in [2, 20]$ ;
Number of neurons per dense layer $τ \in [30, 150]$ ;
Activation function used in the dense layers $σ \in {s e l u, e l u, r e l u}$ ;
Optimizer $λ \in {s g d, a d a m, a d a g r a d}$ .

Results illustrate that we achieve the best performance for the following settings:

α = 50

,

θ = 3

,

τ = 100

,

σ = e l u

,

λ = A d a m

.

4.2. Performance Comparison with the Baselines

In this subsection, MIGAN’s final benchmark results are compared to the outcomes of certain baseline recommender systems. We execute the baselines in the same evaluation environment as MIGAN to guarantee that comparisons are fair. Furthermore, we deploy the NeuRec library. Furthermore, we deploy the NeuRec library. It is released on GitHub as open-source software under an MIT license, and it implements 33 neural recommender systems [26].

We compare the proposed recommender framework with the following baselines:

The stacked content-based filtering recommender: the work [16] developed a content-based recommender system based on the stacking ensemble learning.
Neural collaborative filtering (NCF) [27]: this work developed a recommender framework that uses the multi-layer perceptron to exploit the user-item interaction.
Variational Autoencoders for Collaborative Filtering (MultiVAE) [15]: This approach investigates the collaborative information in a multinomial distribution to recommend items on the long tail.
Node2Vec embedding: We propose a variant of MIGAN architecture, which deploys Node2Vec embedding representation instead of BGNN.

Table 4 reports the performance of the various approaches under investigation using the MovieLens dataset. According to mean average precision, MIGAN outperforms the baselines. Indeed, it models the higher-order feature interactions. Figure 4 shows the MAP@k evaluation versus k-top items. As the MAP measure indicates the fraction of relevant articles in the top k suggestions averaged over all users, MIGAN creates a tailored recommendation. To put it differently, MIGAN outperforms the other models in recalling relevant items for the user. It retains the user–item interaction and generates a user-specific task recommendation. Furthermore, both recommender-based BGNN and the recommender-based Node2Vec are quite competitive. For example, MIGAN and Node2Vec achieve MAP@10 = 0.85, and MAP@10 = 0.84, respectively, outperforming the other baselines. The training loss equals 0.23 with dimensions d = 70 and 0.32 with d = 50 for BGGN and Node2vec, respectively. Node2Vec is a random walk-based node embedding method producing high memory consumption for large graphs. In contrast, the cascaded training used in BGGN does not involve loading the entire graph into memory. Consequently, it reduces the memory cost and training time. The MultiVAE recommender scores poorly. Indeed, it does not use a enough rich representation of data semantic. The neural collaborative filtering approach (Neural CF) shows a good score with k = 10, MAP@10 = 0.74 due to the appropriate representation of the interaction between user and item.

Figure 5 shows that MIGAN exhibit a high NDCG score on MovieLens. Its Top-k recommendation list is quite similar to the ground-truth list. MIGAN boosts the modeling of user–item interaction. Indeed, the more interested the users are in an item, the more likely users with similar preferences to recommend it. Note that the Node2Vec also presents good NDCG scores. The graph recommender effectively obtains the user’s overall interest built by the neighborhood representation.

5. Conclusions

We propose and investigate a graph recommender where each user’s recommended content is accurate and personalized. MIGAN is a collaborative filtering (CF) system. A neural graph represents the item and users. It computes a representation of nodes by combining their neighborhoods’ vectors according to their mutual influence interaction by utilizing a co-attention mechanism with trainable attention weights. The attention weights are adjustable parameters computed by aggregating neighbor vectors. This neural graph architecture predicts the ratings assigned by the users to the items.

We perform a comparative evaluation of several configurations for the RS using the well-known dataset MovieLens. We use two metrics to quantify its accuracy: the mean average precision (MAP) and the normalized discounted cumulative gain (NDCG). The first is about the accuracy of the recommendation. The second is about the ranking of the recommended items. Comparing MIGAN performance with some baselines shows that it outperforms all its alternatives in MAP and NDGC scores. However, it would be very interesting to focus on a specific domain for recommendation tasks, such as taking the knowledge graph to distill attribute-based collaborative signals and compare MIGAN performance with knowledge graph attention network models. Thus, future work will investigate this framework for collaborative knowledge graphs involving contextual and semantic data. Future work will investigate this framework for other recommendation tasks involving contextual data.

Author Contributions

A.D. carried out the study, designed the methodology, conceived the framework, and drafted and finalized the manuscript. H.C. supervised the study, participated in the analysis of the results, and reviewed and edited the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Data Availability Statement

The links to publicly datasets: https://dl.acm.org/doi/10.1145/2827872 (accessed on 1 March 2022).

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

RS	Recommender Systems
MIGAN	Mutual-Interaction Graph Attention Network
BGNN	Bipartite Graph Neural Networks
IDMP	Inter-Domain Message Passing
IDA	Intra-Domain Alignment
LSTM	Long-Short-Term-Memory
GRU	Gate Recurrent Unit
MAP	Mean Average Precision
NDCG	Normalized Discounted Cumulative Gain
Neural-CF	Neural Collaborative Filtering
Glove-Cbf	Glove Content-based filtering recommender
MultiVAE	Variational Autoencoders recommender

References

Kulkarni, S.; Rodd, S.F. Context Aware Recommendation Systems: A review of the state of the art techniques. Comput. Sci. Rev. 2020, 37, 100255. [Google Scholar] [CrossRef]
Fayyaz, Z.; Ebrahimian, M.; Nawara, D.; Ibrahim, A.; Kashef, R. Recommendation Systems: Algorithms, Challenges, Metrics, and Business Opportunities. Appl. Sci. 2020, 10, 7748. [Google Scholar] [CrossRef]
Chen, R.; Hua, Q.; Chang, Y.S.; Wang, B.; Zhang, L.; Kong, X. A Survey of Collaborative Filtering-Based Recommender Systems: From Traditional Methods to Hybrid Methods Based on Social Networks. IEEE Access 2018, 6, 64301–64320. [Google Scholar] [CrossRef]
Berkani, L.; Belkacem, S.; Ouafi, M.; Guessoum, A. Recommendation of users in social networks: A semantic and social based classification approach. Expert Syst. 2021, 38, e12634. [Google Scholar] [CrossRef]
Drif, A.; Zerrad, H.E.; Cherifi, H. Context-Awareness in Ensemble Recommender System Framework. In Proceedings of the 2021 International Conference on Electrical, Communication, and Computer Engineering (ICECCE), Kuala Lumpur, Malaysia, 12–13 June 2021; pp. 1–6. [Google Scholar]
Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and data Mining, Las Vegas, NV, USA, 24–27 August 2008; pp. 426–434. [Google Scholar]
Drif, A.; Guembour, S.; Cherifi, H. A Sentiment Enhanced Deep Collaborative Filtering Recommender System. In Proceedings of the International Conference on Complex Networks and Their Applications, Madrid, Spain, 1–3 December 2020; pp. 66–78. [Google Scholar]
Kipf, T.N.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Lio, P.; Bengio, Y. Graph attention networks. arXiv 2017, arXiv:1710.10903. [Google Scholar]
Wang, X.; He, X.; Cao, Y.; Liu, M.; Chua, T.S. Kgat: Knowledge graph attention network for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 950–958. [Google Scholar]
Song, J.; Chang, C.; Sun, F.; Song, X.; Jiang, P. NGAT4Rec: Neighbor-Aware Graph Attention Network For Recommendation. arXiv 2020, arXiv:2010.12256. [Google Scholar]
Goyal, P.; Ferrara, E. Graph embedding techniques, applications, and performance: A survey. Knowl.-Based Syst. 2018, 151, 78–94. [Google Scholar] [CrossRef]
Cai, H.; Zheng, V.W.; Chang, K.C.C. A Comprehensive Survey of Graph Embedding: Problems, Techniques, and Applications. IEEE Trans. Knowl. Data Eng. 2018, 30, 1616–1637. [Google Scholar] [CrossRef]
Grover, A.; Leskovec, J. node2vec: Scalable feature learning for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Liang, D.; Krishnan, R.G.; Hoffman, M.D.; Jebara, T. Variational autoencoders for collaborative filtering. In Proceedings of the 2018 World Wide Web Conference, Lyon, France, 23–27 April 2018; pp. 689–698. [Google Scholar]
Drif, A.; Zerrad, H.E.; Cherifi, H. EnsVAE: Ensemble Variational Autoencoders for Recommendations. IEEE Access 2020, 8, 188335–188351. [Google Scholar] [CrossRef]
Lo, W.W.; Layeghy, S.; Sarhan, M.; Gallagher, M.; Portmann, M. E-GraphSAGE: A Graph Neural Network based Intrusion Detection System. arXiv 2021, arXiv:2103.16329. [Google Scholar]
Wang, Z.; Wang, Z.; Li, X.; Yu, Z.; Guo, B.; Chen, L.; Zhou, X. Exploring Multi-dimension User-Item Interactions with Attentional Knowledge Graph Neural Networks for Recommendation. IEEE Trans. Big Data 2022, 155–170. [Google Scholar] [CrossRef]
Chen, L.; Cao, J.; Wang, Y.; Liang, W.; Zhu, G. Multi-view graph attention network for travel recommendation. Expert Syst. Appl. 2022, 191, 116234. [Google Scholar] [CrossRef]
Liu, Y.; Yang, S.; Xu, Y.; Miao, C.; Wu, M.; Zhang, J. Contextualized graph attention network for recommendation with item knowledge graph. IEEE Trans. Knowl. Data Eng. arXiv 2021, arXiv:2004.11529v1. [Google Scholar] [CrossRef]
He, C.; Xie, T.; Rong, Y.; Huang, W.; Huang, J.; Ren, X.; Shahabi, C. Cascade-BGNN: Toward Efficient Self-supervised Representation Learning on Large-scale Bipartite Graphs. arXiv 2019, arXiv:1906.11994. [Google Scholar]
Agarap, A.F. Deep learning using rectified linear units (relu). arXiv 2018, arXiv:1803.08375. [Google Scholar]
Xu, K.; Ba, J.; Kiros, R.; Cho, K.; Courville, A.; Salakhudinov, R.; Zemel, R.; Bengio, Y. Show, attend and tell: Neural image caption generation with visual attention. In Proceedings of the International Conference on Machine Learning, Lille, France, 6–11 July 2015; pp. 2048–2057. [Google Scholar]
Bahdanau, D.; Cho, K.; Bengio, Y. Neural machine translation by jointly learning to align and translate. arXiv 2014, arXiv:1409.0473. [Google Scholar]
Harper, F.M.; Konstan, J.A. The movielens datasets: History and context. ACM Trans. Interact. Intell. Syst. (TIIS) 2015, 5, 1–19. [Google Scholar] [CrossRef]
Wu, B.; Sun, Z.; He, X.; Wang, X.; Staniforth, J. NeuRec: Next RecSys Library; National Natural Science Foundation: Beijing, China, 2019. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]

Figure 1. MIGAN Architecture.

Figure 2. The interactive attention network recommender. In this example, user 1 and user 2 rate three similar items. They have strong interactive attention. User 2 and user 3 rate an item not seen by user 1. Therefore, one can deduce that this new item can also attract user 1. It is a first-order interaction. Moreover, one can deduce a mutual influence based on entities’ dependencies at more than a first order interaction level. For example, user 1 influences user 4 (a similar user of user 3), generating a recommendation based on user 3 preferences.

Figure 3. Hyperparameter searching MIGAN filtering recommendation system.

Figure 4. Performance results of Top-K recommended lists, according to MAP. The ranking position K ranges from 1 to 50.

Figure 5. Performance results of Top-K recommended lists according to NDGC. The ranking position K ranges from 1 to 50.

Table 1. Notation and descriptions.

Symbols	Definitions and Descriptions
$r_{u i}$	User u’s rating for item i
$p_{u}$	The user u’s embedding.
$q_{i}$	The item i’s embedding.
g	Long-Short-Term-Memory function
$x_{u}$	The user embedding layer followed by LSTM layer
$x_{i}$	The item embedding layer followed by LSTM layer
h	The Multi-Layer-Perception application
$l s t m_{u}$	The user LSTM layer following by MLP
$l s t m_{i}$	The item LSTM layer following by MLP
$α_{u}^{*}$	Attention network function for user u
$α_{i}^{*}$	Attention network function for item i
$α_{u}$	The last attention weights for user u
$α_{i}$	The last attention weights for item i
$C_{u i}$	User-item space
$C_{t}$	Text space
⊕	The concatenation operator
$r_{u i}^{^{'}}$	User u’s rating expected value for item i
$W, b$	The weight and bias in neural network
$U, I$	Nodes of bipartite graph
$X_{u}, X_{i}$	lists of Features
$B_{u}, B_{i}$	adjacency matrix

Table 2. MovieLens 1M description.

# Users	6040
# Movies	3883
# Ratings	1000209
Sparsity %	95.5%
Item	Genomic Tags
User	Demographics

Table 3. The best scoring MIGAN variant.

	Mean Average Precision			Normalized DCG
Variant	MAP@10	MAP@30	MAP@50	NDCG@10	NDCG@30	NDCG@50
Variant 1	0.85	0.83	0.81	0.71	0.78	0.79
Variant 2	0.82	0.78	0.76	0.65	0.72	0.76
Variant 3	0.80	0.77	0.76	0.66	0.73	0.76
Variant 4	0.79	0.74	0.773	0.62	0.71	0.73
Variant 5	0.79	0.73	0.70	0.60	0.71	0.72
Variant 6	0.77	0.76	0.73	0.63	0.73	0.75

Table 4. Recommendation performance (%) of compared approaches conducted on MovieLens 1M dataset. We generate Top 10, 30, and 50 items for each user. The best score of MAP@k and NDCG@k are highlighted with a bold font.

	Mean Average Precision			Normalized DCG
Rec sys	MAP@10	MAP@30	MAP@50	NDCG@10	NDCG@30	NDCG@50
Glove-Cbf	0.82	0.78	0.77	0.67	0.74	0.77
Node2Vec	0.84	0.82	0.81	0.55	0.65	0.69
MultiVAE	0.62	0.58	0.54	0.57	0.62	0.65
Neural CF	0.74	0.68	0.65	0.68	0.73	0.76
MIGAN	0.85	0.83	0.81	0.71	0.78	0.79

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Drif, A.; Cherifi, H. MIGAN: Mutual-Interaction Graph Attention Network for Collaborative Filtering. Entropy 2022, 24, 1084. https://doi.org/10.3390/e24081084

AMA Style

Drif A, Cherifi H. MIGAN: Mutual-Interaction Graph Attention Network for Collaborative Filtering. Entropy. 2022; 24(8):1084. https://doi.org/10.3390/e24081084

Chicago/Turabian Style

Drif, Ahlem, and Hocine Cherifi. 2022. "MIGAN: Mutual-Interaction Graph Attention Network for Collaborative Filtering" Entropy 24, no. 8: 1084. https://doi.org/10.3390/e24081084

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MIGAN: Mutual-Interaction Graph Attention Network for Collaborative Filtering

Abstract

1. Introduction

2. Related Work

3. Mutual-Interaction Graph Attention Network Approach

3.1. Problem Formulation

3.2. Embedding Representation Based on Bipartite Graph Neural Networks (BGNN)

3.3. The Interactive Attention Network Recommender

4. Experiments and Discussion

4.1. Hyperparameters Analysis

4.2. Performance Comparison with the Baselines

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI