1. Introduction
Personalized recommendations function to provide users with appropriate products according to user preferences. Determining how to accurately capture user preferences from user behaviors is the core issue of personalized recommendations. Traditional recommendation models [
1] usually only rely on a single behavior for a recommendation, which makes them insufficient when extracting complex cooperative signals from users’ multi-type behaviors [
2]. Meanwhile, there are serious data sparsity [
3,
4] and cold-start problems [
5,
6], especially for certain high-cost and low-frequency behaviors. In the real world, users usually have different types of interactive behaviors. In the face of diversified user behaviors, a big challenge to achieve more accurate recommendations is whether users’ heterostructure behavior data can be processed more finely. The multi-behavior recommendation model jointly considers different types of behavioral semantics, which is of great help to predict the possibility of users adopting target behaviors [
7]. For example, on an e-commerce platform, users’ page browsing, shopping-cart additions, and collection behaviors for different items can be used as auxiliary information to help predict users’ purchase intent (target behavior) tasks. Therefore, considering the complex dependencies between multiple behaviors is crucial to accurately predict user preferences.
In order to make full use of dynamic interaction information to better predict user preferences, several multi-behavioral recommendation models have emerged in recent years [
2,
8]. LightGCN [
9] learns user/commodity embeddings via an interaction graph by propagating linearly over the interaction graph, using the weighted sum of the embeddings learned at each layer as the final embedding. This simple, linear, and neat model is easier to implement and train but does not take into account the variability between behaviors. To distinguish the semantics of different behavior types, the KHGT [
10] model assigns different learnable weights to different edges in the user–goods heterogeneous graph and clearly distinguishes which type of user–goods interaction is more important to assist in the task of predicting the target behavior. Nowadays, recommendations based on a graph neural network have been used in many real-world scenarios. The NetEase Cloud Music App introduces the graph model architecture, takes a variety of different types of songs as nodes, and constructs a graph relationship network through the multi-type behavior relationship between users and songs. Jingdong Mall also adopts the model based on a graph neural network proposed by the Jingdong platform, and more accurate recommendation results bring huge benefits to the platform.
Despite the success of these approaches in multi-behavior recommendation tasks, there are some limitations:
- (1)
Different types of behaviors can characterize user preferences from different dimensions and complement each other for better learning of user preferences. User/commodity embedding is at the core of recommendation systems. Most current user/commodity embedding representations are a fusion of static features and lack the explicit encoding of a synergistic signal, which is hidden in the user–commodity interaction. Therefore, it is challenging and valuable to capture the behavioral diversity and potential dependencies in recommendations. To address this challenge, existing work models behavioral dependencies by generating specific types of behavioral embeddings through different aggregation approaches to enhance the user/goods representation. For example, MATH [
11] uses self-attentiveness to encode pairwise correlations between different types of behaviors and make predictions about the target behavior.
- (2)
Traditional multi-behavior recommendation models are implemented based on sequential models, which tend to focus more on the local perspective of multiple sequential behaviors of users. In contrast, graph-based multi-behavior recommendation models focus more on the global perspective of all user behaviors. In a heterogeneous graph constructed using multiple types of behavioral data, users/products are represented as nodes and different types of behaviors are represented as edges of the graph. Graph neural networks are also used to explore higher-order complexity in behavioral heterogeneous graphs due to their powerful learning capabilities. A new graph structure-based model for the novel recommender system NGCF [
12] models higher-order connectivity representation in user–commodity interaction graphs by inserting collaborative signals explicitly into the embedding process of users (goods). The user–commodity correlation is well-represented in the embedding space.
In summary, this paper proposes multi-behavior recommendations based on the graph information transfer network method, in which a heterogeneous graph composed of users/commodities first obtains the user/commodity information of a specific type from the graph. The first-order neighborhood information of a particular type of user/goods is obtained from the graph, and the graph information transfer network is used to ensure the interaction behaviors of a particular type have their own semantic information. The above process learns the higher-order neighborhood information in the graph for user/product representation. In the target behavior prediction stage, the above process learns specific types of behavioral representations, which not only provide useful external knowledge but also serve as supervised signals for model optimization.
2. Related Work
Most previous recommendation models [
13,
14,
15,
16] have been designed for a single type of behavior, and in most cases, behaviors directly related to platform profits were selected for modeling, such as purchase behavior in e-commerce platforms. In practice, however, user behavior is inherently multi-typical (e.g., browsing, favoriting, purchasing, etc.). Different types of user behaviors may exhibit different semantic information to characterize the diverse user–goods interactions. The existing user–commodity interactions are thus coding functions and are not sufficient to comprehensively learn complex user preferences. Moreover, using only a single behavior may lead to severe cold-start or data sparsity problems. For example, on an e-commerce website, it is difficult to construct a recommendation model based on purchase behavior alone to provide a comprehensive learning model for users without historical purchases, and new users with a purchase history can be aptly recommended.
While realizing the importance of leveraging different types of user behavior at the same time, encoding multiple types of behavioral patterns poses a significant challenge. These different types of interaction behaviors may interrelate in complex ways, providing complementary information for learning about user interests. In addition, although several multi-behavioral user modeling techniques have emerged in recent years, some multi-behavioral user modeling techniques [
8,
11] have emerged for recommendation, but they fail to capture higher-order information in different user–goods relationships. Inspired by this, applying graph neural networks to recommendations [
17,
18] is beneficial to consider user–goods interactions in the embedding space higher-order relationships between user-goods interactions are considered in the embedding space.
Recently, graph neural networks have achieved promising results in learning dependencies from graph-structured data [
17]. Typically, the core of graph neural networks is to aggregate feature information of neighboring nodes on the graph under a message propagation mechanism [
18]. This information dissemination mechanism aggregates the information of higher-order neighbors through nodes, which can further capture higher-order interrelationships and achieve representation learning effectively. In other words, graph neural networks can better solve relationship inference problems as an interpretable model. The most representative of these was the Graph Convolutional Network (GCN), which obtains the representation of the current node by combining the weighted values of neighboring nodes’ egress and ingress. Inspired by the effectiveness of graph convolutional networks, recent studies, such as PTGCN [
19] and GraphSage [
20], utilize graph convolutional networks to explore the user–item interaction graph and aggregate the embeddings of neighboring nodes. These works propagate information among nodes to mine relationships between users and items. Then, graph convolutional networks became a popular research direction, and researchers have conducted a lot of work to study heterogeneous graphs. BiHGH [
21] is a new bidirectional heterogeneous graph hashing method. First, it uses heterogeneous graph nodes to initialize then design an Ambigram convolution algorithm to sequentially transfer information, and finally uses Bayesian personalized sorting loss combined with dual similarity preserving regularization to achieve user preference learning. PFCM [
22] created a heterogeneous graph that unifies users, items, and attributes and designed a user embedding module based on multimodal content representation to learn user representations. Finally, heterogeneous graph learning was implemented by executing meta path guidance.
3. Methodology
3.1. Problem Statement
Let U and V denote the set of users and goods, respectively, , , where and denote the number of users and goods. Considering multiple types of interactions, this paper defines a three-dimensional tensor to represent multiple types of interactions (e.g., clicks, favorites, adds, etc.) where K denotes the number of interaction behavior types. A single element with a value of 1 indicates that the th behavior category is used to interact with user and product , otherwise = 0. In a multi-behavior recommendation scenario, the interaction category most associated with the platform benefits will be considered the target behavior (e.g., purchase). Other behaviors will be considered contextual behaviors (e.g., click, favorite, add to cart) and used to provide knowledge that aids the target behavior for prediction. Based on the above definitions, the problem studied in this paper is defined as follows:
Input: Multi-behavior interaction tensors between user set and item set under interaction behavior types.
Output: A prediction function that estimates the likelihood that user will adopt target behavior to interact with good is possible.
3.2. Model Architecture
In realistic scenarios, often users’ behaviors are complex and diverse, and the model first proposes a meta knowledge learner to encode behavioral embeddings considering users’ personalized feature attributes. Based on this, the graphical volume was combined with an attention mechanism to capture multiple behavioral patterns with high-order connectivity on the user–goods interaction graph. Finally, complex cross-type behavioral dependencies are captured by a prediction layer. Multiple types of user behavior can be used not only to tune the parameters of the graph neural network model but also to guide the prediction phase by injecting monitoring signals. The model architecture is shown in
Figure 1.
3.3. Embedding Module Incorporating First-Order Neighborhood Information
In a realistic scenario, the behavior habits of different users are very different. For example, User A is used to collect most of the products in the process of browsing, while User B only collects the products he is most interested in, which shows that the collection behavior has little reference value for User A, while for User B, the collection behavior has little reference value for user A, but has great influence on the products collected by user B. Therefore, the design goal of this module is to capture the first-order neighborhood information of entities in the interaction graph under different behavior categories and inject their corresponding weights into the initial embedding of goods and users, so as to generate a feature representation incorporating the first-order neighborhood information. In the bipartite graph composed of user entities and commodity entities, this module learns the representations of commodity entities and user entities under different behavioral categories, respectively, by combining the initialized IDs of user and by aggregating the initialized ID embedding representations and of user and commodity with the first-order neighborhood information to obtain the fused contextual feature vector.
Given the ID embedding representation
of the initialized user
, the following formula is used to learn personalized specific behavior embedding.
where
denotes the set of goods that user
interacts with under
behavior types, and
denotes the set of users that interact with good
under
behavior types.
denotes the splicing operation of the vector. Here,
is the normalization factor.
is the interaction pattern of user
under a specific behavior type
.
is the learned parameter matrix of user
.
is the parameter matrix of the learned personalization of user
, which injects a specific type of behavioral context into the user
representation, and
and
are transformation parameters.
is the personalized representation of the user
that incorporates the contexts.
Given the ID embedding representation
of the initialized good
, the personalized representation
of the good
of the fused context is obtained using the same method of learning as above. The specific formula is as follows:
where
holds information about the users who interact with commodity
for a specific type of behavior.
is the learned parameter matrix of the personalization of the commodity
, which injects a specific type of behavioral context into the representation of the commodity
, and
and
are transformation parameters.
is the personalized representation of the commodity
incorporating the contexts.
3.4. Representation of Users and Products Based on Single Behavior
In a multi-behavior recommendation scenario, each interaction has its own features and semantic representation. For example, in an e-commerce commerce platform, users’ browsing behavior is more likely to occur than purchasing behavior, and adding to cart and purchasing behavior may occur simultaneously with high probability. Therefore, the proposed module aims to capture personalized behavioral semantic signals. Based on the representation
of each user
and the representation
of each good
learned by the embedding module, this module designs a messaging strategy to capture the user–goods interaction graph
under a single behavior, where
denotes the set of user and goods nodes,
denotes the set of interaction edges in
, and all the interactions are of type
at this point. The goal of this module is to learn different behavior-specific embedding vectors. The specific formula is as follows:
where
and
are the embeddings of user
and item
at behavior type
. Define
as standardization factor
where
denotes the set of goods that user
interacts with under
behavior types, and
denotes the set of users interacting with item
under behavior type
.
3.5. Representation of Users and Items Integrated with Multiple Behaviors
In e-commerce platforms, different types of interactions are intertwined, and they are related to each other in a complex way, which is a great challenge for modeling multi-behavioral interaction patterns of users. In order to model the potential relationships between different behavior types, this module designs a multi-behavior relationship learning function, which obtains a more accurate representation of a specific behavior type by injecting information about the interrelationships between different behaviors. The relationship learning function is based on the attention network and is represented as follows:
The module uses multiple potential spaces to perform the embedding projection process, thereby mining the interaction behavior from different hidden dimensions to mine the degree of association between interactions and from different hidden dimensions, where denotes the global user representation considering all behavior types. redefines a particular type of behavioral embedding by connecting feature representations from different learning subspaces, which encodes the degree of influence of other interaction behaviors on the behavior, considering the correlation between interaction behaviors. is the computed correlation between the interaction behavior k and k′ is the degree of correlation between the computed interaction behavior and . is the transformation matrix that transforms the vectors into projection space, which realizes the transformation of vector dimensions in the attention mechanism.
During the training process, to alleviate overfitting, is partitioned into feature vectors of the size dimension, corresponding to the head, and the multi-head attention mechanism processes these segments in parallel before applying the splicing operation. denotes the h-th slice of .
3.6. User and Item Representations Infused with Higher-Order Neighborhood Information
In order to capture the higher-order complexity of the interaction graph and study the higher-order interactions between user interaction behaviors, this module integrates the vector representation obtained from the behavioral semantic learning module to learn the higher-order embedding propagation paradigm. The higher-order information is injected into the user
embedding by the following equation:
The higher-order feature representation of the commodity
is processed using the same network as the user representation above, where
GCN is the graph convolutional network that defines the behavioral semantic learner.
Att denotes the interconnected learning function between behaviors. By
operations, the model learns the connection relations between nodes for
-hops. To obtain a higher-order information representation, the feature vectors of the
layer network are stitched to obtain the final user and commodity representations.
where ∗ denotes the final user embedding when ∗ is
i and ∗ is
j denotes the final product embedding.
3.7. Target Behavior Prediction
Based on the prediction sub-network learned above, the contextual behavioral information (page view, favorite, add to cart) not only provides useful external knowledge in the target behavior (purchase) prediction phase but also serves as a supervisory signal for model optimization. Based on the above learned feature representations
of users and goods under specific behavior types, the prediction network proposed in this model uses non-target behaviors as supervisory signals to obtain personalized meta-knowledge based on the target behavior
. This process is defined as follows:
Of this, , where denotes the multiplication of the corresponding elements of two vectors and denotes the splicing between the elements. encode the meta-knowledge between user and commodity , that is, the dependency between target behavior and context behavior . is the projective quantity under the behavior .
Based on the above learned dependencies between interaction behaviors, the parameters of the prediction network are learned by the following equation.
Ultimately, the model predicts the interaction between user
and commodity
under target behavior
, using the feature vector of non-target behavior
as a supervised signal. The specific formula is as follows:
where
is the predicted likelihood of user
interacting with good
under target behavior
.
is the intermediate feature vector.
3.8. Optimization Strategy
The model is optimized by using each pair of non-target and target behaviors for prediction. For user
and target behavior
, the model samples
positive samples and
negative samples. In the training process, we use the Adam algorithm [
23] for optimization, which is defined by the following equation:
where
denotes thenon-target behavior,
denotes the target behavior, and
and
denote positive and negative samples, respectively.
In the multi-behavior pattern modeling, the model can learn the personalized semantics of specific behaviors and establish the dependency relationship between different types of behaviors, thus effectively improving the accuracy of recommendation. The model adopts lightweight graph convolutional architecture which costs only 𝑂 (𝐿 × 𝐾 × 𝑑 × ||) across 𝐿 layers, 𝐾 behavior types, 𝑑 latent factors and || edges. The behavior relation learning costs extra 𝑂 (𝐿 × 𝐾 ×𝑑 × (𝐾 + 𝑑) × (𝑁 + 𝑀)). As 𝑂 (𝑑 × |E|) is comparable with 𝑂 ((𝐾 + 𝑑) × (𝑁 + 𝑀)) in our case, the complexity does not increase. The prediction network costs 𝑂 (𝑆 ×) computations for each user. In conclusion, our model could achieve comparable time complexity with some graph convolution-based models.
4. Experiments
4.1. Datasets
Taobao, one of the largest e-commerce platforms in China, contains four types of user interactions, namely, page view, add to cart, favorite, and purchase. Each row of the dataset represents a user behavior, consisting of user ID, product ID, product category ID, behavior type, and timestamp, and is separated by commas.
Beibei is one of the largest online retail websites for baby products in China, and it involves three types of user interaction behaviors, including page browsing, adding to cart, and purchasing.
The JDATA dataset is from JD.com, a famous e-commerce website in China, and contains two months of user behavior data from JD.com’s website. The types of actions are browse, order, follow, comment, and add to cart.
4.2. Evaluation Metrics
To verify the performance of the proposed model, we employ a variety of evaluation metrics, including the Hit Ratio (
HR@10) and Normalized Discounted Cumulative Gain (
NDCG@10).
where
GT is all items in the test set, and the numerator is the sum of the number of items hit in the given Top-
k recommendation list.
where
is a normalization factor to ensure the presence of a normalized representation with a value of 1 in the list;
indicates the predicted relevance of the
th item, represented by 0 and 1; and
lg (1 +
i) is the location decay function. The larger the
NDCG and
HR values of the user to be recommended, the more the recommendation list matches the user’s preference and the better the recommendation effect of the algorithm. In order to compare the performance of different models fairly,
NDCG used the above calculation method in the experiment. The experimental results obtained are different from those in the references, but the trend of the experimental results is the same.
4.3. Compared Methods and Implementation Details
4.3.1. Recommendation Model Based on Graph Neural Network
ST-GCN [
24]: This method is a convolution-based graph neural network model that generates user embeddings through an encoder–decoder coder framework to generate user embeddings.
SR-GNN [
25]: A session-based graph neural network model is proposed, which establishes complex dependencies of the session order between interaction items, which is difficult to achieve using previous traditional sequential approaches.
NGCF [
12]: This is a message-passing architecture for user commodity interaction graphs on information aggregation, thus exploiting the higher-order relationships in the interaction graph.
4.3.2. Recommendation Models for Multi-Behavioral Categories
NMTR [
8]: This approach proposes a new solution for learning recommender systems from user multi-behavior data, and the model considers cascading relationships between different types of behaviors, while cascading predictions for different types of behaviors based on a multi-task learning framework.
MATN [
11]: This method preserves cross-type behavioral synergy signals and type-specific behavioral contextual information by explicitly encoding multi-behavioral relational structures. The model transforms each type of behavioral feature through a designed memory unit, generating a specific behavioral representation through this type-specific transformation process.
MBGCN [
2]: This approach proposes a multi-behavior graph convolutional network-based model that learns behavior intensity through the user–goods propagation layer and captures behavior semantics through the goods–goods propagation layer, which better addresses the limitations of existing work.
4.4. Experimental Results and Analysis
We evaluate the performance of all baseline methods on different datasets, and the results are shown in
Table 1, which summarizes the following observations: The MK-GCN model in this article significantly improves the recommendation performance. This performance gap can be attributed to the effective personalized multi-behavior pattern modeling and the rich context information of user and item representations obtained under the meta-learning paradigm. Most studies ignore the different behavior habits of different users and simply assign different weights to different behaviors. In this paper, we learn user personalized behavior feature representations from interaction graphs according to user behavior habits.
MK-GCN consistently achieves better performance than the baseline models, but these baseline models have different degrees of limitations. SR-GNN and ST-GCN models do not consider the specific operation behavior of users, and only model and extract features based on the products that users interact with. The NMTR model only models the cascading relationships between multiple types of interaction behaviors and cannot explore the high-order behavior dependencies in the interaction graph. The MATN model aggregates different types of behavior patterns by weighted summation, which cannot comprehensively capture the complex interdependence between different types of interaction behaviors.
MK-GCN consistently obtains better performance than the baseline models, which all have different degrees of the SR-GNN and ST-GCN models and do not consider the specific operational behavior of the user and only model and extract features based on the goods that the user interacts with. The NMTR model only models the cascading relationships between multiple types of interactions and cannot explore the higher-order behavioral dependencies in the interaction graph. The MATN model aggregates different types of behavioral patterns through weighted summation, which cannot fully capture the complex interdependencies between different types of interactions.
Furthermore, the comparison between MK-GCN and the multi-behavior graph neural model MBGCN demonstrates the proposed method’s advantages of multi-behavior dependency modeling. Among the various baseline methods, it can be observed that, compared to other single-row-for-model recommendation methods that do not distinguish between intersection types, the injection of multi-behavior information into the recommendation framework (i.e., NMTR, MATN, MBGCN) into multi-behavior information improves the performance. This result confirms the role of exploring multi-behavioral patterns for recommendation improvement.
4.5. Ablation Experiments
In order to explore the effect of each module in the model, the variant models shown in
Table 2 were set up for the experiment. The result of the melting experiments is shown in
Table 3. Based on the experimental results, we draw the following conclusions.
- (1)
Behavioral relational learning plays an active role in capturing higher-order information during message passing in graph neural networks. This suggests that the model uses attention layers under multiple representation subspaces to capture the pairwise correlations between various interaction behavior. It is reasonable that the model uses the attention layer to capture pairwise correlations between various interaction types in multiple representation subspaces.
- (2)
The results demonstrate the necessity of learning the parameters of the prediction network using the dependencies between interaction behaviors of the network. This suggests that behavioral relationships can not only provide external knowledge in the process of multi-behavior aggregation but can also serve as a supervisory signal for model optimization.
- (3)
MK-GCN outperforms -metaEncoder and -metaPred because they do not incorporate a meta knowledge learner, which indicates the importance of user-specific behavior modeling through the meta-learning paradigm.
5. Conclusions and Future Work
In this paper, a multi-behavior augmented recommendation framework based on graph neural networks is studied and designed to address the heterogeneity and diversity of user interaction behaviors. The model first encodes user and product feature vectors fusing contextual information according to a custom meta-learning paradigm, explores the dependencies between multiple behavior types by learning the semantic features of different behaviors, and uses graph convolutional networks and attention networks to obtain higher-order association information in the user–commodity interaction graph through multiple operations learning. Finally, the feature vectors of non-target behaviors are used as supervised signals to predict the likelihood of user u interacting with product j using target behavior k. Experimental validation is conducted on three large e-commerce datasets, and the results show that the model performs better compared to other baseline models. The drawback of this model is that it cannot deal with real-time user behavior data stream and can only make recommendations through the collected historical behavior data. Future work hopes to further investigate time-sensitive models that can leverage newly arrived user behavior data to facilitate real-time recommendations.
This model can be widely used in multi-behavior scenarios, such as shopping mall recommendation, music, books, movies, and so on. In a real scenario, we will model the complex relationship as a heterogeneous graph, which contains multiple types of nodes and edges. Then this model simulates the user’s behavior pattern by learning the dependencies between different types of behaviors, so as to obtain more accurate recommendation results, which is more conducive to the platform to make wise decisions and adjust in time.