MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for Recommendations

Li, Yang; Yan, Shichao; Zhao, Fangtao; Jiang, Yi; Chen, Shuai; Wang, Lei; Ma, Li

doi:10.3390/fi16080270

Open AccessArticle

MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for Recommendations

by

Yang Li

^1,*

,

Shichao Yan

¹,

Fangtao Zhao

¹,

Yi Jiang

¹,

Shuai Chen

¹,

Lei Wang

² and

Li Ma

¹

College of Computer Science and Technology, North China University of Technology, Shijingshan, Beijing 100144, China

²

Data Processing Center, Henan Provincial Bureau of Statistics, Zhengzhou 450016, China

^*

Author to whom correspondence should be addressed.

Future Internet 2024, 16(8), 270; https://doi.org/10.3390/fi16080270

Submission received: 19 June 2024 / Revised: 24 July 2024 / Accepted: 25 July 2024 / Published: 29 July 2024

(This article belongs to the Special Issue Deep Learning in Recommender Systems)

Download

Browse Figures

Versions Notes

Abstract

:

Meta-path-based heterogeneous graph neural networks have received widespread attention for better mining the similarities between heterogeneous nodes and for discovering new recommendation rules. Most existing models depend solely on node IDs for learning node embeddings, failing to leverage attribute information fully and to clarify the reasons behind a user’s interest in specific items. A heterogeneous graph neural network for recommendation named MIMA (multi-feature interaction meta-path aggregation) is proposed to address these issues. Firstly, heterogeneous graphs consisting of user nodes, item nodes, and their feature nodes are constructed, and the meta-path containing users, items, and their attribute information is used to capture the correlations among different types of nodes. Secondly, MIMA integrates attention-based feature interaction and meta-path information aggregation to uncover structural and semantic information. Then, the constructed meta-path information is subjected to neighborhood aggregation through graph convolution to acquire the correlations between different types of nodes and to further facilitate high-order feature fusion. Furthermore, user and item embedding vector representations are obtained through multiple iterations. Finally, the effectiveness and interpretability of the proposed approach are validated on three publicly available datasets in terms of NDCG, precision, and recall and are compared to all baselines.

Keywords:

heterogeneous graph; multi-head attention; multi-feature interaction; meta-path aggregation; heterogeneous graph neural network

1. Introduction

Graph structure data [1] are widely used to represent a variety of practical application problems, including social networks [2,3], physical systems [4,5], traffic networks [6,7,8,9], citation networks [2,10], knowledge maps [11,12], link prediction [13] etc. Graph structure data are divided into homogeneous graphs and heterogeneous graphs [14]. A homogeneous graph has the same type of nodes and only one type of edge. Therefore, one graph can only embed one relationship type [15]. However, various real-life scenarios involve not just one type of node and edge but multiple types with more complex interrelations. A heterogeneous graph is a more reasonable representation [14]. In recommendation systems [16,17], learning optimal node representation is the core problem and can be achieved by aggregating the features of each node’s local neighbors and mining important high-order interaction information through feature extraction, i.e., a graph neural network (GNN) [18,19]. Because of their cogent capacity of feature extraction, GNNs have been widely recognized and applied in academia and industry. The objective of our study is to continuously optimize the embedding representation of nodes for recommendation systems by mining valuable information and latent rules in graphs. Graph convolutional networks (GCNs) [20] are used to learn node embedding features represented by dense low-dimensional vectors by aggregating the features of their neighbor nodes. A graph attention network (GAT) [21] uses the attention strategy to perform aggregation operations on adjacent nodes to learn node feature representation. The performance of GCNs by aggregating neighbor features of different orders for each node has also been improved [22]. Hyperbolic graph neural networks (HGNNs) [23] are optimized to leverage the strategies of a distance-based self-attention mechanism and weighted aggregation. The above algorithms realize neighbor feature aggregation from spectral-based to hyperbolic-based. However, these approaches only consider the embedded representation of neighbor nodes for homogeneous graphs and do not integrate node attribute information, which results in under-utilization of the node’s context information.

A heterogeneous graph attention network (HAN) [14] utilizes node-level and semantic-level attention mechanisms and meta-path aggregation to achieve node embedding in heterogeneous graphs. A heterogeneous graph neural network (HetGNN) [15] uses HAN to aggregate attribute information of nodes to improve classification accuracy. Graph learning augmented heterogeneous graph neural networks (GL-HGNNs) [24] make full use of user–user relations, user–item interactions, and item–item similarities. The three graphs of feature similarity, feature propagation, and semantics are aggregated with the original heterogeneous graph, and the optimal graph is formed after iterative training in heterogeneous graph structure learning (HGSL) [25]. Graph structure and attribute information is extracted, and node representations are trained on the attribute information graph with semantic-aware heterogeneous network embedding (SHNE) [26]. Meta-path-based, relation-based, structure-attribute-enhanced heterogeneous graph learning algorithms have been proposed for learning node embedding. Although some algorithms consider node attribute information for node aggregation, they do not consider the influence of attributes for meta-path aggregation. Introducing node attributes in meta-path aggregation operations will facilitate the discovery of new recommendation rules and enhance the interpretability of recommendation results.

This paper introduces a heterogeneous graph neural network algorithm that integrates multi-feature interaction and meta-path aggregation to enhance node representation for recommendation systems. With the goal of more fully utilizing attribute information of nodes, we construct an attribute-enhanced heterogeneous graph. The multi-head attention mechanism is used to integrate the attribute information of nodes into user and item aggregations and to explore the latent scores of users and items for the embedding vector representation of multi-feature fusion. A graph convolution operation is applied to aggregate the semantic features of the constructed meta-path information, which consists of attribute information, users, and items, and in subsequent steps, high-order feature fusion captures new recommendation rules between users and items by using attribute information. Thereby, the converged model of a MIMA algorithm and optimized embedding vectors of users and items is generated for recommendations after the MIMA algorithm converges.

Motivated by studying the impact of attribute information on node aggregation and meta-path aggregation, a MIMA algorithm is proposed. The key contributions are as follows:

A multi-head attention mechanism is employed to acquire the preference representation of users and items for multi-attribute feature combinations, and the meta-path representations of users and items with their attribute information are modeled for exploring recommendation relations.
A multi-feature interactive meta-path aggregation heterogeneous graph neural network algorithm is proposed, which can more accurately uncover the underlying implicit correlations between users and items with multiple features and can aggregate meta-path semantic information to make the model more interpretable.
Compared with other baselines on three commonly used datasets, our proposed model demonstrates significant advancements in NDCG, recall, and accuracy.

The organization of this paper is as follows. The related works are presented in Section 2. Section 3 details the methodology of our proposed recommendation algorithm. Section 4 experimentally demonstrates that the MIMA algorithm achieves the best results on three datasets compared to the other ten algorithms. Section 5 summarizes the research content of this article.

2. Related Work

2.1. Graph Neural Network

Graph neural networks (GNNs) [19,27,28,29,30] learn the embedding representation of nodes using node aggregation propagated separately on each node, and node attribute information is not consider in the model learning process. Graph convolutional networks (GCNs) [20,21,31,32,33,34,35] optimize the embedding vector of a node by making full use of sufficient attribute information and leveraging inductive learning to generalize to new nodes. LightGT [36] predicts user–item interactions by using modal-specific embedding, a layer-wise position encoder, and a light self-attention mechanism. Attention mechanisms [37,38] and feature interactions [39,40] are used for feature learning of node embedding representations. LightGCN [33] simplifies the framework of GCNs and removes nonlinear activation processes and feature transformations. Neighbor aggregation and message propagation are used to update the embedded representation of nodes and the weight matrix. A message propagation strategy is used to aggregate messages delivered from other directly linked node types (such as users or attributes) to enhance the representation learning of items and users by integrating relevant attributes, and it also naturally solves the problem of missing attributes [41]. An adversarial GCN [42] leverages a GCN-based autoencoder that encodes high-order and complex link pattens to enhance interaction data and utilizes a GCN-based attention mechanism to obtain the social relations of each user. BLoG [43] uses an online encoder and a target encoder to encode user–item bipartite graphs to learn user–item representations and uses local and global regularization to facilitate information interaction between these two different graph encoders. Although GNNs have achieved remarkable success in recommender systems in the past few years, most existing GNN-based algorithms are researched on homogeneous graph with only one node type and one edge type. Meanwhile, most real-world graphs consist of different types of nodes and edges with related attributes in different feature spaces, and these nodes have sufficient attribute information.

2.2. Heterogeneous Graph Neural Networks

It is generally considered that most real-world graphs are better represented by heterogeneous graphs or heterogeneous information networks (HINs) composed of different types of nodes and edges with related attribute information in different feature spaces. HetGNN [15,24,44,45,46,47] has powerful structural and semantic feature extraction capabilities by adopting methods that are meta-path-based [48], relation-based [49], structure-based [50], structure-attribute-based [51] and adversarial-based [52]. SeHGNN [48] adopts a lightweight average aggregator to avoid repeated neighbor aggregation and utilizes a transformer-based semantic fusion strategy for single-layer long meta-paths to extend the features of meta-paths, ultimately achieving the objectives of high prediction precision and fast training speed. GraphENS [53] keeps K% of the most-necessary nodes and computes the remaining attributes by using a linear interpolation method to avoid overfitting. HetEGCN [54] is a heterogeneous graph convolutional neural network model for text classification that effectively utilizes the relationships between different types of text and uses cross-modal information to improve text classification performance. The pairwise proximity preservation module obtains the embedding representation based on the attribute information, and the network scheme proximity module designs multi-task learning to achieve node prediction with the help of node embedding representation [55]. Although remarkable achievements have been made in the current research on heterogeneous graphs and meta-path and node aggregation are very sufficient, the utilization of attribute information of nodes and items is still insufficient. In the existing studies, node attribute information and interaction information are used as the initial input [15,44,45], autoencoders are used to deeply learn high-order combinations of node attributes [26,55], neighbor aggregation based on relations uses meta relation attention mechanisms [56,57], and so on for node embedding representation. In the realm of heterogeneous graphs, “node context characteristics” play a pivotal role in capturing the rich interplay between different node types. These characteristics encapsulate not just the inherent attributes of a node, such as user demographics or item specifications, but also the relational context provided by their connections within the graph. For example, the context of a user node could be shaped by the user’s historical interactions, preferences, and semantic relations to other users or items. This understanding is crucial for the development of our multi-feature interaction meta-path aggregation heterogeneous graph neural network (MIMA), which seeks to integrate these context characteristics into the recommendation process. However, most researches do not utilize node context characteristics, and their models rarely perform well on heterogeneous graphs with sufficient node context characteristics (for example, meta-path 2vec [58], ESim [59], HIN2vec [60], and HERec [61]). And some studies only consider two end nodes, discarding all intermediate nodes on the meta-path, resulting in information loss (such as HAN [14]).

In summary, the integration of node attributes in GNN training, while promising, has been hindered by the intricacies of heterogeneous graphs. Existing models, such as the heterogeneous graph attention network (HAN) and the heterogeneous graph neural network (HetGNN), leverage attention mechanisms to navigate these complex structures. However, they often neglect the pivotal role that node attributes play in meta-path aggregation, which can significantly curtail the quality of recommendations. Traditional GNNs, including graph convolutional networks (GCNs) and graph attention networks (GATs), also fall short in this regard, as they do not fully exploit node attributes, leading to a loss of valuable contextual information. Meta-path-based methods, exemplified by the graph learning augmented heterogeneous graph neural network (GL-HGNN), utilize relationships between users and items but are in dire need of a more sophisticated approach to integrate node attributes, which would enhance the interpretability of recommendations. Our approach, as detailed in the subsequent sections, aims to bridge these gaps by fully utilizing node attributes in meta-path aggregation, thereby improving the accuracy and interpretability of recommendations in heterogeneous graphs.

In order to fully utilize attribute information of nodes and items in high-order feature learning and meta-path aggregation, we model the attribute feature preferences of users and items while considering the historical interactions between them and construct meta-path embedding information that is consistent with interpretability. Therefore, we fully learn the attribute feature information of users and items in heterogeneous graphs with attribute information by feature node aggregation and obtain the semantic feature of the meta-path aggregation with features nodes by passing information. Our proposed model considers the historical preferences of users and multiple attribute feature information of users and items, considers the meta-path with attribute information between items and users, and fuses the items and users together to achieve recommendations. Specifically, the user–item feature interaction layer is designed to capture implicit embedding information between users and items based on their feature information. Then, a graph convolutional neural network is designed to map the semantic information of meta-paths to learn representations containing meta-path information. Finally, the embedded representations of users and items are learned by leveraging multi-feature interactions and meta-path aggregation. The proposed model makes the learned user and project representations interpretable and greatly improves the effectiveness of recommendations.

3. Method

3.1. Heterogeneous Entity Path Definition

Firstly, a heterogeneous graph is defined with different numbers of node types for each dataset; for example, movie recommendation heterogeneous graphs have user and item nodes that include node features. They represent each movie and item as a node, and node features include the movie ID, movie type attributes, user ID, and user attributes. Meanwhile, attributes of movies can be represented as feature nodes, and the node features are vector-encoded representations of the corresponding movie attributes. In this way, there are three different types of nodes and two different types of edges, as in Figure 1a. The types of nodes consist of users, items, and attributes. The edge relationship between a user and the corresponding movie represents the association between the user and the movie. The edge relationship between the movie or user node and the corresponding feature node represents the association between the movie/user and its features.

A meta-path is a way to describe the semantic relationship between nodes in a heterogeneous graph and is composed of the order between node types. It can help define semantic association information between different types of nodes. As shown in Figure 1b, based on the information type of nodes, the form of the meta-path definition can be constructed as user ->item ->user. This meta-path represents that two users have the same preferences for movies. Alternatively, Figure 1c shows A -> B -> C -> B -> A, where C represents the feature node. And this meta-path represents the connection between two users through a common movie type, where two users have similar preferences for movie type. Figure 1d illustrates the process flow of our methodology from meta-path construction to aggregation.

The random walk method is used to sample different types of nodes that are strongly correlated with each node. Starting from a certain node, random walk sampling is carried out with a fixed length, and each time, one of neighboring nodes is visited with equal probability or the initial node is returned to. The number of samples for each type of node is fixed, ensuring that each type of node is sampled. For each node, a strongly correlated neighboring node is sampled to obtain relevant path information. Then, each meta-path is encoded into a vector representation through meta-path encoding. The set of paths formed based on these meta-paths is used to capture association information between different types of nodes. Meanwhile, an attention mechanism is introduced into graph neural networks to aggregate information between different types of nodes.

3.2. Input Layer

Users, items, and their attributes are mapped to vector representations by referring to the mainstream recommendation model [62], and a high-dimensional feature representation of the corresponding input data is generated. The initial representation of the corresponding parameter matrix that needs to be learned is generated randomly through initialization. In our proposed model, the attribute features of users and items are represented as a high-dimensional vector and are embedded in the historical interaction graph. By mapping features to higher dimensions, deep semantic information of features can be learned more deeply.

x = [x_{1}, x_{2}, . . ., x_{m}]

(1)

where

x

is the overall input vector, m represents the dimension of

x

, and

x_{m}

represents the user and item feature vector.

3.3. Embedding layer

The user ID, item ID and their attributes are mapped to the same low-dimensional feature space for easy computation to allow interaction between features and IDs;

e

represents the high-dimensional embedding vector.

e = w^{T} x

(2)

where

w

is a specific weight parameter matrix for the node. Before the node vectors are fed into the model, an attribute-specific linear transformation is applied to each type of node by projecting the feature vectors into the same feature space. And the heterogeneity of the graph that arises from the node context attributes is solved. Then, all nodes have the same dimensions in their mapping space and share the same dimensionality in their projected features, which facilitates the operation of the feature interaction layer.

Mapping to a low-dimensional feature space is a common practice in GNNs. This approach is chosen to facilitate the computation of interactions between features and IDs, as it allows for more efficient processing while capturing the essential relationships between nodes. The choice is justified by the need to balance computational efficiency with the preservation of meaningful node interactions.

3.4. Feature Interaction Layer

Based on all features of the nodes, the key issue in high-order combined feature modeling is to determine which combinations of features form high-value, high-order features. Through iterative training, users learn preference representations for different feature fusions and obtain the semantic information from the fused high-order features. Therefore, depending on the obtained user, item, and feature vectors with the same dimension, our feature extraction approach employs a multi-head attention mechanism to integrate different features of items and users to discover which combinations are more consistent with user preferences. As an illustration, let us consider feature m as a specific instance. The association between feature m and k is calculated. The multiple valuable high-order features of feature m are achieved within head h.

\begin{matrix} α_{m, k}^{h} = \frac{exp (φ^{(h)} (e_{m}, e_{k}))}{\sum_{l = 1}^{M} (φ^{(h)} (e_{m}, e_{l}))} \\ φ^{(h)} (e_{m}, e_{k}) = < W_{q}^{(h)} e_{m}, W_{k}^{(h)} e_{k} > \\ e_{m}^{(h)} = \sum_{k = 1}^{M} α_{m, k}^{h} (W_{v}^{(h)} e_{k}) \end{matrix}

(3)

where l is a variable that ranges from 1 to m,

e_{m}^{(h)}

is updated by combining all relevant features with coefficient

α_{m, k}^{h}

of head h, and

φ^{(h)} (e_{m}, e_{k})

represents the attention function, which computes the similarity between features m and k. The transformation matrices

W_{q}^{(h)}

,

W_{k}^{(h)}

, and

W_{v}^{(h)}

are used to transform the original embedding space into the new space.

\begin{matrix} e_{m}^{H} = e_{m}^{1} ⨁ e_{m}^{2} ⨁ . . . ⨁ e_{m}^{h} \\ e_{m}^{R e s} = (e_{m}^{H} + W_{R e s} e_{m}) \end{matrix}

(4)

where

e_{m}^{H}

is the concatenation of each head’s output,

W_{R e s}

is the projection matrix in the case of mismatched feature dimensions, and

e_{m}^{R e s}

is the updated representation of high-order combination features. To preserve the original features, standard residual connections are added in the neural network topology architecture, which means that

e_{m}

is added to the learned feature representation

e_{m}^{H}

.

The use of a multi-head attention mechanism is motivated by the need to integrate different features of items and users in a way that reflects user preferences. This approach is supported by previous works, which have demonstrated the effectiveness of attention mechanisms in capturing the importance of various features in node representations. By employing multi-head attention, our model can learn multiple preference representations, enhancing its ability to identify and prioritize relevant features.

3.5. Meta-Path Aggregation Layer

Based on meta-paths, the structural and semantic information among users, items, and their attributes is captured. Different types of nodes that are strongly correlated to the target node are sampled by utilizing random walks. At each step, the walker either visits the neighbor nodes of the target node with equal probability or returns to the initial node. And the number of samples for each type of node is fixed to ensure that all node types are sampled. Correlation paths are obtained by sampling the strongly correlated neighbor nodes of each node. All node features along the meta-path are converted into a single vector by using a linear mean encoder. The vector representation of a meta-path is to concatenate the vector representation of each node on the meta-path. And then, a graph convolutional neural network is used to aggregate the feature matrix to achieve the aggregated vector representation of each node. The meta-path aggregation process is shown in Figure 2.

By aggregating meta-path semantic information that includes node attributes, the model achieves node representation optimization for improved recommendation performance. The aggregation process consists of two main steps and is implemented as Equation (5):

h_{p (v, u)} = f_{θ} (p (v, u)) = f_{θ} (h_{v}^{'}, h_{u}^{'}, {h_{t}^{'}, \nabla_{t} \in {m^{(v, u)}}})

(5)

where

p (v, u)

is a given meta-path from the target node v to its meta-path neighbor u. And the intermediate nodes of

p (v, u)

are referred to as

m^{(v, u)} = p (v, u) ∖ (v, u)

, which includes all nodes along the meta-path except

v, u

. The features of all nodes on

p (v, u)

are transformed into an embedding vector by leveraging a linear mean encoder. The relevant information is captured from the intermediate nodes. Overall, this approach allows us to aggregate and represent the features of the nodes along

p (v, u)

, enabling the incorporation of abundant valuable structural and semantic information in the following tasks.

The term

h_{v}^{'}

represents the vector representation of v through feature interaction transformation, and then, the model uses an attention mechanism to assign a weight to and sum the meta-path instances of P related to the target node v. Different meta-paths can promote the representation of target nodes to varying degrees. We calculate the standardized importance weights of each meta-path as

α_{v u}^{P}

and then perform a weighted sum over all meta-paths that have v as the target node, i.e., Equation (6).

\begin{matrix} e_{v u}^{P} = L e a k y R e L U (w_{p}^{T} \cdot [h_{v}^{'} | | h_{p (v, u)}]) \\ α_{v u}^{P} = \frac{exp (e_{v u}^{P})}{\sum_{s \in N_{v}^{P}} exp (e_{v s}^{P})} \\ h_{v}^{P} = σ (\sum_{u \in N_{v}^{P}} α_{v u}^{P} \cdot h_{p (v, u)}) \end{matrix}

(6)

where

w_{p}^{T}

is the parameterized attention vector of the meta-path P.

N_{v}^{P}

is a set of meta-path neighbors based on v. Then, based on the attention coefficient, the vector representation of node v is updated by aggregating its meta-path neighbors. Thereby, an aggregation layer is performed between meta-paths that merges the semantic information on all meta-paths containing v. Once more, different meta-path information is aggregated by assigning distinct weights to diverse meta-paths through attention mechanisms.

\begin{matrix} β_{P_{j}} = \frac{exp (e_{P_{j}})}{\sum_{P \in P} exp (e_{P})} \\ h_{v} = σ (W_{o} \sum_{P \in P} β_{P} \cdot h_{v}^{P}) \end{matrix}

(7)

The set of latent meta-paths passing through v is

P = {P_{1}, P_{2}, . . ., P_{j}, . . ., P_{J}}

, and J is the number of meta-paths. Among them,

e_{P_{j}}

is the concatenation vector representation of all nodes on

P_{j}

.

β_{P_{j}}

denotes the weight coefficient of the meta-path

P_{i}

that consists of v. Because different meta-paths have different levels of importance, the meta-paths containing v are fused as in Equation (7) to further optimize the embedding representation.

W_{o}

is the parameter matrix. The embedding vector of node v is projected and transformed to a reasonable output dimension by using an activation function

σ

. As shown in Figure 3, the specific process of learning target node embedding representation is demonstrated in the entire aggregation layer.

The use of meta-path aggregation is driven by the necessity to capture the complex relationships inherent in heterogeneous graphs. This method is chosen to incorporate both structural and semantic information into the node representations, which is essential for accurate recommendations. Meta-paths provide a structured framework for navigating the graph and extracting valuable relational insights, thereby improving the interpretability and precision of our recommendations.

3.6. Prediction Layer

By propagating information through the meta-path convolutional aggregation layer, we have comprehensive knowledge about the user preferences for each item and can accurately deduce the fused attribute information of the user’s favorite items. Then, the user preference for each item is calculated by performing a dot product operation on the latest learned user and item embedding vectors. Finally, our algorithm generates a set of items that are potentially appealing to the user. The term

σ

denotes the activation function in Equation (8).

y_{u i} = σ (h_{u} \cdot h_{i})

(8)

3.7. Model Training

The embedding vector representations of users and items, the user’s preference coefficients for different feature combinations, and the weight coefficients of combinations among different features are the parameters to be trained for the proposed MIMA model. The model weights are optimized by minimizing the loss functions through negative sampling, and the user’s preferences for positive samples and negative samples are computed separately. And then, the trained parameters are updated based on the differences between negative and positive samples. The loss function is Equation (9).

L = - \sum_{(u, v) \in Ω} log σ (h_{u}^{T} \cdot h_{v}) - \sum_{(u^{'}, v^{'}) \in Ω^{-}} log σ (- h_{u^{'}}^{T} \cdot h_{v^{'}})

(9)

where

Ω

is the set of observed (positive) interaction records between nodes, and

Ω^{-}

is a set of negative node pairs randomly selected from all unobserved node pairs. The Adam optimizer [63] is used for model training. Figure 4 illustrates the comprehensive framework of the proposed MIMA algorithm, including the input layer, embedding layer, feature interaction layer, meta-path aggregation layer, and prediction layer. Algorithm 1 demonstrates the pseudocode of the MIMA algorithm.

Algorithm 1 Multi-feature interaction meta-path aggregation heterogeneous graph neural network (MIMA).

Input: heterogeneous graph with attributes $G = (V, E)$ , node feature { $e_{i}, \forall i \in V$ }, meta-path set $P$ = { $P_{1}, \dots, P_{J}$ }, input vector $x$ , number of attention head H, the training rounds L, and meta-path neighbor set of v as $N_{v}^{P}$
Output: the embedding vector representation of users U and items V
for Node feature { $e_{i}, \forall i \in V$ } do
Node content transformation $h_{i} \leftarrow w_{i} \cdot x_{i} \forall i \in V$
end for
for $l = 1$ to L do
for Node $e_{i} \in V$ do
for Meta-path $P \in P$ do
Calculate the embedding representation of each node feature $e_{i}^{R e s} \leftarrow \sum_{k = 1}^{V} α_{i, k}^{h} (W_{i}^{h} e_{k}) + W_{R e s} e_{i}$
Calculate the vector representation of the internal aggregation of each target node’s meta-path and the aggregation between meta-paths $h_{v} \leftarrow σ (W_{o} \cdot \sum_{P_{j} \in P} \frac{exp (e_{P_{j}})}{\sum_{P \in P} exp (e_{P})} \cdot σ (\sum_{u \in N_{v}^{P}} α_{v u}^{P} \cdot h_{p (v, u)}))$
end for
end for
end for

MIMA algorithm complexity: The input vector e with dimension d is transformed into an embedding representation with dimension

\tilde{d}

; the dimension of the transformation matrix is

d \times \tilde{d}

. The computational complexity of generating the embedding representations for diverse node categories can be denoted as

O (N d \tilde{d})

, where N refers to the total number of nodes. When determining the attention coefficient, it is necessary to compute the user–item graph repeatedly, with the number of evaluations matching the number of edges. The computational complexity associated with this operation is

O (h (E \times \tilde{d}))

, where E signifies the quantity of edges in the heterogeneous graph, and h represents the count of heads within the multi-head attention module. The aggregation calculation of node characteristics usually involves the information propagation and feature aggregation operations of neighbor nodes. Considering different numbers of neighbor nodes and varying feature dimensions, the time complexity of aggregation operations is generally

O (\tilde{d} \times N)

. In general, the computational complexity of aggregation between meta-paths can be represented as

O (p \times \tilde{d})

, where p is the numerical value of the meta-paths. The upper limit of iterations for the MIMA algorithm is L. The total time complexity can be denoted by

O (L (N d \tilde{d} + h (E \times \tilde{d}) + (\tilde{d} \times N) + (p \times \tilde{d})))

.

4. Experiment

4.1. Experimental Design

Dataset: The effectiveness of the proposed MIMA algorithm is evaluated by measuring its performance using three publicly available large datasets, i.e., MovieLens-1m [64], Amine, and Amazon-Book [65]. These three datasets have different properties, including size, domain, and sparsity, which are shown in detail in Table 1.

MovieLens-1m: The anonymous users have a user ID, gender, age, occupation, and other attribute information, and movies include comedy, romance, action, and others; each user rates at least 20 movies. The training set, which consists of 80% of the interaction data, is 3.64 MB in size. The test set, comprising the remaining 20% of the data, is 940 KB. Amine: Each user can add animations to their completed list and rate them, and animations have the following attribute information: anime ID, anime name, genre, type, episodes, rating, and members. The training set for this dataset is 4.8 MB, while the test set is 1.2 MB. Amazon-Book: Book reviews consist of the following comment properties: review ID, title, price, user ID, review helpfulness, review score, and so on. Each book contains the following feature information: title, description, authors, categories, rating, and so on. The training set size is 901 MB, and the test set is 225 MB.

Through the above explanation, we find that users and items have attribute information in the above datasets that is sufficient for building a heterogeneous graph containing features nodes. Then, we use a data preprocessing method to complete the optimization of the three datasets used by only retaining interaction data for which each user has at least 20 interactions with items. And then,

80 %

of the historical interaction data is randomly selected to form the training set, and the remaining historical interaction data are used as the test set. Meanwhile, each transaction record is a positive instance in the training set, and the negative sample is selected from the remaining items that the user has not interacted with before by using a random strategy.

Evaluation metrics: The MIMA algorithm takes users, items, and their feature information as input to obtain the user preference scores for all items. In order to evaluate the performance of the output MIMA model’s top-K recommendation and preference rankings, we use the following widely used evaluation metrics: Recall@K, where

K \in {20, 30, 50}

, Precision@K, where

K \in {10, 20}

, and NDCG@K where

K = 20

.

Baselines: The proposed algorithm is compared with several classic baselines to analyze and evaluate the effectiveness of the MIMA: ItemCF (item-based collaborative filtering algorithm) [66], MF (matrix factorization algorithm) [67], NCF (neural collaborative filtering algorithm) [68], NGCF (neural graph collaborative filtering algorithm) [34], PUP (price-aware GCN algorithm) [69], PinSAGE (random-walk-based GCN algorithm) [70], ATGCF (attention-based GCF) [71], ATGCN (attention-based GCN) [72], HetGNN [15], and HAN (heterogeneous graph attention network) [14].

Hyperparameters: The embedding size for all models is within

[32, 64, 128, 256, 512]

. The learning rate is set as

[0.0001, 0.0005, 0.001, 0.005]

, the batch size is 4096, and there are three layers. The dropout for each layer is

0.1

, and the maximum number of epochs is set to 100. It should be noted that during the training process, the users, items, and feature embeddings in the model are jointly trained. The dimension size of the graph convolution operator is the sum of the number of users, the number of items, and the number of features. The generation graph treats users, items, and their features as a two-step interaction graph, and the value of the interaction matrix represents whether there is an interaction. If there is an interaction, the value is 1; otherwise, the value is 0. The normalization coefficient of

L 2

is in

[10^{- 7}, 10^{- 6}, 10^{- 5}, 10^{- 4}, 10^{- 3}]

.

4.2. Performance Evaluation

Table 2, Table 3 and Table 4 demonstrate the experimental results of the MIMA algorithm and its baselines across three datasets. As shown in the tables, we find that all evaluation metrics of the MIMA model, including Recall@K, Precision@K, and NDCG@K, are significantly better than those of all baselines models. Compared with the highest values of other algorithms, the MIMA model improves the NDCG, precision, and recall performance indicators on the Amine dataset by an average of 4.72%, 3.15%, and 4.70%, respectively. On the MovieLens-1m dataset, the performance metrics of NDCG, precision, and recall are improved by an average of 3.13%, 3.75%, and 4.71%, respectively. The average performance improvements in terms of NDCG, precision, and recall reach 8.65%, 5.16%, and 8.60%, respectively, on the Amazon-Book dataset. Meanwhile, the highest performance improvement percent exceeds 10%. These experimental results mean that the approach proposed in this paper is feasible and effective.

Compared with the first six benchmark algorithms, which do not consider node attribute information, MIMA, ATGCF, and ATGCN, which not only use node IDs but also incorporate node attribute features, have better prediction effectiveness. According to the experimental conclusion, it is verified that it is feasible and effective to integrate node attribute information into feature extraction to improve recommendations. Meanwhile, MIMA, HetGNN, and HAN integrate meta-path semantic features of nodes in the aggregation layer for heterogeneous graphs and achieve better recommendation performance. And MIMA obtains better performance than HetGNN and HAN by incorporating node attribute information into the meta-path semantic information aggregation process. The experimental results verify that integrating node attribute information into meta-path aggregation operations can effectively improve algorithm performance. Integrating heterogeneous graphs containing feature nodes, user and item embedded representations incorporating attribute features, and meta-path aggregation operations with feature information into MIMA model training can improve the recommendation effectiveness, enhance the interpretability of recommendation results, and provide a better recommendation model. Meanwhile, a statistical significance analysis, i.e., a t-test, was conducted on the three datasets. And the p-values of all three datasets—Amine, MovieLens-1m, and Amazon-Book—were 0.002,

1.8 \times 10^{- 17}

, and

6.2 \times 10^{- 6}

, respectively. The results of the t-tests indicate that the improvements are statistically significant for

p < 0.005

. Overall, through widespread comparison of the three datasets, the MIMA we propose can effectively utilize node attributes to improve recommendation.

4.3. Model Analysis

In order to verify the key role played by the feature interaction layer and meta-path aggregation layer in model optimization, we analyze the impact of user and item embedded representations incorporating attribute features and meta-path aggregation operations with feature information in the recommendation model. And then, we design the following comparison baselines for model efficiency verification.

MIMA-F: This method has user and item embedded representations incorporating attribute features but not meta-paths in the aggregation operation. MIMA-P: This approach has user and item embedded representations incorporating attribute features but not user and item attribute information in meta-path aggregation operations.

As shown in Figure 5, experimental results demonstrate that MIMA achieves the best Precision@20, Recall@20 performance, and average execution time per round among the three approaches. Specifically, MIMA-P obtains better results than MIMA-F, which indicates that integrating meta-path aggregation into model optimization can lead to better results than not using meta-paths. MIMA obtains better results than MIMA-P, which demonstrates that meta-path aggregation that integrates attribute information can achieve better performance than not incorporating user and item features in meta-path aggregation. Hence, the multi-feature interaction meta-path aggregation heterogeneous graph neural network achieves significant improvements in relevance ranking and recommendation accuracy. Meanwhile, its execution time is not much different than that of the other models. The higher density and greater number of nodes of the Amine dataset lead to more complex feature extraction and also increase the execution time due to feature interactions and meta-path aggregation with nodes and their attributes. The results show that our proposed algorithm can better utilize the attribute information of users and items and meta-path semantic contexts with node attribute information, thereby improving the performance of the recommendation system.

4.4. The Influence of the $L 2$ Regularization Coefficient

Figure 6 shows the impact of the

L 2

regularization coefficient

λ

on the MIMA model. Through analyzing the experimental results in Figure 6, the optimal

λ

is

1 \times 10^{- 5}

for the three datasets, i.e., MovieLens-1m, Anime, and Amazon-Book. While

λ \geq 1 \times 10^{- 5}

, the Precision@20 and Recall@20 performance of the MIMA model shows a gradually declining trend. The experimental conclusion indicates that increasing the coefficient

λ

has a negative influence on MIMA model training, so select a better regularization coefficient

λ

as much as possible.

4.5. The Influence of the Feature Embedding Dimensions

We analyze the effect of the feature embedding dimensions on the three datasets, and the experimental results are shown in Figure 7. According to the research findings, the results of the precision and recall indicators show a general increase when the feature embedding dimension is from 32 to 64. However, as the embedding dimension further increases, the performance gradually decreases, possibly due to overfitting. The experimental results indicate that the larger the embedding dimension, the greater the complexity of the model, resulting in a downward trend in the MIMA’s performance indicators. Therefore, when choosing feature embedding dimensions, there is a trade-off between algorithm complexity and model performance that needs to be balanced.

4.6. The Convergence of the MIMA Algorithm

Figure 8 demonstrates that the loss function value gradually decreases and becomes stable as the number of training iterations increases. Through analyzing the results, we find that the value of the loss function drops rapidly in the initial stage, indicating the algorithm is learning the preference relationships between users and items. After a certain number of iterations, the loss function maintains small fluctuations. Meanwhile, the weight of the recommendation model reaches an optimal value and can accurately predict the user preferences. Moreover, the BPR and MF loss function values tend to stabilize after several iterations, and the MIMA model achieves convergence.

4.7. The Influence of the Meta-Path Length

Figure 9 illustrates the impact of the meta-path length on the MIMA algorithm’s performance. The results of the precision and recall indicators gradually increase first and then decrease when the meta-path length is from 1 to 5. Shorter meta-paths may result in loss of feature information and weaker generalization capability because the meta-paths lack the refinement of complex relationships. Specifically, a meta-path with only one node cannot capture deeper association information, resulting in poor recommendation performance. Conversely, having too many nodes in a meta-path introduces too much noise and redundant information, making it difficult for the algorithm to extract useful features from the meta-path. Therefore, there is a trade-off between the number of nodes and model performance when choosing meta-path. In the MIMA algorithm, the optimal results are achieved when the meta-path includes three nodes.

4.8. Suitability Verification

Our multi-feature interaction meta-path aggregation heterogeneous graph neural network (MIMA) is designed to enhance the interpretability of recommendations by integrating node attributes and meta-path information. This approach allows for a more transparent understanding of the factors influencing each recommendation.

By incorporating attribute-enhanced embeddings and meta-path aggregation, MIMA not only improves recommendation accuracy but also provides a clear explanation for its suggestions. This ensures that our model’s recommendations are both effective and interpretable.

An example is the KuaiRec dataset, which is a full-exposure dataset for recommender systems that was jointly produced by a team from CSU and the Computer Science Department of Racer. We also used this dataset in our experiments and did not include it in the main text because of the small reference baseline. As shown below, you can see that our MIMA works well on other datasets as well.

Figure 10, which illustrates the precision and recall curves from our experiments using the KuaiRec dataset, underscores the interpretability of the MIMA model. The KuaiRec dataset, with its full-exposure nature and a denseness of up to 99.6%, provides a rich heterogeneous graph with 30 user and 56 video features. Our use of this dataset yielded a recall rate exceeding 40%, demonstrating MIMA’s effectiveness for general heterogeneous graphs and its ability to produce highly interpretable recommendation results.

Potential applications in various domains: The adaptability of the MIMA extends beyond entertainment recommendations. For example, in healthcare, the MIMA could be used to recommend treatments or medications by analyzing patient histories and the medical literature. In finance, it could suggest investment opportunities by considering an investor’s portfolio and market trends. Similarly, in social networks, the MIMA could recommend friends or content based on user interactions and preferences. These applications showcase the broader impact and adaptability of the MIMA algorithm.

In summary, the algorithm proposed in this paper can make full use of user and item attribute information to improve recommendation accuracy. Meanwhile, the experimental results also verify that attribute information has a great impact on recommendation results. Therefore, the multi-feature interaction meta-path aggregation heterogeneous graph neural network is feasible and efficient for recommendations.

5. Conclusions

This paper focuses on enhancing the effectiveness of recommendation algorithms for heterogeneous graphs. We propose the multi-feature interaction meta-path aggregation heterogeneous graph neural network (MIMA) approach, which leverages feature interaction and meta-path aggregation to improve recommendation capabilities. In order to better learn the impact of attribute information on the MIMA, heterogeneous graphs consisting of user nodes, item nodes, and their attribute nodes are constructed. Firstly, a multi-head attention mechanism is employed to fuse the attribute features of users and items into the feature interaction layer to explore the latent preferences of users and items towards the embedding vector representation of multi-feature fusion. And then, a graph convolution operation is applied to aggregate the semantic features of the constructed meta-path information, which consists of attribute information, and in subsequent steps, we achieve high-order feature fusion to capture new recommendation rules between users and items by using attribute information. Moreover, the weights of the MIMA training model and the optimal embedding vectors of users and items are generated for recommendations after multiple iterations. Finally, the experimental results on three public datasets show that the performance of the MIMA algorithm proposed in this article is significantly improved in terms of NDCG, precision, and recall evaluation indicators compared to the baselines.

Author Contributions

Conceptualization, Y.L. and S.Y.; methodology, Y.L. and F.Z.; software, S.Y. and F.Z.; validation, Y.L., S.Y. and F.Z.; formal analysis, Y.J.; investigation, Y.J., S.C. and L.W.; resources, Y.L.; data curation, F.Z.; Writing—original draft preparation, Y.L. and S.Y.; Writing—review and editing, Y.L., F.Z., L.W. and L.M.; visualization, S.Y.; supervision, L.M.; Project administration, L.M.; Funding acquisition, Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Beijing Natural Science Foundation (4234083), the R&D Program of the Beijing Municipal Education Commission (KM202410009003), and the National Key R&D Program of China (2023YFC3107804).

Data Availability Statement

No datasets were generated during the current study.

Conflicts of Interest

Lei Wang is employee of Henan Provincial Bureau of Statistics. The authors declare no conflicts of interest.

References

Zhang, P.; Chartrand, G. Introduction to Graph Theory; Tata McGraw-Hill: New Delhi, India, 2006. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Wang, D.; Cui, P.; Zhu, W. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 1225–1234. [Google Scholar]
Battaglia, P.; Pascanu, R.; Lai, M.; Jimenez Rezende, D. Interaction networks for learning about objects, relations and physics. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar] [CrossRef]
Fout, A.; Byrd, J.; Shariat, B.; Ben-Hur, A. Protein interface prediction using graph convolutional networks. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar]
Li, Y.; Yu, R.; Shahabi, C.; Liu, Y. Diffusion convolutional recurrent neural network: Data-driven traffic forecasting. arXiv 2017, arXiv:1707.01926. [Google Scholar]
Zhang, J.; Shi, X.; Xie, J.; Ma, H.; King, I.; Yeung, D.Y. GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs. In Proceedings of the 34th Conference on Uncertainty in Artificial Intelligence 2018, UAI 2018, Monterey, CA, USA, 6–10 August 2018. [Google Scholar]
Jiang, W.; Luo, J. Graph neural network for traffic forecasting: A survey. Expert Syst. Appl. 2022, 207, 117921. [Google Scholar] [CrossRef]
Ge, J.; Xu, G.; Zhang, Y.; Lu, J.; Chen, H.; Meng, X. Joint Optimization of Computation, Communication and Caching in D2D-Assisted Caching-Enhanced MEC System. Electronics 2023, 12, 3249. [Google Scholar] [CrossRef]
Atwood, J.; Towsley, D. Diffusion-convolutional neural networks. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. Adv. Neural Inf. Process. Syst. 2013, 26. [Google Scholar]
Sun, Z.; Deng, Z.H.; Nie, J.Y.; Tang, J. RotatE: Knowledge Graph Embedding by Relational Rotation in Complex Space. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Cai, L.; Ji, S. A Multi-Scale Approach for Graph Link Prediction. Proc. AAAI Conf. Artif. Intell. 2020, 34, 3308–3315. [Google Scholar] [CrossRef]
Wang, X.; Ji, H.; Shi, C.; Wang, B.; Ye, Y.; Cui, P.; Yu, P.S. Heterogeneous Graph Attention Network. In Proceedings of the World Wide Web Conference, WWW ’19, San Francisco, CA, USA, 13–17 May 2019; pp. 2022–2032. [Google Scholar]
Zhang, C.; Song, D.; Huang, C.; Swami, A.; Chawla, N.V. Heterogeneous Graph Neural Network. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, Anchorage, AK, USA, 4–8 August 2019; pp. 793–803. [Google Scholar]
Berg, R.v.d.; Kipf, T.N.; Welling, M. Graph convolutional matrix completion. arXiv 2017, arXiv:1706.02263. [Google Scholar]
Zhang, J.; Shi, X.; Zhao, S.; King, I. STAR-GCN: Stacked and Reconstructed Graph Convolutional Networks for Recommender Systems. In Proceedings of the IJCAI International Joint Conference on Artificial Intelligence, Macao, China, 10–16 August 2019; p. 4264. [Google Scholar]
Bing, R.; Yuan, G.; Zhu, M.; Meng, F.; Ma, H.; Qiao, S. Heterogeneous graph neural networks analysis: A survey of techniques, evaluations and applications. Artif. Intell. Rev. 2023, 56, 8003–8042. [Google Scholar] [CrossRef]
Wu, S.; Sun, F.; Zhang, W.; Xie, X.; Cui, B. Graph Neural Networks in Recommender Systems: A Survey. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
Kipf, T.N.; Welling, M. Semi-Supervised Classification with Graph Convolutional Networks. In Proceedings of the International Conference on Learning Representations, Toulon, France, 24–26 April 2017. [Google Scholar]
Veličković, P.; Cucurull, G.; Casanova, A.; Romero, A.; Liò, P.; Bengio, Y. Graph Attention Networks. In Proceedings of the International Conference on Learning Representations, Vancouver, BC, Canada, 30 April–3 May 2018. [Google Scholar]
Abu-El-Haija, S.; Perozzi, B.; Kapoor, A.; Alipourfard, N.; Lerman, K.; Harutyunyan, H.; Steeg, G.V.; Galstyan, A. MixHop: Higher-Order Graph Convolutional Architectures via Sparsified Neighborhood Mixing. In Proceedings of the 36th International Conference on Machine Learning, PMLR ’39, Long Beach, CA, USA, 9–15 June 2019; Volume 97, pp. 21–29. [Google Scholar]
Zhang, Y.; Wang, X.; Shi, C.; Jiang, X.; Ye, Y. Hyperbolic Graph Attention Network. IEEE Trans. Big Data 2022, 8, 1690–1701. [Google Scholar] [CrossRef]
Zhang, Y.; Wu, L.; Shen, Q.; Pang, Y.; Wei, Z.; Xu, F.; Chang, E.; Long, B. Graph Learning Augmented Heterogeneous Graph Neural Network for Social Recommendation. ACM Trans. Recomm. Syst. 2023, 1, 1–22. [Google Scholar] [CrossRef]
Zhao, J.; Wang, X.; Shi, C.; Hu, B.; Song, G.; Ye, Y. Heterogeneous Graph Structure Learning for Graph Neural Networks. Proc. AAAI Conf. Artif. Intell. 2021, 35, 4697–4705. [Google Scholar] [CrossRef]
Zhang, C.; Swami, A.; Chawla, N.V. SHNE: Representation Learning for Semantic-Associated Heterogeneous Networks. In Proceedings of the Twelfth ACM International Conference on Web Search and Data Mining, WSDM ’19, Melbourne, VIC, Australia, 11–15 February 2019; pp. 690–698. [Google Scholar]
Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
Zhang, Z.; Cui, P.; Zhu, W. Deep Learning on Graphs: A Survey. IEEE Trans. Knowl. Data Eng. 2022, 34, 249–270. [Google Scholar] [CrossRef]
Grover, A.; Leskovec, J. node2vec: Scalable Feature Learning for Networks. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’16, San Francisco, CA, USA, 24–27 August 2016; pp. 855–864. [Google Scholar]
Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’14, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
Chen, H.; Yeh, C.C.M.; Fan, Y.; Zheng, Y.; Wang, J.; Lai, V.; Das, M.; Yang, H. Sharpness-Aware Graph Collaborative Filtering. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, Taipei, Taiwan, 23–27 July 2023; pp. 2369–2373. [Google Scholar]
Sun, W.; Chang, K.; Zhang, L.; Meng, K. INGCF: An Improved Recommendation Algorithm Based on NGCF. In Algorithms and Architectures for Parallel Processing; Lai, Y., Wang, T., Jiang, M., Xu, G., Liang, W., Castiglione, A., Eds.; Springer: Cham, Switzerland, 2022; pp. 116–129. [Google Scholar]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. LightGCN: Simplifying and Powering Graph Convolution Network for Recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’20, Xi’an, China, 25–30 July 2020; pp. 639–648. [Google Scholar]
Wang, X.; He, X.; Wang, M.; Feng, F.; Chua, T.S. Neural Graph Collaborative Filtering. In Proceedings of the 42nd International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR’19, Paris, France, 21–25 July 2019; pp. 165–174. [Google Scholar]
Xu, K.; Hu, W.; Leskovec, J.; Jegelka, S. How Powerful are Graph Neural Networks? In Proceedings of the International Conference on Learning Representations, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
Wei, Y.; Liu, W.; Liu, F.; Wang, X.; Nie, L.; Chua, T.S. LightGT: A Light Graph Transformer for Multimedia Recommendation. In Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’23, Taipei, Taiwan, 23–27 July 2023; pp. 1508–1517. [Google Scholar]
Yang, Z.; Dong, S. HAGERec: Hierarchical Attention Graph Convolutional Network Incorporating Knowledge Graph for Explainable Recommendation. Knowl. Based Syst. 2020, 204, 106194. [Google Scholar] [CrossRef]
Zhao, J.; Liu, Z.; Sun, Q.; Li, Q.; Jia, X.; Zhang, R. Attention-based dynamic spatial-temporal graph convolutional networks for traffic speed forecasting. Expert Syst. Appl. 2022, 204, 117511. [Google Scholar] [CrossRef]
Su, Y.; Zhao, Y.; Erfani, S.; Gan, J.; Zhang, R. Detecting Arbitrary Order Beneficial Feature Interactions for Recommender Systems. In Proceedings of the 28th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’22, Washington, DC, USA, 14–18 August 2022; pp. 1676–1686. [Google Scholar]
Kim, M.; Choi, H.-S.; Kim, J. Explicit Feature Interaction-Aware Graph Neural Network. IEEE Access 2024, 12, 15438–15446. [Google Scholar] [CrossRef]
Liu, F.; Cheng, Z.; Zhu, L.; Liu, C.; Nie, L. An Attribute-Aware Attentive GCN Model for Attribute Missing in Recommendation. IEEE Trans. Knowl. Data Eng. 2022, 34, 4077–4088. [Google Scholar] [CrossRef]
Yu, J.; Yin, H.; Li, J.; Gao, M.; Huang, Z.; Cui, L. Enhancing Social Recommendation with Adversarial Graph Convolutional Networks. IEEE Trans. Knowl. Data Eng. 2022, 34, 3727–3739. [Google Scholar] [CrossRef]
Li, M.; Zhang, L.; Cui, L.; Bai, L.; Li, Z.; Wu, X. BLoG: Bootstrapped graph representation learning with local and global regularization for recommendation. Pattern Recogn. 2023, 144, 109874. [Google Scholar] [CrossRef]
Fan, S.; Zhu, J.; Han, X.; Shi, C.; Hu, L.; Ma, B.; Li, Y. Metapath-guided heterogeneous graph neural network for intent recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2478–2486. [Google Scholar]
Fu, X.; Zhang, J.; Meng, Z.; King, I. MAGNN: Metapath Aggregated Graph Neural Network for Heterogeneous Graph Embedding. In Proceedings of the Web Conference 2020, WWW ’20, Taipei, Taiwan, 20–24 April 2020; pp. 2331–2341. [Google Scholar]
Ji, Z.; Wu, M.; Yang, H.; Íñigo, J.E.A. Temporal sensitive heterogeneous graph neural network for news recommendation. Future Gener. Comput. Syst. 2021, 125, 324–333. [Google Scholar] [CrossRef]
Cai, D.; Qian, S.; Fang, Q.; Hu, J.; Xu, C. User cold-start recommendation via inductive heterogeneous graph neural network. ACM Trans. Inf. Syst. 2023, 41, 1–27. [Google Scholar] [CrossRef]
Yang, X.; Yan, M.; Pan, S.; Ye, X.; Fan, D. Simple and Efficient Heterogeneous Graph Neural Network. Proc. AAAI Conf. Artif. Intell. 2023, 37, 10816–10824. [Google Scholar] [CrossRef]
Jin, B.; Zhang, Y.; Zhu, Q.; Han, J. Heterformer: Transformer-based Deep Node Representation Learning on Heterogeneous Text-Rich Networks. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, KDD ’23, Long Beach, CA, USA, 6–10 August 2023; pp. 1020–1031. [Google Scholar]
Gao, X.; Zhang, W.; Chen, T.; Yu, J.; Nguyen, H.Q.V.; Yin, H. Semantic-aware Node Synthesis for Imbalanced Heterogeneous Information Networks. In Proceedings of the 32nd ACM International Conference on Information and Knowledge Management, CIKM ’23, Birmingham, UK, 21–25 October 2023; pp. 545–555. [Google Scholar]
Zheng, S.; Guan, D.; Yuan, W. Semantic-aware heterogeneous information network embedding with incompatible meta-paths. World Wide Web 2022, 25, 1–21. [Google Scholar] [CrossRef]
Chen, M.; Huang, C.; Xia, L.; Wei, W.; Xu, Y.; Luo, R. Heterogeneous Graph Contrastive Learning for Recommendation. In Proceedings of the Sixteenth ACM International Conference on Web Search and Data Mining, WSDM ’23, Singapore, 27 February–3 March 2023; pp. 544–552. [Google Scholar]
Park, J.; Song, J.; Yang, E. Graphens: Neighbor-aware ego network synthesis for class-imbalanced node classification. In Proceedings of the Tenth International Conference on Learning Representations, ICLR, Virtual, 25–29 April 2022. [Google Scholar]
Ragesh, R.; Sellamanickam, S.; Iyer, A.; Bairi, R.; Lingam, V. HeteGCN: Heterogeneous Graph Convolutional Networks for Text Classification. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, WSDM ’21, Virtual, 8–12 March 2021; pp. 860–868. [Google Scholar]
Zhao, J.; Wang, X.; Shi, C.; Liu, Z.; Ye, Y. Network schema preserving heterogeneous information network embedding. In Proceedings of the Twenty-Ninth International Joint Conference on Artificial Intelligence, IJCAI’20, Yokohama, Japan, 7–15 January 2021. [Google Scholar]
Hong, H.; Guo, H.; Lin, Y.; Yang, X.; Li, Z.; Ye, J. An Attention-Based Graph Neural Network for Heterogeneous Structural Learning. Proc. AAAI Conf. Artif. Intell. 2020, 34, 4132–4139. [Google Scholar] [CrossRef]
Yang, Y.; Guan, Z.; Li, J.; Zhao, W.; Cui, J.; Wang, Q. Interpretable and Efficient Heterogeneous Graph Convolutional Network. IEEE Trans. Knowl. Data Eng. 2023, 35, 1637–1650. [Google Scholar] [CrossRef]
Dong, Y.; Chawla, N.V.; Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD international Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 135–144. [Google Scholar]
Shang, J.; Qu, M.; Liu, J.; Kaplan, L.M.; Han, J.; Peng, J. Meta-path guided embedding for similarity search in large-scale heterogeneous information networks. arXiv 2016, arXiv:1610.09769. [Google Scholar]
Fu, T.Y.; Lee, W.C.; Lei, Z. Hin2vec: Explore meta-paths in heterogeneous information networks for representation learning. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1797–1806. [Google Scholar]
Shi, C.; Hu, B.; Zhao, W.X.; Philip, S.Y. Heterogeneous information network embedding for recommendation. IEEE Trans. Knowl. Data Eng. 2018, 31, 357–370. [Google Scholar] [CrossRef]
Song, W.; Shi, C.; Xiao, Z.; Duan, Z.; Xu, Y.; Zhang, M.; Tang, J. AutoInt: Automatic Feature Interaction Learning via Self-Attentive Neural Networks. In Proceedings of the 28th ACM International Conference on Information and Knowledge Management, CIKM ’19, Beijing, China, 3–7 November 2019; pp. 1161–1170. [Google Scholar]
Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
Harper, F.M.; Konstan, J.A. The MovieLens Datasets: History and Context. ACM Trans. Interact. Intell. Syst. 2015, 5, 1–19. [Google Scholar] [CrossRef]
He, R.; McAuley, J. Ups and Downs: Modeling the Visual Evolution of Fashion Trends with One-Class Collaborative Filtering. In Proceedings of the 25th International Conference on World Wide Web, Montreal, QC, Canada, 11–15 April 2016; pp. 507–517. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, WWW ’01, Hong Kong, China, 1–5 May 2001; pp. 285–295. [Google Scholar]
Rendle, S.; Krichene, W.; Zhang, L.; Anderson, J. Neural Collaborative Filtering vs. Matrix Factorization Revisited. In Proceedings of the 14th ACM Conference on Recommender Systems, RecSys ’20, Virtual, 22–26 September 2020; pp. 240–248. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural Collaborative Filtering. In Proceedings of the 26th International Conference on World Wide Web, WWW ’17, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Zheng, Y.; Gao, C.; He, X.; Li, Y.; Jin, D. Price-aware Recommendation with Graph Convolutional Networks. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 133–144. [Google Scholar]
Ying, R.; He, R.; Chen, K.; Eksombatchai, P.; Hamilton, W.L.; Leskovec, J. Graph Convolutional Neural Networks for Web-Scale Recommender Systems. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’18, London, UK, 19–23 August 2018; pp. 974–983. [Google Scholar]
Ma, L.; Chen, Z.; Fu, Y.; Li, Y. Heterogeneous Graph Neural Network for Multi-behavior Feature-Interaction Recommendation. In Proceedings of the Artificial Neural Networks and Machine Learning—ICANN 2022, Bristol, UK, 6–9 September 2022; Pimenidis, E., Angelov, P., Jayne, C., Papaleonidas, A., Aydin, M., Eds.; Springer: Cham, Switzerland, 2022; pp. 101–112. [Google Scholar]
Li, Y.; Zhao, F.; Chen, Z.; Fu, Y.; Ma, L. Multi-Behavior Enhanced Heterogeneous Graph Convolutional Networks Recommendation Algorithm based on Feature-Interaction. Appl. Artif. Intell. 2023, 37, 2201144. [Google Scholar] [CrossRef]

Figure 1. The heterogeneous graph with attributes and meta-path construction.

Figure 2. Meta-path encoding and aggregation strategy.

Figure 3. The aggregation layer of the MIMA algorithm.

Figure 4. The comprehensive model architecture diagram illustrating the proposed MIMA algorithm.

Figure 5. The impact of different feature extraction methods and the presence or absence of meta-paths on the MIMA model.

Figure 6. The impact of different regularization coefficients on the results of the MIMA model.

Figure 7. The impact of key parameter feature embedding dimensions on the results of the MIMA model.

Figure 8. (a) The result of the BPR loss function changing with training time. (b) The result of the MF loss function value changing with training time.

Figure 9. The impact of the length of the meta-path on the results.

Figure 10. Interpretability analysis of MIMA model on KuaiRec dataset.

Table 1. Dataset statistics.

Dataset	#Users	#Items	#Interactions	Density
Amine	15,506	34,325	2,601,998	0.00489
MovieLens-1m	6040	5953	1,000,209	0.04189
Amazon-Book	51,639	84,355	2,648,963	0.00056

Table 2. The performance evaluation on the Amine dataset and improvement compared to the optimal baseline.

Algorithm	Recall20	Recall30	Recall50	Precision10	Precision20	Ndcg20
ItemCF	0.0286	0.0304	0.0412	0.0260	0.0201	0.0115
MF	0.0354	0.0446	0.0644	0.0311	0.0231	0.0214
NCF	0.0402	0.0542	0.0685	0.0342	0.0256	0.0256
NGCF	0.0434	0.0580	0.0772	0.0358	0.0294	0.0294
PUP	0.0482	0.0574	0.0784	0.0336	0.0296	0.0304
PinSage	0.0429	0.0553	0.0756	0.0329	0.0272	0.0295
ATGCF	0.0535	0.0616	0.0806	0.0372	0.0308	0.0320
ATGCN	0.0539	0.0621	0.0815	0.0377	0.0313	0.0322
HetGNN	0.0552	0.0607	0.0833	0.0476	0.0359	0.0341
HAN	0.0547	0.0598	0.0841	0.0452	0.0343	0.0335
MIMA	0.0566	0.0653	0.0910	0.0489	0.0372	0.0357
Improvement	2.5%	5.2%	8.2%	2.7%	3.6%	4.7%

Table 3. The performance evaluation on the MovieLens-1m dataset and improvement compared to the optimal baseline.

Algorithm	Recall20	Recall30	Recall50	Precision10	Precision20	Ndcg20
ItemCF	0.0214	0.0238	0.0512	0.0332	0.0327	0.0396
MF	0.0384	0.0425	0.0748	0.0559	0.0445	0.0425
NCF	0.0401	0.0554	0.0864	0.0532	0.0496	0.0489
NGCF	0.0443	0.0647	0.1026	0.0604	0.0584	0.0504
PUP	0.0455	0.0665	0.1048	0.0588	0.0596	0.0548
PinSAGE	0.0431	0.0607	0.1015	0.0496	0.0478	0.0537
ATGCF	0.0482	0.0689	0.1070	0.0645	0.0627	0.0581
ATGCN	0.0491	0.0695	0.1093	0.0656	0.0635	0.0593
HetGNN	0.0533	0.0702	0.1033	0.0771	0.0747	0.0572
HAN	0.0529	0.0677	0.0995	0.0767	0.0729	0.0559
AFHGCN	0.0545	0.0712	0.1170	0.0820	0.0755	0.0621
Improvement	2.3%	1.4%	7.0%	6.4%	1.1%	4.7%

Table 4. The performance evaluation on the Amazon-book dataset and improvement compared to the optimal baseline.

Algorithm	Recall20	Recall30	Recall50	Precision10	Precision20	Ndcg20
ItemCF	0.0184	0.0196	0.0452	0.0186	0.0195	0.0356
MF	0.0250	0.0267	0.0624	0.0254	0.0201	0.0518
NCF	0.0265	0.0288	0.0665	0.0262	0.0225	0.0542
NGCF	0.0348	0.0387	0.0731	0.0328	0.0257	0.0630
PUP	0.0382	0.0415	0.0754	0.0316	0.0285	0.0624
PinSAGE	0.0283	0.0357	0.0710	0.0317	0.0229	0.0545
ATGCF	0.0408	0.0426	0.0775	0.0358	0.0298	0.0651
ATGCN	0.0416	0.0433	0.0783	0.0367	0.0309	0.0658
HetGNN	0.0412	0.0429	0.0780	0.0381	0.0371	0.0662
HAN	0.0432	0.0455	0.0729	0.0377	0.0359	0.0649
MIMA	0.0475	0.0503	0.0834	0.0410	0.0381	0.0719
Improvement	9.9%	10.5%	6.5%	7.6%	2.7%	8.6%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Yan, S.; Zhao, F.; Jiang, Y.; Chen, S.; Wang, L.; Ma, L. MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for Recommendations. Future Internet 2024, 16, 270. https://doi.org/10.3390/fi16080270

AMA Style

Li Y, Yan S, Zhao F, Jiang Y, Chen S, Wang L, Ma L. MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for Recommendations. Future Internet. 2024; 16(8):270. https://doi.org/10.3390/fi16080270

Chicago/Turabian Style

Li, Yang, Shichao Yan, Fangtao Zhao, Yi Jiang, Shuai Chen, Lei Wang, and Li Ma. 2024. "MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for Recommendations" Future Internet 16, no. 8: 270. https://doi.org/10.3390/fi16080270

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for Recommendations

Abstract

1. Introduction

2. Related Work

2.1. Graph Neural Network

2.2. Heterogeneous Graph Neural Networks

3. Method

3.1. Heterogeneous Entity Path Definition

3.2. Input Layer

3.3. Embedding layer

3.4. Feature Interaction Layer

3.5. Meta-Path Aggregation Layer

3.6. Prediction Layer

3.7. Model Training

4. Experiment

4.1. Experimental Design

4.2. Performance Evaluation

4.3. Model Analysis

4.4. The Influence of the $L 2$ Regularization Coefficient

4.5. The Influence of the Feature Embedding Dimensions

4.6. The Convergence of the MIMA Algorithm

4.7. The Influence of the Meta-Path Length

4.8. Suitability Verification

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Article Menu

MIMA: Multi-Feature Interaction Meta-Path Aggregation Heterogeneous Graph Neural Network for Recommendations

Abstract

1. Introduction

2. Related Work

2.1. Graph Neural Network

2.2. Heterogeneous Graph Neural Networks

3. Method

3.1. Heterogeneous Entity Path Definition

3.2. Input Layer

3.3. Embedding layer

3.4. Feature Interaction Layer

3.5. Meta-Path Aggregation Layer

3.6. Prediction Layer

3.7. Model Training

4. Experiment

4.1. Experimental Design

4.2. Performance Evaluation

4.3. Model Analysis

4.4. The Influence of the L 2 Regularization Coefficient

4.5. The Influence of the Feature Embedding Dimensions

4.6. The Convergence of the MIMA Algorithm

4.7. The Influence of the Meta-Path Length

4.8. Suitability Verification

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.4. The Influence of the $L 2$ Regularization Coefficient