Hierarchical Self-Supervised Learning for Knowledge-Aware Recommendation

Zhou, Cong; Zhou, Sihang; Huang, Jian; Wang, Dong

doi:10.3390/app14209394

Open AccessArticle

Hierarchical Self-Supervised Learning for Knowledge-Aware Recommendation

College of Intelligence Science and Technology, National University of Defense Technology, Changsha 410073, China

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Appl. Sci. 2024, 14(20), 9394; https://doi.org/10.3390/app14209394

Submission received: 24 July 2024 / Revised: 16 September 2024 / Accepted: 9 October 2024 / Published: 15 October 2024

Download

Browse Figures

Versions Notes

Abstract

Knowledge-aware recommendation systems have shown superior performance by connecting user item interaction graph (UIG) with knowledge graph (KG) and enriching semantic connections collected by the corresponding networks. Among the existing methods, self-supervised learning has attracted the most attention for its significant effects in extracting node self-discrimination auxiliary supervision, which can largely improve the recommending rationality. However, existing methods usually employ a single (either node or edge) perspective for representation learning, over-emphasizing the pair-wise topology structure in the graph, thus overlooking the important semantic information among neighborhood-wise connection, limiting the recommendation performance. To solve the problem, we propose Hierarchical self-supervised learning for Knowledge-aware Recommendation (HKRec). The hierarchical property of the method is shown in two perspectives. First, to better reveal the knowledge graph semantic relations, we design a Triple-Graph Masked Autoencoder (T-GMAE) to force the network to estimate the masked node features, node connections, and node degrees. Second, to better align the user-item recommendation knowledge with the common knowledge, we conduct contrastive learning in a hybrid way, i.e., both neighborhood-level and edge-level dropout are adopted in a parallel way to allow more comprehensive information distillation. We conduct an in-depth experimental evaluation on three real-world datasets, comparing our proposed HKRec with state-of-the-art baseline models to demonstrate its effectiveness and superiority. Respectively, Recall@20 and NDCG@20 improved by 2.2% to 24.95% and 3.38% to 22.32% in the Last-FM dataset, by 7.0% to 23.82% and 5.7% to 39.66% in the MIND dataset, and by 1.76% to 34.73% and 1.62% to 35.13% in the Alibaba-iFashion dataset.

Keywords:

knowledge graph; recommendation; self-supervised learning

1. Introduction

Recommender systems play a crucial role in helping users discover interest items by analyzing their historical preferences and behaviors, addressing the challenge of information overload caused by the exponential growth of data on the Internet [1]. These systems have become widely applied in various domains, including e-commerce platforms [2], social websites [3], and music streaming services [4]. Among contemporary approaches, collaborative filtering (CF) [5,6] is a typical framework that leverages users’ behavioral data to measure user-item similarity, leading to effective recommendation services. However, CF-based methods heavily rely on historical user–item interactions, making them inadequate for addressing challenges like data sparsity and cold start, thus restricting the accurate predictions of user preference. To overcome these limitations, we incorporate the structured knowledge graph (KG) into recommender systems. The KG offers additional item-wise connectivity information beyond the user-item interaction graph (UIG), as well as provides fine-grained attribute features, facilitating the capture of the true relatedness behind user behavior and interests [7,8].

The KG serves as a valuable resource, containing rich factual information about items and their relationships. By introducing meaningful inherent semantic relatedness into user behavior data, we generate high-quality user and item representations, mitigating data sparsity and ensuring better recommendation performance. The prevalent KG-aware methods can be divided into three categories: embedding-based, path-based, and GNN-based. Specifically, early embedding-based methods [8,9,10,11] leverage knowledge graph embedding algorithms, such as TransE [12] and TransR [13], to encode entities as vector representations, subsequently integrating them into the recommendation framework. Despite being effective, the aforementioned methods suffer from the deficiency of exploiting the high-order connection within the KG, failing to provide reasonable explanations about user preferences [14]. To capture multi-hop interactions between users and items, the path-based methods enhance representations by considering the meaningful long-range structure within KG [15,16,17]. This structure serves as propagation paths for user preferences, providing an intuitive demonstration of interpretability. Nevertheless, these methods rely on manually constructed meta paths, demanding substantial human effort and exhibiting poor scalability. Recently, with the rise of graph neural network (GNN), there has been a strong research interest in leveraging GNN for recursive information propagation among KG nodes, capturing high-order information [18,19,20,21,22].

Although these methods have shown strong performance, the integration of KG also introduces novel challenges, particularly concerning data sparsity and the long-tail issue inherent in the KG [23]. To address the scarcity of the supervision signal in sparse data scenarios, self-supervised learning (SSL) [24], which can be generally categorized into generative and contrastive, has garnered significant attention as an emerging learning paradigm capable of reducing dependency on manual labels. Several recent SSL-based methods [25,26] leverage different data augmentation strategies to construct contrasting views of the UIG, thereby exploiting the intrinsic structure and patterns within the UIG data to enhance recommendation performance. Furthermore, SSL-based methods that incorporate KG, such as KGCL [27], KGTN [28] and HCVCL [29], aim to create contrasting views within both the KG and UIG data, enabling items and users to learn from the rich semantic and contextual information embedded in the KG. KAUR [30] unifies UIG and KG as a collaborative knowledge graph and treats neighbor nodes as contrastive positive pairs. KGRec [31] proposes a unified framework that combines two self-supervised learning paradigms, namely generative and contrastive learning. These methods learn user and item representations from the available original data autonomously. As we can see, SSL-based methods for recommendation mitigate sparsity and cold-start issues by eliminating model dependence on large volumes of labeled data.

Even with the success of the methods mentioned above, most SSL-based efforts in knowledge-aware recommendation typically focus on uniform frameworks for learning representations, with limited exploration into hierarchical learning. They commonly employ contrastive learning paradigms, either by constructing multiple contrasting views between the KG and UIG (e.g., HCVCL [29]) or by introducing cross-knowledge graph structures to form multiple contrasting views (e.g., KGIN [23], MCCLK [32]). In the process of graph augmentation, they particularly emphasize edge-level disruption and learning, over-emphasizing the pair-wise topology structure, which refers the one-to-one connections between nodes. As a result, they overlook the richness of the KG structure, especially the important semantic and structural information inherent in neighborhood-wise connections—the many-to-one connections where a node is linked to multiple neighboring nodes. This limitation restricts their ability to deeply explore the neighborhood-wise semantic connections between KG entities and UIG nodes. It is noteworthy that the coarse-grained semantic understanding can result in the generation of low-quality user–item interaction representations, negatively impacting recommendation performance.

To overcome this limitation, we emphasize a comprehensive strategy that sufficiently processes KG facts and UIG data in a more effective and fine-grained manner. We capture implicit relevant relatedness between entities and items, obtaining high-quality representations for recommendation. Therefore, we propose a method named Hierarchical self-supervised learning for Knowledge-aware Recommendation (HKRec), which seeks to maximize the utilization of explicit hierarchical knowledge to acquire valuable implicit semantics. This enables the model to enhance user–item representations from two perspectives. First, we introduce the Triple-Graph Masked Autoencoder (T-GMAE) designed to better reveal the representation of inherent semantic relations within the KG. T-GMAE achieves this by hierarchically reasoning the masked node features, connections, and degrees. Second, we incorporate contrastive learning in a hybrid manner, employing both neighborhood-level and edge-level augmentation in a parallel way. This bridges the latent information mining of the KG with the user–item interaction modeling via two augmentation operators to achieve more comprehensive information distillation. The parallel contrastive learning paradigm is intended to distill auxiliary supervision signals, ensuring that the model can effectively integrate information hierarchically for improved entity–item alignment.

In summary, we outline the contributions of HKRec as follows:

We highlight the significance of the hierarchical learning mechanism in both KG and UIG for knowledge graph-aware recommendation within a joint self-supervised learning paradigm. This is crucial as it generates more valuable self-supervised signals for item and user representations, enhancing the ability to mine useful knowledge in scenarios of data sparsity.
We present a model HKRec to unify the generative and the contrastive learning paradigms in a hierarchical manner. HKRec captures the implicit semantic information from the knowledge graph incorporating a multi-perspective hierarchy and integrates representations into recommender systems through hierarchical entity–item alignments.
We perform extensive experiments on three real-world benchmark datasets to verify that HKRec achieves significant performance compared to recent state-of-the-art baselines.

The remainder of the paper is organized as follows. Section 2 summarizes the related works. Section 3 formally presents our studied task. Section 4 introduces the HKRec model, providing an in-depth introduction to its key components. Section 5 compares the recommendation performance of the proposed model with other benchmark model-based real-world datasets and includes ablation studies, analyses of the benefits of HKRec, and a visual case study. Finally, Section 6 concludes the paper.

2. Related Work

2.1. Knowledge-Aware Recommendation

Embedding-based. Embedding-based methods [9,10,11,33,34] leverage relations and entities in the KG to enhance the semantic representations in recommender systems [14]. The earliest works trace back to CKE [11] which incorporates different types of side information into the collaborative filtering framework and the embedding of items’ structural knowledge is encoded with TransR [13]. Another representative solution is KTUP [10], which employs a joint learning mechanism of KG completion and item recommendation. Specifically, TransH [35] is utilized to learn distributional representations of the KG, and a hyperplane-based translation component is designed to handle user preferences. Multi-Rec [34] proposes a multi-task learning framework that combines CF-based and embedding-based to simultaneously learn user features in UIG and item features in KG. MFPRKG [8] proposes a multi-feature augmented model that not only embeds structured information from the KG, but also fuses temporal and textual information. However, one problem shared by embedding-based methods lies in their incapacity to handle higher-order connectivities, coupled with a lack of interpretability for recommendation.

Path-based. Path-based methods [4,16,17,36,37,38] are devoted to exploring the high-order connectivity information within the KG to guide the recommendation. They are explicable as the designed meta-paths guide the recommendation process based on their comparison between the candidate item and the historical interactions. For example, PER [38] pre-defines various types of meta-paths and subsequently extracts all paths that conform to the defined meta-path for each user–item pair. MCRec [4] uses CNN to explicitly embed user–item pairs to obtain the representation of meta-paths. Additionally, KPRN [16] employs LSTM to generate path representations by integrating the semantics of entities and relations. PGPRs [36] design a series of quantitative properties to explain reasoning paths from the user’s perspective. Despite the interpretability of successful recommendation, a deficiency is that these methods highly depend on manually designed meta-paths. This makes them insufficiently scalable to other domain knowledge and overly reliant on human efforts.

GNN-based. GNN-based methods [18,19,21,23,39,40,41] are emerging, which can be attributed to their compatibility with the graph structure inherent in recommendation data and recursively propagate the embeddings to capture multi-hop neighbors. For instance, KGAT [19] leverages the relation-aware aggregation mechanisms to model high-order relations effectively. KGCN [18] proposes a mini-batch training strategy and integrates neighbor information with bias to update KG entity representations, allowing for a better exploration of the local proximity structure. KGNN-LS [39] learns the item representations based on the designed user-specific weights to different relations. KGIN [23] is a state-of-the-art method that models user intents for relations and employs relational path-aware aggregation to effectively capture rich information on the KG.

2.2. Self-Supervised Learning

Self-supervised learning has gained widespread attention in the fields of CV [42,43,44], NLP [45,46], and graph [47,48] as it can learn valuable feature representations from extensive unlabeled data. Typically, several attempts introduce self-supervised learning into recommendation, effectively addressing the challenge of data sparsity to achieve improvements in recommendation performance [25,30,31,32,49,50]. The typical exploration is SGL [25], which leverages three data augmentation operators to construct structured views for contrastive learning in the collaborative filtering framework. AdaGCL [26] designs an adaptive graph generation model and a graph denoising model to create contrastive views to enhance the CF paradigm. In recent explorations, the self-supervised paradigm for knowledge-aware recommendation has shown great potential through data self-discrimination to learn richer latent semantic information. KGCL [27] attempts to leverage a joint contrastive learning paradigm to bridge the knowledge graph and the user–item interaction graphs to address noisy and sparse issues. KGTN [28] introduces a novel local–global contrast mechanism in its contrastive learning module, enhancing the robustness of user and item representation learning. KGRec [31] is the first attempt to unify generative and contrastive learning for extracting implicit semantics from the KG in a rationale-aware manner. HCVCLs [29] design a hierarchically coupled view-crossing contrastive learning paradigm to addresses noise issues. Motivated by the above self-supervised learning frameworks, we design a new SSL-based method for knowledge-aware recommendation by hierarchically refining node representations and aligning multi-view semantics across the KG and the UIG.

3. Preliminaries

This section introduces key notations used throughout the paper and formalizes our studied task.

User–Item interaction graph. For the recommendation scenario, we define the user–item interaction graph as

G_{u} = {(u, y_{u i}, i) | u \in U, i \in I}

. Herein,

U

represents the set of users,

I

represents the set of items, and

V = U \cup I

is the node set. The user–item interaction matrix is denoted as

Y

, where

y_{u i} = 1

indicates an interaction (e.g., view and purchase) between user u and item i while

y_{u i} = 0

signifies no interaction. Graph

G_{u}

contains information about the connections between users and items, forming the basis for recommendation models.

Knowledge graph. Knowledge graphs serving as graphical representations of information depict entities as nodes and their relations as edges. The structured format facilitates the organization and retrieval of information. A knowledge graph is formally represented as triplets

G_{k} = {(h, r, t) | h, t \in E, r \in R}

, where

E

denotes the set of entities,

R

denotes the set of relations, and each triple

(h, r, t)

signifies relation r connecting the head entity h to the tail entity t.

Task Description. We formally describe the recommendation task addressed in this paper:

Input: a user–item interaction graph $G_{u}$ and a knowledge graph $G_{k}$ .
Output: a predictive function $y_{u i}$ that estimates how likely a user would adopt an item.

4. Methodology

We present the proposed Hierarchical self-supervised learning for Knowledge-aware Recommendation (HKRec). HKRec aims to hierarchically leverage two self-supervised learning paradigms to unearth the valuable latent semantics of the KG, enabling the acquisition of high-quality user and item representations. Figure 1 shows the overall framework, which comprises two main components: the triple-graph masked autoencoder and the parallel cross-view contrastive learning.

4.1. Triple-Graph Masked Autoencoder

Inspired by the success of masked autoencoder techniques in natural language processing [51] and image processing [52], we propose the T-GMAE module of HKRec. The T-GMAE selectively masks portions of the input signals, forcing the model to learn underlying semantics and adapt to the masked structure, which enhance the learned node representations. In particular, T-GMAE employs dual masking strategies (i.e., node connections masking and node features masking) and incorporates three reasoning strategies: (1) edge reconstruction; (2) degree restoration; (3) feature reconstruction. Among these, both edge reconstruction and degree restoration can be categorized as forms of node connection reconstruction.

4.1.1. Node Connection Reconstruction

We design the attention-based node connection masking strategy hierarchically and divide the reconstruction into two tasks: edge reconstruction and degree restoration, aiming to unveil the underlying semantic information.

Masking. We adopt a generative paradigm, wherein a proportion of edges with high weights in the KG are selectively masked. The main objective is to reconstruct these masked edges, allowing the model to learn crucial information. T-GMAE selects the top-n edges with the highest weights as the masked node connections. To this end, given a triplet

(i, r_{i j}, j)

, we employ an attention mechanism [31] to calculate the weight of relation

r_{i j}

between the head entity i and the tail entity j within the KG, evaluating the significance of

r_{i j}

and its associated triplets. Hence, we formulate the attention-based weighting function as follows:

α_{i, j} = σ (\frac{e_{i} W^{Q} {(e_{j} W^{K} ⊙ r_{i j})}^{⊤}}{\sqrt{d_{k}}}) β_{i}

(1)

where

e_{i}

,

e_{j}

, and

r_{i j}

denote the embedding of entity i and entity j, along with the relation between them, respectively.

W^{Q}

and

W^{K}

denote the weights transformation matrix.

\sqrt{d_{k}}

denotes the scale factor and

σ

denotes the softmax function. We denote the node connections that are masked and remained as

R_{m a s k}

and

R_{r e m a i n}

, respectively, where

R_{m a s k} \cup R_{r e m a i n} = R

. By doing so, we obtain the masked knowledge graph

G_{m}

as the input data:

G_{m} = (E, R_{r e m a i n}), R_{r e m a i n} = R - R_{m a s k}

(2)

Edge reconstruction. We utilize a traditional encoder–decoder architecture to reasoning node connections that are as close as possible to the original structure and encode essential information from the knowledge graph to low-dimensional representations. To be more specific, we use the GNN encoder, such as GCN [53], GraphSAGE [54], HGCN [55], to encode

G_{m}

.

During the neighbor aggregation process, we integrate the weights of the edges into the embedding learning of entities:

e_{i}^{(l)} = \frac{1}{|N_{i}|} \sum_{(i, r_{i j}, j) \in N_{i}} α_{i, j} r_{i, j} ⊙ e_{j}^{(l - 1)}

(3)

where

N_{i}

denotes the subgraph centered around the node i and containing all its first-order neighbors. The node representation can be updated by incorporating the embeddings of both itself and its neighboring nodes, as depicted by

e_{j}^{(l + 1)} = Com (e_{j}^{(l)}, Agg ({e_{i}^{(l)} | i \in N_{i}}))

(4)

where the function Agg is used to aggregate the embedding vectors of the neighboring nodes. The combination function Com refers to an operation that combines the aggregated neighbor information with its own from the previous layer. It is worth pointing out that the final representation is obtained by summing the aggregated embeddings from different layers.

The decoder is responsible for mapping the node representations back to input embeddings for the reconstruction task. We aim to reconstruct both edges and degrees about the masked node connections and we utilize the GNN as the edge decoder. Note that the edge reconstruction loss function is devised by minimizing the dot product logarithmic loss of KG triplets:

L_{e d g e} = \sum_{(i, r, j) \in G_{m a s k}} - log (σ (e_{i}^{⊤} \cdot (e_{j} ⊙ r_{i j})))

(5)

Degree restoration. To fully capture the KG’s structure, we introduce a degree restoration task. While the edge decoder focuses on reconstructing pair-wise connections, the degree decoder is designed to capture the global structural properties of first-order neighborhood-wise connections. Specifically, based on the connectivity patterns, the degrees are divided into out-degree

d e g_{m, o}

and in-degree

d e g_{m, i}

within the masked subgraph

G_{m}

as well as the degree

d e g_{k}

in the entire original knowledge graph

G_{k}

, considering both local and global perspectives. The local-based degree restoration exploits the remaining data for training, focusing on recovering the node degree information by mining important features of the KG. The global-based degree restoration learns node representations from the original KG and attempts to preserve the KG structural information during the reconstruction process.

To measure how well the restored node degree aligns with the actual degree in both the masked KG and original KG, a multilayer perceptron (MLP) is then employed as the degree decoder, and we utilize the mean squared error (MSE) to measure the prediction loss:

L_{d e g} = \frac{1}{| E |} \sum_{i \in E} [{({\hat{d e g}}_{m, *} (i) - {d e g}_{m, *} (i))}^{2} + {({\hat{d e g}}_{k} (i) - {d e g}_{k} (i))}^{2}]

(6)

where

\hat{d e g}

denotes the restored degree and

{d e g}_{m, *}

denotes the node degree in the masked graph

G_{m}

. Notation

* \in (i, o)

denotes in-degree and out-degree properties. Therefore, the final connection objective to be minimized is

L_{c o n} = λ_{1} L_{e d g e} + λ_{2} L_{d e g}

(7)

4.1.2. Node Feature Reconstruction

Distinct from the edge reconstruction and the degree restoration that focuses on the structural-wise learning of the KG, the node feature masking reconstruction we propose focuses on enhancing entity representations from a node-wise perspective. Specifically, we randomly mask a certain proportion of the first-order neighborhood structural features of nodes, followed by reasoning and reconstruction.

Node feature extraction. Considering the prevalent absence of attribute information in real-world KG, we extract the first-order neighborhood structural features of the nodes as the initial features to enrich the structural information of node attributes. Inspired by the entity–entity adjacency matrix in the KG, we construct the entity–relation association matrix

X

as input features:

x_{i k} = \{\begin{matrix} count (r_{k}) & if \exists (i, r_{k}, \cdot) in the KG \\ 0 & otherwise \end{matrix}

(8)

where

x_{i k}

is the element in the ith row and the kth column of

X

, k denotes the number of types of r,

count (r_{k})

represents the count of triplets involving node i and relation

r_{k}

within the KG. If there is no triplet between relation

r_{k}

and node i, the value of corresponding

x_{i k}

is zero. This strategy addresses the challenge associated with the dimension explosion of the traditional adjacency matrix.

Masked feature reconstruction. At a predefined masking rate, we selectively disturb a subset of entities

E_{m a s k}

. For each entity i within this masked set, its initial feature

x_{i}

is replaced with a masking token [MASK] represented as

x_{[M]}

. We feed the perturbed matrix

X^{'}

into the GNN encoder for information propagation and aggregation, where the

{x^{'}}_{i}

for

i \in E

can be defined as

x_{i}^{'} = \{\begin{matrix} x_{[M]} & if i \in E_{m a s k} \\ x_{i} & if i \notin E_{m a s k} \end{matrix}

(9)

Then, we choose MLP as the decoder. Motivated by the challenge of MSE to handle values close to zero, we opt the scaled cosine error (SCE) as the loss function. The training objective is to compute the loss between the predicted feature

{\hat{x}}_{i}

and the original feature

x_{i}

using SCE:

L_{f} = \frac{1}{| E_{m a s k} |} \sum_{i \in E_{m a s k}} {(1 - \frac{x_{i}^{⊤} {\hat{x}}_{i}}{∥x_{i}∥ \cdot ∥{\hat{x}}_{i}∥})}^{γ}

(10)

where the scaling factor

γ > = 1

.

Finally, the overall objective to be minimized about T-GMAE is

L_{m} = L_{c o n} + λ_{3} L_{f}

(11)

The designed T-GMAE, integrating three reconstruction strategies, achieves enhanced node embeddings, which effectively capture the global structural characteristics of the KG.

Algorithm 1 is the pseudo-code of the T-GMAE algorithm.

Algorithm 1 Triple-Graph Masked Autoencoder (T-GMAE)

Input: Knowledge graph $G_{k} = (E, R)$ , encoder, edge decoder, degree decoder, feature decoder, connection masking rate $φ$ , feature masking rate $μ$ , weights $λ_{1}$ , $λ_{2}$ , $λ_{3}$ ;

1:: while not converged do
2:: Mask $G_{k}$ with rate $φ$ to obtain $R_{m a s k}$ and $R_{r e m a i n}$ ;
3:: Mask $G_{k}$ with rate $μ$ to get entity sets entities $E_{m a s k}$ ;
4:: Obtain the perturbed feature matrix $X^{'}$ of $E_{m a s k}$ according to Equation (9);
5:: Perform GNN encoding on $R_{r e m a i n}$ and $X^{'}$ according to Equation (4);
6:: Calculate loss $L_{e d g e}$ according to Equation (5);
7:: Calculate loss $L_{d e g}$ according to Equation (6);
8:: Calculate loss $L_{f}$ according to Equation (7);
9:: Calculate $L_{m} = λ_{1} L_{e d g e} + λ_{2} L_{d e g} + λ_{3} L_{f}$ ;
10:: Update model by minimizing the reconstruction loss $L_{m}$ ;
11:: end while
12:: Return the trained T-GMAE.

4.2. Parallel Cross-View Contrastive Learning

We apply dual graph-based augmentation strategies on both the KG and the UIG to construct multiple structural views. Subsequently, we perform parallel cross-view contrastive learning between the KG and the UIG, achieving entity–item alignment as shown in Figure 2.

4.2.1. Graph Augmentations

Graph augmentations enhance the learning of robust and generalized data representations by deriving new training data from the original graph [25]. We devise two attention-based dropout strategies, namely edge-level dropout and neighborhood-level dropout, to remove low-value information from the KG and construct multiple augmented views. Edge-level dropout selectively removes node connections, while neighborhood-level dropout deletes nodes along with their associated edges. Graph augmentations enable the recommender system to capture comprehensive patterns and information within the graph more effectively, improving the learning quality and robustness of graph representations.

Edge-level dropout. The weight of edge in the KG reflects the significance of the associated entity and relation. By considering the top n edges with the lowest weights, we selectively remove these edges from the KG. This approach is based on the understanding that edges with lower weights indicate less value in the KG. As a result, the dropout of these edges helps filter out redundant noise information and capture the useful connectivity patterns of nodes, enhancing the overall quality of the KG. The edge-level dropout operators can be formalized as

T_{1} (G_{k}) = (E, M_{1} ⊙ R), T_{1} (G_{u}) = (V, M_{2} ⊙ Y)

(12)

where

M_{1}

denote the top n lowest weight-based masking vector on the KG and

M_{2} \in {0, 1}^{| Y |}

denote the masking vector on the UIG edge set generated by a Bernoulli distribution. Following this operator, we can generate enhanced views

G_{k}^{1}

and

G_{u}^{1}

for both the KG and the UIG.

Neighborhood-level dropout. Neighborhood-level dropout operator refers to the removal of nodes and their associated connections from the original graph at a certain proportion. The neighborhood-level dropout operators can be formalized as

T_{2} (G_{k}) = (E ⊙ M_{3}, R ⊙ M_{3}^{'}), T_{2} (G_{u}) = (V ⊙ M_{4}, Y ⊙ M_{4}^{'}))

(13)

where

M_{3}

and

M_{3}^{'}

denote the masking vector on the node set and the relation sets related to top n low weights in KG.

M_{4}

and

M_{4}^{'}

denote the masking vector on the node set and the masking vector on the associated edges, respectively. After, we can create augmented views

G_{k}^{2}

and

G_{u}^{2}

for both the KG and the UIG.

4.2.2. Contrastive Learning

In cases where augmented KG views and UIG views are generated, we employ the message propagation strategy of LightGCN [41] on UIG views due to its effectiveness and lightweight architecture for learning user–item representations, which is updated as follows:

e_{u}^{(l + 1)} = \sum_{i \in N_{u}} \frac{e_{i}^{(l)}}{\sqrt{| N_{u} | | N_{i}} |}

(14)

e_{i}^{(l + 1)} = \sum_{u \in N_{i}} \frac{e_{u}^{(l)}}{\sqrt{| N_{i} | | N_{u}} |}

(15)

where

e_{u}^{(l + 1)}

and

e_{i}^{(l + 1)}

denote the

(l + 1)

th layer’s representations of user and item.

N_{u}

and

N_{i}

denote the interaction items of user u and the set of connected users of item i, respectively. To create a unified representation space for subsequent contrastive learning, facilitating the comparison of embeddings of cross-views, we take the representations as inputs of an MLP to map them to a shared space where the contrastive loss is calculated:

z_{u, i}^{*} = σ {({(e_{i}^{G_{u}^{*}})}^{⊤} W_{1}^{G_{u}^{*}} + b_{1}^{G_{u}^{*}})}^{⊤} W_{2}^{G_{u}^{*}} + b_{2}^{G_{u}^{*}}

(16)

z_{k, i}^{*} = σ {({(e_{i}^{G_{k}^{*}})}^{⊤} W_{1}^{G_{k}^{*}} + b_{1}^{G_{k}^{*}})}^{⊤} W_{2}^{G_{k}^{*}} + b_{2}^{G_{k}^{*}}

(17)

where

* \in (1, 2)

denote the superscripts of structural views constructed by edge-level dropout and neighbor-level dropout, respectively, and

W^{(\cdot)}, b^{(\cdot)}

are trainable parameters.

Positive pairs (i.e., item embedding

z_{u, i}^{1}

in UIG view

G_{u}^{1}

and its corresponding entity embedding

z_{k, i}^{1}

in KG view

G_{k}^{1}

) promote consistency among different perspectives of the same node. On the other hand, negative pairs (i.e., any two node representations in cross-views without relatedness) emphasize distinctions between different nodes. In what follows, we design the contrastive learning loss based on InfoNCE loss [42] to minimize the distance of positive pairs and maximize that of negative pairs:

L_{C} = \sum_{* \in (1, 2)} \sum_{i \in I} - log \frac{e^{s (z_{u, i}^{*}, z_{k, i}^{*}) / τ}}{e^{s (z_{u, i}^{*}, z_{k, i}^{*}) / τ} + \sum_{j \in {i . i^{'}, i^{″}}} e^{s (z_{u, j}^{*}, z_{k, i}^{*}) / τ)}}

(18)

where

s (\cdot)

denotes cosine similarity function and

i^{'}

,

i^{″}

denote the randomly sampled negative candidates for item i. Multiple cross-view contrastive learning creates augmented data from two perspectives, enhancing the robustness of our HKRec.

4.2.3. Joint Learning Strategy

To integrate the two self-supervised tasks into the recommendation objective, we combine reconstruction loss

L_{m}

, contrastive loss

L_{c}

, and recommendation loss

L_{r e c}

. When utilizing a multi-task joint learning paradigm, we concurrently optimize HKRec by incorporating the above loss as integral components of the training process. In particular, the prediction function for user–item interactions can be defined as the inner product of user and item representations:

{\hat{y}}_{u i} = e_{u}^{⊤} e_{i}

(19)

HKRec is optimized by utilizing bayesian personalized ranking (BPR) [56] loss, which is designed for pairwise ranking tasks to reconstruct the historical data:

L_{r e c} = \sum_{(u, i, j) \in O} - ln σ ({\hat{y}}_{u i} - {\hat{y}}_{u j})

(20)

where

O = {(u, i, j) | (u, i) \in O^{+}, (u, j) \in O^{-}}

is the training instances including the observed interactions

O^{+}

and unobserved counterparts

O^{-}

. To optimize the aforementioned three losses, the comprehensive loss of the HKFRec is defined as

L = L_{r e c} + L_{m} + λ_{4} L_{c}

(21)

5. Experiments

To demonstrate the superiority of our model and uncover the reasons behind its outstanding performance, we conducted an extensive series of experiments aimed at addressing the following questions.

RQ1: How does our model’s performance compare to state-of-the-art methods?
RQ2: What is the effectiveness of the key components in our model?
RQ3: How effective is our model when tackling cold start and long-tail issues?
RQ4: Does the self-supervised paradigm of our model lead to improved item representations?

5.1. Experimental Settings

5.1.1. Datasets

We conduct experiments on three benchmark datasets that differ in size and sparsity: Last-FM [19], MIND [31], and Alibaba-iFashion [23]. These datasets reflect different real-world applications.

Last-FM: Last-FM is a dataset commonly used in the field of music recommender systems that collects user music listening history and tagging data from the Last.fm platform.
MIND: The MIND dataset is collected from the Wikidata platform for news recommendation tasks. It contains a considerable amount of user browsing behavior data and news topic information.
Alibaba-iFashion. The Alibaba-iFashion dataset comprises fashion outfits with various fashion items, categorized according to a fashion taxonomy, collected from the Alibaba-iFashion online shopping platform.

We summarize the statistics of three datasets in Table 1.

5.1.2. Evaluation Metrics

We consider all non-interacted items of the target user as negative samples to infer their preferences in the recommender system. For fair comparisons, we use the full-ranking strategy. To evaluate top-K recommendation results, we measure the performance of our proposed model using Recall@K and Normalized Discounted Cumulative Gain (NDCG)@K, where K is set as 20 by default. It is worth noting that NDCG is a metric that captures the quality and relevance of the recommended items by considering their positions in the recommendation list. It assigns higher scores to lists where the most relevant items appear higher up.

5.1.3. Baseline Models

To ensure a diverse and representative evaluation of our model, we compare our HKRec model with various baseline models: conventional collaborative filtering (BPR, LightGCN), embedding-based knowledge-aware recommendation (CKE), GNN-based knowledge-aware recommendation (KGAT, KGIN), self-supervised learning recommendation (SGL), knowledge-aware self-supervised learning recommendation (KGCL, KGRec).

BPR [56]: A widely used recommendation model that employs Bayesian analysis to rank items and generate user preference predictions.
LightGCN [41]: A simplified collaborative filtering method that integrates GCN.
CKE [11]: A typical method that integrates collaborative filtering with knowledge graph feature learning, incorporating three key components to learn representations including structural, textual, and visual information.
KGAT [19]: This model initially utilizes the TransR model for KG vectorization learning and a GNN for information propagation and aggregation.
KGIN [21]: KGIN explores the relational modeling of user intents and provides explainable semantics captured from knowledge graphs for recommendation tasks.
SGL [25]: This method introduces contrastive learning into recommendation data by generating multi-views through techniques such as node removing, edge removing, and random walk.
KGCL [27]: This model generates auxiliary self-supervised signals through data augmentation strategies to construct contrastive views on both KG and UIG.
KGRec [31]: This method generates rational scores for knowledge graph triplets and designs a self-supervised learning framework based on the scores.

5.2. Performance Comparison (RQ1)

The performance of our HKRec is first compared with baselines based on Recall@20 and NDCG@20. The comparison results are presented in Table 2 where the best baselines are highlighted with underlined and the best performances with bold. We can find the following:

The proposed HKRec consistently outperforms all baselines across all three datasets, demonstrating its superiority in recommendation performance. Particularly noteworthy is that HKRec surpasses even the best-performing baseline. Recall@20 and NDCG@20 improved by 2.2% to 24.95% and 3.38% to 22.32% in the Last-FM dataset, by 7.0% to 23.82% and 5.7% to 39.66% in the MIND dataset, and by 1.76% to 34.73% and 1.62% to 35.13% in the Alibaba-iFashion dataset. We summarize the reasons for the usefulness of our HKRec as follows: (1) The design of T-GMAE effectively extracts semantic relatedness within the items that contribute to the recommendation tasks. (2) The incorporation of multiple contrastive cross-views assists HKRec in capturing deeper semantic relatedness and injecting entity representations into the user–item interaction.
Examining Table 1 and Table 2, it can be seen that when the KG contains more diverse entities and relations, our HKRec model, through its T-GMAE and cross-view contrastive learning mechanisms, can effectively discover more useful information from the KG. This process enriches the model with semantic information to understand and recommend item. Consequently, the model achieves the highest recall of 0.1192 on the Alibaba-iFashion dataset. In addition, compared with the strongest baseline, the recall improved by 2.2%, 7.0%, and 1.76% on different datasets, respectively. The model achieves the most notable performance improvement on the MIND dataset, which has the most sparse interactions. In such cases, the provision of more diverse information from the KG has the most significant impact.
The knowledge-aware methods outperform CF-based methods, underscoring the effective alleviation of data sparsity issues in recommender systems through the incorporation of KG.
The performance of the GNN-based methods (i.e, KGAT, KGIN) on the Last-FM and Alibaba-iFashion datasets outperforms the embedding-based approach, CKE. This outcome highlights the effectiveness of GNNs in capturing higher-order dependencies and attaining superior knowledge representations. However, such superiority is not observed in the MIND dataset. The rationale lies in its relatively small knowledge graph scale, where entities exhibit low connection density and lack high-order dependencies. Consequently, the efficacy of GNNs in this scenario is limited.
The two methods combining both contrastive and generative paradigms (i.e., KGRec, HKRec) demonstrate superior performance compared to others that employ a single contrastive learning paradigm (i.e., SGL, KGCL) only. This observation indicates that the generative facilitates the distilling of richer semantic information from the KG. Hence, leveraging this paradigm effectively is crucial for enhancing recommendation performance.

5.3. Ablation Study (RQ2)

Impact of self-supervised learning strategies. To study the effects of two modules, T-GMAE and cross-view contrastive learning, we compare the result of HKRec with the designed two variants and the KGRec as follows:

w/o ND: We remove the neighborhood-level dropout components from the contrastive learning module.
w/o T-GMAE: We remove the T-GAME module which includes the degree decoder component and the node feature masking reconstruction component.

The performance comparison results are presented in Table 3. It is observed that, compared to the best-performing baseline, each of the self-supervised designs makes a clear contribution to the overall performance. HKRec, which utilizes T-GMAE and parallel multi cross-view contrastive learning simultaneously, demonstrates optimal performance. It is worth noting that compared to solely removing the neighborhood-level dropout component in the contrastive learning module, the performance deterioration observed upon removing two decoders about degrees and features from T-GMAE suggests a more significant contribution of T-GMAE to performance enhancement. This can be attributed to T-GMAE’s adoption of a hierarchical triple masking reconstruction strategy, encompassing both structure-level and node-level information. This facilitates a more comprehensive exploration of semantics within the KG. In general, these findings underscore the importance of hierarchical learning in enhancing model performance.

Hyperparameter sensitivit. We investigate the sensitivity of the performance to the key hyperparameters, namely the connection masking rate

φ

, the feature masking rate

μ

, and the node dropout rate

ρ

, which control the intensity of self-supervised learning. For these three hyperparameters, we conduct experiments with varying ranges of values. Figure 3 shows the model’s performance on three datasets with different values of hyperparameters.

First, we observe that the best performance is achieved with the connection masking rate

φ = 0.25

across all three datasets. Subsequently, when gradually increasing the feature masking rate

μ = 0.1

to

μ = 0.5

, optimal performance is attained for MIND and Alibaba-iFashion, while the bset setting is

μ = 0.3

for Last-FM. Furthermore, it is evident that the optimal neighborhood dropout rates

ρ

for Last-FM, MIND, and Alibaba-iFashion are

ρ = 0.3

,

ρ = 0.3

, and

ρ = 0.5

, respectively. In particular, we find that the variation of hyperparameters within a specified range has a small relative impact on performance. This robustness can be attributed to the three well-designed self-supervised learning strategies, which enhance recommendation performance at multiple hierarchies. It should be noted that although value changes in hyperparameters do not have a great impact on performance, we still recommend using the optimal settings to achieve the best results.

5.4. Benefits of HKRec (RQ3)

Effective utilization of the KG offers a solution to mitigate the challenges of the cold-start and long-tail issues for recommender systems. In this subsection, we verify the performance of HKRec in addressing these challenges.

Cold-start recommendation. We divide the users in the Alibaba-iFashion dataset into five groups based on the number of interactions. The lower the group number, the fewer interactions with the user. As can be seen, the experimental results regarding Recall@20 and NDCG@20 are illustrated in Figure 4. Notably, even under cold-start scenarios where user interaction data are limited, the HKRec model consistently demonstrates superior recommendation performance compared to all baseline models. These results emphasize the effectiveness of the hierarchical self-supervised learning strategy in providing enhanced solutions for addressing the cold-start challenge.

Long-tail item recommendation. To evaluate the model’s efficacy in mitigating the long-tail issue, we divided the datasets into five groups with different interaction density degrees of items. Each group maintains an equal item count while witnessing a progressive rise in interaction density. Figure 5 illustrates the empirical results for our HKRec and several state-of-the-art knowledge-aware recommendation baselines. By distilling the semantic relatedness under item contexts, HKRec can effectively handle the imbalances present in long-tail distributions. Overall, the experimental result empirically shows that HKRec not only captures popular items but also understands the fewer interaction items, leading to improved performance in addressing the long-tail issue.

5.5. Item Embedding Visualization (RQ4)

To evaluate whether the proposed HKRec can better distill implicit semantics from the KG and strengthen item representations, we conduct visual experiments on item similarity within the MIND news recommendation dataset by extracting embeddings of diverse item topics and subsequently computing a similarity matrix. Specifically, three items are selected from each of the topics (i.e., actor, institution, and sports). We calculate the cosine similarity between items, where the similarity value range is [−1, 1]. The visual comparison results between our mode and KGCL are illustrated in Figure 6. Briefly, the darker the color, the lower the similarity between items, and the lighter the color, the higher the similarity. From the results, we summarize the following observations:

We find that the lightest colors are observed among items of the same topic and the highest similarity between items of the same topic. Specifically, Figure 6a presents the experimental result of HKRec, showcasing a similarity range of [0.82, 1] for actor items, [0.95, 1] for institutions, and [0.79, 1] for sport items. This is because in recommender systems, items of the same topic typically share similar features, exhibit similar contexts in the KG, and produce simultaneous interactions in the UIG. Yet, as illustrated in Figure 6b, when employing KGCL, the similarity ranges are [0.45, 1], [0.20, 1], and [0.58, 1] separately. Overall, HKRec effectively learns the correlations among items of the same topic, enabling their close positions to each other in the embedding space and performing high-quality item embeddings. However, KGCL exhibits limitations in its node representation learning ability, especially for sports-related items.
Compared to items of the same topic, the visualization matrix colors between items of different topics are darker, indicating lower similarity. Specifically, for institutional items, the colors of the similarity matrix between institutions and actors and sports items are the darkest. This is attributed to the fact that in UIG, only one or two users simultaneously click on both institutional and actor items or both institutional and sports items. And there is no correlation between institutions and the other two types of items in KG.
There is similarity between actors and sports, ranging from 0.55 to 0.81. This phenomenon arises from users clicking on both types of items concurrently in the UIG, while the correlation between these two categories in the KG further contributes to the observed distinctions in similarities. For example, in the case of HKRec, the similarity values between the actor Jennifer Lopez and the sports of World Series and Pittsburgh Steelers are measured at 0.79 and 0.61, respectively. As shown in Figure 7, the rationale behind these findings lies in the fact that almost 30 users concurrently click news related to Jennifer Lopez and World Series, as well as Jennifer Lopez and Pittsburgh Steelers. This limited interaction results in a weaker correlation between Jennifer Lopez and World Series, as well as between Jennifer Lopez and Pittsburgh Steelers. However, the correlation relationship between these two sets of items in the KG provides useful information to the user and makes the correlation between these two sets of items stronger. Specifically, as shown in Figure 7a, the analysis of KG reveals that, in comparison to Pittsburgh Steelers, there is one three-hop path between Jennifer Lopez and World Series, while no three-hop path exists between Jennifer Lopez and Pittsburgh Steelers. Additionally, as shown in Figure 7b, there are twenty-six four-hop paths between Jennifer Lopez and World Series, whereas only eleven paths exist between Jennifer Lopez and Pittsburgh Steelers. Thus, the similarity between Jennifer Lopez and World Series is 0.79 higher than the similarity between Jennifer Lopez and Pittsburgh Steelers, which is 0.61.
Distinct from HKRec, the results of KGCL show corresponding values of only 0.40 and 0.34. This could be due to the inherent limitations of single graph augmentation or single cross-view contrastive learning, which are insufficient for extracting meaningful semantic information and fail to capture user interests effectively. These experimental findings indicate that HKRec exhibits superior node representation learning capabilities compared to KGCL. The resulting high-quality item embeddings effectively capture and reflect the complex correlations between items.

6. Conclusions

In this work, we propose HKRec for knowledge-aware recommendation. HKRec is a hierarchical method that can deeply mine implicit semantics in knowledge graphs by utilizing both generative and contrastive learning paradigms. The T-GMAE module is conducted that can efficiently extract latent semantically relations from the KG while reconstructing the masked knowledge through triple hierarchies of node connection, node degree, and node feature. Furthermore, we adopt the neighborhood-level and edge-level dropout strategies to construct self-supervised signals and perform hierarchical cross-view contrastive learning between KG and UIG for further information distillation, aiming to enhance the alignment between representations of entities and items. Our HKRec facilitates a comprehensive understanding of user preferences to strengthen recommendation performance. Extensive experimental results on three real-world datasets indicate the strong performance of HKRec against the state-of-the-art baselines. In future works, we plan to address the noise problem in the KG and explore methods for highlighting important information while filtering out irrelevant knowledge. In addition to the item KG used in this work, the exploration of user–item KG-aware recommendation is also worthwhile.

Author Contributions

Conceptualization, C.Z.; methodology, C.Z.; validation, C.Z. and S.Z.; investigation, C.Z.; writing—original draft preparation, C.Z. and S.Z.; visualization, D.W.; supervision, J.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Acknowledgments

The authors thank the support from the National University of Defense Technology.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Niu, Y.; Lin, R.; Xue, H. Research on learning resource recommendation based on knowledge graph and collaborative filtering. Appl. Sci. 2023, 19, 10933. [Google Scholar] [CrossRef]
Lei, C.; Liu, Y.; Zhang, L.; Wang, G.; Tang, H.; Li, H.; Miao, C. Semi: A sequential multi-modal information transfer network for e-commerce micro-video recommendations. In Proceedings of the 27th ACM SIGKDD Conference on Knowledge Discovery & Data Mining, Singapore, 14–18 August 2021; pp. 3161–3171. [Google Scholar]
Long, X.; Huang, C.; Xu, Y.; Xu, H.; Dai, P.; Xia, L.; Bo, L. Social recommendation with self-supervised metagraph informax network. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, QLD, Australia, 1–5 November 2021; pp. 1160–1169. [Google Scholar]
Hu, B.; Shi, C.; Zhao, W.; Yu, P. Leveraging meta-path based context for top-n recommendation with a neural co-attention model. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1531–1540. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2011; pp. 285–295. [Google Scholar]
He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, WA, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Wang, Y.; Javari, A.; Balaji, J.; Shalaby, W.; Derr, T.; Cui, X. Knowledge graph-based session recommendation with session-adaptive propagation. In Proceedings of the The ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 264–273. [Google Scholar]
Wang, L.; Du, W.; Chen, Z. Multi-feature-enhanced academic paper recommendation model with knowledge graph. Appl. Sci. 2024, 14, 5022. [Google Scholar] [CrossRef]
Ai, Q.; Azizi, V.; Chen, X.; Zhang, Y. Learning heterogeneous knowledge base embeddings for explainable recommendation. Algorithms 2018, 11, 137. [Google Scholar] [CrossRef]
Cao, Y.; Wang, X.; He, X.; Hu, Z.; Chua, T. Unifying knowledge graph learning and recommendation: Towards a better understanding of user preferences. In Proceedings of the WWW’19: The World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 151–161. [Google Scholar]
Zhang, F.; Yuan, N.; Lian, D.; Xie, X.; Ma, W. Collaborative knowledge base embedding for recommender systems. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 353–362. [Google Scholar]
Bordes, A.; Usunier, N.; Garcia-Duran, A.; Weston, J.; Yakhnenko, O. Translating embeddings for modeling multi-relational data. In Proceedings of the 26th International Conference on Neural Information Processing Systems, Lake Tahoe, NV, USA, 5–10 December 2013; Volume 26, pp. 1–9. [Google Scholar]
Lin, Y.; Liu, Z.; Sun, M.; Liu, Y.; Zhu, X. Learning entity and relation embeddings for knowledge graph completion. In Proceedings of the 29-th AAAI Conference on Artificial Intelligence, Austin, TX, USA, 25–30 January 2015; pp. 2181–2187. [Google Scholar]
Guo, Q.; Zhuang, F.; Qin, C.; Zhu, H.; Xie, X.; Xiong, H.; He, Q. A survey on knowledge graph-based recommender systems. J. Internet Technol. 2020, 34, 3549–3568. [Google Scholar] [CrossRef]
Wang, H.; Zhang, F.; Wang, J. Ripplenet: Propagating user preferences on the knowledge graph for recommender systems. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 417–426. [Google Scholar]
Wang, X.; Wang, D.; Xu, C.; He, X.; Cao, Y.; Chua, T. Explainable reasoning over knowledge graphs for recommendation. In Proceedings of the AAAI Conference on Artificial Intelligence, Honolulu, HI, USA, 27 January–1 February 2019; pp. 5329–5336. [Google Scholar]
Zhao, H.; Yao, Q.; Li, J.; Song, Y.; Lee, D. Meta-graph based recommendation fusion over heterogeneous information networks. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Halifax, NS, Canada, 13–17 August 2017; pp. 635–644. [Google Scholar]
Wang, H.; Zhao, M.; Xie, X.; Li, W.; Guo, M. Knowledge graph convolutional networks for recommender systems. In Proceedings of the WWW’19: The World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 3307–3313. [Google Scholar]
Wang, X.; He, X.; Cao, Y.; Liu, M.; Chua, T.S. KGAT: Knowledge graph attention network for recommendation. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 950–958. [Google Scholar]
Wu, S.; Sun, F.; Zhang, W.; Xie, X. Graph neural networks in recommender systems: A survey. ACM Comput. Surv. 2022, 55, 1–37. [Google Scholar] [CrossRef]
Zou, D.; Wei, W.; Wang, Z.; Mao, X.; Zhu, F.; Fang, R.; Chen, D. Improving knowledge-aware recommendation with multi-level interactive contrastive learning. In Proceedings of the 31th ACM International Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 2817–2826. [Google Scholar]
Duan, H.; Liang, X.; Zhu, Y.; Zhu, Z.; Liu, P. Reducing noise-triplets via differentiable sampling for knowledge-enhanced recommendation with collaborative signal guidance. Neurocomputing 2023, 558, 126771. [Google Scholar] [CrossRef]
Wang, X.; Huang, T.; Wang, D.; Yuan, Y.; Liu, Z.; He, X.; Chua, T. Learning intents behind interactions with knowledge graph for recommendation. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 878–887. [Google Scholar]
Wu, L.; Lin, H.; Tan, C.; Gao, Z.; Li, S.; Engineering, D. Self-supervised learning on graphs: Contrastive, generative, or predictive. IEEE Trans. Knowl. Data Eng. 2021, 35, 4216–4235. [Google Scholar] [CrossRef]
Wu, J.; Wang, X.; Feng, F.; He, X.; Chen, L.; Lian, J.; Xie, X. Self-supervised graph learning for recommendation. In Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual, 11–15 July 2021; pp. 726–735. [Google Scholar]
Jiang, Y.; Huang, C.; Huang, L. Adaptive graph contrastive learning for recommendation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 4252–4261. [Google Scholar]
Yang, Y.; Huang, C.; Xia, L.; Li, C. Knowledge graph contrastive learning for recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1434–1443. [Google Scholar]
Zou, D.; Wei, W.; Zhu, F. Knowledge enhanced multi-intent transformer network for recommendation. In Proceedings of the The Web Conference 2024, Sigapore, 13–17 May 2024; pp. 151–159. [Google Scholar]
Chen, S.; Li, Z. Hierarchically Coupled View-Crossing Contrastive Learning for Knowledge Enhanced Recommendation. Access 2024, 12, 75532–75541. [Google Scholar] [CrossRef]
Ma, Y.; Zhang, X.; Gao, C.; Tang, Y.; Li, L.; Zhu, R.; Yin, C. Enhancing recommendations with contrastive learning from collaborative knowledge graph. Neurocomputing 2023, 523, 103–115. [Google Scholar] [CrossRef]
Yang, Y.; Huang, C.; Xia, L.; Huang, C. Knowledge graph self-supervised rationalization for recommendation. In Proceedings of the 29th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Long Beach, CA, USA, 6–10 August 2023; pp. 3046–3056. [Google Scholar]
Zou, D.; Wei, W.; Mao, X.; Wang, Z.; Qiu, M.; Zhu, F.; Cao, X. Multi-level cross-view contrastive learning for knowledge-aware recommender system. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1358–1368. [Google Scholar]
Wang, H.; Zhang, F.; Zhao, M.; Li, W.; Xie, X.; Guo, M. Multi-task feature learning for knowledge graph enhanced recommendation. In Proceedings of the WWW’19: The World Wide Web Conference, San Francisco, CA, USA, 13–17 May 2019; pp. 2000–2010. [Google Scholar]
Shu, H.; Huang, J. Multi-task feature and structure learning for user-preference based knowledge-aware recommendation. Neurocomputing 2023, 532, 43–55. [Google Scholar] [CrossRef]
Wang, Z.; Zhang, J.; Feng, J.; Chen, Z. Knowledge graph embedding by translating on hyperplanes. In Proceedings of the AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 1112–1119. [Google Scholar]
Balloccu, G.; Boratto, L.; Fenu, G.; Marras, M. Reinforcement recommendation reasoning through knowledge graphs for explanation path quality. Knowl.-Based Syst. 2023, 260, 110098. [Google Scholar] [CrossRef]
Catherine, R.; Cohen, W. Personalized recommendations using knowledge graphs: A probabilistic logic programming approach. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 325–332. [Google Scholar]
Yu, X.; Ren, X.; Sun, Y.; Gu, Q.; Sturt, B.; Khandelwal, U.; Norick, B.; Han, J. Personalized entity recommendation: A heterogeneous information network approach. In Proceedings of the 7th ACM International Conference on Web Search and Data Mining, New York, NY, USA, 24–28 February 2014; pp. 283–292. [Google Scholar]
Wang, H.; Zhang, F.; Zhang, M.; Leskovec, J.; Zhao, M.; Li, W.; Wang, Z. Knowledge-aware graph neural networks with label smoothness regularization for recommender systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 968–977. [Google Scholar]
Du, Y.; Zhu, X.; Chen, L.; Zheng, B.; Gao, Y. Hakg: Hierarchy-aware knowledge gated network for recommendation. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, Madrid, Spain, 11–15 July 2022; pp. 1390–1400. [Google Scholar]
He, X.; Deng, K.; Wang, X.; Li, Y.; Zhang, Y.; Wang, M. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Virtual Event, 25–30 July 2020; pp. 639–648. [Google Scholar]
Chen, T.; Kornblith, S.; Norouzi, M.; Hinton, G. A simple framework for contrastive learning of visual representations. In Proceedings of the 37th International Conference on Machine Learning, PMLR, Vienna, Austria, 12–18 July 2020; pp. 1597–1607. [Google Scholar]
Jing, L.; Tian, Y. Self-supervised visual feature learning with deep neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 43, 4037–4058. [Google Scholar] [CrossRef] [PubMed]
Misra, I.; Maaten, L. Self-supervised learning of pretext-invariant representations. In Proceedings of the Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 6707–6717. [Google Scholar]
Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. Albert: A lite bert for self-supervised learning of language representations. arXiv 2019, arXiv:1909.11942. [Google Scholar]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R. Xlnet: Generalized autoregressive pretraining for language understanding. arXiv 2020, arXiv:1906.08237. [Google Scholar]
Xia, J.; Wu, L.; Chen, J.; Hu, B.; Li, S. Simgrace: A simple framework for graph contrastive learning without data augmentation. In Proceedings of the ACM Web Conference 2022, Virtual Event, 25–29 April 2022; pp. 1070–1079. [Google Scholar]
You, Y.; Chen, T.; Sui, Y.; Chen, T.; Wang, Z.; Shen, Y. Graph contrastive learning with augmentations. Adv. Neural Inf. Process. Syst. 2020, 33, 5812–5823. [Google Scholar]
Hao, B.; Zhang, J.; Yin, H.; Li, C.; Chen, H. Pre-training graph neural networks for cold-start users and items representation. In Proceedings of the 14th ACM International Conference on Web Search and Data Mining, Israel, 8–12 March 2021; pp. 265–273. [Google Scholar]
Yao, T.; Yi, X.; Cheng, D.; Yu, F.; Chen, T. Self-supervised learning for large-scale item recommendations. In Proceedings of the 30th ACM International Conference on Information & Knowledge Management, Gold Coast, QLD, Australia, 1–5 November 2021; pp. 4321–4330. [Google Scholar]
Devlin, J.; Chang, M.; Lee, K.; Toutanova, K. BERT: Pre-training of deep bidirectional transformers for language understanding. arXiv 2015, arXiv:1511.03791. [Google Scholar]
He, K.; Chen, X.; Xie, S.; Li, Y.; Dollar, P.; Girshick, R. Masked autoencoders are scalable vision learners. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, New Orleans, LO, USA, 18–24 June 2022; pp. 16000–16009. [Google Scholar]
Kipf, T.; Welling, M. Semi-supervised classification with graph convolutional networks. arXiv 2016, arXiv:1609.02907. [Google Scholar]
Hamilton, W.; Ying, Z.; Leskovec, J. Inductive representation learning on large graphs. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; pp. 1025–1035. [Google Scholar]
Guo, K.; Hu, Y.; Sun, Y.; Qian, S.; Gao, J.; Yin, B. Hierarchical graph convolution network for traffic forecasting. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 2–9 February 2021; pp. 151–159. [Google Scholar]
Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmid, L. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the 25-th Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 452–461. [Google Scholar]

Figure 1. The overall framework of our proposed HKRec. For the triple-graph masked autoencoder module, we introduce two masking strategies, namely pair-wise connections masking and node features masking based on neighborhood-wise connections to corrupt the inputs. Hereafter, the masked KG are fed into the encoder–decoders and then optimized through three training strategies: edge reconstruction, degree restoration, and node feature reconstruction. For the parallel cross-view contrastive learning module, we employ the edge-level and neighborhood-level dropout strategies to perturb the original KG and UIG separately, creating multiple structural views. Then, parallel contrastive learning is employed between the KG views and the UIG views.

Figure 2. The overview of the parallel cross-view contrastive learning module.

Figure 3. Hyperparameter sensitivity study of HKRec with respect to different connection masking rates, feature masking rates, and node dropout rates.

Figure 4. Comparison results in different user groups. The lower group number indicates the scarcity of historical interaction information, leading to a more pronounced cold-start effect.

Figure 5. Comparison results in different item sparsity levels.

Figure 6. Visualization of item similarity on MIND.

Figure 7. Example subgraph of our selected item on MIND. Here, light green circles and light grey lines constitute the KG, while light green and light blue circles with orange lines form the UIG. Ellipses are used to indicate nodes and relations that are not displayed in the subgraphs.

Table 1. Statistics of the datasets.

Dataset		Last-FM	MIND	Alibaba-iFashion
User–Item graph	#Users	23,566	100,000	114,737
	#Items	48,123	30,577	30,040
	#Interactions	3,034,796	2,975,319	1,781,093
	#Density	$2.7 \times 10^{- 3}$	$9.7 \times 10^{- 4}$	$5.2 \times 10^{- 4}$
Knowledge graph	#Entities	58,266	24,733	59,156
	#Relations	9	512	51
	#Triplets	464,567	148,568	279,155

Table 2. Performance comparisons of different methods on Last-FM, MIND, and Alibaba-iFashion. The best results are bolded and the second-best are underlined.

	Last-FM		MIND		Alibaba-iFashion
	Recall	NDCG	Recall	NDCG	Recall	NDCG
BPR [56]	0.0847	0.0720	0.0392	0.0264	0.0821	0.0506
LightGCN [41]	0.0716	0.0644	0.0403	0.0279	0.0999	0.0614
CKE [11]	0.0853	0.0707	0.0385	0.0276	0.0778	0.0482
KGAT [19]	0.0873	0.0743	0.0339	0.0294	0.0961	0.0582
KGIN [23]	0.0914	0.0786	0.0382	0.0245	0.1171	0.0730
SGL [25]	0.0768	0.0679	0.0339	0.0210	0.1118	0.0702
KGCL [27]	0.0876	0.0789	0.0352	0.0221	0.1097	0.0693
KGRec [31]	0.0933	0.0801	0.0414	0.0328	0.1170	0.0731
HKRec	0.0954	0.0829	0.0445	0.0348	0.1192	0.0743

Table 3. Ablation study of the HKRec. The comparison model of w/o ND refers to a variant obtained by removing the neighborhood-level dropout component in HKRec, whereas the model w/o T-GMAE signifies the removal of the degree decoder and the masked feature decoder. The best results are bolded.

MODEL	MIND		Alibaba-iFashion
MODEL	Recall	NDCG	Recall	NDCG
HKRec	0.0445	0.0348	0.1192	0.0743
w/o ND	0.0438	0.0346	0.1185	0.0741
w/o T-GMAE	0.0432	0.0334	0.1180	0.0737
KGRec	0.0414	0.0328	0.1170	0.0731

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhou, C.; Zhou, S.; Huang, J.; Wang, D. Hierarchical Self-Supervised Learning for Knowledge-Aware Recommendation. Appl. Sci. 2024, 14, 9394. https://doi.org/10.3390/app14209394

AMA Style

Zhou C, Zhou S, Huang J, Wang D. Hierarchical Self-Supervised Learning for Knowledge-Aware Recommendation. Applied Sciences. 2024; 14(20):9394. https://doi.org/10.3390/app14209394

Chicago/Turabian Style

Zhou, Cong, Sihang Zhou, Jian Huang, and Dong Wang. 2024. "Hierarchical Self-Supervised Learning for Knowledge-Aware Recommendation" Applied Sciences 14, no. 20: 9394. https://doi.org/10.3390/app14209394

APA Style

Zhou, C., Zhou, S., Huang, J., & Wang, D. (2024). Hierarchical Self-Supervised Learning for Knowledge-Aware Recommendation. Applied Sciences, 14(20), 9394. https://doi.org/10.3390/app14209394

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Hierarchical Self-Supervised Learning for Knowledge-Aware Recommendation

Abstract

1. Introduction

2. Related Work

2.1. Knowledge-Aware Recommendation

2.2. Self-Supervised Learning

3. Preliminaries

4. Methodology

4.1. Triple-Graph Masked Autoencoder

4.1.1. Node Connection Reconstruction

4.1.2. Node Feature Reconstruction

4.2. Parallel Cross-View Contrastive Learning

4.2.1. Graph Augmentations

4.2.2. Contrastive Learning

4.2.3. Joint Learning Strategy

5. Experiments

5.1. Experimental Settings

5.1.1. Datasets

5.1.2. Evaluation Metrics

5.1.3. Baseline Models

5.2. Performance Comparison (RQ1)

5.3. Ablation Study (RQ2)

5.4. Benefits of HKRec (RQ3)

5.5. Item Embedding Visualization (RQ4)

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI