This section presents the collected papers related to DKG-based RSs. Depending on how these papers integrate the KGs or KGEs with DL models, we categorize them into four main groups: two-stage explainable learning methods, joint-stage explainable learning methods, Path-embedding explainable learning, and the Propagation explainable methods. As stated earlier, our survey goes deeper to present the mathematical modules and algorithms employed in those collected papers. For each published paper, we highlight the drawback associated with the algorithm used, if necessary. A detailed survey of these papers is given in the subsequent sections. For ease of checking the literature, we summarize them in a tabular form.
Table 3 presents a summary of collected papers. In the table, under “Mode of Explainability”, “
WHUE” means “Written Human-Understandable Explanations”, “
IANM”, means “Interpretability based on Attention Network Mechanism”, “
TSELM” stand for “Two-Stage Explainable Learning Method”, “
JSELM” means “Joint-Stage Explainable Learning Methods”, “
PEELM” stands for the Path-Embedding Explainable Learning Methods, and “
PEM” means the Propagation Explainable methods.
3.1. The Two-Stage DKG-Based Learning Methods
In the two-stage explainable learning method, the KG or KGE modules and the recommendation module are trained separately. In the first instance, KG or KGE algorithms are integrated with DL methods and are used to learn how the entities and relations are represented. Then, along with other attributes, the information from the trained graphs is fed into the prediction or recommendation module for accurate predictions and explanations.
Table 4 provides a summary of the collected papers under this method.
As mentioned earlier, accurate predictions and explainability are the two main factors to consider when evaluating a recommendation model and have become one of the basic trade-offs in XAI. In [
61] Gao et al. proposed a Deep Explicit Attentive Multi-View Learning Model (DEAML) to mitigate the trade-off between accuracy and explainability by developing a deep explainable model for item recommendation (Toys and Games, and Digital Music). First, the authors leverage the Microsoft Concept KG to build a tree-like
explicit feature hierarchy , where each node
in
represent an explicit feature of an item. The model consists of two steps: the first step involves item prediction with attentive multi-view learning, while the second step generates personalized explanations. In the first instance, the embeddings of user
i and item
j are generated and used as inputs to the DEAML model. Following the method proposed in [
62], the user-item feature embeddings are then trained to capture both semantic and hierarchical information to represent multiple user views. Given the user
i and item
j embeddings, the output of DEAML includes the predicted item ratings
and a personalized feature-level explanations of the set
. In mathematical detail, for user
i with embedding
for view
h, and item
v with embedding
at level
h, the authors proposed the rating prediction
by adding rating bias
, user bias
and item bias
to the rating formula of [
78] to give
Multiple views are then combined using the weighted sum of predictions
in each view as
where
is the weight of view
h. Finally, the Adam optimizer is employed to automatically adjust the learning rate by optimizing the following objective function:
where
is the loss in view
h,
is the co-regularization loss,
are weights parameters, and
is the L2 norm of all parameters in the model. The explanations presented to the user are in the form: “
You might be interested in (Features of )
on which the item performs well”.
As stated in [
79], typical explainable RSs include one or two layers of attention networks. Being aware of this, a novel multi-channel and word-entity-aligned algorithm, known as Deep Knowledge-aware Network (DKN), was proposed in [
63]. DKN combines word semantics, TransD knowledge embedding method, and knowledge level information from Microsoft Satori KG to learn the representation of entities. Instead of concatenating the word and entity embeddings of news titles as proposed in [
80], two additional Knowledge-aware CNN (KCNN) embeddings, namely “transformed entity embeddings” and “transformed context embeddings”, are used together with the word embeddings. Here, the transformed embeddings aid in alleviating the heterogeneity between word and entity spaces by self-learning. The “transformed context embeddings” aid in identifying the position of each entity in the KG. Similar to Kim CNN [
81], the three embedding matrices are aligned and stacked together as a multi-channel. Multi-filters are then applied to the multi-channel input to extract specific local features, which are then concatenated together and used as the final representation of the input news title. To characterize the dynamic interests of the user, a DL attention-based network is used to model the user’s clicked news title on the candidate news. Mathematically, for a user
with clicked news-history
, the attention-based mechanism is used to generate its embedding
with respect to the candidate news
as
where
is the embeddings of user
s clicked history, and
is the softmax function which measures the degree of similarities between the candidate news
and the clicked news history
. Finally, based on the calculated
, another DL algorithm is employed for the prediction,
of user
s preference. Mathematically, for a given DL algorithm denoted as
, the authors evaluate the prediction as
where
is the candidate news embeddings. In summary, the superiority of DKN lies in its two main properties: (1) the use of the word-entity-aligned KCNN for sentence representation learning, which greatly preserves the association between words and entities; (2) and the use of an attention-based network to treat users’ click history discriminatively, to better capture users’ diverse reading interests.
In [
64], Huang et al. proposed the Knowledge-enhance Sequential Recommender (KSR) model for a sequential recommendation. KSR integrates Recurrent Neural Networks (RNN)-based algorithm structured as Gated Recurrent Unit (GRU) with TransE embedding and information from the Freebase KG for item recommendation and interpretability. The model starts with a GRU-based
sequential recommender, which is then augmented with a
Key-Value Memory Networks (KV-MN), using entity attribute information from the Freebase KG. The KV-MN is an external memory consisting of a larger array of slots explicitly storing and memorizing information from the Freebase KG. The KV-MN splits the array slots into a
key vector and a
value vector, which are then stacked together in a memory slot to capture fine-grained user preference to improve interpretability. Here, the TransE embedding is used to learn both the embeddings for the entities
and relations
from the Freebase KG, of which the relation embedding is taken as the attribute
key vector in the KV-MN model. In mathematical detail, given the interaction sequence
of a user
u at time
t, the GRU first computes the current hidden state vector
(also called the
sequential preference representation of user
u), conditioned on the previous hidden state
as
where
is the item embedding vector pre-trained with the Bayesian Personalized Ranking (BPR) model [
82]. The vector
is then taken as the query to the KV-MNs model, where a Multi-Layer Perceptron (MLP) is adopted to implement a nonlinear transformation as
Using
as the query to read the KV-MN, an
attribute-based preference representation vector
is then designed with attention mechanism on vectors of item attributes. A vector concatenation is then used to combine the representation of both user preference
, and item embedding
into single vectors
, and
, respectively. After transforming
and
to the same dimension, the user’s preference for items is computed as a ranking score given as
The KSR recommendation model is also highly interpretable by checking the user’s attention weight over explicit attributes. For instance, the “actor’s” attribute dominating the attention weight for a recommended movie indicates the recommendation is produced based on that feature. Thus, such feature-level attention weight reflects the user’s explicit preference. In [
65], Wang et al. proposed the joint KG and user preference model (JKP) for an explainable recommendation. The authors employ MLP for both the representation of embeddings and item recommendation. According to the data presented by the authors, JKP outperforms DKN in terms of the AUC. Although this claim may be valid, the recommendations may not be trustworthy. This is because both entity representation and item recommendation are jointly trained together with the same MLP algorithm, and this “MLP joint-training” process can negatively influence the prediction of the user’s preference of candidate items.
Overall, the two-stage DKG learning methods are straightforward to execute as the KGEs can quickly be learned without necessarily interacting with the data. The KG embeddings are usually treated as extra attributes that can be used for the subsequent recommendation module. Thus, large-scale interactive datasets can be learned separately for the recommendation, thereby reducing the computational cost. In addition, it becomes unnecessary to repeat the learning process or update embeddings once they are learned. In all, it is easy to implement this method, and highly scalable. However, this method suffers from improper embeddings, and is more applicable to in-graph applications such as link predictions than recommendation tasks and mostly lack an end-to-end manner.
3.2. The Joint-Stage DKG-Based Learning Methods
In this trend, information from multiple sources, including side information, can be learned and aggregated to produce the final recommendation. Thus, this method leverages multi-modal information, including images, text, and ratings, for top N recommendations. We provide a summary of reviewed publications under this method in
Table 5.
In [
66], Ai et al. designed an explainable RS (ECFKG) by integrating the traditional CF framework with the learning of a KG for amazon product recommendations. First, the model is embedded with an automatically-built graph containing entities and user behaviors and a set of minimal features (e.g., produced_by, Bought_together, also_viewed). Then the design philosophy of CF and the TransE KGE is employed to learn over the KG for a personalized recommendation. Based on a fuzzy soft matching algorithm (Fuzzy-SMA), the authors then conduct fuzzy reasoning over the paths in the product KG to generate personalized explanations. In all, two main objectives are identified: for each user,
u, a set of items
i that user
u is most likely to purchase is identified; and for each retrieved user-item pair, a natural language sentence is developed to explain why the user should buy the item. In mathematical detail, the product KG is first constructed as a set of triples
, where
and
are head and tail entities respectively, and
r is the relationship between
and
. Each entity is then projected to a low-dimensional latent space, and each relation is treated as a translational function for entity conversion. Latent vectors
and
are constructed for the head and tail entities, respectively, and the relation vector
is modeled as a linear projection of the form
. The entity embeddings are then learned by joining entities in the product KG with the translational function in the latent space. Meanwhile, the translation model
is learned by optimizing the generative probability of
, using the negative sampling model proposed in [
83], and the log-likelihood approximation of
. The second stage is to conduct soft matching between the tail entities and the translation model to explain recommended items. This is done by constructing an explanation path between the user and the item in the latent space. The algorithm first conducts breadth-first search (BFS) of maximum depth
z from the user
and item
entities to search for an explanation path that can potentially exist between them. Thus, given an intermediary entity,
, the paths between
and
are then memorized, and the path probability is computed using the soft matching equation as
where
denotes the set of tail entities for relations, and are two sets of relations for the user and item respectively, denotes the set of entity vector, and are integers. The best explanation for can then be obtained by ranking all the path probabilities, and the best one is selected for natural language explanation.
In [
58], the authors exploit the potentials of Semantic-Aware Autoencoder (SemAuto) [
59] to develop content-based explainable (ESemAuto) recommendations for movies. In their algorithm, the structure of DBpedia KG is first combined with an Autoencoder Neural Networks (Auto-NN), whose structure is constructed to mimic the existing connections in the DBpedia KG. User ratings are then fed into the Auto-NN to extract the weights associated with the hidden neurons and then employed to compute recommendations. Mathematically, for those neurons that tend to be not connected in the KG, their autoencoders are trained via feedforward and backpropagation using a masking multiplier matrix
M, where the rows and columns represent items and features, respectively. Hidden layers
and output layers
are computed as
A modified stochastic gradient descent (SGD) is then used to learn the hidden neuron weights
and
as
with respect to the mean square error
E. As each hidden neuron constitutes an entity in the DBpedia KG, the pre-trained weights
are used as indicators, representing the significance of the corresponding entity for the user. After training the autoencoder for each user, the semantic autoencoder algorithm proposed initially in [
59] is adopted to provide explanations for the top
N movie recommendations. To describe movies, the authors used a set of predicates of the form (
dct:subject, dct:starring, dct:director, dct:writer), then relying on the weights
associated with features in the user’s profile, a human-understandable explanation is constructed. The explanations are presented in three forms,
popularity-based: (“
We suggest these items since they are very popular among people who like the same movies as you”.),
point-wise personalization: (“
We guess you would like to watch items X and Y since they are about F”),
pair-wise personalization: (“
We guess you would like to watch X more than Y because you may prefer R”).
Contrary to the two-stage methods, the joint-stage DKG-based methods can be trained in an end-to-end manner. In this way, the module used for the recommendation can also be used to guide the attribute learning process of the graph embedding module. However, these methods require efforts to aggregate different models under one framework. For instance, weights from different objective functions from the KG are often combined and fine-tuned, and used to regularize the RS, which can introduce bias due to regularization.
3.3. The Path-Embedding DKG-Based Learning Methods
In this method, explicit embeddings of various patterns of user-item or item-item connections in the KG are explored to provide additional guidance for the recommendation. A summary of the reviewed publications under this method is given in
Table 6.
In [
67], the authors integrated two rule-based modules (induction rules and recommendation rules modules) to generate an effective and explainable recommendations model (RuleRec). The induction rules, mined from the Freebase KG, utilize multi-hop relational patterns to infer associations among different items for model predictions and explanations. Mathematically, given a rule
between item pairs
, the authors defined a probabilistic term
P to find specific paths between
a and
b as
where
, and
is the probability of reaching node
b from
e with a one-hop random walk, given the relation
, and
is a reachable node set with
from node
a. Given the
i-th entry
of
and a target vector
of a stochastic variable
A, a Chi-square objective function
is employed to choose the top
n useful rules from the derived rules, given by
where
is the
i-th weight of the weight vector
w, representing the importance of each rule. After generating the correct paths, the induced rules are augmented with the recommendation module to provide better generalization capacity. In detail, each user/item is represented with a vector of latent features by separately using a modified BPR Matrix Factorization (BPRMF) model [
82], and a refined version of the Neural Collaborative Filtering (NCF) [
84] algorithm for item prediction. Here, the goal is to predict an item list from a set of items
for user
u based on a preference score
given by
where
can be replaced by the BPRMF or the NCF predictive functions, thus obtaining two separate predictive models for performance verification.
is a parameterized function of weights
w, and
is an indicator function of at least a fraction
of the training item pairs. The objective function for the recommendation model is then defined as
Finally, a joint optimization framework that combines the rule-based learning module and the recommendation module is then employed by the authors to conduct joint learning, as . The authors then show how the proposed model provides explainability to the recommended item.
In [
68], Weiping et al. proposed a knowledge-aware model (EKAR) for a path-based explainable recommendation for movies and books. The authors utilized deep reinforcement learning and Markov Decision Process (MDP) on the user-item-entity graph to generate explainable paths. Treating the user-item-entity graph as a state
of a sequence of visited nodes and edges, a policy network consisting of two fully connected layers is developed to produce probability distribution over possible action space
. A success reward is then defined to depict an agent’s success in finding those items consumed by the target user
u in history. The rewards are finally augmented with a pre-trained KGE to stabilize training and explore diverse recommendation paths. In mathematical detail, choosing each action
as the concatenation of relation and entity embeddings, the state
is first encoded. Then based on the parameterized state
and parameterized action
, the probability distribution over possible action space
is computed as
where
and
are weights matrices and weight vectors of the neural network,
is a nonlinear activation function, and
is the probability of action
given state
. To accelerate the training process, the success reward
[
85] is generated as follows to encourage the agent to explore items that have not been purchased:
where
is the sigmoid function and
is a special relation called “interact”.
computes the correlation between user
and searched item
, while
is the user-item-entity graph. The policy gradient [
86] method is finally employed to optimize the policy network. Explanations to recommended items are latently provided by the user-item paths, which the authors demonstrate in their paper.
In [
26], Xian et al. proposed the policy-Guided Path Reasoning (PGPR) model for recommendation and interpretability through actual paths in the Freebase KG. PGRP combines Markov Decision Process and Reinforcement Learning with a multi-hop path-searching agent that learns (through a policy-guided algorithm) to navigate from a user to potentially “good” items conditioned on the starting user in the KG. In detail, given a state
, where
is the starting user entity, and
is the agent entity of history
prior to step
t, an action space
of the KG
is first constructed. Based on
and a scoring function
which maps any edge
to a real valued score, a user-conditional pruned action space
is created as
, to effectively maintain potentially “good” paths conditioned on the starting user
u, where
is a known upper bound on the size of
. A soft probabilistic reward
is then constructed to encourage the agent to explore as many “good” paths, based on the terminal state
as
Based on the above Markov Decision Process, optimal paths are obtained by learning a stochastic policy to maximize the expected cumulative reward for any starting user
u, as
where
is the discount factor. The authors solve the optimization problem through reinforcement learning by designing a policy network
, and a value network
that share the same feature layers. A “beam search“ algorithm guided by
T-hop generative probabilities
and path rewards
is finally employed to explore candidate paths and recommended items for each user. Interpretations of recommended items are based on the highest generative probabilities.
In [
27], Huang et al. proposed an Explainable Interaction-driven User Modeling (EIUM) algorithm for movie recommendations by extracting and encoding semantic paths between user-item pairs through the MovieLens (IMDB) dataset and the Freebase KG. To learn the KG, the authors developed a multi-modal fusion mechanism, wherein textual, visual, and structural features of items are extracted and incorporated into a network of multi-modal fusion constraints for user-item preference representation. A weighted pooling layer (WPL) is utilized to learn and integrate the different contributions of each path between user-item pairs. A sequence model is then constructed to learn the semantic representation of the paths between user-item pairs. The sequence model consists of a self-attention mechanism with a position encoding module capable of capturing long-distance dependencies of the sequence with various lengths. The two learning modules, i.e., the multi-modal fusion and the sequential recommendation modules, are then combined for joint learning. In mathematical detail, for the multi-modal fusion module, the textual
and visual
features are extracted using fastText and AlexNet, respectively, and represented as
where
denotes concatenation, and
,
are learning parameters, with
being a nonlinear activation function. The structural features
of entities and relation is learned by employing certain structural constraints on the TransE KGE [
17]. The constraints are given in the form:
,
,
, and
. For the sequence module, given a user’s interaction sequence
, the authors represent the sequential interaction of entity
as
, and the user preference interaction as
where
represents the attention score of interactions,
,
denotes a predefined relation between user
u and item
i, and the output embedding
represents the user dynamic preference. Finally, the joint learning is obtained by minimizing the combined objective functions of the two modules as
, where
is the regularization parameter, and
is the loss function of the recommendation module, which follows through cross-entropy function of a distance function
, given by:
with
v being the ground-truth item which is higher than all other items
, and
is the loss function of the KG, which is formed by the combination of the loss functions of the mult-modal fusion constraints, given by
EIUM predicts the user’s item preference with diversified semantic paths to offer path-level explanations. Thus, explanations are based on user-item interactive paths carrying different semantic information. For instance “The movie m5 is recommended since it’s the sequel to movie m3 you have watched”.
In [
69], Wang et al. proposed the Knowledge-aware Path Recurrent Network (KPRN) to exploit user-item interaction for explainable movie and music recommendations by employing the MovieLens, KKBox datasets, and the IMDb KG. Specifically, qualified paths between user-item pairs are extracted over the KGs, and then a Long Short-Term Memory (LSTM) network is adopted to develop the sequential dependencies among entities and relations. After that, a pooling operation, in the form of an attention mechanism, is employed to aggregate path representations to capture prediction signals for user-item pairs. Based on the attention mechanism, path-wise explanations to recommended items are then generated by the authors. In mathematical details, given a user-item pair
, and a set of all possible paths
connecting user
u and item
i, the KPRN model takes the paths
as its input, and outputs a score
signifying the degree of interaction between
u and
i, where
and
represents the KPRN model of parameters
. To generate the KPRN model, the authors employ a three-step strategy. In the first instance, the sequential embeddings of entities
and relation
are generated along paths
and concatenated as
, where each element represents an entity or relation. An RNN in the form of LSTM is then employed to explore the sequential embeddings and generate a unified representation for encoding the holistic semantics. In the last stage, a weighted pooling operation is utilized to combine multiple paths to generate the final score. Here, the final prediction score
is given by
where
is the prediction score for
K paths, and
with
being a hyperparameter for controlling exponential weights. Treating the recommendation as a binary classification problem (i.e., user-item interaction is assigned 1, otherwise 0), a negative log-likelihood loss function is employed to learn the model, with the objective function given by
where
, and
are respectively, the positive and negative interaction pairs. Path-wise explanations are given in the form “
Shakespeare in Love is recommended since you have watched Rush Hour acted by the same actor Tom Wilkinson”.
In [
70], Sun et al. proposed the Recurrent Knowledge Graph Embedding (RKGE) model to learn the semantic representation of entities and paths for user-item preference. In brief, RKGE uses an RNN architecture to model the semantics of paths that link the same entity pair in the KG for the user-item recommendation. Characterization of user preference towards an item is performed via a pooling operator to discriminate the saliency of different paths, and an SGD via a binary cross-entropy loss (BCELoss) is employed in the model optimization. The main advantage of RKGE is that it can automatically mine the connection patterns between entities in the KG to improve recommendation performance. Nonetheless, only one type of neural network is used by RKGE to encode the path embeddings, which cannot exhaustively extract path features, inhibiting the performance improvement of the recommendation.
To overcome the limitations of the RKGE model [
70], Li et al. [
71] propose the Deep Hybrid Knowledge Graph Embedding (DHKGE) model for the top-N recommendation. DHKGE encodes path embeddings between users and items used in the recommender system by integrating CNN and the LSTM network. DHKGE then utilizes an attention mechanism to distinguish the significance of multiple semantic paths between user-item pairs. The attention mechanism is mainly employed to aggregate the encoded path representations to create a hidden state vector, and a proximity score is used to compute the “closeness” between the target user and candidate items to generate the top-N recommendation. The DHKGE outperforms the RKGE in terms of accuracy.
Overall, the path-embedding methods explicitly learn the embeddings of connection patterns. The methods encode the patterns of user-item pair into latent vectors, thus making it possible to consider the effect of mutual relationships of the target user, and provide richer semantics. Most of the models under the path-embedding methods can capture the connection patterns by automatically enumerating the paths. However, for large dataset, it is impossible to mine all connection paths one-by-one if the relations are complex, without the help of pre-defined meta-structures.
3.4. The Propagation DKG-Based Learning Methods
Propagation-based methods exploit information in a KG by combining the representation of entities and relations with embeddings and higher-order connection patterns for personalized explainable recommendations. The main idea is based on embedding propagation, where the basic implementation is the graph neural networks technique. The methods fine-tune entity representations by combining the embeddings of multi-hop neighbors in the KG. This allows the user’s preferences to be predicted with the rich representations of the user and the items. We present the most recently published papers among this method, with a summary presented in
Table 7.
Recently, graph convolutional networks (GCN) have been deemed a state-of-the-art method for graph representation learning. In [
72], Yang et al. employed GCN to develop a four-step hierarchical explainable attention GCN model called HAGERec, to explore users’ preferences from higher-order structural connections of a heterogenous KG. HAGERec simultaneously learns user and item representations via a bi-directional information propagation strategy to exploit semantic information. In detail, multi-directional connections among users
u and their entities
, and items
v and item entities
, are first formulated using a multi-hop algorithm and a “flatten operation” to compress higher-order relations and embed the entities and relations of the KG into a vector space. User and item representations are then aggregated into an embedding vector in the second step, using the following embedding formulas:
where
and
denote the sample neighbor entities for
and
, respectively.
, and
are weights and biases, respectively, whereas
h is the attention parameter that is learned during model training.
is the concatenation vector operation, and
and
are vectors processed by an MLP layer. The third step employs the sample neighbor entities to construct interaction signals to preserve the structure of entities and their neighbor networks to present a more complete representations for users and items. The predicted probability of a user
u engaging an item
v is finally constructed in the fourth step as
, and the model objective function is constructed by optimizing the BPR loss function [
82] to learn the parameters of the HAGERec model as
where
and
are positive and negative interactions, respectively, and
are parameters estimated by minimizing
, with
being the regularization term. To explain user preferences, HAGERec leverages an attention-based relation propagation algorithm to build knowledge-aware connectivity that utilizes an attention score to infer the reason behind users’ preferences.
To provide a better recommendation with user-item side information, it is important to explicitly model higher-order relations in a collaborative KG (CKG). In [
73], Wang et al. proposed the Knowledge Graph Attention Network (KGAT) to explicitly model higher-order user-item connectivities in a hybrid CKG for an explainable recommendation. The KGAT framework utilizes a three-step strategy to model higher-order user-item relations. In the first stage, given a triple
, the TransR embedding is employed to learn the embeddings of each entity and relation by optimizing the translation
through a plausibility score
given as
where
,
are the projections of the head and tail entity embeddings
, respectively in the relation’s
r space, and
denotes the transformation matrix of relation
r. A pairwise ranking loss is then used to train the TransR embeddings. The loss function is formulated as
In the second stage, using a model similar to the GCN architecture, the generated embeddings of the entities are recursively propagated and updated in the CKG based on the embeddings of neighboring entities to capture high-order connectivities. A neural attention mechanism is then employed to learn the weights associated with neighboring entities during the propagation. The degree of information propagation from
h to
t, is formulated through a
nonlinear activation equation given by
where
controls the decay of each propagation on the triple
. Multiple representations are obtained for user and item nodes
and
after N-layer-propagations, which are then concatenated to form the final user-item representations
,
, respectively. Finally, the user preference for an item is predicted through the matching score
as
Justification for item recommendation can be interpreted quantitatively by observing the attention weights.
In [
74], Shimizu et al. employed KGAT model [
73] to develop an improved model-intrinsic (KGAT+) knowledge-based explainable recommendation framework for real-world services. KGAT+ addresses the computational complexity of the conventional KGAT model and maintains high accuracy and interpretability. To achieve this, the authors proposed a five-step framework of a learning algorithm. In the first instance, the massive volume of KG side information consisting of one-to-many relationships is compressed by employing a latent class model and class membership probability. The latent class is treated as an entity, whereas the class membership probability is the relational strength between head and tail entities. In mathematical detail, for a set of targeted relations belonging to a compressed “Collaborative Knowledge Graph (CKG)” denoted as
, a latent class of the form
is taken between the head
h and tail
t entities, respectively. Then based on these parameters, the membership probability is computed as
where
connects
h and
t, with probabilities
and
, respectively. The TransR KGE method is then used in the second stage to obtain the entity representation, and the parameters are updated based on the positive and negative triplets
and
, respectively, by minimizing the pairwise ranking loss
as
where
is a sigmoid function, and
is the probabilistic term. A normalized nonlinear activation equation bearing the probabilistic term
multiplying the one proposed in the conventional KGAT is then employed to compute the attention weights of each triplet in the third stage. After learning the embedded representations, they are concatenated to form the final user and item representations
and
, respectively, and the user preference for an item is computed as
. The BPR loss is then formulated as
where
and
are positive and negative sample pairs, respectively. The entire model is finally optimized using the combined loss functions as
where
is a set of parameters that are estimated by minimizing
, and
is the regularization term. In the fifth and final stage, those relations with soft probabilities compressed by the latent class model are restored back into the original state by calculating the inner product of the normalized nonlinear activation functions as
The restoration permits the model to interpret the same connection as the original data instead of an ambiguous interpretation of the connection with the latent class. In brief, each relation is assigned an attention weight indicating the significance of deciding on a user-item recommendation. Thus, by observing these attention weights, the justification for item recommendation can be interpreted quantitatively.
In [
75], Wang et al. proposed RippleNet, which is an end-to-end framework for a knowledge-aware recommendation. In brief, RippleNet propagates the users’ preferences directly over edges in the KG via an attention mechanism and then interprets the recommendations according to attention scores. However, RippleNet suffers from high computational costs. The model also relies on the post-analysis of the attention scores, which may not always be trustworthy. In addition, RippleNet treats the user’s historical interests as a “seed set,” which is then iteratively extended along KG links to discover the user’s hierarchical potential interests based on the candidate item. The possibility of the cold start problem can persist, especially for fresh-start users who have no historical interests. Moreover, the model does not fully explore the semantics of the relations between entities, leading to possible information loss during message passing. For the latter drawback and the computational cost problem, HAGERec [
72] resolves these issues pretty well.
In contrast to the RippleNet model, AKUPM [
76] employs the TransR embedding method to assign entities in the KG with different representations under various relations. AKUPM is designed to predict the click-through rate for a user-item pair, wherein information from the user is enhanced by entities related to the user’s clicked items. Specifically, the entities to be integrated into the model are initialized as the user’s clicked items, then propagated from near to far along relations in the KG to inject rich entities. In doing so, the model arrives at rich entities that can infer the user’s potential interests. In addition, the self-attention mechanism proposed in [
87] is utilized by AKUPM to assign weights for entities during the propagation process, which provides attention-based interpretability for the user’s potential interests. However, similar to the RippleNet, AKUPM does not necessarily resolve the cold start issues as it still requires the user’s historical data for the model to function.
The main aim of the propagation method is to aggregate both the embeddings of entities and that of the relations and the higher-order connection patterns for personalized recommendations. However, these models are usually computationally costly for large graphs, and models can lead to non-convergence.