1. Introduction
Academic recommendation systems have rapidly developed in recent years, and effective academic recommendation systems can alleviate information overload and help researchers quickly find relevant literature. Academic resource platforms have developed content-rich recommendation pages, such as Baidu Scholar, Google Scholar, etc., which provide lists of related paper recommendations.
Content-based, collaborative filtering, and graph-based recommendation systems are the most widely used methods for paper recommendation. Hybrid suggestions utilize two or more different approaches. Content-based approaches compute the text’s similarity and produce a recommendation list. Typically, they use techniques like topic modeling [
1], word embedding [
2], word frequency analysis [
3], or a combination of word and sequence modeling approaches [
4]. Collaborative filtering-based approaches assess a user’s reading records and predict the user’s preferences for unread papers using methods such as nearest neighbor computation, matrix decomposition [
5], and deep learning [
6]. Graph-based approaches, which often use homomorphic graphs, such as citation networks [
7,
8,
9], or heteromorphic graphs [
10,
11], like those constructed by entities such as authors–conferences–papers, generate embeddings of the entities. They then create recommendation lists via meta-path methods [
12] or graph neural networks [
13].
The main focus of this study is implicit feedback-based collaborative filtering. Methods relying on explicit feedback, such as ratings, are not suitable for academic paper recommendations since users do not typically score or rate papers on academic platforms; moreover, suggestions for academic papers should be based on implicit feedback. Compared to explicit feedback, like ratings or reviews, implicit feedback is easier to collect, but it has more uncertainty. Conventional implicit feedback-based recommendation techniques typically depend on subjective negative sample assumptions, such as establishing a negative sample using randomized uniform sampling [
14] or based on some a priori information [
15]. We contend that the negative sample of academic article recommendations contains outcomes that are ambiguous when based on such assumptions. Since users only have time to read a limited amount of papers, the primary cause of the missing data is a lack of access to the corresponding articles—rather than a lack of interest in the papers’ contents. Consequently, the accuracy of recommender systems is restricted by the subjective negative sample assumption. Secondly, there is a significant imbalance in the quantity of positive and negative samples. Because of the system’s high quantity of non-interactive papers and the severe data sparsity issue, these papers receive little exposure due to inadequate model training, which hinders the advancement of science and technology as well as the communication of scholarly findings. Some studies introduce context for implicit feedback data in an attempt to reduce sparsity. For example, eALS contends that negative samples should be sorted by hotness [
16]. However, we contend that this approach is inappropriate for the academic setting, where we value innovation and require an unbiased method of selecting negative samples.
In this work, we tackle the two aforementioned issues and make improvements from a metric-learning perspective. The following is a summary of this paper’s primary contributions:
1. We offer a context-aware metric learning strategy that effectively modifies the model to learn from implicit feedback.
2. The loss function is separately modeled by the algorithm for positive and negative samples. We present the content factor, which improves the data sparsity issue and expedites the objective function’s convergence via the multiplication rule for the positive sample pull-in operation. We employ the intermediate matrix caching technique to greatly reduce the computing complexity for the negative sample push-off operation, and we adopt the unbiased global negative sample method.
3. Experimental results on two real datasets show that our method outperforms other baselines in terms of recommendation accuracy and computational efficiency. Our results demonstrate the potential of metric learning in dealing with the problem of implicit feedback recommender systems with positive and negative sample imbalance.
The rest of this paper is organized as follows.
Section 2 focuses on the related work, including academic resource recommendations, an implicit feedback-based approach, and metric learning.
Section 3 introduces metric learning to model users’ implicit feedback and optimizes the computational process by applying the alternating element multiplier method to the negative sampling problems of sparse matrices.
Section 4 presents the dataset, experimental methodology, and metrics, introduces the comparison model, analyzes the experimental results, and discusses the contribution of different factors in the model. Finally,
Section 5 presents the conclusion of the study and the outlook for future research.
3. Context-Aware Element-Wise Alternating Least Squares
3.1. Problem Statement and Modeling Framework
This section defines the data structure, describes the research problem, and shows the modeling framework.
Table 1 lists the key notations of this paper.
In the following, we build the model from a metric learning perspective, discuss the optimization of the model, and finally describe a method for computing the context factor.
As shown in
Figure 1, our approach aims to combine multiple contexts to generate paper recommendations. First, we use user profiles to compute the relevance of the user’s visited and unvisited papers in the citation network. Then, the user profile is utilized to compute the topic similarity score between the user and the target paper. Finally, we combine the above a priori information to compute the context factor of the user’s unread papers.
3.2. Starting with Metric Learning
Referring to the simple idea of metric learning, we build the objective function based on the “push” and “pull” operations:
3.3. User Preference Modeling and Algorithm Optimization
as the “pull operation”, we take the dot product as the distance measure; the closer the dot product is to 1, the closer the vectors are, subject to the modulus constraints
. We take
. For positive feedback from users, the user is clearly influenced by contextual information. For papers that the user has not accessed, it is predominantly because the user is unaware of the paper’s existence. Of course, there are users who have already seen the paper but are not interested. In this case of uncertainty about the user’s interest preference, it is not appropriate to use contextual information to determine the user’s reading. Therefore, in this paper, we refer to eALS [
3] and TGSC-PMF [
31], and use different fusion strategies to adopt varied contextual information strategies for the two cases of already-read and unread papers. To accelerate convergence, a content relevance factor
is introduced to perform a stretching operation on the dot product.
where
; the more relevant the content is, the closer
is to 1. When
is much smaller than 1, the model enables
and
to converge to each other’s neighborhood at a faster rate. This introduction of contextual parameters is intuitive: the fact that user u chooses item i with little content relevance suggests that, in some way, this item particularly fits the user’s needs. By quickly bringing such positive sample pairs closer together, the system can quickly capture the user’s particular preferences.
Similarly, in this case of uncertainty about the user’s interest preference, it is inappropriate to use contextual information to determine the user’s reading interest. The closer the dot product is to 0, the further away the vectors are. Therefore, without introducing the content relevance factor in the “push” operation, then .
uses the weight setting method of ALS [
32],
. Then, the objective function can be abbreviated as
According to the Lagrange multiplier method, the objective function can be optimized in the following way:
We can optimize the objective function using the stochastic gradient descent (SGD) method. We set , , . Using the derivative of the objective function L with respect to , the following equation is obtained: .
We let
, and obtain Equation (
8):
Observing
, we can see that the computational complexity mainly comes from the negatively sampled terms
and
. Referring to the eALS, the use of cache matrices in the optimization process can significantly reduce the computational complexity. We set
We define the cache matrix:
then we obtain Equation (
11):
We substitute the above conclusion into Equation (
9) to obtain Equation (
12):
Similarly, We define the cache matrix:
we obtain Equation (
14):
By reusing
and
, and optimizing parameters at the element level, which involves optimizing one of the latent vectors while leaving the others fixed, we can ensure that the context-fused model retains the computational complexity of eALS. Algorithm 1 summarizes our method. The whole model optimization process can be easily visualized in
Figure 2.
Algorithm 1 Context-aware element-wise alternating least squares algorithm |
Input: : interaction matrix; : weight matrix; : user’s text preference matrix for papers; : user citation preference matrix for papers; : latent vector dimension |
Output: optimal : latent vector matrix for users; : latent vector matrix for papers |
- 1:
Randomly initialize ,; - 2:
repeat - 3:
compute cache matrix for ; - 4:
For u from 1: Do - 5:
For f from 1:K Do - 6:
- 7:
END - 8:
END - 9:
compute cache matrix for ; - 10:
For u from 1: Do - 11:
For f from 1:K Do - 12:
- 13:
END - 14:
END - 15:
until CONVERGE
|
3.4. Context Factor
Aiming at the context factor , which is specific to the paper recommendation system, this paper proposes a feasible method through experiments. The text and citation relationship of the paper itself contains a large amount of relevant a priori information. In this paper, we start with the self-supervised method of text and citation to compute the relevance of users and papers in these two domains. At the same time, in the real engineering environment, this process is carried out offline and does not increase the complexity of the online recommendation process.
3.4.1. Self-Supervised Relevance Modeling Based on Citation Networks
To learn the vector representation of an article within a citation network, we utilize a generative model. Each article, represented as a node in this network, has a low-dimensional vector representation of itself, p, as well as a low-dimensional vector representation of the article when it serves as a context, denoted as
. Moreover,
should converge to
p. The citation network can be thought of as a directed graph network. The contexts of two article nodes are more closely related and, hence, more relevant, when they have more neighbors. The conditional probability of producing
from node
for every edge <
> in the citation network can be written as follows:
where
is the number of neighbors. Intuitively, two nodes whose context distributions are more similar should be more similar, so the context distributions should approximate their empirical distributions. The empirical distribution
can be defined as follows:
where
is the weight of edge <
>. Here, we choose the degree of node
as the value of
. We use KL divergence as the objective function to measure the difference between the contextual and empirical distributions. Since the number of negative edges overwhelms the computational power, random negative sampling is introduced into the model computation process to reduce computational effort. Randomized negative sampling involves sampling several negative edges according to the noise distribution for each edge <
>. In this paper, positive and negative samples are used to optimize the objective function. The objective function can be simplified as follows:
where
is the sigmoid function and
.
is the mathematical expectation. Each time the computational model collects an edge <
> from the citation network as a positive sample, it samples
K nodes from the noise distribution,
, to form the negative sample <
>.
Insufficient training of the related embedding vectors will impair the quality of suggestions because new papers are rarely cited. To address this issue, higher-order neighbors are introduced in this study. Second-order neighbor sampling is utilized for less frequently referenced papers. As illustrated in
Figure 3, if the empirical distribution at this moment is expanded as follows for the path
:
Finally, we model the citation network correlation score through the above computational process. , where is the set of papers in the reading record of user .
3.4.2. Topic-Based Model for Text Relevance
We use topic modeling to generate textual representations. First, we aggregate the bag of words for any paper l. Similarly, we aggregate all papers in the profile of a given user to form the bag of words for one. Then, we can obtain the bag of words for any user or paper.
Naturally, the topic distributions of papers and readers may be similar. As we can see in
Figure 4, the generation of topic distribution is as follows:
For any bag of words ,
- (a)
Draw topic proportions .
- (b)
For each word in :
Draw topic assignment
Draw word
We use variational inference to estimate the topic–word distribution versus the topic distribution. We use cosine similarity to measure the similarity between two subject distributions. The formula for cosine similarity is as follows:
The closer the article is to the users on topic distribution, the closer cosine similarity is to 1.
3.4.3. Contextual Fusion Methods Based on the Multiplication Rule
Based on the multiplication rule [
46,
47,
48], we integrate these relevance scores into a unified preference score
, which is defined as follows:
where
is the text relevance between the user topic distribution
and article
, as well as the article topic distribution
, and
is the citation relevance between user
and article
. The computation of contextual relevance
is performed offline, thus not increasing the computational complexity of the online recommendation process.
3.5. Complexity Analysis
Algorithm 1 shows the optimization of context-aware eALS. Line 3 precomputes the cache matrix
according to Equation (
10), whose computational complexity is
, and lines 4–8 compute the user’s hidden feature matrix, whose computational complexity is
. Line 9 precomputes the cache matrix
according to Equation (
13), whose computational complexity is
. Lines 10–14 compute the thesis hidden feature matrix
q with computational complexity
. So, the online computational complexity of the whole model is
, proportional to the size of the dataset
, the size of the set of users
, the size of the set of papers
, and the square of the dimension of the hidden features
K. It can be seen that the optimization proposed in this paper has the same order of magnitude of complexity as the eALS algorithm [
16] without introducing context. Thus, with the preprocessing of the context factor, the computational complexity of the model proposed in this paper remains consistent with the complexity of the eALS algorithm without fusing content information. Therefore, this is currently the more advanced algorithm in terms of computational efficiency. The complexity of these models is listed in
Table 2.
5. Conclusions and Future Work
Addressing the uncertainty of implicit feedback and the positive and negative sample imbalance in academic paper recommendations, this paper proposes a context-sensitive recommendation method from the perspective of metric learning, which, in addition to taking into account factors such as textual information and citation networks, improves the accuracy of the model while greatly reducing the computational complexity through the introduction of process matrix caching. Experimental results on two real paper recommendation datasets demonstrate the effectiveness of context introduction, with the proposed method showing more than a 5% improvement over the alternating element multiplier method.
In recent years, knowledge graph has shown great potential in recommender systems, and some researchers have combined knowledge graph techniques with traditional collaborative filtering techniques to improve the performance of recommender systems. Combining knowledge graph techniques with the context-sensitive paper recommendation algorithm proposed in this paper will be an interesting research direction.