Recommendation Model Based on Probabilistic Matrix Factorization and Rated Item Relevance

Han, Lifeng; Chen, Li; Shi, Xiaolong

doi:10.3390/electronics11244160

Open AccessArticle

Recommendation Model Based on Probabilistic Matrix Factorization and Rated Item Relevance

by

Lifeng Han

^1,2

,

Li Chen

^1,* and

Xiaolong Shi

³

¹

School of Information Science and Technology, Northwest University, Xi’an 710127, China

²

Xi’an Mingde Institute of Technology, Xi’an 710124, China

³

School of Computer Science and Technology, Xidian University, Xi’an 710126, China

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(24), 4160; https://doi.org/10.3390/electronics11244160

Submission received: 9 November 2022 / Revised: 5 December 2022 / Accepted: 6 December 2022 / Published: 13 December 2022

(This article belongs to the Section Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

Personalized recommendation has become indispensable in today’s information society. Personalized recommendations play a significant role for both information producers and consumers. Studies have shown that probability matrix factorization can improve personalized recommendation performance. However, most probability matrix factorization models ignore the effect of item-implicit association and user-implicit similarity on recommendation performance. To overcome this lack, we propose a recommendation model based on probability matrix factorization that considers the correlation of user rating items. Our model uses the resource allocation of the bipartite graphs and the random walk of meta-paths in heterogeneous networks to determine the implicit association of items and the implicit similarity of users, respectively. Thus, the final item association and user similarity are obtained. The final item and user similarity relationships are integrated into the probability matrix factorization model to obtain the user’s prediction score for a specific project. Finally, we validated the model on the Delicious-2k, Movielens-2k and last.fm-2k datasets. The results show that our proposed algorithm model has higher recommendation accuracy than other recommendation algorithms.

Keywords:

bipartite graph; heterogeneous network; item correlation relationship; probabilistic matrix factorization; recommendation system

1. Introduction

Academia and industry have strongly supported personalized recommendation systems since the beginning. Such systems are now in use by many internet providers offering many services, such as entertainment (e.g., movies, music, games, personalized news, and web page recommendations), e-commerce, and social networking. Although personalized recommendations can improve information acquisition, providers require considerable technical support to attract more users and improve profits.

Personalized recommendation systems have widely employed the matrix factorization model because of its high accuracy and scalability [1,2,3,4,5,6,7]. The system concept is to map the user and item into a joint potential feature space, employ the model using the interaction between the user and the item, then predict the unknown rating by the scalar product of the user and item feature vectors [4]. In 2007, the matrix factorization algorithm showed its advantages, and since then, the factorization model has been developing in depth and breadth [8]. Typical matrix factorization models, such as singular value decomposition (SVD) [9], non-negative matrix factorization (NNMF) [10], probabilistic matrix factorization (PMF) [11], and Bayesian probabilistic matrix factorization (BPMF) [12], came into use and improved recommendation accuracy.

However, data sparsity has been a problem with past personalized recommendations. As with the other collaborative filtering (CF) models, the matrix decomposition algorithm is perplexed by the sparsity of user data, especially for cold-start users with fewer rating data. The learned user vector features cannot fully reflect users’ preferences, resulting in poor recommendations. To improve the performance of the matrix factorization model, researchers have been trying to integrate the model with additional information [13] so that familiar users could have higher feature vector similarity and obtain user relationships more accurately and improve the target users’ item recommendations. Koren et al. [14] added user bias, item bias, and hidden feedback from the user to the original matrix factorization model. They proposed the SVD++ algorithm, significantly promoting recommendation accuracy and alleviating the cold-start problem. In [14,15] and [16,17,18,19,20,21,22], item attributes and social networks were integrated into the model, which improved recommendation performance.

Although the models mentioned above improved the recommendation accuracy and alleviated the cold-start problem for new users (i.e., cold-start users with usually no more than five rating records), it was difficult to find external information. For example, finding information about newly registered users’ friends is difficult. Under such circumstances and without external information, we should fully determine the user and item information and integrate them into the traditional matrix factorization model for better recommendation performance. We propose a probability matrix decomposition model that combines item correlation and user similarity (IC-US-PMF). The contributions of our study are as follows.

In order to solve the problem that there are few explicit relationships between items obtained by association rules, this paper draws lessons from the theory of item diffusion and takes into account the influence of item score on item diffusion, and an improved bipartite graph diffusion method is proposed. Then, the implicit relationship between items is obtained. In this way, the relationship between objects can be more fully excavated.
In order to solve the problem of insufficient similarity relationships between users due to the lack of scoring data, we transform the similarity relationship between users into explicit similarity relationship and implicit similarity relationship. Finally, through the integration of the two similar relationships, the final similarity of users was obtained.
When calculating the explicit similarity relationship of users, the problem of a less common score of users is taken into account, and then the explicit similarity relationship of users is calculated by calculating the similarity of users’ preferences for item labels.
According to the three-part graph structure of the user-item-tag, the implicit similarity relationship between users is obtained by the random walk algorithm.
By integrating the user similarity relationship and item association relationship into the probability matrix decomposition model, the prediction score of the user to the item is obtained.

2. Related Work

2.1. Recommendation Model based on Network Representation Learning

Model-based CF has some differences from memory-based CF. The model-based CF establishes a corresponding model based on the current information. For example, the rating matrix obtains the user and item feature vectors to realize the rating prediction of the target user to the item. The memory-based CF includes the Bayesian model [23], cluster model [24,25], neural network [26,27,28], regression analysis [29,30,31], and hidden language model [32,33,34]. This section introduces the recommendation model based on matrix factorization.

A classic latent factor model, matrix factorization, was noticed by researchers due to its simplicity and efficiency. The main idea of matrix factorization is that the product of the user and item latent feature matrices can approximate the rating model. Besides, the rating matrix is sparse, so the original matrix is approximately represented as the product of the transpose of matrix

U

and matrix

V

, as in (1).

R \approx U^{T} V

(1)

where R is an

m \times n

matrix,

U^{T}

is an

m \times k

matrix, and V is a

k \times n

matrix. The score

r_{u, v}

of item

v

made by user

u

can be predicted by the product of the user and item latent feature vectors

{p_{u}}^{T}

and

q_{v}

, as in (2). The gap between the true and predicted scores can be represented by the loss function defined by (3). In (3),

D

is the user and item set after rating, that is, the user and item in the training set, and

λ

is to control the regularization parameters to prevent overfitting.

{\hat{r}}_{u v} = {p_{u}}^{T} q_{v}

(2)

and

Loss = \sum_{(u, v) \in D} {(r_{u v} - {\hat{r}}_{u v})}^{2} + λ ({‖ p_{u} ‖}^{2} + {‖ q_{v} ‖}^{2})

(3)

PMF is a probability realization method of matrix factorization. Figure 1 shows that the user–item rating matrix can be decomposed to derive the low-rank user and item latent factor feature matrices, and the missing scores can be predicted by the low-rank feature matrices. Assume that R is the rating matrix of n users for m items, that

U

and

V

represent the user and item feature matrices, respectively, and that

U_{i}

and

V_{j}

represent the user and item feature vectors.

r_{i j}

is the score of item j by user i. The conditional distribution of the rating matrix R can be defined as

p (R ∣ U, V, σ_{R}^{2}) = \prod_{i = 1}^{m} \prod_{j = 1}^{n} {[N (r_{i j} ∣ g (U_{i}^{T} V_{j}), σ_{R}^{2})]}^{l_{i j}^{R}}

(4)

N (x ∣ μ, σ^{2})

describes that

x

follows a Gaussian distribution with a mean value of μ and a variance of

σ^{2}

,

l_{i j}^{R}

is a function indicating that if user

U_{i}

rates item

V_{j}

, its value is 1; otherwise, it is 0. The function

g (x)

is a logistic function,

g (x) 1 / (1 + e^{- x})

, which limits the value of

U_{i}^{T} V_{j}

within [0,1]. Suppose that both the user and item features obey a spherical Gaussian priori distribution with the mean value of 0, as shown in (5):

\begin{matrix} P (U ∣ σ_{U}^{2}) & = \prod_{u = 1}^{N} & N (U_{u} ∣ 0, σ_{U}^{2} I) \\ P (V ∣ σ_{V}^{2}) & = \prod_{i = 1}^{M} & N (V_{i} ∣ 0, σ_{V}^{2} I) \end{matrix}

(5)

From Bayesian inference, the posteriori probability of the user and item features can be expressed as

2.2. Personalized Recommendation based on Association Rule Mining

Association rule mining is used to discover the interdependence and correlation between things. The classic association, the rule-mining algorithm is the Apriori algorithm proposed by Agrawal et al. [35], which is defined as follows.

Let

V = {v_{1}, v_{2}, \dots, v_{n}}

be a set of items that contains

n

items.

v_{k}

represents the kth item, and, usually,

k -

.represents an item set with a length of

k . D = {t_{1}, t_{2}, \dots, t_{n}}

is a transaction set made up of subsets of item

V

. Each transaction

t_{i}

owns a unique transaction ID, namely, TID. Association rule mining aims to find expressions, such as

X = > Y

, in the database. In the expression,

X, Y \subseteq V

,

X

represents a commodity set,

Y

represents a commodity, and

X \cap Y = ϕ

.

To better describe the above definition, let us assume that there is a supermarket item set

V = {m i l k, b r e a d, b u t t e r, b e e r, d i a p e r s}

. The transaction set is shown in Table 1, each row representing a transaction. If the item in the item set does not appear in the transaction, it is “0”; otherwise, it is “1”.

From Table 1, the association rule of the supermarket can be obtained; that is,

{m i l k, b r e a d} = > {b u t t e r}

. This means that, if the customers in the supermarket buy milk and bread, there is a high probability that they will buy butter as well. Then, we introduce two important concepts to the association rule: support and confidence.

Definition 1:

Support. For the association rule

R

,

X = > Y

, and

X \cap Y = \emptyset

. The support of rule

R

refers to the probability of simultaneous occurrence of

X

and

Y

in the transaction database, which can be described as

S u p p o r t (X, Y) = \frac{P (X \cup Y)}{| D |} \times 100 %

(6)

where

P (X \cup Y)

indicates the number of transactions that contain item sets

X

and

Y

.

| D |

represents the total number of records in the transaction database.

Definition 2:

Confidence. For the association rule

R

,

X = > Y

, and

X \cap Y = \emptyset

. The confidence of rule

R

refers in the transaction database to the probability of the occurrence of item set

Y

on the premise of the occurrence of item set

X

, which can be described as

C o n f i d e n c e (X, Y) = \frac{P (X \cup Y)}{P (X)} \times 100 %

(7)

The Apriori algorithm is based on the iterative idea of layer-by-layer search. First, a candidate set is generated from the transaction database, and the candidate set is pruned by using the minimum support measurement to obtain new frequent itemsets. Then, the frequent itemsets are connected to form a new candidate set. The connection and pruning process keeps repeating until the final frequent set is empty, and the iterative process ends.

The Apriori algorithm traverses the database several times to generate many candidate frequent item sets. Han et al. [36] proposed a FP-growth algorithm based on the frequent pattern tree. Compared with the Apriori algorithm, the efficiency and performance of the FP-growth algorithm were significantly improved. Zhang et al. [37] proposed an algorithm based on sampling and information granulation.

2.3. Personalized Recommendation based on Bipartite Graph

As one of the most common network structures, bipartite graphs have a certain universality, and are often used, including personalized recommendation systems. The personalized recommendation research on bipartite graphs includes mainly bipartite graph recommendation algorithms based on massive diffusion and network representation learning.

A general network structure diagram,

G (N, E)

, is a bipartite graph because it describes the association relationship between two types of entities. The node set N includes

N_{u}

and

N_{v}

, and E represents the weight set between edges.

2.3.1. Bipartite Graph Recommendation Based on Massive Diffusion

Zhou T. et al. [38] first proposed a bipartite graph recommendation algorithm based on massive diffusion in 2007, which drew the principle of massive diffusion in physics. When modeling personalized recommendations according to the user–item selection relationship, this algorithm can set a particular initial resource value for the item and transfer the resources through the user–item network diagram to redistribute the resources. Research shows that the bipartite graph recommendation algorithm based on massive diffusion has higher accuracy than some classical recommendation algorithms. In [39], a bipartite graph recommendation algorithm based on heat conduction was proposed, which imitated the principle of heat conduction in physics. In modeling the personalized recommendation, the resource of the item was taken as heat. The heat (resource) would flow along the user–item bipartite recommendation system graph to realize the reallocation of heat (resource).

Assuming that there are

m

. users and

n

. items in the recommendation system, a bipartite graph consisting of

m + n

. nodes can be established. Meanwhile, any item selected by user

u i

. can recommend other products to

u i

. This ability to abstract resource allocation is the capacity to perform massive diffusion in the bipartite graph, as shown in Figure 2.

In Figure 2, the basic flow of massive diffusion in the user–item bipartite graph is described in a personalized recommendation system. The circular nodes represent the items, and their initial resource values are a, b, and c. The square nodes represent the users. In the first step of massive diffusion, assume that the items are evenly distributed to all users that have associations with them. For example, for three user nodes connected to item node

v 1

, which are

u 1

,

u 2

and

u 4

, the allocated resources for each user from item node

v 1

is

\frac{a}{3}

.

Similarly, item node

v 2

allocates its resources to users

u 2

and

u 3

, and each user is allocated

\frac{b}{2}

of the resources of item node

v 2

. Item node

v 3

allocates its resources to users

u 2

,

u 3

and

u 4

, and each user owns

\frac{c}{3}

of the resources of item node

v 3

. Therefore, after the resource allocation, the resources that the users

u 1

,

u 2

,

u 3

and

u 4

obtain are

\frac{a}{3}

,

\frac{a}{3} + \frac{b}{2} + \frac{c}{3}

,

\frac{b}{2} + \frac{c}{3}

, and

\frac{a}{3} + \frac{c}{3}

, respectively. In the second step, the resources that each user node obtains are reallocated to the item nodes according to the allocation method in the first step so that the item nodes can share the resources of other nodes. The final resources obtained by the item nodes

v 1

,

v 2

and

v 3

are

\frac{11 a}{18} + \frac{1 b}{6} + \frac{5 c}{18}

,

\frac{1 a}{9} + \frac{5 b}{12} + \frac{5 c}{18}

, and

\frac{5 a}{18} + \frac{5 b}{12} + \frac{4 c}{9}

. By extracting the coefficients, we can obtain the following matrix form.

| \begin{matrix} \frac{11}{18} & \frac{1}{6} & \frac{5}{18} \\ \frac{1}{9} & \frac{5}{12} & \frac{5}{18} \\ \frac{5}{18} & \frac{5}{12} & \frac{4}{9} \end{matrix} |

(8)

From the above matrix, it can be obtained that data in the ith row and jth column represent the resources of the ith node redistributed by the jth node. For the bipartite graph

G (N, E)

, assume that N is the set of user node

U

and item node

V

. Meanwhile, suppose that the initial resource of node V_i in the set of item node

V

is

f (V_{i})

, which is positive, and that the resources obtained by node

U_{j}

in the set of user node

U

is

f (U_{j})

. We can obtain:

f (U_{j}) = \sum_{i = 1}^{n} \frac{a_{i j} f (V_{i})}{k (V_{i})}

(9)

where

k (V_{i})

is the degree of node

V_{i}

, that is, how many user nodes are connected to it.

a_{i j}

represents that whether the user node

U_{j}

is connected to item node

V_{i}

. If it is connected, the value of

a_{i j}

is 1; otherwise, it is 0.

During the second resource allocation, the item node allocates resources again for the user nodes that have obtained the resources. The final resources the item obtained on node

V_{i}

is denoted as

f^{'} (V_{i})

. Then,

f^{'} (V_{i})

can be expressed as:

f^{'} (V_{i}) = \sum_{j = 1}^{m} \frac{a_{i j} f (U_{j})}{k (U_{j})} = \sum_{j = 1}^{m} \frac{a_{i j}}{k (U_{j})} \sum_{i = 1}^{n} \frac{a_{i j} f (V_{i})}{k (V_{i})}

(10)

where

k (U_{j})

is the degree of node

U_{j}

, that is, how many item nodes are connected to it.

a i j

represents whether user node

U_{j}

is connected to item node

V_{i}

. If it is connected, the value of

k (U_{j})

is 1; otherwise, it is 0.

After simplification, the relationship between

V_{i}

and

V_{j}

can be expressed as:

f^{'} (V_{i}) = \sum_{j = 1}^{n} w_{i j} f (V_{j})

(11)

where the value of

w_{i j}

can be expressed as

w_{i j} = \frac{1}{k (I_{i})} \sum_{l = 1}^{m} \frac{a_{i l} a_{j l}}{k (I_{l})}

(12)

The final matrix

w

=

{w_{i j}}_{n * n}

represents an association relationship between items: how likely it is that item

i

will recommend item

j

to other users.

2.3.2. Bipartite Graph Recommendation Based on Network Representation Learning

Due to its complexity, network data is faced with representation, analysis, and processing challenges. To solve these problems, researchers have increasingly paid attention to network representation learning. The purpose of network representation learning [40] is to represent the nodes in the network as low-dimensional vector forms and to retain their original structural information as much as possible so that the resulting vector forms can represent and reason in vector space and can be applied to social networks and other typical applications.

The existing network representation learning methods are based mainly on shallow neural networks and deep learning. Representative methods based on shallow neural networks include DeepWalk [41], node2vec [42], and large-scale information network embedding (LINE) [43]. DeepWalk and node2vec adopt the random walk method to convert the network into a sequence of nodes and learn the low-dimensional vector representation of nodes based on the Word2Vec [44] model; LINE models the first- and second-order neighbor relationships in nodes.

Although these algorithms can effectively analyze the relevant network structures, they only work for homogeneous ones. In the real world, many heterogeneous network structures must be studied, in addition to the homogeneous network structure. The heterogeneous network integrates more information than the homogeneous one and contains richer semantic information. For example, many network structure relationships are involved in the personalized recommendation system, such as user rating information and item tag information. Using network data effectively has garnered strong interest in researching personalized recommendation systems [45]. Related scholars have conducted corresponding research for such networks, and algorithms such as metapath2vec, bipartite network embedding (BiNE), and other algorithms have been considered [46,47,48].

3. Probabilistic Matrix Factorization Recommendation Model Based on Relevance of Users’ Rated Items

This section describes our personalized recommendation model in detail, including problem definition and notation, item correlation calculation, user similarity calculation, and model solving.

3.1. Problem Definition and Notation

In personalized recommendation, the ratings of N items by M users are usually defined as a rating matrix with M rows and N columns. For convenience, the main symbols in this paper and their meanings are listed in Table 2. The rating prediction task can be expressed as given user

u \in U

and item

v

, as well as the score of

v

by

u

unknown, and the existing rating matrices can be used to predict the score

{\hat{r}}_{u i}

of i by

u

.

3.2. Ic-Us-Pmf Recommendation Model

As shown in Figure 3, our recommendation model has three parts.

First, based on the user rating information table, the explicit correlation between items is obtained. Then, the implicit correlation between items is obtained by modifying the massive diffusion process of the traditional bipartite graph. The final item correlation relationship is obtained by weighting the explicit and implicit correlations.
The user’s preference for the item tag corresponding to the rated item is calculated to obtain the explicit similarity relationship between users. Next, the implicit similarity relationship between users is obtained through random walks of the meta-path in the heterogeneous network. The final similarity relationship between users is obtained by weighting the user’s explicit and implicit similarity relationships.
The final item correlation relationship and the final user similarity relationship are integrated into the PMF model to predict the score of the target item made by the user.

3.3. Item Correlation Relationship Acquisition

For users with few rated items, items related to their rated items should be fully mined to recommend the items they are fond of. This idea stems from the YouTube recommendation system. In YouTube, the system recommends videos to users related to their played videos by employing their correlations. The relationship between items has become a significant factor in recommendation systems, affecting users’ decision-making. In acquiring item correlation relationships, we consider both the explicit and implicit relationships.

3.3.1. Item Explicit Correlation Acquisition

To obtain the explicit correlation relationship between items, we assume each user’s movie-watching record as the transaction record and count all the movies rated by the user whose scores are above a certain threshold. Then we can obtain the transaction record table of the user, as shown in Table 3.

Therefore, we can obtain the association rule of

{v 1, v 2, v 3} \Rightarrow {v 4, v 5}

through the FP-growth algorithm. To calculate the degree of association between commodities, we must split the many-to-many association rule

{v 1, v 2, v 3} \Rightarrow {v 4, v 5}

into several many-to-one association rules such as

{v 1, v 2, v 3} \Rightarrow {v 4}

and

{v 1, v 2, v 3} \Rightarrow {v 5}

. Suppose that the split association rule is

v_{i} \Rightarrow v_{j}

, and that

S u p p o r t (v_{i}, v_{j}) < β

.

α

and

β

are the corresponding thresholds. The direct correlation degree between items

v_{i}

and

v_{j}

can be defined as

S (v_{i}, v_{j}) = \frac{S u p p o r t (v_{i}, v_{j})}{S u p p o r t (v_{i}, v_{j}) + ϕ} \times C o n f i d e n c e (v_{i} \to v j)

(13)

where

S (v_{i}, v_{j})

represents the direct correlation degree between items,

S u p p o r t (v_{i}, v_{j})

represents adjusting the correlation degree between two items from an overall perspective, and

C o n f i d e n c e (v_{i} \to v_{j})

represents the correlation degree between two items calculated from a local perspective.

ϕ

. is a hyperparameter. When

ϕ

is equal to 0, the degree of correlation between two items is equal to the degree of their confidence.

3.3.2. Item-Implicit Correlation Acquisition

The bipartite graph resource allocation method can be used to mine the implicit correlation between items due to massive diffusion. This method can significantly alleviate the user’s cold start problem by recommending to users items related to their few rated items.

For the traditional bipartite graph recommendation based on massive diffusion, the initial resources of an item selected by the user are generally set to 1, and that of an item not selected by the user is set to 0. This setting method ignores the effects of the user’s rating of the item and the user’s true preference for the item. Therefore, it is necessary to consider the user’s rating of the item as the initial value for this bipartite graph recommendation and revise the massive diffusion process. Besides, it should also be noted that users’ ratings of items may be affected by their rating habits. For example, some users are accustomed to giving high scores to items regardless of their preferences, while some may be used to giving items low scores. Therefore, such extreme rating behaviors cannot fully reflect the users’ true preferences. Under these circumstances, it is necessary to normalize the users’ initial scores to reduce the calculation error caused by the different rating scales.

Based on this analysis, the user ratings are revised as follows:

{r^{'}}_{u v} = | \frac{r_{u v} - r_{\min}}{r_{\max} - r_{\min} + p} | + q

(14)

where

{r^{'}}_{u v}

is the revised score of item

v

by target user

u

,

r_{u v}

is the initial score of item

v

by the target user

u

, and

r_{\min}

and

r_{\max}

are the minimum and maximum scores of the ratings of all the items by the user. To avoid the denominator from being 0, p is set to 0.001, and for the convenience of the experiment, q is set to 0.01.

The calculation error of

{r^{'}}_{u v}

that might be caused by the users’ rating habits is eliminated, but it can be affected by other users’ ratings. Therefore, the users’ ratings should be revised according to

{r^{″}}_{u v} = \frac{{r^{'}}_{u v} - m e a n (r)}{s (r)}

(15)

where

{r^{″}}_{u v}

represents the final score of item

v

by user

u

,

m e a n (r)

represents the mean value of scores by all the users, and

s (r)

represents the standard deviation of scores by all the users. In this way, the negative impact of inaccurate ratings of other users in the entire rating system can be eliminated, and so is the effect of outlier scores in the system.

Therefore, the initial resources

f (V_{i})

of node

V_{i}

in the set of item node

V

can be expressed as:

f (V i) = \frac{\sum_{u = 1}^{m} {r^{″}}_{u V_{i}}}{\sum_{u = 1}^{m}_{u V_{i}}}

(16)

where the numerator represents the sum of all scores of node

V_{i}

in the set of item node V, and the denominator represents the number of users who rate

V_{i}

.

Then, the obtained

f (I_{i})

can be substituted into the calculation of the two massive diffusions in Section 2.3.1. First, in the first massive diffusion, the resources

f (U_{j})

obtained by node

U_{j}

in the set of user node

U

can be expressed as:

f (U_{j}) = \sum_{i = 1}^{n} \frac{a_{i j}}{k (I_{i})} * \frac{\sum_{u = 1}^{m} {r^{″}}_{u V_{i}}}{\sum_{u = 1}^{m}_{u V_{i}}}

(17)

In the second massive diffusion, the resources

f^{'} (V_{i})

obtained on item node

V_{i}

can be expressed as:

f^{'} (V_{i}) = \sum_{j = 1}^{m} \frac{a_{i j}}{k (U_{j})} \sum_{i = 1}^{n} \frac{a_{i j}}{k (I_{i})} * \frac{\sum_{u = 1}^{m} {r^{″}}_{u V_{i}}}{\sum_{u = 1}^{m}_{u V_{i}}}

(18)

The final relationship between

V_{i}

and

V_{j}

is as shown in (11). The value of

w_{i j}

can be expressed as (12). Then, we can obtain the implicit correlation relationship between items

V_{i}

and

V_{j}

. The final correlation relationship between

V_{i}

and

V_{j}

can be expressed as:

C_{i j} = α S_{i j} + (1 - α) w_{i j}

(19)

After obtaining the correlation among all items, we can obtain the item correlation relationship matrix

C

and normalize matrix

C

to make

\sum_{j \in B_{i}} C_{i j} = 1

, where

B_{i}

represents the item sets associated with item

i

.

3.4. User Similarity Acquisition

The idea of CF suggests that users have the interests in the items that users with similarity relationships prefer, so the recommendation accuracy is closely related to the solution accuracy of the user similarity relationships. Promoting the calculation accuracy of the user similarity degrees is the premise of improving the recommendation accuracy. When solving the user similarity relationship, we have two steps. First, we calculate the similarity of the users’ preferences for item tags in the items rated by the users. Additionally, we learn the spread of preferences for tags of the users according to network representation learning and calculate the implicit similarity of users with no common rated items.

3.4.1. User Explicit Similarity Relationship Acquisition

The traditional CF recommendation often relies on many common user scores to achieve high recommendation quality when measuring user similarity. However, in most cases, in the face of a large amount of sparse rating data, the proportion of common rated items by users is small, so the recommendations are not sufficiently accurate. It should be noted that, even for users with few ratings, we can mine their interest through their tag information from their historical rating data. Therefore, by combining the user’s rating of the item with the tag information, we can mine the user’s interest vector more accurately and improve the accuracy of similarity calculation between users.

In calculating user-tag preference, in addition to the user’s marking frequency for a tag, we should also consider the user’s rating of items containing the tag.

Table 4 shows that the user has rated five items separately, and each item contains tags. We can calculate the similarity degree between users through the following three steps:

STEP 1. Calculate the Tag Weight

Assume that the tag set in the system can be expressed as

T = {t_{1}, t_{2}, \dots, t_{n}}

, where

t_{i}

represents a particular tag in the system, and

n

is the total number of tags in the system, according to the

t f - i d f

idea. First, we obtain the tf word frequency (i.e., the total number of times a tag appears in the user-tag set), as shown in (20).

t f_{t i} = \frac{N_{t i}}{\sum_{k = 1}^{n} N_{t i}},

(20)

where

N_{t i}

is the number of times that items marked as tag

t_{i}

appear among all the items rated by the user. The denominator represents the total number of times all tags marked by the user appear.

Second, we can obtain the inverse document frequency

i d f_{t i}

of tag

t_{i}

, which is:

i d f_{t i} = \log (\frac{m}{d_{t i} + 1}),

(21)

where

m

is the number of users, and

d_{t i}

represents the total number of users marked by tag

t_{i}

. To prevent the denominator from being 0, the denominator is incremented by 1.

Finally, we can obtain the

t f - i d f_{t i}

value of tag

t_{i}

, which can be expressed as:

t f - i d f_{t i} = \frac{N_{t i}}{\sum_{k = 1}^{n} N_{t i}} * \log (\frac{m}{d_{t i} + 1}) .

(22)

STEP 2. Calculate the User’s Preference for Tag

t_{I}

Because each user has different criteria for rating, some users prefer to rate the items high. In contrast, others are likely to rate the items low regardless of their preferences. To make the ratings of different users within a certain range, the user ratings must be normalized first. Secondly, the user’s preference for tag

t_{i}

can be calculated by calculating the average scores of each item containing tag

t_{i}

by the user. Finally, the

t f - i d f

method can be applied to manage the influence of popular tags on the recommendation results. The user’s preference for tag

t_{i}

can be expressed as:

P_{u - t_{i}} = \frac{\sum_{j \in I_{U} (t i)} \frac{r_{j} - \max}{\max - \min}}{N_{t i}} \times \frac{N_{t i}}{\sum_{k = 1}^{n} N_{t i}} \times \log (\frac{m}{d_{t i} + 1}),

(23)

where

P_{u - t_{i}}

represents the preference of user

u

for tag

t_{i}

,

\frac{r_{j} - \min}{\max - \min}

is the normalization of all the scores by user

u

, the numerator

\frac{\sum_{j \in I_{U} (t i)} \frac{r_{j} - \max}{\max - \min}}{N_{t i}}

represents the sum of the scores of all the items containing tag

t_{i}

rated by user

u

, and the denominator is the number of times that the user rated items. We can obtain the mean score of all the items containing tag

t_{i}

rated by user

u

. The second half of (23) is the

t f - i d f

value of tag

t_{i}

.

STEP 3. Calculate the User Explicit Similarity

By (23), we can have the preference of user

u

for tag

t_{i}

, and then the user’s rating preference value for the tag can be expressed as the preference vector

p u = {p t 1, p t 2, \dots \dots, p tn}

. The tag preference similarity between any two users can be expressed as:

s i m explicit (u_{i}, u_{j}) = \frac{\sum_{t_{i} \in t_{u_{i}, u_{j}}} (p_{u_{i}, t_{i}} - {\bar{p}}_{u_{i}}) * (p_{u_{j}, t_{i}} - {\bar{p}}_{u_{j}})}{\sqrt{\sum_{t_{i} \in t_{u_{i}, u_{j}}} {(p_{u_{i}, t_{i}} - {\bar{p}}_{u_{i}})}^{2} \sqrt{\sum_{t_{i} \in t_{u, v}} {(p_{u_{j}, t_{i}} - {\bar{p}}_{u_{j}})}^{2}}}},

(24)

where

p_{u_{i}, t_{i}}

and

p_{u_{j}, t_{i}}

represent the rating preference values of users

u_{i}

and

u_{j}

for tag

t_{i}

, respectively,

t_{u_{i}, u_{j}}

represent the tag set that users

u_{i}

and

u_{j}

commonly rate, and

{\bar{p}}_{u_{i}}

and

{\bar{p}}_{u_{j}}

represent the mean values of preferences of users

u_{i}

and

u_{j}

. Standardize and unify the preference similarity for the tags of any two users. We can normalize it within [0,1] and obtain the user explicit similarity.

3.4.2. User-Implicit Similarity Relationship Acquisition

The above method of calculating user similarity using tags as the medium can alleviate the problem of few common rating items among users to a certain extent. The tag can link users, so the users are connected by the tag in different paths. In this way, users who have no relationships at all can establish connections to a certain extent, as shown in Figure 4.

By (21), the preferences of all the users for any tag can be obtained, and then the user–tag matrix can be obtained. The user–tag rating network can be expressed as

G = {U, V, E}

, where

U

and

V

represent the user and tag sets, respectively, and

E

is the edge set whose weight is equal to the user’s preference for the tag. Suppose that the existing meta-path can be expressed as

P = T_{1} \overset{(L_{1}, M_{1})}{\to} T_{2} \overset{(L_{2}, M_{2})}{\to} \dots \overset{(L_{k}, M_{k})}{\to} T_{k + 1}

, where

T_{k}

represents the entity type,

L_{k}

represents the relationship type between entities, and

M_{k}

represents the weight between entities. For the above mentioned heterogeneous bipartite network

G = {U, V, E}

, the meta-path

T_{-} 1 (R^{'}) T (R^{'}) T_{-} 2

can represent one user’s different preferences for two labels.

T_{-} 1 (a) U_{-} 1 (5) T_{-} 2

is an example of a meta-path, which means that the preference of user 1 for tag

T_{-} 1

is a, and that for label

T_{-} 2

is b.

Meta-paths in information networks can be viewed as features in traditional datasets. Given a meta-path

P = T_{1} \overset{(L_{1}, M_{1})}{\to} T_{2} \overset{(L_{2}, M_{2})}{\to} \dots \overset{(L_{k}, M_{k})}{\to} T_{k + 1}

, the similarity between any two entity objects can be obtained using random walk, path counting, and other measurements. They tend to make highly visible or concentrated objects obtain a higher similarity but fail to capture the semantic information of the similarity of peer objects. The similarity between two objects of the same type can be measured using the PathSim method.

s i m Implicit (u_{i}, u_{j}) = \frac{2 \times | {p_{u i ⇝ u j} : p_{u i ⇝ u j} \in P} |}{| {p_{u i ⇝ u i} : p_{u i ⇝ u i} \in P} | + | {p_{u i \to u i} : p_{u i ⇝ u i} \in P} |}

(25)

Then we can obtain the implicit similarity between any two users. The similarity relationship between users

U_{i}

and

U_{j}

can be expressed as:

S u_{i} u_{j} = β s i m explicit (u_{i} u_{j}) + (1 - β) s i m Implicit (u_{i} u_{j}) .

(26)

After the preference similarities among all users are obtained, user preference similarity matrix S can be obtained. Normalize matrix S to make

\sum_{u^{'} \in B_{u}} S_{u u^{'}} = 1

. In the equation,

B_{u}

represents the set of users who have similarities with user

u

, and

B_{u} = {u^{'} ∣ u^{'} \in U, S_{u u^{'}} > 0}

.

3.5. Probabilistic Matrix Factorization Recommendation Model Integrating Item Relevance and User Similarity

Figure 5 shows the probability matrix decomposition model based on the project association proposed in this paper. The model integrates the item association and user similarity relationships. The model decomposes three matrices simultaneously and correlates them through shared potential feature space so that the model can solve the objective function during learning, learn the user low-dimensional potential feature matrix, and project a low-dimensional potential feature matrix under the constraints of project association and user similarity. Finally, the predicted score of the project by the target user is accurately predicted.

This assumes that there exists a rating matrix

R

, which includes the ratings of

M

items by

N

users. By factorizing the rating matrix

R

, we can learn the low-dimensional user and item latent feature matrices

U

and

V

.

U \in R^{d \times N}

,

V \in R^{d \times M}

, and

d

are the dimensions of the feature vector.

U u

and

V v

represent the user and item latent feature vectors, respectively. The predicted score

R_{u v}

of the item by the target user can be obtained through the inner product calculation of

U u

and

V v

. In addition, the user similarity relationship matrix S can be obtained from Section 3.4. By factorizing the matrix S, we can learn the low-dimensional latent feature matrices

U

and

Z

of the potential and similar users. Matrices

U

and

Z

are essentially the same, both representing the user’s latent feature matrix. The item incidence matrix

C

can be obtained from Section 3.3.

Assume that

U

,

V

, and

Z

all satisfy a Gaussian distribution with a mean value of 0 and a variance of

δ^{2}

. According to (5), the conditional probability distribution of

U

,

V

, and

Z

can be expressed as:

\begin{array}{l} P (U ∣ σ_{U}^{2}) = \prod_{u = 1}^{N} N (U_{i} ∣ 0, σ_{U}^{2} I) \\ P (V ∣ σ_{V}^{2}) = \prod_{j = 1}^{M} N (V_{j} ∣ 0, σ_{V}^{2} I) \\ P (Z ∣ σ_{Z}^{2}) = \prod_{u = 1}^{N} N (U_{j} ∣ 0, σ_{U}^{2} I) \end{array}

(27)

where

N (x | u, δ^{2})

means that the variable

x

follows a Gaussian distribution with a mean value of

μ

and a variance of

δ^{2}

.

I

is an indicator function.

According to (4), the conditional probability of the item incidence matrix

C

and the user similarity relationship matrix

S

can be defined as:

\begin{array}{l} p (C ∣ V, σ_{C}^{2}) = \prod_{j = 1}^{m} \prod_{t = 1}^{m} {[N (C_{j t} ∣ g (V_{j}^{T} V_{j}), σ_{C}^{2})]}^{l_{i j}^{C}} \\ p (S ∣ V, σ_{S}^{2}) = \prod_{i = 1}^{n} \prod_{k = 1}^{n} {[N (S_{i k} ∣ g (U_{i}^{T} Z_{k}), σ_{S}^{2})]}^{l_{i j}^{S}} \end{array}

(28)

where

S_{i k}

represents the similarity between users, and

C_{j t}

represents the direct correlation degree of items.

I

is an indicator function. According to Bayesian inference, the posterior probability of

U

,

V

, and

Z

can be expressed as:

\begin{array}{l} p (U, V, Z | R, S, C, σ_{R}^{2}, σ_{S}^{2}, σ_{C}^{2}, σ_{U}^{2}, σ_{V}^{2}, σ_{Z}^{2}) \\ \propto p (R | U, V, σ_{R}^{2}) p (S | U, Z, σ_{S}^{2}) p (C | V, σ_{C}^{2}) p (U | σ_{U}^{2}) p (V | σ_{V}^{2}) p (Z | σ_{Z}^{2}) \\ = \prod_{i = 1}^{M} \prod_{j = 1}^{N} [N (R_{i j} | g (U_{i}^{T} V_{j}), σ_{R}^{2})]^{I_{i j}^{R}} \times \prod_{i = 1}^{N} \prod_{k = 1}^{N} [N (S_{i k} | g (U_{i}^{T} Z_{k}), σ_{S}^{2})]^{I_{i k}^{S}} \\ \times \prod_{j = 1}^{M} \prod_{t = 1}^{M} [N (C_{j t} | g (V_{j}^{T} V_{t}), σ_{C}^{2})]^{I_{i k}^{C}} \times \prod_{i = 1}^{N} N (U_{i} | 0, σ_{U}^{2} I) \\ \times \prod_{j = 1}^{M} N (V_{j} | 0, σ_{V}^{2} I) \times \prod_{k = 1}^{N} N (Z_{k} | 0, σ_{Z}^{2} I) \end{array}

(29)

To maximize the above probabilistic model, we can describe the target function as:

\begin{array}{l} L (R, S, C, U, V, Z) = \\ \frac{1}{2} \sum_{i = 1}^{N} \sum_{j = 1}^{M} I_{i j}^{R} {(R_{i j} - g (U_{i}^{T} V_{j}))}^{2} \\ + \frac{λ_{S}}{2} \sum_{i = 1}^{N} \sum_{k = 1}^{M} I_{i k}^{S} {(S_{i k} - g (U_{i}^{T} Z_{k}))}^{2} \\ + \frac{λ_{C}}{2} \sum_{j = 1}^{M} \sum_{t = 1}^{M} I_{i k}^{S} {(C_{j t} - g (V_{j}^{T} V_{t}))}^{2} \\ + \frac{λ u}{2} | | U | |_{F}^{2} + \frac{λ v}{2} | | V | |_{F}^{2} + \frac{λ z}{2} | | Z | |_{F}^{2} \end{array}

(30)

where

λ s = \frac{σ_{R}^{2}}{σ_{S}^{2}}

,

λ C = \frac{σ_{R}^{2}}{σ_{C}^{2}}

,

λ U = \frac{σ_{R}^{2}}{σ_{U}^{2}}

,

λ V = \frac{σ_{R}^{2}}{σ_{V}^{2}}

, and

λ Z = \frac{σ_{R}^{2}}{σ_{Z}^{2}}

. For (29), the

U

,

V

, and

Z

matrices can be iteratively updated by the gradient descent method to obtain the minimum value of the target function. The solution process is shown in (30).

\begin{array}{l} \frac{\partial L}{\partial U_{i}} = \sum_{i = 1}^{N} I_{i j}^{R} g^{'} (U_{i}^{T} V_{j}) (g (U_{i}^{T} V_{j}) - R_{i j}) V_{j} \\ + λ S \sum_{k = 1}^{N} I_{i k}^{S} g^{'} (U_{i}^{T} Z_{k}) (g (U_{i}^{T} Z_{k}) - S_{i k}) Z_{k} + λ U U i \\ \frac{\partial L}{\partial V_{j}} = \sum_{j = 1}^{N} I_{i j}^{R} g^{'} (U_{i}^{T} V_{j}) (g (U_{i}^{T} V_{j}) - R_{i j}) U i \\ + λ C \sum_{k = 1}^{N} I_{j k}^{C} g^{'} (V_{k}^{T} V_{j}) (g (V_{k}^{T} V_{j}) - C_{k j}) V_{k} + λ_{V} U_{j} \\ \frac{\partial L}{\partial Z_{k}} = λ S \sum_{k = 1}^{N} I_{i j}^{S} g^{'} (U_{i}^{T} Z_{k}) (g (U_{i}^{T} Z_{k}) - S_{i k}) U i + λ z Z_{k} \end{array}

(31)

where g′(x) is the derivative of the logistic function,

λ_{U}

,

λ_{V}

, and

λ_{Z}

are the iteration speeds, and

λ_{C}

and

λ_{S}

control the proportions of item correlation and user relationship in the entire recommendation process. The specific algorithm is shown in Algorithm 1.

Algorithm 1.PMF recommendation model integrating the relevance of user’s rated items.

Input: rating matrix R, item correlation matrix C, user similarity matrix S; latent feature vector dimension d; weights λ_C and λ_S; update rate α, iteration number maxNum, threshold;

Output: rating prediction matrix R′

Initialize: make U, V, and Z obey a Gaussian distribution with the mean value of μ and the variance of δ²;

Initialize: λ_C, λ_S; α, maxNum threshold;

Obtain the item correlation matrix C from Section 3.3

Obtain the user similarity relationship matrix S from Section 3.4

While (t < maxNum)

For r_i,j∈R do

Update U_i←U_i—α

\frac{\partial L}{\partial U_{i}}

based on Equation (30)

Update Vj←V_j—α

\frac{\partial L}{\partial V_{j}}

based on Equation (30)

update Z_k←Z_k—α

\frac{\partial L}{\partial Z_{k}}

based on Equation (30)

end for

calculate the new target function Lnew: L < −Lnew according to Equation (29)

If Lnew: L < −Lnew Then

break

end if

t←t + 1

end while

output U, V, Z

calculate the score prediction matrix R′ according to R′ =

U_{i}^{T} V_{j}

Return R′

3.6. Algorithm Complexity Analysis

The time complexity of the algorithm we proposed includes the calculation of the target function

L

and each gradient variable. The time complexity of calculating the target function L is

O (d | R^{t r} | + d | C | + d | S |)

, with |C| and |S| as the numbers of the item correlations and the user similarities. The complexity of calculating the gradient for one iteration is

O (d | R^{t r} |

\bar{r}

+ d | C | + d | S |),

with

\bar{r}

representing the average number of item ratings. Because

\bar{r}

<< (

| R^{t r} |

,

| C |

,

| S |

), the overall complexity of the algorithm is linearly related to the numbers of ratings, item correlations, and user similarity relationships.

4. Experiment Analysis

In this section, we first introduce the dataset and experimental environment required for the experiment and the evaluation metrics, then introduce the related comparison algorithms and related parameters that affect the proposed algorithm and compare and analyze the experimental results. The purpose of this experiment is to solve the following two problems.

How does the matrix decomposition model with project-related information and user similar information perform in personalized recommendation scenarios?
Integrate the project-related information, the matrix decomposition of user-similar information, and how the influencing factors affect the performance of the recommendation system.

4.1. Experimental Setup

4.1.1. Dataset and Experimental Environment

In order to verify the influence of different information on the accuracy of prediction score in matrix decomposition, we select three public datasets—Delicious-2k, hetrec2011-MovieLens-2k, and last.fm-2k—to conduct the experiment. The Delicious-2k dataset, released in HetRec 2011, comes from the Delicious Social Bookmarking System (accessed on May 2011, http://www.delicious.com) and contains social network, bookmarking, and tagging information. The MovieLens-2k dataset is an extension of the MovieLens10M dataset published by GroupLens (accessed on January 2009, http://www.grouplens.org), which links the original MovieLens dataset with the Internet Movie Database (IMDB), and the movie review system.Last.fm-2k (accessed on May 2011, http://www.lastfm.com) was released at the second International Symposium on Information heterogeneity and Fusion of recommendation Systems in 2011. Last.fm is the most famous social music communication platform in the world. It uses collective wisdom to make personalized music recommendations for users.

The statistics related to the three datasets are shown in Table 5.

Our study divided the dataset into a training set (80%) and a test set (20%). The data in the training and test sets do not intersect, which means

R t e s t \cap R t r a i n = ϕ

. The training set is used to learn the parameters of each recommendation algorithm, and the test set is used to evaluate the performance of the recommendation algorithm.

The software environment of our experiment was Windows 11 (64-bit), Anaconda3, and Python3.7. The hardware environment was an Intel i7-10875H CPU @ 2.30 GHz, and the memory is 16 GB.

4.1.2. Evaluation Metrics

We choose the mean squared error (MAE) and the root mean squared error (RMSE) as the metrics for evaluating the recommendation performance. Both MAE and RMSE describe the degree of deviation between the predicted and actual scores. MAE describes the deviation degree by calculating the mean squared error between the predicted and actual scores, and RMSE describes that by calculating the root mean squared error between the predicted and actual scores. We divided the dataset into training and test sets for the experiment and evaluated the recommendation performance using five-fold cross-validation. The specific definitions are

MAE = \frac{\sum_{(u, i) \in R_{t e s t}} | r_{u i} - {\hat{r}}_{u i} |}{| R_{t e s t} |} and

(32)

RMSE = \sqrt{\frac{\sum_{(u, i) \in R_{t e s t}} {(r_{u i} - {\hat{r}}_{u i})}^{2}}{| R_{t e s t} |}},

(33)

where

R_{t e s t}

is the test set, and |

R_{t e s t}

| represents the number of elements in the test set.

R_{u i}

represents the true score of item i by user u, and

{\hat{R}}_{u i}

represents the prediction score of item i by user u. The smaller the values of MAE and RAE, the smaller the error between the predicted and actual scores, and the higher the accuracy of the algorithm.

4.2. Parameter Setting

In this paper, the random gradient descent algorithm is used to optimize the loss function. In the whole optimization process, the value of each parameter will directly affect the accuracy of the score prediction. Therefore, it is necessary to set the parameters involved in the objective function. Next, we discuss the effects of parameters such as learning rate

α

, item association weight

λ_{C}

, and user similarity weight

λ_{S}

on recommendation accuracy.

4.2.1. Influence of Learning Rate on Recommendation Performance

The proposed algorithm model solves the target function through SGD. The learning rate

α

directly affects the update speeds of

\frac{\partial L}{\partial U_{i}}

,

\frac{\partial L}{\partial V_{j}}

, and

\frac{\partial L}{\partial Z_{k}}

. We set the parameter

α

to 0.00001, 0.0001, 0.001, 0.01, 0.1, and 0.3 to observe the impact of the update rate on performance, which is shown in Figure 6.

It can be seen from Figure 6 that, when learning rate

α

gradually increases above 0.000 01, the RMSE begins to trend downward. When the learning rate

α

equals 0.01, the recommendation performance reaches its optimum for three datasets. When the learning rate

α

exceeds the threshold, the RMSE begins to rise again, the recommendation performance declines gradually, and the model’s learning ability is impaired. Therefore, in our algorithm model, we set

α

to 0.01.

4.2.2. Influence of Item Correlation Relationship Weight $λ_{C}$ on Performance

The weight of the item correlation relationship

λ_{C}

also affects the influence of the item correlation relationship between the items on the entire recommendation algorithm. We set the

λ_{C}

to 0.001, 0.01, 0.1, 0.3, 0.5, and 1 to determine the impact of item correlation relationship weights on the performance. The impact of parameter

λ_{C}

is shown in Figure 7.

Figure 7 shows that, for a

λ_{C}

of 0.3, the RMSE of our algorithm model is the smallest, meaning that the recommendation performance is optimal. This demonstrates that the item correlation relationship significantly improves recommendation performance and affects system performance. Integrating the item correlation relationship into the probabilistic matrix model helps the model to better fit the user’s preferences. Meanwhile, problems such as the cold start caused by insufficient user rating information can also be alleviated.

4.2.3. Influence of User Similarity Relationship Weight $λ_{S}$ on Performance

Weight

λ_{S}

of the user similarity relationship also influences the entire algorithm. We set

λ_{S}

to 0.001, 0.01, 0.1, 0.3, 0.5, and 1 to observe the impact of the update rate on the recommendation performance. The impact of

λ_{S}

on performance is shown in Figure 8.

It can be seen from Figure 8 that, when the user similarity relationship weight

λ_{S}

is 0.3, the RMSE value of the algorithm model is the smallest, indicating optimal recommendation performance. The user similarity relationship is crucial in improving recommendation performance and directly affects system performance. Integrating the user similarity relationship into the probabilistic matrix model is helpful for mining more accurate interests of users through their similar friends.

4.3. Comparison Method

To verify the influence of fusion project explicit association relation, item implicit association relation based on material diffusion, explicit similarity relation of user–tag preference, and user implicit similarity relation mediated by the label on prediction score in matrix decomposition, this paper selects five mainstream recommendation algorithms and compares them with this method. The following is a detailed introduction to the five recommendation algorithms.

PMF: the PMF model [11] factorizes large, sparse, and unbalanced rating matrices efficiently.
SVD++: the SVD++ algorithm, proposed by Koren et al. [14], considers the bias information of users and items and the implicit feedback information of a user in the rating prediction with high recommendation accuracy.
TrustMF: the TrustMF algorithm, proposed by Yang et al. [49], reasonably reflects the impact of the correlation relationship between users who trust each other on the recommendation performance.
UB-HUS: the UB-HUS algorithm, proposed by Wang et al. [50], is a user-based nearest neighbor recommendation method with higher prediction accuracy in the face of data sparsity.
JSR: the JSR algorithm, proposed by Ji et al. [51], proposes a three-factor matrix factorization method by combining tags marked by the users and keywords of the items with the social information.

According to the experiment in the previous section, we can know the value of each parameter when the recommended performance is the best, and finally determine the parameters

α = 0.001

,

λ_{C} = λ_{S} = 0.3

, iterations is 100, and compare it with other algorithms on MAE, and RMSE. In addition, the algorithm achieves the best performance when iterating 100 times. At the same time, to verify the performance of this algorithm in the cold start scenario, we call the set of users whose scores are less than 5 in the training set the cold start user set. Next, we compare the recommendation performance of various recommendation algorithms on all user sets and cold start user sets. Table 6 shows the comparison results of the MAE and RMSE of each recommendation algorithm on all user sets, and Table 7 shows the comparison results of the MAE and RMSE of each recommendation algorithm on the cold start user set.

The experimental results show that the recommendation accuracy of SVD++ is better than that of PMF in all user sets, which shows that user bias factor, item bias factor, and user implicit feedback information can improve the score prediction accuracy of the algorithm. In all user sets, TrustMF, JSR, compared with PMF, its recommendation accuracy has been greatly improved. This shows that users’ social relations help improve the prediction accuracy of the traditional matrix decomposition model and help alleviate users’ cold start problems. Although the algorithm proposed in this paper does not deliberately consider the trust relationship between users, it fully mines the potential trust relationship of users with the help of label information and considers the transitivity of nodes between heterogeneous networks. Therefore, compared with other algorithms, it also has higher accuracy. In the cold start user set, the recommendation accuracy of the algorithm proposed in this paper is significantly improved compared with other comparison algorithms. This is because, when the user is a cold start user, the number of social relations of Delicious-2k, MovieLens-2k, last.fm-2k is often small, so it is limited to rely on social relations to improve the accuracy of the recommendation algorithm; but, when the target item is a non-cold start, with the help of material diffusion principle, considering the relationship between items can improve the score prediction accuracy of the recommendation algorithm. In addition, in the cold start scenario, the algorithm fully excavates the user similarity with the help of a user-item heterogeneous network. This also plays a vital role in improving the accuracy of recommendations.

The algorithm proposed in this paper not only has the above advantages in performance, but also because the algorithm proposed in this paper only needs to calculate the item correlation degree and user similarity, but not the user trust relationship, so when calculating the loss function, the time complexity is

O (d | R^{t r} | + d | C | + d | S |)

, with |C| and |S| as the numbers of the item correlations and the user similarities, which is much lower than the TrustMF, JSR algorithm, which needs to consider the social trust relationship. Because, when calculating the trust degree, it is necessary to traverse the trust relationship between all users and users.

5. Conclusions

This paper proposes our method to measure the degree of item correlation. The association rule algorithm was adopted to obtain an item-explicit correlation, and bipartite graph resource allocation obtained the item-implicit correlation. The final relationship between items is assigned by weighting. Next, the user explicit similarity relationship is represented through the degree of user preference for the tag, so the user-implicit similarity relationship is gained by a random walk of the meta-path of the heterogeneous network. The final similarity relationship is obtained by weighting. After obtaining the item correlation and user similarity relationships, we obtain the PMF recommendation model based on the relevance of the rated items. This integrates the item correlation and user similarity relationships into the PMF model and obtains the predicted rating of the target item by the target user. We validated this algorithm model using the Delicious-2k and movielens-2k datasets. The results show that our model is more accurate than the current recommendation algorithms. The algorithm model proposed in this paper is mainly used in recommendation scenarios with sparse data, especially in cold-start recommendation scenarios with few user scores.

In future research, we will focus on incorporating important information such as contextual information (time and location) and visual features of items into the PMF model. We will also introduce an attention mechanism to better predict the item’s rating by the target user. This will be the focus of our work and the direction of the model improvement we will take.

Author Contributions

Conceptualization, L.H. and L.C.; methodology, L.H. and L.C.; software, X.S.; validation, X.S. Writing and original draft preparation, L.H.; writing and review/editing, L.C.; visualization, X.S.; supervision, L.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Key R&D Program of Shaan Xi Province, grant number 2019ZDLGY10-01.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare that they have no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

References

Chiang, K.Y.; Hsieh, C.; Dhillon, I.S. Matrix completion with noisy side information. In Proceedings of the 28th International Conference on NIPS, Montreal, QC, Canada, 7–12 December 2015; pp. 3447–3455. [Google Scholar]
Rao, N.; Yu, H.F.; Ravikumar, P. Collaborative filtering with graph information: Consistency and scalable methods. In Proceedings of the 28th International Conference on NIPS, Montreal, QC, Canada, 7–12 December 2015; pp. 2107–2115. [Google Scholar]
Bhaskar, S.A. Probabilistic low-rank matrix completion from quantized measurements. J. Mach. Learn. Res. 2016, 17, 2131–3164. [Google Scholar]
Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommendation systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Hu, Y.; Koren, Y.; Volinsky, C. Collaborative filtering for implicit feedback datasets. In Proceedings of the 2008 Eighth IEEE International Conference on Data Mining, Washington, DC, USA, 15–19 December 2008; pp. 263–272. [Google Scholar]
Srebro, N.; Rennie, J.D.M.; Jaakkola, T.S. Maximum-margin matrix factorization. In Proceedings of the 17th International Conference on NIPS, Vancouver, BC, Canada, 1 December 2004; pp. 1329–1336. [Google Scholar]
Liu, X.; Aggarwal, C.; Li, Y.; Kong, X.; Sun, X.; Sathe, S. Kernelized matrix factorization for collaborative filtering. In Proceedings of the 2016 SIAM International Conference on Data Mining, Miami, FL, USA, 5–7 May 2016; pp. 378–386. [Google Scholar]
Sun, J.Z.; Parthasarathy, D.; Varshney, K.R. Collaborative kalman filtering for dynamic matrix factorization. Trans. Sig. Proc. 2014, 62, 3499–3509. [Google Scholar] [CrossRef]
Golub, G.; Kahan, W. Calculating the singular values and pseudo-inverse of a matrix. J. Soc. Ind. Appl. Math. Ser. B Numer. Anal. 2014, 2, 205–224. [Google Scholar] [CrossRef]
Berry, M.W.; Browne, M.; Langville, A.N. Algorithms and applications for approximate nonnegative matrix factorization. Comput. Stat. Data Anal. 2007, 52, 155–173. [Google Scholar] [CrossRef] [Green Version]
Salakhutdinov, R.; Mnih, A. Probabilistic matrix factorization. In Proceedings of the 20th International Conference on NIPS, Vancouver, BC, Canada, 3–6 December 2007; pp. 1257–1264. [Google Scholar]
Salakhutdinov, R.; Mnih, A. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In Proceedings of the 25th International Conference on Machine learning, Helsinki, Finland, 5–9 July 2008; pp. 880–887. [Google Scholar]
Jamali, M.; Ester, M. A matrix factorization technique with trust propagation for recommendation in social networks. In Proceedings of the 4th ACM Conference on recommendation Systems, Barcelona, Spain, 26–30 September 2010; pp. 135–142. [Google Scholar]
Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGK International Conference on KDD, Las Vegas, NV, USA, 24–27 August 2008; pp. 426–434. [Google Scholar]
Ji, K.; Shen, H. Addressing cold-start: Scalable recommendation with tags and keywords. Know.-Based Syst. 2015, 83, 42–50. [Google Scholar] [CrossRef]
Forsati, R.; Mahdavi, M.; Shamsfard, M.; Sarwat, M. Matrix factorization with explicit trust and distrust side information for improved social recommendation. ACM Trans. Inf. Syst. 2014, 14, 1–38. [Google Scholar] [CrossRef] [Green Version]
Wu, L.; Chen, E.; Liu, Q.; Xu, L.L.; Bao, T.F.; Zhang, L. Leveraging tagging for neighborhood-aware probabilistic matrix factorization. In Proceedings of the 21st ACM International Conference on IKM, Maui, HI, USA, 25–29 November 2012; pp. 1854–1858. [Google Scholar]
Yu, X.; Ren, X.; Gu, Q.; Sun, Y.; Han, J. Collaborative filtering with entity similarity regularization in heterogeneous information networks. In Proceedings of the IJCAI-13 HINA Workshop, Beijing, China, 3–9 August 2013. [Google Scholar]
Luo, C.; Pang, W.; Wang, Z.; Lin, C.H. Hete-cf: Social-based collaborative filtering recommendation using heterogeneous relations. In Proceedings of the 2014 IEEE International Conference on Data Mining, Shenzhen, China, 14–17 December 2014; pp. 917–922. [Google Scholar]
Wan, L.; Xia, F.; Kong, X.; Hsu, C.; Huang, R.; Ma, J. Deep Matrix Factorization for Trust-Aware Recommendation in Social Networks. IEEE Trans. Netw. Sci. Eng. 2021, 8, 511–528. [Google Scholar] [CrossRef]
De Meo, P. Trust Prediction via Matrix Factorisation. ACM Trans. Internet Technol. 2019, 44, 20. [Google Scholar]
Xu, S.; Zhuang, H.; Sun, F.; Wang, S.; Wu, T.; Dong, J. Recommendation algorithm of probabilistic matrix factorization based on directed trust. Comput. Electr. Eng. 2021, 93, 107206. [Google Scholar] [CrossRef]
Bobadilla, J.; Bojorque, R.; Esteban, A.H.; Hurtado, R. recommendation systems clustering using Bayesian non negative matrix factorization. IEEE Access 2017, 6, 3549–3564. [Google Scholar] [CrossRef]
Liu, J.; Jiang, Y.; Li, Z.; Zhang, X.; Lu, H.Q. Domain-sensitive recommendation with user-item subgroup analysis. IEEE Trans. Knowl. Data Eng. 2016, 28, 939–950. [Google Scholar] [CrossRef]
Guo, H.; Liu, G.; Su, B.; Meng, K. Collaborative filtering recommendation algorithm combining community structure and interest clusters. J. Comput. Res. Dev. 2016, 28, 939–950. [Google Scholar] [CrossRef]
Zhu, X.; Guo, J.; Li, S.; Hao, T. Facing cold-start: A live TV recommendation system based on neural networks. IEEE Access 2020, 8, 131286–131298. [Google Scholar] [CrossRef]
Liu, S. Enhancing graph neural networks for recommendation systems. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in IR, Xi’an, China, 25–30 July 2020; p. 2484. [Google Scholar]
Yin, R.; Li, K.; Zhang, G.; Lu, J. A deeper graph neural network for recommendation systems. Know.-Based Syst. 2019, 185, 105020. [Google Scholar] [CrossRef]
Qian, M.; Hong, L.; Shi, Y.; Rajan, S. Structured sparse regression for recommendation systems. In Proceedings of the 24th ACM International Conference on IKM, Melbourne, Australia, 18–23 October 2015; pp. 1895–1898. [Google Scholar]
Xu, Y.; Yang, Y.; Han, J.; Wang, E.; Ming, J.; Xiong, H. Slanderous user detection with modified recurrent neural networks in recommendation system. Inf. Sci. 2019, 505, 265–281. [Google Scholar] [CrossRef]
Wang, H.; Li, W.J. Relational collaborative topic regression for recommendation systems. IEEE Trans. Knowl. Data Eng. 2015, 27, 1343–1355. [Google Scholar] [CrossRef] [Green Version]
Hofmann, T.; Puzicha, J. Latent class models for collaborative filtering. In Proceedings of the 16th International Joint Conference on AI, Stockholm, Sweden, 31 July–6 August 1999; pp. 688–693. [Google Scholar]
Li, X.; Lv, Q.; Huang, W. Learning similarity with probabilistic latent semantic analysis for image retrieval. KSII Trans. Internet Inf. Syst. 2015, 9, 1424–1440. [Google Scholar]
Pliakos, K.; Kotropoulos, C. Building an image annotation and tourism recommendation system. Int. J. Artif. Intell. Tools 2015, 24, 1540021. [Google Scholar] [CrossRef]
Agrawal, R.; Srikant, R. Fast algorithms for mining association rules. In Proceedings of the 20th International Conference on Very Large Data Bases, Santiago, Chile, 12–15 September 1994; pp. 487–499. [Google Scholar]
Han, J.; Pei, J.; Yin, Y.; Mao, R. Mining frequent patterns without candidate generation. Data Min. Knowl. Discov. 2000, 8, 53–87. [Google Scholar] [CrossRef]
Zhang, Z.; Pedrycz, W.; Huang, J. Efficient frequent itemsets mining through sampling and information granulation. Eng. Appl. Artif. Intell. 2017, 65, 119–136. [Google Scholar] [CrossRef]
Zhou, T.; Ren, J.; Medo, M.; Zhang, Y.C. Bipartite network projection and personal recommendation. Phys. Rev. E. 2007, 76, 046115. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Zhang, Y.C.; Medo, M.; Ren, J.; Zhou, T.; Li, T.; Yang, F. Recommendation model based on opinion diffusion. Europhys. Lett. 2007, 80, 68003. [Google Scholar] [CrossRef]
Ietswaart, R.; Gyori, B.M.; Bachman, J.A.; Sorger, P.K.; Churchman, L.S. GeneWalk identifies relevant gene functions for a biological context using network representation learning. Genome Biol. 2021, 22, 1–35. [Google Scholar] [CrossRef] [PubMed]
Perozzi, B.; Al-Rfou, R.; Skiena, S. DeepWalk: Online learning of social representations. In Proceedings of the 20th ACM SIGKDD International Conference on KDD, New York, NY, USA, 24–27 August 2014; pp. 701–710. [Google Scholar]
Grover, A.; Leskovec, J. node2vec: Scalable feature leaming for networks. In Proceedings of the 22nd ACM SIGKDD International Conference on KDD, San Francisco, CA, USA, 13–17 August 2016; pp. 855–864. [Google Scholar]
Wang, D.; Cui, P.; Zhou, W. Structural deep network embedding. In Proceedings of the 22nd ACM SIGKDD International Conference on KDD, San Francisco, CA, USA, 13–17 August 2016; pp. 1225–1234. [Google Scholar]
Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed representations of words and phrases and their compositionality. In Proceedings of the 26th International Conference on NIPS, Lake Tahoe, NV, USA, 5–10 December 2013; pp. 3111–3119. [Google Scholar]
Tran, T.; Lee, K.; Liao, Y.; Lee, D. Regularizing matrix factorization with user and item embeddings for recommendation. In Proceedings of the 27th ACM International Conference on IKM, Torino, Italy, 22–26 October 2018; pp. 687–696. [Google Scholar]
Dong, Y.; Chawla, N.V.; Swami, A. metapath2vec: Scalable representation learning for heterogeneous networks. In Proceedings of the 23rd ACM SIGKDD International Conference on KDD, Halifax, NS, Canada, 13–17 August 2017; pp. 135–144. [Google Scholar]
Gao, M.; Chen, L.; He, X.; Zhou, A. BiNE: Bipartite network embedding. In Proceedings of the 41st International ACM SIGIR Conference on R & D in IR, Ann Arbor, MI, USA, 8–12 July 2018; pp. 715–724. [Google Scholar]
Sybrandt, J.; Safro, I. FOBE and HOBE: First- and high-order bipartite embeddings. In Proceedings of the 16th Intenational Workshop on Mining and Learning with Graphs, Anchorage, AK, USA, 5 August 2019; pp. 1–8. [Google Scholar]
Haydar, C.; Boyer, A.; Roussanaly, A. Hybridising collaborative filtering and trust-aware recommendation systems. In Proceedings of the 8th International Conference on WebIS and Technologies-WEBIST, Porto, Portugal, 18–21 April 2012; pp. 695–700. [Google Scholar]
Wang, Y.; Deng, J.; Gao, J.; Zhang, P. A hybrid user similarity model for collaborative filtering. Inf. Sci. 2017, 418, 102–118. [Google Scholar] [CrossRef]
Ji, K.; Shen, H. Jointly modeling content, social network and ratings for explainable and cold-start recommendation. Neurocomput 2016, 218, 1–12. [Google Scholar] [CrossRef]

Figure 1. Illustration of probabilistic matrix factorization model.

Figure 2. Process of massive diffusion in the bipartite graph. (A) shows the initial state of resource allocation. The whole resource allocation process is divided into two stages: in the first stage, resources are from v to u, as shown in (A,B). In the second stage, the resources are from u to v, as shown in (B,C).

Figure 3. IC-US-PMF recommended model frame diagram. The model is divided into two parts, the first step is to get the user similarity matrix and the item correlation matrix, and the second step is to get the score of the target item by the target user according to the probability decomposition model.

Figure 4. Diagram of heterogeneous network.

Figure 5. Diagram of the probability matrix decomposition model combining project association relationship with user similarity relationship.

Figure 6. Influence of update rate

α

on recommendation performance.

Figure 6. Influence of update rate

α

on recommendation performance.

Figure 7. Influence of item correlation relationship weight on recommendation performance.

Figure 8. Influence of user similarity relationship weight

λ_{S}

on recommendation performance.

Figure 8. Influence of user similarity relationship weight

λ_{S}

on recommendation performance.

Table 1. Transaction set in a supermarket.

TID	Milk	Break	Butter	Beer	Diapers
t₁	1	1	0	0	0
t₂	0	0	1	0	0
t₃	0	0	0	1	1
t₄	1	1	1	0	0
t₅	0	1	0	0	0

Table 2. Symbols and meanings.

Symbols	Meanings
$U$	User set
$V$	Item set
$R$	User-item rating matrix
$r_{u i}$	Score of item i by user u
$C$	Item correlation relationship matrix
$S$	User similarity relationship matrix

Table 3. User transaction record table.

User ID	Item ID
$u 1$	$v 1, v 2, v 4, v 6$
$u 2$	$v 1, v 2, v 5, v 8$
$u 3$	$v 1, v 3, v 7$
$u 4$	$v 1, v 7$
$u 5$	$v 2, v 3, v 5, v 6, v 8$
$u 6$	$v 1, v 2, v 4, v 5, v 7$

Table 4. User-Item-Tag Rating Information.

User ID	Item	Score	Tag₁	Tag₂	…	…	Tag_n
u1	i1	5	1	0	0	1	0
u1	i2	3	1	1	1	0	0
u1	i3	2	1	1	0	0	0
u1	i4	4	0	0	1	1	0
u1	i5	3	1	1	0	0	1

Table 5. Dataset Statistics.

Datasets	Delicious-2k	MovieLens-2k	last.fm-2k
users	1867	2113	1892
items	69,224	10,109	17,623
ratings	104,799	855,600	92,000
tags	53,388	13,222	11,946

Table 6. Comparison of recommendation performance of each algorithm on all user sets.

Dataset	Metric	RMSE	MAE
Delicious-2k	PMF	1.291	1.003
	SVD++	1.221	0.936
	TrustMF	1.047	0.786
	UB-HUS	1.473	1.223
	JSR	1.089	0.822
	IC-US-PMF	1.015	0.756
MovieLens-2k	PMF	1.306	1.032
	SVD++	1.224	0.961
	TrustMF	1.085	0.83
	UB-HUS	1.189	0.924
	JSR	1.032	0.759
	IC-US-PMF	0.998	0.734
last.fm-2k	PMF	1.297	1.029
	SVD++	1.224	0.967
	TrustMF	1.058	0.773
	UB-HUS	1.318	1.063
	JSR	1.051	0.793
	IC-US-PMF	1.006	0.717

Table 7. Comparison of recommendation performance of each algorithm on cold user sets.

Dataset	Metric	RMSE	MAE
Delicious-2k	PMF	1.035	0.751
	SVD++	0.953	0.695
	TrustMF	0.786	0.531
	UB-HUS	1.193	0.923
	JSR	0.815	0.537
	IC-US-PMF	0.732	0.482
MovieLens-2k	PMF	1.042	0.762
	SVD++	0.966	0.682
	TrustMF	0.796	0.543
	UB-HUS	0.938	0.656
	JSR	0.773	0.51
	IC-US-PMF	0.742	0.484
last.fm-2k	PMF	1.038	0.761
	SVD++	0.962	0.688
	TrustMF	0.789	0.525
	UB-HUS	1.049	0.794
	JSR	0.792	0.502
	IC-US-PMF	0.738	0.449

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Han, L.; Chen, L.; Shi, X. Recommendation Model Based on Probabilistic Matrix Factorization and Rated Item Relevance. Electronics 2022, 11, 4160. https://doi.org/10.3390/electronics11244160

AMA Style

Han L, Chen L, Shi X. Recommendation Model Based on Probabilistic Matrix Factorization and Rated Item Relevance. Electronics. 2022; 11(24):4160. https://doi.org/10.3390/electronics11244160

Chicago/Turabian Style

Han, Lifeng, Li Chen, and Xiaolong Shi. 2022. "Recommendation Model Based on Probabilistic Matrix Factorization and Rated Item Relevance" Electronics 11, no. 24: 4160. https://doi.org/10.3390/electronics11244160

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Recommendation Model Based on Probabilistic Matrix Factorization and Rated Item Relevance

Abstract

1. Introduction

2. Related Work

2.1. Recommendation Model based on Network Representation Learning

2.2. Personalized Recommendation based on Association Rule Mining

2.3. Personalized Recommendation based on Bipartite Graph

2.3.1. Bipartite Graph Recommendation Based on Massive Diffusion

2.3.2. Bipartite Graph Recommendation Based on Network Representation Learning

3. Probabilistic Matrix Factorization Recommendation Model Based on Relevance of Users’ Rated Items

3.1. Problem Definition and Notation

3.2. Ic-Us-Pmf Recommendation Model

3.3. Item Correlation Relationship Acquisition

3.3.1. Item Explicit Correlation Acquisition

3.3.2. Item-Implicit Correlation Acquisition

3.4. User Similarity Acquisition

3.4.1. User Explicit Similarity Relationship Acquisition

3.4.2. User-Implicit Similarity Relationship Acquisition

3.5. Probabilistic Matrix Factorization Recommendation Model Integrating Item Relevance and User Similarity

3.6. Algorithm Complexity Analysis

4. Experiment Analysis

4.1. Experimental Setup

4.1.1. Dataset and Experimental Environment

4.1.2. Evaluation Metrics

4.2. Parameter Setting

4.2.1. Influence of Learning Rate on Recommendation Performance

4.2.2. Influence of Item Correlation Relationship Weight λ C on Performance

4.2.3. Influence of User Similarity Relationship Weight λ S on Performance

4.3. Comparison Method

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

4.2.2. Influence of Item Correlation Relationship Weight $λ_{C}$ on Performance

4.2.3. Influence of User Similarity Relationship Weight $λ_{S}$ on Performance