1. Introduction
With the rapid development of the Internet, information overload has become an increasing problem for online users. Recommendation systems can help to solve this problem by suggesting information to meet the needs of online users [
1,
2,
3]. Collaborative filtering (CF) can recommend items to users according to users’ browsing history or purchase history. Collaborative filtering has been widely used in various recommendation systems due to its simplicity and effectiveness.
Current collaborative filtering recommendation methods can be divided into two approaches: the memory-based collaborative filtering recommendation approach and the model-based collaborative filtering recommendation approach. The memory-based approach [
4,
5,
6] recommends items based on the similarity of either users or items; while the model-based approach [
7,
8,
9,
10] uses machine learning methods to build a prediction model based on rating data representing users’ preferences for items, leading to higher accuracy and better scalability. In particular, the probability matrix factorization [
11] model is one of the typical model-based collaborative filtering recommendation methods.
Although current recommendation methods have achieved great success in various applications, data sparsity [
12,
13] and cold-start [
14,
15] are two key problems. Users usually do not rate all the items they viewed because of their (watching/use) habits or awareness of privacy protection. This leads to a sparseness of rating data [
16,
17]. When new users or new items are introduced into the system, users or items can not be modeled due to a lack of historical records. This causes the cold-start problem [
18,
19]. Both the data sparsity and cold-start problems lead to inaccurate modeling of users or items, which further reduces the recommendation accuracy [
20,
21].
Incorporating social information into the recommendation method is one of the effective ways to solve the sparsity problem and cold-start problem [
14,
22]. Connected users tend to have similar behavioral preferences, while people with similar behavioral preferences tend to establish connections [
22]. Thus, when lacking users’ historical rating data, the history of their friends can be used to make recommendations for them. Due to its effectiveness in alleviating the sparsity and cold-start problems, social recommendation methods combined with social information have attracted much attention in recent years. Social recommendation based on probability matrix decomposition, in particular, has the advantages of the scalability of the matrix decomposition method with higher accuracy by introducing social information [
8,
9].
Although current social recommendation can effectively alleviate the sparsity and the cold-start problems, we identify two more problems. First, the trust networks with social information used by the current social recommendation methods are binary data. They only use the local information of the network, the direct trust relationship between users, and ignore the overall structure information of the network [
7,
23]. Second, when the social recommendation methods model trust data and rating data, it is considered that the trust evaluation and the rating share the same preference space. However, users normally consider different factors when rating items and when developing social relationships. For example, the rating of the items may be influenced by many factors, such as product quality, brand, appearance, etc. In terms of whether to develop trust relationships with others, users may consider many factors, such as a user’s occupation, personality and background.
This paper proposes a novel social recommendation framework, named TPSR, to solve the two issues identified. First, instead of using the binary trust data, we use quantified trust data that encode the topology structure of the user trust network; we use random walk with restart (RWR) to mine the user credibility and reconstruct the quantified trust network. Then, we propose the concept of primary preference feature space to represent the original user preferences, which are then mapped to rating space, truster space and trustee space and combined with reconstructed trust data and rating data to model users and items. Compared with other methods, our method is easy to understand and has higher recommendation accuracy.
The main contributions of this paper are summarized as follows:
- (1)
We propose a novel social recommendation framework based on trust and preference (TPSR). The TPSR framework consists of two parts: a user trust quantification method (TQ_ RWR) and a social recommendation model (UPPS).
- (2)
We propose a new trust quantify method based on random walk with restart to quantify trust. We mine the credibility hidden in the global trust network to quantify the trust, so as to represent the trust levels of different users. In addition, we proved that using a quantitative trust network for social recommendation can effectively improve the recommendation efficiency in the
Section 6.3.1.
- (3)
Assuming the user preferences in different scenarios are derived from the user’s primary preference feature space, we use three projection matrixes to map users’ primary preference feature vectors to different preference spaces, where three projection matrices correspond to three spaces: interaction space, truster space and trustee space. We propose a social recommendation model based on primary preference space, which is implemented by probabilistic matrix factorization method.
- (4)
Experiments based on four real-world datasets show the superiority of our framework. The experiment on TQ_RWR show that the trust metric algorithm effevtively improves the utilization of trust relationship data and improves the accuracy of recommendation results. At the same time, the experiment on TPSR proved that it alleviates the sparsity problem and the cold-start problem.
2. Related Work
In this section, we briefly review the related work of social recommendation methods.
The memory-based social recommendation method uses a trust network to supplement the rating data and predicts the users’ ratings according to the ratings from their trusted friends or the similar users. Jamali et al. [
4] proposed a random walk model that combines trust and collaborative filtering methods for recommendation. Massa et al. [
24] used trust data instead of rating data to find neighboring users. Berkani [
25] combined collaborative filtering methods with an optimized clustering method to cluster users by users’ similarity and trust relationships. Although the methods above effectively alleviate the sparsity problem and the cold-start problem, the time complexity of the algorithm also increases due to the added search strategy. Furthermore, these methods have low scalability and are not suitable for big data scenarios.
Model-based techniques are used to “guess” to what extent a user will like a new item and utilize several machine learning algorithms to train on the vector of items for a specific user, then they can build a model that can predict the user’s rating for a new item that has just been added to the system. As a typical model-based social recommendation method, the probability matrix factorization method can use low-dimensional vectors to represent users’ preferences for items and train prediction models based on rating data and trust data [
2,
3,
8,
9]. Ma et al. [
23] used probability matrix factorization to model trust network information and rating information. Yang et al. [
7] modeled users as trustees and trusted individuals, respectively. Liu et al. [
10] used network embedding, rather than PMF (probabilistic matrix factorization), and not with RWR (random walk with restart) to quantify users’ trust. Wu et al. [
26] used the dual graph attention network and proposed a new policy-based fusion strategy based on contextual multi-armed bandit to weigh interactions of various social effects. Wu et al. [
27] designed a special feature evolution unit that enabled the embedding vectors for two tasks to exchange their features in a probabilistic manner, and further harness a meta-controller to globally explore proper settings for the feature evolution units. Although SREPS [
10], DANSER [
26] and TrustEV [
27] have achieved good results to a certain extent, the time efficiency of the graph embedding method used by SREPS and the graph neural network method used by DANSER and TrustEV are very low.
The social recommendation algorithm based on probability matrix factorization not only possesses the advantages of the scalability of the matrix factorization method [
8,
9], but also utilizes social information to achieve higher accuracy. However, the trust data used by these social recommendation methods are binary data; it can only reflect whether there is a trust relationship between users and friends, but can not distinguish the strength of these friends. When we only use binary data for recommendation, we can only randomly select friends with trust relationship to assist in recommendation, and so can not find the most similar friends (because their trust values are all 1). On the contrary, when we quantify the trust data, we can easily refer to the most similar friend to make a more accurate recommendation. This motivates us to propose a new framework.
3. Social Recommendation Framework Based on Trust and Perference
In this section, we first introduce the definition of important notations in our paper, then briefly describe our proposed novel social recommendation framework: TPSR. Then, we will describe the two parts of the framework in more detail in
Section 4 and
Section 5, respectively. The common notations are explained in
Table 1.
3.1. Definition
First, we introduce some important notations used throughout our paper. We use and to represent a collection of users and items, respectively. Let denote the user-item rating matrix, where each element denotes the observed ratings of item j given by user i. Let denotes adjacency matrix of a social network, where each element denotes the trust relation between user i and user k. A trust relationship is usually binary data: indicates that user trusts user and indicates no trust. We let and denote the feature matrix of users and items, respectively. We use , and to denote the projection matrix for different space (i.e., from the user primary preference feature space to the rating space, truster space and trustee space). The social recommendation methods predict the missing ratings in the matrix by mining the user preference feature information implicit in the rating matrix and the trust matrix.
3.2. Overview of Our Framework
The TPSR framework consists of two parts: a user trust quantification method (TQ_ RWR) and a social recommendation model (UPPS). Among them, TQ_ RWR is a user trust quantification algorithm based on restart random walk, which measures the trust value between users by mining the user reputation hidden in the trust network topology. UPPS is a social recommendation model based on primary preference space, which is implemented by the probabilistic matrix factorization method.
In TQ_RWR, we use the method of random walk with restart (RWR) to mine the implied credibility in the binary trust network and further quantify the trust between users.
The existing social recommendation methods utilize binary data in the social trust network, where
indicates that user
trusts user
, and
indicates that the user
does not trust user
. In other words, the existing recommendation methods only utilize the local trust relationships, ignoring the overall structure and global information of the trust network [
22].
Figure 1 shows an example of trust network; the nodes represent users and the edges indicate the trust relationships between users. As we all know, there is a serious “star effect” on today’s various shopping or social platforms. For example, there is a “Big V” with a very low attention of hundreds of millions on microblogs. These “Big V” microblogs have a high credibility (also known as a reputation) in the hearts of many fans, and some of their behaviors will cause many fans to follow suit. It can be seen that credibility has a great influence on the generation of trust relationships in trusted social networks. Similar to the central user nodes of node 33 and node 34 in
Figure 1, these central users have trust relationships with a large number of other users, and these central users have a higher credibility compared with others, while the binary trust network cannot express this global network information. TQ_RWR considers that the central users represented by nodes 33 and 34 are trusted by a large number of other users, and therefore have a high degree of credibility, which reflects the trustworthiness of users.
In the UPPS model, we assume that users with social relationships exhibit a certain similarity based on interest preferences, and the higher trust between users, the more similar their interest preferences are. UPPS believes that the users consider different factors when building trust relationships or when rating items, as shown in
Figure 2.
Users may consider factors such as product quality, brand and appearance in the rating scene, and take into account factors such as the user’s occupation, personality and background in the social scene. Each user has their own values, including acceptance, achievement, ambition, attractiveness, etc. Whether in the rating scene or in the social scene, the influencial factors come from users’ own values. Therefore, the preference vectors of users in different scenarios are not always the same. That is, social networks and ratings exist in different preferences spaces. UPPS proposes the concept of a primary preference feature space, and treats the feature vector of users in different scenarios as the projections from primary preference vectors.
4. User Trust Quantification Based on Random Walk with Restart (TQ_RWR)
The random walk method was originally used to calculate the quality ranking of different web pages [
28]. In this paper, user nodes are considered as analogs of web pages, and whether users trust each other is analogous to the jump between web pages. First, a binary adjacency matrix
of the user trust network is constructed, where each element
indicates that user
trusts user
, and
indicates that user
does not trust user
. Due to the asymmetry of the trust relationship, the adjacency matrix
is an asymmetric matrix. In order to represent the jump probability between different user nodes, the matrix
is transposed and normalized by column to obtain a state transition matrix
, where each element
represents the probability of user
trusting user
. After a personalized random walk, we will obtain an credibility vector. A high credibility rating implies that a user is reliable, with others putting a high level of trust in him or her.
We establish a credibility vector
which indicates the credibility rank and the stationary visiting probability of each users after the
th random walk. Moreover, the vector
is defined as the solution of the following equation:
The equation above shows the process of random walk. ; and indicate the times of random walk. The initial value of vector is defined as which indicates that each user is elected with equal probability when starting to random walk.
In the user trust network, there are many user nodes similar to node
and node
in
Figure 3. The node similar to
, with only in-degree and no out-degree, is called a termination node, and the node similar to node
with a self-loop is called a trap [
29]. In the state transition matrix
, all elements of the column corresponding to the terminating node
are 0. After multiple random walks, all elements of the credibility vector
are 0. If there is a trap in the trust network, the main diagonal of the state transition matrix has at least one element equal to 1. There will be
ones on the main diagonal to represent
traps in the entire network. When walked to the trap, it will not jump to other nodes forever. Moreover, the element representing the trap in the trust vector
is 1, and other elements are 0. No matter whether there is a termination node or a trap, we cannot use Equation (1) to random walk to obtain an accurate credibility vector
. Hence, we also cannot quantify the trust between users.
In order to solve the problem of termination nodes and traps in the trust network, we use the method named random walk restart (RWR) to calculate the credibility vector
. When the current state of random walk is a termination code or a trap, it randomly jumps to the initial node with a certain probability to restart. The process of RWR is expressed as:
where
represents the probability of continuing to walk from the current user node, and
represents the probability of jumping from the current node to the initial user node to restart the random walk. The size of
is inversely proportional to the convergence speed of the iteration for Equation (2). The convergence speed will be slow and affect the performance of the method if
is too large. Otherwise, it will not reflect the effect of the walk [
30]. We let
.
After a certain number iteration, the credibility vector
converges [
31]. The larger entry
of vector
is, the higher credibility of the user
is, and the higher trust formed by other users on the user
. The trust of user
to user
is:
where
is the initial trust of user
to user
, while
and
are the credibility of user
and user
in vector
, respectively.
represents the quantified adjacency matrix.
As shown in
Figure 4,
Figure 4a is the initial trust network, and matrix
is the adjacency matrix of trust network in
Figure 4a. However,
Figure 4a only shows whether there is a trust relationship between users, but can not evaluate the degree of trust. TQ_RWR calculates the user credibility ranking vector
by the method of random walk with restart, and reconstructs the trust network as the network of
Figure 4b, where
is the quantified adjacency matrix.
Figure 4b not only shows whether there is a trust relationship between users, but also quantifies the trust. As a central user node C, others’ trust in C is the highest.
This section explores the implied credibility of users in the trust network by TQ_RWR, and uses Equation (3) to measure the trust between users, then reconstructs the user trust network according to the new trust. It is proven in
Section 6.3.1 that social recommendation using the reconstructed trust network can effectively improve the recommendation effect. Thus, the social recommendation of the subsequent sections use the reconstructed trust network.
5. Social Recommendation Model Based on User Primary Preference Space (UPPS)
This section proposes a social recommendation model based on primary preference space, which is implemented by the probabilistic matrix factorization method. In
Section 5.1, we first introduce a classic mode-based social recommendation method named SoRec [
23], and improve on the basis of the SoRec model to obtain the UPPS model. In
Section 5.2 and
Section 5.3, we introduce the modeling process and the model parameter optimization process of the UPPS. Finally, we analyse the time complexity of UPPS and SoRec.
5.1. The SoRec Model
SoRec, proposed by Ma et al. [
23], is one of classic social recommendation methods and was used as the baseline in our experiment. SoRec assumed that the rating system shares the same preference space with the social network and uses the probabilistic matrix factorization method to model. SoRec used the matrices
,
and
to represent the user feature matrix, the item feature matrix and the trust feature matrix, respectively. Column vectors
and
represent user-specific and factor-specific latent feature vectors, respectively.
Figure 5 shows the probabilistic graphical model of SoRec.
SoRec defines the conditional distribution over the observed rating matrix and social network relationships as:
where
is the probability density function of the Gaussian distribution with mean
and variance
.
and
are the indicator functions.
is the logistic function, making it possible to bound the range of
and
within the range [0, 1]. The vectors
,
and
are the preference feature vectors of user
, item
and trustee user
, respectively.
They also place zero-mean spherical Gaussian priors on user, item and factor feature vectors:
According to the Bayesian inference, the posterior distribution of the parameters is proportional to the product of the prior distribution of the parameters and the likelihood function of the data. Hence, the posterior probability of the feature matrix is:
Using the stochastic gradient descent method, they solve the matrices , and to maximize the posterior probability of Equation (9), and finally predict the user’s score using the formula .
5.2. The UPPS Model
In UPPS, we have a rating matrix
, a trust matrix
, a user primary preference feature matrix
, an item feature matrix
and three space projection matrixes
,
and
that are mapping user primary preference features into rating space, truster space and trustee spaces, respectively.
,
and
donate the user feature matrix of rating space, truster space and trustee space, respectively.
Figure 6 is the overview of the UPPS model. According to probability theory, the conditional probability distribution of the rating matrix
is defined as:
where
is the space projection matrix that maps the primary preference space into the rating space.
is the primary feature of user
and
is the feature vector of item
. The prior of the user primary preference feature matrix
and the item feature matrix
are modeled as zero-mean spherical Gaussian distributions.
The preference feature vectors of users in the rating scene and the social scene are regarded as the projections from preference features in the primary space by multiplying space projection matrixes, such as
,
and
. The primary vector of user
in the primary preference space is
. The preference vector in the rating space, the truster space and the trustee space are
,
and
, respectively. We model the rating data and trust data with matrix factorization. Then, we learn the user primary feature vectors and the space projection matrices. Finally, we use user primary feature vector
, the rating space projection matrix
and the item feature vector
to predict the missing rating by
.
The prior of the space projection matrix
is:
The conditional probability distribution of the trust matrix
is defined as:
is the trust after quantification by our method TQ_RWR. The matrix , and have the same Gaussian distribution with the same mean and variance.
According to a Bayesian inference, the posterior probability distribution of the matrix
and
is:
Figure 7 shows the functional dependency between variables and parameters. The above has completed the modeling process of UPPS. Next, we need to optimize the parameters. Through learning the algorithm parameters, we can obtain the user’s basic preference feature matrix
, item feature matrix
V and spatial transfer matrix
that can be used for training data. Finally, we can calculate the user
prediction score
for the item
through
.
5.3. The Optimization of the UPPS Model
Due to the conditional distribution of the rating matrix and the trust matrix and the prior distribution of the feature matrix being Gaussian distributions, we use the logrithm of the posterior probability to conveniently calculate the gradient to maximum posterior probability.
where
C is a constant which does not depend on the parameters. Maximizing the above log-posterior with hyper-parameters is equal to minimizing the following objective function:
where
,
,
,
. We use gradient descent approach to train the proposed
model and to minimize corresponding objective functions. The gradients of the objective function
with respect to the parameters
and
are presented as follow, respectively.
The interval of the quantified trust is (0, 1]. It is worth noting that the quantitative objects are users with trust, not users without trust. In order to better conveniently train the parameters of the UPPS model, we use a function to map the rating data to the interval (0, 1]. After learning the parameters, we use the function to predict the rating of user to item , where .
5.4. Time Complex Analysis
The time complexity of our model is , where is the number of iterations; is the dimensionality of preference feature vectors; and and are the numbers of ratings and trust links. The costs of our model are computing the gradients of against the matrix , , , and . The time complexities are , , , and , respectively. Therefore, the time complexity of , the same as SoRec, is linearly scaled to the numbers of observed ratings and trust links.
6. Experiments and Validations
In
Section 6.1, we introduced the dataset used in the experiment, and in
Section 6.2, we introduced the evaluation metrics and baseline method. Finally, in
Section 6.3, the effectiveness of TQ_RWR and TPSR is verified.
6.1. Datasets
In order to avoid the contingency and bias of the experiments, we selected four independent public datasets related to social recommendation, including Epinions [
24], FilmTrust [
32], Douban [
33] and Ciao [
34]. These four datasets contain both rating data and social trust data. The trust networks of Epinions and FilmTrust are directed, while the trust network of Douban is undirected, because the new friend requests in this website must be verified and approved by both parties. The statistics of these four datasets are shown in
Table 2. For all of the datasets, 80% of the rating data are kept for training by selecting randomly, and the rest are used for testing. Specifically, the parameters of baseline methods are determined by their performance on the validation set. Then, the experiments are conducted with a five-fold cross validation 10 times and the average performances are presented.
6.2. Evaluation Metrics and Comparison Methods
We adopt four representative metrics as evaluation criteria for the recommended performance: root mean square error (RMSE), accuracy (Precision), recall rate (Recall) and F1 value. Each evaluation metrics is calculated as follows.
The root mean square error (
RMSE) is defined as:
where
denotes the set of ratings in the testing set, and
is the size
. The
Precision,
Recall, and
F1 value are defined as:
where
is the set of items that user
likes, and
is the set of items in the recommendation list which user
likes.
In order to comprehensively evaluate the recommended effect of the TPSR framework proposed in this paper, we selected the following eight state-of-the-art social recommendation methods as baselines:
- (a)
Unifying user-based and item-based collaborative filtering approaches by similarity fusion (CF) [
35], in which the rating of neighboring users is used to predict the rating of the target users.
- (b)
Probabilistic matrix factorization (PMF) [
8], in which users and items are mapped to low-dimensional vector space using Bayesian probability matrix decomposition.
- (c)
Social recommendation using probabilistic matrix factorization (SoRec) [
23], Bayesian probability matrix decomposition is proposed to use to establish the relationship between user preferences and trusted friends.
- (d)
Social collaborative filtering by trust (TrustPMF) [
7], in which trust and being trusted are considered.
- (e)
User rating prediction based on trust-driven probabilistic matrix factorization (TPMF) [
36], in which trusting users indirectly affect user preferences and directly affect user ratings.
- (f)
Social recommendation with an essential preference space (SREPS) [
10], in which network embedding rather than PMF is used.
- (g)
Dual graph attention networks for a deep latent representation of the multifaceted social effects in recommendation systems (DANSER) [
26], in which the dual graph attention networks are used and a new policy-based fusion strategy based on a contextual multi-armed bandit to weigh interactions of various social effects is proposed.
- (h)
Feature evolution-based multi-task learning for collaborative filtering with social trust (TrustEV) [
27], in which a special feature evolution unit that enables the embedding vectors for two tasks to exchange their features in a probabilistic manner is designed, and a meta-controller to globally explore proper settings for the feature evolution units is used.
The CF and PMF methods use only rating data for recommendation, while SoRec, TrustPMF, TPMF, SREPS, DANSER, TrustEV and our TPSR use both trust data and rating data for recommendation. CF is based on the nearest neighbor method; SoRec, TrustPMF, PMF, TPMF and our TPSR are based on matrix factorization; and SREPS, DANSER and TrustEV are based on neural networks methods.
6.3. Experimental Results
6.3.1. Verification Experiment for Quantitative Trust
We test whether our TQ_RWR method can improve the recommendation results. In this section, we use the social recommendation method SoRec as the baseline method. The initial trust network and the quantitative trust network are used for comparison experiments, and we use RMSE as the metrics of these experiments.
The results of comparative experiments on the four datasets are presented in
Table 3. The SoRec column indicates recommendation without using TQ_RWR to quantify user trust, while the SoRec+TQ_RWR column indicates recommendation using TQ_RWR to quantify user trust.
Obviously, the FilmTrust dataset has the highest
RMSE reduction rate. Comparing the density of the rating data of these four datasets presented in
Table 2, it indicates that the lower the density of the dataset, the more efficient the quantified users’ trust by TQ_RWR would be. The experimental results show that the user trust quantification method, based on TQ_RWR, can effectively improve the utilization value of the trust network. Moreover, using a quantitative trust network for social recommendation can effectively improve the recommendation efficiency.
6.3.2. Experimental Results of the TPSR Framework
We verify whether TPSR can improve the recommendation results. As mentioned in
Section 1, data sparsity and cold-start are the huge challenges faced by the collaborative filtering methods. Hence, in order to verify the effect of TPSR to relieve these two problems, we conducted two experiments: the first is a global experiment on all users of four datasets, and the second is experiments on different sparsity users. Users are grouped according to the number of single-user rating items, and then each group of data is used to experiment separately.
- (1)
Global experiments on all users.
This experiment is used to verify the recommendation effect of the TPSR framework on all users’ data. There are two goals in the recommendation area, including obtaining the predicted ratings of items and recommending the item list. Hence, in this part, we also conduct two experiments: (a) the experiment to verify rating prediction accuracy using
RMSE as the indicator of accuracy, and (b) the experiment to measure item recommendation list quality, using
accuracy, recall and
F1 value as indicators of the quality of the list. The TPSR framework is compared with eight other methods, and the results are shown in
Table 4 and
Table 5.
As shown in
Table 4, compared with traditional collaborative filtering recommendation methods of CF and PMF (only interactive information, not trust information), the
RMSE of the other seven social recommendation algorithms are reduced, which indicates that using trust information for social recommendation can effectively reduce the error of the rating prediction. Among the seven social recommendation methods using trust data, the proposed TPSR framework has the lowest
RMSE on FilmTrust, Douban and Ciao datasets. Only in the Epinions dataset is the
RMSE of TPSR slightly higher than that of the SREPS, DANSER and TrustEV algorithms. Although the DANSER algorithm has achieved the best result in the Epinions dataset, the result in the small dataset from FilmTrust is not ideal and is far higher than TPSR. When we ran the DANSER algorithm on the Douban and Ciao datasets, we found that the efficiency of the algorithm was very low. We have estimated that it may take several months to train the model; however, TPSR only takes approximately twenty minutes. Therefore, the results of these two sets of experiments are not shown in
Table 4. As shown in
Table 2, the density of scoring data in the Epinions dataset is much lower than that of the other three datasets, while graph-embedding methods and neural network methods have inherent advantages for sparse data mining at the cost of time complexity. In short, in terms of prediction accuracy and time complexity, compared with the existing recommendation algorithm, the proposed TPSR framework has certain advantages.
In the recommendation list experiment, items with ratings greater than 4 are considered to be items that the user really loves. As shown in
Table 5, compared with other methods, TPSR has highest value in terms of
accuracy,
recall and
F1 value. The results show that TPSR can effectively improve the quality of the recommendation list.
In summary, experiments with two main tasks of scoring predictions and recommendation lists for recommended areas indicate that the model of the primary preference space proposed by TPSR in this paper has achieved excellent results in reducing the scoring error and improving the quality of the recommendation list.
- (2)
Experiments on different sparsity users.
The experimental results in datasets with different sparsity can reflect the effect of methods to mitigate the sparsity and cold-start problems mentioned in
Section 1. In order to verify the prediction ability of different models for users with different sparsity levels, we first divide the four datasets into seven groups according to the number of single-user ratings, so as to observe the degree of cold-start. Instead of calculating the average prediction accuracy of all users, this part of the experiment calculates the prediction accuracy of each user group to measure the
RMSE on users with different cold-start levels. The lower the number of users with single-user ratings, the higher the degrees of the cold-start and sparsity problems.
Figure 8 shows the number of users as the range of the number of single-user ratings changes, which is used to indicate the sparsity and cold-start degree of the user score in each dataset. In FilmTrust, Epinions and Douban, the number of users in each group varies widely. In FilmTrust, more than 90% of users rated less than 50, and FilmTrust has the problem of data sparse and cold-start. In Epinions, approximately 50% of users rated less than 5, indicating that Epinions has the highest sparsity and cold-start problem. In the Douban, 40% of the user scores are greater than 200, indicating that the sparseness and cold-start are low and the user rating information is relatively sufficient.
In
Figure 8, the horizontal coordinate of the line graph is the range of the number of single-user ratings, and the ordinate is the RMSE corresponding to the predict the user’s ratings. For the experiments on each dataset, users with a lower number of single-user ratings have a poorer prediction accuracy. The higher the number of single-user rating items, the more historical records the user has, and the better the cold-start problem can be alleviated.
It can be seen from the prediction results of different methods that the proposed TPSR framework reduces the RMSE in each group. The RMSE of the TPSR, TPMF, TrustPMF and SoRec models on each group is lower than the PMF model (only utilizing the interaction information, without the trust information), which proves that the combination of trust information and rating information can improve the recommendation effect, and TPSR reduces the RMSE more significantly. In the group with a lower number of ratings, especially in the group with the number of single-user ratings between 0 and 5, TPSR is especially effective to reduce of RMSE. The results show that TPSR can obviously alleviate the sparsity and cold-start problems.
7. Conclusions and Future Work
Social recommendation is to alleviate the data sparsity problem and cold-start problem in recommendation systems by incorporating social media information. The existing social recommendation methods have two deficiencies: (i) using the binary trust network, which can not reflect the trust level of different users; and (ii) not considering the reality that users may have different preferences in different scenarios when purchasing goods and establishing friendships.
To address the aforementioned issues, in this paper, we propose a novel social recommendation framework, TPSR, based on quantified trust and primary preference space. TPSR consists of two modules: TQ_RWR and UPPS. TQ_RWR is a trust quantification method based on random walk with restart. The quantified trust, generated by mining the credibility hiding in the global trust network, can represent the trust levels of different users. UPPS is a social recommendation model based on primary preference space, which is implemented by the probabilistic matrix factorization method. We map users’ primary preference feature vectors to different preference spaces using a user’s primary preference space model. The mapped preference features in different spaces embody users’ different preferences in different scenarios. We demonstrate the high performance of TPSR in terms of different metrics, including the root mean square error, precision, recall and F1 value for four public datasets. At the same time, one problem with our framework is that the interpretability of probabilistic matrix factorization is not high enough. In the future, we will find a way to solve this problem. In addition, we intend to introduce the primary preference space into the graph model as a direction in our future work.