An Item–Item Collaborative Filtering Recommender System Using Trust and Genre to Address the Cold-Start Problem

Hasan, Mahamudul; Roy, Falguni

doi:10.3390/bdcc3030039

Open AccessArticle

An Item–Item Collaborative Filtering Recommender System Using Trust and Genre to Address the Cold-Start Problem

by

Mahamudul Hasan

^1,*,†

and

Falguni Roy

^2,†

¹

Department of Computer Science and Engineering, East West University, Dhaka 1212, Bangladesh

²

Institute of Information Technology, Noakhali Science and Technology University, Sonapur 3814, Noakhali, Bangladesh

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Big Data Cogn. Comput. 2019, 3(3), 39; https://doi.org/10.3390/bdcc3030039

Submission received: 10 June 2019 / Revised: 30 June 2019 / Accepted: 2 July 2019 / Published: 8 July 2019

Download

Browse Figures

Versions Notes

Abstract

:

Item-based collaborative filtering is one of the most popular techniques in the recommender system to retrieve useful items for the users by finding the correlation among the items. Traditional item-based collaborative filtering works well when there exists sufficient rating data but cannot calculate similarity for new items, known as a cold-start problem. Usually, for the lack of rating data, the identification of the similarity among the cold-start items is difficult. As a result, existing techniques fail to predict accurate recommendations for cold-start items which also affects the recommender system’s performance. In this paper, two item-based similarity measures have been designed to overcome this problem by incorporating items’ genre data. An item might be uniform to other items as they might belong to more than one common genre. Thus, one of the similarity measures is defined by determining the degree of direct asymmetric correlation between items by considering their association of common genres. However, the similarity is determined between a couple of items where one of the items could be cold-start and another could be any highly rated item. Thus, the proposed similarity measure is accounted for as asymmetric by taking consideration of the item’s rating data. Another similarity measure is defined as the relative interconnection between items based on transitive inference. In addition, an enhanced prediction algorithm has been proposed so that it can calculate a better prediction for the recommendation. The proposed approach has experimented with two popular datasets that is Movielens and MovieTweets. In addition, it is found that the proposed technique performs better in comparison with the traditional techniques in a collaborative filtering recommender system. The proposed approach improved prediction accuracy for Movielens and MovieTweets approximately in terms of 3.42% & 8.58% mean absolute error, 7.25% & 3.29% precision, 7.20% & 7.55% recall, 8.76% & 5.15% f-measure and 49.3% and 16.49% mean reciprocal rank, respectively.

Keywords:

item-based collaborative filtering; cold-start problem; genre; asymmetric correlation; transitive inference

1. Introduction

For the easy access of the internet through different smart objects, the world wide web (WWW) has become a vast information pool. Nowadays, achieving required information from this information pool is an onerous job because of the complexity of searching required information is also high. To resolve this problem, the recommender system (RS) has been introduced in the World Wide Web and ease internet users by suggesting them with their required information, items or services [1]. The recommender system (RS) is broadly used in e-commerce and amusement based websites. Amazon, Facebook, Youtube, and IMDB are the real-life applications of recommender systems. Currently, e-learning and e-tourism based websites are also implementing the recommender system to aid users by gaining the required information smoothly from different websites [2,3].

Basically, a recommender system (RS) is a web information personalization system that is used to suggest a set of items or services from the vast items or services on the internet to the users by predicting their preferences [1]. Personal preference prediction is done by generating a rating prediction model on the basis of analyzing the user’s rating pattern of similar items. According to the analysis methodology of the user’s rating pattern, RS is classified as content-based recommending, collaborative filtering and hybrid recommender system [1], but collaborative filtering is more efficient than others and also easy to implement to discover complex patterns [4]. In collaborative filtering (CF), selection and recommendation of the items are done by defining the similarities between items or users’ tastes. Based on the similarity of the subject, CF can be categorized as either user or item-based CF. An item-based CF technique defines the similarity between items and then recommends a new item to the user on the basis of similar items characteristic [5]. User-based CF also does the same thing by defining the similarity of the preferences of users. Traditional item-based CF performs excellently for a dense dataset, but, in the case of the sparse dataset, it performs inversely and data sparsity also leads to a cold-start problem in the RS. Sometimes, in RS, there exists a high amount of new users or items, again, some old users or items that have a low quantity of rating information so that traditional RS cannot make any inferences. In RS, such users and items are depicted as cold-start users and cold-start items and cannot predict preferences accurately [6]. To improve the recommendation accuracy, some of the researchers have proposed some techniques to improve prediction quality of the simple item-based RS by using side data to address the cold-start problem [7,8,9] and some of the researchers have suggested integrating trust in the existing CF which will be defined by the degree of similarity between subjects named as implicit trust [10,11].

According to the trust theory, four main trust properties are utilized. They are asymmetry, transitivity, dynamicity, and context-dependence [12]. The trust value between two subjects should not be the same in the asymmetry property. If there exists a trust relationship between subject ‘a’ with subject ‘b’ and subject ‘a’ with subject ‘c’ then there could be a relative trust relationship between subject ‘a’ and subject ‘c’ in the transitive property of the trust. The trust relationship between two subjects is not constant and it might be changed over time and objects which are determined in the dynamicity and context-dependence of trust [12].

1.1. Motivation

Usually, the users’ behavior is a complicated process and is not constant to every user so it is too arduous to define uniformity between users by using a single correlation determination method. Some correlation determination method wouldn’t work well in all circumstances. Different methods work well in different scenarios. Moreover, the performance of traditional correlation or similarity methods is not up to the mark for the case of the sparse dataset and cannot predict recommendation for cold-start users or items. As most of the real-life applications deal with the sparse dataset so this problem should be handled efficiently. Thus, here, a new fast and efficient item-based CF method have been proposed with better prediction accuracy to overcome the existing issue of cold-start items.

The novelty of the proposed method is, firstly, that the direct asymmetric similarity is calculated between items based on the genre of items on the basis of the asymmetric characteristic of the trust relationship. Secondly, the relative similar relation between items is calculated based on the transitive property of items’ trust values. Again, defining the similarity between items by using the item’s genre is a new approach in this domain, which is not addressed earlier. When a new item enters the system, it might belong to any genre. In addition, an item that is rated hardly must belong to at least one genre. As the information of genre is available previously, the similarity between the items can be defined by taking the help of genre which is the key to alleviating the problem of the cold-start item. However, the genre-based items similarity method may treat a new item as equally similar to the highly-rated item of the system for their presence in more common genres. However, as one of the items is new and another one may be highly popular in the system and contains users’ rating information, so the degree of similarity between them should not be uniform. By taking account of the corresponding item’s rating data, the asymmetric similar relationship is incorporated in the genre-based items’ similarity. Furthermore, an enhancement of the prediction method has been proposed to increase the accuracy of the recommender system performances.

1.2. Paper Contribution

The key purpose of this paper is to design a new item-based collaborative filtering model by calculating the similarity of items based on items’ genre and also improve the prediction accuracy of recommendation. The contribution of this paper is classified as fourfold and these are:

A single asymmetric similarity method has been proposed by taking account of items’ genre with the reliability between them which defines the direct asymmetric similar relation of items.
Another new similarity method has been defined by identifying the correlations of items based on the transitive relations of reliability between them.
A prediction algorithm is proposed to increase the accuracy of recommendation.
A detailed experiment is done to prove our statement that the proposed methodology outperforms existing methods.

The organization of the paper is as follows: Section 2 demonstrates an overview of the related works with the explanation of previous works’ issues and also the aim of the paper. Section 3 describes the proposed approach to serve the aim of the paper by getting rid of the existing cold start problem. Section 4 presents the list of datasets on which the proposed approach will be tested and the methods that are used to validate proposed approach’s performance. In Section 5, experimental results and evaluation are reported and, finally, Section 6 concludes the presented work with the future direction.

2. Related Works

Nowadays, the recommender system (RS) has become one of the most trendy and interesting areas of study for the researchers that cover the mass zone of computer science. As RS is a multifaceted sphere, it is researched in the field of statistics [13], calculative trust with or without social network association [14,15,16,17], machine learning [18], agent-based artificial networks [19], human–computer interaction [20], and more. Basically, a filtering algorithm has been used in RS to take account of the important and required information. The first filtering system called Tapestry, which was introduced in the early 1990s, allows its users to add explicit feedback by commenting to the e-mail messages so that other users could find them by filtering through any related queries which match with the comments [21]. Later, many improvements have been done in the filtering approach to performing RS efficiently and accurately. On the basis of the filtering approach, RS could be classified in many forms, but collaborative filtering (CF) is the most efficient approach nowadays.

Defining similarity between subjects is the core feature of the collaborative filtering (CF) approach and the performance of CF-based RS closely depends on it. Based on the way of defining similarity, CF is classified as a model or memory based CF. In model-based CF, the model construction process for calculating similarity is done by utilizing different machine learning methods such as a genetic algorithm [22], Bayesian network [23], neural network [24], or rule-based approaches [25], whereas memory-based CF used some of the known similarity measures like a Pearson correlation coefficient (PCC), constrained Pearson correlation coefficient (CPCC), weighted Pearson correlation coefficient (WPCC), Jaccard similarity, mean square distance (MSD), Jaccard similarity with mean square distance (JMSD), cosine (COS), etc. [26]. Furthermore, memory-based CF is divided as the user and item-based CF on the basis of determining the correlation of subjects where a subject could be either user or item [12]. However, item-based CF performs better than user-based CF in the case of a high quantity of items that exists with the low amount of ratings in the RS [27].

Item-based CF first came up with Amazon’s item recommendation and afterward ratified by other service-based websites like Youtube [28,29]. In item-based CF, the similarity of each pair of items is defined based on the ratings that are given by the users and then suggests a new set of items to a target user that is not rated yet by the target user but correlated with the target user’s rated items [28]. For Cosine-based Similarity, Pearson correlation based Similarity, Adjusted Cosine Similarity, etc., similarity measures are used to identify the degree of correlations between items [27]. Many researchers have proposed different approaches to construct a better version of item-based CF. Li et al. [30] have proposed a privacy-preserving item-based CF by taking account of users’ privacy of ratings from others. They recommend an unsynchronized protocol called UnsyncSum to attain secure multi-party computation. After that, they have modified two popular similarity computation methods without affecting RS’ performance and proposed PrivateCosine and private person correlation method with the protection of users’ privacy. Dakhel et al. [31] have defined a new method to compute the degree of correlation between items by modifying the traditional cosine similarity method named as Item Asymmetric Correlation (IAC). Then, the asymmetric correlation is used as additional information in matrix factorization to incorporate with item-based CF. However, there still exist some problems and most of them are sparsity, scalability, and cold-start [6,27].

Later, trust is introduced in the RS as a solution to existing data sparsity and cold-start problem. Based on the way of defining trust, it is classified as an explicit and implicit trust. In an explicit trust approach, trust is expressed in binary format by the users, whereas implicit trust is calculated from the degree of similarity between subjects [12]. Implicit trust based similarity between subjects is determined by applying either weighted similarity measures [4,12,32,33] or using a probabilistic technique [34] or incorporating a social trust network [16,35] or integrating fuzzy logic [36,37] in the recommender system.

Different researchers have also researched the cold-start problem and proposed solutions. Ebesu et al. [38] have used the deep neural network and implicit user feedback to conclude the semantic representation of items to solve the cold-start problem. Barjasteh et al. [39] have proposed a matrix factorization approach to explicitly utilize the correlated data of items with the knowledge of existing rating information, which is applied as side data for cold-start items. In addition, Blerina et al. [40] have introduced a classification algorithm along with a traditional correlation determination metric to cure the problem of cold-start but which is not sufficient.

In this paper, our aim to mitigate the cold-start item problem by defining the degree of direct and relative inter-connectivity between items with the help of items’ genre and trust value between the items’ correlation. That is why our proposed system performs better even in the case of a sparse dataset.

3. Proposed Method

In this section, an item-based correlation determination metric has been proposed on the basis of the items’ genre. Afterwards, a prediction algorithm has been proposed to increase the prediction accuracy. Existing CF-based methods cannot recommend items for the users with convenient reliability for the sparseness of the dataset. Most of the time, they produce ambiguous results, especially for the cold-start. To resolve this problem, we proposed to define the correlation between items by using items’ genre data which results positive in successful correlation determination between cold-start items without any error in the sparse environment.

3.1. Genre Based Correlation Determination

The output of this section is an item based correlation metric which is calculated from the items’ genre. In this section, the classification of items has been done according to their genre and, after that, the similarity between items have been calculated on the basis of genre classification. Whenever a new item enters the system, it does not contain any rating but it must belong to some genre. On the other side, if there exists an item in the system with a low amount of users’ ratings, then traditional item-based CF fails to predict recommendation for that item. Thus, by taking account of items’ existence to any genre, we construct Table 1 where we demonstrate a sample of items with its genre. Here, items denote as movies and

“ 1 ”

refers to the movie’s existence to that specific genre.

On the basis of Table 1, we can deduce that an item may belong to more than one genre. The existence of an item to a particular genre indicates the item’s affiliation to that genre. As here we denote movie as the item and when the movie “Wrong Turn” is chosen from Table 1, it has been seen that only belongs to the “Horror” genre. According to our assumption, it is surmised that the entire movie is affiliated to the horror genre. Furthermore, the movie “Stardust” can be classified as “Romantic” and “Adventure” genres based on the movie plot. Thus, again on the basis of our assumption, the inference can be drawn that 50% of this movie is affiliated to romantic and 50% to adventure. Again, “Wolf Girl” movie as 33% of the whole movie is affiliated to romantic, 33% to horror and 33% to adventure genre can be concluded by identifying the movie plot. Thus, based on this classification, a bipartite graph has been built which is shown in Figure 1 where each edge of the graph denotes the degree of affiliation of a movie to the specific genre.

In Figure 1,

I_{x}

denotes list of items and

C_{x}

defines list of genres. Based on the bipartite graph, the degree of correlations between different items or movies according to their genres can be calculated. However, it could be claimed that two movies are highly correlated in terms of genres when they both follow the features of genre one (

C_{1}

) and genre two (

C_{2}

) concurrently. The correlations that are computed from the above statement have been shown in Table 2.

Thus, there exist some possibilities of two movies to be highly correlated to each other when the items belong to more than one common genre. For this reason, a sigmoidal activation function is used here to boost up the correlation factor of the items as the nature of the sigmoidal activation function is shown in Figure 2. Thus, the correlation between items can be determined by using Equation (1) and can be denoted the correlation as

S i m (I_{x}, I_{y})

:

S i m (I_{x}, I_{y}) = \frac{\sum_{a = 1}^{N} (I_{x, a} \times I_{y, a})}{N} \times \frac{1}{1 + e^{- N}} .

(1)

Here,

I_{x, a}

and

I_{y, a}

denote the probability of item

I_{x}

and

I_{y}

belonging to genre a and N represent the number of common genre belongs to items

I_{x}

&

I_{y}

, respectively.

3.2. Confidence with Laplace Correction

Confidence manifests the degree of trustworthiness of coupling between items on the basis of the number of users who rate the items and impacted when the co-rated items’ amount has been changed [32]. Confidence is not constant and could vary based on the amount of co-rated items. The confidence of items is calculated by using the following Equation (2):

C o n f i d e n c e (I_{x}, I_{y}) = \frac{(I_{x} \cap I_{y}) + 1}{I_{y} + 1} .

(2)

Here, the confidence value of target item

I_{x}

on the recommender item

I_{y}

in the system is denoted as

C o n f i d e n c e (I_{x}, I_{y})

. The common ratings between items x & y are defined by

I_{x} \cap I_{y}

and

I_{y}

, which denotes the total amount of ratings given to item y by the system users. As the total amount of ratings of the item could vary from item to item, thus the confidence value between each pair of items is not symmetric which implies that the values of

C o n f i d e n c e (I_{x}, I_{y})

and

C o n f i d e n c e (I_{y}, I_{x})

are asymmetric. Since confidence is multiplied to get the final correlation measures, Laplace correction has been introduced to eschew the zero value.

3.3. Direct Inter-Connectivity Detection: Genre Based Item–Item Asymmetric Similarity

In this section, genre-based item-item similarity and degree of reliability of items’ association have been integrated by taking account of both cold-start and highly-rated items in the system. As only taking consideration of genre-based item–item similarity could produce a high degree of uniform similarity between cold-start and highly-rated items without accounting users’ rating information of the rated items which may lead unsatisfactory results in prediction. For example, a cold-start item might be n% similar to any of the highly-rated items in the system on the basis of belonging the same genres where n is any numeric number. However, any highly-rated item could not be n% similar to the cold-start item as it not only belongs to the same genres but also contains users’ preference data which is users’ rating. Thus, the direct inter-connectivity between items has been ensured as asymmetric as their confidence values are not symmetric and the confidence values take account of co-rated items’ amount. The final similarity for the items has been given in Equation (3). Here, the integrated method is denoted by

D i S i m (I_{x}, I_{y})

and the degree of direct asymmetric similarities between items have been represented using Table 3:

\begin{matrix} D i S i m (I_{x}, I_{y}) = \frac{\sum_{a = 1}^{N} (I_{x, a} \times I_{y, a})}{N} \times \frac{1}{1 + e^{- N}} \times C o n f i d e n c e (I_{x}, I_{y}) . \end{matrix}

(3)

3.4. Relative Inter-Connectivity Detection: Inferring Transitivity of Reliable Correlation of Items

By using the transitive property of the trust, the relative similarity between items can be built whose are not directly similar to the target items. It is known as trust propagation [32]. Trust propagation is done by combining consecutive direct correlations within all in-between items. The trust propagation length, which is a variable, should be finite to define how many consecutive direct associations need to combine for inferring relative inter-connectivity [32]. In our system, variable-length for trust propagation is set to 2, where 1 defines the direct association between the target and recommended items. Finally, the average composition is utilized to calculate relative similarity which is denoted by

R e l S i m (I_{x}, I_{y})

. Here, a sample of relative similarity of items has been demonstrated in Table 4.

\begin{matrix} R e l S i m (I_{x}, I_{y}) = \frac{1}{N} \times \frac{\sum_{i = 1}^{N} (D i S i m (I_{x}, I_{z}) + D i S i m (I_{z}, I_{y}))}{2} \\ = \frac{\sum_{i = 1}^{N} (D i S i m (I_{x}, I_{z}) + D i S i m (I_{z}, I_{y}))}{2 N} . \end{matrix}

(4)

Here, N denotes the variable length of the trust propagation that items x & y have been utilized and z denotes the intermediate item which is correlated with items x & y, respectively.

3.5. Proposed Prediction Algorithm

Here, an enhanced prediction algorithm has been proposed. In order to compute the predicted rating

p_{u, x}

for user u of an item x, Algorithm 1 is used. A direct similarity value between item x and item y has been utilized to calculate the predicted rating of item x for user u. The relative similarity value is used when there exists no direct similarity between the item x and the item y.

Algorithm 1: Enhanced Prediction Algorithm

Input: A list of users,

u \in U

and items,

x & y \in I

and similarity values of items,

s i m (x, y)

Output: A list of predicted ratings

p_{u, x}

1:: Begin
2:: $U \leftarrow s e t o f u u s e r s$ , $I \leftarrow s e t o f i t e m s x a n d y$
3:: $d i v i d e n d \leftarrow 0$ , $d i v i s o r \leftarrow 0$ , $t e m p \leftarrow 0$
4:: $l i s t o f Q u o t i e n t_{x} \leftarrow 0$
5:: for each item $y \in I$ do
6:: if ( $r_{u, y}$ ≠ 0) then
7:: $t e m p \leftarrow r_{u, y} - \bar{r_{y}}$
8:: $d i v i d e n d \leftarrow d i v i d e n d + s i m (x, y) \times t e m p$
9:: $d i v i s o r \leftarrow d i v i s o r + s i m (x, y)$
10:: end if
11:: $Q u o t i e n t_{x} \leftarrow d i v i d e n d / d i v i s o r$
12:: $p_{u, x} \leftarrow \bar{r_{x}} + Q u o t i e n t_{x}$
13:: if $p_{u, x} < 0.5$ then
14:: $p_{u, x} \leftarrow 1$
15:: else
16:: $p_{u, x} \leftarrow R o u n d i n g (p_{u, x})$
17:: end if
18:: end for
19:: End

A rounding operation for the predicted rating has been proposed to boost up the prediction. For example, 5 is taken as the final rating when the predicted rating is in between 4.5 to 5. Again, 4 is taken as the final rating when the predicted rating is in between 3.5 to 4.49. The implementation is the same for the rest of the cases.

4. Experimental Setup and Evaluations

4.1. Dataset

To demonstrate the effectiveness of the proposed approach, we choose offline analysis which uses pre-compiled offline datasets. Though there exist three types of offline datasets which are true-offline-datasets, user-offline-dataset, and expert-offline-datasets [41], we used true-offline-datasets for our experiment. Because true-offline-datasets are introduced in the area of collaborative filtering based RS where users express their like and dislike to any item by giving ratings and this kind of datasets commonly used for the evaluation of collaborative filtering algorithm. Thus, we choose two popular true-offline-datasets for our proposed methodology and those are Movielens and MovieTweets dataset.

The first dataset, MovieLens, is called ML-1M and commonly used in maximum research of a collaborative filtering algorithm to validate proposed system performance. It contains 6040 users and 3952 movies with 1,000,209 ratings and 19 genres. The rating range is 1 to 5 and the density of the user-item matrix is 4.10%.

The MovieTweets dataset also contains movies’ ratings and it is constructed by expertly organized tweets on Twitter. This dataset is the outcome of the investigation coordinated by Simon Dooms [42]. According to 2016’s data, this dataset includes 43,357 users and 25,193 items. The rating range belongs to 1 to 10. The main parameters of the datasets, which are used in the experiments, are illustrated in Table 5.

4.2. Evolution Metrics

The most commonly used evaluation metrics in the collaborative filtering based recommender system is mean absolute error (MAE), precision, recall, f-measure and mean reciprocal rank (MRR) and these metrics are used for the evaluation of the proposed method’s performance. At the time of evaluation, items’ prediction is done by using the proposed methodology and, after that, the proposed method’s performance is measured by comparing the predicted and actual ratings of the items in terms of MAE, precision, recall, f-measure, and MRR.

4.2.1. Mean Absolute Error

Mean absolute error (MAE) [43] is a metric that is used to detect the accuracy of the system by comparing the predicted ratings against the actual ratings of the items. The average absolute difference between the estimated and the user’s true rating is termed as the mean absolute error. The relationship between the metrics measure and the system’s performance is inverse, which implies better performance of the system, and the value of MAE should be lower:

M A E = \frac{\sum_{i = 1}^{N_{u}} | r_{u, x} - p_{u, x} |}{N_{u}} .

(5)

Here,

r_{u, x}

determines the actual rating that user u rates item x and

p_{u, x}

denotes the predicted rating of item x for user u which is predicted by the proposed methodology. In addition,

N_{u}

denotes the amount of user u’s rated items.

4.2.2. Precision

Precision defines the segment of suggested items that are identical to the users’ preferences in the testing dataset [43]. It is also known as positive predictive value and measures the relevancy of the result. Assuming that, in the system, items ratings belong to 3–5 are positive ratings and 1 and 2 are negative ratings. Thus, when the item’s real rating exists within 3–5 and the item’s predicted rating is 3–5, then we can conclude that it is like the true positive prediction

(T P P)

, whereas, when the item’s real rating belongs to 1 and 2, but the item’s predicted rating belongs to 3–5, then we can declare this as false positive prediction

(F P P)

. Thus, the precision is computed as follows:

P r e c i s i o n = \frac{T P P}{T P P + F P P} .

(6)

4.2.3. Recall

The recall is the mean quantity of items of the testing dataset that exists among the ranked list from the training dataset. It also called sensitivity and measures the amount of truly relevant results that are identified. Based on the above precision’s hypothesis, an item’s real rating exists within 3–5 and the item’s predicted rating belongs to 3–5; then, we can call it a true positive prediction (TPP); on the other hand, when the item’s predicted rating belongs to 1–2, then it is classified as a false negative prediction

(F N P)

. Hence, the equation of the recall computation, on the basis of our assumption, is as follows:

R e c a l l = \frac{T P P}{T P P + F N P} .

(7)

4.2.4. F-Measures

It has been seen that, when the precision and recall cannot anticipate decent results, then f-measure is used as a weighted harmonic mean of the precision and recall to ensure better evaluation of the test system:

F_{M e a s u r e s} = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l} .

(8)

4.2.5. Mean Reciprocal Rank

The mean reciprocal rank

(M R R)

is a popular evaluation metric that is used to calculate the average of reciprocal of the rank in which the first correct item was retrieved at each prediction [44]. It will match the recommended list with the predicted recommended list. It will give a higher score if the match is found at the top of the list. The MRR is computed as follows:

M R R = \frac{1}{| N |} \sum_{i = 1}^{| N |} \frac{1}{r a n k_{i}} .

(9)

5. Results and Discussion

5.1. Prediction Accuracy

In this section, we evaluate our proposed method’s performance on the basis of prediction accuracy for recommendation by comparing with some benchmark methods. For comparison, Pearson correlation coefficient (PCC), cosine similarity (COS), Jaccard similarity (Jaccard) and mean squared distance (MSD) are selected as the benchmark methods, as these are the most popular similarity measure’s methods. In addition, we have used 10-fold cross-validation to split the dataset for evaluation. Cross-validation is a process to verify the proposed methodology’s performance by splitting the dataset into two sub-datasets [45]. One of the sub-datasets is a training dataset to train the proposed method and another one is a testing dataset to evaluate it. For 10-fold cross-validation, each dataset is randomly split into 10 equal sub-datasets. After that, at each sub-dataset, 90% of data are randomly selected as train data and the rest of the data are used as test data. At the test phase, the proposed method generates a list of predicted ratings by using test data that are compared with actual ratings in terms of MAE, precision, recall, f-measures, and MRR. The cross-validation process is then repeated 10 times and the outputs of all folds are averaged to generate a single output of the whole process.

Here, the results that are evaluated by using the mentioned datasets (Table 5) have been presented. Figure 3 and Figure 4 represent the results that are obtained with Movielens and Movietweet datasets, respectively. In addition, the results are compared with the traditional similarity metrics by using MAE, precision, recall, f-measures, and MRR. It is proved that the proposed method outperforms with the existing traditional methods.

5.1.1. Performance Evaluation: Movielens DataSet

Usually, the range of items’ ratings in a Movielens dataset is 1 to 5. Thus, for calculating precision and recall, ratings of items that belong to 3–5 are chosen as positive ratings and the ratings that belong to 1 and 2 are treated as negative ratings. The results of the MAE, precision, recall, f-measures and MRR are generated by using the benchmark and proposed methods are shown in Figure 3. In Figure 3, the proposed similarity measure method is denoted as PSM. Regarding the prediction accuracy, PSM surpasses all other benchmark methods in every neighborhood size of the dataset. For the maximum number of neighborhoods, PSM achieved a significant accuracy on average in terms of (Figure 3a) MAE 3.42%, (Figure 3b) precision 7.25%, (Figure 3c) recall 7.20%, (Figure 3d) f-measure 8.76%, and (Figure 3e) MRR 49.3%. For average improved accuracy calculation, we used the following equation:

I A = \frac{1}{N} \sum_{i = 1}^{N} \frac{| O u t p u t_{B M} - O u t p u t_{P S M} |}{O u t p u t_{B M}} .

(10)

Here,

I A

denotes improved accuracy and

B M

defines the benchmark method. In addition, N determines the number of benchmark methods used for evaluation.

5.1.2. Performance Evaluation: MovieTweets DataSet

Again, in the MovieTweets dataset, the rating range varies between 1 to 10. Thus, here, rating range 5–10 have selected as the positive ratings and rating between 1–4 is selected as the negative rating. The MAE, precision, recall, f-measures, and MRR that are calculated from the Movietweets dataset are manifested in Figure 4 by using the same benchmark methods that are mentioned in Section 5.1.1. Again, it is proved that our PSM comes up with a better result than the state of artworks even in the case of the Movietweets dataset also. Further regarding maximum neighborhood size, PSM improved marginal accuracy with respect to (Figure 4a) MAE 8.58%, Figure 4b) precision 3.29%, (Figure 4c) recall 7.55%, (Figure 4d) f-measure 5.15% and (Figure 4e) MRR 16.49% on average. For calculating improved accuracy for the Movietweets dataset, we used the above-mentioned Equation (10).

6. Conclusions

In this paper, a new approach has been proposed to integrate items’ genre data in the item-based CF as a solution to the cold-start items problem. The main contribution of this paper is to generate items’ similarity by utilizing an items’ genre as an item must belong to some specific genre. To determine items’ similarities, two types of inter-connectivity of items have been calculated. One of them is the direct asymmetric inter-connectivity based on the item’s genre with the reliability of items’ similarity. Again, another one is the relative inter-connectivity based on the reliability of direct asymmetric inter-connectivity between items. Furthermore, an enhancement of a prediction algorithm has been proposed to increase the prediction accuracy of the system. To verify our proposed similarity measures’ performance, four popular similarity measures have been used which are PCC, Cosine, Jaccard, and MSD. It is also validated that the proposed technique performs better while comparing with the similarity measures mentioned in reference to prediction accuracy. For the future work, we are interested in conducting further analysis on how to integrate an item’s genre to the user-based collaborative filtering to also overcome the cold-start user problem.

Author Contributions

Conceptualization, M.H.; Formal analysis, M.H. and F.R.; Investigation, M.H. and F.R.; Methodology, M.H.; Validation, M.H. and F.R.; Writing—original draft, F.R.; Writing—review and editing, M.H. and F.R.

Funding

This research received no external funding.

Conflicts of Interest

The authors declare no conflict of interest.

References

Melville, P.; Sindhwani, V. Recommender systems. In Encyclopedia of Machine Learning and Data Mining; Springer: Berlin, Germany, 2017; pp. 1056–1066. [Google Scholar]
Tarus, J.K.; Niu, Z.; Mustafa, G. Knowledge-based recommendation: A review of ontology-based recommender systems for e-learning. Artif. Intell. Rev. 2018, 50, 21–48. [Google Scholar] [CrossRef]
Lu, J.; Wu, D.; Mao, M.; Wang, W.; Zhang, G. Recommender system application developments: A survey. Decis. Support Syst. 2015, 74, 12–32. [Google Scholar] [CrossRef]
Nobahari, V.; Jalali, M.; Mahdavi, S.J.S. ISoTrustSeq: A social recommender system based on implicit interest, trust and sequential behaviors of users using matrix factorization. J. Intell. Inf. Syst. 2019, 52, 239–268. [Google Scholar] [CrossRef]
Barkan, O.; Koenigstein, N. Item2vec: Neural item embedding for collaborative filtering. In Proceedings of the 2016 IEEE 26th International Workshop on Machine Learning for Signal Processing (MLSP), Vietri sul Mare, Italy, 13–16 September 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–6. [Google Scholar]
Wei, J.; He, J.; Chen, K.; Zhou, Y.; Tang, Z. Collaborative filtering and deep learning based recommendation system for cold start items. Expert Syst. Appl. 2017, 69, 29–39. [Google Scholar] [CrossRef]
Koren, Y. Factorization meets the neighborhood: A multifaceted collaborative filtering model. In Proceedings of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Las Vegas, NV, USA, 24–27 August 2008; ACM: New York, NY, USA, 2008; pp. 426–434. [Google Scholar]
Gantner, Z.; Drumond, L.; Freudenthaler, C.; Rendle, S.; Schmidt-Thieme, L. Learning attribute-to-feature mappings for cold-start recommendations. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, Australia, 13–17 December 2010; IEEE: Piscataway, NJ, USA, 2010; pp. 176–185. [Google Scholar]
Menon, A.K.; Elkan, C. Predicting labels for dyadic data. Data Min. Knowl. Discov. 2010, 21, 327–343. [Google Scholar] [CrossRef] [Green Version]
Hwang, C.S.; Chen, Y.P. Using trust in collaborative filtering recommendation. In Proceedings of the International Conference on Industrial, Engineering and Other Applications of Applied Intelligent Systems, Kyoto, Japan, 26–29 June 2007; Springer: Berlin, Germany, 2007; pp. 1052–1060. [Google Scholar]
Gao, Q.; Gao, L.; Fan, J.; Ren, J. A preference elicitation method based on bipartite graphical correlation and implicit trust. Neurocomputing 2017, 237, 92–100. [Google Scholar] [CrossRef] [Green Version]
Roy, F.; Sarwar, S.M.; Hasan, M. User similarity computation for collaborative filtering using dynamic implicit trust. In Proceedings of the International Conference on Analysis of Images, Social Networks and Texts, Yekaterinburg, Russia, 9–11 April 2015; Springer: Berlin, Germany, 2015; pp. 224–235. [Google Scholar]
Isinkaye, F.O.; Folajimi, Y.O.; Ojokoh, B.A. Recommendation systems: Principles, methods and evaluation. Egypt. Inf. J. 2015, 16, 261–273. [Google Scholar] [CrossRef] [Green Version]
Duricic, T.; Lacic, E.; Kowald, D.; Lex, E. Trust-based collaborative filtering: Tackling the cold start problem using regular equivalence. In Proceedings of the 12th ACM Conference on Recommender Systems, Vancouver, BC, Canada, 2 October 2018; ACM: New York, NY, USA, 2018; pp. 446–450. [Google Scholar]
Logesh, R.; Subramaniyaswamy, V. A reliable point of interest recommendation based on trust relevancy between users. Wirel. Pers. Commun. 2017, 97, 2751–2780. [Google Scholar] [CrossRef]
Ma, X.; Ma, J.; Li, H.; Jiang, Q.; Gao, S. ARMOR: A trust-based privacy-preserving framework for decentralized friend recommendation in online social networks. Future Gener. Comput. Syst. 2018, 79, 82–94. [Google Scholar] [CrossRef]
Berkhim, P.; Xu, Z.; Mao, J.; Rose, D.E.; Taha, A.; Maghoul, F. Trust Propagation through Both Explicit and Implicit Social Networks. U.S. Patent 9,576,029, 21 February 2017. [Google Scholar]
Portugal, I.; Alencar, P.; Cowan, D. The use of machine learning algorithms in recommender systems: A systematic review. Expert Syst. Appl. 2018, 97, 205–227. [Google Scholar] [CrossRef] [Green Version]
Dascalu, M.I.; Bodea, C.N.; Moldoveanu, A.; Mohora, A.; Lytras, M.; de Pablos, P.O. A recommender agent based on learning styles for better virtual collaborative learning experiences. Comput. Hum. Behav. 2015, 45, 243–253. [Google Scholar] [CrossRef]
Pu, P.; Chen, L. Trust building with explanation interfaces. In Proceedings of the 11th International Conference on Intelligent User Interfaces, Sydney, Australia, 29 January–1 February 2006; ACM: New York, NY, USA, 2006; pp. 93–100. [Google Scholar] [Green Version]
Goldberg, D.; Nichols, D.; Oki, B.M.; Terry, D. Using collaborative filtering to weave an information tapestry. Commun. ACM 1992, 35, 61–70. [Google Scholar] [CrossRef]
Parvin, H.; Moradi, P.; Esmaeili, S. A collaborative filtering method based on genetic algorithm and trust statements. In Proceedings of the 2018 6th Iranian Joint Congress on Fuzzy and Intelligent Systems (CFIS), Kerman, Iran, 28 February–2 March 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 13–16. [Google Scholar]
Cinicioglu, E.N.; Shenoy, P.P. A new heuristic for learning Bayesian networks from limited datasets: A real-time recommendation system application with RFID systems in grocery stores. Ann. Oper. Res. 2016, 244, 385–405. [Google Scholar] [CrossRef]
Covington, P.; Adams, J.; Sargin, E. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; ACM: New York, NY, USA, 2016; pp. 191–198. [Google Scholar]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Analysis of recommendation algorithms for e-commerce. In Proceedings of the 2nd ACM Conference on Electronic Commerce, Minneapolis, MN, USA, 17–20 October 2000; ACM: New York, NY, USA, 2000; pp. 158–167. [Google Scholar]
Liu, H.; Hu, Z.; Mian, A.; Tian, H.; Zhu, X. A new user similarity model to improve the accuracy of collaborative filtering. Knowl.-Based Syst. 2014, 56, 156–166. [Google Scholar] [CrossRef] [Green Version]
Sarwar, B.; Karypis, G.; Konstan, J.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the 10th International Conference on World Wide Web, Hong Kong, China, 1–5 May 2001; ACM: New York, NY, USA, 2001; pp. 285–295. [Google Scholar] [Green Version]
Linden, G.; Smith, B.; York, J. Amazon. com recommendations: Item-to-item collaborative filtering. IEEE Internet Comput. 2003, 1, 76–80. [Google Scholar] [CrossRef]
Davidson, J.; Liebald, B.; Liu, J.; Nandy, P.; Van Vleet, T.; Gargi, U.; Gupta, S.; He, Y.; Lambert, M.; Livingston, B.; et al. The YouTube video recommendation system. In Proceedings of the Fourth ACM Conference on Recommender Systems, Barcelona, Spain, 26–30 September 2010; ACM: New York, NY, USA, 2010; pp. 293–296. [Google Scholar] [Green Version]
Li, D.; Chen, C.; Lv, Q.; Shang, L.; Zhao, Y.; Lu, T.; Gu, N. An algorithm for efficient privacy-preserving item-based collaborative filtering. Future Gener. Comput. Syst. 2016, 55, 311–320. [Google Scholar] [CrossRef]
Dakhel, A.M.; Malazi, H.T.; Mahdavi, M. A social recommender system using item asymmetric correlation. Appl. Intell. 2018, 48, 527–540. [Google Scholar] [CrossRef]
Papagelis, M.; Plexousakis, D.; Kutsuras, T. Alleviating the sparsity problem of collaborative filtering using trust inferences. In Proceedings of the International Conference on Trust Management, Paris, France, 23–26 May 2005; Springer: Berlin, Germany, 2005; pp. 224–239. [Google Scholar]
Zhang, Z.; Liu, Y.; Jin, Z.; Zhang, R. A dynamic trust based two-layer neighbor selection scheme towards online recommender systems. Neurocomputing 2018, 285, 94–103. [Google Scholar] [CrossRef]
Parvin, H.; Moradi, P.; Esmaeili, S. TCFACO: Trust-aware collaborative filtering method based on ant colony optimization. Expert Syst. Appl. 2019, 118, 152–168. [Google Scholar] [CrossRef]
Yang, B.; Lei, Y.; Liu, J.; Li, W. Social collaborative filtering by trust. IEEE Trans. Pattern Anal. Mach. Intell. 2017, 39, 1633–1647. [Google Scholar] [CrossRef]
Linda, S.; Bharadwaj, K.K. A fuzzy trust enhanced collaborative filtering for effective context-aware recommender systems. In Proceedings of the First, International Conference on Information and Communication Technology for Intelligent Systems, Ahmedabad, India, 28–29 November 2015; Springer: Cham, Switzerland, 2016; Volume 2, pp. 227–237. [Google Scholar]
Yera, R.; Martinez, L. Fuzzy tools in recommender systems: A survey. Int. J. Comput. Intell. Syst. 2017, 10, 776–803. [Google Scholar] [CrossRef]
Ebesu, T.; Fang, Y. Neural Semantic Personalized Ranking for item cold-start recommendation. Inf. Retr. J. 2017, 20, 109–131. [Google Scholar] [CrossRef]
Barjasteh, I.; Forsati, R.; Masrour, F.; Esfahanian, A.H.; Radha, H. Cold-start item and user recommendation with decoupled completion and transduction. In Proceedings of the 9th ACM Conference on Recommender Systems, Vienna, Austria, 16–20 September 2015; ACM: New York, NY, USA, 2015; pp. 91–98. [Google Scholar]
Blerina, L.; Kostas, K.; Stathes, H. Facing the cold start problem in recommender systems. Expert Syst. Appl. 2014, 41, 2065–2073. [Google Scholar]
Beel, J.; Genzmehr, M.; Langer, S.; Nürnberger, A.; Gipp, B. A comparative analysis of offline and online evaluations and discussion of research paper recommender system evaluation. In Proceedings of the International Workshop on Reproducibility and Replication in Recommender Systems Evaluation, Hong Kong, China, 12 October 2013; ACM: New York, NY, USA, 2013; pp. 7–14. [Google Scholar] [Green Version]
Dooms, S.; De Pessemier, T.; Martens, L. Movietweetings: A movie rating dataset collected from twitter. In Proceedings of the Workshop on Crowdsourcing and Human Computation for Recommender Systems, held in conjunction with the 7th ACM Conference on Recommender Systems, Hong Kong, China, 12 October 2013; ACM: New York, NY, USA, 2013; Volume 2013, p. 43. Available online: https://biblio.ugent.be/publication/4284240 (accessed on 10 December 2018).
Herlocker, J.L.; Konstan, J.A.; Terveen, L.G.; Riedl, J.T. Evaluating collaborative filtering recommender systems. ACM Trans. Inf. Syst. (TOIS) 2004, 22, 5–53. [Google Scholar] [CrossRef]
Craswell, N. Mean Reciprocal Rank; Liu, L., ÖZsu, M.T., Eds.; Springer: Boston, MA, USA, 2009; p. 1703. [Google Scholar] [CrossRef]
Roberts, D.R.; Bahn, V.; Ciuti, S.; Boyce, M.S.; Elith, J.; Guillera-Arroita, G.; Hauenstein, S.; Lahoz-Monfort, J.J.; Schröder, B.; Thuiller, W.; et al. Cross-validation strategies for data with temporal, spatial, hierarchical, or phylogenetic structure. Ecography 2017, 40, 913–929. [Google Scholar] [CrossRef]

Figure 1. Bipartite graph based on items’ genre.

Figure 2. Graph of sigmoidal activation function.

Figure 3. Comparison of proposed and benchmark methods in reference to (a) MAE; (b) precision; (c) recall; (d) f-measure and (e) MRR for Movielens dataset.

Figure 4. Differentiation of proposed and benchmark methods on basis of (a) MAE; (b) precision; (c) recall; (d) f-measure and (e) MRR for MovieTweets dataset.

Table 1. Items classification based on genre.

Movie/Genre	Action	Romantic	Horror	Adventure
Stardust	-	1	-	1
Mr. and Mrs. Smith	1	1	-	-
Wolf Girl	-	1	1	1
Constantine	1	-	1	-
Wrong Turn	-	-	1	-

Table 2. An item based similarity metrics based on items genre.

	Stardust	Mr. and Mrs. Smith	Wolf Girl	Constantine	Wrong Turn
Stardust	-	0.250	0.165	0.000	0.000
Mr. and Mrs. Smith	-	-	0.165	0.250	0.000
Wolf Girl	-	-	-	0.165	0.330
Constantine	-	-	-	-	0.500

Table 3. Example of direct asymmetric similarities of items.

	Item1	Item2	Item3	Item4	Item5
Item1	-	0.52	0.00	0.33	0.25
Item2	0.40	-	0.65	0.00	0.00
Item3	0.00	0.25	-	0.16	0.33
Item4	0.81	0.00	0.30	-	0.50
Item5	0.10	0.00	0.41	0.65	-

Table 4. A list of relative similarities of items through transitive correlations.

	Item1	Item2	Item3	Item4	Item5
Item1	-	0.52	0.41	0.33	0.25
Item2	0.40	-	0.65	0.38	0.41
Item3	0.35	0.25	-	0.16	0.33
Item4	0.81	0.66	0.30	-	0.50
Item5	0.10	0.31	0.41	0.65	-

Table 5. Important statistics of datasets used for the experiments.

Datasets	Users	Items	Rating Range	Genre
Movielens	6040	3952	1–5	19
MovieTweets	43,357	25,193	1–10	28+

© 2019 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Hasan, M.; Roy, F. An Item–Item Collaborative Filtering Recommender System Using Trust and Genre to Address the Cold-Start Problem. Big Data Cogn. Comput. 2019, 3, 39. https://doi.org/10.3390/bdcc3030039

AMA Style

Hasan M, Roy F. An Item–Item Collaborative Filtering Recommender System Using Trust and Genre to Address the Cold-Start Problem. Big Data and Cognitive Computing. 2019; 3(3):39. https://doi.org/10.3390/bdcc3030039

Chicago/Turabian Style

Hasan, Mahamudul, and Falguni Roy. 2019. "An Item–Item Collaborative Filtering Recommender System Using Trust and Genre to Address the Cold-Start Problem" Big Data and Cognitive Computing 3, no. 3: 39. https://doi.org/10.3390/bdcc3030039

APA Style

Hasan, M., & Roy, F. (2019). An Item–Item Collaborative Filtering Recommender System Using Trust and Genre to Address the Cold-Start Problem. Big Data and Cognitive Computing, 3(3), 39. https://doi.org/10.3390/bdcc3030039

Article Menu

An Item–Item Collaborative Filtering Recommender System Using Trust and Genre to Address the Cold-Start Problem

Abstract

1. Introduction

1.1. Motivation

1.2. Paper Contribution

2. Related Works

3. Proposed Method

3.1. Genre Based Correlation Determination

3.2. Confidence with Laplace Correction

3.3. Direct Inter-Connectivity Detection: Genre Based Item–Item Asymmetric Similarity

3.4. Relative Inter-Connectivity Detection: Inferring Transitivity of Reliable Correlation of Items

3.5. Proposed Prediction Algorithm

4. Experimental Setup and Evaluations

4.1. Dataset

4.2. Evolution Metrics

4.2.1. Mean Absolute Error

4.2.2. Precision

4.2.3. Recall

4.2.4. F-Measures

4.2.5. Mean Reciprocal Rank

5. Results and Discussion

5.1. Prediction Accuracy

5.1.1. Performance Evaluation: Movielens DataSet

5.1.2. Performance Evaluation: MovieTweets DataSet

6. Conclusions

Author Contributions

Funding

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI