1. Introduction
Web 2.0 and e-commerce have triggered an explosion of online reviews. These reviews usually contain a large amount of sentiment and opinion information that is essential to many decision-making processes, such as personalized consumption decisions, product quality tracking, and public opinion mining. How to mine the information of reviews on sentiment and opinions has become a fundamental problem in natural language processing (NLP) and Web mining fields [
1,
2].
Sentiment polarity classification of online reviews has been widely studied in NLP, but it gradually fails to meet the requirement for mining fine-grained sentiment [
3,
4,
5,
6,
7]. For example, a consumer doesn’t know how to choose the optimum product from all kinds of products when they all belong to the positive sentiment polarity. Some studies have shown that consumers are willing to pay 20% to 99% extra for five-star ratings rather than four-star ratings [
8]. This indicates that slight differences in product ratings may lead to dramatic changes in product sales. For opinion mining, the government should not only understand the positive and negative sentiment polarities but also further understand the intensity of positive and negative sentiments in order to distinguish the urgency of public opinion events and take different measures. Therefore, researchers are increasingly concerned with review rating predictions (RRP). Existing RRP methods based on the review text content mainly transform review text into feature vectors and then employ a machine learning model to predict review rates [
9,
10,
11,
12,
13]. For example, RRP is considered as a feature engineering problem, and the performance of RRP is improved by extracting different features, such as words, lexical patterns, syntactic structures, and semantic topics from the review text content [
10]. Zhang et al. extracted the feature from review text content through word embedding and a Convolutional Neural Network CNN and then realized the RRP through the fully connected network. In this way, the performance of the RRP is improved [
13].
The RRP methods based on review text content have an implicit assumption that the sentiment magnitude expressed by different users using the same sentiment words is consistent, and the sentiment magnitude expressed by different sentiment words is different. However, this implicit hypothesis does not match the actual situation. For example, different users providing similar reviews on a product might rate it differently, or they might give it the same rating while writing very different reviews, depending on how strict/lenient they are or how they like to convey their opinions. Wang et al. believe that the rating is not entirely determined by the review text content, because a harsh user may comment on all products with strict words, even if they give the product a high rating [
14]. Different consumers make use of the same sentiment words to express different sentiment intensities, which reflects the consumer’s personalized expression when using sentiment words. Based on the above analysis, we found that the RRP is not only related to the review text content but is also related to the personalized information of the reviewer.
Review text content is an important source of information for obtaining personalized information regarding users. Wu et al. considered the personalized information of micro-blog users, proposed a personalized micro-blog sentiment classification method, and achieved better sentiment classification performance [
15]. The user-item rating matrix is another data source for obtaining personalized information about users. From the perspective of the recommendation system, based on the historical rating in the user-item rating matrix, the personalized information of the users can be mined through the collaborative filtering algorithm [
16,
17,
18,
19,
20,
21].
The main problem with the existing RRP methods based on the review text content is that the user personalization dependency of the sentiment word cannot be fully exploited only based on the review text content. The user personalized information can be obtained not only by the review text content but also by the user-item rating matrix [
22]. Therefore, we propose a user-personalized review rating prediction (UPRRP) method based on review text content and user-item rating matrix by integrating the review text content and user-item rating matrix information. Our method firstly models the commonality and personality of the user’s sentiment expression based on the review text content and then models user personalization through the user-item rating matrix. Finally, the UPRRP is realized by linearly integrating the review text content and the user-item rating matrix information.
The main contributions of this paper can be summarized as:
(1) We propose a novel method based on review text and user-item rating matrix for personalized review rating prediction.
(2) We model user personality sentiment information by integrating review text and user-item rating matrix information.
(3) Our comparative results on four datasets show that our model is significantly better than previous approaches on tasks of review rating prediction.
The rest of the paper is as follows.
Section 2 introduces related researches on RRP.
Section 3 describes the three UPRRP methods we proposed. Experimental results on four review datasets are reported in
Section 4. Finally,
Section 5 concludes the paper and points out the future research direction.
3. UPRRP Based on Review Text Content and a User-Item Rating Matrix
3.2. UPRRP Method Based on Review Text Content
Review text content is a very important information source for RRP. Current review-text-content-based RRP methods mainly use a vector space model (VSM) to express review text content and then use a linear regression model to predict the review rating. Specifically, there are four steps to take. Firstly, online review text content, which includes segmentations of terms, part-of-speech tagging, and frequency statistics, should be preprocessed. Secondly, regarding words, phrases, and n-gram as features, people employ some feature selection methods to choose features that can perfectly express the review text content to compose the feature set. Thirdly, each online review is expressed as a multi-dimensional vector. Finally, the linear regression model dealing with those vectors of reviews is adopted to predict the review rating.
Here, is the predicted score of user u for item i; w is the parameters of the function; rui is the vector representation of review text content.
Because of the difference of sentiment expression among different users in product review sites, the general RRP model established for all users does not accurately understand the particular sentiment information of each user. It is the most intuitive way to design a personalized RRP method for each user by using the personal review text content posted by each user in product review sites. Nevertheless, in product review sites, the personal review text content posted by a single user is generally very scarce. Therefore, based on the personal review text content information alone, it is very difficult to accurately train a UPRRP model for each user.
Social science research shows that while online users express their sentiments in a personalized way, different users share many of the same sentiment expressions [
41]. For example, “poor” and “bad” are often used to express negative emotions between different users. Therefore, taking full advantage of the shared sentiment information between different users can effectively solve the problem of insufficient data of individual users.
Based on the above analysis, a UPRRP model based on the review text content (UPRRP+RTC) is proposed. In order to model the sentiment commonality of different users and sentiment personality of a single user, the UPRRP model is decomposed into two parts, one is public and the other is user-specific. The public part shared by all users is used to describe the sentiment information shared by different users. The model parameters in the public part are trained using all the user data. The user-specific portion that is unique to each user is used to describe the specific sentiment expression for each user. The model parameters in the user-special part are trained using the single user’s data.
To be specific, user
u has published a review
rui on the item
i. The UPRRP model based on the review text content is as follows:
Here, is the predicted rating of user u for item i; w and wu are the public and specific parameters in UPRRP model; rui is the vector representation of review text content.
To estimate the parameter vectors
w and
wu, given
and
, we minimize the objective function by applying the least squares error loss principle in the training data set.
Here, ‖
w‖ and ‖
wu‖ are the regular terms and
λ is the regular coefficient. To calculate the parameter vectors
w and
wu, we solve this optimization function by applying a stochastic gradient descent. Finally, we learn the parameters
w and
wu by using the following update rules.
Here, , η is learning rate. After getting w and wu, given , we predict the review rating by using .
3.3. UPRRP Based on the User-Item Rating Matrix
In the Recommender Systems (RS), the key to personalized modeling and recommendations for users is to predict the score of the missing rating in UIRM based on the historical ratings in the UIRM. The existing mainstream recommendation method is collaborative filtering (CF), which mainly includes two types of methods; K nearest neighbor method (KNN) based on user similarity or item similarity and matrix factorization (MF) method based on the latent factor model.
KNN-based RRP includes KNN based on user similarity and KNN based on item similarity. The ideas of these two methods are basically the same. Since our goal is to achieve RRP by mining the user’s personalized information. Therefore, we adapt the KNN based on user similarity.
RRP based on matrix factorization is the most popular method in RS. The core idea of the algorithm is to first find latent factors related to the user’s personalized preferences, and then associate the users with the items through the latent factors. By mining the user’s personalized information, the user’s rating of the item is finally realized.
The two types of methods based on KNN and MF have different perspectives in implementing RRP. Considering the information complementarity, we propose a UPRRP model based on the user-item rating matrix by integrating KNN and MF algorithms.
Here, β is the parameter that must be estimated, which is used to adjust the proportion of KNN and MF in our method. is the predicted rating of user u for item i, C is the set of k nearest neighbors of user u, suu’ is the similarity between the user u and the user u’, and vu’i is the rating of the item i by the user u’. We define su as a k-dimensional vector which is composed of suu’, and vi is a k-dimensional vector which is composed of vu’i. pu is the latent factor vector of user u, qi is the latent factor vector of the item i.
To calculate the parameter
β,
su,
pu, and
qi, given the training data set
and
, we use the least-square error loss in training data as the objective function.
Here,
λ is the regular coefficient, ‖
su‖, ‖
pu‖, and ‖
qi‖ are the regular terms of the parameter. To estimate the parameter
β,
su,
pu, and
qi, we first traverse
β from 0 to 1 in steps of 0.01, and then solve this optimization problem for each fixed
β by applying a stochastic gradient descent algorithm in the training dataset. We learn the parameters
su,
pu, and
qi by using the following update rules.
Here, , η is learning rate. After getting β, su, pu, and qi, given , we can use to predict the review rating.
3.4. UPRRP Based on Review Text Content and the User-Item Rating Matrix
There are mainly two types of methods in existing RRP. The first one includes the methods based on review text content, which can be described as a function
f1: (RTC) → (RR). It simply ignores the relationship between the reviewers and the items. The other one contains the methods based on collaborative filtering, which can be described as a function
f2: (UIRM) → (RR). This type of method exploits no information from review text content. Review text content and the user-item rating matrix are two types of different information sources for obtaining users’ personalized sentiment information. Based on
Section 3.2 and
Section 3.3, we propose a UPRRP method based on the review text content and the user-item rating matrix by integrating the review text content information and the user-item rating matrix information.
Here,
β is the parameter which is estimated in
Section 3.3,
α is parameter that needs to estimated and is used to adjust the proportion of UPRRP based on review text content and UPRRP based on user-item rating in our method.
vui is the predicted rating of user
u for item
i;
w and
wu are the common and specific parameters in the UPRRP model;
rui is the vector representation of review text content. C is the set of
k nearest neighbors of user
u,
suu’ is the similarity between the user
u and the user
u’, and
vu’i is the rating of the item
i by the user
u’.
pu is the latent factor vector of user
u and
qi is the latent factor vector of the item
i.
In order to get the optimum parameters
α,
w,
wu,
su,
pu, and
qi, we use the least-square error loss to minimize the objective function in the training datasets.
Here,
λ is the regular coefficient, ‖
w‖, ‖
wu‖, ‖
su‖, ‖
pu‖, and ‖
qi‖ are the regular terms of the parameter. To estimate the parameter
α,
w,
wu,
su,
pu, and
qi, we first get the optimal parameters
β based on
Section 3.3, then traverse α from 0 to 1 in steps of 0.01, and finally, use a stochastic gradient descent algorithm to solve this optimization problem for each fixed α in the training dataset. We learn the parameters
w,
wu,
su,
pu, and
qi by applying the following update rules.
Here, , η is learning rate. After getting α, w, wu, su, pu, and qi, given , we can use to predict the review rating.
Author Contributions
conceptualization, B.W.; methodology, B.W.; software, B.C.; validation, B.C.; formal analysis, B.W.; investigation, B.W.; data curation, G.Z.; writing—original draft preparation, B.W.; writing—review and editing, B.W., Project administration, L.M.
Funding
This work was supported in part by National Natural Science Foundation of China (61472092), Foundation of He’nan Science Technology Committee (172102210428), Foundation of He’nan Educational Committee (19A520032) and Ph.D. Start-up Foundation of Pingdingshan University (PXY-BSQD-2018007).
Acknowledgments
The authors would like to thank all anonymous reviewers and editors for their helpful suggestions for the improvement of this paper.
Conflicts of Interest
The authors declare that there is no conflict of interests regarding the publication of this paper.
References
- Piryani, R.; Madhavi, D.; Singh, V.K. Analytical mapping of opinion mining and sentiment analysis research during 2000–2015. Inf. Process. Manag. 2016, 53, 122–150. [Google Scholar] [CrossRef]
- Liu, B. Sentiment analysis and opinion mining. Synth. Lect. Hum. Lang. Technol. 2012, 5, 1–167. [Google Scholar] [CrossRef]
- Khan, F.H.; Qamar, U.; Bashir, S. A semi-supervised approach to sentiment analysis using revised sentiment strength based on Senti Word Net. Knowl. Inf. Syst. 2017, 51, 851–872. [Google Scholar] [CrossRef]
- Khan, F.H.; Qamar, U.; Bashir, S. e SAP: A decision support framework for enhanced sentiment analysis and polarity classification. Inf. Sci. 2016, 367, 862–873. [Google Scholar] [CrossRef]
- Khan, F.H.; Qamar, U.; Bashir, S. Multi-objective model selection (MOMS)-based semi-supervised framework for sentiment analysis. Cognit. Comput. 2016, 8, 614–628. [Google Scholar] [CrossRef]
- Khan, F.H.; Qamar, U.; Bashir, S. SWIMS: Semi-supervised subjective feature weighting and intelligent model selection for sentiment analysis. Knowl.-Based Syst. 2016, 100, 97–111. [Google Scholar] [CrossRef]
- Kiritchenko, S.; Zhu, X.; Mohammad, S.M. Sentiment analysis of short informal texts. J. Artif. Intell. Res. 2014, 50, 723–762. [Google Scholar] [CrossRef]
- Horrigan, J. “Online shopping,” Pew Internet and American Life Project Report. Pew Research Center. 2008. Available online: http://www.pewinternet.org/2008/02/13/online-shopping/.
- Wu, Y.; Ester, M. FLAME: A probabilistic model combining aspect based opinion mining and collaborative filtering. In Proceedings of the Eighth ACM International Conference on Web Search and Data Mining, Shanghai, China, 2–6 February 2015; pp. 199–208. [Google Scholar]
- Qu, L.; Ifrim, G.; Weikum, G. The bag-of-opinions method for review rating prediction from sparse text patterns. In Proceedings of the 23rd International Conference on Computational Linguistics, Beijing, China, 23–27 August 2010; pp. 913–921. [Google Scholar]
- Li, F.; Liu, N.; Jin, H.; Zhao, K.; Yang, Q.; Zhu, X. Incorporating reviewer and item information for review rating prediction. In Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence, Barcelona, Spain, 16–22 July 2011; Volume 11, pp. 1820–1825. [Google Scholar]
- Ganu, G.; Elhadad, N.; Marian, A. Beyond the Stars: Improving Rating Predictions using Review Text Content. In Proceedings of the Twelfth International Workshop on the Web and Databases, WebDB, Providence, RI, USA, 28 June 2009; Volume 9, pp. 1–6. [Google Scholar]
- Zheng, L.; Noroozi, V.; Yu, P.S. Joint deep modeling of users and items using reviews for recommendation. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, UK, 6–10 February 2017; pp. 425–434. [Google Scholar]
- Wang, H.; Lu, Y.; Zhai, C. Latent aspect rating analysis on review text data: A rating regression approach. In Proceedings of the 16th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, 24–28 July 2010; pp. 783–792. [Google Scholar]
- Wu, F.; Huang, Y. Personalized Microblog Sentiment Classification via Multi-Task Learning. In Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence (AAAI-16), Phoenix, AZ, USA, 12–17 February 2016; pp. 3059–3065. [Google Scholar]
- Shi, Y.; Larson, M.; Hanjalic, A. Collaborative filtering beyond the user-item matrix: A survey of the state of the art and future challenges. ACM Comput. Surv. 2014, 47, 3. [Google Scholar] [CrossRef]
- Ma, H. An experimental study on implicit social recommendation. In Proceedings of the 36th International ACM SIGIR Conference on Research and Development in Information Retrieval, Dublin, Ireland, 28 July–1 August 2013; pp. 73–82. [Google Scholar]
- Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
- Koren, Y. Collaborative filtering with temporal dynamics. Commun. ACM 2010, 53, 89–97. [Google Scholar] [CrossRef]
- Colace, F.; De Santo, M.; Greco, L.; Moscato, V.; Picariello, A. A collaborative user-centered framework for recommending items in Online Social Networks. Comput. Hum. Behav. 2015, 51, 694–704. [Google Scholar] [CrossRef]
- Yu, K.; Zhu, S.; Lafferty, J.; Gong, Y. Fast nonparametric matrix factorization for large-scale collaborative filtering. In Proceedings of the 32nd International ACM SIGIR Conference on Research and Development in Information Retrieval, Boston, MA, USA, 19–23 July 2009; pp. 211–218. [Google Scholar]
- Li, P.; Wang, Z.; Ren, Z.; Bing, L.; Lam, W. Neural rating regression with abstractive tips generation for recommendation. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 07–11 August 2017; pp. 345–354. [Google Scholar]
- Pang, B.; Lee, L. Seeing stars: Exploiting class relationships for sentiment categorization with respect to rating scales. In Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics, Ann Arbor, MI, USA, 25–30 June 2005; pp. 115–124. [Google Scholar]
- Liu, J.; Seneff, S. Review sentiment scoring via a parse-and-paraphrase paradigm. In Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing, Singapore, 6–7 August 2009; Volume 1, pp. 161–169. [Google Scholar]
- Lee, H.C.; Lee, S.J.; Chung, Y.J. A study on the improved collaborative filtering algorithm for recommender system. In Proceedings of the 5th ACIS International Conference on Software Engineering Research, Management & Applications (SERA 2007), Busan, Korea, 20–22 August 2007; pp. 297–304. [Google Scholar]
- Jeong, B.; Lee, J.; Cho, H. Improving memory-based collaborative filtering via similarity updating and prediction modulation. Inf. Sci. 2010, 180, 602–612. [Google Scholar] [CrossRef]
- Covington, P.; Adams, J.; Sargin, E. Deep neural networks for youtube recommendations. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 191–198. [Google Scholar]
- He, X.; Chua, T.S. Neural factorization machines for sparse predictive analytics. In Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval, Tokyo, Japan, 7–11 August 2017; pp. 355–364. [Google Scholar]
- Catherine, R.; Cohen, W. TransNets: Learning to Transform for Recommendation. arXiv, 2017; arXiv:1704.02298. [Google Scholar]
- Kim, D.; Park, C.; Oh, J.; Lee, S.; Yu, H. Convolutional Matrix Factorization for Document Context-Aware Recommendation. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; ACM: New York, NY, USA, 2016. [Google Scholar]
- Seo, S.; Huang, J.; Yang, H.; Liu, Y. Interpretable convolutional neural networks with dual local and global attention for review rating prediction. In Proceedings of the Eleventh ACM Conference on Recommender Systems, Como, Italy, 27–31 August 2017; ACM: New York, NY, USA, 2017; pp. 297–305. [Google Scholar]
- He, X.; Chen, T.; Kan, M.Y.; Chen, X. Trirank: Review-aware explainable recommendation by modeling aspects. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management, Melbourne, Australia, 18–23 October 2015; ACM: New York, NY, USA, 2015; pp. 1661–1670. [Google Scholar]
- Ling, G.; Lyu, M.R.; King, I. Ratings meet reviews, a combined approach to recommend. In Proceedings of the 8th ACM Conference on Recommender systems, Foster City, CA, USA, 6–10 October 2014; pp. 105–112. [Google Scholar]
- McAuley, J.; Leskovec, J. Hidden factors and hidden topics: Understanding rating dimensions with review text. In Proceedings of the 7th ACM Conference on Recommender Systems, Hong Kong, China, 12–16 October 2013; pp. 165–172. [Google Scholar]
- Ren, Z.; Liang, S.; Li, P.; Wang, S.; de Rijke, M. Social collaborative viewpoint regression with explainable recommendations. In Proceedings of the Tenth ACM International Conference on Web Search and Data Mining, Cambridge, UK, 6–10 February 2017. [Google Scholar]
- Bao, Y.; Fang, H.; Zhang, J. Topicmf: Simultaneously exploiting ratings and reviews for recommendation. In Proceedings of the Twenty-Eighth AAAI Conference on Artificial Intelligence, Québec City, QC, Canada, 27–31 July 2014; pp. 2–8. [Google Scholar]
- Diao, Q.; Qiu, M.; Wu, C.; Smola, A.J.; Jiang, J.; Wang, C. Jointly modeling aspects, ratings and sentiments for movie recommendation (JMARS). In Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, New York, NY, USA, 24–27 August 2014; pp. 193–202. [Google Scholar]
- Jakob, N.; Weber, S.H.; Müller, M.C.; Gurevych, I. Beyond the stars: Exploiting free-text user reviews to improve the accuracy of movie recommendations. In Proceedings of the 1st International CIKM Workshop on Topic-Sentiment Analysis for Mass Opinion, Hong Kong, China, 6 November 2009; pp. 57–64. [Google Scholar]
- Zhang, W.; Yuan, Q.; Han, J.; Wang, J. Collaborative Multi-Level Embedding Learning from Reviews for Rating Prediction. In Proceedings of the Twenty-Fifth International Joint Conference on Artificial Intelligence, New York, NY, USA, 9–15 July 2016; pp. 2986–2992. [Google Scholar]
- Zhang, Y.; Ai, Q.; Chen, X.; Croft, W.B. Joint representation learning for top-n recommendation with heterogeneous information sources. In Proceedings of the 2017 ACM on Conference on Information and Knowledge Management, Singapore, 6–10 November 2017; pp. 1449–1458. [Google Scholar]
- Gong, L.; Al Boni, M.; Wang, H. Modeling social norms evolution for personalized sentiment classification. In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics, Berlin, Germany, 7–12 August 2016; Volume 1, pp. 855–865. [Google Scholar]
© 2018 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).