**Preface to "Information Retrieval and Social Media Mining"**

Many of today's businesses are taking advantage of advances in information retrieval and social media mining methods to increase their profits. These techniques allow them to personalize the products or services they offer their customers as well as to extract information from social networks to know user behavior, opinions, and sentiments, which can be exploited for multiple purposes.

This book aims to provide an insight into the progress made in the field of information retrieval and social media mining by presenting new contributions representative of the most recent research directions. They are focused on three highly topical areas: recommender systems, social media analysis, and sentiment analysis.

Since the first recommender systems appeared in the 1990s, research in this area has become increasingly interesting. Many methods have been proposed to provide users with personalized recommendations for products or services, although collaborative filtering (CF) is the most widespread approach. These techniques can be used alone or combined with other methods in hybrid approaches to tackle some problems that are specific to CF. A huge amount of current work is addressing the improvement of user recommendations in different ways. These range from the development of context-aware recommender systems or the evaluation of different aspects of the items to be recommended to the application of deep learning techniques, among others. Recently, the exploitation of social information is receiving special attention since social networks contain valuable data relating to user behavior, relations, interests, and preferences that can contribute to improving these systems. This book includes some proposals related to the mentioned topical issues.

Social networks have become a new source of virtually unlimited information that can be exploited through data analysis techniques in many domains, in addition to recommender systems. Every day their users generate, consume, and share through these media information about preferences, tastes, opinions, activities, location, relationships with other users, etc. The structure of these networks, their dynamics, the behavior of their users, the flow of information, etc. can be analyzed for diverse purposes. Some of them are the creation of user profiles, study of social influence, detection of implicit communities and analysis of their evolution, study of information diffusion, etc., which are subjects of unquestionable interest in many fields. In this task, social media mining plays a key role as the process of representing, analyzing, and extracting patterns from social media data. Some articles in the book are dedicated to the application of these techniques on social network data in order to obtain benefits in different areas of application.

Sentiment analysis and opinion mining are other areas of current intensive research in the domains of information retrieval and social media mining and have a wide range of applications. Their objective is to extract subjective information, such as positive, negative, or neutral opinions, from user-generated content through natural language processing, computational linguistics, and text mining techniques. Recently, deep learning models such as convolutional neural networks (CNNs) or recurrent neural networks (RNNs) have been used to improve their results. These methods require the text to be previously cleaned and transformed into numerical vectors by means of a preprocessing process that encompasses different tasks. In the last step of this process, the most used techniques are term frequency–inverse document frequency (TF–IDF) and word embedding, although the last approach is gaining increasing interest since it provides vectors capturing the word context, unlike other methods. In this regard, the development of word embedding techniques based on deep learning is also the focus of recent work. The latest articles in this book include proposals related to these topics of interest.

More detailed information on all the articles in the book is provided in the first, entitled "Information Retrieval and Social Media Mining", which gives an overview of each of them.

> **Mar´ıa N. Moreno Garc´ıa** *Editor*

### *Editorial* **Information Retrieval and Social Media Mining**

#### **María N. Moreno-García**

Department of Computer Science and Automation, University of Salamanca, 37008 Salamanca, Spain; mmg@usal.es

Received: 9 December 2020; Accepted: 10 December 2020; Published: 11 December 2020

The large amount of digital content available through web sites, social networks, streaming services, and other distribution media, allows more and more people to access virtually unlimited sources of information, products, and services. This enormous availability makes it very difficult for users to find what they are really interested in. Hence, the great current interest in developing personalized methods of information retrieval as well as reliable recommendation algorithms that help users to filter and discover what fits their preferences.

Social networks are a big source of data, from which valuable information can be extracted by means of datamining algorithms. Social media mining allows us to explore a wide range of aspects regarding users, communities, networks structures, information diffusion and so on, to be further exploited in multiple domains.

This Special Issue includes important contributions to the field of information retrieval and social media mining. Specifically, the articles published focus on three areas of research of great interest at the present: recommender systems, social media analysis, and sentiment analysis.

Collaborative Filtering (CF) is the approach most extensively used in recommender systems. It requires either explicit or implicit user ratings for items to be recommended. Then, recommendations provided to a user are based on the ratings of other users with similar preferences. Usually, each item is valued globally with a single rating; however, there are application domains in which different aspects of the items are rated. In these cases, multi-criteria recommendation models are required. Among them, one of the most recent and successful proposals is the utility-based multi-criteria recommendation approach, in which different utility functions can be used to model the value of an item from the perspective of a user. In this issue, an improvement of these models is presented in a proposal [1] that addresses user over-/under-expectations on items through penalty-enhanced models. These involve penalties in the range of [−1, 1] for over-expectations and under-expectations that are added to the utility score and are learned in conjunction with expectations in the same optimization process used to generate the top-N recommendations by maximizing the normalized discounted cumulative gain.

Sometimes, collaborative filtering methods are combined with content-based approaches to solve some problems of the former and obtain more reliable recommendations. This combination is used in a cascade hybrid proposal for document recommendation presented in this issue [2]. A content-based method that makes use of document processing techniques and document metadata is applied first to provide an initial list of recommendations. It also uses a function that involves term frequency (*tf*) and inverse document frequency (*idf*) weights for document ranking. In a second step, collaborative filtering is used to re-rank the previous list.

Research on recommender systems also benefits from the intensive work currently being done in the field of deep-learning algorithms. Deep neural networks are being used to overcome some problems associated with matrix factorization methods since they are able to better represent complex relations between users and items. However, their use is justified if the complexity of the problem or the number of instances of the training set is high. This is the scenario of a paper in this Special Issue [3], in which a graph convolutional network (GCN) algorithm called PharmaSage is proposed for providing pharmacy product cross-selling recommendations based on product feature information and sales data. The model was trained with a huge amount of real pharmaceutical data including almost a million products with complex properties and approximately 100 million sales transactions. This information is represented in a graph where each node represents a unique pharmacy product which also contains a vector encoding its descriptive data. Cross-selling for each pair of products is represented by undirected weighted edges between nodes. The GCN algorithm learns product embeddings by convolutions on aggregate neighborhood vectors. Finally, cosine similarity is applied to the output vectors to obtain recommendation scores.

Recommender systems are also one of the areas in which social data can be exploited to improve the reliability of recommendations. The incorporation of social functionalities in the recommender platforms has allowed their use in this domain. In [4], the concepts of trust and homophily derived from social structure are used to deal with the neighborhood bias of some CF recommendation methods which limits the number of items that can be recommended. Trust is derived from the friendship connections and is used to determine the degree of influence between users. Homophily is inferred from structural equivalence. This is a property often used to identify implicit communities in social networks. This is a way to capture the homophily concept since users belonging to the same community usually share interests and preferences. The similarities between users based on trust and homophily are used to extend the neighborhood of the active user and thus increase the number of potentially recommendable items.

Social media analysis is the focus of two articles in the Special Issue. One of them [5] presents a method for detecting significant events in social networks that can positively or negatively affect users. The changes in the user's followership network are used for event detection and are the base of a further analysis of the network dynamics. It is considered that an event for a given user takes place if the user experiences a follow burst or an unfollow burst in a time interval. To detect bursts, new follow/unfollow events are modeled as independent time series. Then, a time function representing the difference between the actual new follows/unfollows and the expected value for a given time is computed. A Personal Important Event (PIE) happens when the value of the function is higher than a threshold. The work also analyzes the evolution of the networks of users' followers and how the bursts caused by PIEs impact on the evolution.

The other paper focused on social media analysis presents a study about different aspects regarding the interrelationship of social media usage and perceived individual social capital [6]. A systematic procedure was applied to identify 80 scientific publications, which were analyzed in order to assess the measurement techniques used for evaluating social capital. Two operational techniques were identified. Additionally, the individual measurements items were explored to analyze future replication possibilities, resulting in no possibility of replication in an appreciable percentage of items. In the work, some consistencies and/or heterogeneity were detected in terms of operationalization, which can be useful for future studies.

In the research domains of information retrieval and social media mining, the application of language processing approaches to analyze sentiments is gaining increasing interest. In this context, the development of word embedding techniques based on deep learning have played an important role. In fact, word embedding is involved in a contribution to this issue [7], where sentiment analysis was performed for mining and summarizing opinions taking into account the context. The proposal, focused on news opinions, allows determining the relevance based not only on the text of the opinions, but also on the content of the news and its context. Topic detection from the opinion texts was performed by applying a hierarchical agglomerative clustering algorithm and using two different techniques to compute text similarity, with word embedding resulting as the best. The next steps are classifying the sentences according to the sentiment polarity and mapping topics and sentences. Finally, summary construction was provided after topic contextualization and sentence ranking were applied to news content. The topic was obtained by measuring the semantic similarity between the vocabulary associated with the topic and the news content.

We end this editorial by discussing another work that also addresses sentiment analysis [8]. In this case, the targets were questionnaire responses in telemonitoring programs to assist telemedicine patients. The aim was to monitor the adherence of patients to these programs from the sentiment polarity of their responses. The work presents the complete architecture of the system and also includes the collection and management of questionnaires. In addition, a new approach is introduced in the sentiment analysis that allows the monitoring of changes in patient's opinion across time through the repeated administration of a questionnaire. This is achieved by obtaining the polarity as a numerical value and modelling its sequence as a time series.

**Funding:** This work has been performed within the framework of a project funded by the Junta de Castilla y León, Spain, grant number SA064G19.

**Conflicts of Interest:** The author declares no conflict of interest. The funders had no role in the writing of the manuscript.

#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).

## *Article* **Penalty-Enhanced Utility-Based Multi-Criteria Recommendations**

#### **Yong Zheng**

Department of Information Technology and Management, College of Computing Illinois Institute of Technology, Chicago, IL 60616, USA; yzheng66@iit.edu

Received: 18 October 2020; Accepted: 20 November 2020; Published: 26 November 2020 -

**Abstract:** Recommender systems have been successfully applied to assist decision making in multiple domains and applications. Multi-criteria recommender systems try to take the user preferences on multiple criteria into consideration, in order to further improve the quality of the recommendations. Most recently, the utility-based multi-criteria recommendation approach has been proposed as an effective and promising solution. However, the issue of over-/under-expectations was ignored in the approach, which may bring risks to the recommendation model. In this paper, we propose a penalty-enhanced model to alleviate this issue. Our experimental results based on multiple real-world data sets can demonstrate the effectiveness of the proposed solutions. In addition, the outcomes of the proposed solution can also help explain the characteristics of the applications by observing the treatment on the issue of over-/under-expectations.

**Keywords:** recommender systems; utility; multi-criteria; penalty; over-expectation; under-expectation

#### **1. Introduction**

Information retrieval and recommender systems are two solutions to alleviate the problem of information overload [1], while recommender systems can deliver personalized recommendations to the end users without users' explicit queries. Recommender systems are usually built by learning from different types of the user preferences, such as explicit ratings or implicit feedbacks [2,3]. In the past decades, different types of the recommender systems have been proposed and developed. Multi-criteria recommender systems (MCRSs) [4] is one of these recommender systems which take the user preferences on different aspects of the items into account to improve the quality of the recommendations.

MCRSs have been implemented and served in real-world applications, such as hotel bookings at TripAdvisor.com, movie reviews at Yahoo!Movie, restaurant feedbacks at OpenTable.com. An example of the OpenTable.com can be shown by Figure 1. The system allows users to reserve tables at a restaurant and leave ratings on their dinning experiences. To review user experiences on a restaurant, we are able to observe the overall rating and multiple ratings on different aspects of the restaurant in Figure 1b, such as food, service, ambiance and noise level. It is because the system collects each user's overall rating and multi-criteria ratings as shown by Figure 1a. Afterwards, MCRSs can be built by taking advantage of these multi-criteria ratings in order to deliver more effective restaurant recommendations.

An example of data in MCRSs can be shown by Table 1. The rating refers to the users' overall rating on the items. We also have users' ratings on multiple criteria, such as food, service and value.

The research problem in MCRS is straightforward. Take the task of rating predictions for example; MCRSs predict an overall rating for a user and an item by taking advantage of the user's multi-criteria ratings on the item. In Table 1, MCRSs try to predict *U*3's overall rating on *T*<sup>1</sup> as shown in the table above, while we do not know *U*3's multi-criteria ratings on *T*1. Usually, we need to estimate a

user's multi-criteria ratings on an item, and then aggregate these ratings to finally predict the overall rating. The predicted overall rating can be used as a ranking score to sort and produce the list of recommendations delivered to the user.

(**a**) Page of rating entry (**b**) Page of restaurant information


**Figure 1.** Example of user preferences on multiple criteria. **Table 1.** Example of Rating Matrix from OpenTable.

Most recently, a utility-based multi-criteria recommendation approach [5] was proposed and it was demonstrated as one of the most effective methods. In this approach, we assume that there are user expectations on the items which can be represented by a list of ratings in multiple criteria. Given an item, we can also estimate a user's ratings on the different aspects of the items. In this case, the similarity between the user expectations and the multi-criteria ratings on the items can be considered as the utility of the item from the perspective of the user. A user may like the items more, if the similarity between user expectation and the user's multi-criteria ratings on these items is higher. The similarity score therefore can be used to rank the items to produce the top-N recommendations. We proposed to learn these user expectations by a learning-to-rank [6,7] method, and the experimental results were effective and promising.

However, there is a drawback in this approach. Namely, there is an issue of over-/under-expectations, while the current utility or similar function is not able to capture it. The issue refers to the situation that a user's rating on an item may lead to over-/under-expectations in comparison with the user's expectations on the items. Finally, It could result in false positives in the recommendation list and false negatives in the recommendation candidates. Take Table 2 for example, the first three rows refer to user *u*'s rating vectors on three items, while the last row refers to *u*'s expectations to select a restaurant to dine in. It is clear that *u*'s ratings on *T*<sup>1</sup> are under-expectations, while his or her ratings on *T*<sup>2</sup> are over-expectations. However, some of *u*'s ratings on *T*<sup>3</sup> are under-expectations, while others are over-expectations. It results in the difficulty of deciding

whether the user will like *T*3. It could be more complicated when it comes to the recommendation methods in the proposed utility-based multi-criteria recommendation models. A filtering strategy [8] may be helpful to alleviate the issue, but we need to pre-define the filtering rules by using domain knowledge. The challenge, therefore, becomes how to figure out a general solution for the utility-based multi-criteria recommendation model without domain knowledge.


**Table 2.** Example of over-/under-expectation.

In this paper, we propose to learn and apply penalties for the situation of over-/under-expectations. The proposed solution is generally enough to be applied in any applications, and we do not need any domain knowledge to define the filtering rules. The experimental results based on multiple data sets can demonstrate the effectiveness of our proposed solutions.

The remainder of this paper is organized as follows. Section 2 positions the related work. Section 3 presents the utility-based multi-criteria recommendation model. Section 4 discusses our proposed solution to alleviate the issue of over-/under-expectations. Section 5 presents the experimental results, followed by the conclusions and future work in Section 6.

#### **2. Related Work**

In this section, we discuss the related work in multi-criteria recommender systems, as well as the utility-based recommendation models.

#### *2.1. Multi-Criteria Recommendations*

As mentioned before, we have both overall rating and multi-criteria ratings in the rating data. The task in MCRS is predicting the overall rating for a user on an item by taking advantage of the multi-criteria ratings. Usually, we need to estimate a user's multi-criteria ratings on an item, and then aggregate these ratings to finally predict the overall rating, as shown in Equation (1). We use *R*<sup>0</sup> to represent the overall rating, and *R*1,2,··· ,*<sup>k</sup>* as the multi-criteria ratings, while the function *f* is denoted as the aggregation function.

$$R\_0 = f(R\_1, R\_2, \dots, R\_k) \tag{1}$$

Several multi-criteria recommendation algorithms have been developed to take advantage of these multi-criteria ratings. One of these methods is the heuristic approach [4,9] which utilizes the multi-criteria ratings to better calculate user-user or item-item similarities in the collaborative filtering algorithms. Another one is the model-based approach [4,10,11] which constructs a predictive model to estimate a user's overall rating on one item from the observed multi-criteria ratings. The model-based methods are usually more effective than the heuristic approach, since they are machine learning based algorithms which can even alleviate sparsity issues in the rating data.

Adomavicius, et al.'s [4] linear aggregation is one of the most basic and popular model which is usually utilized as a baseline for the purpose of benchmark. In this approach, we need to predict a user's rating on each criterion independently by using any rating function in the traditional recommender systems. Afterwards, we can use a linear regression as the aggregation function to finally estimate the overall rating by taking advantage of these predicted multi-criteria ratings.

One drawback in the approach above is that it ignores the correlation among the different criteria. Take the restaurant recommendation in the OpenTable for example, a user may not give a high rating

on the criterion "value", if the user does not like the "food" in this restaurant. Researchers try to build more effective models by taking the correlation of the criteria into considerations. The flexible mixture model [10] is one of these attempts. It is a mixture model-based collaborative filtering algorithm incorporating the discovered dependency structure, while multiple criteria can be put on the structure connected with a user and an item by using two latent variables. We made another attempt and proposed the approach of criteria chains [11], in which we predicted the multi-criteria ratings in a sequence. The predicted preference in one criterion could be considered as contexts to be used to predict the preference in the next criterion. In this way, we were able to consider the correlation among criteria in the chain.

#### *2.2. Utility-Based Recommendation Models*

According to the classification of recommender systems by Burke [12], there are five categories—collaborative models [13,14], content-based recommenders [15,16], methods which utilize demographic information [17], knowledge-based algorithms [18,19], and utility-based models [5,20,21]. The utility-based recommenders make suggestions based on a computation of the utility of each item for the user. Utility can be used to indicate how valuable an item is from the perspective of a user. The utility function may vary from data to data, and there are no unified function to be generalized to different domains or applications. Guttman used different transformation functions (e.g., linear, square or universal functions) for different types of the attributes (e.g., continuous or discrete) in the context of online shopping [20]. Li et al. [22] defined the utility of recommending a potential link in the social networks by a linear aggregation of its value, cost, and the linkage likelihood. Moreover, Zihayat et al. proposed to use the aggregation of article-driven (e.g., popularity, topic distributions) and user-driven measures (e.g., clickstream, dwell time) as the utility function for news recommendations [21]. The utility-based multi-criteria recommendation model [5] discussed in the next section is an example which designs the utility function to serve multi-criteria recommendations. Different optimization methods can be applied to find the optimal solution in the utility-based recommendation model. A multi-objective optimizer [23,24] could be useful, if there are multiple objectives involved in the recommendation model.

Our previous work [5] proposed and developed the utility-based multi-criteria recommendation models. However, we ignored the over-/under-expectation issue. In this paper, we propose the improved solutions which are built upon the previous model but they further alleviate the issue of the over-/under-expectations.

#### **3. Preliminary: Utility-Based Multi-Criteria Recommendations**

In this section, we introduce the existing utility-based multi-criteria recommendation model [5].

#### *3.1. Utility-Based Model (UBM)*

The major contribution of our previous work [5] is the design of the utility function for the multi-criteria recommender systems. More specifically, the utility of an item from the perspective of the user refers to how valuable the item is in view of a user. It was defined as the similarity between the vector of user expectations and the vector of user ratings in the multiple criteria (i.e., different aspects of the items).

Assume there are *N* criteria, we use −→*cu* to represent the vector of user expectations for a user *u*, and *r* −→*u*,*<sup>i</sup>* denotes the *u*'s rating vector (i.e., multi-criteria ratings) on the item *i*, as shown in Equations (2) and (3). Note that the expectation vector tells a user's expectations on the favorite items aligned to the same criteria used in the vector *r* −→*u*,*i*. More specifically, *r<sup>t</sup> <sup>u</sup>*,*<sup>i</sup>* (t = 1, 2, ··· , *N*) refers to user *u*'s rating on the item *i* in the *t*th criterion. Accordingly, *c<sup>t</sup> <sup>u</sup>* can tell user *u*'s expectation on the items in terms of the *t*th criterion. They must be in the same rating scale for each criterion.

$$\overrightarrow{\mathbf{c}\_{\text{ul}}} = <\mathbf{c}\_{\text{u}\prime}^{1}\mathbf{c}\_{\text{u}\prime}^{2}\cdots\mathbf{c}\_{\text{u}}^{N}> \tag{2}$$

$$
\overrightarrow{r\_{u,i}} = \prec r\_{u,i\prime}^1 r\_{u,i\prime}^2 \cdot \cdots \cdot r\_{u,i}^N > \tag{3}
$$

The value of the utility can be obtained by the similarity or distance measures between two vectors, as shown in Equation (4). The larger the utility is, the more the user may like this item. Note that distance measure will represent dissimilarities, since the similarity will be higher if the distance is smaller.

$$
\Omega 
 \text{L'} 
 \text{lity}(\mathbf{u}, \mathbf{i}) = \text{similarity}(\overleftarrow{\mathbf{c}\_{u}^{\flat}}, \overrightarrow{r\_{u,i}}) \tag{4}
$$

Theoretically, any similarity measures can be applied in Equation (4), such as Pearson correlation, cosine similarity, or distance measures (e.g., Manhattan distance, Euclidean distance, etc.) as dissimilarity measures. Our research deliver more insights about these similarity measures. First of all, Pearson correlation may not be a good choice since the values may not be reliable if the number of dimensions in the vectors is limited. In the area of MCRS, we usually have three or four multiple criteria, which raises the concerns in Pearson correlation. In addition, the angle-based measures, such as the cosine similarity, are not appropriate, since it may produce 100% similarity if two vectors are parallel but with different values. As a result, the distance measures can be utilized to represent the dissimilarity. Any distance measures can be applied. We tried both Manhattan distance and the Euclidean distance, and found that we could get better results by using Euclidean distance. Therefore, we only present the results based on the Euclidean distance in this paper. The distance values should be normalized to the unit scale, and then we use 1 minus the normalized distance value to represent the similarity between the two vectors.

Therefore, the workflow in the utility-based recommendation model can be summarized as follows. We use the data in Table 1 for example, and our task is to produce the top-N recommendations to user *U*3.

First of all, we need to make predictions on the multi-criteria ratings in order to obtain the vector of user ratings on the items, i.e., *r* −→*u*,*i*. In other words, we need to predict how *U*<sup>3</sup> will rate all candidate items on the three criteria, {food, service, value} in Table 1. In our work, we apply a process of independent predictions. More specifically, to predict how how *U*<sup>3</sup> will rate an item on the criterion "service", we will apply a traditional recommendation algorithm on the rating matrix <user, item, service>. Accordingly, we apply the same algorithm on other rating matrix associated with the ratings on each criterion. We use biased matrix factorization (BiasedMF) [25] as the recommendation algorithm in this step, since it is usually considered as a standard baseline and effective algorithm in the traditional recommender systems.

The rating prediction function by BiasedMF [25] can be shown in Equation (5).

$$
\hat{\sigma}\_{\text{u}\bar{\imath}} = \mu + b\_{\text{u}} + b\_{\bar{\imath}} + p\_{\text{u}}^T q\_{\bar{\imath}} \tag{5}
$$

*μ* refers to the global average rating, while *bu* and *bi* are the user bias and item bias respectively. *pu* and *qi* are the latent-factor vector which can represent *u* and *i* respectively. The MF will learn these parameters by minimizing sum of squared errors by using stochastic gradient descent as the optimizer. The *L*<sup>2</sup> norms are usually added into the loss function as the regularization terms in order to alleviate overfitting. The loss function is described in Equation (6), where *λ* is the regularization rate. *rui* and *r*ˆ*ui* are the real rating and predicted rating for the entry *u*, *i*. The model will learn from each entry *u*, *i* in the training set *T*. We use *p*∗, *q*∗, *b*∗ to represent the user latent-factor vectors, item latent-factor vectors and biases respectively which are the parameters to be learned in the process of optimizations.

$$\underset{p\*\,\mathcal{A}\neq\ast, b\*}{\text{Minimize}} \sum\_{(u,i)\in T} \left(r\_{ui} - \left.r\_{ui}\right|^2 + \lambda\left(||p\_u||^2 + ||q\_i||^2 + b\_u^2 + b\_i^2\right)\right) \tag{6}$$

Once we obtain the users' predicted multi-criteria ratings on the items, we randomly initialize the expectation vector for each user, and learn these vectors by using the optimization below.

#### *3.2. Optimization*

We can initialize user expectations for each user at the beginning. In this case, we are able to use Equation (4) to calculate the utility score which will be used to rank the items to produce the top-N recommendations. Our previous work [5] learns these user expectations by maximizing the normalized discounted cumulative gain (NDCG) [26] which is a metric used for listwise ranking in the well-known learning-to-rank methods. Assuming each user *u* has a "gain" *gui* from being recommended an item *i*, the average discounted cumulative gain (DCG) for a list of *J* items is defined in Equation (7).

$$DCG = \frac{1}{N} \sum\_{u=1}^{N} \sum\_{j=1}^{I} \frac{\mathcal{S}uij}{\max(1, \log\_b j)}\tag{7}$$

where the logarithm base is a free parameter, typically between 2 and 10. A logarithm with base 2 is commonly used to ensure all positions are discounted. NDCG is the normalized version of DCG given by Equation (8), where *DCG*∗ is the ideal DCG, i.e., the maximum possible DCG.

$$NDCG = \frac{DCG}{DCG^\*} \tag{8}$$

In terms of the listwise ranking, LambdaRank [27] can be applied to optimize NDCG directly. In addition, genetic and evolution algorithms have also been demonstrated as effective solutions in the listwise ranking optimization [28]. They have been utilized as the optimizer in the area of recommender systems before [29,30]. Our previous work found particle swarm optimization (PSO) [31] to be an effective optimizer, and it is easy to be implemented.

The basic workflow in the PSO can be described by Algorithm 1. In PSO, we need to initialize multiple particles to search for the optimal solution, while we use the NDCG shown in Equation (8) as the fitness function. The position of each particle is the parameters we need to learn. In our case, the position here refers to the all of the user expectation vectors. At the initialize stage, we need to define the number of particles, the initial positions and velocity. The velocity can define how much each particle can move (i.e., change the positions at the beginning).

```
Algorithm 1: Workflow in PSO.
```

```
initialization;
while t <= MaxIteration do
   for each particle do
       Calculate fitness value;
       if fitness is better than pBest then
           update pBest and its position;
       end
       if fitness is better than gBest then
           update gBest and its position;
       end
   end
   for each particle do
       update particle velocity according to Equation (9);
       update particle position according to Equation (10);
   end
   t = t + 1;
end
```
*Information* **2020**, *11*, 551

Each particle will run the algorithms with the initialized positions (i.e., user expectations) and velocity. The velocity is a vector with the same size of the position vector. For each run, we calculate the fitness value, where it refers to the NDCG metric in our experiments. The learning process will save a cBest value (i.e., the best NDCG for each particle *c* in multiple runs) for each particle and a gBest value (i.e., the best NDCG by the whole group of the articles) for the whole group, as well as their corresponding positions. In each iteration, the process will update the velocity for each particle, as shown in Equation (9). We use *Vij*,*<sup>t</sup>* to denote the velocity of the *j*th bit in the position of the particle *i* in the *t*th learning iteration, *Xij*,*<sup>t</sup>* as the value of position in the *j*th bit in particle *i* in the *t*th iteration. *PcBest* and *PgBest* are the vector of positions associated with the individual best fitness (i.e., cBest) and the global fitness value (i.e., gBest). *wt*, *α*1, *α*2, *ϕ*<sup>1</sup> and *ϕ*<sup>2</sup> are the arguments to be defined in advance. In this way, each particle can learn from itself and the best move by the whole group in each learning iteration.

$$V\_{ij,t} = w\_t \times V\_{ij,t} + a\_1 \varrho\_1 \times (P\_{c\text{Rest}}^j - X\_{ij,t}) + a\_2 \varrho\_2 \times (P\_{\text{gBest}}^j - X\_{ij,t}) \tag{9}$$

Finally, the position of each particle can be updated by Equation (10) and be used in the next learning iteration.

$$X\_{ij,t+1} = X\_{ij,t} + V\_{ij,t} \tag{10}$$

#### **4. Penalty-Enhanced Utility-Based Multi-Criteria Recommendation Model**

In this section, we point out the issue of over-/under-expectation in the approach above, and discuss out solution which applies a penalty in the learning process.

#### *4.1. Issue of Over-/Under-Expectations*

To better explain the issue of over-/under-expectations, we use the example shown in Table 2. The first three rows present a user *u*'s predicted rating vectors *r* −→*u*,*<sup>i</sup>* on three items—*T*1, *T*2, *T*3. The last row gives the user expectation vector −→*cu* .

For simplicity, we use the Manhattan distance to represent the dissimilarity between two vectors. In this case, the Manhattan distance is 4 which is the same for the items *T*<sup>1</sup> and *T*2. Apparently, the ratings on the item *T*<sup>2</sup> are all above the user expectations, while the ratings on *T*<sup>1</sup> are all below the user expectations. Without solving the issue of over-/under-expectations, the items *T*<sup>1</sup> and *T*<sup>2</sup> will be considered equally in the item rankings. The situation could be more complicated. Take the item *T*<sup>3</sup> for example, the Manhattan distance will be 6 for *T*3, but it falls in over-expectation in the criterion "Room", and under-expectation in other criteria. *T*<sup>3</sup> will be ranked ahead *T*<sup>1</sup> and *T*2, but the end user may prefer *T*<sup>2</sup> rather than *T*3. As a result, there could be false positives in the recommendation list and false negatives in the list of recommendation candidates.

We realized this issue, and proposed to use a filtering strategy to alleviate this issue [8]. More specifically, we can pre-define the rules for over-/under-expectations. For example, if the item falls in the situation of over-expectations, we may exclude this item from the list of candidate items to be recommended. However, it is difficult to pre-define these rules without domain knowledge, since we do not know whether the user will like an item if it falls in the case of over-expectation or under-expectation. In this paper, we seek solutions which are general and independent of domain knowledge.

#### *4.2. Penalty-Enhanced Models (PEMs)*

Our solution is simple and straightforward. We plan to learn a "penalty" for each situation. We define *Pover* and *Punder* as the penalty for the situation of over-expectation and under-expectations respectively. Everytime when we produce the utility score, we will add these penalties according to whether the actual situation is either over- or under-expected. The scale of *Pover* and *Punder* is [−1, 1],

since the utility score that was measured by similarity will fall in [0, 1]. We are going to learn *Pover* and *Punder* together with the user expectations in the learning-to-rank process.

Note that, we name it as "penalty", but actually the value could be positive or negative. It is a real penalty if the value is negative, since we will penalize the utility score. Otherwise, it is a bonus which will add values to the utility score—it implies that we still accept the item and it provides extra value in the situation of over- or under-expectations.

The remaining challenge is how to detect the situation of over- and under-expectations. We use a sign which can be computed by using ∑*<sup>N</sup> <sup>t</sup>*=1( −→ *ct <sup>u</sup>* <sup>−</sup> −→ *rt u*,*i* ). The item is under-expected if the sign is positive. Otherwise, it is over-expected, if the sign is negative. We will not apply any penalties if the sign is zero.

A finer-grained approach is to learn these penalties for each user or each group of the users, since the penalties may vary from user to user. Learning the penalties for each user may suffer the sparsity problem In this paper, we use PEM+ to denote the approach that we learn *Pover* and *Punder* for each group of the users in our experiments, while we create the user groups by using the K-Means clustering [32] technique.

#### **5. Experiments and Results**

In this section, we present our data sets, evaluation strategies and the experimental results.

#### *5.1. Data Sets and Evaluations*

We use four real-world data set with multi-criteria ratings:


We compare the proposed PEM and PEM+ approaches with the following baseline approaches:


We apply 5-fold cross validation on these data sets, and evaluate the performance of recommendations based on top-10 recommendations by using precision and NDCG. Furthermore, we use the particle swarm optimization (PSO) [35] as introduced previously. Particularly, we use OMOPSO [36] in the open-source library MOEA (http://moeaframework.org). OMOPSO was demonstrated as one of the top-performing PSO algorithms. MOEA is an open-source library for multi-objective learning, but it can also be used for single-objective learning, while we just setup NDCG as the only objective in the library. MOEA provides built-in optimal parameters for each learning algorithm, and we use these default parameters.

In addition to the PEM approach discussed in Section 4.2, we also examine PEM+ in which we put users into different clusters and learn the penalties for each cluster of the users. More specifically, we use the classical K-Means clustering on the user-item rating matrix. We tried different values for K (K = 2, 4, 6, 8, 10), and we found that the optimal value of K is 8, 6, 4, 4 for the TripAdvisor, Yahoo!Movie, SpeedDating and ITMLearning data respectively by using the the within-cluster sum of squared errors. We would like to examine whether PEM+ can offer further improvements, we just tried the small K values. The performance could be better if we try larger values, while we may also have more parameters to be learned. In PEM+, we will learn *Pover* and *Punder* for each cluster of users.

#### *5.2. Results and Findings*

First of all, we present the results based on precision and NDCG in Figure 2. Table 3 presents the NDCG results for the utility-based recommendation models, as well as the improvement by PEM and PEM+ in comparison with UBM. We performed two-paired t-test as the significant test at the 95% confidence level. We use \* to represent significant results between proposed approach (i.e., PEM and PEM+) and the best performing baseline method, and ◦ to indicate significant results between PEM and PEM+. Significance results based on precision are depicted in Figure 2, while the results for NDCG are described in Table 3.

First of all, we compared the results among the baseline methods (i.e., MF, LAM, FMM, CCM and UBM). We observed that the UBM approach generally outperformed other baseline methods in terms of both precision and NDCG. UBM produced slightly better NDCG results than the NDCG by FMM in the TripAdvisor and Yahoo!Movie data.

By comparing the solutions proposed in this paper (i.e., PEM and PEM+) with the baseline methods, we observed that the PEM could offer improvements on both precision and NDCG on all the data sets, except the speed dating data. PEM+ was able to beat all baselines except the speed dating data too. We believe that the failure was caused by the characteristics of this data set, which will be discussed in the next paragraph. A further look at the comparison between PEM and PEM+ can tell that PEM+ beat PEM in NDCG for all data except the dating data. However, PEM+ failed to outperform PEM in precision for the Yahoo!Movie and ITMLearning data. Recall that we used the NDCG as the fitness function in PSO, while the results on precision may be out of controls. Another potential reason could be that we did not try larger K values in KMeans for PEM+.

As a summary, PEM and PEM+ could offer improvements over the utility-based recommendation model. The only exception was the SpeedDating data set. We did have multi-criteria ratings in this data set. However, it was a data set for people-to-people recommendations which fell in the category of reciprocal recommendations. The nature of this data was different from other multi-criteria rating data, which may have resulted in less improvements here. We observed that the NDCG was even decreased by using PEM. The underlying reasons may lie in the special characteristics of the reciprocal recommendations. In the context of speed dating, a successful recommendation will consider a "match" between two users. In our recommendation approach, we only considered the preferences from the perspective of the users who received the recommendations, but ignored whether the recommended people would like to date with the target user. It may result in a drop or less improvements. A reciprocal recommendation model which also considers the dating partners [37,38] may help improve the recommendation performance.

**Figure 2.** Experimental results.


**Table 3.** Results based on normalized discounted cumulative gain (NDCG).

Our previous research [8] proposed to use the filtering strategies to alleviate the issue of over-/under-expectations for the ITMLearning data. We chose the best filtering strategy and run the model. It achieved the NDCG result as 0.1311 which was lower than the results by using both PEM and PEM+. It is not surprising, since the filtering operation may mistakenly remove the items that a user may like. Our solution based on the penalties actually provided a soft and finer-grained solution to alleviate the issue of over-/under-expectations. These results demonstratde that our solution was much more effective than the filtering strategy, not to mention that the penalty-enhanced solution did not require any domain knowledge to define the rules for filtering.

Finally, we present the learned *Pover* and *Punder* by using the PEM approach, as shown by Table 4. We observed that the penalties learned by our models varied from case to case. The "penalty" was positive for over-expectations and negative for under-expectations for the TripAdvisor, Yahoo!Movie and ITMLearning data sets. It tells that the users still liked the item if it was over-expected, and additionally a bonus was added to the predicted score which was used to rank the items. The penalty was negative in the case of under-expectation, so the predicted score was penalized accordingly. The pattern in the SpeedDating data was different from others—the penalty for over-expectation was negative, while it was positive for under-expectations. It implies that a user may not have accepted a recommended partner if some characteristics of the partner were over-expected. By contrast, the penalty for under-expectation was positive but close to zero, which implies that a partner was still acceptable even if the partner slightly missed the expectations in some characteristics. These results are interesting and can also help us understand more characteristics about each data or domain.


#### **6. Conclusions and Future Work**

In this paper, we point out the issue of over-/under-expectations in the existing utility-based multi-criteria recommendation approach, and propose to learn penalties to alleviate this issue. Our experimental results based on four real-world data sets can demonstrate the effectiveness of the proposed solutions. Particularly, the penalty-enhanced approach works better than the filtering strategy, and it is general enough to be applied to any data sets.

However, there are still some limitations in the current work. We can consider more solutions for these issues as our future work. First of all, we define the case of over-/under-expectation for each rating entry by a user on an item, and apply the corresponding penalties. We can actually exploit a finer-grained method which will apply a penalty to each bit of the rating vector (i.e., case by case for the rating on each criterion). In this case, we have more penalties to be learned, but it may be able to further improve the models. In addition, we did not try larger K values for the KMeans clustering in the PEM+ method. Other K values may deliver better results. Using PSO as the optimizer may result in an efficiency issue for a large-scale data. We can use cloud service (such as Amazon Web Services) to learn the parameters. Or, we can seek other optimization methods in future. Finally, the penalties may be affected by other information, such as contexts [39,40] or trust information [41,42]. For example, the issue of over-/under-expectations may be serious in some contexts, but they can be ignored in other situations. Or, the issue can be ignored if the item was recommended by a trusted person. We will seek these alternative improvements in our future work.

**Funding:** This research received no external funding.

**Conflicts of Interest:** The author declares no conflict of interest.

#### **Abbreviations**

The following abbreviations are used in this manuscript:


#### **References**


**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2020 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (http://creativecommons.org/licenses/by/4.0/).
