A Biased Proportional-Integral-Derivative-Incorporated Latent Factor Analysis Model

Sui, Jialu; Yin, Jian

doi:10.3390/app11125724

Open AccessArticle

A Biased Proportional-Integral-Derivative-Incorporated Latent Factor Analysis Model

by

Jialu Sui

and

Jian Yin

^*

School of Mechanical and Information Engineering, Shandong University, Weihai 264209, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(12), 5724; https://doi.org/10.3390/app11125724

Submission received: 19 May 2021 / Revised: 6 June 2021 / Accepted: 15 June 2021 / Published: 20 June 2021

(This article belongs to the Section Mechanical Engineering)

Download

Browse Figures

Versions Notes

Abstract

:

Nowadays, as the number of items is increasing and the number of items that users have access to is limited, user-item preference matrices in recommendation systems are always sparse. This leads to a data sparsity problem. The latent factor analysis (LFA) model has been proposed as the solution to the data sparsity problem. As the basis of the LFA model, the singular value decomposition (SVD) model, especially the biased SVD model, has great recommendation effects in high-dimensional sparse (HiDs) matrices. However, it has the disadvantage of requiring several iterations before convergence. Besides, the model PID-incorporated SGD-based LFA (PSL) introduces the principle of discrete PID controller into the stochastic gradient descent (SGD), the learning algorithm of the SVD model. It could solve the problem of slow convergence speed, but its accuracy of recommendation needs to be improved. In order to make better solution, this paper fuses the PSL model with the biased SVD model, hoping to obtain better recommendation result by combining their advantages and reconciling their disadvantages. The experiments show that this biased PSL model performs better than the traditional matrix factorization algorithms on different sizes of datasets.

Keywords:

latent factor analysis; stochastic gradient descent; PID controller; biased SVD model

1. Introduction

With the development of the network and the arrival of information times, people are exposed to more and more information. However, the number of items that users are interested in does not increase from before. It only accounts for a small proportion of the total items. Currently, it is difficult to satisfy the needs of most users by allowing them to go through the filter by themselves or by filtering only with item labels. In this case, it becomes essential to design a specific recommendation system to help users find the things they might be enthusiastic about. The recommendation system can extract the user’s preference information and recommend items that fit each user’s interest. The introduction of the recommendation system can significantly enhance the user’s experience with the software as they can effortlessly meet their own needs. Moreover, it has great commercial value in the fields of advertising promotion and commodity sales.

The main function of the recommendation system [1,2] is to predict user’s ratings through a series of calculations according to the existing scores and then fill in the user-item matrix with the predicted scores. Most of the data in the matrix are vacant because the amount of scoring data are much lesser than the number of users multiplied by the number of items. The recommendation system needs to fill the matrix to acquire a data-filled user-item matrix, which entails obtaining all users’ scores on all items.

Owing to the amount of information increases and the number of items each user has access to is limited, the user-item preference matrix is always described as the high-dimensional sparse (HiDS) matrix. For example, the Movielens-10 M dataset contains a total of 72,000 users and 10,000 movies. The user rating data shows that each user watched and rated only about 140 movies on average, with a density of only 1.31%. Compared with the total number of movies, the proportion of movies with user ratings is too low. In this situation, cold start [3], reduced coverage, neighbor transitivity and a series of problems need to be solved. Data sparsity [4] brings tons of difficulties to the recommendation. In other words, the user provides little information for reference by the recommendation system, but the number of items that are needed to predict the scores is quite large. In previous studies, scholars have proposed many solutions to these problems, such as singular value decomposition (SVD) [5], principal component analysis (PCA) [6], content-boosted CF algorithm (CBCF) [7], tree augmented naïve bayes optimized by extended logistic regression (TAN-ELR) [8]. All of them have their pros and cons.

Aiming to devise better solution to the data sparsity problem, this paper describes a recommendation model based on the latent factor analysis (LFA) model, which is a kind of model-based recommendation techniques of the collaborative filtering (CF) recommendation system [9]. It uses the idea of the biased SVD model to improve the PID-incorporated SGD-based LFA (PSL) model which combines the advantages of both of them. This model fuses the prediction method of biased SVD with the instantaneous error correction method of PSL, which solves the shortcomings of slow convergence of the biased SVD model and the problem of low recommendation accuracy of the PSL model. When compared with other recommendation models, experiments prove that it has high computational efficiency, as well as high prediction accuracy on the HiDS matrix. The contributions of this paper are as follows:

It introduces a biased PSL model combining the biased SVD and the PSL model;
Experiments on three large datasets demonstrate that the biased PSL model can achieve highly competitive prediction accuracy for the missing data of an HiDS matrix compared to the other models.

The rest of this paper is organized as follows: Section 2 shows the related work about the research; Section 3 gives the preliminaries knowledge for a detailed introduction to the basis of the algorithm. Section 4 describes the specific implementation process of the algorithm. Section 5 reports the recommendation results of the algorithm and compares it with previous recommendation methods; Section 6 evaluates the performance of each method; Finally, Section 7 summarizes the whole paper and puts forward the direction of future research.

2. Related Work

The recommendation system is generally divided into 3 main categories: (1) Content-based recommendation system [10,11]. It recommends the items that are similar to previous favorite ones to users based on item content and item label; (2) CF Recommendation System [12,13,14]. It gives priority to recommend the items that are favored by people who have similar preferences to them and the recommendation is based on the relevance of users and items; (3) Hybrid Recommendation System [15,16]. It combines multiple techniques to make the final decision.

As for the CF algorithms, it can be divided into 3 techniques: (1) Memory-Based CF Techniques [17]. For these techniques, every user is part of a group of people with similar interests. A new user’s preference could be obtained by identifying his so-called neighbors; (2) Model-Based CF Techniques [18]. In order to solve the limitations of memory-based CF algorithms, it applies some models to memory-based CF algorithms, such as machine learning and data mining algorithms. It can resolve the limitations of memory-based CF algorithms; (3) Hybrid CF Techniques [19,20]. As each recommendation technology has its limitations, it combines multiple recommendation techniques to make the final recommendation.

As one of the model-based CF recommendation technique and the basis of the model introduced in this paper, the LFA model [21,22,23,24,25] decomposes the user-item matrix into

P_{u * k}

and

Q_{i * k}

by the SVD, where u and i are the number of the users and items, respectively, and k is the number of latent factors. Then, it obtains the final prediction scoring matrix by multiplying the two matrices. The value of k is self-set, which refers to the number of latent factors. The item matrix can be understood as the degree to which each item has this series of attributes, while the user matrix can be considered as the degree to which each user likes this series of attributes. However, to say “each latent factor represents an attribute” is just for the convenience of understanding. In fact, each latent factor has no definite meaning, which makes the latent factor model more flexible and not limited to the label attribute of the item.

According to the previous studies [26,27], the SVD model, especially the biased SVD model, is proposed to solve a series of problems of data sparsity since it has a good performance on the HiDS matrix. The biased SVD model takes users’ rating habits and the quality of items into account to give a more targeted prediction score in accordance with the characteristics of users and items. In the Netflix Prize recommendation game, Yehuda Koren reduced the rating error by 32% when adding the bias portion alone, and by 42% when adding the personalization portion. Thus, by adding personalization, it can only reduce the error by 10%. This illustrates the importance of the bias which has a greater effect in improving accuracy. However, this model has a serious drawback, it requires multiple iterations to converge on large datasets.

In addition to the biased SVD model, according to [28,29], based on the SVD model, the LFA model incorporating the principle of discrete PID controller named PSL is also raised to solve data sparsity because it has the advantage of rapid convergence. The traditional SVD can be regarded as a discrete PID controller with

K_{I} = K_{D} = 0

. Therefore, this method adds these two parameters to improve the performance of the recommendation model. However, since it assumes that everyone has the same rating standard and all items have the same quality, its recommendation accuracy has to be improved.

Since both of them have advantages and disadvantage, this paper introduces the principle of the biased SVD model to the PSL model to improve the recommendation effect.

3. Preliminaries

The biased PSL model proposed in this paper incorporates the benefits of biased SVD and PSL models while also addressing their drawbacks. The following section introduces the SVD model and PSL model in detail.

3.1. SVD

3.1.1. Conventional Matrix Decomposition SVD Model

The SVD in the recommendation system is slightly different from the mathematical singular value decomposition [30]. In linear algebra, SVD decomposes a matrix into three matrices. As shown in (1), W is the original matrix, U is the left singular matrix, V is the right singular matrix, and ∑ is the singular value diagonal matrix.

W = U \sum V .

(1)

However, in the recommendation algorithm, SVD decomposes a matrix into two matrices

P_{u * k}

and

Q_{i * k}

, the user latent factor matrix and the item latent factor matrix, respectively. Then it gains a matrix full of data which is a predicted matrix filled with user ratings for items through the product of two matrices like (2) and (3). This SVD method is the basis of the LFA.

R_{u * i} = P Q^{T} .

(2)

{\hat{r}}_{u, i} = \sum_{k = 1}^{K} p_{u, k} q_{i, k} .

(3)

The recommendation system aims to minimize the error between the real score and the predicted score, thus the objective function of the traditional SVD model is (4). To avoid the problem of gradient disappearance of P and Q in the process of gradient descent, the regularization term should be added.

λ

expresses the regularization parameter.

min_{Q^{*}, P^{*}} \sum_{(u, i) \in R} (r_{u i} - \sum_{k = 1}^{K} p_{u, k} q_{i, k})^{2} + λ (∥ Q_{i} ∥^{2} + ∥ P_{u} ∥^{2}) .

(4)

The iterative formulas (5) for P and Q are obtained by taking the derivative of the objective function.

\{\begin{matrix} \frac{\partial C}{\partial P_{u k}} = - 2 (r_{u i} - \sum_{k = 1}^{K} p_{u, k} q_{i, k}) Q_{k i} + 2 λ P_{u k}, \\ \frac{\partial C}{\partial Q_{k i}} = - 2 (r_{u i} - \sum_{k = 1}^{K} p_{u, k} q_{i, k}) P_{u k} + 2 λ Q_{k i} . \end{matrix}

(5)

Then the stochastic gradient descent (SGD) method (6) is used to acquire the final solution where

α

represents the learning rate [31].

\{\begin{matrix} P_{u k}^{t + 1} = P_{u k}^{t + 1} + α ((r_{u i} - \sum_{k = 1}^{K} p_{u, k} q_{i, k}) Q_{k i} - λ P_{u k}^{t}), \\ Q_{k i}^{t + 1} = Q_{k i}^{t + 1} + α ((r_{u i} - \sum_{k = 1}^{K} p_{u, k} q_{i, k}) P_{u k} - λ Q_{k i}^{t}) . \end{matrix}

(6)

3.1.2. Biased SVD Model

Obviously, there are many subjective factors in the scoring process. Different people have different scoring habits. For example, some people like to give high scores, while others always give low scores. In addition, the quality of items varies from each other. Naturally, high-quality items always get higher scores than low-quality items. Under this circumstance, it is necessary to introduce bias into the SVD model. In the biased SVD model, the characteristics of the user and item are considered by adding an item bias and a user bias. Compared with the traditional SVD algorithm, the performance of the biased SVD algorithm is better, because it takes individuation into account. As mentioned above, the prediction score formula of the biased SVD model is shown in (7) where

b_{u}

and

b_{i}

indicate the user bias and item bias, representing the eigenvalues of users and items, respectively, and

μ

expresses the average score of all users.

{\hat{r}}_{u, i} = \sum_{k = 1}^{K} p_{u, k} q_{i, k} + μ + b_{i} + b_{u} .

(7)

After gaining the predicted score, the objective function can be calculated as follows (8). Since

b_{i}

and

b_{u}

are also variables and need to be updated during iteration, the regularization terms of

b_{i}

and

b_{u}

need to be added as well.

min_{Q^{*}, P^{*}} \sum_{(u, i) \in R} (r_{u i} - \sum_{k = 1}^{K} p_{u, k} q_{i, k} - μ - b_{i} - b_{u})^{2} + λ (∥ Q_{i} ∥^{2} + ∥ P_{u} ∥^{2} + ∥ b_{u} ∥^{2} + ∥ b_{i} ∥^{2}) .

(8)

In the biased SVD model, since

b_{i}

and

b_{u}

have no product relationship with P and Q, the iterative method of P and Q is the same as that of the traditional SVD method. By calculating the derivative of the objective function, the iterative formulas of

b_{i}

and

b_{u}

are obtained as follows (9), where

α

indicates the learning rate.

\{\begin{matrix} b_{u}^{t + 1} = b_{u}^{t + 1} + α ((r_{u i} - {\hat{r}}_{u, i} - λ b_{u}^{t}), \\ b_{i}^{t + 1} = b_{i}^{t + 1} + α ((r_{u i} - {\hat{r}}_{u, i} - λ b_{i}^{t}) . \end{matrix}

(9)

3.2. PSL Model

As for the PSL model, it implements the principle of the PID controller into the LFA model which can improve the convergence speed. As such, before describing the PSL model, the principle of the PID controller is introduced at first.

3.2.1. PID Controller

The principle of PID [32,33] is to calculate instantaneous error, i.e., the difference between the true value and the predicted value, and then correct this error according to the proportion (P), integral (I), and derivative (D). Since the update point in the HiDS matrix is discrete, it is compatible with the discrete PID controller. The schematic diagram of the discrete PID controller is shown in Figure 1. The TV and PV in the figure represent true and predicted values, respectively.

As shown in Figure 1, firstly, it calculates the error between TV and PV, i.e., the instantaneous error, and then inputs it into the three modules, P, I, and D, to accomplish the following calculation (10). Through this formula we can attain the adjusted error, and finally use it to update the predicted value instead of using the instantaneous error.

\tilde{E_{t}} = K_{P} E_{t} + K_{I} \sum_{n = 0}^{t} E_{n} + K_{D} (E_{t} - E_{t - 1}) .

(10)

The control coefficients for the proportional, integral, and derivative terms are

K_{P}

,

K_{I}

, and

K_{D}

, respectively, and

E_{t}

is the instantaneous error in the

t_{t h}

update point. The PV is reconstructed and returned to the controller based on the adjusted error. This mechanism continues until the termination condition, i.e., the LFA converges, is met.

3.2.2. PSL Model

To converge, the SGD-based LFA model requires several iterations, and the overall cost of time can be considerable. As a result, the convergence process must be accelerated. It can be shown that at each update point of the LFA model based on SGD, the instantaneous error between the actual value

r_{u i}

and the predicted value

{\tilde{τ}}_{u i}

will be measured and transmitted to the algorithm. Therefore, this procedure can be considered as a generalized discrete PID controller, with

K_{I} = K_{D} = 0

omitting the integral and derivative terms. From this standpoint, applying these two terms to the model will help it preform better and accelerate the convergence process. Based on the above assumptions, the PSL model is introduced. The main idea of the PSL model is to reconstruct the instantaneous error according to the PID principle, and then bring the reconstructed error into the SGD algorithm to accelerate the convergence of the LFA model. In the PSL model, according to the concept of PID controller, the integral term and the derivative term are used to extend the calculation of error, as seen in the formulas (11) and (12).

τ_{u, i} = r_{u, i} - \sum_{k = 1}^{K} p_{u, k} q_{i, k} .

(11)

{\tilde{τ}}_{u i}^{t} = K_{P} τ_{u, i}^{t} + K_{I} * \sum_{n = 0}^{t} τ_{u i}^{n} + K_{D} (τ_{u i}^{t} - τ_{u i}^{t - 1}) .

(12)

$\sum_{n = 0}^{t} τ_{u i}^{n}$ shows the sum of historical $τ_{u i}$
$τ_{u i}^{t} - τ_{u i}^{t - 1}$ shows the discrepancy between this time’s error and last time’s error

The objective function (13) of the PSL model is deduced according to the above formulas.

min_{Q^{*}, P^{*}} \sum_{(u, i) \in R} {\tilde{τ}}_{u i}^{2} + λ (∥ Q_{i} ∥^{2} + ∥ P_{u} ∥^{2}) .

(13)

By taking the derivative of the objective function, the following iterative formula (14) for P and Q is obtained.

\{\begin{matrix} P_{u k}^{t + 1} = P_{u k}^{t + 1} + α ({\tilde{τ}}_{u i} Q_{k i} - λ P_{u k}^{t}), \\ Q_{k i}^{t + 1} = Q_{k i}^{t + 1} + α ({\tilde{τ}}_{u i} P_{u k} - λ Q_{k i}^{t}) . \end{matrix}

(14)

4. Materials and Methods

This paper proposes a biased PSL model which combines the biased SVD model with the PSL model. It solves the problem that the PSL model does not consider individuation and the biased SVD model requires several iterations to convergence. The input of the model is the user-item matrix,

K_{P}

,

K_{I}

,

K_{D}

of the discrete PID controller, average score of all users, the number of users U and items I. The output is the user latent factor matrix P and item latent factor matrix Q. The final prediction scoring matrix is obtained by multiplying these two matrices.

4.1. Algorithm Description

According to [26,27,28,29], it can be concluded that both the PSL model and the biased SVD model help improve the effectiveness of the recommendation system. Therefore, this paper presents a new model combining the PSL model with the biased SVD model to test whether it can provide better recommendation result. In this method, the instantaneous error, i.e., the difference between the true value and the predicted value, is derived from the biased SVD formula, and the error is then fed into the PSL model for an iterative solution.

The calculation formula of the predicted value is the same as that of the biased SVD model (7). Then the instantaneous error value is brought into the PID controller to gain the adjusted error for iteration to obtain the final predicted value. In this way, the advantages of rapid convergence and fewer iterations of the PSL model can be combined with the advantage of high accuracy of the biased SVD model. Substituting the above predicted formula into the PSL model can obtain the formula (15).

τ_{u, i} = r_{u, i} - \sum_{k = 1}^{K} p_{u, k} q_{i, k} - μ - b_{i} - b_{u} .

(15)

Then

{\tilde{τ}}_{u, i}

is calculated by Formula (12) and substituted into the following objective function (16). The regularization term is also changed because of the addition of

b_{u}

and

b_{i}

.

min_{Q^{*}, P^{*}} \sum_{(u, i) \in R} {\tilde{τ}}_{u, i}^{2} + λ (∥ Q_{i} ∥^{2} + ∥ P_{u} ∥^{2} + ∥ b_{u} ∥^{2} + ∥ b_{i} ∥^{2}) .

(16)

the formulas in (17) are the iterative formula of

b_{u}

and

b_{i}

obtained based on the objective function. The iterative formulas of P and Q are the same as that of the PSL model.

\{\begin{matrix} b_{u}^{t + 1} = b_{u}^{t + 1} + α ({\tilde{τ}}_{u, i} - λ b_{u}^{t}), \\ b_{i}^{t + 1} = b_{i}^{t + 1} + α ({\tilde{τ}}_{u, i} - λ b_{i}^{t}) . \end{matrix}

(17)

The pseudocode of the algorithm is shown in Algorithm 1.

Algorithm 1Bias-PSLSVD.

Require:U, I, R, K,

λ

,

α

,

K_{P}

,

K_{I}

,

K_{D}

, S,

μ

Ensure:

P, Q

init

P_{| U | * K}, Q_{| I | * K}

with random numbers in [−0.01,0.01]

init

b_{u}, b_{i}

with 0
init

Ψ, Y

with size |R|
for

each

r_{u, i}

in R do

init

Ψ_{u, i}

= 0,

Y_{u, i}

= 0
end for
while not converge and n ≤ S do
for

r_{u, i}

in R do

fetch

Ψ_{u, i}

from

Ψ

and

Y_{u, i}

from

Y

{\hat{r}}_{u, i}

=

\sum_{k = 1}^{K} p_{u, k} q_{i, k}

+

μ

+

b_{i}

+

b_{u}

τ_{u, i}

=

r_{u, i}

-

{\hat{r}}_{u, i}

Ψ_{u, i}

=

Ψ_{u, i}

+

τ_{u, i}

\tilde{τ}

=

K_{P}

*

τ_{u, i}

+

K_{I}

*

Ψ_{u, i}

+

K_{D}

* (

τ_{u, i}

-

Y_{u, i}

)

Y_{u, i}

=

τ_{u, i}

b_{u}

=

b_{u}

+

α

(

\tilde{τ}

-

λ b_{u}

)

b_{i}

=

b_{i}

+

α

(

\tilde{τ}

-

λ b_{i}

)
for k = 1 to K do

p_{u, k}

=

p_{u, k}

+

α

(

\tilde{τ} q_{i, k}

-

λ p_{u, k}

)

q_{i, k}

=

q_{i, k}

+

α

(

\tilde{τ} p_{u, k}

-

λ q_{i, k}

)
        end for
    end for
    n = n + 1
end while

4.2. Algorithm Analysis

Since this algorithm is mainly based on the framework of the PSL model, the computational cost of the algorithm is quite similar to the PSL algorithm. In terms of storage complexity, it has two more sequences (

b_{u}

and

b_{i}

) than that of the PSL algorithm. Apart from these, there is no other change. This storage method is suitable for practical development, as it makes the algorithm formula more concise and comprehensible. The storage complexity is described as

S_{B i a s - P S L}

in (18).

| U |

and

| I |

mean the number of users and the number of items, respectively, K expresses the number of latent factors, and

| R |

expresses the amount of data.

S_{B i a s - P S L} = (| U | + | I |) (K + 1) + 2 | R |

(18)

The computational cost corresponding to the storage complexity is described as

T_{B i a s - P S L}

in (19). In this formula,

s t e p

is the number of iterations in the recommendation algorithm.

T_{B i a s - P S L} = s t e p * | R | * K

(19)

It can be seen that the time complexity of the algorithm is mainly determined by the number of iterations, the amount of data and the number of latent factors.

5. Experiment Results

5.1. General Settings

The main function of the LFA model is to decompose the scoring matrix of missing data into two latent factor matrices P and Q, the user latent factor matrix and item latent factor matrix, and finally calculate the product of the two latent factor matrices

P Q^{T}

to obtain a complete scoring matrix. After gaining the complete scoring matrix, we remove the items users have already rated, and then rank the remaining items according to their predicted scores. Naturally, the items with high scores will be recommended first. As a result, accurate item ratings are the key to the recommendation system. The ultimate purpose of the LFA model is to gain missing data in the user-item matrix, so the main method to judge the recommendation effect of a recommendation model is to evaluate the difference between its predicted value and real value. At present, the methods commonly used are root mean square error (RMSE) [34] and mean absolute error (MAE) [35]. In Formula (20),

r_{u i}

and

\hat{r_{u i}}

denote the real score and predicted score, respectively, and

| R |

denotes the amount of data.

\{\begin{matrix} R M S E = \sqrt{\sum_{(u, i) \in R} {(r_{u i} - \hat{r_{u i}})}^{2} / | R |}, \\ M A E = (\sum_{(u, i) \in R} | r_{u i} - \hat{r_{u i}} |_{a b s}) / | R | . \end{matrix}

(20)

According to the formulas of the two methods, it can be concluded that the smaller the

R M S E

and

M A E

value of this model, the better its recommendation effect.

5.2. Dataset

The experiments use a dataset from the Movielens [36] series, which was collected from the Movielens system maintained by the Grouplens research team, with a score on the scale of 1 to 5 (1,2,3,4,5). To explore the recommendation effect of the biased PSL model on datasets of different sizes and densities, this experiment selects 3 different sizes of subsets of the dataset. Their details are provided in Table 1.

5.3. Model Comparison Test

In this paper, the biased PSL model is compared with the traditional SVD model, the biased SVD model and the PSL model.

The learning rate of the algorithm is unified as 0.01, the regularization parameter is 0.05,

K_{P}

is 1.44,

K_{I}

is 0.002, and

K_{D}

is 0.001. The emphasis is to compare the recommendation effects of different algorithms on different sizes of datasets under the condition of the different number of latent factors (50,100).

All models are accomplished based on surprise [37] (surprise is a Python scikit for building and analyzing recommender systems that deal with explicit rating data.)

The models involved in the experiments are as follows:

SVD model (the traditional SVD model): It decomposes the user-item matrix directly;
Biased SVD model: It introduces the user and item bias into the SVD model, taking the individuation into account;
PSL model: It introduces the principle of the discrete PID controller into the SGD and applies the parameter of PID to the traditional SVD model;
Biased PSL model: It integrates the biased SVD model and the PSL model.

5.4. Results

The experiments adopt the 80–20% train-test settings and applies five-fold cross-validations to obtain objective results. The average training processes of compared models are shown in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7.

The performance of the four models on the datasets is shown below in Table 2, Table 3 and Table 4.

6. Discussion

According to the Figure 2, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7, the RMSE and MAE initially decrease as the number of iterations increases. However, once over-fitting occurs, the RMSE and MAE will rise as the number of iterations rises. In this experiment, the lowest point of the RMSE and MAE before over-fitting is chosen for comparison. From these results, we have following findings:

By comparing the 4 models in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7, it can be seen that the biased PSL model requires fewer iterations to achieve the best recommendation result;
By comparing the 4 models in Figure 2, Figure 3, Figure 4, Figure 5, Figure 6 and Figure 7, it can be seen that the final RMSE and MAE of the biased PSL model are smaller, which means it has better recommendation effect;
By comparing Figure 2, Figure 4 and Figure 6, as well as Figure 3, Figure 5 and Figure 7 we can conclude that the size of the dataset has little impact on the number of iterations needed to achieve the best recommendation effect of the biased PSL model.

The biased PSL model proposed in this paper can combine the advantages of the biased SVD model and the PSL model.

According to Table 2, Table 3 and Table 4, the biased PSL model outperforms the SVD model, the biased SVD model, and the PSL model in terms of the recommendation result in each dataset with different number of latent factors. Among the three datasets, the smallest dataset (100 K) can best reflect the advantages of the biased PSL model. As shown in Table 2, the RMSE of the biased PSL model decreased by 0.0227, about 1%, compared with the traditional SVD model, the model with the worst recommendation effect. Even if the difference is not extraordinarily obvious, its result is still superior to the biased SVD model and PSL model. From the data in Table 2, Table 3 and Table 4, it can be proven that the combination of the biased SVD model and the PSL model can result in improved performance.

7. Conclusions

This paper combines the prediction formula of the biased SVD model with the PSL model, designing a biased PSL model which achieves significantly higher computational efficiency, as well as highly competitive prediction accuracy. This method fuses the prediction method of biased SVD with the instantaneous error correction method of PSL. The biased SVD formula can take into account the scoring habits of different people and the different qualities of items, i.e., taking into account the individuation, to give the most appropriate score for users and items. The PSL model integrates the principle of the discrete PID controller into the SGD-based LFA model, to corrects the errors of the real and predicted scores according to the proportion (P), integral (I), and derivative (D). The biased PSL model blends the advantages of the two models to obtain a better recommendation effect. In the experiment, we use 3 datasets with different sizes and different densities to test the recommendation performance of the biased PSL model and other recommendation models (SVD model, biased SVD model, and PSL model). The prediction effect is evaluated in detail by measuring the results of RMSE and MAE. Eventually, the experimental result delivers that the biased PSL model performs better than the SVD and PSL models on the three HiDS matrices.

Regarding future work, we hope to determine the best parameters appropriate for specific dataset. This model is mainly based on the PSL model, and the principle of the PSL model is discrete PID controller. Thus, its prediction results are greatly affected by the parameters of the PID controller. Owing to the wide range of

K_{P}

,

K_{I}

, and

K_{D}

, it is difficult to ensure whether

K_{P}

,

K_{I}

, and

K_{D}

are most suitable parameters for this dataset. If a system can achieve the best parameters appropriate for the specific dataset, the best recommendation effect could be gained. Furthermore, despite having fewer iterations, the PSL algorithm takes longer to complete each iteration than the SVD algorithm due to the difficulty of vector changes in each iteration. Its capacity to run on a huge dataset is restricted, as it takes longer to complete a single training session. Since new users and items are be introduced regularly into a recommendation system, the lengthy preparation period is not conducive to the implementation of the model. It is difficult to label a recommendation system as outstanding if it cannot measure easily and make initial suggestions as new users and items are being introduced. Besides, we only attempted to combine the PSL model and the biased SVD model in this paper in the hope of a better recommendation effect. Now no attempt has been made to combine the PSL model with other recommendation models. In the future, we can try integrating PSL model with other recommendation models and test their recommendation performance.

Author Contributions

Conceptualization, J.S.; methodology, J.S.; software, J.S.; validation, J.S.; formal analysis, J.S.; writing—original draft preparation, J.S.;writing—review and editing, J.Y.; supervision, J.Y.; All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the National Natural Science Foundation of China under Grant 61971268.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available online at https://github.com/littlebeen/Bias-PSL (accessed on 15 June 2021). And the datesets are available online at https://grouplens.org/datasets/movielens/ (accessed on 15 June 2021).

Acknowledgments

We thank the National Natural Science Foundation of China for funding our work, grant number 61971268.

Conflicts of Interest

The authors declare no conflict of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LFA	Latent Factor Analysis
SVD	Singular Value Decomposition
PSL	PID-incorporated SGD-based LFA
SGD	Stochastic Gradient Descent
CF	Collaborative Filtering
PCA	Principal Component Analysis
CBCF	Content-Boosted CF Algorithm
TAN-ELR	Tree Augmented Naïve Bayes optimized by Extended Logistic Regression
PID	Proportion Integral Derivative
RMSE	Root Mean Square Error
MAE	Mean Absolute Error

References

Abdollahpouri, H.; Malthouse, E.C.; Konstan, J.A.; Mobasher, B.; Gilbert, J. Toward the Next Generation of News Recommender Systems. arXiv 2021, arXiv:abs/2103.06909. [Google Scholar]
Li, Z. Towards the next generation of multi-criteria recommender systems. In Proceedings of the 12th ACM Conference on Recommender Systems, RecSys 2018, Vancouver, BC, Canada, 2–7 October 2018; Pera, S., Ekstrand, M.D., Amatriain, X., O’Donovan, J., Eds.; ACM: New York, NY, USA, 2018; pp. 553–557. [Google Scholar] [CrossRef]
Shao, Y.; Xie, Y. Research on Cold-Start Problem of Collaborative Filtering Algorithm. In Proceedings of the 3rd International Conference on Big Data Research, ICBDR 2019, Cergy-Pontoise, France, 20–22 November 2019; ACM: New York, NY, USA, 2019; pp. 67–71. [Google Scholar] [CrossRef]
Yin, H.; Wang, Q.; Zheng, K.; Li, Z.; Zhou, X. Overcoming Data Sparsity in Group Recommendation. arXiv 2020, arXiv:abs/2010.00813. [Google Scholar] [CrossRef]
Takács, G.; Pilászy, I.; Németh, B.; Tikk, D. Scalable Collaborative Filtering Approaches for Large Recommender Systems. J. Mach. Learn. Res. 2009, 10, 623–656. [Google Scholar]
Yazdani, S.; Shanbehzadeh, J.; Shalmani, M.T.M. RPCA: A Novel Preprocessing Method for PCA. Adv. Artif. Intell. 2012, 2012, 484595:1–484595:7. [Google Scholar] [CrossRef] [Green Version]
Lin, H.; Yang, X.; Wang, W. A Content-Boosted Collaborative Filtering Algorithm for Personalized Training in Interpretation of Radiological Imaging. J. Digit. Imaging 2014, 27, 449–456. [Google Scholar] [CrossRef] [PubMed] [Green Version]
Greiner, R.; Su, X.; Shen, B.; Zhou, W. Structural Extension to Logistic Regression: Discriminative Parameter Learning of Belief Net Classifiers. Mach. Learn. 2005, 59, 297–322. [Google Scholar] [CrossRef] [Green Version]
Su, X.; Khoshgoftaar, T.M. A Survey of Collaborative Filtering Techniques. Adv. Artif. Intell. 2009, 2009, 421425:1–421425:19. [Google Scholar] [CrossRef]
Balabanović, M.; Shoham, Y. Fab: Content-Based, Collaborative Recommendation. Commun. ACM 1997, 40, 66–72. [Google Scholar] [CrossRef]
Liu, H.; Kong, X.; Bai, X.; Wang, W.; Bekele, T.M.; Xia, F. Context-Based Collaborative Filtering for Citation Recommendation. IEEE Access 2015, 3, 1695–1703. [Google Scholar] [CrossRef]
Gogna, A.; Majumdar, A. A Comprehensive Recommender System Model: Improving Accuracy for Both Warm and Cold Start Users. IEEE Access 2015, 3, 2803–2813. [Google Scholar] [CrossRef]
Sarwar, B.M.; Karypis, G.; Konstan, J.A.; Riedl, J. Item-based collaborative filtering recommendation algorithms. In Proceedings of the Tenth International World Wide Web Conference, WWW 10, Hong Kong, China, 1–5 May 2001; Shen, V.Y., Saito, N., Lyu, M.R., Zurko, M.E., Eds.; ACM: New York, NY, USA, 2001; pp. 285–295. [Google Scholar] [CrossRef] [Green Version]
Wu, Y.; Wei, J.; Yin, J.; Liu, X.; Zhang, J. Deep Collaborative Filtering Based on Outer Product. IEEE Access 2020, 8, 85567–85574. [Google Scholar] [CrossRef]
Nguyen, S.; Kwak, H.; Lee, S.; Gim, G.Y. Featured Hybrid Recommendation System Using Stochastic Gradient Descent. Int. J. Netw. Distrib. Comput. 2021, 9, 25–32. [Google Scholar] [CrossRef]
Cai, X.; Hu, Z.; Zhao, P.; Zhang, W.; Chen, J. A hybrid recommendation system with many-objective evolutionary algorithm. Expert Syst. Appl. 2020, 159, 113648. [Google Scholar] [CrossRef]
Petrusel, M. A Comparative Analysis of Similarity Measures in Memory-Based Collaborative Filtering. In Proceedings of the Artificial Intelligence and Soft Computing—19th International Conference, ICAISC 2020, Zakopane, Poland, 12–14 October 2020; Proceedings, Part II. Rutkowski, L., Scherer, R., Korytkowski, M., Pedrycz, W., Tadeusiewicz, R., Zurada, J.M., Eds.; Lecture Notes in Computer Science. Springer: Berlin/Heidelberg, Germany, 2020; Volume 12416, pp. 140–151. [Google Scholar] [CrossRef]
Yang, F.; Liu, F.; Liu, S. Collaborative Filtering Based on a Variational Gaussian Mixture Model. Future Inter. 2021, 13, 37. [Google Scholar] [CrossRef]
Yue, W.; Wang, Z.; Tian, B.; Pook, M.; Liu, X. A Hybrid Model- and Memory-Based Collaborative Filtering Algorithm for Baseline Data Prediction of Friedreich’s Ataxia Patients. IEEE Trans. Ind. Informat. 2021, 17, 1428–1437. [Google Scholar] [CrossRef]
Wang, X.; Dai, Z.; Li, H.; Yang, J. Research on Hybrid Collaborative Filtering Recommendation Algorithm Based on the Time Effect and Sentiment Analysis. Complex 2021, 2021, 6635202:1–6635202:11. [Google Scholar] [CrossRef]
van den Oord, A.; Dieleman, S.; Schrauwen, B. Deep content-based music recommendation. In Proceedings of the Advances in Neural Information Processing Systems 26: 27th Annual Conference on Neural Information Processing Systems 2013, Lake Tahoe, NV, USA, 5–8 December 2013; Burges, C.J.C., Bottou, L., Ghahramani, Z., Weinberger, K.Q., Eds.; 2013; pp. 2643–2651. [Google Scholar]
Luo, X.; Zhou, M.; Li, S.; You, Z.; Xia, Y.; Zhu, Q. A Nonnegative Latent Factor Model for Large-Scale Sparse Matrices in Recommender Systems via Alternating Direction Method. IEEE Trans. Neural. Netw. Learn. Syst. 2016, 27, 579–592. [Google Scholar] [CrossRef]
Costa, F.S.D.; Dolog, P. Convolutional Adversarial Latent Factor Model for Recommender System. In Proceedings of the Thirty-Second International Florida Artificial Intelligence Research Society Conference, Sarasota, FL, USA, 19–22 May 2019; Barták, R., Brawner, K.W., Eds.; AAAI Press: Palo Alto, CA, USA, 2019; pp. 419–424. [Google Scholar]
Mongia, A.; Jhamb, N.; Chouzenoux, É.; Majumdar, A. Deep latent factor model for collaborative filtering. Signal Process. 2020, 169, 107366. [Google Scholar] [CrossRef] [Green Version]
Shang, M.; Luo, X.; Liu, Z.; Chen, J.; Yuan, Y.; Zhou, M. Randomized latent factor model for high-dimensional and sparse matrices from industrial applications. IEEE CAA J. Autom. Sinica 2019, 6, 131–141. [Google Scholar] [CrossRef]
Luo, X.; Zhou, M.; Shang, M.; Li, S.; Xia, Y. A Novel Approach to Extracting Non-Negative Latent Factors From Non-Negative Big Sparse Matrices. IEEE Access 2016, 4, 2649–2655. [Google Scholar] [CrossRef]
Gogna, A.; Majumdar, A. SVD free matrix completion with online bias correction for Recommender systems. In Proceedings of the 2015 Eighth International Conference on Advances in Pattern Recognition (ICAPR), Kolkata, India, 4–7 January 2015; pp. 1–5. [Google Scholar] [CrossRef]
Li, J.; Yuan, Y.; Ruan, T.; Chen, J.; Luo, X. A proportional-integral-derivative-incorporated stochastic gradient descent-based latent factor analysis model. Neurocomputing 2021, 427, 29–39. [Google Scholar] [CrossRef]
Li, J.; Yuan, Y. A Nonlinear Proportional Integral Derivative-Incorporated Stochastic Gradient Descent-based Latent Factor Model. In Proceedings of the 2020 IEEE International Conference on Systems, Man, and Cybernetics, SMC 2020, Toronto, ON, Canada, 11–14 October 2020; pp. 2371–2376. [Google Scholar] [CrossRef]
Kalman, D. A Singularly Valuable Decomposition: The SVD of a Matrix. Coll. Math. J. 1996, 27, 2–23. [Google Scholar] [CrossRef]
Huang, K.; Sidiropoulos, N.D.; Swami, A. Non-Negative Matrix Factorization Revisited: Uniqueness and Algorithm for Symmetric Decomposition. IEEE Trans. Signal Process. 2014, 62, 211–224. [Google Scholar] [CrossRef] [Green Version]
Najariyan, M.; Zhao, Y. Granular fuzzy PID controller. Expert Syst. Appl. 2021, 167, 114182. [Google Scholar] [CrossRef]
Yu, H.; Guan, Z.; Chen, T.; Yamamoto, T. Design of data-driven PID controllers with adaptive updating rules. Automatica 2020, 121, 109185. [Google Scholar] [CrossRef]
Shekhar, S.; Xiong, H. Root-Mean-Square Error. In Encyclopedia of GIS; Shekhar, S., Xiong, H., Eds.; Springer US: Boston, MA, USA, 2008; p. 979. [Google Scholar] [CrossRef]
Sammut, C.; Webb, G.I. (Eds.) Mean Absolute Error. In Encyclopedia of Machine Learning and Data Mining; Sammut, C.; Webb, G.I. (Eds.) Springer US: Boston, MA, USA, 2017; p. 806. [Google Scholar] [CrossRef]
Harper, F.M.; Konstan, J.A. The MovieLens Datasets: History and Context. Acm Trans. Interact. Intell. Syst. 2015, 5. [Google Scholar] [CrossRef]
Hug, N. Surprise: A Python library for recommender systems. J. Open Source Softw. 2020, 5, 2174. [Google Scholar] [CrossRef]

Figure 1. Flowchart of PID controller.

Figure 2. Training process of compared models on MovieLen-100 K with 50 latent factors.

Figure 3. Training process of compared models on MovieLen-100 K with 100 latent factors.

Figure 4. Training process of compared models on MovieLen-1 M with 50 latent factors.

Figure 5. Training process of compared models on MovieLen-1 M with 100 latent factors.

Figure 6. Training process of compared models on MovieLen-10 M with 50 latent factors.

Figure 7. Training process of compared models on MovieLen-10 M with 100 latent factors.

Table 1. Datasets of different sizes.

Dataset	Users	Item	Ratings	Density
100 K	943	1682	100,000	6.30%
1 M	6040	3952	1,000,209	4.19%
10 M	71,567	10,681	10,000,054	1.31%

Table 2. The prediction result on MovieLen-100 K.

Method	K = 50		K = 100
Method	RMSE	MAE	RMSE	MAE
SVD	0.9368	0.7386	0.9340	0.7365
BiasSVD	0.9272	0.7298	0.9198	0.7215
PSL	0.9299	0.7307	0.9189	0.7214
BiasPSL	0.9157	0.7186	0.9035	0.7065

Table 3. The prediction result on MovieLen-1 M.

Method	K = 50		K = 100
Method	RMSE	MAE	RMSE	MAE
SVD	0.8625	0.6817	0.8603	0.6805
BiasSVD	0.8566	0.6733	0.8540	0.6715
PSL	0.8635	0.6796	0.8589	0.6758
BiasPSL	0.8506	0.6666	0.8459	0.6627

Table 4. The prediction result on MovieLen-10 M.

Method	K = 50		K = 100
Method	RMSE	MAE	RMSE	MAE
SVD	0.7973	0.6174	0.7943	0.6152
BiasSVD	0.7916	0.6089	0.7906	0.6084
PSL	0.7993	0.6134	0.7957	0.6108
BiasPSL	0.7885	0.6040	0.7862	0.6024

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Sui, J.; Yin, J. A Biased Proportional-Integral-Derivative-Incorporated Latent Factor Analysis Model. Appl. Sci. 2021, 11, 5724. https://doi.org/10.3390/app11125724

AMA Style

Sui J, Yin J. A Biased Proportional-Integral-Derivative-Incorporated Latent Factor Analysis Model. Applied Sciences. 2021; 11(12):5724. https://doi.org/10.3390/app11125724

Chicago/Turabian Style

Sui, Jialu, and Jian Yin. 2021. "A Biased Proportional-Integral-Derivative-Incorporated Latent Factor Analysis Model" Applied Sciences 11, no. 12: 5724. https://doi.org/10.3390/app11125724

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Biased Proportional-Integral-Derivative-Incorporated Latent Factor Analysis Model

Abstract

1. Introduction

2. Related Work

3. Preliminaries

3.1. SVD

3.1.1. Conventional Matrix Decomposition SVD Model

3.1.2. Biased SVD Model

3.2. PSL Model

3.2.1. PID Controller

3.2.2. PSL Model

4. Materials and Methods

4.1. Algorithm Description

4.2. Algorithm Analysis

5. Experiment Results

5.1. General Settings

5.2. Dataset

5.3. Model Comparison Test

5.4. Results

6. Discussion

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI