Sequential Movie Genre Prediction Using Average Transition Probability with Clustering

Kim, Jihyeon; Kim, Jinkyung; Choi, Jaeyoung

doi:10.3390/app112411841

Open AccessArticle

Sequential Movie Genre Prediction Using Average Transition Probability with Clustering

by

Jihyeon Kim

,

Jinkyung Kim

and

Jaeyoung Choi

^*

School of Computing, Gachon University, 1342 Seongnamdaero, Sujeong-gu, Seongnam-si 13120, Korea

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2021, 11(24), 11841; https://doi.org/10.3390/app112411841

Submission received: 2 November 2021 / Revised: 8 December 2021 / Accepted: 10 December 2021 / Published: 13 December 2021

Download

Browse Figures

Versions Notes

Abstract

:

In recent movie recommendations, one of the most important issues is to predict the user’s sequential behavior to be able to suggest the next movie to watch. However, capturing such sequential behavior is not easy because each user’s short-term or long-term behavior must be taken into account. For this reason, many research results show that the performance of recommending a specific movie is not good in a sequential recommendation. In this paper, we propose a cluster-based method for classifying users with similar movie purchase patterns and a movie genre prediction algorithm rather than the movie itself considering their short-term and long-term behaviors. The movie genre prediction does not recommend a specific movie, but it predicts the genre for the next movie to watch in consideration of each user’s preference for the movie genre based on the genre included in the movie. Using this, it will be possible to provide appropriate guidelines for recommending movies including the genres to users who tend to prefer a specific genre. In particular, in this study, users with similar genre preferences are organized into clusters to recommend genres. For clusters that do not have relatively specific tendencies, genre prediction is performed by appropriately trimming genres that are not necessary for recommendation in order to improve performance. We evaluate our method on well-known movie data sets and qualitatively determine that it captures personalized dynamics and is able to make meaningful recommendations.

Keywords:

movie genre prediction; sequential recommendation; transition probability; user clustering

Graphical Abstract

1. Introduction

One of the most important parts of a recommendation system is to model the interactions between users and items as well as the relationships amongst the items themselves [1]. The former is usually called item-to-user recommendation and the latter is called item-to-item recommendation. In a sequential recommendation, user preferences and sequential patterns can be extracted based on these two kinds of interactions. However, learning the personalized sequential behavior from collaborative data is not an easy task since the long-term and short-term dynamics of the users have to be carefully considered for both personalization and sequential transitions. In particular, if data are sparse, estimating the parameters of learning models becomes difficult. For this reason, the main objective is to design personalized models of user behavior using user purchase histories for sequential recommendation systems [2]. In determining the short-term dynamics of a user’s behavior, approaches based on the Markov chain (MC) assume that the next action depends on only the previous action. Since the last item is, in general, the key factor affecting the user’s next action, first-order MC-based methods show strong performance, especially on sparse data sets [3]. Recurrent neural networks (RNNs) have been used for the long-term dynamics of user behavior. RNNs model all previous actions with a hidden state, which is then used to predict the next action [4]. Both approaches, while strong in specific cases, are limited to certain types of data. MC-based approaches work well with high sparsity settings, but are not as successful in determining the intricate dynamics of long-term complex scenarios. In contrast, RNNs work well in long-term scenarios, which require large amounts of data.

Recently, session-based recommendation systems (SBRSs) have been investigated [5] as a new approach in recommendation systems. Distinct from other collaborative-filtering (CF)-based recommendation systems, SBRSs usually model long-term, static user preferences. SBRSs try to capture the short-term, dynamic user preferences for more sensitive and accurate recommendations during the evolution of session contexts. Unlike purchasing one specific item, the authors in [6] considered a sequential recommendation scenario in which users purchase similar items simultaneously. In this scenario, the main objective of the recommendation is to propose a personalized “list of items" to each user. Similar to this approach, attribute prediction for movie data can also be considered. For instance, the genre/category of a movie can be an important attribute of user/item similarity in recommendation systems. This information is provided when new content is created. Based on this, the authors in [7] consider a movie recommendation system, based on genre correlations, to improve the existing genre correlation algorithm, and compare the results obtained with those of the previous algorithm. In [8], the authors introduce a recommendation system using movie genre similarity and preferred genres. However, in these studies, only a static situation is considered, not a sequentially changing recommendation system.

In this paper, we consider the prediction of movie genres included in preferred movies before recommending movies (See Figure 1). The genre is one of the important features of a movie, which gives guidelines on which movies each user prefers. In the sequential movie recommendation system, we extracted the genre included in the movie each user watched, studied the genre preference, and conducted a study on what movie genre the user will see next. Our genre prediction algorithm does not predict just one genre preferred by each user but multiple genres simultaneously. Hence, it will possible to recommend movies containing several genres so that it will be used appropriately by mixing some existing movie recommendation techniques. In addition, if the recommendation system recommends movies using these genres, it can immediately recommend popular or new movies corresponding to the genres that users prefer. This method can be an alternative to solving the cold-start problem that appears in the common collaborative filtering movie recommendation system.

Our main contributions are described as follows:

First, unlike sequential movie recommendation, which recommends the movie itself, we study which genre the user would prefer based on the user’s past sequence movie selection pattern. Although it cannot be used directly in sequential movie recommendation systems, it can show how well genre-based prediction works in learning user preferences.
Second, to analyze the sequential pattern for the genres, which is one of the main attributes, given the data on the sequential pattern of the movie itself, we introduce an average transition probability (ATV) between genres as a Markov chain to reflect the short-term behavior of the user’s preference in RNN-based models. To see the effect of the average transition probability, we consider four kinds of training data with combining genre vectors.
Third, a proper clustering approach is adopted based on k-means clustering to group similar preferences for movie genres. To improve the prediction performance, a method is proposed to properly trim genres that perform poorly using the results of the RNN-based sequential learning models presented above.
Finally, we evaluate our method on well-known movie data sets, and qualitatively determine that it captures personalized dynamics and is able to make meaningful recommendations for movie genres. The results show that clustering with trimming improves the prediction performance, whereas applying ATV provides negligible performance improvement.

The remainder of this paper is organized as follows. In Section 2, we discuss related studies. In Section 3, our clustering and training methods are presented. In Section 4, the experimental results of our proposed methods are presented, and some limitation and future works are discussed in Section 5. In Section 6, we conclude the paper.

2. Related Works

In recent sequential recommendation systems, most studies have focused on predicting the short-term and long-term preference dynamics of users. The Markov chain approach has been studied as a short-term dynamic approach. Zimdars et al. [9] described a sequential recommender based on Markov chains. They studied how to capture sequential patterns to predict the next state with a standard predictor such as a decision tree. Rendle et al. [6] proposed a factorizing personalized Markov chain (FPMC) to model the history of a basket based on user-specific transitions of a Markov chain. FPMC propagates information among users, items, and transitions favoring similar patterns to extract the sequential pattern. Shani et al. [10] considered a recommendation system based on Markov decision processes (MDPs). For this, they used maximum likelihood estimates (MLE) of MC transition graphs and suggested several heuristic approaches such as clustering and skipping. Mobasher et al. [11] adopted pattern mining methods to extract sequential patterns for generating recommendations. He et al. [3] proposed a translation-based recommendation (TransRec) for sequential data. Their approach considers items (movies) as translation vectors. Khorasani et al. [12] used MC to recommend courses taken by students. They estimated the transition probability of the MC from the record of courses students take based on MLE and enhanced MLE with skip-gram modeling. Konen et al. [13] considered the temporal dynamics of the evolution of users and items over time-based Netflix data.

For the long-term dynamics, most recommendations rely on matrix factorization (MF) or other similarity-based approaches. In previous work [14], the authors used MF to formulate the recommendation problem as a problem that infers missing values from a partially observed user–item matrix. Srebro et al. [15] proposed a maximum margin MF, which uses low-norm instead of low-rank factorization. Salakhutdinov et al. [16] considered a probabilistic MF (PMF) model that expresses the user preference matrix as the product of two lower-rank user and item matrices. The PMF approach is especially effective in making better predictions from sparse user rating data. He et al. [1] suggested an extended FPMC, called Fossil, to present the information of sequential patterns by considering high-order Markov chains and similarity models. Factorization machine-based sequential recommendation systems usually utilize matrix or tensor factorization to factorize the observed user–item-related data into latent factors of users and items for recommendations [4]. Specifically, some works [17,18] have used the estimated latent representations as input of a network to further calculate an interaction score between users and items or successive users’ actions.

Recently, deep learning technologies, such as RNNs [4,19,20], long short-term memory (LSTM) [21,22], and Gated Recurrent Unit (GRU) [23] have been applied to the sequential recommendation problem. These deep-learning-based recommendation systems have performed particularly well in sequential recommendations. In [4], the authors suggested new ranking loss functions corresponding to RNNs in the recommendation model. In [19], the authors designed a novel recommendation model, based on recurrent collaborative filtering (RCF), which combines RNN and CF. In [20], the authors introduced an algorithm called recurrent translation-based network (RTN). Their model reflects both short-term and long-term user’ preferences. In [21], the authors considered LSTM to extract the dependencies of both users and movies. Unlike prior recommendation models, they considered a method of updating the state with recent operations as input. In [22], the authors introduced an LSIC model, leveraging long- and short-term information in content-aware movie recommendation via adversarial training, which combines the global behaviors from MF with RNN for the top-N movies.

In [23], the authors considered a GRU-based RNN for session-based recommendations. Yuan et al. [24] suggested a convolutional neural network (CNN) that provides a sequence of user–item interactions. In their model, a CNN initially stores the user–item interactions in a matrix, regarding the matrix as an “image” in the time and latent spaces. Wu et al. [25] proposed a GNN to capture the sequential behavior of complex transitions over user–item interactions. Zhang et al. [26] adopted a self-attention mechanism to extract the item–item interactions from the user’s historical interactions. Sachdeva et al. [27] considered a variational autoencoder to model the user’ preference. based on historical sequential data and combined latent variables with temporal dependencies for preference modeling. Similar to our work, Choi et al. [7] designed a movie recommendation algorithm based on genre correlations. For this, they assumed that movie genres are defined by experts, such as directors or producers, to guarantee reliability. They then computed genre correlations and used them in a movie recommendation system. In [8], the authors also considered movie genre similarity to provide related services in a mobile experimental environment.

3. Genre Prediction Algorithm

In this section, we will propose a movie genre prediction algorithm. To do this, we first classify the genres included in the movie data watched by each user as shown in Figure 2. Next, we cluster the users into similar groups based on the ratings of the movies. Then, the average transition probability is estimated from genre to genre for each cluster, which is subsequently used to train deep learning models. Because sparse data of genres may cause poor performance in predicting the genres, the genres are appropriately trimmed after model training. Finally, a preferred genre is predicted from the group closest to the user. Based on the predicted genre, suitable movies that contain the genre can be recommended. The above steps are detailed in the following subsections.

3.1. Data Preprocessing

As a sequential movie genre prediction, we assume data in which information of several genres included in each movie is given. (If some information on genres is missing from the movie data, we consider the case where the genre can be estimated by applying the existing techniques [28,29] for classifying the genre). Based on this, we consider movie data by user ID and timestamp to extract each user’s movie sequence in chronological order (left part of Figure 3). We drop user data with five or fewer movie viewing sequences and import user data with five or more movie viewing sequences in the preprocessing. At this time, the five most recent movies generated per user are arranged in chronological order. Next, information on genres included in movies that the user has watched is organized (middle part of Figure 3). One can see that a single movie contains several genres simultaneously. We extract all kinds of genres that each film contains and set the data sequence to n dimensions (

n > 0

) of one-hot vector, meaning each of the n genres (right part of Figure 3). We denote

G

as the set of genres in the paper.

3.2. Clustering

To reflect user similarity, we consider a clustering approach. For this, we consider that each user scores a range between 0.5 and 5 rating for the most recent five movies they watched. Based on the rating sequences of each user, we obtain the average rating of each genre as shown in Figure 4. Let

U

be the set of users and consider an average rating data is generated per one user so we have

| U |

by n rating matrix. Using this matrix, we apply k-means clustering to obtain clusters. We let

C : = {C_{1}, \dots, C_{k}}

be a set of clusters

C_{l}

for

1 \leq l \leq k

after performing the clustering.

3.3. Average Transition Probability of Genres

3.3.1. Markov Chain for Set of Genres

In our genre prediction system, we use transition probabilities from genre to genre. It is known that many approaches for the sequential recommendation, the MC is used to reflect the short-term sequential behavior of a user. The MC assumes the next choice of item depends only on the current choice. Formally, it is described as follows. The transition probability matrix is generated for each cluster. To do this, we first consider the sequence of selected movies for each user in a cluster. However, as described before, a movie may contain multiple genres such as romance, action, and comedy simultaneously. We consider all genres included in the current and next movies and count them in the n by n matrix. We consider the transition matrix for each cluster separately. Then, for all

C_{l} \in C

, we summarize them for all the user’s chosen movies and normalize them to obtain the transition probability from genre to genre as shown in Figure 5.

To describe this, we let

M_{t}^{l}

and

M_{t - 1}^{l}

be the selected movie sets for all user

u \in C_{l}

at time t and

t - 1

, respectively. Then, the transition probability of the first-order Markov chain for the movie selection for the cluster l is given by:

\begin{matrix} p (M_{t}^{l} | M_{t - 1}^{l}) . \end{matrix}

(1)

However, in the genre prediction, we focus on the transition from genre (included in a movie) to genre. For this, we let

G_{t} \subset G

be the set of genres that are contained in the movie

M_{t}^{l}

for all user

u \in C_{l}

at time

t .

Consider two genres

i, j \in G

; we model the genre transition probability in the cluster

C_{l}

as:

\begin{matrix} p_{i j} : = p (j \in G_{t}^{l} | i \in G_{t - 1}^{l}) . \end{matrix}

(2)

3.3.2. Estimation of Transition Probabilities

To make predictions using (2), the transition probability needs to be estimated. To achieve this, consider the following ratio:

\begin{array}{l} {\hat{p}}_{i j} : = \hat{p} (j \in G_{t}^{l} | i \in G_{t - 1}^{l}) & = \frac{\hat{p} (j \in G_{t}^{l} \land i \in G_{t - 1}^{l})}{\hat{p} (i \in G_{t - 1}^{l})} \end{array}

(3)

\begin{array}{l} = \frac{| {(G_{t}^{l}, G_{t - 1}^{l}) : j \in G_{t}^{l} \land i \in G_{t - 1}^{l}} |}{| {(G_{t}^{l}, G_{t - 1}^{l}) : i \in G_{t - 1}^{l}} |}, \end{array}

(4)

where the value of the denominator in (4) is the number of genre i at time

t - 1

and the numerator means the number of genre i at time

t - 1

and genre j at the next time point t. Hence, the estimated transition probability indicates the ratio of the number of genre j, which is selected at time t among number of genres i at time

t - 1 .

However, since the user does not select a specific genre, but sequentially selects a movie including several genres, the transition probability between genres cannot be used by itself. Hence, using the movie data including several genres, we count the number of transitions to each genre. For example, if a movie contains three genres (romance, action, and comedy) as in Figure 5, count all genres which are included in the next selected movie. Then, we compute the ratio in (4) so that we have the estimated transition probabilities from romance to each genre, from action to each genre, and from comedy to each genre, respectively. Next, we take an average for these tree transition probabilities and call it an average transition probability Vector (ATV) for each selected movie. The reason why we use the ATV is that there is no information about transition from a specific genre to another genre in actual data, only information about transition from a movie including these genres to another movie is given. Formally, the ATV can be presented by:

\begin{matrix} {\hat{p}}_{j}^{A T V} : = p (j \in G_{t}^{l} | G_{t - 1}^{l}) = \frac{1}{| G_{t - 1}^{l} |} \sum_{i \in G_{t - 1}^{l}} p (j \in G_{t}^{l} | i \in G_{t - 1}^{l}), \end{matrix}

(5)

for all

j \in G .

We will use this ATV for training with each user’s selected movie sequence.

3.4. Model Training

3.4.1. Training Data Types

As training data, we consider the following four types of training data during the model training: (1) sum of transition vector and movie genre embedding, (2) multiplication of transition vector and movie genre embedding, (3) successive transition vector and movie genre embedding, and (4) movie genre embedding.

First, the sum of ATV and movie genre embedding data is simply the summation of the movie genre vector and ATV, as shown in Figure 6. Second, the multiplication of ATV and movie genre embedding is the data obtained after componentwise multiplication of these two vectors, which results in a new vector. Third, the successive ATV and movie genre embedding refers to the data obtained by attaching ATV to the end of the movie genre for model training. Finally, the movie genre only is the data that considers only the movie genre vector. The training data types were selected to check by how much the ATV considered for short-term dynamics helps to improve model performance. The results for these four training data types are presented in the section on experimental results.

3.4.2. Training Models

In our approach, we use RNN-based models to capture the long-term dynamics of sequential movie genre data such as RNN, LSTM, and GRU. These methods are described in detail as follows:

(1) RNN. First, RNN [30] is a deep learning model that was designed to be useful for sequential data processing. RNN is a recursive model that performs the same function on all input data, and the output for the current input depends on past calculations. When the output data are generated, they are copied and sent back into the recurrent network. Based on the current input and output generated from the previous input, the RNN learns certain sequential data and makes a decision.

To formally describe the method, let

x_{t}

be the input vector and

y_{t}

be the output vector at time t as shown in Figure 7. Then, the state value of hidden layer

h_{t}

at time t is given by:

\begin{matrix} h_{t} = \tanh (U x_{t} + W h_{t - 1} + b), \end{matrix}

(6)

where U and W are model parameter matrices and b is a constant vector. As a function of

h_{t}

, we consider a hyperbolic tangential function

\tanh (\cdot)

. The output vector

y_{t}

is given by:

\begin{matrix} y_{t} = f (V h_{t} + b), \end{matrix}

(7)

where V is a model parameter and f is an activation function. The RNN is optimized to approximate the function by capturing sequential patterns. However, if the length of the sequence input to the RNN is long, the effect of the elements at the beginning of the sequence gradually decreases as the time step progresses and the effect disappears after a certain period of time. This is because the constant value is multiplied equally in each cycle. This is called a long-term dependency problem. RNN is useful for a short sequence of data.

(2) LSTM. To overcome the main disadvantage of RNN, LSTM [31] has been introduced as an improved method. LSTM is a kind of RNN capable of selectively remembering sequences for a long period of time. The main difference from RNN is that LSTM introduces a “cell state" for each time t, which allows information to flow unaltered. In LSTM, the cell state is regarded as a long-term memory becasue the previous information is stored in it as a recursive nature of the cells. The forget gate is used to update the cell states. A forget gate outputs values for which information to forget by multiplying 0 to a position in the matrix. If the output of the forget gate is 1, the information is kept in the cell. The input gates determine which information should enter the cell states. Finally, the output gate determines which information should be passed on to the next hidden state. Based on this fact, LSTM addresses the long-term dependency problem of the RNN. In general, LSTM consists of the following four parts as shown in Figure 8:

(i): Forget Gate Layer. As a first part, the forget gate layer decides to filter some information from the cell state using a sigmoid function. It obtains information at $h_{t - 1}$ and $x_{t}$ and outputs a number between 0 and 1 for each number in the cell state $c_{t - 1}$ . The number 1 implies “completely keep this” while 0 represents “completely drop this.” The output of the forget gate vector $t_{t}$ is given by:

$\begin{matrix} f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f}), \end{matrix}$

(8)

where $σ$ is a sigmoid function and $W_{f}$ and $b_{f}$ are weight matrix and bias vector parameter.
(ii): Input Gate Layer. In the next step, LSTM decides whether new information to store or not in the cell state. For this, an “input gate layer” decides the values to be updated as a sigmoid gate. Next, a tanh gate generates a vector of new values, ${\tilde{c}}_{t}$ , that could be added to the state. Then, these two layers are combined to create an update to the state. The input gate vector $i_{t}$ is given by:

$\begin{matrix} i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}), \end{matrix}$

(9)

where $W_{i}$ and $b_{i}$ are weight matrix and bias vector parameter. The cell input activation vector ${\tilde{c}}_{t}$ is computed by:

$\begin{matrix} {\tilde{c}}_{t} = \tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c}), \end{matrix}$

(10)

where $W_{c}$ and $b_{c}$ are weight matrix and bias vector parameter and $\tanh (\cdot)$ is a hyperbolic tangential function as a sigmoid function.
(iii): Cell State Update. Next, LSTM performs a cell state update procedure to update the old cell state, $c_{t - 1}$ , into the new cell state $c_{t}$ . The previous steps already decided what to do; it just needs to actually do it. Then, it multiplies the old state to $f_{t}$ , forgetting the things it decided to forget earlier. Then, it adds $i_{t} \cdot {\tilde{c}}_{t}$ . This is the new candidate value, scaled by how much it decided to update each state value. The update of cell state $_{t}$ is computed by:

$\begin{matrix} c_{t} = f_{t} * c_{t - 1} + i_{t} * {\tilde{c}}_{t} . \end{matrix}$

(11)
(iv): Output Gate Layer. Finally, in the output gate layer, LSTM decides what information going to be output. This output will be based on the cell state, but it will be a filtered version. First, it runs a sigmoid layer that decides what parts of the cell state going to output. Then, it puts the cell state through tanh and multiplies it by the output of the sigmoid gate, so that it only output the parts it decided to output. The output gate vector $o_{t}$ is given by:

$\begin{matrix} o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}), \end{matrix}$

(12)

where $W_{o}$ and $b_{o}$ are weight matrix and bias vector parameter of the output gate layer. Here, $h_{t}$ is computed by:

$\begin{matrix} h_{t} = o_{t} * \tanh (c_{t}) . \end{matrix}$

(13)

(3) GRU. Cho et al. [33] first introduced a slight variation on the LSTM named GRU. It uses the forget and input gates as a single update gate. Furthermore, it also combines the cell state and hidden state. It is known that the GRU is simpler than the LSTM model.

The detailed structure of GRU as shown in Figure 9 is in what follows:

(i): Update Gate. In GRU, it first begins with computing the update gate $z_{t}$ for time step t by:

$\begin{matrix} z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}]), \end{matrix}$

(14)

where $W_{z}$ is a weight matrix. When the input $x_{t}$ is generated into the network, it is multiplied by its own weight $W_{z}$ . The previous $h_{t - 1}$ also multiplied by the current input $x_{t}$ . As an activation function, a sigmoid is commonly used. The update gate is used to determine how much of the past information (from previous time steps) needs to be passed along to the next. The most useful fact is that the model can control to copy all the information from the past and eliminate the risk of vanishing gradient problems.
(ii): Reset Gate. Next, a reset gate is applied from the model to decide how much of the past information to forget by:

$\begin{matrix} r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}]), \end{matrix}$

(15)

The difference between this from the update gate is the weights and the gate’s usage. Like similar steps in the update gate, it plugs in $h_{t - 1}$ and $x_{t}$ , multiplies them with their corresponding weights, sums the results and applies the sigmoid function.
(iii): Current memory content. The current memory content is then used for the reset gate to store the relevant information from the past. It is computed as:

$\begin{matrix} {\tilde{h}}_{t} = \tanh (W \cdot [r_{t} * h_{t - 1}, x_{t}]), \end{matrix}$

(16)

where W is a weight matrix and the operator * denotes the Hadamard elementwise product. Then, the result determines what to remove from the previous time steps. In this step, it uses tanh as the nonlinear activation function.
(iv): Final memory at current time step. As the last step, the network needs to calculate $h_{t}$ , which is a vector that holds information for the current unit and passes it down to the network. In order to do that, the update gate is needed. It determines what to collect from the current memory content ${\tilde{h}}_{t}$ and what to collect from the previous steps $h_{t - 1}$ by weighting the update gate value:

$\begin{matrix} h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\tilde{h}}_{t} . \end{matrix}$

(17)

3.4.3. Subgenre Trimming

Finally, using the evaluation results of the trained models, we perform a subgenre trimming process based on a predefined threshold of the evaluation metric scores for each cluster. The detailed process is described as pseudocode of Algorithm 1. For this, we first select clusters that do not satisfy the criteria of evaluation metrics.

Algorithm 1: Subgenre trimming.

Input: Set of movie genre matrices

M : = {M_{i}}_{i = 1}^{k}

for each cluster

C_{i}

with each evaluated
value

P_{e}^{i}

, threshold parameters

η

and

θ .

Output: Subgenre trimmed matrices set

M^{^{'}} : = {M_{1}^{^{'}}, \dots, M_{k}^{^{'}}} .

Set

C_{i}^{^{'}} = \emptyset

for all

1 \leq i \leq k

;
for

1 \leq i \leq k

do
Set

| C_{i} |

as the length (number of movies) of a cluster

C_{i}

for each i and set;

\begin{matrix} P_{m i n}^{i} : = \min_{e \in E} P_{e}^{i} . \end{matrix}

(18)

if

P_{m i n} < η

then
for

1 \leq j \leq n

do
Count the number of genre j in a column of movie genre matrix

M_{i}

and set it
to

g_{j} = \sum_{u \in C_{i}} c_{u j}

. If

g_{j} < 100 \times θ_{i}

, replace all values of the column j by zero;
        end for
    end if

M_{i}^{'} \leftarrow M_{i}

;
end for
Return

M^{'} : = {M_{1}^{'}, \dots, M_{k}^{'}}

;

More precisely, let

P_{e}^{i}

be the value of the evaluation for a performance metric

e \in E

of cluster i, where

E

is a set of performance metrics such as

E = {P r e c i s i o n, R e c a l l, A c c u r a c y}

.

Next, we let

P_{m i n}^{i} : = \min_{e \in E} P_{e}^{i}

be the minimum evaluated value of

P_{e}

for all

e \in E

. Then, we check the value

P_{m i n}^{i}

for each cluster, and if there exists an evaluation value less than a predefined threshold

η > 0

, i.e.,

P_{m i n}^{i} < η

, then we choose the cluster as the target cluster for the subgenre trimming. For example, if we consider the evaluation metrics as precision and recall and the threshold

η

is given by 0.5, the cluster 3 does not satisfy this as shown in Figure 10. Hence, it is regarded as a target cluster for the subgenre trimming. After selecting target clusters, we find that subgenres that are less than

θ_{i}

percent of the total length (number of movies) of each target cluster i are trimmed. To do this, let

M_{i} = {[c_{u j}]}_{u \in C_{i}, 0 \leq j \leq n}

be the movie genre matrix for the cluster i and let

M : = {M_{i}}_{i = 1}^{k}

. Then, using the matrix, we find the genres that the number of total sum is less than

100 \times θ_{i}

, i.e.,

\sum_{u \in C_{i}} c_{u j} < 100 \times θ_{i}

. To minimize the data loss, we replace the values as zero rather than deleting them. The reason for this to increase the accuracy of evaluation of clusters that do not have explicit preferences. In the example of Figure 10, the length of the cluster is 100 and

θ = 0.1

; then, we have

100 \times 0.1 = 10

. We choose genres that do not have more than 10 data in the cluster such as documentary and war. After these procedures, we finally obtain the subgenre trimmed matrix set

M^{^{'}} : = {M_{1}^{^{'}}, \dots, M_{k}^{^{'}}}

.These matrices are then used to obtain the performance results.

4. Experimental Results

In this section, the experimental results are presented. For this, a well-known movie data set is used. The performance metrics of the evaluation are as follows:

4.1. Data

For the experiment, we consider two data sets: (1) a movielens data set (ml-25m) with 25 million ratings and one million tag applications applied to 62,000 movies by 12,429 users [35]. We drop user data with 5 or fewer movie viewing sequences and import user data with 5 or more movie viewing sequences to configure the data set for the experiment. After this process, it becomes 11,552 users with 6166 movie data. Approximately 7% of users were removed. (2) a movielens data set(latest) with 100,000 ratings and 3600 tag applications applied to 9000 movies by 600 users [36]. In this data, we also applied the same filtering step as the previous one. In our setting, the experiment was conducted by dividing the movie sequence data set, which includes 11,552 users, into a ratio of training 0.8 and test 0.2. For the sequential recommendation, we sort the data by ‘userId’ and ‘timestamp’ (to extract each user’s movie sequence in chronological order) as shown in Figure 3. At this time, five movie data generated per user are arranged in chronological order. To train the model, we convert the data sequence to 19 dimensions of a one-hot vector, meaning each of the 19 genres: {Action, Adventure, Animation, Children, Comedy, Crime, Documentary, Drama, Fantasy, Film-Noir, Horror, IMAX, Musical, Mystery, Romance, Sci-Fi, Thriller, War, Western}.

4.2. Performance Metrics

As performance metrics, we consider (i) recall, (ii) precision, (iii) accuracy, and (iv) F1 score. To formally explain these metrics, true positive (TP) is the number of correctly predicted positive values, that is, the actual value is yes, and the predicted value is also yes; true negative (TN) indicates the number of correctly predicted negative values, that is, the actual value is no, and the predicted value is also no; false positive (FP) is when the actual value is no, but the predicted value is yes; false negative (FN) is when the actual value is yes, but the predicted value is no. The three metrics are described as follows:

(i): Precision: Precision is the ratio of correctly predicted positive answers to the total predicted positive answers.

$\begin{matrix} P r e c i s i o n : = \frac{T P}{T P + F P} . \end{matrix}$

(19)
(ii): Recall: Recall is the ratio of correctly predicted positive answers to all answers in the actual class of answers.

$\begin{matrix} R e c a l l : = \frac{T P}{T P + F N} . \end{matrix}$

(20)
(iii): Accuracy: Accuracy is the ratio of correctly predicted answers to the total number of answers.

$\begin{matrix} A c c u r a c y : = \frac{T P + T N}{T P + F P + F N + T N} . \end{matrix}$

(21)

Accuracy is a good measure when the values of false positives and false negatives of the data sets are almost the same.
(iv): F1 Score: This metric is a weighted average of precision and recall. Therefore, this score considers both false positives and false negatives as follows:

$\begin{matrix} F 1 s c o r e : = \frac{2 (R e c a l l * P r e c i s i o n)}{R e c a l l + P r e c i s i o n} . \end{matrix}$

(22)

The F1 score is usually more useful than accuracy when the values of false positives and false negatives of the data sets are quite different.

Using the previously described data and performance metrics, we obtain various experimential results of the movie genre prediction in the following subsection.

4.3. Results

The results show the extent to which the prediction performance is affected by (1) clustering, (2) subgenre trimming, and (3) ATV. To observe the clustering effect, the results before and after clustering are presented. Seven clusters are considered after applying kNN during the clustering step. The results show the mean performance of all clusters and the best and worst performance of clusters among them. Settings used to select the trimming clusters are

η = 0.5

and

θ_{i} = 0.1

for all clusters i.

4.3.1. Effects of Clustering

As a first result, the performance of each model before and after clustering is displayed in Figure 11. Without clustering (Figure 11a,d,g), the performance is measured by considering all users without distinguishing the users based on any criteria. This is because the data are not classified. It is difficult for the models (RNN, LSTM, and GRU) to grasp the data, and it is difficult to extract any information. Therefore, all three models exhibited relatively poor performance. To improve the performance, clustering was applied to users with similar preferences. After clustering (Figure 11b,e,h), the performance is significantly improved in all stages of the experiment. Users with similar preferences are grouped together, so that in the case of a group in which preferences are well expressed, the range of values between preferred and nonpreferred genres is large. In other words, the number of data from genres with clear preferences is overwhelmingly large. This helps the process of recommending movies. However, there are occasionally clusters where the preferences are not evident (Figure 11c,f,i). In Table 1, Table 2 and Table 3, the results of the four performance metrics (recall, precision, accuracy, and F1 score) are displayed with respect to the three training models. In the experiment, we used two movielens data sets as described at the beginning of the experimental results section. In parentheses of the performance metrics in all Table 1, Table 2, Table 3, Table 4, Table 5 and Table 6, 1 means data set (1) and 2 means data set (2). Furthermore, the training data [Genre*ATV] are considered as representative data. Here, the abbreviations BC means before clustering and AC means after clustering. AC (best) and AC (worst) indicate the best and worst results among the clusters. Finally, AC (mean) represents the mean results from all clusters. The AC performances of the four metrics improved compared to BC, except for the worst case. The accuracy was also the highest for these data sets because the false positives and false negatives of the data sets were not that different.

4.3.2. Effects of Subgenre Trimming

In order to maximize the advantages of clustering, we come up with a method of trimming the subgenres that are not preferred in the cluster. To see this, we set the threshold

η

by 0.5 and if there is a cluster that does not exceed 0.5 at any one of Recall, Precision, Accuracy, and F1 score, it is subject to trimming. We select the trimming threshold based on the empirical result after the training process. We conducted several tests to derive an appropriate threshold value. During the experiment, we found that the metrics for the worst performance cluster subject to trimming have an approximate value of 0.5 for the used data. We have checked that if the threshold value is lower than this value, even a cluster requiring trimming is not subject to trimming, so a significant change in performance cannot be confirmed. It is included as the trimming target. We have checked that that if the threshold value is lower than this value, no significant change in performance can be confirmed because even clusters requiring trimming are not subject to trimming. If the threshold value is increased, clusters that do not require trimming (with a clear preference) are included as trimming targets. For the result, three cases were considered: before clustering, after clustering, and after trimming for two types of training data, namely [Genre, ATV] and [Genre*ATV], for each model. As shown in Figure 12, most of the metrics for the aftertrimming case have larger values than the others, except for precision.

This is because precision considers the ratio of correctly predicted positive observations to the total predicted positive observations. Furthermore, the accuracy is the highest for all three models. In Table 4, Table 5 and Table 6, the results of the four performance metrics (recall, precision, accuracy, and F1 score) are displayed with respect to three training models. In the experiment, the training data set [Genre*ATV] was considered the representative data set. Here, the abbreviations BT means before trimming and AT means after trimming. AT (best) and AT (worst) indicate the best and worst results among the clusters. Finally, AT (mean) represents the mean of all results of the clusters. The performances of AT for all four metrics also improved.

4.3.3. Effects of Average Transition Probability

Finally, the recommended performance for the four data types described in the model was examined to determine the effect of ATV. The average evaluation values for all clusters are shown in Figure 13, There are no significant differences in performance before and after clustering regardless of whether the transition probability is applied. Before clustering, there are no distinct characteristics to consider for the users. The environment is not conducive to generating a transition probability that reflects the user preference. After clustering, users with similar preferences are grouped together. ATV is expected to have a discernible effect, but contrary to expectations, the effect is insignificant. Rather, except for the RNN, the performance is slightly higher when the transition probability is excluded. Because RNN learns by placing more weight on recent information, as dictated by the characteristics of the model, it is expected to represent the effect of transition probability better than the other models. In general, the results show that the effects of clustering and trimming are substantial.

5. Discussion

In the result, we see that in the sequential recommendation system, state-of-the-art studies mainly show the performance when the Top N or k movies [2,20,22] are recommended as a set in many cases when they are included in the set. Since this approach is to recommend a set rather than a single movie itself, the performance of our method is similar to that of our method. A method for genre prediction was examined as a prestep for sequential movie recommendations. However, how to make sequential movie recommendations was not specifically addressed. To solve problems such as cold start, due to the limitation of movie data in movie recommendations, ref [7] a recommendation system was also proposed based on correlation information on movie genres. However, this study does not suggest a recommendation method for movies with sequential dynamics. In the sequential movie recommendation system, it is necessary to study the information on the genre of a movie for prediction performance. In addition, it is necessary to design a method to select and recommend movies, including recommended genres, based on the results of the sequential movie genre prediction discussed in our study.

The internal threat of this study is that it is difficult to estimate the transition probability from one genre to another from the transition data of the movie sequence that the user watched because of genre prediction rather than movie recommendation. Therefore, we introduced the concept of average transition probability, a new concept that averaged the multiple genres included in each movie at the same time using transition data from one movie to another. However, there is a part that assumes that a movie contains several genres in the same ratio, and we will consider how to assign a weight to each genre as the next step. Next, the external threat is that it is difficult to directly compare the algorithm proposed in this study with other existing studies since many studies currently focus on movie recommendations rather than movie genres. Furthermore, regarding the ATV in Markov chain as a learning method for short-term dynamics, the reason this method did not have much effect on the performance is because the RNN-based deep learning actually learns some short-term dynamics. Therefore, whether this short-term dynamic is better estimated by additionally using a higher-order MC needs to be examined, that is, a higher-order MC that uses information from the past better than the first-order MC, in estimating the next step with the result of the previous step. These issues will be addressed in future work.

6. Conclusions

In this paper, a sequential movie genre prediction algorithm was proposed based on MC for short-term behavior and RNN for long-term behavior of user preference. Movie genre prediction does not recommend a specific movie but recommends the genre for the next movie to watch taking into consideration the preference of each user for the movie genre. For this, users with similar genre preferences are organized into clusters to recommend genres. In clusters that do not display specific tendencies, genre prediction is performed by appropriately trimming genres that are not necessary for recommendation in order to improve performance. Various experiments were performed using the method proposed in this study on well-known movie data sets. The results showed that clustering and subgenre trimming work, but the effect of ATV is not significant.

Author Contributions

Conceptualization, J.C. and J.K. (Jihyeon Kim); methodology, J.C.; software, J.K. (Jihyeon Kim); validation, J.K. (Jihyeon Kim) and J.K. (Jinkyung Kim); formal analysis, J.C.; investigation, J.C.; resources, J.K. (Jinkyung Kim); data curation, J.K. (Jihyeon Kim); writing—original draft preparation, J.C.; writing—review and editing, J.C.; visualization, J.C.; supervision, J.C.; project administration, J.C.; funding acquisition, J.C. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the National Research Foundation of Korea (NRF) grant funded by the Korean government (MSIT) (No. 2019R1G1A1099466).

Data Availability Statement

Not applicable.

Conflicts of Interest

Authors declare no conflict of interest.

References

He, R.; McAuley, J. Fusing similarity models with Markov chains for sparse sequential recommendation. In Proceedings of the ICDM, Barcelona, Spain, 12–15 December 2016. [Google Scholar]
Kang, W.; McAuley, J. Self-Attentive Sequential Recommendation. In Proceedings of the ICDM, Singapore, 17–20 November 2018. [Google Scholar]
He, R.; Kang, W.; McAuley, J. Translation-based Recommendation: A Scalable Method for Modeling Sequential Behavior. In Proceedings of the IJCAI, Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Hidasi, B.; Karatzoglou, A. Recurrent neural networks with Top-k gains for session-based recommendations. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Turin, Italy, 22–26 October 2018. [Google Scholar]
Wang, S.; Cao, L.; Wang, Y.; Sheng, Q.Z.; Orgun, M.A.; Lian, D. A Survey on Session-based Recommender Systems. ACM Comput. Surv. 2021, 9, 1–38. [Google Scholar] [CrossRef]
Rendle, S.; Freudenthaler, C.; Thieme, L.S. Factorizing personalized Markov chains for next-basket recommendation. In Proceedings of the 19th International Conference on World Wide Web, Raleigh, NC, USA, 26–30 April 2010. [Google Scholar]
Choi, S.; Ko, S.; Han, Y. A movie recommendation algorithm based on genre correlations. Expert Syst. Appl. 2012, 39, 8079–8085. [Google Scholar] [CrossRef]
Kim, K.; Moon, N. Recommender system design using movie genre similarity and preferred genres in SmartPhone. Multimed. Tools Appl. 2012, 61, 87–104. [Google Scholar] [CrossRef]
Zimdars, A.; Chickering, D.M.; Meek, C. Using temporal data for making recommendations. In Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence, Seattle, WA, USA, 2–5 August 2001. [Google Scholar]
Shani, G.; Heckerman, D.; Brafman, R.I. An mdp-based recommender system. J. Mach. Learn. Res. 2005, 6, 1265–1295. [Google Scholar]
Mobasher, B.; Dai, H.; Luo, T.; Nakagawa, M. Using sequential and non-sequential patterns in predictive web usage mining tasks. In Proceedings of the ICDM, Maebashi City, Japan, 9–12 December 2002. [Google Scholar]
Khorasani, E.S.; Zhenge, Z.; Champaign, J. A Markov chain collaborative filtering model for course enrollment recommendations. In Proceedings of the Big Data, Washington, DC, USA, 5–8 December 2016. [Google Scholar]
Koren, Y. Collaborative filtering with temporal dynamics. In Proceedings of the Communications of the ACM, Chicago, IL, USA, 4–8 October 2010. [Google Scholar]
Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
Srebro, N.; Rennie, J.D.M.; Jaakkola, T.S. Maximum-margin matrix factorization. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 5–8 December 2005. [Google Scholar]
Salakhutdinov, R.; Mnih, A. Probabilistic matrix factorization. In Proceedings of the Neural Information Processing Systems, Vancouver, BC, Canada, 8–10 December 2008. [Google Scholar]
Wang, P.; Guo, J.; Lan, Y.; Xu, J.; Wan, S.; Cheng, X. Learning hierarchical representation model for next basket recommendation. In Proceedings of the ACM SIGIR Conference on Research and Development in Information Retrieval, Santiago, Chile, 9–13 August 2015. [Google Scholar]
Wang, S.; Hu, L.; Cao, L.; Huang, X.; Lian, D.; Liu, W. Attention-based transactional context embedding for next-item recommendation. In Proceedings of the AAAI, New Orleans, LA, USA, 2–7 February 2018. [Google Scholar]
Dong, D.; Zheng, X.; Zhang, R.; Wang, Y. Recurrent collaborative filtering for unifying general and sequential recommender. In Proceedings of the Twenty-Seventh International Joint Conference on Artificial Intelligence (IJCAI), Stockholm, Sweden, 13–19 July 2018. [Google Scholar]
Chairatanakul, N.; Murata, T.; Liu, X. Recurrent translation-based network for Top-N sparse sequential recommendation. IEEE Access 2019, 7, 131567–131576. [Google Scholar] [CrossRef]
Wu, C.; Ahmed, A.; Beutel, A.; Smola, A.J.; Jing, H. Recurrent recommender networks. In Proceedings of the Eleventh International Conference on Web Search and Data Mining, Los Angeles, CA, USA, 11 August 2017. [Google Scholar]
Zhao, W.; Wang, B.; Yang, M.; Ye, J.; Zhao, Z.; Chen, X.; Shen, Y. Leveraging Long and Short-Term Information in Content-Aware Movie Recommendation via Adversarial Training. IEEE Trans. Cybern. 2020, 50, 11. [Google Scholar] [CrossRef] [PubMed]
Hidasi, B.; Karatzoglou, A. Session-based recommendations with recurrent neural networks. In Proceedings of the Learning Representations, San Juan, Puerto Rico, 2–4 May 2016. [Google Scholar]
Yuan, F.; Karatzoglou, A.; Arapakis, I.; Jose, J.M.; He, X. A simple convolutional generative network for next item recommendation. In Proceedings of the Eleventh International Conference on Web Search and Data Mining, Houston, TX, USA, 16 August 2019. [Google Scholar]
Wu, S.; Tang, Y. Session-based recommendation with graph neural networks. In Proceedings of the AAAI, Honolulu, HI, USA, 27–28 January 2019. [Google Scholar]
Zhang, S.; Tay, Y.; Yao, L.; Sun, A.; An, J. Next item recommendation with self-attentive metric learning). In Proceedings of the AAAI, Honolulu, HI, USA, 27–28 January 2019. [Google Scholar]
Sachdeva, N.; Manco, G.; Ritacco, E.; Pudi, V. Sequential Variational Autoencoders for Collaborative Filtering. In Proceedings of the WSDM, Melbourne, Australia, 11–15 February 2019. [Google Scholar]
Wehrmann, J.; Barros, R. Movie genre classification: A multi-label approach based on convolutions through time. Appl. Soft Comput. 2017, 61, 973–982. [Google Scholar] [CrossRef]
Rasheed, Z.; Sha, M. Movie genre classification by exploiting audio-visual features of previews. In Proceedings of the International Conference on Pattern Recognition, Quebec City, QC, Canada, 11–15 August 2002. [Google Scholar]
Graves, A.; Mohamed, A.; Hinton, G. Speech Recognition with Deep Recurrent Neural Networks. In Proceedings of the International Conference on Acoustics, Speech and Signal Processing (ICASSP), Vancouver, BC, Canada, 26–31 May 2013. [Google Scholar]
Sak, H.; Senior, A.; Beaufays, F. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proceedings of the Fifteenth Annual Conference of the International Speech Communication Association, Singapore, 14–18 September 2014. [Google Scholar]
LSTM. Available online: https://medium.com/nerd-for-tech/what-is-lstm-peephole-lstm-and-gru-77470d84954b (accessed on 30 October 2021).
Cho, K.; van Merriënboer, B.; Gulcehre, C.; Bahdanau, D.; Bougares, F.; Schwenk, H.; Bengio, Y. Learning Phrase Representations using RNN Encoder–Decoder for Statistical Machine Translation. In Proceedings of the EMNLP, Doha, Qatar, 25–29 October 2014. [Google Scholar]
GRU. Available online: https://primo.ai/index.php?title=Gated_Recurrent_Unit_(GRU) (accessed on 30 October 2021).
Movielens (ml-25 m). Available online: https://grouplens.org/datasets/movielens/25m (accessed on 30 October 2021).
Movielens Latest Datasets. Available online: https://grouplens.org/datasets/movielens/latest/m (accessed on 30 October 2021).

Figure 1. Movie genre prediction: Based on the genre included in the user’s movie sequence, we study a prediction on the next genre.

Figure 2. Overall view of our proposed sequential movie genre prediction.

Figure 3. Example of data preprocessing for movie genre prediction (

n = 19

).

Figure 3. Example of data preprocessing for movie genre prediction (

n = 19

).

Figure 4. Illustration of example for clustering.

Figure 5. Transition probability matrix and average transition probability vector.

Figure 6. Four types of training data.

Figure 7. Recurrent neural network.

Figure 8. Long Short Term Memory [32].

Figure 9. Gated Recurrent Unit [34].

Figure 10. Subgenre trimming. In this example, we set

η

= 0.5 and there are two types of performance metrics: precision and recall. We see that the minimum value of evaluation metrics

P_{m i n}^{2} = 0.6

and

P_{m i n}^{7} = 0.55

for cluster 2 and cluster 7, respectively. Hence, these clusters are not regarded as the trimming clusters. However,

P_{m i n}^{3} = 0.45 < 0.5

for cluster 3, which is regarded as a trimming cluster.

Figure 10. Subgenre trimming. In this example, we set

η

= 0.5 and there are two types of performance metrics: precision and recall. We see that the minimum value of evaluation metrics

P_{m i n}^{2} = 0.6

and

P_{m i n}^{7} = 0.55

for cluster 2 and cluster 7, respectively. Hence, these clusters are not regarded as the trimming clusters. However,

P_{m i n}^{3} = 0.45 < 0.5

for cluster 3, which is regarded as a trimming cluster.

Figure 11. Result of clustering for RNN (a–c), LSTM (d–f), and GRU (g–i), respectively.

Figure 12. Result of subgenre trimming for RNN (a,b), LSTM (c,d), and GRU (e,f), respectively.

Figure 13. Result of ATV for RNN (a–c), LSTM (d–f), and GRU (g–i), respectively.

Table 1. Four performance results of RNN with training data type [Genre*ATV]. (In parentheses of the performance metrics, 1 means data set (1) and 2 means data set (2)).

Clustering	Recall (1, 2)	Precision (1, 2)	Accuracy (1, 2)	F1 Score (1, 2)
BC	0.44 0.40	0.77 0.74	0.85 0.84	0.56 0.52
AC (best)	0.73 0.82	0.80 0.71	0.91 0.80	0.76 0.76
AC (worst)	0.29 0.29	0.87 0.67	0.85 0.72	0.43 0.40
AC (mean)	0.52 0.47	0.77 0.71	0.86 0.83	0.58 0.55

Table 2. Four performance results of LSTM with training data type [Genre*ATV].

Clustering	Recall (1, 2)	Precision (1, 2)	Accuracy (1, 2)	F1 Score (1, 2)
BC	0.45 0.46	0.76 0.66	0.85 0.82	0.57 0.54
AC (best)	0.75 0.92	0.79 0.84	0.91 0.94	0.77 0.88
AC (worst)	0.27 0.40	0.84 0.61	0.84 0.77	0.41 0.61
AC (mean)	0.51 0.56	0.75 0.68	0.86 0.84	0.59 0.61

Table 3. Four performance results of GRU with training data type [Genre*ATV].

Clustering	Recall (1, 2)	Precision (1, 2)	Accuracy (1, 2)	F1 Score (1, 2)
BC	0.49 0.45	0.70 0.69	0.83 0.82	0.58 0.54
AC (best)	0.73 0.89	0.75 0.87	0.89 0.93	0.74 0.88
AC (worst)	0.35 0.36	0.69 0.60	0.79 0.77	0.46 0.45
AC (mean)	0.53 0.48	0.69 0.71	0.84 0.82	0.59 0.56

Table 4. Four performance results of RNN with training data type [Genre*ATV]. (In parentheses of the performance metrics, 1 means data set (1) and 2 means data set (2)).

Trimming	Recall (1, 2)	Precision (1, 2)	Accuracy (1, 2)	F1 Score (1, 2)
BT (mean)	0.52 0.47	0.77 0.71	0.86 0.83	0.60 0.55
BT (worst)	0.29 0.29	0.87 0.67	0.85 0.72	0.43 0.40
AT (worst)	0.37 0.47	0.87 0.97	0.85 0.84	0.52 0.63
AT (mean)	0.57 0.53	0.75 0.73	0.87 0.85	0.64 0.61

Table 5. Four performance results of LSTM with training data type [Genre*ATV].

Trimming	Recall (1, 2)	Precision (1, 2)	Accuracy (1, 2)	F1 Score (1, 2)
BT (mean)	0.51 0.56	0.75 0.68	0.86 0.84	0.59 0.61
BT (worst)	0.27 0.40	0.84 0.61	0.84 0.77	0.41 0.48
AT (worst)	0.35 0.42	0.84 0.66	0.86 0.81	0.49 0.51
AT (mean)	0.58 0.62	0.73 0.70	0.87 0.87	0.63 0.65

Table 6. Four performance results of GRU with training data type [Genre*ATV].

CTrimming	Recall (1, 2)	Precision (1, 2)	Accuracy (1, 2)	F1 Score (1, 2)
BT (mean)	0.53 0.48	0.69 0.71	0.84 0.82	0.59 0.56
BT (worst)	0.35 0.36	0.69 0.60	0.79 0.77	0.46 0.45
AT (worst)	0.43 0.37	0.70 0.63	0.82 0.80	0.53 0.47
AT (mean)	0.58 0.49	0.69 0.72	0.85 0.84	0.63 0.58

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Kim, J.; Kim, J.; Choi, J. Sequential Movie Genre Prediction Using Average Transition Probability with Clustering. Appl. Sci. 2021, 11, 11841. https://doi.org/10.3390/app112411841

AMA Style

Kim J, Kim J, Choi J. Sequential Movie Genre Prediction Using Average Transition Probability with Clustering. Applied Sciences. 2021; 11(24):11841. https://doi.org/10.3390/app112411841

Chicago/Turabian Style

Kim, Jihyeon, Jinkyung Kim, and Jaeyoung Choi. 2021. "Sequential Movie Genre Prediction Using Average Transition Probability with Clustering" Applied Sciences 11, no. 24: 11841. https://doi.org/10.3390/app112411841

APA Style

Kim, J., Kim, J., & Choi, J. (2021). Sequential Movie Genre Prediction Using Average Transition Probability with Clustering. Applied Sciences, 11(24), 11841. https://doi.org/10.3390/app112411841

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Sequential Movie Genre Prediction Using Average Transition Probability with Clustering

Abstract

1. Introduction

2. Related Works

3. Genre Prediction Algorithm

3.1. Data Preprocessing

3.2. Clustering

3.3. Average Transition Probability of Genres

3.3.1. Markov Chain for Set of Genres

3.3.2. Estimation of Transition Probabilities

3.4. Model Training

3.4.1. Training Data Types

3.4.2. Training Models

3.4.3. Subgenre Trimming

4. Experimental Results

4.1. Data

4.2. Performance Metrics

4.3. Results

4.3.1. Effects of Clustering

4.3.2. Effects of Subgenre Trimming

4.3.3. Effects of Average Transition Probability

5. Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI