Integration of Deep Reinforcement Learning with Collaborative Filtering for Movie Recommendation Systems
Abstract
:1. Introduction
- The use of singular value decomposition (SVD) is utilized for matrix factorization in CF, extracting informative embeddings that capture latent user and movie features, which are essential for improving recommendation accuracy and personalization.
- The work emphasizes the actor–critic method within DRL, a strategy that balances policy-based and value-based methods, potentially enhancing the recommendation process. The incorporation of the Deep Deterministic Policy Gradient (DDPG) into the reinforcement learning framework facilitates the training of the recommendation system through continuous interactions between users and the environment, ensuring a flexible and adaptable approach to Top-N recommendation mechanisms. The system updates its internal state to reflect the most recent user interactions, ensuring an up-to-date set of suggestions. To evaluate the performance of the recommender models, several metrics are chosen to measure the proportion of ranking items to ensure that the items are relevant to the target users. Additionally, various benchmark models have emerged for comparison with the support of MovieLens [7].
2. Related Work
2.1. Traditional-Based Recommendation Systems
2.2. Deep-Learning-Based Recommendation Systems
2.3. Reinforcement-Learning-Based Recommendation Systems
3. Proposed System
3.1. Problem Scenarios
3.2. System Architecture
3.3. Detail Implementation Flow
3.3.1. Phase One
- Data Loading and Preprocessing
- Embedding Generation
- Train–Test Split and Data Preparation
3.3.2. Phase Two
- State Representation (AdaptiveMaxPool State Representation)
- Actor–Critic Initialization
Algorithm 1 Actor–Critic Model Initialization | |||
Define the Actor class Inherits nn.Module: | |||
Initialize neural network layers: | |||
Linear(input_dim,hidden_dim)->LeakyReLU->Dropout(optimal_prob) | |||
Linear(hidden_dim,hidden_dim)->LeakyReLU->Dropout(optimal_prob) | |||
Linear(hidden_dim, output_dim) | |||
#Second Linear layer from hidden_dim to hidden_dim | |||
LeakyReLU activation layer | |||
Define forward(state): | |||
state -> Linear1->LeakyReLU ->Dropout ->Linear2->LeakyReLU->Dropout->Linear3 | |||
return output | |||
End Actor Class | |||
Define the Critic class Inherit from nn.Module | |||
Initialize: | |||
Linear(input_dim+output_dim, hidden_dim)->LeakyReLU->Dropout(optimal_prob) | |||
Linear(hidden_dim, hidden_dim)->LeakyReLU->Dropout(optimal_prob) | |||
Linear(hidden_dim, 1) | |||
Initialize weights of last layer uniformly within specified range | |||
Define forward(state, action): | |||
Concatenate(state,action)->Linear1->LeakyReLU->Dropout->Linear2->LeakyReLU->Dropout->Linear3 | |||
return value_estimate | |||
End Critic Class | |||
Model Initialization: | |||
Set input_dim, output_dim, hidden_dim as per dataset characteristics | |||
actor = Actor(input_dim, hidden_dim, output_dim, dropout_probability) | |||
critic = Critic(input_dim, output_dim, hidden_dim, dropout_probability) | |||
actor_target = actor_model | |||
critic_target = critic_model | |||
Define Optimizers: | |||
actor_optimizer = Optimizer(actor.parameters(), learning_rate) | |||
critic_optimizer = Optimizer(critic.parameters(), learning_rate) |
- Training using the DDPG
Algorithm 2 DDPG Training | |||||
Initialize Networks and Parameters: | |||||
actor_model = initialize_actor_network(input_dim, hidden_dim, output_dim) | |||||
critic_model = initialize_critic_network(input_dim, output_dim, hidden_dim) | |||||
actor_target = actor_model | |||||
critic_target = critic_model | |||||
replay_buffer = create_replay_buffer() | |||||
Set exploration_noise, discount_factor, tau (soft update parameter) | |||||
Training Loop: | |||||
For episode = 1 to max_episodes do: | |||||
initialize exploration_noise | |||||
state = observe_initial_state() | |||||
for t = 1 to max_timesteps do | |||||
action = actor(state) + exploration_noise | |||||
next_state, reward, done = execute_action(action) | |||||
replay_buffer.store(state, action, reward, next_state, done) | |||||
batch = replay_buffer.sample() | |||||
target_Q = compute_target_Q(batch, critic_target, actor_target, discount_factor) | |||||
update_critic(critic, batch, target_Q) | |||||
update_actor(actor_model, critic_model batch) | |||||
soft_update(actor_target, actor_model, tau) | |||||
soft_update(critic_target, critic_model, tau) | |||||
update exploration_noise | |||||
if done then | |||||
break | |||||
end if | |||||
end for | |||||
end for |
3.3.3. Phase Three
- Testing and Recommendation Generation
3.4. Algorithms for the Proposed System
Algorithm 3 The Proposed System for Movie Recommendation system | ||||||
Input: ratings_df, movies_df, users_df | ||||||
Output: evaluation_metrics, specific user_recommendations, top-N recommendation | ||||||
Procedure: | ||||||
1: Load and preprocess data: | ||||||
ratings_df, movies_df, users_df = load_movie_data() | ||||||
R_df = create_user_item_matrix(ratings_df) | ||||||
2: Split data into training and testing sets: | ||||||
train_users, test_users = split_users_based_on_ratings(R_df) | ||||||
3: Prepare data for deep reinforcement learning: | ||||||
train_dataloader, test_dataloader = create_data_loaders(train_users, test_users) | ||||||
4: Initialize reinforcement learning models: | ||||||
actor_model = initialize_actor_model() | ||||||
critic_model = initialize_critic_model() | ||||||
target_actor_model = actor_model | ||||||
target_critic_model = critic_model | ||||||
replay_buffer = initialize_replay_buffer() | ||||||
5: Define state representation: | ||||||
define_state_representation_functions() | ||||||
6: Train the models—for the detail refer to Step 2: | ||||||
for episode = 1 to num_episodes do | ||||||
for batch in train_dataloader do | ||||||
state = compute_state_representation(batch) | ||||||
action = actor_model(state) | ||||||
reward = calculate_reward(batch) | ||||||
next_state = compute_state_representation(batch) | ||||||
replay_buffer.push(state, action, reward, next_state) | ||||||
if replay_buffer.size() > batch_size then | ||||||
update_actor_critic_models(replay_buffer) | ||||||
end if | ||||||
end for | ||||||
end for | ||||||
7: Test the models: | ||||||
for batch in test_dataloader do | ||||||
state = compute_state_representation(batch) | ||||||
recommendations = generate_recommendations(actor_model, state) | ||||||
evaluate_recommendations(recommendations, batch) | ||||||
end for | ||||||
8: Compute evaluation metrics: | ||||||
evaluation_metrics = calculate_evaluation_metrics() | ||||||
9: Generate user-specific recommendations: | ||||||
selected_user_id = choose_user_id(test_users) | ||||||
user_recommendations = generate_user_specific_recommendations(selected_user_id) | ||||||
10: Analyze recommendations using cosine similarity: | ||||||
cosine_similarity_matrix = compute_cosine_similarity(user_recommendations) | ||||||
11: Return Output |
4. Experiments and Results
4.1. Experiment Setting
4.2. Evaluation Metrics
4.3. Benchmark Models
- Singular Value Decomposition (SVD): It is a matrix factorization technique that decomposes a user–item interaction matrix into three matrices, capturing latent factors. It is fundamental in recommendations, especially for predicting missing ratings in collaborative filtering.
- KNNBasic: K-Nearest Neighbor (KNN) is a memory-based collaborative-filtering approach. It generates recommendations by assessing the similarity between items or users, often using distance metrics like Cosine similarity or Pearson correlation.
- KNNWithZScore: This model extends KNNBasic by normalizing user rating, taking into account the mean and standard deviation. This normalization is beneficial for addressing users who consistently rate items higher or lower than average.
- Collaborative Filtering (user-based): This technique recommends items by analyzing the preferences and behaviors of similar users. As a classic approach in recommender systems, it predicts user interests through the leverage of user similarity.
- SVD++: an advanced variant of SVD, enhances the model by incorporating implicit feedback, like clicks, views, or purchase history, alongside explicit ratings. This enhancement enables the model to encompass a wider spectrum of user preferences.
- Generalized Matrix Factorization (GMF): This neural network-based approach generalizes matrix factorization and typically employs a linear kernel model latent feature interaction, rendering it a more flexible version of matrix factorization.
- Multi-Layer Perceptron (MLP): It is a type of neural network that is adept at modeling complex and nonlinear relationships between users and items. They excel in capturing high-level abstractions within data.
- Maximum Margin Matrix Factorization (MMMF): It is a matrix-factorization technique that uses a margin-based loss function, aiming to widen the gap between predictions for positive and negative interactions. This approach enhances the distinction between relevant and irrelevant items.
- Neural Matrix Factorization (NeuMF): This technique merged GMF and MLP to capture both the linearity of matrix factorization and the non-linearity of neural networks. This combination is particularly adept at capturing complex user-item interaction patterns.
- Bilateral VAE for Collaborative Filtering (VAECF): This advanced model employed variational autoencoders for collaborative filtering. It is particularly effective in managing sparse and high-dimensional data, which are prevalent in recommendation scenarios.
4.4. Recommendation Results
5. Ablation Study
- Ablation—Alters the method of reward calculation.
- o
- Potential Outcomes and Interpretation:
- ▪
- Decreased Performance: The new reward system does not outperform our current models. It suggests that the new reward structure is either too simplistic and not capturing the nuances of user preferences, or overly complex, confusing the learning process. The model aims to capture only the user binary rate as 1 or 0. In addition, that is the reason for reducing the performance of the recommendation metrics. The new reward structure aligns better with the desired outcomes of our model, which uses Adam optimizers rather than the one with SGD. But this binary reward does not seem to perform well, especially when the metric k becomes bigger, as shown in Figure 6.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Kim, H.M.; Ghiasi, B.; Spear, M.; Laskowski, M.; Li, J. Online serendipity: The case for curated recommender systems. Bus. Horiz. 2017, 60, 613–620. [Google Scholar] [CrossRef]
- Thorat, P.B.; Goudar, R.M.; Barve, S. Survey on collaborative filtering, content-based filtering and hybrid recommendation system. Int. J. Comput. Appl. 2015, 110, 31–36. [Google Scholar]
- Ferreira, D.; Silva, S.; Abelha, A.; Machado, J. Recommendation system using autoencoders. Appl. Sci. 2020, 10, 5510. [Google Scholar] [CrossRef]
- Elguea, Í.; Arana-Arexolaleiba, N.; Serrano Muñoz, A. A review on reinforcement learning for contact-rich robotic manipulation tasks. Robot. Comput.-Integr. Manuf. 2023, 81, 102517. [Google Scholar] [CrossRef]
- Li, M.; Wang, Z. Deep learning for high-dimensional reliability analysis. Mech. Syst. Signal Process. 2020, 139, 106399. [Google Scholar] [CrossRef]
- Kulkarni, T.D.; Narasimhan, K.; Saeedi, A.; Tenenbaum, J. Hierarchical deep reinforcement learning: Integrating temporal abstraction and intrinsic motivation. Adv. Neural Inf. Process. Syst. 2016, 29, 3682–3690. [Google Scholar]
- Harper, F.M.; Konstan, J.A. The movielens datasets: History and context. Acm Trans. Interact. Intell. Syst. 2015, 5, 1–19. [Google Scholar] [CrossRef]
- Vilakone, P.; Xinchang, K.; Park, D.S. Personalized movie recommendation system combining data mining with the k-clique method. J. Inf. Process. Syst. 2019, 15, 1141–1155. [Google Scholar]
- Peng, S.; Park, D.S.; Kim, D.Y.; Yang, Y.; Siet, S.; Ugli SI, R.; Lee, H. A Modern Recommendation System Survey in the Big Data Era. In International Conference on Computer Science and Its Applications and the International Conference on Ubiquitous Information Technologies and Applications; Springer Nature Singapore: Singapore, 2022; pp. 577–582. [Google Scholar]
- Koren, Y.; Rendle, S.; Bell, R. Advances in collaborative filtering. In Recommender Systems Handbook; Springer: New York, NY, USA, 2021; pp. 91–142. [Google Scholar]
- Goldberg, D.; Nichols, D.; Oki, B.M.; Terry, D. Using collaborative filtering to weave an information tapestry. Commun. ACM 1992, 35, 61–70. [Google Scholar] [CrossRef]
- Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
- Liang, D.; Altosaar, J.; Charlin, L.; Blei, D.M. Factorization meets the item embedding: Regularizing matrix factorization with item co-occurrence. In Proceedings of the 10th ACM Conference on Recommender Systems, Boston, MA, USA, 15–19 September 2016; pp. 59–66. [Google Scholar]
- Tran, T.; Lee, K.; Liao, Y.; Lee, D. Regularizing matrix factorization with user and item embeddings for recommendation. In Proceedings of the 27th ACM International Conference on Information and Knowledge Management, Torino, Italy, 22–26 October 2018; pp. 687–696. [Google Scholar]
- Deldjoo, Y.; Dacrema, M.F.; Constantin, M.G.; Eghbal-Zadeh, H.; Cereda, S.; Schedl, M.; Ionescu, B.; Cremonesi, P. Movie genome: Alleviating new item cold start in movie recommendation. User Model. User-Adapt. Interact. 2019, 29, 291–343. [Google Scholar] [CrossRef]
- Xinchang, K.; Vilakone, P.; Park, D.S. Movie recommendation algorithm using social network analysis to alleviate cold-start problem. J. Inf. Process. Syst. 2019, 15, 616–631. [Google Scholar]
- Vilakone, P.; Park, D.S.; Xinchang, K.; Hao, F. An efficient movie recommendation algorithm based on improved k-clique. Hum.-Centric Comput. Inf. Sci. 2018, 8, 38. [Google Scholar] [CrossRef]
- Van Meteren, R.; Van Someren, M. Using content-based filtering for recommendation. In Proceedings of the Machine Learning in the New Information Age: MLnet/ECML2000 Workshop, Barcelona, Spain, 30 May 2000; Volume 30, pp. 47–56. [Google Scholar]
- Bogdanov, D.; Haro, M.; Fuhrmann, F.; Xambó, A.; Gómez, E.; Herrera, P. Semantic audio content-based music recommendation and visualization based on user preference examples. Inf. Process. Manag. 2013, 49, 13–33. [Google Scholar] [CrossRef]
- Li, L.; Wang, D.; Li, T.; Knox, D.; Padmanabhan, B. Scene: A scalable two-stage personalized news recommendation system. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 24–28 July 2011; pp. 125–134. [Google Scholar]
- Tian, Y.; Zheng, B.; Wang, Y.; Zhang, Y.; Wu, Q. College library personalized recommendation system based on hybrid recommendation algorithm. Procedia CIRP 2019, 83, 490–494. [Google Scholar] [CrossRef]
- Wang, H.; Wang, N.; Yeung, D.Y. Collaborative deep learning for recommender systems. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Sydney, NSW, Australia, 10–13 August 2014; pp. 1235–1244. [Google Scholar]
- Zhang, S.; Yao, L.; Sun, A.; Tay, Y. Deep learning based recommender system: A survey and new perspectives. ACM Comput. Surv. (CSUR) 2019, 52, 1–38. [Google Scholar] [CrossRef]
- Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
- Naumov, M.; Mudigere, D.; Shi, H.J.M.; Huang, J.; Sundaraman, N.; Park, J.; Wang, X.; Gupta, U.; Wu, C.-J.; Azzolini, A.G.; et al. Deep learning recommendation model for personalization and recommendation systems. arXiv 2019, arXiv:1901.02103. [Google Scholar]
- Li, Z.; Shi, L.; Cristea, A.I.; Zhou, Y. A survey of collaborative reinforcement learning: Interactive methods and design patterns. In Proceedings of the 2021 ACM Designing Interactive Systems Conference, Virtual, 28 June–2 July 2021; pp. 1579–1590. [Google Scholar]
- Zhao, X.; Xia, L.; Zou, L.; Yin, D.; Tang, J. Toward simulating environments in reinforcement learning based recommendations. arXiv 2019, arXiv:1906.11462. [Google Scholar]
- Deliu, N. Reinforcement learning for sequential decision making in population research. Qual. Quant. 2023, 1–24. [Google Scholar] [CrossRef]
- Zou, L.; Xia, L.; Ding, Z.; Song, J.; Liu, W.; Yin, D. Reinforcement learning to optimize long-term user engagement in recommender systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2810–2818. [Google Scholar]
- Mlika, F.; Karoui, W. Proposed model to intelligent recommendation system based on Markov chains and grouping of genres. Procedia Comput. Sci. 2020, 176, 868–877. [Google Scholar] [CrossRef]
- Arulkumaran, K.; Deisenroth, M.P.; Brundage, M.; Bharath, A.A. Deep reinforcement learning: A brief survey. IEEE Signal Process. Mag. 2017, 34, 26–38. [Google Scholar] [CrossRef]
- Hug, N. Surprise: A Python library for recommender systems. J. Open Source Softw. 2020, 5, 2174. [Google Scholar] [CrossRef]
- Salah, A.; Truong, Q.T.; Lauw, H.W. Cornac: A comparative framework for multimodal recommender systems. J. Mach. Learn. Res. 2020, 21, 3803–3807. [Google Scholar]
Models | P@5 | P@10 | P@15 | P@20 |
---|---|---|---|---|
SVD | 0.7756 | 0.7651 | 0.7608 | 0.7578 |
KNNBasic | 0.7948 | 0.7831 | 0.7792 | 0.7755 |
KNNWithZScore | 0.6615 | 0.6578 | 0.6505 | 0.6493 |
KNNBaseline | 0.7513 | 0.7420 | 0.7391 | 0.7347 |
SVDpp | 0.7922 | 0.7802 | 0.7733 | 0.7682 |
GMF | 0.3935 | 0.3560 | 0.3286 | 0.3088 |
MLP | 0.4150 | 0.3674 | 0.3350 | 0.3104 |
MMMF | 0.1162 | 0.1133 | 0.1030 | 0.0922 |
NeuMF | 0.4240 | 0.3728 | 0.3412 | 0.3166 |
VAECF | 0.2195 | 0.1900 | 0.1712 | 0.1570 |
OURS (Adadelta) | 0.6933 | 0.6907 | 0.6948 | 0.6982 |
OURS (Adam) | 0.7445 | 0.7344 | 0.7241 | 0.7258 |
OURS (SGD) | 0.7391 | 0.7445 | 0.7464 | 0.7448 |
OURS (RMSprop) | 0.6880 | 0.6771 | 0.6752 | 0.6838 |
OURS (Adamax) | 0.6908 | 0.6792 | 0.6832 | 0.6868 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Peng, S.; Siet, S.; Ilkhomjon, S.; Kim, D.-Y.; Park, D.-S. Integration of Deep Reinforcement Learning with Collaborative Filtering for Movie Recommendation Systems. Appl. Sci. 2024, 14, 1155. https://doi.org/10.3390/app14031155
Peng S, Siet S, Ilkhomjon S, Kim D-Y, Park D-S. Integration of Deep Reinforcement Learning with Collaborative Filtering for Movie Recommendation Systems. Applied Sciences. 2024; 14(3):1155. https://doi.org/10.3390/app14031155
Chicago/Turabian StylePeng, Sony, Sophort Siet, Sadriddinov Ilkhomjon, Dae-Young Kim, and Doo-Soon Park. 2024. "Integration of Deep Reinforcement Learning with Collaborative Filtering for Movie Recommendation Systems" Applied Sciences 14, no. 3: 1155. https://doi.org/10.3390/app14031155
APA StylePeng, S., Siet, S., Ilkhomjon, S., Kim, D.-Y., & Park, D.-S. (2024). Integration of Deep Reinforcement Learning with Collaborative Filtering for Movie Recommendation Systems. Applied Sciences, 14(3), 1155. https://doi.org/10.3390/app14031155