DDRCN: Deep Deterministic Policy Gradient Recommendation Framework Fused with Deep Cross Networks
Abstract
:1. Introduction
- We provide a deep deterministic policy gradient recommendation framework, DDRCN, that fuses deep cross networks. The framework uses the Actor-Critic approach, which maximizes the cumulative reward of recommendations through the continuous exploration and exploitation of the Actor and Critic networks, combined with greedy strategies.
- In this recommendation framework, we fit the data features between users and items, utilizing a deep cross network. The deep cross network consists of a Deep network and a Cross network, and the two networks work together to compute the cross relationship between the features.
- We conducted relevant recommendation experiments on the real movie and music datasets, and the experimental results show that our proposed model outperforms its competitors regarding recall and ranking effects.
2. Related Work
2.1. Traditional Recommendation Methods
2.2. Recommendation Methods with Deep Learning
2.3. Recommendation Methods with Reinforcement Learning
3. Preliminaries
- State space S: S is the set of environment states, and represents the state of the agent at time t, which is the interaction between the user and the recommendation system at time t.
- Action space A: A is the set of actions that the agent can take, and represents the action taken by the agent at time t. In particular, actions here refer to action vectors.
- Reward R: The recommendation system will recommend actions based on the user’s state and behavior, and the user will provide feedback (click, rating, retweet, etc.). Recommendation systems receive instant rewards based on user feedback .
- State transition P: When the recommendation system recommends action at time step t, the state of the user at this time is transferred from to .
- Discount factor : is a discount factor used to indicate the importance of future rewards, with being close to 1 to consider long-term rewards, and being close to 0 to consider immediate rewards.
4. The Proposed Framework
4.1. The Architecture
4.1.1. Actor Policy Neural Networks
4.1.2. Critic Value Neural Networks
4.1.3. State Representation Module
4.2. Training Process
4.3. Evaluation Process
Algorithm 1: Training Process |
Algorithm 2: Offline Evaluation Process |
5. Experiment
5.1. Dataset and Evaluation Metrics
- MovieLens (100k): A stable benchmark dataset of 100,000 ratings given to 1700 movies by 1000 users.
- MovieLens (1M): A stable benchmark dataset consisting of 1 million ratings from 6000 users for 4000 movies.
- Lastfm dataset: a stable benchmark dataset of 100,000 listening histories of 17,000 songs by 2000 users.
5.2. Baseline Algorithms
- BPR: BPR is a personalized ranking framework, a general learning algorithm to obtain the maximum a posteriori estimator via Bayesian analysis.
- NCF: NCF is a recommendation framework based on neural network collaborative filtering, which uses matrix factorization and neural networks to learn the user–item interaction function.
- DRR-p: DRR is a recommendation method based on deep reinforcement learning. This method models the dynamic interaction process of the recommendation system and uses three state representation structures. DRR-p is a method in the state representation module that exploits the product between items to capture local relationships between features.
- DRR-u: DRR-u is another feature representation method in the DRR framework, which uses the product between items and users to capture the relationship between features.
- DRR-ave: DRR-ave is also the third method of the state representation module in the DRR model. It is the average pooling and inner product of embedding vectors of users and items to obtain the user feature representation.
5.3. Experimental Setup
5.4. Experimental Results and Analysis
5.5. Parameter Study
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Ge, M.; Persia, F. A survey of multimedia recommendation systems: Challenges and opportunities. Int. J. Semant. Comput. 2017, 11, 411–428. [Google Scholar] [CrossRef]
- Gomez-Uribe, C.; Hunt, N. The netflix recommendation system: Algorithms, business value, and innovation. ACM Trans. Manag. Inf. Syst. 2015, 6, 1–19. [Google Scholar] [CrossRef] [Green Version]
- Seide, F.; Li, G.; Chen, X.; Yu, D. Feature engineering in context-dependent deep neural networks for conversational speech transcription. In Proceedings of the IEEE Workshop on Automatic Speech Recognition & Understanding, Waikoloa, HI, USA, 11–15 December 2011; pp. 24–29. [Google Scholar]
- Wang, R.; Fu, B.; Fu, G.; Wang, M. Deep & cross network for ad click predictions. In Proceedings of the ADKDD’17, Halifax, NS, Canada, 14 August 2017; pp. 1–7. [Google Scholar]
- Wang, R.; Shivanna, R.; Cheng, D.; Jain, S.; Lin, D.; Hong, L.; Chi, E. Dcn v2: Improved deep & cross network and practical lessons for web-scale learning to rank systems. In Proceedings of the Web Conference, Ljubljana, Slovenia, 19–23 April 2021; pp. 1785–1797. [Google Scholar]
- Xie, R.; Wang, Y.; Wang, R.; Lu, Y.; Zou, Y.; Xia, F.; Lin, L. Long short-term temporal meta-learning in online recommendation. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event, AZ, USA, 21–25 February 2022; pp. 1168–1176. [Google Scholar]
- Shivaram, K.; Liu, P.; Shapiro, M.; Bilgic, M.; Culotta, A. Reducing Cross-Topic Political Homogenization in Content-Based News Recommendation. In Proceedings of the 16th ACM Conference on recommendation Systems, Seattle, WA, USA, 18–23 September 2022; pp. 220–228. [Google Scholar]
- Rendle, S. Factorization machines. In Proceedings of the 2010 IEEE International conference on data mining, Sydney, Australia, 13–17 December 2010; pp. 995–1000. [Google Scholar]
- Purushotham, S.; Liu, Y.; Kuo, J. Collaborative topic regression with social matrix factorization for recommendation systems. In Proceedings of the International Conference on Machine Learning, Edinburgh, UK, 26 June–1 July 2012; pp. 691–698. [Google Scholar]
- Montanés, E.; Quevedo, J.R.; Díaz, I.; Ranilla, J. Collaborative tag recommendation system based on logistic regression. In Proceedings of the European Conference on Machine Learning/ Principles and Practice of Knowledge Discovery in Databases, Bled, Slovenia, 7–11 September 2009; pp. 173–188. [Google Scholar]
- Sun, Y.; Pan, J.; Zhang, A.; Flores, A. FM2: Field-matrixed factorization machines for recommendation systems. In Proceedings of the Web Conference 2021, Ljubljana, Slovenia, 19–23 April 2021; pp. 2828–2837. [Google Scholar]
- Sedhain, S.; Menon, A.K.; Sanner, S.; Xie, L. Autorec: Autoencoders meet collaborative filtering. In Proceedings of the 24th international conference on World Wide Web, Florence, Italy, 18–22 May 2015; pp. 111–112. [Google Scholar]
- Cheng, H.T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M.; et al. Wide & deep learning for recommendation systems. In Proceedings of the 1st workshop on deep learning for recommendation systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
- Zhou, G.; Mou, N.; Fan, Y.; Pi, Q.; Bian, W.; Zhou, C.; Zhu, X.; Gai, K. Deep interest evolution network for click-through rate prediction. In Proceedings of the AAAI Conference on Artificial Intelligence; Association for the Advancement of Artificial Intelligence: Palo Alto, CA, USA, 2019; pp. 5941–5948. [Google Scholar]
- Liu, H.; Wei, Y.; Yin, J.; Nie, L. HS-GCN: Hamming Spatial Graph Convolutional Networks for Recommendation. In IEEE Transactions on Knowledge and Data Engineering; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
- Xu, C.; Guan, Z.; Zhao, W.; Wu, Q.; Yan, M.; Chen, L.; Miao, Q. Recommendation by users’ multimodal preferences for smart city applications. IEEE Trans. Ind. Inform. 2021, 17, 4197–4205. [Google Scholar] [CrossRef]
- Wang, K.; Zou, Z.; Deng, Q.; Tao, J.; Wu, R.; Fan, C.; Chen, L.; Cui, P. Reinforcement learning with a disentangled universal value function for item recommendation. InProceedings of the AAAI Conference on Artificial Intelligence; Association for the Advancement of Artificial Intelligence: Palo Alto, CA, USA, 2021; Volume 35, pp. 4427–4435. [Google Scholar]
- Zou, L.; Xia, L.; Ding, Z.; Song, J.; Liu, W.; Yin, D. Reinforcement learning to optimize long-term user engagement in recommendation systems. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, Anchorage, AK, USA, 4–8 August 2019; pp. 2810–2818. [Google Scholar]
- Zou, L.; Xia, L.; Du, P.; Zhang, Z.; Bai, T.; Liu, W.; Nie, J.Y.; Yin, D. Pseudo Dyna-Q: A reinforcement learning framework for interactive recommendation. In Proceedings of the 13th International Conference on Web Search and Data Mining, Houston, TX, USA, 3–7 February 2020; pp. 816–824. [Google Scholar]
- Ma, J.; Zhao, Z.; Yi, X.; Yang, J.; Chen, M.; Tang, J.; Hong, L.; Chi, E.H. Off-policy learning in two-stage recommendation systems. Proceedings of The Web Conference 2020, Taipei, Taiwan, 20–24 April 2020; pp. 463–473. [Google Scholar]
- Xin, X.; Karatzoglou, A.; Arapakis, I.; Jose, J.M. Supervised Advantage Actor-Critic for Recommender Systems. In Proceedings of the Fifteenth ACM International Conference on Web Search and Data Mining, Virtual Event, AZ, USA, 21–25 February 2022; pp. 1186–1196. [Google Scholar]
- Liu, F.; Tang, R.; Li, X.; Zhang, W.; Ye, Y.; Chen, H.; Guo, H.; Zhang, Y. Deep reinforcement learning based recommendation with explicit user-item interactions modeling. arXiv 2018, arXiv:1810.12027. [Google Scholar]
- Silver, D.; Lever, G.; Heess, N.; Degris, T.; Wierstra, D.; Riedmiller, M. Deterministic policy gradient algorithms. Int. Conf. Mach. Learn. 2014, 32, 387–395. [Google Scholar]
- Lillicrap, T.P.; Hunt, J.J.; Pritzel, A.; Heess, N.; Erez, T.; Tassa, Y.; Silver, D.; Wierstra, D. Continuous control with deep reinforcement learning. arXiv 2021, arXiv:1509.02971. [Google Scholar]
- Li, Z.; Zhao, H.; Liu, Q.; Huang, Z.; Mei, T.; Chen, E. Learning from history and present: Next-item recommendation via discriminatively exploiting user behaviors. In Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, London, UK, 19–23 August 2018; pp. 1734–1743. [Google Scholar]
- Rendle, S.; Freudenthaler, C.; Gantner, Z.; Schmidt-Thieme, L. BPR: Bayesian personalized ranking from implicit feedback. In Proceedings of the Twenty-Fifth Conference on Uncertainty in Artificial Intelligence, Montreal, QC, Canada, 18–21 June 2009; pp. 452–461. [Google Scholar]
- He, X.; Liao, L.; Zhang, H.; Nie, L.; Hu, X.; Chua, T.S. Neural collaborative filtering. In Proceedings of the 26th International Conference on World Wide Web, Perth, Australia, 3–7 April 2017; pp. 173–182. [Google Scholar]
Dataset | User | Item | Rating |
---|---|---|---|
MovieLens (100k) | 943 | 1682 | 100,000 |
MovieLens (1M) | 6040 | 3952 | 1,000,209 |
Lastfm | 1892 | 17,632 | 92,834 |
Model | Precision@5 | Precision@10 | NDCG@5 | NDCG@10 |
---|---|---|---|---|
BPR | 0.4814 | 0.6532 | 0.3284 | 0.3838 |
NCF | 0.5217 | 0.6066 | 0.3501 | 0.4164 |
DRR-p | 0.6866 | 0.6679 | 0.9318 | 0.9314 |
DRR-u | 0.6991 | 0.6670 | 0.9429 | 0.9247 |
DRR-ave | 0.6848 | 0.6562 | 0.9414 | 0.9278 |
DDRCN | 0.7312 | 0.7121 | 0.9489 | 0.9341 |
Model | Precision@5 | Precision@10 | NDCG@5 | NDCG@10 |
---|---|---|---|---|
BPR | 0.5200 | 0.6937 | 0.3528 | 0.4046 |
NCF | 0.4478 | 0.6308 | 0.2985 | 0.2564 |
DRR-p | 0.7718 | 0.7483 | 0.9490 | 0.9412 |
DRR-u | 0.7712 | 0.7393 | 0.9534 | 0.9341 |
DRR-ave | 0.7839 | 0.7415 | 0.9508 | 0.9389 |
DDRCN | 0.7970 | 0.7516 | 0.9546 | 0.9413 |
Model | Precision@5 | Precision@10 | NDCG@5 | NDCG@10 |
---|---|---|---|---|
BPR | 0.6173 | 0.6332 | 0.4929 | 0.4359 |
NCF | 0.6081 | 0.6390 | 0.4928 | 0.4214 |
DRR-p | 0.6266 | 0.5902 | 0.9355 | 0.9152 |
DRR-u | 0.6312 | 0.6000 | 0.9343 | 0.9207 |
DRR-ave | 0.6492 | 0.6191 | 0.9376 | 0.9238 |
DDRCN | 0.6516 | 0.6402 | 0.9419 | 0.9254 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Gao, T.; Gao, S.; Xu, J.; Zhao, Q. DDRCN: Deep Deterministic Policy Gradient Recommendation Framework Fused with Deep Cross Networks. Appl. Sci. 2023, 13, 2555. https://doi.org/10.3390/app13042555
Gao T, Gao S, Xu J, Zhao Q. DDRCN: Deep Deterministic Policy Gradient Recommendation Framework Fused with Deep Cross Networks. Applied Sciences. 2023; 13(4):2555. https://doi.org/10.3390/app13042555
Chicago/Turabian StyleGao, Tianhan, Shen Gao, Jun Xu, and Qihui Zhao. 2023. "DDRCN: Deep Deterministic Policy Gradient Recommendation Framework Fused with Deep Cross Networks" Applied Sciences 13, no. 4: 2555. https://doi.org/10.3390/app13042555
APA StyleGao, T., Gao, S., Xu, J., & Zhao, Q. (2023). DDRCN: Deep Deterministic Policy Gradient Recommendation Framework Fused with Deep Cross Networks. Applied Sciences, 13(4), 2555. https://doi.org/10.3390/app13042555