Clothing Recommendation with Multimodal Feature Fusion: Price Sensitivity and Personalization Optimization
Abstract
:1. Introduction
- From the perspective of price sensitivity, a price feature extraction module was designed to capture characteristics such as clothing brand, category, and user historical behavior, enabling the model to provide recommendations aligned with users’ consumption preferences.
- By leveraging transfer learning, pre-trained models were introduced into the domain of clothing outfit recommendation, enhancing the depth and breadth of feature representation in the model’s initial input vectors.
- In the context of multimodal recommendation, we propose an effective recommendation framework that integrates a multi-head attention mechanism into the multimodal recommendation algorithm. This approach explores optimized methods for interaction and fusion across different modalities, thereby improving the model’s ability to learn from diverse modal features.
2. Related Work
3. The Proposed Model
3.1. Feature Extraction and Embedding
3.1.1. Image Feature Extraction and Embedding
3.1.2. Text Feature Extraction and Embedding
3.1.3. Price Feature Extraction and Embedding
- (1)
- Single item price feature
- (2)
- Combination Price Feature
- (3)
- Price Statistical Features from Historical Outfit Records
- (4)
- Category Price Mean and Standard Deviation
- (5)
- Brand Price Mean and Standard Deviation
- (6)
- User Brand Preference Price Features
- (7)
- Discretization of Price Features
- (8)
- One-Hot Encoding
3.2. Recommendation Model
4. Results and Discussions
4.1. Experimental Settings
4.1.1. Dataset
4.1.2. Evaluation Metrics
4.2. Result Analysis and Performance Comparison
4.2.1. Baseline
4.2.2. Comprehensive Performance of Comparative Experiments
4.2.3. Comparative Test Case Study
4.2.4. Comprehensive Performance of Ablation Experiments
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Jo, J.; Lee, S.; Lee, C.; Lee, D.; Lim, H. Development of fashion product retrieval and recommendations model based on deep learning. Electronics 2020, 9, 508. [Google Scholar] [CrossRef]
- Duan, S.; Ouyang, M.; Wang, R.; Li, Q.; Xiao, Y. Let long-term interests talk: An disentangled learning model for recommendation based on short-term interests generation. Inf. Process. Manag. 2025, 62, 103997. [Google Scholar] [CrossRef]
- Lops, P.; Jannach, D.; Musto, C.; Bogers, T.; Koolen, M. Trends in content-based recommendation: Preface to the special issue on Recommender systems based on rich item descriptions. User Model. User-Adapt. Interact. 2019, 29, 239–249. [Google Scholar] [CrossRef]
- Köhler, S.; Wöhner, T.; Peters, R. The impact of consumer preferences on the accuracy of collaborative filtering recommender systems. Electron. Mark. 2016, 26, 369–379. [Google Scholar] [CrossRef]
- Yin, R. Enhanced Recommender Systems with Deep Neural Networks; University of Technology Sydney: Sydney, NSW, Australia, 2022. [Google Scholar]
- Rendle, S. Factorization machines. In Proceedings of the 2010 IEEE International Conference on Data Mining, Sydney, NSW, Australia, 13–17 December 2010; pp. 995–1000. [Google Scholar]
- Cheng, H.-T.; Koc, L.; Harmsen, J.; Shaked, T.; Chandra, T.; Aradhye, H.; Anderson, G.; Corrado, G.; Chai, W.; Ispir, M. Wide & deep learning for recommender systems. In Proceedings of the 1st Workshop on Deep Learning for Recommender Systems, Boston, MA, USA, 15 September 2016; pp. 7–10. [Google Scholar]
- Guo, H.; Tang, R.; Ye, Y.; Li, Z.; He, X. DeepFM: A factorization-machine based neural network for CTR prediction. arXiv 2017, arXiv:1703.04247. [Google Scholar]
- Song, X.; Han, X.; Li, Y.; Chen, J.; Xu, X.-S.; Nie, L. GP-BPR: Personalized compatibility modeling for clothing matching. In Proceedings of the 27th ACM International Conference on Multimedia, Nice, France, 21–25 October 2019; pp. 320–328. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Adam, H. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Miao, Y.; Li, G.; Bao, C. ClothingNet: Cross-domain clothing retrieval with feature fusion and quadruplet loss. IEEE Access 2020, 8, 142669–142679. [Google Scholar] [CrossRef]
- Xuan, Y.; Liao, X.; Su, X. Clothing Image Retrieval Method Based on Convolutional Block Attention Model. J. Comput. Sci. Appl. 2022, 12, 1331–1340. [Google Scholar]
- Mu, C.; Guo, Z.; Liu, Y. A multi-scale and multi-level spectral-spatial feature fusion network for hyperspectral image classification. Remote Sens. 2020, 12, 125. [Google Scholar] [CrossRef]
- Guo, M.-H.; Xu, T.-X.; Liu, J.-J.; Liu, Z.-N.; Jiang, P.-T.; Mu, T.-J.; Zhang, S.-H.; Martin, R.R.; Cheng, M.-M.; Hu, S.-M. Attention mechanisms in computer vision: A survey. Comput. Vis. Media 2022, 8, 331–368. [Google Scholar] [CrossRef]
- Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient estimation of word representations in vector space. arXiv 2013, arXiv:1301.3781. [Google Scholar]
- Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; Volume 1, pp. 4171–4186. [Google Scholar]
- Gao, D.; Jin, L.; Chen, B.; Qiu, M.; Li, P.; Wei, Y.; Hu, Y.; Wang, H. FashionBERT: Text and Image Matching with Adaptive Loss for Cross-modal Retrieval. In Proceedings of the 43rd International ACM SIGIR Conference on Research and Development in Information Retrieval, Xi’an, China, 25–30 July 2020; pp. 2251–2260. [Google Scholar]
- Li, Y.; Chen, T.; Huang, Z. Attribute-aware explainable complementary clothing recommendation. World Wide Web 2021, 24, 1885–1901. [Google Scholar] [CrossRef]
- Zheng, Y.; Gao, C.; He, X.; Li, Y.; Jin, D. Price-aware recommendation with graph convolutional networks. In Proceedings of the 2020 IEEE 36th International Conference on Data Engineering (ICDE), Dallas, TX, USA, 20–24 April 2020; pp. 133–144. [Google Scholar]
- Chen, J.; Jin, Q.; Zhao, S.; Bao, S.; Zhang, L.; Su, Z.; Yu, Y. Does product recommendation meet its Waterloo in unexplored categories? No, price comes to help. In Proceedings of the 37th International ACM SIGIR Conference on Research & Development in Information Retrieval, Gold Coast, QLD, Australia, 3 July 2014; pp. 667–676. [Google Scholar]
- Wang, J.; Zhang, Y. Utilizing marginal net utility for recommendation in e-commerce. In Proceedings of the 34th International ACM SIGIR Conference on Research and Development in Information Retrieval, Beijing, China, 25–29 July 2011; pp. 1003–1012. [Google Scholar]
- Lin, H.; Zhu, H.; Wu, J.; Zuo, Y.; Zhu, C.; Xiong, H. Enhancing employer brand evaluation with collaborative topic regression models. ACM Trans. Inf. Syst. (TOIS) 2020, 38, 1–33. [Google Scholar] [CrossRef]
- Liu, H.; Li, L.; Yu, N.; Ma, K.; Peng, T.; Hu, X. Outfit compatibility model using fully connected self-adjusting graph neural network. Vis. Comput. 2024, 40, 8331–8343. [Google Scholar] [CrossRef]
- Zafar, S.; Kumar, S.; Ahilan, A.; Cakir, G.K. Industry 5.0 for Smart Healthcare Technologies: Utilizing Artificial Intelligence, Internet of Medical Things and Blockchain; CRC Press: Boca Raton, FL, USA, 2024. [Google Scholar]
- Ugail, H. Deep Learning in Visual Computing: Explanations and Examples; CRC Press: Boca Raton, FL, USA, 2022. [Google Scholar]
- Su, J.; Her, P.; Clemens, E.; Yaz, E.; Schneider, S.; Medeiros, H. Violence detection using 3d convolutional neural networks. In Proceedings of the 2022 18th IEEE International Conference on Advanced Video and Signal Based Surveillance (AVSS), Madrid, Spain, 24 November 2022; pp. 1–8. [Google Scholar]
- Conia, S.; Li, M.; Lee, D.; Minhas, U.F.; Ilyas, I.; Li, Y. Increasing coverage and precision of textual information in multilingual knowledge graphs. arXiv 2023, arXiv:2311.15781. [Google Scholar]
- Branavan, S.; Silver, D.; Barzilay, R. Learning to win by reading manuals in a monte-carlo framework. J. Artif. Intell. Res. 2012, 43, 661–704. [Google Scholar] [CrossRef]
- Peyton, K.; Unnikrishnan, S.; Mulligan, B. A review of university chatbots for student support: FAQs and beyond. Discov. Educ. 2025, 4, 21. [Google Scholar] [CrossRef]
- Zhang, S.; Tay, Y.; Yao, L.; Sun, A.; Zhang, C. Deep learning for recommender systems. In Recommender Systems Handbook; Springer: Berlin/Heidelberg, Germany, 2021; pp. 173–210. [Google Scholar]
- Gao, S.; Hu, Y.; Li, W. Handbook of Geospatial Artificial Intelligence; CRC Press: Boca Raton, FL, USA, 2023. [Google Scholar]
- Singh, A. Impacts of Adversarial Machine Learning Methods in Deep Learning Models Used in Iot Environments; National University of Singapore: Singapore, 2023. [Google Scholar]
- Hui, B.; Zhang, L.; Zhou, X.; Wen, X.; Nian, Y. Personalized recommendation system based on knowledge embedding and historical behavior. Appl. Intell. 2021, 52, 954–966. [Google Scholar] [CrossRef]
- Wang, M.; Shi, X. Research on User Behavior Analysis in E-commerce Platforms Based on Personalized Recommendation Algorithms. In Proceedings of the International Conference on Decision Science & Management, Hong Kong, China, 26–28 April 2024; pp. 140–145. [Google Scholar]
- Park, G.; Han, C.; Yoon, W.; Kim, D. MHSAN: Multi-head self-attention network for visual semantic embedding. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, Snowmass Village, CO, USA, 1–5 March 2020; pp. 1518–1526. [Google Scholar]
- Wang, X.; Li, Q.; Yu, D.; Huang, W.; Li, Q.; Xu, G. Neural causal graph collaborative filtering. Inf. Sci. 2024, 677, 120872. [Google Scholar] [CrossRef]
- Fu, J.; Fu, Y.; Xue, H.; Xu, Z. TMFN: A text-based multimodal fusion network with multi-scale feature extraction and unsupervised contrastive learning for multimodal sentiment analysis. Complex Intell. Syst. 2025, 11, 133. [Google Scholar] [CrossRef]
- Xie, Y.; Lin, J.; Dong, H.; Zhang, L.; Wu, Z. Survey of code search based on deep learning. ACM Trans. Softw. Eng. Methodol. 2023, 33, 1–42. [Google Scholar] [CrossRef]
- Järvelin, K.; Kekäläinen, J. Cumulated gain-based evaluation of IR techniques. ACM Trans. Inf. Syst. (TOIS) 2002, 20, 422–446. [Google Scholar] [CrossRef]
- Shani, G.; Gunawardana, A. Evaluating recommendation systems. In Recommender Systems Handbook; Springer: Berlin/Heidelberg, Germany, 2011; pp. 257–297. [Google Scholar]
- Bradley, A.P. The use of the area under the ROC curve in the evaluation of machine learning algorithms. Pattern Recognit. 1997, 30, 1145–1159. [Google Scholar] [CrossRef]
- Koren, Y.; Bell, R.; Volinsky, C. Matrix factorization techniques for recommender systems. Computer 2009, 42, 30–37. [Google Scholar] [CrossRef]
- Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Stat. Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
- Radford, A.; Kim, J.W.; Hallacy, C.; Ramesh, A.; Goh, G.; Agarwal, S.; Sastry, G.; Askell, A.; Mishkin, P.; Clark, J. Learning transferable visual models from natural language supervision. In Proceedings of the International Conference on Machine Learning, Virtually, 18–24 July 2021; pp. 8748–8763. [Google Scholar]
- Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Dosovitskiy, A.; Beyer, L.; Kolesnikov, A.; Weissenborn, D.; Zhai, X.; Unterthiner, T.; Houlsby, N. An image is worth 16x16 words: Transformers for image recognition at scale. arXiv 2020, arXiv:2010.11929. [Google Scholar]
Item Category | Quantity |
---|---|
Outwear | 35,765 |
Tops | 119,895 |
Bottom | 77,813 |
Dresses | 25,816 |
Shoes | 106,598 |
Accessories | 306,448 |
Dataset | Model | AUC | Recall@5 | Recall@10 | NDCG@5 | NDCG@10 |
---|---|---|---|---|---|---|
Image, text | LR | 0.680 | 0.231 | 0.267 | 0.298 | 0.419 |
FM | 0.759 | 0.415 | 0.516 | 0.388 | 0.483 | |
Wide&Deep | 0.763 | 0.461 | 0.549 | 0.405 | 0.501 | |
GP-BPR | 0.779 | 0.486 | 0.586 | 0.438 | 0.530 | |
DeepFM | 0.786 | 0.519 | 0.601 | 0.455 | 0.553 | |
DeepFMP | 0.802 | 0.536 | 0.622 | 0.476 | 0.585 | |
Image, text and price | LR | 0.709 | 0.239 | 0.278 | 0.311 | 0.436 |
FM | 0.786 | 0.432 | 0.538 | 0.405 | 0.503 | |
Wide&Deep | 0.794 | 0.479 | 0.572 | 0.422 | 0.526 | |
GP-BPR | 0.811 | 0.506 | 0.610 | 0.456 | 0.552 | |
DeepFM | 0.816 | 0.545 | 0.635 | 0.483 | 0.596 | |
DeepFMP | 0.833 | 0.558 | 0.648 | 0.495 | 0.609 |
Dataset | Model | AUC | Recall@5 | Recall@10 | NDCG@5 | NDCG@10 |
---|---|---|---|---|---|---|
Image, text | Base | 0.786 | 0.519 | 0.601 | 0.455 | 0.553 |
Image, text | Base + attention | 0.802 | 0.536 | 0.622 | 0.476 | 0.585 |
Image, text, price | Base | 0.816 | 0.545 | 0.635 | 0.483 | 0.596 |
Image, text, price | Base + attention | 0.833 | 0.558 | 0.648 | 0.495 | 0.609 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, C.; Ji, X.; Cai, L. Clothing Recommendation with Multimodal Feature Fusion: Price Sensitivity and Personalization Optimization. Appl. Sci. 2025, 15, 4591. https://doi.org/10.3390/app15084591
Zhang C, Ji X, Cai L. Clothing Recommendation with Multimodal Feature Fusion: Price Sensitivity and Personalization Optimization. Applied Sciences. 2025; 15(8):4591. https://doi.org/10.3390/app15084591
Chicago/Turabian StyleZhang, Chunhui, Xiaofen Ji, and Liling Cai. 2025. "Clothing Recommendation with Multimodal Feature Fusion: Price Sensitivity and Personalization Optimization" Applied Sciences 15, no. 8: 4591. https://doi.org/10.3390/app15084591
APA StyleZhang, C., Ji, X., & Cai, L. (2025). Clothing Recommendation with Multimodal Feature Fusion: Price Sensitivity and Personalization Optimization. Applied Sciences, 15(8), 4591. https://doi.org/10.3390/app15084591