Large Language Models for Structured and Semi-Structured Data, Recommender Systems and Knowledge Base Engineering: A Survey of Recent Techniques and Architectures
Abstract
1. Introduction
1.1. Large Language Models: Definition and Emergence
1.2. Global Adoption Trends of LLMs
1.3. Application of LLMs in Recommendation Systems
1.4. Objectives and Structure
2. Materials and Methods
- Data category: Queries aimed at capturing studies on LLMs processing structured/semi-structured data, using combinations like:
- (LLM OR "large language model") AND ("structured data" OR"semi-structured data")
- AND ("vectorization" OR "knowledge graph" OR "embedding" OR"data transformation")
- Recommendation category: Queries focused on systems integrating LLMs with auxiliary knowledge components for recommendation purposes, such as the following:
- (LLM OR "large language model") AND ("recommendation system" OR"recommender system"OR "retrieval augmented generation" OR RAG) AND ("vector store" OR
- "vector database"OR "graph database" OR "knowledge base")
- Technical identification category: Searches targeting architectural and methodological studies on knowledge base construction and LLM integration, such as the following:
- LLM AND "knowledge base"
- (LLM OR "language model") AND ("knowledge base" OR "knowledge graph")
- AND (integration OR architecture OR evaluation OR design)
2.1. Search Strategy
- Data category: Queries aimed at capturing studies on LLMs processing structured/semi-structured data, using combinations like the following:
- (LLM OR "large language model") AND ("structured data" OR"semi-structured data")
- AND ("vectorization" OR "knowledge graph" OR "embedding" OR"data transformation")
- Recommendation category: Queries focused on systems integrating LLMs with auxiliary knowledge components for recommendation purposes, such as the following:
- (LLM OR "large language model") AND ("recommendation system" OR"recommender system" OR "retrieval augmented generation" OR RAG)AND ("vector store" OR "vector database" OR "graph database" OR
- "knowledge base")
- Technical identification category: Searches targeting architectural and methodological studies on knowledge base construction and LLM integration, such as the following:
- LLM AND "knowledge base",
- (LLM OR "language model") AND ("knowledge base" OR "knowledge graph")AND (integration OR architecture OR evaluation OR design)
2.2. Inclusion and Exclusion Criteria
- Were published between 1 January 2023 and 15 July 2025;
- Were written in English;
- Addressed at least one of the review’s three thematic areas:
- the use of LLMs with structured or semi-structured data;
- the integration of LLMs into recommendation systems, incorporating auxiliary knowledge structures; or
- the design, evaluation or integration techniques for knowledge bases tailored for LLMs;
- Provided original empirical results, novel system implementations, architectural frameworks, or quantitative evaluations;
- Were available as peer-reviewed journal articles, conference papers, or technical reports with substantive technical contributions (including preprints from trusted repositories such as arXiv or OpenReview).
- Focused solely on theoretical discussions, conceptual frameworks or opinion pieces without accompanying technical implementation or empirical validation;
- Addressed general-purpose natural language processing (NLP) or LLM capabilities without specific relevance to structured/semi-structured data processing, knowledge integration, or recommendation systems as defined by thematic areas;
- Published before January 2023;
- Not written in English;
- Duplicates of already identified records;
- Inaccessible in full-text format after reasonable attempts (e.g., checking institutional access, public repositories);
- Non-academic sources such as blog posts, editorials, news articles or promotional materials.
2.3. Screening and Eligibility—PRISMA
- Automated Filtering: An initial batch of records () was retrieved across all databases. The 61,451 records referenced in Figure 1 were not manually reviewed at the title, abstract, and full-text screening level; these were excluded via automated filtering based on metadata issues (e.g., missing abstracts, titles), duplicate entries, non-English language, or thematic irrelevance (e.g., unrelated domains such as biology or pure linguistics). Only entries with valid metadata and matching initial inclusion keywords were retained.
- Keyword-Based Thematic Filtering: Remaining records were filtered using structured Boolean queries aligned with the three review categories (structured/semi-structured data processing, LLM-based recommendation systems, and technical architectures for knowledge augmentation). This step reduced the pool to 227 potentially relevant studies.
- Title and Abstract Screening: Two independent reviewers (Reviewer A and Reviewer B) evaluated the titles and abstracts of the 227 remaining records. Inclusion required explicit alignment with at least one of the review’s three thematic categories. Studies containing relevant keywords but lacking substantive focus on system-level implementation or integration (e.g., generic NLP tasks) were excluded. Conflicts between reviewers were resolved through discussion; a third reviewer (Reviewer C) was designated to resolve disagreements, but was not needed in practice.
- Full-Text Review: A full-text evaluation was conducted on 102 records. Three could not be retrieved due to access limitations. From the remaining 99, 11 studies were excluded—7 were not primary research (e.g., opinion pieces or conceptual overviews), and 4 were duplicate versions of previously assessed work. This led to a final inclusion of 88 studies.
2.4. Data Collection and Thematic Categorization
2.5. Quality Considerations and Risk of Bias
3. Results
3.1. Characteristics of the Identified Literature
3.2. Temporal and Thematic Trends
3.3. Publication Venues and Thematic Focus
4. Discussion and Limitations
- First, although notable advancements have been achieved, handling structured and semi-structured data remains a persistent challenge. Studies reveal that LLMs are highly sensitive to input formatting, schema variability, and lack robustness in symbolic reasoning tasks. While approaches such as reinforcement learning-based context reduction and the construction of vectorized knowledge bases show promise, they introduce considerable complexity and demand careful domain-specific adaptation.
- Second, within technical integration, methods like RAG and KG enhancement significantly improve factual accuracy and reduce hallucinations. However, critical issues such as knowledge drift, retrieval noise, and representational complexity remain unresolved. Although domain-specific fine-tuning and human-in-the-loop validation strategies offer improvements in reliability, they often require substantial resource investments, raising questions about scalability and broader industrial applicability.
- Third, in the domain of LLM-based recommendation systems, research has made strides in enhancing personalization, fairness, and robustness. Nevertheless, challenges persist, including item popularity bias, vulnerability to adversarial inputs, and the efficiency of prompting strategies. Moreover, there is a delicate trade-off between increasing model expressiveness and maintaining strict alignment with explicit user preferences, particularly when integrating user histories or handling ambiguous queries.
4.1. Limitations of Current Research
- Bias and Fairness: Despite advances such as IFairLRS, most recommender systems continue to display biases towards popular or semantically favored items. Comprehensive strategies for systematic bias mitigation are still underdeveloped.
- Robustness to Adversarial Attacks: Few studies adequately address the vulnerability of LLM-augmented systems to adversarial perturbations, though methods like RETURN highlight promising directions.
- Data Dependency and Limited Generalization: Many proposed solutions heavily rely on curated datasets or domain-specific knowledge bases, which restricts their generalizability across diverse application areas.
- Computational Efficiency: Several frameworks introduce significant computational overhead, with trade-offs between model accuracy, interpretability, and scalability often insufficiently explored.
- Evaluation Metrics: A lack of standardized, domain-specific evaluation protocols persists, particularly for multi-modal and conversational recommender systems.
4.2. Broader Implications and Future Research Directions
- Scalable hybrid architectures: There is a need for modular systems that integrate Retrieval-Augmented Generation (RAG) with knowledge graphs (KGs) and domain-specific ontologies. While models like K-RAGRec [10] and FastRAG [11] demonstrate initial success, future architectures should emphasize transferability across domains without sacrificing retrieval precision or inflating inference latency.
- Fairness metrics tailored to recommendation settings: Studies such as Jiang et al. (2024) [12] introduced item-side fairness via the IFairLRS framework, but broader adoption of such metrics is lacking. Future work should develop fairness benchmarks that explicitly account for long-tail item exposure, semantic group fairness, and user-group fairness trade-offs.
- Robustness under adversarial and noisy conditions: Despite promising frameworks like RETURN (Ning et al., 2025) [13], few works systematically test LLM-based recommendation systems under perturbed or adversarial conditions. Simulation-based adversarial testing environments should be adopted to evaluate real-world system resilience.
- Cross-domain generalization: Many current solutions rely on heavily curated or domain-specific datasets. Techniques such as iterative reasoning (StructGPT) and schema linking show promise but remain underexplored outside of narrow contexts. Research should investigate prompt adaptation, meta-learning, and modular representation learning to reduce overfitting to specific domains.
- Evaluation frameworks: There is a lack of unified, multimodal evaluation pipelines. Benchmarks such as RecBench+ (Huang et al., 2025) [14] offer a start but are limited in scope. A comprehensive framework should combine trustworthiness, transparency, hallucination resistance, response latency, and alignment with user intent.
5. LLMs for Structured and Semi-Structured Data
6. Architectures and Trends in LLM Recommenders
7. Enhancing LLMs with RAG, Knowledge Graphs, and Prompts
8. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AI | Artificial intelligence |
GPT | Generative Pre-trained Transformer |
KG | Knowledge Graph |
LLM | Large Language Model |
NLP | Natural Langauge Processing |
PRISMA | Preferred Reporting Items for Systematic Reviews and Meta-Analyses |
RAG | Retrieval-Augmented Generation |
RL | Reinforcement Learning |
RS, RecSys | Recommendation System |
SAR | Sustainability Augmented Reranking |
References
- Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; von Arx, M.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2022, arXiv:2108.07258. [Google Scholar]
- Brown, T.B.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. In Proceedings of the Annual Conference on Neural Information Processing Systems 2020, Virtual, 6–12 December 2020. [Google Scholar]
- Deloitte. Deloitte’s State of Generative AI in the Enterprise Quarter Four Report. 2024. Available online: https://www2.deloitte.com/content/dam/Deloitte/us/Documents/consulting/us-state-of-gen-ai-q4.pdf (accessed on 2 May 2025).
- McKinsey. The Economic Potential of Generative AI: The Next Productivity Frontier. 2023. Available online: https://www.mckinsey.com/~/media/mckinsey/business%20functions/mckinsey%20digital/our%20insights/the%20economic%20potential%20of%20generative%20ai%20the%20next%20productivity%20frontier/the-economic-potential-of-generative-ai-the-next-productivity-frontier.pdf (accessed on 2 May 2025).
- Li, H.; Xi, J.; Hsu, C.H.; Yu, B.X.; Zheng, X.K. Generative artificial intelligence in tourism management: An integrative review and roadmap for future research. Tour. Manag. 2025, 110, 105179. [Google Scholar] [CrossRef]
- Xiao, X. MMAgentRec, a personalized multi-modal recommendation agent with large language model. Nature 2025, 15, 12062. [Google Scholar] [CrossRef]
- Zakarija, I.; Škopljanac Mačina, F.; Marušić, H.; Blašković, B. A Sentiment Analysis Model Based on User Experiences of Dubrovnik on the Tripadvisor Platform. Appl. Sci. 2024, 14, 8304. [Google Scholar] [CrossRef]
- Brynjolfsson, E.; Li, D.; Raymond, L. Generative AI at work. Q. J. Econ. 2025, 140, 889–942. [Google Scholar] [CrossRef]
- Acemoglu, D.; Restrepo, P. Artificial intelligence, automation, and work. In The Economics of Artificial Intelligence: An Agenda; University of Chicago Press: Chicago, IL, USA, 2018; pp. 197–236. [Google Scholar]
- Wang, S.; Fan, W.; Feng, Y.; Ma, X.; Wang, S.; Yin, D. Knowledge Graph Retrieval-Augmented Generation for LLM-based Recommendation. arXiv 2025, arXiv:2501.02226. [Google Scholar]
- Abane, A.; Bekri, A.; Battou, A. FastRAG: Retrieval Augmented Generation for Semi-structured Data. arXiv 2024, arXiv:2411.13773. [Google Scholar]
- Jiang, M.; Bao, K.; Zhang, J.; Wang, W.; Yang, Z.; Feng, F.; He, X. Item-side Fairness of Large Language Model-based Recommendation System. In Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024. [Google Scholar]
- bo Ning, L.; Fan, W.; Li, Q. Retrieval-Augmented Purifier for Robust LLM-Empowered Recommendation. arXiv 2025, arXiv:2504.02458. [Google Scholar]
- Huang, J.; Wang, S.; bo Ning, L.; Fan, W.; Wang, S.; Yin, D.; Li, Q. Towards Next-Generation Recommender Systems: A Benchmark for Personalized Recommendation Assistant with LLMs. arXiv 2025, arXiv:2503.09382. [Google Scholar]
- Li, B.; Jiang, G.; Li, N.; Song, C. Research on large-scale structured and unstructured data processing based on large language model. In Proceedings of the International Conference on Machine Learning, Pattern Recognition and Automation Engineering, Singapore, 7–9 August 2024; pp. 111–116. [Google Scholar]
- Ko, H.; Yang, H.; Han, S.; Kim, S.; Lim, S.; Hormazabal, R. Filling in the Gaps: LLM-Based Structured Data Generation from Semi-Structured Scientific Data. In Proceedings of the ICML 2024 AI for Science Workshop, Vienna, Austria, 21–27 July 2024. [Google Scholar]
- Moundas, M.; White, J.; Schmidt, D.C. Prompt patterns for structured data extraction from unstructured text. In Proceedings of the 31st Pattern Languages of Programming (PLoP) Conference, Columbia River Gorge, WA, USA, 13–16 October 2024. [Google Scholar]
- Wu, X.; Tsioutsiouliklis, K. Thinking with Knowledge Graphs: Enhancing LLM Reasoning Through Structured Data. arXiv 2024, arXiv:2412.10654. [Google Scholar] [CrossRef]
- Zhong, Y.; Deng, Y.; Chai, C.; Gu, R.; Yuan, Y.; Wang, G.; Cao, L. Doctopus: A System for Budget-aware Structural Data Extraction from Unstructured Documents. In Proceedings of the Companion of the 2025 International Conference on Management of Data, Berlin, Germany, 22–27 June 2025; pp. 275–278. [Google Scholar]
- Huang, X.; Surve, M.; Liu, Y.; Luo, T.; Wiest, O.; Zhang, X.; Chawla, N.V. Application of Large Language Models in Chemistry Reaction Data Extraction and Cleaning. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 3797–3801. [Google Scholar]
- Sui, Y.; Zhou, M.; Zhou, M.; Han, S.; Zhang, D. Table Meets LLM: Can Large Language Models Understand Structured Table Data? A Benchmark and Empirical Study. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Yucatan, Mexico, 4–8 March 2023. [Google Scholar]
- Lee, Y.; Kim, S.; Rossi, R.A.; Yu, T.; Chen, X. Learning to reduce: Towards improving performance of large language models on structured data. arXiv 2024, arXiv:2407.02750. [Google Scholar] [CrossRef]
- Fang, X.; Xu, W.; Tan, F.A.; Zhang, J.; Hu, Z.; Qi, Y.; Nickleach, S.; Socolinsky, D.; Sengamedu, S.; Faloutsos, C. Large Language Models (LLMs) on Tabular Data: Prediction, Generation, and Understanding—A Survey. arXiv 2024, arXiv:2402.17944. [Google Scholar]
- Zhang, Y.; Zhong, M.; Ouyang, S.; Jiao, Y.; Zhou, S.; Ding, L.; Han, J. Automated Mining of Structured Knowledge from Text in the Era of Large Language Models. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 1–8 February 2024; pp. 6644–6654. [Google Scholar]
- Zou, Y.; Shi, M.; Chen, Z.; Deng, Z.; Lei, Z.; Zeng, Z.; Yang, S.; Tong, H.; Xiao, L.; Zhou, W. ESGReveal: An LLM-based approach for extracting structured data from ESG reports. J. Clean. Prod. 2025, 489, 144572. [Google Scholar] [CrossRef]
- Jiang, J.; Zhou, K.; Dong, Z.; Ye, K.; Zhao, W.X.; Wen, J.R. Structgpt: A general framework for large language model to reason over structured data. arXiv 2023, arXiv:2305.09645. [Google Scholar] [CrossRef]
- Chen, H.; Shen, X.; Wang, J.; Wang, Z.; Lv, Q.; He, J.; Wu, R.; Wu, F.; Ye, J. Knowledge Graph Finetuning Enhances Knowledge Manipulation in Large Language Models. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Tan, Z.; Li, D.; Wang, S.; Beigi, A.; Jiang, B.; Bhattacharjee, A.; Karami, M.; Li, J.; Cheng, L.; Liu, H. Large Language Models for Data Annotation and Synthesis: A Survey. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 930–957. [Google Scholar]
- Ghiani, G.; Solazzo, G.; Elia, G. Integrating Large Language Models and Optimization in Semi- Structured Decision Making: Methodology and a Case Study. Algorithms 2024, 17, 582. [Google Scholar] [CrossRef]
- Paoli, S.D. Performing an Inductive Thematic Analysis of Semi-Structured Interviews with a Large Language Model: An Exploration and Provocation on the Limits of the Approach. Soc. Sci. Comput. Rev. 2024, 42, 997–1019. [Google Scholar] [CrossRef]
- Wang, L.; Ma, C.; Feng, X.; Zhang, Z.; Yang, H.; Zhang, J.; Chen, Z.; Tang, J.; Chen, X.; Lin, Y.; et al. A survey on large language model based autonomous agents. Front. Comput. Sci. 2024, 18, 186345. [Google Scholar] [CrossRef]
- Loureiro, S.M.C.; Guerreiro, J.; Friedmann, E.; Lee, M.J.; and, H.H. Tourists and artificial intelligence-LLM interaction: The power of forgiveness. Curr. Issues Tour. 2024, 28, 1172–1190. [Google Scholar] [CrossRef]
- Hou, Y.; Zhang, A.; Sheng, L.; Yang, Z.; Wang, X.; Chua, T.S.; McAuley, J. Generative Recommendation Models: Progress and Directions. In Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, Sydney, Australia, 28 April–2 May 2025; pp. 13–16. [Google Scholar]
- Lin, J.; Dai, X.; Xi, Y.; Liu, W.; Chen, B.; Zhang, H.; Liu, Y.; Wu, C.; Li, X.; Zhu, C.; et al. How Can Recommender Systems Benefit from Large Language Models: A Survey. ACM Trans. Inf. Syst. 2025, 43, 28. [Google Scholar] [CrossRef]
- Zhao, Z.; Fan, W.; Li, J.; Liu, Y.; Mei, X.; Wang, Y.; Wen, Z.; Wang, F.; Zhao, X.; Tang, J.; et al. Recommender Systems in the Era of Large Language Models (LLMs). IEEE Trans. Knowl. Data Eng. 2024, 36, 6889–6907. [Google Scholar] [CrossRef]
- Chen, J.; Liu, Z.; Huang, X.; Wu, C.; Liu, Q.; Jiang, G.; Pu, Y.; Lei, Y.; Chen, X.; Wang, X.; et al. When large language models meet personalization: Perspectives of challenges and opportunities. World Wide Web 2024, 27, 42. [Google Scholar] [CrossRef]
- Zhang, J.; Xie, R.; Hou, Y.; Zhao, X.; Lin, L.; Wen, J.R. Recommendation as Instruction Following: A Large Language Model Empowered Recommendation Approach. ACM Trans. Inf. Syst. 2025, 43, 114. [Google Scholar] [CrossRef]
- Xu, L.; Zhang, J.; Li, B.; Wang, J.; Chen, S.; Zhao, W.X.; Wen, J.R. Tapping the Potential of Large Language Models as Recommender Systems: A Comprehensive Framework and Empirical Analysis. ACM Trans. Knowl. Discov. Data 2024, 19, 105. [Google Scholar] [CrossRef]
- Lian, J.; Lei, Y.; Huang, X.; Yao, J.; Xu, W.; Xie, X. RecAI: Leveraging Large Language Models for Next-Generation Recommender Systems. In Proceedings of the Companion Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 1031–1034. [Google Scholar]
- Ren, Y.; Chen, Z.; Yang, X.; Li, L.; Jiang, C.; Cheng, L.; Zhang, B.; Mo, L.; Zhou, J. Enhancing Sequential Recommenders with Augmented Knowledge from Aligned Large Language Models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, Washington, DC, USA, 14–18 July 2024; pp. 345–354. [Google Scholar]
- Chu, Z.; Wang, Z.; Zhang, R.; Ji, Y.; Wang, H.; Sun, T. Improve Temporal Awareness of LLMs for Domain-general Sequential Recommendation. In Proceedings of the ICML 2024 Workshop on In-Context Learning, Vienna, Austria, 27 July 2024. [Google Scholar]
- Na, H.; Gang, M.; Ko, Y.; Seol, J.; Lee, S.g. Enhancing Large Language Model Based Sequential Recommender Systems with Pseudo Labels Reconstruction. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; pp. 7213–7222. [Google Scholar]
- Yang, T.; Chen, L. Unleashing the Retrieval Potential of Large Language Models in Conversational Recommender Systems. In Proceedings of the 18th ACM Conference on Recommender Systems, Vienna, Austria, 21–27 July 2024; pp. 43–52. [Google Scholar]
- Zhu, Y.; Wan, C.; Steck, H.; Liang, D.; Feng, Y.; Kallus, N.; Li, J. Collaborative Retrieval for Large Language Model-based Conversational Recommender Systems. In Proceedings of the ACM on Web Conference 2025, Sydney, Australia, 28 April–2 May 2025; pp. 3323–3334. [Google Scholar]
- Kim, S.; Kang, H.; Choi, S.; Kim, D.; Yang, M.; Park, C. Large Language Models meet Collaborative Filtering: An Efficient All-round LLM-based Recommender System. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 1395–1406. [Google Scholar]
- Zhu, Y.; Wu, L.; Guo, Q.; Hong, L.; Li, J. Collaborative Large Language Model for Recommender Systems. In Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 3162–3172. [Google Scholar]
- Qu, H.; Fan, W.; Lin, S. Generative Recommendation with Continuous-Token Diffusion. arXiv 2025, arXiv:2504.12007. [Google Scholar] [CrossRef]
- Wei, C.; Duan, K.; Zhuo, S.; Wang, H.; Huang, S.; Liu, J. Enhanced Recommendation Systems with Retrieval-Augmented Large Language Model. J. Artif. Int. Res. 2025, 28, 1147–1173. [Google Scholar] [CrossRef]
- Jeong, C.; Kang, Y.; Cho, Y.S. Leveraging Refined Negative Feedback with LLM for Recommender Systems. In Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, Sydney, Australia, 28 April–2 May 2025; pp. 1028–1032. [Google Scholar]
- Fayyazi, A.; Kamal, M.; Pedram, M. FACTER: Fairness-Aware Conformal Thresholding and Prompt Engineering for Enabling Fair LLM-Based Recommender Systems. In Proceedings of the Forty-Second International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
- Zhao, J.; Wang, W.; Xu, C.; Ng, S.K.; Chua, T.S. A Federated Framework for LLM-based Recommendation. In Proceedings of the Findings of the Association for Computational Linguistics: NAACL 2025, Albuquerque, NM, USA, 30 April 2025; pp. 2852–2865. [Google Scholar]
- Wang, J.; Karatzoglou, A.; Arapakis, I.; Jose, J.M. Large Language Model driven Policy Exploration for Recommender Systems. In Proceedings of the Eighteenth ACM International Conference on Web Search and Data Mining, Hannover, Germany, 10–14 March 2025; pp. 107–116. [Google Scholar]
- Tan, Z.; Zeng, Q.; Tian, Y.; Liu, Z.; Yin, B.; Jiang, M. Democratizing Large Language Models via Personalized Parameter-Efficient Fine-tuning. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 6476–6491. [Google Scholar]
- Zhao, S.; Hong, M.; Liu, Y.; Hazarika, D.; Lin, K. Do LLMs Recognize Your Preferences? Evaluating Personalized Preference Following in LLMs. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Wu, Z.; Jia, Q.; Wu, C.; Du, Z.; Wang, S.; Wang, Z.; Dong, Z. RecSys Arena: Pair-wise Recommender System Evaluation with Large Language Models. arXiv 2024, arXiv:2412.11068. [Google Scholar]
- Rajput, S.; Mehta, N.; Singh, A.; Keshavan, R.H.; Vu, T.; Heldt, L.; Hong, L.; Tay, Y.; Tran, V.Q.; Samost, J.; et al. Recommender Systems with Generative Retrieval. In Proceedings of the Thirty-Seventh Conference on Neural Information Processing Systems, New Orleans, LA, USA, 10–16 December 2023. [Google Scholar]
- Yu, Y.; Qi, S.a.; Li, B.; Niu, D. PepRec: Progressive Enhancement of Prompting for Recommendation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 17941–17953. [Google Scholar]
- Wang, H.; Wu, C.; Huang, Y.; Qi, T. Learning Human Feedback from Large Language Models for Content Quality-aware Recommendation. ACM Trans. Inf. Syst. 2025, 43, 86. [Google Scholar] [CrossRef]
- Shang, F.; Zhao, F.; Zhang, M.; Sun, J.; Shi, J. Personalized recommendation systems powered by large language models: Integrating semantic understanding and user preferences. Int. J. Innov. Res. Eng. Manag. 2024, 11, 39–49. [Google Scholar] [CrossRef]
- Liu, H.; Tang, X.; Chen, T.; Liu, J.; Indu, I.; Zou, H.P.; Dai, P.; Galan, R.F.; Porter, M.D.; Jia, D.; et al. Sequential LLM Framework for Fashion Recommendation. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing: Industry Track, Miami, FL, USA, 12–16 November 2024; pp. 1276–1285. [Google Scholar]
- Liu, X.; Wang, R.; Sun, D.; Hakkani Tur, D.; Abdelzaher, T. Uncovering Cross-Domain Recommendation Ability of Large Language Models. In Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, Sydney, Australia, 28 April–2 May 2025; pp. 2736–2743. [Google Scholar]
- Weber, I. Large Language Models are Pattern Matchers: Editing Semi-Structured and Structured Documents with ChatGPT. arXiv 2024, arXiv:2409.07732. [Google Scholar] [CrossRef]
- Huang, C.; Wu, J.; Xia, Y.; Yu, Z.; Wang, R.; Yu, T.; Zhang, R.; Rossi, R.A.; Kveton, B.; Zhou, D.; et al. Towards Agentic Recommender Systems in the Era of Multimodal Large Language Models. arXiv 2025, arXiv:2503.16734. [Google Scholar]
- Zhang, J.; Liu, Z.; Lian, D.; Chen, E. Generalization Error Bounds for Two-stage Recommender Systems with Tree Structure. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- Baumann, J.; Mendler-Dünner, C. Algorithmic Collective Action in Recommender Systems: Promoting Songs by Reordering Playlists. In Proceedings of the Thirty-Eighth Annual Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- Tian, J.; Wang, Z.; Zhao, J.; Ding, Z. MMREC: LLM Based Multi-Modal Recommender System. In Proceedings of the 2024 19th International Workshop on Semantic and Social Media Adaptation & Personalization (SMAP), Athens, Greece, 21–22 November 2024; pp. 105–110. [Google Scholar]
- Prahlad, D.; Lee, C.; Kim, D.; Kim, H. Personalizing Large Language Models using Retrieval Augmented Generation and Knowledge Graph. In Proceedings of the Companion Proceedings of the ACM on Web Conference 2025, Sydney, Australia, 28 April–2 May 2025; pp. 1259–1263. [Google Scholar]
- Song, S.; Yang, C.; Xu, L.; Shang, H.; Li, Z.; Chang, Y. TravelRAG: A Tourist Attraction Retrieval Framework Based on Multi-Layer Knowledge Graph. ISPRS Int. J. Geo-Inf. 2024, 13, 414. [Google Scholar] [CrossRef]
- Banerjee, A.; Satish, A.; Wörndl, W. Enhancing Tourism Recommender Systems for Sustainable City Trips Using Retrieval-Augmented Generation. In Proceedings of the Recommender Systems for Sustainability and Social Good, Prague, Czech Republic, 22–26 September 2025; pp. 19–34. [Google Scholar]
- Yang, H.; Guo, J.; Qi, J.Q.; Xie, J.; Zhang, S.; Yang, S.; Li, N.; Xu, M. A Method for Parsing and Vectorization of Semi-structured Data used in Retrieval Augmented Generation. arXiv 2024, arXiv:2405.03989. [Google Scholar] [CrossRef]
- Li, J.; Yuan, Y.; Zhang, Z. Enhancing LLM Factual Accuracy with RAG to Counter Hallucinations: A Case Study on Domain-Specific Queries in Private Knowledge-Bases. arXiv 2024, arXiv:2403.10446. [Google Scholar] [CrossRef]
- Wu, Z.; Lin, X.; Dai, Z.; Hu, W.; Shu, Y.; Ng, S.K.; Jaillet, P.; Low, B.K.H. Prompt Optimization with EASE? Efficient Ordering-aware Automated Selection of Exemplars. In Proceedings of the Annual Conference on Neural Information Processing Systems 2024, Vancouver, BC, Canada, 10–15 December 2024; pp. 122706–122740. [Google Scholar]
- Ding, Y.; Fan, W.; Huang, X.; Li, Q. Large Language Models for Graph Learning. In Proceedings of the Companion Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 1643–1646. [Google Scholar]
- Wang, X.; Cui, J.; Fukumoto, F.; Suzuki, Y. Enhancing High-order Interaction Awareness in LLM-based Recommender Model. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, Miami, FL, USA, 12–16 November 2024; pp. 11696–11711. [Google Scholar]
- Zhang, Q.; Dong, J.; Chen, H.; Zha, D.; Yu, Z.; Huang, X. KnowGPT: Knowledge Graph based Prompting for Large Language Models. In Proceedings of the Annual Conference on Neural Information Processing Systems 2024, Vancouver, BC, Canada, 10–15 December 2024; pp. 6052–6080. [Google Scholar]
- Qiu, Z.; Luo, L.; Zhao, Z.; Pan, S.; Liew, A.W.C. Graph Retrieval-Augmented LLM for Conversational Recommendation Systems. In Proceedings of the Advances in Knowledge Discovery and Data Mining, Sydney, Australia, 10–13 June 2025; pp. 344–355. [Google Scholar]
- Yang, W.; Some, L.; Bain, M.; Kang, B. A comprehensive survey on integrating large language models with knowledge-based methods. Knowl.-Based Syst. 2025, 318, 113503. [Google Scholar] [CrossRef]
- Benjira, W.; Atigui, F.; Bucher, B.; Grim-Yefsah, M.; Travers, N. Automated mapping between SDG indicators and open data: An LLM-augmented knowledge graph approach. Data Knowl. Eng. 2025, 156, 102405. [Google Scholar] [CrossRef]
- Choi, S.; Jung, Y. Knowledge Graph Construction: Extraction, Learning, and Evaluation. Appl. Sci. 2025, 15, 3727. [Google Scholar] [CrossRef]
- Tsaneva, S.; Dessì, D.; Osborne, F.; Sabou, M. Knowledge graph validation by integrating LLMs and human-in-the-loop. Inf. Process. Manag. 2025, 62, 104145. [Google Scholar] [CrossRef]
- Yao, L.; Peng, J.; Mao, C.; Luo, Y. Exploring Large Language Models for Knowledge Graph Completion. In Proceedings of the ICASSP 2025—2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; pp. 1–5. [Google Scholar]
- Chao, W.S.; Zheng, Z.; Zhu, H.; Liu, H. Make Large Language Model a Better Ranker. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; pp. 918–929. [Google Scholar]
- Hsu, S.; Khattab, O.; Finn, C.; Sharma, A. Grounding by Trying: LLMs with Reinforcement Learning-Enhanced Retrieval. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Bansal, H.; Hosseini, A.; Agarwal, R.; Tran, V.Q.; Kazemi, M. Smaller, Weaker, Yet Better: Training LLM Reasoners via Compute-Optimal Sampling. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Kumar, K.; Ashraf, T.; Thawakar, O.; Anwer, R.M.; Cholakkal, H.; Shah, M.; Yang, M.H.; Torr, P.H.; Khan, S.H.; Khan, F.S. LLM Post-Training: A Deep Dive into Reasoning Large Language Models. arXiv 2025, arXiv:2502.21321. [Google Scholar] [CrossRef]
- Zhang, J. Guided Profile Generation Improves Personalization with Large Language Models. In Proceedings of the Findings of the Association for Computational Linguistics: EMNLP 2024, Miami, FL, USA, 12–16 November 2024; pp. 4005–4016. [Google Scholar]
- Huang, T.J.; Yang, J.Q.; Shen, C.; Liu, K.Q.; Zhan, D.C.; Ye, H.J. Improving LLMs for Recommendation with Out-of-Vocabulary Tokens. In Proceedings of the Forty-Second International Conference on Machine Learning, Vancouver, BC, Canada, 13–19 July 2025. [Google Scholar]
- Thomas, V.; Ma, J.; Hosseinzadeh, R.; Golestan, K.; Yu, G.; Volkovs, M.; Caterini, A. Retrieval & Fine-Tuning for In-Context Tabular Models. In Proceedings of the Annual Conference on Neural Information Processing Systems 2024, Vancouver, BC, Canada, 10–15 December 2024; pp. 108439–108467. [Google Scholar]
- Liao, M.; Chen, W.; Shen, J.; Guo, S.; Wan, H. HMoRA: Making LLMs More Effective with Hierarchical Mixture of LoRA Experts. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Wei, Q.; Yang, M.; Wang, J.; Mao, W.; Xu, J.; Ning, H. TourLLM: Enhancing LLMs with Tourism Knowledge. arXiv 2024, arXiv:2407.12791. [Google Scholar]
- Jeong, C. Domain-specialized LLM: Financial fine-tuning and utilization method using Mistral 7B. J. Intell. Inf. Syst. 2024, 30, 93–120. [Google Scholar] [CrossRef]
- Sordoni, A.; Yuan, E.; Côté, M.A.; Pereira, M.; Trischler, A.; Xiao, Z.; Hosseini, A.; Niedtner, F.; Le Roux, N. Joint Prompt Optimization of Stacked LLMs using Variational Inference. In Proceedings of the Annual Conference on Neural Information Processing Systems 2023, New Orleans, LA, USA, 10–16 December 2023; pp. 58128–58151. [Google Scholar]
- Xu, X.; Ye, Q.; Ren, X. Stress-Testing Long-Context Language Models with Lifelong ICL and Task Haystack. In Proceedings of the Annual Conference on Neural Information Processing Systems 2024, Vancouver, BC, Canada, 10–15 December 2024; pp. 15801–15840. [Google Scholar]
- Jesson, A.; Beltran-Velez, N.; Chu, Q.; Karlekar, S.; Kossen, J.; Gal, Y.; Cunningham, J.P.; Blei, D. Estimating the Hallucination Rate of Generative AI. In Proceedings of the Annual Conference on Neural Information Processing Systems 2024, Vancouver, BC, Canada, 10–15 December 2024; pp. 31154–31201. [Google Scholar]
- Ahmed, B.S.; Baader, L.O.; Bayram, F.; Jagstedt, S.; Magnusson, P. Quality Assurance for LLM-RAG Systems: Empirical Insights from Tourism Application Testing. In Proceedings of the 2025 IEEE International Conference on Software Testing, Verification and Validation Workshops (ICSTW), Naples, Italy, 31 March–4 April 2025; pp. 200–207. [Google Scholar]
- Qian, Y.; Ye, H.; Fauconnier, J.P.; Grasch, P.; Yang, Y.; Gan, Z. MIA-Bench: Towards Better Instruction Following Evaluation of Multimodal LLMs. In Proceedings of the Thirteenth International Conference on Learning Representations, Singapore, 24–28 April 2025. [Google Scholar]
- Zhang, X.; Wang, D.; Dou, L.; Zhu, Q.; Che, W. A Survey of Table Reasoning with Large Language Models. arXiv 2024, arXiv:2402.08259. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Smajić, A.; Karlović, R.; Bobanović Dasko, M.; Lorencin, I. Large Language Models for Structured and Semi-Structured Data, Recommender Systems and Knowledge Base Engineering: A Survey of Recent Techniques and Architectures. Electronics 2025, 14, 3153. https://doi.org/10.3390/electronics14153153
Smajić A, Karlović R, Bobanović Dasko M, Lorencin I. Large Language Models for Structured and Semi-Structured Data, Recommender Systems and Knowledge Base Engineering: A Survey of Recent Techniques and Architectures. Electronics. 2025; 14(15):3153. https://doi.org/10.3390/electronics14153153
Chicago/Turabian StyleSmajić, Alma, Ratomir Karlović, Mieta Bobanović Dasko, and Ivan Lorencin. 2025. "Large Language Models for Structured and Semi-Structured Data, Recommender Systems and Knowledge Base Engineering: A Survey of Recent Techniques and Architectures" Electronics 14, no. 15: 3153. https://doi.org/10.3390/electronics14153153
APA StyleSmajić, A., Karlović, R., Bobanović Dasko, M., & Lorencin, I. (2025). Large Language Models for Structured and Semi-Structured Data, Recommender Systems and Knowledge Base Engineering: A Survey of Recent Techniques and Architectures. Electronics, 14(15), 3153. https://doi.org/10.3390/electronics14153153