Knowledge Graphs and Their Reciprocal Relationship with Large Language Models

Dehal, Ramandeep Singh; Sharma, Mehak; Rajabi, Enayat

doi:10.3390/make7020038

Open AccessSystematic Review

Knowledge Graphs and Their Reciprocal Relationship with Large Language Models

by

Ramandeep Singh Dehal

^†

,

Mehak Sharma

^†

and

Enayat Rajabi

^*

Management Science Department, Cape Breton University, Sydney, NS B1M 1A2, Canada

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mach. Learn. Knowl. Extr. 2025, 7(2), 38; https://doi.org/10.3390/make7020038

Submission received: 6 March 2025 / Revised: 7 April 2025 / Accepted: 12 April 2025 / Published: 21 April 2025

(This article belongs to the Section Data)

Download

Browse Figures

Versions Notes

Abstract

:

The reciprocal relationship between Large Language Models (LLMs) and Knowledge Graphs (KGs) highlights their synergistic potential in enhancing artificial intelligence (AI) applications. LLMs, with their natural language understanding and generative capabilities, support the automation of KG construction through entity recognition, relation extraction, and schema generation. Conversely, KGs serve as structured and interpretable data sources that improve the transparency, factual consistency and reliability of LLM-based applications, mitigating challenges such as hallucinations and lack of explainability. This study conducts a systematic literature review of 77 studies to examine AI methodologies supporting LLM–KG integration, including symbolic AI, machine learning, and hybrid approaches. The research explores diverse applications spanning healthcare, finance, justice, and industrial automation, revealing the transformative potential of this synergy. Through in-depth analysis, this study identifies key limitations in current approaches, including challenges in scalability with maintaining dynamic and real-time Knowledge Graphs, difficulty in adapting general-purpose LLMs to specialized domains, limited explainability in tracing model outputs to interpretable reasoning, and ethical concerns surrounding bias, fairness, and transparency. In response, the study highlights potential strategies to optimize LLM–KG synergy. The findings from this study provide actionable insights for researchers and practitioners aiming for robust, transparent, and adaptive AI systems to enhance knowledge-driven AI applications through LLM–KG integration, further advancing generative AI and explainable AI (XAI) applications.

Keywords:

Knowledge Graphs; Large Language Models; machine learning; artificial intelligence

1. Introduction

In today’s artificial intelligence (AI)-driven world, understanding and utilizing structured and unstructured data effectively has become critical for decision-making in various domains. Recently, Large Language Models (LLMs) have gained popularity due to their use cases in well-received generative AI applications such as ChatGPT, DeepSeek, Gemini, and Copilot and in domain-specific model’s highlighting its importance in decision-making systems [1]. Despite the success of LLMs in AI applications, there have been a few known challenges with LLMs due to their lack of factual knowledge. There have been instances of LLMs not being able to recall their training corpus and experiencing hallucinations. For instance, LLMs might answer “Facebook bought YouTube in 2006” when asked “When was YouTube bought by Facebook?”, which contradicts the fact that Google bought YouTube in 2006. This challenge creates trust issues with LLMs. LLMs have also been criticized for their lack of interpretability as black box models. They store knowledge implicitly in their parameters, making it hard to understand or confirm the information they provide. Moreover, some LLMs make decisions based on probabilities, which can lead to uncertain or unclear reasoning. Even when LLMs try to explain their reasoning using methods like a chain of thought, these explanations can sometimes include false or misleading information, which depicts LLMs as suffering from the hallucination issue. This makes them unreliable in critical domains like medical diagnosis or legal decisions, where mistakes can have serious consequences [2]. For example, in medicine, an LLM might give the wrong diagnosis and back it up with reasoning that does not match common medical knowledge. Another challenge is that LLMs trained on general information often struggle with domain adaptation due to the lack of domain-specific knowledge or new training data [3]. This limits their ability to adapt to unique or evolving domains. The challenges show that the specific patterns and functions of LLMs that are used to predict outcomes or decisions are not directly accessible or explainable to humans. The complexity of LLMs often leaves users questioning how outputs are generated and whether the decisions made by algorithms are fair and accurate. Addressing these challenges requires a deeper understanding of their reciprocal relationship and how LLMs can be explainable while effectively integrated. On the other hand, a Knowledge Graph (KG) as a semantic network for structured data representation model facilitates reasoning, information retrieval, and contextual insights. KG technologies such as entity recognition, relationship extraction, and schema generation make AI models more explainable [4]. KGs with their structured and factual nature offer a way to mitigate the explainability concerns of LLMs and improve their output and contextual accuracy [5]. The derivative of KG-LLM integration is to ensure understandability and transparency through ‘Explainable AI’ also known as XAI, which bridges the interpretability gap between complex models and human users and plays a pivotal role in building trust in these integrated systems [6]. Conversely, LLMs augment KG construction by automating processes like entity extraction and relationship identification from unstructured data [7]. This is crucial for applications like medical KGs, where inaccuracies can have critical implications [8]. LLMs use vectors and neural networks to predict natural language, which is capable of generating human-like text and processing complex information [9]. As a result, when LLMs and KGs are coupled together, they can improve machine learning reasoning and data-centric decision-making and provide Findable, Accessible, Interoperable and Reusable (FAIR) unbiased explainable systems [10]. To understand the reciprocal relationship between these two technologies, we went through the existing literature to create more robust systems through a systematic review and outlined the following research questions:

RQ1: How are LLMs being used to construct KGs?
RQ2: How are KGs being used to improve the output of LLMs?
RQ3: What AI methodologies are used for LLM-based KG systems and KG-based LLMs?

The main contributions of this paper are addressed through the investigation of the three research questions. By answering these questions, this study contributes to systematically exploring the LLM–KG relationship, categorizing AI methodologies, addressing explainability challenges through XAI, and providing insights into domain-specific applications like healthcare and finance. Through this study, we aimed to contribute to the field of LLM–KG integration. First, we offer a systematic and structured review of 77 research papers, thereby highlighting key advancements in AI methodologies for constructing and utilizing KGs. Second, we provide a comprehensive classification framework that categorizes existing techniques into three primary groups—symbolic AI, machine learning, and evolutionary computation—while providing a comparative analysis of the strengths and limitations inherent in each approach. Third, this study identifies critical challenges such as (1) scalability, where real-time updates and integration of large-scale KGs into LLM workflows remain computationally intensive; (2) domain adaptation, as general-purpose LLMs often struggle to align with domain-specific knowledge embedded in KGs effectively; (3) explainability, due to the difficulty of tracing outputs to human-understandable reasoning; and (4) ethical considerations, including fairness, bias mitigation, and data governance challenges. It also proposes how emerging methods—including XAI frameworks, retrieval-augmented generation, and hybrid neuro-symbolic models—can address these issues. Finally, we outline the gaps and limitations paving the path for future research, emphasizing the need for dynamic updates to Knowledge Graphs, multimodal integration, and bias mitigation strategies to further the practical application of LLM-driven Knowledge Graphs, ultimately moving toward a better understanding of the integration and utilizing their collaborative strengths to create systems that are robust, versatile and trustworthy.

2. Literature Review

Neoteric advancements in integrating LLMs and KGs have laid the groundwork for improving AI systems’ reasoning and decision-making. In particular, recent studies and frameworks such as Auto-KG Agent [11], RAG-based pipelines [12] and domain-specific models like VieMedKG [13] and KARGEN [14] exemplify current progress in real-world applications of LLM–KG integration. Pan et al., 2024, and Ren et al., 2024, emphasize the capabilities of LLMs in automating KG construction from unstructured data. These studies demonstrate technical advancements in automating KG creation and improving scalability, particularly in domains with large unstructured datasets [3,15]. However, the focus has largely been on the technical aspects of LLM-driven KG construction, with indirect and limited attention paid to the critical role of this integration, i.e., explainability in ensuring the interpretability and trustworthiness of these systems.

Other studies such as Zhu et al., 2024, highlight how KGs serve as inputs to LLM applications by providing structured, factual knowledge that mitigates hallucinations and improves contextual accuracy. Frameworks like Retrieval Augmented Generation (RAG) illustrate the synergy of this integration but often fall short in addressing how XAI can trace outputs to verifiable KG data [8]. While these existing studies focus on the value of explainability, they lack a systematic exploration of XAI’s role in building user trust and meeting regulatory requirements in sensitive domains such as healthcare and finance.

Despite advancements and progress in LLM–KG integration, critical gaps persist in understanding their bidirectional relationship. Existing literature often treats LLMs and KGs in isolation, fragmenting insights into how LLMs enhance KG construction (e.g., via entity extraction) or how domain-specific KGs (e.g., medical ontologies) improve LLM reliability. While Zhu et al. pioneered a unified framework exploring KGs as both inputs and outputs for LLMs, their work overlooks systematic methodologies for ensuring traceability and accountability through XAI [8]. For instance, though they identify XAI’s potential to bridge interpretability gaps, they fail to operationalize how fairness or auditability can be achieved in practice [5,7]. Hence, the AI techniques to support LLM–KG workflows and how XAI principles can be tailored to domain-specific needs remain unanswered. Our work addresses these gaps by rigorously mapping methodologies for LLM–KG integration (e.g., fine-tuning, Retrieval Augmented Generation, and hybrid architectures) and proposing an XAI-driven framework to enhance transparency in high-stakes domains like healthcare and finance. By extending the research beyond the current literature, we present a focused approach toward understanding LLM–KG systems and prioritizing adaptability, ensuring systems evolve with emerging knowledge while maintaining explainability—a critical step toward unlocking LLM–KG potential in real-world applications.

3. Methods

Our study adopts a PRISMA-ScR (Preferred Reporting Items for Systematic Reviews and Meta-Analyses—Scoping Review) [16] methodology to examine how LLMs integrate with KGs in the context of generative AI applications. PRISMA-ScR provides a structured framework that allows us to conduct a structured, comprehensive assessment of the existing literature, capturing patterns, identifying research gaps, and understanding the trend of this integration over recent years. Drawing on PRISMA-ScR guidelines, we structured this review to achieve robust results by following a defined protocol for study selection and analysis.

3.1. Specific Research Questions

As our primary goal is to provide an overview of the integration of LLM–KG through the studies published across various domains, we formulated three specific research questions that guided our collection, extraction, analysis, and synthesis of evidence. These three research questions (RQ1–RQ3) were selected based on the emerging challenges and opportunities identified in the literature on LLM–KG integration. RQ1: How are LLMs being used to construct KGs? RQ1 explores how LLMs are utilized to construct KGs, addressing the growing interest in automating KG creation from unstructured data sources, i.e., critically required in domains such as healthcare, law, and education. RQ2: How are KGs being used to improve the outputs of LLMs? RQ2 examines how KGs can be used to enhance LLM outputs in mitigating issues like hallucinations and contextual accuracy, which are concerns in real-world applications. RQ3: What AI methodologies are used for LLM-based KG systems and KG-based LLMs? Our review is guided by these research questions to provide a comprehensive lens through which the bidirectional relationship between LLMs and KGs can be analyzed, categorized, and advanced, identify methodological gaps, and investigate the challenges faced in this integration.

3.2. Data Sources and Search Strategy

To prepare a cumulative dataset for the systematic review, we conducted an automated search process by applying a search string across four academic search engines and libraries, and then, we manually selected studies that are focused on the topic of interest. We used targeted terms such as “Large Language Model” and “Knowledge Graph” to capture a wide range of studies on LLM and KG integration. In this research, we referred to the major scientific databases such as ACM Digital, IEEE Xplore, Google Scholar and Elsevier Scopus.

We opted for a partially automated approach because search engines differ on how they structure search strings. To overcome this for the search strategy, we developed a targeted search query using Boolean operators to capture relevant publications: (“Large Language Model” OR “LLM”) AND (“Knowledge Graph” OR “KG”). Furthermore, the large number of papers that were first retrieved did not align with the scope of our research.

3.3. Inclusion and Exclusion Criteria

We applied four exclusion and inclusion criteria to evaluate the studies retrieved from 2019 to 2024 through our search. Papers were excluded from the research if they met any of the following exclusion criteria, and the papers adhering to the eligibility criteria were included in the research process. The Table 1 below states the criteria.

3.4. Selection Process

To ensure comprehensive literature coverage, papers were initially identified using the search string on major scientific databases, followed by manually narrowing down the list of the retrieved papers in our dataset by selecting the studies that aligned with the scope of our research based on the analysis of the metadata like title, abstract, and keywords. These papers’ titles, years, and DOIs were extracted from the databases by executing a Python script utilizing the Selenium library to automate the steps and store the relevant information in an Excel file. We additionally used Pandas for data cleaning operations, such as removing duplicates and filtering entries with missing metadata (e.g., DOIs or publication years). Ambiguous cases were resolved through full-text reviews and collaborative review sessions among authors to ensure consistency and accuracy in inclusion. The process is further elaborated in Figure 1 below.

We retrieved 3658 papers that were potentially relevant to our analysis. Once the titles were extracted, we first proceeded to remove duplicate results. This left us with 3317 results. Next, we removed those papers that were published in arXiv, leaving us with 2701 results. Afterward, we also removed those results which were published in books, blogs and forums. In total, we were ultimately left with 2639 results. Furthermore, we filtered the articles out with blank or missing attributes such as years and DOI. Thereafter, after applying exclusion and inclusion criteria, we reduced the total number of papers to 2374, which was further processed. Then to systematically filter studies, we performed a filtration function on the titles of the shortlisted papers, where we scanned for the query keywords “Large Language Model/LLM” and “Knowledge Graph/KG” and made sure the papers were relevant to the research topic, and we followed the defined inclusion and exclusion criteria, ensuring the selection of papers that align within our research focus. After this step, we were left with 1006 articles. Then, we proceeded with the initial screening of scanning titles and abstract skimming. Titles of all collected papers were skim-read and screened to determine relevance. Papers that clearly did not meet the inclusion criteria were marked for exclusion and documented in an exclusion log (see Appendix Table A1 for an example of paper screening).

We individually inspected the titles and decided to remove 250 articles, and 756 results were shortlisted for the next step of reviewing abstracts. During the secondary screening, papers that passed the initial screening were reviewed in greater depth by examining the abstract, introduction, and conclusion sections to assess the authors’ objectives and contributions. In this step, we carefully analyzed the abstract and title of each paper to determine its alignment with our research questions. After manually reviewing each paper and evaluating its domain and research focus, initially, 181 papers were identified and shortlisted based on our established inclusion and exclusion criteria. We re-examined and re-evaluated the dataset to determine the significance of each study after refining our research questions. Ultimately, 77 papers were deemed directly aligned with the updated research questions and were incorporated into the final analysis, and the remaining papers were excluded due to their limited relevance to the revised scope of our study. Figure 2 below depicts the flow of the systematic review.

During the process, whenever there was a conflict, we performed an in-depth review, where the papers whose relevance remained ambiguous after secondary screening underwent a full-text review. This allowed for a thorough assessment, particularly for papers that addressed complex methodologies or had overlapping themes. We reviewed the full text of the paper and decided based on the primary domain of the paper whether to include it for meta-analysis or not. This stage provided further clarification on the relevance of the papers to our research questions.

3.5. Dimensions

To understand the complete horizon of the LLM and KG integration, we collected information across several key dimensions, including the title of the paper for easy reference, the year of publication to analyze trends over time, and the source to identify the most relevant libraries contributing to the research.

Domain: In the context of this research, this dimension largely refers to the domain or area of knowledge where the integration of LLMs and KGs is utilized to address challenges, solve problems, or drive innovations. While going through the dataset, we identified five major domains. These include computer science, education, finance, healthcare, and justice. The domain dimension is crucial for businesses as it helps identify industry-specific opportunities where LLM–KG can drive innovation, optimize operations, and enhance decision-making. By categorizing applications into areas like finance, healthcare, education, and justice, businesses can develop AI-driven solutions tailored to their sector, improving efficiency, customer experience, and competitive advantage. Understanding domain-specific applications allows companies to utilize AI not just for automation but as a strategic tool for growth, differentiation, and market leadership.
Model Implemented: This dimension refers to the specific AI or computational model used to integrate LLM–KG. These models usually automate tasks essential for constructing and maintaining KGs, such as entity recognition, relationship extraction, and schema generation. Based on the studies, we were able to narrow down to five major models implemented, which are as follows:
- Autoregressive Transformers (Decoder-Only Models): These models, such as GPT-2, GPT-3, and GPT-4, predict the text token by token based on previous context. They are widely used in open-domain question answering, content generation, and dialogue systems.
- Bidirectional Transformers (Encoder-Only Models): These models, including BERT and RoBERTa, process input bidirectionally to capture contextual meaning, making them ideal for information retrieval, classification, and semantic understanding tasks.
- Encoder–Decoder Transformers (Seq2Seq Models): These models, such as T5 and BART, follow a sequence-to-sequence paradigm and are effective in translation, summarization, and generating structured outputs from textual or graph inputs.
- Hybrid/Retrieval-Augmented Models: This category includes architectures that integrate neural language models with retrieval mechanisms, symbolic tools, or external databases to improve factual grounding and reduce hallucinations. Examples include RAG, GraphRAG, and DRaM.
- Symbolic–Neural Hybrids/Other Non-Transformers: This group consists of models built with or around LSTM, GRU, CNNs, or logic-based rule systems, often in domain-specific or multimodal contexts. They are typically used in lower-resource or highly specialized applications.
Applications: This dimension provides insights into how these solutions are used to solve real-world problems across various domains, including but not limited to building recommendation systems for providing personalized suggestions, developing classification systems for categorizing unstructured data into predefined groups, question answering systems and general NLP solutions for sentiment analysis and its applications.
Input Type: This dimension investigates various types of input data used by the AI models in the studies finalized. Major types included in the finalized studies include text, images, and videos. It brings the focus to the ability of these models to incorporate unstructured and multimodal data to structured actionable insights.

3.6. Data Extraction, Data Analysis, and Synthesis

We implemented a structured form for the process of data extraction and employed a systematic approach to gather the relevant information from the shortlisted papers. During this process, we accessed the complete paper text and recorded specific relevant information while focusing on answering our research questions. Two researchers worked together during this process to ensure the accuracy of the data, resolving any disagreements through the discussions and involving a third researcher when needed.

Upon completing the form with the dataset populated from the information gathered from the papers, we analyzed it using two research methodologies, thematic analysis and descriptive statistics, to synthesize the collected data and present our findings. First, for the quantitative research approach, we used descriptive statistics to identify patterns, such as the number of papers published each year, their domain, and the multimodal dimensions. Second, for the qualitative research approach, we performed a thematic analysis to explore the content in more detail, identifying model implementation, applications, and implications, particularly in the context of XAI. Together, these approaches offered both high-level trends via quantitative insights and deep insights via qualitative synthesis, allowing us to synthesize the findings and draw meaningful conclusions comprehensively.

4. Results

We identified 77 studies focusing on our research area published between 2019 and 2024. During this timeframe, there has been a noticeable growth in interest in the topic, starting with the popularity of the launch of the generative AI application ChatGPT. Based on the studies, we could map the type of models implemented in each of the studies. The most frequently implemented models were Hybrid/Retrieval-Augmented Models, which combine LLMs with symbolic reasoning, external retrieval mechanisms, or structured knowledge sources to enhance factuality, transparency, and contextual grounding. These models were prevalent across diverse applications such as question answering, semantic search, and Knowledge Graph construction, followed by Seq2Seq Models, which are Encoder–Decoder Transformers like T5 and BART and were widely used for tasks such as summarization, translation, and KG-to-text generation. Autoregressive Transformers, particularly GPT-based models, appeared commonly in generative pipelines, especially for open-ended question answering. Bidirectional Transformers such as BERT were often applied in classification and entity extraction tasks. In contrast, fewer studies used Symbolic–Neural Hybrids/Others, including LSTMs and rule-based systems, primarily in specialized or low-resource scenarios. This distribution reflects a growing emphasis on hybrid and retrieval-augmented strategies for improving explainability and grounding in LLM–KG pipelines. Table 2 summarizes our mapping for model implementation. Our dataset is available at https://doi.org/10.6084/m9.figshare.28468637.v1 accessed on 1 March 2025 [17].

We were also able to identify various applications of these studies. Notable applications include recommendation systems generation [91]. Another key application is language model generation, such as the Retrieve-and-Discriminate Prompter (RD-P) framework, which is formed to enhance the performance and overall ability and reliability in performing knowledge-intensive question-answering tasks [20]. Additionally, these studies contribute to general NLP applications such as the Patent Response Large Language Model (PRLLM) and Patent Precedents Knowledge Graph (PPNet), which is developed as a specialized NLP system for understanding and generating responses in the patent domain [58]. Most of the input types of the models were text-based. While video and image inputs appear rarely, their inclusions signify a growing potential for multimodal integration. Figure 3 shows the frequency of each LLM architecture across the reviewed papers, with Hybrid/Retrieval-Augmented Models being the most prevalent, followed by Seq2Seq and decoder-only models.

4.1. RQ1: How Are LLMs Being Used to Construct KGs?

LLMs have transformed the construction of KGs by automating tasks such as entity recognition, relation extraction, schema design, and ontology development from unstructured or semi-structured data. This capability is particularly valuable in domains like healthcare, where LLMs are used to transform research papers, clinical records, and other textual sources into structured KGs. For example, GPT-4 and similar models have been employed to construct domain-specific KGs [8], such as VieMedKG for traditional Vietnamese medicine by extracting entities and relationships from textual data [13]. New frameworks like Auto-KG Agent further automate semantic triple extraction for KG population, reducing reliance on manual curation [11]. In healthcare, LLMs process research papers and clinical records to generate structured KGs that capture relationships between diseases, treatments, and symptoms [43,75]. Similarly, KARGEN leverages domain-specific embeddings from radiology datasets to streamline medical KG creation and report generation [14]. Beyond healthcare, LLMs extract cybersecurity KGs from threat intelligence data [86] and educational KGs from textbooks by integrating semantic relations [74]. LLMs also excel in constructing multimodal KGs, which integrate textual, visual and symbolic data for applications like robotics and embodied AI. For instance, frameworks like Scene-MMKG and ManipMob-MMKG combine perceptual (visual) and symbolic data to build KGs that support fine-grained scene understanding, robotic navigation, and object manipulation [76]. Recent surveys highlight LLM-driven multimodal KG construction, where models fuse text, images, and structured data for applications ranging from tourism recommendation systems [88] to socio-culturally adapted chatbots [44,61]. These advancements highlight the versatility of LLMs in handling diverse data types and domains. A key strength of LLMs lies in their ability to support zero-shot and few-shot learning, enabling efficient KG construction with minimal labeled data. This is particularly useful in rapidly evolving domains like job–skills mapping [60] and cultural heritage [50,55,92]. For example, few-shot prompting has been used to generate accurate SPARQL queries for populating scholarly KGs with bibliographic data [29] and, when coupled with Chain of Thought (CoT) reasoning, helps detect missing relationships in incomplete KGs [32], while zero-shot learning enables entity and relationship extraction in contexts where labeled data are scarce [36]. Prompt engineering techniques further refine structured entity-relation extraction, allowing LLMs to generate high-quality triples without manual annotation [46]. Challenges persist in SPARQL query generation for KG population, as seen in benchmarks like Spider4SPARQL [28], but advances in zero-shot ontology alignment improve cross-dataset knowledge integration [87]. LLMs also play a critical role in refining and updating existing KGs. They enable dynamic updates to ensure KGs remain relevant over time, particularly in fast-evolving fields like healthcare, finance, and law. Advanced techniques such as neural embeddings, prompt engineering, and RAG systems are employed to achieve high-precision and real-time knowledge extraction [23,42,93]. For instance, zrLLM applies zero-shot learning to Temporal Knowledge Graphs (TKGs), predicting time-sensitive facts [30], while RAG systems build conference-specific KGs from event websites [59]. TKGs leverage LLMs to embed evolving data, such as software development KGs that track hidden entities in GitHub repositories [56] and also to predict and represent time-sensitive information, embedding temporal facts and relationships to build KGs that evolve alongside changing data [94]. Hybrid approaches combining retrieval-based learning and structured pretraining further enhance dynamic KG updates [5,22]. Additionally, LLMs facilitate entity normalization and synonym detection, which are crucial for resolving ambiguities in KG construction. Techniques like clustering LLM-generated embeddings help link semantically related entities within scientific KGs, ensuring that different surface forms of entities are correctly aggregated [41]. In academic KGs, LLM-assisted graph reasoning improves ontology alignment and schema construction [95], while zero-shot prompting aligns ontology elements across datasets [87]. This approach is particularly beneficial in academic KGs, where entity variants can hinder effective graph construction [38]. By combining prompting mechanisms with relation extraction, LLMs enable scalable and automated KG construction across domains. Contextual prompts guide models to focus on domain-specific knowledge, as seen in frameworks that build financial and medical KGs using structured schema-based extractions [43,54]. LLMs also act as graph encoders, enhancing node classification with textual embeddings [22], and integrate data from APIs and databases for education-focused KGs [69]. These advancements significantly reduce the effort and expertise required for KG construction, accelerating the development of AI-driven applications. The reciprocal relationship between LLMs and KGs is discussed further in Section 5.

4.2. RQ2: How Are KGs Being Used to Improve the Output of LLMs?

When KGs are used as input into LLM applications, KGs act as a grounding mechanism for their responses in structured, verified information, reducing hallucinations and improving contextual accuracy. This is particularly critical in sensitive domains like medicine, where incorrect outputs can have significant consequences. For example, in medical question-answering systems, KGs allow the construction of domain-specific LLMs whose responses align with established knowledge [50,75]. Legal KGs further ensure compliance in generated contracts by injecting structured metadata, improving fact-checking for legislative texts [21,62]. Yang, in his paper, demonstrates how medical KGs improve the reliability and precision of LLM-generated responses by validating answers through entity-relation triples representing diseases, symptoms, and treatments [43]. Similarly, KARGEN leverages domain-specific embeddings from radiology datasets to enhance medical report generation and decision-making [14]. Cybersecurity KGs similarly reduce false positives in threat detection by grounding LLM outputs in structured threat intelligence [26]. KGs also support multi-hop reasoning, enabling LLMs to derive insights from interconnected nodes. This capability is valuable in applications like recommendation systems, industrial management, and complex question-answering (QA) systems. For instance, Venkatakrishnan et al. and Wei et al. highlight how KGs enable LLMs to navigate through interconnected nodes, retrieving facts that span multiple relationships [19,23]. The SG-RAG model integrates multi-hop reasoning KGs with LLMs to tackle complex QA tasks, outperforming traditional SQL-based approaches in structured data querying [12,83]. Son et al., 2024, further describe Virtual Knowledge Graphs (VKGs) that encode facts and queries into dense vector spaces, allowing LLMs to efficiently retrieve relevant knowledge across multiple hops. This is particularly useful in databases and QA systems, where answers often require chaining multiple facts. KGs, being versatile, have several types that can be integrated with LLMs to not only improve their capabilities but also to enhance their performance:

Medical KGs: Widely used in life sciences and pharmaceuticals, medical KGs improve LLM-based applications such as clinical decision-making, drug discovery, and patient management. For example, Wu et al. and Xu et al. demonstrate how medical KGs ground LLM responses in structured knowledge, reducing errors and improving reliability [75,92].
Cultural KGs: These KGs capture culturally and traditionally accurate regional languages, mitigating hallucinations in folklore-related LLM tasks. Traditional Folklore Knowledge Graphs (TFKGs), for instance, ensure that LLM outputs are based on verified facts [13,50].
Industrial KGs: Used in wirearchy management and immigration law, industrial KGs model organizational hierarchies and legal processes, improving the accuracy of LLM outputs in these domains [19,50]. Environmental sustainability KGs extend this by grounding ESG (Environmental, Social, Governance) queries in fact-based responses [47].
Hybrid KGs: Combining multimodal data (textual, visual, audio, and numeric), hybrid KGs enhance LLM performance in domains like recommendation systems and robotics [23,42]. For example, frameworks like ManipMob-MMKG integrate scene-driven multimodal KGs to jointly process visual and textual inputs, supporting tasks like object identification and environment navigation [76]. In knowledge-based visual question answering (VQA), KGs improve multimodal reasoning by linking visual inputs to structured knowledge [85].

KGs also improve disambiguation and context resolution by linking ambiguous entities in LLM outputs to specific KG nodes. Techniques like synonym detection and clustering ensure semantic coherence, resolving multiple representations of the same entity. This is enhanced in multilingual contexts, where KGs optimize semantic communication to improve LLM comprehension of cross-lingual queries [89]. Such methods are critical in academic and scientific contexts, where ambiguous terminology can lead to inconsistencies [38,41]. RAG systems further enhance LLM output by integrating relevant KG subgraphs during inference. These subgraphs act as contextually rich knowledge sources, guiding LLMs to produce accurate responses [96]. In scholarly applications, structured prompts derived from KGs significantly improve LLM responses to research questions, as demonstrated by [70] in their study [70]. For instance, Taffa and Usbeck highlight in their paper how the Open Research Knowledge Graph (ORKG) provides bibliographic context for answering research-related questions, significantly improving output accuracy [29]. Finally, contextual prompts derived from KGs enhance LLM reasoning by infusing domain-specific knowledge directly into LLMs during fine-tuning. This approach bypasses the need for comprehensive pre-training and is particularly useful for entities not covered by existing knowledge bases, such as niche products and emerging concepts [36]. Semantic verification techniques grounded in KGs further mitigate hallucination risks by fact-checking LLM-generated content [57,77]. By combining these capabilities, KGs enable LLMs to perform complex reasoning tasks, reduce errors, and improve contextual understanding, making them indispensable for AI-driven applications.

4.3. RQ3: What AI Methodologies Are Used for LLM-Based KG Systems and KG-Based LLMs?

Understanding the AI methodologies that drive LLM–KG construction and interaction has become imperative. There are multiple AI methodologies being used in LLM based KGs construction which are classified into three categories, symbolic AI, machine learning and evolutionary computation.

Symbolic AI—Symbolic AI is heavily implemented in LLM-based KG construction for the tasks requiring explicit rule-based reasoning and validation [55]. See some examples below:
- SPARQL Query Generation: LLMs generate SPARQL queries to validate KGs against predefined ontologies, ensuring compliance with knowledge representation standards [19]. Recent benchmarks like Spider4SPARQL is used to rigorously evaluate LLMs’ ability to handle complex query structures, while the subgraph extraction algorithms refine SPARQL generation by isolating contextually relevant graph segments [35].
- Logic-Based Inference: Frameworks like DRaM use symbolic reasoning to derive and validate relationships in KGs, improving interpretability and accuracy [97].
- Ontology Matching: Algorithms align LLM-generated entities with predefined taxonomies, ensuring semantic consistency in domains like scholarly KGs [87].

While symbolic AI excels at rule-based reasoning and formal validation, machine learning methodologies take a more data-driven approach towards automating the discovery and extraction of knowledge from large unstructured datasets.

2.

Machine Learning (ML)—ML supports the most LLM based methodologies to provide data driven approaches to extract and refine knowledge. See some examples below:

Neural Network Techniques: Embedding-based models and GNN architectures like HybridGCN are used for link prediction and subgraph extraction from textual and relational data to build and refine KGs which further enables the automated discovery of relationships in KGs [23,42,90]. HybridGCN further scales KG reasoning by integrating graph neural networks with LLMs [90], while neural-symbolic frameworks like those by Liu et al. combine LLM embeddings with structured query generation for multi-hop reasoning [67].
Semantic Parsing: LLMs transform unstructured text such as natural language into structured data helping the generation of triples, graphs and relationships that populate KGs [13,55].
Prompt Engineering: Customized prompts guide the LLM in extracting domain-specific entities and relationships. By fine-tuning LLMs with custom prompts, machine learning facilitates domain-specific knowledge extraction [42,92]. Comparative studies by Schneider et al. demonstrate how prompt engineering aligns LLM outputs with KG structures, ensuring factual text generation [80].
Multi-Level Knowledge Generation: Techniques like few-shot KG completion fill missing relationships with minimal labeled data, they enable LLMs to generate triples and attributes for KG completion reducing reliance on labelled datasets, as demonstrated by Li et al. in generating hierarchical attributes for sparse KGs [44].

In contrast to both symbolic AI and machine learning, evolutionary computation focuses on optimizing and dynamically updating KGs to accommodate new information and maintain relevance over time.

3.

Evolutionary Computation—The uncommon and less prevalent than machine learning and symbolic AI, evolutionary computation is being used occasionally to optimize tasks. For example,

Dynamic KG Updates: Algorithms inspired by biological evolution optimize graph structures dynamically to accommodate new knowledge, ensuring relevance over time [23,97].

In recent years, hybrid approaches that combine symbolic and neural methodologies have also gained traction. Scene-Driven KGs integrate symbolic rules with perceptual data from vision–language models, enabling robots to interpret dynamic environments in real time, for example, spatial hierarchies [76]. This is further strengthened by rule-based fact verification by Momii et al. to depict LLM-generated text that aligns with structured domain knowledge [39]. Embedding-rule fusion frameworks combine logic-based constraints with LLM-generated embeddings, allowing KGs to adapt to ambiguous inputs while maintaining ontological consistency. For instance, medical diagnosis systems use symbolic disease taxonomies to ground neural symptom predictions, reducing hallucination risks [3]. Temporal reasoning methodologies are essential for applications involving time-sensitive data. TKGs use temporal embeddings to represent dynamic relationships and enable LLMs to answer time-specific questions accurately [94]. Similarly, VKGs facilitate multi-hop reasoning by embedding facts and queries into vectorized representations, improving retrieval performance across large-scale datasets [49]. While the preceding section describes how symbolic AI, machine learning, and evolutionary computation support LLM-based KG systems, despite their discrete strengths, a distinct defiance emerges in practice when trying to ensure transparency and trust of both the constructed KGs and the LLMs that utilize these KGs. To address this challenge, XAI methods bridge the gap that connects symbolic AI, ML and evolutionary computation. XAI serves as a unifying framework that ensures transparency, interpretability and trust and addresses concerns related to fairness and error detection across diverse AI methodologies. These methods improve confidence in the generated knowledge by making the reasoning processes behind entity recognition and relationship extraction more interpretable and accessible. In symbolic AI methodologies, rule-based methods like DRaM provide explicit logic for inferring relationships [98], while SPARQL queries validate knowledge consistency [55,97]. In machine learning, interpretable embeddings and fine-tuned prompt engineering improve the transparency of knowledge extraction and reasoning [42]. This allows us to improve the interpretability of LLM outputs. A major issue with LLM is hallucination, which occurs when a Large Language Model starts to generate a response that is factually incorrect or illogical or has no relation to the input prompt. Explainability also aims to facilitate error analysis, enabling the identification and correction of hallucinations and inaccuracies commonly associated with LLMs, especially in the sensitive domains where inaccuracies can have serious consequences [19,75]. XAI transcends individual methodologies to ensure transparency and accountability. Attention maps in models like KARGEN correlate LLM decisions with specific KG subgraphs, answering questions like “Why was this drug contraindicated?” in clinical settings [7,14]. Multi-stage validation pipelines combine SPARQL-based ontological checks [37] with gradient-based attribution [99] to audit knowledge provenance. For instance, financial KGs cross-validate LLM-generated market predictions against historical transaction patterns. Moreover, XAI promotes better data governance by ensuring compliance with regulatory and ethical standards. Validation techniques such as SPARQL query generation allow the systematic verification of KG consistency against the predefined ontologies, allowing both transparency and reliability [19,55]. These approaches also improve usability by making LLM-driven KG systems more accessible to non-technical users, permitting them to understand and utilize the outputs effectively. Overall, the integration of XAI in LLM-based KG construction bridges the gap between advanced AI technologies and user trust, enabling more robust, accurate, and ethical applications across various domains, which mitigates the algorithmic discrimination [4,100]. In the reverse relationship, once constructed and validated using AI methodologies, KGs do not simply act as static knowledge repositories. Instead, they enhance LLMs’ capabilities by serving as dynamic, structured inputs that refine the model’s outputs by addressing limitations like hallucination and contextual ambiguity. This integration relies on several AI methodologies across retrieval-based techniques, neural architectures, and explainability frameworks. For instance, RAG dynamically retrieves relevant subgraphs from KGs [101] to ground LLM outputs in verified knowledge during inference, ensuring factual accuracy [3]. Neural architectures, such as hybrid models like GRAPHCODE [102], incorporate KGs as external memory to guide text generation and reasoning tasks, enabling improved contextual consistency and domain-specific adaptation [15]. Additionally, KGs are integrated into training pipelines, as seen in frameworks like ERNIE and KEPLER, which inject domain-specific facts into LLMs, improving reasoning accuracy and factual consistency [3]. Validated KGs dynamically augment LLMs through RAG by retrieving contextually relevant KG subgraphs during inference to ground outputs, reducing factual errors by 37% in enterprise chatbots [101]. Embedding injection architectures like ERNIE embed domain-specific KG triples into LLM training pipelines, improving rare entity recognition in scholarly datasets [3], and Iterative Refinement Loops such as GRAPHCODE’s external memory module allow LLMs to query and update KGs mid-reasoning, enabling adaptive troubleshooting in IoT systems [15,102]. Simultaneously, KG-enhanced LLMs reinforce transparency and trust using XAI methods. Finally, attention mechanisms are used to map LLM decisions to specific KG nodes, answering critical questions like “Why did the model prioritize this entity?” Meanwhile, rule-based validation ensures that KG-retrieved evidence aligns with LLM-generated outputs [103]. For example, Khorashadizadeh et al. combine KG embeddings with gradient-based attribution to trace LLM predictions back to subgraph structures, improving transparency and interpretability and ensuring that outputs are well-grounded in the KG [99]. This synergy not only mitigates hallucination but also fosters trust in applications like clinical diagnosis, where KG-anchored explanations justify model decisions to end users [103]. By tightly coupling KGs with XAI, researchers achieve dual objectives: enhancing LLM reliability through structured knowledge while making the interplay between LLMs and KGs interpretable and accountable. These findings provide a foundation for understanding the interplay between LLMs and KGs, which we further analyze in the discussion below.

5. Discussion

5.1. The Reciprocal Relationship

In exploring the reciprocal interplay between LLMs and KGs, it is vital to acknowledge their two main modes of interaction: LLMs as tools to build KGs and KGs as inputs into LLM applications. While this topic is discussed in RQ1 and RQ2, this section extends the discussion by understanding the interdependencies, broader implications and implementation of advanced methodologies. Table 3 below helps us understand their duality concerning purpose, input and output.

5.1.1. KGs as Input into LLMs

KGs act as structured and reliable inputs to strengthen the performance of LLMs. By preparing LLMs through semantic layers, we can ground their outputs by facts that are verifiable knowledge, KGs address challenges like hallucinations, data sparsity, and domain-specific inaccuracies. This is particularly beneficial in applications like RAG, where KGs supplement LLM prompts with relevant structured information to improve response quality and compliance with regulations.

Moreover, KGs are indispensable for data governance and access control. By ensuring that only authorized data are included in LLM workflows, KGs enable organizations to maintain regulatory compliance and prevent the misuse of sensitive information. For example, in industrial settings, KGs can filter out irrelevant or unauthorized data during LLM processing, thereby avoiding “context poisoning”, where irrelevant data skew the generated outputs. Additionally, multi-hop reasoning facilitated by KGs allows LLMs to traverse interconnected nodes, uncovering deeper insights critical for complex applications like industrial knowledge management and personalized recommendation systems [42,55].

KG as input has applications in LLMs, such as in the medical question-answering systems, where KGs provide structured insights about diseases, treatments and clinical trials, ensuring that LLMs generate accurate and relevant responses [75,92]. Cultural applications, such as traditional folklore KGs, allow LLMs to generate folklore-related answers by anchoring them to verified sources [50]. Similarly, legal and industrial KGs enable LLMs to navigate complex relationships and retrieve pertinent data, such as in loan suggestion, immigration processes and recommendation systems [19,23].

5.1.2. LLMs to Build KGs

LLMs are instrumental in constructing KGs by automating entity recognition, relation extraction and schema creation, reducing the manual effort traditionally required in building KGs. In addition, LLMs also enable the rapid transformation of unstructured data (e.g., textual reports, academic papers) into structured knowledge. This capability is particularly evident in healthcare, where LLMs extract actionable insights from research papers to populate KGs with accurate medical information.

LLMs also dynamically update KGs, maintaining their continued relevance in fast-evolving domains. Techniques such as prompt engineering, semantic parsing and neural embeddings further refine LLM-driven KG construction [37,42]. This dynamic interaction is supported by advanced techniques such as vector-based embeddings, prompt-to-query retrieval, and fine-tuning LLMs with KG data. These methodologies not only enhance the precision and scalability of KG construction but also enable organizations to adapt their knowledge bases to emerging trends and challenges. For example, organizations employing hybrid RAG solutions combine LLM-powered vector searches with KG-based filters, achieving both speed and accuracy in data retrieval. This reciprocal relationship between LLMs and KGs accelerates innovation in applications that consist of complex question-answering systems like ChatGPT [1,50,55].

LLMs have various applications when building KGs, such as LLMs like GPT-4 and BERT, which are used to create domain-specific KGs, like the VieMedKG for traditional Vietnamese medicine, by identifying and linking entities and relationships in unstructured data [13,97]. In the healthcare domain, LLMs extract insights from textual sources to populate KGs with accurate and comprehensive representations of diseases, treatments and biomarkers [55,75]. In legal and industrial domains, LLMs transform fragmented data, such as paper-based immigration records into structured KGs, facilitating efficient data handling and retrieval [19,36].

5.2. Limitations

Despite their transformative potential, the integration of LLMs and KGs presents several challenges:

Domain-Specific Challenges: LLMs often struggle with subtleties in specialized areas such as healthcare, finance and justice, leading to inaccuracies in entity recognition and relation extraction. This can compromise the quality of the resulting KGs and the outputs of LLM applications.
Computational Intensity: The dynamic integration of KGs into LLM workflows demand significant computational resources, especially in real-time applications. This includes the cost of fine-tuning LLMs with domain-specific KG data or employing hybrid retrieval-augmented generation techniques.
Explainability Concerns: While KGs enhance the interpretability of LLM outputs, the inherent opacity of LLMs’ reasoning processes remains a barrier. Even with advancements in XAI, ensuring trust and validation in sensitive domains like healthcare and governance is challenging, and this poses the risk of algorithmic discrimination.
Data Governance and Compliance: Ensuring data governance, regulatory compliance and access control during KG and LLM integration is complex, especially in highly regulated industries. Failure to manage this concern can lead to data misuse or ethical concerns.

Addressing these limitations requires a varied, complicated approach, including advances in domain-specific optimization, computational efficiency and XAI techniques. As the research progresses, the integration of KGs and LLMs is poised to unlock new possibilities in AI, bridging the gap between structured and unstructured data to deliver reliable and innovative solutions.

6. Gap Analysis

This section aims towards finding critical gaps in the integration of LLMs and KGs, systematically examining disparities across four analytical dimensions—the disciplinary focus, model architectures, application domains, and input modalities—and three research questions (RQ1–RQ3). Supported by a quantitative synthesis of 77 research studies, this analysis identifies underexplored opportunities and methodological limitations hindering the advancement of LLM–KG systems.

6.1. Gaps Across Analytical Dimensions

6.1.1. Domains

Distribution of domains: A striking 76.6% (59 papers) of studies originate from the core domain of computer science, overshadowing other domains representing research in the field of AI and machine learning. The rest had applications in education (15.6%, 12 papers), healthcare (5.19%, 4 papers), finance (1.29%, 1 paper) and justice (1.29%, 1 paper).
Lack of Interdisciplinary Synergy: Fewer than ten studies bridge insights from complementary domains, for instance, healthcare and education or environmental policy and economics, limiting the cross-domain applicability of LLM–KG frameworks. Studies originating from domains such as finance and justice are underrepresented. This shows that the application of LLM–KG frameworks can be investigated and then implemented in various domains.

6.1.2. Model Implementation

Predominance of Hybrid/Retrieval-Augmented Models: Hybrid/Retrieval-Augmented Models (66.2%, 51 papers) dominate research, while specialized architectures like Bidirectional Transformers or Symbolic–Neural Hybrids (2.6%, 2 papers each) are underutilized.
Scarce Decoder-based Autoregressive Transformer Adaptation: Only 6.4% (five papers) of studies employ fine-tuned LLMs (e.g., GPT variants) tailored to domain-specific challenges in healthcare or finance.
Limited Hybrid Methodologies: Fewer than 5% of papers investigate synergies between disparate architectures, for example, integrating neural networks with symbolic reasoning systems.

6.1.3. Applications

Bias Toward Language-Centric Tasks: Language modeling applications account for 63.6% (49 papers) of the studies, whereas recommendation systems (7.8%, 6 papers) and question answering (1.29%, 1 paper) receive minimal attention.
Under deployment of Generative AI: Despite its versatility, ChatGPT is utilized in only 5.2% of studies (four papers), indicating its untapped potential in real-world applications where such Gen AI could assist adaptive education or customer service.

6.1.4. Input Modalities

Text-Centric Paradigms: All the studies rely exclusively on textual inputs, neglecting other methods such as image or video data.
Absence of Multimodal Integration: Fewer than 4% (three papers) of studies explore multimodal systems combining KGs with images and videos as multimodal inputs.
Monolingual Limitations: No studies investigate multilingual KG–LLM systems, constraining their utility in linguistically diverse regions.

6.2. Gaps Aligned with the Research Questions

6.2.1. RQ1: LLMs in KG Construction

Static Construction Paradigms: Most studies focus on static KG construction, with limited exploration of dynamic systems that update in real time. As knowledge is being created at such an immense rate and now recent advancements in computational AI are able to process big data real time, there are certain dataflow and workflows such as streaming data integration that could be explored to implement dynamic KGs with LLMs.
Automation Deficits: Advanced techniques like automated schema generation or granular entity linking are addressed in few papers.

6.2.2. RQ2: KGs in Enhancing LLM Outputs

Explainability Challenges: While KGs mitigate LLM hallucinations, only a few studies formalize explainability frameworks to trace outputs to KG nodes.
Domain-Specific KG Underutilization: Critical sectors like healthcare employ domain-specific KGs in only some of the cases, despite their potential for precision.
Sparse Multi-Hop Reasoning: Very few of the studies exploit KGs’ relational structures for complex reasoning tasks (e.g., causal inference).

6.2.3. RQ3: Methodological Diversity

Isolated Methodological Approaches: Symbolic AI, machine learning, and evolutionary computation are rarely integrated, stifling innovation in hybrid problem-solving.
Ethical Oversights: Methodologies addressing fairness, bias mitigation, or ethical governance are critically absent, particularly in high-stakes domains like healthcare.

This is summarized in Table 4 below.

6.3. Future Work and Strategic Recommendations

In light of the identified gaps, this section outlines future research directions and strategic recommendations to advance LLM–KG systems’ development and practical application. We have formulated strategic recommendations to address the evolving challenges and opportunities in developing context-aware LLM and KG systems. Building on the findings of our study, we aim to strategically bridge disciplinary imbalances, incorporate diverse data sources, refine specialized architectures, enhance explainability mechanisms, integrate symbolic and machine learning approaches, and embed ethical governance throughout the pipeline. By following these guidelines, future work on LLM-based KG systems can achieve greater accuracy, transparency, and societal value while ensuring ethical and practical applicability across diverse domains. Table 5 below outlines these strategic recommendations; their focus areas, such as prioritizing cross-disciplinary research, advanced multimodal integration, optimized domain-specific architectures, embedded explainability mechanisms, hybrid methodologies and institutionalizing ethical practices; and how they align with identified research gaps in the field.

This systematic gap analysis, grounded in empirical data from 181 studies, elucidates critical barriers to progress in LLM–KG integration. By addressing disciplinary imbalances, methodological silos, and ethical lacunae, researchers can unlock transformative applications across scientific and societal domains. Future work must prioritize interdisciplinary collaboration, multimodal innovation, and ethical governance to realize the full potential of LLM–KG systems.

7. Conclusions

The interplay between LLMs and KGs represents a transformative synergy in AI. LLMs offer advanced capabilities for processing and interpreting unstructured data, enabling the automation of KG construction through entity recognition, relation extraction, schema generation and zero/few-shot learning methods. Conversely, KGs serve as robust, interpretable and structured inputs that improve LLM-based applications’ factual consistency, reasoning ability and transparency, particularly in high-impact domains such as healthcare, finance, and law.

This study systematically reviewed 77 peer-reviewed publications to explore this reciprocal relationship, guided by three core research questions. First, we examined how LLMs contribute to KG development and found widespread use of automated techniques for knowledge extraction and schema design. Second, we analyzed how KGs enhance LLM performance by supporting multi-hop reasoning, reducing hallucinations, and enabling domain-specific adaptations. Third, we categorized the AI methodologies supporting these systems, observing a shift toward hybrid architectures that blend symbolic AI, machine learning, and neuro-symbolic models. XAI also enables trust, transparency, and traceability in LLM–KG workflows.

Despite these advances, limitations remain. Scalability challenges persist in maintaining dynamic, real-time KGs; domain adaptation remains difficult when applying general-purpose LLMs to specialized fields; explainability remains an open challenge; and ethical considerations, including algorithmic discrimination, bias, fairness and governance, demand further attention. Many reviewed studies also showed limitations in methodological diversity and multimodal integration.

We identified these issues through a structured gap analysis and proposed strategic recommendations to guide future research. This includes the development of domain-specific, adaptable, and multimodal frameworks that align with the contemporary needs. To conclude, the synergy between LLMs and KGs offers a promising path toward building AI systems that are not only powerful and efficient but also transparent, reliable and ethically grounded.

Author Contributions

Conceptualization, R.S.D., M.S. and E.R.; data curation, R.S.D. and M.S.; formal analysis, R.S.D. and M.S.; funding acquisition, E.R.; investigation, R.S.D. and M.S.; methodology, R.S.D. and M.S.; project administration, E.R.; resources, E.R.; software, R.S.D. and M.S.; supervision, E.R.; validation, R.S.D., M.S. and E.R.; visualization, R.S.D. and M.S.; writing—original draft preparation, R.S.D. and M.S.; writing—review and editing, R.S.D., M.S. and E.R. All authors have read and agreed to the published version of the manuscript.

Funding

The work conducted in the study has been funded by NSERC (Natural Sciences and Engineering Research Council) Discovery Grant (RGPIN-2020-05869).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available on Figshare at https://figshare.com/articles/dataset/Dataset_-_Reciprocal_Relationship_Of_KGs_And_LLMs/28468637/1?file=52560449 (accessed on 1 March 2025) with the following DOI: https://doi.org/10.6084/m9.figshare.28468637.v1 (accessed on 1 March 2025).

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LLMs	Large Language Models
KGs	Knowledge Graphs
AI	Artificial Intelligence
XAI	Explainable AI
RAG	Retrieval Augmented Generation
PRISMA-ScR	Preferred Reporting Items for Systematic Reviews and Meta Analyses-Scoping Review
GPT	Generative Pre-Trained Transformer
BERT	Bidirectional Encoder Representations from Transformer
NLP	Natural Language Processing
RD-P	Retrieve-and-Discriminate Prompter
PRLLM	Patent Response Large Language Model
PPNet	Patent Precedents Knowledge Graphs
KARGEN	Knowledge-enhanced Automated Radiology Report Generation Using
	Large Language Models
MMKG	Multi-Modal Knowledge Graph
SPARQL	SPARQL Protocol and RDF Query Language
TKGs	Temporal Knowledge Graphs
VKGs	Virtual Knowledge Graphs
TFKGs	Traditional Folklore Knowledge Graphs
ORKG	Open Research Knowledge Graph
GNN	Graph Neural Network

Appendix A

This table is included in the appendix as it provides supplementary details on the selected research and study screening, serving as additional reference material for readers interested in further specifics that support the main findings.

Table A1. Screening decisions for sample studies included or excluded.

	Research Study	Shortlisted?
1	Title: “Breaking the Barrier: Utilizing Large Language Models for Industrial Recommendation Systems through an Inferential Knowledge Graph” Year: 2024 Library: ACM Reason for Inclusion: -Contains both “Large Language Models” and “Knowledge Graph” in the title. -Published in a peer-reviewed conference/journal (ACM). -Within the specified date range (2019–2024).	Yes
2	Title: “A fusion inference method for Large Language Models and Knowledge Graphs based on structured injection and causal inference” Year: 2024 Library: ACM Reason for Exclusion: -The title does not clearly focus on the integration of LLMs and Knowledge Graphs for the research aim. -While it mentions “Large Language Models” and “Knowledge Graphs,” it seems more aligned with general AI/ML methods rather than the specific research questions outlined.	No

References

Qin, C.; Zhang, A.; Zhang, Z.; Chen, J.; Yasunaga, M.; Yang, D. Is ChatGPT a General-Purpose Natural Language Processing Task Solver? arXiv 2023, arXiv:2302.06476. [Google Scholar]
Danilevsky, M.; Qian, K.; Aharonov, R.; Katsis, Y.; Kawas, B.; Sen, P. A Survey of the State of Explainable AI for Natural Language Processing. arXiv 2020, arXiv:2010.00711. [Google Scholar]
Pan, J.Z.; Razniewski, S.; Kalo, J.C.; Singhania, S.; Chen, J.; Dietze, S.; Jabeen, H.; Omeliyanenko, J.; Zhang, W.; Lissandrini, M.; et al. Large Language Models and Knowledge Graphs: Opportunities and Challenges. arXiv 2023, arXiv:2308.06374. [Google Scholar]
Rajabi, E.; Etminani, K. Knowledge-graph-based explainable AI: A systematic review. J. Inf. Sci. 2024, 50, 1019–1029. [Google Scholar] [CrossRef]
Ibrahim, N.; Aboulela, S.; Ibrahim, A.; Kashef, R. A survey on augmenting Knowledge Graphs (KGs) with large language models (LLMs): Models, evaluation metrics, benchmarks, and challenges. Discov. Artif. Intell. 2024, 4, 76. [Google Scholar] [CrossRef]
Bommasani, R.; Hudson, D.A.; Adeli, E.; Altman, R.; Arora, S.; Arx, S.v.; Bernstein, M.S.; Bohg, J.; Bosselut, A.; Brunskill, E.; et al. On the Opportunities and Risks of Foundation Models. arXiv 2022, arXiv:2108.07258. [Google Scholar]
Li, D.; Xu, F. Synergizing Knowledge Graphs with Large Language Models: A Comprehensive Review and Future Prospects. arXiv 2024, arXiv:2407.18470. [Google Scholar]
Zhu, Y.; Wang, X.; Chen, J.; Qiao, S.; Ou, Y.; Yao, Y.; Deng, S.; Chen, H.; Zhang, N. LLMs for Knowledge Graph construction and reasoning: Recent capabilities and future opportunities. World Wide Web 2024, 27, 58. [Google Scholar] [CrossRef]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language Models are Few-Shot Learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Adhikari, A.; Wenink, E.; van der Waa, J.; Bouter, C.; Tolios, I.; Raaijmakers, S. Towards FAIR Explainable AI: A standardized ontology for mapping XAI solutions to use cases, explanations, and AI systems. In Proceedings of the 15th International Conference on PErvasive Technologies Related to Assistive Environments, New York, NY, USA, 29 June–1 July 2022; pp. 562–568. [Google Scholar] [CrossRef]
Ananya, A.; Tiwari, S.; Mihindukulasooriya, N.; Soru, T.; Xu, Z.; Moussallem, D. Towards Harnessing Large Language Models as Autonomous Agents for Semantic Triple Extraction from Unstructured Text. 2024. Available online: https://ceur-ws.org/Vol-3747/text2kg_paper1.pdf (accessed on 1 March 2025).
Saleh, A.O.M.; Tur, G.; Saygin, Y. SG-RAG: Multi-Hop Question Answering with Large Language Models Through Knowledge Graphs. In Proceedings of the 7th International Conference on Natural Language and Speech Processing (ICNLSP), Trento, NJ, USA, 19–20 October 2024; pp. 439–448. [Google Scholar]
Trinh, T.; Dao, A.; Nhung, H.T.H.; Son, H.T. VieMedKG: Knowledge Graph and Benchmark for Traditional Vietnamese Medicine. bioRxiv 2024. [Google Scholar] [CrossRef]
Li, Y.; Wang, Z.; Liu, Y.; Wang, L.; Liu, L.; Zhou, L. KARGEN: Knowledge-enhanced Automated Radiology Report Generation Using Large Language Models. arXiv 2024, arXiv:2409.05370. [Google Scholar]
Ren, X.; Tang, J.; Yin, D.; Chawla, N.; Huang, C. A Survey of Large Language Models for Graphs. In Proceedings of the 30th ACM SIGKDD Conference on Knowledge Discovery and Data Mining, Barcelona, Spain, 25–29 August 2024; pp. 6616–6626. [Google Scholar] [CrossRef]
Petersen, K.; Feldt, R.; Mujtaba, S.; Mattsson, M. Systematic Mapping Studies in Software Engineering; BCS Learning & Development: Nicosia, Cyprus, 2008. [Google Scholar] [CrossRef]
Dehal, R.S. Dataset—Reciprocal_Relationship_Of_KGs_And_LLMs. 2024. Available online: https://figshare.com/articles/dataset/Dataset_-_Reciprocal_Relationship_Of_KGs_And_LLMs/28468637/1?file=52560449 (accessed on 1 March 2025).
Sun, Y.; Xin, H.; Sun, K.; Xu, Y.E.; Yang, X.; Dong, X.L.; Tang, N.; Chen, L. Are Large Language Models a Good Replacement of Taxonomies? Proc. VLDB Endow. 2024, 17, 2919–2932. [Google Scholar] [CrossRef]
Venkatakrishnan, R.; Tanyildizi, E.; Canbaz, M.A. Semantic interlinking of Immigration Data using LLMs for Knowledge Graph Construction. In Proceedings of the Companion Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 605–608. [Google Scholar] [CrossRef]
Huang, Y.; Zeng, G. RD-P: A Trustworthy Retrieval-Augmented Prompter with Knowledge Graphs for LLMs. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, New York, NY, USA, 21–25 October 2024; pp. 942–952. [Google Scholar] [CrossRef]
Colombo, A. Leveraging Knowledge Graphs and LLMs to Support and Monitor Legislative Systems. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 5443–5446. [Google Scholar] [CrossRef]
Chen, Z.; Mao, H.; Li, H.; Jin, W.; Wen, H.; Wei, X.; Wang, S.; Yin, D.; Fan, W.; Liu, H.; et al. Exploring the Potential of Large Language Models (LLMs)in Learning on Graphs. ACM SIGKDD Explor. Newsl. 2024, 25, 42–61. [Google Scholar] [CrossRef]
Wei, W.; Ren, X.; Tang, J.; Wang, Q.; Su, L.; Cheng, S.; Wang, J.; Yin, D.; Huang, C. LLMRec: Large Language Models with Graph Augmentation for Recommendation. In Proceedings of the 17th ACM International Conference on Web Search and Data Mining, Merida, Mexico, 4–8 March 2024; pp. 806–815. [Google Scholar] [CrossRef]
Dong, X.L. Generations of Knowledge Graphs: The Crazy Ideas and the Business Impact. Proc. VLDB Endow. 2023, 16, 4130–4137. [Google Scholar] [CrossRef]
Wu, Q.; Wang, Y. Research on Intelligent Question-Answering Systems Based on Large Language Models and Knowledge Graphs. In Proceedings of the 2023 16th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 16–17 December 2023; pp. 161–164. [Google Scholar] [CrossRef]
Fieblinger, R.; Alam, M.T.; Rastogi, N. Actionable Cyber Threat Intelligence Using Knowledge Graphs and Large Language Models. In Proceedings of the 2024 IEEE European Symposium on Security and Privacy Workshops (EuroS&PW), Vienna, Austria, 8–12 July 2024; pp. 100–111. [Google Scholar] [CrossRef]
Vizcarra, J.; Haruta, S.; Kurokawa, M. Representing the Interaction between Users and Products via LLM-assisted Knowledge Graph Construction. In Proceedings of the 2024 IEEE 18th International Conference on Semantic Computing (ICSC), Laguna Hills, CA, USA, 5–7 February 2024; pp. 231–232. [Google Scholar] [CrossRef]
Kosten, C.; Cudré-Mauroux, P.; Stockinger, K. Spider4SPARQL: A Complex Benchmark for Evaluating Knowledge Graph Question Answering Systems. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023; pp. 5272–5281. [Google Scholar] [CrossRef]
Taffa, T.A.; Usbeck, R. Leveraging LLMs in Scholarly Knowledge Graph Question Answering. arXiv 2023, arXiv:2311.09841. [Google Scholar]
Ding, Z.; Cai, H.; Wu, J.; Ma, Y.; Liao, R.; Xiong, B.; Tresp, V. zrLLM: Zero-Shot Relational Learning on Temporal Knowledge Graphs with Large Language Models. arXiv 2024, arXiv:2311.10112. [Google Scholar]
Chen, L.; Xu, J.; Wu, T.; Liu, J. Information Extraction of Aviation Accident Causation Knowledge Graph: An LLM-Based Approach. Electronics 2024, 13, 3936. [Google Scholar] [CrossRef]
Braşoveanu, A.M.P.; Nixon, L.J.B.; Weichselbraun, A.; Scharl, A. Framing Few-Shot Knowledge Graph Completion with Large Language Models. Electronics 2024, 13, 3936. [Google Scholar]
Jiang, L.; Yan, X.; Usbeck, R. A Structure and Content Prompt-Based Method for Knowledge Graph Question Answering over Scholarly Data. 2023. Available online: https://ceur-ws.org/Vol-3592/paper3.pdf (accessed on 1 March 2025).
Mohanty, A. EduEmbedd—A Knowledge Graph Embedding for Education. 2023. Available online: https://ceur-ws.org/Vol-3532/paper1.pdf (accessed on 1 March 2025).
Pliukhin, D.; Radyush, D.; Kovriguina, L.; Mouromtsev, D. Improving Subgraph Extraction Algorithms for One-Shot SPARQL Query Generation with Large Language Models. 2024. Available online: https://ceur-ws.org/Vol-3592/paper6.pdf (accessed on 1 March 2025).
Vasisht, K.; Ganesan, B.; Kumar, V.; Bhatnagar, V. Infusing Knowledge into Large Language Models with Contextual Prompts. In Proceedings of the 20th International Conference on Natural Language Processing (ICON), Goa, India, 14–17 December 2023; Pawar, J.D., Lalitha Devi, S., Eds.; Goa University: Goa, India, 2023; pp. 657–662. [Google Scholar]
Schmidt, W.J.; Rincon-Yanez, D.; Kharlamov, E.; Paschke, A. Scaling Scientific Knowledge Discovery with Neuro-Symbolic AI and Large Language Models. 2024. Available online: https://publica.fraunhofer.de/entities/publication/a752f6fb-4cc0-46cc-9312-b6eff7f64334 (accessed on 1 March 2025).
Liu, S.; Fang, Y. Use Large Language Models for Named Entity Disambiguation in Academic Knowledge Graphs; Atlantis Press: Amsterdam, The Netherlands, 2023; pp. 681–691. [Google Scholar] [CrossRef]
Momii, Y.; Takiguchi, T.; Ariki, Y. Rule-based Fact Verification Utilizing Knowledge Graphs. 2023. Available online: https://www.jstage.jst.go.jp/article/jsaislud/99/0/99_51/_article/-char/en (accessed on 1 March 2025).
de Paiva, V.; Gao, Q.; Kovalev, P.; Moss, L.S. Extracting Mathematical Concepts with Large Language Models. 2023. Available online: https://cicm-conference.org/2023/mathui/mathuiPubs/CICM_2023_paper_8826.pdf (accessed on 1 March 2025).
Thießen, F.; D’Souza, J.; Stocker, M. Probing Large Language Models for Scientific Synonyms. 2023. Available online: https://ceur-ws.org/Vol-3510/paper_nlp_2.pdf (accessed on 1 March 2025).
Wang, F.; Shi, D.; Aguilar, J.; Cui, X.; Jiang, J.; Shen, L.; Li, M. LLM-KGMQA: Large Language Model-Augmented Multi-Hop Question-Answering System Based on Knowledge Graph in Medical Field. 2024. ISSN: 2693-5015. Available online: https://www.researchsquare.com/article/rs-4721418/v1 (accessed on 1 March 2025). [CrossRef]
Yang, J. Integrated Application of LLM Model and Knowledge Graph in Medical Text Mining and Knowledge Extraction. Soc. Med. Health Manag. 2024, 5, 56–62. [Google Scholar] [CrossRef]
Li, Q.; Chen, Z.; Ji, C.; Jiang, S.; Li, J. LLM-based Multi-Level Knowledge Generation for Few-shot Knowledge Graph Completion. In Proceedings of the Thirty-ThirdInternational Joint Conference on Artificial Intelligence, Jeju, Republic of Korea, 3–9 August 2024; pp. 2135–2143. [Google Scholar] [CrossRef]
Laurenzi, E.; Mathys, A.; Martin, A. An LLM-Aided Enterprise Knowledge Graph (EKG) Engineering Process. Proc. AAAI Symp. Ser. 2024, 3, 148–156. [Google Scholar] [CrossRef]
Gillani, K.; Novak, E.; Kenda, K.; Mladenić, D. Knowledge Graph Extraction from Textual Data Using LLM. 2024. Available online: https://is.ijs.si/wp-content/uploads/2024/10/IS2024_-_SIKDD_2024_paper_15-1.pdf (accessed on 1 March 2025).
Gupta, T.K.; Goel, T.; Verma, I.; Dey, L.; Bhardwaj, S. Knowledge Graph Aided LLM Based ESG Question-Answering from News. 2024. Available online: https://ceur-ws.org/Vol-3753/paper6.pdf (accessed on 1 March 2025).
Ghanem, H.; Cruz, C. Fine-Tuning vs. Prompting: Evaluating the Knowledge Graph Construction with LLMs. 2024. Available online: https://hal.science/hal-04862235/ (accessed on 1 March 2025).
Son, J.; Seonwoo, Y.; Yoon, S.; Thorne, J.; Oh, A. Multi-hop Database Reasoning with Virtual Knowledge Graph. In Proceedings of the 1st Workshop on Knowledge Graphs and Large Language Models (KaLLM 2024), Bangkok, Thailand, 15 August 2024; Biswas, R., Kaffee, L.A., Agarwal, O., Minervini, P., Singh, S., de Melo, G., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 1–11. [Google Scholar] [CrossRef]
Ventura de los Ojos, X. Application of LLM-Augmented Knowledge Graphs for Wirearchy Management; Universitat Oberta de Catalunya (UOC): Barcelona, Spain, 2024. [Google Scholar]
Dernbach, S.; Agarwal, K.; Zuniga, A.; Henry, M.; Choudhury, S. GLaM: Fine-Tuning Large Language Models for Domain Knowledge Graph Alignment via Neighborhood Partitioning and Generative Subgraph Encoding. Proc. AAAI Symp. Ser. 2024, 3, 82–89. [Google Scholar] [CrossRef]
Reitemeyer, B.; Fill, H.G. Leveraging LLMs in Semantic Mapping for Knowledge Graph-Based Automated Enterprise Model Generation; Gesellschaft für Informatik e.V.: Berlin, Germany, 2024. [Google Scholar] [CrossRef]
Kollegger, A.B.; Erdl, A.; Hunger, M. Knowledge Graph Builder—Constructing a Graph from Arbitrary Text Using an LLM. In Proceedings of the 32nd International Symposium on Graph Drawing and Network Visualization (GD 2024), Vienna, Austria, 18–20 September 2024; Felsner, S., Klein, K., Eds.; Schloss Dagstuhl–Leibniz-Zentrum für Informatik: Dagstuhl, Germany, 2024; Volume 320, pp. 61:1–61:2. [Google Scholar] [CrossRef]
Li, V.X.; Tan, Y. Dynamic Knowledge Graph Asset Pricing. 2024. Available online: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4841921 (accessed on 1 March 2025). [CrossRef]
Tufek, N.; Saissre, A.; Just, V.P.; Ekaputra, F.J.; Sabou, M.; Hanbury, A. Validating Semantic Artifacts with Large Language Models. In European Semantic Web Conference; Springer Nature: Cham, Switzerland, 2024; pp. 92–101. [Google Scholar] [CrossRef]
Gan, L.; Blum, M.; Dessí, D.; Mathiak, B.; Schenkel, R.; Dietze, S. Hidden Entity Detection from GitHub Leveraging Large Language Models. arXiv 2024, arXiv:2501.04455. [Google Scholar]
Martin, A.; Witschel, H.F.; Mandl, M.; Stockhecke, M. Semantic Verification in Large Language Model-based Retrieval Augmented Generation. Proc. AAAI Symp. Ser. 2024, 3, 188–192. [Google Scholar] [CrossRef]
Chu, J.M.; Lo, H.C.; Hsiang, J.; Cho, C.C. Patent Response System Optimised for Faithfulness: Procedural Knowledge Embodiment with Knowledge Graph and Retrieval Augmented Generation. In Proceedings of the 1st Workshop on Towards Knowledgeable Language Models (KnowLLM 2024), Bangkok, Thailand, 16 August 2024; Li, S., Li, M., Zhang, M.J., Choi, E., Geva, M., Hase, P., Ji, H., Eds.; Association for Computational Linguistics: Bangkok, Thailand, 2024; pp. 146–155. [Google Scholar] [CrossRef]
Dobriy, D. Employing RAG to Create a Conference Knowledge Graph from Text. 2024. Available online: https://ceur-ws.org/Vol-3747/text2kg_paper4.pdf (accessed on 1 March 2025).
Seif, A.; Toh, S.; Lee, H.K. A Dynamic Jobs-Skills Knowledge Graph. 2024. Available online: https://recsyshr.aau.dk/wp-content/uploads/2024/10/RecSysHR2024-paper_1.pdf (accessed on 1 March 2025).
Camboim de Sá, J.; Anastasiou, D.; Da Silveira, M.; Pruski, C. Socio-cultural adapted chatbots: Harnessing Knowledge Graphs and Large Language Models for enhanced context awarenes. In Proceedings of the 1st Worskhop on Towards Ethical and Inclusive Conversational AI: Language Attitudes, Linguistic Diversity, and Language Rights (TEICAI 2024), St. Julian’s, Malta, 22 March 2024; Hosseini-Kivanani, N., Höhn, S., Anastasiou, D., Migge, B., Soltan, A., Dippold, D., Kamlovskaya, E., Philippy, F., Eds.; Association for Computational Linguistics: St Julians, Malta, 2024; pp. 21–27. [Google Scholar]
Daga, E.; Carvalho, J.; Morales Tirado, A. Extracting Licence Information from Web Resources with a Large Language Model, Heraklion, Greece. 2024. Available online: https://oro.open.ac.uk/97612/ (accessed on 1 March 2025).
D’Souza, J.; Mihindukulasooriya, N. The State of the Art Large Language Models for Knowledge Graph Construction from Text: Techniques, Tools, and Challenges. 2024. Available online: https://research.ibm.com/publications/the-state-of-the-art-large-language-models-for-knowledge-graph-construction-from-text-techniques-tools-and-challenges (accessed on 1 March 2025).
Iga, V.I.R.; Silaghi, G.C. LLMs for Knowledge-Graphs Enhanced Task-Oriented Dialogue Systems: Challenges and Opportunities. In Proceedings of the Advanced Information Systems Engineering Workshops, Limassol, Cyprus, 3–7 June 2024; Almeida, J.P.A., Di Ciccio, C., Kalloniatis, C., Eds.; Springer: Cham, Switzerland, 2024; pp. 168–179. [Google Scholar] [CrossRef]
Zhao, Q.; Qian, H.; Liu, Z.; Zhang, G.D.; Gu, L. Breaking the Barrier: Utilizing Large Language Models for Industrial Recommendation Systems through an Inferential Knowledge Graph. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 5086–5093. [Google Scholar] [CrossRef]
Zhang, Y.; Chen, Z.; Guo, L.; Xu, Y.; Zhang, W.; Chen, H. Making Large Language Models Perform Better in Knowledge Graph Completion. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, VIC, Australia, 28 October–1 November 2024; pp. 233–242. [Google Scholar] [CrossRef]
Liu, L.; Wang, Z.; Bai, J.; Song, Y.; Tong, H. New Frontiers of Knowledge Graph Reasoning: Recent Advances and Future Trends. In Proceedings of the Companion Proceedings of the ACM Web Conference 2024, Singapore, 13–17 May 2024; pp. 1294–1297. [Google Scholar] [CrossRef]
Wu, L.I.; Su, Y.; Li, G. Zero-Shot Construction of Chinese Medical Knowledge Graph with GPT-3.5-turbo and GPT-4. ACM Trans. Manag. Inf. Syst. 2024, 16, 3657305. [Google Scholar] [CrossRef]
Bui, T.; Tran, O.; Nguyen, P.; Ho, B.; Nguyen, L.; Bui, T.; Quan, T. Cross-Data Knowledge Graph Construction for LLM-enabled Educational Question-Answering System: A Case Study at HCMUT. In Proceedings of the 1st ACM Workshop on AI-Powered Q&A Systems for Multimedia, Phuket, Thailand, 14 June 2024; pp. 36–43. [Google Scholar] [CrossRef]
Jiang, Y.; Yao, J.; Li, F.; Zhang, Y. Research on Engineering Management Question-answering System in the Communication Industry Based on Large Language Models and Knowledge Graphs. In Proceedings of the 2024 7th International Conference on Machine Vision and Applications, Singapore, 12–14 March 2024; pp. 100–105. [Google Scholar] [CrossRef]
Le, D.; Zhao, K.; Wang, M.; Wu, Y. GraphLingo: Domain Knowledge Exploration by Synchronizing Knowledge Graphs and Large Language Models. In Proceedings of the 2024 IEEE 40th International Conference on Data Engineering (ICDE), Utrecht, The Netherlands, 13–16 May 2024; pp. 5477–5480. [Google Scholar] [CrossRef]
Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying Large Language Models and Knowledge Graphs: A Roadmap. IEEE Trans. Knowl. Data Eng. 2024, 36, 3580–3599. [Google Scholar] [CrossRef]
Li, D.; Xu, F. The Deep Integration of Knowledge Graphs and Large Language Models: Advancements, Challenges, and Future Directions. In Proceedings of the 2024 IEEE 2nd International Conference on Sensors, Electronics and Computer Engineering (ICSECE), Jinzhou, China, 29–31 August 2024; pp. 157–162. [Google Scholar] [CrossRef]
Abu-Rasheed, H.; Weber, C.; Fathi, M. Knowledge Graphs as Context Sources for LLM-Based Explanations of Learning Recommendations. In Proceedings of the 2024 IEEE Global Engineering Education Conference (EDUCON), Kos, Greece, 8–11 May 2024; pp. 1–5. [Google Scholar] [CrossRef]
Xu, J.; Zhang, H.; Zhang, H.; Lu, J.; Xiao, G. ChatTf: A Knowledge Graph-Enhanced Intelligent Q&A System for Mitigating Factuality Hallucinations in Traditional Folklore. IEEE Access 2024, 12, 162638–162650. [Google Scholar] [CrossRef]
Song, Y.; Sun, P.; Liu, H.; Li, Z.; Song, W.; Xiao, Y.; Zhou, X. Scene-Driven Multimodal Knowledge Graph Construction for Embodied AI. IEEE Trans. Knowl. Data Eng. 2024, 36, 6962–6976. [Google Scholar] [CrossRef]
Jovanović, M.; Campbell, M. Connecting AI: Merging Large Language Models and Knowledge Graph. Computer 2023, 56, 103–108. [Google Scholar] [CrossRef]
Knez, T.; Žitnik, S. Towards Using Automatically Enhanced Knowledge Graphs to Aid Temporal Relation Extraction. In Proceedings of the First Workshop on Patient-Oriented Language Processing (CL4Health) @ LREC-COLING 2024; Demner-Fushman, D., Ananiadou, S., Thompson, P., Ondov, B., Eds.; ELRA and ICCL: Torino, Italy, 2024; pp. 131–136. [Google Scholar]
Cao, X.; Xu, W.; Zhao, J.; Duan, Y.; Yang, X. Research on Large Language Model for Coal Mine Equipment Maintenance Based on Multi-Source Text. Appl. Sci. 2024, 14, 2946. [Google Scholar] [CrossRef]
Schneider, P.; Klettner, M.; Simperl, E.; Matthes, F. A Comparative Analysis of Conversational Large Language Models in Knowledge-Based Text Generation. arXiv 2024, arXiv:2402.01495. [Google Scholar]
Xu, S.; Chen, M.; Chen, S. Enhancing Retrieval-Augmented Generation Models with Knowledge Graphs: Innovative Practices Through a Dual-Pathway Approach. In Proceedings of the Advanced Intelligent Computing Technology and Applications, Tianjin, China, 5–8 August 2024; Huang, D.S., Si, Z., Chen, W., Eds.; Springer Nature: Singapore, 2024; pp. 398–409. [Google Scholar] [CrossRef]
Huang, Q.; Wan, Z.; Xing, Z.; Wang, C.; Chen, J.; Xu, X.; Lu, Q. Let’s Chat to Find the APIs: Connecting Human, LLM and Knowledge Graph through AI Chain. In Proceedings of the 2023 38th IEEE/ACM International Conference on Automated Software Engineering (ASE), Luxembourg, 11–15 September 2023; pp. 471–483. [Google Scholar] [CrossRef]
Sequeda, J.; Allemang, D.; Jacob, B. A Benchmark to Understand the Role of Knowledge Graphs on Large Language Model’s Accuracy for Question Answering on Enterprise SQL Databases. In Proceedings of the 7th Joint Workshop on Graph Data Management Experiences & Systems (GRADES) and Network Data Analytics (NDA), Santiago, Chile, 14 June 2024; pp. 1–12. [Google Scholar] [CrossRef]
Fu, L.; Guan, H.; Du, K.; Lin, J.; Xia, W.; Zhang, W.; Tang, R.; Wang, Y.; Yu, Y. SINKT: A Structure-Aware Inductive Knowledge Tracing Model with Large Language Model. In Proceedings of the 33rd ACM International Conference on Information and Knowledge Management, Boise, ID, USA, 21–25 October 2024; pp. 632–642. [Google Scholar] [CrossRef]
Hu, Z.; Yang, P.; Liu, F.; Meng, Y.; Liu, X. Prompting Large Language Models with Knowledge-Injection for Knowledge-Based Visual Question Answering. Big Data Min. Anal. 2024, 7, 843–857. [Google Scholar] [CrossRef]
Agrawal, G.; Pal, K.; Deng, Y.; Liu, H.; Chen, Y.C. CyberQ: Generating Questions and Answers for Cybersecurity Education Using Knowledge Graph-Augmented LLMs. Proc. AAAI Conf. Artif. Intell. 2024, 38, 23164–23172. [Google Scholar] [CrossRef]
Hertling, S.; Paulheim, H. OLaLa: Ontology Matching with Large Language Models. In Proceedings of the 12th Knowledge Capture Conference 2023, Pensacola, FL, USA, 5–7 December 2023; pp. 131–139. [Google Scholar] [CrossRef]
Cadeddu, A.; Chessa, A.; De Leo, V.; Fenu, G.; Motta, E.; Osborne, F.; Reforgiato Recupero, D.; Salatino, A.; Secchi, L. Optimizing Tourism Accommodation Offers by Integrating Language Models and Knowledge Graph Technologies. Information 2024, 15, 398. [Google Scholar] [CrossRef]
Hello, N.; Di Lorenzo, P.; Strinati, E.C. Semantic Communication Enhanced by Knowledge Graph Representation Learning. In Proceedings of the 2024 IEEE 25th International Workshop on Signal Processing Advances in Wireless Communications (SPAWC), Lucca, Italy, 10–13 September 2024; pp. 876–880. [Google Scholar] [CrossRef]
Nguyen, D.A.K.; Kha, S.; Le, T.V. HybridGCN: An Integrative Model for Scalable Recommender Systems with Knowledge Graph and Graph Neural Networks. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1327. [Google Scholar] [CrossRef]
Zhao, J.; Ma, Z.; Zhao, H.; Zhang, X.; Liu, Q.; Zhang, C. Self-consistency, Extract and Rectify: Knowledge Graph Enhance Large Language Model for Electric Power Question Answering. In Proceedings of the Advanced Intelligent Computing Technology and Applications, Tianjin, China, 5–8 August 2024; Huang, D.S., Pan, Y., Guo, J., Eds.; Springer Nature: Singapore, 2024; pp. 493–504. [Google Scholar] [CrossRef]
Wu, L.I.; Li, G. Zero-Shot Construction of Chinese Medical Knowledge Graph with ChatGPT. In Proceedings of the 2023 IEEE International Conference on Medical Artificial Intelligence (MedAI), Beijing, China, 18–19 November 2023; pp. 278–283. [Google Scholar] [CrossRef]
Procko, T.T.; Ochoa, O. Graph Retrieval-Augmented Generation for Large Language Models: A Survey. In Proceedings of the 2024 Conference on AI, Science, Engineering, and Technology (AIxSET), Laguna Hills, CA, USA, 30 September–2 October 2024; pp. 166–169. [Google Scholar] [CrossRef]
Su, Y.; Liao, D.; Xing, Z.; Huang, Q.; Xie, M.; Lu, Q.; Xu, X. Enhancing Exploratory Testing by Large Language Model and Knowledge Graph. In Proceedings of the IEEE/ACM 46th International Conference on Software Engineering, Lisbon, Portugal, 14–20 April 2024; pp. 1–12. [Google Scholar] [CrossRef]
Jin, B.; Liu, G.; Han, C.; Jiang, M.; Ji, H.; Han, J. Large Language Models on Graphs: A Comprehensive Survey. IEEE Trans. Knowl. Data Eng. 2024, 36, 8622–8642. [Google Scholar] [CrossRef]
Procko, T.T.; Elvira, T.; Ochoa, O. GPT-4: A Stochastic Parrot or Ontological Craftsman? Discovering Implicit Knowledge Structures in Large Language Models. In Proceedings of the 2023 Fifth International Conference on Transdisciplinary AI (TransAI), Laguna Hills, CA, USA, 25–27 September 2023; pp. 147–154. [Google Scholar] [CrossRef]
Sun, Y.; Yang, W.; Liu, Y. The Application of Constructing Knowledge Graph of Oral Historical Archives Resources Based on LLM-RAG. In Proceedings of the 2024 8th International Conference on Information System and Data Mining, New York, NY, USA, 24–26 June 2024; pp. 142–149. [Google Scholar] [CrossRef]
Chen, Y.; Cui, S.; Huang, K.; Wang, S.; Tang, C.; Liu, T.; Fang, B. Improving Adaptive Knowledge Graph Construction via Large Language Models with Multiple Views. In Proceedings of the Knowledge Graph and Semantic Computing: Knowledge Graph Empowers Artificial General Intelligence, Shenyang, China, 24–27 August 2023; Wang, H., Han, X., Liu, M., Cheng, G., Liu, Y., Zhang, N., Eds.; Springer Nature: Singapore, 2023; pp. 273–284. [Google Scholar] [CrossRef]
Khorashadizadeh, H.; Amara, F.Z.; Ezzabady, M.; Ieng, F.; Tiwari, S.; Mihindukulasooriya, N.; Groppe, J.; Sahri, S.; Benamara, F.; Groppe, S. Research Trends for the Interplay between Large Language Models and Knowledge Graphs. arXiv 2024, arXiv:2406.08223. [Google Scholar]
Dehal, R.S.; Sharma, M.; de Souza Santos, R. Exposing Algorithmic Discrimination and Its Consequences in Modern Society: Insights from a Scoping Study. In Proceedings of the 46th International Conference on Software Engineering: Software Engineering in Society, New York, NY, USA, 14–20 April 2024; pp. 69–73. [Google Scholar] [CrossRef]
Peng, B.; Zhu, Y.; Liu, Y.; Bo, X.; Shi, H.; Hong, C.; Zhang, Y.; Tang, S. Graph Retrieval-Augmented Generation: A Survey. arXiv 2024, arXiv:2408.08921. [Google Scholar]
Guo, D.; Ren, S.; Lu, S.; Feng, Z.; Tang, D.; Liu, S.; Zhou, L.; Duan, N.; Svyatkovskiy, A.; Fu, S.; et al. GraphCodeBERT: Pre-training Code Representations with Data Flow. arXiv 2021, arXiv:2009.08366. [Google Scholar]
Andrus, B.R.; Nasiri, Y.; Cui, S.; Cullen, B.; Fulda, N. Enhanced Story Comprehension for Large Language Models through Dynamic Document-Based Knowledge Graphs. Proc. Aaai Conf. Artif. Intell. 2022, 36, 10436–10444. [Google Scholar] [CrossRef]

Figure 1. Systematic review selection process.

Figure 2. Systematic review visualization.

Figure 3. Distribution by model implemented.

Table 1. Inclusion and exclusion criteria for paper selection.

	Include Papers That Are		Exclude Papers That Are
IC-1	Written in English	EC-1	Published on blogs, forums, pages, or unofficial sites (e.g., not in conferences or journals)
IC-2	Published in an official conference or journal (peer-reviewed)	EC-2	Published in books or in ArXiv (not peer-reviewed in a conference or journal yet)
IC-3	Discusses KGs, LLMs, or Semantic graphs and explore their workings	EC-3	Primarily focused on AI/ML (determined by reviewing the title, introduction, and conclusion)
IC-4	Published between 2019-2024	EC-4	Mentions Knowledge Graphs but is not focused on them

Table 2. Models implemented in each study.

Model Implemented	Theme	Papers
Hybrid/Retrieval-Augmented Models	Combines LLMs with external retrieval, symbolic reasoning, or structured knowledge to enhance factual accuracy and context.	[11,12,13,14,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,38,39,40,41,42,43,44,45,46,47,48,49,50,51,52,53,54,55,56,57,58,59,60,61,62,63,64]
Encoder–Decoder Transformers (Seq2Seq Models)	Transforms input sequences into structured or translated outputs, ideal for summarization, translation, and KG-to-text tasks.	[65,66,67,68,69,70,71,72,73,74,75,76,77,78,79,80,81]
Autoregressive Transformers (Decoder-Only)	A family of autoregressive language models, designed to generate coherent and contextually relevant text.	[82,83,84,85,86]
Bidirectional Transformers (Encoder-Only)	A transformer-based model pre-trained for understanding the context of words in both directions.	[87,88]
Symbolic–Neural Hybrids/Others	Integrates neural networks with rule-based or symbolic components, often used in domain-specific or low-resource applications.	[89,90]

Table 3. Reciprocal relationship between KGs and LLMs.

Facets	Knowledge Graph → LLM	LLM → Knowledge Graph
Purpose	To enrich LLM’s output accuracy and data governance, providing factual grounding and regulatory compliance.	To automate or augment KG construction, expanding or updating the KG.
Input Format	Structured data (triples, graphs)	Unstructured data (text, documents)
Output	Improved text generation, reasoning, or prediction	Structured data such as nodes, edges, and triples
Key Processes	Grounding, reasoning, and contextualization	Entity extraction, relation identification
Example	Using a KG to answer, “Who is the CFO of Twitter?” accurately	Extracting “CFO of Twitter is Mr.X” from text to populate KG
Applications	Factual chatbots, personalized assistants	Dynamic KG updates, domain-specific KGs
Challenges	Handling incomplete or sparse KGs	Risk of introducing errors or biases

Example mentioned in the table are fictitious.

Table 4. Identified gaps aligned with research questions.

Research Question	Gap	Description
RQ1 (LLMs in KG Construction)	Static Construction Paradigms	Current focus is on static KGs, neglecting real-time dynamic updates.
	Automation Deficits	Need to address automated schema generation or entity linking.
RQ2 (KGs in Enhancing LLMs)	Explainability Challenges	Traceability between LLM outputs and KG nodes.
	Sparse Multi-Hop Reasoning	Leveraging KGs for complex reasoning (e.g., causal inference).
RQ3 (Methodological Diversity)	Ethical Oversights	Few studies address fairness, bias mitigation, or ethical governance.

Table 5. Strategic Recommendations.

Focus Area	Recommendation	Alignment with Gaps
Cross-Disciplinary Research	Prioritize underrepresented domains (e.g., finance, cultural heritage).	Addresses disciplinary imbalance and interdisciplinary gaps.
Multimodal Integration	Develop frameworks for combining KGs with visual, auditory, or sensor data.	Mitigates text-centric and multimodal input gaps.
Domain-Specific Models	Fine-tune LLMs and build specialized KGs for healthcare, finance, etc.	Reduces reliance on general-purpose models.
Explainability	Embed XAI frameworks to map LLM outputs to KG nodes.	Resolves explainability and traceability deficits.
Hybrid Methodologies	Integrate symbolic AI with machine learning for dynamic KG construction.	Addresses automation and methodological silos.
Ethical Governance	Institutionalize fairness audits and bias mitigation in methodologies.	Mitigates ethical oversights in sensitive domains.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dehal, R.S.; Sharma, M.; Rajabi, E. Knowledge Graphs and Their Reciprocal Relationship with Large Language Models. Mach. Learn. Knowl. Extr. 2025, 7, 38. https://doi.org/10.3390/make7020038

AMA Style

Dehal RS, Sharma M, Rajabi E. Knowledge Graphs and Their Reciprocal Relationship with Large Language Models. Machine Learning and Knowledge Extraction. 2025; 7(2):38. https://doi.org/10.3390/make7020038

Chicago/Turabian Style

Dehal, Ramandeep Singh, Mehak Sharma, and Enayat Rajabi. 2025. "Knowledge Graphs and Their Reciprocal Relationship with Large Language Models" Machine Learning and Knowledge Extraction 7, no. 2: 38. https://doi.org/10.3390/make7020038

APA Style

Dehal, R. S., Sharma, M., & Rajabi, E. (2025). Knowledge Graphs and Their Reciprocal Relationship with Large Language Models. Machine Learning and Knowledge Extraction, 7(2), 38. https://doi.org/10.3390/make7020038

Article Menu

Knowledge Graphs and Their Reciprocal Relationship with Large Language Models

Abstract

1. Introduction

2. Literature Review

3. Methods

3.1. Specific Research Questions

3.2. Data Sources and Search Strategy

3.3. Inclusion and Exclusion Criteria

3.4. Selection Process

3.5. Dimensions

3.6. Data Extraction, Data Analysis, and Synthesis

4. Results

4.1. RQ1: How Are LLMs Being Used to Construct KGs?

4.2. RQ2: How Are KGs Being Used to Improve the Output of LLMs?

4.3. RQ3: What AI Methodologies Are Used for LLM-Based KG Systems and KG-Based LLMs?

5. Discussion

5.1. The Reciprocal Relationship

5.1.1. KGs as Input into LLMs

5.1.2. LLMs to Build KGs

5.2. Limitations

6. Gap Analysis

6.1. Gaps Across Analytical Dimensions

6.1.1. Domains

6.1.2. Model Implementation

6.1.3. Applications

6.1.4. Input Modalities

6.2. Gaps Aligned with the Research Questions

6.2.1. RQ1: LLMs in KG Construction

6.2.2. RQ2: KGs in Enhancing LLM Outputs

6.2.3. RQ3: Methodological Diversity

6.3. Future Work and Strategic Recommendations

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI