Quantifying Relational Exploration in Cultural Heritage Knowledge Graphs with LLMs: A Neuro-Symbolic Approach for Enhanced Knowledge Discovery
Abstract
:1. Introduction
- ⁎
- Generate contextually relevant and diverse explanations: LLMs are trained on massive text corpora, giving them a much richer understanding of language and context than traditional NLP models. They can grasp the subtle relationships between concepts and generate explanations that are more nuanced and insightful. As such, we leveraged the power of LLMs to produce more diverse and flexible explanations that go beyond predefined templates. These explanations were able to adapt to the specific semantic context of the relationships, enhancing the user’s comprehension of the discovered information. We aimed at creating explanations that go beyond simple information and provide deep insights about the discovered relationships.
- ⁎
- Personalize explanations by considering user context: LLMs are designed to provide explanations that are much easier for users to understand compared to graph paths or complex SPARQL queries. We achieved this by prompting the LLM with structural text that contains connection information as well as context and letting the LLM create a response based on the structural text. In this context, we aimed at creating a system that provides personalized explanations by considering the user’s current context, preferences, knowledge, and past interactions, ensuring that the presented explanations are both relevant and engaging. This is achieved by allowing the user to define specific parameters or interests, as well as implicitly by taking into account past interactions and expertise.
- ⁎
- Quantify relationship interestingness using a mathematical model: We introduced a novel mathematical model to assess the interestingness of relationships. This model, which combines measures of semantic relatedness within the graph and contextual relevance to a specific user, enabled our system to effectively guide the exploration process, revealing non-trivial and valuable connections. This provided an objective and data-driven approach to guide the exploration process, moving beyond simple relevance and enabling the discovery of non-obvious connections.
2. Materials and Methods
2.1. Knowledge-Agnostic Graph Traversal
2.2. Knowledge-Based Approaches
2.3. Explainable AI in Knowledge Graphs
2.4. Large Language Models in Knowledge Graph Reasoning
2.5. Connection Discovery
Algorithm 1 Connection Discovery |
Input:
|
- ⁎
- “Person X was born in place Y”;
- ⁎
- “Person X works in place Y”;
- ⁎
- “Person X wrote a book about place Y”;
- ⁎
- “Person X created a painting that depicts place Y”.
- ⁎
- entity1_id: The identifier of the first connected entity;
- ⁎
- entity2_id: The identifier of the second connected entity;
- ⁎
- relationship_type: The type of relationship connecting the entities;
- ⁎
- relevant_metadata: Relevant metadata, such as times or additional information (when available);
- ⁎
- explanation_text: A simple textual label describing the relationship—this will later be used by the LLM for generating explanations.
- ⁎
- : The overall interestingness score of the relationship instance r between entities e1 and e2;
- ⁎
- : The semantic relatedness score between entities e1 and e2—this quantifies how closely related two entities are in the context of the graph structure;
- ⁎
- : The contextual relevance score of the relationship r to the user’s context U—this takes into consideration the user’s preferences, history, and expertise;
- ⁎
- : A weight parameter (0 ≤ α ≤ 1) that balances the influence of semantic relatedness and contextual relevance.
2.5.1. Semantic Relatedness (SR)
- ⁎
- : The set of all simple paths connecting entity e1 and e2 within the KG—a simple path is a path that does not visit a node twice;
- ⁎
- : The total number of simple paths connecting e1 and e2;
- ⁎
- : The length of the path , measured as the number of edges traversed.
2.5.2. Contextual Relevance (CR)
- ⁎
- : The vector embedding of the textual description of the relationship , generated by a Large Language Model (LLM).
- ⁎
- : The vector embedding of the user’s context , also generated by an LLM. The user context can be a concatenation of different types of information: search history, domain expertise, specified user interests or other available information. The LLM can take this combined context information to generate the embeddings.
- ⁎
- : The cosine similarity between the embeddings of and . The score ranges from −1 to 1, with values closer to 1 indicating a higher degree of contextual relevance.
2.6. Explanation Generation with LLMs
- ⁎
- entity1_description: A concise description of the first entity;
- ⁎
- entity2_description: A concise description of the second entity;
- ⁎
- relationship_type: The type of relationship connecting the two entities;
- ⁎
- interestingness_score: The value of ;
- ⁎
- user_context_description: A textual representation of the user’s context .
Algorithm 2 Explanation Generation |
Input:
|
3. Results
3.1. Dataset
- ⁎
- Triple Filtering: Triples that were not directly or indirectly related to the selected entity types were excluded. This ensured a cleaner and more relevant subset for evaluation.
- ⁎
- Normalization: We normalized all the entities and relationships by removing blank nodes, and resolving duplicate entities. This helped in avoiding errors due to inconsistent identifiers and ensured consistent results.
- ⁎
- Missing Value Handling: Incomplete entities and triples that did not include all required attributes (like entity ID and relation type) were removed.
3.2. Implementation Details
- ⁎
- LLM: We used the Llama-2-7B model, specifically the version that was released on 18 July 2023. The model was fine-tuned using a combined dataset consisting of 70% domain-specific data from the WCH-LOD dataset, and 30% synthetic data generated using our mathematical model by creating varied combinations of entities and relationships. The fine-tuning process used 10 training epochs with a batch size of 16 and a learning rate of 2 × 10−5, and a learning rate decay of 0.01. We used AdamW as the optimizer, which provides a good balance between quality and computational efficiency. This model was chosen due to the balance between its performance and its open-source availability.
- ⁎
- Embedding Model: We used the all-mpnet-base-v2 model from the Sentence Transformers library (version 2.2.2). This model was selected due to its proven ability to capture semantic similarity within textual data, which is critical for accurately measuring the relevance of user context and the relationship. The Sentence Transformers library provides an easy method to generate embeddings, and this model is known to be suitable for many different types of text.
- ⁎
- Graph Database: Neo4j v4.4.14 was used as our graph database, chosen for its scalability, expressiveness, and powerful Cypher query language. We loaded the extracted RDF data into Neo4j by transforming the RDF triples into Neo4j nodes and relationships. We created indexes on the entity IDs for faster lookup. We used the cypher query language to implement our queries, which is a very powerful query language for graph databases.
- ⁎
- SPARQL Engine: The Apache Jena Fuseki SPARQL server (version 4.10.0) was used as our SPARQL engine. This was chosen due to its stability, its open-source nature, and the ease of creating and executing SPARQL queries using this engine.
- ⁎
- Faceted Search: We implemented a custom faceted search interface using Python Flask (version 2.3.3) and JavaScript. This implementation allows users to interactively explore the results and modify the alpha parameter using a slider, which dynamically adjusts the interestingness score to explore different perspectives on the data. The faceted search also shows summaries of the results, allowing the user to perform a more exploratory search.
3.3. Experimental Procedures
- Baseline Comparisons: Our system was compared to two baseline methods:
- ⁎
- Graph Traversal Baseline: We implemented a standard Breadth-First Search (BFS) algorithm using Neo4j’s graph database library and the shortest path method, which provides the shortest path between two entities based on the number of edges. We used the default implementation of Dijkstra’s algorithm provided by the Neo4j library to calculate the shortest path. For the BFS, the explanation was generated by a simple textual description of the traversed path, which does not include any domain knowledge, and does not take user context into account.
- ⁎
- Knowledge-Based Baseline: We created a knowledge-based baseline by manually constructing 50 SPARQL queries, and 50 simple text-based templates for generating explanations. The SPARQL queries were created based on the most frequent types of relationships found in the dataset. The templates were created using a simple keyword-based approach, and did not include a mathematical model for guiding the search. Examples of the manual SPARQL queries used for the knowledge-based baseline are as follows (Box 2):Box 2. A sampel of the manual SPARQL queries used for the knowledge-based baseline.
- Quantitative Evaluation: To assess the quality and performance of the system, we computed the following metrics:
- ⁎
- Precision and Recall: We measured precision and recall against a manually created gold standard dataset. The dataset consists of 200 manually created relationships and their correct, human-generated explanations. Two domain experts with cultural heritage knowledge were employed to validate each result and generate the correct explanation. A result was considered correct if it either matched a result from the gold standard dataset, or was independently validated by the two experts. The inter-rater agreement between experts was 0.85, which indicates a high level of consistency in the annotation process. Any discrepancies in results or explanations were discussed and resolved in a meeting of the experts.
- ⁎
- Text Quality: Text quality was measured using BLEU (Bilingual Evaluation Understudy), ROUGE-L (Recall-Oriented Understudy for Gisting Evaluation), and METEOR (Metric for Evaluation of Translation with Explicit Ordering) scores. These are widely used in NLP for evaluating the quality of generated text. BLEU measures n-gram overlap between the generated text and reference texts, assessing the similarity of n-grams. ROUGE-L computes the longest common subsequence, capturing longer matching patterns. METEOR is an alignment-based metric that includes synonyms and stems, aligning word forms and measuring the degree of exact and near-matches between two texts. While these metrics are good for measuring the similarity of two texts, they may not capture the broader coherence or quality of the text.
- ⁎
- Interestingness Correlation: We used Spearman’s rank correlation to measure the degree to which the scores produced by our mathematical model match a set of randomly generated scores. This is a measure of the monotonic relationship between the two variables. We generated the random ratings by assigning a random score between 0 and 1 to each relationship, and this was used as a baseline to assess the quality of our interestingness score. If the value is close to 1, it means the proposed method is better than random, and that it is capturing a specific type of relation between data.
- ⁎
- Diversity: The diversity of the generated explanations was manually analyzed by two domain experts, who classified results based on the variety of topics, and the novelty of the discovered relationships. Novel results were classified as those results that were not commonly found using standard keyword searches or basic graph traversal methods, which include interesting insights.
3.4. Quantitative Results
- ⁎
- Precision, Recall, and F1-Score: The LLM-enhanced approach achieved a precision of 0.70, a recall of 0.68, and an F1-score of 0.69. These results significantly outperform both the graph traversal baseline (precision: 0.28, recall: 0.25, F1-score: 0.26) and the knowledge-based baseline (precision: 0.45, recall: 0.42, F1-score: 0.43). The quantitative results clearly demonstrate the efficacy of our approach in discovering relevant relationships while minimizing the discovery of irrelevant relationships. The results show a considerable gain compared to the other methods. The graph traversal performed particularly poorly, because it did not include any domain knowledge, and generated paths that were not very meaningful. The knowledge-based method did much better by relying on human-engineered SPARQL queries, but the LLM-enhanced approach improved over the baseline by using both the LLM and the interestingness metric, leading to better results.
- ⁎
- Text Quality Metrics: The text generated by our method was evaluated using BLEU, ROUGE-L, and METEOR. Our method scores the best values in all the metrics with BLEU = 0.52, ROUGE-L = 0.58, and METEOR = 0.63. These results indicate that the LLM is able to produce more human-like texts when compared to the baselines. The knowledge-based baseline uses only simple templates, which results in poor text quality. The graph baseline produces no natural language outputs, and relies on a simple textual representation of the path.
- ⁎
- Interestingness Correlation: The Spearman correlation between the interestingness scores produced by our method, and the random ratings was 0.65; this demonstrates that our mathematical model is significantly more correlated with relevance compared to random values and is, therefore, a useful tool to guide the search. Both the graph-based and the knowledge-based baselines have no correlation score, which means that the results are not very correlated with a relevant and meaningful score of interestingness. The interestingness correlation is an important guide for our method, and these results highlight its importance.
- ⁎
- The experiments clearly demonstrate the advantage of using our proposed neuro-symbolic method when compared to traditional techniques.
- ⁎
- Effectiveness of LLMs: By incorporating an LLM, we were able to generate more relevant and human-interpretable explanations than the template-based baseline or the graph traversal baseline, which shows the effectiveness of LLMs.
- ⁎
- Importance of Mathematical Model: The novel mathematical model, based on the computation of both the semantic relatedness and contextual relevance, plays a key role in discovering relevant and non-trivial relationships, as highlighted by the interestingness correlation, which is significantly higher when compared to both baselines. This shows the importance of using a mathematical model to guide the system.
4. Conclusions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Fan, T.; Wang, H. Research of Chinese intangible cultural heritage knowledge graph construction and attribute value extraction with graph attention network. Inf. Process. Manag. 2022, 59, 102753. [Google Scholar] [CrossRef]
- Maree, M.; Rattrout, A.; Altawil, M.; Belkhatir, M. Multi-modality Search and Recommendation on Palestinian Cultural Heritage Based on the Holy-Land Ontology and Extrinsic Semantic Resources. J. Comput. Cult. Herit. 2021, 14, 29. [Google Scholar] [CrossRef]
- Pellegrino, M.A.; Scarano, V.; Spagnuolo, C. Move cultural heritage knowledge graphs in everyone’s pocket. Semant. Web 2023, 14, 323–359. [Google Scholar] [CrossRef]
- Carriero, V.A.; Gangemi, A.; Mancinelli, M.L.; Marinucci, L.; Nuzzolese, A.G.; Presutti, V.; Veninata, C. ArCo: The Italian cultural heritage knowledge graph. In Proceedings of the 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, 26–30 October 2019; Proceedings, Part II 18. Springer: Berlin/Heidelberg, Germany, 2019. [Google Scholar]
- Carriero, V.A.; Gangemi, A.; Mancinelli, M.L.; Nuzzolese, A.G.; Presutti, V.; Veninata, C. Pattern-based design applied to cultural heritage knowledge graphs. Semant. Web 2021, 12, 313–357. [Google Scholar] [CrossRef]
- Dhaliwal, C.K.; Brar, T.P.S. Unveiling Cultural Treasures Through Digital Learning, in Digital Cultural Heritage; CRC Press: Boca Raton, FL, USA, 2024; pp. 123–154. [Google Scholar]
- Rahaman, H. Digital heritage interpretation: A conceptual framework. Digit. Creat. 2018, 29, 208–234. [Google Scholar] [CrossRef]
- Shao, S.; Ribeiro, P.H.; Ramirez, C.M.; Moore, J.H. A review of feature selection strategies utilizing graph data structures and Knowledge Graphs. Brief. Bioinform. 2024, 25, bbae521. [Google Scholar] [CrossRef]
- Capodiferro, C.; De Maria, M.; Mazzei, M.; Spreafico, M.; Bik, O.V.; Palma, A.L.; Solovyeva, A.V. Cultural Itineraries Generated by Smart Data on the Web. ISPRS Int. J. Geo-Inf. 2024, 13, 47. [Google Scholar] [CrossRef]
- Righetto, L.; Khademizadeh, M.; Giachetti, A.; Ponchio, F.; Gigilashvili, D.; Bettio, F.; Gobbetti, E. Efficient and user-friendly visualization of neural relightable images for cultural heritage applications. ACM J. Comput. Cult. Herit. 2024, 17, 1–24. [Google Scholar] [CrossRef]
- Tartari, G.; Hogan, A. WiSP: Weighted Shortest Paths for RDF Graphs; Universidad de Chile: Santiago, Chile, 2018. [Google Scholar]
- Hyvönen, E. Using the Semantic Web in digital humanities: Shift from data publishing to data-analysis and serendipitous knowledge discovery. Semant. Web 2020, 11, 187–193. [Google Scholar] [CrossRef]
- Liang, K.; Liu, H.; Shan, M.; Zhao, J.; Li, X.; Zhou, L. Enhancing scenic recommendation and tour route personalization in tourism using ugc text mining. Appl. Intell. 2024, 54, 1063–1098. [Google Scholar] [CrossRef]
- Maree, M.; Belkhatir, M. Addressing semantic heterogeneity through multiple knowledge base assisted merging of domain-specific ontologies. Knowl.-Based Syst. 2015, 73, 199–211. [Google Scholar] [CrossRef]
- Mountantonakis, M.; Tzitzikas, Y. Generating SPARQL Queries over CIDOC-CRM using a Two-Stage Ontology Path Patterns Method in LLM Prompts. ACM J. Comput. Cult. Herit. 2025, 18, 1–20. [Google Scholar] [CrossRef]
- Cheng, G.; Shao, F.; Qu, Y. An empirical evaluation of techniques for ranking semantic associations. IEEE Trans. Knowl. Data Eng. 2017, 29, 2388–2401. [Google Scholar] [CrossRef]
- Cheng, G.; Zhang, Y.; Qu, Y. Explass: Exploring associations between entities via top-k ontological patterns and facets. In Proceedings of the 13th International Semantic Web Conference (ISWC 2014), Riva del Garda, Italy, 19–23 October 2014; Proceedings, Part II 13. Springer: Berlin/Heidelberg, Germany, 2014. [Google Scholar]
- Rantala, H.; Leskinen, P.; Peura, L.; Hyvönen, E. Representing and searching associations in cultural heritage knowledge graphs using faceted search. In Knowledge Graphs in the Age of Language Models and Neuro-Symbolic AI; IOS Press: Amsterdam, The Netherlands, 2024; pp. 420–435. [Google Scholar]
- Ismail, S.; Shaikh, T. EnSense—A commonality checker for Semantic Web. In Proceedings of the 2017 International Conference on IoT in Social, Mobile, Analytics and Cloud (I-SMAC), Palladam, India, 10–11 February 2017; IEEE: Piscataway, NJ, USA, 2017. [Google Scholar]
- Li, J.; Bikakis, A. Towards a Semantics-Based Recommendation System for Cultural Heritage Collections. Appl. Sci. 2023, 13, 8907. [Google Scholar] [CrossRef]
- Bugueño, M.; Biswas, R.; de Melo, G. Graph-Based Explainable AI: A Comprehensive Survey; HAL Open Science: Lyon, France, 2024. [Google Scholar]
- Bobasheva, A.; Gandon, F.; Precioso, F. Learning and reasoning for cultural metadata quality: Coupling symbolic AI and machine learning over a semantic web knowledge graph to support museum curators in improving the quality of cultural metadata and information retrieval. J. Comput. Cult. Herit. JOCCH 2022, 15, 1–23. [Google Scholar] [CrossRef]
- Wang, R.; Deng, J.; Guan, X.; He, Y. A framework of genealogy knowledge reasoning and visualization based on a knowledge graph. Library Hi Tech 2024, 42, 1977–1999. [Google Scholar] [CrossRef]
- Wang, Y.; Liu, J.; Wang, W.; Chen, J.; Yang, X.; Sang, L.; Wen, Z.; Peng, Q. Construction of Cultural Heritage Knowledge Graph Based on Graph Attention Neural Network. Appl. Sci. 2024, 14, 8231. [Google Scholar] [CrossRef]
- Zhou, J.; Cui, G.; Hu, S.; Zhang, Z.; Yang, C.; Liu, Z.; Wang, L.; Li, C.; Sun, M. Graph neural networks: A review of methods and applications. AI Open 2020, 1, 57–81. [Google Scholar] [CrossRef]
- Díaz-Rodríguez, N.; Pisoni, G. Accessible cultural heritage through explainable artificial intelligence. In Proceedings of the 28th ACM Conference on User Modeling, Adaptation and Personalization, Genoa, Italy, 12–18 July 2020. [Google Scholar]
- Arrieta, A.B.; Díaz-Rodríguez, N.; Del Ser, J.; Bennetot, A.; Tabik, S.; Barbado, A.; Garcia, S.; Gil-Lopez, S.; Molina, D.; Benjamins, R.; et al. Explainable Artificial Intelligence (XAI): Concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 2020, 58, 82–115. [Google Scholar] [CrossRef]
Metric | Graph Baseline | Knowledge Baseline | Our Approach (LLM) |
---|---|---|---|
Precision (P) | 0.28 | 0.45 | 0.70 |
Recall (R) | 0.25 | 0.42 | 0.68 |
F1-score | 0.26 | 0.43 | 0.69 |
BLEU Score | 0.18 | 0.30 | 0.52 |
ROUGE-L Score | 0.24 | 0.32 | 0.58 |
METEOR Score | 0.28 | 0.40 | 0.63 |
Interestingness Correlation | N/A | N/A | 0.65 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the author. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Maree, M. Quantifying Relational Exploration in Cultural Heritage Knowledge Graphs with LLMs: A Neuro-Symbolic Approach for Enhanced Knowledge Discovery. Data 2025, 10, 52. https://doi.org/10.3390/data10040052
Maree M. Quantifying Relational Exploration in Cultural Heritage Knowledge Graphs with LLMs: A Neuro-Symbolic Approach for Enhanced Knowledge Discovery. Data. 2025; 10(4):52. https://doi.org/10.3390/data10040052
Chicago/Turabian StyleMaree, Mohammed. 2025. "Quantifying Relational Exploration in Cultural Heritage Knowledge Graphs with LLMs: A Neuro-Symbolic Approach for Enhanced Knowledge Discovery" Data 10, no. 4: 52. https://doi.org/10.3390/data10040052
APA StyleMaree, M. (2025). Quantifying Relational Exploration in Cultural Heritage Knowledge Graphs with LLMs: A Neuro-Symbolic Approach for Enhanced Knowledge Discovery. Data, 10(4), 52. https://doi.org/10.3390/data10040052