5. Solution Architecture for Privacy-Aware and Trustworthy Conversational AI
With the advancement of conversational AI platforms powered by LLMs, there arises a need to address issues of explainability, data privacy, and the ethical use of data to ensure their safe usage in industry and practical applications. LLMs, despite their significant natural language understanding capabilities, often operate as black-box models, making their decisions difficult to interpret. Integrating knowledge graphs (KGs) [
46] with LLMs offers a solution to these challenges by coupling the structured knowledge representation of KGs with the linguistic proficiency of LLMs. Furthermore, the incorporation of role-based access control (RBAC) into the architecture ensures that access to the AI system aligns with organizational policies and is granted only to authorized roles.
The integration of key components such as LLMs, KGs, and RBAC systems provides distinct advantages in the development and application of conversational AI platforms.
LLMs are foundational to conversational platforms, offering deep linguistic understanding that facilitates nuanced interactions. These models are capable of continuous adaptation, refining their responses based on ongoing user interactions, which enhances the relevance and accuracy of their output over time.
KGs deliver structured and validated domain-specific knowledge, significantly enhancing the system’s explainability. They do this by tracing the origins of information, which not only clarifies the decision-making process but also complements LLMs by filling gaps in domain-specific expertise. This synergy between LLMs and KGs is crucial for applications requiring a high degree of accuracy and reliability in specialized fields.
RBAC ensures data privacy and aligns AI system usage with organizational policies and compliance mandates through controlled access mechanisms. This methodical approach to managing data access significantly advances the security of AI systems, providing a structured framework that supports secure and efficient operations across various levels of user engagement.
Several architectural paradigms can merge KGs and LLMs. As detailed by Shirui Pan et al. in [
47], the primary techniques include KG-enhanced LLMs, LLM-augmented KGs, and a synergistic LLMs+KGs approach. Our proposition of a closed-loop architecture is in line with this synergistic design, underscoring the advantages of combining LLMs and KGs.
5.1. System Components and Design Choices
To elucidate the practical implementation of our architectural design, we anchor our exposition using media and journalism, specifically leveraging news data aggregated from the AI NewsHub platform (
https://ainewshub.ie, accessed on 2 June 2024).
5.1.1. AI NewsHub Dataset Description
The AI NewsHub dataset is a systematically aggregated collection of news articles, procured daily from web crawling and search engines, and stored within a relational database system. Each record within this dataset represents an article and encapsulates the following attributes: identifier (ID), title, article content, published date, publisher name, and associated country of origin. Upon acquisition, articles undergo an analytical phase wherein they are classified based on their relevant topics, affiliated industry sectors, and originating publishers’ categories.
5.1.2. Construction of the AI NewsHub-Based KG
The KGs derived from the AI NewsHub dataset comprise two primary node classes: articles and topics. The ”article” node includes attributes such as ID, content, sentiment analysis results, and a multi-dimensional vector array. The ID and content are extracted directly from the dataset, sentiment is derived using the Vader sentiment analysis tool [
48], and the vector array is generated using the FastText library [
49].
Each ”article” node establishes a directed relationship with a ”topic” node if it is categorized under a specific topic. The ”topic” node contains the topic ID and name, extracted from the topic table in the dataset. The KG is managed using Neo4j [
50], with queries structured in Cypher [
51].
The data processing involved several key steps:
Extraction: Articles were parsed to extract relevant content using advanced natural language processing (NLP) techniques. The dataset included topic tags and industry sectors for AI news articles.
Transformation: The extracted content was transformed into a structured format suitable for graph representation, involving tagging and categorizing key entities and topics.
Data cleaning: Ensuring the quality and accuracy of the KG was crucial. This included the following:
- –
Deduplication: Identifying and merging duplicate nodes and relationships to avoid redundancy and ensure uniqueness.
- –
Normalization: Standardizing entity names and topics for consistency. Topics and industry sectors were mapped to each article, with relevant tags available on the AI NewsHub web portal at
www.ainewshub.ie (accessed on 2 June 2024).
- –
Validation: Cross-referencing data with reliable sources to verify accuracy before adding them to the KG, ensuring reliability.
Loading: The structured data were loaded into the Neo4j database, creating nodes and relationships according to the defined schema.
By the end of the development phase, the KG comprised approximately 13,000 nodes and 26,000 relationships, categorized into articles, topics, entities, and relationships such as “HAS_TOPIC”. A visual representation of a KG node is also shown in
Figure A3.
5.1.3. Llama-2 LLM
In July 2023, Meta introduced the Llama-2 series [
52], an advancement in its large language model lineup. The models in this series are characterized by parameters ranging from 7 billion to 70 billion. When compared to the preceding Llama-1 models, Llama-2 variants have been trained on an expanded dataset with an increase in the token count of 40% and an extended context length of 4000 tokens. The 70B model incorporates the grouped-query attention mechanism [
53], which is designed for efficient inference.
A specialized variant within the Llama-2 series is the Llama-2-Chat, which is fine-tuned for dialogue applications using reinforcement learning from human feedback (RLHF). Benchmark evaluations indicate that the Llama-2-Chat model offers improvements in areas such as user assistance and system security. Performance metrics suggest that this model’s capabilities align closely with those of ChatGPT, as per human evaluators. Additionally, the 70B version of Llama-2 has been recognized on HuggingFace’s OpenLLM leaderboard, exhibiting strong performance on several benchmarks, including but not limited to ARC [
54], HellaSwag [
55], MMLU [
56], and TruthfulQA (MC) [
57].
With the details of the three main components provided, we will now describe the proposed system architecture in the next section.
5.2. Architecture Workflow
Figure 10 and
Figure 11 present the architectural workflow of our proposed system. The former offers a more general overview, elucidating the synergistic connection between these three components. The latter is a detailed view of the intricate interactions between knowledge graphs (KGs), large language models (LLMs), and role-based access control (RBAC).
The detailed sequence of operations in the system is described as follows:
Step 1: The user (U) communicates a specific request to the RBAC service (S).
Step 2: The RBAC service (S) evaluates the permissions associated with the user, forwarding the request to the access control (AC). The AC determines the user’s data access rights. Upon granting access, the process proceeds to Step 3. If denied, it transitions to Step 2.2.
Step 3: The prompt analysis module refines and refactors the user’s prompt (if necessary) and identifies the key capabilities required to put together an appropriate response. In our case of journalism, the key capabilities include natural language understanding and generic output response or specialized capabilities such as similar article finder, sentiment analysis, fact-checking, and prediction for article topics and relevant industry sectors.
Step 4: The Llama-2 LLM processes the user’s request based on the identified capabilities. If a generic response is required, the LLM responds to the user directly (Step 4.1). If specialized features from the KG are required, the process moves to Step 4.2.
Step 5: Llama-2 generates or invokes relevant Cypher instructions for Neo4j based on the required capabilities.
Step 6: A Cypher validation layer (CVL) ensures the integrity and safety of these instructions. Validated queries are executed on the Neo4j knowledge graph.
Step 7: KG processes the queries, extracting the pertinent data. An error handling (EH) mechanism inspects the extracted insights for anomalies.
Step 8: Error-free insights are compiled to be returned back to the LLM.
Steps 9 and 10: LLM formats the insights for user-friendly presentation and provides a response to the user. The user (U) receives the curated data, ensuring only permitted information is accessed. Users have the option to offer feedback through a feedback loop (FB), which might guide the LLM’s subsequent interactions.
Based on the presented workflow, we now elucidate the decision-making process of the language learning model (LLM) based on two primary scenarios: one where the LLM interacts with the knowledge graph (KG) to deduce answers, and another where the LLM operates autonomously.
5.2.1. Interactions between LLM and KG
The LLM’s reliance on the KG is made manifest in the following application scenarios:
1. Similar article finder
For implementing recommendation systems in media and journalism services, it is imperative to identify articles akin to a given one. Using the cosine similarity metric, we ascertain the resemblance by evaluating the cosine value between the focal article and others. The topmost articles in terms of similarity scores are then selected, as represented in Algorithm 1.
Algorithm 1: Identify Top 5 Articles resembling Article 100 |
|
Computing the cosine similarity between two vectors typically has a complexity of , where n is the dimensionality of the vectors. Given that we are comparing a given article to m articles, then the complexity becomes . Sorting the similarity scores to retrieve the top five has a complexity of . Therefore, the overall complexity is .
2. Article sentiment analysis
In situations necessitating sentiment extraction from a specific article or specific articles with the same set of sentiments, the model retrieves the sentiment pre-recorded in the article node within the KG. This retrieval is achieved by executing a MATCH Cypher query based on the article’s unique identifier, as delineated in Algorithm 2.
Algorithm 2: Sentiment Extraction for Article with article_id 100 |
|
The complexity here is dominated by the search operation. Finding a node by its unique identifier in a well-indexed graph database like Neo4j is an operation.
3. Article topic prediction
Articles devoid of any designated topic can be assigned one based on the topics of closely resembling articles. This is realized by associating the given article with others and then linking the most similar one’s topic. Algorithm 3 elucidates the related Cypher query.
Algorithm 3: Topic Inference for Article 100 from its analogous article |
|
Similar to the “Similar Article Finder”, the complexity of computing the cosine similarity across all articles is . However, in this case, we also have a filtering operation, which selects articles with a similarity score greater than 0.97. The complexity of filtering is . Returning only one result (using LIMIT 1) does not substantially change the computational complexity, but it significantly improves the real-world performance of the query since it halts once a match is found. Therefore, the overall complexity in this case is .
5.2.2. Autonomous LLM Operations
LLMs extensively trained on diverse datasets demonstrate the ability to autonomously handle certain user queries without relying on external databases such as KGs. This autonomous capability is particularly evident in scenarios that require a deep understanding of text, summarization abilities, or responses that do not demand explicit verification against structured knowledge sources. For instance, in text summarization, the LLM applies its extensive training to distill and encapsulate essential points from articles independently of KG interactions. Additionally, the LLM efficiently addresses generic journalism queries, drawing on its comprehensive training across journalism-related content to provide insights on journalistic standards, writing styles, and media ethics. Moreover, the model excels in contextual interpretations, where it leverages its intrinsic understanding of relational data to interpret and respond to journalism-related inquiries based on the surrounding context without the need to extract explicit facts.
5.3. Architecture Evaluation for Selective Tasks
We performed an experimental evaluation of the proposed architecture using the AI NewsHub dataset to benchmark our solution against two mainstream approaches: Approach A (baseline Llama-2 without KG) and Approach B (LLM with vector databases) using proprietary Microsoft Azure GPT-3.5 with an off-the-shelf RAG pipeline employing hybrid search.
The accuracy of each approach was evaluated using specific applications: contextual information retrieval and query-based knowledge extraction. This metric was calculated by comparing the retrieved information against a manually curated ground truth dataset for 50 news articles.
Table 3 demonstrates the superior performance of the proposed architecture over existing approaches in terms of the accuracy of information retrieval and contextual understanding.
6. Discussions
Conversational AI, while advancing rapidly, faces hurdles in balancing linguistic depth with accurate information representation. As highlighted in our comprehensive review through the Large Language Model Explorer (LLMXplorer), which provides a systemic overview of numerous LLMs, linguistic proficiency often comes at the expense of transparency. Knowledge graphs, while offering factual precision, might fall short of mimicking human conversational fluidity.
The evolution and integration of LLMs in the digital environment necessitate a comprehensive understanding of their multifaceted implications. Open-source models promote AI democratization, paving the way for innovation. Concurrently, the challenges presented by algorithmic bias and the potential widening of digital divides underscore the importance of interdisciplinary collaboration among technologists, policy-makers, and end-users. The clarification of legal frameworks, especially in data protection and intellectual property, becomes essential given LLMs’ capabilities. Additionally, the anthropomorphic attributes of these systems elevate concerns about user over-reliance and misplaced trust. Achieving the optimal utility of LLMs requires a concerted effort to advance technology, which underscores their vast potential. Still, it also highlights the need for architectures that can bridge the aforementioned gaps and, in response, address ethical, legal, and physiological dimensions.
Through our exhaustive applied analysis of the practical use cases, challenges, and limitations of LLMs across industries, our work introduces a functional solution architecture that uniquely integrates KGs and LLMs. This system not only stands out in its linguistic capabilities but also ensures factual consistency. With the incorporation of RBAC, we further the cause of data security by restricting users to role-specific information and thereby fostering trust for real-world and industry-wide use cases.
The application domain of media and journalism, exemplified using rich data from the real-world product (AI NewsHub) platform, serves as both a case study and a validation point. It is a testament to the architecture’s robustness and efficiency. Notably, the adaptability of the proposed architecture means it can seamlessly cater to a myriad of use cases and applications, emphasizing its cross-industry relevance.
The proposed architecture underscores its significance through several pivotal design choices. It employs a specialized large language model (LLM), Llama-2-Chat, which is fine-tuned for dialogue, ensuring enhanced conversational accuracy and quality. The use of Neo4j for knowledge graph navigation leverages its open-source nature, widespread industry adoption, and proficiency in navigating knowledge structures, facilitated by the efficient Cypher query language. A Cypher validation layer is strategically positioned between the LLM and Neo4j to validate queries, mitigating potential unintended data access and enhancing system security. Additionally, the architecture incorporates a feedback mechanism that allows for the iterative refinement of system responses and continuous learning of the LLM. Moreover, the system adopts specific algorithms for tasks such as article similarity detection, sentiment analysis, and topic prediction, which contribute to its transparency and maintainability.
Our solution architecture presents critical advancements tailored to address contemporary challenges in conversational AI. Through the seamless amalgamation of KGs and LLMs, the design strikes an equilibrium between linguistic sophistication and factual veracity, obviating the shortcomings inherent to individual models. The employment of the Llama-2-Chat LLM enhances conversational fidelity, elevating the user interaction experience. By harnessing Neo4j for knowledge graph navigation, the architecture not only optimizes data retrieval but also exhibits adaptability to evolving data paradigms. The use of RBAC provides further guardrails for enhanced data security. Additionally, an iterative feedback mechanism ensures the system’s alignment with dynamic user needs. Our methodological stance, demonstrated through a tailored algorithmic approach, enhances transparency and mitigates the “black box” dilemma associated with deep learning-based models including LLMs. In essence, our architecture epitomizes a forward-thinking approach, synergizing linguistic depth with factual accuracy, all while underscoring user trust and systemic resilience.
6.1. Research Limitations
While our architecture has been rigorously tested and validated within the media and journalism context using the AI NewsHub platform, its performance and applicability in other industries remain to be explored. This could potentially limit the generalizability of our findings.
The employment of the Llama-2-Chat demonstrates promising conversational capabilities. However, its comprehensive adaptability and performance in diverse linguistic and cultural scenarios are yet to be thoroughly evaluated.
Furthermore, the iterative refinement enabled by our feedback loop is heavily dependent on user engagement and the quality of their input. This introduces the potential risk of bias or skewed learning if not carefully curated.
Lastly, our study focuses predominantly on the Neo4j database for navigating the KG. The exploration of benefits or challenges associated with alternative knowledge graph databases is beyond this research’s scope but warrants attention in future investigations.
6.2. Future Implications
The converging point of linguistic depth and factual accuracy, as epitomized by our system, is a promising indication of where conversational AI is headed. It sets a precedent for building architectures that are not only sophisticated in their response generation but also trustworthy in the information they convey. By addressing key challenges and providing tangible solutions, this work paves the way for future technologies that prioritize both efficiency and trustworthiness. Future endeavors could focus on expanding the architecture’s cross-domain adaptability, evaluating diverse LLMs, and refining feedback mechanisms in tandem with expanding knowledge graph databases.