Building Trust in Conversational AI: A Review and Solution Architecture Using Large Language Models and Knowledge Graphs

Zafar, Ahtsham; Parthasarathy, Venkatesh Balavadhani; Van, Chan Le; Shahid, Saad; Khan, Aafaq Iqbal; Shahid, Arsalan

doi:10.3390/bdcc8060070

Open AccessArticle

Building Trust in Conversational AI: A Review and Solution Architecture Using Large Language Models and Knowledge Graphs

by

Ahtsham Zafar

,

Venkatesh Balavadhani Parthasarathy

,

Chan Le Van

,

Saad Shahid

,

Aafaq Iqbal Khan

and

Arsalan Shahid

^*

CeADAR-Ireland’s Centre for AI, University College Dublin, Belfield, D04 V2N9 Dublin, Ireland

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2024, 8(6), 70; https://doi.org/10.3390/bdcc8060070

Submission received: 29 April 2024 / Revised: 11 June 2024 / Accepted: 13 June 2024 / Published: 17 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

Conversational AI systems have emerged as key enablers of human-like interactions across diverse sectors. Nevertheless, the balance between linguistic nuance and factual accuracy has proven elusive. In this paper, we first introduce LLMXplorer, a comprehensive tool that provides an in-depth review of over 205 large language models (LLMs), elucidating their practical implications, ranging from social and ethical to regulatory, as well as their applicability across industries. Building on this foundation, we propose a novel functional architecture that seamlessly integrates the structured dynamics of knowledge graphs with the linguistic capabilities of LLMs. Validated using real-world AI news data, our architecture adeptly blends linguistic sophistication with factual rigor and further strengthens data security through role-based access control. This research provides insights into the evolving landscape of conversational AI, emphasizing the imperative for systems that are efficient, transparent, and trustworthy.

Keywords:

knowledge graphs; large language models; LLMXplorer; role-based access control; trustworthiness; Neo4j

1. Introduction

The ability to comprehend and engage in natural language dialogue is an intrinsic aspect of human intelligence. In contrast, machines inherently lack this capability, necessitating methods to translate natural language into machine-readable formats and vice versa. Conversational AI, which has evolved since the mid-1960s, addresses this gap. Initial developments were based on pattern-matching, while more recent approaches leverage generative AI technologies for coherent, context-aware responses. This evolution has captured the interest of both researchers and the wider community, leading to applications across various domains such as healthcare [1,2,3], retail [4,5], journalism [6,7], finance [8,9], and education [10,11].

1.1. History and Evolution of Conversational AI

The journey of conversational AI can be traced back to the mid-1960s, marked by seminal technologies like Eliza [12], PARRY [13], and subsequent innovations such as Watson [14], A.L.I.C.E. [15], and GPT models [16]. Eliza’s keyword identification approach was succeeded by PARRY’s conceptual modeling of mental disorders. A.L.I.C.E. in 1995 marked a significant shift by utilizing heuristic pattern-matching with XML-based rules. The progression continued with SmarterChild’s [17] natural language processing in 2001, IBM Watson’s semantic analysis, and further personal assistants like Siri [18], Google Assistant [19], Cortana [20], and Alexa [21]. These developments culminated with OpenAI’s ChatGPT [22], grounded in the GPT-4o architecture, which harnessed vast datasets for more nuanced conversational capabilities, fueling a range of industry-wide use cases (Appendix B). Figure 1 provides a timeline encapsulating these critical advancements.

1.2. Development and Advancement of LLMs

Historically, language models were constrained by limited training data and specific tasks like classification. The introduction of generative pretrained transformers (GPT) marked a significant evolution, expanding the modeling capabilities to encompass zero-shot and few-shot learning [23,24]. LLMs now support a wide range of NLP tasks such as text summarization, machine translation, sentiment analysis, and named entity recognition.

Several types of LLMs have emerged, each with unique characteristics. Autoregressive models, like OpenAI’s GPT-3 and GPT-2, generate text by predicting the next word based on preceding words. Transformer-based models, including Google’s BERT [25], T5 [26], and PaLM [27], use self-attention mechanisms to manage dependencies in input data, enhancing performance on various tasks. Seq2Seq models, exemplified by the original Transformer model, are designed for tasks involving input and output sequences, such as translation. Additionally, multimodal models like OpenAI’s DALL-E and CLIP handle multiple data types simultaneously, showcasing advanced capabilities in processing and generating text and images.

The transformation began with BERT, a pioneer in bidirectional encoding [25], followed by OpenAI’s GPT-1, which laid the groundwork for generative AI. GPT-2 introduced “Prompt Engineering”, which significantly improved text generation [28]. T5 expanded the field further with an 11-billion-parameter model [26]. OpenAI’s GPT-3 became a landmark achievement with 175 billion parameters [29], while Google’s PaLM [27] and Meta’s LLaMA continued to push the boundaries with even larger models. OpenAI’s ChatGPT, fine-tuned from GPT-3.5, specializes in dialogue generation through reinforcement learning with human feedback (RLHF).

Recent innovations include OpenAI’s GPT-4o [30] and Google’s Gemini 1.5 Pro, which are multimodal models capable of processing text, audio, and images, reflecting rapid advancements that have paved the way for more robust and versatile natural language understanding and generation.

1.3. Key Challenges of LLMs

Despite the substantial advancements in LLMs and their growing prevalence in various applications, several inherent limitations and challenges remain. These include the following:

Hallucination: In their quest for coherence, LLMs may inadvertently prioritize fluency, potentially at the expense of factual accuracy. Such behavior arises from the inherent nature of LLMs, which, while trained on expansive datasets, generate outputs based on recognized patterns rather than genuine comprehension. Consequently, while the generated content may appear plausible, it is susceptible to factual inaccuracies.
Trustworthiness: The expansion in the deployment of LLMs in critical domains such as healthcare and finance accentuates the imperative for reliable output. Assessing the veracity of information, particularly when it originates from regions outside the model’s explicit training data, poses substantial challenges.
Explainability: The complexity of deep learning models, LLMs included, often results in a phenomenon colloquially termed the “black box” problem. As these models evolve, incorporating an increasing number of parameters, their internal decision-making processes become less transparent. This opacity becomes particularly problematic in sectors where clear interpretability and rationale for outputs are mandated.
Untraining ability: LLMs undergo meticulous fine-tuning based on vast datasets. Consequently, adjusting or rectifying specific knowledge components without embarking on a comprehensive retraining process remains a notable challenge. Such modifications become essential in cases where the previously acquired knowledge becomes outdated or is misaligned with evolving societal norms.
Privacy and ethical awareness: Given the breadth of data on which LLMs are trained, there exists a non-trivial risk of inadvertently disclosing private or sensitive information during interactions. Moreover, there is an ethical dimension to consider, especially when models generate or amplify biased or potentially harmful content. Addressing these concerns mandates a holistic approach, encompassing not just model refinement but also infrastructural and oversight considerations.

To address some of the aforementioned challenges, retrieval-augmented generation (RAG) [31] has emerged as an effective NLP system that enhances LLMs by combining retrieval-based and generation-based models. In RAG, a retrieval model first extracts pertinent information from a vast database. A generation model then uses these retrieved data, along with the initial query, to formulate the final response. For the retrieval component, structured databases such as knowledge graphs or vector databases are crucial. However, for enterprise-scale applications that handle extensive datasets, vector databases often fall short compared to knowledge graphs. Knowledge graphs excel due to their ability to understand data semantically, represent data structurally, and their superior interpretability, scalability, and ease of integration with external knowledge. They also facilitate better reasoning and inference: capabilities that vector databases typically lack.

1.4. Main Contributions and Paper Organization

This paper proposes a novel architecture that combines the strengths of large language models (LLMs), knowledge graphs (KGs), and role-based access control (RBAC). By integrating LLMs’ natural language understanding capabilities with KGs’ structured representation of real-world knowledge and RBAC’s access control and privacy mechanisms, the proposed architecture aims to address the aforementioned limitations.

To enhance clarity, the paper is structured into two main parts. The first part provides an extensive review of the state-of-the-art of LLMs, highlighting their performance and limitations. This review sets the stage for understanding the critical gaps that need to be addressed. Building on these insights, the second part presents our proposed architecture, demonstrating how the integration of LLMs with KGs and RBAC can mitigate the identified challenges and improve overall system performance.

The main contributions of this paper include the following:

A comprehensive review of large language models (LLMs): We present a Large Language Model Explorer (LLMXplorer), an Excel-based tool that encapsulates over 150 key open and closed source LLMs. It provides a systematic overview of 16 key features, including parameters, training hardware, applications, industry relevance, country of origin, and more, offering insights into the state and evolution of the field.
Applied analysis of LLMs in various industries: An extensive investigation into practical use cases, limitations, and challenges in the technological adoption of LLMs across different sectors. This analysis contributes to a better understanding of real-world applications and the potential for trustworthy conversational AI.
Functional solution architecture: The development of a state-of-the-art, privacy-aware, explainable, and trustworthy conversational AI architecture. This design uniquely integrates LLMs, knowledge graphs, and RBAC, and is empirically tested on the curated AI news dataset.

The rest of this paper is organized as follows: In Section 2, an exhaustive review of prominent LLMs is provided and LLMXplorer is introduced. Section 3 and Section 4 discuss the practical implications, technological impacts, market trends, and industry-specific applications of LLMs. Section 5 introduces a functional solution architecture integrating LLMs with knowledge graphs, aiming for a trustworthy conversational AI system. Section 6 presents the discussions and future implications. Finally, Section 7 concludes the paper.

2. Comprehensive Review of State-of-the-Art LLMs

The release of OpenAI’s GPT-3.5 in October 2022 [32] spurred significant advancements in LLMs, with major companies like Google and Meta launching new models to enhance accessibility and capabilities. Appendix A discusses the methodologies and high-level training processes tailored for LLMs, shedding light on key aspects such as architecture, training methodologies, and accessibility.

Recent surveys, such as [33,34,35], provide valuable insights into the development and capabilities of LLMs, focusing on architectural innovations, training methodologies, and reasoning capabilities. While these reviews offer substantial information, they often lack a dynamic and continuously updated tool for tracking ongoing developments. Our review addresses this gap with the introduction of LLMXplorer.

2.1. LLMXplorer: Large Language Model Explorer

With the proliferation of open-source LLMs, there are now over 36,000 distinct variations [36]. The Large Language Model Explorer, i.e., LLMXplorer, is freely available. Access requests can be sent via: https://forms.gle/TNUbqHiCBsinD4Bu8 (accessed on 2 June 2024) [37] is introduced as an Excel-based tool that provides insights into a concise catalog of over 205 LLMs based on strict inclusion criteria, such as public accessibility, detailed documentation, practical applications, recognizable releasing entities, and comprehensive training details. It classifies them based on capabilities, applications, industry sectors, and more. The LLMXplorer captures 17 key attributes for both open-source and proprietary models, including the following:

Release Date: Official launch date of the LLM.
Number of Parameters: Indication of model complexity.
Base Model: Foundational pretrained model for the LLM.
Training Data Size: Dataset volume used in training.
Training Hardware: Infrastructure used in model training.
Training Time: Duration of model training.
Context Length: Length of the input text the model can process in one go.
Target Application: Primary use case of the LLM.
Model License Type: Governing license of the LLM.
Modalities: Supported data types (text, image, video).
Releasing Company: Entity releasing the LLM.
Industry: Targeted domain or sector of the LLM.
Origin Country: Base country of the releasing entity.
Estimated User Base: Approximate adoption rate.
Privacy Awareness: Model’s emphasis on data protection.
Ethical Awareness: Model’s ethical design and use considerations.
Reference Link: Source link for more LLM details.

2.2. Insights from LLMXplorer

Figure 2 illustrates the number of open-source and closed-source language models released each year from 2018 to April 2024, highlighting a significant increase in the release of open-source LLMs starting in 2020, with a peak in 2023 at over 60 models. In contrast, the number of closed-source LLMs also increased initially but plateaued at around 20 models per year from 2021 to 2023. This trend highlights the growing preference and focus on open-source development in the AI community, possibly driven by the benefits of collaborative improvement and transparency.

Figure 3 and Figure 4 provide detailed timelines of large language model (LLM) development for both open- and closed-source categories. From Figure 2, over the past seven years, open-source LLMs have seen an average of 19 releases per year with considerable variability (standard deviation of 21.6), peaking at 65 models in one year. In comparison, closed-source LLMs, over the last five years, have averaged 14 releases per year with a more consistent output (standard deviation of 9.87) and an annual range of 3 to 25 models. Both sectors demonstrate a growth trend in annual releases, underscoring a heightened interest in LLM development. Notably, the growth rate of closed-source models is slower, suggesting differences in community engagement or resource allocation.

Development frequencies by company, represented in Figure 5 and Figure 6, reveal that Meta dominates in the realm of open-source LLM development, while Google Research is the leader in developing closed-source LLMs, highlighting the significant roles these companies play in the field.

The geographical distribution of LLM development, as seen in Figure 7 and Figure 8, indicates that the USA is the leading country in developing both open- and closed-source models, followed by the UK, Canada, and China, showing a strong regional concentration in LLM advancements.

Figure 9 illustrates the parameter distributions for open- and closed-source models. Open-source LLMs show a parameter range from 0.035 billion to 198 billion, with an average size of 27.77 billion parameters. Models like CPM2, BLOOM, BLOOMZ, OPT, and BlenderBot3 exceed 175 billion parameters. Conversely, closed-source models range from 0.06 billion to 1200 billion parameters, with an average of 182.94 billion. Exceptionally, models like GLaM and PanGu-E have surpassed the trillion-parameter mark.

3. Applied and Technology Implications for LLMs

This section highlights the social, ethical, legal, privacy, regulatory, and physiological implications of LLMs.

3.1. Social and Ethical Implications

The open-source availability of large language models (LLMs) has democratized AI access, fostering innovation across various user groups and providing a competitive alternative to the offerings of tech giants [38]. Techniques such as LoRA [39] allow efficient internal data handling and quick model fine-tuning for personalized model development. However, the complexity of LLMs introduces risks of misuse and bias, highlighting the need for diverse development teams and comprehensive bias audits to ensure fairness. Moreover, disparities in LLM access potentially exacerbate societal digital divides, necessitating enhanced efforts in technology dissemination and education [40].

Responsible development of AI focuses on balancing the benefits of LLMs with potential risks, requiring collaborative efforts from multiple stakeholders to promote transparency, robust data governance, and educational outreach. The opaque nature of these models (“black box”) complicates trust and understanding, making research into the explainability and development of user-friendly interpretations crucial. Furthermore, addressing the propagation of harmful content involves implementing diverse training datasets and watermarking techniques, supported by both algorithmic and human oversight mechanisms to ensure safe and responsible LLM deployment [41].

3.2. Legal, Privacy, and Regulatory Perspective

LLMs must adhere to regulations like GDPR while preventing unintentional data disclosure. Recent regulatory shifts, like the “EU AI ACT” [42] and proposed Chinese regulations, emphasize responsible generative AI use. Furthermore, LLMs’ capacity to generate human-like text challenges copyright norms. Addressing this requires clear intellectual property guidelines and plagiarism detection tools [43].

3.3. Physiological Perspectives

LLM-based conversational agents carry potential risks, including undue trust, psychological vulnerabilities, and design biases. Promoting responsible design and prioritizing user privacy is paramount. Anthropomorphized systems may lead to user overreliance, risking unsafe outcomes or psychological harm, which underscores the need for diligent oversight and responsible usage practices. Additionally, users might unknowingly divulge private information to these agents, highlighting the importance of implementing robust safeguards and launching user education campaigns to prevent misuse. Furthermore, the design of conversational agents often unintentionally reinforces stereotypes, requiring sensitivity and adherence to unbiased design principles to mitigate such biases [44].

4. Market Analysis of LLMs and Cross-Industry Use Cases

4.1. Market Size of LLMs and Driving Factors

The global natural language processing (NLP) market is projected to surpass USD 91 billion by 2030, expanding at a CAGR of 27% [45]. A major driving force behind this growth is the influence of large language models (LLMs). While exact market figures for LLMs remain elusive, certain factors listed in Table 1 hint at their increasing dominance. Despite the absence of precise market data, it is evident that the sector demonstrates substantial potential and is set to continue expanding across various industries.

4.2. LLM Development Opportunities

The development of LLMs presents several applied challenges that are critical to address in order to mitigate potential risks associated with their deployment. One of the primary concerns is the ability of LLMs to generate disinformation, which can convincingly undermine the credibility of information and lead to harmful consequences. Additionally, these models have the capability to produce deepfakes and other forms of sophisticated manipulated media, which pose significant threats of deception and can be used to manipulate public opinion.

Privacy and data concerns are also paramount, as the handling of vast amounts of training data can introduce vulnerabilities, potentially leading to data breaches and the exposure of sensitive information. Moreover, there is the risk of bias amplification, where LLMs may inadvertently reflect and reinforce existing biases present in their training datasets, thus perpetuating societal prejudices.

The potential for malicious usage of LLMs to produce harmful, offensive, or abusive content is another significant challenge, which can facilitate hate speech or targeted assaults. The outputs from LLMs can sometimes lead to unintended outcomes or raise ethical concerns, highlighting the urgent need for stringent oversight and the establishment of comprehensive guidelines to ensure responsible development and application of these technologies. Recognizing the challenges associated with LLMs, there exists a spectrum of opportunities for refining and evolving these models, as shown in Table 2, with the potential to fuel cross-industry business use cases (see Appendix B).

5. Solution Architecture for Privacy-Aware and Trustworthy Conversational AI

With the advancement of conversational AI platforms powered by LLMs, there arises a need to address issues of explainability, data privacy, and the ethical use of data to ensure their safe usage in industry and practical applications. LLMs, despite their significant natural language understanding capabilities, often operate as black-box models, making their decisions difficult to interpret. Integrating knowledge graphs (KGs) [46] with LLMs offers a solution to these challenges by coupling the structured knowledge representation of KGs with the linguistic proficiency of LLMs. Furthermore, the incorporation of role-based access control (RBAC) into the architecture ensures that access to the AI system aligns with organizational policies and is granted only to authorized roles.

The integration of key components such as LLMs, KGs, and RBAC systems provides distinct advantages in the development and application of conversational AI platforms.

LLMs are foundational to conversational platforms, offering deep linguistic understanding that facilitates nuanced interactions. These models are capable of continuous adaptation, refining their responses based on ongoing user interactions, which enhances the relevance and accuracy of their output over time.

KGs deliver structured and validated domain-specific knowledge, significantly enhancing the system’s explainability. They do this by tracing the origins of information, which not only clarifies the decision-making process but also complements LLMs by filling gaps in domain-specific expertise. This synergy between LLMs and KGs is crucial for applications requiring a high degree of accuracy and reliability in specialized fields.

RBAC ensures data privacy and aligns AI system usage with organizational policies and compliance mandates through controlled access mechanisms. This methodical approach to managing data access significantly advances the security of AI systems, providing a structured framework that supports secure and efficient operations across various levels of user engagement.

Several architectural paradigms can merge KGs and LLMs. As detailed by Shirui Pan et al. in [47], the primary techniques include KG-enhanced LLMs, LLM-augmented KGs, and a synergistic LLMs+KGs approach. Our proposition of a closed-loop architecture is in line with this synergistic design, underscoring the advantages of combining LLMs and KGs.

5.1. System Components and Design Choices

To elucidate the practical implementation of our architectural design, we anchor our exposition using media and journalism, specifically leveraging news data aggregated from the AI NewsHub platform (https://ainewshub.ie, accessed on 2 June 2024).

5.1.1. AI NewsHub Dataset Description

The AI NewsHub dataset is a systematically aggregated collection of news articles, procured daily from web crawling and search engines, and stored within a relational database system. Each record within this dataset represents an article and encapsulates the following attributes: identifier (ID), title, article content, published date, publisher name, and associated country of origin. Upon acquisition, articles undergo an analytical phase wherein they are classified based on their relevant topics, affiliated industry sectors, and originating publishers’ categories.

5.1.2. Construction of the AI NewsHub-Based KG

The KGs derived from the AI NewsHub dataset comprise two primary node classes: articles and topics. The ”article” node includes attributes such as ID, content, sentiment analysis results, and a multi-dimensional vector array. The ID and content are extracted directly from the dataset, sentiment is derived using the Vader sentiment analysis tool [48], and the vector array is generated using the FastText library [49].

Each ”article” node establishes a directed relationship with a ”topic” node if it is categorized under a specific topic. The ”topic” node contains the topic ID and name, extracted from the topic table in the dataset. The KG is managed using Neo4j [50], with queries structured in Cypher [51].

The data processing involved several key steps:

Extraction: Articles were parsed to extract relevant content using advanced natural language processing (NLP) techniques. The dataset included topic tags and industry sectors for AI news articles.
Transformation: The extracted content was transformed into a structured format suitable for graph representation, involving tagging and categorizing key entities and topics.
Data cleaning: Ensuring the quality and accuracy of the KG was crucial. This included the following:
–
Deduplication: Identifying and merging duplicate nodes and relationships to avoid redundancy and ensure uniqueness.
–
Normalization: Standardizing entity names and topics for consistency. Topics and industry sectors were mapped to each article, with relevant tags available on the AI NewsHub web portal at www.ainewshub.ie (accessed on 2 June 2024).
–
Validation: Cross-referencing data with reliable sources to verify accuracy before adding them to the KG, ensuring reliability.
Loading: The structured data were loaded into the Neo4j database, creating nodes and relationships according to the defined schema.

By the end of the development phase, the KG comprised approximately 13,000 nodes and 26,000 relationships, categorized into articles, topics, entities, and relationships such as “HAS_TOPIC”. A visual representation of a KG node is also shown in Figure A3.

5.1.3. Llama-2 LLM

In July 2023, Meta introduced the Llama-2 series [52], an advancement in its large language model lineup. The models in this series are characterized by parameters ranging from 7 billion to 70 billion. When compared to the preceding Llama-1 models, Llama-2 variants have been trained on an expanded dataset with an increase in the token count of 40% and an extended context length of 4000 tokens. The 70B model incorporates the grouped-query attention mechanism [53], which is designed for efficient inference.

A specialized variant within the Llama-2 series is the Llama-2-Chat, which is fine-tuned for dialogue applications using reinforcement learning from human feedback (RLHF). Benchmark evaluations indicate that the Llama-2-Chat model offers improvements in areas such as user assistance and system security. Performance metrics suggest that this model’s capabilities align closely with those of ChatGPT, as per human evaluators. Additionally, the 70B version of Llama-2 has been recognized on HuggingFace’s OpenLLM leaderboard, exhibiting strong performance on several benchmarks, including but not limited to ARC [54], HellaSwag [55], MMLU [56], and TruthfulQA (MC) [57].

With the details of the three main components provided, we will now describe the proposed system architecture in the next section.

5.2. Architecture Workflow

Figure 10 and Figure 11 present the architectural workflow of our proposed system. The former offers a more general overview, elucidating the synergistic connection between these three components. The latter is a detailed view of the intricate interactions between knowledge graphs (KGs), large language models (LLMs), and role-based access control (RBAC).

The detailed sequence of operations in the system is described as follows:

Step 1: The user (U) communicates a specific request to the RBAC service (S).
Step 2: The RBAC service (S) evaluates the permissions associated with the user, forwarding the request to the access control (AC). The AC determines the user’s data access rights. Upon granting access, the process proceeds to Step 3. If denied, it transitions to Step 2.2.
Step 3: The prompt analysis module refines and refactors the user’s prompt (if necessary) and identifies the key capabilities required to put together an appropriate response. In our case of journalism, the key capabilities include natural language understanding and generic output response or specialized capabilities such as similar article finder, sentiment analysis, fact-checking, and prediction for article topics and relevant industry sectors.
Step 4: The Llama-2 LLM processes the user’s request based on the identified capabilities. If a generic response is required, the LLM responds to the user directly (Step 4.1). If specialized features from the KG are required, the process moves to Step 4.2.
Step 5: Llama-2 generates or invokes relevant Cypher instructions for Neo4j based on the required capabilities.
Step 6: A Cypher validation layer (CVL) ensures the integrity and safety of these instructions. Validated queries are executed on the Neo4j knowledge graph.
Step 7: KG processes the queries, extracting the pertinent data. An error handling (EH) mechanism inspects the extracted insights for anomalies.
Step 8: Error-free insights are compiled to be returned back to the LLM.
Steps 9 and 10: LLM formats the insights for user-friendly presentation and provides a response to the user. The user (U) receives the curated data, ensuring only permitted information is accessed. Users have the option to offer feedback through a feedback loop (FB), which might guide the LLM’s subsequent interactions.

Based on the presented workflow, we now elucidate the decision-making process of the language learning model (LLM) based on two primary scenarios: one where the LLM interacts with the knowledge graph (KG) to deduce answers, and another where the LLM operates autonomously.

5.2.1. Interactions between LLM and KG

The LLM’s reliance on the KG is made manifest in the following application scenarios:

1. Similar article finder

For implementing recommendation systems in media and journalism services, it is imperative to identify articles akin to a given one. Using the cosine similarity metric, we ascertain the resemblance by evaluating the cosine value between the focal article and others. The topmost articles in terms of similarity scores are then selected, as represented in Algorithm 1.

Algorithm 1: Identify Top 5 Articles resembling Article 100

Computing the cosine similarity between two vectors typically has a complexity of

O (n)

, where n is the dimensionality of the vectors. Given that we are comparing a given article to m articles, then the complexity becomes

O (n \times m)

. Sorting the similarity scores to retrieve the top five has a complexity of

O (m log m)

. Therefore, the overall complexity is

O (m \times n + m log m)

.

2. Article sentiment analysis

In situations necessitating sentiment extraction from a specific article or specific articles with the same set of sentiments, the model retrieves the sentiment pre-recorded in the article node within the KG. This retrieval is achieved by executing a MATCH Cypher query based on the article’s unique identifier, as delineated in Algorithm 2.

Algorithm 2: Sentiment Extraction for Article with article_id 100

The complexity here is dominated by the search operation. Finding a node by its unique identifier in a well-indexed graph database like Neo4j is an

O (1)

operation.

3. Article topic prediction

Articles devoid of any designated topic can be assigned one based on the topics of closely resembling articles. This is realized by associating the given article with others and then linking the most similar one’s topic. Algorithm 3 elucidates the related Cypher query.

Algorithm 3: Topic Inference for Article 100 from its analogous article

Similar to the “Similar Article Finder”, the complexity of computing the cosine similarity across all articles is

O (m \times n)

. However, in this case, we also have a filtering operation, which selects articles with a similarity score greater than 0.97. The complexity of filtering is

O (m)

. Returning only one result (using LIMIT 1) does not substantially change the computational complexity, but it significantly improves the real-world performance of the query since it halts once a match is found. Therefore, the overall complexity in this case is

O (m \times n)

.

5.2.2. Autonomous LLM Operations

LLMs extensively trained on diverse datasets demonstrate the ability to autonomously handle certain user queries without relying on external databases such as KGs. This autonomous capability is particularly evident in scenarios that require a deep understanding of text, summarization abilities, or responses that do not demand explicit verification against structured knowledge sources. For instance, in text summarization, the LLM applies its extensive training to distill and encapsulate essential points from articles independently of KG interactions. Additionally, the LLM efficiently addresses generic journalism queries, drawing on its comprehensive training across journalism-related content to provide insights on journalistic standards, writing styles, and media ethics. Moreover, the model excels in contextual interpretations, where it leverages its intrinsic understanding of relational data to interpret and respond to journalism-related inquiries based on the surrounding context without the need to extract explicit facts.

5.3. Architecture Evaluation for Selective Tasks

We performed an experimental evaluation of the proposed architecture using the AI NewsHub dataset to benchmark our solution against two mainstream approaches: Approach A (baseline Llama-2 without KG) and Approach B (LLM with vector databases) using proprietary Microsoft Azure GPT-3.5 with an off-the-shelf RAG pipeline employing hybrid search.

The accuracy of each approach was evaluated using specific applications: contextual information retrieval and query-based knowledge extraction. This metric was calculated by comparing the retrieved information against a manually curated ground truth dataset for 50 news articles.

Table 3 demonstrates the superior performance of the proposed architecture over existing approaches in terms of the accuracy of information retrieval and contextual understanding.

6. Discussions

Conversational AI, while advancing rapidly, faces hurdles in balancing linguistic depth with accurate information representation. As highlighted in our comprehensive review through the Large Language Model Explorer (LLMXplorer), which provides a systemic overview of numerous LLMs, linguistic proficiency often comes at the expense of transparency. Knowledge graphs, while offering factual precision, might fall short of mimicking human conversational fluidity.

The evolution and integration of LLMs in the digital environment necessitate a comprehensive understanding of their multifaceted implications. Open-source models promote AI democratization, paving the way for innovation. Concurrently, the challenges presented by algorithmic bias and the potential widening of digital divides underscore the importance of interdisciplinary collaboration among technologists, policy-makers, and end-users. The clarification of legal frameworks, especially in data protection and intellectual property, becomes essential given LLMs’ capabilities. Additionally, the anthropomorphic attributes of these systems elevate concerns about user over-reliance and misplaced trust. Achieving the optimal utility of LLMs requires a concerted effort to advance technology, which underscores their vast potential. Still, it also highlights the need for architectures that can bridge the aforementioned gaps and, in response, address ethical, legal, and physiological dimensions.

Through our exhaustive applied analysis of the practical use cases, challenges, and limitations of LLMs across industries, our work introduces a functional solution architecture that uniquely integrates KGs and LLMs. This system not only stands out in its linguistic capabilities but also ensures factual consistency. With the incorporation of RBAC, we further the cause of data security by restricting users to role-specific information and thereby fostering trust for real-world and industry-wide use cases.

The application domain of media and journalism, exemplified using rich data from the real-world product (AI NewsHub) platform, serves as both a case study and a validation point. It is a testament to the architecture’s robustness and efficiency. Notably, the adaptability of the proposed architecture means it can seamlessly cater to a myriad of use cases and applications, emphasizing its cross-industry relevance.

The proposed architecture underscores its significance through several pivotal design choices. It employs a specialized large language model (LLM), Llama-2-Chat, which is fine-tuned for dialogue, ensuring enhanced conversational accuracy and quality. The use of Neo4j for knowledge graph navigation leverages its open-source nature, widespread industry adoption, and proficiency in navigating knowledge structures, facilitated by the efficient Cypher query language. A Cypher validation layer is strategically positioned between the LLM and Neo4j to validate queries, mitigating potential unintended data access and enhancing system security. Additionally, the architecture incorporates a feedback mechanism that allows for the iterative refinement of system responses and continuous learning of the LLM. Moreover, the system adopts specific algorithms for tasks such as article similarity detection, sentiment analysis, and topic prediction, which contribute to its transparency and maintainability.

Our solution architecture presents critical advancements tailored to address contemporary challenges in conversational AI. Through the seamless amalgamation of KGs and LLMs, the design strikes an equilibrium between linguistic sophistication and factual veracity, obviating the shortcomings inherent to individual models. The employment of the Llama-2-Chat LLM enhances conversational fidelity, elevating the user interaction experience. By harnessing Neo4j for knowledge graph navigation, the architecture not only optimizes data retrieval but also exhibits adaptability to evolving data paradigms. The use of RBAC provides further guardrails for enhanced data security. Additionally, an iterative feedback mechanism ensures the system’s alignment with dynamic user needs. Our methodological stance, demonstrated through a tailored algorithmic approach, enhances transparency and mitigates the “black box” dilemma associated with deep learning-based models including LLMs. In essence, our architecture epitomizes a forward-thinking approach, synergizing linguistic depth with factual accuracy, all while underscoring user trust and systemic resilience.

6.1. Research Limitations

While our architecture has been rigorously tested and validated within the media and journalism context using the AI NewsHub platform, its performance and applicability in other industries remain to be explored. This could potentially limit the generalizability of our findings.

The employment of the Llama-2-Chat demonstrates promising conversational capabilities. However, its comprehensive adaptability and performance in diverse linguistic and cultural scenarios are yet to be thoroughly evaluated.

Furthermore, the iterative refinement enabled by our feedback loop is heavily dependent on user engagement and the quality of their input. This introduces the potential risk of bias or skewed learning if not carefully curated.

Lastly, our study focuses predominantly on the Neo4j database for navigating the KG. The exploration of benefits or challenges associated with alternative knowledge graph databases is beyond this research’s scope but warrants attention in future investigations.

6.2. Future Implications

The converging point of linguistic depth and factual accuracy, as epitomized by our system, is a promising indication of where conversational AI is headed. It sets a precedent for building architectures that are not only sophisticated in their response generation but also trustworthy in the information they convey. By addressing key challenges and providing tangible solutions, this work paves the way for future technologies that prioritize both efficiency and trustworthiness. Future endeavors could focus on expanding the architecture’s cross-domain adaptability, evaluating diverse LLMs, and refining feedback mechanisms in tandem with expanding knowledge graph databases.

7. Conclusions

This research elucidates the advancements and challenges in conversational AI, emphasizing the imperative balance between linguistic richness and factual accuracy. By introducing a novel architecture that synergistically integrates large language models and knowledge graphs, we provide a tangible solution to the current transparency and trust issues. Our comprehensive analysis of LLMs and their practical applications across various industries further contributes to the understanding and potential of trustworthy conversational AI. As we look ahead, this work not only addresses present challenges but also charts a promising trajectory for future conversational systems that prioritize efficiency, security, and user trust.

Author Contributions

Conceptualization, A.Z., V.B.P., C.L.V. and A.S.; methodology, A.Z., V.B.P., C.L.V., S.S., A.I.K. and A.S.; software, A.Z., V.B.P., C.L.V., S.S., A.I.K. and A.S.; validation, A.Z., V.B.P., C.L.V. and A.S.; formal analysis, A.Z., C.L.V. and A.S.; investigation, A.Z., V.B.P., C.L.V. and A.S.; resources, A.Z., V.B.P., C.L.V., S.S., A.I.K. and A.S.; data curation, A.Z., C.L.V. and A.S.; writing—original draft preparation, A.Z., V.B.P., C.L.V. and A.S.; writing—review and editing, A.Z., V.B.P., C.L.V., S.S., A.I.K. and A.S.; visualization, A.Z., V.B.P., C.L.V. and A.S.; project administration, A.S.; funding acquisition, A.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study are obtained from CeADAR’s AI NewsHub platform (https://ainewshub.ie/, accessed on 2 June 2024). A sample version of the processed dataset from AI NewsHub is available at HuggingFace https://huggingface.co/datasets/ceadar-ie/AIVision360-8k (accessed on 25 May 2024).

Conflicts of Interest

The authors declare no financial conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

A.L.I.C.E.	Artificial Linguistic Internet Computer Entity
BERT	Bidirectional Encoder Representations from Transformers
GPT	Generative pretrained transformer
KG	Knowledge graph
LLM	Large language model
NLP	Natural language processing
RBAC	Role-based access control
RLHF	Reinforcement learning from human feedback
T5	Text-to-text transfer transformer
XML	Extensible Markup Language

Appendix A. Methods and Training Process of LLMs

Large language models (LLMs) have become critical components of machine learning, enabling the prediction of a word or character sequence within a given language. This can be formalized as predicting the

N (i)

-th word given the preceding sequence up to

N (i - 1)

, as illustrated in Figure A1.

Training a large language model is typically a multi-stage process. It consists of two main stages: pretraining and supervised fine-tuning. Recent advancements have introduced a third stage: reinforcement learning from human feedback (RLHF). This stage has been observed to substantially enhance effectiveness for certain applications [58]. Figure A2 depicts this tri-phase approach to LLM training.

Figure A1. Illustration of LLM’s prediction based on previous inputs.

Figure A2. Three-stage training process of large language models (LLMs): Starting from an expansive pretraining on diverse data sources and utilizing the transformer architecture, transitioning into supervised fine-tuning with labeled datasets tailored for specific tasks, and culminating in dialogue optimization to refine AI–user interactions.

Appendix A.1. Pretraining

During pretraining, a vast amount of unlabeled data is gathered from diverse online sources, which might encompass trillions of tokens from articles, web pages, and books. Preprocessing procedures, such as tokenization, lowercasing, and special character removal, are applied. Tokenization might operate at the word or subword level. The popular choice for LLM training is the transformer architecture. This phase is computationally demanding, often requiring robust GPUs or TPUs and incurring significant time and cost.

Appendix A.2. Supervised Fine-Tuning

After pretraining, LLMs are fine-tuned using labeled datasets. They build upon the patterns discerned from pretraining, and refining for specific tasks. Notably, fine-tuning requires fewer data and less computational power than building models anew. It has exhibited proficiency across a multitude of NLP tasks [59,60], owing to the marriage of broad linguistic knowledge from pretraining with task-focused adjustments.

Appendix A.3. Dialogue Optimization

Dialogue optimization in artificial intelligence involves refining interactions between AI systems and users by focusing on several key aspects. Firstly, language understanding is crucial, as it enhances the AI’s ability to accurately interpret user inputs, including intent detection and semantic analysis. Additionally, response generation is developed to improve the model’s capability to produce contextually relevant and coherent responses tailored to individual user needs or varied as required. Context management is another vital element, maintaining a consistent conversational thread by tracking dialogue history and ensuring coherence throughout interactions. Error management involves implementing strategies to identify and correct errors and to handle ambiguous inputs effectively. Finally, integrating user feedback mechanisms is essential for continually improving the AI system, thereby elevating the overall user experience. These facets collectively contribute to the efficacy and sophistication of AI-driven dialogue systems.

Appendix A.4. Reward Model

Subsequent to supervised fine-tuning, the model produces an array of possible replies to a prompt. Human evaluators rank these based on quality. For further refinement, each response might be appended with a reward token. Although this process refines model outputs, it is labor-intensive and demands meticulous data curation.

Appendix A.5. Reinforcement Learning from Human Feedback (RLHF)

RLHF marries three steps: LLM pretraining, human preference-based reward model training, and reinforcement learning-guided LLM fine-tuning. Despite its efficacy, RLHF has its challenges, including the cost of obtaining human feedback and potential risks associated with producing detrimental or false content.

Appendix B. Technology Applications and Use Cases in Diverse Industrial Sectors

At present, the broad application potential of LLMs remains partially untapped due to concerns regarding data sensitivity and model ethics. However, the evolution of privacy-aware LLM models could redefine numerous industrial landscapes. A selection of potential applications across various sectors is elucidated in Appendix B. These include finance and risk management (Table A1), healthcare and diagnostics (Table A2), education and e-learning (Table A3), customer service and support (Table A4), marketing and sales (Table A5), human resources and recruitment (Table A6), legal and compliance (Table A7), and supply chain management (Table A8). Notably, these listed applications represent a mere fraction of the extensive possibilities domain-specific LLMs can offer.

Table A1. Applications of LLMs in finance and risk management.

Personalized Financial Advising: Enhancing customer interactions by offering bespoke advice and assistance.

Intelligent Support Chatbots: Addressing customer queries and assisting in real-time.

NLP in Financial Analysis: Parsing unstructured financial data for insights.

Market Insights and Sentiment Analysis: Gleaning consumer behavior and market trends from diverse sources.

Risk Assessment and Portfolio Optimization: Informing financial decisions through historical data analysis and prediction.

Regulatory Adherence: Assisting institutions in maintaining compliance by analyzing regulatory documentation.

Fraud Prevention: Identifying anomalies in transactions to detect potential fraud.

Real-Time Market Analytics: Providing current market sentiments and predictions.

Table A2. Applications of LLMs in healthcare.

Medical Data Synthesis: Extracting insights from diverse medical datasets for informed decisions.

Clinical Decision Assistance: Offering patient-centric insights for diagnostic and therapeutic interventions.

Medical Imaging Insights: Detecting anomalies in radiological images.

NLP in Medical Documentation: Assisting with transcription, language translation, and information retrieval.

Predictive Health Analytics: Identifying disease risks and facilitating early interventions.

Drug Discovery Support: Assisting in potential drug target identification and efficacy prediction.

Patient Education and Virtual Assistance: Personalizing medical education and support for patients.

Real-Time Patient Monitoring: Providing alerts based on patient data analysis.

Table A3. Applications of LLMs in education and e-Learning.

Customized Learning Paths: Adapting content to cater to unique student requirements.

Intelligent Tutoring Systems: Offering interactive guidance and adaptive feedback.

Content Generation: Creating and adapting educational resources.

Language Learning Support: Facilitating exercises, translation, and pronunciation guides.

Automated Evaluation: Grading assignments with detailed feedback.

Dynamic Assessments: Generating assessments tailored to student performance.

Resource Recommendations: Suggesting pertinent educational materials.

Gamified Learning Experiences: Creating engaging, interactive scenarios and simulations.

Table A4. Applications of LLMs in customer service and support.

Personalized Recommendations and Upselling: Analyzing customer data to provide personalized product recommendations, suggest complementary items, and support upselling and cross-selling efforts.

Sentiment Analysis and Customer Insights: Analyzing customer feedback and social media conversations to extract sentiment, provide actionable insights, and enhance products, services, and customer experience.

Self-Service Support and Knowledge Base: Assisting in creating and maintaining self-service support systems and knowledge bases, generating FAQs, troubleshooting, and offering step-by-step instructions for customers.

Voice and Speech Recognition: Contributing to voice and speech recognition technologies, accurately transcribing and interpreting customer voice inputs, enabling voice-controlled self-service support and interactions.

Call Center Support: Aiding call center agents by providing real-time suggestions, relevant information, and access to knowledge bases during customer calls, enhancing efficiency and customer satisfaction.

Language Translation and Multilingual Support: Facilitating language translation in customer service interactions, overcoming language barriers, and supporting multilingual customer service.

Email and Ticket Management: Assisting in managing customer support emails and tickets, categorizing inquiries, prioritizing urgent cases, and suggesting appropriate responses.

Customer Feedback Analysis: Analyzing customer feedback surveys, reviews, and ratings to extract insights, understand preferences, and make data-driven decisions for product or service enhancements.

Complaint Resolution and Escalation: Assisting in complaint resolution by providing guidelines, suggested responses, or escalation procedures to customer service agents, ensuring consistent and appropriate handling.

Table A5. Applications of LLMs in marketing and sales.

Content Generation: Assisting in generating marketing content, suggesting ideas, and automating parts of the content creation process.

Personalized Marketing Campaigns: Analyzing customer data to create personalized marketing campaigns and tailored messages based on demographics and preferences.

Market Research and Trend Analysis: Assisting in market research by analyzing industry reports, customer reviews, and social media conversations to identify trends, preferences, and competitive insights.

Customer Profiling and Segmentation: Analyzing customer data to generate detailed customer profiles and segment customers based on various attributes.

Social Media Listening and Engagement: Monitoring social media platforms, tracking brand mentions, engaging with customers, and responding to inquiries or comments.

Lead Generation and Qualification: Analyzing customer data to identify potential leads, assess lead quality, predict purchase intent, and prioritize leads for follow-up.

Sales Support and Product Information: Providing sales teams with real-time product information, pricing details, and assistance in answering customer queries and objections.

Customer Journey Mapping and Optimization: Analyzing customer interactions, feedback, and behavioral data to map the customer journey, identify pain points, and optimize the customer experience.

Marketing Automation and Personalization: Powering marketing automation platforms, segmenting audiences, crafting personalized campaigns, and delivering dynamic content based on user preferences.

Table A6. Applications of LLMs in human resources and recruitment.

Resume Screening and Candidate Evaluation: Analyzing and evaluating candidate qualifications, skills, and experience against job requirements, shortlisting candidates based on predefined criteria.

Job Description Generation: Generating comprehensive job descriptions based on industry trends, suggesting appropriate job titles, responsibilities, and qualifications.

Candidate Sourcing and Talent Pooling: Searching through candidate data to identify potential candidates based on specific skills, experience, or qualifications, assisting in building talent pools.

Interview Assistance: Providing interview guidance, suggesting questions, evaluation criteria, and offering insights into candidate background information.

Employee Onboarding and Orientation: Supporting employee onboarding by providing resources and policies and generating onboarding materials such as welcome emails and orientation guides.

HR Policy and Compliance Guidance: Assisting with HR policy guidance, answering HR-related questions, interpreting employment laws, and ensuring compliance with regulations and standards.

Employee Engagement and Performance Management: Aiding in employee engagement and performance management by generating surveys, collecting feedback, and providing insights for improvement.

HR Knowledge Base and Self-Service Support: Contributing to HR knowledge bases, generating responses to HR questions, and offering guidance on policies, procedures, benefits, and programs.

Employee Performance Analysis: Analyzing employee performance data, identifying patterns, strengths, areas for improvement, and development opportunities, aiding in performance evaluations.

Employee Satisfaction Surveys: Generating employee satisfaction surveys, analyzing collected data, and providing insights to improve employee engagement, satisfaction, and retention.

Employee Training and Development: Contributing to employee training initiatives, generating training materials, courses, and resources to enhance effectiveness and scalability.

Diversity and Inclusion Initiatives: Supporting diversity and inclusion efforts by analyzing employee demographics, suggesting strategies, and generating diversity reports.

Employee Exit Interviews and Feedback Analysis: Automating exit interviews, analyzing employee feedback, identifying areas for improvement, and providing insights for addressing concerns and retention.

HR Analytics and Predictive Modeling: Analyzing HR data to derive insights, predict trends, and assist in workforce planning, succession planning, and talent management strategies.

Employee Benefits and Wellness Programs: Providing information and guidance on employee benefits, wellness initiatives, and resources to promote employee well-being.

HR Compliance and Ethics: Assisting with HR compliance, providing information on employment laws, anti-discrimination policies, and ethical guidelines.

Table A7. Applications of LLMs in legal sector and compliance.

Legal Research and Case Analysis: Assisting in legal research, extracting insights from legal documents, case law, statutes, and regulatory materials, and supporting attorneys in case analysis and preparation.

Contract Analysis and Due Diligence: Reviewing contracts, extracting key terms, conditions, obligations, and risks, aiding in due diligence processes, and identifying potential legal issues.

Compliance Monitoring and Risk Assessment: Monitoring for regulatory compliance, risk assessment, and suggesting mitigative actions based on analysis of internal processes and external regulations.

Legal Document Generation: Assisting in the generation of legal documents, contracts, agreements, and other legal materials by suggesting standardized language and clauses.

E-Discovery and Document Review: Aiding in e-discovery processes, categorizing, and tagging legal documents, identifying relevant information, and optimizing the document review process.

Legal Knowledge Management: Assisting in organizing and accessing legal knowledge resources, suggesting appropriate references, and facilitating the retrieval of information.

Legal Compliance Training and Support: Generating training materials for legal compliance, offering guidance on regulatory requirements, and assisting in creating compliance programs.

Legal Language Translation and Interpretation: Assisting in translating legal documents and terminology across languages, supporting multilingual legal practices, and ensuring accurate interpretation.

Legal Chatbots and Virtual Assistants: Providing general legal guidance, answering common legal queries, and offering initial consultation through virtual platforms and chatbots.

Dispute Resolution and Mediation Support: Assisting legal professionals in dispute resolution and mediation processes, suggesting strategies, providing relevant case law, and offering insights.

Table A8. Applications of LLMs in supply chain and manufacturing.

Supply Chain Optimization: Assisting in supply chain optimization efforts, suggesting strategies, analyzing logistics and distribution networks, and providing insights for streamlined operations.

Quality Control and Defect Detection: Assisting in quality control processes by detecting anomalies in production data, identifying potential product defects, and suggesting corrective actions.

Predictive Maintenance: Analyzing equipment and machinery data, predicting potential failures, suggesting maintenance schedules, and optimizing equipment uptime.

Demand Forecasting and Inventory Management: Analyzing sales data, historical demand, and other external factors to predict product demand, aiding in inventory optimization, and reducing stockouts or overstock situations.

Supplier Relationship Management: Evaluating supplier performance, identifying risks or inefficiencies, and providing insights for better supplier collaboration and relationship management.

Product Design and Development: Offering insights for product design based on market trends, customer feedback, and competitive analysis, aiding in product innovation and development.

Sustainability and Environmental Impact: Analyzing supply chain data to suggest environmental sustainability initiatives, reduce carbon footprints, and align operations with sustainability goals.

Supply Chain Risk Management: Identifying potential risks in the supply chain, suggesting risk-mitigative strategies, and ensuring consistent supply chain operations.

Production Planning and Scheduling: Assisting in production planning, suggesting optimized schedules based on demand forecasts, resource availability, and operational constraints.

Lean Manufacturing and Process Improvement: Analyzing manufacturing processes, identifying inefficiencies, suggesting lean strategies, and providing insights for process improvement and optimization.

Figure A3. A node in the Neo4j knowledge graph showcasing the relationship structure and data feeding process from the AI NewsHub dataset.

References

Biswas, S.S. Role of chat gpt in public health. Ann. Biomed. Eng. 2023, 51, 868–869. [Google Scholar] [CrossRef] [PubMed]
Cascella, M.; Montomoli, J.; Bellini, V.; Bignami, E. Evaluating the feasibility of ChatGPT in healthcare: An analysis of multiple clinical and research scenarios. J. Med. Syst. 2023, 47, 33. [Google Scholar] [CrossRef] [PubMed]
Bin Sawad, A.; Narayan, B.; Alnefaie, A.; Maqbool, A.; Mckie, I.; Smith, J.; Yuksel, B.; Puthal, D.; Prasad, M.; Kocaballi, A.B. A systematic review on healthcare artificial intelligent conversational agents for chronic conditions. Sensors 2022, 22, 2625. [Google Scholar] [CrossRef] [PubMed]
Moore, S.; Bulmer, S.; Elms, J. The social significance of AI in retail on customer experience and shopping practices. J. Retail. Consum. Serv. 2022, 64, 102755. [Google Scholar] [CrossRef]
Leung, C.H.; Yan Chan, W.T. Retail chatbots: The challenges and opportunities of conversational commerce. J. Digit. Soc. Media Mark. 2020, 8, 68–84. [Google Scholar]
Lewis, S.C.; Guzman, A.L.; Schmidt, T.R. Automation, journalism, and human–machine communication: Rethinking roles and relationships of humans and machines in news. Digit. J. 2019, 7, 409–427. [Google Scholar] [CrossRef]
Veglis, A.; Maniou, T.A. Chatbots on the rise: A new narrative in journalism. Stud. Media Commun. 2019, 7, 1–6. [Google Scholar] [CrossRef]
Yue, T.; Au, D.; Au, C.C.; Iu, K.Y. Democratizing financial knowledge with ChatGPT by OpenAI: Unleashing the Power of Technology. 2023. Available online: https://ssrn.com/abstract=4346152 (accessed on 2 June 2024).
Zhang, X.P.S.; Kedmey, D. A budding romance: Finance and AI. IEEE Multimed. 2018, 25, 79–83. [Google Scholar] [CrossRef]
Atlas, S. ChatGPT for Higher Education and Professional Development: A Guide to Conversational AI 2023. Available online: https://digitalcommons.uri.edu/ (accessed on 2 June 2024).
Dwivedi, Y.K.; Kshetri, N.; Hughes, L.; Slade, E.L.; Jeyaraj, A.; Kar, A.K.; Baabdullah, A.M.; Koohang, A.; Raghavan, V.; Ahuja, M.; et al. “So what if ChatGPT wrote it?” Multidisciplinary perspectives on opportunities, challenges and implications of generative conversational AI for research, practice and policy. Int. J. Inf. Manag. 2023, 71, 102642. [Google Scholar] [CrossRef]
Weizenbaum, J. ELIZA—a computer program for the study of natural language communication between man and machine. Commun. ACM 1966, 9, 36–45. [Google Scholar] [CrossRef]
Colby, K.M. Modeling a paranoid mind. Behav. Brain Sci. 1981, 4, 515–534. [Google Scholar] [CrossRef]
Ferrucci, D.; Brown, E.; Chu-Carroll, J.; Fan, J.; Gondek, D.; Kalyanpur, A.A.; Lally, A.; Murdock, J.W.; Nyberg, E.; Prager, J.; et al. Building Watson: An overview of the DeepQA project. AI Mag. 2010, 31, 59–79. [Google Scholar] [CrossRef]
Wallace, R.S. The Anatomy of ALICE; Springer: Berlin/Heidelberg, Germany, 2009. [Google Scholar]
Radford, A.; Narasimhan, K.; Salimans, T.; Sutskever, I. Improving Language Understanding with Unsupervised Learning. 2018. Available online: https://s3-us-west-2.amazonaws.com/openai-assets/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 2 June 2024).
SmarterChild Was a Chatbot Available on AOL Instant Messenger and Windows Live Messenger. Wikipedia 2023. Page Version I D: 1151827590. Available online: https://en.wikipedia.org/wiki/SmarterChild (accessed on 2 June 2024).
Siri is an Easy Way to Make Calls, Send Texts, Use Apps, and Get Things Done with Just Your Voice. Available online: https://www.apple.com/siri/ (accessed on 2 June 2024).
Meet Your Google Assistant. Ask It Questions. Tell It to Do Things. Available online: https://assistant.google.com/ (accessed on 2 June 2024).
Cortana Helps You Achieve More with Less Effort. Cortana—Your Personal Productivity Assistant. Available online: https://www.microsoft.com/en-us/cortana (accessed on 2 June 2024).
Amazon Alexa. Available online: https://alexa.amazon.com/ (accessed on 2 June 2024).
OpenAI. 2022. Available online: https://openai.com/chatgpt (accessed on 2 June 2024).
Xian, Y.; Lampert, C.H.; Schiele, B.; Akata, Z. Zero-shot learning—A comprehensive evaluation of the good, the bad and the ugly. IEEE Trans. Pattern Anal. Mach. Intell. 2018, 41, 2251–2265. [Google Scholar] [CrossRef] [PubMed]
Wang, Y.; Yao, Q.; Kwok, J.T.; Ni, L.M. Generalizing from a few examples: A survey on few-shot learning. ACM Comput. Surv. (csur) 2020, 53, 1–34. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 5485–5551. [Google Scholar]
Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling language modeling with pathways. arXiv 2022, arXiv:2204.02311. [Google Scholar]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Open AI GPT-4. Available online: https://openai.com/gpt-4 (accessed on 2 June 2024).
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Advances Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
OpenAI. 2022. Available online: https://platform.openai.com/docs/models/gpt-3-5 (accessed on 2 June 2024).
Huang, J.; Chang, K.C.C. Towards reasoning in large language models: A survey. arXiv 2022, arXiv:2212.10403. [Google Scholar]
Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Barnes, N.; Mian, A. A comprehensive overview of large language models. arXiv 2023, arXiv:2307.06435. [Google Scholar]
Chang, Y.; Wang, X.; Wang, J.; Wu, Y.; Yang, L.; Zhu, K.; Chen, H.; Yi, X.; Wang, C.; Wang, Y.; et al. A survey on evaluation of large language models. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–45. [Google Scholar] [CrossRef]
LLM Explorer: Explore 18,000+ Large Language Models and Compare Their Parameters. Available online: https://llm.extractum.io/ (accessed on 2 June 2024).
Zafar, A. LLMXplorer: Large Language Model Explorer. CeADAR Connect Group 2023. [Google Scholar] [CrossRef]
Maciocci, G. Google Is Afraid of Lamas: How Open Source is Democratizing Large Language Models. 2023. Available online: https://medium.com/@guidomaciocci/google-is-afraid-of-lamas-how-open-source-is-democratizing-large-language-models-b3ca74e9a2e8 (accessed on 2 June 2024).
Hu, E.J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; Chen, W. Lora: Low-rank adaptation of large language models. arXiv 2021, arXiv:2106.09685. [Google Scholar]
Rillig, M.C.; Ågerstrand, M.; Bi, M.; Gould, K.A.; Sauerland, U. Risks and Benefits of Large Language Models for the Environment. Environ. Sci. Technol. 2023, 57, 3464–3466. [Google Scholar] [CrossRef] [PubMed]
Abdoullaev, A. The Dark Side of Large Language Models: Mitigating the Risks of AI Text Generation. 2023. Available online: https://www.bbntimes.com/science/the-dark-side-of-large-language-models-mitigating-the-risks-of-ai-text-generation (accessed on 2 June 2024).
The Artificial Intelligence Act. 2021. Available online: https://artificialintelligenceact.eu/ (accessed on 2 June 2024).
Intellectual Property Considerations. Available online: https://www.linkedin.com/pulse/unleashing-power-large-language-models-omer-ali-phd (accessed on 2 June 2024).
Weidinger, L.; Mellor, J.; Rauh, M.; Griffin, C.; Uesato, J.; Huang, P.S.; Cheng, M.; Glaese, M.; Balle, B.; Kasirzadeh, A.; et al. Ethical and social risks of harm from language models. arXiv 2021, arXiv:2112.04359. [Google Scholar]
Research, S. Natural Language Processing Market Size Is Projected to Reach USD 91 Billion by 2030: Straits Research. 2022. Available online: https://straitsresearch.com/report/natural-language-processing-market (accessed on 2 June 2024).
Fensel, D.; Şimşek, U.; Angele, K.; Huaman, E.; Kärle, E.; Panasiuk, O.; Toma, I.; Umbrich, J.; Wahler, A. Introduction: What Is a Knowledge Graph? In Knowledge Graphs: Methodology, Tools and Selected Use Cases; Springer International Publishing: Cham, Switzerland, 2020; pp. 1–10. [Google Scholar] [CrossRef]
Pan, S.; Luo, L.; Wang, Y.; Chen, C.; Wang, J.; Wu, X. Unifying Large Language Models and Knowledge Graphs: A Roadmap. arXiv 2023, arXiv:2306.08302. [Google Scholar] [CrossRef]
Hutto, C.; Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proceedings of the International AAAI Conference on Web and Social Media, Buffalo, NY, USA, 3–6 June 2014; Volume 8, pp. 216–225. [Google Scholar] [CrossRef]
Joulin, A.; Grave, E.; Bojanowski, P.; Mikolov, T. Bag of Tricks for Efficient Text Classification. In Proceedings of 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume 2, Short Papers, Valencia, Spain, 3–7 April 2017; Association for Computational Linguistics: Valencia, Spain, 2017; Volume 2, pp. 427–431. [Google Scholar]
Neo4j Graph Database & Analytics—The Leader in Graph Databases. Available online: https://neo4j.com/ (accessed on 2 June 2024).
Cypher Query Language—Developer Guides. Available online: https://neo4j.com/docs/cypher-manual/current/introduction/ (accessed on 2 June 2024).
Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
Ainslie, J.; Lee-Thorp, J.; de Jong, M.; Zemlyanskiy, Y.; Lebrón, F.; Sanghai, S. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. arXiv 2023, arXiv:2305.13245. [Google Scholar]
Moskvichev, A.; Odouard, V.V.; Mitchell, M. The ConceptARC Benchmark: Evaluating Understanding and Generalization in the ARC Domain. arXiv 2023, arXiv:2305.07141. [Google Scholar]
Zellers, R.; Holtzman, A.; Bisk, Y.; Farhadi, A.; Choi, Y. Hellaswag: Can a machine really finish your sentence? arXiv 2019, arXiv:1905.07830. [Google Scholar]
Hendrycks, D.; Burns, C.; Basart, S.; Zou, A.; Mazeika, M.; Song, D.; Steinhardt, J. Measuring massive multitask language understanding. arXiv 2020, arXiv:2009.03300. [Google Scholar]
Lin, S.; Hilton, J.; Evans, O. Truthfulqa: Measuring how models mimic human falsehoods. arXiv 2021, arXiv:2109.07958. [Google Scholar]
Bai, Y.; Jones, A.; Ndousse, K.; Askell, A.; Chen, A.; DasSarma, N.; Drain, D.; Fort, S.; Ganguli, D.; Henighan, T.; et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv 2022, arXiv:2204.05862. [Google Scholar]
Peters, M.E.; Neumann, M.; Iyyer, M.; Gardner, M.; Clark, C.; Lee, K.; Zettlemoyer, L. Deep Contextualized Word Representations. In Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long Papers), New Orleans, LA, USA, 1–6 June 2018; pp. 2227–2237. [Google Scholar] [CrossRef]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. Roberta: A robustly optimized bert pretraining approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]

Figure 1. Timeline summarizing the key product developments in conversational systems.

Figure 2. Comparative analysis of annual releases: open-source (n = 134) vs. closed-source (n = 72) LLMs.

Figure 3. Timeline of open-source LLMs (n = 134). The x-axis displays year-month, while the y-axis shows vertical stacks depicting the total number of models (in each stack) released within one month. Models denoted with same colors represents them having a release data in the same month.

Figure 4. Timeline of closed-source LLMs (n = 72). The x-axis displays year-month, while the y-axis shows vertical stacks depicting the total number of models (in each stack) released within one month represented by same color in a stack.

Figure 5. Open-source LLMs developed by companies (n = 134). The chart illustrates the percentage distribution of models developed by various companies.

Figure 6. Closed-source LLMs developed by companies (n = 72). The chart illustrates the percentage distribution of models developed by various companies.

Figure 7. Open-source LLMs categorized by the originating country of the developer company (n = 134). The pie chart segments are labeled with the respective country names and their corresponding percentage contributions.

Figure 8. Closed-source LLMs categorized by the originating country of the developer company (n = 72). The pie chart segments are labeled with the respective countries’ names and their corresponding percentage contributions.

Figure 9. Distribution of number of parameters (in billions) in open- and closed-source LLMs (n = 134 for open-source, n = 72 for closed-source). The x-axis depicts the parameter count in billions (B), while the y-axis represents the frequency (i.e., the number of models within each histogram bin). The solid line (KDE) illustrates the kernel density estimate, offering a smoothed distribution shape insight.

Figure 10. Simplified functional workflow for combined knowledge graph and LLM.

Figure 11. Functional architecture illustrating the integration of large language models (LLMs), knowledge graphs (KGs), and role-based access control (RBAC).

Table 1. Factors driving the LLM market growth.

Factors	Description
Enterprise Adoption	Organizations utilize LLMs to optimize customer service, automate operations, and enhance efficiency. Sectors such as e-commerce, healthcare, and finance are primary beneficiaries.
AI Industry Growth	LLMs, central to AI initiatives, benefit from the holistic expansion of the artificial intelligence sector, accentuated by deep learning advancements.
Language-as-a-Service (LaaS) Emergence	LaaS platforms provide ready-to-use language models and APIs, enabling enterprises to harness language capabilities without intensive model training.
R&D Momentum	Continuous advancements in NLP and machine learning propel the market. Innovations in model design, training techniques, and performance optimization catalyze industrial interest.

Table 2. Opportunities in LLM evolution.

Opportunity	Description
Bias and Fairness	Develop techniques to detect and neutralize biases in LLMs, ensuring outputs that respect diverse perspectives.
Ethical Guidelines	Forge ethical standards and frameworks for the responsible creation and deployment of LLMs.
Explainability	Innovate methods to augment LLM transparency and interpretability, clarifying decision-making processes for users.
Contextual Accuracy	Enhance LLMs’ contextual comprehension to produce more relevant and precise responses.
Collaborative Efforts	Foster collaborations amongst stakeholders, like researchers and policymakers, to address LLM-related societal and ethical implications.

Table 3. Architecture evaluation for information retrieval and contextual understanding.

Metric	Proposed Solution	Approach A	Approach B
Information Retrieval Accuracy (%)	95.4	60.3	72.3
Contextual Understanding Accuracy (%)	93.8	70.2	75.1

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zafar, A.; Parthasarathy, V.B.; Van, C.L.; Shahid, S.; Khan, A.I.; Shahid, A. Building Trust in Conversational AI: A Review and Solution Architecture Using Large Language Models and Knowledge Graphs. Big Data Cogn. Comput. 2024, 8, 70. https://doi.org/10.3390/bdcc8060070

AMA Style

Zafar A, Parthasarathy VB, Van CL, Shahid S, Khan AI, Shahid A. Building Trust in Conversational AI: A Review and Solution Architecture Using Large Language Models and Knowledge Graphs. Big Data and Cognitive Computing. 2024; 8(6):70. https://doi.org/10.3390/bdcc8060070

Chicago/Turabian Style

Zafar, Ahtsham, Venkatesh Balavadhani Parthasarathy, Chan Le Van, Saad Shahid, Aafaq Iqbal Khan, and Arsalan Shahid. 2024. "Building Trust in Conversational AI: A Review and Solution Architecture Using Large Language Models and Knowledge Graphs" Big Data and Cognitive Computing 8, no. 6: 70. https://doi.org/10.3390/bdcc8060070

APA Style

Zafar, A., Parthasarathy, V. B., Van, C. L., Shahid, S., Khan, A. I., & Shahid, A. (2024). Building Trust in Conversational AI: A Review and Solution Architecture Using Large Language Models and Knowledge Graphs. Big Data and Cognitive Computing, 8(6), 70. https://doi.org/10.3390/bdcc8060070

Article Menu

Building Trust in Conversational AI: A Review and Solution Architecture Using Large Language Models and Knowledge Graphs

Abstract

1. Introduction

1.1. History and Evolution of Conversational AI

1.2. Development and Advancement of LLMs

1.3. Key Challenges of LLMs

1.4. Main Contributions and Paper Organization

2. Comprehensive Review of State-of-the-Art LLMs

2.1. LLMXplorer: Large Language Model Explorer

2.2. Insights from LLMXplorer

3. Applied and Technology Implications for LLMs

3.1. Social and Ethical Implications

3.2. Legal, Privacy, and Regulatory Perspective

3.3. Physiological Perspectives

4. Market Analysis of LLMs and Cross-Industry Use Cases

4.1. Market Size of LLMs and Driving Factors

4.2. LLM Development Opportunities

5. Solution Architecture for Privacy-Aware and Trustworthy Conversational AI

5.1. System Components and Design Choices

5.1.1. AI NewsHub Dataset Description

5.1.2. Construction of the AI NewsHub-Based KG

5.1.3. Llama-2 LLM

5.2. Architecture Workflow

5.2.1. Interactions between LLM and KG

5.2.2. Autonomous LLM Operations

5.3. Architecture Evaluation for Selective Tasks

6. Discussions

6.1. Research Limitations

6.2. Future Implications

7. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A. Methods and Training Process of LLMs

Appendix A.1. Pretraining

Appendix A.2. Supervised Fine-Tuning

Appendix A.3. Dialogue Optimization

Appendix A.4. Reward Model

Appendix A.5. Reinforcement Learning from Human Feedback (RLHF)

Appendix B. Technology Applications and Use Cases in Diverse Industrial Sectors

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI