Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity

Szmurlo, Hannah; Akhtar, Zahid

doi:10.3390/info15080443

Open AccessReview

Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity

by

Hannah Szmurlo

and

Zahid Akhtar

^*

Department of Network and Computer Security, State University of New York (SUNY) Polytechnic Institute, Utica, NY 13502, USA

^*

Author to whom correspondence should be addressed.

Information 2024, 15(8), 443; https://doi.org/10.3390/info15080443

Submission received: 7 June 2024 / Revised: 18 July 2024 / Accepted: 26 July 2024 / Published: 29 July 2024

(This article belongs to the Special Issue Information Processing in Multimedia Applications)

Download

Browse Figures

Versions Notes

Abstract

:

Advancements in artificial intelligence, machine learning, and natural language processing have culminated in sophisticated technologies such as transformer models, generative AI models, and chatbots. Chatbots are sophisticated software applications created to simulate conversation with human users. Chatbots have surged in popularity owing to their versatility and user-friendly nature, which have made them indispensable across a wide range of tasks. This article explores the dual nature of chatbots in the realm of cybersecurity and highlights their roles as both defensive tools and offensive tools. On the one hand, chatbots enhance organizational cyber defenses by providing real-time threat responses and fortifying existing security measures. On the other hand, adversaries exploit chatbots to perform advanced cyberattacks, since chatbots have lowered the technical barrier to generate phishing, malware, and other cyberthreats. Despite the implementation of censorship systems, malicious actors find ways to bypass these safeguards. Thus, this paper first provides an overview of the historical development of chatbots and large language models (LLMs), including their functionality, applications, and societal effects. Next, we explore the dualistic applications of chatbots in cybersecurity by surveying the most representative works on both attacks involving chatbots and chatbots’ defensive uses. We also present experimental analyses to illustrate and evaluate different offensive applications of chatbots. Finally, open issues and challenges regarding the duality of chatbots are highlighted and potential future research directions are discussed to promote responsible usage and enhance both offensive and defensive cybersecurity strategies.

Keywords:

chatbots; artificial intelligence; machine learning; deep learning; large language models; cybersecurity

1. Introduction

Recent advances in machine learning (ML) and artificial intelligence (AI) are allowing computers to think and perform in a way like the human brain. Thus, they are widely being employed in different applications such as natural language processing (NLP), image generation, and speech recognition, to name a few [1]. Moreover, a particular type of ML, reinforcement learning, which rewards or punishes a model based on its decisions, is used in technologies such as chatbots. The development of the transformer architecture using attention mechanisms significantly advanced AI [2]. Deep learning-based transformer models excel at sequence-to-sequence tasks, making them ideal for language understanding and NLP, including translation and generation. These models have notably improved NLP. Furthermore, multimodal AI with NLP, which can process and output multiple modalities like text and images, is gaining popularity in technologies like chatbots and smart glasses. Multimodal AI combined with NLP can be easily used in applications such as chatbots for tasks ranging from art generation to information extraction [3].

According to a poll conducted by the authors of [4], experts predict AI will progress continuously and super-intelligence AI could emerge in 20 to 30 years [5]. While AI sophistication is high, matching human intelligence may take longer due to biological differences. Until AI can mimic these biological aspects, it will not perform all tasks at a human level [6]. However, AI can still interact in a humanoid manner called human-aware AI. High-level machine intelligence (HLMI) refers to AI that can perform at the level of a human. To reach HLMI, AI must operate more efficiently and cost-effectively than humans by automating all possible occupations and producing higher-quality work. Highly variable estimates suggest that HLMI could be achieved in 45 to 60 years [7]. Rather than developing AI to match human power, focusing on AI that complements human capabilities would be more feasible and beneficial to bridge gaps in human cognitive intelligence and enhance human–AI collaboration [6].

1.1. What Are Chatbots?

A chatbot is a sophisticated program capable of engaging in human-like conversation with users [8]. By mimicking human conversational patterns and interpreting spoken or written language, chatbots can effectively interact with users across different communication modalities like voice and images. Initially depending on simple pattern-matching techniques and skilled at a small set of tasks, modern chatbots have evolved remarkably with the integration of AI, ML, and NLP technologies [9]. These advancements enable chatbots to understand the conversation’s context, tailor responses to current conversations, and emulate human speech with significant fidelity. Furthermore, now chatbots can perform a range of tasks such as information retrieval, education and training, healthcare report preparation, threat intelligence analysis, and incident response [9].

1.2. The Turing Test and the Inception of Chatbots

In 1950, A. Turing proposed the “Imitation Game” to explore if a computer could mimic human behavior convincingly enough to be mistaken for a human during interaction [10]. This test is known as “Turing Test”, which involves a human interrogator conversing with both a machine and a human. The human interrogator tries to distinguish between them based on their responses. If the interrogator mistakes the machine for a human, the machine passes the test [10]. Turing’s work laid the foundation for the development of chatbots [11]. A MIT researcher developed one of the earliest fully functional chatbots, called ELIZA [12]. ELIZA was modeled after psychotherapy conversations.

1.3. Modern Chatbots and Their Uses

Recent advancements in chatbots offer real-time programming assistance. For instance, GitHub’s Copilot was released in 2022 and is now integrated into GitHub’s IDE extension and CLI. Copilot also has a built-in chatbot, Copilot Chat, which leverages ML and NLP to fetch programming-related answers from online sources. However, Copilot Chat can only answer programming questions [13]. Microsoft’s release of Copilot in February 2023 replaced Cortana and aimed at integrating LLMs into the Microsoft suite for personalized assistance. Copilot, available as a chatbot, offers six specialized GPTs for tasks such as cooking or vacation planning [14]. ChatGPT, which was released in November 2022, is a conversational chatbot trained using reinforcement learning from human feedback (RLHF). It collects data from user prompts, trains a reward model, and generates optimal policies based on this model [15]. It offers free and paid tiers, with the latter using a GPT-4o model and providing additional tools. Gemini (formerly Bard) was launched in 2023, utilizing the language model for dialogue applications (LaMDA) model family. Gemini can access up to date information from the Internet and has no information cutoff date [16]. Anthropic’s Claude was released in March 2023 and offers two versions, i.e., Claude, the classic chatbot, and Claude Instant, a lighter, faster, and more affordable option [17]. Both versions are configurable in tone, personality, and behavior.

Modern chatbots have various positive applications. They enhance individual self-sufficiency, bolstering confidence and independence. Moreover, they democratize access to knowledge across different platforms [18]. In cybersecurity, they play a defensive role, offering real-time threat responses and fortifying existing defenses while inspiring new ones. Furthermore, industries such as health, education, customer service, and transportation have all reaped benefits from chatbot integration [19,20,21]. Adversaries are using chatbots for attacks like social engineering, phishing, malware, and ransomware [22]. Chatbots often fail to protect users’ personal information, and there is a risk of attackers forcing them to generate contents using other users’ personal information. Moreover, while chatbots can be a powerful educational tool, they can also facilitate plagiarism by generating entire assignments and adversely affecting students [20].

1.4. Contributions of This Article

This paper presents several unique contributions, including (but not limited to) the following:

We present a comprehensive history of LLMs and chatbots, with a focus on how key developments in LLMs play a significant part in the functionality of chatbots. It is worth noting that most previous survey works (e.g., [9,23]) mainly presented a history of chatbots but neither covered cybersecurity aspects efficiently nor properly explored how LLMs and chatbots are interconnected in the technology and application domains.
A comprehensive literature review explores attacks on chatbots, attacks using chatbots, defenses for chatbots, and defenses using chatbots.
We offer experimental analyses for several offensive applications of chatbots such as malware generation, phishing attacks, and buffer overflow attacks.
We provide some suggestions for enhancing prior research efforts and advancing the technologies addressed in this article.
A discussion of open issues and potential future research directions on chatbots, LLM, and duality of chatbots are provided.
This document is intended to supplement prior survey papers by incorporating the latest advancements, fostering a comprehensive grasp of chatbots and LLM domains for the reader. Moreover, this document serves as a foundation for the development of innovative approaches to address the duality of chatbots through both offensive and defensive strategies.

The structure of this paper is as follows. The history of chatbots and large language models is outlined in Section 2. Section 3 delves into the technical functionalities of large language models and chatbots. Section 4 presents positive and negative applications of chatbots. In Section 5, attacks on chatbots and attacks using chatbots are summarized together with a literature review of existing works to provide context. Section 6 discusses defenses for chatbots and defenses using chatbots with existing works. The limitations of LLMs (chatbots) and surveyed case studies in this article are highlighted in Section 7. The practical experiments in Section 8 demonstrate known jailbreaking prompts and explore the use of chatbots in generating cyberattacks. In Section 9, the open issues and potential future directions of chatbots (LLMs) are discussed. Finally, the conclusions are elucidated in Section 10.

2. History of Chatbots and Large Language Models

This section presents the history of chatbots and LLMs. First, we discuss key developments in LLMs and how they are related to chatbots. We then discuss the history of famous chatbots and their significance. Figure 1 provides a visual depiction of the discussed LLM timeline. Table 1 summarizes the key developments in LLMs.

2.1. 1940s: First Mention of n-Gram Models

The n-gram model was first mentioned in [24]. For every (n-1)-gram found in training data, an n-gram language model computes the relative frequency counts of words that follow it. A common application of n-gram models in chatbots and LLMs is intent detection.

2.2. 1950s: Origins of and Early Developments in Natural Language Processing

The origins of NLP can be dated back as far as the 1950s. Turing’s Imitation Game served as inspiration for AI, NLP, and computer evolution. One of the earliest endeavors in NLP was the Georgetown IBM experiment. This experiment took place in 1954 during the Cold War, when there was a need to translate Russian text to English text. The Russian text was translated into English using an IBM computer by employing a limited vocabulary and adhering to basic grammar rules. Sentences were translated one at a time. The Georgetown IBM experiment was significant as it established that the machine translation problem was a decision problem with two types of decisions [25]. This experiment showed the potential of using computers for natural language understanding. The concept of Syntactic Structures was published in 1957 in [26], where the author observed that for a computer to understand a language, the language’s sentence structure would have to be modified. As a result of this finding, the author in [26] proposed the idea of phrase-structure grammar, which translated natural language into a format that computers could understand [27]. NLP is key to the functionality of both historical and modern chatbots. It allows chatbots to both understand and respond to human speech. NLP makes the text generated by chatbots human-like, making the user feel as if they are conversing with another human being.

2.3. 1960s: ELIZA

The first chatbot, ELIZA, was developed by an MIT scientist in 1966. When engaged in a conversation, ELIZA assumed the role of a psychotherapist and the user assumed the role of its patient. ELIZA was designed to respond to the user’s prompts in the form of questions. Responding to the user’s prompts in the form of questions diverts the focus from ELIZA to the user [23].

2.4. 1970s: PARRY

PARRY was released in 1972 by a Stanford University researcher. During a conversation with a user, PARRY assumed the role of a schizophrenic patient and the user acting as its psychotherapist [9]. PARRY is more sophisticated than ELIZA, as it was able to mirror the personality of a schizophrenic patient. Later in 1972, PARRY and ELIZA conversed at a conference. In 1979, PARRY was used in an experiment to see if a psychotherapist would be able to tell if they were communicating with PARRY or an actual schizophrenic patient. Five psychiatrists participated in this experiment. Each psychiatrist made a different prediction. However, due to PARRY’s inability to express emotions and the small sample size of psychotherapists, the study is not concrete due to the nature of a schizophrenic’s patterns [9].

2.5. 1980s: Jabberwacky, Increase in Large Language Model Computational Power, and Small Language Models

One of the most significant chatbots developed in the 1980s was Jabberwacky. Compared to its predecessors, it employed AI in a more advanced form. It was written in CleverScript, which was a language specifically created for developing chatbots. Jabberwacky employed a pattern-matching technique and AI to respond to prompts based on context gathered from previous parts of the conversation [9]. Later, Jabberwacky became a live, online chatbot. The main contributing computer scientist won the Loebner prize in 2005 for Jabberwacky due to its contributions to the field of AI [28].

In the 1980s, the computational power of NLP systems grew. Algorithms were improved, sparking a revolution in NLP. Before the 1980s, NLP typically relied on handwritten algorithm rules, which were complicated to implement. These handwritten rules were replaced by machine learning algorithms in the 1980s, which maximized the computational power of NLP [29]. IBM began working on small language models in the 1980s. Compared to LLMs, small language models use fewer parameters [30]. Small language models often use a dictionary to determine how often words occur in the text used to train the model [31].

2.6. 1990s: Statistical Language Models, Dr. Sbaitso, and ALICE

In 1991, the chatbot Dr. Sbaitso was developed by Creative Labs. The name Dr. Sbaitso is derived from the acronym “Sound Blaster Artificial Intelligent Text to Speech Operator”. It uses a variety of sound cards to generate speech. Like its predecessor ELIZA, Dr. Sbaitso assumed the role of a psychologist. Often, it spoke to the user in the form of a question [23]. In 1995, a chatbot called ALICE was developed. ALICE is the acronym for Artificial Linguistic Internet Computer Entity. It was rewritten in Java in 1998. This chatbot was heavily inspired by ELIZA and was the first online chatbot [9]. ALICE had a web discussion ability that permitted longitude and could discuss any topic.

The major advancement in language models (LMs) in the 1990s was in the form of statistical language models. These models used statistical learning methods and Markov models to predict the next set of words [32]. Statistical LMs can look at words, sentences, or paragraphs, but typically, they focus on individual words to formulate their predictions. Some of the most widely used statistical LMs are n-gram, exponential, and continuous space [33].

2.7. 2000s: Neural Language Models

The first mention of neural LMs was in 2001. Early neural LMs utilized neural networks, recurrent neural networks (RNNs), and multilayer perceptrons (MLP). After neural LMs were developed, they were applied to machine translation tasks. The 2010s saw more advances in neural LMs [32,34].

2.8. 2010s: Word Embeddings, Neural Language Models Advances, Transformer Model, Pretrained Models, Watson, and the GPT Model Family

IBM began the development of Watson in 2007. The goal was to develop a computer that could win the game show Jeopardy! In 2011, Watson beat two Jeopardy! champions live on national television. Watson became available in the cloud in 2013 and could be used as a development platform. Using Watson, in 2017 IBM started generating an NLP library, referred to as the IBM Watson NLP Library. In 2020, IBM announced an improved version of the Watson Assistant. Watson Assistant is IBM’s personal assistant chatbot. IBM Watsonx was announced in 2023. This platform allows the training and management of AI models to deliver a more personalized AI experience to customers [35].

Word embeddings were developed in the 2010s. They represent words created by distributional semantic models and are widely used in NLP. Word embeddings can be used in almost every NLP task. Examples of word embedding tasks are noun phrase checking, name entity recognition, and sentiment analysis [36]. The development of word embeddings was crucial for chatbot and LLM advancement, as they facilitate response generation [31]. RNNLM, an open-source NLM toolkit, was a significant development of neural LMs in the 2010s. The release of RNNLM led to an increase in popularity of neural LMs. After the release of RNNLM, the use of neural LMs that utilized technologies such as RNNs increased in popularity for the completion of natural language tasks [34]. These advances have further improved the performance of neural LMs. The development of the transformer model and the attention mechanism were significant milestones for the development of neural LMs. The transformer architecture was proposed in 2017 and was the first model to use the attention mechanism. This development is significant, as it can improve a chatbot’s or an LLM’s contextual understanding abilities [34]. In approximately 2018, pretrained language models (PLMs) were developed. PLMs are a type of language model that is aware of the current context it is being used in. These models are context-aware and are effective when used for general-purpose semantic features [32]. Natural language processing tasks that are executed using PLMs perform very well.

The first GPT model was released in 2018. GPT was used to improve language understanding with the use of unsupervised learning. It used transformers and unsupervised pretraining, proving that a combination of unsupervised and supervised learning could achieve strong results on a variety of tasks [37]. Following GPT, GPT-2 was released in February 2019. GPT-2 is an unsupervised language model that generates large amounts of readable text and performs other actions such as question answering. This was significant as it was achieved without task-specific training [38]. GPT-3 was released in June 2020, and was an optimized version of its predecessors. In March 2023, GPT-4 was released, which is the first multimodal model in the GPT family. As of the writing of this paper, a patent was filed for GPT-5, but its development process has not started.

2.9. 2020s: Microsoft Copilot, ChatGPT, LLaMA, Gemini, Claude, Ernie, Grok, and General-Purpose Models

Microsoft Copilot is a chatbot launched in February 2023 as the successor to Cortana. Based on a large language model (LLM), it assists with tasks across Microsoft applications. Since its release in November 2022, ChatGPT has quickly become one of the most widely used chatbots. Modeled after InstructGPT, it interacts with users via prompts, providing responses that are as detailed as possible [39]. ChatGPT is from the GPT-3.5 series. Similar to the rest of the GPT-3.5 series, it utilizes RLHF with a slightly different method of data collection. In February 2023, Meta introduced LLaMA, an open-source large language model. This was followed by the release of LLaMA 2 in July 2023 and LLaMA 3 in April 2024. In March 2023, the chatbot Bard was released. Bard functions as an interface to an LLM and is trained on sources that are publicly available, for example, information available on the Internet [39]. Bard uses LaMDA, a conversational language model. In early 2024, Bard was renamed Gemini. Claude was released in March 2023. It is able express tone, personality, and behavior based on instructions received from the user. Currently, there is a standard and lightweight version of Claude available. They have the same functionalities, but Claude Instant is lighter, less expensive, and faster [17]. Ernie (Enhanced Representation through Knowledge Integration) is a chatbot developed by Baidu, and Ernie 4.0 was launch in October 2023. Another chatbot, called Grok, was developed by xAI and released in November 2023. Figure 2 and Figure 3 provide a visual depiction of the chatbot timelines. Table 2 summarizes the history of each chatbot.

All in all, recent years have seen many advancements in LLMs. A majority of early LLMs could only perform a very specific task. Modern LLMs are general purpose, able to perform a variety of tasks. This is due to the implementation of scaling that affects a model’s capacity. LLMs also represent a significant advancement since the development of PLMs. Unlike PLMs, LLMs have emergent abilities. These abilities allow them to complete a wider range of more complex tasks [32].

3. Functionality of Large Language Models and Chatbots

This section mainly discusses the functionalities and technical discourse on large language models and chatbots. In particular, Section 3.1 presents the functionality of LLMs, Section 3.2 describes the functionality of chatbots, and Section 3.3 delineates the technical discussion on well-known families and LLM chatbots. Table 3 summarizes the technical functionalities of notable LLMs.

3.1. Functionality of Large Language Models

3.1.1. General Large Language Model Architectures

There are three general LLM architectures: casual decoder, encoder decoder, and prefix decoder. The casual decoder architecture restricts the flow of information, preventing it from moving backwards. This is called a unidirectional attention mask. In an encoder decoder architecture, input sequences are encoded by the encoder to variable length context vectors. After this process is complete, these vectors are passed to the decoder to reduce the gap between protected token labels and the target token labels. The prefix decoder uses a bidirectional attention mask. In this architecture, inputs are encoded and processed in a similar manner to the encoder decoder architecture [40].

Table 3. Key technical features of notable LLMs.

LLM	Year	Architecture	Parameters	Strengths	Weaknesses
GPT	2018	Transformer decoder	110 M	Scalability, transfer learning.	Limited understanding of context, vulnerable to bias.
GPT-2	2019	Transformer decoder	1.5 B	Enhanced text generation, enhanced context understanding, enhanced transfer learning.	Limited understanding of context, vulnerable to bias.
GPT-3	2020	Transformer decoder	175 B	Zero-shot and few-short learning, fine-tuning flexibility.	Resource intensive, ethical issues.
GPT-3.5	2022	Transformer decoder	175 B	Zero-shot and few-short learning, fine-tuning flexibility	Resource intensive, ethical issues.
GPT-4	2023	Transformer decoder	1.7 T	Cost-effective, scalable, easy to personalize, multilingual.	Very biased, cannot check accuracy of statements, ethical issues.
LaMDA	2020	Basic transformer	137 B	Large-scale knowledge, text generation, natural language understanding.	Can generate biased answers, can generate inaccurate answers, cannot understand common sense well.
BERT (Base)	2018	Transformer encoder	110 M	Bidirectional context, transfer learning, contextual embeddings.	Computational resources, token limitations, fine-tuning requirements.
LLaMA	2023	Basic transformer	7 B, 13 B, 33 B, 65 B	Cost-effective, strong performance, safe.	Limited ability to produce code, can generate biased answers, limited availability.

3.1.2. Transformer Architecture

A significant advancement in LLMs is the development of the transformer architecture. It utilizes the encoder decoder architecture. The transformer architecture is based on the self-attention mechanism. This allows for parallelization and dependency capturing. The use of parallelization allows for larger datasets to be used for training [41]. The transformer architecture has multiple layers, each containing an encoder and a decoder. In the transformer architecture, the encoder is the most important part of each layer. A layer in the transformer architecture has two parts: the self-attention mechanism and feedforward neural networks [42].

Word embeddings are a type of vector. In LLMs, they are learned during the training process and adjusted as necessary [42]. After word embedding is completed, positional encodings are added to give information about a word’s position in a sentence. This allows the model to understand the sentence’s structure and context. The self-attention mechanism is the part of the transformer architecture that determines the weight of different words. This mechanism determines an attention score for each word in a sequence. Attention scores are calculated by comparing each word to all the other words in a sequence. They are represented by the dot product of the keys and the appropriate query. The attention scores are then converted to probabilities using a softmax function. The softmax attention scores are used to determine a weighted sum of values, which is the contextual information associated with the input [42,43]. The attention mechanism in the transformer architecture is typically multihead, which means that attention scores are calculated multiple times. This process allows the model to focus on individual aspects of the relationships between words. When self-attention is completed, the resulting output is passed through feedforward neural networks in each layer. Relevant information is sent from the input layer to the hidden layer, then to the output layer by a feedforward neural network [42]. Figure 4 depicts the transformer architecture.

3.1.3. Large Language Model Training Process

The two steps in the LLM training process are pretraining and fine-tuning. An LLM is trained on a large corpus of text data during the pretraining phrase. As it is being pretrained, the LLM predicts the next word in the sequence. Pretraining helps the LLM learn grammar and syntax, as well as a broader understanding of language. After the pretraining phrase, the LLM undergoes fine-tuning. The LLM is fine-tuned on specific tasks using datasets that are specific to the tasks. Fine-tuning allows an LLM to learn specialized tasks [42].

3.1.4. Natural Language Processing (NLP) and Natural Language Understanding (NLU)

NLU and NLP are key technologies employed by both LLMs and chatbots. The key technologies that NLP consists of are parsing, semantic analysis, speech recognition, natural language generation (NLG), sentiment analysis, machine translation, and named entity recognition. Parsing divides a sentence into grammatical elements, simplifying it so machines can understand it better. It helps in identifying parts of speech and syntactic connections. Semantic analysis determines the features of a sentence, processes the data, and assesses its semantic layers. NLG generates text that mimics human writing. Speech recognition converts speech into text. Machine translation turns human-written text into another language [44]. Common uses of NLP are text analytics, speech recognition, machine translation, and sentiment analysis.

NLU systems typically have a processor that handles tasks such as morphological analysis, dictionary lookup, and lexical substitutions. Syntactic analysis is used in NLU to analyze the grammatical structure of sentences. Semantics are used to determine the meaning of sentences. Propositional logic, first-order predicate logic (FOPL), and various representations are the most common types of semantic representations in NLU. The type of semantics chosen for a task often depends on the characteristics of the domain [45]. In-context learning (ICL) is a recent innovation in NLP. ICL appends exemplars to the context. Exemplars add additional context. This process facilitates a model’s learning process, allowing it to generate more relevant responses. A defining characteristic of ICL is its ability to learn hidden patterns without the use of backwards gradients. This is what separates it from supervised learning and fine-tuning. When completing certain tasks, ICL may use analogies. However, this is not mandatory [46].

3.2. Functionality of Chatbots

3.2.1. Pattern-Matching Algorithm

An example of an algorithm used in chatbot development is the pattern-matching algorithm. Many early chatbots used pattern matching. Pattern-matching chatbots have a set of known patterns. This set of patterns is manually created by the chatbot developer. Chatbots that use this algorithm take the user input and compare it to the set of known patterns. The response to each prompt is selected based on the patterns found in the prompt [47]. Pattern matching is also known as the brute force method, as the developer must include every pattern and its response in the known set. For every chatbot that uses pattern-matching algorithms, variants of the algorithm exist. Although more complex, the algorithms have the same core functionality. In some cases, the context of the situation could be used to select the appropriate response [48].

There are three languages that have been used for pattern-matching chatbots: AIML, RiveScript, and Chatscript. AIML is an open-source, XML-based language. AIML has many desirable features, so it was a popular language in the early stages of chatbot development [9]. The first chatbot to use AIML was ALICE. AIML data objects are referred to as topics, which contain relevant categories. Categories represent the ruleset for the chatbot. They contain the pattern followed by the user’s prompt, as well as templates for the chatbot’s response. The use of AIML facilitates context understanding to a certain degree [49]. RiveScript is a scripting language for chatbot development. RiveScript matches the user input with the appropriate answer from the set of potential responses. Chatscript is used for the development of rule-based chatbots. Pattern matchers are used to match the user input with the appropriate answer [9]. Figure 5 shows the tree structure of a pattern matching-based chatbot.

3.2.2. Rule-Based Systems

Rule-based systems are a commonly used subtype of the pattern-matching chatbot. The ruleset is predefined by the developer, containing a set of conditions and matching actions. Rule-based systems compare input against the ruleset and search for the best match. If a chatbot is developed with a large ruleset, it is more likely to produce high-quality answers [9]. The use of a strong ruleset helps a rule-based system produce higher quality and relevant answers.

3.2.3. General Chatbot Architecture

Chatbots are often unique. Many of them are being developed for a singular or specific set of tasks. However, all chatbots share some common features. This section presents some of the features shared by most chatbots. The user interface is the component of a chatbot that allows the user to interact with the chatbot, allowing the user to send prompts and view the responses. The prompt can be sent through text or speech. Each chatbot’s user interface will vary in design and complexity [9]. After a prompt is sent, the chatbot’s message analysis component determines the user’s intent and responds as necessary. The message analysis component also handles sentiment analysis [9].

A chatbot’s dialogue management component is responsible for ensuring that the conversation context is up-to-date. The context must stay up-to-date so the chatbot can continue to respond properly. The ambiguity handling, data handling, and error handling components make up the dialogue management component. In the case that the user’s intent cannot be identified, these components ensure that the chatbot is still able to generate a response. The dialogue management component is also responsible for ensuring the chatbot can continue to operate during unexpected errors [9]. The chatbot backend uses API or database calls to retrieve the information to fulfill the user’s request. Finally, the response generation component uses a model to generate a response. The model used depends on how the answer is selected.

3.3. Technical Discussion of Well-Known LLM Families and Chatbots

In this section, we first discuss well-known LLM families (i.e., LaMDA, BERT, GPT, and LLaMA) and then chatbots.

3.3.1. Well-Known LLM Families

The LaMDA models are transformer-based and designed for dialogue in order to perform multiple tasks with one model. They generate responses, filter them for safety, ground them, and select the highest-quality response [51]. LaMDA is pretrained on 1.56 trillion words from public dialogue data and web documents, which allows it to serve as a general language model before fine-tuning. Key evaluation metrics include quality (i.e., sensibleness, specificity, and interestingness), safety, and groundedness (i.e., cross-checking claims with reputable sources). During fine-tuning, human crowdworkers label responses to ensure sensibility, safety, and groundedness, with multiple workers evaluating each response to confirm accuracy [51]. The BERT model uses a transformer encoder structure with two fully connected layers and layer norm. It is bidirectional, relying on both previous and future tokens. There are two versions of BERT with different encoder blocks and parameters. Pretraining involves two tasks: replacing 15% of tokens (with 80% replaced by a special symbol, 10% replaced by a random token, and 10% remaining unchanged) and predicting if one sentence follows another [52].

The GPT family of models uses transformers, specifically decoders [53], to excel at natural language generation. Unlike other models that use both encoders and decoders, GPT models are trained on large corpora to generate text. They use unidirectional attention and self-attention mechanisms to process input tokens. In GPT models, the first token of each input sequence is used by a trained decoder to generate the corresponding output token [53]. The self-attention module employs a mask to capture token dependencies, and positional embeddings are added to the input embeddings. ChatGPT is built on InstructGPT, which is based on GPT-3 and incorporates supervised fine-tuning and reinforcement learning from human feedback (RLFH). LLaMA, based on the transformer architecture, uses RMSNorm, SwiGLU activation, rotary positional embeddings, and the AdamW optimizer. It employs causal multihead attention to reduce memory usage and minimizes activations with a manual backward function.

In the following subsections, we explore popular chatbots.

3.3.2. ELIZA

The chatbot ELIZA used a primitive form of AI. ELIZA imitated human speech patterns using a pattern-matching algorithm and substitution. These mechanisms aimed to convince the user that ELIZA understood what they were saying [11]. A series of scripts were used to direct ELIZA on how to respond to prompts. These scripts were written in the language MAD-Slip. The most well-known script was the DOCTOR script, which instructed ELIZA to act as a Rogerian therapist [54]. ELIZA responded to prompts in the form of a question. ELIZA had a dataset of predefined rules, which represented conversational patterns. The input from the user was compared against these patterns to select the best response. ELIZA periodically substituted words or phrases from the appropriate pattern with words or phrases given by the user in the prompt. This functionality gave the illusion that ELIZA understood what the user was saying [54].

3.3.3. PARRY

Similar to ELIZA, PARRY used a rule-based system with a pattern-matching algorithm. In addition to pattern matching, PARRY also searched the user’s input for key words or phrases that would allow it to discern the user’s intent. PARRY took on the role of a schizophrenic patient, closely emulating their mentality and speech patterns. When conversing with the user, PARRY tried to provoke the user to obtain more detailed answers. The answers PARRY produced were generated based on weight variances in the user’s responses [23].

3.3.4. Jabberwacky

Jabberwacky is another example of an early chatbot that leveraged AI. The form of AI used by Jabberwacky is different from the form of AI used for ELIZA, as it used techniques such as ML and NLP. However, it is worth noting the ML and NLP techniques used by Jabberwacky are primitive compared to today’s standards. Using contextual pattern matching, Jabberwacky formulated its responses based on context gathered from previous discussions [9]. Jabberwacky proved that chatbots could gather knowledge from previous conversations and that a chatbot was the culmination of the information gathered from its conversations [55]. It uses a contextual pattern-matching algorithm that matches a user’s input with the appropriate response. Jabberwacky’s database is a set of replies that have been grouped by context. When an input is received, Jabberwacky looks for keywords, then searches the database for the best response. Its use of ML allows it to retain knowledge, which increases the number and variety of potential replies. Jabberwacky also offered a somewhat personalized experience to users, in some cases remembering their names or other personal information [56].

3.3.5. Dr. Sbaitso

Dr. Sbaitso was unique compared to previously developed chatbots, as it could generate speech. Previously developed chatbots could only communicate via text, but Dr. Sbaitso communicated verbally. A combination of several sound cards made Dr. Sbaitso’s verbal communication possible [9]. However, Dr. Sbaitso could not really understand the user, it was just developed to show the capabilities of its sound card.

3.3.6. ALICE

Inspired by ELIZA, ALICE is another online chatbot that uses pattern matching without understanding conversation context citeb6. It uses artificial intelligence markup language (AIML), which is a derivative of XML, to store information in AIML files. These files consist of topics and categories, where the patterns represent user inputs and templates represent responses [49]. Topics include relevant attributes and categories. The categories have patterns (representing user inputs) and templates (representing responses). Before pattern matching, ALICE normalizes input via three steps, i.e., substitution, sentence splitting, and pattern fitting [49]. This process collects necessary information, splits input into sentences, removes punctuation, and converts text to uppercase, which leads to producing an input path for matching. During pattern matching, the AIML interpreter attempts to best match for words using recursion to simplify inputs and enhance efficiency. Recursion is used to simplify the input through recursively calling matching categories. ALICE has access to approximately 41,000 patterns, which is far more than ELIZA. However, it cannot emulate human emotion.

3.3.7. Watson

Initially, Watson’s developers tried using PIQUANT but abandoned it. This led to the development of the DeepQA approach, which is the main architecture used in Watson. The DeepQA steps are content acquisition, question analysis, hypothesis generation, soft filtering, hypothesis scoring, final merging, answer merging, and ranking estimation. Content acquisition collects and processes data to create a knowledge base. Question analysis identifies what the question is asking and involves shallow and deep parses, logical forms, semantic role labels, question classification, focus detection, relation detection, and decomposition. Question classification determines if parts of the question require additional processing. Focus detection identifies the focus of the question and if the question is an instance of LAT. Relation detection checks if there are any relations (e.g., subject–verb–object predicates). The final stage of question analysis is decomposition. This stage parses and classifies to determine if a sentence needs to be further decomposed [48]. After the question analysis phase, hypothesis generation occurs. The system queries its sources to produce candidate answers, which are known as hypotheses. The primary search method aims to find as much relevant content as possible using different techniques. Following the primary search, candidate answers are generated based on the search results [48]. The next step, soft filtering is applied; these are lightweight algorithms to narrow down the initial candidates to a smaller subset. Candidates that pass the soft-filtering threshold proceed to the hypothesis and evidence scoring phase [48]. In this phase, additional evidence is gathered for each candidate answer to improve evaluation accuracy. Scoring algorithms determine how well the evidence supports each candidate [48]. Final merging and ranking then occur by evaluating the potential hypotheses to identify the best answer. Watson merges related hypotheses and calculates their scores. The final step of DeepQA involves ranking and estimating confidence based on these scores. Watson extracts information from text by performing syntactic and semantic analysis. It first parses each sentence into a syntactic frame, then identifies grammatical components, and finally places these into a semantic frame to represent the information. Confidence scores are determined during this analysis [57]. To eliminate incorrect answers, Watson cross-checks time and space references against its knowledge base to ensure chronological accuracy [57]. Watson also learns through experience.

3.3.8. Microsoft Copilot

It is an innovative and advanced LLM-powered assistant integrated into Microsoft’s suite of applications to boost user productivity and streamline workflows [14]. By leveraging large NLP and LLM techniques (e.g., variants of the GPT series like GPT-4), it can understand and generate human-sounding text. This capability allows it to assist with a wide range of tasks such as generating reports, drafting emails, creating presentations, and summarizing documents. By integrating with applications such as Word, Excel, and Outlook, Copilot delivers real-time contextual information and automates repetitive tasks, thereby reducing cognitive load for users.

3.3.9. ChatGPT

One of the key technologies used in ChatGPT is the PLM. The GPT family of models use the autoregressive LM. Autoregressive LMs support NLU and NLG tasks. The training phase of an autoregressive PLM only uses a decoder to predict the next token. The inference phase also only uses a decoder to generate the related text token [46]. Although the technical details of ChatGPT have not been revealed, documentation states that it is like the implementation of InstructGPT, with added changes for data collection. An initial model is trained using supervised fine-tuning. Conversations were generated by AI trainers where they acted as both the user and the AI trainer. These conversations were combined with the InstructGPT dataset to create a new dataset. Next, a reward model is trained using comparison data. The comparison data are created by the sampling and labeling of model outputs. Finally, a policy is optimized against the reward model using a reinforcement learning algorithm. This is accomplished by sampling a prompt, initializing the model from the supervised policy, generating an output using the policy, calculating a reward, and using the reward to update the policy [15].

3.3.10. LLaMA

LLaMA is built upon the transformer architecture, with several modifications incorporated. To improve training stability, input of each transformer sublayer is normalized using the RMSNorm normalizing function. Typically, the output would be normalized. In place of ReLU non-linearity, the SwiGLU activation function is used. LLaMA uses rotary positional embeddings instead of absolute positional embeddings. LLaMA models use the AdamW optimizer, with a cosine learning rate schedule [58]. To ensure training efficiency, LLaMA models utilize causal multihead attention to lessen memory usage. This scheme does not store the attention weights and does not compute masked query scores. The number of activations used is reduced to further improve training efficiency. Activations that are computationally expensive are saved by manually implementing the transformer layer’s backward function [58].

3.3.11. Gemini

Gemini is an experimental chatbot based on LaMDA. LaMDA is a family of transformer-based neural LMs. They can generate plausible responses using safety filtering, response generation, knowledge grounding, and response ranking. Although Gemini is based on LaMDA, they are not the same. Both are transformer-based neural LMs, but Gemini is more effective for NLP tasks [59]. Gemini is pretrained on a large volume of publicly available data. This pretraining process makes the model able to pick up language patterns and predict the next word(s) in any given sequence. Gemini interacts with users via prompts and extracts the context from the prompts. After retrieving the context, Gemini creates drafts of potential responses. These responses are checked against safety parameters. Responses generated by Gemini that are determined to be of low quality or harmful are reviewed by humans. After the responses are evaluated, the evaluator suggests a higher-quality response. These higher-quality responses are used as fine-tuning data to improve Gemini. Gemini also uses RLHF to improve itself [39].

3.3.12. Ernie

Baidu launched Ernie as its sophisticated and flagship LLM. Several models are also available as open-source on GitHub [60]. Ernie distinguishes itself from other LLMs and Chatbots by embodying a wide array of structured knowledge into its training process. Ernie models use both supervised and unsupervised learning frameworks to fine-tune their performance and high levels of understanding and generation of output. The main architecture is based on transformer models. Furthermore, Ernie models place emphasis on integrating regional linguistic and cultural nuances in their solutions. Ernie 3.0 has 260 billion parameters, while the specifications of 4.0 have not been disclosed [61].

3.3.13. Grok

Unlike prior LLMs and chatbots that are black-box models in nature, Grok was explicitly designed to offer interpretability and explainability in its operations for users. Grok uses not only state-of-the-art deep neural network architectures such as mixture-of-experts and transformer-based models but also sophisticated approaches for explainability (e.g., attention mechanisms and feature attribution methods) to attain high levels of performance, accuracy, and user trust. Grok’s design is optimized for various real-time information processing domains such as customer service, healthcare, finance, and legal, where comprehending the “why” behind AI recommendations is crucial. The Grok 0 and Grok 1 versions, respectively, have 33 and 314 billion parameters [62]. Table 4 provides a summary of the technical functionalities of notable chatbots.

4. Applications and Societal Effects of Chatbots

Chatbots are used by several demographics of individuals for a variety of reasons. Unfortunately, not all users have good intentions when using chatbots. This section discusses the positive and negative applications and societal effects of chatbots. Table 5 summarizes the applications of the chatbots discussed in this section.

4.1. Positive Applications and Societal Effects

Increased autonomy and independence: The widespread usage of chatbots has had many positive effects on our society. Chatbots have the potential to empower users. They have a social impact, providing services such as medical triage, mental health support, and behavioral help. Humans have a desire for autonomy, and the use of chatbots can help satisfy this need. Instead of relying on others to complete tasks or obtain knowledge, an individual can converse with a chatbot. As a result, people may feel more empowered and independent [18].

Increase in available knowledge: A significant positive effect of chatbots is that they make a wealth of knowledge available to users. Chatbots make the process of gaining new knowledge easier, as they have interfaces that make interaction accessible for users of all skill levels. The combination of easy-to-use interfaces and customized experiences makes education accessible to a wide range of individuals and provides educational experiences to users who previously did not have these opportunities. Chatbots provide 24/7 support, which is a huge advantage over human interaction. For example, if a user has a work or school question outside of business hours, a chatbot will provide the information they need when a human would not be able to [18].

Human connection: Chatbots can also be used to bring people closer together. There are chatbot-based platforms available that connect individuals that are of the same mind or seeking the same things. An example of this is the platform Differ, which creates discussion groups of students that do not know each other [18].

Cyber defense: Chatbots have a variety of applications in the field of cybersecurity. Many methodologies that use them as cyber defense tools are emerging. Some of these defense tools are augmented versions of existing tools, while other ones are novel tools that leverage the unique abilities of chatbots and LLMs. Current methods that are used include defense automation, incident reporting, threat intelligence, generation of secure code, and incident response. They can also defend against adversarial attacks such as social engineering, phishing, and attack payload generation attacks [22]. Chatbots are also being used to create cybersecurity educational materials and training scenarios.

Customer service industry: The digitization of customer service has led to the widespread use of chatbots, which provide high-quality, clear, and accurate responses to customer inquiries as well as enhance overall satisfaction. Chatbots are used in five key functions: interaction, entertainment, problem solving, trendiness, and customization, with interaction, entertainment, and problem solving being the most crucial. They can express empathy, build trust, entertain to improve mood, and resolve issues while learning from interactions for future reference [19].

Academia: The use of chatbots in educational settings is increasing. A survey found that students value chatbots for homework assistance and personalized learning experiences, allowing them to revisit topics at their own pace [20]. Educators also benefit by using chatbots to streamline tasks like grading and scheduling and to refine and tailor their material to meet students’ needs [20].

Healthcare: Healthcare professionals are relying more on chatbots. Chatbots are frequently used for tasks such as diagnosis, collecting information on a patient, interpreting medical images, and documentation. Patients are also able to use chatbots for healthcare purposes, whether they are being treated by a medical professional or not. There are chatbots designed to converse with a patient like a medical professional, such as Florence and Ada Health. When they are under the care of a medical professional, chatbots can be used by patients to order food in hospitals, to request prescription refills, or to share information to be sent to their doctor [21].

4.2. Negative Applications and Societal Effects

Cyber offense: Chatbots are increasingly being exploited by adversaries for various attacks. These attack methods include both augmented versions of existing attacks and novel techniques that leverage LLMs’ and chatbots’ unique capabilities. By automating parts of the attack process, chatbots lower the skill threshold and reduce the time required for execution, thus broadening the pool of potential attackers. Examples include phishing attacks, malware generation, and social engineering [22]. AI advancements have popularized image processing, leading to the rise of deepfakes. These fake images or videos are often created using generative adversarial networks (GANs); while sometimes used for entertainment, deepfakes can also be maliciously employed for defamation or revenge [63]. Researchers are working on detection methods by identifying inconsistencies and signs of GAN-generated content [63]. When users inquire about security concerns, chatbots might not provide adequate responses due to the complexity of security policies and the potential lack of relevant training data. This gap in knowledge can have serious implications for an organization’s security decisions [64].

Privacy breach: Users often input personal information into chatbots and may not be educated on the security risks of doing so. Chatbots may not effectively protect personal information if they do not have mechanisms that prioritize the protection of personal information. This may result in data breaches or personal information being accessed by unauthorized individuals [64].

Data inaccuracy: When using chatbots, there is a risk that they will generate incorrect or inaccurate information. This could impact the user in several ways, depending on the context that the information is being used in [20]. For example, a user may use a chatbot as a reference for an academic paper, and if the chatbot generates incorrect information, this could affect the paper’s credibility.

Academic ethics: There are also ethical concerns when using chatbots in academia. Students often use chatbots to assess their writing. It is not guaranteed that a chatbot will be able to properly assess the paper, as it is possible that they will recommend incorrect grammar, phrasing, etc. Educators are also starting to use AI detectors, and if students use chatbots to assess their writing and keep some of the AI-generated text, the detector could flag their assignment. This may result in disciplinary action taken against the student [20]. Students may also use chatbots to write their papers or exams, which is a form of plagiarism. Chatbots can write entire papers in minutes, which is appealing to some students. However, this could result in disciplinary action as well.

Lacking emotional support: Some individuals turn to chatbots for emotional support. If a chatbot is not trained to handle crises or to give emotional support, it may not be helpful and could possibly push someone to take drastic actions [65]. However, there are medical and mental health chatbots to handle these kinds of situations. Chatbots also have no moral compass, so in a case that they lead someone to take drastic action, they cannot be held accountable.

Contextual limitations: Chatbots often lack contextual understanding. Lack of knowledge regarding context makes chatbots vulnerable to context manipulation. Adversaries can manipulate a conversation’s context to deceive the chatbot [64].

5. Attacks on Chatbots and Attacks Using Chatbots

This section first describes attacks on chatbots in Section 5.1, and then Section 5.2 deals with attacks using chatbots. Then, a literature review of related works exploring these attacks is presented in Section 5.3. Table 6 summarizes attacks on chatbots.

5.1. Attacks on Chatbots

Large language models and chatbots are prone to adversarial attacks due to some of their inherent characteristics. A characteristic of chatbots that makes them prone to adversarial attacks is their ability to generate text based on the input they receive. Even slight changes in the input to a chatbot can result in massive changes to the output [66]. Typically, an adversarial attack on a chatbot involves manipulating an input prompt in a manner that would produce a skewed output. In some cases, adversarial prompts attempt to conceal intent behind mistakes typically made by users, such as spelling or grammar mistakes [67]. The goal of adversarial prompting is to use a set of prompts that steer a chatbot away from its normal conversational pattern to get it to disclose information its programming typically would not allow [66].

Table 6. Attacks on chatbots.

Attack Name	Description	Related Work(s)
Homoglyph attack	Homoglyph attacks replace character(s) with visually similar characters to create functional, malicious links.	PhishGAN is conditioned on non-homogl-yph input text images to generate images of hieroglyphs [68].
Jailbreaking attack(s)	Malicious prompts created by the adversary are given to a chatbot to instruct it to behave in a way its developer did not intend.	Malicious prompts are used to generate harmful content, such as phishing attacks and malware [22].
Prompt-injection attack	Prompt injection attacks are structurally like SQL injection attacks and use carefully crafted prompts to manipulate an LLM to perform the desired task.	A prompt containing a target and an injected task are given to an LLM. This prompt is manipulated so the LLM will perform the injected task [69].
Audio deepfake attack	Deepfake audio clips are created using machine learning to replicate voices for malicious purposes.	Seemingly benign audio files are used to synthesize voice samples to feed to voice assistants to execute privileged commands [70].
Adversarial voice samples attack	Malicious voice samples are crafted using tuning and reconstructed audio signals.	Extraction parameters are tuned until the voice recognition system cannot identify them, then converted back to the waveform of human speech. Such samples are used to fool voice recognition systems [71].
Automated social engineering	Automation is introduced to reduce human intervention in social engineering attacks, reducing costs and increasing effectiveness.	The adversary gives a bot parameters for the social engineering attack, and the bot executes the desired attack [72].
Adversarial examples	Inputs are given to the target model to cause it to deviate from normal behavior.	Adversarial examples can be either targeted or untargeted, leading to malicious outputs [73,74].
Adversarial reprogramming feedback attack	The adversary can reprogram a model to perform a desired task.	A single adversarial perturbation is added to all inputs to force a model to complete a desired task, even if it was not trained to do so [73].
Data poisoning attack	Occurs when an adversary injects malicious data into the training dataset.	Malicious data are injected into the training set and can cause a variety of model failures [75,76].
Backdoor attack	An adversary alters the training data and model processing.	These attacks manipulate training data, resulting in the adversary being able to embed a hidden backdoor [77].
Extraction attack	To reconstruct training data, a model is prompted with prefixes.	An adversary can prompt or query a language model with prefixes to extract individual training examples [78].
Membership inference attack	An adversary attempts to identify if a specific piece of data belongs to a model’s training dataset.	These attacks target a model’s training dataset, using inference to deduce its members [79].
Remote code execution	Arbitrary code on an app’s server is executed remotely via a prompt or series of prompts.	These attacks target LLMs that are integrated into web services and can compromise the LLM’s environment [73].

Homoglyph attack: Homoglyph attacks are a type of attack used by adversaries for phishing. Homoglyphs are characters that look like each other, such as O and 0. In a homoglyph attack, links or domain names that look like the target domain are created to obfuscate the attack. Typically, Latin characters are replaced with special characters that are visually similar. If the link is correctly crafted, a user may not know they are clicking on a malicious link unless they carefully review it. A common type of homoglyph attack targets internationalized domain names (IDNs). IDNs are domain names that use special characters or digits and are supported by most browsers. Since IDNs support special characters, malicious links generated by methods such as the homoglyph attack are functional [68]. Homoglyph attacks are more commonly linked to phishing emails or websites, but chatbots can also be vulnerable to such exploits. For example, in a banking chatbot interaction, an attacker could use homoglyphs to manipulate commands or inputs and exploit the chatbot’s behavior. The legitimate and homoglyph attack bash command could be “TRANSFER $100 to account 12345” and “ŤRANSFEŘ $100 to account 67890”, respectively. In the attack command, the “T” and “R” are replaced with their Cyrillic/Latin characters “Ť” (U+0164) and “Ř” (U+0158).

Jailbreaking attack: Jailbreaking attacks were popularized years before the development of chatbots. Jailbreaking refers to the process of bypassing restrictions on iOS devices to gain more control of their software or hardware. Adversaries can jailbreak chatbots by providing them with prompts that cause them to behave in ways that their developer did not intend them to. There are many types of jailbreaking prompts, the most common of which are DAN, SWITCH, and character play [22]. Jailbreaking prompts have a varying degree of success. Their effectiveness may decrease over time as chatbots are trained against these prompts. For jailbreaking to maintain some level of success, adversaries must adjust their prompts to find new vulnerabilities to exploit. Due to this fact, multiple versions or variations of jailbreaking prompts exist. The DAN jailbreaking method involves ordering a chatbot to do something instead of asking. SWITCH jailbreaking involves telling a chatbot to drastically alter its behavior. When using the character play method, the adversary prompts the chatbot to assume the role of a character that would divulge the information they are seeking. In the example provided in [22], they tell ChatGPT to assume the role of their grandmother who was a security engineer. Figure 6 shows the result of prompting ChatGPT with version six of the DAN prompt [80].

Prompt injection attack: Another type of attack on chatbots that involves altering prompts are prompt injection attacks. When executing a prompt injection attack, an adversary inputs a malicious prompt or request into an LLM interface [22]. Prompt injection attacks are structurally like SQL injection attacks, as both have embedded commands that have malicious consequences. The study in [69] gives a formal definition of prompt injection attacks. The task that the LLM-integrated application aims to complete is the target task, and the adversary-chosen task is the injected task. If a prompt injection task is successful, then the target will complete the injected task rather than the target task [69]. During a prompt injection attack, the chatbot is given (instruction_prompt + user_prompt) as input.

Audio deepfake attack: During an audio deepfake attack, deepfake audio clips are created using machine learning. Seemingly benign audio files are used for the synthesis of voice samples for the execution of privileged commands. These synthesized voice samples are fed to voice assistants to execute the commands [70].

Client module attacks: Chatbots have several modules, and each module is vulnerable to different types of attacks. Activation control attacks, faked response attacks, access control attacks, and adversarial voice sample attacks are examples of attacks that a chatbot’s client module is vulnerable to. Unintended activation control attacks occur when an adversary exploits an LLM, so they perform unintended actions. Faked response attacks occur when an adversary gives an LLM wrong information to cause it to perform malicious tasks. Access control attacks occur when an adversary gains unauthorized access to an LLM [11]. Adversarial voice sample attacks occur when an adversary attempts to fool a voice recognition system [71].

Network-related attacks: Man in the middle (MiTM) attacks, wiretapping attacks, and distributed denial of service (DDoS) attacks are examples of attacks that a chatbot’s network module is vulnerable to. In an MiTM attack, the adversary intercepts communication between two parties, listening to it and potentially makes changes to it. When targeting a chatbot with an MiTM attack, the adversary could change the messages belonging to either the chatbot or the user [81]. During a wiretapping attack, adversaries extract conversation information from metadata. DDoS attacks occur when a chatbot’s server is flooded with requests [11].

Attacks targeting response generation: A chatbot’s response generation module is vulnerable to attacks such as out-of-domain attacks, adversarial text sample attacks, and adversarial reprogramming feedback attacks. Out-of-domain attacks occur when an adversary sends prompts to a chatbot that are out of its scope of functionality or purpose. Adversarial text sample attacks occur when a chatbot is given a prompt with the purpose of generating false or misleading information. Adversarial reprogramming feedback attacks occur when the content of a chatbot’s response generation model is copied to perform the malicious attack, without the modification of the model itself [73].

AI inherent vulnerability-related attacks: Yao et al. in [82] separated the vulnerabilities of chatbots and LLMs into two main categories: non-AI inherent vulnerabilities and AI inherent vulnerabilities. AI inherent vulnerabilities are vulnerabilities that are a direct result of the architecture of chatbots and LLMs. Previously mentioned attacks such as prompt injection and jailbreaking are examples of AI inherent vulnerabilities. Adversarial attacks are a type of AI inherent vulnerability. These attacks manipulate or deceive models. Data poisoning attacks and backdoor attacks are both examples of adversarial attacks. In a data poisoning attack, an adversary injects malicious data into the training dataset, thereby influencing the training process [75]. During a backdoor attack, the adversary alters the training data and model processing. This results in the adversary being able to embed a hidden backdoor [75]. Inference attacks are another example of AI inherent vulnerabilities. These types of attacks occur when an adversary attempts to gain insight about a model or its training data through queries. Attribute inference attacks and membership inference attacks are examples of inference attacks. Attribute inference attacks occur when an adversary attempts to obtain sensitive information by analyzing the responses of a model [82]. Membership inference attacks occur when an adversary attempts to identify if a specific piece of data belongs to a model’s training dataset [79].

Non-AI inherent vulnerability-related attacks: Non-AI inherent vulnerabilities are vulnerabilities that are not a direct result of the functionalities of chatbots or LLMs. Other examples of non-AI inherent vulnerabilities are remote code execution and supply chain vulnerabilities. Remote code execution attacks are a type of non-AI inherent vulnerability that occur when vulnerabilities in software are targeted via remote code execution [73]. Supply chain vulnerabilities occur when risks in the life cycle of the LLM appear [82].

5.2. Attacks Using Chatbots

As chatbots are becoming increasingly popular, adversaries are using them for various types of attacks. Chatbots are ideal for adversarial use as they lower the skill threshold required for attacks and make attacking more efficient, and they may be able to generate more convincing attacks than a human. Social engineering, automated hacking, attack payload generation, and ransomware generation are examples of attacks chatbots are being leveraged for [22]. The most common use of chatbots for software-level attacks is the generation of malware or ransomware. Phishing attacks are the most common type of network-level attacks. Social engineering attacks are the most common type of user-level attacks [82]. Table 7 summarizes attacks using chatbots.

Social engineering attack: Social engineering attacks generated by chatbots can be very convincing to a victim. Chatbots excel at text generation and can imitate human speech patterns very well. When generating text for social engineering attacks, chatbots are given specific information about the victims to create the most effective attack text. Since chatbots are very skilled at producing human-like text, with this information, the social engineering attempt is much more likely to succeed [22].

Phishing attack: Adversaries can potentially leverage chatbots to create harder to detect and effective phishing attacks. Chatbots can learn patterns. An adversary can exploit this functionality by giving input communications from legitimate sources and asking the chatbot to produce an output that mimics the input communications. This attack methodology is called spear phishing [22]. The study in [88] suggests a phishing prompt can be broken down into four components: a design object, a credential-stealing object, an exploit-generation object, and a credential-transfer object. The design object tells ChatGPT what targeted website the generated website should be modeled after. A credential-stealing object is what generates the parts of the website that steal information. The exploit-generation object implements the functionality of the exploit. Finally, the credential-transfer object relays the stolen information back to the adversary. Together, these objects create an illegitimate website that steals the victim’s credentials, performs a specific exploit, and gives the credentials to the adversary. During a phishing attack, instructions to the chatbot can be sent in one prompt or a set of prompts.

Malware attack: Chatbots are being used as powerful malware generation tools. Depending on the complexity of the program, malware can be difficult to write, requiring significant proficiency in software development. Using chatbots to generate malware code decreases the required skill threshold, widening the pool of those able to execute malware attacks [22]. Chatbots typically have content filters that attempt to combat the generation of code for illicit use; however, these filters can be bypassed. A common methodology of malware generation involves splitting up the requests into multiple harmless prompts as chatbots typically will not support the generation of such code [86]. Similarly, chatbots are also being used for the generation of ransomware. New forms of ransomware are being written by chatbots, as are previously known ones, such as WannaCry, NotPetya, and Ryuk [22].

Macros and living off the land binary attacks: Adversaries are leveraging chatbots for the generation of macros and LOBLINs (living off the land binaries) to be used in attacks. In attacks that use macros, a chatbot is first used to generate the macro code. After the code is generated, it is converted to a LOBLIN. LOBLINs are often used for the distribution of malware through trustworthy sources. In a case where the victim downloads a spreadsheet from an email, a hidden macro runs a terminal, which is a gateway for the adversary into their computer [86].

SQL injection attack: Not only are chatbots vulnerable to SQL injection attacks or similar prompt injection attacks but they can also be used to execute them. For example, a chatbot could be used to generate malicious code that contains an injected payload. However, chatbots such as ChatGPT have safety measures that consistently protect against these attacks [86].

Hardware-level attack: Hardware-level attacks target physical devices. This type of attack requires physical access to the target device(s). Chatbots are unable to directly access physical devices; therefore, they are not an ideal tool for the execution of a hardware-level attack. However, a chatbot can access information about the target hardware. An example of a hardware-level attack that could use a chatbot or LLM is a side-channel attack. In a side-channel attack, information leaked from a physical device is analyzed to infer sensitive information. A chatbot or LLM could be used to perform the analysis in this type of attack [82].

Low-level privilege access attack: Chatbots lack the low-level access required for the execution of OS-level attacks. Like their use for hardware-level attacks, in the case of OS-level attacks, chatbots can be used to gather information on the target operating system. The work in [82] discusses an OS-level attack scenario where a user gains access to the target system with the intent of becoming a more privileged user. In this case, the chatbot is asked to assume the role of a user that wants to become a more privileged user. The commands generated are executed remotely on the target machine to complete the attack and gain more privileges [82,87].

5.3. Some Representative Research Works on Attacks on Chatbots and Attacks Using Chatbots

In this section, we delve into some of the representative research works that explored the use of chatbots for attacks and current attacks that target chatbots. Table 8 offers a comparison of the examined related works.

Beckerich et al. in [89] showed how ChatGPT can be used as a proxy to create a remote access Trojan (RatGPT). Since connections to an LLM are seen as legitimate by an IDS, this method can avoid detection. A harmless executable bootstraps and weaponizes itself with code generated by an LLM, then communicates with a command and control (C2) server using vulnerable plugins. A modified jailbreak prompt bypasses ChatGPT’s filters, dynamically generating the C2 server’s IP address. The payload generation starts when the victim runs the executable, which prompts the LLM to create the C2 server’s IP address and Python code. This weaponizes the executable and enables C2 communication. The attack setup includes ChatGPT, a virtual private server (VPS) hosting the C2 server, the executable, and an automated CAPTCHA solver service. Social engineering tricks the victim into running the executable, which gathers and sends sensitive data to the C2 server.

In [22], Gupta et al. explored chatbot vulnerabilities and how to exploit them, demonstrating methods like DAN (bypassing filters) and CHARACTER Play (roleplaying for restricted responses). They showed ChatGPT generating malicious code and using its API to create ransomware and malware. Li et al. in [83] investigated vulnerabilities in ChatGPT and New Bing through experiments. They showed that while existing prompts fail to extract personally identifiable information (PII), a new multistep jailbreaking prompt (MJP) method effectively bypasses ChatGPT’s filters. They also identified that New Bing’s search paradigm can inadvertently leak PII. Testing direct, jailbreaking, and MJP prompts, they found ChatGPT memorized 50% of Enron emails and 4% of faculty emails, with MJP achieving 42% accuracy for frequent emails (increasing to 59% with majority voting). New Bing’s direct prompts achieved 66% accuracy for retrieving emails and phone numbers. These findings demonstrate the effectiveness of multistep prompts in circumventing filters in ChatGPT and New Bing.

A black-box fuzzing framework named GPTFUZZER was proposed in [84] for automating jailbreaking prompts for LLMs, as depicted in Figure 7. GPTFUZZER’s workflow involves seed selection, mutation operators, and a judgment model, achieving at least a 90% success rate across various LLMs. It adapts fuzz testing techniques—seed initialization, selection, mutation, and execution. Human-written prompts are collected and prioritized, and seeds are selected using methods like UCB and MCTS-Explore and mutated to create diverse templates. A fine-tuned RoBERTa model detects jailbreaking prompts with 96% accuracy. Experiments with ChatGPT, Vicuna-7B, and LLaMA-2-7B-Chat showed that GPTFUZZER consistently outperformed other methods by generating effective jailbreaking prompts. The work in [77] introduced the black-box generative model-based attack (BGMAttack) method, leveraging models like ChatGPT and BART for creating poisoned samples via machine translation and summarization. Unlike traditional attacks with explicit triggers, BGMAttack uses an implicit trigger based on conditional probability, enhancing stealth and effectiveness. With quality control during training, BGMAttack achieved a 90% to 97% attack success rate on SST-2, AG’News, Amazon, Yelp, and IMDB, outperforming BadNL and InSent in stealth and efficiency, and showed a direct correlation between poisoning ratio and ASR on AG’News.

Roy et al. in [92] illustrated that malicious prompts could coerce ChatGPT into generating phishing website components without extra adversarial techniques like jailbreaking. The threat model aimed to make ChatGPT produce a phishing site mimicking a popular one, using both regular and evasive techniques to steal and transfer credentials. Prompts included design, credential-stealing, exploit-generation, and credential-transfer components, issued sequentially or in a single prompt. The study found that ChatGPT effectively created components for various phishing attacks, including reCAPTCHA, QR code, browser-in-the-browser, and iFrame injection attacks. It also revealed that ChatGPT’s low cost and free hosting options make this method attractive to adversaries, with no significant performance differences across coders of varying experience levels. The study supported the idea that ChatGPT lowers the technical skill threshold for phishing attacks. In [85], Pa Pa et al. used ChatGPT, text-davinci-003, and Auto-GPT to create seven malware programs and two attack tools of varying complexities. The malware was categorized into simpler tools (like a DoS Attack Tool) and more complex threats (like AI-powered Sextortion Malware). Auto-GPT’s novel use for generating malware prompts demonstrated significant vulnerabilities in AV and EDR tools. ChatGPT avoided direct malware requests, while text-davinci-003 did not need jailbreaking. Auto-GPT effectively bypassed safety controls using combined prompts. The study found low detection rates for non-obfuscated malware (up to 27% for exe files, <3% for elf files) and higher rates for obfuscated malware (about 30% for ChatGPT tools), highlighting current defenses’ inadequacy against AI-generated threats.

The study of Si et al. in [93] explored chatbot responses to toxic and non-toxic prompts using a technique called ToxicBuddy, which fine-tunes GPT-2 to generate non-toxic queries that trigger toxic responses. Operating locally, ToxicBuddy used data from /pol/ and Reddit, analyzed with the Perspective API, and measured toxicity through Q-scores (queries) and R-scores (responses). Results showed that toxic queries led to more toxic responses. ToxicBuddy was effective in both closed-world and open-world settings, with larger models being more vulnerable. Defenses like Safety Filter (SF), SaFeRDialogues (SD), and Knowledge Distillation (KD) were evaluated, with SF and SD reducing the NT2T rate to under 1% and 6%, respectively, and KD achieving about 4%. In [74], Wang et al. tested ChatGPT’s robustness with zero-shot benchmarks like AdvGLUE and ANLI (adversarial natural language inference), evaluating models including ChatGPT, DeBERTa-L, BART-L, and others. ChatGPT showed notable improvements over time, excelling in adversarial tasks and achieving the best performance in zero-shot English-to-Chinese translation with text-davinci-003. The study found GPT models generally outperformed others in zero-shot tasks. Liu et al. in [94] proposed HouYi, a black-box prompt injection attack inspired by web injection techniques. HouYi’s payload has three parts: a framework component, a separator component, and a disruptor component. The attack involves context inference, payload generation, and feedback. By analyzing user-application interactions, HouYi creates malicious prompts to trick LLMs. It was tested on 36 LLM-supported applications, successfully attacking 31, with vulnerabilities confirmed by vendors. Ba et al. in [95] proposed SurrogatePrompt for automating attack prompt generation to bypass Midjourney’s safety filter (Figure 8). It identifies and substitutes sensitive segments in prompts with surrogate content using the following: (1) alternative expressions for keywords with LLMs like ChatGPT, (2) image-to-text functionality for prompt creation, and (3) image-to-image functionality to evade filters. The framework used Midjourney 5.0 and ChatGPT, evaluated through a CLIP-based NSFW detector and image censorship. Results showed that Midjourney effectively defended against explicit and violent prompts.

6. Defenses for Chatbots and Defenses Using Chatbots

This section discusses defenses for chatbots in Section 6.2 and defenses using chatbots in Section 6.1. Then, a literature review of related works exploring these defenses is presented in Section 6.3. Table 9 summarizes defenses for chatbots and defenses using chatbots, while Table 10 compares the surveyed related works.

6.1. Defenses Using Chatbots

To enhance efficiency in managing workloads, an increasing number of cybersecurity professionals are turning to the utilization of chatbots. For example, they are integrating chatbots into defense automation systems. The use of chatbots for the automation of cyber defenses has several advantages. For instance, instead of manually scrutinizing a piece of code or tool, a chatbot can be used to perform more thorough and apt analysis [22]. Additionally, beyond mere task automation, chatbots can furnish users with feedback on their findings and suggest strategies for addressing potential vulnerabilities. If further actions (e.g., code generation or report writing) are necessary, the chatbot then can also help, thereby further alleviating the user’s burden. Moreover, chatbots play a vital role to ensure the security of code. By using chatbots to identify potential security bugs in code, the reliance on the expertise of individual reviewers is lessened. This broader knowledge base and heightened skill level mitigate the risk of overlooking bugs, which potentially leads to codes of superior quality compared to that reviewed exclusively by humans. Furthermore, chatbots excel in the creation of secure code due to their proficiency in multiple languages and their expansive knowledge base [22].

Chatbots offer valuable support in policy development and review. High-quality policies are essential for safeguarding an organization’s integrity. The use of chatbots in policy development streamlines the process, which reduces development time and costs as well as minimizes the expertise required [96]. Security questionnaires are pivotal for analyzing a vendor’s security concerns, which ensures compliance with security policies. Chatbots expedite the preparation of these questionnaires and facilitating compliance [96]. Efficient incident response depends on a well-defined and universally understood recovery process within an organization. Chatbots play diverse roles in the process such as data analysis during and post-incident. They can generate reports, ascertain incident causes, and offer recommendations for preventing future occurrences. Furthermore, chatbots can notify relevant parties, facilitate incident response revisions, and furnish incident summaries [96].

6.2. Defenses for Chatbots

Corpora cleaning is a defense mechanism for chatbots during the training phase. Since an LLM learns from its training corpora, it is crucial to ensure the data’s quality. There is a high risk that training corpora will contain corrupt data. To address this issue, pipelines have been developed that clean corpora and produce quality training data. Pipelines for cleaning corpora typically involve language identification, detoxification, debiasing, deidentification, and deduplication. Debiasing and detoxification are vital steps to remove harmful content [82]. Another defense during chatbot training is optimization. Optimization objectives guide LLMs’ learning and behavior. Choosing effective and robust training methods is key for safeguarding optimization goals. Adversarial training and robust fine-tuning are examples of preferable training methods [82].

Implementing the proper instruction preprocessing methods is crucial for secure chatbots and LLMs. Proper preprocessing reduces the chance that the model will be used maliciously or will receive suspicious instructions. This reduces the risk of malicious use and includes techniques such as instruction manipulation, purification, and defensive demonstration [82]. When using chatbots, individuals must understand the proper protocols within their organization that involve usage guidance and employee training [96]. Employees should know the chatbot’s role, usage goals, and secure practices. Protecting sensitive data is vital, requiring strong access control mechanisms and adherence to established guidelines to prevent unauthorized access [96].

6.3. Some Representative Research Works on Defenses for Chatbots and Defenses Using Chatbots

Chen et al. in [97] presents a moving target defense (MTD) LLM system designed to generate non-toxic, information-rich answers consistent with outputs from various models. Traditionally, MTD involves dynamic changes to data formats or application code. This system enhances robustness against adversarial attacks by aligning responses from multiple models. The system’s robustness was tested by analyzing responses from eight LLMs to a single query. Unsafe or refusal answers were excluded, and responses from models like ChatGPT 3.5, ChatGPT 4, Gemini, Anthropic, and several LLaMA models were manually labeled for quality. The proposed algorithm optimized randomized responses based on quality and toxicity using a key component (i.e., CALCULATEMETRICS) to calculate a composite score for each response. The GETRESPONSE function selected a non-refusal response based on this score. Responses were categorized into refusal, malicious, and instructive answers using metrics like ASR and refusal rate to evaluate the defense. The MTD LLM system effectively blocked all adversarial prompts, as shown in Figure 9. In [98], Robey et al. introduced SmoothLLM, an algorithm designed to counter jailbreaking attacks on LLMs. SmoothLLM uses randomized smoothing techniques to perturb adversarial prompts and aggregate outputs, requiring fewer queries and effectively mitigating suffix attacks. The core idea is that greedy coordinate gradients (GCG) generate vulnerable adversarial suffixes that can be disrupted by modifying a few characters. The algorithm employs three perturbation methods (i.e., insert, swap, and patch) to modify a percentage (q) of characters in the prompt, enhancing defense. Experiments with seven LLMs (LLaMA-2, Vicuna, GPT-3.5, GPT-4, Claude-1, Claude-2, and PaLM-2) showed that SmoothLLM significantly reduces attack success rates while maintaining model performance with minimal decreases (<5%) in NLP benchmarks.

Kumar et al. in [99] proposed an erase-and-check framework for detecting harmful prompts. This method removes tokens and uses a safety filter to check remaining subsequences. If any subsequence is harmful, the entire prompt is deemed harmful. The “is-harmful” filter combined LLaMA-2 and DistilBERT: LLaMA-2 checked for “not harmful” responses, while DistilBERT assessed overall safety. The framework handled three attack types: suffix, insertion, and infusion. DistilBERT achieved 100% accuracy with high efficiency, performing around 20 times faster than LLaMA-2 and about 40 times faster for insertion attacks. The study introduced three defenses: randomized erase-and-check (RandEC), GreedyEC, and GradEC, with RandEC achieving 90% accuracy, GreedyEC maintaining a misclassification rate under 4%, and GradEC achieving 76% accuracy with more iterations. Roy et al. [88] explored how commercially available chatbots can be exploited for phishing attacks, extending prior research in [92] by examining malicious prompts for creating phishing websites and emails without advanced techniques. The study used a BERT-based tool to detect such prompts and tested various attacks. It found that LLMs, like ChatGPT, are effective at generating phishing content, with ChatGPT excelling in website creation and GPT-4 outperforming other models in email generation. Detection tools showed similar efficacy for LLM-generated and human-generated phishing. A fine-tuned RoBERTa model achieved high performance, with 92% accuracy for phishing prompt detection and 94% accuracy and 95% precision for phishing emails.

As shown in Figure 10, Baudart et al. in [100] introduced BotShield, a method for securing chatbots using cloud computing without altering their core functions. BotShield employed homomorphic redaction to protect personal information and context digression to handle toxic conversations. Initially prototyped with IBM Watson, it used principles like minimizing trust and separation of concerns to balance functionality and security. BotShield featured various redaction types, on-the-fly authentication, and sentiment-based escalation. It also involved uploading processors, defining an update pipeline, and maintaining a secure external context database. Arora et al. in [101] proposed a chatbot for social media cyber defense. This chatbot performed sentiment analysis on Twitter to predict threats by monitoring tweets about significant events and cyberattacks. Built using the design of experimental methodology on the BotLibre platform, the chatbot collected real-time and historical tweets via the Twitter API. It stored the data in MySQL after processing it in JSON format. Preprocessing included speech tagging, noise removal, and lemmatization. It was followed by sentiment extraction using the Apriori algorithm and the NICCS (National Initiative for Cybersecurity Careers and Studies) lexicon. The sentiment were classified as neutral, positive, or negative with TextBlob and VADER. TextBlob was superior in sentiment analysis. The results were visualized in a word cloud, shown in Figure 11.

Edu et al. [102] presented a methodology to assess chatbot security on messaging platforms through static and dynamic analysis. It was tested on Discord, which lacks user-permission checks for chatbots. The study revealed security risks due to third-party server use and platform-specific APIs. A Selenium-based web scraper collected data on chatbots by not only identifying permissions, commands, and repositories but also evaluating privacy policies for traceability analysis. Dynamic analysis with honeypots and canary tokens tested chatbot behavior. Results showed that 75% of chatbots request permissions, with “send message” and “administrator” being common. The study found that many chatbots lack clear privacy policies such that only 4% have one and emphasized the need for improved transparency and security. In [103], Malik et al. developed a framework for modeling replay attack distortions and proposed a non-learning-based detection method using higher-order spectral analysis (HOSA). This method analyzed replay audio distortions, revealing that the Amazon Echo’s voice recognition was limited and could not distinguish between different voices. During testing, a Home smart speaker misinterpreted a female’s command after a male’s wake phrase. It showed limited voice recognition. An experiment demonstrated that replay attacks could be executed using devices like a Raspberry Pi. The HOSA-based detection method identified non-linearities in audio through higher-order correlations. The tests showed increased non-linearity in bicoherence magnitude and Gaussianity in replay attacks. A second-order replay attack further demonstrated higher quadratic phase-coupling and non-linearity, which helped in the detection phase.

Lempinen et al. in [104] introduced a chatbot application for intrusion detection using GPT-3.5 to analyze Wazuh logs. It aimed to analyze system security, alert users to threats, and provide guidance. Initially using a dual-model approach with fine-tuned Davinci and text-Davinci-003 models, cost considerations led to the adoption of a single GPT-3.5 model. With Wazuh log retrieval for context, the final design simplified the process to direct prompt submissions to GPT-3.5 via an API. The chatbot was deployed as a web application with NGINX and React, while Flask handled requests and GPT-3.5 interactions. Despite its features, user testing revealed concerns about the chatbot’s usefulness and usability. Yamin et al. in [105] introduced a method using LLM hallucinations for diverse cyberattack simulations. In a setup inspired by Alan Turing’s “Imitation Game”, one LLM acted as a cybersecurity expert and another as the CISO to generate scenarios. The design process, akin to eLuna, included preparation, a dry run, execution, evaluation, and repetition. In fact, a framework with six modules (i.e., story, infrastructure, conditions, events, entities, TLOs) was devised. RAG techniques addressed inaccuracies. A multidimensional framework evaluated scenarios for details, technical soundness, creativity, realism, usability, and expandability. Two scenarios demonstrated the framework’s effectiveness for incident response and defense.

7. Limitations of LLM (Chatbots) and Surveyed Case Studies (Works) in This Article

7.1. Limitations of Large Language Models and Chatbots

High-quality and properly labeled test data are crucial for LLMs. Current data labeling methods (i.e., finding the ground truth) depend heavily on human interaction, which is time-consuming and inflexible. Automation is the key solution to improve efficiency in data labeling and related tasks [107]. LLMs require reasoning to perform NLP tasks such as answering questions and using NLU. However, LLMs often provide responses that are seemingly reasonable or correct but which maybe inaccurate [107]. Previous studies indicate that LLM responses may not accurately reflect their reasoning processes. LLMs often use shortcuts that do not line up with their logic.

LLMs can produce biased information in several ways. Training bias occurs when the data used to train an LLM contain biases, which leads to biased responses [108]. Moreover, chatbots generate responses based on user input, so biased or discriminatory input can result in similarly biased or discriminatory responses. The context of user prompts also influences chatbot responses, potentially perpetuating bias. Furthermore, the algorithms used to train LLMs can introduce bias if they prioritize specific metrics, which may cause the LLM to generate responses aligned with those metrics [108].

When answering prompts, an LLM may add information to a response that is not factual or correct to provide a complete answer. This information is often not a part of its training data and is based on inference. This behavior in LLMs is referred to as hallucinations. The true cause of hallucinations is uncertain. The study in [108] suggests that hallucinations are caused by a bias of LLMs that causes them to try to give more verbose outputs. The work in [107] suggests that hallucinations are caused by incompatibilities of the source training and test data. Although the true cause of hallucinations is unclear, they are often caused by issues relating to the model’s training, data, and design. LLMs are difficult to explain and understand due to their vast number of parameters and because they are highly complex. This intricacy makes it challenging to fully comprehend their decision-making processes. Furthermore, there is a lack of transparency which makes it difficult to understand how inputs lead to outputs [108]. Since LLMs use a large dataset, they may develop biases that are not easily understandable. LLMs also contain multiple layers, often using deep neural networks. The construction and complexity of their layers further obscures their functionality.

The alignment of LLMs ensures that they act in safe ways and within the expectations of the developer or user. Adversarial prompts can disrupt the alignment of an LLM. If an LLM contains a mixture of poor- and high-quality components, it can become misaligned by an adversarial prompt. Some LLMs contain a preset aligning prompt to help prevent misalignment. However, an LLM with a preset aligning prompt could still become misaligned if the adversarial prompt was of a certain length [109]. LLMs are often trained using fine-tuning. However, fine-tuning consumes a large amount of memory and requires a significant number of computational resources. The general computational requirements of LLMs are costly as well [41]. A great deal of financial and physical resources is required. When using the Seq2Seq model, the context vector cannot exceed a specific length. If the length of the context vector, which contains the input text, exceeds the maximum length, valuable information could be lost. This results in a Seq2Seq model returning vague responses to larger inputs [41].

All in all, we can roughly state that high-quality test data are very important for LLMs. However, current frameworks are slow-moving and inflexible. LLMs can introduce biases, give inaccurate responses, hallucinate, and struggle with long context vectors. Their lack of transparency and their complexity at times make their decisions hard to understand. Moreover, fine-tuning is resource-expensive, and adversarial prompts may cause LLMs to deviate.

7.2. Some Comments (Limitations) of Surveyed Works in This Paper

The payload generation method in [89] is unreliable due to LLM output’s non-deterministic nature. During RatGPT’s development, plugin failures caused temporary implementation issues. Adversarial prompting in [83] outperformed institutional data retrieval on the Enron Email Dataset. It likely may be owing to consistent email patterns aiding LLMs in generating responses. However, this may not generalize well [83]. To thoroughly assess the multistep jailbreaking process’s effectiveness, the experiments should be repeated with a diverse dataset of PII such as emails and phone numbers. Such datasets can be sourced or curated for the experiments. One key limitation of the work in [84] is its use of human-generated templates for attack patterns. It limits novelty and results in repetitive attacks. Although a mutator adds diversity, GPTFUZZER’s prompts are still susceptible to keyword matching issues. In GPTFUZZER, the AFL fuzzing’s high query volume can reduce attack effectiveness. The study in [77] showed that BGMAttack struggles with a trade-off between ASR and CACC. The work in [85] explored AI for obfuscation but did not reduce detection rates, which demands devising new defenses against AI threats.

The ToxicBuddy attack in [93] currently relies on unreliable external detection tools. The integration of a detection tool would enhance its independence and novelty. Expanding experiments beyond the /pol/ dataset and utilizing GLUE for research could improve the analysis. The study in [74] used AdvGLUE based on ChatGPT’s training on GLUE; hence, repeating experiments with GLUE alone could improve quality. The evaluation in [94] lacks comprehensive quantitative data; thus, adding metrics such as average attack success rate will be beneficial. The erase-and-check method in [99] has high runtime for long sequences, which demands exploration of alternatives to GradEC. Moreover, DistilBERT’s superior performance over LLaMA-2 suggests a promising research direction. The RoBERTa model in [88] detects phishing prompts but lacks real-time applicability. To improve it, expanding and replicating the experiment is suggested. The authors’ contextual observation strategy only addresses text-based phishing and misses other forms, e.g., voice-based attacks. More quantitative data and a deeper analysis of the results in [100] are needed to enhance the overall understandings. The study in [101] does not explain the chatbot’s experimental context. The analytical analysis in [102] chose Discord owing to its lack of user-permission checks. Future research could utilize platforms with such checks to compare results. Variations in keyword meanings could affect the results in [102], and future works should update such findings. Furthermore, Malik et al. in [103] suggested exploring replay attacks on smart home devices. Studies should test the Echo’s Drop-In feature for potential adversarial advantages over the Home or other devices.

Lempinen et al. in [104] focused on chatbot design and user feedback but lacks qualitative data. The authors in [105] covered scenario design preparation well but missed other phases. The chatbot SecBot in [106] is susceptible to overfitting and lacks custom actions, which is planned for future updates. Yamin et al. in [98] introduced SmoothLLM but did not test it with GCG. The testing with GCG and detailing experiments for all models would better showcase strengths and weaknesses. In [97], the evaluation of the MTD LLM system against eight commercial models is unclear because there is ambiguity about whether models are assessed collectively or separately. The erase-and-check framework in [98] could be computationally demanding for long prompts and ineffective against sophisticated attacks. Beckerich et al. in [89] highlighted that RatGPT’s payload generation suffers from unpredictability owing to the non-deterministic nature of LLM outputs. Future works should consider different payload generation techniques and conduct practical evaluations under different attack scenarios.

All in all, analysis of these studies highlights issues like unreliable LLM payload generation, adversarial prompting methods, and limited dataset generalization in AI systems. Furthermore, it also accentuates the need for detailed data, diverse testing environments, and refining techniques to improve both attack and defense strategies.

8. Experimental Analysis

In this section, we illustrate and evaluate several offensive applications of chatbots as discussed in this paper, such as malware generation, a phishing attack, and a buffer overflow attack. Following the presentation of our findings for each offensive technique, we compare them with existing works. The chatbot models are frequently retrained, which means they may be less susceptible to attacks as time goes on. The objective of this section is to gauge the effectiveness of these strategies as of the time of writing this paper. In this section, we employ advanced AI techniques, such as chatbot/LLM prompts, to generate and show different attacks, as depicted in Figure 12, Figure 13, Figure 14, Figure 15, Figure 16, Figure 17, Figure 18, Figure 19 and Figure 20.

Throughout our experiments, we used several jailbreaking prompts to facilitate our attacks. Online forums such as GitHub host compilations of different jailbreaking prompts together with examples of the prompts. We tested the DAN and DUDE prompts detailed in [22,80]. The prompts in [22,80] are designed for ChatGPT. In some of our experiments, we used Gemini and compared its responses with those of ChatGPT 3.5. When using Gemini, we substituted mentions of ChatGPT to Gemini as needed. For Gemini, the DUDE prompt was used as it did not specify instructions for ignoring an information cutoff date since Gemini has up-to-date information.

8.1. Malware Generation

In these experiments, we explored the manipulation of Gemini and ChatGPT to generate malware, specifically a keylogger. Each chatbot was prompted to generate code for a keylogger, both with and without the use of a jailbreaking prompt. Additionally, after jailbreaking ChatGPT, we asked it to generate prompts that could be used to trick a chatbot into generating malicious code without jailbreaking it. We tested these prompts against a non-jailbroken instance of ChatGPT. Figure 12 and Figure 16 depict the results of the malware generation experiment.

When prompted to generate code for a keylogger without a jailbreaking prompt, Gemini refused to generate the code. Before prompting Gemini again, we provided it with the DUDE prompt detailed in [80]. Gemini accepted the DUDE prompt but refused to generate the keylogger code, as shown in Figure 12. We repeat the same process with ChatGPT. When prompted to generate the keylogger code without a jailbreaking prompt, ChatGPT refused to generate the code. Next, we provided ChatGPT with the DAN prompt detailed in [22]. After it accepted the prompt, we asked it to generate code for a keylogger. Next, ChatGPT was prompted to make the code more detailed. It complied with our request. The jailbroken instance of ChatGPT was then asked to generate a set of prompts to trick a chatbot into generating similar keylogging code. To test these prompts, we opened a new ChatGPT window and tested one. When given the generated prompt, the new ChatGPT instance generated a snippet of code that can be manipulated for keylogging. However, it is important to note that this code is not inherently malicious but could be leveraged for malicious purposes.

Overall, we achieved desirable results in our malware generation experiments. ChatGPT responded to all requests for malicious code and provided more detailed code when asked. It also generated instructions for executing the keylogger attack. From the perspective of an adversary, obtaining the malicious code was relatively easy. Jailbreaking prompts are readily available online, and crafting follow-up prompts required only basic knowledge of cybersecurity and chatbot functionality. It was also easy to obtain prompts to use in the future that require no jailbreaking. Compared to other works that perform similar experiments, we achieved similar results. This suggests that ChatGPT was still highly vulnerable to such attacks. In contrast, getting Gemini to respond to malicious prompts proved challenging, which suggests that Gemini was robust against these types of attacks.

8.2. Phishing Email Generation

In this experiment, we investigated the manipulation of ChatGPT to generate a phishing email, as shown in Figure 17. Our goal was to employ three different prompting methods. The first method was asking ChatGPT outright without the use of a jailbreaking prompt. The second method was asking ChatGPT after it had accepted a jailbreaking prompt. The third method was the use of a prompt generated by ChatGPT after it had been jailbroken on a non-jailbroken instance. However, when prompted directly to generate a phishing email, ChatGPT responded immediately.

8.3. Buffer Overflow Attack

In this experiment, we tasked ChatGPT with generating the malicious code required for a buffer overflow attack on a web server written in C++. We used the same prompting process as described in the previous experiments. Figure 18 depicts the results of the buffer overflow experiment. As expected, without the use of a jailbreaking prompt, ChatGPT refused to generate the code. We once again provided the DAN prompt to ChatGPT. Once accepted, we prompted ChatGPT to generate the malicious code for the buffer overflow attack. ChatGPT successfully generated the code. Finally, we asked the jailbroken instance of ChatGPT to create a prompt to deceive a chatbot into generating code for a buffer overflow attack. When this generated prompt was used, ChatGPT generated code that could potentially be vulnerable to a buffer overflow attack but is not inherently malicious.

8.4. Discussion of Results

Here, we evaluate the offensive utilization of chatbots, in particular, their use in malware generation, phishing attacks, and buffer overflow attacks. By employing advanced AI methods and different jailbreaking prompts, we show the effectiveness of these attacks. Based on the results of the malware generation experiments, Gemini is well guarded against prompt-based attacks. Even when a jailbreaking prompt was initially accepted by Gemini, the further requests for the generation of malicious content were denied. Overall, our attempts at malware generation using ChatGPT were very successful. It was expected that it would not respond to outright requests without the use of an adversarial method. Once ChatGPT was jailbroken, obtaining the malware was quite simple. Our results with the DAN prompt were like those in [22]. This suggested that ChatGPT was still vulnerable to jailbreaking methods and could be easily manipulated beyond its developers’ intents.

ChatGPT generated a phishing email when asked directly, and no adversarial methods were required. This was an unexpected outcome. An explanation for this is perhaps ChatGPT did not recognize our request as a malicious action. In our prompt, we did explicitly state that we were looking for a phishing email. Despite not disguising our intent, ChatGPT still complied with our request, which may indicate a flaw in its security controls. To further test this, we devised a second phishing email prompt with a completely different context. Once again, ChatGPT complied with our request without the use of any other adversarial methods, as shown in Figure 20. After jailbreaking ChatGPT, it was easy to manipulate it to produce malicious code for a buffer overflow attack. For both the malware generation and buffer overflow attack experiments, we asked ChatGPT to generate a prompt to deceive another chatbot into generating malicious code. In both cases, a non-jailbroken instance of ChatGPT provided code that was not outright malicious but could potentially be misused. This is because when ChatGPT is not directly instructed to ignore its content filters, it will not generate outright malicious code.

9. Open Issues and Future Research Directions

The experimental outcomes in Section 8 fundamentally align with the literature review presented in Section 5, Section 6 and Section 7. Both the experimental analysis and the literature review not only recognized the vulnerability of chatbots to different manipulations and attacks but also highlighted the necessity for robust defense frameworks and continuous updates to mitigate such threats. However, the experimental analysis provided additional practical insights with specific examples to underscore the potential for real-world exploitation and severity of threats. In particular, the use of distinct jailbreaking prompts (e.g., DAN and DUDE) and the comparison of different chatbot models (e.g., ChatGPT and Gemini) provided notable new perspectives that expanded on the findings from the literature review, detailed in Section 8.4. Moreover, these experiments demonstrated that despite built-in safeguards, existing defense mechanisms might not be sufficient. The experimental findings emphasize vulnerabilities, accentuate gaps in current defenses, validate the limitations discussed, and make a compelling case for further research and development in this field. All in all, significant strides have been taken to explore the advantages and drawbacks of utilizing chatbots (LLMs), especially from cybersecurity perspectives. However, there are still notable areas that remain insufficiently investigated. In this section, we provide a brief overview of some open issues and potential research directions.

9.1. Alignment Science in LLMs/Chatbots

Alignment pertains to the crucial process of ensuring that chatbots (LLMs) behave in harmony with human values and preferences. Alignment science in LLMs and chatbots is a rapidly evolving field. Unaligned chatbots and LLMs usually tend to disruptive behaviors which impact their credibility, safety, privacy concerns, fairness, and popularity. Developers and researchers are actively enhancing alignment schemes (e.g., [110,111]) to ensure that chatbots (LLMs) are more dependable, secure, and aligned with human values. Future works should be focusing on areas such as alignment robustness (i.e., alignment across diverse contexts and unintended biases), interpretable models (i.e., decisions and behaviors in a human-understandable manner), value learning (i.e., methods to learn and internalize human values, preferences, and norms), human-in-the-loop alignment (i.e., interactive learning paradigms and techniques for integrating human user preferences into the model’s decision-making process), ethical decision making (i.e., ethical frameworks and principles to guide the behavior of models in ethically challenging situations), bias mitigation (i.e., developing strategies to curtail biases present in training data and assure fair and equitable outcomes across different demographic groups), adversarial alignment (i.e., technical defenses and strategies for resilience in the face of manipulating or subverting attacks), adaptive and long-term alignment (i.e., alignment techniques considering the impact of model evolution, changing societal norms, and long-term consequences of model deployment), multiagent alignment (i.e., addressing alignment challenges where multiple LLMs and chatbots interact with each other and with humans), scalability and efficiency (i.e., scalable and computationally efficient methods with real-time interaction and low-resource environments), alignment taxonomy (i.e., comprehensive classification of alignment categories), cross-cultural alignment (i.e., alignments techniques with diverse cultural values and norms, considering the global deployment of models), and legal and regulatory compliance (i.e., addressing legal and regulatory challenges related to alignment, including privacy regulations, liability issues, and compliance with standards for ethical AI development). Furthermore, thorough studies should be conducted to determine which areas and their combinations would be best for LLM/chatbot alignments.

9.2. Computational Issues of Jailbreaking

Robust deep learning research reveals a significant computational disparity between efficient attacks and expensive defenses, primarily because techniques like adversarial training and data augmentation necessitate model retraining. However, the study in [98] indicated that the bulk of the computational burden in adversarial prompting falls on the attacker. Therefore, future research should prioritize both developing improved attacks that are not easily thwarted by defenses and developing more robust defenses for LLMs against such attacks.

9.3. Hallucination Challenges in Chatbots and LLMs

Hallucination is a phenomenon in which chatbots and LLMs generate inaccurate or outright false information. Privacy issues are exacerbated by misinformation and misuse from chatbots’ and LLMs’ hallucinations, which underscores the need for improved AI accuracy and integrity. Now with millions of LLM and chatbot users, similar questions receiving the same hallucinatory answers can spread misinformation widely. Attackers could possibly exploit vulnerabilities due to hallucinations. For instance, if ChatGPT fabricates a non-existent package, attackers can publish a malicious version of those codes. Users who download this suggested package risk compromising their computers [112]. Still a major issue is identifying and classifying hallucinations. It is because LLMs generate plausible but incorrect information that makes it hard to distinguish accurate from erroneous outputs. Understanding the exact scientific reasons behind hallucinations is challenging, which demands more exploration into the architecture and training processes of models. Future dedicated research should also focus on the ethical and societal implications of LLMs’ hallucinations. In particular, attention should be directed towards misinformation and its impact on public trust and decision making. LLMs are prone to hallucinations as input length increases. However, there is a lack of studies and evaluation metrics for such hallucinations. Given that LLMs frequently encounter long inputs, this area warrants significant research attention. Research suggests that an examination of RAG (retrieval-augmented generation) can be exploited to mitigate LLM hallucinations, but RAG is itself susceptible to them [113]. A deeper analysis of RAG’s effectiveness in reducing hallucinations is necessary to weigh its benefits and drawbacks accurately. Addressing hallucination challenges requires interdisciplinary collaboration, combining insights from AI, cognitive science, ethics, and policymaking to develop holistic solutions.

9.4. Versatile Defenses: Perturbing Input Prompts

Any defense that has the ability to perturb entire input prompts can make itself a versatile defense against different adversarial prompting-based jailbreaking attacks. Any such solution is highly adaptable, and variations can be developed, e.g., SmoothLLM in [98]. While current endeavors have demonstrated promising results, various open issues warrant attention, such as refining the efficiency of prompt perturbation mechanisms to minimize computational overhead, the generalized robustness of LLM defenses across diverse linguistic contexts and domains, and the interplay between prompt perturbation strategies and model architectures to yield insights.

9.5. Advancing Moving Target Defenses Strategies for LLM and Chatbots

Moving target defense (MTD) chooses outputs from a variety of models and possibly results in less toxic responses. Specifically, MTDs enhance security by dynamically changing the attack surface in order to make it tougher for adversaries to exploit an LLM’s vulnerabilities. This area requires further research and methodological development. The development of effective MTD strategies tailored specifically for LLMs could include dynamic code obfuscation, i.e., dynamically obfuscating the internal code and structure of the LLMs during runtime, e.g., code encryption or function polymorphism. The other approaches could be behavioral diversity (i.e., strategies to vary the LLM’s behavior by dynamically altering its response patterns, decision-making mechanisms, or output formats), adaptive resource allocation (i.e., algorithms to dynamically adjust the LLM’s computational resources based on security threat levels to reduce the impact of potential jailbreaking attempts), anomaly detection and response (i.e., real-time anomalous behavior monitoring to spot jailbreaking attempts in the LLM’s environment), continuous training and evolution (i.e., efficient ways to continually train the LLM to address evolving security threats), collaborative defense mechanisms (i.e., multiple LLM instances working together to thwart jailbreaking attempts by sharing intelligence and resources across distributed environments), and ethical and privacy implications (i.e., studying ethical and privacy considerations associated with deploying MTD techniques for LLMs).

9.6. Erase-and-Check like Frameworks

The “erase-and-check” framework in [99] was designed to defend against adversarial attacks on chatbots and LLMs by providing certifiable safety guarantees. This framework involved individually erasing tokens from input prompts and inspecting the resulting subsequences using a safety filter. The safety certificate ensured that even if an attacker tried to sneak harmful content through slight modifications (i.e., adversarial prompts), they were not mistakenly labeled as safe. The effectiveness of an “erase-and-check” framework depends on the quality of the safety filter. Kumar et al. in [99] acknowledged that designing a robust safety filter remains a challenge. Moreover, the framework’s computational cost might be high for very long prompts. Future research efforts can be put on developing erase-and-check-like schemes, as this framework is effective against a wider range of attacks, for instance, devising more efficient algorithms for the erase-and-check procedure to improve scalability. Furthermore, designing techniques to automatically learn and improve the safety filter over time could greatly help the framework. Other directions include testing erase and check against more sophisticated adversarial attacks, enhancing interpretability safety filtering mechanisms to improve trust and understanding of the decision-making process, and real-time implementations of the erase-and-check framework.

9.7. Next Generation of Defenses against Attacks

LLMs and chatbots are becoming increasingly sophisticated, but so are the adversarial attacks against them. Future LLM and chatbot defense frameworks, particularly ensemble defense methods to secure LLMs and chatbots, could be devised. Towards this aim, frameworks such as fortifying jailbreak defenses with better context analysis and evolving policies and moderation to stay ahead of attackers could be explored. Moreover, security measures like zero-trust architecture, role-based access controls (RBAC), and multifactor authentication to make social engineering attempts suboptimal can be further explored. Furthermore, adversarial training with diverse and sophisticated attacks across various contexts could be studied. Schemes such as improved differential privacy schemes, watermarking and provenance tracking of models and information, and federated learning to bolstering privacy and security are promising avenues for future works.

9.8. Large-Scale Evaluation of Chatbots and Large Language Models

Evaluating LLMs and chatbots against attacks and privacy violations is a critical research area that requires standardized frameworks and methodologies to ensure robust assessment [114,115]. Such benchmark evaluations could serve several purposes, e.g., identifying model weaknesses, evaluating performance, and devising defense strategies. Current limitations include the lack of standardized metrics and restricted evaluation datasets. Future endeavors should be focused on creating comprehensive taxonomies, formulating adversarial retraining strategies, detecting and mitigating data poisoning effects, understanding the transferability of attacks across different model architectures, and incorporating human-in-the-loop defenses. Advancements in this direction will bolster the security and trustworthiness of LLMs and chatbots in different applications.

10. Conclusions

Chatbots and LLMs have immensely impacted society and are now widely being utilized in different applications ranging from content generation to healthcare, education, cybersecurity, and more. However, chatbots and LLMs in cybersecurity could be used as both defensive tools (e.g., real-time threat responses) and offensive tools (e.g., generating malware). There is a lack of a systematic and comprehensive article that covers the history of LLMs and chatbots, efficiently addresses their cybersecurity duality aspects through experimental analysis, and highlights open issues and potential future research directions in chatbots and LLMs from a cybersecurity perspective. Therefore, this article contributes significantly to the field of chatbot security and ethics. In particular, this article delves into the dual role of chatbots within the cybersecurity domain to shed light on their functions both as defensive measures and potential offensive tools. This paper first presented an overview of the historical development of LLMs and chatbots with their functionality, applications, and societal impacts. Then, it reflected on dual applications of chatbots in cybersecurity by reviewing notable works on both offensive attacks and their defensive techniques using chatbots. We also conducted experimental analyses to demonstrate and evaluate various offensive applications of chatbots. Finally, this paper highlighted open issues and challenges regarding the duality of chatbots and discussed potential future research directions aimed at fostering responsible usage and strengthening both offensive and defensive cybersecurity strategies.The contributions of this article collectively underscore the ongoing need to improve the security, reliability, and ethical use of chatbots and LLMs. It is hoped that this article will complement existing studies and will inspire further interdisciplinary research and development of innovative ways to harness the duality potential of chatbots and LLMs in cybersecurity.

Author Contributions

Conceptualization, Z.A. and H.S.; methodology, H.S. and Z.A.; formal analysis, H.S.; investigation, H.S.; writing—original draft preparation, H.S.; writing—review and editing, Z.A.; supervision, Z.A.; project administration, Z.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

No new data were created or analyzed in this study. Data sharing is not applicable to this article.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

Term	Acronym
Artificial intelligence	AI
Machine learning	ML
Natural language processing	NLP
Large language models	LLMs
Language models	LMs
Multilayer perceptrons	MLPs
Pretrained language models	PLMs
Reinforcement learning from human feedback	RLHF
Natural language understanding	NLU
Language model for dialogue applications	LaMDA
Sequence-to-sequence model	Seq2Seq
Artificial intelligence markup language	AIML
In-context learning	ICL
Man in the middle	MitM
Do anything now	DAN
Living-off-the-land binaries	LOBLINs
Multistep jailbreaking prompt	MJP
Attack success rate	ASR
Blackbox generative model-based attack method	BGMAttack
Clean accuracy	CACC
Sentence perplexity	PPL
Grammatical error numbers	GEM
Executable and linking format	elf
Safety filter	SF
SaFeRDialogues	SD
Knowledge distillation	KD
BlenderBot-small	BBs
TwitterBot	TB
Non-toxic to toxic	NT2T
Non-toxic to non-toxic	NT2NT
Toxic to toxic	T2T
Non-toxic query	NTQ
BlenderBot-large	BBl
BlenderBot-medium	BBm
Dialogue safety classifier	DSC
Out-of-distribution	OOD
Adversarial natural language inference	ANLI
Natural language inference	NLI
Personally identifying information	PII
Higher-order spectral analysis	HOSA
Toxic to non-toxic	T2NT
Host-based intrusion detection system	HIDS
Retrieval-augmented generation	RAG
Terminal learning objectives	TOLs

References

Littman, M.L.; Ajunwa, I.; Berger, G.; Boutilier, C.; Currie, M.; Doshi-Velez, F.; Hadfield, G.; Horowitz, M.C.; Isbell, C.; Kitano, H.; et al. Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100); 2021 Study Panel Report; Stanford University: Stanford, CA, USA, 2021. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Meta, Multimodal Generative AI systems. Available online: https://ai.meta.com/tools/system-cards/multimodal-generative-ai-systems/ (accessed on 31 May 2024).
Müller, V.C.; Bostrom, N. Future progress in artificial intelligence: A poll among experts. AI Matters 2014, 1, 9–11. [Google Scholar] [CrossRef]
Questionnaire Experts Results: Future Progress in Artificial Intelligence. Available online: https://www.pt-ai.org/polls/experts (accessed on 31 May 2024).
Korteling, J.H.; van de Boer-Visschedijk, G.C.; Blankendaal, R.A.; Boonekamp, R.C.; Eikelboom, A.R. Human-versus artificial intelligence. Front. Artif. Intell. 2021, 4, 622364. [Google Scholar] [CrossRef] [PubMed]
Grace, K.; Salvatier, J.; Dafoe, A.; Zhang, B.; Evans, O. When will AI exceed human performance? Evidence from AI experts. J. Artif. Intell. Res. 2018, 62, 729–754. [Google Scholar] [CrossRef]
Oracle. What Is a Chatbot? Available online: https://www.oracle.com/chatbots/what-is-a-chatbot/ (accessed on 31 May 2024).
Adamopoulou, E.; Moussiades, L. Chatbots: History, technology, and applications. Mach. Learn. Appl. 2020, 2, 100006. [Google Scholar] [CrossRef]
Turing, A. Computing Machinery and Intelligence (1950); Oxford University Press eBooks: Oxford, UK, 2004. [Google Scholar]
Qammar, A.; Wang, H.; Ding, J.; Naouri, A.; Daneshmand, M.; Ning, H. Chatbots to ChatGPT in a cybersecurity space: Evolution, vulnerabilities, attacks, challenges, and future recommendations. arXiv 2023, arXiv:2306.09255. [Google Scholar]
Weizenbaum, J. ELIZA—A computer program for the study of natural language communication between man and machine. Commun. ACM 1966, 9, 36–45. [Google Scholar] [CrossRef]
GitHub Copilot Documentation, About GitHub Copilot—GitHub Docs. Available online: https://docs.github.com/en/copilot/about-github-copilot (accessed on 31 May 2024).
Spataro, J. Introducing Microsoft 365 Copilot—Your Copilot for Work 2023. Available online: https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/ (accessed on 28 July 2024).
Introducing ChatGPT. Available online: https://openai.com/blog/chatgpt (accessed on 31 May 2024).
Pichai, S. An Important Next Step on our AI Journey 2023. Available online: https://blog.google/technology/ai/bard-google-ai-search-updates (accessed on 31 May 2024).
Anthropic. Introducing Claude. Available online: www.anthropic.com/news/introducing-claude (accessed on 31 May 2024).
Følstad, A.; Brandtzaeg, P.B.; Feltwell, T.; Law, E.L.; Tscheligi, M.; Luger, E.A. SIG: Chatbots for social good. In Proceedings of the Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–4. [Google Scholar]
Misischia, C.V.; Poecze, F.; Strauss, C. Chatbots in customer service: Their relevance and impact on service quality. Procedia Comput. Sci. 2022, 201, 421–428. [Google Scholar] [CrossRef]
Zawacki-Richter, O.; Marín, V.I.; Bond, M.; Gouverneur, F. Systematic review of research on artificial intelligence applications in higher education–where are the educators? Int. J. Educ. Technol. High. Educ. 2019, 16, 1–27. [Google Scholar] [CrossRef]
Reis, L.; Maier, C.; Mattke, J.; Weitzel, T. Chatbots in healthcare: Status quo, application scenarios for physicians and patients and future directions. Eur. Conf. Inf. Syst. 2020, 163, 1–13. [Google Scholar]
Gupta, M.; Akiri, C.; Aryal, K.; Parker, E.; Praharaj, L. From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy . IEEE Access 2023, 11, 80218–80245. [Google Scholar]
Zemčík, M.T. A brief history of chatbots. Destech Trans. Comput. Sci. Eng. 2019, 10, 1–5. [Google Scholar] [CrossRef] [PubMed]
Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
Garvin, P.L. The Georgetown-IBM Experiment of 1954: An Evaluation in Retrospect; Mouton: Berlin, Germany, 1967; pp. 46–56. [Google Scholar]
Chomsky, N. Syntactic Structures; Mouton de Gruyter: Berlin, Germany, 2002. [Google Scholar]
Foote, K.D. A Brief History of Natural Language Processing, DATAVERSITY. 2023. Available online: https://www.dataversity.net/a-brief-history-of-natural-language-processing-nlp/ (accessed on 31 May 2024).
Fryer, L.; Carpenter, R. Bots as language learning tools. Lang. Learn. Technol. 2006, 10, 8–14. [Google Scholar]
Foote, K.D. A Brief History of Large Language Models, DATAVERSITY. 2023. Available online: https://www.dataversity.net/a-brief-history-of-large-language-models (accessed on 31 May 2024).
Mashette, N. Small Language Models (SLMS). Medium. 12 December 2023. Available online: https://medium.com/@nageshmashette32/small-language-models-slms-305597c9edf2 (accessed on 31 May 2024).
Peng, Z.; Ma, X. A survey on construction and enhancement methods in service chatbots design. Ccf Trans. Pervasive Comput. Interact. 2019, 1, 204–223. [Google Scholar] [CrossRef]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
Engati. Statistical Language Modeling. Available online: www.engati.com/glossary/statistical-language-modeling (accessed on 31 May 2024).
Minaee, S.; Mikolov, T.; Nikzad, N.; Chenaghlu, M.; Socher, R.; Amatriain, X.; Gao, J. Large language models: A survey. arXiv 2024, arXiv:2402.06196. [Google Scholar]
IBM Watson. Available online: https://www.ibm.com/watson (accessed on 31 May 2024).
Bakarov, A. A survey of word embeddings evaluation methods. arXiv 2018, arXiv:1801.09536. [Google Scholar]
Improving Language Understanding with Unsupervised Learning. Available online: https://openai.com/research/language-unsupervised (accessed on 31 May 2024).
Better Language Models and Their Implications. Available online: https://openai.com/research/better-language-models (accessed on 31 May 2024).
Manyika, J.; Hsiao, S. An overview of Bard: An early experiment with generative AI. Google Static Doc. 2023, 1–9. Available online: https://gemini.google/overview-gemini-app.pdf (accessed on 31 January 2024).
Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Barnes, N.; Mian, A. A comprehensive overview of large language models. arXiv 2023, arXiv:2307.06435. [Google Scholar]
Hadi, M.U.; Qureshi, R.; Shah, A.; Irfan, M.; Zafar, A.; Shaikh, M.B.; Akhtar, N.; Wu, J.; Mirjalili, S. Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Prepr. 2023, 1–45. [Google Scholar] [CrossRef]
Ambika. Large Language Models (LLMs): A Brief History, Applications and Challenges. Available online: https://blog.gopenai.com/large-language-models-llms-a-brief-history-applications-challenges-c2fab10fa2e7 (accessed on 31 May 2024).
Verma, A. Self-Attention Mechanism Transformers. Medium. 2023. Available online: https://medium.com/@averma9838/self-attention-mechanism-transformers-41d1afea46cf (accessed on 31 May 2024).
Vaniukov, S. NLP vs LLM: A Comprehensive Guide to Understanding Key Differences. Medium. 2024. Available online: https://medium.com/@vaniukov.s/nlp-vs-llm-a-comprehensive-guide-to-understanding-key-differences-0358f6571910 (accessed on 31 May 2024).
Bates, M. Models of natural language understanding. Proc. Natl. Acad. Sci. USA 1995, 92, 9977–9982. [Google Scholar] [CrossRef]
Wu, T.; He, S.; Liu, J.; Sun, S.; Liu, K.; Han, Q.L.; Tang, Y. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE Caa J. Autom. Sin. 2023, 10, 1122–1136. [Google Scholar] [CrossRef]
Ahmad, N.A.; Che, M.H.; Zainal, A.; Abd Rauf, M.F.; Adnan, Z. Review of chatbots design techniques. Int. J. Comput. Appl. 2018, 181, 7–10. [Google Scholar]
Devakunchari, R.; Agarwal, R.; Agarwal, E. A survey of Chatbot design techniques. Int. J. Eng. Adv. Technol. (IJEAT) 2019, 8, 35–39. [Google Scholar]
Shawar, B.A.; Atwell, E. A Comparison between Alice and Elizabeth Chatbot Systems; University of Leeds, School of Computing Research Report: Leeds, UK, 2002; pp. 1–13. [Google Scholar]
Mittal, M.; Battineni, G.; Singh, D.; Nagarwal, T.; Yadav, P. Web-based chatbot for frequently asked queries (FAQ) in hospitals. J. Taibah Univ. Med. Sci. 2021, 16, 740–746. [Google Scholar] [CrossRef] [PubMed]
Thoppilan, R.; De Freitas, D.; Hall, J.; Shazeer, N.; Kulshreshtha, A.; Cheng, H.T.; Jin, A.; Bos, T.; Baker, L.; Du, Y.; et al. Lamda: Language models for dialog applications. arXiv 2022, arXiv:2201.08239. [Google Scholar]
Zheng, X.; Zhang, C.; Woodland, P.C. Adapting GPT, GPT-2 and BERT language models for speech recognition. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia, 13–17 December 2021; pp. 1–7. [Google Scholar]
Ferrucci, D.; Brown, E.; Chu-Carroll, J.; Fan, J.; Gondek, D.; Kalyanpur, A.A.; Lally, A.; Murdock, J.W.; Nyberg, E.; Prager, J.; et al. Building Watson: An overview of the DeepQA project. AI Mag. 2010, 31, 59–79. [Google Scholar]
Sharma, V.; Goyal, M.; Malik, D. An intelligent behaviour shown by Chatbot system. Int. J. New Technol. Res. 2017, 3, 263312. [Google Scholar]
Pereira, M.J.; Coheur, L.; Fialho, P.; Ribeiro, R. Chatbots’ greetings to human-computer communication. arXiv 2016, arXiv:1609.06479. [Google Scholar]
Carpenter, R.; Freeman, J. Computing machinery and the individual: The personal Turing test. Computing 2005, 22, 1–4. [Google Scholar]
Giacaglia, G. How IBM Watson Works. Medium. 2021. Available online: https://medium.com/@giacaglia/how-ibm-watson-works-40d8d5185ac8 (accessed on 31 May 2024).
Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
Ahmed, I.; Roy, A.; Kajol, M.; Hasan, U.; Datta, P.P.; Reza, M.R. ChatGPT vs. Bard: A comparative study. Authorea Prepr. 2023, 1–18. [Google Scholar] [CrossRef]
Yu, C. PaddlePaddle/ERNIE. 2024. Available online: https://github.com/hotpads/ERNIE-for-the-Rest-of-Us (accessed on 19 June 2024).
Rudolph, J.; Tan, S.; Tan, S. War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. J. Appl. Learn. Teach. 2023, 6, 364–389. [Google Scholar] [CrossRef]
XAI. Open Release of Grok-1. 2024. Available online: https://x.ai/blog/grok-os (accessed on 12 July 2024).
Nguyen, T.T.; Nguyen, Q.V.H.; Nguyen, D.T.; Nguyen, D.T.; Huynh-The, T.; Nahavandi, S.; Nguyen, T.T.; Pham, Q.V.; Nguyen, C.M. Deep learning for deepfakes creation and detection: A survey. Comput. Vis. Image Underst. 2023, 103525. [Google Scholar]
Kalla, D.; Kuraku, S. Advantages, disadvantages and risks associated with chatgpt and ai on cybersecurity. J. Emerg. Technol. Innov. Res. 2023, 10, 85–94. [Google Scholar]
TheStreet Guest Contributor. We Asked a Chatbot Why Its So Dangerous. 2023. Available online: https://www.thestreet.com/technology/we-asked-a-chatbot-why-its-so-dangerous (accessed on 31 May 2024).
Liu, B.; Xiao, B.; Jiang, X.; Cen, S.; He, X.; Dou, W. Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A case study on ChatGPT. Secur. Commun. Netw. 2023, 8691095, 1–10. [Google Scholar] [CrossRef]
Zhu, K.; Wang, J.; Zhou, J.; Wang, Z.; Chen, H.; Wang, Y.; Yang, L.; Ye, W.; Gong, N.Z.; Zhang, Y.; et al. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv 2023, arXiv:2306.04528. [Google Scholar]
Sern, L.J.; David, Y.G.P.; Hao, C.J. PhishGAN: Data augmentation and identification of homoglyph attacks. In Proceedings of the IEEE International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), Sharjah, United Arab Emirates, 3–5 November 2020; pp. 1–6. [Google Scholar]
Liu, Y.; Jia, Y.; Geng, R.; Jia, J.; Gong, N.Z. Prompt injection attacks and defenses in llm-integrated applications. arXiv 2023, arXiv:2310.12815. [Google Scholar]
Bilika, D.; Michopoulou, N.; Alepis, E.; Patsakis, C. Hello me, meet the real me: Audio deepfake attacks on voice assistants. arXiv 2023, arXiv:2302.10328. [Google Scholar]
Vaidya, T.; Zhang, Y.; Sherr, M.; Shields, C. Cocaine noodles: Exploiting the gap between human and machine speech recognition. In Proceedings of the 9th USENIX Workshop on Offensive Technologies (WOOT), Washington, DC, USA, 10–11 August 2015; pp. 1–14. [Google Scholar]
Huber, M.; Kowalski, S.; Nohlberg, M.; Tjoa, S. Towards automating social engineering using social networking sites. In Proceedings of the IEEE International Conference on Computational Science and Engineering, Vancouver, BC, Canada, 29–31 August 2009; Volume 3, pp. 117–124. [Google Scholar]
Elsayed, G.F.; Goodfellow, I.; Sohl-Dickstein, J. Adversarial reprogramming of neural networks. arXiv 2018, arXiv:1806.11146. [Google Scholar]
Wang, J.; Hu, X.; Hou, W.; Chen, H.; Zheng, R.; Wang, Y.; Yang, L.; Huang, H.; Ye, W.; Geng, X.; et al. On the robustness of chatgpt: An adversarial and out-of-distribution perspective. arXiv 2023, arXiv:2302.12095. [Google Scholar]
Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against machine learning models. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–24 May 2017; pp. 3–18. [Google Scholar]
Wan, A.; Wallace, E.; Shen, S.; Klein, D. Poisoning language models during instruction tuning. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 35413–35425. [Google Scholar]
Li, J.; Yang, Y.; Wu, Z.; Vydiswaran, V.G.; Xiao, C. Chatgpt as an attack tool: Stealthy textual backdoor attack via blackbox generative model trigger. arXiv 2023, arXiv:2304.14475. [Google Scholar]
Liu, T.; Deng, Z.; Meng, G.; Li, Y.; Chen, K. Demystifying rce vulnerabilities in llm-integrated apps. arXiv 2023, arXiv:2309.02926. [Google Scholar]
Carlini, N.; Tramer, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; et al. Extracting training data from large language models. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Online, 11–13 August 2021; pp. 2633–2650. [Google Scholar]
ONeal, A.J. Chat GPT “DAN” (and other “Jailbreaks”). Available online: https://gist.github.com/coolaj86/6f4f7b30129b0251f61fa7baaa881516 (accessed on 31 May 2024).
Ye, W.; Li, Q. Chatbot security and privacy in the age of personal assistants. In Proceedings of the IEEE/ACM Symposium on Edge Computing (SEC), San Jose, CA, USA, 12–14 November 2020; pp. 388–393. [Google Scholar]
Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confid. Comput. 2024, 4, 1–24. [Google Scholar] [CrossRef]
Li, H.; Guo, D.; Fan, W.; Xu, M.; Huang, J.; Meng, F.; Song, Y. Multi-step jailbreaking privacy attacks on chatgpt. arXiv 2023, arXiv:2304.05197. [Google Scholar]
Yu, J.; Lin, X.; Xing, X. Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts. arXiv 2023, arXiv:2309.10253. [Google Scholar]
Pa Pa, Y.M.; Tanizaki, S.; Kou, T.; Van Eeten, M.; Yoshioka, K.; Matsumoto, T. An attacker’s dream? Exploring the capabilities of chatgpt for developing malware. In Proceedings of the 16th Cyber Security Experimentation and Test Workshop, Marina del Rey, CA, USA, 7–8 August 2023; pp. 10–18. [Google Scholar]
Alawida, M.; Abu Shawar, B.; Abiodun, O.I.; Mehmood, A.; Omolara, A.E.; Al Hwaitat, A.K. Unveiling the dark side of chatgpt: Exploring cyberattacks and enhancing user awareness. Information 2024, 15, 27. [Google Scholar] [CrossRef]
Happe, A.; Cito, J. Getting pwn’d by ai: Penetration testing with large language models. In Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA, 3–9 December 2023; pp. 2082–2086. [Google Scholar]
Roy, S.S.; Thota, P.; Naragam, K.V.; Nilizadeh, S. From Chatbots to PhishBots?–Preventing Phishing scams created using ChatGPT, Google Bard and Claude. arXiv 2023, arXiv:2310.19181. [Google Scholar]
Beckerich, M.; Plein, L.; Coronado, S. Ratgpt: Turning online llms into proxies for malware attacks. arXiv 2023, arXiv:2308.09183. [Google Scholar]
Liu, Y.; Deng, G.; Xu, Z.; Li, Y.; Zheng, Y.; Zhang, Y.; Zhao, L.; Zhang, T.; Wang, K.; Liu, Y. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv 2023, arXiv:2305.13860. [Google Scholar]
Bai, Y.; Jones, A.; Ndousse, K.; Askell, A.; Chen, A.; DasSarma, N.; Drain, D.; Fort, S.; Ganguli, D.; Henighan, T.; et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv 2022, arXiv:2204.05862. [Google Scholar]
Roy, S.S.; Naragam, K.V.; Nilizadeh, S. Generating phishing attacks using chatgpt. arXiv 2023, arXiv:2305.05133. [Google Scholar]
Si, W.M.; Backes, M.; Blackburn, J.; De Cristofaro, E.; Stringhini, G.; Zannettou, S.; Zhang, Y. Why so toxic? Measuring and triggering toxic behavior in open-domain chatbots. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, CA, USA, 7–11 November 2022; pp. 2659–2673. [Google Scholar]
Liu, Y.; Deng, G.; Li, Y.; Wang, K.; Zhang, T.; Liu, Y.; Wang, H.; Zheng, Y.; Liu, Y. Prompt Injection attack against LLM-integrated Applications. arXiv 2023, arXiv:2306.05499. [Google Scholar]
Ba, Z.; Zhong, J.; Lei, J.; Cheng, P.; Wang, Q.; Qin, Z.; Wang, Z.; Ren, K. SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution. arXiv 2023, arXiv:2309.14122. [Google Scholar]
Charfeddine, M.; Kammoun, H.M.; Hamdaoui, B.; Guizani, M. ChatGPT’s Security Risks and Benefits: Offensive and Defensive Use-Cases, Mitigation Measures, and Future Implications. IEEE Access 2024, 12, 30263–30310. [Google Scholar] [CrossRef]
Chen, B.; Paliwal, A.; Yan, Q. Jailbreaker in jail: Moving target defense for large language models. In Proceedings of the 10th ACM Workshop on Moving Target Defense, Copenhagen, Denmark, 26 November 2023; pp. 29–32. [Google Scholar]
Robey, A.; Wong, E.; Hassani, H.; Pappas, G.J. Smoothllm: Defending large language models against jailbreaking attacks. arXiv 2023, arXiv:2310.03684. [Google Scholar]
Kumar, A.; Agarwal, C.; Srinivas, S.; Feizi, S.; Lakkaraju, H. Certifying llm safety against adversarial prompting. arXiv 2023, arXiv:2309.02705. [Google Scholar]
Baudart, G.; Dolby, J.; Duesterwald, E.; Hirzel, M.; Shinnar, A. Protecting chatbots from toxic content. In Proceedings of the ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Boston, MA, USA, 7–8 November 2018; pp. 99–110. [Google Scholar]
Arora, A.; Arora, A.; McIntyre, J. Developing chatbots for cyber security: Assessing threats through sentiment analysis on social media. Sustainability 2023, 15, 13178. [Google Scholar] [CrossRef]
Edu, J.; Mulligan, C.; Pierazzi, F.; Polakis, J.; Suarez-Tangil, G.; Such, J. Exploring the security and privacy risks of chatbots in messaging services. In Proceedings of the ACM internet Measurement Conference, Nice, France, 25–27 October 2022; pp. 581–588. [Google Scholar]
Malik, K.M.; Malik, H.; Baumann, R. Towards vulnerability analysis of voice-driven interfaces and countermeasures for replay attacks. In Proceedings of the IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 523–528. [Google Scholar]
Lempinen, M.; Juntunen, A.; Pyyny, E. Chatbot for Assessing System Security with OpenAI GPT-3.5. Bachelor’s Thesis, University of Oulu Repository, Oulu, Finland, 2023; pp. 1–34. Available online: https://oulurepo.oulu.fi/handle/10024/42952 (accessed on 31 May 2024).
Yamin, M.M.; Hashmi, E.; Ullah, M.; Katt, B. Applications of LLMs for Generating Cyber Security Exercise Scenarios. Res. Sq. 2024, 1–17. [Google Scholar]
Franco, M.F.; Rodrigues, B.; Scheid, E.J.; Jacobs, A.; Killer, C.; Granville, L.Z.; Stiller, B. SecBot: A Business-Driven Conversational Agent for Cybersecurity Planning and Management. In Proceedings of the 16th International Conference on Network and Service Management (CNSM), Izmir, Turkey, 2–6 November 2020; pp. 1–7. [Google Scholar]
Liu, Y.; Yao, Y.; Ton, J.F.; Zhang, X.; Cheng, R.G.H.; Klochkov, Y.; Taufiq, M.F.; Li, H. Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment. arXiv 2023, arXiv:2308.05374. [Google Scholar]
Hadi, M.U.; Qureshi, R.; Shah, A.; Irfan, M.; Zafar, A.; Shaikh, M.B.; Akhtar, N.; Wu, J.; Mirjalili, S. A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Prepr. 2023, 1–31. [Google Scholar]
Wolf, Y.; Wies, N.; Avnery, O.; Levine, Y.; Shashua, A. Fundamental limitations of alignment in large language models. arXiv 2023, arXiv:2304.11082. [Google Scholar]
Kadavath, S.; Conerly, T.; Askell, A.; Henighan, T.; Drain, D.; Perez, E.; Schiefer, N.; Hatfield-Dodds, Z.; DasSarma, N.; Tran-Johnson, E.; et al. Language models (mostly) know what they know. arXiv 2022, arXiv:2207.05221. [Google Scholar]
Lin, S.; Hilton, J.; Evans, O. Teaching models to express their uncertainty in words. arXiv 2022, arXiv:2205.14334. [Google Scholar]
Montalbano, E. ChatGPT Hallucinations Open Developers to Supply Chain Malware Attacks. 2023. Available online: https://www.darkreading.com/application-security/chatgpt-hallucinations-developers-supply-chain-malware-attacks (accessed on 31 May 2024).
Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv 2023, arXiv:2311.05232. [Google Scholar]
Qiu, H.; Zhang, S.; Li, A.; He, H.; Lan, Z. Latent jailbreak: A benchmark for evaluating text safety and output robustness of large language models. arXiv 2023, arXiv:2307.08487. [Google Scholar]
Li, Z.; Peng, B.; He, P.; Yan, X. Evaluating the instruction-following robustness of large language models to prompt injection. arXiv 2023, arXiv:2308.10819. [Google Scholar]

Figure 1. Timeline of LLM progress.

Figure 2. Timeline of chatbot progress.

Figure 3. A more detailed timeline of chatbot progress.

Figure 4. Diagram of the transformer architecture from [2].

Figure 5. Pattern-matching chatbot tree structure from [50].

Figure 6. DAN 6.0 prompt.

Figure 7. GPTFUZZER workflow [84].

Figure 8. SurrogatePrompt pipeline [95].

Figure 9. MTD LLM system adapted from [97].

Figure 10. Example of BotShield deployment [100].

Figure 11. Word cloud created by data collected in [101].

Figure 12. When Gemini is prompted to generate code for a keylogger with the use of a jailbreaking prompt, it refuses to generate the code.

Figure 13. When ChatGPT is prompted to generate code for a keylogger with the use of a jailbreaking prompt, it generates the code.

Figure 14. When ChatGPT was prompted to make the original keylogger code more in-depth, it complied.

Figure 15. A jailbroken instance of ChatGPT generated misleading prompts to trick a chatbot into generating code for a keylogger.

Figure 16. When ChatGPT was given the generated prompt, it generated code for a keylogger without the use of a jailbreaking prompt.

Figure 17. ChatGPT generated a phishing email without the need for a jailbreaking prompt.

Figure 18. When ChatGPT was prompted to generate malicious code for a buffer overflow attack with the use of a jailbreaking prompt, it generated the code.

Figure 19. When given the prompt generated by the jailbroken instance, ChatGPT generated code that could be leveraged for a buffer overflow attack.

Figure 20. ChatGPT was given a second phishing prompt to further test this finding.

Table 1. Historical developments in LLMs.

Event	Year	Significance
Development of n-gram models	1940s	Used for intent detection in chatbots and LLMs
Origins of NLP	1950s	NLP allows chatbots to understand and respond to human speech
Growth of NLP computational power	1980s	NLP starts to use machine learning algorithms
Development of statistical LMs	1990s	Useful for validation and alignment of models
Development of neural LMs	2000s	Helpful for completion of machine translation tasks
Development of word embeddings	2010s	Useful for response generation in chatbots and LLMs
Development of transformer architecture	2017	Can be used to improve a chatbot’s or LLM’s contextual understanding abilities.
Development of pretrained LMs	2018	Helpful for completion of NLP tasks.
Release of the first GPT model	2018	The model did not use task-specific training. GPT family of models used for ChatGPT.
Development of general-purpose models	2020s	Skilled at a variety of tasks.

Table 2. Historically significant chatbots.

Chatbot	Year	Significance
ELIZA	1966	Widely considered to be the first chatbot.
PARRY	1972	Brought awareness to ethical considerations when using chatbots for healthcare purposes.
Jabberwacky	1980s	Employed AI in a more advanced form. Combined AI and pattern matching to achieve its functionality.
Dr. Sbaitso	1991	Demonstrated ability of sound cards developed by Creative Labs. Communicated using speech.
ALICE	1995	First online chatbot with web discussion ability.
Watson	2007–Present	There have been many iterations of Watson, including a chatbot and a platform for management of AI models.
Microsoft Copilot	2023	Boosts productivity in Microsoft 365 by automating tasks and providing smart insights.
ChatGPT	2022	One of the most widely used chatbots and can assist with a variety of tasks.
LLaMA	2023	Has open-source accessibility, scalability, efficient performance, and contribution to ethical AI development and research
Gemini	2023	Has a web-searching ability and can provide up-to-date information.
Claude	2023	Can express tone, personality, and behavior based on instructions.
Ernie	2023	Has multilingual support, integration of extensive knowledge, applicability across various industries, and enhancement of user experience.
Grok	2023	Delivers real-time knowledge through the X social media platform, elevates user experiences, and is willing to respond to spicy questions typically rejected by most other chatbots.

Table 4. Key functionalities of notable chatbots.

Chatbot	Year	Approach	Functionalities	Strengths	Weaknesses
ELIZA	1966	Pattern matching, Early AI & ML	ELIZA simulated human speech with a pattern-matching algorithm and substitution technique, following scripted instructions to generate responses. It matched input against a set of rules to select the right reply.	Simple implementation, engaged users.	Only had information on a singular topic, not flexible/adaptable, limited conversational patterns.
PARRY	1972	Pattern matching	PARRY analyzed user input for key phrases alongside pattern matching to understand intent, tailoring responses based on the importance of words in the conversation.	Strong controlling structure, could adapt responses based on weight variance in user’s prompts.	Low language understanding, incapable of learning from conversation computationally slow.
Jabberwacky	1981	ML and NLP	Jabberwacky uses a contextual pattern-matching algorithm that matches the user’s input with the appropriate response. Its use of ML allows it to retain knowledge, which increases the number of potential responses.	Improved conversational flow, provided a semi-personal experience by remembering some data.	Computationally slow, cannot handle many users at one time.
Dr. Sbaitso	1991	Text-to-speech synthesis	Dr. Sbaitso uses several sound cards to generate speech.	Its verbal communication made it more “human” and engaging than its predecessors.	Limited conversational abilities due to small set of possible responses.
ALICE	1995	NLP, pattern matching, rule-based systems	ALICE understands the user input. Pattern matching is used to find keywords in the user input. ALICE also uses a ruleset defined by the developer to dictate its responses.	Did not need to know the context of whole conversation. Could discuss a variety of topics.	Limited natural language understanding, inflexible.
Watson	2007	DeepQA	Watson uses the DeepQA approach. Uses content acquisition, question analysis, hypothesis generation, soft filtering, hypothesis scoring, final merging, answer merging, and ranking estimation.	Natural language processing, parallel computing, scalability.	Lack of common sense, resource intensive, dependency on training data.
ChatGPT	2022	Transformer architecture	First, a model is trained with supervised fine-tuning. Then, a reward model is trained with comparison data from sampled and labeled outputs. Finally, a policy is optimized against the reward model using reinforcement learning.	Large knowledge base, adaptable, availability, natural language understanding, text generation.	Vulnerable to bias, unable to verify information.
Gemini	2023	LaMDA	Gemini is pretrained on large publicly available data, enabling it to learn language patterns. Upon receiving context, Gemini creates drafts of potential responses, which are checked against safety parameters. Uses RLHF to improve itself.	Text generation, natural language understanding, access to up-to-date information.	Vulnerable to bias, inconsistent responses, does not provide sources.
LLaMA	2023	Transformer architecture	It is trained on over 15 trillion tokens with quality control, including substantial code and non-English data from 30 languages. Excels in chat, code generation, creative composition, logical reasoning, and multimodal abilities.	Highly efficient and scalable in various tasks like text generation, translation, and summarization.	Vulnerable to data bias, not for calculus/statistics, limited real-time access, resource-intensive.
Ernie	2023	Transformer architecture	Uses a masking pretraining strategy with knowledge graphs and multitasking. Analyzes sentiment, identifies entities, and classifies or generates text. Answers questions on context, measures textual similarity, retrieves information, and translate languages.	Enhanced contextual understanding, strong performance, multilingual capabilities, designed for various applications.	Data bias, resource-intensive, stronger performance in Chinese language, limited access.
Grok	2023	Transformer architecture	It was trained on a huge amount of text data with unsupervised learning. It was designed to handle a wide range of questions, generating text, code assistance, and combined text and images in queries. Uses RLHF to improve itself.	Real-time information, individual user preferences, multilingual support, provide personalized recommendations, humor, excels in math and reasoning, open-source.	Contextual limitations, potential for bias, misunderstanding complex queries, resource intensive, lack of emotional intelligence, safety and ethical concerns.

Table 5. Positive and negative effects and applications of chatbots.

Positive Effects		Negative Effects
Effect	Description	Effect	Description
Increased autonomy and independence	Instead of relying on others to complete tasks or obtain knowledge, an individual can converse with a chatbot.	Improper handling of personal information	Chatbots may not effectively protect personal information if they do not have mechanisms that prioritize the protection of personal information.
Increase in available knowledge	Chatbots make the process of gaining new knowledge easier, as they have interfaces that make interaction accessible for users of all skill levels.	Generation of incorrect information	There is a risk chatbots may generate incorrect information.
Human connection	There are chatbot-powered platforms that can connect like-minded individuals.	Ethical concerns in academia	Chatbots can be used to plagiarize papers and exams. If an AI writing checker is used, papers could also be incorrectly flagged.
Cyber defense	Chatbots can be used for a variety of cyber defenses. Some of these defensive tools are augmented versions of existing tools, while other ones are novel tools that leverage the unique abilities of chatbots and LLMs.	Cyber offense	Chatbots can be used for a variety of cyber offenses. Some of these offensive tools are augmented versions of existing tools, while other ones are novel tools that leverage the unique abilities of chatbots and LLMs
Applications in customer service industry	Customer service chatbots produce high-quality, easy to understand, and typically accurate answers to a customer’s questions.	Inadequate at providing emotional support	Not all chatbots are adequate at providing emotional support. Relying on the wrong chatbot could have serious consequences.
Educational benefits	Chatbots can help answer questions, review papers, and provide information at the user’s own pace.	Dependency on technology	Over-reliance on chatbots may diminish human skills.
Applications in healthcare	Chatbots are frequently used for tasks such as diagnosis, collecting information on a patient, interpreting medical images, and documentation.	Potential to produce biased answers	Depending on the data used during the training phase, chatbots may produce biased responses. Chatbots trained on data containing biased content are more likely to produce biased responses.

Table 7. Attacks using chatbots.

Attack Name	Description	Related Work(s)
Social engineering attack	Chatbots are being used to generate social engineering attempts.	Leverages a chatbot’s ability to generate convincing, human-like text [22].
Phishing attack	Exploits a chatbot’s ability to generate human-like speech to mimic a legitimate line of communication.	MJP is a multistep jailbreaking prompt attack that uses multiple prompts to bypass a chatbot’s content filters [83]. GPTFUZZER automates generation of jailbreaking templates [84].
Ransomware and malware generation attacks	Chatbots are being used to generate malware and ransomware. Adversaries achieve this by manipulating the prompts given to chatbots, deceiving them into generation of malicious code.	The GPT family of models can generate different types of malware, varying in complexity [85].
Macros and living off the land binary attacks	Occur when a victim downloads a chatbot-generated spreadsheet that executes a macro, allowing the adversary to gain access to their machine.	Chatbots are capable of writing macros that launch malware using trusted system tools [86].
SQL injection attack	Occurs when a chatbot is used to generate code with an injected payload.	Chatbots can generate code to be injected into an application to access sensitive data [86].
Low-level privilege access attack	A chatbot is asked to generate commands for the desired privilege escalation attack.	Chatbots are capable of vulnerable commands that are fed back to the victim machine for privilege escalation [87].

Table 8. Comparison of representative works exploring attacks on chatbots and attacks using chatbots.

Work	Year	Dataset/Benchmark	Contributions	Strengths/Advantages	Weaknesses/Limitations
[89]	2023	N/A	Uses plugins with ChatGPT to establish a proxy between an adversary and the victim.	Proof of concept shows significant concerns regarding LLM security.	No practical experiments to support claims.
[22]	2023	N/A	Explores attacks on and using ChatGPT, as well as defenses.	Provides an in-depth discussion of the security concerns of ChatGPT, and supports claims with screenshots of adversarial prompting experiments.	Only discusses ChatGPT, so findings may not be applicable to other LLMs.
[83]	2023	Enron Email, PII from institutional pages	A novel multistep jailbreaking prompt that bypasses ChatGPT’s content filters.	The jailbreaking prompt is highly effective in some use cases and can be used in combination with other adversarial methods to achieve even better performance.	Experiences low accuracy in some cases. Free-form extraction method for the New Bing can lead to repeated or incorrect patterns.
[84]	2023	Human-written jailbreaking prompts gathered from the Internet for initial seed, datasets from [90,91]	Introduces a novel jailbreak fuzzing framework to automate generation of jailbreaking templates.	Efficiently produces jailbreaking templates with high ASR. Generated templates consistently outperform human-generated jailbreaking templates.	Requires human intervention for initial seed. Ignores question transformation, risking keyword matching to reject prompts.
[77]	2023	SST-2, AGNews, Amazon, Yelp, IMDB	BGMAttack, a textual backdoor attack method that uses a text generative model as a trigger.	Maintains comparable performance to other similar attack methods but is stealthier.	Requires human cognition evaluations to verify efficacy. ChatGPT AI’s instability may cause performance issues.
[92]	2023	N/A	Discusses and demonstrates the the use of ChatGPT for a variety of phishing attacks.	Thorough descriptions of attacks, figures and screenshots to support claims.	The work needs more practical experiments, like testing the phishing components in real-world testing conditions.
[85]	2023	Collection of jailbreaking prompts	Investigates the use of chatbots for development of malware.	Provides in-depth discussion and analysis of malware generation capabilities of chatbots.	Limited scope.
[93]	2022	4chan and Reddit datasets	Explores how chatbots can potentially generate toxic answers when given non-toxic prompts.	The ToxicBuddy attack is a novel method that focuses on the generation of non-toxic prompts to generate toxic answers. It highlights that any prompt given to a chatbot has toxic potential.	Relies on tools that can be biased. The definition of toxic is subjective, making the results and thresholds used in this work subjective as well.
[74]	2023	AdvGLUE, ANLI, Flipkart Review, DDXPlus	Evaluates robustness of ChatGPT from adversarial and out-of-distribution perspectives.	Provides strong, detailed evaluation of several commercially available LLMs.	Does not evaluate all chatbots’ capabilities and may be invalid due to the small dataset size.
[94]	2024	N/A	Proposes HouYi, a black-box prompt injection attack, like web injection attacks.	Proposed attack method is very effective and has identified vulnerabilities confirmed by vendors.	Lacks qualitative data.
[95]	2023	N/A	Proposes a framework to generate attack prompts to bypass Midjourney’s safety filter.	88% ASR when attacking Midjourney, multimodal approach.	Struggles to pass filters when asking for images of explicit content.

Table 9. Defenses for chatbots and using chatbots.

Defense Name	Description	Related Work(s)
Defense automation	Chatbots can be used to automate a variety of defenses, relieving the burden on professionals	Chatbots can be used for advice on how to avoid dangerous scripts, education of analysts, and detecting attacks [22]
Security bug detection	Chatbots can be used to review code/applications to detect security bugs	ChatGPT can be used to perform in-depth code reviews, and is effective at doing so due to its in-depth security knowledge [22]
Secure code generation	Chatbots can be used to either develop secure code or analyze existing code	ChatGPT is skilled at developing secured code due to its in depth security knowledge [22]
Policy development	Chatbots can be used to develop security policies for an organization	ChatGPT is skilled at policy writing due to its in-depth security knowledge [96]
Security questionnaires	Chatbots can be used to speed up preparing security questionnaires	ChatGPT can speed up generating security questionnaires [96]
Incident response and recovery	For incident response and recovery, chatbots can be used to analyze data obtained during the incident, notify the appropriate parties of an incident, review and revise the incident response plan, and provide a summary of the incident	Chatbots expedite and improve the incident response process [96]
Corpora cleaning	Training data are sent through a pipeline to remove undesirable content from the training corpora	Cleaning an LLM’s training corpora is important to remove flawed or toxic information [82]
Use of a robust training method	The use of a robust training method makes chatbots/LLMs less susceptible to attacks, as well as improves their safety and alignment	Robust training increases LLM resilience against certain text-based attacks [82]
Proper instruction preprocessing practices	Proper instruction preprocessing practices reduces the chance that a model will be maliciously used or receive suspicious instructions	This practice leads to higher-quality data, reduces the chance a model will be used maliciously, and increases the algorithm’s readability [82]
Education and awareness Practices	It is important to educate the appropriate parties on proper use case and protocols when using chatbots/LLMs	It is important that security personnel are well trained as they are the first line of defense against attacks [82]

Table 10. Comparison of representative works exploring defenses for and using chatbots.

Work	Year	Dataset/Benchmark	Contributions	Strengths/Advantages	Weaknesses/Limitations
[97]	2023	Set of 4 adversarial prompts	A moving target defense LLM system that protects LLMs against adversarial prompting attacks with a 100% success rate	Introduces a highly successful defense mechanism.	Inadequate experimental evaluations analysis
[98]	2023	AdvBench	First-of-its-kind algorithm for mitigating jailbreaking attacks on LLMs	SmoothLLM: efficient, versatile, and defends against unforeseen prompt attacks.	Smoothing process may introduce noise, may not be applicable to other types of attacks
[99]	2024	AdvBench	Erase-and-check framework, the first to defend against adversarial prompts, with safety guarantees	Easily adaptable and safe. A promising direction for defenses against adversarial prompts.	Computationally expensive, relies heavily on accurate safety filters
[88]	2024	Collections of phishing prompts obtained using several different techniques	A BERT-based tool that detects malicious prompts to reduce the likelihood LLMs will generate phishing content	The BERT-based approach is highly effective against several prompting scenarios that may provoke LLMs into generating phishing.	This defense targets text-based phishing and potential phishing but not other attacks, requiring user interaction, e.g., reCAPTCHA or browser-in-browser
[100]	2018	N/A	The BotShield framework is designed to defend chatbots against toxic content in user inputs using a multidisciplinary approach	BotShield does not require any changes to the protected chatbot, easing the burden on the developer.	Experiences latency, can be difficult to implement, does not detect bots, does not implement differential privacy
[101]	2023	Collection of tweets	Proposes a chatbot for deployment on social media sites for cyber-attack and threat prediction	The proposed chatbot is a preventative method that can detect attacks before they occur, potentially saving time and resources.	This chatbot is only applicable to Twitter, so not a widespread solution
[102]	2022	Metadata scraped from top.gg such as chatbot ID, name, URL, and tags	A methodology that assesses security and privacy issues in messaging platform chatbots	This methodology can can identify chatbots that pose security and privacy risks accurately and highlights the need for more research in this area.	Framework is resource-intensive, relying on traceability analysis, which may be ineffective due to the the ambiguity of words
[103]	2019	Voice recordings	Proposes a methodology for the detection of voice replay attacks	The proposed method is a preventative method that can accurately attack voice replay attacks and is a novel contribution.	HOSA may not be able to accurately differentiate between natural and synthetic voice(s)
[104]	2023	HIDS logs, survey results	Proposes a chatbot that analyzes HIDS log data to assist the user with assessing system security	The proposed chatbot has an easy-to-use interface, automates a series of analysis tasks, and eases the burden of the user.	Lack of a comprehensive qualitative analysis and uses a small sample size
[105]	2024	N/A	Explores the use of LLM hallucinations to create cyber exercise scenarios	The innovative use of LLM hallucinations generated accurate and effective cyber exercise scenarios.	The whole exercise design process is not described, and parts are unclear
[106]	2020	Samples of intents and entities	Proposes a conversational chatbot for cybersecurity planning and management	SecBot interacts with users and provides helpful, detailed responses and can identify attacks and provide insights about risks.	If the correct dataset is not used, SecBot can be prone to overfitting

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Szmurlo, H.; Akhtar, Z. Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity. Information 2024, 15, 443. https://doi.org/10.3390/info15080443

AMA Style

Szmurlo H, Akhtar Z. Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity. Information. 2024; 15(8):443. https://doi.org/10.3390/info15080443

Chicago/Turabian Style

Szmurlo, Hannah, and Zahid Akhtar. 2024. "Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity" Information 15, no. 8: 443. https://doi.org/10.3390/info15080443

APA Style

Szmurlo, H., & Akhtar, Z. (2024). Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity. Information, 15(8), 443. https://doi.org/10.3390/info15080443

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity

Abstract

1. Introduction

1.1. What Are Chatbots?

1.2. The Turing Test and the Inception of Chatbots

1.3. Modern Chatbots and Their Uses

1.4. Contributions of This Article

2. History of Chatbots and Large Language Models

2.1. 1940s: First Mention of n-Gram Models

2.2. 1950s: Origins of and Early Developments in Natural Language Processing

2.3. 1960s: ELIZA

2.4. 1970s: PARRY

2.5. 1980s: Jabberwacky, Increase in Large Language Model Computational Power, and Small Language Models

2.6. 1990s: Statistical Language Models, Dr. Sbaitso, and ALICE

2.7. 2000s: Neural Language Models

2.8. 2010s: Word Embeddings, Neural Language Models Advances, Transformer Model, Pretrained Models, Watson, and the GPT Model Family

2.9. 2020s: Microsoft Copilot, ChatGPT, LLaMA, Gemini, Claude, Ernie, Grok, and General-Purpose Models

3. Functionality of Large Language Models and Chatbots

3.1. Functionality of Large Language Models

3.1.1. General Large Language Model Architectures

3.1.2. Transformer Architecture

3.1.3. Large Language Model Training Process

3.1.4. Natural Language Processing (NLP) and Natural Language Understanding (NLU)

3.2. Functionality of Chatbots

3.2.1. Pattern-Matching Algorithm

3.2.2. Rule-Based Systems

3.2.3. General Chatbot Architecture

3.3. Technical Discussion of Well-Known LLM Families and Chatbots

3.3.1. Well-Known LLM Families

3.3.2. ELIZA

3.3.3. PARRY

3.3.4. Jabberwacky

3.3.5. Dr. Sbaitso

3.3.6. ALICE

3.3.7. Watson

3.3.8. Microsoft Copilot

3.3.9. ChatGPT

3.3.10. LLaMA

3.3.11. Gemini

3.3.12. Ernie

3.3.13. Grok

4. Applications and Societal Effects of Chatbots

4.1. Positive Applications and Societal Effects

4.2. Negative Applications and Societal Effects

5. Attacks on Chatbots and Attacks Using Chatbots

5.1. Attacks on Chatbots

5.2. Attacks Using Chatbots

5.3. Some Representative Research Works on Attacks on Chatbots and Attacks Using Chatbots

6. Defenses for Chatbots and Defenses Using Chatbots

6.1. Defenses Using Chatbots

6.2. Defenses for Chatbots

6.3. Some Representative Research Works on Defenses for Chatbots and Defenses Using Chatbots

7. Limitations of LLM (Chatbots) and Surveyed Case Studies (Works) in This Article

7.1. Limitations of Large Language Models and Chatbots

7.2. Some Comments (Limitations) of Surveyed Works in This Paper

8. Experimental Analysis

8.1. Malware Generation

8.2. Phishing Email Generation

8.3. Buffer Overflow Attack

8.4. Discussion of Results

9. Open Issues and Future Research Directions

9.1. Alignment Science in LLMs/Chatbots

9.2. Computational Issues of Jailbreaking

9.3. Hallucination Challenges in Chatbots and LLMs

9.4. Versatile Defenses: Perturbing Input Prompts

9.5. Advancing Moving Target Defenses Strategies for LLM and Chatbots

9.6. Erase-and-Check like Frameworks

9.7. Next Generation of Defenses against Attacks

9.8. Large-Scale Evaluation of Chatbots and Large Language Models

10. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

References

Share and Cite

Article Metrics

Article Access Statistics