Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity
Abstract
:1. Introduction
1.1. What Are Chatbots?
1.2. The Turing Test and the Inception of Chatbots
1.3. Modern Chatbots and Their Uses
1.4. Contributions of This Article
- We present a comprehensive history of LLMs and chatbots, with a focus on how key developments in LLMs play a significant part in the functionality of chatbots. It is worth noting that most previous survey works (e.g., [9,23]) mainly presented a history of chatbots but neither covered cybersecurity aspects efficiently nor properly explored how LLMs and chatbots are interconnected in the technology and application domains.
- A comprehensive literature review explores attacks on chatbots, attacks using chatbots, defenses for chatbots, and defenses using chatbots.
- We offer experimental analyses for several offensive applications of chatbots such as malware generation, phishing attacks, and buffer overflow attacks.
- We provide some suggestions for enhancing prior research efforts and advancing the technologies addressed in this article.
- A discussion of open issues and potential future research directions on chatbots, LLM, and duality of chatbots are provided.
- This document is intended to supplement prior survey papers by incorporating the latest advancements, fostering a comprehensive grasp of chatbots and LLM domains for the reader. Moreover, this document serves as a foundation for the development of innovative approaches to address the duality of chatbots through both offensive and defensive strategies.
2. History of Chatbots and Large Language Models
2.1. 1940s: First Mention of n-Gram Models
2.2. 1950s: Origins of and Early Developments in Natural Language Processing
2.3. 1960s: ELIZA
2.4. 1970s: PARRY
2.5. 1980s: Jabberwacky, Increase in Large Language Model Computational Power, and Small Language Models
2.6. 1990s: Statistical Language Models, Dr. Sbaitso, and ALICE
2.7. 2000s: Neural Language Models
2.8. 2010s: Word Embeddings, Neural Language Models Advances, Transformer Model, Pretrained Models, Watson, and the GPT Model Family
2.9. 2020s: Microsoft Copilot, ChatGPT, LLaMA, Gemini, Claude, Ernie, Grok, and General-Purpose Models
3. Functionality of Large Language Models and Chatbots
3.1. Functionality of Large Language Models
3.1.1. General Large Language Model Architectures
LLM | Year | Architecture | Parameters | Strengths | Weaknesses |
---|---|---|---|---|---|
GPT | 2018 | Transformer decoder | 110 M | Scalability, transfer learning. | Limited understanding of context, vulnerable to bias. |
GPT-2 | 2019 | Transformer decoder | 1.5 B | Enhanced text generation, enhanced context understanding, enhanced transfer learning. | Limited understanding of context, vulnerable to bias. |
GPT-3 | 2020 | Transformer decoder | 175 B | Zero-shot and few-short learning, fine-tuning flexibility. | Resource intensive, ethical issues. |
GPT-3.5 | 2022 | Transformer decoder | 175 B | Zero-shot and few-short learning, fine-tuning flexibility | Resource intensive, ethical issues. |
GPT-4 | 2023 | Transformer decoder | 1.7 T | Cost-effective, scalable, easy to personalize, multilingual. | Very biased, cannot check accuracy of statements, ethical issues. |
LaMDA | 2020 | Basic transformer | 137 B | Large-scale knowledge, text generation, natural language understanding. | Can generate biased answers, can generate inaccurate answers, cannot understand common sense well. |
BERT (Base) | 2018 | Transformer encoder | 110 M | Bidirectional context, transfer learning, contextual embeddings. | Computational resources, token limitations, fine-tuning requirements. |
LLaMA | 2023 | Basic transformer | 7 B, 13 B, 33 B, 65 B | Cost-effective, strong performance, safe. | Limited ability to produce code, can generate biased answers, limited availability. |
3.1.2. Transformer Architecture
3.1.3. Large Language Model Training Process
3.1.4. Natural Language Processing (NLP) and Natural Language Understanding (NLU)
3.2. Functionality of Chatbots
3.2.1. Pattern-Matching Algorithm
3.2.2. Rule-Based Systems
3.2.3. General Chatbot Architecture
3.3. Technical Discussion of Well-Known LLM Families and Chatbots
3.3.1. Well-Known LLM Families
3.3.2. ELIZA
3.3.3. PARRY
3.3.4. Jabberwacky
3.3.5. Dr. Sbaitso
3.3.6. ALICE
3.3.7. Watson
3.3.8. Microsoft Copilot
3.3.9. ChatGPT
3.3.10. LLaMA
3.3.11. Gemini
3.3.12. Ernie
3.3.13. Grok
4. Applications and Societal Effects of Chatbots
4.1. Positive Applications and Societal Effects
4.2. Negative Applications and Societal Effects
5. Attacks on Chatbots and Attacks Using Chatbots
5.1. Attacks on Chatbots
Attack Name | Description | Related Work(s) |
---|---|---|
Homoglyph attack | Homoglyph attacks replace character(s) with visually similar characters to create functional, malicious links. | PhishGAN is conditioned on non-homogl-yph input text images to generate images of hieroglyphs [68]. |
Jailbreaking attack(s) | Malicious prompts created by the adversary are given to a chatbot to instruct it to behave in a way its developer did not intend. | Malicious prompts are used to generate harmful content, such as phishing attacks and malware [22]. |
Prompt-injection attack | Prompt injection attacks are structurally like SQL injection attacks and use carefully crafted prompts to manipulate an LLM to perform the desired task. | A prompt containing a target and an injected task are given to an LLM. This prompt is manipulated so the LLM will perform the injected task [69]. |
Audio deepfake attack | Deepfake audio clips are created using machine learning to replicate voices for malicious purposes. | Seemingly benign audio files are used to synthesize voice samples to feed to voice assistants to execute privileged commands [70]. |
Adversarial voice samples attack | Malicious voice samples are crafted using tuning and reconstructed audio signals. | Extraction parameters are tuned until the voice recognition system cannot identify them, then converted back to the waveform of human speech. Such samples are used to fool voice recognition systems [71]. |
Automated social engineering | Automation is introduced to reduce human intervention in social engineering attacks, reducing costs and increasing effectiveness. | The adversary gives a bot parameters for the social engineering attack, and the bot executes the desired attack [72]. |
Adversarial examples | Inputs are given to the target model to cause it to deviate from normal behavior. | Adversarial examples can be either targeted or untargeted, leading to malicious outputs [73,74]. |
Adversarial reprogramming feedback attack | The adversary can reprogram a model to perform a desired task. | A single adversarial perturbation is added to all inputs to force a model to complete a desired task, even if it was not trained to do so [73]. |
Data poisoning attack | Occurs when an adversary injects malicious data into the training dataset. | Malicious data are injected into the training set and can cause a variety of model failures [75,76]. |
Backdoor attack | An adversary alters the training data and model processing. | These attacks manipulate training data, resulting in the adversary being able to embed a hidden backdoor [77]. |
Extraction attack | To reconstruct training data, a model is prompted with prefixes. | An adversary can prompt or query a language model with prefixes to extract individual training examples [78]. |
Membership inference attack | An adversary attempts to identify if a specific piece of data belongs to a model’s training dataset. | These attacks target a model’s training dataset, using inference to deduce its members [79]. |
Remote code execution | Arbitrary code on an app’s server is executed remotely via a prompt or series of prompts. | These attacks target LLMs that are integrated into web services and can compromise the LLM’s environment [73]. |
5.2. Attacks Using Chatbots
5.3. Some Representative Research Works on Attacks on Chatbots and Attacks Using Chatbots
6. Defenses for Chatbots and Defenses Using Chatbots
6.1. Defenses Using Chatbots
6.2. Defenses for Chatbots
6.3. Some Representative Research Works on Defenses for Chatbots and Defenses Using Chatbots
7. Limitations of LLM (Chatbots) and Surveyed Case Studies (Works) in This Article
7.1. Limitations of Large Language Models and Chatbots
7.2. Some Comments (Limitations) of Surveyed Works in This Paper
8. Experimental Analysis
8.1. Malware Generation
8.2. Phishing Email Generation
8.3. Buffer Overflow Attack
8.4. Discussion of Results
9. Open Issues and Future Research Directions
9.1. Alignment Science in LLMs/Chatbots
9.2. Computational Issues of Jailbreaking
9.3. Hallucination Challenges in Chatbots and LLMs
9.4. Versatile Defenses: Perturbing Input Prompts
9.5. Advancing Moving Target Defenses Strategies for LLM and Chatbots
9.6. Erase-and-Check like Frameworks
9.7. Next Generation of Defenses against Attacks
9.8. Large-Scale Evaluation of Chatbots and Large Language Models
10. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
Term | Acronym |
Artificial intelligence | AI |
Machine learning | ML |
Natural language processing | NLP |
Large language models | LLMs |
Language models | LMs |
Multilayer perceptrons | MLPs |
Pretrained language models | PLMs |
Reinforcement learning from human feedback | RLHF |
Natural language understanding | NLU |
Language model for dialogue applications | LaMDA |
Sequence-to-sequence model | Seq2Seq |
Artificial intelligence markup language | AIML |
In-context learning | ICL |
Man in the middle | MitM |
Do anything now | DAN |
Living-off-the-land binaries | LOBLINs |
Multistep jailbreaking prompt | MJP |
Attack success rate | ASR |
Blackbox generative model-based attack method | BGMAttack |
Clean accuracy | CACC |
Sentence perplexity | PPL |
Grammatical error numbers | GEM |
Executable and linking format | elf |
Safety filter | SF |
SaFeRDialogues | SD |
Knowledge distillation | KD |
BlenderBot-small | BBs |
TwitterBot | TB |
Non-toxic to toxic | NT2T |
Non-toxic to non-toxic | NT2NT |
Toxic to toxic | T2T |
Non-toxic query | NTQ |
BlenderBot-large | BBl |
BlenderBot-medium | BBm |
Dialogue safety classifier | DSC |
Out-of-distribution | OOD |
Adversarial natural language inference | ANLI |
Natural language inference | NLI |
Personally identifying information | PII |
Higher-order spectral analysis | HOSA |
Toxic to non-toxic | T2NT |
Host-based intrusion detection system | HIDS |
Retrieval-augmented generation | RAG |
Terminal learning objectives | TOLs |
References
- Littman, M.L.; Ajunwa, I.; Berger, G.; Boutilier, C.; Currie, M.; Doshi-Velez, F.; Hadfield, G.; Horowitz, M.C.; Isbell, C.; Kitano, H.; et al. Gathering Strength, Gathering Storms: The One Hundred Year Study on Artificial Intelligence (AI100); 2021 Study Panel Report; Stanford University: Stanford, CA, USA, 2021. [Google Scholar]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. In Proceedings of the Advances in Neural Information Processing Systems 30: Annual Conference on Neural Information Processing Systems 2017, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
- Meta, Multimodal Generative AI systems. Available online: https://ai.meta.com/tools/system-cards/multimodal-generative-ai-systems/ (accessed on 31 May 2024).
- Müller, V.C.; Bostrom, N. Future progress in artificial intelligence: A poll among experts. AI Matters 2014, 1, 9–11. [Google Scholar] [CrossRef]
- Questionnaire Experts Results: Future Progress in Artificial Intelligence. Available online: https://www.pt-ai.org/polls/experts (accessed on 31 May 2024).
- Korteling, J.H.; van de Boer-Visschedijk, G.C.; Blankendaal, R.A.; Boonekamp, R.C.; Eikelboom, A.R. Human-versus artificial intelligence. Front. Artif. Intell. 2021, 4, 622364. [Google Scholar] [CrossRef] [PubMed]
- Grace, K.; Salvatier, J.; Dafoe, A.; Zhang, B.; Evans, O. When will AI exceed human performance? Evidence from AI experts. J. Artif. Intell. Res. 2018, 62, 729–754. [Google Scholar] [CrossRef]
- Oracle. What Is a Chatbot? Available online: https://www.oracle.com/chatbots/what-is-a-chatbot/ (accessed on 31 May 2024).
- Adamopoulou, E.; Moussiades, L. Chatbots: History, technology, and applications. Mach. Learn. Appl. 2020, 2, 100006. [Google Scholar] [CrossRef]
- Turing, A. Computing Machinery and Intelligence (1950); Oxford University Press eBooks: Oxford, UK, 2004. [Google Scholar]
- Qammar, A.; Wang, H.; Ding, J.; Naouri, A.; Daneshmand, M.; Ning, H. Chatbots to ChatGPT in a cybersecurity space: Evolution, vulnerabilities, attacks, challenges, and future recommendations. arXiv 2023, arXiv:2306.09255. [Google Scholar]
- Weizenbaum, J. ELIZA—A computer program for the study of natural language communication between man and machine. Commun. ACM 1966, 9, 36–45. [Google Scholar] [CrossRef]
- GitHub Copilot Documentation, About GitHub Copilot—GitHub Docs. Available online: https://docs.github.com/en/copilot/about-github-copilot (accessed on 31 May 2024).
- Spataro, J. Introducing Microsoft 365 Copilot—Your Copilot for Work 2023. Available online: https://blogs.microsoft.com/blog/2023/03/16/introducing-microsoft-365-copilot-your-copilot-for-work/ (accessed on 28 July 2024).
- Introducing ChatGPT. Available online: https://openai.com/blog/chatgpt (accessed on 31 May 2024).
- Pichai, S. An Important Next Step on our AI Journey 2023. Available online: https://blog.google/technology/ai/bard-google-ai-search-updates (accessed on 31 May 2024).
- Anthropic. Introducing Claude. Available online: www.anthropic.com/news/introducing-claude (accessed on 31 May 2024).
- Følstad, A.; Brandtzaeg, P.B.; Feltwell, T.; Law, E.L.; Tscheligi, M.; Luger, E.A. SIG: Chatbots for social good. In Proceedings of the Extended Abstracts of the 2018 CHI Conference on Human Factors in Computing Systems, Montreal, QC, Canada, 21–26 April 2018; pp. 1–4. [Google Scholar]
- Misischia, C.V.; Poecze, F.; Strauss, C. Chatbots in customer service: Their relevance and impact on service quality. Procedia Comput. Sci. 2022, 201, 421–428. [Google Scholar] [CrossRef]
- Zawacki-Richter, O.; Marín, V.I.; Bond, M.; Gouverneur, F. Systematic review of research on artificial intelligence applications in higher education–where are the educators? Int. J. Educ. Technol. High. Educ. 2019, 16, 1–27. [Google Scholar] [CrossRef]
- Reis, L.; Maier, C.; Mattke, J.; Weitzel, T. Chatbots in healthcare: Status quo, application scenarios for physicians and patients and future directions. Eur. Conf. Inf. Syst. 2020, 163, 1–13. [Google Scholar]
- Gupta, M.; Akiri, C.; Aryal, K.; Parker, E.; Praharaj, L. From ChatGPT to ThreatGPT: Impact of Generative AI in Cybersecurity and Privacy . IEEE Access 2023, 11, 80218–80245. [Google Scholar]
- Zemčík, M.T. A brief history of chatbots. Destech Trans. Comput. Sci. Eng. 2019, 10, 1–5. [Google Scholar] [CrossRef] [PubMed]
- Shannon, C.E. A mathematical theory of communication. Bell Syst. Tech. J. 1948, 27, 379–423. [Google Scholar] [CrossRef]
- Garvin, P.L. The Georgetown-IBM Experiment of 1954: An Evaluation in Retrospect; Mouton: Berlin, Germany, 1967; pp. 46–56. [Google Scholar]
- Chomsky, N. Syntactic Structures; Mouton de Gruyter: Berlin, Germany, 2002. [Google Scholar]
- Foote, K.D. A Brief History of Natural Language Processing, DATAVERSITY. 2023. Available online: https://www.dataversity.net/a-brief-history-of-natural-language-processing-nlp/ (accessed on 31 May 2024).
- Fryer, L.; Carpenter, R. Bots as language learning tools. Lang. Learn. Technol. 2006, 10, 8–14. [Google Scholar]
- Foote, K.D. A Brief History of Large Language Models, DATAVERSITY. 2023. Available online: https://www.dataversity.net/a-brief-history-of-large-language-models (accessed on 31 May 2024).
- Mashette, N. Small Language Models (SLMS). Medium. 12 December 2023. Available online: https://medium.com/@nageshmashette32/small-language-models-slms-305597c9edf2 (accessed on 31 May 2024).
- Peng, Z.; Ma, X. A survey on construction and enhancement methods in service chatbots design. Ccf Trans. Pervasive Comput. Interact. 2019, 1, 204–223. [Google Scholar] [CrossRef]
- Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
- Engati. Statistical Language Modeling. Available online: www.engati.com/glossary/statistical-language-modeling (accessed on 31 May 2024).
- Minaee, S.; Mikolov, T.; Nikzad, N.; Chenaghlu, M.; Socher, R.; Amatriain, X.; Gao, J. Large language models: A survey. arXiv 2024, arXiv:2402.06196. [Google Scholar]
- IBM Watson. Available online: https://www.ibm.com/watson (accessed on 31 May 2024).
- Bakarov, A. A survey of word embeddings evaluation methods. arXiv 2018, arXiv:1801.09536. [Google Scholar]
- Improving Language Understanding with Unsupervised Learning. Available online: https://openai.com/research/language-unsupervised (accessed on 31 May 2024).
- Better Language Models and Their Implications. Available online: https://openai.com/research/better-language-models (accessed on 31 May 2024).
- Manyika, J.; Hsiao, S. An overview of Bard: An early experiment with generative AI. Google Static Doc. 2023, 1–9. Available online: https://gemini.google/overview-gemini-app.pdf (accessed on 31 January 2024).
- Naveed, H.; Khan, A.U.; Qiu, S.; Saqib, M.; Anwar, S.; Usman, M.; Barnes, N.; Mian, A. A comprehensive overview of large language models. arXiv 2023, arXiv:2307.06435. [Google Scholar]
- Hadi, M.U.; Qureshi, R.; Shah, A.; Irfan, M.; Zafar, A.; Shaikh, M.B.; Akhtar, N.; Wu, J.; Mirjalili, S. Large language models: A comprehensive survey of its applications, challenges, limitations, and future prospects. Authorea Prepr. 2023, 1–45. [Google Scholar] [CrossRef]
- Ambika. Large Language Models (LLMs): A Brief History, Applications and Challenges. Available online: https://blog.gopenai.com/large-language-models-llms-a-brief-history-applications-challenges-c2fab10fa2e7 (accessed on 31 May 2024).
- Verma, A. Self-Attention Mechanism Transformers. Medium. 2023. Available online: https://medium.com/@averma9838/self-attention-mechanism-transformers-41d1afea46cf (accessed on 31 May 2024).
- Vaniukov, S. NLP vs LLM: A Comprehensive Guide to Understanding Key Differences. Medium. 2024. Available online: https://medium.com/@vaniukov.s/nlp-vs-llm-a-comprehensive-guide-to-understanding-key-differences-0358f6571910 (accessed on 31 May 2024).
- Bates, M. Models of natural language understanding. Proc. Natl. Acad. Sci. USA 1995, 92, 9977–9982. [Google Scholar] [CrossRef]
- Wu, T.; He, S.; Liu, J.; Sun, S.; Liu, K.; Han, Q.L.; Tang, Y. A brief overview of ChatGPT: The history, status quo and potential future development. IEEE Caa J. Autom. Sin. 2023, 10, 1122–1136. [Google Scholar] [CrossRef]
- Ahmad, N.A.; Che, M.H.; Zainal, A.; Abd Rauf, M.F.; Adnan, Z. Review of chatbots design techniques. Int. J. Comput. Appl. 2018, 181, 7–10. [Google Scholar]
- Devakunchari, R.; Agarwal, R.; Agarwal, E. A survey of Chatbot design techniques. Int. J. Eng. Adv. Technol. (IJEAT) 2019, 8, 35–39. [Google Scholar]
- Shawar, B.A.; Atwell, E. A Comparison between Alice and Elizabeth Chatbot Systems; University of Leeds, School of Computing Research Report: Leeds, UK, 2002; pp. 1–13. [Google Scholar]
- Mittal, M.; Battineni, G.; Singh, D.; Nagarwal, T.; Yadav, P. Web-based chatbot for frequently asked queries (FAQ) in hospitals. J. Taibah Univ. Med. Sci. 2021, 16, 740–746. [Google Scholar] [CrossRef] [PubMed]
- Thoppilan, R.; De Freitas, D.; Hall, J.; Shazeer, N.; Kulshreshtha, A.; Cheng, H.T.; Jin, A.; Bos, T.; Baker, L.; Du, Y.; et al. Lamda: Language models for dialog applications. arXiv 2022, arXiv:2201.08239. [Google Scholar]
- Zheng, X.; Zhang, C.; Woodland, P.C. Adapting GPT, GPT-2 and BERT language models for speech recognition. In Proceedings of the IEEE Automatic Speech Recognition and Understanding Workshop (ASRU), Cartagena, Colombia, 13–17 December 2021; pp. 1–7. [Google Scholar]
- Ferrucci, D.; Brown, E.; Chu-Carroll, J.; Fan, J.; Gondek, D.; Kalyanpur, A.A.; Lally, A.; Murdock, J.W.; Nyberg, E.; Prager, J.; et al. Building Watson: An overview of the DeepQA project. AI Mag. 2010, 31, 59–79. [Google Scholar]
- Sharma, V.; Goyal, M.; Malik, D. An intelligent behaviour shown by Chatbot system. Int. J. New Technol. Res. 2017, 3, 263312. [Google Scholar]
- Pereira, M.J.; Coheur, L.; Fialho, P.; Ribeiro, R. Chatbots’ greetings to human-computer communication. arXiv 2016, arXiv:1609.06479. [Google Scholar]
- Carpenter, R.; Freeman, J. Computing machinery and the individual: The personal Turing test. Computing 2005, 22, 1–4. [Google Scholar]
- Giacaglia, G. How IBM Watson Works. Medium. 2021. Available online: https://medium.com/@giacaglia/how-ibm-watson-works-40d8d5185ac8 (accessed on 31 May 2024).
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and efficient foundation language models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Ahmed, I.; Roy, A.; Kajol, M.; Hasan, U.; Datta, P.P.; Reza, M.R. ChatGPT vs. Bard: A comparative study. Authorea Prepr. 2023, 1–18. [Google Scholar] [CrossRef]
- Yu, C. PaddlePaddle/ERNIE. 2024. Available online: https://github.com/hotpads/ERNIE-for-the-Rest-of-Us (accessed on 19 June 2024).
- Rudolph, J.; Tan, S.; Tan, S. War of the chatbots: Bard, Bing Chat, ChatGPT, Ernie and beyond. The new AI gold rush and its impact on higher education. J. Appl. Learn. Teach. 2023, 6, 364–389. [Google Scholar] [CrossRef]
- XAI. Open Release of Grok-1. 2024. Available online: https://x.ai/blog/grok-os (accessed on 12 July 2024).
- Nguyen, T.T.; Nguyen, Q.V.H.; Nguyen, D.T.; Nguyen, D.T.; Huynh-The, T.; Nahavandi, S.; Nguyen, T.T.; Pham, Q.V.; Nguyen, C.M. Deep learning for deepfakes creation and detection: A survey. Comput. Vis. Image Underst. 2023, 103525. [Google Scholar]
- Kalla, D.; Kuraku, S. Advantages, disadvantages and risks associated with chatgpt and ai on cybersecurity. J. Emerg. Technol. Innov. Res. 2023, 10, 85–94. [Google Scholar]
- TheStreet Guest Contributor. We Asked a Chatbot Why Its So Dangerous. 2023. Available online: https://www.thestreet.com/technology/we-asked-a-chatbot-why-its-so-dangerous (accessed on 31 May 2024).
- Liu, B.; Xiao, B.; Jiang, X.; Cen, S.; He, X.; Dou, W. Adversarial Attacks on Large Language Model-Based System and Mitigating Strategies: A case study on ChatGPT. Secur. Commun. Netw. 2023, 8691095, 1–10. [Google Scholar] [CrossRef]
- Zhu, K.; Wang, J.; Zhou, J.; Wang, Z.; Chen, H.; Wang, Y.; Yang, L.; Ye, W.; Gong, N.Z.; Zhang, Y.; et al. Promptbench: Towards evaluating the robustness of large language models on adversarial prompts. arXiv 2023, arXiv:2306.04528. [Google Scholar]
- Sern, L.J.; David, Y.G.P.; Hao, C.J. PhishGAN: Data augmentation and identification of homoglyph attacks. In Proceedings of the IEEE International Conference on Communications, Computing, Cybersecurity, and Informatics (CCCI), Sharjah, United Arab Emirates, 3–5 November 2020; pp. 1–6. [Google Scholar]
- Liu, Y.; Jia, Y.; Geng, R.; Jia, J.; Gong, N.Z. Prompt injection attacks and defenses in llm-integrated applications. arXiv 2023, arXiv:2310.12815. [Google Scholar]
- Bilika, D.; Michopoulou, N.; Alepis, E.; Patsakis, C. Hello me, meet the real me: Audio deepfake attacks on voice assistants. arXiv 2023, arXiv:2302.10328. [Google Scholar]
- Vaidya, T.; Zhang, Y.; Sherr, M.; Shields, C. Cocaine noodles: Exploiting the gap between human and machine speech recognition. In Proceedings of the 9th USENIX Workshop on Offensive Technologies (WOOT), Washington, DC, USA, 10–11 August 2015; pp. 1–14. [Google Scholar]
- Huber, M.; Kowalski, S.; Nohlberg, M.; Tjoa, S. Towards automating social engineering using social networking sites. In Proceedings of the IEEE International Conference on Computational Science and Engineering, Vancouver, BC, Canada, 29–31 August 2009; Volume 3, pp. 117–124. [Google Scholar]
- Elsayed, G.F.; Goodfellow, I.; Sohl-Dickstein, J. Adversarial reprogramming of neural networks. arXiv 2018, arXiv:1806.11146. [Google Scholar]
- Wang, J.; Hu, X.; Hou, W.; Chen, H.; Zheng, R.; Wang, Y.; Yang, L.; Huang, H.; Ye, W.; Geng, X.; et al. On the robustness of chatgpt: An adversarial and out-of-distribution perspective. arXiv 2023, arXiv:2302.12095. [Google Scholar]
- Shokri, R.; Stronati, M.; Song, C.; Shmatikov, V. Membership inference attacks against machine learning models. In Proceedings of the IEEE Symposium on Security and Privacy (SP), San Jose, CA, USA, 22–24 May 2017; pp. 3–18. [Google Scholar]
- Wan, A.; Wallace, E.; Shen, S.; Klein, D. Poisoning language models during instruction tuning. In Proceedings of the International Conference on Machine Learning, Honolulu, HI, USA, 23–29 July 2023; pp. 35413–35425. [Google Scholar]
- Li, J.; Yang, Y.; Wu, Z.; Vydiswaran, V.G.; Xiao, C. Chatgpt as an attack tool: Stealthy textual backdoor attack via blackbox generative model trigger. arXiv 2023, arXiv:2304.14475. [Google Scholar]
- Liu, T.; Deng, Z.; Meng, G.; Li, Y.; Chen, K. Demystifying rce vulnerabilities in llm-integrated apps. arXiv 2023, arXiv:2309.02926. [Google Scholar]
- Carlini, N.; Tramer, F.; Wallace, E.; Jagielski, M.; Herbert-Voss, A.; Lee, K.; Roberts, A.; Brown, T.; Song, D.; Erlingsson, U.; et al. Extracting training data from large language models. In Proceedings of the 30th USENIX Security Symposium (USENIX Security 21), Online, 11–13 August 2021; pp. 2633–2650. [Google Scholar]
- ONeal, A.J. Chat GPT “DAN” (and other “Jailbreaks”). Available online: https://gist.github.com/coolaj86/6f4f7b30129b0251f61fa7baaa881516 (accessed on 31 May 2024).
- Ye, W.; Li, Q. Chatbot security and privacy in the age of personal assistants. In Proceedings of the IEEE/ACM Symposium on Edge Computing (SEC), San Jose, CA, USA, 12–14 November 2020; pp. 388–393. [Google Scholar]
- Yao, Y.; Duan, J.; Xu, K.; Cai, Y.; Sun, Z.; Zhang, Y. A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confid. Comput. 2024, 4, 1–24. [Google Scholar] [CrossRef]
- Li, H.; Guo, D.; Fan, W.; Xu, M.; Huang, J.; Meng, F.; Song, Y. Multi-step jailbreaking privacy attacks on chatgpt. arXiv 2023, arXiv:2304.05197. [Google Scholar]
- Yu, J.; Lin, X.; Xing, X. Gptfuzzer: Red teaming large language models with auto-generated jailbreak prompts. arXiv 2023, arXiv:2309.10253. [Google Scholar]
- Pa Pa, Y.M.; Tanizaki, S.; Kou, T.; Van Eeten, M.; Yoshioka, K.; Matsumoto, T. An attacker’s dream? Exploring the capabilities of chatgpt for developing malware. In Proceedings of the 16th Cyber Security Experimentation and Test Workshop, Marina del Rey, CA, USA, 7–8 August 2023; pp. 10–18. [Google Scholar]
- Alawida, M.; Abu Shawar, B.; Abiodun, O.I.; Mehmood, A.; Omolara, A.E.; Al Hwaitat, A.K. Unveiling the dark side of chatgpt: Exploring cyberattacks and enhancing user awareness. Information 2024, 15, 27. [Google Scholar] [CrossRef]
- Happe, A.; Cito, J. Getting pwn’d by ai: Penetration testing with large language models. In Proceedings of the ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, San Francisco, CA, USA, 3–9 December 2023; pp. 2082–2086. [Google Scholar]
- Roy, S.S.; Thota, P.; Naragam, K.V.; Nilizadeh, S. From Chatbots to PhishBots?–Preventing Phishing scams created using ChatGPT, Google Bard and Claude. arXiv 2023, arXiv:2310.19181. [Google Scholar]
- Beckerich, M.; Plein, L.; Coronado, S. Ratgpt: Turning online llms into proxies for malware attacks. arXiv 2023, arXiv:2308.09183. [Google Scholar]
- Liu, Y.; Deng, G.; Xu, Z.; Li, Y.; Zheng, Y.; Zhang, Y.; Zhao, L.; Zhang, T.; Wang, K.; Liu, Y. Jailbreaking chatgpt via prompt engineering: An empirical study. arXiv 2023, arXiv:2305.13860. [Google Scholar]
- Bai, Y.; Jones, A.; Ndousse, K.; Askell, A.; Chen, A.; DasSarma, N.; Drain, D.; Fort, S.; Ganguli, D.; Henighan, T.; et al. Training a helpful and harmless assistant with reinforcement learning from human feedback. arXiv 2022, arXiv:2204.05862. [Google Scholar]
- Roy, S.S.; Naragam, K.V.; Nilizadeh, S. Generating phishing attacks using chatgpt. arXiv 2023, arXiv:2305.05133. [Google Scholar]
- Si, W.M.; Backes, M.; Blackburn, J.; De Cristofaro, E.; Stringhini, G.; Zannettou, S.; Zhang, Y. Why so toxic? Measuring and triggering toxic behavior in open-domain chatbots. In Proceedings of the ACM SIGSAC Conference on Computer and Communications Security, Los Angeles, CA, USA, 7–11 November 2022; pp. 2659–2673. [Google Scholar]
- Liu, Y.; Deng, G.; Li, Y.; Wang, K.; Zhang, T.; Liu, Y.; Wang, H.; Zheng, Y.; Liu, Y. Prompt Injection attack against LLM-integrated Applications. arXiv 2023, arXiv:2306.05499. [Google Scholar]
- Ba, Z.; Zhong, J.; Lei, J.; Cheng, P.; Wang, Q.; Qin, Z.; Wang, Z.; Ren, K. SurrogatePrompt: Bypassing the Safety Filter of Text-To-Image Models via Substitution. arXiv 2023, arXiv:2309.14122. [Google Scholar]
- Charfeddine, M.; Kammoun, H.M.; Hamdaoui, B.; Guizani, M. ChatGPT’s Security Risks and Benefits: Offensive and Defensive Use-Cases, Mitigation Measures, and Future Implications. IEEE Access 2024, 12, 30263–30310. [Google Scholar] [CrossRef]
- Chen, B.; Paliwal, A.; Yan, Q. Jailbreaker in jail: Moving target defense for large language models. In Proceedings of the 10th ACM Workshop on Moving Target Defense, Copenhagen, Denmark, 26 November 2023; pp. 29–32. [Google Scholar]
- Robey, A.; Wong, E.; Hassani, H.; Pappas, G.J. Smoothllm: Defending large language models against jailbreaking attacks. arXiv 2023, arXiv:2310.03684. [Google Scholar]
- Kumar, A.; Agarwal, C.; Srinivas, S.; Feizi, S.; Lakkaraju, H. Certifying llm safety against adversarial prompting. arXiv 2023, arXiv:2309.02705. [Google Scholar]
- Baudart, G.; Dolby, J.; Duesterwald, E.; Hirzel, M.; Shinnar, A. Protecting chatbots from toxic content. In Proceedings of the ACM SIGPLAN International Symposium on New Ideas, New Paradigms, and Reflections on Programming and Software, Boston, MA, USA, 7–8 November 2018; pp. 99–110. [Google Scholar]
- Arora, A.; Arora, A.; McIntyre, J. Developing chatbots for cyber security: Assessing threats through sentiment analysis on social media. Sustainability 2023, 15, 13178. [Google Scholar] [CrossRef]
- Edu, J.; Mulligan, C.; Pierazzi, F.; Polakis, J.; Suarez-Tangil, G.; Such, J. Exploring the security and privacy risks of chatbots in messaging services. In Proceedings of the ACM internet Measurement Conference, Nice, France, 25–27 October 2022; pp. 581–588. [Google Scholar]
- Malik, K.M.; Malik, H.; Baumann, R. Towards vulnerability analysis of voice-driven interfaces and countermeasures for replay attacks. In Proceedings of the IEEE Conference on Multimedia Information Processing and Retrieval (MIPR), San Jose, CA, USA, 28–30 March 2019; pp. 523–528. [Google Scholar]
- Lempinen, M.; Juntunen, A.; Pyyny, E. Chatbot for Assessing System Security with OpenAI GPT-3.5. Bachelor’s Thesis, University of Oulu Repository, Oulu, Finland, 2023; pp. 1–34. Available online: https://oulurepo.oulu.fi/handle/10024/42952 (accessed on 31 May 2024).
- Yamin, M.M.; Hashmi, E.; Ullah, M.; Katt, B. Applications of LLMs for Generating Cyber Security Exercise Scenarios. Res. Sq. 2024, 1–17. [Google Scholar]
- Franco, M.F.; Rodrigues, B.; Scheid, E.J.; Jacobs, A.; Killer, C.; Granville, L.Z.; Stiller, B. SecBot: A Business-Driven Conversational Agent for Cybersecurity Planning and Management. In Proceedings of the 16th International Conference on Network and Service Management (CNSM), Izmir, Turkey, 2–6 November 2020; pp. 1–7. [Google Scholar]
- Liu, Y.; Yao, Y.; Ton, J.F.; Zhang, X.; Cheng, R.G.H.; Klochkov, Y.; Taufiq, M.F.; Li, H. Trustworthy LLMs: A Survey and Guideline for Evaluating Large Language Models’ Alignment. arXiv 2023, arXiv:2308.05374. [Google Scholar]
- Hadi, M.U.; Qureshi, R.; Shah, A.; Irfan, M.; Zafar, A.; Shaikh, M.B.; Akhtar, N.; Wu, J.; Mirjalili, S. A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Prepr. 2023, 1–31. [Google Scholar]
- Wolf, Y.; Wies, N.; Avnery, O.; Levine, Y.; Shashua, A. Fundamental limitations of alignment in large language models. arXiv 2023, arXiv:2304.11082. [Google Scholar]
- Kadavath, S.; Conerly, T.; Askell, A.; Henighan, T.; Drain, D.; Perez, E.; Schiefer, N.; Hatfield-Dodds, Z.; DasSarma, N.; Tran-Johnson, E.; et al. Language models (mostly) know what they know. arXiv 2022, arXiv:2207.05221. [Google Scholar]
- Lin, S.; Hilton, J.; Evans, O. Teaching models to express their uncertainty in words. arXiv 2022, arXiv:2205.14334. [Google Scholar]
- Montalbano, E. ChatGPT Hallucinations Open Developers to Supply Chain Malware Attacks. 2023. Available online: https://www.darkreading.com/application-security/chatgpt-hallucinations-developers-supply-chain-malware-attacks (accessed on 31 May 2024).
- Huang, L.; Yu, W.; Ma, W.; Zhong, W.; Feng, Z.; Wang, H.; Chen, Q.; Peng, W.; Feng, X.; Qin, B.; et al. A survey on hallucination in large language models: Principles, taxonomy, challenges, and open questions. arXiv 2023, arXiv:2311.05232. [Google Scholar]
- Qiu, H.; Zhang, S.; Li, A.; He, H.; Lan, Z. Latent jailbreak: A benchmark for evaluating text safety and output robustness of large language models. arXiv 2023, arXiv:2307.08487. [Google Scholar]
- Li, Z.; Peng, B.; He, P.; Yan, X. Evaluating the instruction-following robustness of large language models to prompt injection. arXiv 2023, arXiv:2308.10819. [Google Scholar]
Event | Year | Significance |
---|---|---|
Development of n-gram models | 1940s | Used for intent detection in chatbots and LLMs |
Origins of NLP | 1950s | NLP allows chatbots to understand and respond to human speech |
Growth of NLP computational power | 1980s | NLP starts to use machine learning algorithms |
Development of statistical LMs | 1990s | Useful for validation and alignment of models |
Development of neural LMs | 2000s | Helpful for completion of machine translation tasks |
Development of word embeddings | 2010s | Useful for response generation in chatbots and LLMs |
Development of transformer architecture | 2017 | Can be used to improve a chatbot’s or LLM’s contextual understanding abilities. |
Development of pretrained LMs | 2018 | Helpful for completion of NLP tasks. |
Release of the first GPT model | 2018 | The model did not use task-specific training. GPT family of models used for ChatGPT. |
Development of general-purpose models | 2020s | Skilled at a variety of tasks. |
Chatbot | Year | Significance |
---|---|---|
ELIZA | 1966 | Widely considered to be the first chatbot. |
PARRY | 1972 | Brought awareness to ethical considerations when using chatbots for healthcare purposes. |
Jabberwacky | 1980s | Employed AI in a more advanced form. Combined AI and pattern matching to achieve its functionality. |
Dr. Sbaitso | 1991 | Demonstrated ability of sound cards developed by Creative Labs. Communicated using speech. |
ALICE | 1995 | First online chatbot with web discussion ability. |
Watson | 2007–Present | There have been many iterations of Watson, including a chatbot and a platform for management of AI models. |
Microsoft Copilot | 2023 | Boosts productivity in Microsoft 365 by automating tasks and providing smart insights. |
ChatGPT | 2022 | One of the most widely used chatbots and can assist with a variety of tasks. |
LLaMA | 2023 | Has open-source accessibility, scalability, efficient performance, and contribution to ethical AI development and research |
Gemini | 2023 | Has a web-searching ability and can provide up-to-date information. |
Claude | 2023 | Can express tone, personality, and behavior based on instructions. |
Ernie | 2023 | Has multilingual support, integration of extensive knowledge, applicability across various industries, and enhancement of user experience. |
Grok | 2023 | Delivers real-time knowledge through the X social media platform, elevates user experiences, and is willing to respond to spicy questions typically rejected by most other chatbots. |
Chatbot | Year | Approach | Functionalities | Strengths | Weaknesses |
---|---|---|---|---|---|
ELIZA | 1966 | Pattern matching, Early AI & ML | ELIZA simulated human speech with a pattern-matching algorithm and substitution technique, following scripted instructions to generate responses. It matched input against a set of rules to select the right reply. | Simple implementation, engaged users. | Only had information on a singular topic, not flexible/adaptable, limited conversational patterns. |
PARRY | 1972 | Pattern matching | PARRY analyzed user input for key phrases alongside pattern matching to understand intent, tailoring responses based on the importance of words in the conversation. | Strong controlling structure, could adapt responses based on weight variance in user’s prompts. | Low language understanding, incapable of learning from conversation computationally slow. |
Jabberwacky | 1981 | ML and NLP | Jabberwacky uses a contextual pattern-matching algorithm that matches the user’s input with the appropriate response. Its use of ML allows it to retain knowledge, which increases the number of potential responses. | Improved conversational flow, provided a semi-personal experience by remembering some data. | Computationally slow, cannot handle many users at one time. |
Dr. Sbaitso | 1991 | Text-to-speech synthesis | Dr. Sbaitso uses several sound cards to generate speech. | Its verbal communication made it more “human” and engaging than its predecessors. | Limited conversational abilities due to small set of possible responses. |
ALICE | 1995 | NLP, pattern matching, rule-based systems | ALICE understands the user input. Pattern matching is used to find keywords in the user input. ALICE also uses a ruleset defined by the developer to dictate its responses. | Did not need to know the context of whole conversation. Could discuss a variety of topics. | Limited natural language understanding, inflexible. |
Watson | 2007 | DeepQA | Watson uses the DeepQA approach. Uses content acquisition, question analysis, hypothesis generation, soft filtering, hypothesis scoring, final merging, answer merging, and ranking estimation. | Natural language processing, parallel computing, scalability. | Lack of common sense, resource intensive, dependency on training data. |
ChatGPT | 2022 | Transformer architecture | First, a model is trained with supervised fine-tuning. Then, a reward model is trained with comparison data from sampled and labeled outputs. Finally, a policy is optimized against the reward model using reinforcement learning. | Large knowledge base, adaptable, availability, natural language understanding, text generation. | Vulnerable to bias, unable to verify information. |
Gemini | 2023 | LaMDA | Gemini is pretrained on large publicly available data, enabling it to learn language patterns. Upon receiving context, Gemini creates drafts of potential responses, which are checked against safety parameters. Uses RLHF to improve itself. | Text generation, natural language understanding, access to up-to-date information. | Vulnerable to bias, inconsistent responses, does not provide sources. |
LLaMA | 2023 | Transformer architecture | It is trained on over 15 trillion tokens with quality control, including substantial code and non-English data from 30 languages. Excels in chat, code generation, creative composition, logical reasoning, and multimodal abilities. | Highly efficient and scalable in various tasks like text generation, translation, and summarization. | Vulnerable to data bias, not for calculus/statistics, limited real-time access, resource-intensive. |
Ernie | 2023 | Transformer architecture | Uses a masking pretraining strategy with knowledge graphs and multitasking. Analyzes sentiment, identifies entities, and classifies or generates text. Answers questions on context, measures textual similarity, retrieves information, and translate languages. | Enhanced contextual understanding, strong performance, multilingual capabilities, designed for various applications. | Data bias, resource-intensive, stronger performance in Chinese language, limited access. |
Grok | 2023 | Transformer architecture | It was trained on a huge amount of text data with unsupervised learning. It was designed to handle a wide range of questions, generating text, code assistance, and combined text and images in queries. Uses RLHF to improve itself. | Real-time information, individual user preferences, multilingual support, provide personalized recommendations, humor, excels in math and reasoning, open-source. | Contextual limitations, potential for bias, misunderstanding complex queries, resource intensive, lack of emotional intelligence, safety and ethical concerns. |
Positive Effects | Negative Effects | ||
---|---|---|---|
Effect | Description | Effect | Description |
Increased autonomy and independence | Instead of relying on others to complete tasks or obtain knowledge, an individual can converse with a chatbot. | Improper handling of personal information | Chatbots may not effectively protect personal information if they do not have mechanisms that prioritize the protection of personal information. |
Increase in available knowledge | Chatbots make the process of gaining new knowledge easier, as they have interfaces that make interaction accessible for users of all skill levels. | Generation of incorrect information | There is a risk chatbots may generate incorrect information. |
Human connection | There are chatbot-powered platforms that can connect like-minded individuals. | Ethical concerns in academia | Chatbots can be used to plagiarize papers and exams. If an AI writing checker is used, papers could also be incorrectly flagged. |
Cyber defense | Chatbots can be used for a variety of cyber defenses. Some of these defensive tools are augmented versions of existing tools, while other ones are novel tools that leverage the unique abilities of chatbots and LLMs. | Cyber offense | Chatbots can be used for a variety of cyber offenses. Some of these offensive tools are augmented versions of existing tools, while other ones are novel tools that leverage the unique abilities of chatbots and LLMs |
Applications in customer service industry | Customer service chatbots produce high-quality, easy to understand, and typically accurate answers to a customer’s questions. | Inadequate at providing emotional support | Not all chatbots are adequate at providing emotional support. Relying on the wrong chatbot could have serious consequences. |
Educational benefits | Chatbots can help answer questions, review papers, and provide information at the user’s own pace. | Dependency on technology | Over-reliance on chatbots may diminish human skills. |
Applications in healthcare | Chatbots are frequently used for tasks such as diagnosis, collecting information on a patient, interpreting medical images, and documentation. | Potential to produce biased answers | Depending on the data used during the training phase, chatbots may produce biased responses. Chatbots trained on data containing biased content are more likely to produce biased responses. |
Attack Name | Description | Related Work(s) |
---|---|---|
Social engineering attack | Chatbots are being used to generate social engineering attempts. | Leverages a chatbot’s ability to generate convincing, human-like text [22]. |
Phishing attack | Exploits a chatbot’s ability to generate human-like speech to mimic a legitimate line of communication. | MJP is a multistep jailbreaking prompt attack that uses multiple prompts to bypass a chatbot’s content filters [83]. GPTFUZZER automates generation of jailbreaking templates [84]. |
Ransomware and malware generation attacks | Chatbots are being used to generate malware and ransomware. Adversaries achieve this by manipulating the prompts given to chatbots, deceiving them into generation of malicious code. | The GPT family of models can generate different types of malware, varying in complexity [85]. |
Macros and living off the land binary attacks | Occur when a victim downloads a chatbot-generated spreadsheet that executes a macro, allowing the adversary to gain access to their machine. | Chatbots are capable of writing macros that launch malware using trusted system tools [86]. |
SQL injection attack | Occurs when a chatbot is used to generate code with an injected payload. | Chatbots can generate code to be injected into an application to access sensitive data [86]. |
Low-level privilege access attack | A chatbot is asked to generate commands for the desired privilege escalation attack. | Chatbots are capable of vulnerable commands that are fed back to the victim machine for privilege escalation [87]. |
Work | Year | Dataset/Benchmark | Contributions | Strengths/Advantages | Weaknesses/Limitations |
---|---|---|---|---|---|
[89] | 2023 | N/A | Uses plugins with ChatGPT to establish a proxy between an adversary and the victim. | Proof of concept shows significant concerns regarding LLM security. | No practical experiments to support claims. |
[22] | 2023 | N/A | Explores attacks on and using ChatGPT, as well as defenses. | Provides an in-depth discussion of the security concerns of ChatGPT, and supports claims with screenshots of adversarial prompting experiments. | Only discusses ChatGPT, so findings may not be applicable to other LLMs. |
[83] | 2023 | Enron Email, PII from institutional pages | A novel multistep jailbreaking prompt that bypasses ChatGPT’s content filters. | The jailbreaking prompt is highly effective in some use cases and can be used in combination with other adversarial methods to achieve even better performance. | Experiences low accuracy in some cases. Free-form extraction method for the New Bing can lead to repeated or incorrect patterns. |
[84] | 2023 | Human-written jailbreaking prompts gathered from the Internet for initial seed, datasets from [90,91] | Introduces a novel jailbreak fuzzing framework to automate generation of jailbreaking templates. | Efficiently produces jailbreaking templates with high ASR. Generated templates consistently outperform human-generated jailbreaking templates. | Requires human intervention for initial seed. Ignores question transformation, risking keyword matching to reject prompts. |
[77] | 2023 | SST-2, AGNews, Amazon, Yelp, IMDB | BGMAttack, a textual backdoor attack method that uses a text generative model as a trigger. | Maintains comparable performance to other similar attack methods but is stealthier. | Requires human cognition evaluations to verify efficacy. ChatGPT AI’s instability may cause performance issues. |
[92] | 2023 | N/A | Discusses and demonstrates the the use of ChatGPT for a variety of phishing attacks. | Thorough descriptions of attacks, figures and screenshots to support claims. | The work needs more practical experiments, like testing the phishing components in real-world testing conditions. |
[85] | 2023 | Collection of jailbreaking prompts | Investigates the use of chatbots for development of malware. | Provides in-depth discussion and analysis of malware generation capabilities of chatbots. | Limited scope. |
[93] | 2022 | 4chan and Reddit datasets | Explores how chatbots can potentially generate toxic answers when given non-toxic prompts. | The ToxicBuddy attack is a novel method that focuses on the generation of non-toxic prompts to generate toxic answers. It highlights that any prompt given to a chatbot has toxic potential. | Relies on tools that can be biased. The definition of toxic is subjective, making the results and thresholds used in this work subjective as well. |
[74] | 2023 | AdvGLUE, ANLI, Flipkart Review, DDXPlus | Evaluates robustness of ChatGPT from adversarial and out-of-distribution perspectives. | Provides strong, detailed evaluation of several commercially available LLMs. | Does not evaluate all chatbots’ capabilities and may be invalid due to the small dataset size. |
[94] | 2024 | N/A | Proposes HouYi, a black-box prompt injection attack, like web injection attacks. | Proposed attack method is very effective and has identified vulnerabilities confirmed by vendors. | Lacks qualitative data. |
[95] | 2023 | N/A | Proposes a framework to generate attack prompts to bypass Midjourney’s safety filter. | 88% ASR when attacking Midjourney, multimodal approach. | Struggles to pass filters when asking for images of explicit content. |
Defense Name | Description | Related Work(s) |
---|---|---|
Defense automation | Chatbots can be used to automate a variety of defenses, relieving the burden on professionals | Chatbots can be used for advice on how to avoid dangerous scripts, education of analysts, and detecting attacks [22] |
Security bug detection | Chatbots can be used to review code/applications to detect security bugs | ChatGPT can be used to perform in-depth code reviews, and is effective at doing so due to its in-depth security knowledge [22] |
Secure code generation | Chatbots can be used to either develop secure code or analyze existing code | ChatGPT is skilled at developing secured code due to its in depth security knowledge [22] |
Policy development | Chatbots can be used to develop security policies for an organization | ChatGPT is skilled at policy writing due to its in-depth security knowledge [96] |
Security questionnaires | Chatbots can be used to speed up preparing security questionnaires | ChatGPT can speed up generating security questionnaires [96] |
Incident response and recovery | For incident response and recovery, chatbots can be used to analyze data obtained during the incident, notify the appropriate parties of an incident, review and revise the incident response plan, and provide a summary of the incident | Chatbots expedite and improve the incident response process [96] |
Corpora cleaning | Training data are sent through a pipeline to remove undesirable content from the training corpora | Cleaning an LLM’s training corpora is important to remove flawed or toxic information [82] |
Use of a robust training method | The use of a robust training method makes chatbots/LLMs less susceptible to attacks, as well as improves their safety and alignment | Robust training increases LLM resilience against certain text-based attacks [82] |
Proper instruction preprocessing practices | Proper instruction preprocessing practices reduces the chance that a model will be maliciously used or receive suspicious instructions | This practice leads to higher-quality data, reduces the chance a model will be used maliciously, and increases the algorithm’s readability [82] |
Education and awareness Practices | It is important to educate the appropriate parties on proper use case and protocols when using chatbots/LLMs | It is important that security personnel are well trained as they are the first line of defense against attacks [82] |
Work | Year | Dataset/Benchmark | Contributions | Strengths/Advantages | Weaknesses/Limitations |
---|---|---|---|---|---|
[97] | 2023 | Set of 4 adversarial prompts | A moving target defense LLM system that protects LLMs against adversarial prompting attacks with a 100% success rate | Introduces a highly successful defense mechanism. | Inadequate experimental evaluations analysis |
[98] | 2023 | AdvBench | First-of-its-kind algorithm for mitigating jailbreaking attacks on LLMs | SmoothLLM: efficient, versatile, and defends against unforeseen prompt attacks. | Smoothing process may introduce noise, may not be applicable to other types of attacks |
[99] | 2024 | AdvBench | Erase-and-check framework, the first to defend against adversarial prompts, with safety guarantees | Easily adaptable and safe. A promising direction for defenses against adversarial prompts. | Computationally expensive, relies heavily on accurate safety filters |
[88] | 2024 | Collections of phishing prompts obtained using several different techniques | A BERT-based tool that detects malicious prompts to reduce the likelihood LLMs will generate phishing content | The BERT-based approach is highly effective against several prompting scenarios that may provoke LLMs into generating phishing. | This defense targets text-based phishing and potential phishing but not other attacks, requiring user interaction, e.g., reCAPTCHA or browser-in-browser |
[100] | 2018 | N/A | The BotShield framework is designed to defend chatbots against toxic content in user inputs using a multidisciplinary approach | BotShield does not require any changes to the protected chatbot, easing the burden on the developer. | Experiences latency, can be difficult to implement, does not detect bots, does not implement differential privacy |
[101] | 2023 | Collection of tweets | Proposes a chatbot for deployment on social media sites for cyber-attack and threat prediction | The proposed chatbot is a preventative method that can detect attacks before they occur, potentially saving time and resources. | This chatbot is only applicable to Twitter, so not a widespread solution |
[102] | 2022 | Metadata scraped from top.gg such as chatbot ID, name, URL, and tags | A methodology that assesses security and privacy issues in messaging platform chatbots | This methodology can can identify chatbots that pose security and privacy risks accurately and highlights the need for more research in this area. | Framework is resource-intensive, relying on traceability analysis, which may be ineffective due to the the ambiguity of words |
[103] | 2019 | Voice recordings | Proposes a methodology for the detection of voice replay attacks | The proposed method is a preventative method that can accurately attack voice replay attacks and is a novel contribution. | HOSA may not be able to accurately differentiate between natural and synthetic voice(s) |
[104] | 2023 | HIDS logs, survey results | Proposes a chatbot that analyzes HIDS log data to assist the user with assessing system security | The proposed chatbot has an easy-to-use interface, automates a series of analysis tasks, and eases the burden of the user. | Lack of a comprehensive qualitative analysis and uses a small sample size |
[105] | 2024 | N/A | Explores the use of LLM hallucinations to create cyber exercise scenarios | The innovative use of LLM hallucinations generated accurate and effective cyber exercise scenarios. | The whole exercise design process is not described, and parts are unclear |
[106] | 2020 | Samples of intents and entities | Proposes a conversational chatbot for cybersecurity planning and management | SecBot interacts with users and provides helpful, detailed responses and can identify attacks and provide insights about risks. | If the correct dataset is not used, SecBot can be prone to overfitting |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Szmurlo, H.; Akhtar, Z. Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity. Information 2024, 15, 443. https://doi.org/10.3390/info15080443
Szmurlo H, Akhtar Z. Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity. Information. 2024; 15(8):443. https://doi.org/10.3390/info15080443
Chicago/Turabian StyleSzmurlo, Hannah, and Zahid Akhtar. 2024. "Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity" Information 15, no. 8: 443. https://doi.org/10.3390/info15080443
APA StyleSzmurlo, H., & Akhtar, Z. (2024). Digital Sentinels and Antagonists: The Dual Nature of Chatbots in Cybersecurity. Information, 15(8), 443. https://doi.org/10.3390/info15080443