MDPI - Publisher of Open Access Journals

20 pages, 21076 KiB

Open AccessArticle

Domain-Aware Reinforcement Learning for Prompt Optimization

by Mengqi Gao, Bowen Sun, Tong Wang, Ziyu Fan, Tongpo Zhang and Zijun Zheng

Mathematics 2025, 13(16), 2552; https://doi.org/10.3390/math13162552 - 9 Aug 2025

Viewed by 453

Prompt engineering provides an efficient way to adapt large language models (LLMs) to downstream tasks without retraining model parameters. However, designing effective prompts can be challenging, especially when model gradients are unavailable and human expertise is required. Existing automated methods based on gradient [...] Read more.

Prompt engineering provides an efficient way to adapt large language models (LLMs) to downstream tasks without retraining model parameters. However, designing effective prompts can be challenging, especially when model gradients are unavailable and human expertise is required. Existing automated methods based on gradient optimization or heuristic search exhibit inherent limitations under black box or limited-query conditions. We propose Domain-Aware Reinforcement Learning for Prompt Optimization (DA-RLPO), which treats prompt editing as a sequential decision process and leverages structured domain knowledge to constrain candidate edits. Our experimental results show that DA-RLPO achieves higher accuracy than baselines on text classification tasks and maintains robust performance with limited API calls, while also demonstrating effectiveness on text-to-image and reasoning tasks. Full article

(This article belongs to the Special Issue Multi-Criteria Decision Making Under Uncertainty)

► Show Figures

Figure 1

19 pages, 313 KiB

Open AccessArticle

Survey on the Role of Mechanistic Interpretability in Generative AI

by Leonardo Ranaldi

Big Data Cogn. Comput. 2025, 9(8), 193; https://doi.org/10.3390/bdcc9080193 - 23 Jul 2025

Viewed by 1598

Abstract

The rapid advancement of artificial intelligence (AI) and machine learning has revolutionised how systems process information, make decisions, and adapt to dynamic environments. AI-driven approaches have significantly enhanced efficiency and problem-solving capabilities across various domains, from automated decision-making to knowledge representation and predictive [...] Read more.

The rapid advancement of artificial intelligence (AI) and machine learning has revolutionised how systems process information, make decisions, and adapt to dynamic environments. AI-driven approaches have significantly enhanced efficiency and problem-solving capabilities across various domains, from automated decision-making to knowledge representation and predictive modelling. These developments have led to the emergence of increasingly sophisticated models capable of learning patterns, reasoning over complex data structures, and generalising across tasks. As AI systems become more deeply integrated into networked infrastructures and the Internet of Things (IoT), their ability to process and interpret data in real-time is essential for optimising intelligent communication networks, distributed decision making, and autonomous IoT systems. However, despite these achievements, the internal mechanisms that drive LLMs’ reasoning and generalisation capabilities remain largely unexplored. This lack of transparency, compounded by challenges such as hallucinations, adversarial perturbations, and misaligned human expectations, raises concerns about their safe and beneficial deployment. Understanding the underlying principles governing AI models is crucial for their integration into intelligent network systems, automated decision-making processes, and secure digital infrastructures. This paper provides a comprehensive analysis of explainability approaches aimed at uncovering the fundamental mechanisms of LLMs. We investigate the strategic components contributing to their generalisation abilities, focusing on methods to quantify acquired knowledge and assess its representation within model parameters. Specifically, we examine mechanistic interpretability, probing techniques, and representation engineering as tools to decipher how knowledge is structured, encoded, and retrieved in AI systems. Furthermore, by adopting a mechanistic perspective, we analyse emergent phenomena within training dynamics, particularly memorisation and generalisation, which also play a crucial role in broader AI-driven systems, including adaptive network intelligence, edge computing, and real-time decision-making architectures. Understanding these principles is crucial for bridging the gap between black-box AI models and practical, explainable AI applications, thereby ensuring trust, robustness, and efficiency in language-based and general AI systems. Full article

► Show Figures

Figure 1

25 pages, 1429 KiB

Open AccessArticle

A Contrastive Semantic Watermarking Framework for Large Language Models

by Jianxin Wang, Xiangze Chang, Chaoen Xiao and Lei Zhang

Symmetry 2025, 17(7), 1124; https://doi.org/10.3390/sym17071124 - 14 Jul 2025

Viewed by 742

Abstract

The widespread deployment of large language models (LLMs) has raised urgent demands for verifiable content attribution and misuse mitigation. Existing text watermarking techniques often struggle in black-box or sampling-based scenarios due to limitations in robustness, imperceptibility, and detection generality. These challenges are particularly [...] Read more.

The widespread deployment of large language models (LLMs) has raised urgent demands for verifiable content attribution and misuse mitigation. Existing text watermarking techniques often struggle in black-box or sampling-based scenarios due to limitations in robustness, imperceptibility, and detection generality. These challenges are particularly critical in open-access settings, where model internals and generation logits are unavailable for attribution. To address these limitations, we propose CWS (Contrastive Watermarking with Semantic Modeling)—a novel keyless watermarking framework that integrates contrastive semantic token selection and shared embedding space alignment. CWS enables context-aware, fluent watermark embedding while supporting robust detection via a dual-branch mechanism: a lightweight z-score statistical test for public verification and a GRU-based semantic decoder for black-box adversarial robustness. Experiments on GPT-2, OPT-1.3B, and LLaMA-7B over C4 and DBpedia datasets demonstrate that CWS achieves F1 scores up to 99.9% and maintains F1 ≥ 93% under semantic rewriting, token substitution, and lossy compression (ε ≤ 0.25, δ ≤ 0.2). The GRU-based detector offers a superior speed–accuracy trade-off (0.42 s/sample) over LSTM and Transformer baselines. These results highlight CWS as a lightweight, black-box-compatible, and semantically robust watermarking method suitable for practical content attribution across LLM architectures and decoding strategies. Furthermore, CWS maintains a symmetrical architecture between embedding and detection stages via shared semantic representations, ensuring structural consistency and robustness. This semantic symmetry helps preserve detection reliability across diverse decoding strategies and adversarial conditions. Full article

(This article belongs to the Section Computer)

► Show Figures

Figure 1

22 pages, 969 KiB

Open AccessArticle

A Spectral Interpretable Bearing Fault Diagnosis Framework Powered by Large Language Models

by Panfeng Bao, Wenjun Yi, Yue Zhu, Yufeng Shen and Haotian Peng

Sensors 2025, 25(12), 3822; https://doi.org/10.3390/s25123822 - 19 Jun 2025

Viewed by 871

Abstract

Most existing fault diagnosis methods, although capable of extracting interpretable features such as attention-weighted fault-related frequencies, remain essentially black-box models that provide only classification results without transparent reasoning or diagnostic justification, limiting users’ ability to understand and trust diagnostic outcomes. In this work, [...] Read more.

Most existing fault diagnosis methods, although capable of extracting interpretable features such as attention-weighted fault-related frequencies, remain essentially black-box models that provide only classification results without transparent reasoning or diagnostic justification, limiting users’ ability to understand and trust diagnostic outcomes. In this work, we present a novel, interpretable fault diagnosis framework that integrates spectral feature extraction with large language models (LLMs). Vibration signals are first transformed into spectral representations using Hilbert- and Fourier-based encoders to highlight key frequencies and amplitudes. A channel attention-augmented convolutional neural network provides an initial fault type prediction. Subsequently, structured information—including operating conditions, spectral features, and CNN outputs—is fed into a fine-tuned enhanced LLM, which delivers both an accurate diagnosis and a transparent reasoning process. Experiments demonstrate that our framework achieves high diagnostic performance while substantially improving interpretability, making advanced fault diagnosis accessible to non-expert users in industrial settings. Full article

(This article belongs to the Topic Advances in Integrative AI, Machine Learning, and Big Data for Transformative Applications)

► Show Figures

Figure 1

30 pages, 4246 KiB

Open AccessArticle

Enhancing Online Learning Through Multi-Agent Debates for CS University Students

by Jing Du, Guangtao Xu, Wenhao Liu, Dibin Zhou and Fuchang Liu

Appl. Sci. 2025, 15(11), 5877; https://doi.org/10.3390/app15115877 - 23 May 2025

Viewed by 1034

Abstract

As recent advancements in large language models enhance reasoning across various domains, educators are increasingly exploring their use in conversation-based tutoring systems. However, since LLMs are black-box models to users and lack human-like problem-solving strategies, users are hardly convinced by the answers provided [...] Read more.

As recent advancements in large language models enhance reasoning across various domains, educators are increasingly exploring their use in conversation-based tutoring systems. However, since LLMs are black-box models to users and lack human-like problem-solving strategies, users are hardly convinced by the answers provided by LLMs. This lack of trust can potentially undermine the effectiveness of learning in educational scenarios. To address these issues, we introduce a novel approach that integrates multi-agent debates into a lecture video Q&A system, aiming to assist computer science (CS) university students in self-learning by using LLMs to simulate debates between affirmative and negative debaters and a judge to reach a final answer and presenting the entire process to users for review. This approach is expected to lead to better learning outcomes and the improvement of students’ critical thinking. To validate the effectiveness of this approach, we carried out a user study through a prototype system and conducted preliminary experiments based on video lecture learning involving 90 CS students from three universities. The study compared different conditions and demonstrated that students who had access to a combination of video-based Q&A and multi-agent debates performed significantly better on quizzes compared to those who only had access to the video or video-based Q&A. These findings indicate that integrating multi-agent debates with lecture videos can substantially enhance the learning experience, which is also beneficial for the development of students’ high-order thinking abilities in the future. Full article

(This article belongs to the Special Issue Artificial Intelligence Technologies for Education: Advancements, Challenges, and Impacts)

► Show Figures

Figure 1

20 pages, 266 KiB

Open AccessArticle

Code Word Cloud in Franz Kafka’s “Beim Bau der Chinesischen Mauer” [“The Great Wall of China”]

by Alex Mentzel

Humanities 2025, 14(4), 73; https://doi.org/10.3390/h14040073 - 25 Mar 2025

Viewed by 535

Abstract

Amidst the centenary reflections on Franz Kafka’s legacy, this article explores his work’s ongoing resonance with the digital age, particularly through the lens of generative AI and cloud computation. Anchored in a close reading of Kafka’s “Beim Bau der chinesischen Mauer”, this study [...] Read more.

Amidst the centenary reflections on Franz Kafka’s legacy, this article explores his work’s ongoing resonance with the digital age, particularly through the lens of generative AI and cloud computation. Anchored in a close reading of Kafka’s “Beim Bau der chinesischen Mauer”, this study interrogates how the spatial and temporal codes embedded in the narrative parallel the architectures of contemporary diffusion systems at the heart of AI models. Engaging with critical theory, media archaeology, and AI discourse, this article argues that the rise of large language models not only commodifies language but also recasts Kafka’s allegorical critiques of bureaucratic opacity and imperial command structures within a digital framework. The analysis leverages concepts like Kittler’s code, Benjamin’s figural cloud, and Hamacher’s linguistic dissemblance to position Kafka’s parables as proto-critical tools for examining AI’s black-box nature. Ultimately, the piece contends that Kafka’s text is less a metaphor for our technological present than a mirror reflecting the epistemological crises engendered by the collapse of semantic transparency in the era of algorithmic communication. This reframing invites a rethinking of how narrative, code, and digital architectures intersect, complicating our assumptions about clarity, control, and the digital regimes shaping contemporary culture. Full article

(This article belongs to the Special Issue Franz Kafka in the Age of Artificial Intelligence)

25 pages, 6609 KiB

Open AccessArticle

MultiDiffEditAttack: A Multi-Modal Black-Box Jailbreak Attack on Image Editing Models

by Peihong Chen, Feng Chen and Lei Guo

Electronics 2025, 14(5), 899; https://doi.org/10.3390/electronics14050899 - 24 Feb 2025

Viewed by 1116

Abstract

In recent years, image editing models have made notable advancements and gained widespread use. However, these technologies also present significant security risks by enabling the creation of Not Safe For Work (NSFW) content. This study introduces MDEA (MultiDiffEditAttack), an innovative multi-modal black-box jailbreak [...] Read more.

In recent years, image editing models have made notable advancements and gained widespread use. However, these technologies also present significant security risks by enabling the creation of Not Safe For Work (NSFW) content. This study introduces MDEA (MultiDiffEditAttack), an innovative multi-modal black-box jailbreak attack framework designed to evaluate and challenge the security of image editing models. MDEA leverages large language models and genetic algorithms to generate adversarial prompts that modify sensitive vocabulary structures, thereby bypassing prompt filters. Additionally, MDEA employs transfer learning to optimize input image features, effectively bypassing post-hoc safety checks. By integrating prompt attacks and safety checker attacks, MDEA utilizes a multimodal attack strategy to target image editing models in a black-box setting. Experimental results demonstrate that MDEA significantly improves the attack efficiency against image editing models compared to current black-box methods. These results demonstrate the effectiveness of MDEA in multi-modal attacks and reveal numerous vulnerabilities in current defense mechanisms. Full article

(This article belongs to the Special Issue Security and Privacy for AI)

► Show Figures

Figure 1

20 pages, 520 KiB

Open AccessArticle

A Green AI Methodology Based on Persistent Homology for Compressing BERT

by Luis Balderas, Miguel Lastra and José M. Benítez

Appl. Sci. 2025, 15(1), 390; https://doi.org/10.3390/app15010390 - 3 Jan 2025

Viewed by 1980

Abstract

Large Language Models (LLMs) like BERT have gained significant prominence due to their remarkable performance in various natural language processing tasks. However, they come with substantial computational and memory costs. Additionally, they are essentially black-box models, being challenging to explain and interpret. In [...] Read more.

Large Language Models (LLMs) like BERT have gained significant prominence due to their remarkable performance in various natural language processing tasks. However, they come with substantial computational and memory costs. Additionally, they are essentially black-box models, being challenging to explain and interpret. In this article, Persistent BERT Compression and Explainability (PBCE) is proposed, a Green AI methodology to prune BERT models using persistent homology, aiming to measure the importance of each neuron by studying the topological characteristics of their outputs. As a result, PBCE can compress BERT significantly by reducing the number of parameters (47% of the original parameters for BERT Base, 42% for BERT Large). The proposed methodology has been evaluated on the standard GLUE Benchmark, comparing the results with state-of-the-art techniques achieving outstanding results. Consequently, PBCE can simplify the BERT model by providing explainability to its neurons and reducing the model’s size, making it more suitable for deployment on resource-constrained devices. Full article

(This article belongs to the Section Computing and Artificial Intelligence)

► Show Figures

Figure 1

13 pages, 1171 KiB

Open AccessArticle

Bayesian Optimization for Instruction Generation

by Antonio Sabbatella, Francesco Archetti, Andrea Ponti, Ilaria Giordani and Antonio Candelieri

Appl. Sci. 2024, 14(24), 11865; https://doi.org/10.3390/app142411865 - 19 Dec 2024

Viewed by 2174

Abstract

The performance of Large Language Models (LLMs) strongly depends on the selection of the best instructions for different downstream tasks, especially in the case of black-box LLMs. This study introduces BOInG (Bayesian Optimization for Instruction Generation), a method leveraging Bayesian Optimization (BO) to [...] Read more.

The performance of Large Language Models (LLMs) strongly depends on the selection of the best instructions for different downstream tasks, especially in the case of black-box LLMs. This study introduces BOInG (Bayesian Optimization for Instruction Generation), a method leveraging Bayesian Optimization (BO) to efficiently generate instructions while addressing the combinatorial nature of instruction search. Over the last decade, BO has emerged as a highly effective optimization method in various domains due to its flexibility and sample efficiency. At its core, BOInG employs Bayesian search in a low-dimensional continuous space, projecting solutions into a high-dimensional token embedding space to retrieve discrete tokens. These tokens act as seeds for the generation of human-readable, task-relevant instructions. Experimental results demonstrate that BOInG achieves comparable or superior performance to state-of-the-art methods, such as InstructZero and Instinct, with substantially lower resource requirements while also enabling the use of both white-box and black-box models. This approach offers both theoretical and practical benefits without requiring specialized hardware. Full article

(This article belongs to the Special Issue Advances in Large Language Models: Techniques, Applications and Challenges)

► Show Figures

Figure 1

32 pages, 686 KiB

Open AccessArticle

Opening the AI Black Box: Distilling Machine-Learned Algorithms into Code

by Eric J. Michaud, Isaac Liao, Vedang Lad, Ziming Liu, Anish Mudide, Chloe Loughridge, Zifan Carl Guo, Tara Rezaei Kheirkhah, Mateja Vukelić and Max Tegmark

Entropy 2024, 26(12), 1046; https://doi.org/10.3390/e26121046 - 2 Dec 2024

Cited by 1 | Viewed by 2664

Abstract

Can we turn AI black boxes into code? Although this mission sounds extremely challenging, we show that it is not entirely impossible by presenting a proof-of-concept method, MIPS, that can synthesize programs based on the automated mechanistic interpretability of neural networks trained to [...] Read more.

Can we turn AI black boxes into code? Although this mission sounds extremely challenging, we show that it is not entirely impossible by presenting a proof-of-concept method, MIPS, that can synthesize programs based on the automated mechanistic interpretability of neural networks trained to perform the desired task, auto-distilling the learned algorithm into Python code. We test MIPS on a benchmark of 62 algorithmic tasks that can be learned by an RNN and find it highly complementary to GPT-4: MIPS solves 32 of them, including 13 that are not solved by GPT-4 (which also solves 30). MIPS uses an integer autoencoder to convert the RNN into a finite state machine, then applies Boolean or integer symbolic regression to capture the learned algorithm. As opposed to large language models, this program synthesis technique makes no use of (and is therefore not limited by) human training data such as algorithms and code from GitHub. We discuss opportunities and challenges for scaling up this approach to make machine-learned models more interpretable and trustworthy. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

15 pages, 438 KiB

Open AccessArticle

Using Generative AI Models to Support Cybersecurity Analysts

by Štefan Balogh, Marek Mlynček, Oliver Vraňák and Pavol Zajac

Electronics 2024, 13(23), 4718; https://doi.org/10.3390/electronics13234718 - 28 Nov 2024

Cited by 1 | Viewed by 3213

Abstract

One of the tasks of security analysts is to detect security vulnerabilities and ongoing attacks. There is already a large number of software tools that can help to collect security-relevant data, such as event logs, security settings, application manifests, and even the (decompiled) [...] Read more.

One of the tasks of security analysts is to detect security vulnerabilities and ongoing attacks. There is already a large number of software tools that can help to collect security-relevant data, such as event logs, security settings, application manifests, and even the (decompiled) source code of potentially malicious applications. The analyst must study these data, evaluate them, and properly identify and classify suspicious activities and applications. Fast advances in the area of Artificial Intelligence have produced large language models that can perform a variety of tasks, including generating text summaries and reports. In this article, we study the potential black-box use of LLM chatbots as a support tool for security analysts. We provide two case studies: the first is concerned with the identification of vulnerabilities in Android applications, and the second one is concerned with the analysis of security logs. We show how LLM chatbots can help security analysts in their work, but point out specific limitations and security concerns related to this approach. Full article

(This article belongs to the Special Issue New Insights in Information Security and Data Privacy: Challenges and Solutions)

► Show Figures

Figure 1

23 pages, 4829 KiB

Open AccessReview

The Evolution of Artificial Intelligence in Medical Imaging: From Computer Science to Machine and Deep Learning

by Michele Avanzo, Joseph Stancanello, Giovanni Pirrone, Annalisa Drigo and Alessandra Retico

Cancers 2024, 16(21), 3702; https://doi.org/10.3390/cancers16213702 - 1 Nov 2024

Cited by 13 | Viewed by 9637

Abstract

Artificial intelligence (AI), the wide spectrum of technologies aiming to give machines or computers the ability to perform human-like cognitive functions, began in the 1940s with the first abstract models of intelligent machines. Soon after, in the 1950s and 1960s, machine learning algorithms [...] Read more.

Artificial intelligence (AI), the wide spectrum of technologies aiming to give machines or computers the ability to perform human-like cognitive functions, began in the 1940s with the first abstract models of intelligent machines. Soon after, in the 1950s and 1960s, machine learning algorithms such as neural networks and decision trees ignited significant enthusiasm. More recent advancements include the refinement of learning algorithms, the development of convolutional neural networks to efficiently analyze images, and methods to synthesize new images. This renewed enthusiasm was also due to the increase in computational power with graphical processing units and the availability of large digital databases to be mined by neural networks. AI soon began to be applied in medicine, first through expert systems designed to support the clinician’s decision and later with neural networks for the detection, classification, or segmentation of malignant lesions in medical images. A recent prospective clinical trial demonstrated the non-inferiority of AI alone compared with a double reading by two radiologists on screening mammography. Natural language processing, recurrent neural networks, transformers, and generative models have both improved the capabilities of making an automated reading of medical images and moved AI to new domains, including the text analysis of electronic health records, image self-labeling, and self-reporting. The availability of open-source and free libraries, as well as powerful computing resources, has greatly facilitated the adoption of deep learning by researchers and clinicians. Key concerns surrounding AI in healthcare include the need for clinical trials to demonstrate efficacy, the perception of AI tools as ‘black boxes’ that require greater interpretability and explainability, and ethical issues related to ensuring fairness and trustworthiness in AI systems. Thanks to its versatility and impressive results, AI is one of the most promising resources for frontier research and applications in medicine, in particular for oncological applications. Full article

(This article belongs to the Section Cancer Informatics and Big Data)

► Show Figures

Figure 1

30 pages, 5419 KiB

Open AccessArticle

Explainable Aspect-Based Sentiment Analysis Using Transformer Models

by Isidoros Perikos and Athanasios Diamantopoulos

Big Data Cogn. Comput. 2024, 8(11), 141; https://doi.org/10.3390/bdcc8110141 - 24 Oct 2024

Cited by 6 | Viewed by 7257

Abstract

An aspect-based sentiment analysis (ABSA) aims to perform a fine-grained analysis of text to identify sentiments and opinions associated with specific aspects. Recently, transformers and large language models have demonstrated exceptional performance in detecting aspects and determining their associated sentiments within text. However, [...] Read more.

An aspect-based sentiment analysis (ABSA) aims to perform a fine-grained analysis of text to identify sentiments and opinions associated with specific aspects. Recently, transformers and large language models have demonstrated exceptional performance in detecting aspects and determining their associated sentiments within text. However, understanding the decision-making processes of transformers remains a significant challenge, as they often operate as black-box models, making it difficult to interpret how they arrive at specific predictions. In this article, we examine the performance of various transformers on ABSA and we employ explainability techniques to illustrate their inner decision-making processes. Firstly, we fine-tune several pre-trained transformers, including BERT, RoBERTa, DistilBERT, and XLNet, on an extensive set of data composed of MAMS, SemEval, and Naver datasets. These datasets consist of over 16,100 complex sentences, each containing a couple of aspects and corresponding polarities. The models were fine-tuned using optimal hyperparameters and RoBERTa achieved the highest performance, reporting 89.16% accuracy on MAMS and SemEval and 97.62% on Naver. We implemented five explainability techniques, LIME, SHAP, attention weight visualization, integrated gradients, and Grad-CAM, to illustrate how transformers make predictions and highlight influential words. These techniques can reveal how models use specific words and contextual information to make sentiment predictions, which can improve performance, address biases, and enhance model efficiency and robustness. These also point out directions for further focus on the analysis of models’ bias in combination with explainability methods, ensuring that explainability highlights potential biases in predictions. Full article

(This article belongs to the Special Issue Advances in Natural Language Processing and Text Mining)

► Show Figures

Figure 1

23 pages, 5336 KiB

Open AccessArticle

Enhancing the Interpretability of Malaria and Typhoid Diagnosis with Explainable AI and Large Language Models

by Kingsley Attai, Moses Ekpenyong, Constance Amannah, Daniel Asuquo, Peterben Ajuga, Okure Obot, Ekemini Johnson, Anietie John, Omosivie Maduka, Christie Akwaowo and Faith-Michael Uzoka

Trop. Med. Infect. Dis. 2024, 9(9), 216; https://doi.org/10.3390/tropicalmed9090216 - 16 Sep 2024

Cited by 3 | Viewed by 4044

Abstract

Malaria and Typhoid fever are prevalent diseases in tropical regions, and both are exacerbated by unclear protocols, drug resistance, and environmental factors. Prompt and accurate diagnosis is crucial to improve accessibility and reduce mortality rates. Traditional diagnosis methods cannot effectively capture the complexities [...] Read more.

Malaria and Typhoid fever are prevalent diseases in tropical regions, and both are exacerbated by unclear protocols, drug resistance, and environmental factors. Prompt and accurate diagnosis is crucial to improve accessibility and reduce mortality rates. Traditional diagnosis methods cannot effectively capture the complexities of these diseases due to the presence of similar symptoms. Although machine learning (ML) models offer accurate predictions, they operate as “black boxes” with non-interpretable decision-making processes, making it challenging for healthcare providers to comprehend how the conclusions are reached. This study employs explainable AI (XAI) models such as Local Interpretable Model-agnostic Explanations (LIME), and Large Language Models (LLMs) like GPT to clarify diagnostic results for healthcare workers, building trust and transparency in medical diagnostics by describing which symptoms had the greatest impact on the model’s decisions and providing clear, understandable explanations. The models were implemented on Google Colab and Visual Studio Code because of their rich libraries and extensions. Results showed that the Random Forest model outperformed the other tested models; in addition, important features were identified with the LIME plots while ChatGPT 3.5 had a comparative advantage over other LLMs. The study integrates RF, LIME, and GPT in building a mobile app to enhance the interpretability and transparency in malaria and typhoid diagnosis system. Despite its promising results, the system’s performance is constrained by the quality of the dataset. Additionally, while LIME and GPT improve transparency, they may introduce complexities in real-time deployment due to computational demands and the need for internet service to maintain relevance and accuracy. The findings suggest that AI-driven diagnostic systems can significantly enhance healthcare delivery in environments with limited resources, and future works can explore the applicability of this framework to other medical conditions and datasets. Full article

► Show Figures

Figure 1

18 pages, 590 KiB

Open AccessArticle

Open Sesame! Universal Black-Box Jailbreaking of Large Language Models

by Raz Lapid, Ron Langberg and Moshe Sipper

Appl. Sci. 2024, 14(16), 7150; https://doi.org/10.3390/app14167150 - 14 Aug 2024

Cited by 17 | Viewed by 5912

Abstract

Large language models (LLMs), designed to provide helpful and safe responses, often rely on alignment techniques to align with user intent and social guidelines. Unfortunately, this alignment can be exploited by malicious actors seeking to manipulate an LLM’s outputs for unintended purposes. In [...] Read more.

Large language models (LLMs), designed to provide helpful and safe responses, often rely on alignment techniques to align with user intent and social guidelines. Unfortunately, this alignment can be exploited by malicious actors seeking to manipulate an LLM’s outputs for unintended purposes. In this paper, we introduce a novel approach that employs a genetic algorithm (GA) to manipulate LLMs when model architecture and parameters are inaccessible. The GA attack works by optimizing a universal adversarial prompt that—when combined with a user’s query—disrupts the attacked model’s alignment, resulting in unintended and potentially harmful outputs. Our novel approach systematically reveals a model’s limitations and vulnerabilities by uncovering instances where its responses deviate from expected behavior. Through extensive experiments, we demonstrate the efficacy of our technique, thus contributing to the ongoing discussion on responsible AI development by providing a diagnostic tool for evaluating and enhancing alignment of LLMs with human intent. To our knowledge, this is the first automated universal black-box jailbreak attack. Full article

(This article belongs to the Special Issue Advances in Large Language Models: Techniques, Applications and Challenges)

► Show Figures

Figure 1

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (25)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI