MDPI - Publisher of Open Access Journals

17 pages, 3359 KB

Open AccessArticle

Automated Generation of Test Scenarios for Autonomous Driving Using LLMs

by Aaron Agyapong Danso and Ulrich Büker

Electronics 2025, 14(16), 3177; https://doi.org/10.3390/electronics14163177 - 10 Aug 2025

Viewed by 1205

This paper introduces an approach that leverages large language models (LLMs) to convert detailed descriptions of an Operational Design Domain (ODD) into realistic, executable simulation scenarios for testing autonomous vehicles. The method combines model-based and data-driven techniques to decompose ODDs into three key components: environmental, scenery, and dynamic elements. It then applies prompt engineering to generate ScenarioRunner scripts compatible with CARLA. The model-based component guides the LLM using structured prompts and a “Tree of Thoughts” strategy to outline the scenario, while a data-driven refinement process, drawing inspiration from red teaming, enhances the accuracy and robustness of the generated scripts over time. Experimental results show that while static components, such as weather and road layouts, are well captured, dynamic elements like vehicle and pedestrian behavior require further refinement. Overall, this approach not only reduces the manual effort involved in creating simulation scenarios but also identifies key challenges and opportunities for advancing safer and more adaptive autonomous driving systems. Full article

(This article belongs to the Special Issue Autonomous and Connected Vehicles)

► Show Figures

Figure 1

30 pages, 452 KB

Open AccessArticle

Advancing Multimodal Large Language Models: Optimizing Prompt Engineering Strategies for Enhanced Performance

by Minjun Son and Sungjin Lee

Appl. Sci. 2025, 15(7), 3992; https://doi.org/10.3390/app15073992 - 4 Apr 2025

Cited by 2 | Viewed by 4217

Abstract

This study investigates prompt engineering (PE) strategies to mitigate hallucination, a key limitation of multimodal large language models (MLLMs). To address this issue, we explore five prominent multimodal PE techniques: in-context learning (ICL), chain of thought (CoT), step-by-step reasoning (SSR), tree of thought (ToT), and retrieval-augmented generation (RAG). These techniques are systematically applied across multiple datasets with distinct domains and characteristics. Based on the empirical findings, we propose the greedy prompt engineering strategy (Greedy PES), a methodology for optimizing PE application across different datasets and MLLM models. To evaluate user satisfaction with MLLM-generated responses, we adopt a comprehensive set of evaluation metrics, including BLEU, ROUGE, METEOR, S-BERT, MoverScore, and CIDEr. A weighted aggregate evaluation score is introduced to provide a holistic assessment of model performance under varying conditions. Experimental results demonstrate that the optimal prompt engineering strategy varies significantly depending on both dataset properties and the MLLM model used. Specifically, datasets categorized as general benefit the most from ICL, ToT, and RAG, whereas mathematical datasets perform optimally with ICL, SSR, and ToT. In scientific reasoning tasks, RAG and SSR emerge as the most effective strategies. Applying Greedy PES leads to a substantial improvement in performance across different multimodal tasks, achieving an average evaluation score enhancement of 184.3% for general image captioning, 90.3% for mathematical visual question answering (VQA), and 49.1% for science visual question answering (VQA) compared to conventional approaches. These findings highlight the effectiveness of structured PE strategies in optimizing MLLM performance and provide a robust framework for PE-driven model enhancement across diverse multimodal applications. Full article

► Show Figures

Figure 1

34 pages, 315 KB

Open AccessArticle

Optimizing Large Language Models: A Deep Dive into Effective Prompt Engineering Techniques

by Minjun Son, Yun-Jae Won and Sungjin Lee

Appl. Sci. 2025, 15(3), 1430; https://doi.org/10.3390/app15031430 - 30 Jan 2025

Cited by 6 | Viewed by 6238

Abstract

Recent advancements in Natural Language Processing (NLP) technologies have been driven at an unprecedented pace by the development of Large Language Models (LLMs). However, challenges remain, such as generating responses that are misaligned with the intent of the question or producing incorrect answers. This paper analyzes various Prompt Engineering techniques for large-scale language models and identifies methods that can optimize response performance across different datasets without the need for extensive retraining or fine-tuning. In particular, we examine prominent Prompt Engineering techniques including In-Context Learning (ICL), Chain of Thought (CoT), Retrieval-Augmented Generation (RAG), Step-by-Step Reasoning (SSR), and Tree of Thought (ToT), and we apply these techniques to leading LLMs such as Gemma2, LlaMA3, and Mistral. The performance of these models was evaluated using the AI2 Reasoning Challenge (ARC), HellaSwag, Massive Multitask Language Understanding (MMLU), TruthfulQA, Winogrande, and Grade School Math (GSM8k) datasets across metrics such as BLEU, ROUGE, METEOR, BLEURT, and BERTScore. The experimental results indicate that the most suitable Prompt Engineering technique can vary depending on the characteristics of each dataset. Specifically, for datasets emphasizing mathematical and logical reasoning, Prompt Engineering strategies centered around CoT, SSR, and ToT were found to be advantageous. For datasets focusing on natural language understanding, ICL-centric strategies were more effective, while RAG-based strategies were beneficial for datasets where factual accuracy is crucial. However, it was also observed that the optimal combination of Prompt Engineering techniques could differ depending on the specific LLM, indicating that fine-tuning the Prompt Engineering approach to the model and dataset is essential for achieving the best performance. The findings indicate that as LLMs become more advanced, their reliance on Prompt Engineering (PE) techniques diminishes, yet the magnitude of their performance improvement when PE strategies are applied increases. Furthermore, these advanced models tend to depend less on ICL techniques while exhibiting a greater reliance on RAG strategies. It is also evident that implementing RAG with PE-based preprocessing yields superior performance enhancements compared to the mere application of RAG on raw data. Full article

(This article belongs to the Special Issue Advanced Large Language Models and Natural Language Processing Applications)

15 pages, 3024 KB

Open AccessArticle

Research on Intelligent Grading of Physics Problems Based on Large Language Models

by Yuhao Wei, Rui Zhang, Jianwei Zhang, Dizhi Qi and Wenqian Cui

Educ. Sci. 2025, 15(2), 116; https://doi.org/10.3390/educsci15020116 - 21 Jan 2025

Cited by 2 | Viewed by 2305

Abstract

The automation of educational and instructional assessment plays a crucial role in enhancing the quality of teaching management. In physics education, calculation problems with intricate problem-solving ideas pose challenges to the intelligent grading of tests. This study explores the automatic grading of physics problems through a combination of large language models and prompt engineering. By comparing the performance of four prompt strategies (one-shot, few-shot, chain of thought, tree of thought) within two large model frameworks, namely ERNIEBot-4-turbo and GPT-4o. This study finds that the tree of thought prompt can better assess calculation problems with complex ideas (N = 100, ACC ≥ 0.9, kappa > 0.8) and reduce the performance gap between different models. This research provides valuable insights for the automation of assessments in physics education. Full article

► Show Figures

Figure 1

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (4)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI