Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

Search Results (4)

Search Parameters:
Keywords = tree-of-thought prompt

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
17 pages, 3359 KB  
Article
Automated Generation of Test Scenarios for Autonomous Driving Using LLMs
by Aaron Agyapong Danso and Ulrich Büker
Electronics 2025, 14(16), 3177; https://doi.org/10.3390/electronics14163177 - 10 Aug 2025
Viewed by 1205
Abstract
This paper introduces an approach that leverages large language models (LLMs) to convert detailed descriptions of an Operational Design Domain (ODD) into realistic, executable simulation scenarios for testing autonomous vehicles. The method combines model-based and data-driven techniques to decompose ODDs into three key [...] Read more.
This paper introduces an approach that leverages large language models (LLMs) to convert detailed descriptions of an Operational Design Domain (ODD) into realistic, executable simulation scenarios for testing autonomous vehicles. The method combines model-based and data-driven techniques to decompose ODDs into three key components: environmental, scenery, and dynamic elements. It then applies prompt engineering to generate ScenarioRunner scripts compatible with CARLA. The model-based component guides the LLM using structured prompts and a “Tree of Thoughts” strategy to outline the scenario, while a data-driven refinement process, drawing inspiration from red teaming, enhances the accuracy and robustness of the generated scripts over time. Experimental results show that while static components, such as weather and road layouts, are well captured, dynamic elements like vehicle and pedestrian behavior require further refinement. Overall, this approach not only reduces the manual effort involved in creating simulation scenarios but also identifies key challenges and opportunities for advancing safer and more adaptive autonomous driving systems. Full article
(This article belongs to the Special Issue Autonomous and Connected Vehicles)
Show Figures

Figure 1

30 pages, 452 KB  
Article
Advancing Multimodal Large Language Models: Optimizing Prompt Engineering Strategies for Enhanced Performance
by Minjun Son and Sungjin Lee
Appl. Sci. 2025, 15(7), 3992; https://doi.org/10.3390/app15073992 - 4 Apr 2025
Cited by 2 | Viewed by 4217
Abstract
This study investigates prompt engineering (PE) strategies to mitigate hallucination, a key limitation of multimodal large language models (MLLMs). To address this issue, we explore five prominent multimodal PE techniques: in-context learning (ICL), chain of thought (CoT), step-by-step reasoning (SSR), tree of thought [...] Read more.
This study investigates prompt engineering (PE) strategies to mitigate hallucination, a key limitation of multimodal large language models (MLLMs). To address this issue, we explore five prominent multimodal PE techniques: in-context learning (ICL), chain of thought (CoT), step-by-step reasoning (SSR), tree of thought (ToT), and retrieval-augmented generation (RAG). These techniques are systematically applied across multiple datasets with distinct domains and characteristics. Based on the empirical findings, we propose the greedy prompt engineering strategy (Greedy PES), a methodology for optimizing PE application across different datasets and MLLM models. To evaluate user satisfaction with MLLM-generated responses, we adopt a comprehensive set of evaluation metrics, including BLEU, ROUGE, METEOR, S-BERT, MoverScore, and CIDEr. A weighted aggregate evaluation score is introduced to provide a holistic assessment of model performance under varying conditions. Experimental results demonstrate that the optimal prompt engineering strategy varies significantly depending on both dataset properties and the MLLM model used. Specifically, datasets categorized as general benefit the most from ICL, ToT, and RAG, whereas mathematical datasets perform optimally with ICL, SSR, and ToT. In scientific reasoning tasks, RAG and SSR emerge as the most effective strategies. Applying Greedy PES leads to a substantial improvement in performance across different multimodal tasks, achieving an average evaluation score enhancement of 184.3% for general image captioning, 90.3% for mathematical visual question answering (VQA), and 49.1% for science visual question answering (VQA) compared to conventional approaches. These findings highlight the effectiveness of structured PE strategies in optimizing MLLM performance and provide a robust framework for PE-driven model enhancement across diverse multimodal applications. Full article
Show Figures

Figure 1

34 pages, 315 KB  
Article
Optimizing Large Language Models: A Deep Dive into Effective Prompt Engineering Techniques
by Minjun Son, Yun-Jae Won and Sungjin Lee
Appl. Sci. 2025, 15(3), 1430; https://doi.org/10.3390/app15031430 - 30 Jan 2025
Cited by 6 | Viewed by 6238
Abstract
Recent advancements in Natural Language Processing (NLP) technologies have been driven at an unprecedented pace by the development of Large Language Models (LLMs). However, challenges remain, such as generating responses that are misaligned with the intent of the question or producing incorrect answers. [...] Read more.
Recent advancements in Natural Language Processing (NLP) technologies have been driven at an unprecedented pace by the development of Large Language Models (LLMs). However, challenges remain, such as generating responses that are misaligned with the intent of the question or producing incorrect answers. This paper analyzes various Prompt Engineering techniques for large-scale language models and identifies methods that can optimize response performance across different datasets without the need for extensive retraining or fine-tuning. In particular, we examine prominent Prompt Engineering techniques including In-Context Learning (ICL), Chain of Thought (CoT), Retrieval-Augmented Generation (RAG), Step-by-Step Reasoning (SSR), and Tree of Thought (ToT), and we apply these techniques to leading LLMs such as Gemma2, LlaMA3, and Mistral. The performance of these models was evaluated using the AI2 Reasoning Challenge (ARC), HellaSwag, Massive Multitask Language Understanding (MMLU), TruthfulQA, Winogrande, and Grade School Math (GSM8k) datasets across metrics such as BLEU, ROUGE, METEOR, BLEURT, and BERTScore. The experimental results indicate that the most suitable Prompt Engineering technique can vary depending on the characteristics of each dataset. Specifically, for datasets emphasizing mathematical and logical reasoning, Prompt Engineering strategies centered around CoT, SSR, and ToT were found to be advantageous. For datasets focusing on natural language understanding, ICL-centric strategies were more effective, while RAG-based strategies were beneficial for datasets where factual accuracy is crucial. However, it was also observed that the optimal combination of Prompt Engineering techniques could differ depending on the specific LLM, indicating that fine-tuning the Prompt Engineering approach to the model and dataset is essential for achieving the best performance. The findings indicate that as LLMs become more advanced, their reliance on Prompt Engineering (PE) techniques diminishes, yet the magnitude of their performance improvement when PE strategies are applied increases. Furthermore, these advanced models tend to depend less on ICL techniques while exhibiting a greater reliance on RAG strategies. It is also evident that implementing RAG with PE-based preprocessing yields superior performance enhancements compared to the mere application of RAG on raw data. Full article
15 pages, 3024 KB  
Article
Research on Intelligent Grading of Physics Problems Based on Large Language Models
by Yuhao Wei, Rui Zhang, Jianwei Zhang, Dizhi Qi and Wenqian Cui
Educ. Sci. 2025, 15(2), 116; https://doi.org/10.3390/educsci15020116 - 21 Jan 2025
Cited by 2 | Viewed by 2305
Abstract
The automation of educational and instructional assessment plays a crucial role in enhancing the quality of teaching management. In physics education, calculation problems with intricate problem-solving ideas pose challenges to the intelligent grading of tests. This study explores the automatic grading of physics [...] Read more.
The automation of educational and instructional assessment plays a crucial role in enhancing the quality of teaching management. In physics education, calculation problems with intricate problem-solving ideas pose challenges to the intelligent grading of tests. This study explores the automatic grading of physics problems through a combination of large language models and prompt engineering. By comparing the performance of four prompt strategies (one-shot, few-shot, chain of thought, tree of thought) within two large model frameworks, namely ERNIEBot-4-turbo and GPT-4o. This study finds that the tree of thought prompt can better assess calculation problems with complex ideas (N = 100, ACC ≥ 0.9, kappa > 0.8) and reduce the performance gap between different models. This research provides valuable insights for the automation of assessments in physics education. Full article
Show Figures

Figure 1

Back to TopTop