LLM Fine-Tuning: Concepts, Opportunities, and Challenges

Wu, Xiao-Kun; Chen, Min; Li, Wanyi; Wang, Rui; Lu, Limeng; Liu, Jia; Hwang, Kai; Hao, Yixue; Pan, Yanru; Meng, Qingguo; Huang, Kaibin; Hu, Long; Guizani, Mohsen; Chao, Naipeng; Fortino, Giancarlo; Lin, Fei; Tian, Yonglin; Niyato, Dusit; Wang, Fei-Yue

doi:10.3390/bdcc9040087

Open AccessPerspective

LLM Fine-Tuning: Concepts, Opportunities, and Challenges

by

Xiao-Kun Wu

¹

,

Min Chen

^2,*,

Wanyi Li

³

,

Rui Wang

⁴

,

Limeng Lu

³,

Jia Liu

⁴,

Kai Hwang

⁵,

Yixue Hao

⁴,

Yanru Pan

³,

Qingguo Meng

⁶,

Kaibin Huang

⁷,

Long Hu

⁴,

Mohsen Guizani

⁸,

Naipeng Chao

⁹,

Giancarlo Fortino

¹⁰

,

Fei Lin

¹¹,

Yonglin Tian

¹²,

Dusit Niyato

¹³ and

Fei-Yue Wang

¹²

¹

School of Journalism and Communication, Renmin University of China, Beijing 100872, China

²

School of Computer Science and Engineering, South China University of Technology, Guangzhou 510006, China

³

School of Journalism and Communication, South China University of Technology, Guangzhou 510006, China

⁴

School of Computer Science and Technology, Huazhong University of Science and Technology, Wuhan 430074, China

⁵

School of Data Science, The Chinese University of Hong Kong, Shenzhen 518172, China

⁶

The School of Public Policy and Management, Tsinghua University, Beijing 100084, China

⁷

Department of Electrical and Electronic Engineering, The University of Hong Kong, Hong Kong 999077, China

⁸

Machine Learning Department, Mohamed bin Zayed University of Artificial Intelligence, Abu Dhabi P.O. Box 131818, United Arab Emirates

⁹

School of Media and Communication, Shenzhen University, Shenzhen 518060, China

¹⁰

Department of Informatics, Modeling, Electronics, and Systems, University of Calabria, 87036 Rende, Italy

¹¹

Department of Engineering Science, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China

¹²

The State Key Laboratory for Management and Control of Complex Systems, Institute of Automation, Chinese Academy of Sciences, Beijing 100190, China

¹³

College of Computing and Data Science, Nanyang Technological University, Singapore 639798, Singapore

Show full affiliation list

Hide full affiliation list

^*

Author to whom correspondence should be addressed.

Big Data Cogn. Comput. 2025, 9(4), 87; https://doi.org/10.3390/bdcc9040087

Submission received: 20 February 2025 / Revised: 29 March 2025 / Accepted: 31 March 2025 / Published: 2 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

As a foundation of large language models, fine-tuning drives rapid progress, broad applicability, and profound impacts on human–AI collaboration, surpassing earlier technological advancements. This paper provides a comprehensive overview of large language model (LLM) fine-tuning by integrating hermeneutic theories of human comprehension, with a focus on the essential cognitive conditions that underpin this process. Drawing on Gadamer’s concepts of Vorverständnis, Distanciation, and the Hermeneutic Circle, the paper explores how LLM fine-tuning evolves from initial learning to deeper comprehension, ultimately advancing toward self-awareness. It examines the core principles, development, and applications of fine-tuning techniques, emphasizing its growing significance across diverse field and industries. The paper introduces a new term, “Tutorial Fine-Tuning (TFT)”, which annotates a process of intensive tuition given by a “tutor” to a small number of “students”, to define the latest round of LLM fine-tuning advancements. By addressing key challenges associated with fine-tuning, including ensuring adaptability, precision, credibility and reliability, this paper explores potential future directions for the co-evolution of humans and AI. By bridging theoretical perspectives with practical implications, this work provides valuable insights into the ongoing development of LLMs, emphasizing their potential to achieve higher levels of cognitive and operational intelligence.

Keywords:

large language models (LLM); fine-tuning; hermeneutics; comprehension; tutorial fine-tuning; human–AI co-evolution

1. Introduction

In his seminal 1950 paper “Computing Machinery and Intelligence” [1], Alan Turing outlined the potential for machines to exhibit human-like intelligence, including the ability to learn and improve over time. These ideas have been later interpreted as involving concepts like “autonomous thinking” [2], “self-learning” [3], and “self-evolving” [4]. Turing proposed that machines could potentially engage in processes similar to human thought, suggesting that over time, with sufficient development, machines might perform tasks that require human-like reasoning, learning, and adaptation. Since the 1950s, the development of AI has unfolded in distinct phases, including the early symbolic approaches [5], the rise of deep learning [6], and, more recently, the emergence of large-scale models [7]. Each phase has played a pivotal role in advancing AI, progressively steering it toward increasingly sophisticated levels of intelligence.

Turing’s goals for AI align with fundamental human mental processes, encompassing cognition, language, and reasoning. These objectives form a “mind bridge” for intelligent machines, progressing from “learning” to “comprehension” and eventually approaching to the possibility of “self-awareness” [8]. Although this goal has not yet been fully realized, continuous breakthroughs in large language model (LLM) technology are gradually narrowing the cognitive gap between humans and AI [9]. Through an ongoing process of feedback and iteration, a communication channel has been established, which acts as an invisible “bridge” between humans and machines. However, humans are often unaware of the act of comprehension, seldom reflecting on its possibilities, conditions, or underlying methods [10]. This under-awareness reflects the inherent nature of everyday cognition. Yet, as the concept of comprehension is further explored, hermeneutics has identified its essential conditions [11], highlighting that comprehension is not only spontaneous but also involves complex cognitive structures [12].

The realization of comprehension is contingent upon several fundamental conditions [13]: First, Vorverständnis (pre-comprehension) is the foundational starting point, shaping the cognitive horizon and framework for comprehension. It consists of the assumptions and frameworks that individuals bring to new information, forming the foundation for deeper understanding [13,14,15]. Second, Distanciation is a critical condition for comprehension, this term originally representing the distance between different perspectives, experiences, and bodies of knowledge in the interpretative process [13,16]. In the context of human–AI interaction, this “distance” signifies the cognitive gap between human understanding and machine interpretation, highlighting the challenges of bridging the humans and AI comprehension [17]. Third, the Hermeneutic Circle represents the condition that encapsulate a paradox: to understand the whole, one must engage with the parts, yet understanding the parts relies on understanding the whole. This circular relationship between parts and whole is central to the interpretative process, emphasizing that comprehension is not instantaneous. Rather, it is a continuously iterative process, where understanding evolves through repeated engagement with both the whole and its parts [12,18]. Through the interrelation of these conditions, comprehension emerges as a dynamic, interactive cognitive structure, rather than a linear or static process [19,20].

As Figure 1 exhibited, LLMs accomplish cognitive tasks by simulating the fundamental conditions of human comprehension [9,21]. More specifically, the Vorverständnis (pre-understanding) of a LLM originates from the vast knowledge framework it acquires during pre-training on large-scale corpora. This process enables the model to develop a foundational comprehension capability, allowing it to process and interpret new inputs based on patterns, relationships, and structures learned from extensive data [22,23]. Distanciation is reflected in the cognitive disparity between machine and humans, which shapes how the model processes and interprets information. This cognitive gap allows the model to dynamically re-calibrate the relationship between input and output, based on the patterns it has learned during pre-training. As the model encounters new data, it leverages its learned representations to modulate its responses, adapting to the specific context while maintaining flexibility in generating meaningful outputs. This process highlights the continuous negotiation of understanding between the model’s internal representations and the external input it processes [24]. The Hermeneutic Circle is manifested through the continuous training and fine-tuning of techniques, where the model iteratively refines its understanding of specific domains or tasks. Through repeated adjustments and learning, the model improves its comprehension by revisiting and revising its interpretations, gradually enhancing its ability to generate more accurate and contextually relevant outputs [18]. The synergistic interaction of these conditions allows AI to derive a holistic comprehension from partial information and continuously improve its cognitive abilities through iterative processes [25].

In the process, comprehension is not merely the outcome of information processing, but a result of the interplay of multiple layers and conditions. Fine-tuning enables the model to refine and deepen its comprehension within specific domains by building upon the broad knowledge gained during the pre-training phase. In pre-training, the model undergoes unsupervised learning on vast, diverse datasets, developing a general understanding of language structures, patterns, and semantic relationships [26]. This provides the Vorverständnis with a foundational knowledge framework that the model uses to interpret and generate text. However, this general understanding is not tailored to particular tasks or contexts. In contrast, post-training involves further training the model on domain-specific, often labeled datasets, allowing it to adjust and specialize its knowledge for more precise applications. During fine-tuning, the model’s parameters are optimized to enhance its performance on specific tasks. This iterative process enables the model to refine its understanding and adapt more effectively to particular challenges, resulting in a model that is not only fluent in general language but also finely attuned to the needs of specific applications [27].

Building upon the concept of Distanciation, this cognitive gap between AI and humans does not merely reflect a static difference but serves as a dynamic process that drives the model’s adaptation and refinement [28]. While the model starts with a broad, generalized understanding acquired through pre-training, it continuously recalibrates its responses in light of new information and evolving contexts. As the model encounters diverse inputs, it adjusts its internal representations and re-aligns them with task-specific goals, narrowing the gap through iterative feedback. This process of continuous adaptation reflects not just a recalibration of input–output relationships but a deeper, more flexible negotiation of meaning, where the model gradually aligns its outputs with human expectations and understanding.

The cognitive distance between AI and humans is not merely a challenge to overcome but an integral aspect of how the model evolves and improves. Through an iterative process, the model gradually refines its understanding and adapting to specific contexts. This dynamic process enables the model to produce increasingly accurate and contextually relevant responses that better align with human-like comprehension. The circular nature of comprehension becomes particularly evident during fine-tuning. In this phase, the model engages in continuous refinement, adjusting its internal representations based on partial information. This iterative process not only corrects and enhances the model’s cognitive abilities but also deepens its understanding, allowing it to become more adaptable and accurate across different tasks and domains. By repeatedly negotiating between its learned knowledge and new inputs, the model progressively improves its performance, demonstrating the evolving nature of AI comprehension.

This paper provides a perspective of LLM fine-tuning through the lens of human comprehension as understood in hermeneutics. By integrating key hermeneutic principles, it explores the core concepts, advancements, application areas, future directions, and challenges of fine-tuning techniques. Focusing particularly on the three conditions of human comprehension—Distanciation, the Hermeneutic Circle, and Vorverständnis—which examines how the evolution of fine-tuning technology parallels the shift from mere learning to deeper comprehension that may ultimately guide the machine to cognitive intelligence [29]. In other words, by approaching to the conditions of human comprehension, a LLM could go beyond generating knowledge by using existing information, and also obtain other intellectual functions such as attention, learning, memory, judgment and reasoning, etc. Hermeneutics emphasizes the dynamic, iterative process of comprehension, making it a valuable tool for analyzing how fine-tuning helps LLMs move beyond simple pattern recognition to the more nuanced understanding required for task-specific applications. By drawing on this framework, the paper highlights the ways in which fine-tuning technologies are advancing AI’s ability to simulate human-like cognitive processes.

The aim of this paper is to offer theoretical insights and guidance for the innovative development of LLM fine-tuning, while also inspiring the direction for the next phase of AI advancements. To achieve this, the paper addresses two key questions: How can fine-tuning techniques in LLMs be optimized to bridge the cognitive gap between AI and humans, and how can principles of human comprehension, particularly from hermeneutics, contribute to explain the model’s understanding and task-specific performance? What are the emerging challenges in fine-tuning LLMs, and how can this iterative process contribute to the development of cognitive intelligence in AI systems? Additionally, this paper aims to inspire new directions in research that will drive the next phase of AI evolution, where models are not just trained, but deeply comprehend and interact with human-like intelligence. The structure of the paper is as follows: Section 2 introduces the core concepts of LLM fine-tuning, including its basic techniques and developments; Section 3 explores the application areas and potential opportunities of LLM fine-tuning, analyzing its practical value across various industries; Section 4 discusses the challenges faced by fine-tuning technology in detail, forecasting future directions, and highlighting the latent risks that may require vigilance in the co-evolution of human–AI systems.

2. Core Concepts of LLM Fine-Tuning

The application of large language models (LLMs) can be understood through the ‘emergent’ abilities developed during the pre-training phase, wherein the model learns universal linguistic patterns and knowledge [30,31,32]. The subsequent fine-tuning process ensures that, while maintaining the model’s general comprehension capabilities, it is adapted to specific domain applications [33,34,35]. This allows the model to excel in specialized tasks without sacrificing its broad understanding. During the pre-training phase, LLMs acquire linguistic competencies from vast datasets, including grammar, commonsense reasoning, and basic contextual comprehension. However, these pre-trained capabilities often struggle to adapt across specialized domains. In fields such as medicine, engineering, law, and finance, LLMs may encounter specialized terminology or complex domain-specific knowledge, where their performance tends to be sub-optimal. Fine-tuning addresses this challenge by adjusting pre-trained models with domain-specific data, enabling them to more accurately perform specialized tasks.

In addition to fine-tuning, several other techniques, such as knowledge distillation, prompt engineering (PE), and retrieval-augmented generation (RAG), are instrumental in adapting LLMs for specialized tasks. Knowledge distillation involves transferring knowledge from a larger, more complex model to a smaller, more efficient one, making it particularly useful in resource-constrained environments. While fine-tuning focuses on refining a model using domain-specific data to improve its performance on particular tasks, knowledge distillation emphasizes model compression and the transfer of learned behaviors, complementing fine-tuning by making the model more efficient. PE [36] guides the model’s behavior by carefully designing prompts that influence the model’s responses, allowing for greater flexibility in adapting to various tasks. Similarly, RAG [37] enhances model capabilities by enabling it to retrieve relevant information from external knowledge sources, such as databases or documents, before generating a response. Both PE and RAG provide powerful methods for fine-tuning the model’s ability to adapt quickly and effectively to a wide range of tasks, thereby complementing traditional fine-tuning techniques in enhancing LLM performance.

The essence of fine-tuning, however, lies in leveraging the broad knowledge gained during pre-training and refining it to enable a deeper understanding that is specifically tailored to a particular task or field. It makes it possible for LLMs to adapt more effectively to domain-specific nuances and intricacies, honing their performance on specific objectives without losing the general knowledge they acquired in the pre-training phase. Building upon recent breakthroughs in fine-tuning, we introduce the concept of Tutorial Fine-Tuning (TFT), a term coined to reflect the latest advancements in the learning mechanisms underlying model refinement. At its core, TFT involves utilizing the target model itself as both the learner and the evaluator, essentially acting as a “tutor” for its own learning. In this process, the model’s responses are subjected to detailed analysis, which allows it to identify areas where it performs well (i.e., “correct” answers) and the points where it falters (i.e., “incorrect” or “missed” answers). The feedback with instructive details enables the model to learn from its own performance, effectively guiding it through an iterative process of self-improvement. Through this mechanism, TFT approaches facilitate more efficient and nuanced learning, as the model refines its understanding by analyzing and correcting its own errors. This self-directed learning process mirrors how humans improve through tutoring, where they reflect on feedback, correct mistakes, and refine their comprehension.

2.1. Key Techniques of Fine-Tuning in LLMs

LLM fine-tuning enhances model performance by adapting a pre-existing LLM to task- or domain-specific data. The core idea of it is to further adjust the model using a small amount of domain-specific data, building on the extensive pre-training of the base model, thereby enabling the transition from “broad application” to “precise comprehension”.

An early technical approach is Supervised Fine-Tuning (SFT) [20], which aims to adapt pre-trained models to specific downstream tasks through supervised learning. The underlying principle of SFT is to leverage the general language representations learned by the pre-trained model and fine-tune its parameters using a small set of labeled data, thereby enhancing its performance on targeted tasks. SFT encompasses traditional single-task fine-tuning as well as multi-task adaptations, such as Full Fine-Tuning (FFT) and Parameter-Efficient Fine-Tuning (PEFT). When integrated with reinforcement learning techniques, such as Reinforcement Learning with Human Feedback (RLHF), these methods have significantly improved the model’s alignment and practical utility in applications.

Building on this, FFT [25,38] represents another critical technique for model adaptation, where all model parameters are adjusted to optimize performance for a specific task. While comprehensive, this approach is computationally expensive and prone to overfitting, especially when labeled data are limited. PEFT [39,40], such as LoRA [41] and Adapter [39], optimizes a small subset of parameters or inserts lightweight modules, reducing computational costs and improving training efficiency without sacrificing performance. Instruction Tuning [42] trains models with diverse task instructions, enhancing their ability to understand and execute complex tasks, thereby improving model’s adaptation. RLHF [43,44] refines generated content based on individual’s input, improving safety and controllability.

Self-supervised Fine-Tuning enables models to generate their own data for learning, reducing reliance on labeled datasets and benefiting low-resource or unlabeled scenarios. While TFT guides models to efficiently learn from limited data, fostering self-learning and ensuring strong performance in data-scarce situations. These techniques collectively enhance the accuracy, adaptability, and efficiency of LLM across various tasks and domains. Some of the key techniques are listed in Table 1.

2.2. Evolution of LLM Fine-Tuning

Building upon Turing’s vision for the evolution of intelligent machines, the development of fine-tuning technology marks a pivotal leap in enabling intelligent machines to transition from “learning” to “comprehension”. This progression enhances AI’s ability to simulate human-like understanding and adaptive reasoning, bringing it closer to more advanced forms of cognitive intelligence [29]. By considering the various conditions necessary for intelligent machines to attain “comprehension”, the current development of LLM fine-tuning can be categorized into the following three phases.

Figure 2 illustrates these phases. The first, The Foundations of Fine-Tuning (2017–2018), focuses on the introduction of fine-tuning, where pre-trained models were adapted with task-specific data, allowing them to move from learning to initial comprehension. Second, Advancing Fine-Tuning (2019–2024), saw the adaptation of models to specific domains like medicine and law, enhancing their understanding of complex, specialized knowledge. Third, Breakthrough Phase (December 2024–), introduces TFT approaches, where models continuously improve by learning from new data and feedback, bringing them closer to the next phase of AI. Each phase in Figure 2 shows the progression from basic language understanding to more advanced, self-evolving comprehension, marking key milestones in AI’s journey.

The Foundations of Fine-Tuning (2017–2018): From Pre-training to Initial Comprehension

In the early phase of LLM fine-tuning, comprehension was achieved through large-scale pre-training that captured universal linguistic patterns and knowledge. PT-FT is the process where a pre-trained model is further fine-tuned on specific task-related data to improve its performance on specialized tasks, such as text classification, question answering, or translation, by adjusting the model’s parameters based on task-specific objectives [66]. A key breakthrough was the introduction of the Transformer [19], which, through its self-attention mechanism, enhanced sequence modeling and allowed the model to capture rich contextual information. Similar to the concept of “Vorverständnis”, the Transformer provided a foundational framework for LLM comprehension, setting the phase for fine-tuning advancements.

In 2018, the release of BERT (Bidirectional Encoder Representations from Transformers) [20] marked a significant leap in fine-tuning. BERT’s bidirectional pre-training enabled a deeper, more comprehensive comprehension of language structure, enhancing the model’s adaptability to specific tasks. During fine-tuning, BERT optimized model parameters and dynamically adjusted its comprehension by incorporating task-specific data inputs, reflecting the hermeneutic notion that comprehension is a continuous, adaptive process. By combining pre-training with fine-tuning, BERT captured linguistic competencies from vast corpora and refined it to achieve substantial progress in tasks like text classification, Q&A, and named entity recognition. This breakthrough not only shifted the trajectory of NLP research but also laid a solid theoretical and practical foundation for the widespread use of large-scale pre-trained models.

Advancing Fine-Tuning (2019–2024): Task-Specific Adaptation and Enhanced Comprehension

Between 2019 and 2024, the development of generative models such as GPT-2, T5, and BART marked a shift in fine-tuning from simple parameter adjustments to task-driven, guided learning. During this phase, large language models, having acquired broad language comprehension through pre-training, required further fine-tuning to tailor their capabilities for specific tasks and domains, ensuring optimal performance in varied contexts.

The release of models such as GPT-2 [23], T5 [25], and BART [67] facilitated the application of both generative and discriminative models across diverse tasks. Fine-tuning evolved from adjusting model parameters to guiding learning through task-specific data. For instance, GPT-3 [9] showcased impressive zero-shot and few-shot learning capabilities, coupled with flexible prompt engineering, allowing it to adapt to various tasks without traditional fine-tuning. This breakthrough expanded the scope of fine-tuning beyond conventional methods. Concurrently, PEFT [39] emerged, offering a more resource-efficient and effective approach. This phase reflects the “progressive” nature of comprehension, where continuous feedback and task-specific interactions enable models to adjust and adapt. However, specialized domains still require more detailed learning guidance to tackle complex scenarios and address challenges like limited data availability.

Breakthrough Phase (December 2024–): TFT and the Self-Evoluted Comprehension

Starting from December 2024, LLM fine-tuning enters a period of rapid advancement. Driven by a new paradigm, this approach (See Figure 3) emphasizes task-oriented learning, enabling models to adapt swiftly to complex tasks with limited resources and data, while also facilitating effective self-adjustment and optimization.

TFT not only retains the strengths of traditional fine-tuning but also innovates in terms of efficiency, resource consumption, and the way of learning. Notably, it has achieved significant progress in low-resource environments. Key technologies, including NLFT [62], DeepSeek-R1 [63], and Stanford TTS [64], are instrumental in enhancing LLM efficiency, boosting reasoning capabilities, and reducing resource demands. These breakthroughs signal the beginning of a new phase for human–AI co-evolution.

Before the introduction of the TFT approach, DeepSeek released V3 on 26 December 2024, featuring the Multi-Head Latent Attention (MLA) and Mixture of Experts (MoE) model [63,68]. Although conceptually advanced, its real performance did not significantly outperform OpenAI’s latest model at the time, and was comparable to ByteDance’s ReFT [69]. Despite architectural innovations, DeepSeek-V3 failed to meet expectations and struggled to compete effectively with industry giants like OpenAI and ByteDance.

Notably, on 29 December 2024, a research paper of NLFT published on ArXiv and open-sourced on GitHub, garnering significant attention from both academia and industry. In contrast to ReFT, NLFT achieved a remarkable leap in performance, particularly excelling in mathematical reasoning tasks on the GSM8K dataset, demonstrating unprecedented innovation in LLM fine-tuning. The paper [62] claimed, with only 50 data instances, NLFT achieves an accuracy increase that exceeds SFT by 219%. Compared to ReFT, the time complexity and space complexity of NLFT are reduced by 78.27% and 92.24%, respectively. Unlike traditional supervised learning in SFT and the multi-stage warm-up in ByteDance’s ReFT, NLFT has three breakthroughs: a fine-grained feedback mechanism for token-level evaluation, cold-start for direct fine-tuning without extensive pre-training, and a self-iterative natural language evaluator leveraging the model’s own language comprehension for clear, interpretable feedback.

On 20 January 2025, DeepSeek launched version R1, which emphasizes ”low computational power and high performance” and has significantly outperformed DeepSeek-V3 in several real-world applications. DeepSeek claimed its R1-Distill-Qwen-32B scores 72.6% on AIME 2024, 94.3% on MATH-500, and 57.2% on LiveCodeBench [63]. This breakthrough not only shatters the computational barriers of traditional large-scale language models but also positions DeepSeek as a strong contender in the global large model race. The release sparked notable stock price fluctuations [70], including those of major players like NVIDIA, marking the beginning of a new competitive phase in the large model landscape.

On 11 February 2025, Stanford released the Test-Time Scaling (TTS) fine-tuning technique, proposing that smaller language models could potentially outperform large-scale models in specific complex tasks. The paper “Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling” [71], which also used the GSM8K dataset for training, showed that a 1B parameter LLM might even outperform a 405B parameter model (such as GPT-4) in tasks like MATH-500 and AIME24. TTS claimed its Qwen2.5-32B-Instruct language model on s1K and equipping it with budget forcing, TTS model s1-32B exceeds o1-preview on competition math questions by up to 27% (MATH and AIME24) [64]. The core of the TTS is focused on optimizing model performance by adjusting computational resources during inference.

TFTs aim to guide the LLM through a gradual evolution during the learning process, akin to reflective comprehension in hermeneutics. Depending on self-adjustment and reflection, TFTs can efficiently generate and refine learning data despite the limited labeled data. Thus, TFT approach represents a significant technological breakthrough and a crucial step toward achieving Turing-level intelligence in artificial intelligence.

3. Applications and Opportunities

LLM fine-tuning showcases significant potential to enhance task-specific intelligence and offers more precise, personalized, and intelligent services across industries. Its phased development highlights continuous evolution in practical applications, unlocking new opportunities.

3.1. Applications

In the foundational phase of LLM fine-tuning, the focus was primarily on adjusting the parameters of pre-trained models to meet the needs of specific tasks or domains. A typical example is BERT, which, after pre-training on large-scale corpora, can undergo fine-tuning to adapt to tasks such as sentiment analysis, named entity recognition, and question-answering systems [20]. For instance, BERT is widely used in medical question-answering systems, where fine-tuning allows it to answer domain-specific medical questions, improving accuracy and expertise in the medical field. In the legal domain, BERT and its variants, such as LegalBERT [72], are fine-tuned to handle legal documents and case-related queries, enhancing its comprehension of legal language and terminology. Additionally, the finance industry extensively utilizes fine-tuned models for risk assessment, processing historical data and financial reports to predict risks and provide decision support [73,74]. The application of fine-tuning technology involves many fields, such as medicine, education, law, finance, journalism, etc., as shown in Table 2.

As AI technology progresses, fine-tuning techniques and models have entered a new phase focused on task adaptation through guided learning. In this phase, fine-tuning shifts from simple parameter adjustment to guiding the learning process with task-specific data. For instance, GPT-3’s strong zero-shot and few-shot learning abilities enable e-commerce platforms to deliver personalized recommendations without extensive traditional fine-tuning [9]. Companies like Amazon and Alibaba leverage these capabilities to dynamically adjust recommendations based on users’ behaviors and preferences, significantly boosting conversion rates. The combination of the Transformer architecture and cross-lingual transfer learning technique is used to build the multilingual and multi-modal capabilities of LLMs. EuroLLM, with its advanced multilingual capabilities and deep understanding of regional preferences, allows businesses to fine-tune recommendations not only based on user behavior but also by considering language, culture, and context, thus delivering highly personalized and regionally relevant suggestions.

Additionally, models like GPT-2 [23] and T5 [25] are widely applied in the financial sector for tasks such as market analysis, investment advice, and customer service, providing more accurate responses based on specific needs like stock predictions or financial queries. Generative models like BART [67] and T5 [25] are also actively used in healthcare, especially in medical Q&A systems and literature analysis. T5, for example, customizes responses to patient queries, while BART excels in summarizing medical literature and answering related questions. Additionally, Parameter-Efficient Fine-Tuning (PEFT) [39] allows for effective fine-tuning in resource-limited settings, performing exceptionally in complex tasks. In healthcare, PEFT helps diagnostic systems quickly adapt to new disease classifications with minimal data, proving invaluable in public health emergencies.

During the breakthrough phase of fine-tuning, particularly with the advancement brought by TFT approaches, the focus of LLM has shifted from task optimization to adaptive learning and advanced reasoning. TFT guides the learning process by combining task-specific data, gradually enhancing the model’s performance in complex scenarios. These approaches demonstrate powerful reasoning capabilities. For instance, DeepSeek R1 adopts an efficient, low-computation inference strategy, streamlining the model and optimizing computation to reduce resource demands while maintaining strong reasoning performance, particularly in applications like financial Q&A and legal consulting. Furthermore, NLFT [62] and Stanford TTS [64] optimize computational resources at the pre-comprehension level, allowing smaller models to outperform larger ones in specific tasks, especially in mathematical reasoning and complex Q&A.

3.2. Opportunities

The future applications of large model fine-tuning will revolutionize the operation of low-power devices, particularly in edge computing and Internet of Things (IoT) environments. As fine-tuning technology advances, LLMs will be able to run efficiently on resource-constrained devices, providing enhanced intelligent decision-making capabilities. Traditionally, edge devices have struggled to support complex AI models due to limited computing power. However, fine-tuning allows models to be tailored to the specific needs of these devices, reducing computational and storage demands while maintaining strong reasoning abilities. For instance, fine-tuned smaller models in smart home devices can predict user behavior, manage device status, and offer personalized recommendations—all without relying on cloud computing, thus improving response times and ensuring better data privacy.

Furthermore, the development of fine-tuning technologies will open new avenues for efficient inference in smaller models. Fine-tuning approaches like NLFT and OpenAI R1 not only reduce the consumption of computational resources but also ensure high performance even in models with reduced scale. This breakthrough allows models to exhibit strong adaptability and reliability, even on resource-constrained devices, when handling complex tasks. For example, in fields like autonomous driving, smart security, and industrial monitoring, fine-tuning smaller models enables real-time processing of complex data and intelligent decision-making, significantly enhancing system responsiveness and decision accuracy. In the future, LLM fine-tuning will not only drive the intelligence of edge computing and IoT devices but also accelerate the deployment of real-time applications, further promoting the possibility of human–AI co-evolution.

4. Challenges and Future Directions

The advancement of LLM fine-tuning presents both significant opportunities and complex challenges. While it enhances model efficiency, adaptability, and interpretability, it also reveals limitations in addressing ethical considerations, resource constraints, and the diverse nature of tasks.The recent groundbreaking changes in TFT approaches have further aligned AI processes with human comprehension, paving the way for a future model of human–AI co-evolution. The rapid development of the approaches, along with the uncertainties it entails, also hints at the profound risks that AI can potentially direct to.

4.1. Challenges

Challenges in Model Behavior and Task Adaptation

A core objective of LLM fine-tuning is to enhance their adaptability, enabling them to exhibit intelligent behavior tailored to specific tasks [92]. This goal parallels the purpose of the Turing Test, which aims to make machines display intelligent behavior [93]. TFT approaches facilitate this by guiding models to learn from limited data and gradually adjusting their behavior to better align with task requirements, thereby improving task-specific adaptability [62].

However, despite significant progress, TFT faces challenges due to the inherent diversity and unpredictability of model behavior. Task-specific differences require not only structural fine-tuning but also behavioral adjustments. Here, concepts such as “Vorverständnis” and the “Hermeneutic Circle” are essential. When models encounter new tasks, they must adapt and deepen their comprehension based on existing knowledge to effectively display intelligent behavior.

Thus, the evolution of fine-tuning, particularly within the framework of TFT, constitutes an ongoing optimization process shaped by task interaction and feedback. This approach requires models to possess strong reasoning capabilities and the adaptability to effectively address new tasks, continuously refining their behavior through iterative feedback cycles.

Challenges of Resource Efficiency and Scalability

One of the key objectives of the Turing Test is to “enable machines to simulate human cognitive processes”, which means that the machine is demanded to exhibit efficient reasoning and resource management capabilities. To achieve Turing’s goal, models must not only demonstrate intelligent behavior in complex tasks but also conduct reasoning efficiently under resource constraints [94,95]. The introduction of TFT approaches marks a significant advancement, resulting in a notable increase in resource efficiency by enabling models to learn effectively from limited data and small sample sizes [96]. This advancement allows LLMs to adapt to tasks without sacrificing computational efficiency. However, this approach raises the critical issue of balancing reduced computational resource consumption with maintaining the richness of the model’s intelligent behavior. A comparable challenge arises in cognitive processes: optimizing thinking strategies to solve complex problems with limited resources. This dynamic interplay between computational efficiency and the expansion of intelligent behavior lies at the heart of the continuous evolution of fine-tuning techniques. However, the complexity of fine-tuning [40,97,98,99] itself introduces additional challenges, particularly as models become more specialized. Fine-tuning involves not only refining the model’s ability to process domain-specific information but also ensuring that the broader capabilities of the model are preserved, which requires careful balancing of computational and data resources. This complexity intensifies as models evolve toward more agentic behavior, where self-directed decision-making must still function efficiently within these constraints. This issue directly aligns with the Turing test’s objective of simulating human-like awareness and highlights the challenge of scaling intelligent behavior while maintaining efficiency.

Challenges of Interpretability and Trustworthiness

For large language models to pass the Turing test, they must not only generate intelligent responses but also produce outputs that are understandable and trustworthy to humans. It requires both strong reasoning abilities and interpretable decision-making processes. The TFT approach enhances model interpretability by guiding the learning process and improving decision-making transparency [62]. However, while TFT improves the interpretability of intermediate steps, further advancements are needed to align machine reasoning with human thought processes and ensure full transparency and credibility.

Additionally, although zero-shot and few-shot learning improve generalization, fine-tuning technology still struggles with handling unknown tasks and edge cases [100,101,102]. While TFT approaches have improved robustness, maintaining stability and efficiency across diverse tasks remains a challenge. This issue reflects the Hermeneutic Circle, where repeated feedback and interaction refine the model’s comprehension capacity, enabling it to better adapt to new contexts and tasks.

Ethical Risks and Implications of Advanced Fine-Tuning Techniques

As fine-tuning techniques rapidly advance the reasoning capabilities of LLMs, their decision-making powers become increasingly formidable, giving rise to profound ethical concerns [103,104]. With exceptional performance, particularly in zero-shot and few-shot learning tasks, these models’ responses may be seen as both credible and authoritative, which could lead to an over-reliance on them. Such dependence has the potential to erode human autonomy in decision-making, subtly shaping individual thought processes and diminishing free will. Furthermore, the deep reasoning embedded within these models may inadvertently reinforce specific values and behavior patterns, fostering a sense of dependency in certain contexts and narrowing the range of personal choice. Ethical challenges also stem from the possibility that biases, faulty inferences, or value-laden perspectives within the model may be amplified through fine-tuning, thereby influencing human cognition and subtly altering values, decision-making frameworks, and moral judgments.

4.2. Future Directions

The development of technology has always carried with it human expectations. Looking through the historical lens, three fundamental expectations persist: the expectation of foresight, the expectation of control, and the expectation of precision. When faced with the unknown, humans always hope that technology will deliver predictable outcomes and provide effective solutions to complex problems. Alongside which, the expectation of control arises, reflecting humanity’s desire to master technological advancements and shape them to serve our values and needs. Finally, precision stands as the pursuit of technological reliability and accuracy—especially in fields that involve decision-making and reasoning, where this expectation is particularly pronounced.

When reflecting on the possibilities introduced by fine-tuning technology within the context of the Turing test, the TFT approach can be seen as a transition from “simulating human thought” to “self-adjustment and evolution”. The aim of the Turing test is to assess whether a machine can emulate behaviors exhibiting human “intelligence”. TFT shifts the machine’s learning process from knowledge “storage” to knowledge “generation” and “self-adaptation”. Through tutorial-based learning, the machine continually refines its reasoning and decision-making abilities based on task feedback. In this way, the machine’s behavior evolves beyond preset responses, gradually adapting and optimizing through interaction with external tasks, showcasing a “mind bridge” between AI and humans.

Behind this trend lies humanity’s pursuit of “control” and “comprehension” of technology, as well as a proactive exploration of its future. Looking at the trends of fine-tuning technology, key areas include: (1) further integration of adaptability and intelligent behavior, (2) breakthroughs in fine-tuning efficiency, (3) expansion of multi-modal fine-tuning, and (4) the diverse models of human–AI co-evolution.

First, as fine-tuning technology continues to evolve, adaptability has emerged as a critical focus. The introduction of LLMs, especially with approaches like TFT, enables models to dynamically adjust and optimize in response to diverse tasks and contexts, more effectively aligning with specific needs. This “adaptive” capability not only represents a breakthrough in traditional fine-tuning but also marks a deeper integration of “intelligent behavior”. Looking ahead, fine-tuning advancements will drive LLM closer to human-like intelligence, enabling continuous adjustment and improvement through self-feedback and iteration, thus ensuring high efficiency in complex, multi-task environments.

Second, as model scales continue to expand, ensuring performance enhancement while maintaining efficiency remains a key challenge for fine-tuning techniques. Future advancements in fine-tuning will focus more on improving tuning efficiency, utilizing innovative algorithms and techniques to deliver exceptional performance even in low-resource, high-efficiency environments. It means that the learning process of LLM will no longer rely solely on massive datasets and computational power but will instead leverage smarter adjustment strategies to achieve optimal results with “minimal adjustment, diverse tasks”.

Third, the emergence of multi-modal fine-tuning will be another visible trend. This direction enables LLMs to process not only textual data but also images, audio, video, and other forms of information. LLMs may integrate multiple sensory capabilities—such as vision, hearing, and touch—allowing them to demonstrate enhanced intelligence in more complex and multidimensional tasks. For instance, in fields like autonomous driving, smart healthcare, and virtual reality, models will need to analyze and reason with data from various sensory inputs simultaneously. Multi-modal fine-tuning will empower models with cross-domain adaptability, enabling them to perform with greater intelligence when handling diverse and intricate information. This advancement goes beyond technology, reflecting a deeper simulation and extension of human sensory perception and comprehension.

Fourth, driven by fine-tuning technology, the concept of human–AI co-evolution will change from a mere “technical application” issue into a profound social and ethical concern. As fine-tuning technologies advance, the focus will extend beyond the performance of models themselves to how human–machine interaction influences and reshapes the communication structure, work dynamics, and social relationships. In this evolving landscape, many researchers predict that the next stage of LLMs will be agentic AI—machines that make independent decisions and take on more autonomous roles. As fine-tuning fosters deeper collaboration between humans and machines, this synergy will enhance not only task execution but also permeate every aspect of human decision-making and cognition. However, this will also introduce significant challenges. Fostering the balanced evolution of the human–machine relationship, while safeguarding individual autonomy and free will, will become a central concern. The rise of agentic AI [105,106] will intensify these issues, necessitating careful consideration of control, accountability, and the ethical boundaries of AI’s role in society.

Last but not least, although fine-tuning technology represents just one aspect of large language models, its rapid development, wide-ranging applications, and deep impact on human interaction surpasses any prior technological advancements. For centuries, we have viewed the world through a human-centered lens, using ourselves as the reference point to observe and evaluate change. However, stepping outside this perspective prompts an important question: can artificial intelligence, with its advanced reasoning and ongoing self-improvement, truly achieve the third goal of the Turing test? This introduces the possibility of machines evolving into a “new species”, a prospect that requires careful thought and scrutiny.

5. Conclusions

In conclusion, this paper has addressed two key research questions central to the development of fine-tuning techniques in large language models (LLMs). First, it explored how fine-tuning can be optimized to bridge the cognitive gap between AI and humans. By integrating Gadamer’s hermeneutic theory—particularly concepts such as Vorverständnis, Distanciation, and the Hermeneutic Circle—we have framed fine-tuning as a dynamic, iterative process. This approach allows LLMs to move beyond basic knowledge acquisition, refining their task-specific understanding and enhancing their ability to generate human-like responses. The concept of Tutorial Fine-Tuning (TFT) emerged as a promising advancement, guiding models to learn efficiently from limited data, thereby increasing their adaptability and performance in data-scarce contexts. Second, the paper highlighted the emerging challenges in fine-tuning LLMs, particularly in terms of accuracy, domain-specific precision, and the model’s potential for developing deeper cognitive intelligence. While fine-tuning holds significant promise for improving the cognitive and operational capabilities of LLMs, overcoming these challenges will be crucial for the continued evolution of AI systems. Additionally, as these models advance, addressing the ethical and societal implications of AI technologies will remain critical. Ensuring that fine-tuning fosters positive collaboration between humans and machines, while safeguarding individual autonomy, is essential for promoting responsible AI development.

Ultimately, by linking theoretical frameworks with practical applications, this paper attempts to offer some insights for the future of LLMs, and therefore to inspire further research that will drive the next phase of AI evolution, where models not only process data but also deeply comprehend and interact with human-like intelligence.

Author Contributions

Conceptualization, X.-K.W.; methodology, X.-K.W., M.C. and R.W.; software, J.L. and L.H.; validation, X.-K.W., W.L. and L.L.; formal analysis, X.-K.W. and R.W.; investigation, M.C. and W.L.; resources, M.C., J.L., K.H. (Kai Hwang), Y.H., Q.M., K.H. (Kaibin Huang), L.H., M.G., N.C. and G.F.; data curation, X.-K.W., W.L., L.L. and Y.P.; writing-original draft preparation, X.-K.W., W.L., L.L. and Y.P.; writing-review and editing, R.W., K.H. (Kai Hwang), Y.H., Y.H., Q.M., K.H. (Kaibin Huang), M.G., N.C., G.F., F.L., Y.T., D.N. and F.-Y.W.; visualization, X.-K.W. and W.L.; supervision, M.C. and R.W.; project administration, M.C.; funding acquisition, X.-K.W. All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by the Major Research Project of the National Social Science Foundation of China (Grant No. 23&ZD215).

Data Availability Statement

No data were used in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Turing, A.M. Computing machinery and intelligence. Mind 1950, 59, 433–460. [Google Scholar]
Russell, S.J.; Norvig, P. Artificial Intelligence: A Modern Approach; Pearson: Sydney, Australia, 2016. [Google Scholar]
Sutton, R.S.; Barto, A.G. Reinforcement Learning: An Introduction; MIT Press: Cambridge, UK, 1998; Volume 1. [Google Scholar]
Schmidhuber, J. Formal theory of creativity, fun, and intrinsic motivation (1990–2010). IEEE Trans. Auton. Ment. Dev. 2010, 2, 230–247. [Google Scholar]
Newell, A.; Simon, H.A. Computer science as empirical inquiry: Symbols and search. In ACM Turing Award Lectures; Association for Computing Machinery: New York, NY, USA, 2007; p. 1975. Available online: https://dl.acm.org/doi/10.1145/360018.360022 (accessed on 30 March 2025).
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [PubMed]
Schaeffer, R.; Miranda, B.; Koyejo, S. Are emergent abilities of large language models a mirage? Adv. Neural Inf. Process. Syst. 2023, 36, 55565–55581. [Google Scholar]
Buttazzo, G. Artificial consciousness: Utopia or real possibility? Computer 2001, 34, 24–30. [Google Scholar]
Brown, T.; Mann, B.; Ryder, N.; Subbiah, M.; Kaplan, J.D.; Dhariwal, P.; Neelakantan, A.; Shyam, P.; Sastry, G.; Askell, A.; et al. Language models are few-shot learners. Adv. Neural Inf. Process. Syst. 2020, 33, 1877–1901. [Google Scholar]
Kahneman, D. Thinking, Fast and Slow; Macmillan: London, UK, 2011. [Google Scholar]
Gadamer, H.G. Aesthetics and hermeneutics. In The Continental Aesthetics Reader; Routledge: Oxfordshire, UK, 1960; pp. 181–186. [Google Scholar]
Clark, A.; Karmiloff-Smith, A. The cognizer’s innards: A psychological and philosophical perspective on the development of thought. Mind Lang. 1993, 8, 487–519. [Google Scholar]
Gadamer, H.G. Philosophical Hermeneutics; University of California Press: Berkeley, CA, USA, 1977. [Google Scholar]
Heidegger, M. Being and time Harper and Row. New York. 1962. Available online: http://pdf-objects.com/files/Heidegger-Martin-Being-and-Time-trans.-Macquarrie-Robinson-Blackwell-1962.pdf (accessed on 6 March 2025).
Kintsch, W. Comprehension: A Paradigm for Cognition; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Ricoeur, P. Interpretation Theory: Discourse and the Surplus of Meaning; TCU Press: Fort Worth, TX, USA, 1976. [Google Scholar]
Bender, E.M.; Gebru, T.; McMillan-Major, A.; Shmitchell, S. On the dangers of stochastic parrots: Can language models be too big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, Online, 3–10 March 2021; pp. 610–623. [Google Scholar]
Howard, J.; Ruder, S. Universal language model fine-tuning for text classification. arXiv 2018, arXiv:1801.06146. [Google Scholar]
Vaswani, A. Attention is all you need. In Proceedings of the 31st Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Devlin, J. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
Lake, B.; Baroni, M. Generalization without systematicity: On the compositional skills of sequence-to-sequence recurrent networks. In Proceedings of the International Conference on Machine Learning, PMLR, Stockholm, Sweden, 10–15 July 2018; pp. 2873–2882. [Google Scholar]
Petroni, F.; Rocktäschel, T.; Lewis, P.; Bakhtin, A.; Wu, Y.; Miller, A.H.; Riedel, S. Language models as knowledge bases? arXiv 2019, arXiv:1909.01066. [Google Scholar]
Radford, A.; Wu, J.; Child, R.; Luan, D.; Amodei, D.; Sutskever, I. Language models are unsupervised multitask learners. OpenAI Blog 2019, 1, 9. [Google Scholar]
Bender, E.M.; Koller, A. Climbing towards NLU: On meaning, form, and understanding in the age of data. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Online, 5–7 July 2020; pp. 5185–5198. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the limits of transfer learning with a unified text-to-text transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Zhao, W.X.; Zhou, K.; Li, J.; Tang, T.; Wang, X.; Hou, Y.; Min, Y.; Zhang, B.; Zhang, J.; Dong, Z.; et al. A survey of large language models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
Susnjak, T.; Hwang, P.; Reyes, N.H.; Barczak, A.L.; McIntosh, T.R.; Ranathunga, S. Automating research synthesis with domain-specific large language model fine-tuning. ACM Trans. Knowl. Discov. Data 2024, 19, 1–39. [Google Scholar] [CrossRef]
Mallery, J.C.; Hurwitz, R.; Duffy, G. Hermeneutics: From Textual Explication to Computer Understanding? 1986. Available online: https://www.researchgate.net/publication/2769192_Hermeneutics_From_Textual_Explication_to_Computer_Understanding (accessed on 4 March 2025).
Chen, M.; Herrera, F.; Hwang, K. Cognitive computing: Architecture, technologies and intelligent applications. IEEE Access 2018, 6, 19774–19783. [Google Scholar] [CrossRef]
Liu, J.; Hao, Y.; He, Z.; Chen, M.; Hu, L.; Wei, G. BigFiberNet: LLMs and Fabric Computing Empowered Large-scale Non-disturbance Mobile Sensing Networks. IEEE Netw. 2024. Available online: https://ieeexplore.ieee.org/document/10804831 (accessed on 2 March 2025).
Liu, J.; Chen, M. FaGeL: Fabric LLMs Agent empowered Embodied Intelligence Evolution with Autonomous Human-Machine Collaboration. arXiv 2024, arXiv:2412.20297. [Google Scholar]
Qu, G.; Chen, Q.; Wei, W.; Lin, Z.; Chen, X.; Huang, K. Mobile edge intelligence for large language models: A contemporary survey. IEEE Commun. Surv. Tutor. 2025. Available online: https://arxiv.org/html/2407.18921v2 (accessed on 2 March 2025).
Wang, F.Y.; Miao, Q.; Li, X.; Wang, X.; Lin, Y. What does ChatGPT say: The DAO from algorithmic intelligence to linguistic intelligence. IEEE/CAA J. Autom. Sin. 2023, 10, 575–579. [Google Scholar] [CrossRef]
Liu, Y.; Wu, F.; Liu, Z.; Wang, K.; Wang, F.; Qu, X. Can language models be used for real-world urban-delivery route optimization? Innovation 2023, 4, 100520. [Google Scholar]
Wu, H.; Chen, X.; Huang, K. Resource management for low-latency cooperative fine-tuning of foundation models at the network edge. arXiv 2024, arXiv:2407.09873. [Google Scholar]
White, J.; Fu, Q.; Hays, S.; Sandborn, M.; Olea, C.; Gilbert, H.; Elnashar, A.; Spencer-Smith, J.; Schmidt, D.C. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv 2023, arXiv:2302.11382. [Google Scholar]
Lewis, P.; Perez, E.; Piktus, A.; Petroni, F.; Karpukhin, V.; Goyal, N.; Küttler, H.; Lewis, M.; Yih, W.t.; Rocktäschel, T.; et al. Retrieval-augmented generation for knowledge-intensive nlp tasks. Adv. Neural Inf. Process. Syst. 2020, 33, 9459–9474. [Google Scholar]
Lv, K.; Yang, Y.; Liu, T.; Gao, Q.; Guo, Q.; Qiu, X. Full parameter fine-tuning for large language models with limited resources. arXiv 2023, arXiv:2306.09782. [Google Scholar]
Houlsby, N.; Giurgiu, A.; Jastrzebski, S.; Morrone, B.; De Laroussilhe, Q.; Gesmundo, A.; Attariyan, M.; Gelly, S. Parameter-efficient transfer learning for NLP. In Proceedings of the International Conference on Machine Learning, PMLR, Long Beach, CA, USA, 9–15 June 2019; pp. 2790–2799. [Google Scholar]
Han, Z.; Gao, C.; Liu, J.; Zhang, J.; Zhang, S.Q. Parameter-efficient fine-tuning for large models: A comprehensive survey. arXiv 2024, arXiv:2403.14608. [Google Scholar]
Valipour, M.; Rezagholizadeh, M.; Kobyzev, I.; Ghodsi, A. Dylora: Parameter efficient tuning of pre-trained models using dynamic search-free low-rank adaptation. arXiv 2022, arXiv:2210.07558. [Google Scholar]
Zhang, S.; Dong, L.; Li, X.; Zhang, S.; Sun, X.; Wang, S.; Li, J.; Hu, R.; Zhang, T.; Wu, F.; et al. Instruction tuning for large language models: A survey. arXiv 2023, arXiv:2308.10792. [Google Scholar]
Christiano, P.F.; Leike, J.; Brown, T.; Martic, M.; Legg, S.; Amodei, D. Deep reinforcement learning from human preferences. Adv. Neural Inf. Process. Syst. 2017, 30, 4302–4310. Available online: https://dl.acm.org/doi/pdf/10.5555/3294996.3295184 (accessed on 4 March 2025).
Stiennon, N.; Ouyang, L.; Wu, J.; Ziegler, D.; Lowe, R.; Voss, C.; Radford, A.; Amodei, D.; Christiano, P.F. Learning to summarize with human feedback. Adv. Neural Inf. Process. Syst. 2020, 33, 3008–3021. [Google Scholar]
Radford, A. Improving Language Understanding by Generative Pre-Training. 2018. Available online: https://cdn.openai.com/research-covers/language-unsupervised/language_understanding_paper.pdf (accessed on 6 March 2025).
Ding, N.; Lv, X.; Wang, Q.; Chen, Y.; Zhou, B.; Liu, Z.; Sun, M. Sparse low-rank adaptation of pre-trained language models. arXiv 2023, arXiv:2311.11696. [Google Scholar]
Zadouri, T.; Üstün, A.; Ahmadian, A.; Ermiş, B.; Locatelli, A.; Hooker, S. Pushing mixture of experts to the limit: Extremely parameter efficient moe for instruction tuning. arXiv 2023, arXiv:2309.05444. [Google Scholar]
Lin, Y.; Ma, X.; Chu, X.; Jin, Y.; Yang, Z.; Wang, Y.; Mei, H. Lora dropout as a sparsity regularizer for overfitting control. arXiv 2024, arXiv:2404.09610. [Google Scholar]
Pfeiffer, J.; Kamath, A.; Rücklé, A.; Cho, K.; Gurevych, I. Adapterfusion: Non-destructive task composition for transfer learning. arXiv 2020, arXiv:2005.00247. [Google Scholar]
Liu, X.; Ji, K.; Fu, Y.; Tam, W.L.; Du, Z.; Yang, Z.; Tang, J. P-tuning v2: Prompt tuning can be comparable to fine-tuning universally across scales and tasks. arXiv 2021, arXiv:2110.07602. [Google Scholar]
Li, X.L.; Liang, P. Prefix-tuning: Optimizing continuous prompts for generation. arXiv 2021, arXiv:2101.00190. [Google Scholar]
Ma, F.; Zhang, C.; Ren, L.; Wang, J.; Wang, Q.; Wu, W.; Quan, X.; Song, D. Xprompt: Exploring the extreme of prompt tuning. arXiv 2022, arXiv:2210.04457. [Google Scholar]
Li, J.; Aitken, W.; Bhambhoria, R.; Zhu, X. Prefix propagation: Parameter-efficient tuning for long sequences. arXiv 2023, arXiv:2305.12086. [Google Scholar]
Huang, J.; Lin, F.; Yang, J.; Wang, X.; Ni, Q.; Wang, Y.; Tian, Y.; Li, J.; Wang, F. From prompt engineering to generative artificial intelligence for large models: The state of the art and perspective. Chin. J. Intell. Sci. Technol. 2024, 6, 115–133. [Google Scholar]
Wei, J.; Bosma, M.; Zhao, V.Y.; Guu, K.; Yu, A.W.; Lester, B.; Du, N.; Dai, A.M.; Le, Q.V. Finetuned language models are zero-shot learners. arXiv 2021, arXiv:2109.01652. [Google Scholar]
Chung, H.W.; Hou, L.; Longpre, S.; Zoph, B.; Tay, Y.; Fedus, W.; Li, Y.; Wang, X.; Dehghani, M.; Brahma, S.; et al. Scaling instruction-finetuned language models. J. Mach. Learn. Res. 2024, 25, 1–53. [Google Scholar]
Wu, T.; Zhu, B.; Zhang, R.; Wen, Z.; Ramchandran, K.; Jiao, J. Pairwise proximal policy optimization: Harnessing relative feedback for llm alignment. arXiv 2023, arXiv:2310.00212. [Google Scholar]
Rafailov, R.; Sharma, A.; Mitchell, E.; Manning, C.D.; Ermon, S.; Finn, C. Direct preference optimization: Your language model is secretly a reward model. Adv. Neural Inf. Process. Syst. 2024, 36, 53728–53741. [Google Scholar]
Gulcehre, C.; Le Paine, T.; Srinivasan, S.; Konyushkova, K.; Weerts, L.; Sharma, A.; Siddhant, A.; Ahern, A.; Wang, M.; Gu, C.; et al. Reinforced self-training (rest) for language modeling. arXiv 2023, arXiv:2308.08998. [Google Scholar]
Yuan, Z.; Yuan, H.; Tan, C.; Wang, W.; Huang, S.; Huang, F. Rrhf: Rank responses to align language models with human feedback without tears. arXiv 2023, arXiv:2304.05302. [Google Scholar]
Luong, T.Q.; Zhang, X.; Jie, Z.; Sun, P.; Jin, X.; Li, H. Reft: Reasoning with reinforced fine-tuning. arXiv 2024, arXiv:2401.08967. [Google Scholar]
Liu, J.; Wang, Y.; Lin, Z.; Chen, M.; Hao, Y.; Hu, L. Natural Language Fine-Tuning. arXiv 2024, arXiv:2412.20382. [Google Scholar]
Guo, D.; Yang, D.; Zhang, H.; Song, J.; Zhang, R.; Xu, R.; Zhu, Q.; Ma, S.; Wang, P.; Bi, X.; et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv 2025, arXiv:2501.12948. [Google Scholar]
Muennighoff, N.; Yang, Z.; Shi, W.; Li, X.L.; Fei-Fei, L.; Hajishirzi, H.; Zettlemoyer, L.; Liang, P.; Candès, E.; Hashimoto, T. s1: Simple test-time scaling. arXiv 2025, arXiv:2501.19393. [Google Scholar]
Yuan, J.; Gao, H.; Dai, D.; Luo, J.; Zhao, L.; Zhang, Z.; Xie, Z.; Wei, Y.X.; Wang, L.; Xiao, Z.; et al. Native Sparse Attention: Hardware-Aligned and Natively Trainable Sparse Attention. arXiv 2025, arXiv:2502.11089. [Google Scholar]
Sriram, A.; Jun, H.; Satheesh, S.; Coates, A. Cold fusion: Training seq2seq models together with language models. arXiv 2017, arXiv:1708.06426. [Google Scholar]
Lewis, M. Bart: Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension. arXiv 2019, arXiv:1910.13461. [Google Scholar]
Guo, D.; Zhu, Q.; Yang, D.; Xie, Z.; Dong, K.; Zhang, W.; Chen, G.; Bi, X.; Wu, Y.; Li, Y.; et al. DeepSeek-Coder: When the Large Language Model Meets Programming–The Rise of Code Intelligence. arXiv 2024, arXiv:2401.14196. [Google Scholar]
DeepSeek, vs. OpenAI, xAI, and Anthropic: A Comparative Evaluation by FlagEval. 2025. Available online: https://hub.baai.ac.cn/view/43898 (accessed on 6 March 2025).
Broughel, J. The Tradeoffs Between Energy Efficiency, Consumer Preferences, and Economic Growth; The Center for Growth and Opportunity: Logan, UT, USA, 2025. [Google Scholar]
Liu, R.; Gao, J.; Zhao, J.; Zhang, K.; Li, X.; Qi, B.; Ouyang, W.; Zhou, B. Can 1B LLM Surpass 405B LLM? Rethinking Compute-Optimal Test-Time Scaling. arXiv 2025, arXiv:2502.06703. [Google Scholar]
George, A.; Jose, A.; Ashik, F.; Prabhakar, P.; Pati, P.B.; Parida, S. Enhancing Legal Decision Making: WRIT Case Outcome Prediction with LegalBERT Embeddings and AdaBoost Classifier. In Proceedings of the 2024 IEEE International Conference on Contemporary Computing and Communications (InC4), Bangalore, India, 15–16 March 2024; IEEE: New York, NY, USA, 2024; Volume 1, pp. 1–6. [Google Scholar]
Tian, Y.; Lin, F.; Li, Y.; Zhang, T.; Zhang, Q.; Fu, X.; Huang, J.; Dai, X.; Wang, Y.; Tian, C.; et al. UAVs Meet LLMs: Overviews and Perspectives Toward Agentic Low-Altitude Mobility. arXiv 2025, arXiv:2501.02341. [Google Scholar]
Tang, Y.; Han, X.; Li, X.; Yu, Q.; Hao, Y.; Hu, L.; Chen, M. Minigpt-3d: Efficiently aligning 3d point clouds with large language models using 2d priors. In Proceedings of the 32nd ACM International Conference on Multimedia, Melbourne, Australia, 28 October–1 November 2024; pp. 6617–6626. [Google Scholar]
Elshin, D.; Karpachev, N.; Gruzdev, B.; Golovanov, I.; Ivanov, G.; Antonov, A.; Skachkov, N.; Latypova, E.; Layner, V.; Enikeeva, E.; et al. From general LLM to translation: How we dramatically improve translation quality using human evaluation data for LLM finetuning. In Proceedings of the Ninth Conference on Machine Translation, Miami, FL, USA, 12–13 November 2024; pp. 247–252. [Google Scholar]
Wang, X.; Zhou, W.; Zu, C.; Xia, H.; Chen, T.; Zhang, Y.; Zheng, R.; Ye, J.; Zhang, Q.; Gui, T.; et al. Instructuie: Multi-task instruction tuning for unified information extraction. arXiv 2023, arXiv:2304.08085. [Google Scholar]
Gupta, P.; Jiao, C.; Yeh, Y.T.; Mehri, S.; Eskenazi, M.; Bigham, J.P. InstructDial: Improving zero and few-shot generalization in dialogue through instruction tuning. arXiv 2022, arXiv:2205.12673. [Google Scholar]
Bražinskas, A.; Nallapati, R.; Bansal, M.; Dreyer, M. Efficient few-shot fine-tuning for opinion summarization. arXiv 2022, arXiv:2205.02170. [Google Scholar]
Varia, S.; Wang, S.; Halder, K.; Vacareanu, R.; Ballesteros, M.; Benajiba, Y.; John, N.A.; Anubhai, R.; Muresan, S.; Roth, D. Instruction tuning for few-shot aspect-based sentiment analysis. arXiv 2022, arXiv:2210.06629. [Google Scholar]
Bill, D.; Eriksson, T. Fine-Tuning a Llm Using Reinforcement Learning from Human Feedback for a Therapy Chatbot Application. 2023. Available online: https://www.diva-portal.org/smash/get/diva2:1782678/FULLTEXT01.pdf (accessed on 3 March 2025).
Maharjan, J.; Garikipati, A.; Singh, N.P.; Cyrus, L.; Sharma, M.; Ciobanu, M.; Barnes, G.; Thapa, R.; Mao, Q.; Das, R. OpenMedLM: Prompt engineering can out-perform fine-tuning in medical question-answering with open-source large language models. Sci. Rep. 2024, 14, 14156. [Google Scholar] [CrossRef]
Lehman, E.; Johnson, A. Clinical-t5: Large language models built using mimic clinical text. PhysioNet 2023, 101, 215–220. [Google Scholar]
Ross, E.; Kansal, Y.; Renzella, J.; Vassar, A.; Taylor, A. Supervised Fine-Tuning LLMs to Behave as Pedagogical Agents in Programming Education. arXiv 2025, arXiv:2502.20527. [Google Scholar]
Gao, L.; Lu, J.; Shao, Z.; Lin, Z.; Yue, S.; Ieong, C.; Sun, Y.; Zauner, R.J.; Wei, Z.; Chen, S. Fine-tuned large language model for visualization system: A study on self-regulated learning in education. IEEE Trans. Vis. Comput. Graph. 2025, 31, 514–524. [Google Scholar] [CrossRef]
Latif, E.; Zhai, X. Fine-tuning ChatGPT for automatic scoring. Comput. Educ. Artif. Intell. 2024, 6, 100210. [Google Scholar]
Iacovides, G.; Konstantinidis, T.; Xu, M.; Mandic, D. FinLlama: LLM-Based Financial Sentiment Analysis for Algorithmic Trading. In Proceedings of the 5th ACM International Conference on AI in Finance, Brooklyn, NY, USA, 14–17 November 2024; pp. 134–141. [Google Scholar]
Chen, W.; Wang, Q.; Long, Z.; Zhang, X.; Lu, Z.; Li, B.; Wang, S.; Xu, J.; Bai, X.; Huang, X.; et al. Disc-finllm: A chinese financial large language model based on multiple experts fine-tuning. arXiv 2023, arXiv:2310.15205. [Google Scholar]
Ni, S.; Cheng, H.; Yang, M. Pre-training, Fine-tuning and Re-ranking: A Three-Stage Framework for Legal Question Answering. In Proceedings of the ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Hyderabad, India, 6–11 April 2025; IEEE: New York, NY, USA, 2025; pp. 1–5. [Google Scholar]
Satterfield, N.; Holbrooka, P.; Wilcoxa, T. Fine-tuning llama with case law data to improve legal domain performance. OSF 2024. [Google Scholar] [CrossRef]
Guan, C.; Chin, A.; Vahabi, P. Enhancing news summarization with elearnfit through efficient in-context learning and efficient fine-tuning. arXiv 2024, arXiv:2405.02710. [Google Scholar]
Wang, Z.; Cheng, J.; Cui, C.; Yu, C. Implementing BERT and fine-tuned RobertA to detect AI generated news by ChatGPT. arXiv 2023, arXiv:2306.07401. [Google Scholar]
Wang, Y.; Zhang, Z.; Wang, J.; Fan, D.; Xu, Z.; Liu, L.; Hao, X.; Bhat, V.; Li, X. GEXIA: Granularity Expansion and Iterative Approximation for Scalable Multi-grained Video-language Learning. arXiv 2024, arXiv:2412.07704. [Google Scholar]
Lee, S.H.; Wang, J.; Fan, D.; Zhang, Z.; Liu, L.; Hao, X.; Bhat, V.; Li, X. NowYouSee Me: Context-Aware Automatic Audio Description. arXiv 2024, arXiv:2412.10002. [Google Scholar]
Li, Z.; Chen, C.; Xu, T.; Qin, Z.; Xiao, J.; Sun, R.; Luo, Z.Q. Entropic distribution matching in supervised fine-tuning of LLMs: Less overfitting and better diversity. arXiv 2024, arXiv:2408.16673. [Google Scholar]
Lin, H.; Huang, B.; Ye, H.; Chen, Q.; Wang, Z.; Li, S.; Ma, J.; Wan, X.; Zou, J.; Liang, Y. Selecting large language model to fine-tune via rectified scaling law. arXiv 2024, arXiv:2402.02314. [Google Scholar]
Ash, M.G. Gestalt Psychology in German Culture, 1890–1967: Holism and the Quest for Objectivity; Cambridge University Press: Cambridge, UK, 1998. [Google Scholar]
Chen, J.; Zhang, A.; Shi, X.; Li, M.; Smola, A.; Yang, D. Parameter-efficient fine-tuning design spaces. arXiv 2023, arXiv:2301.01821. [Google Scholar]
Hu, Z.; Wang, L.; Lan, Y.; Xu, W.; Lim, E.P.; Bing, L.; Xu, X.; Poria, S.; Lee, R.K.W. Llm-adapters: An adapter family for parameter-efficient fine-tuning of large language models. arXiv 2023, arXiv:2304.01933. [Google Scholar]
Zhou, H.; Wan, X.; Vulić, I.; Korhonen, A. Autopeft: Automatic configuration search for parameter-efficient fine-tuning. Trans. Assoc. Comput. Linguist. 2024, 12, 525–542. [Google Scholar] [CrossRef]
Zhu, K.; Zhao, Q.; Chen, H.; Wang, J.; Xie, X. Promptbench: A unified library for evaluation of large language models. J. Mach. Learn. Res. 2024, 25, 1–22. [Google Scholar]
Jebali, M.S.; Valanzano, A.; Murugesan, M.; Veneri, G.; De Magistris, G. Leveraging the Regularizing Effect of Mixing Industrial and Open Source Data to Prevent Overfitting of LLM Fine Tuning. In Proceedings of the International Joint Conference on Artificial Intelligence 2024 Workshop on AI Governance: Alignment, Morality, and Law, Jeju, Republic of Korea, 3–9 August 2024. [Google Scholar]
Liu, T.; Dong, Z.; Zhang, L.; Wang, H.; Gao, J. Mitigating Heterogeneous Token Overfitting in LLM Knowledge Editing. arXiv 2025, arXiv:2502.00602. [Google Scholar]
Schramowski, P.; Turan, C.; Andersen, N.; Rothkopf, C.A.; Kersting, K. Large pre-trained language models contain human-like biases of what is right and wrong to do. Nat. Mach. Intell. 2022, 4, 258–268. [Google Scholar] [CrossRef]
Luo, W.; Keung, J.W.; Yang, B.; Ye, H.; Goues, C.L.; Bissyande, T.F.; Tian, H.; Le, B. When Fine-Tuning LLMs Meets Data Privacy: An Empirical Study of Federated Learning in LLM-Based Program Repair. arXiv 2024, arXiv:2412.01072. [Google Scholar]
Giannini, F.; Franzè, G.; Pupo, F.; Fortino, G. A sustainable multi-agent routing algorithm for vehicle platoons in urban networks. IEEE Trans. Intell. Transp. Syst. 2023, 24, 14830–14840. [Google Scholar] [CrossRef]
Tran, K.T.; Dao, D.; Nguyen, M.D.; Pham, Q.V.; O’Sullivan, B.; Nguyen, H.D. Multi-Agent Collaboration Mechanisms: A Survey of LLMs. arXiv 2025, arXiv:2501.06322. [Google Scholar]

Figure 1. Conceptional framework for human–AI co-evolution.

Figure 2. Evolution of LLM Fine-Tuning.

Figure 3. Key technical features of TFT.

Table 1. Key Teminologies.

Data Usage Method	Key Technique \Models	Release Time	Technical Focus	Developer
SFT	Traditional SFT: BERT [20], GPT [45]	2018	Directly fine-tune models using annotated datasets	Google, OpenAI
	Full Fine-tuning [25]	2020	Data quality affects performance, risk of overfitting	Raffel et al. (Google)
	LoRA: DyLoRA [41], AdaLoRA [46], MoLORA [47], LoRA Dropout [48]	2022, 2023, 2023, 2024	Achieves near-full fine-tuning through rank decomposition	Valipour, et al.
	Adapter-based: Serial Adapter [39], AdapterFusion [49]	2019, 2020	Adapter parameters are small, ideal for multi-task scenarios	Houlsby, Pfeiffer
	Prompt Tuning: P-Tuning v2 [50], Prefix-tuning [51], Xprompt [52], Prefix propagation [53], Prompt engineering [54]	2021, 2021, 2022, 2023, 2024	Suitable for few-shot scenarios, sensitive to task complexity	Liu et al.
	FLAN (Fine-tuned Language Net): FLAN [55], FLAN-T5 [56]	2021, 2024	Enhances model generalization across tasks	Wei et al. (Google)
RLHF	PPO (Proximal Policy Optimization) [39,57]	2023	Introduces trust region constraints to prevent excessive policy updates	Wu et al.
	DPO (Direct Preference Optimization) [58]	2024	Optimizes policies using preference data, bypassing explicit reward models	Rafailov et al.
	ReST (Reinforced Self-Training) [59]	2023	Iteratively generates high-quality samples to fine-tune models	Gulcehre
	RRHF (Rank Responses to align Human Feedback) [60]	2023	Aligns human preferences using ranking loss functions, no reinforcement learning	Yuan et al.
RFT	ReFT [61]	2024	Explores multiple CoT paths for non-differentiable objective optimization	Luong et al. (ByteDance)
	RFT	2024	Utilizes reinforcement fine-tuning to create expert models with high-quality tasks	OpenAI
	Grok-3	2025	Self-correction mechanism and reinforcement learning capability	xAI
TFT	NLFT [62]	2024 (Dec.)	Compares token probabilities under different prompts, using natural language as supervisory signal	Liu et al. (EPIC Lab)
	DeepSeek R1 [63]	2025 (Jan.)	Releases distilled models for low-cost, high-performance inference	DeepSeek-AI
	TTS [64]	2025 (Feb.)	Introduces budget enforcement for effective scaling during testing (details undisclosed)	Stanford (Fei-Fei Li’s team)
	NSA (Native Sparse Attention) [65]	2025 (Feb.)	Hardware alignment and sparse attention mechanisms enhance training and inference efficiency	DeepSeek-AI

Table 2. Application of key fine-tuning techniques.

Domain Type	Domain Target	Domain-Specific Instruction Fine-Tuned LLMs	Fine-Tuning Techniques	Base Model
General	Machine translation	YandexMT-GPT [75]	SFT	YandexGPT
	Information extraction	InstructUlE [76]	Multi-task Instruction Tuning	FlanT5
	Dialogue	InstructDial [77]	Instruction Tuning	T0
	Text summarization	FewShotSummarizer [78]	adapter	BART
	Sentiment analysis	bangla-bert [79]	SFT	BERT
	Chatbot	PsychRLHF-Chatbot [80]	RLHF	Llama-7b
Medical	Medical Q&A	OpenMedLM [81]	Prompt tuning	OS Yi 34B
Medical	Clinical decision support	Clinical-T5 [82]	SFT	T5
Education	Pedagogical agents	GuideLM [83]	SFT	ChatGPT-4o
	Educational question generation	EduQG [84]	RFT	Google FLAN-T5
	Automatic scoring	EduScoreGPT [85]	Task Fine-tuning	GPT-3.5
Finance	Risk assessment	FinLlama [86]	LoRA	Llama2 7B
Finance	Automated customer support	DISC-FinLLM [87]	RHLF	Baichuan-13B
Law	Legal consultation	RoBERTa-LQA [88]	Task Fine-tuning	RoBERTa
Law	Legal document analysis	FTLlama3-Legal [89]	SFT	Llama3
Journalism	News summarization	LLaMa2-EFit [90]	LoRA	LLaMa2
Journalism	Fake news detection	FT-RoBERTa-FND [91]	SFT	RobertA, BERT

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wu, X.-K.; Chen, M.; Li, W.; Wang, R.; Lu, L.; Liu, J.; Hwang, K.; Hao, Y.; Pan, Y.; Meng, Q.; et al. LLM Fine-Tuning: Concepts, Opportunities, and Challenges. Big Data Cogn. Comput. 2025, 9, 87. https://doi.org/10.3390/bdcc9040087

AMA Style

Wu X-K, Chen M, Li W, Wang R, Lu L, Liu J, Hwang K, Hao Y, Pan Y, Meng Q, et al. LLM Fine-Tuning: Concepts, Opportunities, and Challenges. Big Data and Cognitive Computing. 2025; 9(4):87. https://doi.org/10.3390/bdcc9040087

Chicago/Turabian Style

Wu, Xiao-Kun, Min Chen, Wanyi Li, Rui Wang, Limeng Lu, Jia Liu, Kai Hwang, Yixue Hao, Yanru Pan, Qingguo Meng, and et al. 2025. "LLM Fine-Tuning: Concepts, Opportunities, and Challenges" Big Data and Cognitive Computing 9, no. 4: 87. https://doi.org/10.3390/bdcc9040087

APA Style

Wu, X.-K., Chen, M., Li, W., Wang, R., Lu, L., Liu, J., Hwang, K., Hao, Y., Pan, Y., Meng, Q., Huang, K., Hu, L., Guizani, M., Chao, N., Fortino, G., Lin, F., Tian, Y., Niyato, D., & Wang, F.-Y. (2025). LLM Fine-Tuning: Concepts, Opportunities, and Challenges. Big Data and Cognitive Computing, 9(4), 87. https://doi.org/10.3390/bdcc9040087

Article Menu

LLM Fine-Tuning: Concepts, Opportunities, and Challenges

Abstract

1. Introduction

2. Core Concepts of LLM Fine-Tuning

2.1. Key Techniques of Fine-Tuning in LLMs

2.2. Evolution of LLM Fine-Tuning

3. Applications and Opportunities

3.1. Applications

3.2. Opportunities

4. Challenges and Future Directions

4.1. Challenges

4.2. Future Directions

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI