Length Instruction Fine-Tuning with Chain-of-Thought (LIFT-COT): Enhancing Length Control and Reasoning in Edge-Deployed Large Language Models
Abstract
:1. Introduction
2. Related Work
2.1. Length Bias and Model Performance
2.2. Length-Aware Model Training Methods
3. AlpacaEval-LI & MT-Bench-LI: New Length-Instructed Benchmarks
3.1. Enhancing General Instructions with Length Constraints
3.1.1. Target Length
3.1.2. Baseline for Adhering to Length Instructions
3.1.3. Metrics
3.2. Length-Instructed AlpacaEval & Length-Instructed MT-Bench
4. The Dynamic Reasoning Method with COT
4.1. Introduction and Principle of COT
4.2. Design of the Dynamic Reasoning Framework
4.2.1. Components of the Framework
4.2.2. Work Flow
5. Optimization of Problem Design and Algorithms
5.1. Optimization of Problem Design
5.1.1. Optimization Objectives
5.1.2. Constraint Conditions
- Constraints on CPU usage and memory occupancy:
- Constraints on bandwidth occupancy and data transmission delay:
- Accuracy Constraint Based on the Reference Text:
- Length Instruction Adhering to the Accuracy Constraint:
- Constraint on the Number of Generated Tokens:
- Constraint on the Strength of Chain-of-Thought Guidance:
- Constraint on the Effectiveness of the Length Control Strategy:
5.1.3. Decision Variables
5.1.4. Formulation of the Optimization Problem
5.2. Optimization Algorithm Design
5.2.1. Algorithm Overview
5.2.2. Algorithm Flow
- Initialization Phase:
- 2.
- Iterative Loop Phase:
- (a)
- Generation Stage:
- (b)
- Evaluation Stage:
- Semantic Understanding Accuracy:
- Accuracy of Precise Length Instruction Following:
- (c)
- Comprehensive Resource Consumption Indicators:
- CPU Usage Fluctuation Evaluation:
- Peak Memory Occupancy Evaluation:
- Dynamic Bandwidth Occupancy Evaluation:
- Data Transmission Delay Jitter Evaluation:
- Model Inference Stability Evaluation:
- 3.
- Feedback Adjustment Phase:
5.2.3. Length Control Strategies
5.2.4. Algorithm Termination Conditions
6. Experimental Setup and Results
6.1. Experimental Datasets and Benchmarks
6.2. Experimental Results and Analysis
6.3. Analysis of Comprehensive Performance Indicators
7. Conclusions
7.1. Limitations
7.2. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Appendix A
Algorithm A1. A Word Count Function We Use |
Start Input: text (a string) Output: word_count (an integer representing the number of words excluding punctuation) Procedure CountWords BEGIN // Initialize an empty list to store words words ← [] // Retrieve a set of all punctuation characters punctuation ← GetPunctuationSet() // Tokenize the input text into individual words tokenized_words ← WordTokenize(text) // Iterate over the list of tokenized words FOR each word IN tokenized_words DO // If the word is not a punctuation character, add it to the words list IF word NOT IN punctuation THEN words.APPEND(word) END IF END FOR // Calculate the number of words in the words list word_count ← LENGTH(words) // Return the count of words RETURN word_count END End Algorithm |
Algorithm A2. Optimize_LLM_ Deployment |
Start // Initialize parameters and load data Initialize parameters: - U_CPU_max (maximum CPU usage) - M_max (maximum memory occupation) - B_max (maximum bandwidth occupation) - D_max (maximum data transmission delay) - Accuracy_min (minimum acceptable accuracy) - epsilon (convergence threshold) - T_max (maximum number of iterations) - x (decision variables including COT guidance intensity parameter s and other length control parameters) - P_prev (previous performance metric, initialized to 0) Load pre-trained LLM model: - model ← load_pretrained_LLM() // Load the pre-trained Large Language Model Load training and testing datasets: - train_data, test_data ← load_data() // Load training and testing datasets // Main loop starts While True: // Generation Phase: Generate outputs using the current model parameters generated_outputs ← model.generate(test_data, x) // Generate outputs based on current parameters // Evaluation Phase: Evaluate the generated outputs against constraints N_new ← evaluate_outputs(generated_outputs, test_data, U_CPU_max, M_max, B_max, D_max) // Count valid outputs // Performance Metric Calculation: Calculate the performance metric P P ← calculate_performance_metric(N_new, generated_outputs) // Calculate overall performance metric // Check Termination Conditions: Break the loop if convergence or max iterations reached If check_termination(P, P_prev, epsilon, t, T_max) Then Break // Feedback and Adjustment Phase: Update model parameters based on performance gradient ← calculate_gradient(P, x) // Compute gradient for parameter update x ← update_parameters(x, gradient, t) // Update parameters using gradient descent t ← t + 1 // Increment iteration count P_prev ← P // Update previous performance metric // Output Optimization Results: Return the optimized parameters and performance Return x, N_new, P |
Appendix B
- Scatter plots of AlpacaEval-LI.
- Scatter plots of MT-Bench-LI
References
- OpenAI. OpenAI o1 System Card. 12 September 2024. Available online: https://www.studocu.com/in/document/university-of-mumbai/bachelor-of-engineering/o1-system-card-amammsnske/121416702 (accessed on 12 September 2024).
- Wahidur, R.S.M.; Tashdeed, I.; Kaur, M.; Lee, H.N. Enhancing Zero-Shot Crypto Sentiment With Fine-Tuned Language Model and Prompt Engineering. IEEE Access 2024, 12, 10146–10159. [Google Scholar] [CrossRef]
- Naganawa, H.; Hirata, E. Enhancing Policy Generation with GraphRAG and YouTube Data: A Logistics Case Study. Electronics 2025, 14, 1241. [Google Scholar] [CrossRef]
- Dubois, Y.; Li, C.X.; Taori, R.; Zhang, T.; Gulrajani, I.; Ba, J.; Guestrin, C.; Liang, P.S.; Hashimoto, T.B. Alpacafarm: A Simulation Framework for Methods that Learn from Human Feedback. Adv. Neural Inf. Process. Syst. 2023, 36, 30039–30069. [Google Scholar]
- Dubois, Y.; Galambosi, B.; Liang, P.; Hashimoto, T.B. Length-Controlled Alpacaeval: A Simple Way to Debias Automatic Evaluators. arXiv 2024, arXiv:2404.04475. [Google Scholar]
- Ge, S.; Sun, Y.; Cui, Y.; Wei, D. An Innovative Solution to Design Problems: Applying the Chain-of-Thought Technique to Integrate LLM-based Agents with Concept Generation Methods. IEEE Access 2024, 13, 10499–10512. [Google Scholar] [CrossRef]
- Zheng, L.; Chiang, W.L.; Sheng, Y.; Zhuang, S.; Wu, Z.; Zhuang, Y.; Lin, Z.; Li, Z.; Li, D.; Xing, E.; et al. Judging LLM-as-a-Judge with MT-bench and Chatbot Arena. In Proceedings of the 37th Conference Neural Information Processing Systems Datasets and Benchmarks Track, New Orleans, LA, USA, 10 December 2023. [Google Scholar]
- Yuan, W.; Kulikov, I.; Yu, P.; Cho, K.; Sukhbaatar, S.; Weston, J.; Xu, J. Following Length Constraints in Instructions. arXiv 2024, arXiv:2406.17744. [Google Scholar]
- Singhal, P.; Goyal, T.; Xu, J.; Durrett, G. A Long Way to Go: Investigating Length Correlations in RLHF. arXiv 2023, arXiv:2310.03716. [Google Scholar]
- Park, R.; Rafailov, R.; Ermon, S.; Finn, C. Disentangling Length from Quality in Direct Preference Optimization. arXiv 2024, arXiv:2403.19159. [Google Scholar]
- Schulhoff, S.; Ilie, M.; Balepur, N.; Kahadze, K.; Liu, A.; Si, C.; Li, Y.; Gupta, A.; Han, H.; Schulhoff, S.; et al. The Prompt Report: A Systematic Survey of Prompting Techniques. arXiv 2024, arXiv:2406.06608. [Google Scholar]
- Casper, S.; Davies, X.; Shi, C.; Gilbert, T.K.; Scheurer, J.; Rando, J.; Freedman, R.; Korbak, T.; Lindner, D.; Freire, P.; et al. Open Problems and Fundamental Limitations of Reinforcement Learning from Human Feedback. arXiv 2023, arXiv:2307.15217. [Google Scholar]
- Zhao, H.; Andriushchenko, M.; Croce, F.; Flammarion, N. Long is More for Alignment: A Simple but Tough-to-Beat Baseline for Instruction Fine-Tuning. arXiv 2024, arXiv:2402.04833. [Google Scholar]
- Anthropic. Claude 3. 2024. Available online: https://www.anthropic.com/news/claude-3-family (accessed on 12 December 2024).
- Achiam, J.; Adler, S.; Agarwal, S.; Ahmad, L.; Akkaya, I.; Aleman, F.L.; Almeida, D.; Altenschmidt, J.; Altman, S.; Anadkat, S.; et al. Gpt-4 Technical Report. arXiv 2023, arXiv:2303.08774. [Google Scholar]
- Touvron, H.; Martin, L.; Stone, K.; Albert, P.; Almahairi, A.; Babaei, Y.; Bashlykov, N.; Batra, S.; Bhargava, P.; Bhosale, S.; et al. Llama 2: Open Foundation and Fine-Tuned Chat Models. arXiv 2023, arXiv:2307.09288. [Google Scholar]
- Bai, Y.; Jones, A.; Ndousse, K.; Askell, A.; Chen, A.; DasSarma, N.; Drain, D.; Fort, S.; Ganguli, D.; Henighan, T.; et al. Training a Helpful and Harmless Assistant with Reinforcement Learning from Human Feedback. arXiv 2022, arXiv:2204.05862. [Google Scholar]
- Köpf, A.; Kilcher, Y.; von Rütte, D.; Anagnostidis, S.; Rui Tam, Z.; Stevens, K.; Barhoum, A.; Nguyen, D.; Stanley, O.; Nagyfi, R.; et al. Openassistant Conversations-Democratizing Large Language Model Alignment. Adv. Neural Inf. Process. Syst. 2023, 36, 47669–47681. [Google Scholar]
- Fragakis, N.; Trichopoulos, G.; Caridakis, G. Empowering Education with Intelligent Systems: Exploring Large Language Models and the NAO Robot for Information Retrieval. Electronics 2025, 14, 1210. [Google Scholar] [CrossRef]
- Chakraborty, N.; Ornik, M.; Driggs-Campbell, K. Hallucination Detection in Foundation Models for Decision-Making: A Flexible Definition and Review of the State of the Art. arXiv 2024, arXiv:2403.16527. [Google Scholar] [CrossRef]
- Düring, B.; Georgiou, N.; Merino-Aceituno, S.; Scalas, E. Continuum and Thermodynamic Limits for a Simple Random-Exchange Model. Stoch. Process. Their Appl. 2022, 149, 248–277. [Google Scholar] [CrossRef]
- Wang, J. Hallucination Reduction and Optimization for Large Language Model-Based Autonomous Driving. Symmetry 2024, 16, 1196. [Google Scholar] [CrossRef]
- Tan, J.; Chen, G.; Huang, Z.; Liu, H.; Ang Jr, M.H. E-GAIL: Efficient GAIL through including negative corruption and long-term rewards for robotic manipulations. Appl. Intell. 2025, 55, 633. [Google Scholar] [CrossRef]
- Ji, Z.; Yu, T.; Xu, Y.; Lee, N.; Ishii, E.; Fung, P. Towards Mitigating Hallucination in Large Language Models via Self-Reflection. arXiv 2023, arXiv:2310.06271. [Google Scholar]
- Cauteruccio, F.; Marchetti, M.; Traini, D.; Ursino, D.; Virgili, L. Adaptive patch selection to improve Vision Transformers through Reinforcement Learning. Appl. Intell. 2025, 55, 607. [Google Scholar] [CrossRef]
- Dimitriadis, D.; Tsoumakas, G. Enhancing Yes/No Questions Through Weak Supervision by Extracting Question Answering. Appl. Intell. 2023, 53, 27560–27570. [Google Scholar] [CrossRef]
- Madaan, A.; Tandon, N.; Gupta, P.; Hallinan, S.; Gao, L.; Wiegreffe, S.; Alon, U.; Dziri, N.; Prabhumoye, S.; Yang, Y.; et al. Self-Refine: Iterative Refinement with Self-Feedback. Adv. Neural Inf. Process. Syst. 2023, 36, 46534–46594. [Google Scholar]
- Yashwanth, Y.S.; Shettar, R. Zero and Few Short Learning Using Large Language Models for De-Identification of Medical Records. IEEE Access 2024, 12, 110385–110393. [Google Scholar] [CrossRef]
- Sen, S.; Gonzalez, V.; Husom, E.J.; Tverdal, S.; Tokas, S.; Tjøsvoll, S.O. ERG-AI: Enhancing occupational ergonomics with uncertainty-aware ML and LLM feedback. Appl. Intell. 2024, 54, 12128–12155. [Google Scholar] [CrossRef]
Description | Instruction | <MAX_LEN> | Generation | Length |
---|---|---|---|---|
Just length instruction added | Answer the following instruction using <MAX_LEN> words or less. I’m trying to teach myself to have nicer handwriting. Can you help? | 20 | Yes, practice regularly with a good pen and paper, and focus on making each letter clear and consistent. | 18 |
40 | Yes, I can help. Start by selecting a pen that feels comfortable in your hand. Then, practice writing letters and words in a clear and consistent manner. Use a guide or template if needed to help improve..... | 49 | ||
80 | Sure, I can help. Start by analyzing your current handwriting and identifying areas for improvement. Use a pen with a smooth flow and identifying areas for improvement. Use a pen with a smooth flow and comfortable grip. Practice writing each letter and word slowly and deliberately..... | 79 | ||
Length instruction with COT introduced | Answer the following instruction using <MAX_LEN> words or less. I’m trying to teach myself to have nicer handwriting. Can you help? Let’s think step by step. | 20 | Practice slowly, focus on clarity & consistency. Use a guide if needed. Be patient. | 14 |
40 | Yes, start with basic strokes and letters. Practice regularly, focusing on consistency and neatness. Use a pen that’s comfortable to hold. Be patient and don’t give up. | 27 | ||
80 | Certainly! Start by identifying the areas you want to improve, such as letter spacing, size, or shape. Practice writing slowly and deliberately, focusing on the individual strokes and curves of each letter. Use a guide or template to help you maintain consistency. Gradually increase your speed and confidence as you become more comfortable with the new habits. Remember, it takes time and practice to develop a pleasing handwriting style, so be patient and keep at it! | 77 | ||
Original input without any length guidance | I’m trying to teach myself to have nicer handwriting. Can you help? | no word limitation of answer | Yes, I can definitely help you with improving your handwriting. Here are some tips to get you started: 1.Start with basic strokes: Practice writing straight lines, curves, and circles to improve your control and consistency. 2.Use a guide: A handwriting workbook or guide can help you learn proper letter formation and spacing....... | 156 |
Standard Models | AlpacaEval-LI | ||
---|---|---|---|
Vlt (%) | Acc (%) | Words | |
ERNIE Bot 3.5 + (LIFT-COT) | 35.7 | 64.3 | 163 |
ERNIE Bot 4.0 + (LIFT-COT) | 5.7 | 94.3 | 105 |
ERNIE Bot 4.0 Turbo + (LIFT-COT) | 4.7 | 95.3 | 100 |
Doubao + (LIFT-COT) | 0.9 | 99.1 | 64 |
Mistral Large2 + (LIFT-COT) | 15.6 | 84.4 | 168 |
Codestral + (LIFT-COT) | 5.9 | 94.1 | 104 |
Mistral Nemo + (LIFT-COT) | 2.9 | 97.1 | 66 |
Kimi + (LIFT-COT) | 9.6 | 90.4 | 137 |
Tongyi Qianwen 2.5 + (LIFT-COT) | 5.7 | 94.3 | 131 |
iFlytek Spark + (LIFT-COT) | 10.3 | 89.7 | 137 |
AlpacaEval-LI Baseline Model (without COT) | 6.2 | 93.8 | 180 |
Standard Models | MT-Bench-LI | ||
---|---|---|---|
Vlt (%) | Acc (%) | Words | |
ERNIE Bot 3.5 + (LIFT-COT) | 76.2 | 23.8 | 265 |
ERNIE Bot 4.0 + (LIFT-COT) | 15.4 | 84.6 | 131 |
ERNIE Bot 4.0 Turbo + (LIFT-COT) | 64.6 | 35.4 | 217 |
Doubao + (LIFT-COT) | 7.9 | 92.1 | 79 |
Mistral Large2 + (LIFT-COT) | 59.2 | 40.8 | 234 |
Codestral + (LIFT-COT) | 49.6 | 50.4 | 193 |
Mistral Nemo + (LIFT-COT) | 34.2 | 65.8 | 176 |
Kimi + (LIFT-COT) | 66.2 | 33.8 | 207 |
Tongyi Qianwen 2.5 + (LIFT-COT) | 30.4 | 69.6 | 160 |
iFlytek Spark + (LIFT-COT) | 23.7 | 76.3 | 145 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Chen, P.; Li, Z. Length Instruction Fine-Tuning with Chain-of-Thought (LIFT-COT): Enhancing Length Control and Reasoning in Edge-Deployed Large Language Models. Electronics 2025, 14, 1662. https://doi.org/10.3390/electronics14081662
Chen P, Li Z. Length Instruction Fine-Tuning with Chain-of-Thought (LIFT-COT): Enhancing Length Control and Reasoning in Edge-Deployed Large Language Models. Electronics. 2025; 14(8):1662. https://doi.org/10.3390/electronics14081662
Chicago/Turabian StyleChen, Pinzhe, and Zhen Li. 2025. "Length Instruction Fine-Tuning with Chain-of-Thought (LIFT-COT): Enhancing Length Control and Reasoning in Edge-Deployed Large Language Models" Electronics 14, no. 8: 1662. https://doi.org/10.3390/electronics14081662
APA StyleChen, P., & Li, Z. (2025). Length Instruction Fine-Tuning with Chain-of-Thought (LIFT-COT): Enhancing Length Control and Reasoning in Edge-Deployed Large Language Models. Electronics, 14(8), 1662. https://doi.org/10.3390/electronics14081662