Tiny Language Models for Automation and Control: Overview, Potential Applications, and Future Research Directions
Abstract
:1. Introduction
1.1. Background
1.2. Paper Motivation
1.3. Main Contributions
- Comprehensive Survey on Tiny Language Models: This paper presents the first comprehensive survey focused on TLMs designed specifically for NLP applications on resource-limited devices.
- Comparison of TLM Architectures and Optimization Techniques: This survey provides an in-depth comparison of various TLM architectures, covering model sizes, structures, and performance across applications. This includes a detailed analysis of TLM adaptations to resource constraints, such as knowledge distillation, quantization, pruning.
- Discussion of Potential TLMs Applications in Real-World Scenarios: This survey explores the potential applications of TLMs across a range of real-world scenarios.
- Exploration of Challenges and Future Research Directions: This survey addresses current limitations of TLMs, offering a discussion on emerging solutions and future research directions, including hybrid compression strategies, development of domain-specific TLMs, and context-aware adaptations for varied hardware and deployment settings.
1.4. Survey Structure
2. Techniques for Reducing Model Size
2.1. Knowledge Distillation
Algorithm 1 Knowledge Distillation |
|
2.2. Quantization
- Low-Precision Floating-Point Formats:This approach uses lower-precision floating-point numbers instead of the typical 32-bit floating-point (FP32) representation, thus reducing memory usage while preserving a wide range of representable values. Common formats include FP16 and Bfloat16, each utilizing 16 bits [57]. A floating-point number can be expressed as:
- Fixed-Point Representation:Here, numbers are stored with a fixed number of digits before and after a binary point [58], leading to faster, simpler calculations. Fixed-point representation, as opposed to floating-point, fixes the position of the binary point, making it ideal for hardware applications. In general, a fixed-point number is defined by:
- Binarization and Ternarization:These extreme quantization methods limit parameters to two or three values, respectively.Binarization [59] represents weights as either or 1, computed by:Ternarization [60] maps weights to values of −1, 0, or 1, and is defined by:
- Logarithmic Quantization:This technique restricts values to powers of two, which allows for efficient storage and computation by employing simple bit-shift operations [61]. Logarithmic quantization is expressed as:The value is scaled afterward to approximate the original parameter, providing an effective solution for energy-sensitive applications.
2.3. Pruning and Sparsification
- Structured Pruning: Removes entire channels, neurons, or layers.
- Unstructured Pruning: Removes individual weights based on their magnitude.
- Pruning function:
Algorithm 2 Pruning Process |
|
2.4. Efficient Architectures
Algorithm 3 Efficient Architecture Design |
|
3. Overview of Tiny Language Models
3.1. Definition and Importance of TLMs
3.2. Key Characteristics of Tiny Language Models
3.2.1. Size and Complexity
3.2.2. Efficiency
3.2.3. Flexibility
3.3. Benefits of Tiny Language Models
3.4. Architecture of TLMs
Transformer for TLMs
3.5. Common Training Datasets for TLMs
- RefinedWeb [89]: A high-quality dataset sourced from CommonCrawl, carefully filtered to ensure the retention of valuable web content.
- CulturaX [90]: A comprehensive multilingual dataset spanning 167 languages, designed for cross-cultural and multilingual model training.
- FineWeb-Edu [91]: An educationally focused dataset derived from the broader FineWeb corpus, specifically curated for instructional content.
- The Pile [92]: A diverse mixture of smaller datasets encompassing multiple domains, making it a foundational resource for pretraining.
- RedPajama [93]: A dataset comprising over 100 billion text documents, extracted from 84 CommonCrawl snapshots and processed via the CCNet pipeline.
- Cosmopedia [94]: A synthetic dataset featuring textbooks, stories, blogs, WikiHow articles, and posts, generated using the Mixtral-8x7B-Instruct-v0.1 model.
- RoBERTa [95] CCNewsV2: A dataset that includes an updated version of the English text from the CommonCrawl News corpus, specifically curated for training robust models.
- WuDaoCorpora [96]: A massive Chinese corpus with approximately 3 trillion tokens and 1.08 trillion Chinese characters, designed for large-scale language modeling.
- DCLM-baseline [97]: Built from Common Crawl data, this dataset includes effective pretraining strategies using the OpenLM framework, along with evaluations across 53 downstream tasks.
- Dolma [98]: An English-language corpus that employs MinHash algorithms for deduplication both within and across datasets.
- StarCoder [99]: A domain-specific dataset focused on Python programming language tokens, suitable for code-related language modeling tasks.
- PushShift.io Reddit [100]: A social media archive containing Reddit data collected since 2015, aimed at enabling social media analysis and research.
3.6. Popular Tiny Language Models
3.6.1. Transformer-Based Encoder-Only Models
3.6.2. Transformer-Based Decoder-Only Models
3.6.3. Transformer-Based Encoder-Decoder Models
3.6.4. Multimodal Models
3.6.5. Specialized Architectures
3.7. Discussion
3.7.1. Optimization Strategies and Architectural Innovations
3.7.2. Deployment and Adaptability Across Diverse Environments
- -
- MobileBERT achieves a 4× speedup over BERT on mobile CPUs by incorporating bottleneck transformers and low-rank matrix decomposition.
- -
- TeenyTinyLlama demonstrates the potential of ultra-compact models for on-device AI assistants, reducing latency by over 60% compared to standard lightweight models.
- -
- MobileVLM, a multimodal variant of TLMs, extends small-scale transformer models to handle vision-language tasks, making it particularly useful for augmented reality (AR) and real-time visual assistance.
Part 1 | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
Model Name | Date | Hidden Size | Size | Layer Number | Head Num | Attention | Activation | Vocab. Size | Max Context Window | Open Training Datasets |
Pythia [27] | 2023 | 768 | 160 M | 12 | 12 | MHA | GELU | 50k | 2k | ✔ |
1024 | 410 M | 24 | 16 | |||||||
2048 | 1 B | 16 | 8 | |||||||
2048 | 1.4 B | 24 | 16 | |||||||
2560 | 2.8 B | 32 | 32 | |||||||
Bloomz [120] | 2022 | 1536 | 1.1 B | MHA | ||||||
1024 | 560 M | 16 | GELU, tanh | 251k | 2k | ✔ | ||||
Bloom [121] | 2022 | 1024 | 560 M | 24 | 16 | MHA | GELU, tanh | 251k | 2k | ✔ |
1536 | 1.1 B | 24 | ||||||||
OPT [122] | 2022 | 768 | 125 M | 12 | 12 | MHA | ReLU | 50k | 2k | ✔ |
1024 | 350 M | 24 | 16 | |||||||
2048 | 1.3 B | 24 | 32 | |||||||
2560 | 2.7 B | 32 | 32 | |||||||
Cerebras-GPT [114] | 2023 | 768 | 111 M | 10 | 12 | MHA | GELU | 50k | 2k | ✔ |
1088 | 256 M | 14 | 17 | |||||||
1536 | 590 M | 18 | 12 | |||||||
2048 | 1.3 B | 24 | 16 | |||||||
2560 | 2.7 B | 32 | 32 | |||||||
Galactica [123] | 2022 | 768 | 125 M | 12 | 12 | MHA | GELU | 50k | 2k | |
2048 | 1.3 B | 24 | 32 | |||||||
Phi-3.5-mini | 2024 | 3072 | 2.7 B | 32 | 32 | MHA | SiLU | 32k | 4k | |
Phi-3-mini [124] | 2024 | 3072 | 3.8 B | 32 | 32 | MHA | SiLU | 32k | 4k | |
Phi-2 [125] | 2023 | 2560 | 2.7 B | 32 | 32 | MHA | GELU, tanh | 51k | 2k | |
Phi-1.5 [126] | 2023 | 2048 | 1.3 B | 24 | 32 | MHA | GELU, tanh | 51k | ||
Phi-1 [127] | 2023 | 2048 | 1.3 B | 24 | 32 | MHA | GELU, tanh | 51k | 2k | |
StableLM-2-zephyr [29] | 2024 | 2048 | 1.6 B | 24 | 32 | MHA | SiLU | 100k | 4k | ✔ |
StableLM-zephyr [128] | 2023 | 2560 | 3 B | 32 | 32 | MHA | SiLU | 50k | 1k | ✔ |
MobilLlaMA [129] | 2023 | 2048 | 1.4 B | 24 | 16 | GQA | SiLU | 32k | 2k | ✔ |
TinyLlama [34] | 2023 | 2048 | 1.1 B | 22 | 32 | GQA | SiLU | 32k | 2k | ✔ |
MobiLlama [106] | 2024 | 2048 | 0.5 B | 22 | 32 | GQA | SiLU | 32k | 2k | ✔ |
1 B | ||||||||||
Gemma [36] | 2024 | 2048 | 2 B | 18 | 8 | MQA | GELU | 256k | 8k | |
recurrentGemma [130] | 2024 | 2560 | 2 B | 26 | 10 | MQA | GELU, tanh | 256k | 8k | |
Gemma-2 [131] | 2024 | 2304 | 2 B | 26 | 8 | GQA | GELU, tanh | 256k | 8k | |
LaMini-GPT [132] | 2023 | 1280 | 774 M | 36 | 20 | MHA | GELU, tanh | 50k | 1k | |
1600 | 1.5 B | 48 | 25 | |||||||
MiniCPM3 [133] | 2024 | 2560 | 4 B | 62 | 40 | MLA | SiLU | 73k | ||
MiniCPM [41] | 2024 | 1536 | 1 B | 52 | 24 | GQA | SiLU | 73k | 128k | |
2304 | 2 B | 40 | 36 | 123k | 131k | |||||
Model Name | Date | Layer Number | Hidden Size | Size | Head Num | Attention | Activation | Vocab. Size | Max Context Window | Open Training Datasets |
Toyota DCLM [134] | 2024 | 24 | 2048 | 1.4 B | 16 | MHA | SiLU | 50k | 50k | ✔ |
SmolLM [104] | 2024 | 30 | 576 | 135 M | 9 | GQA | SiLU | 49k | 2k | ✔ |
32 | 960 | 360 M | 15 | GQA | 49k | 2k | ✔ | |||
24 | 2048 | 1.7 B | 32 | MHA | ||||||
AllenAI OLMo [35] | 2024 | 16 | 2048 | 1.18 B | 16 | MHA | SiLU | 50k | 50k | ✔ |
OpenELM [40] | 2024 | 16 | 1280 | 270 M | 12–20 | GQA | SiLU | 32k | 2k | ✔ |
20 | 1536 | 450 M | 12–24 | 32k | 2k | ✔ | ||||
28 | 2048 | 1.1 B | 16–32 | 32k | 2k | ✔ | ||||
36 | 3072 | 3 B | 12–24 | 32k | 2k | ✔ | ||||
DataBricks Dolly-v2 [28] | 2023 | 32 | 2560 | 3 B | 32 | MHA | GELU | 50k | 2k | |
Danube3 [135] | 2024 | 16 | 1536 | 0.5 B | 16 | GQA | SiLU | 32k | 8k | |
24 | 3840 | 4 B | 32 | SiLU | 32k | 8k | ||||
Qwen 2.5 [136] | 2024 | 24 | 896 | 0.5 B | 14 | GQA | SiLU | 152k | 32k | |
28 | 1536 | 1.5 B | 12 | 152k | 32k | |||||
36 | 2048 | 3 B | 16 | 152k | 32k | |||||
Qwen 2 [137] | 2024 | 24 | 2048 | 1.8 B | 16 | MHA | SiLU | 152k | 32k | |
40 | 2560 | 4 B | 20 | 32k | ||||||
Qwen 1.5 [33] | 2024 | 24 | 1024 | 0.5 B | 16 | MHA | SiLU | 152k | 32k | |
Qwen 1 [138] | 2023 | 24 | 2048 | 1.8 B | 16 | MHA | SiLU | 152k | 8k | |
Fox [139] | 2024 | 32 | 2048 | 1.6 B | 16 | GQA | SiLU | 32k | 8k |
3.7.3. TLMs for Linguistic and Regional Accessibility
- -
- Chinese Tiny LLM leverages multi-stage distillation to retain Chinese linguistic richness while reducing parameter count by 80%, making it viable for government and business applications.
- -
- IndicTLM, a small-scale NLP model trained on Indian languages, employs subword tokenization and corpus filtering to optimize performance across more than 10 Indian languages, achieving state-of-the-art results on low-resource benchmarks.
3.7.4. TLMs in Domain-Specific Applications
- -
- Architext GPT-J-162M: Trained on architectural datasets, enabling automated floor plan generation based on textual inputs.
- -
- CodeT5+: A lightweight model designed for code generation, debugging, and auto-completion, with quantized variants achieving a 2.5× acceleration in inference speed.
- -
- SciTinyBERT: Fine-tuned on scientific literature, optimized for research paper summarization and question answering.
4. Potential and Emerging Applications of TLMs in Automation and Control
4.1. Edge Computing and IoT
4.2. Natural Language Interfaces in Industrial Systems
4.3. Diagnostics and Predictive Maintenance
5. Challenges and Limitations of Tiny Language Models
5.1. Trade-Off Between Size and Accuracy
5.2. Limited Generalization Capabilities
5.3. Data Efficiency and Training Constraints
5.4. Ethical and Security Concerns
5.5. Reliability in Real-Time Applications
6. Future Directions for TLMs in Automation
6.1. Hybrid and Adaptive Compression Techniques
6.2. Application-Specific TLMs
6.3. Context-Aware and Hardware-Specific Models
7. Conclusions and Future Directions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Zhao, W.X.; Qin, J.; Wang, Y.; Liu, Y.; Yang, Z.; Gao, Z. A Survey of Large Language Models. arXiv 2023, arXiv:2303.18223. [Google Scholar]
- Naveed, H.; Ali, A.; Rehman, K.; Hashmi, M.A. A Comprehensive Overview of Large Language Models. arXiv 2023, arXiv:2307.06435. [Google Scholar]
- Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar]
- Kalyan, K.S. A Survey of GPT-3 Family Large Language Models Including ChatGPT and GPT-4. Nat. Lang. Process. J. 2023, 6, 100048. [Google Scholar] [CrossRef]
- Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Zhou, Y.; Li, W.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
- Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.A.; Lacroix, T.; Rozière, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. Llama: Open and Efficient Foundation Language Models. arXiv 2023, arXiv:2302.13971. [Google Scholar]
- Chowdhery, A.; Narang, S.; Devlin, J.; Bosma, M.; Mishra, G.; Roberts, A.; Barham, P.; Chung, H.W.; Sutton, C.; Gehrmann, S.; et al. Palm: Scaling Language Modeling with Pathways. J. Mach. Learn. Res. 2023, 24, 1–113. [Google Scholar]
- Li, J.; Tang, T.; Zhao, W.X.; Nie, J.Y.; Wen, J.R. Pre-Trained Language Models for Text Generation: A Survey. ACM Comput. Surv. 2024, 56, 1–39. [Google Scholar] [CrossRef]
- Bhat, M.M.; Meng, R.; Liu, Y.; Zhou, Y.; Yavuz, S. Investigating Answerability of LLMs for Long-Form Question Answering. arXiv 2023, arXiv:2309.08210. [Google Scholar]
- Jin, H.; Zhang, Y.; Meng, D.; Wang, J.; Tan, J. A Comprehensive Survey on Process-Oriented Automatic Text Summarization with Exploration of LLM-Based Methods. arXiv 2024, arXiv:2403.02901. [Google Scholar]
- Huang, H.; Wu, S.; Liang, X.; Wang, B.; Shi, Y.; Wu, P.; Zhao, T. Towards Making the Most of LLM for Translation Quality Estimation. In Proceedings of the CCF International Conference on Natural Language Processing and Chinese Computing, Foshan, China, 12–15 October 2023; Springer: Cham, Switzerland, 2023; pp. 375–386. [Google Scholar]
- Moujahid, H.; Boutahar, K.; El Gannour, O.; Saleh, S.; Cherradi, B.; El Abbassi, A. A Scoping Review of Large Language Models: Architecture and Applications. In Proceedings of the 2024 4th International Conference on Innovative Research in Applied Science, Engineering and Technology (IRASET), Fez, Morocco, 16–17 May 2024; pp. 1–7. [Google Scholar]
- Zhang, D.; Yu, Y.; Dong, J.; Li, C.; Su, D.; Chu, C.; Yu, D. MM-LLMs: Recent Advances in Multimodal Large Language Models. arXiv 2024, arXiv:2401.13601. [Google Scholar]
- Patil, R.; Gudivada, V. A Review of Current Trends, Techniques, and Challenges in Large Language Models (LLMs). Appl. Sci. 2024, 14, 2074. [Google Scholar] [CrossRef]
- Bai, G.; Chai, Z.; Ling, C.; Wang, S.; Lu, J.; Zhang, N.; Shi, T.; Yu, Z.; Zhu, M.; Zhang, Y.; et al. Beyond Efficiency: A Systematic Survey of Resource-Efficient Large Language Models. arXiv 2024, arXiv:2401.00625. [Google Scholar]
- Bhardwaj, S.; Singh, P.; Pandit, M.K. A Survey on the Integration and Optimization of Large Language Models in Edge Computing Environments. In Proceedings of the 2024 16th International Conference on Computer and Automation Engineering (ICCAE), Melbourne, Australia, 14–16 March 2024; pp. 168–172. [Google Scholar]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Zhu, X.; Li, J.; Liu, Y.; Ma, C.; Wang, W. A Survey on Model Compression for Large Language Models. arXiv 2023, arXiv:2308.07633. [Google Scholar] [CrossRef]
- Xu, C.; McAuley, J. A Survey on Model Compression and Acceleration for Pretrained Language Models. In Proceedings of the AAAI Conference on Artificial Intelligence, Washington, DC, USA, 7–14 February 2023; Volume 37, pp. 10566–10575. [Google Scholar]
- Lamaakal, I.; Ouahbi, I.; El Makkaoui, K.; Maleh, Y.; Pławiak, P.; Alblehai, F. A TinyDL Model for Gesture-Based Air Handwriting Arabic Numbers and Simple Arabic Letters Recognition. IEEE Access 2024, 12, 76589–76605. [Google Scholar] [CrossRef]
- Lamaakal, I.; Maleh, Y.; Ouahbi, I.; El Makkaoui, K.; Abd El-Latif, A.A. A Deep Learning-Powered TinyML Model for Gesture-Based Air Handwriting Simple Arabic Letters Recognition. In Proceedings of the International Conference on Digital Technologies and Applications, Ningbo, China, 30–31 May 2024; Springer: Cham, Switzerland, 2024; pp. 32–42. [Google Scholar]
- Lamaakal, I.; El Mourabit, N.; El Makkaoui, K.; Ouahbi, I.; Maleh, Y. Efficient Gesture-Based Recognition of Tifinagh Characters in Air Handwriting with a TinyDL Model. In Proceedings of the 2024 Sixth International Conference on Intelligent Computing in Data Sciences (ICDS), Marrakech, Morocco, 23–24 October 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–8. [Google Scholar]
- Lamaakal, I.; El Makkaoui, K.; Ouahbi, I.; Maleh, Y. A TinyML Model for Gesture-Based Air Handwriting Arabic Numbers Recognition. Procedia Comput. Sci. 2024, 236, 589–596. [Google Scholar] [CrossRef]
- Tang, Y.; Liu, F.; Ni, Y.; Tian, Y.; Bai, Z.; Hu, Y.Q.; Liu, S.; Jui, S.; Han, K.; Wang, Y. Rethinking Optimization and Architecture for Tiny Language Models. arXiv 2024, arXiv:2402.02791. [Google Scholar]
- Samie, F.; Bauer, L.; Henkel, J. IoT Technologies for Embedded Computing: A Survey. In Proceedings of the Eleventh IEEE/ACM/IFIP International Conference on Hardware/Software Codesign and System Synthesis, Pittsburgh, PA, USA, 1–7 October 2016; pp. 1–10. [Google Scholar]
- Dey, N.; Gosal, G.; Khachane, H.; Marshall, W.; Pathria, R.; Tom, M.; Hestness, J. Cerebras-GPT: Open Compute-Optimal Language Models Trained on the Cerebras Wafer-Scale Cluster. arXiv 2023, arXiv:2304.03208. Available online: https://arxiv.org/abs/2304.03208 (accessed on 22 January 2025).
- Biderman, S.; Schoelkopf, H.; Anthony, Q.G.; Bradley, H.; O’Brien, K.; Hallahan, E.; Khan, M.A.; Purohit, S.; Prashanth, U.S.; Raff, E.; et al. Pythia: A Suite for Analyzing Large Language Models Across Training and Scaling. In Proceedings of the International Conference on Machine Learning, Zhuhai, China, 17–20 February 2023; pp. 2397–2430. [Google Scholar]
- DataBricks. Databricks/Dolly-v2-3b. Available online: https://huggingface.co/databricks/dolly-v2-3b (accessed on 25 January 2025).
- StabilityAI. Stabilityai/Stablelm-2-zephyr. Available online: https://huggingface.co/stabilityai/stablelm-2-zephyr-1_6b (accessed on 22 January 2025).
- Yang, K.; Zhang, T.; Kuang, Z.; Xie, Q.; Huang, J.; Ananiadou, S. MentaLLaMA: Interpretable Mental Health Analysis on Social Media with Large Language Models. In Proceedings of the ACM on Web Conference 2024, Singapore, 13–17 May 2024; pp. 4489–4500. [Google Scholar]
- Nguyen, T.D.; Ting, Y.S.; Ciucă, I.; O’Neill, C.; Sun, Z.C.; Jabłońska, M.; Schawinski, K. AstroLLaMA: Towards Specialized Foundation Models in Astronomy. arXiv 2023, arXiv:2309.06126. [Google Scholar]
- Bi, Z.; Zhang, N.; Xue, Y.; Ou, Y.; Ji, D.; Zheng, G.; Chen, H. OceanGPT: A Large Language Model for Ocean Science Tasks. arXiv 2023, arXiv:2310.02031. [Google Scholar]
- Alibaba. Qwen 1.5. Available online: https://huggingface.co/collections/Qwen/qwen15-65c0a2f577b1ecb76d786524 (accessed on 24 January 2025).
- Zhang, P.; Zeng, G.; Wang, T.; Lu, W. TinyLLama: An Open-Source Small Language Model. arXiv 2024, arXiv:2401.02385. Available online: https://arxiv.org/abs/2401.02385 (accessed on 22 January 2025).
- AllenAI. Allenai/OLMo-1B-hf. Available online: https://huggingface.co/allenai/OLMo-1B-hf (accessed on 25 January 2025).
- Mesnard, T.; Hardin, C.; Dadashi, R.; Bhupatiraju, S.; Pathak, S.; Sifre, L.; Rivière, M.; Kale, M.S.; Love, J.; Tafti, P.; et al. Gemma: Open Models Based on Gemini Research and Technology. arXiv 2024, arXiv:2403.08295. Available online: https://arxiv.org/abs/2403.08295 (accessed on 22 January 2025).
- Zhang, D.; Hu, Z.; Zhoubian, S.; Du, Z.; Yang, K.; Wang, Z.; Tang, J. SciInstruct: A Self-Reflective Instruction Annotated Dataset for Training Scientific Language Models. In Proceedings of the Thirty-Eighth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, Vancouver, BC, Canada, 10–15 December 2024. [Google Scholar]
- Zhang, D.; Liu, W.; Tan, Q.; Chen, J.; Yan, H.; Yan, Y.; Ouyang, W. ChemLLM: A Chemical Large Language Model. arXiv 2024, arXiv:2402.06852. [Google Scholar]
- Abdin, M.; Aneja, J.; Awadalla, H.; Awadallah, A.; Awan, A.A.; Bach, N.; Bahree, A.; Bakhtiari, A.; Bao, J.; Behl, H.; et al. Phi-3 Technical Report: A Highly Capable Language Model Locally on Your Phone. arXiv 2024, arXiv:2404.14219. Available online: https://arxiv.org/abs/2404.14219 (accessed on 22 January 2025).
- Mehta, S.; Sekhavat, M.H.; Cao, Q.; Horton, M.; Jin, Y.; Sun, C.; Mirzadeh, S.I.; Najibi, M.; Belenko, D.; Zatloukal, P.; et al. OpenELM: An Efficient Language Model Family with Open Training and Inference Framework. In Proceedings of the Workshop on Efficient Systems for Foundation Models II@ICML2024, Vienna, Austria, 26 July 2024. [Google Scholar]
- OpenBMB. MiniCPM. Available online: https://huggingface.co/openbmb/MiniCPM-V (accessed on 25 January 2025).
- Bangemann, T.; Karnouskos, S.; Camp, R.; Carlsson, O.; Riedl, M.; McLeod, S.; Harrison, R.; Colombo, A.W.; Stluka, P. State of the Art in Industrial Automation. In Industrial Cloud-Based Cyber-Physical Systems: The IMC-AESOP Approach; Springer: Cham, Switzerland, 2014; pp. 23–47. [Google Scholar]
- Wang, X.; Dang, T.; Kostakos, V.; Jia, H. Efficient and Personalized Mobile Health Event Prediction via Small Language Models. arXiv 2024, arXiv:2409.18987. [Google Scholar]
- Wu, C.K.; Cheng, C.T.; Uwate, Y.; Chen, G.; Mumtaz, S.; Tsang, K.F. State-of-the-Art and Research Opportunities for Next-Generation Consumer Electronics. IEEE Trans. Consum. Electron. 2022, 69, 937–948. [Google Scholar] [CrossRef]
- Fadhil, J.A.; Omar, O.A.; Sarhan, Q.I. A Survey on the Applications of Smart Home Systems. In Proceedings of the 2020 International Conference on Computer Science and Software Engineering (CSASE), Duhok, Iraq, 16–18 April 2020; pp. 168–173. [Google Scholar]
- McGuire, H.; Weigl, B.H. Medical Devices and Diagnostics for Cardiovascular Diseases in Low-Resource Settings. J. Cardiovasc. Transl. Res. 2014, 7, 737–748. [Google Scholar] [CrossRef]
- Wang, X.; Li, Y.; Kwok, K.W. A Survey for Machine Learning-Based Control of Continuum Robots. Front. Robot. AI 2021, 8, 730330. [Google Scholar] [CrossRef]
- Wang, F.; Zhang, Z.; Zhang, X.; Wu, Z.; Mo, T.; Lu, Q.; Wang, S. A Comprehensive Survey of Small Language Models in the Era of Large Language Models: Techniques, Enhancements, Applications, Collaboration with LLMs, and Trustworthiness. arXiv 2024, arXiv:2411.03350. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Gu, Y.; Dong, L.; Wei, F.; Huang, M. Knowledge Distillation of Large Language Models. arXiv 2023, arXiv:2306.08543. [Google Scholar]
- Gu, Y.; Dong, L.; Wei, F.; Huang, M. MiniLLM: Knowledge Distillation of Large Language Models. In Proceedings of the Twelfth International Conference on Learning Representations, Vienna, Austria, 7–11 May 2024. [Google Scholar]
- Cho, J.H.; Hariharan, B. On the Efficacy of Knowledge Distillation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27 October–2 November 2019; pp. 4794–4802. [Google Scholar]
- Yao, Z.; Dong, Z.; Zheng, Z.; Gholami, A.; Yu, J.; Tan, E.; Wang, L.; Huang, Q.; Wang, Y.; Mahoney, M.; et al. Hawq-v3: Dyadic Neural Network Quantization. In Proceedings of the International Conference on Machine Learning, Virtual, 18–24 July 2021; pp. 11875–11886. [Google Scholar]
- Hubara, I.; Courbariaux, M.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized Neural Networks. In Advances in Neural Information Processing Systems, Proceedings of the Annual Conference on Neural Information Processing Systems 2016, Barcelona, Spain, 5–10 December 2016; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2016; Volume 29. [Google Scholar]
- Jin, R.; Du, J.; Huang, W.; Liu, W.; Luan, J.; Wang, B.; Xiong, D. A Comprehensive Evaluation of Quantization Strategies for Large Language Models. arXiv 2024, arXiv:2402.16775. [Google Scholar]
- Liu, P.; Liu, Z.; Gao, Z.F.; Gao, D.; Zhao, W.X.; Li, Y.; Ding, B.; Wen, J.R. Do Emergent Abilities Exist in Quantized Large Language Models: An Empirical Study. arXiv 2023, arXiv:2307.08072. [Google Scholar]
- Wang, S.; Kanwar, P. BFloat16: The Secret to High Performance on Cloud TPUs. Google Cloud Blog. 2021. Available online: https://cloud.google.com/blog/products/ai-machine-learning/bfloat16-the-secret-to-high-performance-on-cloud-tpus (accessed on 28 October 2024).
- Goyal, R.; Vanschoren, J.; van Acht, V.; Nijssen, S. Fixed-Point Quantization of Convolutional Neural Networks for Quantized Inference on Embedded Platforms. In Proceedings of the 33rd International Conference on Machine Learning, New York, NY, USA, 19–24 June 2021. [Google Scholar]
- Yuan, C.; Agaian, S.S. A Comprehensive Review of Binary Neural Network. Artif. Intell. Rev. 2023, 56, 12949–13013. [Google Scholar] [CrossRef]
- Liu, B.; Li, F.; Wang, X.; Zhang, B.; Yan, J. Ternary Weight Networks. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
- Lee, E.H.; Miyashita, D.; Chai, E.; Murmann, B.; Wong, S.S. LogNet: Energy-Efficient Neural Networks Using Logarithmic Computation. In Proceedings of the ICASSP, IEEE International Conference on Acoustics, Speech and Signal Processing, New Orleans, LA, USA, 5–9 March 2017; pp. 5900–5904. [Google Scholar] [CrossRef]
- Lee, N.; Ajanthan, T.; Torr, P.H. Snip: Single-Shot Network Pruning Based on Connection Sensitivity. arXiv 2018, arXiv:1810.02340. [Google Scholar]
- Liu, Z.; Mu, H.; Zhang, X.; Guo, Z.; Yang, X.; Cheng, K.T.; Sun, J. Metapruning: Meta Learning for Automatic Neural Network Channel Pruning. In Proceedings of the IEEE/CVF International Conference on Computer Vision, Seoul, Republic of Korea, 27–28 October 2019; pp. 3296–3305. [Google Scholar]
- Dery, L.; Kolawole, S.; Kagy, J.F.; Smith, V.; Neubig, G.; Talwalkar, A. Everybody Prune Now: Structured Pruning of LLMs with Only Forward Passes. arXiv 2024, arXiv:2402.05406. [Google Scholar]
- Ma, X.; Fang, G.; Wang, X. LLM-Pruner: On the Structural Pruning of Large Language Models. Adv. Neural Inf. Process. Syst. 2023, 36, 21702–21720. [Google Scholar]
- Hillier, D.; Guertler, L.; Tan, C.; Agrawal, P.; Ruirui, R.C.; Cheng, B. Super Tiny Language Models. arXiv 2024, arXiv:2405.14159. [Google Scholar]
- Jiao, X.; Yin, Y.; Shang, L.; Jiang, X.; Chen, X.; Li, L.; Wang, F.; Liu, Q. TinyBERT: Distilling BERT for Natural Language Understanding. arXiv 2019, arXiv:1909.10351. [Google Scholar]
- Sun, Z.; Yu, H.; Song, X.; Liu, R.; Yang, Y.; Zhou, D. MobileBERT: A Compact Task-Agnostic BERT for Resource-Limited Devices. arXiv 2020, arXiv:2004.02984. [Google Scholar]
- Magister, L.C.; Mallinson, J.; Adamek, J.; Malmi, E.; Severyn, A. Teaching Small Language Models to Reason. arXiv 2022, arXiv:2212.08410. [Google Scholar]
- Scherer, M.; Macan, L.; Jung, V.J.; Wiese, P.; Bompani, L.; Burrello, A.; Conti, F.; Benini, L. Deeploy: Enabling Energy-Efficient Deployment of Small Language Models on Heterogeneous Microcontrollers. arXiv 2024, arXiv:2408.04413. [Google Scholar] [CrossRef]
- Mitchell, E.; Rafailov, R.; Sharma, A.; Finn, C.; Manning, C.D. An Emulator for Fine-Tuning Large Language Models Using Small Language Models. arXiv 2023, arXiv:2310.12962. [Google Scholar]
- Lu, Z.; Li, X.; Cai, D.; Yi, R.; Liu, F.; Zhang, X.; Xu, M. Small Language Models: Survey, Measurements, and Insights. arXiv 2024, arXiv:2409.15790. [Google Scholar]
- Jiang, A.Q.; Sablayrolles, A.; Roux, A.; Mensch, A.; Savary, B.; Bamford, C.; Chaplot, D.S.; Casas, D.D.L.; Hanna, E.B.; Bress, F.; et al. Mixtral of Experts. arXiv 2024, arXiv:2401.04088. [Google Scholar]
- Das, B.C.; Amini, M.H.; Wu, Y. Security and Privacy Challenges of Large Language Models: A Survey. ACM Comput. Surv. 2024, 57, 152. [Google Scholar] [CrossRef]
- Vaswani, A. Attention is All You Need. In Advances in Neural Information Processing Systems, Proceedings of the 31st Annual Conference on Neural Information Processing Systems (NIPS 2017), Long Beach, CA, USA, 4–9 December 2017; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2017. [Google Scholar]
- Shazeer, N. Fast Transformer Decoding: One Write-Head Is All You Need. arXiv 2019, arXiv:1911.02150. [Google Scholar]
- Ainslie, J.; Lee-Thorp, J.; de Jong, M.; Zemlyanskiy, Y.; Lebrón, F.; Sanghai, S. GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints. arXiv 2023, arXiv:2305.13245. [Google Scholar]
- Liu, A.; Feng, B.; Wang, B.; Wang, B.; Liu, B.; Zhao, C.; Deng, C.; Ruan, C.; Dai, D.; Guo, D.; et al. DeepSeek-v2: A Strong, Economical, and Efficient Mixture-of-Experts Language Model. arXiv 2024, arXiv:2405.04434. [Google Scholar]
- Dao, T. FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning. arXiv 2023, arXiv:2307.08691. [Google Scholar]
- Dao, T.; Fu, D.; Ermon, S.; Rudra, A.; Ré, C. FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness. Adv. Neural Inf. Process. Syst. 2022, 35, 16344–16359. [Google Scholar]
- Csordás, R.; Irie, K.; Schmidhuber, J. Approximating Two-Layer Feedforward Networks for Efficient Transformers. arXiv 2023, arXiv:2310.10837. [Google Scholar]
- Agarap, A.F. Deep Learning Using Rectified Linear Units (ReLU). arXiv 2018, arXiv:1803.08375. [Google Scholar]
- Hendrycks, D.; Gimpel, K. Gaussian Error Linear Units (GELUs). arXiv 2016, arXiv:1606.08415. [Google Scholar]
- Elfwing, S.; Uchibe, E.; Doya, K. Sigmoid-Weighted Linear Units for Neural Network Function Approximation in Reinforcement Learning. Neural Netw. 2018, 107, 3–11. [Google Scholar] [CrossRef]
- Su, J.; Ahmed, M.; Lu, Y.; Pan, S.; Bo, W.; Liu, Y. RoFormer: Enhanced Transformer with Rotary Position Embedding. Neurocomputing 2024, 568, 127063. [Google Scholar] [CrossRef]
- Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer Normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar]
- Xiong, R.; Yang, Y.; He, D.; Zheng, K.; Zheng, S.; Xing, C.; Liu, T. On Layer Normalization in the Transformer Architecture. In Proceedings of the 37th International Conference on Machine Learning, PMLR, Virtual, 13–18 July 2020; pp. 10524–10533. [Google Scholar]
- Zhang, B.; Sennrich, R. Root Mean Square Layer Normalization. In Advances in Neural Information Processing Systems, Proceedings of the 2019 Conference on Neural Information Processing Systems, Vancouver, BC, Canada, 8–14 December 2019; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2019; Volume 32. [Google Scholar]
- Penedo, G.; Malartic, Q.; Hesslow, D.; Cojocaru, R.; Cappelli, A.; Alobeidli, H.; Launay, J. The RefinedWeb Dataset for Falcon LLM: Outperforming Curated Corpora with Web Data, and Web Data Only. arXiv 2023, arXiv:2306.01116. [Google Scholar]
- Nguyen, T.; Van Nguyen, C.; Lai, V.D.; Man, H.; Ngo, N.T.; Dernoncourt, F.; Nguyen, T.H. CulturaX: A Cleaned, Enormous, and Multilingual Dataset for Large Language Models in 167 Languages. arXiv 2023, arXiv:2309.09400. [Google Scholar]
- Penedo, G.; Kydlíček, H.; Lozhkov, A.; Mitchell, M.; Raffel, C.; Von Werra, L.; Wolf, T. The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale. arXiv 2024, arXiv:2406.17557. [Google Scholar]
- Gao, L.; Biderman, S.; Black, S.; Golding, L.; Hoppe, T.; Foster, C.; Phang, J.; He, H.; Thite, A.; Nabeshima, N.; et al. The Pile: An 800GB Dataset of Diverse Text for Language Modeling. arXiv 2020, arXiv:2101.00027. [Google Scholar]
- Weber, M.; Fu, D.; Anthony, Q.; Oren, Y.; Adams, S.; Alexandrov, A.; Zhang, C. RedPajama: An Open Dataset for Training Large Language Models. arXiv 2024, arXiv:2411.12372. [Google Scholar]
- Ben Allal, L.; Lozhkov, A.; Penedo, G.; Wolf, T.; Von Werra, L. Cosmopedia. 2024. Available online: https://huggingface.co/datasets/HuggingFaceTB/cosmopedia (accessed on 21 January 2025).
- Liu, Y. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
- Yuan, S.; Zhao, H.; Du, Z.; Ding, M.; Liu, X.; Cen, Y.; Tang, J. WuDaoCorpora: A Super Large-Scale Chinese Corpora for Pre-Training Language Models. AI Open 2021, 2, 65–68. [Google Scholar] [CrossRef]
- Li, J.; Fang, A.; Smyrnis, G.; Ivgi, M.; Jordan, M.; Gadre, S.; Shankar, V. DataComp-LM: In Search of the Next Generation of Training Sets for Language Models. arXiv 2024, arXiv:2406.11794. [Google Scholar]
- Soldaini, L.; Kinney, R.; Bhagia, A.; Schwenk, D.; Atkinson, D.; Authur, R.; Lo, K. DOLMA: An Open Corpus of Three Trillion Tokens for Language Model Pretraining Research. arXiv 2024, arXiv:2402.00159. [Google Scholar]
- Li, R.; Ben Allal, L.; Zi, Y.; Muennighoff, N.; Kocetkov, D.; Mou, C.; de Vries, H. StarCoder: May the Source Be With You! arXiv 2023, arXiv:2305.06161. [Google Scholar]
- Baumgartner, J.; Zannettou, S.; Keegan, B.; Squire, M.; Blackburn, J. The Pushshift Reddit Dataset. In Proceedings of the International AAAI Conference on Web and Social Media, Virtual, 8–11 June 2020; Volume 14, pp. 830–839. [Google Scholar]
- Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
- Wang, W.; Wei, F.; Dong, L.; Bao, H.; Yang, N.; Zhou, M. MiniLM: Deep Self-Attention Distillation for Task-Agnostic Compression of Pre-Trained Transformers. In Advances in Neural Information Processing Systems, Proceedings of the Annual Conference on Neural Information Processing Systems 2020, Virtual, 6–12 December 2020; Neural Information Processing Systems Foundation, Inc. (NeurIPS): La Jolla, CA, USA, 2020; Volume 33, pp. 5776–5788. [Google Scholar]
- Iandola, F.N.; Shaw, A.E.; Krishna, R.; Keutzer, K.W. SqueezeBERT: What Can Computer Vision Teach NLP About Efficient Neural Networks? arXiv 2020, arXiv:2006.11316. [Google Scholar]
- Allal, L.B.; Lozhkov, A.; Bakouch, E.; von Werra, L.; Wolf, T. SmollM-Blazingly Fast and Remarkably Powerful. Hugging Face Blog. 2024. Available online: https://huggingface.co/blog/smollm (accessed on 26 October 2024).
- Meta, A.I. Introducing Meta Llama 3: The Most Capable Openly Available LLM to Date. 2024. Available online: https://ai.meta.com/blog/meta-llama-3/ (accessed on 26 October 2024).
- Thawakar, O.; Vayani, A.; Khan, S.; Cholakal, H.; Anwer, R.M.; Felsberg, M.; Baldwin, T.; Xing, E.P.; Khan, F.S. MobilLlama: Towards Accurate and Lightweight Fully Transparent GPT. arXiv 2024, arXiv:2402.16840. [Google Scholar]
- Bellagente, M.; Tow, J.; Mahan, D.; Phung, D.; Zhuravinskyi, M.; Adithyan, R.; Baicoianu, J.; Brooks, B.; Cooper, N.; Datta, A.; et al. Stable LM 2 1.6B Technical Report. arXiv 2024, arXiv:2402.17834. [Google Scholar]
- Mitra, A.; Del Corro, L.; Mahajan, S.; Codas, A.; Simoes, C.; Agarwal, S.; Chen, X.; Razdaibiedina, A.; Jones, E.; Aggarwal, K.; et al. Orca 2: Teaching Small Language Models How to Reason. arXiv 2023, arXiv:2311.11045. [Google Scholar]
- Galanos, T.; Liapis, A.; Yannakakis, G.N. Architext: Language-Driven Generative Architecture Design. arXiv 2023, arXiv:2303.07519. [Google Scholar]
- Allal, L.B.; Li, R.; Kocetkov, D.; Mou, C.; Akiki, C.; Ferrandis, C.M.; Muennighoff, N.; Mishra, M.; Gu, A.; Dey, M.; et al. SantaCoder: Don’t Reach for the Stars! arXiv 2023, arXiv:2301.03988. [Google Scholar]
- Corrêa, N.K.; Falk, S.; Fatimah, S.; Sen, A.; De Oliveira, N. TeenyTinyLlama: Open-Source Tiny Language Models Trained in Brazilian Portuguese. Mach. Learn. Appl. 2024, 16, 100558. [Google Scholar] [CrossRef]
- Du, X.; Yu, Z.; Gao, S.; Pan, D.; Cheng, Y.; Ma, Z.; Yuan, R.; Qu, X.; Liu, J.; Zheng, T.; et al. Chinese Tiny LLM: Pretraining a Chinese-Centric Large Language Model. arXiv 2024, arXiv:2404.04167. [Google Scholar]
- Wang, Y.; Le, H.; Gotmare, A.D.; Bui, N.D.Q.; Li, J.; Hoi, S.C.H. Codet5+: Open Code Large Language Models for Code Understanding and Generation. arXiv 2023, arXiv:2305.07922. [Google Scholar]
- Chu, X.; Qiao, L.; Lin, X.; Xu, S.; Yang, Y.; Hu, Y.; Wei, F.; Zhang, X.; Zhang, B.; Wei, X.; et al. MobileVLM: A Fast, Strong, and Open Vision Language Assistant for Mobile Devices. arXiv 2023, arXiv:2312.16886. [Google Scholar]
- Yuan, Z.; Li, Z.; Huang, W.; Ye, Y.; Sun, L. TinyGPT-V: Efficient Multimodal Large Language Model via Small Backbones. arXiv 2023, arXiv:2312.16862. [Google Scholar]
- HuggingFace. DistilGPT2. 2019. Available online: https://huggingface.co/distilgpt2 (accessed on 26 October 2024).
- Black, S.; Gao, L.; Wang, P.; Leahy, C.; Biderman, S. GPT-Neo: Large Scale Autoregressive Language Modeling with Mesh-Tensorflow. Zenodo 2021, 58, 5297715. [Google Scholar] [CrossRef]
- Yang, A.; Yang, B.; Hui, B.; Zheng, B.; Yu, B.; Zhou, C.; Li, C.; Li, C.; Liu, D.; Huang, F.; et al. Qwen2 Technical Report. arXiv 2024, arXiv:2407.10671. [Google Scholar]
- OpenAI. GPT-4o-mini. 2024. Available online: https://openai.com/index/gpt-4o-mini-advancing-cost-efficient-intelligence/ (accessed on 26 October 2024).
- BigScience. Bigscience/Bloomz-1b1. Available online: https://huggingface.co/bigscience/bloomz-1b1 (accessed on 21 January 2025).
- BigScience. Bigscience/Bloom-560m. Available online: https://huggingface.co/bigscience/bloom-560m (accessed on 21 January 2025).
- Facebook. Facebook/Opt-125m. Available online: https://huggingface.co/facebook/opt-125m (accessed on 21 January 2025).
- Facebook. Facebook/Galactica-125m. Available online: https://huggingface.co/facebook/galactica-125m (accessed on 21 January 2025).
- Microsoft. Microsoft/Phi-3-mini. Available online: https://huggingface.co/microsoft/Phi-3.5-mini-instruct (accessed on 21 January 2025).
- Microsoft. Microsoft/Phi-2. Available online: https://huggingface.co/microsoft/phi-2 (accessed on 21 January 2025).
- Microsoft. Microsoft/Phi-1_5. Available online: https://huggingface.co/microsoft/phi-1_5 (accessed on 21 January 2025).
- Microsoft. Microsoft/phi-1. Available online: https://huggingface.co/microsoft/phi-1 (accessed on 22 January 2025).
- StabilityAI. Stabilityai/Stablelm-Zephyr-3b. Available online: https://huggingface.co/stabilityai/stablelm-zephyr-3b (accessed on 22 January 2025).
- Meituan. MobileLLaMA. Available online: https://huggingface.co/mtgv/MobileLLaMA-1.4B-Base (accessed on 24 January 2025).
- Google. RecurrentGemma. Available online: https://huggingface.co/mtgv/MobileLLaMA-2.7B-Base (accessed on 24 January 2025).
- Google. gemma2. Available online: https://huggingface.co/google/gemma-2-2b-it (accessed on 24 January 2025).
- MBZUAI. MBZUAI/LaMini-GPT-774M. Available online: https://huggingface.co/MBZUAI/LaMini-GPT-774M (accessed on 24 January 2025).
- OpenBMB. MiniCPM3. Available online: https://huggingface.co/openbmb/MiniCPM3-4B (accessed on 25 January 2025).
- Toyota. DCLM. Available online: https://huggingface.co/TRI-ML/DCLM-1B (accessed on 25 January 2025).
- H2O.ai. h2o-danube3-4b-base. Available online: https://huggingface.co/h2oai/h2o-danube3-4b-base (accessed on 24 January 2025).
- Alibaba. Qwen 2.5. Available online: https://qwenlm.github.io/blog/qwen2.5/ (accessed on 24 January 2025).
- Alibaba. Qwen 2. Available online: https://huggingface.co/docs/transformers/model_doc/qwen2 (accessed on 24 January 2025).
- Alibaba. Qwen 1. Available online: https://huggingface.co/Qwen/Qwen-1_8B (accessed on 24 January 2025).
- TensorOpera. Fox-1-1.6B. Available online: https://huggingface.co/tensoropera/Fox-1-1.6B (accessed on 24 January 2025).
- Cao, K.; Liu, Y.; Meng, G.; Sun, Q. An Overview on Edge Computing Research. IEEE Access 2020, 8, 85714–85728. [Google Scholar] [CrossRef]
- Al-Fuqaha, A.; Guizani, M.; Mohammadi, M.; Aledhari, M.; Ayyash, M. Internet of Things: A Survey on Enabling Technologies, Protocols, and Applications. IEEE Commun. Surv. Tutor. 2015, 17, 2347–2376. [Google Scholar] [CrossRef]
- Coskun-Setirek, A.; Mardikyan, S. Understanding the Adoption of Voice Activated Personal Assistants. Int. J. E-Serv. Mob. Appl. (IJESMA) 2017, 9, 1–21. [Google Scholar] [CrossRef]
- Sah, A.; Saurav, S.; Meena, A.; Mishra, S.; Kumar, N. Smartwatch as a Pervasive Computing Application in Health Metrics Tracking. In Proceedings of the International Conference on Innovative Computing and Communication, Delhi, India, 16–17 February 2024; Springer: Singapore, 2024; pp. 145–157. [Google Scholar]
- Rawassizadeh, R.; Rong, Y. ODSearch: Fast and Resource Efficient On-Device Natural Language Search for Fitness Trackers’ Data. In Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies; Association for Computing Machinery: New York, NY, USA, 2023; Volume 6, pp. 1–25. [Google Scholar]
- Beniwal, G.; Singhrova, A. A Systematic Literature Review on IoT Gateways. J. King Saud-Univ.-Comput. Inf. Sci. 2022, 34, 9541–9563. [Google Scholar] [CrossRef]
- Vereecken, H.; Bogena, H.; Huisman, J.A.; Vanderborght, J.; Herbst, M.; Brüggemann, N.; Vereecken, J. On the Spatio-Temporal Dynamics of Soil Moisture at the Field Scale. J. Hydrol. 2014, 516, 76–96. [Google Scholar] [CrossRef]
- Franch, G.; Tomasi, E.; Wanjari, R.; Poli, V.; Cardinali, C.; Alberoni, P.P.; Cristoforetti, M. GPTCast: A Weather Language Model for Precipitation Nowcasting. arXiv 2024, arXiv:2407.02089. [Google Scholar]
- Norda, M.; Engel, C.; Rennies, J.; Appell, J.E.; Lange, S.C.; Hahn, A. Evaluating the Efficiency of Voice Control as Human Machine Interface in Production. IEEE Trans. Autom. Sci. Eng. 2023, 21, 4817–4828. [Google Scholar] [CrossRef]
- Xia, Y.; Jazdi, N.; Zhang, J.; Shah, C.; Weyrich, M. Control Industrial Automation System with Large Language Models. arXiv 2024, arXiv:2409.18009. [Google Scholar]
- Dhillon, A.S.; Torresin, A. Advancing Vehicle Diagnostics: Exploring the Application of Large Language Models in the Automotive Industry. Master’s Thesis, Chalmers University of Technology, Gothenburg, Sweden, 2024. [Google Scholar]
- Vakulenko, S.; Longpre, S.; Tu, Z.; Anantha, R. A Wrong Answer or a Wrong Question? An Intricate Relationship Between Question Reformulation and Answer Selection in Conversational Question Answering. arXiv 2020, arXiv:2010.06835. [Google Scholar]
- Klongdee, S.; Singthongchai, J. Enhancing Sentiment Analysis with Term Sentiment Entropy: Capturing Nuanced Sentiment in Text Classification. Artif. Intell. Mach. Learn. 2024; preprint. [Google Scholar]
- Liu, C.; Tao, C.; Liang, J.; Shen, T.; Feng, J.; Huang, Q.; Zhao, D. Rethinking Task-Specific Knowledge Distillation: Contextualized Corpus as Better Textbook. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, Abu Dhabi, United Arab Emirates, 7–11 December 2022; pp. 10652–10658. [Google Scholar]
- Li, X.; Wang, W.; Hu, X.; Yang, J. Selective Kernel Networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Long Beach, CA, USA, 15–20 June 2019; pp. 510–519. [Google Scholar]
- Vrbančič, G.; Podgorelec, V. Transfer Learning with Adaptive Fine-Tuning. IEEE Access 2020, 8, 196197–196211. [Google Scholar] [CrossRef]
- Hernández, A.; Amigó, J.M. Attention Mechanisms and Their Applications to Complex Systems. Entropy 2021, 23, 283. [Google Scholar] [CrossRef] [PubMed]
- Gururangan, S.; Marasović, A.; Swayamdipta, S.; Lo, K.; Beltagy, I.; Downey, D.; Smith, N.A. Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks. arXiv 2020, arXiv:2004.10964. [Google Scholar]
- Lee, H.Y.; Li, S.W.; Vu, N.T. Meta Learning for Natural Language Processing: A Survey. arXiv 2022, arXiv:2205.01500. [Google Scholar]
- Zhang, Y.; Yang, Q. A Survey on Multi-Task Learning. IEEE Trans. Knowl. Data Eng. 2021, 34, 12–5586. [Google Scholar] [CrossRef]
- Kadam, S.; Vaidya, V. Review and Analysis of Zero, One and Few Shot Learning Approaches. In Intelligent Systems Design and Applications, Proceedings of the 18th International Conference on Intelligent Systems Design and Applications (ISDA 2018), Vellore, India, 6–8 December 2018; Springer International Publishing: Cham, Switzerland, 2018; Volume 1, pp. 100–112. [Google Scholar]
- Kiela, D.; Wang, C.; Cho, K. Dynamic Meta-Embeddings for Improved Sentence Representations. arXiv 2018, arXiv:1804.07983. [Google Scholar]
- Ouali, Y.; Hudelot, C.; Tami, M. An Overview of Deep Semi-Supervised Learning. arXiv 2020, arXiv:2006.05278. [Google Scholar]
- Figueira, A.; Vaz, B. Survey on Synthetic Data Generation, Evaluation Methods and GANs. Mathematics 2022, 10, 2733. [Google Scholar] [CrossRef]
- Predd, J.B.; Kulkarni, S.R.; Poor, H.V. A Collaborative Training Algorithm for Distributed Learning. IEEE Trans. Inf. Theory 2009, 55, 1856–1871. [Google Scholar] [CrossRef]
- Plantin, J.C.; Lagoze, C.; Edwards, P.N. Re-Integrating Scholarly Infrastructure: The Ambiguous Role of Data Sharing Platforms. Big Data Soc. 2018, 5, 2053951718756683. [Google Scholar] [CrossRef]
- Zhang, C.; Xie, Y.; Bai, H.; Yu, B.; Li, W.; Gao, Y. A Survey on Federated Learning. Knowl. Based Syst. 2021, 216, 106775. [Google Scholar] [CrossRef]
- Pan, S.J.; Yang, Q. A Survey on Transfer Learning. IEEE Trans. Knowl. Data Eng. 2009, 22, 1345–1359. [Google Scholar] [CrossRef]
- Li, B.; Hou, Y.; Che, W. Data Augmentation Approaches in Natural Language Processing: A Survey. AI Open 2022, 3, 71–90. [Google Scholar] [CrossRef]
- Singh, I.; Singh, B. Access Management of IoT Devices Using Access Control Mechanism and Decentralized Authentication: A Review. Meas. Sens. 2023, 25, 100591. [Google Scholar] [CrossRef]
- Bai, T.; Luo, J.; Zhao, J.; Wen, B.; Wang, Q. Recent Advances in Adversarial Training for Adversarial Robustness. arXiv 2021, arXiv:2102.01356. [Google Scholar]
- Moradi, R.; Berangi, R.; Minaei, B. A Survey of Regularization Strategies for Deep Models. Artif. Intell. Rev. 2020, 53, 3947–3986. [Google Scholar] [CrossRef]
- Yang, M.; Guo, T.; Zhu, T.; Tjuawinata, I.; Zhao, J.; Lam, K.Y. Local Differential Privacy and Its Applications: A Comprehensive Survey. Comput. Stand. Interfaces 2023, 89, 103827. [Google Scholar] [CrossRef]
- Zhao, C.; Zhao, S.; Zhao, M.; Chen, Z.; Gao, C.Z.; Li, H.; Tan, Y.A. Secure Multi-Party Computation: Theory, Practice and Applications. Inf. Sci. 2019, 476, 357–372. [Google Scholar] [CrossRef]
- Myers, D.; Suriadi, S.; Radke, K.; Foo, E. Anomaly Detection for Industrial Control Systems Using Process Mining. Comput. Secur. 2018, 78, 103–125. [Google Scholar] [CrossRef]
- Srinivasan, M.; Parmar, S.; Crowley, M.; Salnikov, S. Infrared Fallback Mechanism for Remote Control Devices. 2022. Available online: https://www.tdcommons.org/cgi/viewcontent.cgi?article=6659&context=dpubs_series (accessed on 25 January 2025).
- Poczeta, K.; Płaza, M.; Zawadzki, M.; Michno, T.; Krechowicz, M. Analysis of the Retraining Strategies for Multi-Label Text Message Classification in Call/Contact Center Systems. Sci. Rep. 2024, 14, 10093. [Google Scholar] [CrossRef] [PubMed]
- Ma, Y.; Wang, Z.; Yang, H.; Yang, L. Artificial Intelligence Applications in the Development of Autonomous Vehicles: A Survey. IEEE/CAA J. Autom. Sin. 2020, 7, 315–329. [Google Scholar] [CrossRef]
- Babaei, P.; Riahinia, N.; Ebadati, O.M.; Azimi, A. Autonomous Vehicles’ Object Detection Architectures Ranking Based on Multi-Criteria Decision-Making Techniques. Int. J. Inf. Technol. 2024, 16, 2343–2352. [Google Scholar] [CrossRef]
- Azamfirei, V.; Psarommatis, F.; Lagrosen, Y. Application of Automation for In-Line Quality Inspection: A Zero-Defect Manufacturing Approach. J. Manuf. Syst. 2023, 67, 1–22. [Google Scholar] [CrossRef]
- Licardo, J.T.; Domjan, M.; Orehovački, T. Intelligent Robotics—A Systematic Review of Emerging Technologies and Trends. Electronics 2024, 13, 542. [Google Scholar] [CrossRef]
- Sánchez-Corcuera, R.; Nuñez-Marcos, A.; Sesma-Solance, J.; Bilbao-Jayo, A.; Mulero, R.; Zulaika, U.; Azkune, G.; Almeida, A. Smart Cities Survey: Technologies, Application Domains and Challenges for the Cities of the Future. Int. J. Distrib. Sens. Netw. 2019, 15, 1550147719853984. [Google Scholar] [CrossRef]
- Wei, H.; Zheng, G.; Gayah, V.; Li, Z. A Survey on Traffic Signal Control Methods. arXiv 2019, arXiv:1904.08117. [Google Scholar]
- Seneviratne, S.; Hu, Y.; Nguyen, T.; Lan, G.; Khalifa, S.; Thilakarathna, K.; Seneviratne, A. A Survey of Wearable Devices and Challenges. IEEE Commun. Surv. Tutor. 2017, 19, 2573–2620. [Google Scholar] [CrossRef]
- Elshakhs, Y.S.; Deliparaschos, K.M.; Charalambous, T.; Oliva, G.; Zolotas, A. A Comprehensive Survey on Delaunay Triangulation: Applications, Algorithms, and Implementations over CPUs, GPUs, and FPGAs. IEEE Access 2024, 12, 12562–12585. [Google Scholar] [CrossRef]
Technique | Description | Model Size Reduction | Latency Improvement | Accuracy Trade-Off |
---|---|---|---|---|
Knowledge Distillation | Teacher-student model transfer | High (>50%) | Moderate (1.2×–2×) | Minimal (<1%) |
Quantization | Lower precision (e.g., INT8) | High (>50%) | High (>2×) | Minor (1–3%) for INT8 |
Pruning | Remove unimportant weights | Moderate (20–50%) | Moderate (1.2×–2×) | Variable |
Efficient Architectures | Model design for efficiency | High (>50%) | High (>2×) | Minimal (<1%) if optimized |
Aspect | LLMs | TLMs |
---|---|---|
Model Size | Billions of parameters (e.g., GPT-3: 175 B; PaLM: 540 B) | Tens to hundreds of millions (e.g., DistilBERT: 66 M; TinyBERT: 4.4 M) |
Computational Requirements | Extremely high: Requires multiple GPUs or TPUs, often in cloud environments | Moderate to low: Operates efficiently on single GPUs or CPUs |
Latency | High (e.g., GPT-3 inference latency 100–200 ms for cloud-based tasks) | Low (e.g., optimized for <50 ms response times in edge environments) |
Energy Efficiency | High energy consumption (e.g., training GPT-3 requires hundreds of MWh) | Optimized for low-power settings (e.g., inference on mobile devices) |
Memory Footprint | Very large (requires distributed storage or advanced caching techniques) | Compact (fits within edge device memory, typically < 1 GB) |
Accuracy and Performance | High on complex tasks (e.g., SOTA on NLP benchmarks) | Moderate to high; some accuracy trade-off for efficiency |
Training Techniques | Pre-training on massive datasets, extensive fine-tuning | Model compression (e.g., distillation, pruning, quantization) |
Deployment Scope | Cloud-based or server environments | On-device, edge, IoT, or mobile platforms |
Inference Speed | Variable; often slow for large-scale models due to size | Fast; optimized for low-latency environments |
Suitability for Real-Time Applications | Limited due to latency and resource constraints | Ideal for real-time applications, especially on edge devices |
Common Use Cases | Large-scale NLP (e.g., machine translation, chatbots, summarization) | Resource-efficient NLP (e.g., voice assistants, mobile diagnostics) |
Relevance in Automation | High in centralized processing pipelines | High for real-time, localized automation |
Hardware Compatibility | Requires specialized hardware (e.g., TPU, GPU clusters) | Compatible with standard hardware (e.g., CPUs, low-power GPUs) |
Cost of Deployment | Very high: Significant investment in infrastructure and cloud usage | Low: Affordable for small-scale or local deployments |
Application Domain | Function of TLMs | Expected Benefits |
---|---|---|
Smart Home Assistants | On-device NLP for automation | Low-latency, privacy-friendly voice interaction |
Wearable Health Monitors | Real-time health insights processing | Improved accessibility and energy efficiency |
Industrial IoT Monitoring | Sensor data analysis and anomaly detection | Enhanced predictive maintenance and uptime |
Smart Agriculture Systems | AI-powered environmental monitoring | Increased resource efficiency and automation |
Voice-Controlled Manufacturing | Hands-free machine operation | Simplified workflow and human-machine interaction |
Automated Industrial Reporting | NLP-based log analysis and summarization | Reduced manual reporting efforts and faster insights |
Multilingual NLP for Factories | Real-time translation of operator commands | Enhanced collaboration in global workplaces |
Error Code Interpretation | AI-based fault diagnosis | Faster troubleshooting and maintenance resolution |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lamaakal, I.; Maleh, Y.; El Makkaoui, K.; Ouahbi, I.; Pławiak, P.; Alfarraj, O.; Almousa, M.; Abd El-Latif, A.A. Tiny Language Models for Automation and Control: Overview, Potential Applications, and Future Research Directions. Sensors 2025, 25, 1318. https://doi.org/10.3390/s25051318
Lamaakal I, Maleh Y, El Makkaoui K, Ouahbi I, Pławiak P, Alfarraj O, Almousa M, Abd El-Latif AA. Tiny Language Models for Automation and Control: Overview, Potential Applications, and Future Research Directions. Sensors. 2025; 25(5):1318. https://doi.org/10.3390/s25051318
Chicago/Turabian StyleLamaakal, Ismail, Yassine Maleh, Khalid El Makkaoui, Ibrahim Ouahbi, Paweł Pławiak, Osama Alfarraj, May Almousa, and Ahmed A. Abd El-Latif. 2025. "Tiny Language Models for Automation and Control: Overview, Potential Applications, and Future Research Directions" Sensors 25, no. 5: 1318. https://doi.org/10.3390/s25051318
APA StyleLamaakal, I., Maleh, Y., El Makkaoui, K., Ouahbi, I., Pławiak, P., Alfarraj, O., Almousa, M., & Abd El-Latif, A. A. (2025). Tiny Language Models for Automation and Control: Overview, Potential Applications, and Future Research Directions. Sensors, 25(5), 1318. https://doi.org/10.3390/s25051318