A Review of Artificial Intelligence in Embedded Systems
Abstract
:1. Introduction
2. Hardware Acceleration Methods for Embedded AI
2.1. FPGA
2.2. ASIC
2.3. GPU
2.4. Other Acceleration Hardware
2.5. Summary
3. The Key Technologies of Embedded AI
3.1. Model Compression of Neural Network
3.1.1. Network Structure Redesign
3.1.2. Quantization
3.1.3. Pruning
3.2. Binary Neural Networks and Optimization Techniques
3.3. CPU/GPU Acceleration Algorithm
3.4. Summary
4. Application Modes of Embedded Artificial Intelligence
4.1. Post-Training Deployment
4.2. Training on Embedded Devices
4.3. Partial Training
4.4. Summary
5. The Outlook of Embedded Artificial Intelligence
- Efficient algorithms and lightweight models: In the current society, most workers need to frequently switch between different work scenarios. This results in higher requirements for device portability, including weight, volume, energy consumption, and other factors. To ensure the portability of the devices, the development of intelligent devices requires the study of more efficient algorithms and lightweight network models while maintaining model accuracy and reducing network model complexity.
- Hardware acceleration methods: In addition to optimization in algorithms and models, optimization can also be achieved at the hardware level. The current research on hardware acceleration methods is limited to a single architecture of the neural network accelerator. Applying a hardware neural network accelerator to multiple platforms or using multiple hardware devices in combination may become a solution to the problem in the future.
- Deployment optimization: Embedded AI deployment can be divided into post-training deployment, training on embedded devices, and part of the training task on embedded devices. Current post-training deployment has a high demand for training speed on other platforms, which can be met by improving the model training speed. The need for training on embedded devices is consistent with the first point of this subsection, requiring more efficient algorithms and lighter network models to reduce the difficulty of model training on embedded devices. For tasks completed on embedded devices, consideration of post-training models for integration is required to ensure model integrity.
- Compatibility: According to reference [60], the current embedded intelligence in the industry still faces problems. For example, in legacy automation systems, some dedicated functions lack interoperability with the current automation system due to various reasons. At the same time, there is no standard method to manage the edge computing nodes and data collection. Additionally, utilizing the large amount of data generated by the edge computing and industrial cloud working together in machine learning remains an issue.
6. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Ang, K.L.-M.; Seng, J.K.P. Embedded Intelligence: Platform Technologies, Device Analytics, and Smart City Applications. IEEE Internet Things J. 2021, 8, 13165–13182. [Google Scholar] [CrossRef]
- Dick, R.P.; Shang, L.; Wolf, M.; Yang, S.-W. Embedded Intelligence in the Internet-of-Things. IEEE Des. Test 2019, 37, 7–27. [Google Scholar] [CrossRef]
- Guo, B.; Zhang, D.; Yu, Z.; Liang, Y.; Wang, Z.; Zhou, X. From the internet of things to embedded intelligence. World Wide Web 2012, 16, 399–420. [Google Scholar] [CrossRef]
- Ardakani, A.; Condo, C.; Gross, W.J. Fast and Efficient Convolutional Accelerator for Edge Computing. IEEE Trans. Comput. 2019, 69, 138–152. [Google Scholar] [CrossRef]
- Li, H.; Ota, K.; Dong, M. Learning IoT in Edge: Deep Learning for the Internet of Things with Edge Computing. IEEE Netw. 2018, 32, 96–101. [Google Scholar] [CrossRef]
- Manavalan, E.; Jayakrishna, K. A review of Internet of Things (IoT) embedded sustainable supply chain for industry 4.0 requirements. Comput. Ind. Eng. 2018, 127, 925–953. [Google Scholar] [CrossRef]
- Xu, D.; Li, T.; Li, Y.; Su, X.; Tarkoma, S.; Jiang, T.; Crowcroft, J.; Hui, P. Edge Intelligence: Empowering Intelligence to the Edge of Network. Proc. IEEE 2021, 109, 1778–1837. [Google Scholar] [CrossRef]
- Poniszewska-Maranda, A.; Kaczmarek, D.; Kryvinska, N.; Xhafa, F. Studying usability of AI in the IoT systems/paradigm through embedding NN techniques into mobile smart service system. Computing 2018, 101, 1661–1685. [Google Scholar] [CrossRef]
- Howard, A.G.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H.J. Mobilenets: Efficient convolutional neural networks for mobile vision applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Deng, B.L.; Li, G.; Han, S.; Shi, L.; Xie, Y. Model Compression and Hardware Acceleration for Neural Networks: A Comprehensive Survey. Proc. IEEE 2020, 108, 485–532. [Google Scholar] [CrossRef]
- Krishnamoorthi, R.J. Quantizing deep convolutional networks for efficient inference: A whitepaper. arXiv 2018, arXiv:1806.08342. [Google Scholar]
- Kwadjo, D.T.; Tchinda, E.N.; Mbongue, J.M.; Bobda, C. Towards a component-based acceleration of convolutional neural networks on FPGAs. J. Parallel Distrib. Comput. 2022, 167, 123–135. [Google Scholar] [CrossRef]
- Hwang, D.H.; Han, C.Y.; Oh, H.W.; Lee, S.E. ASimOV: A Framework for Simulation and Optimization of an Embedded AI Accelerator. Micromachines 2021, 12, 838. [Google Scholar] [CrossRef]
- Li, Z.; Lemaire, E.; Abderrahmane, N.; Bilavarn, S.; Miramond, B. Efficiency analysis of artificial vs. Spiking Neural Networks on FPGAs. J. Syst. Arch. 2022, 133, 102765. [Google Scholar] [CrossRef]
- Venieris, S.I.; Bouganis, C.S. fpgaConvNet: A toolflow for mapping diverse convolutional neural networks on embedded FPGAs. arXiv 2017, arXiv:1711.08740. [Google Scholar]
- Venieris, S.I.; Bouganis, C.-S. fpgaConvNet: Mapping Regular and Irregular Convolutional Neural Networks on FPGAs. IEEE Trans. Neural Networks Learn. Syst. 2018, 30, 326–342. [Google Scholar] [CrossRef]
- Andri, R.; Cavigelli, L.; Rossi, D.; Benini, L. YodaNN: An Architecture for Ultralow Power Binary-Weight CNN Acceleration. IEEE Trans. Comput. Des. Integr. Circuits Syst. 2017, 37, 48–60. [Google Scholar] [CrossRef]
- Hegde, K.; Yu, J.; Agrawal, R.; Yan, M.; Pellauer, M.; Fletcher, C. UCNN: Exploiting Computational Reuse in Deep Neural Networks via Weight Repetition. In Proceedings of the 2018 ACM/IEEE 45th Annual International Symposium on Computer Architecture (ISCA), Los Angeles, CA, USA, 1–6 June 2018; pp. 674–687. [Google Scholar] [CrossRef]
- Shin, D.; Yoo, H.-J. The Heterogeneous Deep Neural Network Processor With a Non-von Neumann Architecture. Proc. IEEE 2019, 108, 1245–1260. [Google Scholar] [CrossRef]
- Wang, M.; Yang, T.; Flechas, M.A.; Harris, P.; Hawks, B.; Holzman, B.; Knoepfel, K.; Krupa, J.; Pedro, K.; Tran, N. GPU-Accelerated Machine Learning Inference as a Service for Computing in Neutrino Experiments. Front. Big Data 2021, 3. [Google Scholar] [CrossRef]
- Zhang, F.; Chen, Z.; Zhang, C.; Zhou, A.C.; Zhai, J.; Du, X. An Efficient Parallel Secure Machine Learning Framework on GPUs. IEEE Trans. Parallel Distrib. Syst. 2021, 32, 2262–2276. [Google Scholar] [CrossRef]
- Kang, M.; Lee, Y.; Park, M. Energy Efficiency of Machine Learning in Embedded Systems Using Neuromorphic Hardware. Electronics 2020, 9, 1069. [Google Scholar] [CrossRef]
- Mittal, S. A Survey on optimized implementation of deep learning models on the NVIDIA Jetson platform. J. Syst. Arch. 2019, 97, 428–442. [Google Scholar] [CrossRef]
- Liu, X.; Ounifi, H.-A.; Gherbi, A.; Li, W.; Cheriet, M. A hybrid GPU-FPGA based design methodology for enhancing machine learning applications performance. J. Ambient. Intell. Humaniz. Comput. 2019, 11, 2309–2323. [Google Scholar] [CrossRef]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K.J. SqueezeNet: AlexNet-level accuracy with 50× fewer parameters and<0.5 MB model size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. arXiv 2017, arXiv:1707.01083v2. [Google Scholar]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Xie, S.; Girshick, R.; Dollár, P.; Tu, Z.; He, K. In Aggregated residual transformations for deep neural networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition 2017, Honolulu, HI, USA, 21–26 July 2017; pp. 1492–1500. [Google Scholar]
- Cai, H.; Gan, C.; Wang, T.; Zhang, Z.; Han, S. Once-for-all: Train one network and specialize it for efficient deployment. arXiv 2019, arXiv:1908.09791. [Google Scholar]
- Dong, X.; Yan, S.; Duan, C. A lightweight vehicles detection network model based on YOLOv5. Eng. Appl. Artif. Intell. 2022, 113, 104914. [Google Scholar] [CrossRef]
- Li, Y.; Gong, R.; Tan, X.; Yang, Y.; Hu, P.; Zhang, Q.; Yu, F.; Wang, W.; Gu, S. Brecq: Pushing the limit of post-training quantization by block reconstruction. arXiv 2021, arXiv:2102.05426. [Google Scholar]
- Nagel, M.; Van Baalen, M.; Blankevoort, T.; Welling, M. Data-Free Quantization Through Weight Equalization and Bias Correction. In Proceedings of the IEEE/CVF International Conference on Computer Vision 2019, Seoul, Republic of Korea, 27 October–2 November 2019. [Google Scholar]
- Nagel, M.; Amjad, R.A.; Van Baalen, M.; Louizos, C.; Blankevoort, T. Up or down? adaptive rounding for post-training quantization. In Proceedings of the International Conference on Machine Learning 2020, Virtual, 3–18 July 2020; pp. 7197–7206. [Google Scholar]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and huffman coding. arXiv 2015, arXiv:1510.00149. [Google Scholar]
- Han, S.; Liu, X.; Mao, H.; Pu, J.; Pedram, A.; Horowitz, M.A.; Dally, W.J. EIE: Efficient inference engine on compressed deep neural network. ACM SIGARCH Comput. Archit. News 2016, 44, 243–254. [Google Scholar] [CrossRef]
- Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar]
- Zhou, X.; Zhang, W.; Xu, H.; Zhang, T. Effective sparsification of neural networks with global sparsity constraint. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, Virtual, 19–25 June 2021; pp. 3599–3608. [Google Scholar]
- Tang, Y.; Wang, Y.; Xu, Y.; Deng, Y.; Xu, C.; Tao, D.; Xu, C. Manifold regularized dynamic network pruning. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2021, Virtual, 19–25 June 2021; pp. 5018–5028. [Google Scholar]
- Hou, Z.; Qin, M.; Sun, F.; Ma, X.; Yuan, K.; Xu, Y.; Chen, Y.-K.; Jin, R.; Xie, Y.; Kung, S.-Y. Chex: Channel exploration for CNN model compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 12287–12298. [Google Scholar]
- Li, Y.; Adamczewski, K.; Li, W.; Gu, S.; Timofte, R.; Van Gool, L. Revisiting random channel pruning for neural network compression. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, New Orleans, LA, USA, 18–24 June 2022; pp. 191–201. [Google Scholar]
- Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized neural networks: Training deep neural networks with weights and activations constrained to+ 1 or-1. arXiv 2016, arXiv:1602.02830. [Google Scholar]
- Courbariaux, M.; Bengio, Y.; David, J.P. Binaryconnect: Training deep neural networks with binary weights during propagations. Adv. Neural Inf. Process. Syst. 2015, 28, 777–780. [Google Scholar]
- Rastegari, M.; Ordonez, V.; Redmon, J.; Farhadi, A. In Xnor-net: Imagenet classification using binary convolutional neural networks. In Proceedings of the Computer Vision–ECCV 2016: 14th European Conference, Amsterdam, The Netherlands, 11–14 October 2016; Springer: Berlin/Heidelberg, Germany, 2016; pp. 525–542. [Google Scholar]
- Hu, Q.; Wang, P.; Cheng, J. From Hashing to CNNs: Training Binary Weight Networks via Hashing. Proc. Conf. AAAI Artif. Intell. 2018, 32. [Google Scholar] [CrossRef]
- Al-Wajih, E.; Ghazali, R. Threshold center-symmetric local binary convolutional neural networks for bilingual handwritten digit recognition. Knowledge-Based Syst. 2023, 259. [Google Scholar] [CrossRef]
- Tu, Z.; Chen, X.; Ren, P.; Wang, Y. Adabin: Improving Binary Neural Networks with Adaptive Binary Sets, Proceedings of the Computer Vision–ECCV 2022: 17th European Conference, Tel Aviv, Israel, 23–27 October 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 379–395. [Google Scholar]
- Fang, J.; Fu, H.; Yang, G.; Hsieh, C.-J. RedSync: Reducing synchronization bandwidth for distributed deep learning training system. J. Parallel Distrib. Comput. 2019, 133, 30–39. [Google Scholar] [CrossRef]
- Khalid, Y.N.; Aleem, M.; Ahmed, U.; Islam, M.A.; Iqbal, M.A. Troodon: A machine-learning based load-balancing application scheduler for CPU–GPU system. J. Parallel Distrib. Comput. 2019, 132, 79–94. [Google Scholar] [CrossRef]
- Li, S.; Niu, X.; Dou, Y.; Lv, Q.; Wang, Y. Heterogeneous blocked CPU-GPU accelerate scheme for large scale extreme learning machine. Neurocomputing 2017, 261, 153–163. [Google Scholar] [CrossRef]
- Cai, P.; Luo, Y.; Hsu, D.; Lee, W.S. HyP-DESPOT: A hybrid parallel algorithm for online planning under uncertainty. Int. J. Robot. Res. 2021, 40, 558–573. [Google Scholar] [CrossRef]
- Chang, K.-W.; Chang, T.-S. VWA: Hardware Efficient Vectorwise Accelerator for Convolutional Neural Network. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 67, 145–154. [Google Scholar] [CrossRef]
- Ahmed, U.; Lin, J.C.-W.; Srivastava, G. A ML-based resource utilization OpenCL GPU-kernel fusion model. Sustain. Comput. Inform. Syst. 2022, 35, 100683. [Google Scholar] [CrossRef]
- Manogaran, G.; Shakeel, P.M.; Fouad, H.; Nam, Y.; Baskar, S.; Chilamkurti, N.; Sundarasekar, R. Wearable IoT Smart-Log Patch: An Edge Computing-Based Bayesian Deep Learning Network System for Multi Access Physical Monitoring System. Sensors 2019, 19, 3030. [Google Scholar] [CrossRef] [PubMed]
- Ramasamy, L.K.; Khan, F.; Shah, M.; Prasad, B.V.V.S.; Iwendi, C.; Biamba, C. Secure Smart Wearable Computing through Artificial Intelligence-Enabled Internet of Things and Cyber-Physical Systems for Health Monitoring. Sensors 2022, 22, 1076. [Google Scholar] [CrossRef] [PubMed]
- Martinez-Alpiste, I.; Casaseca-de-la-Higuera, P.; Alcaraz-Calero, J.M.; Grecos, C.; Wang, Q. Smartphone-based object recognition with embedded machine learning intelligence for unmanned aerial vehicles. J. Field Robot. 2020, 37, 404–420. [Google Scholar] [CrossRef]
- Zhou, Q.; Wang, J.; Wu, P.; Qi, Y. Application Development of Dance Pose Recognition Based on Embedded Artificial Intelligence Equipment. J. Physics Conf. Ser. 2021, 1757, 012011. [Google Scholar] [CrossRef]
- Ma, Q.; Wang, Y. RETRACTED ARTICLE: Application of embedded system and artificial intelligence platform in Taekwondo image feature recognition. J. Ambient. Intell. Humaniz. Comput. 2021, 1–12. [Google Scholar] [CrossRef]
- Sharma, A.; Georgi, M.; Tregubenko, M.; Tselykh, A.; Tselykh, A.J.C.; Engineering, I. Enabling smart agriculture by implementing artificial intelligence and embedded sensing. Comput. Ind. Eng. 2022, 165, 107936. [Google Scholar] [CrossRef]
- Haque, W.A.; Arefin, S.; Shihavuddin, A.; Hasan, M.A. DeepThin: A novel lightweight CNN architecture for traffic sign recognition without GPU requirements. Expert Syst. Appl. 2020, 168, 114481. [Google Scholar] [CrossRef]
- Dai, W.; Nishi, H.; Vyatkin, V.; Huang, V.; Shi, Y.; Guan, X. Industrial Edge Computing: Enabling Embedded Intelligence. IEEE Ind. Electron. Mag. 2019, 13, 48–56. [Google Scholar] [CrossRef]
Classification | Reference | Proposed Method | Advantage | |
---|---|---|---|---|
FPGA | [12] | Pre-implemented CNN Accelerator | Lower resource and high energy efficiency | Flexibility and Scalability |
[13] | ASimOV framework | Lower memory high performance | ||
[14] | Compare between ANN and SNN | SNN is better than ANN in energy efficiency and inference speed | ||
[15] | fpgaConvNet Accelerator | Higher energy efficiency and raw performance | ||
[16] | Synchronous dataflow | Higher performance and performance density | ||
ASIC | [17] | YodaNN CNN Accelerator | Higher performance and energy efficiency | Performance and Energy Efficiency |
[18] | UCNN Accelerator | Lower computation cost | ||
[19] | MSICs | Higher energy efficiency | ||
GPU | [20] | CRYPTGPU | Faster than CPU and more private | Parallel Computing Capabilities |
[21] | ParSecureML | Faster than other SecureML frameworks | ||
Other | [22] | NPU application system | More efficient and lower energy consumption | Customization |
[23] | NVIDIA Jetson | Powerful parallel computation capability |
Classification | Reference | Proposed Method | Advantage |
---|---|---|---|
Model Compression | |||
Network Design | [25] | SqueezeNet | Fewer parameters |
[9] | MobileNet | Compatible Resource-scarce embedded devices | |
[31] | ShuffleNet | ||
[29] | Once-for-all network | Lower energy consumption | |
[41] | Improvement of Yolov5 | Faster detection speed | |
Quant | [31] | BRECQ | Faster production |
[32] | Data-Free Quantization | Less precision loss | |
[33] | AdaRound | Less precision loss | |
Quant | [11] | DeepCompression | Fewer storage requirements with no loss of precision |
[35] | Efficient Inference Engine | Lower energy consumption | |
Prune | [25] | Important Connection Pruning | Fewer parameters |
[37] | ProbMask | Higher precision | |
[38] | ManiDp | Fewer flops | |
[39] | CHEX | Fewer flops and less precision loss | |
[15] | Channel pruning | Less loss of performance | |
Binary Neural Network | |||
[42] | Binary Neural Network | Significant reduction in the number of parameters | |
[43] | XNOR-Network | Less memory consumption | |
[44] | From hashing to CNM | Improvement of accuracy | |
[45] | CS-LBCNN and TCS-LBCNN | Higher precision | |
[46] | AdaBin | Less loss of precision | |
CPU/GPU Acceleration | |||
[47] | RedSync | Faster training speed | |
[9] | Troodon | Less processing time | |
[34] | Local respective field-based Extreme Learning Machine | Higher performance and faster decomposition speed | |
[50] | HyP-DESPOT | Faster execution Speed | |
[51] | Hardware efficient vector-wise accelerator | Less energy consumption and higher hardware utilization | |
[52] | GPU-kernel fusion model | Higher F-measure |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Zhang, Z.; Li, J. A Review of Artificial Intelligence in Embedded Systems. Micromachines 2023, 14, 897. https://doi.org/10.3390/mi14050897
Zhang Z, Li J. A Review of Artificial Intelligence in Embedded Systems. Micromachines. 2023; 14(5):897. https://doi.org/10.3390/mi14050897
Chicago/Turabian StyleZhang, Zhaoyun, and Jingpeng Li. 2023. "A Review of Artificial Intelligence in Embedded Systems" Micromachines 14, no. 5: 897. https://doi.org/10.3390/mi14050897
APA StyleZhang, Z., & Li, J. (2023). A Review of Artificial Intelligence in Embedded Systems. Micromachines, 14(5), 897. https://doi.org/10.3390/mi14050897