A Low-Power Hardware Architecture for Real-Time CNN Computing
Abstract
:1. Introduction
- 1.
- We apply a pipeline structure for real-time CNN inference, where the input data is streamed and processed in a column-wise manner to realize the minimal computational time once the complete input image becomes available.
- 2.
- We propose a multi-cycle scheme, where all columns related to a convolutional kernel, except for the last one, are processed in multiple cycles to reduce hardware resources and power consumption.
- 3.
- Based on the proposed multi-cycle scheme, we design the hardware structures for the typical CNN layers and construct them in a pipeline according to the given CNN architecture, to perform column-wise computations.
2. Background
2.1. Convolutional Neural Network
2.2. State-of-the-Art CNN Accelerators
3. Proposed Architecture for Real-Time CNN
3.1. CNN Inference in RTC Systems
3.2. Multi-Cycle Pipelining for Computational Resource Minimization
3.3. Hardware Architecture
4. Results
4.1. Setup
4.2. Overall Power Reduction
4.3. Impact of Layer Configurations to Power Reduction
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Rich feature hierarchies for accurate object detection and semantic segmentation. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 23–28 June 2014; pp. 580–587. [Google Scholar]
- Tomè, D.; Monti, F.; Baroffio, L.; Bondi, L.; Tagliasacchi, M.; Tubaro, S. Deep convolutional neural networks for pedestrian detection. Signal Process. Image Commun. 2016, 47, 482–489. [Google Scholar] [CrossRef]
- Simonyan, K.; Zisserman, A. Very deep convolutional networks for large-scale image recognition. arXiv 2014, arXiv:1409.1556. [Google Scholar]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
- Sainath, T.N.; Vinyals, O.; Senior, A.; Sak, H. Convolutional, long short-term memory, fully connected deep neural networks. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 4580–4584. [Google Scholar]
- Palaz, D.; Collobert, R. Analysis of cnn-Based Speech Recognition System Using Raw Speech as Input; Technical Report; Idiap: Martigny, Switzerland, 2015. [Google Scholar]
- Palaz, D.; Doss, M.M.; Collobert, R. Convolutional neural networks-based continuous speech recognition using raw speech signal. In Proceedings of the 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), South Brisbane, QLD, Australia, 19–24 April 2015; IEEE: Piscataway, NJ, USA, 2015; pp. 4295–4299. [Google Scholar]
- Mohammad-Razdari, A.; Rousseau, D.; Bakhshipour, A.; Taylor, S.; Poveda, J.; Kiani, H. Recent advances in E-monitoring of plant diseases. Biosens. Bioelectron. 2022, 201, 113953. [Google Scholar] [CrossRef] [PubMed]
- Gholamalinezhad, H.; Khosravi, H. Pooling methods in deep neural networks, a review. arXiv 2020, arXiv:2009.07485. [Google Scholar]
- Jouppi, N.P.; Young, C.; Patil, N.; Patterson, D.; Agrawal, G.; Bajwa, R.; Bates, S.; Bhatia, S.; Boden, N.; Borchers, A.; et al. In-datacenter performance analysis of a tensor processing unit. In Proceedings of the 44th Annual International Symposium on Computer Architecture, Toronto, ON, Canada, 24–28 June 2017; pp. 1–12. [Google Scholar]
- Srinivas, A.; Lin, T.Y.; Parmar, N.; Shlens, J.; Abbeel, P.; Vaswani, A. Bottleneck transformers for visual recognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Nashville, TN, USA, 20–25 June 2021; pp. 16519–16529. [Google Scholar]
- Boroumand, A.; Ghose, S.; Akin, B.; Narayanaswami, R.; Oliveira, G.F.; Ma, X.; Shiu, E.; Mutlu, O. Google Neural Network Models for Edge Devices: Analyzing and Mitigating Machine Learning Inference Bottlenecks. In Proceedings of the 2021 30th International Conference on Parallel Architectures and Compilation Techniques (PACT), Atlanta, GA, USA, 26–29 September 2021; pp. 159–172. [Google Scholar]
- Shen, Y.; Ferdman, M.; Milder, P. Maximizing CNN accelerator efficiency through resource partitioning. In Proceedings of the 2017 ACM/IEEE 44th Annual International Symposium on Computer Architecture (ISCA), Toronto, ON, Canada, 24–28 June 2017; IEEE: Piscataway, NJ, USA, 2017; pp. 535–547. [Google Scholar]
- Alwani, M.; Chen, H.; Ferdman, M.; Milder, P. Fused-layer CNN accelerators. In Proceedings of the 2016 49th Annual IEEE/ACM International Symposium on Microarchitecture (MICRO), Taipei, Taiwan, 15–19 October 2016; IEEE: Piscataway, NJ, USA, 2016; pp. 1–12. [Google Scholar]
- Wang, J.; Lin, J.; Wang, Z. Efficient hardware architectures for deep convolutional neural network. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 65, 1941–1953. [Google Scholar] [CrossRef]
- Kim, J.; Kang, J.K.; Kim, Y. A Resource Efficient Integer-Arithmetic-Only FPGA-Based CNN Accelerator for Real-Time Facial Emotion Recognition. IEEE Access 2021, 9, 104367–104381. [Google Scholar] [CrossRef]
- Kim, J.; Kang, J.K.; Kim, Y. A Low-Cost Fully Integer-Based CNN Accelerator on FPGA for Real-Time Traffic Sign Recognition. IEEE Access 2022, 10, 84626–84634. [Google Scholar] [CrossRef]
- Lin, K.T.; Chiu, C.T.; Chang, J.Y.; Hsiao, S.C. High utilization energy-aware real-time inference deep convolutional neural network accelerator. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
- Gonzalez, H.A.; Muzaffar, S.; Yoo, J.; Elfadel, I.A.M. An inference hardware accelerator for EEG-based emotion detection. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
- Xu, K.; Wang, X.; Liu, X.; Cao, C.; Li, H.; Peng, H.; Wang, D. A dedicated hardware accelerator for real-time acceleration of YOLOv2. J. Real-Time Image Process. 2021, 18, 481–492. [Google Scholar] [CrossRef]
- Kyriakos, A.; Papatheofanous, E.A.; Bezaitis, C.; Reisis, D. Resources and Power Efficient FPGA Accelerators for Real-Time Image Classification. J. Imaging 2022, 8, 114. [Google Scholar] [CrossRef] [PubMed]
- Sanchez, J.; Sawant, A.; Neff, C.; Tabkhi, H. Aware-cnn: Automated workflow for application-aware real-time edge acceleration of cnns. IEEE Internet Things J. 2020, 7, 9318–9329. [Google Scholar] [CrossRef]
- Zhang, J.; Cheng, L.; Li, C.; Li, Y.; He, G.; Xu, N.; Lian, Y. A low-latency fpga implementation for real-time object detection. In Proceedings of the 2021 IEEE International Symposium on Circuits and Systems (ISCAS), Daegu, Republic of Korea, 22–28 May 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–5. [Google Scholar]
- LeCun, Y.; Boser, B.; Denker, J.; Henderson, D.; Howard, R.; Hubbard, W.; Jackel, L. Handwritten digit recognition with a back-propagation network. Adv. Neural Inf. Process. Syst. 1989, 2, 396–404. [Google Scholar]
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. Imagenet classification with deep convolutional neural networks. Commun. ACM 2017, 60, 84–90. [Google Scholar] [CrossRef]
Network | Size of Input Image | No. of Classes | No. of Parameters |
---|---|---|---|
LeNet | 10 | 0.05 M | |
AlexNet | 21 | 41.59 M | |
VGG16 | 21 | 117.57 M |
Layer | LeNet | AlexNet | VGG16 |
---|---|---|---|
conv. | in = 1, out = 6, k = 5, s = 1 | in = 3, out = 96, k = 11, s = 4 | in = 3, out = 64, k = 3, s = 1 |
conv. | in = 64, out = 64, k = 3, s = 1 | ||
pooling | k = 2, s = 2 | k = 3, s = 2 | k = 2, s = 2 |
conv. | in = 6, out = 16, k = 5, s = 1 | in = 96, out = 256, k = 5, s = 1 | in = 64, out = 128, k = 3, s = 1 |
conv. | in = 128, out = 128, k = 3, s = 1 | ||
pooling | k = 2, s = 2 | k = 3, s = 2 | k = 2, s = 2 |
conv. | in = 256, out = 384, k = 3, s = 1 | in = 128, out = 256, k = 3, s = 1 | |
conv. | in = 384, out = 384, k = 3, s = 1 | in = 256, out = 256, k = 3, s = 1 | |
conv. | in = 384, out = 256, k = 3, s = 1 | in = 256, out = 256, k = 3, s = 1 | |
pooling | k = 3, s = 2 | k = 2, s = 2 | |
conv. | in = 256, out = 512, k = 3, s = 1 | ||
conv. | in = 512, out = 512, k = 3, s = 1 | ||
conv. | in = 512, out = 512, k = 3, s = 1 | ||
pooling | k = 2, s = 2 | ||
conv. | in = 512, out = 512, k = 3, s = 1 | ||
conv. | in = 512, out = 512, k = 3, s = 1 | ||
conv. | in = 512, out = 512, k = 3, s = 1 | ||
pooling | k = 2, s = 2 | ||
fc | in = 400, out = 120 | in = 9216, out = 4096 | in = 25088, out = 4096 |
fc | in = 120, out = 84 | in = 4096, out = 4096 | in = 4096, out = 4096 |
fc | in = 84, out = 10 | in = 4096, out = 21 | in = 4096, out = 21 |
Design | AWARE-CNN [22] | ISCAS21 [23] | This Work |
---|---|---|---|
Implementation | FPGA | FPGA | ASIC |
Technology node (nm) | 16 | 28 | 65 |
Quantization (bit) | 16 | 8 | 8 |
Frequency (MHz) | 30–40 | 200 | 62.5 |
Power (mW) | 15,000 | NA | 65.05–147.95 |
Power efficiency (TOPs/W) | 0.25–0.49 | 0.045 | 3.19–4.08 |
Latency (ms) | 30.3–45.2 | 27.78 | 7.45–22.36 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, X.; Cao, C.; Duan, S. A Low-Power Hardware Architecture for Real-Time CNN Computing. Sensors 2023, 23, 2045. https://doi.org/10.3390/s23042045
Liu X, Cao C, Duan S. A Low-Power Hardware Architecture for Real-Time CNN Computing. Sensors. 2023; 23(4):2045. https://doi.org/10.3390/s23042045
Chicago/Turabian StyleLiu, Xinyu, Chenhong Cao, and Shengyu Duan. 2023. "A Low-Power Hardware Architecture for Real-Time CNN Computing" Sensors 23, no. 4: 2045. https://doi.org/10.3390/s23042045
APA StyleLiu, X., Cao, C., & Duan, S. (2023). A Low-Power Hardware Architecture for Real-Time CNN Computing. Sensors, 23(4), 2045. https://doi.org/10.3390/s23042045