Implementation of Lightweight Convolutional Neural Networks with an Early Exit Mechanism Utilizing 40 nm CMOS Process for Fire Detection in Unmanned Aerial Vehicles
Abstract
:1. Introduction
- Development of a lightweight CNN model for wildfire recognition, incorporating multiple exit points;
- The proposed model substantially reduces neural network computations while maintaining an 83% accuracy rate in predictions;
- Significant reduction in the memory requirements and energy consumption of the hardware circuit, with only a slight decrease in CNN accuracy;
- Implementation of the hardware circuit using TSMC 40 nm CMOS technology, enhanced with power gating techniques to further reduce power consumption;
- The implemented ASIC offers multiple usage modes for users to select according to various usage scenarios.
2. Related Work
3. The Proposed Hardware Architecture
3.1. CNN Architecture Overview
3.2. Weight Quantization Method
3.3. Software Results
4. Hardware Implementation
4.1. Fixed-Point of Activation Values and Parameters
4.2. CNN Hardware Accelerator Architecture
5. Experimental Results
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Yuan, C.; Liu, Z.; Zhang, Y. UAV-based forest fire detection and tracking using image processing techniques. In Proceedings of the International Conference on Unmanned Aircraft Systems (ICUAS), Denver, CO, USA, 9–12 June 2015; pp. 639–643. [Google Scholar]
- Dampage, U.; Bandaranayake, L.; Bandaranayake, L.; Kottahachchi, K.; Jayasanka, B. Forest fire detection system using wireless sensor networks and machine learning. Sci. Rep. 2022, 12, 46. [Google Scholar] [CrossRef] [PubMed]
- Sathishkumar, V.E.; Cho, J.; Subramanian, M.; Naren, O.S. Forest fire and smoke detection using deep learning-based learning without forgetting. Fire Ecol. 2023, 19, 9. [Google Scholar] [CrossRef]
- Shamsoshoara, A.; Afghah, F.; Razi, A.; Zheng, L.; Ful, P.Z.; Blasch, E. Aerial imagery pile burn detection using deep learning: The FLAME dataset. Comput. Netw. 2021, 193, 108001. [Google Scholar] [CrossRef]
- Chollet, F. Xception: Deep learning with depthwise separable convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1251–1258. [Google Scholar]
- Ghali, R.; Akhlouf, M.A.; Mseddi, W.S. Deep learning and transformer approaches for UAV-based wildfire detection and segmentation. Sensors 2022, 22, 1977. [Google Scholar] [CrossRef] [PubMed]
- Tan, M.; Le, Q. EfficientNet: Rethinking model scaling for convolutional neural networks. In Proceedings of the International Conference on Machine Learning, Long Beach, CA, USA, 9–15 June 2019; pp. 6105–6114. [Google Scholar]
- Huang, G.; Liu, Z.; van der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 4700–4708. [Google Scholar]
- Zhang, L.; Wang, M.; Fu, Y.; Ding, Y. A forest fire recognition method using UAV images based on transfer learning. Forests 2022, 13, 975. [Google Scholar] [CrossRef]
- Zulberti, L.; Monopoli, M.; Nannipieri, P.; Fanucci, L.; Moranti, S. Highly parameterised CGRA architecture for design space exploration of machine learning applications onboard satellites. In Proceedings of the 2023 European Data Handling & Data Processing Conference (EDHPC), Juan Les Pins, France, 2–6 October 2023. [Google Scholar]
- Pacini, T.; Rapuano, E.; Tuttobene, L.; Nannipieri, P.; Fanucci, L.; Moranti, S. Towards the extension of FPG-AI toolflow to RNN deployment on FPGAs for onboard satellite applications. In Proceedings of the 2023 European Data Handling & Data Processing Conference (EDHPC), Juan Les Pins, France, 2–6 October 2023. [Google Scholar]
- Cheng, Y.; Wang, D.; Zhou, P.; Zhang, T. A survey of model compression and acceleration for deep neural networks. arXiv 2020, arXiv:1710.09282v9. [Google Scholar]
- Han, S.; Pool, J.; Tran, J.; Dally, W. Learning both weights and connections for efficient neural network. In Proceedings of the Advances in Neural Information Processing Systems 28 (NIPS 2015), Montreal, QC, Canada, 7–12 December 2015. [Google Scholar]
- Han, S.; Mao, H.; Dally, W.J. Deep compression: Compressing deep neural networks with pruning, trained quantization and Huffman coding. arXiv 2016, arXiv:1510.00149v5. [Google Scholar]
- Ba, J.; Caruana, R. Do deep nets really need to be deep? In Proceedings of the Advances in Neural Information Processing Systems 27 (NIPS 2014), Montreal, QC, Canada, 8–13 December 2014. [Google Scholar]
- Teerapittayanon, S.; McDanel, B.; Kung, H.T. BranchyNet: Fast inference via early exiting from deep neural networks. In Proceedings of the International Conference on Pattern Recognition (ICPR), Cancun, Mexico, 4–8 December 2016; pp. 2464–2469. [Google Scholar]
- Courbariaux, M.; Hubara, I.; Soudry, D.; El-Yaniv, R.; Bengio, Y. Binarized neural networks: Training deep neural networks with weights and activations constrained to +1 or −1. arXiv 2016, arXiv:1602.02830v3. [Google Scholar]
- Li, F.; Liu, B.; Wang, X.; Zhang, B.; Yan, J. Ternary weight networks. arXiv 2022, arXiv:1605.04711v3. [Google Scholar]
- Micikevicius, P.; Narang, S.; Alben, J.; Diamos, G.; Elsen, E.; Ginsburg, B.; Houston, M.; Kuchaiev, O.; Venkatesh, G.; Wu, H. Mixed precision training. arXiv 2018, arXiv:1710.03740v3. [Google Scholar]
- Zhou, S.; Wu, Y.; Ni, Z.; Zhou, X.; Wen, H.; Zou, Y. DoReFa-Net: Training low bitwidth convolutional neural networks with low bitwidth gradients. arXiv 2018, arXiv:1606.06160v3. [Google Scholar]
Operation | Input Data Size (Width × Height × in_channel) | Output Data Size (Width × Height × in_channel) | Stride |
---|---|---|---|
Convolution 1 | 64 × 64 × 3 | 64 × 64 × 8 | 1 |
Max-pooling 1 | 64 × 64 × 8 | 32 × 32 × 8 | 4 |
Convolution2 | 32 × 32 × 8 | 32 × 32 × 8 | 1 |
Max-pooling 2 | 32 × 32 × 8 | 16 × 16 × 8 | 4 |
Global average pooing | 16 × 16 × 8 | 1 × 1 × 8 | - |
FC 1 | 1 × 1 × 8 | 1 × 1 ×30 | - |
FC 2 | 1 × 1 × 30 | 1 × 1 × 30 | - |
FC 3 | 1 × 1 × 30 | 1 × 1 × 1 | - |
Convolution 3 | 16 × 16 × 8 | 8 × 8 × 16 | 1 |
Max-pooling 3 | 8 × 8 × 16 | 4 × 4 × 16 | 4 |
Global average pooing | 4 × 4 × 16 | 1 × 1 × 16 | - |
FC 4 | 1 × 1 × 16 | 1 × 1 × 30 | - |
FC 5 | 1 × 1 × 30 | 1 × 1 × 30 | - |
FC 6 | 1 × 1 × 30 | 1 × 1 × 1 | - |
Convolution 4 | 4 × 4 × 16 | 4 × 4 × 32 | 1 |
Max-pooling 4 | 4 × 4 × 32 | 2 × 2 × 32 | 4 |
Global average pooing | 2 × 2 × 32 | 1 × 1 × 32 | - |
FC 7 | 1 × 1 × 32 | 1 × 1 × 30 | - |
FC 8 | 1 × 1 × 30 | 1 × 1 × 30 | - |
FC 9 | 1 × 1 × 30 | 1 × 1 × 1 | - |
Operation | Number of Parameters | Sum of Parameters |
---|---|---|
Convolution 1 | 3 × 3 × 3 × 8 | 216 |
Batch normalization 1 | 4 × 8 | 32 |
Convolution 2 | 8 ×3 × 3 × 8 | 576 |
Batch normalization 2 | 4 × 8 | 32 |
FC 1 | 8 × 30 | 240 |
FC 2 | 30 × 30 | 900 |
FC 3 | 30 × 1 | 30 |
Convolution 3 | 8 × 3 × 3 × 16 | 1152 |
Batch normalization 3 | 4 × 16 | 64 |
FC 4 | 16 × 30 | 480 |
FC 5 | 30 × 30 | 900 |
FC 6 | 30 × 1 | 30 |
Convolution 4 | 16 × 3 × 3 × 32 | 4608 |
Batch normalization 4 | 4 × 32 | 128 |
FC 7 | 32 × 30 | 960 |
FC 8 | 30 × 30 | 900 |
FC 9 | 30 × 1 | 30 |
Total parameters | 11,278 |
Image Size | Nearest Neighbor Interpolation Test Accuracy | Bilinear Interpolation Test Accuracy | Bicubic Interpolation Test Accuracy |
---|---|---|---|
128 × 128 | 74.53% | 84.70% | 84.64% |
64 × 64 | 65.32% | 83.29% | 83.11% |
32 × 32 | 54.62% | 75.66% | 72.42% |
Method | Conv. Layer (in_channel/out_channel) | Sum of Conv. Parameters | Test Accuracy | |||
---|---|---|---|---|---|---|
Layer1 | Layer2 | Layer3 | Layer4 | |||
1 | 3/4 | 4/8 | 8/16 | 16/32 | 6156 | 73.29% |
2 | 3/8 | 8/8 | 8/16 | 16/32 | 6552 | 83.11% |
3 | 3/16 | 16/8 | 8/16 | 16/32 | 7344 | 83.57% |
4 | 3/32 | 32/16 | 16/16 | 16/32 | 12,384 | 83.64% |
Image Size | Exit1 Accuracy | Exit2 Accuracy | Exit3 Accuracy |
---|---|---|---|
64 × 64 | 79.02% | 83.04% | 83.11% |
Bits | Exit1 Accuracy | Exit2 Accuracy | Exit3 Accuracy |
---|---|---|---|
9 | 78.86% | 82.71% | 82.81% |
8 | 78.62% | 82.57% | 82.79% |
7 | 78.35% | 82.21% | 82.69% |
6 | 77.21% | 80.36% | 80.39% |
Predict | Accuracy | |||
---|---|---|---|---|
Fire | No Fire | |||
Actual | Fire | 4283 | 854 | 83.38% |
No Fire | 638 | 2842 | 81.67% |
Integer Bits | Decimal Bits | Exit1 Accuracy | Exit2 Accuracy | Exit3 Accuracy |
---|---|---|---|---|
3 | 7 | 78.35% | 82.16% | 82.60% |
3 | 6 | 78.31% | 82.17% | 82.57% |
3 | 5 | 78.29% | 81.85% | 81.94% |
3 | 4 | 75.37% | 78.67% | 78.97% |
Fixed Point Comparison Table | ||||
---|---|---|---|---|
Integer Bits | Decimal Bits | Exit1 Accuracy | Exit2 Accuracy | Exit3 Accuracy |
3 | 11 | 78.32% | 81.77% | 81.92% |
3 | 10 | 78.35% | 82.02% | 81.89% |
3 | 9 | 78.29% | 81.83% | 81.85% |
3 | 8 | 78.26% | 81.76% | 81.88% |
3 | 7 | 59.72% | 60.40% | 60.76% |
Fixed Point Comparison Table | ||||
---|---|---|---|---|
Integer Bits | Decimal Bits | Exit1 Accuracy | Exit2 Accuracy | Exit3 Accuracy |
3 | 7 | 78.26% | 81.76% | 81.92% |
3 | 6 | 78.24% | 81.85% | 81.88% |
3 | 5 | 78.18% | 81.77% | 81.84% |
3 | 4 | 78.16% | 81.73% | 81.82% |
3 | 3 | 77.47% | 79.06% | 79.04% |
Comparison for Bit-Width of Weight | ||||
---|---|---|---|---|
Integer Bits | Decimal Bits | Exit1 Accuracy | Exit2 Accuracy | Exit3 Accuracy |
2 | 9 | 78.08% | 81.68% | 81.76% |
2 | 8 | 78.06% | 81.63% | 81.73% |
2 | 7 | 76.12% | 78.77% | 79.25% |
Store Data | Data Size (Height × Width × Channel × Bit-Width) | Total Size (bits) | Data Management |
---|---|---|---|
Input data | 64 × 64 × 3 × 8 | 98,306 | Sr1-1, Sr1-2 |
Psum of layer 1 | 64 × 64 × 1 × 19 | 77,824 | Psum1, Psum2, Psum3, Psum4 |
Feature map of layer 1 | 32 × 32 × 8 × 8 | 65,536 | rf1-1, rf1-2, rf1-3, rf1-4 |
Psum of layer 2 | 32 × 32 × 1 × 19 | 19,456 | Psum1 |
Feature map of layer 2 | 16 × 16 × 8 × 8 | 16,384 | Sr1-2 |
Psum of layer 3 | 16 × 16 × 1 × 19 | 4864 | Psum1 |
Feature map of layer 3 | 8 × 8 × 16 × 8 | 8192 | rf1-1 |
Psum of layer 4 | 8 × 8 × 1 × 19 | 1216 | Psum1 |
Feature map of layer 4 | 4 × 4 × 32 × 8 | 4096 | rf1-2 |
Memory Type | Memory | Total Bits before the Fixed-Point | Total Bits after the Fixed-Point | Reduction Ratio |
---|---|---|---|---|
RAM | Sr1-1, Sr1-2 | 393,224 | 98,306 | 75% |
Register | Psum1~4 | 131,072 | 77,824 | 40.625% |
rf1-1, rf1-2, rf1-3, rf1-4 | 262,144 | 65,536 | 75% | |
ROM | rom_c1c2, rom_c3, rom_c4 | 209,920 | 45,920 | 78.125% |
rom_exit1, rom_exit2, rom_exit3 | 145,408 | 31,808 | 78.125% | |
The sum of all bits | 1,141,768 | 319,394 | 72.03% |
Operation of Each Stage | Exit1 Test Accuracy | Exit2 Test Accuracy | Exit3 Test Accuracy |
---|---|---|---|
Build CNN model | 79.02% | 83.04% | 83.11% |
Weight quantization to 7-bit by DoReFaNet | 78.35% | 82.21% | 82.69% |
Convert activation to 8-bit | 78.29% | 81.85% | 81.94% |
Convert in BN to 11 bits | 78.26% | 81.76% | 81.88% |
Convert in BN to 7 bits | 78.16% | 81.73% | 81.82% |
Verilog register transfer | 78.02% | 81.47% | 81.49% |
[6] Computer Networks’21 | [8] Sensors’22 | [13] Forests’22 | Proposed Work | ||
---|---|---|---|---|---|
Architecture | Xception | DenseNet201+ EfficientB5 | FT-ResNet50 | 2-D CNN | |
Hardware information | NVIDIA Geforce RTX 2080ti | NVIDIA Geforce RTX 2080ti | NVIDIA Geforce RTX 2080ti | Raspberry pi3 Model B | 40 nm ASIC |
Inference time (s) | N/A | 0.018 | 0.055 | 3.809 | 0.077 (Exit1) 0.105 (Exit3) @300 MHz |
Dataset | The FLAME dataset | The FLAME dataset | The FLAME dataset | The FLAME dataset | |
Input size | 254 × 254 | 254 × 254 | 254 × 254 | 64 × 64 | |
Parameters | 22.9 M | 50.7 M | 25.6 M | 11.2 k | |
Quantization method | No | No | No | DoReFaNet+Fixed-point | |
Accuracy | 76.23% | 85.12% | 79.48% | 81.82% | 81.49% |
Power consumption | 600 W | 600 W | 600 W | ≈2.5 W | 117 mW@300 MHz |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liang, Y.-P.; Chang, C.-M.; Chung, C.-C. Implementation of Lightweight Convolutional Neural Networks with an Early Exit Mechanism Utilizing 40 nm CMOS Process for Fire Detection in Unmanned Aerial Vehicles. Sensors 2024, 24, 2265. https://doi.org/10.3390/s24072265
Liang Y-P, Chang C-M, Chung C-C. Implementation of Lightweight Convolutional Neural Networks with an Early Exit Mechanism Utilizing 40 nm CMOS Process for Fire Detection in Unmanned Aerial Vehicles. Sensors. 2024; 24(7):2265. https://doi.org/10.3390/s24072265
Chicago/Turabian StyleLiang, Yu-Pei, Chen-Ming Chang, and Ching-Che Chung. 2024. "Implementation of Lightweight Convolutional Neural Networks with an Early Exit Mechanism Utilizing 40 nm CMOS Process for Fire Detection in Unmanned Aerial Vehicles" Sensors 24, no. 7: 2265. https://doi.org/10.3390/s24072265