Optimizing Convolutional Neural Networks for Image Classification on Resource-Constrained Microcontroller Units
Abstract
:1. Introduction
- ≤250 KB RAM
- ≤250 KB ROM
- Inference cost ≤60 M multiply–accumulate operations (MACs)
- We investigated and developed a strategy to optimize existing CNN architectures for given resource constraints.
- We created the CNN Analyzer to inspect the metrics of each layer in a CNN.
- Our model implementations use the TensorFlow Lite for Microcontrollers inference library, as it can run on almost all available MCUs [23].
- We introduced new model architecture scaling factors to optimize MobileNet v1, MobileNet v2, MobileNet v3, and ShuffleNet v2.
- We have published the CNN Analyzer and its related code on our GitHub repository: https://github.com/subrockmann/tiny_cnn (accessed on 3 June 2024)
2. Resource Constraints of Microcontroller Units
3. Related Work
3.1. Techniques for Reducing the Size of Neural Networks
3.2. Techniques for Designing Efficient Convolutional Neural Networks
3.3. Efficient Convolutional Neural Networks
3.3.1. Efficient Convolutional Neural Network Architectures for Mobile Devices
3.3.2. Efficient Convolutional Neural Networks for Microcontrollers
3.3.3. Comparison to Our Work
4. Our Strategy for Optimizing CNNs on MCUs
4.1. Step 1: Create Model Variations
4.2. Step 2: Analyze Model Variations with CNN Analyzer
4.3. Step 3: Train and Evaluate Model Variations
4.4. Step 4: Evaluate Model Variations on MCU
5. Experimental Setup
5.1. Dataset
5.2. Running CNNs on MCUs
- Build a TensorFlow model representation: To build the model variation with our model architecture scaling factors, we used the TensorFlow framework.
- Convert to TensorFlow Lite model representation: To optimize the model for inference on mobile devices.
- Convert to TensorFlow Lite for Microcontrollers model representation: The optimized TensorFlow Lite model representation is compiled to a c-byte array, which is necessary in order to run it on MCUs.
5.3. CNN Analyzer: A Dashboard-Based Tool for Determining Optimal CNN Model Architecture Scaling Factors
5.3.1. Model Scorecard
5.3.2. Implementation
5.3.3. Naming Conventions for the Analyzed Models
6. Experiments and Results
6.1. Benchmark Model
6.2. Optimization of MobileNet v1 in Detail
6.2.1. Optimization with Default Model Architecture Scaling Factors and l
6.2.2. Layer-Wise Optimization with New Model Architecture Scaling Factors pl and ll
6.2.3. Layer-Wise Optimization with New Model Architecture Scaling Factor
6.3. Summary of Benchmark MobileNet v1 Optimization
6.4. Leveraging Visualizations to Find Optimal Model Architecture Scaling Factors
6.5. Optimizations of Further Models
6.5.1. MobileNet v2
6.5.2. MobileNet v3
6.5.3. ShuffleNet v1
6.5.4. ShuffleNet v2
6.5.5. Summary of Model Optimizations
- Build model variations with different width multipliers and check the model size and peak memory. Find a model variation where only one of those constraints is not met.
- If the peak memory constraint is not met, choose a smaller width multiplier .
- If the model size requirement is not met, create a layer-wise visualization of the model parameters and identify the layers with the most model parameters.
- Reduce the number of channels in the layers that have the most model parameters.
- Finally, try to increase the width multiplier as much as possible while keeping the model variation within the constraints.
7. Conclusions and Future Work
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
References
- Krizhevsky, A.; Sutskever, I.; Hinton, G.E. ImageNet Classification with Deep Convolutional Neural Networks. In Proceedings of the Advances in Neural Information Processing Systems, Lake Tahoe, NV, USA, 3–6 December 2012; Curran Associates, Inc.: Glasgow, UK, 2012; Volume 25, pp. 1097–1105. [Google Scholar]
- Girshick, R.; Donahue, J.; Darrell, T.; Malik, J. Region-Based Convolutional Networks for Accurate Object Detection and Segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 2016, 38, 142–158. [Google Scholar] [CrossRef] [PubMed]
- He, K.; Gkioxari, G.; Dollár, P.; Girshick, R. Mask R-CNN. In Proceedings of the 2017 IEEE International Conference on Computer Vision (ICCV), Venice, Italy, 22–29 October 2017; pp. 2980–2988, ISSN 2380-7504. [Google Scholar] [CrossRef]
- He, K.; Zhang, X.; Ren, S.; Sun, J. Deep Residual Learning for Image Recognition. In Proceedings of the 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778, ISSN 1063-6919. [Google Scholar] [CrossRef]
- Zagoruyko, S.; Komodakis, N. Wide Residual Networks. In Proceedings of the British Machine Vision Conference, BMVC 2016, York, UK, 19–22 September 2016; Wilson, R.C., Hancock, E.R., Smith, W.A.P., Eds.; BMVA Press: Durham, UK, 2016. [Google Scholar]
- Simonyan, K.; Zisserman, A. Very Deep Convolutional Networks for Large-Scale Image Recognition. In Proceedings of the 3rd International Conference on Learning Representations, ICLR 2015, San Diego, CA, USA, 7–9 May 2015. [Google Scholar]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; RabiNovemberich, A. Going Deeper with Convolutions. In Proceedings of the 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Boston, MA, USA, 7–12 June 2015; pp. 1–9, ISSN 1063-6919. [Google Scholar] [CrossRef]
- Alyamkin, S.; Ardi, M.; Berg, A.C.; Brighton, A.; Chen, B.; Chen, Y.; Cheng, H.P.; Fan, Z.; Feng, C.; Fu, B.; et al. Low-Power Computer Vision: Status, Challenges, Opportunities. arXiv 2019, arXiv:1904.07714. [Google Scholar] [CrossRef]
- Banbury, C.; Zhou, C.; Fedorov, I.; Navarro, R.M.; Thakker, U.; Gope, D.; Reddi, V.J.; Mattina, M.; Whatmough, P.N. MicroNets: Neural Network Architectures for Deploying TinyML Applications on Commodity Microcontrollers. Proc. Mach. Learn. Syst. 2021, 3, 517–532. [Google Scholar]
- Iandola, F.N.; Han, S.; Moskewicz, M.W.; Ashraf, K.; Dally, W.J.; Keutzer, K. SqueezeNet: AlexNet-Level Accuracy with 50× Fewer Parameters and <0.5 MB Model Size. arXiv 2016, arXiv:1602.07360. [Google Scholar]
- Howard, A.; Zhu, M.; Chen, B.; Kalenichenko, D.; Wang, W.; Weyand, T.; Andreetto, M.; Adam, H. MobileNets: Efficient Convolutional Neural Networks for Mobile Vision Applications. arXiv 2017, arXiv:1704.04861. [Google Scholar]
- Sandler, M.; Howard, A.; Zhu, M.; ZhmogiNovember, A.; Chen, L.C. MobileNetV2: Inverted Residuals and Linear Bottlenecks. arXiv 2019, arXiv:1801.04381. [Google Scholar] [CrossRef]
- Howard, A.; Sandler, M.; Chu, G.; Chen, L.C.; Chen, B.; Tan, M.; Wang, W.; Zhu, Y.; Pang, R.; Vasudevan, V.; et al. Searching for MobileNetV3. arXiv 2019, arXiv:1905.02244. [Google Scholar] [CrossRef]
- Zhang, X.; Zhou, X.; Lin, M.; Sun, J. ShuffleNet: An Extremely Efficient Convolutional Neural Network for Mobile Devices. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 6848–6856. [Google Scholar]
- Ma, N.; Zhang, X.; Zheng, H.T.; Sun, J. ShuffleNet V2: Practical Guidelines for Efficient CNN Architecture Design. In Proceedings of the ECCV 2018. Lecture Notes in Computer Science, Cham, Switzerland, 8–14 September 2018; Volume 11218. [Google Scholar] [CrossRef]
- Situnayake, D.; Plunkett, J. AI at the Edge: Solving Real-World Problems with Embedded Machine Learning, 1st ed.; Machine Learning; O’Reilly: Beijing, China; Boston, MA, USA; Farnham, UK; Sebastopol, CA, USA; Tokyo, Japan, 2023. [Google Scholar]
- Hussein, D.; Ibrahim, D.; Alajlan, N. TinyML: Enabling of Inference Deep Learning Models on Ultra-Low-Power IoT Edge Devices for AI Applications. Micromachines 2022, 13, 851. [Google Scholar] [CrossRef] [PubMed]
- Chowdhery, A.; Warden, P.; Shlens, J.; Howard, A.; Rhodes, R. Visual Wake Words Dataset. arXiv 2019, arXiv:1906.05721. [Google Scholar] [CrossRef]
- Banbury, C.; Reddi, V.J.; Torelli, P.; Holleman, J.; Jeffries, N.; Kiraly, C.; Montino, P.; Kanter, D.; Ahmed, S.; Pau, D.; et al. MLPerf Tiny Benchmark. arXiv 2021, arXiv:2106.07597. [Google Scholar]
- Zoph, B.; Vasudevan, V.; Shlens, J.; Le, Q.V. Learning Transferable Architectures for Scalable Image Recognition. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 8697–8710, ISSN: 2575-7075. [Google Scholar] [CrossRef]
- Fedorov, I.; Adams, R.P.; Mattina, M.; Whatmough, P.N. SpArSe: Sparse Architecture Search for CNNs on Resource-Constrained Microcontrollers. arXiv 2019, arXiv:1905.12107. [Google Scholar] [CrossRef]
- Lin, J.; Chen, W.M.; Lin, Y.; Cohn, J.; Gan, C.; Han, S. MCUNet: Tiny Deep Learning on IoT Devices—Technical Report. arXiv 2020, arXiv:2007.10319. [Google Scholar] [CrossRef]
- David, R.; Duke, J.; Jain, A.; Reddi, V.J.; Jeffries, N.; Li, J.; Kreeger, N.; Nappier, I.; Natraj, M.; Wang, T.; et al. TensorFlow Lite Micro: Embedded Machine Learning for TinyML Systems. Proc. Mach. Learn. Syst. 2021, 3, 800–811. [Google Scholar]
- Liberis, E.; Lane, N.D. Neural Networks on Microcontrollers: Saving Memory at Inference via Operator Reordering. arXiv 2020, arXiv:1910.05110. [Google Scholar] [CrossRef]
- Han, S.; Mao, H.; Dally, W.J. Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding. arXiv 2016, arXiv:1510.00149. [Google Scholar]
- LeCun, Y.; Denker, J.S.; Solla, S.A. Optimal Brain Damage. In Proceedings of the Advances in Neural Information Processing Systems 2, Denver, CO, USA, 12 December 1990; pp. 598–605. [Google Scholar]
- Hassibi, B.; Stork, D.; Wolff, G. Optimal Brain Surgeon and general network pruning. In Proceedings of the IEEE International Conference on Neural Networks, San Francisco, CA, USA, 28 March–1 April 1993; Volume 1, pp. 293–299. [Google Scholar] [CrossRef]
- Frankle, J.; Carbin, M. The Lottery Ticket Hypothesis: Finding Sparse, Trainable Neural Networks. In Proceedings of the 7th International Conference on Learning Representations, ICLR 2019, New Orleans, LA, USA, 6–9 May 2019. [Google Scholar]
- Heim, L.; Biri, A.; Qu, Z.; Thiele, L. Measuring what Really Matters: Optimizing Neural Networks for TinyML. arXiv 2021, arXiv:2104.10645. [Google Scholar] [CrossRef]
- Jacob, B.; Kligys, S.; Chen, B.; Zhu, M.; Tang, M.; Howard, A.; Adam, H.; Kalenichenko, D. Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 2704–2713. [Google Scholar]
- Hinton, G.; Vinyals, O.; Dean, J. Distilling the Knowledge in a Neural Network. arXiv 2015, arXiv:1503.02531. [Google Scholar]
- Chollet, F. Xception: Deep Learning with Depthwise Separable Convolutions. In Proceedings of the 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 21–26 July 2017; pp. 1800–1807, ISSN 1063-6919. [Google Scholar] [CrossRef]
- Freeman, I.; Roese-Koerner, L.; Kummert, A. EffNet: An Efficient Structure for Convolutional Neural Networks. arXiv 2018, arXiv:1801.06434. [Google Scholar]
- Lawrence, T.; Zhang, L. IoTNet: An Efficient and Accurate Convolutional Neural Network for IoT Devices. Sensors 2019, 19, 5541. [Google Scholar] [CrossRef]
- Tan, M.; Le, Q.V. EfficientNet: Improving Accuracy and Efficiency through AutoML and Model Scaling. 2019. Available online: https://research.google/blog/efficientnet-improving-accuracy-and-efficiency-through-automl-and-model-scaling/ (accessed on 1 July 2024).
- Gholami, A.; Kwon, K.; Wu, B.; Tai, Z.; Yue, X.; Jin, P.; Zhao, S.; Keutzer, K. SqueezeNext: Hardware-Aware Neural Network Design. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Salt Lake City, UT, USA, 18–22 June 2018; pp. 1638–1647, ISSN 2160-7516. [Google Scholar] [CrossRef]
- Huang, G.; Liu, S.; Maaten, L.V.D.; Weinberger, K.Q. CondenseNet: An Efficient DenseNet Using Learned Group Convolutions. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–22 June 2018; pp. 2752–2761. [Google Scholar] [CrossRef]
- Liu, C.; Zoph, B.; Neumann, M.; Shlens, J.; Hua, W.; Li, L.J.; Fei-Fei, L.; Yuille, A.; Huang, J.; Murphy, K. Progressive Neural Architecture Search. arXiv 2018, arXiv:1712.00559. [Google Scholar] [CrossRef]
- Tan, M.; Chen, B.; Pang, R.; Vasudevan, V.; Sandler, M.; Howard, A.; Le, Q.V. MnasNet: Platform-Aware Neural Architecture Search for Mobile. arXiv 2019, arXiv:1807.11626. [Google Scholar] [CrossRef]
- Real, E.; Aggarwal, A.; Huang, Y.; Le, Q.V. Regularized Evolution for Image Classifier Architecture Search. arXiv 2019, arXiv:1802.01548. [Google Scholar] [CrossRef]
- Liu, H.; Simonyan, K.; Yang, Y. DARTS: Differentiable Architecture Search. arXiv 2019, arXiv:1806.09055. [Google Scholar]
- Wu, B.; Dai, X.; Zhang, P.; Wang, Y.; Sun, F.; Wu, Y.; Tian, Y.; Vajda, P.; Jia, Y.; Keutzer, K. FBNet: Hardware-Aware Efficient ConvNet Design via Differentiable Neural Architecture Search. arXiv 2019, arXiv:1812.03443. [Google Scholar] [CrossRef]
- Han, K.; Wang, Y.; Tian, Q.; Guo, J.; Xu, C.; Xu, C. GhostNet: More Features from Cheap Operations. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Seattle, WA, USA, 13–19 June 2020; pp. 1580–1589. [Google Scholar]
- Mehta, S.; Rastegari, M. MobileViT: Light-weight, General-purpose, and Mobile-friendly Vision Transformer. arXiv 2022, arXiv:2110.02178. [Google Scholar] [CrossRef]
- Krishnamoorthi, R. Quantizing Deep Convolutional Networks for Efficient Inference: A Whitepaper. arXiv 2018, arXiv:1806.08342. [Google Scholar] [CrossRef]
- Lin, J.; Chen, W.M.; Cai, H.; Gan, C.; Han, S. MCUNetV2: Memory-Efficient Patch-based Inference for Tiny Deep Learning. arXiv 2021, arXiv:2110.15352. [Google Scholar] [CrossRef]
- Liberis, E.; Dudziak, Ł.; Lane, N.D. μNAS: Constrained Neural Architecture Search for Microcontrollers. In Proceedings of the 1st Workshop on Machine Learning and Systems, New York, NY, USA, 26 April 2021; EuroMLSys ’21. pp. 70–79. [Google Scholar] [CrossRef]
- Deng, J.; Dong, W.; Socher, R.; Li, L.J.; Li, K.; Fei-Fei, L. ImageNet: A large-scale hierarchical image database. In Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, Miami, FL, USA, 20–25 June 2009; pp. 248–255, ISSN 1063-6919. [Google Scholar] [CrossRef]
- Krizhevsky, A.; Hinton, G. Learning Multiple Layers of Features from Tiny Images; Technical Report 0; University of Toronto: Toronto, ON, Canada, 2009. [Google Scholar]
- Hu, J.; Shen, L.; Sun, G. Squeeze-and-Excitation Networks. In Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141, ISSN 2575-7075. [Google Scholar] [CrossRef]
Input | Operator | Scaling | Benchmark | Optim. 1 | Optim. 2 | Optim. 3 |
---|---|---|---|---|---|---|
VWW | Factors | |||||
(Our Use Case) | ||||||
Channels: | Channels: | Channels: | Channels: | |||
96 × 96 × 3 | conv2d 3 × 3 | 8 | 8 | 9 | 22 | |
48 × 48 × 32 | mobilenet/s1 | 16 | 16 | 19 | 44 | |
48 × 48 × 64 | mobilenet/s2 | 32 | 32 | 38 | 89 | |
24 × 24 × 128 | mobilenet/s1 | 32 | 32 | 38 | 89 | |
24 × 24 × 128 | mobilenet/s2 | 64 | 64 | 76 | 179 | |
12 × 12 × 256 | mobilenet/s1 | 64 | 64 | 76 | 179 | |
12 × 12 × 256 | mobilenet/s2 | × | 128 | 128 | 153 | 107 |
6 × 6 × 512 | mobilenet/s1 | × | 128 | 128 | 153 | 107 |
6 × 6 × 512 | mobilenet/s1 | × | 128 | 128 | 153 | 107 |
6 × 6 × 512 | mobilenet/s1 | × | 128 | 128 | 153 | 107 |
6 × 6 × 512 | mobilenet/s1 | × | 128 | — | 153 | 107 |
6 × 6 × 512 | mobilenet/s1 | × | 128 | — | 153 | 107 |
6 × 6 × 512 | mobilenet/s2 | pl | 256 | 256 | 256 | 64 |
3 × 3 × 1024 | mobilenet/s1 | ll | 256 | 256 | 32 | 32 |
3 × 3 × 1024 | global avgpool | 256 | 256 | 32 | 32 | |
1024 | dense (k) | 1 | 1 | 1 | 1 | |
k (k = 2) | softmax | — | — | — | — | |
Acc (%) | 85.4 | 85.1 | 86.1 | 88.8 | ||
Model Size | 293.8 KB | 244.6 KB | 243.4 KB | 243.9 KB | ||
<250 KB | X | ✓ | ✓ | ✓ |
Model | Acc. | Model | Peak | Inference | MACs | Params | Bytes/ |
---|---|---|---|---|---|---|---|
Size | Memory | on MCU | Param | ||||
(%) | (KB) | (KB) | (ms) | ||||
mobilenetv1_0.7_96 | 88.8 | 243.9 | 148.5 | 181.7 | 21,893,563 | 171,743 | 1.454 |
_c3_o2_l5ll32pl64b0.3 | |||||||
mobilenetv3smallNSQ_0.3_96 | 86.1 | 172.8 | 110.6 | 118.8 | 6,191,720 | 78,664 | 2.249 |
_c3_o2_l32pl128 | |||||||
mobilenetv1_0.25_96 | 85.4 | 293.8 | 54.0 | 66.4 | 7,489,664 | 221,794 | 1.356 |
_c3_o2 (benchmark) | |||||||
shufflenetv1_0.25_96 | 85.1 | 175.2 | 81.0 | 69.6 | 3,184,560 | 71,030 | 2.526 |
_c3_o2_g1 | |||||||
mobilenetv2_0.25_96 | 84.1 | 248.0 | 56.3 | 59.5 | 3,886,352 | 138,366 | 1.835 |
_c3_o2_t5l256 | |||||||
shufflenetv2_0.1_96 | 83.3 | 167.8 | 78.8 | 57.4 | 2,741,080 | 56,058 | 3.065 |
_c3_o2_l128 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Brockmann, S.; Schlippe, T. Optimizing Convolutional Neural Networks for Image Classification on Resource-Constrained Microcontroller Units. Computers 2024, 13, 173. https://doi.org/10.3390/computers13070173
Brockmann S, Schlippe T. Optimizing Convolutional Neural Networks for Image Classification on Resource-Constrained Microcontroller Units. Computers. 2024; 13(7):173. https://doi.org/10.3390/computers13070173
Chicago/Turabian StyleBrockmann, Susanne, and Tim Schlippe. 2024. "Optimizing Convolutional Neural Networks for Image Classification on Resource-Constrained Microcontroller Units" Computers 13, no. 7: 173. https://doi.org/10.3390/computers13070173
APA StyleBrockmann, S., & Schlippe, T. (2024). Optimizing Convolutional Neural Networks for Image Classification on Resource-Constrained Microcontroller Units. Computers, 13(7), 173. https://doi.org/10.3390/computers13070173