Optimizing Artificial Neural Networks to Minimize Arithmetic Errors in Stochastic Computing Implementations
Abstract
:1. Introduction
2. Stochastic Computing
3. Multiplication Error
- Uniform–uniform: Weights are uniformly distributed between −1 and +1, i.e., , which results in the AMSE given by (9).
4. LFSR Seed Selection Error
5. Stochastic Computing Aware Training
- Weight clamping: Analogous to fixed-point representation, the technique of clamping weights within a specific range is employed. This method prevents the emergence of extreme weight values in the distribution, which can otherwise escalate linear quantization errors. By restricting the variance of the weight distribution, smaller quantization errors are achieved.
- Weight distribution uniformization: Adjusting the weight distribution to be more uniform can mitigate the relative errors associated with smaller quantized values. Additionally, dispersing weights that are proximate to zero can diminish the incidence of large relative errors during bipolar SC multiplications.
- Weight distribution binarization: To enhance the accuracy of multiplication outcomes in stochastic circuits, increasing the proportion of weights represented by bipolar quantities set to and is beneficial. In the bipolar encoding framework, these values are correlated with activation rates of 0 and 1, respectively, thereby not contributing to the arithmetic error.
Algorithm 1 Stochastic computing aware weight distribution modification and quantization steps |
Require: Weights , width factor , rounding rate , number of integer values
|
6. Experimental Results
Metric | TCFPGA17 [23] | BNFPGA18 [20] | SCFPGA20 [24] | SCFPGA24 [21] | This Work |
---|---|---|---|---|---|
Year | 2017 | 2018 | 2020 | 2024 | 2024 |
Architecture parallelism | Sequential | Sequential | Semi-Parallel | Parallel | Parallel |
Computing paradigm | TC | BNN | SC | SC | SC |
Activation/Weight bits | 16/8 | 8/1 | 9/9 | 6/6 | 6/6 |
FPGA family | Virtex7 | Stratix V | Zynq | Kintex7 | Arria 10 |
FPGA name | VX690T | 5SFSD8 | XC7Z020 | XC7K325T | GX1150 |
Frequency (MHz) | 100 | 150 | 60 | 110 | 150 |
Software Acc (%) | 99.17 | 98.70 | 98.67 | 98.36 | 98.98 |
Hardware Acc (%) | 98.16 | 98.24 | 98.13 | 97.63 | 97.64 |
Acc Degradation (%) | 1.01 | 0.46 | 0.54 | 0.73 | 1.34 |
Throughput (Images/s) | 10,617 | 294,118 | 170 | 1,718,800 | 1,190,476 |
Performance (Images/s/MHz) | 106 | 1961 | 3 | 15,626 | 7937 |
Power (W) | 25.2 | 26.2 | 3.7 | 6.8 | 4.9 |
Energy efficiency (Images/J) | 421 | 11,226 | 46 | 254,373 | 243,171 |
Logic used K (LUT or ALM) | 233 | 0.182 | 28 | 153 | 318 |
DSP (blocks) | 2907 | 20 | 0 | 0 | 0 |
Memory (Mbits) | 17.2 | 44.2 | 1.7 | 0.0 | 0.0 |
Metric | Year | Software Acc (%) | Hardware Acc (%) | Acc Degradation (%) | Test Platform |
---|---|---|---|---|---|
SCCNN19 [25] | 2019 | 98.47 | 97.94 | −0.53 | Sim |
SCFPGA20 [24] | 2020 | 98.67 | 98.13 | −0.54 | FPGA |
SCCNN21 [26] | 2021 | 98.75 | 97.50 | −1.25 | Sim |
SCFPGA22 [12] | 2022 | 98.60 | 97.58 | −1.02 | FPGA |
SCFPGA24 [21] (8-bits) | 2024 | 98.36 | 98.22 | −0.14 | FPGA |
This work (8-bits) | 2024 | 98.98 | 98.97 | −0.01 | FPGA |
7. Conclusions
Author Contributions
Funding
Data Availability Statement
Conflicts of Interest
Abbreviations
AD | Accuracy degradation |
AMSE | Average mean squared error |
BNN | Binarized neural network |
CNN | Convolutional neural network |
FPGA | Field Programmable Gate Array |
LFSR | Linear feedback shift register |
ML | Machine learning |
MSE | Mean squared error |
NN | Neural network |
SC | Stochastic computing |
References
- Chen, J.; Ran, X. Deep learning with edge computing: A review. Proc. IEEE 2019, 107, 1655–1674. [Google Scholar] [CrossRef]
- Suda, N.; Loh, D. Machine Learning on Arm Cortex-M Microcontrollers; Arm Ltd.: Cambridge, UK, 2019. [Google Scholar]
- Google. Google Edge TPU. Available online: https://cloud.google.com/edge-tpu (accessed on 16 July 2024).
- Abadi, M.; Barham, P.; Chen, J.; Chen, Z.; Davis, A.; Dean, J.; Devin, M.; Ghemawat, S.; Irving, G.; Isard, M.; et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementatio, Savannah, GA, USA, 2–4 November 2016; Volume 16, pp. 265–283. [Google Scholar]
- Paszke, A.; Gross, S.; Chintala, S.; Chanan, G.; Yang, E.; DeVito, Z.; Lin, Z.; Desmaison, A.; Antiga, L.; Lerer, A. Automatic differentiation in PyTorch. In Proceedings of the 31st Conference on Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017; Available online: https://openreview.net/forum?id=BJJsrmfCZ (accessed on 16 July 2024).
- Akopyan, F.; Sawada, J.; Cassidy, A.; Alvarez-Icaza, R.; Arthur, J.; Merolla, P.; Imam, N.; Nakamura, Y.; Datta, P.; Nam, G.J.; et al. Truenorth: Design and tool flow of a 65 mw 1 million neuron programmable neurosynaptic chip. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2015, 34, 1537–1557. [Google Scholar] [CrossRef]
- Davies, M.; Srinivasa, N.; Lin, T.H.; Chinya, G.; Cao, Y.; Choday, S.H.; Dimou, G.; Joshi, P.; Imam, N.; Jain, S.; et al. Loihi: A neuromorphic manycore processor with on-chip learning. IEEE Micro 2018, 38, 82–99. [Google Scholar] [CrossRef]
- Gaines, B.R. Stochastic computing systems. Adv. Inf. Syst. Sci. 1969, 2, 37–172. [Google Scholar]
- Frasser, C.F.; Roca, M.; Rosselló, J.L. Optimal stochastic computing randomization. Electronics 2021, 10, 2985. [Google Scholar] [CrossRef]
- LeCun, Y.; Bottou, L.; Bengio, Y.; Haffner, P. Gradient-based learning applied to document recognition. Proc. IEEE 1998, 86, 2278–2324. [Google Scholar] [CrossRef]
- Lecun, Y. The MNIST Database of Handwritten Digits. Available online: http://yann.lecun.com/exdb/mnist/ (accessed on 16 July 2024).
- Frasser, C.F.; Linares-Serrano, P.; de los Ríos, I.D.; Morán, A.; Skibinsky-Gitlin, E.S.; Font-Rosselló, J.; Canals, V.; Roca, M.; Serrano-Gotarredona, T.; Rosselló, J.L. Fully Parallel Stochastic Computing Hardware Implementation of Convolutional Neural Networks for Edge Computing Applications. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 10408–10418. [Google Scholar] [CrossRef] [PubMed]
- Li, Z.; Chen, Z.; Zhang, Y.; Huang, Z.; Qian, W. Simultaneous area and latency optimization for stochastic circuits by D flip-flop insertion. IEEE Trans. Comput. Aided Des. Integr. Circuits Syst. 2018, 38, 1251–1264. [Google Scholar] [CrossRef]
- Neugebauer, F.; Polian, I.; Hayes, J.P. Building a better random number generator for stochastic computing. In Proceedings of the 2017 Euromicro Conference on Digital System Design (DSD), Vienna, Austria, 30 August–1 September 2017; pp. 1–8. [Google Scholar]
- Morán, A.; Parrilla, L.; Roca, M.; Font-Rossello, J.; Isern, E.; Canals, V. Digital Implementation of Radial Basis Function Neural Networks Based on Stochastic Computing. IEEE J. Emerg. Sel. Top. Circuits Syst. 2023, 13, 257–269. [Google Scholar] [CrossRef]
- Anderson, J.H.; Hara-Azumi, Y.; Yamashita, S. Effect of LFSR seeding, scrambling and feedback polynomial on stochastic computing accuracy. In Proceedings of the 2016 Design, Automation Test in Europe Conference Exhibition (DATE), Dresden, Germany, 14–18 March 2016; pp. 1550–1555. [Google Scholar]
- LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
- Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar]
- Gidel Company. PROC10A Board Image. Available online: https://gidel.com/product/proc10a/ (accessed on 10 June 2020).
- Liang, S.; Yin, S.; Liu, L.; Luk, W.; Wei, S. FP-BNN: Binarized neural network on FPGA. Neurocomputing 2018, 275, 1072–1086. [Google Scholar] [CrossRef]
- Lee, Y.Y.; Halim, Z.A.; Ab Wahab, M.N.; Almohamad, T.A. Stochastic Computing Convolutional Neural Network Architecture Reinvented for Highly Efficient Artificial Intelligence Workload on Field-Programmable Gate Array. Research 2024, 7, 0307. [Google Scholar] [CrossRef] [PubMed]
- Costoya, A.M. Compact Machine Learning Systems with Reconfigurable Computing. Ph.D. Thesis, Universitat de les Illes Balears, Palma, Spain, 2022. [Google Scholar]
- Li, Z.; Wang, L.; Guo, S.; Deng, Y.; Dou, Q.; Zhou, H.; Lu, W. Laius: An 8-Bit Fixed-Point CNN Hardware Inference Engine. In Proceedings of the 2017 IEEE International Symposium on Parallel and Distributed Processing with Applications and 2017 IEEE International Conference on Ubiquitous Computing and Communications (ISPA/IUCC), Guangzhou, China, 12–15 December 2017; pp. 143–150. [Google Scholar] [CrossRef]
- Muthappa, P.K.; Neugebauer, F.; Polian, I.; Hayes, J.P. Hardware-Based Fast Real-Time Image Classification with Stochastic Computing. In Proceedings of the 2020 IEEE 38th International Conference on Computer Design (ICCD), Hartford, CT, USA, 18–21 October 2020; pp. 340–347. [Google Scholar]
- Zhang, Y.; Zhang, X.; Song, J.; Wang, Y.; Huang, R.; Wang, R. Parallel Convolutional Neural Network (CNN) Accelerators Based on Stochastic Computing. In Proceedings of the 2019 IEEE International Workshop on Signal Processing Systems (SiPS), Nanjing, China, 20–23 October 2019; pp. 19–24. [Google Scholar] [CrossRef]
- Sadi, M.H.; Mahani, A. Accelerating Deep Convolutional Neural Network base on stochastic computing. Integration 2021, 76, 113–121. [Google Scholar] [CrossRef]
- Frasser, F.; Camilo, C. Hardware Implementation of Machine Learning and Deep-Learning Systems oriented to Image Processing. Ph.D. Thesis, Universitat de les Illes Balears, Palma, Spain, 2022. [Google Scholar]
Pulses | Unipolar | Bipolar |
---|---|---|
0/4 | 0 | −1 |
1/4 | 0.25 | −0.5 |
2/4 | 0.50 | 0 |
3/4 | 0.75 | 0.5 |
4/4 | 1 | 1 |
Metric | SCFPGA22 [12] | This Work |
---|---|---|
Year | 2022 | 2024 |
Architecture parallelism | Parallel | Parallel |
Computing paradigm | SC | SC |
Activation/Weight bits | 8/8 | 6/6 |
FPGA platform | Arria10 GX1150 | Arria10 GX1150 |
Frequency (MHz) | 150 | 150 |
Software Acc (%) | 98.60 | 98.98 |
Hardware Acc (%) | 97.58 | 97.64 |
Acc Degradation (%) | 1.02 | 1.34 |
Throughput (Images/s) | 294,118 | 1,190,476 |
Performance (Images/s/MHz) | 1961 | 7937 |
Power (W) | 21.0 | 4.9 |
Energy efficiency (Images/J) | 14,006 | 243,171 |
Logic used K (LUT or ALM) | 343 | 318 |
DSP (blocks) | 0 | 0 |
Memory (Mbits) | 0.00 | 0.00 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Frasser, C.F.; Morán, A.; Canals, V.; Font, J.; Isern, E.; Roca, M.; Rosselló, J.L. Optimizing Artificial Neural Networks to Minimize Arithmetic Errors in Stochastic Computing Implementations. Electronics 2024, 13, 2846. https://doi.org/10.3390/electronics13142846
Frasser CF, Morán A, Canals V, Font J, Isern E, Roca M, Rosselló JL. Optimizing Artificial Neural Networks to Minimize Arithmetic Errors in Stochastic Computing Implementations. Electronics. 2024; 13(14):2846. https://doi.org/10.3390/electronics13142846
Chicago/Turabian StyleFrasser, Christiam F., Alejandro Morán, Vincent Canals, Joan Font, Eugeni Isern, Miquel Roca, and Josep L. Rosselló. 2024. "Optimizing Artificial Neural Networks to Minimize Arithmetic Errors in Stochastic Computing Implementations" Electronics 13, no. 14: 2846. https://doi.org/10.3390/electronics13142846
APA StyleFrasser, C. F., Morán, A., Canals, V., Font, J., Isern, E., Roca, M., & Rosselló, J. L. (2024). Optimizing Artificial Neural Networks to Minimize Arithmetic Errors in Stochastic Computing Implementations. Electronics, 13(14), 2846. https://doi.org/10.3390/electronics13142846