MDCIM: MRAM-Based Digital Computing-in-Memory Macro for Floating-Point Computation with High Energy Efficiency and Low Area Overhead
Abstract
:1. Introduction
- (1)
- Design a digital computing circuit based on SOT-MRAM to reduce the area overhead of MAC units.
- (2)
- Implement FP64 MAC operations based on FMA instructions to reduce the latency of FP operations.
- (3)
- Achieve higher calculation frequency based on digital CIM to reduce the energy consumption of the macro.
2. Background
2.1. Working Modes of DCIM
2.2. Principle of FP64 Computation
- ①
- Unpack the operands into the FP multiplier. The sign, exponent, and mantissa of each operand are separated. Then, the mantissa is converted to the significand by adding hidden bit 1 at the top digit.
- ②
- XOR the sign bits of two operands to obtain the sign bits of the product.
- ③
- Multiply the significands of two operands to obtain the product, which is the key step that restricts the overall computational speed.
- ④
- Add the exponents of two operands. The sum is shifted with the normalization of the product.
- ⑤
- Pack parts as products. Remove the hidden bit, and reassemble the sign, exponent, and mantissa of the results.
2.3. Process of MAC Operation
3. Our MDCIM Work
3.1. Algorithm and Architecture of FP64 Computation
3.2. Array of SOT-MRAM
4. Simulation and Discussion
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Chen, J.; Li, J.; Li, Y.; Miao, X.-S. Multiply accumulate operations in memristor crossbar arrays for analog computing. J. Semicond. 2021, 42, 013104. [Google Scholar] [CrossRef]
- Ahn, J.; Yoo, S.; Mutlu, O.; Choi, K. PIM-enabled instructions. In Proceedings of the 42nd Annual International Symposium on Computer Architecture, Portland, OR, USA, 13–17 June 2015; pp. 336–348. [Google Scholar] [CrossRef]
- Chiu, Y.-C.; Yang, C.-S.; Teng, S.-H.; Huang, H.-Y.; Chang, F.-C.; Wu, Y.; Chien, Y.-A.; Hsieh, F.-L.; Li, C.-Y.; Lin, G.-Y.; et al. A 22nm 4Mb STT-MRAM Data-Encrypted Near-Memory Computation Macro with a 192GB/s Read-and-Decryption Bandwidth and 25.1-55.1TOPS/W 8b MAC for AI Operations. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 178–180. [Google Scholar] [CrossRef]
- Deaville, P.; Zhang, B.; Verma, N. A 22nm 128-kb MRAM Row/Column-Parallel In-Memory Computing Macro with Memory-Resistance Boosting and Multi-Column ADC Readout. In Proceedings of the 2022 IEEE Symposium on VLSI Technology and Circuits (VLSI Technology and Circuits), Honolulu, HI, USA, 12–17 June 2022; pp. 268–269. [Google Scholar] [CrossRef]
- Jung, S.; Lee, H.; Myung, S.; Kim, H.; Yoon, S.K.; Kwon, S.W.; Ju, Y.; Kim, M.; Yi, W.; Han, S.; et al. A crossbar array of magnetoresistive memory devices for in-memory computing. Nature 2022, 601, 211–216. [Google Scholar] [CrossRef]
- Fujiwara, H.; Mori, H.; Zhao, W.-C.; Chuang, M.-C.; Naous, R.; Chuang, C.-K.; Hashizume, T.; Sun, D.; Lee, C.-F.; Akarvardar, K.; et al. A 5-nm 254-TOPS/W 221-TOPS/mm2 Fully-Digital Computing-in-Memory Macro Supporting Wide-Range Dynamic-Voltage-Frequency Scaling and Simultaneous MAC and Write Operations. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar] [CrossRef]
- Chi, P.; Li, S.; Xu, C.; Zhang, T.; Zhao, J.; Liu, Y.; Wang, Y.; Xie, Y. PRIME: A Novel Processing-in-Memory Architecture for Neural Network Computation in ReRAM-Based Main Memory. In Proceedings of the 2016 ACM/IEEE 43rd Annual International Symposium on Computer Architecture (ISCA), Seoul, Republic of Korea, 18–22 June 2016; pp. 27–39. [Google Scholar] [CrossRef]
- Li, S.; Xu, C.; Zou, Q.; Zhao, J.; Lu, Y.; Xie, Y. Pinatubo: A Processing-in-Memory Architecture for Bulk Bitwise Operations in Emerging Non-volatile Memories. In Proceedings of the 53rd Annual Design Automation Conference, Austin, TX, USA, 5–9 June 2016; pp. 1–6. [Google Scholar] [CrossRef]
- Wang, J.; Gu, Z.; Wang, H.; Hao, Z. TAM: A Computing in Memory based on Tandem Array within STT-MRAM for Energy-Efficient Analog MAC Operation. In Proceedings of the 2023 Design, Automation & Test in Europe Conference & Exhibition (DATE), Antwerp, Belgium, 17–19 April 2023; pp. 1–6. [Google Scholar]
- Doevenspeck, J.; Garello, K.; Verhoef, B.; Degraeve, R.; Van Beek, S.; Crotti, D.; Yasin, F.; Couet, S.; Jayakumar, G.; Papistas, I.A.; et al. SOT-MRAM Based Analog in-Memory Computing for DNN Inference. In Proceedings of the 2020 IEEE Symposium on VLSI Technology, Honolulu, HI, USA, 16–19 June 2020; pp. 1–2. [Google Scholar]
- Tu, F.; Wang, Y.; Wu, Z.; Liang, L.; Ding, Y.; Kim, B.; Liu, L.; Wei, S.; Xie, Y.; Yin, S. A 28 nm 29.2TFLOPS/W BF16 and 36.5TOPS/W INT8 Reconfigurable Digital CIM Processor with Unified FP/INT Pipeline and Bitwise In-Memory Booth Multiplication for Cloud Deep Learning Acceleration. In Proceedings of the 2022 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 20–26 February 2022; pp. 1–3. [Google Scholar]
- Wang, Z.; Zhou, H.; Wang, M.; Cai, W.; Zhu, D.; Klein, J.-O.; Zhao, W. Proposal of Toggle Spin Torques Magnetic RAM for Ultrafast Computing. IEEE Electron Device Lett. 2019, 40, 726–729. [Google Scholar] [CrossRef]
- Wang, Z.; Zhang, L.; Wang, M.; Wang, Z.; Zhu, D.; Zhang, Y.; Zhao, W. High-Density NAND-Like Spin Transfer Torque Memory with Spin Orbit Torque Erase Operation. IEEE Electron Device Lett. 2018, 39, 343–346. [Google Scholar] [CrossRef]
- Wang, M.; Cai, W.; Zhu, D.; Wang, Z.; Kan, J.; Zhao, Z.; Cao, K.; Wang, Z.; Zhang, Y.; Zhang, T.; et al. Field-free switching of a perpendicular magnetic tunnel junction through the interplay of spin–orbit and spin-transfer torques. Nat. Electron. 2018, 1, 582–588. [Google Scholar] [CrossRef]
- Wang, M.; Cai, W.; Cao, K.; Zhou, J.; Wrona, J.; Peng, S.; Yang, H.; Wei, J.; Kang, W.; Zhang, Y.; et al. Current-induced magnetization switching in atom-thick tungsten engineered perpendicular magnetic tunnel junctions with large tunnel magnetoresistance. Nat. Commun. 2018, 9, 671. [Google Scholar] [CrossRef] [PubMed]
- Peng, S.; Zhu, D.; Zhou, J.; Zhang, B.; Cao, A.; Wang, M.; Cai, W.; Cao, K.; Zhao, W. Modulation of Heavy Metal/Ferromagnetic Metal Interface for High-Performance Spintronic Devices. Adv. Electron. Mater. 2019, 5, 1900134. [Google Scholar] [CrossRef]
- Peng, S.; Zhao, W.; Qiao, J.; Su, L.; Zhou, J.; Yang, H.; Zhang, Q.; Zhang, Y.; Grezes, C.; Amiri, P.K.; et al. Giant interfacial perpendicular magnetic anisotropy in MgO/CoFe/capping layer structures. Appl. Phys. Lett. 2017, 110, 072403. [Google Scholar] [CrossRef]
- Whitehead, N.; Fit-Florea, A. Precision & performance: Floating point and IEEE 754 compliance for NVIDIA GPUs. rn (A+ B) 2011, 21, 18749–19424. [Google Scholar]
- Szydzik, T.; Moloney, D. Precision refinement for media-processor SoCs: fp32-> fp64 on myriad. In Proceedings of the 2014 IEEE Hot Chips 26 Symposium (HCS), Las Palmas, Spain, 10–12 August 2014. [Google Scholar]
- Zhang, H.; Chen, D.; Ko, S.B. Efficient multiple-precision floating-point fused multiply-add with mixed-precision support. IEEE Trans. Comput. 2019, 68, 1035–1048. [Google Scholar] [CrossRef]
- Park, J.; Lee, S.; Jeon, D. A neural network training processor with 8-bit shared exponent bias floating point and multiple-way fused multiply-add trees. IEEE J. Solid-State Circuits 2021, 57, 965–977. [Google Scholar] [CrossRef]
- Rawat, R.M.; Kumar, V. A Comparative Study of 6T and 8T SRAM Cell With Improved Read and Write Margins in 130 nm CMOS Technology. Wseas Trans. Circuits Syst. 2020, 19, 13–18. [Google Scholar] [CrossRef]
- Tohoku University. Researchers Develop 128Mb STT-MRAM with World’s Fastest Write Speed for Embedded Memory. ScienceDaily. 2023. Available online: www.sciencedaily.com/releases/2018/12/181228164841.htm (accessed on 1 September 2023).
- Jeong, S.; Park, J.; Jeon, D. A 28nm 1.644TFLOPS/W Floating-Point Computation SRAM Macro with Variable Precision for Deep Neural Network Inference and Training. In Proceedings of the ESSCIRC 2022- IEEE 48th European Solid State Circuits Conference (ESSCIRC), Milan, Italy, 19–22 September 2022; pp. 145–148. [Google Scholar] [CrossRef]
- Lee, J.; Kim, J.; Jo, W.; Kim, S.; Kim, S.; Lee, J.; Yoo, H.-J. A 13.7 TFLOPS/W Floating-point DNN Processor using Heterogeneous Computing Architecture with Exponent-Computing-in-Memory. In Proceedings of the 2021 Symposium on VLSI Circuits, Kyoto, Japan, 13–19 June 2021. [Google Scholar] [CrossRef]
- Wang, J.; Wang, X.; Eckert, C.; Subramaniyan, A.; Das, R.; Blaauw, D.; Sylvester, D. A Compute SRAM with Bit-Serial Integer/Floating-Point Operations for Programmable In-Memory Vector Acceleration. In Proceedings of the 2019 IEEE International Solid- State Circuits Conference (ISSCC), San Francisco, CA, USA, 17–21 February 2019. [Google Scholar] [CrossRef]
- Wang, J.; Wang, X.; Eckert, C.; Subramaniyan, A.; Das, R.; Blaauw, D.; Sylvester, D. A 28-nm Compute SRAM With Bit-Serial Logic/Arithmetic Operations for Programmable In-Memory Vector Computing. IEEE J. Solid-State Circuits 2020, 55, 76–86. [Google Scholar] [CrossRef]
- Leon, V.; Paparouni, T.; Petrongonas, E.; Soudris, D.; Pekmestzi, K. Improving Power of DSP and CNN Hardware Accelerators Using Approximate Floating-point Multipliers. ACM Trans. Embed. Comput. Syst. 2021, 20, 1–21. [Google Scholar] [CrossRef]
- Gustafsson, O.; Hellman, N. Approximate Floating-Point Operations with Integer Units by Processing in the Logarithmic Domain. In Proceedings of the 2021 IEEE 28th Symposium on Computer Arithmetic (ARITH), Lyngby, Denmark, 14–16 June 2021. [Google Scholar] [CrossRef]
Input | Output | ||
---|---|---|---|
a | x | b | d |
0x40454d4a9a95352a (42.60) | 0x40425f44be897d13 (36.74) | 0x408073cc6798cf32 (526.47) | 0x40a057d8496a0647 (2091.92) |
0x402b9cf739ee73dd (13.81) | 0x4045124e249c4939 (42.14) | 0x4081a05840b08161 (564.04) | 0x4091e7931bfbd1a3 (1145.89) |
0x4019d4b3a96752cf (6.46) | 0x404265e8cbd197a3 (36.80) | 0x407edc24b8497093 (493.76) | 0x4086db06849fa665 (731.38) |
0x405676f4ede9dbd4 (89.86) | 0x40340aa015402a80 (20.04) | 0x407726f04de09bc1 (370.43) | 0x40a0f6acacac57ff (2171.34) |
0x40580ec81d903b20 (96.23) | 0x40585af4b5e96bd3 (97.42) | 0x4077c61f8c3f187e (380.38) | 0x40c30da89eda44c6 (9755.32) |
ISSCC’19 [26] | JSSC’20 [27] | VLSI’21 [25] | ESSCIRC’22 [24] | This Work | |
---|---|---|---|---|---|
Memory array | 28 nm SRAM | 28 nm SRAM | 28 nm SRAM | 28 nm SRAM | SOT-MRAM |
Supply voltage (V) | 0.6–1.1 | 0.6–1.1 | 0.76–1.1 | 0.5–0.9 | 0.9–1.32 |
Supported floating-point format | FP8 | FP8 | BF16 | FP32 | FP64 |
Macro size (mm2) | 2.55 (Chip) | 2.7 (Die) | 5.832 (Die) | 0.0439 | 0.051814 |
Frequency (MHz) | 475 | 114 | 250 | 400 | 150 |
GOPS/W/FP64 | 8.59 | 4.86 | 47.5 | 24.75 | 26.9 |
GOPS/mm2/FP64 | 0.354 | 0.426 | 1.28 | 1.21 | 0.322 |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, L.; Tan, L.; Gan, J.; Pan, B.; Zhou, J.; Li, Z. MDCIM: MRAM-Based Digital Computing-in-Memory Macro for Floating-Point Computation with High Energy Efficiency and Low Area Overhead. Appl. Sci. 2023, 13, 11914. https://doi.org/10.3390/app132111914
Liu L, Tan L, Gan J, Pan B, Zhou J, Li Z. MDCIM: MRAM-Based Digital Computing-in-Memory Macro for Floating-Point Computation with High Energy Efficiency and Low Area Overhead. Applied Sciences. 2023; 13(21):11914. https://doi.org/10.3390/app132111914
Chicago/Turabian StyleLiu, Liang, Lehao Tan, Jie Gan, Biao Pan, Jiahui Zhou, and Zhengliang Li. 2023. "MDCIM: MRAM-Based Digital Computing-in-Memory Macro for Floating-Point Computation with High Energy Efficiency and Low Area Overhead" Applied Sciences 13, no. 21: 11914. https://doi.org/10.3390/app132111914
APA StyleLiu, L., Tan, L., Gan, J., Pan, B., Zhou, J., & Li, Z. (2023). MDCIM: MRAM-Based Digital Computing-in-Memory Macro for Floating-Point Computation with High Energy Efficiency and Low Area Overhead. Applied Sciences, 13(21), 11914. https://doi.org/10.3390/app132111914