A Low-Power Area-Efficient Precision Scalable Multiplier with an Input Vector Systolic Structure
Abstract
:1. Introduction
- A novel IVS structure for a 64-bit integer multiplier was designed based on the modular design method. During the computation, input-A of each 16-bit multiplier was stationary, while input-B is the systolic input. As the input data move like a systolic array, it was named an “input vector systolic” architecture. The comparison results prove that the IVS multiplier reduces at least 61.9% of the area and 45.18% of the power over its counterparts.
- For the inner 16-bit sub-multiplier, a TCA structure for the PPA was designed to increase the utilization of hardware resources. The original full 32-bit adders were replaced with separate 3/17-bit adders to reduce the adder’s area and eliminate redundant bits. The experiment in an FPGA showed a 39.8% reduction in the number of LUTs.
- The precision scale ability of the IVS multiplier is discussed, which can support sixteen different kinds of [16b, 32b, 48b, 64b] × [16b, 32b, 48b, 64b] multiplications.
2. Methods
2.1. The Design of the Proposed 64-Bit IVS Multiplier
2.1.1. Algorithm Description
2.1.2. The Iterative PPA Process
2.2. The Design of MUL16
2.2.1. Modified Radix-8 Booth Encoding
2.2.2. Transverse Carry Array Structure for PPA
2.3. Configurability of the IVS Multiplier
3. Results and Discussions
3.1. Comparison of MUL16
3.2. Comparison of the 64-Bit IVS Multiplier
3.2.1. Compared with the Traditional Multipliers
3.2.2. Comparison with Some Low-Power Multipliers in the Literature
4. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
IVS | Input Vector Systolic |
CNN | Convolutional Neural Network |
TCA | Transverse Carry Array |
PPA | Partial Products Accumulation |
BZ-FAD | Bypass Zero Feed A Directly |
PPs | Partial Products |
MP | Multi-Precision |
MUL16 | 16-bit Integer Multiplier |
CSA | Carry-Save Adder |
RCA | Ripple-Carry Adder |
IC | Integrated Circuits |
VCS | Verilog Compile Simulator |
STA | Static Timing Analysis |
References
- Dhem, J.F. Design of an Efficient Public-Key Cryptographic Library for RISC-Based Smart Cards. Ph.D. Thesis, UCL-Université Catholique de Louvain, Ottignies-Louvain-la-Neuve, Belgium, 1998. [Google Scholar]
- Newell, D.; Duffy, M. Review of Power Conversion and Energy Management for Low-Power, Low-Voltage Energy Harvesting Powered Wireless Sensors. IEEE Trans. Power Electron. 2019, 34, 9794–9805. [Google Scholar] [CrossRef]
- Chaithra, T.; PradeepKumar, S.; Nandini, K.; Harshitha, S.; Ankitha, K. ASIC realization and performance evaluation of 64 × 64 bit high speed multiplier in CMOS 45 nm using Wallace Tree. In Proceedings of the 2017 2nd IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bangalore, India, 19–20 May 2017; pp. 1115–1119. [Google Scholar] [CrossRef]
- Mottaghi, M.D.; Ali, A.K.; Massoud, P. BZ-FAD: A low-power low-area multiplier based on shift-and-add architecture. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2009, 17, 302–306. [Google Scholar] [CrossRef]
- Zhang, X.; Boussaid, F.; Bermak, A. 32 Bit × 32 Bit Multiprecision Razor-Based Dynamic Voltage Scaling Multiplier With Operands Scheduler. IEee Trans. Very Large Scale Integr. Syst. 2014, 22, 759–770. [Google Scholar] [CrossRef]
- You, H.; Hei, Y.; Yuan, J.; Tang, W.; Bai, X.; Qiao, S. Design of low-power low-area asynchronous iterative multiplier. IEICE Electron. Express 2019, 16, 20190212. [Google Scholar] [CrossRef]
- Nandan, D.; Kanungo, J.; Mahajan, A. An efficient VLSI architecture for iterative logarithmic multiplier. In Proceedings of the 2017 4th International Conference on Signal Processing and Integrated Networks (SPIN), Noida, India, 2–3 February 2017; pp. 419–423. [Google Scholar] [CrossRef]
- Lee, C.Y.; Horng, J.S.; Jou, C.I.; Lu, E. Low-complexity bit-parallel systolic Montgomery multipliers for special classes of GF(2/sup m/). IEEE Trans. Comput. 2005, 54, 1061–1070. [Google Scholar] [CrossRef]
- Yong, K.M.; Hussin, R.; Kamarudin, A.; Ismail, R.C.; Isa, M.N.M.; Naziri, S.Z.M. Design and Analysis of 32-Bit Signed and Unsigned Multiplier Using Booth, Vedic and Wallace Architecture. J. Phys. Conf. Ser. 2021, 1755, 012008. [Google Scholar] [CrossRef]
- Booth, A.D. A signed binary multiplication technique. Q. J. Mech. Appl. Math. 1951, 4, 236. [Google Scholar] [CrossRef]
- Chang, W.Y.; Jen, C.W. High-speed Booth encoded parallel multiplier design. IEEE Trans. Comput. 2000, 49, 692–701. [Google Scholar] [CrossRef] [Green Version]
- Huang, Z.; Ercegovac, M. High-performance low-power left-to-right array multiplier design. IEEE Trans. Comput. 2005, 54, 272–283. [Google Scholar] [CrossRef]
- Wey, C.; Li, J. Design of reconfigurable array multipliers and multiplier-accumulators. In Proceedings of the 2004 IEEE Asia-Pacific Conference on Circuits and Systems, Tainan, Taiwan, 6–9 December 2004; Volume 1, pp. 37–40. [Google Scholar] [CrossRef]
- Praveen Kumar, M.; Sivanantham, S.; Balamurugan, S.; Mallick, P. Low power reconfigurable multiplier with reordering of partial products. In Proceedings of the 2011 International Conference on Signal Processing, Communication, Computing and Networking Technologies, Thuckalay, India, 21–22 July 2011; pp. 532–536. [Google Scholar] [CrossRef]
- Tu, J.; Van, L. Power-efficient pipelined reconfigurable fixed-width Baugh-Wooley multipliers. IEEE Trans. Comput. 2009, 58, 1346–1355. [Google Scholar] [CrossRef]
- Kuang, S.; Wang, J. Design of Power-Efficient Configurable Booth Multiplier. IEEE Trans. Circuits Syst. I Regul. Pap. 2010, 57, 568–580. [Google Scholar] [CrossRef]
- Sjalander, M.; Drazdziulis, M.; Larsson-Edefors, P.; Eriksson, H. A low-leakage twin-precision multiplier using reconfigurable power gating. In Proceedings of the 2005 IEEE International Symposium on Circuits and Systems (ISCAS), Kobe, Japan, 23–26 May 2005; Volume 2, pp. 1654–1657. [Google Scholar] [CrossRef]
- Li, K.; Mao, W.; Zhou, J.; Li, B.; Yang, Z.; Yang, S.; Du, L.; Huang, S.; Yu, H. A Vector Systolic Accelerator for Multi-Precision Floating-Point High-Performance Computing. IEEE Trans. Circuits Syst. II Express Briefs 2022, 99, 1. [Google Scholar] [CrossRef]
- Christoph, N.; Michael, M.; Frank, K. Evaluation of the back-end design overhead for ASIC implementations of large-operand multipliers targeting resource-constrained environments. In Proceedings of the 22nd Austrian Workshop on Microelectronics (Austrochip), Graz, Austria, 9 October 2014; pp. 1–6. [Google Scholar] [CrossRef]
- Fried, R. Minimizing energy dissipation in high-speed multipliers. In Proceedings of the 1997 International Symposium on Low Power Electronics and Design, Monterey, CA, USA, 18–20 August 1997; p. 214. [Google Scholar] [CrossRef]
- Pallavi, C.; Rajani, C. Comparative Analysis of 16-Bit Booth Multiplier Using Radix-4 and Radix-8 Encoding Technique. Int. J. Adv. Sci. Technol. (IJAST) 2020, 29, 62–75. [Google Scholar]
- Mikhaylov, K.; Tervonen, J. Experimental Evaluation of Alkaline Batteries’s Capacity for Low Power Consuming Applications. In Proceedings of the 2012 IEEE 26th International Conference on Advanced Information Networking and Applications, Fukuoka, Japan, 26–29 March 2012; pp. 331–337. [Google Scholar] [CrossRef]
- Lin, Y.W.; Liu, H.Y.; Lee, C.Y. A dynamic scaling FFT processor for DVB-T applications. IEEE J. Solid-State Circuits 2004, 39, 2005–2013. [Google Scholar] [CrossRef]
- Chen, Y.; Lin, Y.; Tsao, Y.; Lee, C. A 2.4-Gsample/s DVFS FFT processor for MIMO OFDM communication systems. IEEE J. Solid-State Circuits 2008, 43, 1260–1273. [Google Scholar] [CrossRef]
- Chang, Y.J.; Cheng, Y.C.; Liao, S.C.; Hsiao, C.H. A low power radix-4 Booth multiplier with pre-encoded mechanism. IEEE Access 2020, 8, 114842–114853. [Google Scholar] [CrossRef]
- Kim, H.; Kim, M.S.; Barrio, A.A.D.; Bagherzadeh, N. A Cost-Efficient Iterative Truncated Logarithmic Multiplication for Convolutional Neural Networks. In Proceedings of the 2019 IEEE 26th Symposium on Computer Arithmetic (ARITH), Kyoto, Japan, 10–12 June 2019; pp. 108–111. [Google Scholar] [CrossRef]
Group Bits | Group Bits | ||||
---|---|---|---|---|---|
0000 | 0 | 0 | 1000 | 4a | 1 |
0001 | a | 0 | 1001 | 3a | 1 |
0010 | a | 0 | 1010 | 3a | 1 |
0011 | 2a | 0 | 1011 | 2a | 1 |
0100 | 2a | 0 | 1100 | 2a | 1 |
0101 | 3a | 0 | 1101 | a | 1 |
0110 | 3a | 0 | 1110 | a | 1 |
0111 | 4a | 0 | 1111 | 0 | 0 |
Adders | LUT | IO Buffer | Logic Delay (ns) | Route Delay (ns) | Power (W) |
---|---|---|---|---|---|
full-32-bit | 349 | 224 | 2.426 | 11.234 | 0.323 |
3/17-bit TCA | 210 | 148 | 2.62 | 14.384 | 0.319 |
Types of MUL16 | Delay (ns) | Area (m2) | Power (W) |
---|---|---|---|
Standard radix-8 with array-adder | 12.33 | 2530.28 | 85.4 |
Standard radix-8 with Wallace-tree | 9.22 | 2471.01 | 93.6 |
radix-8 with TCA | 12.99 | 2133.58 | 80.22 |
Multiplier | Delay (ns) | Area (um2) | Power (mW) | Number of Clocks | Energy (pJ/Mul.) |
---|---|---|---|---|---|
Booth multiplier | 13.36 | 55,007.52 | 0.448 | 1 | 34.49 |
Array multiplier | 49.21 | 28,914.95 | 0.155 | 1 | 11.96 |
Full-parallel multiplier [18] | 25.35 | 35,402.95 | 0.0903 | 1 | 6.953 |
IVS | 14.52 | 11,007.18 | 0.0495 | 4 | 15.58 |
Design | [3] | [25] | [4] | [26] | [6] | This Work |
---|---|---|---|---|---|---|
Tech. | 45 nm | 40 nm | 130 nm | 32 nm | 55 nm | 40 nm |
Feature | tree- based | array- based | shift -add | logar. | asyn. | IVS |
Type | paral. | paral. | itera. | itera. | itera. | paral- itera. |
Width | 64b | 16b | 16b | 32b | 16b | 64b |
Area (um2) | 77,841 | 6436(tr) | 3903 | 3102 | 1618 | 11,007 |
Energy (pJ/Mul.) | 26.9 | 1.08 | 48.5 | 13.7 | 3.7 | 15.58 |
Norm. energy | 2.11 | 1.53 | 21.148 | 6.06 | 2.59 | 1 |
Norm. area | 5.38 | N.A | 0.54 | 1.42 | 1.243 | 1 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Tang, X.; Li, Y.; Lin, C.; Shang, D. A Low-Power Area-Efficient Precision Scalable Multiplier with an Input Vector Systolic Structure. Electronics 2022, 11, 2685. https://doi.org/10.3390/electronics11172685
Tang X, Li Y, Lin C, Shang D. A Low-Power Area-Efficient Precision Scalable Multiplier with an Input Vector Systolic Structure. Electronics. 2022; 11(17):2685. https://doi.org/10.3390/electronics11172685
Chicago/Turabian StyleTang, Xiqin, Yang Li, Chenxiao Lin, and Delong Shang. 2022. "A Low-Power Area-Efficient Precision Scalable Multiplier with an Input Vector Systolic Structure" Electronics 11, no. 17: 2685. https://doi.org/10.3390/electronics11172685
APA StyleTang, X., Li, Y., Lin, C., & Shang, D. (2022). A Low-Power Area-Efficient Precision Scalable Multiplier with an Input Vector Systolic Structure. Electronics, 11(17), 2685. https://doi.org/10.3390/electronics11172685