High-Speed Grouping and Decomposition Multiplier for Binary Multiplication

Padmanabhan, Khamalesh Kumar; Seerengasamy, Umadevi; Ponraj, Abraham Sudharson

doi:10.3390/electronics11244202

Open AccessArticle

High-Speed Grouping and Decomposition Multiplier for Binary Multiplication

by

Khamalesh Kumar Padmanabhan

¹,

Umadevi Seerengasamy

^2,* and

Abraham Sudharson Ponraj

¹

School of Electronics Engineering, Chennai Campus, VIT University, Chennai 600127, India

²

Centre for Nanoelectronics and VLSI Design, School of Electronics Engineering, Chennai Campus, VIT University, Chennai 600127, India

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(24), 4202; https://doi.org/10.3390/electronics11244202

Submission received: 29 September 2022 / Revised: 7 December 2022 / Accepted: 8 December 2022 / Published: 16 December 2022

(This article belongs to the Special Issue VLSI Design, Testing, and Applications)

Download

Browse Figures

Review Reports Versions Notes

Abstract

:

In the computation systems that are frequently utilized in Digital Signal Processing (DSP)- and Fast Fourier transform (FFT)-based applications, binary multipliers play a crucial role. Multipliers are one of the basic arithmetic components used, and they require more hardware resources and computational time. Due to this, numerous studies have been performed so as to decrease the computational time and hardware requirements. In this research study on reducing the necessary computational time, a high-speed binary multiplier known as the Grouping and Decomposition (GD) multiplieris proposed. The proposed multiplier aims to achieve competency in processing algorithms over existing multiplier architectures through a combination of the parallel grouping of partial products of the same size and the decomposition of each grouped partial-product bit, with the final summation performed using a 5:2 logic adder (5LA). The usage of parallel processing and decomposition logic reduces the number of computation steps and hence achieves a higher speed in multiplication. The front-end and physical design implementation of the proposed GD multiplier have been executed in the 180 nm technology library using the Cadence^® Virtuoso and Cadence^® Virtuoso Assura tools. From the front-end design of the 8 × 8 proposed GD multiplier, it was observed that the GD multiplier achieves a reduction of approximately 56% in computation time and a reduction of 53% in power–delay product when compared to existing multiplier architectures. A further reduction in the power–delay product is achieved by the physical design implementation of the proposed multiplier due to the internal routing of subsystems with the shortest-path algorithm. The proposed multiplier works better with higher-order multiplication and is suitable for high-end applications.

Keywords:

digital signal processing; fast Fourier transform; grouping and decomposition multiplier; 5:2 logic adder

1. Introduction

Applications for multimedia, image processing, and the Internet of Things not only require extensive computing, but they also require a quick response while using minimal power. Digital logic circuits are the backbone of most computer arithmetic applications, making them highly reliable and accurate. The multiplier [1,2] is one of the most important arithmetic building blocks, and it is frequently employed in many applications, particularly those related to signal processing. Fast multipliers [3] come in a variety of forms, and each form has its advantages and disadvantages. Most researchers are currently concentrating mainly on improving performance metrics such as power, area consumption, and computational time [4]. The multipliers have sequential and parallel architectures as their two main design options. Sequential designs have a minimal power requirement, but a relatively high delay. Conversely, Wallace tree and Dadda are fast parallel designs with significant levels of power consumption [5,6].

Because both power consumption and processing speed are essential considerations in the layout of digital circuits, optimizing these aspects of multipliers is of the utmost significance. It is a quite common practice to optimize one parameter while simultaneously taking into account a constraint imposed on another parameter. Specifically, the task is difficult because of the minimal power capacity of portable devices, which must be considered when trying to achieve optimum performance. Having a certain level of reliability may also hinder the system’s desired level of performance. There are numerous techniques at various design abstraction levels that can be used to achieve both the power and speed requirements. A common type of multiplier is an array multiplier, which implements multiplication as a shift-and-add operation. However, because it performs more computations, it uses more energy, takes up more space, and takes more computational time [7]. The Vedic multiplier is the main focus of current research since it has advantages over array multipliers, such as having a high speed and requiring less area [8,9,10]. Inexact computing techniques seek to meet objective parameters while sacrificing computation accuracy [11]. The basic foundation for inexact arithmetic computation is the minimization of the arithmetic unit circuits [12]. The strategy can be employed when there is no unique solution and/or a set of near-accurate results is acceptable [13].

The 16-bit Dadda multiplier has been approached in a novel way, employing new compressor designs, by Sebastian, Alen, et al. These brand-new compressors include two 4–2 compressors [14] along with an updated higher-order compressor. Three proposed multiplier designs were compared with existing multiplier designs. In terms of delay and area, the proposed design was determined to be closer to ideal than the previous designs, and it can be applied to exact multiplier applications. This approach produces a minimum delay of 20.612 ns and an area of 84 slices. However, the power reduction was not considered.

A unique Wallace tree multiplier structure designed by Devi Ykuntam, et al. has been suggested as a method for decreasing the delay time of the Wallace tree multiplier without compromising its area parameter. In the suggested arrangement, parallel prefix adders [15] complete the addition process of the partial products (PPAs). Using the Kogge–Stone adder, Sklansky adder, Brent–Kung adder, Ladner–Fischer adder, and Han–Carlson adder, five Wallace tree multiplier architectures were presented. The suggested multiplier designs’ areas (number of LUTs) and delays (ns) are compared with the conventional multiplier design. Among them, a 16-bit Wallace tree multiplier using the Kogge–Stone adder had a delay of 29.44 ns and an area of 634 LUTs.

Approximation methods [16] have the advantage of reducing device usage for high-computation applications, which eventually results in a reduction in both power consumption and delay. Both the redundant bit and the amount of storage required to keep the data will be reduced in the end. The computational complexity of arithmetic circuits can be reduced through approximation without significantly affecting the coding efficiency. The approximate, almost-full adder-based Dadda multiplier that is suggested ultimately increases speed and requires less energy and a fewer number of devices. The present work evaluates the analysis of several multipliers using an approximation of a nearly full adder. An 8-bit Dadda multiplier with an almost-full adder consumes 11.409 W of power and 0.20 LUTs of area.

In order to reduce multiplier power consumption, a redesigned full adder [17] with a multiplexer was developed by Jaiswal, Kokila Bharti, et al. The traditional Wallace tree multiplier design was used to evaluate the effectiveness of the suggested structure. Comparing the suggested multiplier to a traditional structure, the ASIC synthesis results revealed an average reduction of 37.45% in power requirements, 45.75% in area, and 17.65% in delay. Using the proposed full adder to make a 16-bit Wallace multiplier takes 6.5534 mW of power and 8.81 ns of time.

Wallace and Dadda multipliers [18] were created using a brand-new hybrid 3–2 counter by Devnath, et al. The suggested method used several AND gates to make partial products, and a good model of an AND gate only needs two transistors. The multipliers were developed using the 65 nm PTM (full form) transistor model. A thorough evaluation and comparison of both multipliers’ system results with other models already in use were conducted. The delay of a 4-bit Dadda multiplier with a hybrid full adder was 220.9 ps, and the power consumption was 20.34 μW.

Ram, et al. examined the Wallace multiplier with the conventional array multiplier and the Dadda multiplier regarding the delay. Additionally, the Carry-select adder (CSLA) [19] and binary to excess-1 converter (BEC) adders were used to construct the 16-bit Wallace multiplier that was suggested. The Wallace multiplier with a CSLA requires less computational time than the Wallace multiplier with a BEC. The 16 × 16-bit Wallace multiplier for the binary to excess code converter has a delay of 24.948 ns and consumes 86.48 mW of power.

Momeni, Amir, et al. examined and designed two new approximate 4–2 compressors [20] to employ them in a multiplier. These designs utilize various compression aspects, and as a result, the inaccuracy in computation can be accommodated with regard to the circuit-based figures of merit of a design. They suggested and examined four distinct approaches to implementing the approximation compressors with a Dadda multiplier. The delay of an 8-bit Dadda multiplier with an approximate 4–2 compressor is 44.35 ps, and the power consumption is 1.14 μW.

Akbari, Omid, et al. proposed four 4:2 compressors that can toggle between accurate and approximate modes of operation [21]. These 2D compressors offer increased speed and decreased power consumption in the approximation mode at the expense of increased error. In addition to varying delay times and power consumption between approximate and accurate modes, each of these compressors also varies in how accurate it is when operating in the approximate mode. A 16-bit Dadda Multiplier with dual-quality 4:2 mixed compressors has a delay of 1.19 ns, and the power consumption is 2339 μW.

The performance parameters of the existing binary multipliers based on the Wallace/Dadda/parallel processing algorithm are mentioned in Table 1. All of the existing architectures either have less speed or more power consumption, and hence, they have more power–delay product. The present research aimed to propose an architecture for binary multiplication to achieve an improved power–delay product without affecting the other parameters much. This has been achieved through the introduction of the concept of the grouping of partial products with the same size and decomposing each grouped partial product in a parallel manner. The parallel processing and decomposition logic introduced in this novel architecture reduces the number of computation steps and hence achieves high speed in computation. The GD multiplier uses a 4 × 4 conventional Wallace and Dadda binary multiplier for the decomposition logic.

2. Methodology

2.1. Conventional Wallace Multiplier (4 × 4)

Wallace’s method is a binary multiplier. The final result is obtained by the reduction of the partial products [22,23] generator (shown in Figure 1) through a single-bit full adder [2,24] and a half adder, as shown in Figure 2a,b, respectively.

After generating the partial products, each column with the at most height of h = 3 is considered, and the reduction of layers [25] takes place. The schematic design of conventional 4 × 4 Wallace multiplier is shown in Figure 3. Depending on the number of entries in the column, either single-bit half adders [26] or full adders are used to process the layers, i.e., if a single bit is passed on to the next layer as a carry over, and if two or three partial products are present in the column as entries, then they are passed through the half adder and the full adder, respectively. The sum and carry generated from the previous layer are passed on to the next set of entries. This process is repeated until it reaches the terminal layer, which consists of only a pair of entries in each column.

In the Wallace multiplier, the computational time is reduced due to the parallel reduction of bits in layers by the single-bit half adder and full adder, but the logical levels required to perform the summation can be reduced further (i.e., the number of layers). The physical design implementation of the Wallace multiplier is complex, with the higher-order multiplication of bits.

2.2. Conventional DADDA Multiplier (4 × 4)

The Dadda multiplier is a binary multiplier design invented by Luigi Dadda. This multiplier design consists of single-bit full adder and a half adder, as shown in Figure 2a,b, to process the partial products and give the final product.

Dadda multiplication starts by forming the partial products into a tree structure of a height that varies according to the products. The progression of the reduction layer is controlled by d_j (maximum height sequence), where d_j = 2, 3, 4, 6, 9, 13, etc. The value of d_j is obtained using the condition d_j < min (n,m), where n and m are the sizes of the multiplier and multiplicand. The 4 × 4 schematic of the Dadda multiplier is shown in Figure 4. The sizes of n and m are both 4. With the minimum of both n and m being 4, the maximum height sequence is determined to be 3 (as per the sequence d_j 3 < 4). With d_j determined, the reduction of layers proceeds. Tree columns formed using partial products are reduced if h > 3. Summation is achieved by producing partial products as inputs for half and full adders. Reduction layers are performed progressively i.e., h = 4 → 3 → 2 until h = 2 for producing the final products of the Dadda multiplier.

The Dadda multiplier consumes less power because it requires a minimum amount of hardware. The main drawbacks of the Dadda multiplier are that it is more complex to proceed with and to compute for a higher number of bits, and it reduces the number of bits at the early stages of reduction, which results in passing on numerous bits to the next stage.

2.3. Proposed GD Multiplier (8 × 8)

The proposed GD multiplier is a high-speed multiplier with a parallel multiplication and grouping technique, which is shown in Figure 5. The GD multiplier follows the below-described steps for 8 × 8 binary multiplication:

Partial products generation.
Parallel grouping of partial products and decomposition of grouped bits using a 4 × 4 Wallace and a 4 × 4 Dadda multiplier.
Final summations of bits performed using 5LA.

After the generation of the partial products, they are divided into groups of equal size. Each group is processed through the Dadda and the Wallace multiplier alternatively; this process is called “decomposition”. It uses both Wallace and Dadda multipliers parallelly in order to reduce the computational time produced during the summation of partial products. The byproducts from each group are then passed on to the 5LA [27,28,29], shown in Figure 6, and the half adders to obtain the final results. The 5LA is a combination of two full adders to minimize carry-over steps. The GD multiplier is a hybrid multiplier incorporating both Wallace and Dadda, and it aims to combine the unique advantages of both the Wallace and Dadda multipliers. The overall working principle of the GD multiplier is shown in Figure 7, with a test input of 8 × 8.

2.4. Back-End Design Implementation of Proposed and Existing Multiplier Design

The physical design implementation of both the existing and the proposed multiplier designs has been implemented according to the conventional process flow. After the schematic design of the multiplier, the physical design implementation was carried out using the Cadence^® Virtuoso Layout Suite XL tool with a fixed place and route boundary. The interconnection of the transistors was achieved using various metal layers by following the rule deck provided by the vendor. The Design Rule Check (DRC) and the Layout Vs. Schematic (LVS) physical verifications of the back-end design were performed using the Cadence^® Assura tool. The width and spacing rules of the design were checked through the DRC, and the open-end short errors were checked through LVS. The violations were fixed manually to improve the quality of the layout. Parasitic values were extracted from a post-layout simulation of the design.

3. Results and Discussion

The proposed GD multiplier architecture has the advantage of performing higher-order bit multiplication with less computational time than the existing multiplier architectures. This has been achieved in the proposed multiplier by grouping the partial products equally and applying the decomposition procedure along with parallel computation. The proposed architecture has been implemented in the 180nm technology library. The front-enddesign of the circuit was performed using Cadence^® Virtuoso, the physical design of the circuit was performed using the Cadence^® Layout Editor, and physical verification of the layout was carried out using the Cadence^® Assura tool. Wallace and Dadda are the two main multiplier architectures used to compare the working of our proposed multiplier architecture. Both Wallace and Dadda use parallel multiplication, whereas the proposed multiplier applies parallel computation in grouped partial products.

3.1. Proposed GD Multiplier Implementation

The proposed GD multiplier architecture produces the same product result as the Wallace or Dadda multiplier for any set of binary input patterns chosen. The obtained GD multiplier product result for the sample input patterns, x = 10110110, y = 10011100 and x = 11110000, y = 00001111, are shown in Figure 8. The logic high value is mentioned as 1.8 V, and the logic low value is mentioned as 0 V. The time duration for each bit is 50 ns.

3.2. Front-End Design Performance Comparison of the Proposed GD Multiplier Architecture with Existing Multiplier Architectures

Table 2 shows the comparative analysis of the front-end designs of the 4 × 4 proposed GD multiplier against those of existing architectures in terms of delay and power consumption. From Table 2, it can be inferred that the GD multiplier requires a 29.93% reduction in computational time over the existing multiplier architectures without compromising power consumption. The same has been achieved through parallel computation, which is introduced during the decomposition of the groups of partial products. In the GD multiplier, full adder logic is reduced, and half adder logic is increased to balance the power consumption. The proposed GD multiplier architecture has a power–delay product that is 33.18% less than that of the Wallace multiplier and 25.91% less than that of the Dadda multiplier.

Table 3 shows the comparison of the front-end design parameters of the GD multiplier (8 × 8) against those of the existing architectures. When compared with the existing multiplier designs, the GD multiplier shows 59.19% and 56.29% decrease in computational time with respect to the Wallace and Dadda multipliers, respectively. The power consumption of the proposed multiplier is higher than that of the existing multiplier. The proposed multiplier consumes 13.76%and 13.62% more power compared to the Dadda and Wallace multipliers, respectively. Even though there is a tolerable increase in the power consumption of the proposed multiplier which is compensated by achieving less power–delay product in the GD multiplier against the existing architectures. PDP percentage reductions of 56.52% and 53.491% are achieved in the GD multiplier with respect to the Wallace and Dadda multipliers.

The bar graph shown in Figure 9 shows the percentage improvement of the proposed GD multiplier with respect to the existing multipliers in terms of power–delay product (PDP) and computational time. It can be inferred that with higher-order multiplication, the proposed multiplier shows a better improvement in terms of PDP and computational time. From the graph, one can see that the 8 × 8 architecture has an improvement of approximately 30% in performance parameters over the 4 × 4architecture of the proposed architecture.

3.3. Physical Design Implementation and Comparison Results against the Existing Multipliers

The physical design implementation of the 4 × 4 Wallace multiplier shown in Figure 10 is achieved through the interconnection of the schematic-driven layout of a two-input AND gates, a half adder, and a full adder cellas per the front-end design.

The physical design implementation of the 4 × 4 Dadda multiplier is shown in Figure 11. This is achieved through the interconnection of the schematic-driven layout of a two-input AND gate, a half adder, and a full adder cell, as per the front-end design.

The physical design implementation of the 8 × 8 GD multiplier is shown in Figure 12. This is achieved through the interconnection of the schematic-driven layout of 4 × 4 Wallace and Dadda multipliers, a half adder, and a full adder cell, as per the front-end design.

Table 4 shows the back-end design comparison between the proposed GD multiplier and the existing Wallace and Dadda multipliers. The back-end design shows that a computational time reduction of 43.85% compared to the Dadda and a 47.52% reduction compared to the Wallace is achieved with the proposed GD multiplier. Also from the results, it can be observed that the post-layout design of both the proposed and existing architecture consume approximately 1.5% more power compared to the front-end design of the same, which is due to the routing of interconnecting nets. The power–delay product of the proposed GD multiplier shows 66.7% and 65.07% improvement with respect to the Wallace and Dadda multipliers, respectively. The Dadda multiplier consumes less silicon area compared to the other two multipliers due to the requirement for fewer single-bit adders in the design. The GD multiplier consumes more silicon area, approximately 15%, as compared to other two multipliers. This is caused by the greater amount of logic involvement required by the grouping and decomposition stage. However, if the order of the multiplication increases, the other two multipliers may consume more silicon area as compared to the proposed architecture since a greater amount of layer-reduction is required, whereas the proposed architecture always has a fixed amount of layer-reduction, irrespective of the order of the multiplication.

4. Conclusions

This research work proposes a high-speed binary multiplier that uses a grouping and decomposing algorithm. The proposed multiplier was implemented in a 180nm CMOS process. The comparative analysis of the front-end architecture of the proposed multiplier shows that it consumes 56.29% less computational time and 53.49% less power–delay product as compared to the existing parallel architectures. The proposed multiplier can parallelly execute the reduction of bits, regardless of the total number of partial products, using the grouping and decomposition algorithm. The physical design implementation of the proposed GD multiplier shows a reduction of 47.52% and 43.85% in computational time when compared to the Wallace and Dadda multipliers, respectively. A comparative analysis of the proposed architecture with existing multipliers shows that the parallel working of the grouping and decomposition algorithm makes the proposed multiplier outperform existing multipliers, such as Wallace, Dadda, etc. Therefore, it is preferable for high-speed VLSI applications.

Author Contributions

Conceptualization and methodology: K.K.P. and U.S.; software: K.K.P.; validation: U.S.; formal analysis: K.K.P. and U.S.; investigation: K.K.P., U.S. and A.S.P.; resources: U.S.; data curation: K.K.P. and U.S.; writing—original draft preparation: K.K.P., U.S. and A.S.P.; writing—review and editing: K.K.P., U.S. and A.S.P.; visualization: K.K.P.; supervision: U.S.; project administration: K.K.P. and U.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Acknowledgments

The authors are grateful to the VIT management for providing the necessary facilities to carry out this research work.

Conflicts of Interest

The authors declare no conflict of interest.

References

Wallace, C.S. A suggestion for a fast multiplier. IEEE Trans. Electron. Comput. 1964, 1, 14–17. [Google Scholar] [CrossRef]
Kulkarni, P.; Gupta, P.; Ercegovac, M. Trading accuracy for power with an underdesigned multiplier architecture. In Proceedings of the 2011 24th Internatioal Conference on VLSI Design, Chennai, India, 2–7 January 2011; pp. 346–351. [Google Scholar]
Habibi, A.; Wintz, P.A. Fast multipliers. IEEE Trans. Comput. 1970, 100, 153–157. [Google Scholar] [CrossRef]
Bandi, V.L.; Gamini, P.; Harshith, B.S. Performance analysis of dadda multiplier using modified full adder. Int. J. Innov. Res. Comput. Commun. Eng. 2018, 6, 126–130. [Google Scholar]
Baran, D.; Aktan, M.; Oklobdzija, V.G. Multiplier structures for low power applications in deep-CMOS. In Proceedings of the 2011 IEEE International Symposium of Circuits and Systems (ISCAS), Rio de Janeiro, Brazil, 15–18 May 2011; pp. 1061–1064. [Google Scholar]
Townsend, W.J.; Swartzlander, E.E., Jr.; Abraham, J.A. A comparison of Dadda and Wallace multiplier delays. In Advanced Signal Processing Algorithms, Architectures, and Implementations XIII; SPIE: San Diego, CA, USA, 2003; Volume 5205. [Google Scholar]
Weste, N.H.; Harris, D.M. Harris, Integrated Circuit Design; Pearson: Boston, MA, USA, 2010. [Google Scholar]
Maurya, K.A.; Lakshmanna, Y.R.; Sindhuri, K.B.; Kumar, N.U. Design and implementation of 32-bit adders using various full adders. In Proceedings of the 2017 Innovations in Power and Advanced Computing Technologies (i-PACT), Vellore, India, 21–22 April 2017. [Google Scholar]
Bandi, V. Performance Analysis Modified Full for Vedic Multiplier using Adders. In Proceedings of the 2017 Innovations in Power and Advanced Computing Technologies (i-PACT), Vellore, India, 21–22 April 2017. [Google Scholar]
Ram, G.C.; Lakshmanna, Y.R.; Rani, D.S.; Sindhuri, K.B. Area Efficient Modified Vedic Multiplier. In Proceedings of the 2016 International Conference On Circuit, Power and Computing Technologies, Nagercoil, India, 18–19 March 2016; pp. 1–5. [Google Scholar]
Akbari, O.; Kamal, M.; Afzali-Kusha, A.; Pedram, M. RAP-CLA: A reconfigurable approximate carry look-ahead adder. IEEE Trans. Circuits Syst. II Express Briefs 2016, 65, 1089–1093. [Google Scholar] [CrossRef]
Raha, A.; Jayakumar, H.; Raghunathan, V. Input-based dynamic reconfiguration of approximate arithmetic units for video encoding. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2015, 24, 846–857. [Google Scholar] [CrossRef]
Sampson, A.; Dietl, W.; Fortuna, E.; Gnanapragasam, D.; Ceze, L.; Grossman, D. EnerJ: Approximate data types for safe and general low-power computation. In Proceedings of the 32nd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI 11), New York, NY, USA, 4–8 June 2011; pp. 164–174. [Google Scholar]
Sebastian, A.; Jose, F.; Gopakumar, K.; Thiyagarajan, P. Design and Implementation of an Efficient Dadda Multiplier Using Novel Compressors and Fast Adder. In Proceedings of the 2020 International Symposium on Devices, Circuits and Systems (ISDCS), Howrah, India, 4–6 March 2020; IEEE: Piscataway, NJ, USA; pp. 1–4. [Google Scholar]
devi Ykuntam, Y.; Pavani, K.; Saladi, K. Design and analysis of High speed wallace tree multiplier using parallel prefix adders for VLSI circuit designs. In Proceedings of the 2020 11th International Conference on Computing, Communication and Networking Technologies (ICCCNT), Kharagpur, India, 1 July 2020; IEEE: Piscataway, NJ, USA; pp. 1–6. [Google Scholar]
Pathak, K.C.; Sarvaiya, J.N.; Darji, A.D.; Diwan, S.; Gangadwala, A.; Bhatt, Z.; Patel, A. An Efficient Dadda Multiplier using Approximate Adder. In Proceedings of the 2020 IEEE Region 10 Conference (TENCON), Osaka, Japan, 16–19 November 2020; IEEE: Piscataway, NJ, USA; pp. 176–181. [Google Scholar]
Jaiswal, K.B.; Kumar, N.; Seshadri, P.; Lakshminarayanan, G. Low power wallace tree multiplier using modified full adder. In Proceedings of the 2015 3rd International Conference on Signal Processing, Communication and Networking (ICSCN), Chennai, India, 26–28 March 2015; IEEE: Piscataway, NJ, USA; pp. 1–4. [Google Scholar]
Devnath, B.C.; Biswas, S.N.; Datta, M.R. 4-bit Wallace and Dadda Multiplier design using novel hybrid 3-2 Counter. In Proceedings of the 2020 2nd International Conference on Advanced Information and Communication Technology (ICAICT), Dhaka, Bangladesh, 28–29 November 2020; IEEE: Piscataway, NJ, USA; pp. 189–194. [Google Scholar]
Ram, G.C.; Rani, D.S.; Balasaikesava, R.; Sindhuri, K.B. Design of delay efficient modified 16 bit Wallace multiplier. In Proceedings of the 2016 IEEE International Conference on Recent Trends in Electronics, Information & Communication Technology (RTEICT), Bengaluru, India, 20–21 May 2016; IEEE: Piscataway, NJ, USA; pp. 1887–1891. [Google Scholar]
Momeni, A.; Han, J.; Montuschi, P.; Lombardi, F. Design and analysis of approximate compressors for multiplication. IEEE Trans. Comput. 2014, 64, 984–994. [Google Scholar] [CrossRef]
Akbari, O.; Kamal, M.; Afzali-Kusha, A.; Pedram, M. Dual-quality 4: 2 compressors for utilizing in dynamic accuracy configurable multipliers. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2017, 25, 1352–1361. [Google Scholar] [CrossRef]
Elguibaly, F. A Fast Parallel Multiplier-Accumulator Using the Modified Booth Algorithm. IEEE Trans. Circuits Syst. 2000, 47, 902–908. [Google Scholar] [CrossRef]
Wang, Z.; Jullien, G.A.; Miller, W.C. A new design technique for column compression multipliers. IEEE Trans. Comput. 1995, 44, 962–970. [Google Scholar] [CrossRef]
Kakde, S.; Khan, S.; Dakhole, P.; Badwaik, S. Design of area and power aware reduced Complexity Wallace Tree multiplier. In Proceedings of the 2015 International Conference on Pervasive Computing (ICPC), Pune, India, 8–10 January 2015; IEEE: Piscataway, NJ, USA; pp. 1–6. [Google Scholar]
Waters, R.S.; Swartzlander, E.E. A reduced complexity wallace multiplier reduction. IEEE Trans. Comput. 2010, 59, 1134–1137. [Google Scholar] [CrossRef]
Asif, S.; Kong, Y. Low-area Wallace multiplier. In Vlsi Design; Hindawi Limited: London, UK, 12 May 2014; pp. 1–16. [Google Scholar] [CrossRef] [Green Version]
Senapati, R.K.; Ravindra, J. Low-power near-explicit 5: 2 compressor for superior performance multipliers. Int. J. Eng. 2018, 11, 529–545. [Google Scholar]
Priya, K.B.; Sudarmani, R. Performance analysis of Dadda multiplier using 5:2 compressor and its applications. Int. J. Adv. Inf. Sci. Technol. 2016, 5, 72–78. [Google Scholar]
Asif, S.; Kong, Y. Design of an algorithmic Wallace multiplier using high speed counters. In Proceedings of the 2015 Tenth International Conference on Computer Engineering & Systems (ICCES), Cairo, Egypt, 23–24 December 2015; IEEE: Piscataway, NJ, USA; pp. 133–138. [Google Scholar]

Figure 1. Schematic design of 4 × 4 partial product generator.

Figure 2. (a) Schematic design of static CMOS single-bit full adder; (b) Schematic design of static CMOS single-bit half adder.

Figure 3. Schematic design of 4 × 4 Wallace multiplier.

Figure 4. Schematic design of 4 × 4 Dadda multiplier.

Figure 5. Schematic design of 8 × 8 GD multiplier.

Figure 6. Schematic of 5LA.

Figure 7. Working principle of 8 × 8 GD multiplier: (a) Grouping of products (into 4 groups) performed in 8 × 8 GD multiplier; (b) Flow of 4 × 4 Wallace multiplier used during decomposition (for groups 2 and 3) and Flow of 4 × 4 Dadda multiplier used during decomposition (for groups 1 and 4); (c) Terminal-layer processing using 5LA.

Figure 8. Simulation waveform of GD multiplier (8 × 8).

Figure 9. Proposed multiplier vs. existing multiplier.

Figure 10. DRC, LVS cleaned, physical design implementation of 4 × 4 Wallace multiplier.

Figure 11. DRC, LVS cleaned, physical design implementation of 4 × 4 Dadda multiplier.

Figure 12. DRC and LVS cleaned, physical design implementation of 8 × 8 GD (grouping and decomposition) multiplier.

Table 1. Comparison of parameters of different existing literature works.

	Title	Multiplier	Computational Time	Power Consumption/ Area Consumption	Implementation Tool/ Technological Node
1.	Design and Implementation of an Efficient Dadda Multiplier Using Novel Compressors and Fast Adder	16-bit Dadda multiplier with 4–2	20.612 ns	84 slices	Xilinx ISE
2.	Design and Analysis of High speed Wallace Tree Multiplier Using Parallel Prefix Adders for VLSI Circuit Designs	16-bit Wallace tree multiplier using Kogge–Stone adder compressors	29.44 ns	634 LUT	Xilinx ISE
3.	An Efficient Dadda Multiplier using Approximate Adder	8-bit Dadda multiplier using almost full adder	NA	11.409 Watt/ 0.20 LUT	Xilinx ISE
4.	Low-Power Wallace Tree Multiplier Using Modified Full Adder	16-bit Wallace tree multiplier using full adder	8.81 ns	6.5534 mW/ 12,627.71 μm²	Synopsys design compiler using SAED 90 nm CMOS technology
5.	4-bit Wallace and Dadda Multiplier Design Using Novel Hybrid 3–2 Counter	4-bit Dadda multiplier with hybrid full adder	220.9 ps	20.34 μW	65 nm technology
6.	Design of Delay Efficient Modified 16-bit Wallace Multiplier	16 × 16-bit Wallace multiplier binary to excess code converter	24.948 ns	86.48 mW / 1019 LUT	Xilinx
7.	Design and Analysis of Approximate Compressors for Multiplication	8-bit Dadda multiplier with approximate 4–2 compressor	44.35 ps	1.14 μW	32 nm HSPICE simulation
8.	Dual-Quality 4:2 Compressors for Utilizing in Dynamic Accuracy Configurable Multipliers	8-bit Dadda multiplier with dual-quality 4:2 compressors mixed	0.25 ns	424 μW/ 423 μm²	45 nm technology node

Table 2. Front-end design performance comparison between proposed GD architecture with existing multiplier architectures.

Parameters	Wallace Multiplier (4 × 4)	Dadda Multiplier (4 × 4)	GD Multiplier (4 × 4)
Delay (ns)	1.7518	1.7518	1.2274
Power (nW)	1.604	1.567	1.659
Power–Delay Product (PDP) (10⁻¹⁵)	0.00280	0.00274	0.00203
Hardware Requirements	2 i/p AND gate = 16	2 i/p AND gate = 16	2 i/p AND gate = 16
	FA = 8	FA = 8	FA = 4
	HA = 4	HA = 4	HA = 11

FA = single-bit full adder, HA = single-bit half adder.

Table 3. Front-end performance analysis between GD and existing techniques with power and delay as parameters.

Parameters	Wallace Multiplier (8 × 8)	Dadda Multiplier (8 × 8)	GD Multiplier (8 × 8)
Delay (ns)	13.527	12.631	5.5199
Power (nW)	7.082	7.0370	7.9955
Power–Delay Product (PDP) (10⁻¹⁵)	0.0950	0.0888	0.0413
Hardware Requirements	2 i/p AND gate = 64	2 i/p AND gate = 64	2 i/p AND gate = 64
	FA = 53	FA = 48	FA = 54
	HA = 8	HA = 8	HA = 18

Table 4. Back-end design performance comparison between the proposed multiplier and existing multiplier techniques.

Parameters	Wallace Multiplier (8 × 8)	Dadda Multiplier (8 × 8)	GD Multiplier (8 × 8)
Computational time (ns)	10.511	9.824	5.5159
Power consumption (nW)	7.1336	7.1390	8.1154
Area consumption (mm²)	2.1304	2.0092	2.3638
Power–delay product (PDP) (10⁻¹⁵)	0.0749	0.0713	0.0249

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Padmanabhan, K.K.; Seerengasamy, U.; Ponraj, A.S. High-Speed Grouping and Decomposition Multiplier for Binary Multiplication. Electronics 2022, 11, 4202. https://doi.org/10.3390/electronics11244202

AMA Style

Padmanabhan KK, Seerengasamy U, Ponraj AS. High-Speed Grouping and Decomposition Multiplier for Binary Multiplication. Electronics. 2022; 11(24):4202. https://doi.org/10.3390/electronics11244202

Chicago/Turabian Style

Padmanabhan, Khamalesh Kumar, Umadevi Seerengasamy, and Abraham Sudharson Ponraj. 2022. "High-Speed Grouping and Decomposition Multiplier for Binary Multiplication" Electronics 11, no. 24: 4202. https://doi.org/10.3390/electronics11244202

APA Style

Padmanabhan, K. K., Seerengasamy, U., & Ponraj, A. S. (2022). High-Speed Grouping and Decomposition Multiplier for Binary Multiplication. Electronics, 11(24), 4202. https://doi.org/10.3390/electronics11244202

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

High-Speed Grouping and Decomposition Multiplier for Binary Multiplication

Abstract

1. Introduction

2. Methodology

2.1. Conventional Wallace Multiplier (4 × 4)

2.2. Conventional DADDA Multiplier (4 × 4)

2.3. Proposed GD Multiplier (8 × 8)

2.4. Back-End Design Implementation of Proposed and Existing Multiplier Design

3. Results and Discussion

3.1. Proposed GD Multiplier Implementation

3.2. Front-End Design Performance Comparison of the Proposed GD Multiplier Architecture with Existing Multiplier Architectures

3.3. Physical Design Implementation and Comparison Results against the Existing Multipliers

4. Conclusions

Author Contributions

Funding

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI