Next Article in Journal
Adversarial Image Colorization Method Based on Semantic Optimization and Edge Preservation
Previous Article in Journal
UI dApps Meet Decentralized Operating Systems
Previous Article in Special Issue
Low-Power Energy-Based Spike Detector ASIC for Implantable Multichannel BMIs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Approximate Floating-Point Multiplier based on Static Segmentation

Department of Electrical Engineering and Information Technology, University of Naples Federico II, 80131 Naples, Italy
*
Author to whom correspondence should be addressed.
Electronics 2022, 11(19), 3005; https://doi.org/10.3390/electronics11193005
Submission received: 26 August 2022 / Revised: 15 September 2022 / Accepted: 19 September 2022 / Published: 22 September 2022

Abstract

:
In this paper a novel low-power approximate floating-point multiplier is presented. Since the mantissa computation is responsible for the largest part of the power consumption, we apply a novel approximation technique to mantissa multiplication, based on static segmentation. In our approach, the inputs of the mantissa multiplier are properly segmented so that a small inner multiplier can be used to calculate the output, with beneficial impact on power and area. To further improve performance, we introduce a novel segmentation-and-truncation approach which allows us to eliminate the shifter normally present at the output of the segmented multiplier. In addition, a simple compensation term for reducing approximation error is employed. The accuracy of the circuit can be tailored at the design time, by acting on a single parameter. The proposed approximate floating-point multiplier is compared with the state-of-the-art, showing good performance in terms of both precision and hardware saving. For single-precision floating-point format, the obtained NMED is in the range 10−5–7 × 10−7, while MRED is in the range 3 × 10−3–1.7 × 10−4. Synthesis results in 28 nm CMOS show area and power saving of up to 82% and 85%, respectively, compared to the exact floating-point multiplier. Image processing applications confirm the expectations, with results very close to the exact case.

1. Introduction

Multipliers are the most-used arithmetic blocks in many digital signal processing applications, being the basic elements for operations as filtering, correlation, de-noising, and domain transformation. Thanks to the favorable hardware features, fixed-point implementation is extensively exploited in a wide range of electronic systems including transceivers [1,2], FPGA accelerators [3,4], digital phase locked loops and spread spectrum clock generators [5,6,7]. The fixed-point arithmetic employs a fixed number of bits for representing the integer and the fractional parts of the signals. The bit-length of the integer part is related to the range of representable numbers, while the bit-length of the fractional part affects the accuracy of the operations. Therefore, the designer must properly choose the signal bit-widths and resolutions to manage numerical range and precision.
Floating-point (FP) arithmetic, while complex from the point of view of hardware implementation, offers a flexible way of performing numerical computations, providing at the same time a large range of representable values and high precision. It is therefore routinely used in applications such as scientific computing, digital signal processing, and computer graphics. The standardized IEEE 754 format [8] is ubiquitous in most computing platforms, from CPUs to GPUs and microcontrollers. According to this standard, a FP number consists of sign S, exponent E and mantissa M. The value encoded in the FP format is given by: A = (−1)S·(1 + M)·2E-bias, with the mantissa M in the range [0, 1).
A FP multiplication involves a fixed-point adder to sum up the exponents, a fixed-point multiplier for the mantissa processing, and a normalization logic for the result. It follows that in most applications the FP multiplication dissipates the largest part of the overall power consumption [9]. In recent years, approximate computing techniques have been introduced in FP multipliers to save power and area, at the cost of introducing some (small) errors in the result. Approximate computing is a general paradigm that aims at improving the hardware performance by intentionally introducing approximations into the design [10,11]. The authors of [12] devise a piecewise linear approximation for the mantissa multiplication and introduce a tunable error compensation technique. The work [13] involves gate-level pruning and inexact carry propagate adder (CPA) for the computation of the product, whereas [14] skips the multiplication and transfers one of the input mantissas to the output. In [15], the mantissa multiplier is substituted by an adder, and an adaptive control logic is introduced for alleviating the approximation error. The paper [16] scales the accuracy of the multiplier, freezing some of its partial products, whereas in [17] an approximate addition performs the product in the logarithmic number system.
While only a few papers deal with approximate FP multipliers, approximate fixed-point multipliers have been extensively studied by using several techniques. In [18,19,20,21], the partial products generation stage is approximated by truncating some of the less-significant partial products and mitigating the truncation error with a suitable correction function. The papers [22,23,24,25,26,27,28,29] approximate the compression of partial product matrix: the papers [22,23] use simple OR gates for the compression, whereas [24,25,26,27,28,29] use more complex approximate compressors. The works [30,31,32,33,34,35] design low-power multipliers by assembling small elementary approximate multipliers in a hierarchical fashion. The use of logarithmic number system is investigated in [36,37,38,39]. The static and the dynamic segment methods reduce the size of the multiplier by selecting portions of the inputs for the product [40,41,42,43]. While the dynamic segmentation allows a finer selection at the cost of introduction of additive logic (as leading one detectors and barrel shifters), the static segmentation alleviates the hardware burden by choosing between fixed portions of the inputs.
In this paper, we propose the design of a novel approximate static segmented floating-point multiplier (SSFPM) with static segmentation applied to the mantissa product. The proposed circuit is designed for single precision floating-point arithmetic.
In the proposed approach, the inputs of the mantissa multiplier are properly segmented so that a small inner multiplier can be used to calculate the output. To further improve performance, we introduce a novel segmentation-and-truncation approach which allows us to eliminate the shifter normally present at the output of the segmented multiplier. In addition, as in [41], a simple compensation term for reducing the approximation error is employed. The accuracy of the SSFPM can be accurately tailored at the design time, by acting on a single parameter.
Analysis of the error metrics shows that the proposed SSFPM is competitive with the state-of-the-art. The power consumption and the area occupation, obtained by synthesis in TSMC 28 nm CMOS technology, demonstrate remarkable performance. Finally, the results of signal processing applications such as JPEG compression, image filtering, and tone mapping of high dynamic range (HDR) images [44] show the effectiveness of the proposed SSFPM.
The paper is organized as follows: in Section 2 we recall the steps used to perform the standard floating-point multiplication. In this section we also describe the optimized normalization logic, the proposed static segment method, and the correction technique. Section 3 shows the results in terms of error metrics and hardware performances, as well as discusses the behavior of the SSFPM in image processing applications. Section 4 compares the results with the state-of-the-art, while Section 5 concludes the paper.

2. Floating-Point Multiplication

2.1. Single Precision Floating-Point Multiplier

In the following we consider the IEEE 754 single precision standard. According to this standard, a FP number consists of sign S, exponent E and mantissa M.
The value encoded in the FP format, in the case of normalized representations, is given by: X = (−1)S·(1 + M)·2E-bias (for the sake of simplicity, we do not consider de-normalized representations).
The mantissa M is in the range [0, 1). The ‘1’ bit added to M is the so-called implicit bit. The mantissa M is represented with 23 bits (with MSB and LSB of weights 2−1 and 2−23, respectively). The exponent E is an 8-bit integer, while the exponent bias is 127. Thus, the exponent value Ebias is in the range [–127, 128].
The Figure 1 shows an example, where the decimal number −13.140625 is represented according to the IEEE 754 single precision standard. This number can be written as: −23 × (1 + 1/2 + 1/8 + 1/64 + 1/512). Thus: the sign bit is S = 1; the exponent value is E = 3 + bias = 130, corresponding to binary 10000010; the mantissa is M = 1/2 + 1/8 + 1/64 + 1/512, corresponding to binary 10100100100000000000000.
Figure 2 shows the block diagram of the single precision floating-point multiplier.
As detailed above, the input a is expressed on 32 bits, with ne = 8 bits dedicated to the exponent and nm = 23 bits dedicated to the mantissa. More precisely, the most significant bit (MSB) of a, a[31], is the sign Sa, while the portions Ea = a[30:23] and Ma = a[22:0] are the exponent and the mantissa, respectively. The same scheme applies also to the input b, with sign Sb = b[31], exponent Eb = b[30:23], and mantissa Mb = b[22:0]. Therefore, the signals a and b are represented as follows:
a = ( 1 ) S a · 2 E a 127 · ( 1 + M a ) b = ( 1 ) S b · 2 E b 127 · ( 1 + M b )
Similarly, the product c = a·b is expressed in the form:
c = ( 1 ) S c · 2 E c 127 · ( 1 + M c )
where Sc, Ec, and Mc are the sign, exponent and mantissa of c, respectively.
The mantissas Ma, Mb, and Mc are in the range [0, 1) with MSB and LSB of weights 2−1 and 2−23, respectively. Therefore, they constitute the fractional part of the quantities (1 + Ma), (1 + Mb), and (1 + Mc) (also indicated with (1.Ma), (1.Mb) and (1.Mc) in the following).
The arithmetic stage of Figure 2 computes Sc, Ec, and the mantissa product P. As shown, the XOR between Sa and Sb allows us to obtain the sign Sc. The sum between Ea and Eb computes the exponent Ec, while the subtraction by 127 considers the exponent bias. In the mantissa multiplier, a bit ‘1’ is explicitly concatenated to Ma and Mb at the most significant position (see (1.Ma) = (1 + Ma) and (1.Mb) = (1 + Mb) in the figure)), to compute:
P = ( 1 + M a ) · ( 1 + M b )
It is worth noting that (1 + Ma) lays in the range [1, 2). As consequence, P is in the range [1, 4), that means that 2 MSBs are involved for representing its integer part (namely p[47] and p[46]). We also underline that P is expressed on 48 bits.
The normalization logic extracts the mantissa Mc from P and consequently adjusts the exponent Ec. If P < 2 (i.e., if p[47] is low), the product P is in the form (1.Mc). Therefore, the extraction of Mc simply requires to select the fractional part of P, that is p[45:0]. Actually, only 24 bits of P (that is p[45:22]) are sent to the next rounding stage.
In the case P ≥ 2 (i.e., if p[47] is high), we need to right-shift P of one position in order to express the product in the form (1.Mc) before to apply the rounding. Therefore, the segment p[46:23] is sent to the rounding stage. In this case, moreover, the exponent is incremented by one to compensate for the shift on P, as shown in exponent update block in Figure 2.
The mantissa rounding block, finally, rounds the mantissa to 23 bits, as required by the standard, with the help of a 24-bit adder.

2.2. Proposed Optimization for the Mantissa Computation

In the proposed approach, instead of computing P as in (3), we compute:
P = P 1
as follows:
P = ( 1 + M a ) · ( 1 + M b ) 1 = M a · M b + M a + M b
Please note that (3) requires a 24 bit × 24 bit multiplier, due to the bit ‘1’ explicitly concatenated to Ma and Mb. On the other hand, the calculation of P′ in (5) requires a smaller 23 bit × 23 bit multiplier. Equation (5) opens the way to a static segmentation of the multiplication: in fact in the multiplication operation described by (3) the bit ‘1’ does not allow to segment the multiplier inputs, while, on the other hand, the multiplicands contained in (5) does not include any stuck at ‘1’ bit. Figure 3a shows the structure of the proposed floating-point multiplier and highlights in the dashed red rectangle the multiply-and-add unit (MAA) that implements Equation (5).
As shown in the figure, since the LSBs of the mantissas and of the product Ma·Mb have weights 2−23 and 2−46, respectively, both Ma and Mb are added with a hard-wired left-shift of 23 positions, to properly align their LSBs to Ma·Mb.
In this implementation, the two MSBs of P’ are exploited to manage the normalization process. Before to proceed with the discussion, let us observe that the maximum value of P’, named Pmax, can easily be calculated since the maximum value of Ma and Mb is equal to (1–2−23). We have: Pmax = (1–2−23)2 + 2·(1–2−23), therefore Pmax is slightly less than 3. Thus, 0 ≤ P’ < 3 and, in binary representation, the two MSBs of P’, which constitute its integer part, can vary between ‘00’, ‘01’, and ‘10’. This observation helps the calculation of P = P’ + 1. In fact, the fractional bits of P coincide with the fractional bits of P’ and only the two MSBs of P (the integer part) should be computed from the two MSBs of P’. To this purpose, let us consider the various cases reported in the truth table of Figure 3b. When p’[47:46] = ‘00’, the two MSBs of P are p[47:46] = ‘01’. In this case, the normalization is not required since P < 2. Conversely, when p’[47:46] = ‘01’ or ‘10’, the two MSBs of P are p[47:46] = ‘10’ or ‘11’. These cases demand for the normalization since P ≥ 2. Following the truth table, we can define the signal sel as the OR between p’[47] and p’[46], and normalize the mantissa when sel = 1 (see also Figure 3a). The truth table also shows that p[46] is inverted with respect to p’[46]. Therefore, the output of the mantissa multiplier is given by {~p’[46], p[45:0]}, where ‘~’ is used to represent the complement operation.
After mantissa multiplier, mantissa normalization and rounding follow, as in Figure 2. The main difference is the use of signal sel (instead of p[47]) to perform normalization. Note also that exponent update is implemented without multiplexer, by using sel to increment the exponent.

2.3. Static Segmentation Method

The static segmentation [40,41] reduces the size of the multiplier by segmenting the multiplicands before the product. Each segment comprises m bits, with nm/2 < m < nm. Therefore, an m × m multiplier is employed for computing the result instead of a nm × nm multiplier. Synthesizable HDL descriptions of fixed-point static-segmented multipliers are available in [45].
Figure 4 shows the segmentation for the mantissa Ma, where, for the sake of simplicity, we assume nm = 8 bits with m = 5. The mantissa Ma is divided in a lower portion (LPa), given by its m LSBs, and in an upper portion (UPa), given by its m MSBs. The (nm-m) MSBs constitutes the control segment CSa, used to decide between LPa and UPa. When the bits of CSa are low, the segment LPa is selected for the multiplication. Conversely, if at least one of the bits of CS is high, the segment UPa is chosen. A similar mechanism is applied also to input Mb. Therefore, defining the selection flags αMa and αMb as the OR of bits of the control segments CSa and CSb, respectively, and naming Massm, Mbssm the segmented signals, the following relations hold:
M a s s m = { M a [ m 1 : 0 ] M a [ n m 1 : n m m ] i f   α M a = 0 i f   α M a = 1
M b s s m = { M b [ m 1 : 0 ] M b [ n m 1 : n m m ] i f   α M b = 0 i f   α M b = 1
Please note that an approximation error is introduced when the upper portion of the mantissa is multiplied (i.e., when the selection flag is high) since the less significant part is discarded (namely εMa in the figure). On the contrary, no approximation is introduced if the lower portion is selected.
After the multiplication, a left-shift is required to extend the result from 2·m bits to 2·nm bits. Then, the approximate product Kssm is computed as follows:
K s s m = ( M a s s m · 2 L S H a ) · ( M b s s m · 2 L S H b ) = ( M a s s m · M b s s m ) · 2 L S H
with LSHa and LSHb defined as
L S H a = { 0 n m m i f   α M a = 0 i f   α M a = 1 L S H b = { 0 n m m i f   α M b = 0 i f   α M b = 1
The term LSH in (8) is given by: LSH = LSHa + LSHb and is the number of positions for the overall left-shift:
L S H = { 0 n m m 2 · ( n m m ) i f   α M a = 0 ,   α M b = 0   i f   α M a = 0 ,   α M b = 1   o r   α M a = 1 ,   α M b = 0 i f   α M a = 1 ,   α M b = 1

2.4. Static Segmentation Applied to the Mantissa Product

In this paragraph, we apply the segmentation to the inputs of the MAA unit to employ an m × m multiplier and an m-bits adder for the mantissa computation. By assuming that Massm and Mbssm have LSB of weight 20, we write the approximate product Papprx as follows:
P a p p r x = ( M a s s m · M b s s m ) · 2 2 · n m + L S H + M a s s m · 2 n m + L S H a + M b s s m · 2 n m + L S H b
Figure 5 depicts the circuit that implements Equation (11).
The multiplexer mux LSH, used for the left-shift of the multiplier output, does not allow to merge the multiplier and the adder in a fused PPM, and leads to the usage of two cascaded CPAs, one for computing the product Massm·Mbssm 2−2nm+LSH and one for computing P’apprx. Furthermore, Massm· 2−nm+LSHa and Mbssm· 2−nm+LSHb involve up to 2·nm bits when αMa = 1 and αMb = 1, thus degrading the performances of the adder.
In order to optimize the MAA unit, we analyze Equation (11) considering all the possible combinations for αMa and αMb. Figure 6 shows the alignments of the signals reporting the exact MAA for reference (Figure 6a), and the alignments with segmentation in the case nm = 8 and m = 6 (Figure 6b–d).
If αMa = 0 and αMb = 0, the shifts LSH, LSHa and LSHb are zero, and Equation (11) becomes:
P a p p r x = [ ( M a s s m · M b s s m ) · 2 n m + M a s s m + M b s s m ] · 2 n m
As shown in Figure 6b, the product is on 2·m bits, whereas the shifted mantissas imply the usage of an adder on (nm + m) bits. To employ an m-bit adder, we truncate the product Massm·Mbssm·2−nm discarding the gray LSBs in the figure. Therefore, we approximate Equation (12) as follows:
P a p p r x = [ M a s s m · M b s s m · 2 n m + M a s s m + M b s s m ] · 2 n m
where · is used to represent the floor operator.
In order to implement the calculations in (13) with a fused PPM and an unique CPA, we rearrange (13) as follows:
P a p p r x = [ M a s s m , m u l t · M b s s m , m u l t + M a s s m , a d d + M b s s m , a d d ] · 2 n m
where:
M a s s m , m u l t = M a s s m · 2 n m / 2 M b s s m , m u l t = M b s s m · 2 n m / 2 M a s s m , a d d = M a s s m M b s s m , a d d = M b s s m
and · is the ceiling operator.
For the sake of simplicity, let us consider the case in which nm is even. The final floor operation on Massm,mult and Mbssm,mult of (15) results in truncating nm/2 bits from m bit operands and, therefore, the computation of (14) requires a (mnm/2) × (mnm/2) multiplier.
As observable from the formula, we can first compute Massm,mult, Mbssm,mult, that are truncated versions of Massm, Mbssm, and then execute the multiply-and-add operation. In this way, we remove the shift between the multiplier and the adder and design the MAA unit with a fused PPM and a unique CPA.
When αMa = 1 and αMb = 0, we have LSH = LSHa = nmm (refer also to (9),(10)). Therefore, the (11) becomes
P a p p r x = [ ( M a s s m · M b s s m ) · 2 n m + M a s s m + M b s s m · 2 L S H ] · 2 n m + L S H
Applying the same reasoning, we need to discard the gray LSBs of Massm·Mbssm·2nm and of Mbssm·2LSH (see Figure 6c) in order to involve again an m-bit addition. Furthermore, we need also to remove the shift at the output of the multiplier.
The above considerations lead to the following approximation for (16):
P a p p r x = [ M a s s m , m u l t · M b s s m , m u l t + M a s s m , a d d + M b s s m , a d d ] · 2 n m + L S H
with Massm,mult, Mbssm,mult, and Massm,add defined as in (15), and Mbssm,add that is
M b s s m , a d d = M b s s m · 2 L S H
Here, the addend Mbssm is also truncated along with the multiplier inputs. Since the expression of Massm,mult and Mbssm,mult remains the same, also in this case the computation of (17) requires a (mnm/2) × (mnm/2) multiplier.
A similar reasoning applies also for the case αMa = 1 and αMb = 0, with Massm,mult, Mbssm,mult, and Mbssm,add defined as in (15) and Massm,add = M a s s m · 2 L S H .
Finally, when αMa = 1 and αMb = 1, we have LSH = 2·(nmm) and LSHa = LSHb = nmm. Therefore, the (11) becomes
P a p p r x = [ ( M a s s m · M b s s m ) · 2 n m + L S H a + M a s s m + M b s s m ] · 2 n m + L S H a
As shown in Figure 6d, we need to truncate (Massm·Mbssm)·2nm+LSHa for employing an m-bit adder, and to shift the inputs of the multiplier for employing a unique CPA. Therefore, the (19) is approximated as follows
P a p p r x = [ M a s s m , m u l t · M b s s m , m u l t + M a s s m , a d d + M b s s m , a d d ] · 2 n m + L S H a
With
M a s s m , m u l t = M a s s m · 2 ( n m + L S H a ) / 2 M b s s m , m u l t = M b s s m · 2 ( n m + L S H a ) / 2
and Massm,add, Mbssm,add defined as in (15). If for the sake of simplicity we consider the case in which nm and m are even, the final floor operation on Massm,mult and Mbssm,mult of (21) results in truncating (nmLSHa)/2 = m/2 bits from m bit operands, therefore the computation of (20) requires a (m/2) × (m/2) multiplier. Since mnm/2 < m/2, overall, the computation of P’apprx requires a (m/2) × (m/2) multiplier.
Figure 7a shows the circuit that implements the static segmented multiply-and-add unit (SSMAA), whereas Table 1 collects the segments of Massm, Mbssm obtained with the segment-and-truncate approach. The multiplexers on Ma and Mb perform the segmentation of Table 1 at the input of both the multiplier and the adder. Then, a further multiplexer applies the final shift on Pssm in order to express the result Papprx on (2·nm + 2) = 48 bits.
It is worth mentioning that the Papprx is subsequently quantized in the normalization process. It follows that the rounding allows us to reduce the size of the final multiplexer, since the result can be expressed on (2·nm + 2 − nq) bits, nq being the number of discarded LSBs (nq = 22 for the single precision FPM).

2.5. Error Analysis and Correction

The approximation errors that affect the proposed SSMAA are due to the discarding of the least significant parts of Ma and Mb (when the segmentation selects the upper segments), and due to the truncations used to employ the m-bit adder and a unique CPA. It follows that the largest error arises when αMa = 1, αMb = 1.
Estimating the error E = P’ − Papprx can help in improving the accuracy of the SSMAA unit. The idea is to compute the SSMAA result as Papprx,c = Papprx + E*, where E* is a term which approximates E.
To study the error, by considering Table 1, let us write the inputs of the SSMAA as:
M a s s m , m u l t = U P a m u l t = k = n m m n m 1 m a k 2 k M b s s m , m u l t = U P b m u l t = k = n m m n m 1 m b k 2 k M a s s m , a d d = U P a a d d = k = n m m n m 1 m a k 2 k M b s s m , a d d = U P b a d d = k = n m m n m 1 m b k 2 k
where UPamult, UPbmult, UPaadd, and UPbadd are the quantities selected for the SSMAA, and m’ = mm/2 = m/2. In this discussion we consider an even value of m for the sake of simplicity in order to have the same truncation for both Massm,mult and Mbssm,mult.
Additionally, the quantities pruned for the segmentation are:
ε M a , m u l t = k = 0 n m m 1 m a k 2 k ε M b , m u l t = k = 0 n m m 1 m b k 2 k ε M a , a d d = k = 0 n m m 1 m a k 2 k ε M b , a d d = k = 0 n m m 1 m b k 2 k
By writing Ma and Mb as follows for the product:
M a = ( U P a m u l t + ε M a , m u l t ) · 2 n m M b = ( U P b m u l t + ε M b , m u l t ) · 2 n m
and as follows for the addition:
M a = ( U P a a d d + ε M a , a d d ) · 2 n m M b = ( U P b a d d + ε M b , a d d ) · 2 n m
we obtain the exact result of MAA:
P = ( U P a m u l t U P b m u l t + U P a m u l t ε M b , m u l t + U P b m u l t ε M a , m u l t + ε M a , m u l t ε M b , m u l t ) · 2 2 n m + ( U P a a d d + ε M a , a d d ) · 2 n m + ( U P b a d d + ε M b , a d d ) · 2 n m
Therefore, being Papprx given by:
P a p p r x = U P a m u l t U P b m u l t · 2 2 n m + U P a a d d · 2 n m + U P b a d d · 2 n m
the error E is:
E = ( U P a m u l t ε M b , m u l t + U P b m u l t ε M a , m u l t + ε M a , m u l t ε M b , m u l t ) · 2 2 n m + + ε M b a , a d d · 2 n m + ε M b , a d d · 2 n m
In order to simplify the discussion, let us focus on the most significant term E*:
E * = ( U P a m u l t ε M b , m u l t + U P b m u l t ε M a , m u l t ) · 2 2 n m
Following the approach of [41], we approximate εMa,mult with:
ε M a , m u l t ( 2 m a n m m 1 + 1 ) · 2 n m m 2
and write UPamult as:
U P a m u l t = 2 n m m k = 0 m 1 m a k + n m m 2 k
A similar expression holds for εMb,mult and for UPbmult.
Then, substituting (30) and (31) in (29), the error becomes:
E * = [ 2 2 n m 2 m 2 k = 0 m 1 e k + n m m 2 k ] · 2 2 n m
with:
e k + n m m = 2 · ( m a k + n m m m b n m m 1 + m b k + n m m m a n m m 1 ) + + m a k + n m m + m b k + n m m
Approximating ek+nm-m’ with:
e * k + n m m = 4 · ( m a k + n m m m b n m m 1   O R   m b k + n m m m a n m m 1 )
for further simplification, we obtain:
E * [ 2 2 n m 2 m 2 k = 0 m 1 e * k + n m m 2 k ] · 2 2 n m
In the case of m odd, we consider the following approximate expression for e*k+nm-m’:
e * k + n m m = 4 · ( m a k + n m m m b n m m 1   O R   m b k + n m m m a n m m 1 )
Approximating the summation in (35) with two or three terms allows us to sufficiently reduce the approximation error in the SSMAA.
Figure 7b depicts the scheme of the proposed corrected SSMAA (cSSMAA in the following). The correction term E*, highlighted in red, can be directly fused in the PPM, thus implying a minimum impact on the hardware performances.

3. Results

3.1. Error Metrics Analysis

We exploit some of the common error metrics to verify the performances of the proposed SSFPM. Let us define the exact and the approximate result of the floating-point multiplier as C and Capprx, respectively. The approximation error is given by: EFMP = CCapprx while the error distance is ED = |EFMP|. The normalized mean error and the normalized mean error distance are NM = mean(EFMP)/Cmax and NMED = mean(ED)/Cmax, respectively, where mean(·) is the average operator and Cmax = 2128 is the maximum value of C. The mean relative error distance is MRED = mean(ED/C), whereas the normalized maximum error distance is defined as NmaxED= EDmax/Cmax, EDmax being the maximum value of ED.
We compute the error metrics by simulating the SSFPM with 107 couples of random inputs laying in the whole range of representation (that is about [−2128, 2128]).
Figure 8 represents the behavior of NMED and MRED as function of the parameter m, used to define the accuracy of the segmentation. In the corrected version of the SSFPM, two terms are used to approximate the (35).
As expected, increasing m allows us to improve both NMED and MRED, which lowers from 1.32 × 10−5 to 3.16 × 10−7 and from 3.41 × 10−3 to 7.96 × 10−5, respectively. The figure also highlights the beneficial effects of the correction technique, since NMED and MRED are about halved in the corrected case.
Table 2 collects the error metrics of the proposed SSFPM and cSSFPM in the cases m = 12, 14, 16, and 18. For the sake of comparison, we show also the accuracy of the approximate FPMs obtained by exploiting the techniques of [22,42,43] for the computation of the product Ma·Mb in (5), and by implementing the proposal of [17].
In [42] (referred as TOSAM in the table), the authors devise the usage of a dynamic segmentation to perform multiplication with good precision and optimized power and area. Here, the multiplication is revisited as multiply-and-add operation, with the multiplicands truncated on h bits after the leading one. The addends are also truncated since h + 4 LSBs are discarded. The work [43] (referred as DRUM in the table) explores the dynamic segmentation selecting a segment of k bits (comprising the leading one bit and a correction bit at the least significant position) for the multiplication. Then, a barrel shifter allows us to extend the result on the desired number of bits. The HDL description of DRUM multiplier [43] is available in [46]. The technique of [22] (referred as DATE17 in the table) organizes the PPM in groups of L rows. Then, the rows of each group are compressed by means of L-input OR gates. In [17] (referred as AFMB in the table), a modified version of the Mitchell algorithm is employed to compute the product of (3), with the input signals that are truncated on t bits. Here, since the leading one bit always corresponds to the implicit bit, no leading one detectors and barrel shifters are used, with beneficial improvements on power and area.
As shown in the table, the proposed SSFPM and cSSFPM are competitive with the state-of-the-art. The implementations cSSFPM perform better than [22] with L = 4, L = 6 and slightly overcome [42,43], exhibiting NMED and MRED up to 1.37 × 10−6 and 3.48 × 10−4 (m = 16), and 6.79 × 10−7 and 1.73 × 10−4 (m = 18). In the case m = 18, also SSFPM performs better than [42,43], with NMED = 1.57 × 10−6 and MRED = 4.05 × 10−4. The FPM [22] with L = 2 shows best accuracy with NMED = 5.30 × 10−8 and MRED = 1.35 × 10−5. On the other hand, the implementations [17] exhibits worst performances, with NMED around 10−4/10−3 and MRED that ranges between 2.5 × 10−2 and 2.9 × 10−1.
It is also worth noting that the correction technique allows us to reduce the NmaxED since it lowers the maximum error of the segmented multiplier.

3.2. Electrical Performances

We synthesize the proposed segmented floating-point multipliers and the state-of-the-art with a physical flow in TSMC 28 nm CMOS technology using Cadence Genus, with clock period Tclk of 500 ps and using standard threshold voltage cell library. Furthermore, the exact FPM described in Section 2.2. is implemented for reference. In the implementation of FPMs, pipeline levels are often inserted, to shorten the critical. In our case, we employ a single pipeline level between the arithmetic stage and the normalization logic in all investigated FPMs.
The power and area are obtained from post-synthesis analyses, with power consumption computed by simulating the synthesized netlist at 1 GHz. To this aim, SDF and TCF format files are employed to annotate the path delays and the toggle activity of the signals. In addition, we also study the minimum clock period employable for each FPMs, corresponding to the minimum clock period that ensures positive slack.
As shown in Table 3, the proposed SSFPM and cSSFPM are also competitive from a hardware point of view, exhibiting remarkable results with area and power reductions up to −82.3% and −85.5% with SSFPM and m = 12. The corrected circuits exhibit only a slight worsening of the performances with respect to the uncorrected ones, with degradations of area and power in the range of 1–3%.
The minimum Tclk improves with respect to the exact implementation, with the multiplier m = 12 that exhibits the best speed.
The FPMs with [22,42,43] show poorer performances, with area improvement limited to 47.4%, 50.2%, and 60%, respectively, and power saving up to 55.1%, 50.9% and 65%. In particular [22], L = 2, which offers best accuracy, exhibits limited improvements, with area and power reductions of 35.2% and 47.5%. The minimum Tclk worsens with [42,43], increasing up to 25% and 35%, respectively. On the contrary, the minimum Tclk improves up to 30% in the case of [22]. The work [17] shows best hardware performances with area reduction around −94% and power saving up to −95.7%. Moreover, the minimum Tclk exhibits best improvement (up to 70% with t = 18). These performances are due to the realization of the multiplication by means of a truncated adder in the logarithmic number system. However, by looking to the data of Table 2, we can conclude that these electrical features are achieved at the price of an accuracy loss.

3.3. Image Processing Applications

3.3.1. Image Filtering

We verify the validity of our proposal exploring the performances of the segmented FPMs in an image filtering application. The image filtering performs the following operation on the input image I
I f i l t e r e d ( i , j ) = p = d d q = d d I ( i + p , j + q ) · h ( i + d + 1 , j + d + 1 )
where h is the kernel matrix of the filter, with size (2d + 1) × (2d + 1).
In this example, we consider a smoothing application with gaussian kernel of size 5 × 5 and standard deviation 2, whose coefficients are floating-point numbers with values normalized to 1. The Matlab command fspecial(‘gaussian’, 5, 2) allows us to obtain the filter mask. Then, the products in (37) are realized by means of our approximate FPMs and the state-of-the-art.
As a second example, we analyze the performances in edge detection. In this case, the gradient G of the original image is computed to highlight its edges. To this aim, the Sobel kernel hSobel and its transpose hTSobel are used to compute the x and the y component of G (named Gx and Gy respectively). Then, the gradient G is computed as follows:
G = G x 2 + G y 2
In our trial, we use the following Sobel kernel:
h S o b e l = ( 1 2 1 0 0 0 1 2 1 )
and used investigated FPMs to implement the multiplications in (38), (39). Table 4 collects the results in terms of structural similarity index measure (SSIM) and peak signal-to-noise-ratio (PSNR, in dB), obtained by filtering three images (Lena, Cameraman, and Lady). For each example, we average the SSIM and the PSNR obtained by processing the test images, as well as we indicate the overall average SSIM and PSNR as synthetic parameter in the last column of the table.
As shown in Table 4, the segmented FPMs achieve SSIM very close to 1 and PSNR up to 70 dB with gaussian filtering and produce the exact result with m > 12 in the edge detection. The correction technique allows us to improve the PSNR, with a maximum increment of 5.4 dB in the case m = 14, smoothing application.
Moreover, the implementations with [22,42,43] L = 4, 6 achieve good results, with SSIM very close to 1 and a PSNR values up to 63.1 dB in the gaussian case. Similarly, the PSNR overcomes 60 dB with [42,43] k = 6, and [22] L = 4 in the edge detection. The design [22] L = 2 offers best results on average, whereas the implementation [17] shows worst performances due to the stronger approximation, with a PSNR up to 37.6 dB on average (case t = 14).
Figure 9 shows the results of the edge detection using our SSFPMs with m = 12, 18 and the implementations from [17]. We report the negative of G for better highlighting the computed edges. The images obtained with the proposed multipliers are practically unchanged with respect to the exact one (as also expected from the high values of SSIM and PSNR). The results from [17] depend on the truncation t, with a sensible degradation of the detection in the background with t = 18 (again confirmed by the lower values of SSIM and PSNR).

3.3.2. JPEG Compression

The JPEG compression leverages the limit of human senses to reduce the bit-volume of images. The compression algorithm exploits the discrete cosine transformation (DCT), applied to disjoined blocks of size 8 × 8 pixels, and performs a quantization to the transformed image with a variable resolution. The lower frequency components, more visible to human eyes, are approximated with a finer quantization step, whereas the high frequency component, less appreciable to human eyes, are approximated with a rougher quantization step.
In addition, a quality factor Q, defined in the range [0, 100], allows us to further modify the quantization accuracy and, as consequence, the compression, with Q = 0 that implies hardest compression and Q = 100 that implies lightest compression. Then, the quantized transformed image is reported in the original domain by means of the inverse discrete cosine transformation (iDCT). In the algorithm, the DCT and the iDCT require the multiplication between real numbers and are suitable to verify the validity of our proposal in a concrete scenario.
For the performances assessment, we approximate the DCT and the iDCT by using the proposed segmented FPMs and the state-of-the-art, considering the cases Q = 40, Q = 70, and Q = 100.
Table 5 collects the results, again expressed in terms of SSIM and PSNR, obtained by compressing three grayscale images: Lena, Cameraman, and Peppers (SSIM and PSNR are computed relative to image compression performed with exact multiplier). In this case, we also report the mean SSIM and the mean PSNR for each Q, obtained by processing the three images, and indicate the overall average SSIM and PSNR in the last column of the table. Figure 10 reports Peppers compressed images in case of segmented multipliers, Q = 40.
As observable, our segmented multipliers ensure again a SSIM very close to 1 in all the cases, and a PSNR that ranges between 47 dB and 63 dB on average. Increasing m allows us to improve the quality of results, while the correction technique leads to a remarkable PSNR increment especially for small values of m (up to +8 dB in the case m = 12, Q = 100). In addition, performances are almost constant with respect to the quality factor Q. The designs with [22,42,43] L = 4, 6 exhibit lower accuracy, with the average PSNR limited to 53 dB, whereas [22] L = 2 allows best compression with PSNR of 71.4 dB on average.

3.3.3. Tone Mapping of HDR Images

As a last example, we employ the investigated multipliers for a tone mapping application on HDR images. An HDR image exploits floating-point pixels to represent a high dynamic range of luminance. A mapping operation is needed to properly adapt the high dynamic range of luminance to a lower range of values, whenever requested by the application. The algorithm devised in [44] exploits the following formula to perform tone mapping:
L m a p p e d ( i , j ) = L ( i , j ) 1 + L ( i , j )
The L(i,j) is a pixel of the luminance matrix of the image, obtained by executing the following steps:
L t m p ( i , j ) = 0.27 · R ( i , j ) + 0.67 · G ( i , j ) + 0.06 · B ( i , j ) L m = exp ( 1 N log ( L t m p ( i , j ) ) ) L ( i , j ) = β L m · L t m p ( i , j )
where R, G, B are the three channels of the input HDR image, N is the number of pixels in a channel, Lm is the geometric mean of Ltmp, and β is a value in the range [0, 1].
Applying (40) allows us to properly scale the luminance since large values of L(i,j) are normalized to 1, whereas small values of L(i,j) are practically unmodified. Then, the channels of the original image are weighted for L(i,j) as follows:
R o u t ( i , j ) = [ L ( i , j ) · R ( i , j ) ] / L m G o u t ( i , j ) = [ L ( i , j ) · G ( i , j ) ] / L m B o u t ( i , j ) = [ L ( i , j ) · B ( i , j ) ] / L m
and quantized to integer values in the range [0, 255].
As in the previous demonstrations, the approximate tone mapping is obtained using the approximate FPMs in the multiplications in (41) and (42). In our trials, we pose β = 0.5.
Table 6 collects the results of the comparison between the exact the approximate algorithm, again expressed in terms of SSIM and PSNR, with the processed HDR images that are Oxford Church, Office and Bottles_small. Figure 11 depicts the result for the Bottles_small image.
In this case, the SSIM achieved with the segmented multipliers is also very close to 1, and the PSNR ranges between 46 dB and 64 dB on average. The results are comparable with [42,43], whereas the implementation [17,22] L = 6 exhibits lower performances (with PSNR up to 32.6 dB on average).

4. Discussion

The static segmentation applied to the MAA operation of (5) allows us to reduce power, delay, and area of the FPM while preserving remarkable accuracy performances. This is mainly due to the reduction in the input bit-width in the MAA unit. In addition, the proposed shift-and-truncate technique allows us to realize a fused PPM for the MAA unit with a unique CPA, with beneficial effects on the hardware performances.
At the same time, the SSFPM exhibits very good accuracy since (i) the approximation is applied only to the mantissa computation and (ii) the employed approximation, based on static segmentation of the fixed-point multiplier needed in mantissa computation, provides a small relative error. Indeed, the SSM approach introduces an error only when large values are represented, whereas small values are not approximated. This leads to good error performances that are suitable for the implementation of a floating-point multiplier.
The multipliers that exploit [42,43] benefit from the dynamic segmentation to approximate the product Ma·Mb in (5). These solutions achieve satisfactory error performances, as also demonstrated in the image processing applications. The error metrics are comparable with the SSFPM, as well as the SSIM and PSNR values demonstrate the high capabilities of these solutions. On the other hand, shifters are required between the multiplier and the adder of the MAA, thus implying the usage of two CPAs. This leads to a worsening of the hardware performances with respect to the SSFPM, as demonstrated by the lower power and area savings, limited to 55% and 50%, respectively. Furthermore, the minimum Tclk is larger due to the shifters placed between the multiplier and the adder of the MAA unit.
The FPMs with the approximation of [22] exhibit performances that strongly depend on L, with the accuracy that worsens as L increases. Power and area reductions are limited in the case L = 2 (35.2% and 47.5%, respectively), whereas an improvement is registered with L = 4, 6 (up to 59.7% and 65.0%, respectively) at the cost of precision.
The proposal of [17] exhibits best hardware performances, with area and power saving that overcome 90% in both cases. These improvements are due to the usage of an adder in place of a multiplier for the realization of the mantissa product. In addition, leading one detectors and barrel shifters are not used in this case since the position of the implicit bit is always known. In any way, approximating the product with a logarithmic sum leads to a larger error, due to logarithm approximation. In addition, the adder is also truncated for further optimization, with a consequent accuracy loss.
As part of a joint assessment between hardware performances and accuracy, we show the power and the area saving versus the NMED and MRED of each FPM in Figure 12. The black dotted line indicates the Pareto Front. For NMED < 10−5 the proposed SSFPMs overcomes [22,42,43] with L = 4, 6, exhibiting a power saving greater than 70% and an area improvement larger than 60%. The cSSFPMs also define the Pareto front in that region of the graph. Similar observations also apply to the case MRED < 10−2. The figures show again that [17] performs better from a hardware point of view, but at the cost of a loss of accuracy (NMED > 8 × 10−5 and MRED > 2 × 10−2). Similarly, [22] L = 2 has the best accuracy, but at the cost of a degradation of power and area.
Therefore, we can conclude that our proposal offers the best trade-off between hardware improvement and accuracy of results.

5. Conclusions

In this paper we propose a novel low-power approximate floating-point multiplier, based on static segmentation. To optimize hardware performances, the mantissa product is first revisited as a multiply-and-add operation. In this way the implicit bit is excluded from the computation to reduce the complexity of the multiplier, and additive logic is introduced to recover the exact result.
Then, a segmentation scheme is applied to the mantissas, to reduce the size of the multiplier. The proposed technique leverages a segment-and-truncate approach to eliminate the left-shift operation at the output of the multiplier. In this way, we can realize the mantissa multiplier by means of a fused partial product matrix and a unique carry-propagate adder, with beneficial effects on the hardware performances. In addition, a correction term is introduced, to reduce the approximation error due to the segmentation. The accuracy of the circuit can be accurately tailored at the design time, by acting on a single parameter, m.
Analysis of error metrics show that proposed floating-point multiplier is competitive with the state-of-the-art for values of m in the range 12–18 (in the considered case of single-precision floating-point format). Syntheses in 28 nm CMOS reveal a remarkable reduction in the power consumption and area, with best results achieved with m = 12 (up to 85% of power saving and up to 82% of area reduction compared to exact floating-point multiplier).
Implementations of several image processing algorithms (JPEG compression, image filtering, tone mapping of HDR images) show the effectiveness of the proposed architecture in real applications.
By a joint analysis of electrical performances and error metrics, the proposed approximate floating-point multiplier overcomes the state-of-the-art, exhibiting the best trade-off between hardware improvements and quality of results.

Author Contributions

Conceptualization, G.D.M., G.S. and A.G.M.S.; methodology, G.D.M. and G.S.; software, G.D.M. and G.S.; validation, G.D.M., G.S. and A.G.M.S.; formal analysis, G.D.M., G.S. and A.G.M.S.; investigation, G.D.M., G.S. and A.G.M.S.; data curation, G.D.M. and G.S.; writing—original draft preparation, G.D.M., G.S., A.G.M.S. and D.D.C.; writing—review and editing, A.G.M.S., D.D.C. and N.P.; visualization, G.D.M. and G.S.; supervision, A.G.M.S., D.D.C. and N.P.; project administration, A.G.M.S. and D.D.C. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The Verilog code is available on GitHub at https://github.com/GenDiMeo/SSFPM (accessed on 18 September 2022).

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Montanari, D.; Castellano, G.; Kargaran, E.; Pini, G.; Tijani, S.; De Caro, D.; Strollo, A.G.M.; Manstretta, D.; Castello, R. An FDD Wireless Diversity Receiver with Transmitter Leakage Cancellation in Transmit and Receive Bands. IEEE J. Solid-State Circuits 2018, 53, 1945–1959. [Google Scholar] [CrossRef]
  2. Kurzo, Y.; Kristensen, A.T.; Burg, A.; Balatsoukas-Stimming, A. Hardware Implementation of Neural Self-Interference Cancellation. IEEE J. Emerg. Sel. Top. Circuits Syst. 2020, 10, 204–216. [Google Scholar] [CrossRef]
  3. Mookherjee, S.; DeBrunner, L.; DeBrunner, V. A low power radix-2 FFT accelerator for FPGA. In Proceedings of the 2015 49th Asilomar Conference on Signals, Systems and Computers, Pacific Grove, CA, USA, 8–11 November 2015; pp. 447–451. [Google Scholar] [CrossRef]
  4. Wei, X.; Du, G.-M.; Wang, X.; Cao, H.; Hu, S.; Zhang, D.; Li, Z. FPGA Implementation of Hardware Accelerator for Real-time Video Image Edge Detection. In Proceedings of the 2021 IEEE 15th International Conference on Anti-counterfeiting, Security, and Identification (ASID), Xiamen, China, 29–31 October 2021; pp. 16–20. [Google Scholar] [CrossRef]
  5. Xie, G.; Wang, C. An All-Digital PLL for Video Pixel Clock Regeneration Applications. In Proceedings of the 2009 WRI World Congress on Computer Science and Information Engineering, Los Angeles, CA, USA, 31 March–2 April 2009; pp. 392–396. [Google Scholar] [CrossRef]
  6. De Caro, D.; Tessitore, F.; Vai, G.; Imperato, N.; Petra, N.; Napoli, E.; Parrella, C.; Strollo, A.G.M. A 3.3 GHz Spread-Spectrum Clock Generator Supporting Discontinuous Frequency Modulations in 28 nm CMOS. IEEE J. Solid-State Circuits 2015, 50, 2074–2089. [Google Scholar] [CrossRef]
  7. De Caro, D.; Di Meo, G.; Napoli, E.; Petra, N.; Strollo, A.G.M. A 1.45 GHz All-Digital Spread Spectrum Clock Generator in 65nm CMOS for Synchronization-Free SoC Applications. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 67, 3839–3852. [Google Scholar] [CrossRef]
  8. IEEE Std 754-2019 (Revision of IEEE 754-2008); Microprocessor Standards Committee, IEEE Standard for Floating-Point Arithmetic. IEEE Computer Society: New York, NY, USA, 2019; pp. 1–84. [CrossRef]
  9. Tong, J.; Nagle, D.; Rutenbar, R. Reducing power by optimizing the necessary precision/range of floating-point arithmetic. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2000, 8, 273–286. [Google Scholar] [CrossRef]
  10. Han, J.; Orshansky, M. Approximate computing: An emerging paradigm for energy-efficient design. In Proceedings of the 2013 18th IEEE European Test Symposium (ETS), Avignon, France, 27–30 May 2013; pp. 1–6. [Google Scholar] [CrossRef]
  11. Chippa, V.K.; Chakradhar, S.T.; Roy, K.; Raghunathan, A. Analysis and characterization of inherent application resilience for approximate computing. In Proceedings of the 2013 50th ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, 29 May–7 June 2013; pp. 1–9. [Google Scholar] [CrossRef]
  12. Chen, C.; Qian, W.; Imani, M.; Yin, X.; Zhuo, C. PAM: A Piecewise-Linearly-Approximated Floating-Point Multiplier with Unbiasedness and Configurability. IEEE Trans. Comput. 2021, 71, 2473–2486. [Google Scholar] [CrossRef]
  13. Camus, V.; Schlachter, J.; Enz, C.; Gautschi, M.; Gurkaynak, F.K. Approximate 32-bit floating-point unit design with 53% power-area product reduction. In Proceedings of the ESSCIRC Conference 2016: 42nd European Solid-State Circuits Conference, Lausanne, Switzerland, 12–15 September 2016; pp. 465–468. [Google Scholar] [CrossRef]
  14. Imani, M.; Peroni, D.; Rosing, T. CFPU: Configurable floating point multiplier for energy-efficient computing. In Proceedings of the 2017 54th ACM/EDAC/IEEE Design Automation Conference (DAC), Austin, TX, USA, 18–22 June 2017; pp. 1–6. [Google Scholar] [CrossRef]
  15. Imani, M.; Garcia, R.; Gupta, S.; Rosing, T. RMAC: Runtime Configurable Floating Point Multiplier for Approximate Computing. In Proceedings of the International Symposium on Low Power Electronics and Design (ISLPED’ 18), Seattle, WA, USA, 23–25 July 2018; Association for Computing Machinery: New York, NY, USA, 2018; pp. 1–6. [Google Scholar] [CrossRef]
  16. Zhang, H.; Ko, S.-B. Variable-Precision Approximate Floating-Point Multiplier for Efficient Deep Learning Computation. IEEE Trans. Circuits Syst. II Express Briefs 2022, 69, 2503–2507. [Google Scholar] [CrossRef]
  17. Saadat, H.; Bokhari, H.; Parameswaran, S. Minimally Biased Multipliers for Approximate Integer and Floating-Point Multiplication. IEEE Trans. Comput. Des. Integr. Circuits Syst. 2018, 37, 2623–2635. [Google Scholar] [CrossRef]
  18. Petra, N.; De Caro, D.; Garofalo, V.; Napoli, E.; Strollo, A.G.M. Design of Fixed-Width Multipliers With Linear Compensation Function. IEEE Trans. Circuits Syst. I Regul. Pap. 2010, 58, 947–960. [Google Scholar] [CrossRef]
  19. De Caro, D.; Petra, N.; Strollo, A.G.M.; Tessitore, F.; Napoli, E. Fixed-Width Multipliers and Multipliers-Accumulators With Min-Max Approximation Error. IEEE Trans. Circuits Syst. I Regul. Pap. 2013, 60, 2375–2388. [Google Scholar] [CrossRef]
  20. Zervakis, G.; Tsoumanis, K.; Xydis, S.; Soudris, D.; Pekmestzi, K. Design-Efficient Approximate Multiplication Circuits Through Partial Product Perforation. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2016, 24, 3105–3117. [Google Scholar] [CrossRef] [Green Version]
  21. Leon, V.; Zervakis, G.; Xydis, S.; Soudris, D.; Pekmestzi, K. Walking through the Energy-Error Pareto Frontier of Approximate Multipliers. IEEE Micro 2018, 38, 40–49. [Google Scholar] [CrossRef]
  22. Qiqieh, I.; Shafik, R.; Tarawneh, G.; Sokolov, D.; Yakovlev, A. Energy-efficient approximate multiplier design using bit significance-driven logic compression. In Proceedings of the Design, Automation & Test in Europe Conference & Exhibition (DATE), Lausanne, Switzerland, 27–31 March 2017; Volume 2017, pp. 7–12. [Google Scholar] [CrossRef]
  23. Esposito, D.; Strollo, A.G.M.; Alioto, M. Low-power approximate MAC unit. In Proceedings of the 2017 13th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), Taormina, Italy, 12–15 June 2017; pp. 81–84. [Google Scholar] [CrossRef]
  24. Esposito, D.; Strollo, A.G.M.; Napoli, E.; De Caro, D.; Petra, N. Approximate Multipliers Based on New Approximate Compressors. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 65, 4169–4182. [Google Scholar] [CrossRef]
  25. Yang, Z.; Han, J.; Lombardi, F. Approximate compressors for error-resilient multiplier design. In Proceedings of the 2015 IEEE International Symposium on Defect and Fault Tolerance in VLSI and Nanotechnology Systems (DFTS), Amherst, MA, USA, 12–14 October 2015; pp. 183–186. [Google Scholar] [CrossRef]
  26. Akbari, O.; Kamal, M.; Afzali-Kusha, A.; Pedram, M. Dual-Quality 4:2 Compressors for Utilizing in Dynamic Accuracy Configurable Multipliers. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2017, 25, 1352–1361. [Google Scholar] [CrossRef]
  27. Ahmadinejad, M.; Moaiyeri, M.H.; Sabetzadeh, F. Energy and area efficient imprecise compressors for approximate multiplication at nanoscale. AEU-Int. J. Electron. Commun. 2019, 110, 152859. [Google Scholar] [CrossRef]
  28. Strollo, A.G.M.; Napoli, E.; De Caro, D.; Petra, N.; Di Meo, G. Comparison and Extension of Approximate 4-2 Compressors for Low-Power Approximate Multipliers. IEEE Trans. Circuits Syst. I Regul. Pap. 2020, 67, 3021–3034. [Google Scholar] [CrossRef]
  29. Strollo, A.G.M.; De Caro, D.; Napoli, E.; Petra, N.; Di Meo, G. Low-Power Approximate Multiplier with Error Recovery using a New Approximate 4-2 Compressor. In Proceedings of the 2020 IEEE International Symposium on Circuits and Systems (ISCAS), Seville, Spain, 12–14 October 2020; pp. 1–4. [Google Scholar] [CrossRef]
  30. Kulkarni, P.; Gupta, P.; Ercegovac, M. Trading Accuracy for Power with an Underdesigned Multiplier Architecture. In Proceedings of the 2011 24th Internatioal Conference on VLSI Design, Chennai, India, 2–7 January 2011; pp. 346–351. [Google Scholar] [CrossRef]
  31. Gillani, G.A.; Hanif, M.A.; Verstoep, B.; Gerez, S.H.; Shafique, M.; Kokkeler, A.B.J. MACISH: Designing Approximate MAC Accelerators with Internal-Self-Healing. IEEE Access 2019, 7, 77142–77160. [Google Scholar] [CrossRef]
  32. Ansari, M.S.; Jiang, H.; Cockburn, B.F.; Han, J. Low-Power Approximate Multipliers Using Encoded Partial Products and Approximate Compressors. IEEE J. Emerg. Sel. Top. Circuits Syst. 2018, 8, 404–416. [Google Scholar] [CrossRef]
  33. Waris, H.; Wang, C.; Liu, W.; Han, J.; Lombardi, F. Hybrid Partial Product-Based High-Performance Approximate Recursive Multipliers. IEEE Trans. Emerg. Top. Comput. 2020, 10, 507–513. [Google Scholar] [CrossRef]
  34. Nunziata, I.; Zacharelos, E.; Saggese, G.; Strollo, A.M.G.; Napoli, E. Approximate Recursive Multipliers Using Carry Truncation and Error Compensation. In Proceedings of the 2022 17th Conference on Ph.D. Research in Microelectronics and Electronics (PRIME), Villasimius, Italy, 12–15 June 2022; pp. 137–140. [Google Scholar] [CrossRef]
  35. Zacharelos, E.; Nunziata, I.; Saggese, G.; Strollo, A.G.; Napoli, E. Approximate Recursive Multipliers Using Low Power Building Blocks. IEEE Trans. Emerg. Top. Comput. 2022, 10, 1315–1330. [Google Scholar] [CrossRef]
  36. Kim, M.S.; Del Barrio, A.A.; Hermida, R.; Bagherzadeh, N. Low-power implementation of Mitchell’s approximate logarithmic multiplication for convolutional neural networks. In Proceedings of the 2018 23rd Asia and South Pacific Design Automation Conference (ASP-DAC), Jeju, Korea, 22–25 January 2018; pp. 617–622. [Google Scholar] [CrossRef]
  37. Liu, W.; Xu, J.; Wang, D.; Wang, C.; Montuschi, P.; Lombardi, F. Design and Evaluation of Approximate Logarithmic Multipliers for Low Power Error-Tolerant Applications. IEEE Trans. Circuits Syst. I Regul. Pap. 2018, 65, 2856–2868. [Google Scholar] [CrossRef]
  38. Kim, M.S.; Del Barrio, A.A.; Oliveira, L.T.; Hermida, R.; Bagherzadeh, N. Efficient Mitchell’s Approximate Log Multipliers for Convolutional Neural Networks. IEEE Trans. Comput. 2018, 68, 660–675. [Google Scholar] [CrossRef]
  39. Ansari, M.S.; Cockburn, B.F.; Han, J. An Improved Logarithmic Multiplier for Energy-Efficient Neural Computing. IEEE Trans. Comput. 2021, 70, 614–625. [Google Scholar] [CrossRef]
  40. Narayanamoorthy, S.; Moghaddam, H.A.; Liu, Z.; Park, T.; Kim, N.S. Energy-Efficient Approximate Multiplication for Digital Signal Processing and Classification Applications. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2014, 23, 1180–1184. [Google Scholar] [CrossRef]
  41. Strollo, A.G.M.; Napoli, E.; De Caro, D.; Petra, N.; Saggese, G.; Di Meo, G. Approximate Multipliers Using Static Segmentation: Error Analysis and Improvements. IEEE Trans. Circuits Syst. I Regul. Pap. 2022, 69, 2449–2462. [Google Scholar] [CrossRef]
  42. Vahdat, S.; Kamal, M.; Afzali-Kusha, A.; Pedram, M. TOSAM: An Energy-Efficient Truncation- and Rounding-Based Scalable Approximate Multiplier. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 2019, 27, 1161–1173. [Google Scholar] [CrossRef]
  43. Hashemi, S.; Bahar, R.I.; Reda, S. DRUM: A Dynamic Range Unbiased Multiplier for approximate applications. In Proceedings of the 2015 IEEE/ACM International Conference on Computer-Aided Design (ICCAD), Austin, TX, USA, 2–6 November 2015; pp. 418–425. [Google Scholar] [CrossRef]
  44. Kinoshita, Y.; Shiota, S.; Kiya, H. Fast inverse tone mapping with Reinhard’s global operator. In Proceedings of the 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), New Orleans, LA, USA, 5–9 March 2017; pp. 1972–1976. [Google Scholar] [CrossRef]
  45. GitHub. Available online: https://github.com/astrollo/SSM (accessed on 16 February 2020).
  46. GitHub. Available online: https://github.com/scale-lab/DRUM (accessed on 18 April 2020).
Figure 1. Representation of the decimal number −13.140625 according to the IEEE 754 single precision standard. The numbers from 0 to 31 highlight the bit position in the digital string.
Figure 1. Representation of the decimal number −13.140625 according to the IEEE 754 single precision standard. The numbers from 0 to 31 highlight the bit position in the digital string.
Electronics 11 03005 g001
Figure 2. Block diagram of the single precision floating-point multiplier.
Figure 2. Block diagram of the single precision floating-point multiplier.
Electronics 11 03005 g002
Figure 3. (a) Block diagram of the improved floating-point multipliers and (b) truth table that links the MSBs of P’ and P. NA stands for not applicable.
Figure 3. (a) Block diagram of the improved floating-point multipliers and (b) truth table that links the MSBs of P’ and P. NA stands for not applicable.
Electronics 11 03005 g003
Figure 4. Segmentation of the mantissa Ma in the case nm = 8 bits and m = 5 bits.
Figure 4. Segmentation of the mantissa Ma in the case nm = 8 bits and m = 5 bits.
Electronics 11 03005 g004
Figure 5. Segmented MAA that implements Equation (11).
Figure 5. Segmented MAA that implements Equation (11).
Electronics 11 03005 g005
Figure 6. Signals alignment for the mantissa product in (a) the exact case, and the approximate static segmented cases with (b) αMa = 0, αMb = 0, (c) αMa = 1, αMb = 0, and (d) αMa = 1, αMb = 1. In this example, we consider nm = 8 bits and m = 6 bit.
Figure 6. Signals alignment for the mantissa product in (a) the exact case, and the approximate static segmented cases with (b) αMa = 0, αMb = 0, (c) αMa = 1, αMb = 0, and (d) αMa = 1, αMb = 1. In this example, we consider nm = 8 bits and m = 6 bit.
Electronics 11 03005 g006
Figure 7. Block diagram of the SSMAA (a) without the correction term and (b) with the correction term (highlighted in red).
Figure 7. Block diagram of the SSMAA (a) without the correction term and (b) with the correction term (highlighted in red).
Electronics 11 03005 g007
Figure 8. Behavior of (a) NMED and (b) MRED as function of m for the SSFPM and the cSSFPM.
Figure 8. Behavior of (a) NMED and (b) MRED as function of m for the SSFPM and the cSSFPM.
Electronics 11 03005 g008
Figure 9. Results of the edge detection for the Cameraman image.
Figure 9. Results of the edge detection for the Cameraman image.
Electronics 11 03005 g009
Figure 10. JPEG compressed images with multiplications realized by means of proposed SSFPMs. The quality factor is Q = 40.
Figure 10. JPEG compressed images with multiplications realized by means of proposed SSFPMs. The quality factor is Q = 40.
Electronics 11 03005 g010
Figure 11. Example of tone mapping with the proposed segmented multipliers and the state-of-the-art.
Figure 11. Example of tone mapping with the proposed segmented multipliers and the state-of-the-art.
Electronics 11 03005 g011
Figure 12. Power saving versus (a) NMED and (b) MRED, and area saving versus (c) NMED and (d) MRED.
Figure 12. Power saving versus (a) NMED and (b) MRED, and area saving versus (c) NMED and (d) MRED.
Electronics 11 03005 g012
Table 1. Segmentation scheme applied to Ma, Mb at the inputs of both the multiplier and the adder.
Table 1. Segmentation scheme applied to Ma, Mb at the inputs of both the multiplier and the adder.
αMa, αMbMassm,multMbssm,multMassm,addMbssm,addFinal Shift
00Ma [ m 1 : n m / 2 ]Mb [ m 1 : n m / 2 ]Ma[m−1:0]Mb[m−1:0]nmnq
01Ma [ m 1 : n m / 2 ]Mb [ n m 1 : n m m + n m / 2 ]Ma[m−1:nmm]Mb[nm−1:nmm]2nmm−nq
10Ma [ n m 1 : n m m + n m / 2 ]Mb [ m 1 : n m / 2 ]Ma[nm−1:nmm]Mb[m−1:nmm]2nmm−nq
11Ma [ n m 1 : n m m + m / 2 ]Mb [ n m 1 : n m m + m / 2 ]Ma[nm−1:nmm]Mb[nm−1:nmm]2nmm−nq
Table 2. Error metrics of the approximate FPMs.
Table 2. Error metrics of the approximate FPMs.
Approximate FPMNMNMEDMREDNmaxED
TOSAM [42], h = 28.13 × 10−97.79 × 10−61.99 × 10−31.48 × 10−2
TOSAM [42], h = 3−1.93 × 10−83.91 × 10−61.00 × 10−37.27 × 10−3
DRUM [43], k = 4−3.99 × 10−82.08 × 10−55.39 × 10−32.85 × 10−2
DRUM [43], k = 6−7.30 × 10−95.20 × 10−61.35 × 10−37.31 × 10−3
AFMB [17], t = 143.65 × 10−79.01 × 10−52.50 × 10−24.72 × 10−2
AFMB [17], t = 163.28 × 10−79.10 × 10−52.53 × 10−25.30 × 10−2
AFMB [17], t = 18−2.81 × 10−61.02 × 10−32.91 × 10−14.85 × 10−1
DATE17 [22], L = 22.63 × 10−105.30 × 10−81.35 × 10−51.82 × 10−4
DATE17 [22], L = 4−7.94 × 10−81.03 × 10−52.68 × 10−32.03 × 10−2
DATE17 [22], L = 6−1.93 × 10−75.28 × 10−51.34 × 10−28.70 × 10−2
SSFPM m = 124.06 × 10−81.32 × 10−53.41 × 10−37.55 × 10−3
cSSFPM m = 123.46 × 10−95.67 × 10−61.45 × 10−34.62 × 10−3
SSFPM m = 141.69 × 10−86.52 × 10−61.68 × 10−33.80 × 10−3
cSSFPM m = 145.23 × 10−92.78 × 10−67.08 × 10−42.28 × 10−3
SSFPM m = 161.06 × 10−83.20 × 10−68.28 × 10−41.85 × 10−3
cSSFPM m = 168.91 × 10−91.37 × 10−63.48 × 10−41.12 × 10−3
SSFPM m = 186.98 × 10−91.57 × 10−64.05 × 10−49.03 × 10−4
cSSFPM m = 182.42 × 10−96.79 × 10−71.73 × 10−45.37 × 10−4
Table 3. Hardware performances of the proposed SSFPM and cSSFPM, and the state-of-the-art.
Table 3. Hardware performances of the proposed SSFPM and cSSFPM, and the state-of-the-art.
FPMMin Tclk [ps]Area [µm2][Power µW@1 GHz]
Exact3741753.82874.3
TOSAM [42], h = 2430 (15.0%)922.7 (−47.4%)1290.1 (−55.1%)
TOSAM [42], h = 3468 (25.1%)1226.6 (−30.1%)1710.6 (−40.5%)
DRUM [43], k = 4447 (19.5%)873.9 (−50.2%)1411.5 (−50.9%)
DRUM [43], k = 6503 (34.5%)1320.2 (−24.7%)2361.0 (−17.9%)
AFMB [17], t = 14144 (−61.5%)112.4 (−93.6%)179.7 (−93.7%)
AFMB [17], t = 16134 (−64.2%)99.4 (−94.3%)157.9 (−94.5%)
AFMB [17], t = 18110 (−70.6%)85.2 (−95.1%)124.5 (−95.7%)
DATE17 [22], L = 2333 (−11.0%)1136.3 (−35.2%)1508.5 (−47.5%)
DATE17 [22], L = 4299 (−20.1%)815.2 (−53.5%)1199.1 (−58.3%)
DATE17 [22], L = 6261 (−30.2%)707.0 (−59.7%)1005.2 (−65.0%)
SSFPM m = 12289 (−22.7%)310.5 (−82.3%)417.1 (−85.5%)
cSSFPM m = 12286 (−23.5%)364.8 (−79.2%)454.8 (−84.2%)
SSFPM m = 14323 (−13.6%)392.7 (−77.6%)504.1 (−82.5%)
cSSFPM m = 14314 (−16.0%)437.9 (−75.0%)532.4 (−81.5%)
SSFPM m = 16333 (−11.0%)484.9 (−72.4%)594.5 (−79.3%)
cSSFPM m = 16337 (−9.9%)477.5 (−72.8%)603.6 (−79.0%)
SSFPM m = 18355 (−5.1%)524.7 (−70.1%)683.6 (−76.2%)
cSSFPM m = 18360 (−3.7%)611.4 (−65.2%)787.9 (−72.6%)
Table 4. Accuracy of the proposed segmented FPMs and of the state-of-the-art in image filtering application.
Table 4. Accuracy of the proposed segmented FPMs and of the state-of-the-art in image filtering application.
Approximate
FPM
Gaussian FilterEdge DetectorAverage
SSIMPSNR [dB]SSIMPSNR [dB]SSIMPSNR [dB]
TOSAM [42], h = 21.00061.71.00062.41.00062.1
TOSAM [42], h = 31.00063.11.00064.41.00063.7
DRUM [43], k = 40.99954.71.00055.40.99955.1
DRUM [43], k = 60.99956.71.00061.20.99958.9
AFMB [17], t = 140.99641.50.98833.70.99337.6
AFMB [17], t = 160.99640.90.96932.70.98636.8
AFMB [17], t = 180.99839.80.81027.80.92933.8
DATE17 [22], L = 21.00091.31.0001.000
DATE17 [22], L = 40.99956.41.00063.90.99960.1
DATE17 [22], L = 60.99847.30.99950.40.99948.9
SSFPM m = 120.99955.51.00071.90.99963.7
cSSFPM m = 121.00060.41.00076.41.00068.4
SSFPM m = 140.99959.01.0001.000
cSSFPM m = 141.00064.41.0001.000
SSFPM m = 161.00062.61.0001.000
cSSFPM m = 161.00067.01.0001.000
SSFPM m = 181.00066.11.0001.000
cSSFPM m = 181.00070.01.0001.000
Table 5. Performances of the proposed SSFPM, cSSFPM and the state-of-the-art in the JPEG compression.
Table 5. Performances of the proposed SSFPM, cSSFPM and the state-of-the-art in the JPEG compression.
Approximate
FPM
Q = 40Q = 70Q = 100Average
SSIMPSNR [dB]SSIMPSNR [dB]SSIMPSNR [dB]SSIMPSNR [dB]
TOSAM [42], h = 20.99748.90.99749.50.99953.90.99850.8
TOSAM [42], h = 30.99850.80.99851.50.99956.10.99952.8
DRUM [43], k = 40.99244.80.99345.60.99749.20.99446.5
DRUM [43], k = 60.99850.00.99850.90.99956.10.99852.3
AFMB [17], t = 140.97334.50.98035.00.98635.70.98035.1
AFMB [17], t = 160.97234.40.97934.90.98535.60.97934.9
AFMB [17], t = 180.96032.70.96833.00.97433.60.96733.1
DATE17 [22], L = 21.00063.91.00075.51.00074.71.00071.4
DATE17 [22], L = 40.99343.90.99544.80.99847.60.99545.4
DATE17 [22], L = 60.97034.20.97534.60.98235.50.97634.7
SSFPM m = 120.99545.60.99746.90.99949.70.99747.4
cSSFPM m = 120.99952.90.99953.01.00057.80.99954.6
SSFPM m = 140.99951.70.99952.41.00056.90.99953.7
cSSFPM m = 140.99955.40.99956.71.00060.21.00057.4
SSFPM m = 160.99954.90.99956.31.00060.11.00057.1
cSSFPM m = 161.00058.11.00059.31.00062.51.00060.0
SSFPM m = 181.00058.31.00059.41.00062.61.00060.1
cSSFPM m = 181.00062.01.00062.91.00064.81.00063.2
Table 6. Results for tone mapping of HDR images.
Table 6. Results for tone mapping of HDR images.
Approximate
FPM
Bottle_SmallOxford ChurchOfficeAverage
SSIMPSNR [dB]SSIMPSNR [dB]SSIMPSNR [dB]SSIMPSNR [dB]
TOSAM [42], h = 21.00051.40.99950.40.99948.40.99950.1
TOSAM [42], h = 31.00053.50.99955.41.00055.41.00054.8
DRUM [43], k = 40.99943.90.99643.50.99842.10.99843.1
DRUM [43], k = 61.00055.30.99955.51.00054.31.00055.0
AFMB [17], t = 141.00033.60.96226.20.98835.30.98131.7
AFMB [17], t = 161.00034.40.96226.30.98835.40.98232.1
AFMB [17], t = 181.00033.30.94926.50.97630.20.97230.0
DATE17 [22], L = 21.00073.61.00072.71.00072.71.00073.0
DATE17 [22], L = 41.00045.10.99845.90.99945.50.99945.5
DATE17 [22], L = 60.99533.00.98133.00.98731.90.98832.6
SSFPM m = 121.00047.50.99945.20.99946.00.99946.2
cSSFPM m = 121.00056.10.99954.20.99951.31.00053.9
SSFPM m = 141.00052.40.99951.40.99952.00.99952.0
cSSFPM m = 141.00058.11.00060.71.00056.91.00058.6
SSFPM m = 161.00054.81.00057.21.00054.61.00055.6
cSSFPM m = 161.00060.61.00060.51.00060.11.00060.4
SSFPM m = 181.00059.31.00060.21.00057.81.00059.1
cSSFPM m = 181.00067.11.00062.51.00062.91.00064.2
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Di Meo, G.; Saggese, G.; Strollo, A.G.M.; De Caro, D.; Petra, N. Approximate Floating-Point Multiplier based on Static Segmentation. Electronics 2022, 11, 3005. https://doi.org/10.3390/electronics11193005

AMA Style

Di Meo G, Saggese G, Strollo AGM, De Caro D, Petra N. Approximate Floating-Point Multiplier based on Static Segmentation. Electronics. 2022; 11(19):3005. https://doi.org/10.3390/electronics11193005

Chicago/Turabian Style

Di Meo, Gennaro, Gerardo Saggese, Antonio G. M. Strollo, Davide De Caro, and Nicola Petra. 2022. "Approximate Floating-Point Multiplier based on Static Segmentation" Electronics 11, no. 19: 3005. https://doi.org/10.3390/electronics11193005

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop