POSIT vs. Floating Point in Implementing IIR Notch Filter by Enhancing Radix-4 Modified Booth Multiplier

Esmaeel, Anwar A.; Abed, Sa’ed; Mohd, Bassam J.; Fairouz, Abbas A.

doi:10.3390/electronics11010163

Open AccessArticle

POSIT vs. Floating Point in Implementing IIR Notch Filter by Enhancing Radix-4 Modified Booth Multiplier

¹

Computer Engineering Department, Kuwait University, P.O. Box 5969, Safat 13060, Kuwait

²

Computer Engineering Department, The Hashemite University, P.O. Box 330127, Zarqa 13133, Jordan

^*

Author to whom correspondence should be addressed.

Electronics 2022, 11(1), 163; https://doi.org/10.3390/electronics11010163

Submission received: 16 November 2021 / Revised: 30 December 2021 / Accepted: 1 January 2022 / Published: 5 January 2022

(This article belongs to the Topic Advanced Systems Engineering: Theory and Applications)

Download

Browse Figures

Versions Notes

Abstract

:

The increased demand for better accuracy and precision and wider data size has strained current the floating point system and motivated the development of the POSIT system. The POSIT system supports flexible formats and tapered precision and provides equivalent accuracy with fewer bits. This paper examines the POSIT and floating point systems, comparing the performance of 32-bit POSIT and 32-bit floating point systems using IIR notch filter implementation. Given that the bulk of the calculations in the filter are multiplication operations, an Enhanced Radix-4 Modified Booth Multiplier (ERMBM) is implemented to increase the calculation speed and efficiency. ERMBM enhances area, speed, power, and energy compared to the POSIT regular multiplier by 26.80%, 51.97%, 0.54%, and 52.22%, respectively, without affecting the accuracy. Moreover, the Taylor series technique is adopted to implement the division operation along with cosine arithmetic unit for POSIT numbers. After comparing POSIT with floating point, the accuracy of POSIT is 92.31%, which is better than floating point’s accuracy of 23.08%. Moreover, POSIT reduces area by 21.77% while increasing the delay. However, when the ERMBM is utilized instead of the POSIT regular multiplier in implementing the filter, POSIT outperforms floating point in all the performance metrics including area, speed, power, and energy by 35.68%, 20.66%, 31.49%, and 45.64%, respectively.

Keywords:

POSIT; floating point; infinite impulse response filter; Radix-4 modified booth multiplier; taylor series

1. Introduction

The discovery of deep neural networks, an increase in data size, and a high demand for better accuracy and precision mean that the standard floating point (FP) system will not be efficient enough to meet specified requirements. Therefore, modifications are required on the computer’s architecture to improve the performance. One of the modifications is upgrading the numbering system from the current floating point standard (FP IEEE-745), which has not been modified for three decades.

The bit width of the three fields in FP format is fixed which results in redundant bits in both the mantissa and the exponent. In floating point format, bits are also wasted for exceptions, including NaNs (not a number). This exception case represents illegal mathematical operations, including dividing a number by a zero [1]. Therefore, an alternative number representation system called POSIT was introduced by Gustafson in 2017 to address the drawbacks of the floating point system. Unums, which stands for universal numbers, was invented by John Gustafson as an alternative for representing real numbers using a finite number of bits. Unums latest version is called POSIT or Type III unum. It is considered to be a hardware friendly format compared to previous versions of unums. Although the Type II unum was invented to resolve the drawbacks of Type I including the complexity of hardware implementation along with the ability of representing certain values with different representations, the Type II unum depends heavily on look-up tables for most of its operations. Moreover, implementing dot product is considered to be expensive in this format. That is why POSIT was invented to resolve all these drawbacks while being hardware friendly. POSIT systems are capable of executing complex mathematical operations and maintaining the precision and accuracy of the result with a smaller number of bits. The exponent in POSIT is of variable length, where the smaller exponents are assigned with fewer bits, which introduces the concept of tapered precision [2].

The idea of having an extra field called “regime” makes POSIT unique as it allows for the creation of tapered accuracy. The values with small exponents provide better accuracy compared to cumbersome or minuscule numbers, which would have less accuracy. Moreover, unlike floating point, POSIT does not have subnormal or denormalized numbers as the hidden bit of its fractional part is always set to 1. Therefore, all POSIT’s values are normalized and only one rounding mode exists. POSIT has only two special cases reserved, namely 0 and

\infty

. The POSIT numbering system does not have NaN (not a number) cases [3]. Furthermore, POSIT neither over- nor under-flows, and its implementation is much simpler and smaller compared to floating point [4].

The POSIT numbering system is considered a new approach and currently is not deployed in the arithmetic architectures of existing hardware. The focus of this paper is to demonstrate that the POSIT arithmetic unit outperforms floating point in accuracy, area, speed, power, and energy. As mentioned previously, POSIT provides better accuracy compared with floating point. However, POSIT could potentially increase the execution time. On the one hand, the fixed position of the bits representing the exponent and the fractional portion of floating point facilitates simpler and parallel decoding operations. POSIT involves serialization to determine the regime prior to the decoding operation. Hence, to boost the design speed, a special multiplier called the Enhanced Radix-4 Modified Booth Multiplier (ERMBM) was developed and implemented.

In this paper, the two numbering systems are employed to implement the second-order infinite impulse response notch filter. When comparing the two systems, the focus is on one of the Digital Signal Processing (DSP’s) well-known applications as its main purpose is to manage and manipulate information, which provides accuracy and precision in the fundamental factors. This accuracy mainly stems from the general structure of the DSP. Specifically, the sampled analog signal is converted into numbers, followed by executing certain computations, and converting the result back to an analog signal, hence the filter accuracy of the calculation is a critical design feature. Recent studies focus on enhancing the execution of complex algorithms and functions without increasing power, energy dissipation, or area. As optimizing all these performance metrics is significant to the field of DSP, the paper focuses on enhancing and testing performance by implementing the POSIT numbering system while enhancing its arithmetic units to optimize its performance even more.

The main contributions in our paper are:

(1): Enhancing the POSIT multiplication operation by implementing a special type of multiplier called ERMBM, which significantly enhances the area, speed, and energy compared with the regular multiplier proposed previously in [5].
(2): Implementing the POSIT cosine function along with the division arithmetic unit utilizing a shared hardware that is based on the Taylor series technique to reduce the design area.
(3): Building the second-order IIR notch filter for POSIT and floating point designs. The floating point adder is built as proposed by the work represented in [6], while the FP multiplier is based on [7].
(4): Comparing the performances of POSIT when utilizing a regular multiplier with floating point which results in better accuracy and area enhancement.

The rest of the paper is organized as follows. Section 2 includes the background information. Section 3 reviews previous articles. Section 4 demonstrates the components design. Section 5 summarizes the implementation results along with the discussion. Finally, Section 6 includes the conclusion of this study and recommendations for future work.

2. Background

2.1. POSIT

An N-bit POSIT number is defined by its exponent size. The way the POSIT numbering system is structured is different than the floating point structure. The POSIT number consists of a sign bit, regime bits, exponent bits, and mantissa bits, as shown in Figure 1 [8].

Often, the sign and regime bits will be enough to represent any POSIT number. That is, the existence of the exponent and mantissa bits is not always necessary for representing numbers in POSIT. Therefore, what distinguishes POSIT from floating point is the presence of an extra field.

The sign bit in POSIT is set to 0 if the number is positive or 1 if it is negative [10]. If the number is negative, computing the 2′s complement of the number will be necessary before decoding the remaining fields, including the regime, exponent, and mantissa. Additionally, the regime field consists of a sequence of bits such that the MSB of this sequence has a value that is opposite to the value of the least significant bit of the sequence. The MSB bit can be either 0 or 1. If it is 0, then the k (regime value) is determined by calculating the first complement of the number and counting the number of 1′s in the binary string. In this case, the sign of k is negative. For example, if the regime field is equal to 00001, then the value of k will be found by calculating the first complement, which results in 11110, and then the number of existing 1′s is counted, making k = −4. Another way of calculating the regime value (k) if the leading bit representing the MSB is 0, is to count the number of 0s. However, if the MSB bit of the regime field is equal to 1 then the value of k is found by counting the number of existing 1′s minus one. The sign of the k value in this case will be positive. For example, if regime is equal to 11110, the value of k = 4 − 1 = +3.

In POSIT, the regime value of k is added to the exponent value. That is, instead of having a single field to determine the operational range of a number, POSIT uses two fields, namely the regime field and the exponent field, to form the scale factor. The exponent field is preserved as unsigned numbers while the regime field is treated as signed numbers. The k value of the regime contributes to the overall scale factor by

{(2)}^{2^{(e_{b i t s})}}^{k}

, where

e_{b i t s}

represents the width of the exponent. In addition, the contribution of the exponent value to the overall scale factor is by

2^{e x p}

, where the exp represents the value of the exponent. Both values are multiplied together to construct a single sign number that is called the scale factor which represents the full operational range, as illustrated in Equation (1).

Scale Factor = {(2)}^{2^{(e_{b i t s})}}^{k} \cdot 2^{e x p}

(1)

The fraction field is denoted the “mantissa” or “magnificent.” The mantissa’s format in POSIT is the same as it is in floating point. It is normalized, which means that its decimal value will be less than 1 (0.xxx) [8]. The decimal equivalent of a POSIT number can be calculated using Equation (2), while the decimal equivalent of a floating point number can be calculated using Equation (3) [11].

The decimal value of POSIT = {(- 1)}^{s} \cdot {({(2)}^{2}^{(e_{b i t s})})}^{k} \cdot 2^{e x p} \cdot 1 \cdot m a n t i s s a

(2)

The decimal value of FP = {(- 1)}^{s} \cdot 2^{(e x p - b i a s)}) \cdot 1 \cdot m a n t i s s a

(3)

However, sometimes the size of the POSIT number is not enough to include all the exponent bits as per its exponent field size (

e_{b i t s}

). This is because the width of the regime sequence can be a size up to N

-

1; where N is the size of the POSIT number [5]. When this is the case, the remaining bits of the POSIT number that are left for allocating the exponent are considered as the MSB and the rest are treated as 0. For example, when the POSIT number (01111001) is of 8 bits, and the number of bits that are allocated for the exponent is 4 (

e_{b i t s}

= 4), then the sign bit is 0 and the regime sequence is equal to 11110. However, since only 2 bits (01) of the exponent part are represented in the POSIT number, they are treated as the MSB and the rest of the bits are set to 0. Therefore, the value of exponent is 0100 and the mantissa is equal to (1 + 0.0).

2.2. IIR Filter

In signal processing, filters are used to eliminate some of the signal’s undesired components or to extract certain segments out of a signal. Filters are constructed using digital or analog circuits. In case of digital filters, mathematical operations would be applied on a discrete signal where the analog filters operate on continuous-time analog signals. Digital filters can be classified into two main filters: IIR filter and finite impulse response filter (FIR) filter. The main feature that distinguishes between the two filters is the nature of impulse response; the response of the FIR filter is of finite duration unlike the response of IIR which is infinite.

There are two essential signals that can assist as the input in the process of system evaluation in control systems including the step and the impulse response. The impulse response, referred to as the impulse response function (IRF), is utilized to assess the response of a system for all the frequency elements while maintaining the same magnitude.

Filters can be categorized based on the mechanism that is used for the selection of various frequency bands from a signal. For example, the notch filter is considered to be a band-stop filter with a narrow stopband where it is used in applications that require the exclusion of certain frequency components. It consists of different types including a high bandwidth notch filter (FIR Filter) and a low bandwidth filter (IIR Filter). The main difference between the two filters is the fact that the FIR filter could affect a range of frequencies unlike the IIR filter, which would only eliminate the frequency of interest. The FIR filter attenuates the frequencies around the anticipated null. Since the IIR filter keeps the broad band signal unchanged while eliminating the narrow band interface signal, it is considered to be an essential and critical block in the communication system. A detailed comparison between FIR and IIR filter is demonstrated in Table 1.

3. Related Work

The main aim of this paper is to compare POSIT and floating point numbering systems and examine the performance of the IIR filter implementation. Consequently, the literature review is divided into two main categories: IIR filter and the POSIT numbering system.

3.1. IIR Filter

Numerous published articles have examined IIR implementation and addressed various aspects such as design, filter order, and fixed point representation.

Ushanandhini et al. [12] enhanced the performance of the IIR filter by proposing a new technique to optimize the multiplication process utilizing a Radix-4 multiplier to reduce the number of partial products from n to n/2. They also included the idea of pre-charging, aiming to reduce the delay time, increase the electrical units’ lifespan, and enhance the trustworthiness of the system.

A new proposal was developed by Wang et al. [13] where the IIR filter is designed through an enhanced local search operator of the multi-objective optimization evolutionary algorithm (MOEA). This proposal considers the filter order, magnitude phase error, and the linear phase response inaccuracy. The fixed point representation was applied to the LS-MOEA to establish a new form of local search operator-enhanced multi-objective evolutionary algorithm with floating point representation (LS-MOEA-FR), which was tested on multiple types of IIR filter, including LP, HP, BP, and BS.

Chengliang et al. [14] implemented a 5-order IIR filter through MATLAB using Simulink and it was tested and proven that it is possible to establish an efficient IIR filter through MATLAB. It also concluded that as the order of the filter increases, the better the filtering result but with more computational time to be consumed. To overcome this issue, the paper recommended using a low order IIR filter, which is not an issue as it is possible to acquire same design requirement with low-order IIR filter compared to FIR filter.

In Bajwa et al. [15], the authors designed a low-pass IIR filter and proposed a new technique optimizing its architecture to improve precision. Unlike the coding that is seen in most studies, Bajwa et al.’s filter parameters were encoded using a fixed point numbering system. In fact, it is better to go with the fixed point numbering system for real applications as it saves power and, therefore, cost. It will also reduce the amount of computational resources that are being utilized. The two-state ensemble evolutionary algorithm (TEEA) was used for the optimization process. This algorithm basically is divided into two phases. The first phase includes the global shrinking phase; where it is used to reduce the searching range and redirect it to the most favorable area as fast as possible. The second phase on the other hand, is the local exploration phase where it is used to investigate the constrained region comprehensively aiming to find a better solution.

The transfer functions of one-dimensional (1-D) and two-dimensional (2-D) filters are included in Stavrou et al. [16]. Two of the most crucial (2-D) filters, including IIR and FIR filters, were designed using different techniques. The IIR filter in this paper was designed using the bilinear transformation method while the FIR filter was implemented using the windows method. The utilized approaches to design the 2-D IIR filter were based on the appropriate 1-D functions along with the appropriate optimization methods.

A software was developed to overcome the constrained optimization challenges in Stavrou et al. [17]. The software is based on the modified hybrid genetic algorithm which includes a series of enhanced genetic operators to maintain the viability of experimental solutions and ceases using a stochastic stopping rule. It involves a global exploration search along with a local optimization process aiming for a faster recognition of global minimum. The proposed technique was examined on multiple benchmark functions that were provided in the literature review and it was utilized to overcome the instability issues that were raised when implementing 2-D IIR filters. The software was written in ANSI C++ and the results were compared with DONLP2 algorithm.

3.2. POSIT Numbering System

Numerous articles were published examining various aspects of the POSIT system, including targeted application, operations, conversion to/from FP, and the implementation of POSIT adder, subtractor, and multiplier using application-specific integrated circuit (ASIC) and field-programmable gate array (FPGA) technologies.

Langroudi et al. [18] decided to use POSIT for performing a low-precision deep learning inference on embedded devices. According to the paper, reducing the number of bits below 8 when implementing a deep learning inference significantly impacts accuracy. However, the accuracy of the normalized POSIT surpasses other numbering systems when reducing the number of bits. A model was proposed for performing image classification tasks. This model converts the weights from floating point into POSIT for all operations involving memory, including read and write.

A prototype called POSGEN was developed by Podobas et al. [19], which creates any POSIT operation the user requests. The user specifies the operation type, width of the total format, and the exponent width. Although the constructed pipelined hardware covers 32-bit POSIT only, their future work will be dedicated to optimizing other bit-widths. The authors compared the POSIT design of three different operations: addition, subtraction, and multiplication.

In Jaiswal et al. [20], the authors proposed an algorithmic flow along with an architecture for a POSIT adder and subtractor. The implementations of an 8-, 16-, and 32-bit adders and subtractors were realized on the FPGA and ASIC. The authors compared the POSIT adders with the SP FP adders. The POSIT adders show an average value of latency and area compared with several FP adder architectures.

The research work of Jaiswal et al. [21] included the algorithmic flow and the hardware architecture of the POSIT adder, subtractor, and multiplier as well as the conversion of FP to POSIT and vice versa. The first step of converting FP to POSIT is to extract the data of FP while considering the zero and infinity cases and excluding the NaN (not a number) cases. Then, the POSIT fields are constructed to concatenate the sign bit with the REM [N − 1:1]; where REM stands for regime, exponent, and mantissa data construction. Similarly, when converting POSIT to FP, the POSIT inputs should be checked for exceptions, including zero or infinity. The sign bit is next examined to see if it is negative, and a 2′s complement is computed for the input. The equivalent FP number is created after extracting all the POSIT data.

The software and the hardware implementation of Type I and POSIT was demonstrated by Lehoczky et al. [22]. For both types, the software implementation was constructed on a .NET platform written in C language as an open source. The authors used Hastlayer, which was run on Visual Studio 2017, for the purpose of transforming the .NET into a VHDL language that was used to configure the FPGA logic gates. The authors implemented Type I and POSIT through using a structure called Bitmask. This structure can handle environments of all sizes. They performed addition, subtraction, comparison, and conversion for both numbering systems.

Hou et al. [23] examined the drawbacks of floating point along with POSIT structure. The logic circuit diagrams of the unum adder and multiplier were provided in this work. The result that was acquired by the unum adder displays only the rightmost and leftmost bit based on the value of esize and fsize, respectively. Therefore, when using unum, the authors managed to decrease the bandwidth without affecting the accuracy of the result.

Glaser et al. created an ASIC design with a 128-bit ALU to execute unum operations, including addition and subtraction [24]. The authors also implemented two ubound and unum-specific functions—optimize and unify. The “optimize” function was used to ensure that the intermediate results were not lost, and “unify” was used for external data movement.

A reconfigurable logic of a matrix multiplier was designed by Chen et al. and implemented on an FPGA [25]. The multiplier was constructed for matrices with 32-bit POSIT numbers with es = 2, which represents the exponent size. Chen et al. decided to create a matrix multiplier because it significantly improved memory access and tolerated having a large bandwidth for the input and output. Having a matrix multiplier also boosts the number of FP operations that can be implemented.

From the previous research addressing the IIR digital filter and the POSIT numbering system, it is clear that POSIT implementation is attractive. However, not enough research has been conducted to improve POSIT performance in filter designs. Moreover, in the case of studies showing POSIT numbers, the main focus is on enhancing either the speed, area, or both. That is, limited numbers of performance metrics have been considered. However, this paper focuses on enhancing the performance of POSIT arithmetic units including accuracy, area, speed, power, and energy, which are utilized to implement the second-order IIR notch filter.

This paper focuses on the work of Podobas et al. [19] and Jaiswal et al. [20]. Podobas et al. [19] demonstrated a hardware diagram for three arithmetic operations, including a POSIT adder/subtractor and a POSIT multiplier. The Podobas multiplier is a regular multiplier, while our paper creates a more efficient and faster POSIT multiplier that significantly optimizes speed. Moreover, our paper includes the design of two additional POSIT operations: the division and cosine function. Our paper also implements a second-order infinite impulse response notch filter, which is implemented for both POSIT and floating point numbering systems. Moreover, since the work presented by Jaiswal et al. [20] only proposes a POSIT adder/subtractor, this research builds on it by constructing additional POSIT arithmetic units, including ERMBM and a common block for executing division and cosine functions.

4. Components Design

An overview of the proposed methodology in implementing the second-order infinite impulse response notch filter along with the ERMBM for POSIT numbers is provided in this section.

4.1. Enhanced Radix-4 Modified Booth Multiplier

The modified Booth multiplier consists of three main steps: the modified Booth encoder, the generation of the partial products, and partial product summation [26]. Since the paper is constructing a Radix-4 Modified Booth Multiplier for POSIT numbers, it has to operate on unsigned numbers as it will be used to multiply the POSIT fractional part only. Usually, when adjusting the Radix-4 Modified Booth Multiplier to operate on unsigned numbers, the size of the multiplier’s input and output will increase significantly [27]. This is mainly because both operands have to be sign extended with n number of bits equal to its size. In the case of unsigned numbers, the sign bit is equal to 0. However, in this paper, the developed Radix-4 Modified Booth Multiplier will operate without the need for changing the size of the multiplier and, therefore, the number of partial products that are generated remains the same as if the two inputs are of signed numbers. Only the multiplicand is sign-extended. Moreover, since most of the multiplier’s delays are due to computing the 2′complement of the multiplicand and adding the grouped partial products together, this paper implements an efficient way to reduce the delay time of the 2′s complement computation [28]. Furthermore, it reduces the number of partial products that are added together. The architecture of the modified Booth multiplier that is proposed by this paper consists of the following main steps:

4.1.1. Computing 2′s Complement

The block diagram of computing 2′s complement is provided in Figure 2. The design groups every two neighboring binary bits in one group. Since the size of the multiplier’s input is of 29 bits as only the mantissa will be multiplier together, 29 groups will be created once every two neighboring bits are grouped together. This will be followed by calculating the conversion signal for each group. The conversion signal is found by making all the bits after the rightmost 1 equal to 1. Then, a conversion signal for a 4-bit group number is found followed by the conversion signal for an 8-bit group number. The same approach will be implemented until all bits after 0 are converted, as proposed by Tripathi et al. [29]. The last two steps of finding the 2′s complement is to shift the final answer one bit to the left and XOR it with the multiplicand binary number.

4.1.2. Encoding the Multiplier

When multiplying two numbers, A and B, using a regular multiplier, as proposed in [5], the number of the partial products that are created is equal to the number of bits of the multiplier (MR). The Booth encoder is incorporated in this design to reduce the multiplier digit number partial products and the multiplication time [30]. In general, the Radix-2ⁱ Booth reduces the number of partial products by (#of MR bits/i). To implement an n-bit multiplier when using the Radix-4 Booth encoder, a zero should be placed at position 0 and then each 3 bits are grouped together in (#of MR bits)/2 groups [31]. If the MR or the MD’s total number of bits is of an odd number, then the sign bit is extended. In this paper, a 29 × 29-bit Enhanced Radix-4 Modified Booth Multiplier is implemented. Therefore, the multiplier binary number are grouped into 15 three-bit groups, as shown in Figure 3.

Each group is encoded to either 0y, (±1y) or (±2y); where y represents the multiplicand as presented in Table 2. Prior to encoding, the MD is concatenated with n zeros that are equal to its size, to utilize the Booth algorithm for two unsigned numbers. When a 4-bit MR with a value of 1111 is being multiplied with a 4-bit MD with a value 1111, the operation will be executed differently based on number representation. If the numbers are signed numbers, then the answer will be (−1) × (−1) = +1. However, if both numbers are unsigned numbers, then the answer will be 15 × 15 = 225. The 16-bit unsigned Booth multiplier can be expressed as a 33 × 33-bit signed multiplier, resulting in a 17-partial product [27]. The size of the multiplier is equal to 33 × 33 bits, as 16 bits out of those 33 bits are for the operands, the other 16 bits are for sign extensions, and one bit is the zero that is placed at position 0 for Booth encoding. However, in this paper, a different approach has been adapted to multiply two unsigned numbers while having several partial products that are equal to the (#of MR bits)/2. The MR’s number of bits did not change except for adding zero at position 0 and another 0 at position 30 to make the MR of an even number of bits. Therefore, the number of partial products remains the same. The conversion of each partial product is done based on the vector values, as shown in Table 2 [32].

Usually, when multiplying two numbers of the same size (e.g., N) the product size is equal to 2N. Since the input of our multiplier is of 29 bits, the answer is 58 bits. Therefore, each partial product is 58 bits. Before adding the partial products, each partial product is shifted to the left by 2 compared with the preceding partial product, as shown in Figure 4. The final step of the multiplication process is to add the partial products together to formulate the final answer of the multiplication, which in this paper is a size of 58 bits.

Figure 5 illustrates an example along with the steps in which the proposed ERMBM calculates the answer of multiplying the two mantissas together, considering that the two’s mantissas are of 7-bit. In this example, the first mantissa has the value of

62_{10}

, while the mantissa of the second input has a value of

102_{10}

. Moreover, Figure 6 represents the general architecture of the proposed Radix-4 Modified Booth Multiplier that was used to multiply the two mantissas.

4.2. IIR Notch Filter

As mentioned previously, one of the paper objectives is to implement the second-order IIR notch filter. The transfer function of the second-order IIR notch filter is expressed in Equation (4), where

B_{1}

and

B_{2}

represent the filter’s feed-forward coefficients, while

A_{1}

and

A_{2}

characterize the filter’s feed-back coefficients. Since these coefficients can be calculated using the formulas that are provided in Equations (5)–(11), the transfer function can be rewritten as provided in Equation(13), where

(ω)

is the normalized polar notch frequency; (

α

) is the pole contraction factor, which basically defines the bandwidth of the notch;

(f_{0})

is the notch frequency; and

(f_{s})

is the sampling frequency [33].

H (z) = \frac{B_{0} + B_{1} z^{(- 1)} + B_{2} z^{(- 2)}}{A_{0} + A_{1} z^{(- 1)} + A_{2} z^{(- 2)}}

(4)

B_{0} = 1

(5)

B_{1} = - 2 \cos (ω)

(6)

B_{2} = 1

(7)

A_{0} = 1

(8)

A_{1} = 2 \cdot α \cdot \cos (ω)

(9)

A_{2} = {(α)}^{2}

(10)

ω = 2 \cdot π \cdot (\frac{f_{0}}{f_{s}})

(11)

∴ H (z) = \frac{1 - 2 \cos (ω) z^{(- 1)} + z^{(- 2)}}{1 - 2 (α) \cos (ω) z^{(- 1)} + {(α)}^{2} z^{(- 2)}}

(12)

Equation (13) shown in Figure 7 can be derived once the z-domain transfer function is expressed as an input–output relation of the linear difference equations with constant coefficients. Figure 7 represents the data flow of the second-order IIR notch filter.

∴ y [n] = x [n] - 2 \cos (ω) \cdot x [n - 1] + x [n - 2] + 2 (α) \cos (ω) \cdot y [n - 1] - {(α)}^{2} \cdot y [n - 2]

(13)

The block diagram of the second-order IIR notch filter that is emulated using Xilinx software is provided in Figure 8. The aim of this paper is to improve the performance of the implemented filter by utilizing a Radix-4 Modified Booth multiplier as demonstrated in Figure 9. The paper also implements POSIT division operation based on the Taylor series technique. The same technique is used to implement the cosine function through utilizing a shared hardware.

As the data are represented using the 32-bit POSIT numbering system when implementing the IIR filter, all the arithmetic operations in the IIR filter equations are calculated in POSIT format. These operations include:

(1): POSIT addition/subtraction,
(2): POSIT multiplication,
(3): POSIT division,
(4): Cosine function for POSIT.

4.2.1. POSIT Divider

In this paper, both the POSIT multiplication and division arithmetic units have several steps in common. For example, similar to the previously explained multiplication design, the division operation starts first by checking the corner cases, including 0 and ∞. Then, it decodes all the necessary fields, including sign bit (S), regime check (RC), regime (R), exponent (E), and mantissa (M) for both inputs. The 2′s complement is generated for either input depending on the result of XORing the two’s input sign bit. Essentially, the division operation is implemented by dividing the mantissa of the dividend over the mantissa of the divisor. However, the standard division operation is replaced with the multiplication operation as the first input will be multiplied with the second input’s reciprocal. This ultimately generates the same output as if it were implemented using the standard division arithmetic unit. The reason for replacement is that division is a complex operation requiring significant execution time. Furthermore, standard division has a large implementation area and consumes high power [34]. Many equations can be used to execute the division operation. These equations are divided into slow and fast division equations. In this paper, the main focus will be on implementing a fast division technique including Taylor series.

Since division can be expressed as A × (1/B) instead of (A/B); where A represents the dividend and B represent the divisor, the division operation is replaced with multiplication with the reciprocal of the divisor. In this case, the focus will be on deriving an accurate value for the denominator reciprocal which can be computed using the Taylor series method.

The idea of Taylor-series expansion is to find a higher-order polynomial estimation of any continuous function at some value in its domain [35]. For example, the technique approximates the continuous function along with n first derivatives by using the n degree polynomial at a certain point. As the order of this polynomial equation increases, the accuracy of the approximation increases dramatically. Therefore, the Taylor series replaces the division operation with the multiplication of A and with a generated polynomial equation of B. This generated polynomial for any n of

y = f (x)

at any value of

(x_{0})

is shown in Equation (14). In this paper, the Taylor series technique utilizes a 5-order polynomial equation to optimize the area. As the order of the polynomial equation increases, the area will increase dramatically and more terms must be calculated. Moreover, since the value of the mantissa part of (1/B) will be in a range of [1,2), the polynomial equation should be centered at 1.5. This is mainly because the minimum value of the mantissa can be 1.0, while the maximum value can be

1.111 \dots 1 \approx 1.9999

. However, when the error was calculated at 1.5 & n = 5 using Equation (15), a different value of

x_{0}

was chosen to reduce the error and simplify the generated equation even more. When the regime and exponent are in the range of

2^{1}

(100001), the range of the Taylor series will be between [2,4). Therefore, the Taylor series equation that was used in this paper is instead centered at 3. When substituting (1/x) in Equation (14), Equation (16) is generated, and when substituting n with 5, Equation (17) is generated, which is used to calculate (1/x).

However, before substituting any number in the equation of the Taylor series, x must be in the range of [2,4). Therefore, some modifications must first be done on the scale factor of the POSIT number before using the Taylor series equation, as the size of the regime in this paper is 2 bits while the size of the exponent is 4 bits. When substituting these values to Equation (2), Equation (18) is generated. However, because the division operation involves subtracting the exponents, the exponent of the inverse scale factor needs to be a negative value. The term

16 k

can be a negative value by making the value of

k = - 1

, which is done by setting the value of the regime bits = 01. On the other hand, the negative representation of the second exponent as displayed in Equation (18) is 1 − exp. This representation is equivalent to 17 − exp for a four-bit number, which is the case of the exponent field. Thus, the inverse of the input is calculated by concatenating 0 for the sign bit with 01 for the regime bits, 17 − exp for the exponent bits, and setting the fractional bits with zeros. This inverse is then multiplied with

(2^{+ 1} \cdot 1 . f),

which is calculated using the Taylor series technique as illustrated in Equation (17), aiming to calculate the value of (1/x).

P n (x) = \frac{f^{(0)} (x_{0})}{0!} + \frac{f^{(1)} (x_{0})}{1!} (x - x_{0}) + \frac{f^{(2)} (x_{0})}{2!} {(x - x_{0})}^{2} + \dots + \frac{f^{(n)} (x_{0})}{n!} {(x - x_{0})}^{n} .

(14)

E_{n} (x) = \frac{1}{n!} \int_{0}^{x} {(x - t)}^{n} f^{n + 1} (t) d t

(15)

P n (x) = \sum_{n \geq 0} {(- 1)}^{n} \cdot f 3^{- 1 - n} \cdot {(- 3 + X)}^{n}

(16)

P n (x) = \frac{1}{3} - \frac{X - 3}{9} + \frac{1}{27} {(x - 3)}^{2} - \frac{1}{81} {(x - 3)}^{3} + \frac{1}{243} {(x - 3)}^{4} - \frac{1}{729} {(x - 3)}^{5}

(17)

D e c i m a l V a l u e o f P O S I T = {(- 1)}^{0} \cdot {({(2)}^{2}^{(4)})}^{k} \cdot 2^{e x p} \cdot 1 . f = 2^{16 k} \cdot 2^{e x p} \cdot 1 . f = 2^{16 k} \cdot 2^{e x p - 1} \cdot (2^{+ 1} \cdot 1 . f)

(18)

The implemented flow of the shared hardware which represents the division part is provided in Figure 10.

4.2.2. POSIT Cosine Function

Trigonometric functions such as sine and the cosine can be computed by several methods such as the look-up table and the Taylor series. However, since the look-up table technique requires memory resources compared to the Taylor series [36], the cosine function in this paper is designed using the Taylor series expansion method. The main purpose of this technique is to break down the function into an infinite sum of terms that are introduced in terms of their function derivatives at a certain value. The function of the Taylor series expansion is shown in Equation (19). The accuracy of the function approximation increases dramatically as the number of these terms increases.

f (x) = f (a) + f^{'} (a) (x - a) + \frac{f^{''} (a)}{2!} {(x - a)}^{2} + \frac{{f^{'}}^{'' (a)}}{3!} {(x - a)}^{3} + \dots \frac{f^{(n)} (a)}{n!} {(x - a)}^{n} + \dots

(19)

\cos (x) = 1 - \frac{x^{2}}{2!} + \frac{x^{4}}{4!} - \frac{x^{6}}{6!} + \frac{x^{8}}{8!} - \frac{x^{10}}{10!} + \frac{x^{12}}{12!} + \dots

(20)

In this paper, the cosine function that is represented in Equation (20) is implemented and the final answer is a 32-bit POSIT number. The angle of this arithmetic unit is in radians. Since the number of the finite terms in the Taylor series technique determines the answer’s precision, the paper considers the first three terms. Although increasing the number of terms will increase the accuracy of the answer, only a limited number of terms have been considered. This limitation is because only one POSIT adder and multiplier were used to implement the cosine function; increasing the number of adders and multipliers will occupy more area. Due to this trade-off, only three terms have been considered to calculate the value of the cosine function. The approximated polynomial function that was used for the Taylor series expansion method was centered at zero with an order that was equal to three. The flow of the main steps that are needed to implement the second part of the shared hardware that represents the cosine function is represented in Figure 11. The figure also illustrates how the three main filter coefficients including B1, A1, and A2 are calculated.

5. Results and Discussion

The result of implementing the second-order IIR notch filter, explained in Section 5, is included and discussed in this section. Xilinx Integrated Software Environment (ISE) v. 14.7 was used to implement the filter using the Verilog HDL code. The arithmetic units and IIR notch filter architecture are simulated by the Virtex-5 Family with an XC5VLX20T field-programmable gate array (FPGA) device. The floating point adder/subtractor was built based on the approach that was proposed in [6], unlike the floating point multiplier which was constructed based on the work that was presented in [7]. Both studies were selected as they proposed and implemented standard floating point arithmetic units, which is essential for this paper. Moreover, POSIT is compared with floating point before and after enhancing POSIT arithmetic units to conduct a fair comparison between the two numbering systems. A total of 10 different samples are passed to the filter and considered to be the filter’s main inputs. A total of five different performance metrics are measured and discussed including accuracy, area, speed, power, and energy. The section includes the two types of comparisons including:

(a): A comparison within POSIT (regular multiplier vs. radix-4 modified booth multiplier)
(b): A detailed comparison between POSIT and floating point

5.1. Second-Order IIR Notch Filter

5.1.1. Calculating the Filter’s Coefficients

Coefficients are one of the most fundamental aspects of any filter as they determine the filter’s output. Therefore, it is vital to conduct a precise calculation for the coefficients to generate the correct output. From Equation (12), which represents the equation of the second-order notch filter in the frequency domain, it can be concluded that there are three main coefficients that must be calculated. These coefficients are B1, A1, and A2. For the IIR filter, three inputs are needed to calculate the filter’s coefficients, namely

(f_{0})

,

(f_{s})

, and (

α

), which represent the notch frequency, sampling frequency, and the pole contraction factor, respectively. The coefficients’ values are assumed to be 60, 768, and 0.96875, respectively.

First, the reciprocal of

f_{s}

is calculated and then is multiplied with

f_{0}

. Basically, ten multiplication and six addition operations are needed to calculate

1 / f_{s}

. When the reciprocal of

f_{s}

is calculated, another two extra multiplication operations are needed to calculate

ω = 2 * p i * (f_{0} / f_{s})

. These calculations will be executed using only one POSIT multiplier and one POSIT adder. The reciprocal of the sample frequency is calculated in 12 cycles while normalized polar notch frequency

(ω)

is calculated in cycle 14.

Moreover, six multiplications and three addition operations are needed to calculate

COS (ω),

which is executed using one multiplier and one adder. In Cycle 21, the value of the cosine function is acquired. To calculate B1 and A1, an additional multiplication operation is executed. The value of B1 is calculated in Cycle 22, while A1 is calculated in Cycle 23. However, since A2 does not depend on the value of the cosine function it is executed once the multiplier is not occupied which is in Cycle 21. Therefore, in Cycle 23, the valid register is set to 1 to indicate that all the filter’s coefficients have been calculated. The accuracy of both numbering systems is discussed in greater detail in the next section.

5.1.2. Generating the Output of the IIR Filter

To implement Equation (13), which is the equation of the IIR filter, three additional multiplication and four addition operations were executed using one adder, one multiplier, and four 32-bit registers. These registers were created to construct x[n − 1], x[n − 2], y[n − 1], and y[n − 2]. If the reset is equal to 1, then these registers are equal to zero or the registers will be updated to the value of their inputs at the rising edge of the clock. At each raising edge of the clock, a new sample enters the filter, updating the value of these four registers and, therefore, calculating the sample’s output. In this paper, 10 samples are passed to the IIR filter. The output of the first sample will be calculated in cycle number 27 as mentioned in Table 3. The corresponding result for each sample is compared with the answers that are generated from MATLAB as a reference aiming to have a fair comparison between POSIT and floating point concerning accuracy.

5.2. Comparison

The section includes a comparison within POSIT arithmetic units; as the performance of Radix-4 Modified Booth Multiplier versus the performance of regular multiplier is demonstrated in great detail. Moreover, the section examines the performance of the filter using the two numbering systems. The calculations that were acquired from MATLAB are used as a reference aiming to have a fair comparison between the two systems.

5.2.1. Radix-4 Modified Booth Multiplier vs. the Regular POSIT Multiplier

Accuracy

The main goal of implementing ERMBM is to speed up the calculation and consequently reduce the amount of occupied area. The accuracy for both multipliers has been tested for many cases, and all of these demonstrate that the two multipliers have the same accuracy as shown in Table 4.

Area and Speed

As demonstrated in Table 5, the ERMBM outperforms the regular multiplier that was proposed by [5] in area and speed. It reduces the maximum delay to less than half compared with the regular multiplier. This is because ERMBM reduces the number of partial products by n/2; were n is the size of the multiplier. The enhancement percentages of the implemented approach (ERMBM) in speed and area are 51.97% and 26.80%, respectively.

Power and Energy

As illustrated in Table 6, the power and energy of ERMBM is better than the regular multiplier as it enhances them by 0.54% and 52.22%, respectively. Usually power and area track each other in Datapath. Therefore, if the area is reduced by 26.80%, then power should be reduced with the same percentage. However, this is not the case here as the ERMBM nodes switching rate is higher.

5.2.2. A Comparison between POSIT and Floating Point

Accuracy

Table 7 compares the accuracy between a 32-bit POSIT and a 32-bit floating point for the IIR filter. The comparison includes 13 cases: the filter’s coefficients (i.e., B1, A1, and A2) and the filter’s ten outputs (i.e., Y1, Y2, …, Y10). Each case includes the MATLAB result, POSIT (result and error), and floating point (result and error). The errors are computed in reference to MATLAB. Table 7 shows that 10 cases have better accuracy for POSIT while two have the same accuracy. Only one case has shown a better accuracy for the floating point system. Hence, overall, POSIT provides better accuracy for the IIR filter.

Area and Speed

The area and speed of the filter implementation for the two numbering systems using the regular multiplier is provided in Table 8. The area of the POSIT filter is reduced by 21.77%, unlike its speed, which took 27.81% more time compared to floating point. This is because the fixed position of the fields representing floating point facilitate parallel decoding operation unlike POSIT where some serialization is required to determine the regime first.

However, if the Enhanced Radix-4 Modified Booth Multiplier is used instead of a regular multiplier, POSIT outperforms floating point in speed as it enhanced the speed by 51.97%, as mentioned in Table 5. This is the case as shown in Table 9, where the area and speed of the whole filter are enhanced by 35.68% and 20.66% while maintaining the same precision for the acquired output with zero numbers of DSPE48E. This result is unlike the regular multiplier, which utilizes 8 DSPs, where each one of them adds complexity to the design and significantly increases the area. The DSPE48E is a digital signal processing unit that is part of the FPGA device. Xilinx uses this element to implement several basic and complex operations. Sometimes it even utilizes a cascaded number of DSPs to implement complicated operations including advanced multiplications and N-Tap filters.

Power and Energy

When comparing POSIT with floating point in terms of power and energy, POSIT demonstrates better results for both, as illustrated in Table 10 and Table 11.

5.3. Discussion

After enhancing POSIT arithmetic units and comparing the system with the floating point numbering system, several recommendations result, including:

(1): Using the Enhanced Radix-4 Modified Booth Multiplier (ERMBM) instead of a regular multiplier enhances the area, speed, power, and energy without affecting the accuracy of the output.
(2): If the accuracy, area, and power are the main concern, then it is recommended to utilize the POSIT numbering system with a regular multiplier instead of floating point.
(3): If the aim is to enhance all five performance metrics, then it is recommended to use POSIT units along with ERMBM instead of floating point, as POSIT optimizes area, speed, power, and energy by 35.68%, 20.66%, 31.49%, and 45.64%, respectively.

6. Conclusions and Future Works

In this paper, the main goal was to implement the second-order IIR notch filter in two numbering systems, POSIT and floating point, aiming to measure their performance metrics for accuracy, area, speed, power, and energy.

When comparing POSIT and floating point, the regular multiplier was used first instead of the Radix-4 Modified Booth Multiplier with the goal of assessing the two numbering systems without any optimization or enhancement for POSIT arithmetic units. In this case, POSIT shows better performance in accuracy than floating point; POSIT’s accuracy was estimated at 92.31%, as 10 out of 13 cases showed better performance for POSIT, while 2 of 13 showed the same accuracy for both POSIT and FP. In comparison, the estimated accuracy of floating point was 23.08%, as 1 out of 13 cases showed better performance for floating point. When implementing the entire filter, the area for POSIT is less than floating point, though POSIT’s area was enhanced 21.77%. However, the longest timing path for floating point is reduced by 27.81%, which indicates that there is a trade-off between the area and speed. The dissipated power and energy consumption for the whole filter that are implemented using POSIT were better compared to floating point. However, POSIT outperforms floating point in all five performance metrics when utilizing ERMBM instead of the regular multiplier. In this case, the speed, area, power, and energy were enhanced by 20.66%, 35.68%, 31.49%, and 45.64%, respectively, while utilizing zero number of DSPs.

For future work, we plan to improve the performance of POSIT arithmetic units even more by implementing a pipelined design aiming to enhance POSIT’s performance in resource use, speed of execution, power, and energy consumption. It would be advantageous to compare 32-bit POSIT with a custom precision floating point to determine the number of bits floating point would need to provide the same performance.

Author Contributions

Conceptualization, A.A.E., S.A. and B.J.M.; methodology, A.A.E., S.A. and B.J.M.; software, A.A.E. and S.A.; validation, A.A.E., S.A. and B.J.M.; formal analysis, A.A.E., S.A. and B.J.M.; investigation, A.A.E., S.A. and B.J.M.; resources, A.A.E. and S.A.; writing—original draft preparation, A.A.E., S.A., B.J.M. and A.A.F.; writing—review and editing, A.A.E., S.A., B.J.M. and A.A.F.; visualization, A.A.E., S.A. and B.J.M.; supervision, S.A. and B.J.M.; project administration, S.A. and B.J.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Hauser, J.R. Handling floating-point exceptions in numeric programs. ACM Trans. Program. Lang. Syst. 1996, 18, 139–174. [Google Scholar] [CrossRef]
Lindstrom, P.; Lloyd, S.; Hittinger, J. Universal Coding of the Reals: Alternatives to IEEE Floating Point. In Proceedings of the Conference for Next Generation Arithmetic, Singapore, 28 March 2018; ACM Press: New York, NY, USA, 2018; p. 5. [Google Scholar]
Gustafson, J.L. A Radical Approach to Computation with Real Numbers. Supercomput. Front. Innov. 2016, 3, 1–16. [Google Scholar] [CrossRef]
Cook, J. Anatomy of a POSIT Number. 2018. Available online: https://www.johndcook.com/blog/2018/04/11/anatomy-of-a-POSIT-number/ (accessed on 3 April 2019).
Jaiswal, M.; So, H. PA Cogen: A Hardware POSIT Arithmetic Core Generator. IEEE Access 2019, 7, 74586–74601. [Google Scholar] [CrossRef]
Doable, R.; Chaturvedi, S. Implementation of 32 Bit Binary Floating Point Adder Using IEEE 754 Single Precision Format. VOSR J. VLSI Signal Processing (IOSR-JVSR-JVSP) 2015, 5, 2319–4197. [Google Scholar]
Tejaswini, H.N.; Ravishankar, C.V. Single Precision Floating Point Numbers Multiplication using Standard IEEE 754. Int. J. Adv. Res. Electron. Commun. Eng. (IJARECE) 2015, 4, 1–5. [Google Scholar]
Gustafson, J.; Yosemite, I. Beating Floating Point at its Own Game: POSIT Arithmetic. Supercomput. Front. Innov. Int. J. 2017, 4, 2409–6008. [Google Scholar]
Montero, R.M. Study of the POSIT Number System: A Practical Approach. Bachelor’s Thesis, Universidad Complutense de Madrid, Madrid, Spain, 2019; pp. 1–78. [Google Scholar]
Carmichael, Z.; Langroudi, H.F.; Khazanov, C.; Lillie, J.; Gustafson, J.L.; Kudithipudi, D. Deep Positron: A Deep Neural Network Using the Posit Number System. arXiv, 2018; arXiv:1812.01762. [Google Scholar]
Dam, L. Enabling High Performance POSIT Arithmetic Applications Using Hardware Acceleration. Master’s Thesis, TU Delft Electrical Engineering, Mathematics and Computer Science, Delft, The Netherlands, 2018; pp. 1–112. [Google Scholar]
Ushanandhini, S.S. Designing of IIR Filter using Radix-4 Multiplier by Recharging Technique. Int. J. Trend Sci. Res. Dev. (IJTSRD) 2018, 2, 1100–1107. [Google Scholar]
Wang, Y.; Li, B.; Li, Z. Fixed-point digital IIR filter design using multi-objective optimization evolutionary algorithm. In Proceedings of the 2010 IEEE Youth Conference on Information, Computing and Telecommunications, Beijing, China, 28–30 November 2010; pp. 174–177. [Google Scholar] [CrossRef]
Chengliang, Z.; Lihong, W. IIR Digital Filter Research and Simulation on MATLAB. In Proceedings of the 4th International Conference on Signal Processing Systems (ICSPS), Kuala Lumpur, Malaysia, 21–22 December 2012; Volume 58, pp. 138–142. [Google Scholar]
Bajwa, D.; Singh, K.; Chahal, N. Fixed Point digital IIR filter Design using two-stage ensemble evolutionary algorithm. Int. J. Adv. Res. Comput. Sci. Softw. Eng. (IJARCSSE) 2014, 4, 329–338. [Google Scholar]
Stavrou, V.; Tsoulos, I.; Mastorakis, N. Transformations for FIR and IIR Filters’ Design. Symmetry 2021, 13, 533. [Google Scholar] [CrossRef]
Tsoulos, I.G.; Stavrou, V.; Mastorakis, N.E.; Tsalikakis, D. GenConstraint: A programming tool for constraint optimization problems. SoftwareX 2019, 10, 100355. [Google Scholar] [CrossRef]
Langroudi, S.H.F.; Pandit, T.; Kudithipudi, D. Deep Learning Inference on Embedded Devices: Fixed-Point vs Posit. In Proceedings of the 2018 1st Workshop on Energy Efficient Machine Learning and Cognitive Computing for Embedded Applications (EMC2), Williamsburg, VA, USA, 25 March 2018; Volume EMC2, pp. 19–23. [Google Scholar]
Podobas, A.; Matsuoka, S. Hardware Implementation of POSITs and Their Application in FPGAs. In Proceedings of the 2018 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Vancouver, BC, Canada, 21–25 May 2018; pp. 138–145. [Google Scholar]
Jaiswal, M.K.; So, H.K.-H. Architecture Generator for Type-3 Unum Posit Adder/Subtractor. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar]
Jaiswal, M.K.; So, H.K.-H. Universal number posit arithmetic generator on FPGA. In Proceedings of the 2018 Design, Automation & Test in Europe Conference & Exhibition (DATE), Dresden, Germany, 19–23 March 2018; pp. 1159–1162. [Google Scholar]
Lehóczky, Z.; Retzler, A.; Tóth, R.; Szabó, Á.; Farkas, B.; Somogyi, K. High-level. NET software implementations of unum type I and posit with simultaneous FPGA implementation using Hastlayer. In Proceedings of the Conference for Next Generation Arithmetic, Singapore, 28 March 2018; ACM Press: New York, NY, USA, 2018; p. 4. [Google Scholar]
Hou, J.; Zhu, Y.; Shen, Y.; Li, M.; Wu, H.; Song, H. Tackling Gaps in Floating Point Arithmetic: Unum Arithmetic Implementaion on FPGA. In Proceedings of the IEEE 3rd International Conference on Data Science and Systems (HPCC/SmartCity/DSS), Bangkok, Thailand, 18–20 December 2017; pp. 615–616. [Google Scholar]
Glaser, F.; Mach, S.; Rahimi, A.; Gurkaynak, F.K.; Huang, Q.; Benini, L. An 826 MOPS, 210uW/MHz Unum ALU in 65 nm. In Proceedings of the 2018 IEEE International Symposium on Circuits and Systems (ISCAS), Florence, Italy, 27–30 May 2018; pp. 1–5. [Google Scholar]
Chen, J.; Al-Ars, Z.; Hofstee, H.P. A matrix-multiply unit for posits in reconfigurable logic leveraging (open) CAPI. In Proceedings of the Conference for Next Generation Arithmetic, Singapore, 28 March 2018; ACM Press: New York, NY, USA, 2018; p. 1. [Google Scholar]
Fadavi-Ardekani, J. M*N Booth encoded multiplier generator using optimized Wallace trees. IEEE Trans. Very Large Scale Integr. (VLSI) Syst. 1993, 1, 120–125. [Google Scholar] [CrossRef]
Krushna, M.; Kumar, K. Modified Booth Encoding Multiplier for both Signed and Unsigned Radix Based Multi-Modulus Multiplier. Int. J. Mag. Eng. Technol. Manag. Res. (IJMETMR) 2015, 2, 1–6. [Google Scholar]
Kang, J.-Y.; Gaudiot, J.-L. A fast and well-structured multiplier. Euromicro Symposium on Digital System Design, DSD, Rennes, France, 31 August–3 September 2004; pp. 508–515. [Google Scholar] [CrossRef]
Tripathi, R.; Prakash, S. Comparative Design of Regular Structured Modified Booth Multiplier. Int. J. VLSI Des. Commun. Syst. (VLSICS) 2016, 7, 21–30. [Google Scholar] [CrossRef]
Kshirsagar, R.; Aishwarya, E.; Vishwanath, A.; Jayakrishnan, P. Implementation of pipelined Booth Encoded Wallace tree Multiplier architecture. In Proceedings of the 2013 International Conference on Green Computing, Communication and Conservation of Energy (ICGCE), Chennai, India, 12–14 December 2013; pp. 199–204. [Google Scholar]
Ramteke, S.; Khandagre, Y.; Dubey, A. Implementation of Low Power Booth’s Multiplier by Utilizing Ripple Carry Adder. Int. J. Sci. Eng. Res. (IJSER) 2014, 5, 2229–5518. [Google Scholar]
Kaur, S.; Manna, M. Implementation of Modified Booth Algorithm (Radix-4) and its Comparison with Booth Algorithm (Radix-2). Res. India Publ. Adv. Electron. Electr. Eng. 2013, 3, 683–690. [Google Scholar]
Wang, C.M.; Xiao, W.C. Second-order IIR Notch Filter Design and implementation of digital signal processing system. In Proceedings of the 2nd International Symposium on Computer, Communication, Control and Automation, Singapore, 1–2 December 2013; Atlantis Press: Paris, France, 2013; pp. 0576–0578. [Google Scholar]
Lavanya, K.; Savidhan, S.; Akshatha, D. Computing Non-Restoring and Newton Raphson’s Method for Division. IOSR J. Electron. Commun. Eng. (IOSR-JECE) 2017, 12, 57–60. [Google Scholar]
Oberman, S.; Flynn, M. An Analysis of Division Algorithms and Implementations; Departments of Electrical Engineering and Computer Science, Stanford University: Stanford, CA, USA, 1995; Volume 46, pp. 833–854. [Google Scholar]
Malepati, H. Digital Media Processing DSP Algorithms Using C; Elsevier: Newnes, Australia, 2010; pp. 1–768. [Google Scholar]

Figure 1. POSIT format (2) [9].

Figure 2. The block diagram of the 2′s complement that is utilized in the ERMBM multiplier.

Figure 3. How the multiplier’s (MR) bits will be grouped.

Figure 4. Alignment of shifted partial products.

Figure 5. Multiplication of two unsigned numbers using Radix-4 modified Booth algorithm.

Figure 6. Proposed architecture for the Enhanced Radix-4 Modified Booth Multiplier.

Figure 7. Data flow of the second-order IIR notch filter.

Figure 8. Block diagram of the second-order IIR notch filter.

Figure 9. Block diagram of POSIT ERMBM.

Figure 10. The shared hardware of the division arithmetic unit that is used to implement

\frac{1}{f_{s}}

.

Figure 10. The shared hardware of the division arithmetic unit that is used to implement

\frac{1}{f_{s}}

.

Figure 11. Shared hardware utilized flow to calculate cos(x).

Table 1. The difference between FIR and IIR filter.

Finite Impulse Response Filter (FIR Filter)	Infinite Impulse Response Filter (IIR Filter)
Depend on current input only, no feedback is required (only zeros)	Depend on previous output, feedback is required (poles and zeros)
Uses more number of terms to achieve same result-more computations–slower	Uses less number of terms to achieve same result-less computations–faster.
Require more memory	Require less memory space
Can be easily designed with a linear phase	Non-linear phase
Always stable	Has stability issues–Not always stable
Has constant delay at all frequencies	Has unequal delays at each frequency
No analog history	Derived from analog filters
Used with high order tapping (20–2000)	Used with fewer order tapping type < 1/10th order of FIR filter (4–20)

Table 2. Radix-4 Modified Booth encoding table.

Y_i	Y_i−1	Y_i−2	Required Operation on the Multiplicand	Acquired Output
0	0	0	0 × MD	Zero
0	0	1	+1 × MD	Value of the multiplicand
0	1	0	+1 × MD	Value of the multiplicand
0	1	1	+2 × MD	Shifting the MD 1 bit to the left
1	0	0	−2 × MD	Finding 2′s complement of the MD and then shifting 1 bit to the left
1	0	1	−1 × MD	Finding 2′s complement of MD
1	1	0	−1 × MD	Finding 2′s complement of MD
1	1	1	0 × MD	Zero

Table 3. Multiplication/Addition operations and cycle number to compute filter Equation (13).

Term	Multiplication	Addition	Cycle Number
$1 / f_{s}$	10	6	12
$ω$	2	-	14
$\cos (ω)$	6	3	21
A2	1	-	21
B1	1	-	22
A1	1	-	23
Equation (13) Result	3	4	27

Table 4. A total of three random cases that were tested to compare the accuracy between the arithmetic units ERMBM and the regular multiplier.

$Case One (34.3456 * 434)$
ERMBM	Output in POSIT = 01011011101000111001111111011000 Output in decimal = 14905.99023	$ERMBM error = 1.14048107 \times 10^{- 6}$
Regular POSIT multiplier proposed by [5]	Output in POSIT = 01011011101000111001111111011000 Output in decimal = 14905.99023	$Regular multiplier error = 1.14048107 \times 10^{- 6}$
MATLAB	14905.990399999998771818354725838
$Case Two (- 2.3456 \times 10^{- 3} * 0.564)$
ERMBM	Output in POSIT = 11010011010010100110100011100110 $Output in decimal = - 1.3229184 \times 10^{- 3}$	$ERMBM error = 1.63910665 \times 10^{- 14}$
Regular POSIT multiplier proposed by [5]	11010011010010100110100011100110 $Output in decimal = - 1.3229184 \times 10^{- 3}$	$Regular multiplier error = 1.63910665 \times 10^{- 14}$
MATLAB	−0.0013229183999999998867791450862796
$Case Three (- 585.56 * - 5529.34)$
ERMBM	Output in POSIT = 01100101100010110011110000000010 $Output in decimal = 3237760.25$	ERMBM error = 99.99999748
Regular POSIT multiplier proposed by [5]	Output in POSIT = 01100101100010110011110000000010 $Output in decimal = 3237760.25$	Regular multiplier error = 99.99999748
MATLAB	3237760.3303999998606741428375244

Table 5. Comparison of the regular multiplier with ERMBM for area and speed.

Bit Configuration	Slice LUTS	DSP48Es	Datapath Delay (ns)	Total Gate Delay (ns Logic)	Total Net Delay (ns Route)
Regular POSIT multiplier [5] (Exp = 7 & Regime = 2)	2634	0	48.440	21.657	26.783
Enhanced Radix-4 Modified Booth Multiplier (ERMBM) (Exp = 7 & Regime = 2)	1928	0	23.268	11.698	11.570
Percentage of area enhancement for ERMBM 26.80%			Percentage of speed enhancement for ERMBM 51.97%

Table 6. Comparison of the regular multiplier with ERMBM for power and energy.

Bit configuration	Power (mW)	Energy (pJ)
Regular POSIT multiplier [5] (Exp = 7 & Regime = 2)	1117	54,107.48
Enhanced Radix-4 Modified Booth Multiplier (ERMBM) (Exp = 7 & Regime = 2)	1111	25,850.748
Percentage of power enhancement for ERMBM 0.54%	Percentage of energy enhancement for ERMBM 52.22%

Table 7. Comparison of accuracy between 32-bit POSIT and 32-bit floating point.

Calculation Source	Generated Output	Calculated Error
B1 (POSIT is More Accurate)
B1 from MATLAB	−1.763842361931102233771193617584
B1 from POSIT	10111110011110001110100110101000 → −1.763842344	$POSIT error = 1.01659325 \times 10^{- 6}$
B1 from FP proposed in [6,7]	bfe1c598 → −1.7638425827026367	$FP error = 12.51651160 \times 10^{- 6}$
A1 (POSIT is more accurate)
A1 from MATLAB	−1.7087222881207552889658438170345
A1 from POSIT	10111110100101010010001001011011 → −1.708722264	$POSIT error = 1.41162526 \times 10^{- 6}$
A1 from FP proposed in [6,7]	bfdab76b → −1.708722472190857	$FP error = 10.77238260 \times 10^{- 6}$
A2 (same precision)
A2 from MATLAB	0.9384765625
A2 from POSIT	00111111110000010000000000000000 → 0.9384765625	POSIT error = 0
A2 from FP proposed in [6,7]	3f704000 → 0.9384765625	FP error = 0
Sample = 67 (same precision)
Y1 from MATLAB	67.0
Y1 from POSIT	0100110000011000000000000000000 → 67.0	POSIT error = 0
Y1 from FP proposed in [6,7]	42860000 → 67.0	FP error = 0
Sample = −456 (POSIT is more accurate)
Y2 from MATLAB	−459.69304494529324530195843663682
Y2 from POSIT	10101110011010001001110100101000 → −459.6930466	$POSIT error = 0.35995906 \times 10^{- 6}$
Y2 from FP proposed in [6,7]	c3e5d8b6 → −459.69305419921875	$FP error = 2.01306624 \times 10^{- 6}$
Sample = 19 (POSIT is more accurate)
Y3 from MATLAB	41.946435760963943178399103612273
Y3 from POSIT	01001010100111110010010010000000 → 41.9464111328125	$POSIT error = 58.71333520 \times 10^{- 6}$
Y3 from FP proposed in [6,7]	4227c930 → 41.94647216796875	$FP error = 86.79403660 \times 10^{- 6}$
Sample = 28 (POSIT is More Accurate)
Y4 from MATLAB	41.572953440710447008497063642695
Y4 from POSIT	01001010100110010010101010101000 → 41.57291412	$POSIT error = 94.58243210 \times 10^{- 6}$ .
Y4 from FP proposed in [6,7]	42264ac0 → 41.572998046875	$FP Error = 107.29611600 \times 10^{- 6}$
Sample = 160 (POSIT is more accurate)
Y5 from MATLAB	161.28329915100100573765568271504
Y5 from POSIT	01001110100001010010001000001101 → 161.2832546	$POSIT Error = 27.62282350 \times 10^{- 6}$
Y5 from FP proposed in [6,7]	4321488a → 161.28335571289062	$FP Error = 35.06989870 \times 10^{- 6}$
Sample = 39 (POSIT is more accurate)
Y6 from MATLAB	21.358347613975862138928193744926
Y6 from POSIT	01001000101010110111011100110000 → 21.35829926	$POSIT Error = 226.3938050 \times 10^{- 6}$
Y6 from FP proposed in [6,7]	41aade00 → 21.3583984375	$FP Error = 237.9562550 \times 10^{- 6}$
Sample = 142 (POSIT is more accurate)
Y7 from MATLAB	118.34503631422772995296612011743
Y7 from POSIT	01001101101100101100001010001101 → 118.3449955	$POSIT Error = 34.48748590 \times 10^{- 6}$
Y7 from FP Proposed in [6,7]	42ecb0ae → 118.34507751464844	$FP Error = 34.81381390 \times 10^{- 6}$
Sample = 151 (POSIT is more accurate)
Y8 from MATLAB	121.70887719492042148245152838541
Y8 from POSIT	01001101110011011010101110111000 → 121.70887402	$POSIT Error = 2.60861861 \times 10^{- 6}$
Y8 from FP proposed in [6,7]	42f36af6 → 121.70890808105469	$FP Error = 25.37705960 \times 10^{- 6}$
Sample = 117 (FP is more accurate)
Y9 from MATLAB	89.562431604401888060911774239986
Y9 from POSIT	01001100110011000111111111010100 → 86.56241608	$POSIT Error = 3349636.081 \times 10^{- 6}$
Y9 from FP proposed in [6,7]	42b31ff8 → 89.56243896484375	$FP Error = 8.21822468 \times 10^{- 6}$
Sample = 294 (POSIT is more accurate)
Y10 from MATLAB	277.44683801916972138270230542038
Y10 from POSIT	01010000001010101110010011001000 → 277.4468384	$POSIT Error = 0.13726243 \times 10^{- 6}$
Y10 from FP proposed in [6,7]	43687263 → 277.4468231201172	$FP Error = 5.37005671 \times 10^{- 6}$

Table 8. Comparison of the area and speed between 32-bit POSIT IIR notch filter utilizing a regular multiplier and a 32-bit floating point second-order IIR notch filter.

Bit Configuration	Slice LUTS	DSP48Es	Datapath Delay (ns)	Total Gate Delay (ns Logic)	Total Net Delay (ns Route)
32-bit POSIT utilizing regular multiplier in implementing IIR notch filter (Exp = 7 & Regime = 2)	5729	8	58.070	28.799	29.271
32-bit FP IIR notch filter using the proposed FP adder/subtractor & the proposed multiplier	7323	8	41.918	14.744	27.174
Percentage of area enhancement for POSIT 21.77%			Percentage of speed enhancement for FP 27.81%

Table 9. Comparison of the area and speed between a 32-bit POSIT IIR notch filter utilizing an ERMBM and a 32-bit floating point second-order IIR notch filter.

Bit Configuration	Slice LUTS	DSP48Es	Datapath Delay (ns)	Total Gate Delay (ns Logic)	Total Net Delay (ns Route)
32-bit POSIT utilizing ERMBM in implementing IIR notch filter (Exp = 7 & Regime = 2)	12,270	0	58.014	28.844	29.17
32-bit FP IIR notch filter using the proposed FP adder/subtractor & the proposed multiplier	19,077	0	73.125	32.918	40.207
Percentage of area enhancement for POSIT 35.68%			Percentage of speed enhancement for POSIT 20.66%

Table 10. Comparison of the power and energy between a 32-bit POSIT IIR notch filter utilizing a regular multiplier and a 32-bit floating point second-order IIR notch filter.

Bit Configuration	Power (mW)	Energy (pJ)
32-bit POSIT utilizing regular multiplier in implementing IIR notch filter (Exp = 7 & Regime = 2)	1630	94,654.1
32-bit FP IIR notch filter using the proposed FP adder/subtractor & the proposed multiplier	2671	111,962.98
Percentage of power enhancement for POSIT 38.98%	Percentage of energy enhancement for POSIT 15.46%

Table 11. Comparison of the power and energy between a 32-bit POSIT IIR notch filter utilizing an ERMBM and a 32-bit floating point second-order IIR notch filter.

Bit Configuration	Power (mW)	Energy (pJ)
32-bit POSIT utilizing ERMBM in implementing IIR notch filter (Exp = 7 & Regime = 2)	1619	93,924.67
32-bit FP IIR notch filter using the proposed FP adder/subtractor & the proposed multiplier	2363	172,794.38
Percentage of power enhancement of POSIT 31.49%	Percentage of energy enhancement of POSIT 45.64%

Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Esmaeel, A.A.; Abed, S.; Mohd, B.J.; Fairouz, A.A. POSIT vs. Floating Point in Implementing IIR Notch Filter by Enhancing Radix-4 Modified Booth Multiplier. Electronics 2022, 11, 163. https://doi.org/10.3390/electronics11010163

AMA Style

Esmaeel AA, Abed S, Mohd BJ, Fairouz AA. POSIT vs. Floating Point in Implementing IIR Notch Filter by Enhancing Radix-4 Modified Booth Multiplier. Electronics. 2022; 11(1):163. https://doi.org/10.3390/electronics11010163

Chicago/Turabian Style

Esmaeel, Anwar A., Sa’ed Abed, Bassam J. Mohd, and Abbas A. Fairouz. 2022. "POSIT vs. Floating Point in Implementing IIR Notch Filter by Enhancing Radix-4 Modified Booth Multiplier" Electronics 11, no. 1: 163. https://doi.org/10.3390/electronics11010163

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

POSIT vs. Floating Point in Implementing IIR Notch Filter by Enhancing Radix-4 Modified Booth Multiplier

Abstract

1. Introduction

2. Background

2.1. POSIT

2.2. IIR Filter

3. Related Work

3.1. IIR Filter

3.2. POSIT Numbering System

4. Components Design

4.1. Enhanced Radix-4 Modified Booth Multiplier

4.1.1. Computing 2′s Complement

4.1.2. Encoding the Multiplier

4.2. IIR Notch Filter

4.2.1. POSIT Divider

4.2.2. POSIT Cosine Function

5. Results and Discussion

5.1. Second-Order IIR Notch Filter

5.1.1. Calculating the Filter’s Coefficients

5.1.2. Generating the Output of the IIR Filter

5.2. Comparison

5.2.1. Radix-4 Modified Booth Multiplier vs. the Regular POSIT Multiplier

Accuracy

Area and Speed

Power and Energy

5.2.2. A Comparison between POSIT and Floating Point

Accuracy

Area and Speed

Power and Energy

5.3. Discussion

6. Conclusions and Future Works

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI