Next Article in Journal
Research on Structured Extraction Method for Function Points Based on Event Extraction
Next Article in Special Issue
Mutable Observation Used by Television Drone Pilots: Efficiency of Aerial Filming Regarding the Quality of Completed Shots
Previous Article in Journal
Certificateless Remote Data Integrity Auditing with Access Control of Sensitive Information in Cloud Storage
Previous Article in Special Issue
Using a Double-Core Structure to Reduce the LUT Count in FPGA-Based Mealy FSMs
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Efficient FPGA Implementation of an RFIR Filter Using the APC–OMS Technique with WTM for High-Throughput Signal Processing

by
Kasarla Satish Reddy
1,
Sowmya Madhavan
2,
Przemysław Falkowski-Gilski
3,*,
Parameshachari Bidare Divakarachari
2,* and
Arun Mathiyalagan
4
1
Department of Electronics and Communication Engineering, Hyderabad Institute of Technology and Management, Hyderabad 501401, India
2
Department of Electronics and Communication Engineering, Nitte Meenakshi Institute of Technology, Yelahanka, Bangalore 560064, India
3
Faculty of Electronics, Telecommunications and Informatics, Gdansk University of Technology, Narutowicza 11/12, 80-233 Gdansk, Poland
4
Department of Electronics and Communication Engineering, Panimalar Institute of Technology, Chennai 600123, India
*
Authors to whom correspondence should be addressed.
Electronics 2022, 11(19), 3118; https://doi.org/10.3390/electronics11193118
Submission received: 30 August 2022 / Revised: 20 September 2022 / Accepted: 25 September 2022 / Published: 29 September 2022
(This article belongs to the Special Issue Embedded Systems: Fundamentals, Design and Practical Applications)

Abstract

:
Nowadays, Finite Impulse Response (FIR) filters are used to change the attributes of a signal in the time or frequency domain. Among FIR filters, a reconfigurable filter has the advantage of changing the coefficient in real-time, while performing the operation. In this paper, the Anti-Symmetric Product Coding (APC) and Odd Multiple Storage (OMS) modules are utilized to implement the reconfigurable FIR filter (RFIR–APC–OMS). Herein, the APC–OMS module is used to reduce the area of the RFIR architecture. The performance of the RFIR–APC–OMS is analyzed in terms of: area, power, delay, LUT, flip flop, slices, and frequency. RFIR–APC–OMS has reduced 3.44% of area compared to the existing RFIR architecture employing the Dynamic Reconfigurable Partial Product Generator (DRPPG) module.

1. Introduction

The FIR filter is the most commonly used filter in numerous Digital Signal Processing (DSP) applications, like the removal of echo, speech signal processing, speaker standardization, versatile commotion evacuation, and communication [1,2]. Infinite Impulse Response (IIR) and Finite Impulse Response (FIR) filters are two kinds of advanced channels, which are utilized in the correspondence frameworks. In DSP systems, FIR configuration assumes a crucial job by changing the input information test with the ideal unit reaction of the FIR channel [3]. The most complex systems such as image processing and DSP applications are realized using the FIR filter as a basic tool because of its absolute stability as well as linear phase property [4]. In multimedia and mobile communication systems, the fastest and least difficult FIR filters are mostly used for various applications, such as: channel equalization, analog signal processing, filtering systems, and digital signal processing functions [5].
The traditional FIR filter design suffers from a major hindrance related to the number of evaluation processes, which is too high, so that it utilizes a higher filter order, larger area of the hardware, and consumes more energy when compared to the reconfigurable FIR filter [6,7]. In the conventional design, the FIR filter was implemented with the Distributed Arithmetic (DA) technique, in which the filter order increased with higher level [8,9,10,11]. Many of the existing architectures have been designed by using various types of FIR filter, such as: the linear-based FIR filter [12], parallel-based FIR filter [13], low-power-multiplier FIR filter [14], and DA-based FIR filter [15].
The above-mentioned conventional designs occupy high hardware resources and provide low efficiency. Meanwhile, most of the conventional designs are not concentrating on the reconfigurable design. An innovative scheme is proposed in [16,17], which helps to design a block-based RFIR structure easily. Yet, these architectures require more hardware to perform and filter operations and do not perform the reconfigurable process. As a result, the RFIR filter block-based model is only applicable for adaptive filters and 2-dimensional filters [18]. To solve this hindrance, the RFIR–APC–OMS (Anti-Symmetric Product Coding–Odd Multiple Storage) architecture is introduced in this paper.
The scientific contributions of this work are as follows:
  • Due to the usage of conventional multipliers and adders, the hardware utilization of the filter has been increased to a great extent. The less combinational block adder and multiplier help to raise the filter performance effectively.
  • Carry Look ahead adder (CLA) and Wallace tree multiplier (WTM) both play an important role in minimizing the energy consumption of the proposed architecture.
  • The APC–OMS structure helps to redesign the Dynamic Reconfigurable Partial Product Generator (DRPPG) module, which in turn occupies less area and less delay.
  • The proposed architecture was evaluated in case of Field Programmable Gate Array (FPGA) and Application Specific Integrated Circuit (ASIC) performance for different architectures. The overall performance of the RFIR–APC–OMS architecture is better than the existing architectures.
The organization of the paper is given as follows. Section 2 describes a general review of previously published papers on RFIR filter architecture. Section 3 demonstrates the WTM algorithm and CLA addition operation based on RFIR architecture. Section 4 presents a comparative analysis of the RFIR–APC–OMS filter and conventional methods based on experimental result. In Section 5, the overall conclusion of our work is given.

2. Related Works

Ramanathan et al., in Ref. [19], introduced a high-throughput and high-speed adaptive filter with DA technique. In the low-power adaptive FIR filter, the Least Mean Square (LMS) method was used to minimize the Mean Square Error (MSE) and update the weight in between the desired response and the current filter outcomes. Due to the usage of the pipeline DA table, switching activity has been increased, which consumed more power in the proposed architecture.
Krishnamoorthy et al., in Ref. [20], introduced a reconfigurable FIR filter in the Very Large-Scale Integration (VLSI) domain. In this work, interpolation and a multi-standard digital up-converter has been used to limit the area and power. The multiplication process is reduced in each input sample. In addition, a binary common sub-expression algorithm acted as an effective multiplier. The operating frequency also improved due to the usage of the binary common algorithm. However, the coefficient generation process was not explained clearly.
Mohanty et al., in Ref. [21], proposed a reconfigurable FIR filter based on VLSI architecture by using distributed arithmetic operations. In this work, FIR filter structures were analyzed in two types, that is transpose form structure and direct form structure. From the analysis of the two structures, the transpose form FIR structure required a larger number of registers when compared with a direct-form FIR structure. The DA-based FIR filter is a reconfigurable system, which is used to achieve high scalability for some specific applications, namely, larger block-size applications. The main disadvantage of the distributed arithmetic operation-based RFIR filter was that the performance of the method was very low in ASIC implementation.
Pan and Kumar [22] implemented a FIR filter based on the bit-level optimization technique. In this work, the Multiple Constant Multiplication (MCM) algorithm was utilized to execute the FIR filter operation. MCM enables the elimination of the sub-expression process, which optimizes the adder’s tree. This proposed algorithm eliminates unwanted expressions to reduce computational time and hardware complexity. This MCM algorithm required more unwanted submodules to store the intermediate register values, which caused more power consumption and higher delay.
Seik-Jae et al., in Ref. [23], illustrated the architectural approach for implementing a low-power RFIR filter structure. This new architectural approach was applicable for a fixed filter order process which can change dynamically. Mathematical analysis of the power-saving process was explained clearly. The power and area savings were equal to 41.9% and 5.3%, respectively, compared to conventional methods with less performance degradation. The bit-level optimization-based FIR filter decreased the throughput and system speed gradually, which caused high delays.
Roy and Chandra [24] proposed a triangular common subexpression elimination algorithm using the less logical blocks in the design of FIR filters. In this work, more logical operators are required to generate the bit pattern. The subexpression process has been eliminated once the bit values were performed on the average of the triangular model. The critical path of the proposed model required four adder steps for every bit of operation. The multiplier and adder blocks were required and occupied 1418 registers.
Tan et al., in Ref. [25], presented the factor-based Recursive Least Squares (RLS) algorithm for the identification of the FIR process with input noise operation. The forgetting factor of the proposed model helped to process the recursive estimation of the noise variance. The linear array design model is used to perform the bias compensation, which varied based on different FPGA devices. The proposed FPGA implemented using Zynq FPGA occupied 11500 LUT counts, which is more for the RLS algorithm.
Sakthimohan and Deny [26] illustrated a 16-tap FIR filter using Radix-4 booth multiplier with the help of the booth decoding method. The decoder and encoder modules depended on the input bits, which was given to the filter design. The proposed design required a minimum number of steps to process the multiplication operation. Partial product operation was performed with higher power consumption (1790 W) in the multiplier module.
Sumalatha et al., in Ref. [27], proposed the VLSI implementation of the FIR filter for ECG denoising applications. In this work, a Vedic multiplier was designed to perform the multiplication operation with less power consumption. The filter model was evaluated in the different FPGA and ASIC platforms to get the area, power, and delay performances. The denoising application is implemented to remove the noise, which was presented in the ECG signals. However, the Vedic multiplier process required a multiplication for every eight-clock cycle, which increased the latency of the design model.
Patali and Kassim [28] proposed an efficient design methodology using retaining and two-level pipelining techniques to improve the performance of FIR filter architecture. In this paper, a two-level pipelining technique separated the multiplication and addition operation, which used to reduce the latency of the filter. Simulation results of a 64-tap FIR filter enabled to achieve 35%, 33.97%, 38.06%, and 29.67% improvements in critical path delay, latency, power delay product, and area delay product, respectively. When the retiming technique was applied in the design model, the addition operation was broken, which caused degradation in the filter output.
Gandhi et al., in Ref. [29], presented an FIR filter implementation using a self-tunable addition and multiplication process. Two ideal instruments were proposed in this paper. The first one reset the circuit and tuned the coefficients to a certain level. Here, the self-tuning system helped to auto-tune itself at a runtime. The second one performed the reduction of nodes of addition and multiplication operations. Meanwhile, the results showed that this design gave the highest threshold limit with a satisfactory low area and delay computation. However, a single-gate-level netlist required four adders and five multipliers, which caused 384 LUTs for a simple design model.
According to the overall analysis, real-time systems frequently have a high throughput requirement. Additionally, adaptable systems can be modified in accordance with new specifications. Furthermore, the performance enhancement obtained through parallel implementation typically calls for expensive dedicated software. It is crucial to keep an observation on the configuration files to ensure that the right bits are being utilized at the right location on the FPGA. If one bit is misplaced, the configuration as a whole is invalid. Therefore, RFIR–APC–OMS is proposed, wherein the reconfigurable hardware devices allow the creation of unique, high-performance computing circuits, as well as the flexibility of software. This adaptability enables the use of FIR-filters built into FPGAs in real-time and high-throughput applications.

3. RFIR–APC–OMS Methodology

The proposed RFIR–APC–OMS filter uses WTM and CLA, which reduced the complexity of the multiplication process when compared to a conventional multiplier design. As an alternative to shifting accumulator operation, the WTM has been employed to increase system performance while avoiding undesired blocks that could result in increased area and power consumption. The system delay is decreased using the adder to perform addition operations at a fast rate of speed. The DRPPG operation has been carried out by using APC–OMS architecture with fewer slices. Therefore, as compared to the current state-of-the-art RFIR filter designs, the proposed RFIR–APC–OMS architecture occupied less space and consumed less power in the RFIR filter design.
The improvement of the FPGA hardware design process is denoted as the important research area in communication systems, as it has the potential to significantly enhance the performance of a signal processing system. In the proposed method, the FPGA architecture is used to implement the RFIR filter. The block-level representation of the proposed RFIR–APC–OMS architecture is presented in Figure 1. The algorithm of the Wallace tree design based on RFIR architecture is presented in Section 3.1.

3.1. RFIR Filter Model Using Wallace Tree Design

The RFIR–APC–OMS-block diagram fundamentally requires a parallel-level shifter and an accumulator for addition and multiplication operations. In the proposed method, a reusable LUT based on RAM is utilized rather than a LUT based on ROM. In the RFIR– APC–OMS strategy, an effective plan helps to minimize the count of LUT. Moreover, the CLA was used for the addition operation in the design of the FIR filter. The reconfigurable WTM-based FIR filter is utilized in the proposed method to improve the process of the RFIR filter and reduce the complexity of multiplication. Furthermore, the RFIR architecture required less hardware because it used WTM, which enhances the functionality of the entire system. In the proposed architecture, the length of the register is represented as N. However, the registers are a limited resource in FPGA devices, and LUTs in the FPGA have only two-bit registers. The distributed RAM (DRAM) is used to analyze the LUTs in the implementation of the FPGA. Moreover, L denotes input bit length, and L -times the clock period of the operation in the design is presented as sample duration. It is rarely suitable for applications that have more throughput.
In the implementation of LUT, the DRAM is used for less resource utilization in each bit slices. Henceforth, this work eliminates the generator of the partial inner product by Q parallel units. The Q parallel units have R-bit slices based on R-time multiplexed processes. Here, L is denoted as a positive numbers or composite number, which is known as L = R Q (where R Q represents a two decimal value). In Equation (1), index l value is plotted with r + q p . Here, r = 0 , 1 , 2 , . Q 1 . The S l , p value is presented in Equation (2).
y = l 1 L 1 2 1 ( p 1 P 1 S l , p )
S l , p = m 0 M 1 h ( m + p M ) [ s ( M + p M ) ]
The total sum of the partial product of the M samples is represented as S l , p . Here, M is represented as 222; l id values are expressed as 0, 1, 2, 3, …..., L 1 and, respectively, p id values are expressed as 0 , 1 , 2 , , P 1 .
S ( l , p ) = q 1 Q 1 2 R P [ r 0 R 1 2 1 ( P 0 P 1 r + q , R , P ) ]
From Equation (3), q represents the index, and r denotes the index of time. The proposed architecture consists of R time slots for a particular working clock period. In each single R cycle period, the filter delivers a single output. The RFIR–APC–OMS architecture has q units, and each unit consists of P DRAM, respectively, with APC, OMS, and pipeline adder tree (PAT) modules to perform summation for the right-most portion. For processing the shift operation, the shift accumulator is used in the method based on the second summation. Generally, the RFIR processes the input by multiplying the coefficient effectively. Thus, the multiplication is performed based on the WTM. Instead of a normal digital adder, CLA is used for improving the addition performance.

3.1.1. Working Principle of APC–OMS

The block diagram of APC–OMS is shown in Figure 2. In this section, the process of LUT memory reduction and shift compliment coefficient are explained with the optimal designs. The design is implemented for various taps of the filter.
The 16-tap filter design generated 16 coefficients, which are stored in conventional LUT registers. The coefficient values are given in Table 1. From the conventional LUTs, the optimal LUTs are generated, which holds the four rows of coefficients, as given in Table 2. The remaining set of coefficients are generated with the help of the SC-LUT technique. The optimal SC-based LUTs are given in Table 3. The detailed explanation of the proposed work is described in the mathematical operation.

3.1.2. Mathematical Operation of the Proposed Method with an Example

  • In this work, APC and OMS designs are used to implement the RFIR filter.
  • In the conventional methods, 16 memory units are required to fetch the 16 values from the respective address. The main motivation for using the APC–OMS design is to reduce the memory unit count.
  • With the help of the APC–OMS design, 4 memory units (0000, 0010, 0100, and 0110) are enough to fetch the 16 values from the respective address.
  • Those 4 memory units have performed the left shifting and 2′s complement operation, which helps to get the 16 data values.
Example.
Let us consider that the 1st memory unit is A = 0100.
The 1st memory unit is going to perform the left-shifting operation (<<0, <<1, <<2, <<3) that produced the shifted output, which is given in Equations (4)–(7).
L S 0 = A 0
L S 1 = A 1
L S 2 = A 2
L S 3 = A 3
After performing the shifting operation, the outputs L S 0 , L S 1 , L S 2 , L S 3 are obtained as 0100, 1000, 0001, and 0010.
Eventually, i.e., L S 0 = 0100; L S 1 = 0010; L S 2 = 0101; L S 3 = 1000.
These shifted outputs are performed as the 2′s complement, which is given in Equations (8)–(11).
2 s   c o m p _ o u t 0 = 2 s ( L S 0 )
2 s   c o m p _ o u t 1 = 2 s ( L S 1 )
2 s   c o m p _ o u t 2 = 2 s ( L S 2 )
2 s   c o m p _ o u t 3 = 2 s ( L S 3 )
  • 2′s complement of 2 s   c o m p _ o u t 0 is 1100;
  • 2′s complement of 2 s   c o m p _ o u t 1 is 1110;
  • 2′s complement of 2 s   c o m p _ o u t 2 is 1011;
  • 2′s complement of 2 s   c o m p _ o u t 3 is 1000.
As per the 2′s complement output, the 1st memory unit 0100 will fetch the data from the four addresses (1100, 1110, 1011, 1000). Similarly, the remaining memory units (0000, 0100, and 0110) are performed in the APC–OMS process. According to this operation, 4 memory units are enough to fetch the 16 values from the respective memory address.
  • The optimal CLA and WTM are used in the shift accumulator module.
  • Due to the usage of the optimal logical block, the hardware utilization of the RFIR filter has been reduced.
  • Moreover, these optimal designs have a small delay only used to increase the speed of the RFIR filter architecture.
  • The SC-based LUT design is selected from the results with a significant area reduction when compared to previous works.
The schematic diagram of the WTM is shown in Figure 3. The output of adders is given to the input of the Wallace tree multiplier. This WTM performs the multiplication operation in an effective manner. This design avoids the unwanted blocks, which may lead to more area and power consumption.

3.1.3. Carry Look-Ahead Adder Design

In the proposed RFIR–APC–OMS filter architecture, CLA is used for addition operations, as shown in Figure 4. The performance of the CLA is faster than the normal adders, which makes it suitable for the proposed method. Generally, four 4-bit adder blocks are required to design the 16-bit adder, as shown in Figure 4.
For designing 16-bit CLA adder, the 4 bit-CLA module plays a vital role. Based on the 16-bit CLA, the operations are performed and P and G computations are evaluated, as shown in Figure 5. The CLA adders are generally designed as 4-bit modules, which are connected together to construct large-size adders. In the RFIR–APC–OMS architecture, the 16-bit CLA is used to reduce the power, area and delay.

3.2. Coefficient Generation

The coefficient is produced from the MATLAB Filter Design and Analysis (FDA) tool, as shown in Figure 6. With the help of this tool, the coefficients are generated and used in the RFIR filter design. The filter specification is as follows:
  • The direct form FIR;
  • Design method—Equiripple;
  • Density factor—8;
  • Response type—Low pass.
Figure 6. FDA tool interface with selected parameters.
Figure 6. FDA tool interface with selected parameters.
Electronics 11 03118 g006
With the help of the filter response icon, the magnitudes are generated, which helps to optimize the overall design. From the file toolbox, the export option is utilized to generate the coefficients. These coefficients are used in the Verilog program for simulating the FIR operation.

4. Results and Discussion

The proposed method was evaluated using a PC with 8 GB of RAM, a 1.60 GHz Intel i-5 processor, and a 1 TB hard disk. The Verilog HDL language is used to design the proposed architecture. Coefficients are generated using the MATLAB FDA tool. The Verilog HDL code is developed and simulated in the Modelsim 10.5 tool. The FPGA performance of the proposed architecture and the hardware utilization of the proposed method are calculated using the Xilinx 14.4 ISE simulator. The performance of the ASIC, such as: area, delay, and power, is calculated using a Cadence RTL compiler. The FPGA performance of the proposed architecture and the hardware utilization of the proposed method are calculated using the Xilinx 14.4 ISE simulator, which synthesizes and implements the model, as shown in Figure 7.
Xilinx 14.4 ISE is a platform used to compile combinational and sequential design models with high speed. Based on the specifications, RTL code is designed and synthesized in the Xilinx 14.4 ISE simulator. After synthetization, the frequency and the RTL schematic is observed from the synthesized report. During the implement design operation: translate, map, and place & route operations are performed for the RFIR design architecture. After implementing the design in Xilinx 14.4 ISE, the LUTs, flip fops, and slices are evaluated for the proposed RFIR design model. Xilinx 14.4 ISE was also used during experiments with the “Generate programing file” option.
The results of the area, power, and delay performance of the 4-bit design are described in Table 4. In this comparison, DA [7], CBA [9], and R2 designs are considered as a reference and compared with the proposed filter design. RFIR–R2–CSLA, RFIR–R2–LCSLA, and RFIR–VM–CLA architectures were previously implemented by the same author, and the results are also compared with the current proposed method.
The comparison of area, power, and delay are shown in Figure 8, Figure 9 and Figure 10, respectively. In these graphs, the first two taps (3-tap and 7-tap) are for 180 nm technology, and the rest of the two taps (3-tap and 7-tap) are for 45 nm technology. With the help of pictorial representation, it is clear that the proposed method required less area, power, and delay when compared to the conventional designs.
The area, power, and delay performance are given in Table 5. The 8-bit RFIR is working based on the 4-bit design only. The major changes in the 8-bit RFIR filter are that the input size of the filter contains 8-bit values. With the aid of optimal designs, the ASIC performances are improved in the proposed work, compared to conventional works.
The LUT, flip flop, slices, and frequency values are described in Table 6. Due to the usage of WTM and CLA, the RFIR design required less hardware utilization, which helps to improve the overall system performances. The different FPGA devices are used to analyze the hardware utilization of the filter design. Due to less latency, the operating speed of the proposed design also improved.
An illustration of the comparisons of the LUT, flip flop, slices, and frequency are shown in Figure 11, Figure 12, Figure 13 and Figure 14, respectively. In these graphs, the initial 3-tap and 7-tap represent Virtex 4 results. In addition, the two 3-tap and 7-tap are considered as Virtex 5, and the final 3-tap and 7-tap are considered as Virtex 6 results. As seen from these plots, the FPGA performance is improved in the RFIR–APC–OMS method compared to other existing methods.
Similarly, 8-bit filter design FPGA performances are analyzed with conventional comparison. This architecture works in a similar way as the 4-bit architecture, except the design complexity is higher. The inputs given are also 8-bits in length. This 8-bit architecture also provides efficient performance results in terms of ASIC and FPGA evaluation. With the help of Xilinx RTL design, the main module and internal module RTL are handled, as shown in Figure 15 and Figure 16.
The proposed filter may be used, i.e., in the medical industry, to reduce the noise present in ECG and EEG signals. The filtering operation may aid in obtaining a clear view of the medical signals, which helps to identify the condition of the patient. The proposed filter enables the solving of numerous problems in the signal processing field, depending on the signal denoising and signal enhancement with restoration. During the denoising process, the proposed filter is monitored for data loss. As we know, each signal is important to identify the health condition of the patient [27]. Additionally, RFIR filters can solve various problems occurring in the wavelet transform [30].

5. Conclusions

In this work, the FPGA and ASIC platforms are used to design the RFIR–APC–OMS architecture in the Verilog HDL language. The RFIR–APC–OMS filter has been implemented by using WTM and CLA, which decreased the complication of the multiplication process when compared to a normal multiplier design. The WTM has been used to improve the system performance as an alternative to shifting accumulator operation. The adder is used for the addition operation with high speed, which in turn reduces the system delay. APC–OMS architecture has been used to perform the equivalent DRPPG operation with fewer slices. Hence, the proposed RFIR–APC–OMS architecture required less area and less power in the RFIR filter design when compared to existing state-of-the-art RFIR filter designs. The RFIR–APC–OMS architecture enables reducing: 3.44% of the area, 2.74% of power, and 3.84% of delay (using ASIC 180 nm technology), compared to the existing Dynamic Reconfigurable Partial Product Generator (DRPPG) modules. This proposed RFIR–APC–OMS functions better for small variations in the filter order, which produces the minimal hardware usage. Additionally, the operation of filters with lower orders cannot boost the throughput of these structures. This fact will be taken into account in upcoming studies. Therefore, in the future, various types of filter architectures can be designed to further improve the performances of both FPGA and ASIC.

Author Contributions

The paper investigation, resources, data curation, writing—original draft preparation, writing—review and editing, and visualization were performed by K.S.R. The paper conceptualization and software were conducted by S.M. and A.M. The validation and formal analysis, methodology, supervision, project administration, and funding acquisition of the version to be published were conducted by P.F.-G. and P.B.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Not applicable.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Naveen, N.S.; Gupta, K.A. An efficient reconfigurable FIR digital filter using modified distribute arithmetic technique. Int. J. Emerg. Technol. Adv. Eng. 2015, 5, 152–156. [Google Scholar]
  2. Rasekh, A.; Bakhtiar, M.S. Design of low-power low-area tunable active RC filters. IEEE Trans. Circuits Syst. II Express Briefs 2018, 65, 6–10. [Google Scholar] [CrossRef]
  3. Thakur, R.; Khare, K. High-speed FPGA implementation of FIR filter for DSP applications. Int. J. Model. Optim. 2013, 3, 92–94. [Google Scholar] [CrossRef]
  4. Bhagyalakshmi, N.; Rekha, K.R.; Nataraj, K.R. Design and Implementation of DA-based Reconfigurable FIR Digital Filter on FPGA. In Proceedings of the 2015 International Conference on Emerging Research in Electronics, Computer Science and Technology, Mandya, India, 17–19 December 2015. [Google Scholar]
  5. Maamoun, M.; Hassani, A.; Dahmani, S.; Ait Saadi, H.; Zerari, G.; Chabini, N.; Beguenane, R. Efficient FPGA based architecture for high-order FIR filtering using simultaneous DSP and LUT reduced utilization. IET Circuits Devices Syst. 2021, 15, 475–484. [Google Scholar] [CrossRef]
  6. Karthick, S.; Valarmathy, S.; Kamalanathan, C. Design and performance analysis of a reconfigurable FIR filter. Int. J. Innov. Eng. Technol. 2017, 8, 73–80. [Google Scholar]
  7. Meher, P.K.; Park, S.Y. High-throughput pipelined realization of adaptive FIR filter based on distributed arithmetic. In Proceedings of the 2011 IEEE/IFIP 19th International Conference on VLSI and System-on-Chip, Hong Kong, China, 3–5 October 2011. [Google Scholar]
  8. Bonetti, A.; Teman, A.; Flatresse, P.; Burg, A. Multipliers-driven perturbation of coefficients for low-power operation in reconfigurable FIR filters. IEEE Trans. Circuits Syst. I Regul. Pap. 2017, 64, 2388–2400. [Google Scholar] [CrossRef]
  9. Reddy, K.S.; Suresh, H.N. A Low Power VLSI implementation of reconfigurable FIR filter using carry bypass adder. Int. J. Intell. Eng. Syst. 2018, 11, 225–236. [Google Scholar] [CrossRef]
  10. Reddy, K.S.; Suresh, H.N. A low-power VLSI implementation of RFIR filter design using Radix-2 algorithm with LCSLA. IETE J. Res. 2019, 66, 741–750. [Google Scholar] [CrossRef]
  11. Reddy, K.S.; Suresh, H.N. FPGA implementation of reconfigurable FIR filter using Vedic design with CLA adder. Int. J. Adv. Sci. Technol. 2019, 28, 144–161. [Google Scholar]
  12. Tsao, Y.C.; Choi, K. Area-efficient VLSI implementation for parallel linear-phase FIR digital filters of odd length based on fast FIR algorithm. IEEE Trans. Circuits Syst. II Express Briefs 2012, 59, 371–375. [Google Scholar] [CrossRef]
  13. Khan, S.; Jaffery, Z.A. Low power FIR filter implementation on FPGA using parallel distributed arithmetic. In Proceedings of the 2015 Annual IEEE India Conference, New Delhi, India, 17–20 December 2015. [Google Scholar]
  14. Rashidi, B.; Rashidi, B.; Pourormazd, M. Design and implementation of low power digital FIR filter based on low power multipliers and adders on Xilinx FPGA. In Proceedings of the 2011 3rd International Conference on Electronics Computer Technology, Kanyakumari, India, 8–10 April 2011. [Google Scholar]
  15. Park, S.Y.; Meher, P.K. Efficient FPGA and ASIC realizations of DA-based reconfigurable FIR digital filter. IEEE Trans. Circuits Syst. II Express Briefs 2014, 61, 511–515. [Google Scholar] [CrossRef]
  16. Mohanty, B.K.; Meher, P.K. A high-performance energy-efficient architecture for FIR adaptive filter based on new distributed arithmetic formulation of block LMS algorithm. IEEE Trans. Signal. Process. 2013, 61, 921–932. [Google Scholar] [CrossRef]
  17. Mohanty, B.K.; Meher, P.K.; Al-Maadeed, S.; Amira, A. Memory footprint reduction for power-efficient realization of 2-D finite impulse response filters. IEEE Trans. Circuits Syst. I Regul. Pap. 2014, 61, 120–133. [Google Scholar] [CrossRef]
  18. Mohanty, B.K.; Meher, P.K. A high-performance FIR filter architecture for fixed and reconfigurable applications. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2016, 24, 444–452. [Google Scholar] [CrossRef]
  19. Ramanathan, S.; Anand, G.; Reddy, P.; Sridevi, S.A. Low power adaptive FIR filter based on distributed arithmetic. Int. J. Eng. Res. Appl. 2016, 6, 47–51. [Google Scholar]
  20. Krishnamoorthy, R.; Kalaivaani, P.T.; Thirumurugan, P. Performance evaluation of re-configurable VLSI architecture based on finite impulse response interpolation filter. Int. J. Recent Technol. Eng. 2018, 7, 484–491. [Google Scholar]
  21. Mohanty, B.K.; Meher, P.K.; Singhal, S.K.; Swamy, M.N.S. A high-performance VLSI architecture for reconfigurable FIR using distributed arithmetic. Integr. VLSI J. 2016, 54, 37–46. [Google Scholar] [CrossRef]
  22. Pan, Y.; Meher, P.K. Bit-level optimization of adder-trees for multiple constant multiplications for efficient FIR filter implementation. IEEE Trans. Circuits Syst. I Regul. Pap. 2013, 61, 455–462. [Google Scholar] [CrossRef]
  23. Lee, S.J.; Choi, J.W.; Kim, S.W.; Park, J. A reconfigurable FIR filter architecture to trade off filter performance for dynamic power consumption. IEEE Trans. Very Large Scale Integr. VLSI Syst. 2010, 19, 2221–2228. [Google Scholar] [CrossRef]
  24. Roy, S.; Chandra, A. A triangular common subexpression elimination algorithm with reduced logic operators in FIR Filter. IEEE Trans. Circuits Syst. II Express Briefs 2020, 67, 3527–3531. [Google Scholar] [CrossRef]
  25. Tan, H.J.; Chan, S.C.; Lin, J.Q.; Sun, X. A new variable forgetting factor-based bias-compensated RLS algorithm for identification of FIR systems with input noise and its hardware implementation. IEEE Trans. Circuits Syst. I Regul. Pap. 2019, 67, 198–211. [Google Scholar] [CrossRef]
  26. Sakthimohan, M.; Deny, J. An optimistic design of 16-tap FIR filter with Radix-4 booth multiplier using improved booth recoding algorithm. Microprocess. Microsyst. 2020, 103453. [Google Scholar] [CrossRef]
  27. Sumalatha, M.; Naganjaneyulu, P.V.; Prasad, K.S. Low power and low area VLSI implementation of Vedic design FIR filter for ECG signal de-noising. Microprocess. Microsyst. 2019, 71, 102883. [Google Scholar] [CrossRef]
  28. Patali, P.; Kassim, S.T. High throughput and energy efficient FIR filter architectures using retiming and two level pipelining. Procedia Comput. Sci. 2020, 171, 617–626. [Google Scholar] [CrossRef]
  29. Gandhi, M.; Periyasamy, M.; Murugeswari, S.; Washburn, S.P.S. A VLSI implementation of FIR filter using self tunable addition and multiplication. Mater. Today Proc. 2020, 33, 4318–4322. [Google Scholar] [CrossRef]
  30. Radhakrishnan, P.; Themozhi, G. FPGA implementation of XOR-MUX full adder based DWT for signal processing applications. Microprocess. Microsyst. 2020, 73, 102961. [Google Scholar] [CrossRef]
Figure 1. Block-level illustration of the proposed RFIR–APC–OMS architecture.
Figure 1. Block-level illustration of the proposed RFIR–APC–OMS architecture.
Electronics 11 03118 g001
Figure 2. Block diagram of the APC–OMS.
Figure 2. Block diagram of the APC–OMS.
Electronics 11 03118 g002
Figure 3. Schematic of the Wallace tree multiplier.
Figure 3. Schematic of the Wallace tree multiplier.
Electronics 11 03118 g003
Figure 4. CLA adder design.
Figure 4. CLA adder design.
Electronics 11 03118 g004
Figure 5. Block level illustration of the 4-bit CLA design.
Figure 5. Block level illustration of the 4-bit CLA design.
Electronics 11 03118 g005
Figure 7. Xilinx ISE synthesis and implementation design.
Figure 7. Xilinx ISE synthesis and implementation design.
Electronics 11 03118 g007
Figure 8. Comparative analysis of area for 180 nm and 45 nm.
Figure 8. Comparative analysis of area for 180 nm and 45 nm.
Electronics 11 03118 g008
Figure 9. Comparative analysis of power for 180 nm and 45 nm.
Figure 9. Comparative analysis of power for 180 nm and 45 nm.
Electronics 11 03118 g009
Figure 10. Comparative analysis of delay for 180 nm and 45 nm.
Figure 10. Comparative analysis of delay for 180 nm and 45 nm.
Electronics 11 03118 g010
Figure 11. Comparative analysis of LUT with various types of Virtex devices.
Figure 11. Comparative analysis of LUT with various types of Virtex devices.
Electronics 11 03118 g011
Figure 12. Comparative analysis of flip flop with various types of Virtex devices.
Figure 12. Comparative analysis of flip flop with various types of Virtex devices.
Electronics 11 03118 g012
Figure 13. Comparative analysis of slices with various types of Virtex devices.
Figure 13. Comparative analysis of slices with various types of Virtex devices.
Electronics 11 03118 g013
Figure 14. Comparative analysis of the frequency with various types of Virtex devices.
Figure 14. Comparative analysis of the frequency with various types of Virtex devices.
Electronics 11 03118 g014
Figure 15. Main module RTL design.
Figure 15. Main module RTL design.
Electronics 11 03118 g015
Figure 16. RTL Schematic of the internal architecture.
Figure 16. RTL Schematic of the internal architecture.
Electronics 11 03118 g016
Table 1. 16 tap and 16 bit LUT table.
Table 1. 16 tap and 16 bit LUT table.
Address (C1)00000001001000110100010101100111
Coefficient (C2)0A02A03A04A05A06A07A08A
Address (C3)10001001101010111100110111101111
Coefficient (C4)15A14A13A12A11A10A9A16A
Table 2. Symmetric property coefficient value.
Table 2. Symmetric property coefficient value.
Address (C1)000000010010001101000101011001111000
Coefficient (C2)A2A3A4A5A6A7A8A16A
Table 3. Optimized LUT using the SC algorithm.
Table 3. Optimized LUT using the SC algorithm.
AddressCoefficient (C1)Shift CountOutput
4 bit—00001 × A12 × A
24 × A
38 × A
416 × A
4 bit—001003A16A
4 bit—010005A0
4 bit—011007A0
Table 4. Area, power, and delay performance for the 4-bit design.
Table 4. Area, power, and delay performance for the 4-bit design.
4-Bit Input
TechnologyArchitecturesBits and TapsArea
[μm2]
Power
[nW]
Delay
[ps]
APP
[μm2 × nW]
ADP
[μm2 × ps]
180 nmDA-RFIR [7]4B and 3T2,14,7818,84,72217810,021,475,88238,231,018
4B and 7T3,64,7001,201,345178438,130,521,50064,916,600
LC-CBA-RFIR [9]4B and 3T2,01,4758,14,360165164,073,181,00033,243,375
4B and 7T3,06,9871,153,698165354,170,287,92650,652,855
RFIR-R2-CSLA4B and 3T1,86,4137,95,214151148,238,227,38228,148,363
4B and 7T2,36,9471,132,478158268,303,290,32637,437,626
RFIR-R2-LCSLA4B and 3T1,54,7897,59,641146117,584,070,74922,599,194
4B and 7T2,06,4159,64,178139199,020,801,87028,691,685
RFIR-VM-CLA4B and 3T1,03,6546,93,89412871,924,888,67613,267,712
4B and 7T1,81,4989,56,414128173,587,228,17223,231,744
RFIR-APC-OMS4B and 3T1,00,1416,74,84112567,579,252,58112,517,625
4B and 7T1,62,4789,36,421125152,147,811,23820,309,750
45 nmDA-RFIR [7]4B and 3T642142,015198269,778,3151,271,358
4B and 7T699149,579194346,606,7891,356,254
LC-CBA-RFIR [9]4B and 3T509839,798171202,890,2048,71,758
4B and 7T642540,514172260,302,4501,105,100
RFIR-R2-CSLA4B and 3T484135,087170169,856,1678,22,970
4B and 7T509938,894170198,320,5068,66,830
RFIR-R2-LCSLA4B and 3T394732,614164128,727,4586,47,308
4B and 7T419836,524164153,327,7526,88,472
RFIR-VM-CLA4B and 3T229529,01716066,594,0153,67,200
4B and 7T230129,59415968,095,7943,65,859
RFIR-APC-OMS4B and 3T220127,01415459,457,8143,38,954
4B and 7T224527,95415562,756,7303,47,975
Table 5. Area, power, and delay performance for 8-bit design.
Table 5. Area, power, and delay performance for 8-bit design.
8-Bit Input
TechnologyArchitecturesBits and TapsArea
[μm2]
Power
[nW]
Delay
[ps]
APP
[μm2 × nW]
ADP
[μm2 × ps]
180 nmDA–RFIR [7]8B and 3T2,56,478241,897,112279679,402,784,13871,557,362
8B and 7T2,66,4572,431,657278647,932,029,24974,075,046
LC–CBA–RFIR [9]8B and 3T2,34,6742,240,984265525,900,679,21662,188,610
8B and 7T2,54,6132,314,521270589,307,135,37368,745,510
RFIR–R2–CSLA8B and 3T2,01,556193,225,220265389,457,024,42353,412,340
8B and 7T2,24,5131,834,612265411,894,243,95659,495,945
RFIR–R2–LCSLA8B and 3T2,01,450193,215,421261389,232,465,60452,578,450
8B and 7T2,14,7811,984,548258426,243,203,98855,413,498
RFIR–VM–CLA8B and 3T1,92,3571,351,544130259,978,949,20825,006,410
8B and 7T1,92,9621,140,187130220,012,763,89425,085,060
RFIR–APC–OMS8B and 3T1,92,2471,241,063125238,590,638,56124,030,875
8B and 7T1,92,8471,057,894125204,011,684,21824,105,875
45 nmDA–RFIR [7]8B and 3T13,34799,4201951,326,958,7402,602,665
8B and 7T13,45794,1521971,267,003,4642,651,029
LC–CBA–RFIR [9]8B and 3T10,4288,945,243189932,805,4561,970,892
8B and 7T12,47191,2471841,137,941,3372,294,664
RFIR–R2–CSLA8B and 3T947885,186171807,392,9081,670,550
8B and 7T961484,754175814,824,9561,682,450
RFIR–R2–LCSLA8B and 3T94268,515,222169802,644,8251,592,994
8B and 7T841486,541169728,155,9741,421,966
RFIR–VM–CLA8B and 3T37725,489,347159207,056,3965,99,748
8B and 7T37955,592,854159212,246,7606,03,405
RFIR–APC–OMS8B and 3T36545,34,781154195,408,6125,62,716
8B and 7T36415,41,872154197,294,8675,60,714
Table 6. LUT, flop flop, slices, and frequency performances for 4-bit design.
Table 6. LUT, flop flop, slices, and frequency performances for 4-bit design.
Target FPGA DeviceMethodologyBit and TabsNo. of LUTNo. of Flip-FlopNo. of SlicesFrequency
[MHz]
Virtex-4
Xc4vfx12
DA–RFIR [7]4 B and 3T825457221.145
4 B and 7T14298105110.214
LC–CBA–RFIR [9]4 B and 3T784652235.120
4 B and 7T1389096115.312
RFIR–R2–CSLA4 B and 3T664442254.754
4 B and 7T1308288136.418
RFIR–R2–LCSLA4 B and 3T574841278.36
4 B and 7T1288184141.25
RFIR–VM–CLA4 B and 3T423531315.706
4 B and 7T1107079160.962
RFIR–APC–OMS4 B and 3T383128321.141
4 B and 7T1056675164.215
Virtex-5
xc5vlx20t
DA–RFIR [7]4 B and 3T8810852210.54
4 B and 7T984642142.130
LC–CBA–RFIR [9]4 B and 3T749445224.125
4 B and 7T924642154.216
RFIR–R2–CSLA4 B and 3T729241233.36
4 B and 7T874945139.54
RFIR–R2–LCSLA4 B and 3T658441241.36
4 B and 7T864743171.24
RFIR–VM–CLA4 B and 3T433523289.763
4 B and 7T777036196.398
RFIR–APC–OMS4 B and 3T413120294.324
4 B and 7T746632201.654
Virtex-6
Xc6vcx75t
DA–RFIR [7]4 B and 3T91546978.36
4 B and 7T132729654.152
LC–CBA–RFIR [9]4B and 3T81456080.124
4 B and 7T124659262.145
RFIR–R2–CSLA4 B and 3T76405484.612
4 B and 7T116608468.154
RFIR–R2–LCSLA4 B and 3T69395296.32
4 B and 7T115598374.25
RFIR–VM–CLA4 B and 3T623648121.24
4 B and 7T112567985.364
RFIR–APC–OMS4 B and 3T543344124.210
4 B and 7T99497094.369
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Share and Cite

MDPI and ACS Style

Reddy, K.S.; Madhavan, S.; Falkowski-Gilski, P.; Divakarachari, P.B.; Mathiyalagan, A. Efficient FPGA Implementation of an RFIR Filter Using the APC–OMS Technique with WTM for High-Throughput Signal Processing. Electronics 2022, 11, 3118. https://doi.org/10.3390/electronics11193118

AMA Style

Reddy KS, Madhavan S, Falkowski-Gilski P, Divakarachari PB, Mathiyalagan A. Efficient FPGA Implementation of an RFIR Filter Using the APC–OMS Technique with WTM for High-Throughput Signal Processing. Electronics. 2022; 11(19):3118. https://doi.org/10.3390/electronics11193118

Chicago/Turabian Style

Reddy, Kasarla Satish, Sowmya Madhavan, Przemysław Falkowski-Gilski, Parameshachari Bidare Divakarachari, and Arun Mathiyalagan. 2022. "Efficient FPGA Implementation of an RFIR Filter Using the APC–OMS Technique with WTM for High-Throughput Signal Processing" Electronics 11, no. 19: 3118. https://doi.org/10.3390/electronics11193118

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop