MDPI - Publisher of Open Access Journals

8 pages, 921 KB

Open AccessProceeding Paper

Design of Complementary Metal–Oxide–Semiconductor Encoder/Decoder with Compact Circuit Structure for Booth Multiplier

by Yu-Nsin Wang and Yu-Cherng Hung

Eng. Proc. 2025, 103(1), 21; https://doi.org/10.3390/engproc2025103021 - 1 Sep 2025

Viewed by 834

Multipliers are crucial components in digital processing and the arithmetic logic unit (ALU) of central processing unit (CPU) design. As the data bit length increases, the number of partial products in the multiplication process increases, resulting in an increased summation time for the [...] Read more.

Multipliers are crucial components in digital processing and the arithmetic logic unit (ALU) of central processing unit (CPU) design. As the data bit length increases, the number of partial products in the multiplication process increases, resulting in an increased summation time for the partial products. Consequently, the speed of the multiplier circuit is adversely affected by increased time delays. In this article, we present a combined radix-4 Booth encoding module that employs metal–oxide–semiconductor (MOS) transistors that share common control signals to reduce the transistor count. In HSPICE simulations, the functionality of the proposed circuit architecture was verified, and the number of transistors used was successfully reduced. Full article

(This article belongs to the Proceedings of The 8th Eurasian Conference on Educational Innovation 2025)

► Show Figures

Figure 1

11 pages, 1974 KB

Open AccessProceeding Paper

Chip Design of Multithreaded and Pipelined RISC-V Microcontroller Unit

by Mao-Hsu Yen, Yih-Hsia Lin, Tzu-Feng Lin, Yu-Hui Chen, Yuan-Fu Ku and Chien-Ting Kao

Eng. Proc. 2025, 89(1), 31; https://doi.org/10.3390/engproc2025089031 - 28 Feb 2025

Cited by 1 | Viewed by 2484

Abstract

Multithreading is widely used in microcontroller unit (MCU) chips. Multithreaded hardware is composed of multiple identical single threads and provides instructions to different threads. Using the concept of thread-level parallelism (TLP), pauses are compensated for during single-thread operation to increase the throughput at [...] Read more.

Multithreading is widely used in microcontroller unit (MCU) chips. Multithreaded hardware is composed of multiple identical single threads and provides instructions to different threads. Using the concept of thread-level parallelism (TLP), pauses are compensated for during single-thread operation to increase the throughput at the same unit. The principle of pipelined management is to use instruction-level parallelism (ILP) to split the MCU into multiple stages. When an instruction is given in a certain stage, other instructions are provided to operate in other idle stages and improve their execution efficiency. Based on the four-thread and pipelined RISC-V MCU architecture, we analyzed the instruction types of three benchmarks, i.e., Coremark, SHA, and Dijkstra. A total of 94% of the instructions use the arithmetic logic unit (ALU). Based on the executable four-thread architecture, we developed two to four RISC-V architectures with different numbers of ALUs and a dispatch algorithm. This architecture allows for the simultaneous delivery of multiple instructions, enabling parallel processing of instructions and increasing efficiency. Compared to the traditional RISC-V architecture with only one ALU, the test results showed that the instructions per clock (IPCs) of RISC-V architectures with two, three, and four ALUs increased efficiency by 76, 128.9, and 154.3%, while the area increased by 12, 22.3, and 32.6% and the static power consumption increased by 5.1, 9.2, and 13.3%. The results showed a significant improvement in performance with only a slight increase in the area. Due to the limited area of chips, a two-thread microcontroller architecture was used for the IC design and tape-out. TSMC’s 180nm process with a chip area of 1190 × 1190 μm at 133 MHz was used in this study. Full article

(This article belongs to the Proceedings of 2024 IEEE 7th International Conference on Knowledge Innovation and Invention)

► Show Figures

Figure 1

15 pages, 1714 KB

Open AccessArticle

SAluMC: Thwarting Side-Channel Attacks via Random Number Injection in RISC-V

by Shibo Dang, Yunlong Shao, Zhida Li, Adetokunbo Makanju and Thomas Aaron Gulliver

Entropy 2025, 27(2), 202; https://doi.org/10.3390/e27020202 - 14 Feb 2025

Cited by 3 | Viewed by 1932

Abstract

As processor performance advances, the cache has become an essential component of computer architecture. Moreover, the rapid digital transformation of daily life has resulted in electronic devices storing greater amounts of sensitive information. Thus, device users are becoming more concerned about the security [...] Read more.

As processor performance advances, the cache has become an essential component of computer architecture. Moreover, the rapid digital transformation of daily life has resulted in electronic devices storing greater amounts of sensitive information. Thus, device users are becoming more concerned about the security of their personal information, so improving processor performance is no longer the sole priority. Hardware vulnerabilities are generally more difficult to detect and address compared to software viruses and related threats. A common technique for exploiting hardware vulnerabilities is through side-channel attacks. They can bypass software security to extract personal information directly from hardware components like the cache or registers. This paper introduces a novel architecture for the arithmetic logic unit (ALU) and associated memory controller (MC) based on the RISC-V microarchitecture to mitigate side-channel attacks. The proposed approach employs hardware-generated random numbers and has minimal design costs, negligible impact on the original system structure, seamless integration, and easy modification of internal components. Results are presented that show it is effective against side-channel attacks. Full article

(This article belongs to the Section Information Theory, Probability and Statistics)

► Show Figures

Figure 1

13 pages, 3558 KB

Open AccessArticle

Multi-Layer QCA Reversible Full Adder-Subtractor Using Reversible Gates for Reliable Information Transfer and Minimal Power Dissipation on Universal Quantum Computer

by Jun-Cheol Jeon

Appl. Sci. 2024, 14(19), 8886; https://doi.org/10.3390/app14198886 - 2 Oct 2024

Cited by 3 | Viewed by 2598

Abstract

The effects of quantum mechanics dominate nanoscale devices, where Moore’s law no longer holds true. Additionally, with the recent rapid development of quantum computers, the development of reversible gates to overcome the problems of energy and information loss and the nano-level quantum-dot cellular [...] Read more.

The effects of quantum mechanics dominate nanoscale devices, where Moore’s law no longer holds true. Additionally, with the recent rapid development of quantum computers, the development of reversible gates to overcome the problems of energy and information loss and the nano-level quantum-dot cellular automata (QCA) technology to efficiently implement them are in the spotlight. In this study, a full adder-subtractor, a core operation of the arithmetic and logic unit (ALU), the most important hardware device in computer operations, is implemented as a circuit capable of reversible operation using QCA-based reversible gates. The proposed circuit consists of one reversible QCA gate and two Feynman gates and is designed as a multi-layer structure for efficient use of area and minimization of delay. The proposed circuit is tested on QCADesigner 2.0.3 and QCADesigner-E 2.2 and shows the best performance and lowest energy dissipation. In particular, it shows tremendous improvement rates of 180% and 562% in two representative standard design cost indicators compared to the best existing studies, and also shows the highest circuit average output polarization. Full article

► Show Figures

Figure 1

16 pages, 7976 KB

Open AccessArticle

Design of All-Optical D Flip Flop Memory Unit Based on Photonic Crystal

by Yonatan Pugachov, Moria Gulitski and Dror Malka

Nanomaterials 2024, 14(16), 1321; https://doi.org/10.3390/nano14161321 - 6 Aug 2024

Cited by 9 | Viewed by 3515

Abstract

This paper proposes a unique configuration for an all-optical D Flip Flop (D-FF) utilizing a quasi-square ring resonator (RR) and T-Splitter, as well as NOT and OR logic gates within a 2-dimensional square lattice photonic crystal (PC) structure. The components realizing the all-optical [...] Read more.

This paper proposes a unique configuration for an all-optical D Flip Flop (D-FF) utilizing a quasi-square ring resonator (RR) and T-Splitter, as well as NOT and OR logic gates within a 2-dimensional square lattice photonic crystal (PC) structure. The components realizing the all-optical D-FF comprise of optical waveguides in a 2D square lattice PC of 45 × 23 silicon (Si) rods in a silica (SiO₂) substrate. The utilization of these specific materials has facilitated the fabrication process of the design, diverging from alternative approaches that employ an air substrate, a method inherently unattainable in fabrication. The configuration underwent examination and simulation utilizing both plane-wave expansion (PWE) and finite-difference time-domain (FDTD) methodologies. The simulation outcomes demonstrate that the designed waveguides and RR effectively execute the operational principles of the D-FF by guiding light as intended. The suggested configuration holds promise as a logic block within all-optical arithmetic logic units (ALUs) designed for digital computing optical circuits. The design underwent optimization for operation within the C-band spectrum, particularly at 1550 nm. The outcomes reveal a distinct differentiation between logic states ‘1’ and ‘0’, enhancing robust decision-making on the receiver side and minimizing logic errors in the photonic decision circuit. The D-FF displays a contrast ratio (CR) of 4.77 dB, a stabilization time of 0.66 psec, and a footprint of 21 μm × 12 μm. Full article

(This article belongs to the Section Nanophotonics Materials and Devices)

► Show Figures

Figure 1

18 pages, 688 KB

Open AccessArticle

An Optimized Hardware Implementation of a Non-Adjacent Form Algorithm Using Radix-4 Multiplier for Binary Edwards Curves

by Asher Sajid, Omar S. Sonbul, Muhammad Rashid, Muhammad Arif and Amar Y. Jaffar

Appl. Sci. 2024, 14(1), 54; https://doi.org/10.3390/app14010054 - 20 Dec 2023

Cited by 1 | Viewed by 2245

Abstract

Binary Edwards Curves (BEC) play a pivotal role in modern cryptographic processes and applications, offering a combination of robust security as well as computational efficiency. For robust security, this article harnesses the inherent strengths of BEC for the cryptographic point multiplication process by [...] Read more.

Binary Edwards Curves (BEC) play a pivotal role in modern cryptographic processes and applications, offering a combination of robust security as well as computational efficiency. For robust security, this article harnesses the inherent strengths of BEC for the cryptographic point multiplication process by utilizing the Non-Adjacent Form (NAF) algorithm. For computational efficiency, a hardware architecture for the NAF algorithm is proposed. Central to this architecture is an Arithmetic Logic Unit (ALU) designed for streamlined execution of essential operations, including addition, squaring, and multiplication. One notable innovation in our ALU design is the integration of multiplexers, which maximize ALU efficiency with minimal additional hardware requirements. Complementing the optimized ALU, the proposed architecture incorporates a radix-4 multiplier, renowned for its efficiency in both multiplication and reduction. It eliminates resource-intensive divisions, resulting in a substantial boost to overall computational speed. The architecture is implemented on Xilinx Virtex series Field-Programmable Gate Arrays (FPGAs). It achieves throughput-to-area ratios of 14.819 (Virtex-4), 25.5 (Virtex-5), 34.58 (Virtex-6), and 37.07 (Virtex-7). These outcomes underscore the efficacy of our optimizations, emphasizing an equilibrium between computational performance and area utilization. Full article

► Show Figures

Figure 1

12 pages, 944 KB

Open AccessArticle

Accelerating DSP Applications on a 16-Bit Processor: Block RAM Integration and Distributed Arithmetic Approach

by Bharathi M, Krithikaa Mohanarangam, Yasha Jyothi M Shirur and Jun Rim Choi

Electronics 2023, 12(20), 4236; https://doi.org/10.3390/electronics12204236 - 13 Oct 2023

Cited by 4 | Viewed by 3274

Abstract

Modern processors have improved performance but still face challenges such as power consumption, storage limitations, and the need for faster processing. The 16-bit Digital Signal Processors (DSPs) accelerate DSP applications by significantly enhancing speed and performance for tasks including audio processing, telecommunications, image [...] Read more.

Modern processors have improved performance but still face challenges such as power consumption, storage limitations, and the need for faster processing. The 16-bit Digital Signal Processors (DSPs) accelerate DSP applications by significantly enhancing speed and performance for tasks including audio processing, telecommunications, image and video processing, wireless communication, and consumer electronics. This paper presents a novel technique for accelerating DSP applications on a 16-bit processor by combining two methods: Block Random Access Memory (BRAM) and Distributed Arithmetic (DA). Integrating BRAM as a replacement for conventional RAM minimizes timing and critical route delays, improving processor efficiency and performance. Furthermore, the Distributed Arithmetic approach enhances performance and efficiency by utilizing precomputed lookup tables to expedite multiplication operations within the Arithmetic and Logic Unit (ALU). We use the Xilinx Vivado tool, a robust development environment for FPGA-based systems, for the design process and execute the hardware implementation using the Genesys2 Kintex board. The proposed work produces improved efficiency with a cycle per instruction of 2, where the delay is 2.009 ns, the critical path delay is 8.182 ns, and the power consumption is 4 mW. Full article

► Show Figures

Figure 1

24 pages, 11377 KB

Open AccessArticle

Reversible Quantum-Dot Cellular Automata-Based Arithmetic Logic Unit

by Mohammed Alharbi, Gerard Edwards and Richard Stocker

Nanomaterials 2023, 13(17), 2445; https://doi.org/10.3390/nano13172445 - 29 Aug 2023

Cited by 25 | Viewed by 4309

Abstract

Quantum-dot cellular automata (QCA) are a promising nanoscale computing technology that exploits the quantum mechanical tunneling of electrons between quantum dots in a cell and electrostatic interaction between dots in neighboring cells. QCA can achieve higher speed, lower power, and smaller areas than [...] Read more.

Quantum-dot cellular automata (QCA) are a promising nanoscale computing technology that exploits the quantum mechanical tunneling of electrons between quantum dots in a cell and electrostatic interaction between dots in neighboring cells. QCA can achieve higher speed, lower power, and smaller areas than conventional, complementary metal-oxide semiconductor (CMOS) technology. Developing QCA circuits in a logically and physically reversible manner can provide exceptional reductions in energy dissipation. The main challenge is to maintain reversibility down to the physical level. A crucial component of a computer’s central processing unit (CPU) is the arithmetic logic unit (ALU), which executes multiple logical and arithmetic functions on the data processed by the CPU. Current QCA ALU designs are either irreversible or logically reversible; however, they lack physical reversibility, a crucial requirement to increase energy efficiency. This paper shows a new multilayer design for a QCA ALU that can carry out 16 different operations and is both logically and physically reversible. The design is based on reversible majority gates, which are the key building blocks. We use QCADesigner-E software to simulate and evaluate energy dissipation. The proposed logically and physically reversible QCA ALU offers an improvement of 88.8% in energy efficiency. Compared to the next most efficient 16-operation QCA ALU, this ALU uses 51% fewer QCA cells and 47% less area. Full article

(This article belongs to the Special Issue Study on Quantum Dot and Quantum Dot-Based Device)

► Show Figures

Figure 1

22 pages, 1511 KB

Open AccessArticle

Verilog Design, Synthesis, and Netlisting of IoT-Based Arithmetic Logic and Compression Unit for 32 nm HVT Cells

by Raj Mouli Jujjavarapu and Alwin Poulose

Signals 2022, 3(3), 620-641; https://doi.org/10.3390/signals3030038 - 13 Sep 2022

Cited by 3 | Viewed by 4354

Abstract

Micro-processor designs have become a revolutionary technology almost in every industry. They brought the reality of automation and also electronic gadgets. While trying to improvise these hardware modules to handle heavy computational loads, they have substantially reached a limit in size, power efficiency, [...] Read more.

Micro-processor designs have become a revolutionary technology almost in every industry. They brought the reality of automation and also electronic gadgets. While trying to improvise these hardware modules to handle heavy computational loads, they have substantially reached a limit in size, power efficiency, and similar avenues. Due to these constraints, many manufacturers and corporate entities are trying many ways to optimize these mini beasts. One such approach is to design microprocessors based on the specified operating system. This approach came to the limelight when many companies launched their microprocessors. In this paper, we will look into one method of using an arithmetic logic unit (ALU) module for internet of things (IoT)-enabled devices. A specific set of operations is added to the classical ALU to help fast computational processes in IoT-specific programs. We integrated a compression module and a fast multiplier based on the Vedic algorithm in the 16-bit ALU module. The designed ALU module is also synthesized under a 32-nm HVT cell library from the Synopsys database to generate an overview of the areal efficiency, logic levels, and layout of the designed module; it also gives us a netlist from this database. The synthesis provides a complete overview of how the module will be manufactured if sent to a foundry. Full article

(This article belongs to the Special Issue Advances of Signal Processing for Signal, Image and Video Technology)

► Show Figures

Figure 1

24 pages, 5480 KB

Open AccessArticle

Ternary Arithmetic Logic Unit Design Utilizing Carbon Nanotube Field Effect Transistor (CNTFET) and Resistive Random Access Memory (RRAM)

by Furqan Zahoor, Fawnizu Azmadi Hussin, Farooq Ahmad Khanday, Mohamad Radzi Ahmad and Illani Mohd Nawi

Micromachines 2021, 12(11), 1288; https://doi.org/10.3390/mi12111288 - 21 Oct 2021

Cited by 19 | Viewed by 6131

Abstract

Due to the difficulties associated with scaling of silicon transistors, various technologies beyond binary logic processing are actively being investigated. Ternary logic circuit implementation with carbon nanotube field effect transistors (CNTFETs) and resistive random access memory (RRAM) integration is considered as a possible [...] Read more.

Due to the difficulties associated with scaling of silicon transistors, various technologies beyond binary logic processing are actively being investigated. Ternary logic circuit implementation with carbon nanotube field effect transistors (CNTFETs) and resistive random access memory (RRAM) integration is considered as a possible technology option. CNTFETs are currently being preferred for implementing ternary circuits due to their desirable multiple threshold voltage and geometry-dependent properties, whereas the RRAM is used due to its multilevel cell capability which enables storage of multiple resistance states within a single cell. This article presents the 2-trit arithmetic logic unit (ALU) design using CNTFETs and RRAM as the design elements. The proposed ALU incorporates a transmission gate block, a function select block, and various ternary function processing modules. The ALU design optimization is achieved by introducing a controlled ternary adder–subtractor module instead of separate adder and subtractor circuits. The simulations are analyzed and validated using Synopsis HSPICE simulation software with standard 32 nm CNTFET technology under different operating conditions (supply voltages) to test the robustness of the designs. The simulation results indicate that the proposed CNTFET-RRAM integration enables the compact circuit realization with good robustness. Moreover, due to the addition of RRAM as circuit element, the proposed ALU has the advantage of non-volatility. Full article

(This article belongs to the Special Issue Advances in Resistive Switching Memory Devices)

► Show Figures

Figure 1

14 pages, 473 KB

Open AccessArticle

Efficient Designs of Quantum Adder/Subtractor Using Universal Reversible Gate on IBM Q

by Mohamed Osman and Khaled El-Wazan

Symmetry 2021, 13(10), 1842; https://doi.org/10.3390/sym13101842 - 2 Oct 2021

Cited by 4 | Viewed by 4366

Abstract

Reversible arithmetic and logic unit (ALU) is a necessary part of quantum computing. In this work, we present improved designs of reversible half and full addition and subtraction circuits. The proposed designs are based on a universal one type gate (G gate library). [...] Read more.

Reversible arithmetic and logic unit (ALU) is a necessary part of quantum computing. In this work, we present improved designs of reversible half and full addition and subtraction circuits. The proposed designs are based on a universal one type gate (G gate library). The G gate library can generate all possible permutations of the symmetric group. The presented designs are multi-function circuits that are capable of performing additional logical operations. We achieve a reduction in the quantum cost, gate count, number of constant inputs, and delay with zero garbage, compared to relevant results obtained by others. The experimental results using IBM Quantum Experience (IBM Q) illustrate the success probability of the proposed designs. Full article

► Show Figures

Figure 1

25 pages, 15040 KB

Open AccessArticle

Robust Circuit and System Design for General-Purpose Computational Resistive Memories

by Felipe Pinto and Ioannis Vourkas

Electronics 2021, 10(9), 1074; https://doi.org/10.3390/electronics10091074 - 1 May 2021

Cited by 8 | Viewed by 3674

Abstract

Resistive switching devices (memristors) constitute a promising device technology that has emerged for the development of future energy-efficient general-purpose computational memories. Research has been done both at device and circuit level for the realization of primitive logic operations with memristors. Likewise, important efforts [...] Read more.

Resistive switching devices (memristors) constitute a promising device technology that has emerged for the development of future energy-efficient general-purpose computational memories. Research has been done both at device and circuit level for the realization of primitive logic operations with memristors. Likewise, important efforts are placed on the development of logic synthesis algorithms for resistive RAM (ReRAM)-based computing. However, system-level design of computational memories has not been given significant consideration, and developing arithmetic logic unit (ALU) functionality entirely using ReRAM-based word-wise arithmetic operations remains a challenging task. In this context, we present our results in circuit- and system-level design, towards implementing a ReRAM-based general-purpose computational memory with ALU functionality. We built upon the 1T1R crossbar topology and adopted a logic design style in which all computations are equivalent to modified memory read operations for higher reliability, performed either in a word-wise or bit-wise manner, owing to an enhanced peripheral circuitry. Moreover, we present the concept of a segmented ReRAM architecture with functional and topological features that benefit flexibility of data movement and improve latency of multi-level (sequential) in-memory computations. Robust system functionality is validated via LTspice circuit simulations for an n-bit word-wise binary adder, showing promising performance features compared to other state-of-the-art implementations. Full article

(This article belongs to the Special Issue Computing-in-Memory Devices and Systems)

► Show Figures

Figure 1

18 pages, 798 KB

Open AccessArticle

Fast Approximations of Activation Functions in Deep Neural Networks when using Posit Arithmetic

by Marco Cococcioni, Federico Rossi, Emanuele Ruffaldi and Sergio Saponara

Sensors 2020, 20(5), 1515; https://doi.org/10.3390/s20051515 - 10 Mar 2020

Cited by 30 | Viewed by 6693

Abstract

With increasing real-time constraints being put on the use of Deep Neural Networks (DNNs) by real-time scenarios, there is the need to review information representation. A very challenging path is to employ an encoding that allows a fast processing and hardware-friendly representation of [...] Read more.

With increasing real-time constraints being put on the use of Deep Neural Networks (DNNs) by real-time scenarios, there is the need to review information representation. A very challenging path is to employ an encoding that allows a fast processing and hardware-friendly representation of information. Among the proposed alternatives to the IEEE 754 standard regarding floating point representation of real numbers, the recently introduced Posit format has been theoretically proven to be really promising in satisfying the mentioned requirements. However, with the absence of proper hardware support for this novel type, this evaluation can be conducted only through a software emulation. While waiting for the widespread availability of the Posit Processing Units (the equivalent of the Floating Point Unit (FPU)), we can already exploit the Posit representation and the currently available Arithmetic-Logic Unit (ALU) to speed up DNNs by manipulating the low-level bit string representations of Posits. As a first step, in this paper, we present new arithmetic properties of the Posit number system with a focus on the configuration with 0 exponent bits. In particular, we propose a new class of Posit operators called L1 operators, which consists of fast and approximated versions of existing arithmetic operations or functions (e.g., hyperbolic tangent (TANH) and extended linear unit (ELU)) only using integer arithmetic. These operators introduce very interesting properties and results: (i) faster evaluation than the exact counterpart with a negligible accuracy degradation; (ii) an efficient ALU emulation of a number of Posits operations; and (iii) the possibility to vectorize operations in Posits, using existing ALU vectorized operations (such as the scalable vector extension of ARM CPUs or advanced vector extensions on Intel CPUs). As a second step, we test the proposed activation function on Posit-based DNNs, showing how 16-bit down to 10-bit Posits represent an exact replacement for 32-bit floats while 8-bit Posits could be an interesting alternative to 32-bit floats since their performances are a bit lower but their high speed and low storage properties are very appealing (leading to a lower bandwidth demand and more cache-friendly code). Finally, we point out how small Posits (i.e., up to 14 bits long) are very interesting while PPUs become widespread, since Posit operations can be tabulated in a very efficient way (see details in the text). Full article

(This article belongs to the Special Issue Applications in Electronics Pervading Industry, Environment and Society – Sensing Systems and Pervasive Intelligence)

► Show Figures

Figure 1

15 pages, 334 KB

Open AccessArticle

Forwarding Path Limitation and Instruction Allocation for In-Order Processor with ALU Cascading

by Ryotaro Kobayashi, Anri Suzuki and Hajime Shimada

J. Low Power Electron. Appl. 2017, 7(4), 32; https://doi.org/10.3390/jlpea7040032 - 14 Dec 2017

Viewed by 8252

Abstract

Much research focuses on many-core processors, which possess a vast number of cores. Their area, energy consumption, and performance have a tendency to be proportional to the number of cores. It is better to utilize in-order (IO) execution for better area/energy efficiency. However, [...] Read more.

Much research focuses on many-core processors, which possess a vast number of cores. Their area, energy consumption, and performance have a tendency to be proportional to the number of cores. It is better to utilize in-order (IO) execution for better area/energy efficiency. However, expanding two-way IO to three-way IO offers very little improvement, since data dependency limits the effectiveness. In addition, if the core is changed from IO to out-of-order (OoO) execution to improve Instruction Per Cycle(IPC), area and energy consumption increases significantly. The combination of IO execution and Arithmetic Logic Unit(ALU) cascading is an effective solution to alleviate this problem. However, ALU cascading is implemented by complex bypass circuits because it requires a connection between all outputs and all inputs of all ALUs. The hardware complexity of the bypass circuits increases area, energy consumption, and delay. In this study, we proposed a mechanism that limits the number of the forwarding paths and allocates instructions to ALUs in accordance with the limited paths. This mechanism scales down bypass circuits to reduce the hardware complexity. Our evaluation results show that our proposed mechanism can reduce the area by 38.7%, the energy by 41.1%, and the delay by 23.2% with very little IPC loss on average, as compared with the conventional mechanism. Full article

► Show Figures

Figure 1

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (14)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI