Journal of Low Power Electronics and Applications

18 pages, 413 KB

Open AccessArticle

Time- and Amplitude-Controlled Power Noise Generator against SPA Attacks for FPGA-Based IoT Devices

by Luis Parrilla, Antonio García, Encarnación Castillo, Salvador Rodríguez-Bolívar and Juan Antonio López-Villanueva

J. Low Power Electron. Appl. 2022, 12(3), 48; https://doi.org/10.3390/jlpea12030048 - 10 Sep 2022

Cited by 2 | Viewed by 3399

Abstract

Power noise generation for masking power traces is a powerful countermeasure against Simple Power Analysis (SPA), and it has also been used against Differential Power Analysis (DPA) or Correlation Power Analysis (CPA) in the case of cryptographic circuits. This technique makes use of [...] Read more.

Power noise generation for masking power traces is a powerful countermeasure against Simple Power Analysis (SPA), and it has also been used against Differential Power Analysis (DPA) or Correlation Power Analysis (CPA) in the case of cryptographic circuits. This technique makes use of power consumption generators as basic modules, which are usually based on ring oscillators when implemented on FPGAs. These modules can be used to generate power noise and to also extract digital signatures through the power side channel for Intellectual Property (IP) protection purposes. In this paper, a new power consumption generator, named Xored High Consuming Module (XHCM), is proposed. XHCM improves, when compared to others proposals in the literature, the amount of current consumption per LUT when implemented on FPGAs. Experimental results show that these modules can achieve current increments in the range from 2.4 mA (with only 16 LUTs on Artix-7 devices with a power consumption density of 0.75 mW/LUT when using a single HCM) to 11.1 mA (with 67 LUTs when using 8 XHCMs, with a power consumption density of 0.83 mW/LUT). Moreover, a version controlled by Pulse-Width Modulation (PWM) has been developed, named PWM-XHCM, which is, as XHCM, suitable for power watermarking. In order to build countermeasures against SPA attacks, a multi-level XHCM (ML-XHCM) is also presented, which is capable of generating different power consumption levels with minimal area overhead (27 six-input LUTS for generating 16 different amplitude levels on Artix-7 devices). Finally, a randomized version, named RML-XHCM, has also been developed using two True Random Number Generators (TRNGs) to generate current consumption peaks with random amplitudes at random times. RML-XHCM requires less than 150 LUTs on Artix-7 devices. Taking into account these characteristics, two main contributions have been carried out in this article: first, XHCM and PWM-XHCM provide an efficient power consumption generator for extracting digital signatures through the power side channel, and on the other hand, ML-XHCM and RML-XHCM are powerful tools for the protection of processing units against SPA attacks in IoT devices implemented on FPGAs. Full article

(This article belongs to the Special Issue Low-Power Hardware Security)

► Show Figures

Figure 1

18 pages, 18002 KB

Open AccessArticle

LoRa-Based Wireless Sensors Network for Rockfall and Landslide Monitoring: A Case Study in Pantelleria Island with Portable LoRaWAN Access

by Mattia Ragnoli, Alfiero Leoni, Gianluca Barile, Giuseppe Ferri and Vincenzo Stornelli

J. Low Power Electron. Appl. 2022, 12(3), 47; https://doi.org/10.3390/jlpea12030047 - 7 Sep 2022

Cited by 27 | Viewed by 7862

Abstract

Rockfalls and landslides are hazards triggered from geomorphological and climatic factors other than human interaction. The economic and social impacts are not negligible, therefore the topic has become an important field in the application of remote monitoring. Wireless sensor networks (WSNs) are particularly [...] Read more.

Rockfalls and landslides are hazards triggered from geomorphological and climatic factors other than human interaction. The economic and social impacts are not negligible, therefore the topic has become an important field in the application of remote monitoring. Wireless sensor networks (WSNs) are particularly suited for the deployment of such systems, thanks to the different technologies and topologies that are evolving nowadays. Among these, LoRa modulation technique represents a fitting technical solution for nodes communication in a WSN. In this paper, a smart autonomous LoRa-based rockfall and landslide monitoring system is presented. The structure has been operating in Pantelleria Island, Sicily, Italy. The sensing elements are disposed in sensor nodes arranged in a star topology. Network access to the LoRaWAN and the Internet is provided through gateways using a portable, solar powered device assembly. A system overview concerning both hardware and functionality of the nodes and gateways devices, then a power analysis is reported, and a monthly recorded result is presented, with related discussion. Full article

(This article belongs to the Special Issue Advances in Embedded Artificial Intelligence and Internet-of-Things)

► Show Figures

Figure 1

11 pages, 10992 KB

Open AccessArticle

High-Speed and Energy-Efficient Carry Look-Ahead Adder

by Padmanabhan Balasubramanian and Nikos E. Mastorakis

J. Low Power Electron. Appl. 2022, 12(3), 46; https://doi.org/10.3390/jlpea12030046 - 10 Aug 2022

Cited by 29 | Viewed by 9035

Abstract

The carry look-ahead adder (CLA) is well known among the family of high-speed adders. However, a conventional CLA is not faster than other high-speed adders such as a conditional sum adder (CSA), a carry-select adder (CSLA), and the Kogge–Stone adder (KSA), which is [...] Read more.

The carry look-ahead adder (CLA) is well known among the family of high-speed adders. However, a conventional CLA is not faster than other high-speed adders such as a conditional sum adder (CSA), a carry-select adder (CSLA), and the Kogge–Stone adder (KSA), which is the fastest parallel-prefix adder. Further, in terms of power-delay product (PDP) that characterizes the energy of digital circuits, the conventional CLA is not efficient compared to CSLA and KSA. In this context, this paper presents a high-speed and energy-efficient architecture for the CLA. Many adders ranging from ripple carry to parallel-prefix adders were implemented using a 32-28 nm CMOS standard digital cell library by considering a 32-bit addition. The adders were structurally described in Verilog and synthesized using Synopsys Design Compiler. From the results obtained, it is observed that the proposed CLA achieves a reduction in critical path delay by 55.3% and a reduction in PDP by 45% compared to the conventional CLA. Compared to the CSA, the proposed CLA achieves a reduction in critical path delay by 33.9%, a reduction in power by 26.1%, and a reduction in PDP by 51.1%. Compared to an optimized CSLA, the proposed CLA achieves a reduction in power by 35.4%, a reduction in area by 37.3%, and a reduction in PDP by 37.1% without sacrificing the speed. Although the KSA is faster, the proposed CLA achieves a reduction in power by 39.6%, a reduction in PDP by 6.5%, and a reduction in area by 55.6% in comparison. Full article

► Show Figures

Figure 1

21 pages, 851 KB

Open AccessTutorial

Computer Engineering Education Experiences with RISC-V Architectures—From Computer Architecture to Microcontrollers

by Peter Jamieson, Huan Le, Nathan Martin, Tyler McGrew, Yicheng Qian, Eric Schonauer, Alan Ehret and Michel A. Kinsy

J. Low Power Electron. Appl. 2022, 12(3), 45; https://doi.org/10.3390/jlpea12030045 - 9 Aug 2022

Cited by 7 | Viewed by 5887

Abstract

With the growing popularity of RISC-V and various open-source released RISC-V processors, it is now possible for computer engineers students to explore this simple and relevant architecture, and also, these students can explore and design a microcontroller at a low-level using real tool-flows [...] Read more.

With the growing popularity of RISC-V and various open-source released RISC-V processors, it is now possible for computer engineers students to explore this simple and relevant architecture, and also, these students can explore and design a microcontroller at a low-level using real tool-flows and implement and test their hardware. In this work, we describe our experiences with undergraduate engineers building RISC-V architectures on an FPGA and then extending their experiences to implement an Arduino-like RISC-V tool-flow and the respective hardware and software to handle input-output ports, interrupts, hardware timers, and communication protocols. The microcontroller is implemented on an FPGA as a Senior Design project to test the viability of such efforts. In this work, we will explain how undergraduates can achieve these experiences including preparation for these projects, the tool-flows they use, the challenges in understanding and extending a RISC-V processor with microcontroller functionality, and a suggestion of how to integrate this learning into an existing curriculum, including a discussion on if we should include these deeper experiences in the Computer Engineering undergraduate curriculum. Full article

(This article belongs to the Special Issue RISC-V Architectures and Systems: Hardware and Software Perspectives)

► Show Figures

Figure 1

47 pages, 3468 KB

Open AccessReview

Review on the Basic Circuit Elements and Memristor Interpretation: Analysis, Technology and Applications

by Aliyu Isah and Jean-Marie Bilbault

J. Low Power Electron. Appl. 2022, 12(3), 44; https://doi.org/10.3390/jlpea12030044 - 3 Aug 2022

Cited by 34 | Viewed by 14330

Abstract

Circuit or electronic components are useful elements allowing the realization of different circuit functionalities. The resistor, capacitor and inductor represent the three commonly known basic passive circuit elements owing to their fundamental nature relating them to the four circuit variables, namely voltage, magnetic [...] Read more.

Circuit or electronic components are useful elements allowing the realization of different circuit functionalities. The resistor, capacitor and inductor represent the three commonly known basic passive circuit elements owing to their fundamental nature relating them to the four circuit variables, namely voltage, magnetic flux, current and electric charge. The memory resistor (or memristor) was claimed to be the fourth basic passive circuit element, complementing the resistor, capacitor and inductor. This paper presents a review on the four basic passive circuit elements. After a brief recall on the first three known basic passive circuit elements, a thorough description of the memristor follows. Memristor sparks interest in the scientific community due to its interesting features, for example nano-scalability, memory capability, conductance modulation, connection flexibility and compatibility with CMOS technology, etc. These features among many others are currently in high demand on an industrial scale. For this reason, thousands of memristor-based applications are reported. Hence, the paper presents an in-depth overview of the philosophical argumentations of memristor, technologies and applications. Full article

► Show Figures

Figure 1

13 pages, 2205 KB

Open AccessArticle

A Subthreshold Layout Strategy for Faster and Lower Energy Complex Digital Circuits

by Jordan Morris, Pranay Prabhat, James Myers and Alex Yakovlev

J. Low Power Electron. Appl. 2022, 12(3), 43; https://doi.org/10.3390/jlpea12030043 - 2 Aug 2022

Viewed by 3145

Abstract

This work presents complex circuitry from subthreshold standard cell libraries created by geometric STI spacer patterning for bulk planar CMOS technology nodes. Performance/leakage granularity enhancement affords safer multi-Vt synthesis in aggressive voltage scaling schemes. Libraries are evaluated in silicon through implementation of 32-bit [...] Read more.

This work presents complex circuitry from subthreshold standard cell libraries created by geometric STI spacer patterning for bulk planar CMOS technology nodes. Performance/leakage granularity enhancement affords safer multi-Vt synthesis in aggressive voltage scaling schemes. Libraries are evaluated in silicon through implementation of 32-bit datapath 128-bit AES cores. Intra-die nominal temperature (20 °C) analysis reveals improvements of up to 8.65×/24% MEP-to-MEP in frequency and energy-per-cycle respectively, compared to a state-of-the-art subthreshold library. A negative temperature correlation with performance enhancement is demonstrated extending beyond the cell level and into more complex designs. MEP-to-MEP performance enhancement and energy-per-cycle reduction are demonstrated over a temperature range of 0 °C to 85 °C. Full article

► Show Figures

Figure 1

17 pages, 964 KB

Open AccessArticle

The Benefits and Costs of Netlist Randomization Based Side-Channel Countermeasures: An In-Depth Evaluation

by Ali Asghar, Andreas Becher and Daniel Ziener

J. Low Power Electron. Appl. 2022, 12(3), 42; https://doi.org/10.3390/jlpea12030042 - 23 Jul 2022

Cited by 1 | Viewed by 2660

Abstract

Exchanging FPGA-based implementations of cryptographic algorithms during run-time using netlist randomized versions has been introduced recently as a unique countermeasure against side channel attacks. Using partial reconfiguration, it is possible to shuffle between structurally different but functionally similar versions of a cryptographic implementation. [...] Read more.

Exchanging FPGA-based implementations of cryptographic algorithms during run-time using netlist randomized versions has been introduced recently as a unique countermeasure against side channel attacks. Using partial reconfiguration, it is possible to shuffle between structurally different but functionally similar versions of a cryptographic implementation. The resulting varying power profile enhances the resistance against power-based side channel attacks. While side channel leakage is reduced, costs in terms of additional resources and/or lowered throughput are often increased due to the overheads of the required online partial reconfiguration. In this work, we provide an in-depth evaluation of the leakage-area-throughput trade-off. Full article

(This article belongs to the Special Issue Low-Power Hardware Security)

► Show Figures

Figure 1

19 pages, 5912 KB

Open AccessArticle

Electrical Impedance Tomography for Hand Gesture Recognition for HMI Interaction Applications

by Noelia Vaquero-Gallardo and Herminio Martínez-García

J. Low Power Electron. Appl. 2022, 12(3), 41; https://doi.org/10.3390/jlpea12030041 - 18 Jul 2022

Cited by 8 | Viewed by 4332

Abstract

Electrical impedance tomography (EIT) is based on the physical principle of bioimpedance defined as the opposition that biological tissues exhibit to the flow of a rotating alternating electrical current. Consequently, here, we propose studying the characterization and classification of bioimpedance patterns based on [...] Read more.

Electrical impedance tomography (EIT) is based on the physical principle of bioimpedance defined as the opposition that biological tissues exhibit to the flow of a rotating alternating electrical current. Consequently, here, we propose studying the characterization and classification of bioimpedance patterns based on EIT by measuring, on the forearm with eight electrodes in a non-invasive way, the potential drops resulting from the execution of six hand gestures. The starting point was the acquisition of bioimpedance patterns studied by means of principal component analysis (PCA), validated through the cross-validation technique, and classified using the k-nearest neighbor (kNN) classification algorithm. As a result, it is concluded that reduction and classification is feasible, with a sensitivity of 0.89 in the worst case, for each of the reduced bioimpedance patterns, leading to the following direct advantage: a reduction in the numbers of electrodes and electronics required. In this work, bioimpedance patterns were investigated for monitoring subjects’ mobility, where, generally, these solutions are based on a sensor system with moving parts that suffer from significant problems of wear, lack of adaptability to the patient, and lack of resolution. Whereas, the proposal implemented in this prototype, based on the so-called electrical impedance tomography, does not have these problems. Full article

► Show Figures

Figure 1

18 pages, 500 KB

Open AccessArticle

Dynamic SIMD Parallel Execution on GPU from High-Level Dataflow Synthesis

by Aurelien Bloch, Simone Casale-Brunet and Marco Mattavelli

J. Low Power Electron. Appl. 2022, 12(3), 40; https://doi.org/10.3390/jlpea12030040 - 17 Jul 2022

Viewed by 3150

Abstract

Developing and fine-tuning software programs for heterogeneous hardware such as CPU/GPU processing platforms comprise a highly complex endeavor that demands considerable time and effort of software engineers and requires evaluating various fundamental components and features of both the design and of the platform [...] Read more.

Developing and fine-tuning software programs for heterogeneous hardware such as CPU/GPU processing platforms comprise a highly complex endeavor that demands considerable time and effort of software engineers and requires evaluating various fundamental components and features of both the design and of the platform to maximize the overall performance. The dataflow programming approach has proven to be an appropriate methodology for reaching such a difficult and complex goal for the intrinsic portability and the possibility of easily decomposing a network of actors on different processing units of the heterogeneous hardware. Nonetheless, such a design method might not be enough on its own to achieve the desired performance goals, and supporting tools are useful to be able to efficiently explore the design space so as to optimize the desired performance objectives. This article presents a methodology composed of several stages for enhancing the performance of dataflow software developed in RVC-CAL and generating low-level implementations to be executed on GPU/CPU heterogeneous hardware platforms. The stages are composed of a method for the efficient scheduling of parallel CUDA partitions, an optimization of the performance of the data transmission tasks across computing kernels, and the exploitation of dynamic programming for introducing SIMD-capable graphics processing unit systems. The methodology is validated on both the quantitative and qualitative side by means of dataflow software application examples running on platforms according to various different mapping configurations. Full article

► Show Figures

Figure 1

14 pages, 1937 KB

Open AccessArticle

Efficiency of Priority Queue Architectures in FPGA

by Lukáš Kohútka

J. Low Power Electron. Appl. 2022, 12(3), 39; https://doi.org/10.3390/jlpea12030039 - 14 Jul 2022

Cited by 3 | Viewed by 5160

Abstract

This paper presents a novel SRAM-based architecture of a data structure that represents a set of multiple priority queues that can be implemented in FPGA or ASIC. The proposed architecture is based on shift registers, systolic arrays and SRAM memories. Such architecture, called [...] Read more.

This paper presents a novel SRAM-based architecture of a data structure that represents a set of multiple priority queues that can be implemented in FPGA or ASIC. The proposed architecture is based on shift registers, systolic arrays and SRAM memories. Such architecture, called MultiQueue, is optimized for minimum chip area costs, which leads to lower energy consumption too. The MultiQueue architecture has constant time complexity, constant critical path length and constant latency. Therefore, it is highly predictable and very suitable for real-time systems too. The proposed architecture was verified using a simplified version of UVM and applying millions of instructions with randomly generated input values. Achieved FPGA synthesis results are presented and discussed. These results show significant savings in FPGA Look-Up Tables consumption in comparison to existing solutions. More than 63% of Look-Up Tables can be saved using the MultiQueue architecture instead of the existing priority queues. Full article

(This article belongs to the Special Issue Advanced Researches in Embedded Systems)

► Show Figures

Figure 1

16 pages, 1752 KB

Open AccessArticle

Analysis and Comparison of Different Approaches to Implementing a Network-Based Parallel Data Processing Algorithm

by Iouliia Skliarova

J. Low Power Electron. Appl. 2022, 12(3), 38; https://doi.org/10.3390/jlpea12030038 - 9 Jul 2022

Viewed by 2876

Abstract

It is well known that network-based parallel data processing algorithms are well suited to implementation in reconfigurable hardware recurring to either Field-Programmable Gate Arrays (FPGA) or Programmable Systems-on-Chip (PSoC). The intrinsic parallelism of these devices makes it possible to execute several data-independent network [...] Read more.

It is well known that network-based parallel data processing algorithms are well suited to implementation in reconfigurable hardware recurring to either Field-Programmable Gate Arrays (FPGA) or Programmable Systems-on-Chip (PSoC). The intrinsic parallelism of these devices makes it possible to execute several data-independent network operations in parallel. However, the approaches to designing the respective systems vary significantly with the experience and background of the engineer in charge. In this paper, we analyze and compare the pros and cons of using an embedded processor, high-level synthesis methods, and register-transfer low-level design in terms of design effort, performance, and power consumption for implementing a parallel algorithm to find the two smallest values in a dataset. This problem is easy to formulate, has a number of practical applications (for instance, in low-density parity check decoders), and is very well suited to parallel implementation based on comparator networks. Full article

► Show Figures

Figure 1

26 pages, 1106 KB

Open AccessReview

Deep Learning Approaches to Source Code Analysis for Optimization of Heterogeneous Systems: Recent Results, Challenges and Opportunities

by Francesco Barchi, Emanuele Parisi, Andrea Bartolini and Andrea Acquaviva

J. Low Power Electron. Appl. 2022, 12(3), 37; https://doi.org/10.3390/jlpea12030037 - 5 Jul 2022

Cited by 6 | Viewed by 6885

Abstract

To cope with the increasing complexity of digital systems programming, deep learning techniques have recently been proposed to enhance software deployment by analysing source code for different purposes, ranging from performance and energy improvement to debugging and security assessment. As embedded platforms for [...] Read more.

To cope with the increasing complexity of digital systems programming, deep learning techniques have recently been proposed to enhance software deployment by analysing source code for different purposes, ranging from performance and energy improvement to debugging and security assessment. As embedded platforms for cyber-physical systems are characterised by increasing heterogeneity and parallelism, one of the most challenging and specific problems is efficiently allocating computational kernels to available hardware resources. In this field, deep learning applied to source code can be a key enabler to face this complexity. However, due to the rapid development of such techniques, it is not easy to understand which of those are suitable and most promising for this class of systems. For this purpose, we discuss recent developments in deep learning for source code analysis, and focus on techniques for kernel mapping on heterogeneous platforms, highlighting recent results, challenges and opportunities for their applications to cyber-physical systems. Full article

(This article belongs to the Special Issue Advances in Programming Parallel and Heterogeneous Computing for Cyber-Physical Systems)

► Show Figures

Figure 1

21 pages, 764 KB

Open AccessArticle

Performance Estimation of High-Level Dataflow Program on Heterogeneous Platforms by Dynamic Network Execution

by Aurelien Bloch, Simone Casale-Brunet and Marco Mattavelli

J. Low Power Electron. Appl. 2022, 12(3), 36; https://doi.org/10.3390/jlpea12030036 - 23 Jun 2022

Cited by 1 | Viewed by 2994

Abstract

The performance of programs executed on heterogeneous parallel platforms largely depends on the design choices regarding how to partition the processing on the various different processing units. In other words, it depends on the assumptions and parameters that define the partitioning, mapping, scheduling, [...] Read more.

The performance of programs executed on heterogeneous parallel platforms largely depends on the design choices regarding how to partition the processing on the various different processing units. In other words, it depends on the assumptions and parameters that define the partitioning, mapping, scheduling, and allocation of data exchanges among the various processing elements of the platform executing the program. The advantage of programs written in languages using the dataflow model of computation (MoC) is that executing the program with different configurations and parameter settings does not require rewriting the application software for each configuration setting, but only requires generating a new synthesis of the execution code corresponding to different parameters. The synthesis stage of dataflow programs is usually supported by automatic code generation tools. Another competitive advantage of dataflow software methodologies is that they are well-suited to support designs on heterogeneous parallel systems as they are inherently free of memory access contention issues and naturally expose the available intrinsic parallelism. So as to fully exploit these advantages and to be able to efficiently search the configuration space to find the design points that better satisfy the desired design constraints, it is necessary to develop tools and associated methodologies capable of evaluating the performance of different configurations and to drive the search for good design configurations, according to the desired performance criteria. The number of possible design assumptions and associated parameter settings is usually so large (i.e., the dimensions and size of the design space) that intuition as well as trial and error are clearly unfeasible, inefficient approaches. This paper describes a method for the clock-accurate profiling of software applications developed using the dataflow programming paradigm such as the formal RVL-CAL language. The profiling can be applied when the application program has been compiled and executed on GPU/CPU heterogeneous hardware platforms utilizing two main methodologies, denoted as static and dynamic. This paper also describes how a method for the qualitative evaluation of the performance of such programs as a function of the supplied configuration parameters can be successfully applied to heterogeneous platforms. The technique was illustrated using two different application software examples and several design points. Full article

► Show Figures

Figure 1

12 pages, 1968 KB

Open AccessArticle

±0.3V Bulk-Driven Fully Differential Buffer with High Figures of Merit

by Manaswini Gangineni, Jaime Ramirez-Angulo, Héctor Vázquez-Leal, Jesús Huerta-Chua, Antonio J. Lopez-Martin and Ramon Gonzalez Carvajal

J. Low Power Electron. Appl. 2022, 12(3), 35; https://doi.org/10.3390/jlpea12030035 - 22 Jun 2022

Cited by 7 | Viewed by 4288

Abstract

A high performance bulk-driven rail-to-rail fully differential buffer operating from ±0.3V supplies in 180 nm CMOS technology is reported. It has a differential–difference input stage and common mode feedback circuits implemented with no-tail, high CMRR bulk-driven pseudo-differential cells. It operates in subthreshold, has [...] Read more.

A high performance bulk-driven rail-to-rail fully differential buffer operating from ±0.3V supplies in 180 nm CMOS technology is reported. It has a differential–difference input stage and common mode feedback circuits implemented with no-tail, high CMRR bulk-driven pseudo-differential cells. It operates in subthreshold, has infinite input impedance, low output impedance (1.4 kΩ), 86.77 dB DC open-loop gain, 172.91 kHz bandwidth and 0.684 μW static power dissipation with a 50-pF load capacitance. The buffer has power efficient class AB operation, a small signal figure of merit FOM_SS = 12.69 MHzpFμW⁻¹, a large signal figure of merit FOM_LS = 34.89 (V/μs) pFμW⁻¹, CMRR = 102 dB, PSRR+ = 109 dB, PSRR− = 100 dB, 1.1 μV/√Hz input noise spectral density, 0.3 mVrms input noise and 3.5 mV input DC offset voltage. Full article

► Show Figures

Figure 1

Journal Menu

Journal Browser

J. Low Power Electron. Appl., Volume 12, Issue 3 (September 2022) – 14 articles

Further Information

Guidelines

MDPI Initiatives

Follow MDPI