MDPI - Publisher of Open Access Journals

22 pages, 1556 KB

Open AccessArticle

Hardware Accelerator Design for MUSIC-DOA Estimation with Bilateral Jacobi Optimization

by Yafan Gao, Weijiang Wang, Chengbo Xue, Shiwei Ren, Kuanhao Liu and Xiangnan Li

Electronics 2026, 15(10), 1982; https://doi.org/10.3390/electronics15101982 - 7 May 2026

Viewed by 264

Real-time Direction of Arrival (DOA) estimation demands high computational throughput and numerical precision. Consequently, dedicated hardware accelerators are essential. This paper presents an architecture to accelerate the MUSIC algorithm using an improved complex bilateral Jacobi eigenvalue decomposition (EVD). First, we design a triangular [...] Read more.

Real-time Direction of Arrival (DOA) estimation demands high computational throughput and numerical precision. Consequently, dedicated hardware accelerators are essential. This paper presents an architecture to accelerate the MUSIC algorithm using an improved complex bilateral Jacobi eigenvalue decomposition (EVD). First, we design a triangular systolic array for Hermitian matrices. It employs an output-stationary dataflow to enable efficient parallel covariance computation. Second, we propose an enhanced EVD algorithm. It replaces CORDIC approximations with direct analytical rotations. This significantly improves numerical stability and accuracy. Third, we introduce hardware optimizations. These include unit reuse, integrated termination conditions, and pre-stored steering vectors. These measures reduce resource consumption while maintaining full functionality. Experiments on a Xilinx Virtex-6 platform validate the design. The architecture achieves a root mean square error (RMSE) below

0.24 °

with 300 snapshots. Processing latency is only 76.17 µs. The design utilizes 10,775 LUTs and 73 DSP slices. This work balances accuracy, speed, and efficiency. It offers a practical solution for real-time, high-precision DOA systems. Full article

(This article belongs to the Special Issue New Advances of FPGAs in Signal Processing)

► Show Figures

Figure 1

19 pages, 352 KB

Open AccessArticle

Enhancing Polynomial Multiplication in Post-Quantum Cryptography for IoT Applications: A Hybrid Serial–Parallel Systolic Architecture

by Atef Ibrahim and Fayez Gebali

Computers 2026, 15(4), 224; https://doi.org/10.3390/computers15040224 - 3 Apr 2026

Viewed by 574

Abstract

The rapid growth of the Internet of Things (IoT) is fundamentally altering industrial and economic landscapes by embedding smart, connected devices into everyday operations. Despite these benefits, significant concerns regarding data protection and user privacy continue to obstruct the widespread use of these [...] Read more.

The rapid growth of the Internet of Things (IoT) is fundamentally altering industrial and economic landscapes by embedding smart, connected devices into everyday operations. Despite these benefits, significant concerns regarding data protection and user privacy continue to obstruct the widespread use of these technologies, particularly with the looming threat of quantum computing. Implementing post-quantum cryptographic (PQC) solutions is vital for addressing these risks, yet the limited resources found in IoT edge devices present major deployment challenges. Lattice-based cryptography has become a leading solution to these problems, largely because it depends on efficient polynomial multiplication. Enhancing the execution of this mathematical operation is crucial for improving the overall performance of PQC protocols. In this work, we introduce a hybrid serial–parallel systolic architecture specifically engineered for polynomial multiplication within the Binary Ring Learning With Errors (BRLWE) scheme. Designed for the security processors used in IoT hardware, this architecture significantly increases processing speeds while minimizing the use of hardware resources and reducing energy consumption. Such improvements are critical for establishing a secure IoT infrastructure that is resilient against quantum-era attacks and capable of supporting industrial expansion. Moreover, this research aligns with global Sustainable Development Goals (SDGs) 8 and 9 by building trust in innovative systems and fostering a more secure, sustainable, and productive digital economy. Full article

► Show Figures

Figure 1

17 pages, 340 KB

Open AccessArticle

Efficient Serial Systolic Polynomial Multiplier for Lattice-Based Post-Quantum Cryptographic Schemes in IoT Edge Node

by Atef Ibrahim and Fayez Gebali

Network 2026, 6(2), 21; https://doi.org/10.3390/network6020021 - 1 Apr 2026

Viewed by 386

Abstract

The rapid development of the Internet of Things (IoT) is transforming various economic and industrial sectors by embedding interconnected devices within their operational processes. However, security and privacy risks associated with these interconnected devices pose significant barriers to widespread adoption, particularly in light [...] Read more.

The rapid development of the Internet of Things (IoT) is transforming various economic and industrial sectors by embedding interconnected devices within their operational processes. However, security and privacy risks associated with these interconnected devices pose significant barriers to widespread adoption, particularly in light of potential quantum threats. To mitigate these challenges, it is imperative to employ post-quantum cryptographic schemes. However, essential constraints on IoT edge nodes complicate the effective implementation of such schemes. Among the most promising approaches in post-quantum cryptography are lattice-based schemes, which rely heavily on polynomial multiplication operations at their core. Improving the implementation of polynomial multiplication will significantly enhance the performance of these schemes. Therefore, this paper proposes an efficent low-complexity serial systolic array optimized for polynomial multiplication, particularly tailored for the Binary Ring Learning With Errors (BRLWE) scheme. Designed for cryptographic processors targeting capable IoT edge nodes, the proposed architecture demonstrates remarkable performance improvements, achieving a maximum operating frequency of 280 MHz for a field size of 256, while requiring only 8232 lookup tables (LUTs) and 2616 flip-flops (FFs). These results reflect a 16.8% reduction in LUT usage and a 19% reduction in FFs compared to the nearest competing designs, all while maintaining high throughput and low area utilization. This work significantly advances the establishment of secure and efficient infrastructure for IoT systems, bolstering their resilience against post-quantum attacks and supporting the growth of a robust digital economy. Furthermore, it aligns with sustainable development goals 8 and 9 by fostering trust and facilitating the adoption of cutting-edge IoT technologies, ultimately promoting more resilient and innovative economic activities. Full article

(This article belongs to the Special Issue Cybersecurity and Privacy in Internet-of-Things: Advances, Challenges, and Emerging Trends)

► Show Figures

Figure 1

27 pages, 1058 KB

Open AccessArticle

Ordered Eigenvalue Decomposition Implementation on Systolic Arrays via Virtual Rewiring

by Chengqian Tang, Yaxuan Lu, Bowen Liang, Yunhe Cao and Mengmeng Han

Electronics 2026, 15(5), 941; https://doi.org/10.3390/electronics15050941 - 25 Feb 2026

Viewed by 527

Abstract

Eigenvalue Decomposition (EVD) is a fundamental operation in real-time signal processing, yet obtaining sorted outputs from systolic arrays remains a persistent engineering challenge. The conventional Brent–Luk architecture relies on external sorting networks to reorder eigenvalues. Attempts to achieve in-place sorting via angle adjustment [...] Read more.

Eigenvalue Decomposition (EVD) is a fundamental operation in real-time signal processing, yet obtaining sorted outputs from systolic arrays remains a persistent engineering challenge. The conventional Brent–Luk architecture relies on external sorting networks to reorder eigenvalues. Attempts to achieve in-place sorting via angle adjustment fail due to “topological mismatch,” a conflict between implicit data permutation from large-angle Givens rotations and fixed hardware routing. To address this, we propose a virtual rewiring mechanism. By exploiting the inherent half-cycle reversal pattern of Round-Robin scheduling, we derive a correction algorithm requiring only sign-bit operations. This achieves automatic descending-order arrangement without modifying physical interconnects. Field-Programmable Gate Array (FPGA) experiments demonstrate that the proposed scheme requires negligible additional resources (0.8% Look-Up Tables (LUTs)) while reducing sorting-related logic by 91%. Furthermore, sorting is achieved entirely within the existing computational pipeline, resulting in zero additional hardware latency per sweep. Full article

(This article belongs to the Special Issue New Advances of FPGAs in Signal Processing)

► Show Figures

Figure 1

18 pages, 47766 KB

Open AccessArticle

Scalable AI + DSP Compute Frameworks Using AMD Xilinx RF-SoC ZCU/VCU Platforms for Wireless Testbeds for Scientific, Commercial, Space, and Defense Applications

by Buddhipriya Gayanath, Gayani Rathnasekara, Kasun Karunanayake and Arjuna Madanayake

Electronics 2026, 15(2), 445; https://doi.org/10.3390/electronics15020445 - 20 Jan 2026

Viewed by 1334

Abstract

This paper describes recent engineering designs that allow full-duplex SerDes connectivity between a number of cascaded Xilinx radio frequency system-on-chip (RF-SoC) and VCU FPGA systems. The design allows for unlimited scalability with all-to-all connectivity across FPGA systems and RF-SoCs that allow for bidirectional [...] Read more.

This paper describes recent engineering designs that allow full-duplex SerDes connectivity between a number of cascaded Xilinx radio frequency system-on-chip (RF-SoC) and VCU FPGA systems. The design allows for unlimited scalability with all-to-all connectivity across FPGA systems and RF-SoCs that allow for bidirectional data transport in streaming mode at a capacity of 50 Gbps per ADC-DAC channel. A custom massively parallel systolic-array architecture supporting 8 parallel data streams from time-interleaved ADC/DACs allow real-time matrix–vector-multiplication (MVM). The MVM can be 8 × 8, 8 × 16, …, 8 × 1024 in supported matrix size, and is demonstrated in real time sustained throughput of 1 TeraMAC/second, for matrix size 8 × 512. The MVM is the building block supporting machine learning and filtering, with the computational graph split across FPGA systems using the SerDes connections. The RF data processed by the FPGA chain can be further utilized for higher-level AI workloads on an NVIDIA DGX Spark platform connected to the system. We demonstrate two platforms in which ZCU111 and ZCU1285 RF-SoC boards perform direct-RF data acquisition, while compute engines operating in real time on VCU128 and VCU129 FPGA boards showcase both digital beamforming and polyphase FIR filterbanking in a real-time bandwidth of 1.0 GHz. Full article

(This article belongs to the Special Issue Emerging Applications of FPGAs and Reconfigurable Computing System)

► Show Figures

Figure 1

16 pages, 998 KB

Open AccessArticle

Architecture Design of a Convolutional Neural Network Accelerator for Heterogeneous Computing Based on a Fused Systolic Array

by Yang Zong, Zhenhao Ma, Jian Ren, Yu Cao, Meng Li and Bin Liu

Sensors 2026, 26(2), 628; https://doi.org/10.3390/s26020628 - 16 Jan 2026

Viewed by 784

Abstract

Convolutional Neural Networks (CNNs) generally suffer from excessive computational overhead, high resource consumption, and complex network structures, which severely restrict the deployment on microprocessor chips. Existing related accelerators only have an energy efficiency ratio of 2.32–6.5925 GOPs/W, making it difficult to meet the [...] Read more.

Convolutional Neural Networks (CNNs) generally suffer from excessive computational overhead, high resource consumption, and complex network structures, which severely restrict the deployment on microprocessor chips. Existing related accelerators only have an energy efficiency ratio of 2.32–6.5925 GOPs/W, making it difficult to meet the low-power requirements of embedded application scenarios. To address these issues, this paper proposes a low-power and high-energy-efficiency CNN accelerator architecture based on a central processing unit (CPU) and an Application-Specific Integrated Circuit (ASIC) heterogeneous computing architecture, adopting an operator-fused systolic array algorithm with the YOLOv5n target detection network as the application benchmark. It integrates a 2D systolic array with Conv-BN fusion technology to achieve deep operator fusion of convolution, batch normalization and activation functions; optimizes the RISC-V core to reduce resource usage; and adopts a locking mechanism and a prefetching strategy for the asynchronous platform to ensure operational stability. Experiments on the Nexys Video development board show that the architecture achieves 20.6 GFLOPs of computational performance, 1.96 W of power consumption, and 10.46 GOPs/W of energy efficiency ratio, which is 58–350% higher than existing mainstream accelerators, thus demonstrating excellent potential for embedded deployment. Full article

(This article belongs to the Section Intelligent Sensors)

► Show Figures

Figure 1

28 pages, 1828 KB

Open AccessArticle

Edge Detection on a 2D-Mesh NoC with Systolic Arrays: From FPGA Validation to GDSII Proof-of-Concept

by Emma Mascorro-Guardado, Susana Ortega-Cisneros, Francisco Javier Ibarra-Villegas, Jorge Rivera, Héctor Emmanuel Muñoz-Zapata and Emilio Isaac Baungarten-Leon

Appl. Sci. 2026, 16(2), 702; https://doi.org/10.3390/app16020702 - 9 Jan 2026

Cited by 1 | Viewed by 756

Abstract

Edge detection is a key building block in real-time image-processing applications such as drone-based infrastructure inspection, autonomous navigation, and remote sensing. However, its computational cost remains a challenge for resource-constrained embedded systems. This work presents a hardware-accelerated edge detection architecture based on a [...] Read more.

Edge detection is a key building block in real-time image-processing applications such as drone-based infrastructure inspection, autonomous navigation, and remote sensing. However, its computational cost remains a challenge for resource-constrained embedded systems. This work presents a hardware-accelerated edge detection architecture based on a homogeneous 2D-mesh Network-on-Chip (NoC) integrating systolic arrays to efficiently perform the convolution operations required by the Sobel filter. The proposed architecture was first developed and validated as a 3 × 3 mesh prototype on FPGA (Xilinx Zynq-7000, Zynq-7010, XC7Z010-CLG400A, Zybo board, utilizing 26,112 LUTs, 24,851 flip-flops, and 162 DSP blocks), achieving a throughput of 8.8 Gb/s with a power consumption of 0.79 W at 100 MHz. Building upon this validated prototype, a reduced 2 × 2 node cluster with 14-bit word width was subsequently synthesized at the physical level as a proof-of-concept using the OpenLane RTL-to-GDSII open-source flow targeting the SkyWater 130 nm PDK (sky130A). Post-layout analysis confirms the manufacturability of the design, with a total power consumption of 378 mW and compliance with timing constraints, demonstrating the feasibility of mapping the proposed architecture to silicon and its suitability for drone-based infrastructure monitoring applications. Full article

(This article belongs to the Special Issue Advanced Integrated Circuit Design and Applications)

► Show Figures

Figure 1

12 pages, 4871 KB

Open AccessArticle

A Hybrid Scale-Up and Scale-Out Approach for Performance and Energy Efficiency Optimization in Systolic Array Accelerators

by Hao Sun, Junzhong Shen, Changwu Zhang and Hengzhu Liu

Micromachines 2025, 16(3), 336; https://doi.org/10.3390/mi16030336 - 14 Mar 2025

Cited by 2 | Viewed by 2884

Abstract

The rapid development of deep neural networks (DNNs), such as convolutional neural networks and transformer-based large language models, has significantly advanced AI applications. However, these advances have introduced substantial computational and data demands, presenting challenges for the development of systolic array accelerators, which [...] Read more.

The rapid development of deep neural networks (DNNs), such as convolutional neural networks and transformer-based large language models, has significantly advanced AI applications. However, these advances have introduced substantial computational and data demands, presenting challenges for the development of systolic array accelerators, which excel in tensor operations. Systolic array accelerators are typically developed using two approaches: scale-up, which increases the size of a single array, and scale-out, which involves multiple parallel arrays of fixed size. Scale-up achieves high performance in large-scale matrix multiplications, while scale-out offers better energy efficiency for lower-dimensional matrix multiplications. However, neither approach can simultaneously maintain both high performance and high energy efficiency across the full spectrum of DNN tasks. In this work, we propose a hybrid approach that integrates scale-up and scale-out techniques. We use mapping space exploration in a multi-tenant application environment to assign DNN operations to specific systolic array modules, thereby optimizing performance and energy efficiency. Experiments show that our proposed hybrid systolic array accelerator reduces energy consumption by up to 8% on average and improves throughput by up to 57% on average, compared to TPUv3 across various DNN models. Full article

► Show Figures

Figure 1

28 pages, 422 KB

Open AccessArticle

Enhancing Security and Efficiency in IoT Assistive Technologies: A Novel Hybrid Systolic Array Multiplier for Cryptographic Algorithms

by Atef Ibrahim and Fayez Gebali

Appl. Sci. 2025, 15(5), 2660; https://doi.org/10.3390/app15052660 - 1 Mar 2025

Cited by 4 | Viewed by 1363

Abstract

The incorporation of Internet of Things (IoT) edge nodes into assistive technologies greatly improves the daily lives of individuals with disabilities by facilitating real-time data processing and seamless connectivity. However, the increasing adoption of IoT edge devices intended for individuals with disabilities presents [...] Read more.

The incorporation of Internet of Things (IoT) edge nodes into assistive technologies greatly improves the daily lives of individuals with disabilities by facilitating real-time data processing and seamless connectivity. However, the increasing adoption of IoT edge devices intended for individuals with disabilities presents significant security challenges, particularly concerning the safeguarding of sensitive data and the heightened risk of cyber vulnerabilities. To effectively mitigate these risks, advanced cryptographic protocols, including those based on elliptic curve cryptography, have been proposed to establish robust security measures. While these protocols are effective in reducing the risk of data exposure, they often demand considerable computational resources, which poses challenges for cost-effective IoT devices. Therefore, it is essential to prioritize the effective execution of cryptographic algorithms, as they rely on finite field operations such as multiplication, inversion, and division. Among these computations, field multiplication is particularly critical, serving as the backbone for the other operations. This study intends to create an innovative hybrid systolic array design for the Dickson basis multiplier, which integrates both serial and parallel inputs to enhance overall performance. The proposed design is anticipated to significantly reduce space and power consumption, thereby enabling the secure execution of complex cryptographic algorithms on resource-limited IoT devices designed for disabled people. By addressing these pressing security issues, the study aspires to fully leverage IoT technologies to enhance the living standards of individuals with disabilities, while ensuring that their privacy and security are meticulously maintained. Full article

(This article belongs to the Special Issue Recent Advances in the Internet of Things (IoT): Architecture, Protocols and Security, 2nd Edition)

► Show Figures

Figure 1

30 pages, 526 KB

Open AccessArticle

Optimizing Security of Radio Frequency Identification Systems in Assistive Devices: A Novel Unidirectional Systolic Design for Dickson-Based Field Multiplier

by Atef Ibrahim and Fayez Gebali

Systems 2025, 13(3), 154; https://doi.org/10.3390/systems13030154 - 25 Feb 2025

Cited by 3 | Viewed by 1242

Abstract

The emergence of the Internet of Things (IoT) technologies has greatly enhanced the lives of individuals with disabilities by leveraging radio frequency identification (RFID) systems to improve autonomy and access to essential services. However, these advancements also pose significant security risks, particularly through [...] Read more.

The emergence of the Internet of Things (IoT) technologies has greatly enhanced the lives of individuals with disabilities by leveraging radio frequency identification (RFID) systems to improve autonomy and access to essential services. However, these advancements also pose significant security risks, particularly through side-channel attacks that exploit weaknesses in the design and operation of RFID tags and readers, potentially jeopardizing sensitive information. To combat these threats, several solutions have been proposed, including advanced cryptographic protocols built on cryptographic algorithms such as elliptic curve cryptography. While these protocols offer strong protection and help minimize data leakage, they often require substantial computational resources, making them impractical for low-cost RFID tags. Therefore, it is essential to focus on the efficient implementation of cryptographic algorithms, which are fundamental to most encryption systems. Cryptographic algorithms primarily depend on various finite field operations, including field multiplication, field inversion, and field division. Among these operations, field multiplication is especially crucial, as it forms the foundation for executing other field operations, making it vital for the overall performance and security of the cryptographic framework. The method of implementing field multiplication operation significantly influences the system’s resilience against side-channel attacks; for instance, implementation using unidirectional systolic array structures can provide enhanced error detection capabilities, improving resistance to side-channel attacks compared to traditional bidirectional multipliers. Therefore, this research aims to develop a novel unidirectional systolic array structure for the Dickson basis multiplier, which is anticipated to achieve lower space and power consumption, facilitating the efficient and secure implementation of computationally intensive cryptographic algorithms in RFID systems with limited resources. This advancement is crucial as RFID technology becomes increasingly integrated into various IoT applications for individuals with disabilities, including secure identification and access control. Full article

(This article belongs to the Special Issue Cybersecurity and Secure Information Systems: Challenges and Solutions in Digital Environment)

► Show Figures

Figure 1

21 pages, 2394 KB

Open AccessEditor’s ChoiceArticle

AFHRE: An Accurate and Fast Hardware Resources Estimation Method for Convolutional Accelerator with Systolic Array Structure on FPGA

by Yongchang Wang, Hongzhi Zhao and Jinyao Zhao

Electronics 2025, 14(1), 168; https://doi.org/10.3390/electronics14010168 - 3 Jan 2025

Viewed by 1810

Abstract

FPGA-based convolutional accelerators have been widely used in image recognition scenarios. Many convolutional accelerators utilize the systolic array structure to enhance parallelism. Developing a method to efficiently estimate the utilized hardware resources of an FPGA for such a structure would be helpful in [...] Read more.

FPGA-based convolutional accelerators have been widely used in image recognition scenarios. Many convolutional accelerators utilize the systolic array structure to enhance parallelism. Developing a method to efficiently estimate the utilized hardware resources of an FPGA for such a structure would be helpful in improving the speed of achieving an optimal systolic array structure with the best performance on a given FPGA device. Currently, most estimations of work have either focused on the evaluation of hardware resources for general structures or have not adequately assessed hardware resources specifically for systolic arrays. To reduce estimation latency, this paper proposes an Accurate and Fast Hardware Resources Estimation method (AFHRE) that addresses these shortcomings by analyzing the structure of systolic arrays and utilizing mathematical formulas to describe their characteristics. Experiments show that the DSP resource occupancy estimated by AFHRE is fully consistent with that by Vivado HLS. The error rates of other three types of hardware resources (BRAM, LUT, and FF) are within

11 %

. In addition, the speed of resource estimation using this method is

40 X

to

610 X

faster than that of Vivado HLS. AFHRE can serve as a preprocessing step for Vivado HLS, achieving some optimal or sub-optimal solutions systolic array parameters much faster than original simulation manners of Vivado HLS. Full article

(This article belongs to the Special Issue FPGA-Based Reconfigurable Embedded Systems)

► Show Figures

Figure 1

18 pages, 3376 KB

Open AccessFeature PaperArticle

Heterogeneous Edge Computing for Molecular Property Prediction with Graph Convolutional Networks

by Mahdieh Grailoo and Jose Nunez-Yanez

Electronics 2025, 14(1), 101; https://doi.org/10.3390/electronics14010101 - 30 Dec 2024

Cited by 3 | Viewed by 2089

Abstract

Graph-based neural networks have proven to be useful in molecular property prediction, a critical component of computer-aided drug discovery. In this application, in response to the growing demand for improved computational efficiency and localized edge processing, this paper introduces a novel approach that [...] Read more.

Graph-based neural networks have proven to be useful in molecular property prediction, a critical component of computer-aided drug discovery. In this application, in response to the growing demand for improved computational efficiency and localized edge processing, this paper introduces a novel approach that leverages specialized accelerators on a heterogeneous edge computing platform. Our focus is on graph convolutional networks, a leading graph-based neural network variant that integrates graph convolution layers with multi-layer perceptrons. Molecular graphs are typically characterized by a low number of nodes, leading to low-dimensional dense matrix multiplications within multi-layer perceptrons—conditions that are particularly well-suited for Edge TPUs. These TPUs feature a systolic array of multiply–accumulate units optimized for dense matrix operations. Furthermore, the inherent sparsity in molecular graph adjacency matrices offers additional opportunities for computational optimization. To capitalize on this, we developed an FPGA GFADES accelerator, using high-level synthesis, specifically tailored to efficiently manage the sparsity in both the graph structure and node features. Our hardware/software co-designed GCN+MLP architecture delivers performance improvements, achieving up to

58 \times

increased speed compared to conventional software implementations. This architecture is implemented using the Pynq framework and TensorFlow Lite Runtime, running on a multi-core ARM CPU within an AMD/Xilinx Zynq Ultrascale+ device, in combination with the Edge TPU and programmable logic. Full article

(This article belongs to the Special Issue Machine Learning in Electronic and Biomedical Engineering, 3rd Edition)

► Show Figures

Figure 1

15 pages, 2101 KB

Open AccessArticle

Scalable Transformer Accelerator with Variable Systolic Array for Multiple Models in Voice Assistant Applications

by Seok-Woo Chang and Dong-Sun Kim

Electronics 2024, 13(23), 4683; https://doi.org/10.3390/electronics13234683 - 27 Nov 2024

Cited by 4 | Viewed by 4724

Abstract

Transformer model is a type of deep learning model that has quickly become fundamental in natural language processing (NLP) and other machine learning tasks. Transformer hardware accelerators are usually designed for specific models, such as Bidirectional Encoder Representations from Transformers (BERT), and vision [...] Read more.

Transformer model is a type of deep learning model that has quickly become fundamental in natural language processing (NLP) and other machine learning tasks. Transformer hardware accelerators are usually designed for specific models, such as Bidirectional Encoder Representations from Transformers (BERT), and vision Transformer models, like the ViT. In this study, we propose a Scalable Transformer Accelerator Unit (STAU) for multiple models, enabling efficient handling of various Transformer models used in voice assistant applications. Variable Systolic Array (VSA) centralized design, along with control and data preprocessing in embedded processors, enables matrix operations of varying sizes. In addition, we propose an efficient variable structure and a row-wise data input method for natural language processing where the word count changes. The proposed scalable Transformer accelerator accelerates text summarization, audio processing, image search, and generative AI used in voice assistance. Full article

(This article belongs to the Topic Theory and Applications of High Performance Computing)

► Show Figures

Figure 1

12 pages, 3631 KB

Open AccessArticle

Fiber Bragg Grating Pulse and Systolic Blood Pressure Measurement System Based on Mach–Zehnder Interferometer

by Yuanjun Li, Bo Wang, Shanren Liu, Mengmeng Gao, Qianhua Li, Chao Chen, Qi Guo and Yongsen Yu

Sensors 2024, 24(19), 6222; https://doi.org/10.3390/s24196222 - 26 Sep 2024

Cited by 2 | Viewed by 2906

Abstract

A fiber Bragg grating (FBG) pulse and systolic blood pressure (SBP) measurement system based on the edge-filtering method is proposed. The edge filter is the Mach–Zehnder interferometer (MZI) fabricated by two fiber couplers with a linear slope of 52.45 dBm/nm. The developed system [...] Read more.

A fiber Bragg grating (FBG) pulse and systolic blood pressure (SBP) measurement system based on the edge-filtering method is proposed. The edge filter is the Mach–Zehnder interferometer (MZI) fabricated by two fiber couplers with a linear slope of 52.45 dBm/nm. The developed system consists of a broadband light source, an edge filter, fiber Bragg gratings (FBGs), a coarse wavelength-division multiplexer (CWDM), and signal-processing circuits based on a field-programmable gate array (FPGA). It can simultaneously measure pulse pulsations of the radial artery in the wrist at three positions: Cun, Guan and Chi. The SBP can be calculated based on the pulse transit time (PTT) principle. The measurement results compared to a standard blood pressure monitor showed the mean absolute error (MAE) and standard deviation (STD) of the SBP were 0.93 ± 3.13 mmHg. The system meets the requirements of the Association for the Advancement of Medical Instrumentation (AAMI) equipment standards. The proposed system can achieve continuous real-time measurement of pulse and SBP and has the advantages of fast detection speed, stable performance, and no compression sensation for subjects. The system has important application value in the fields of human health monitoring and medical device development. Full article

(This article belongs to the Section Optical Sensors)

► Show Figures

Figure 1

23 pages, 16139 KB

Open AccessArticle

Bioarchitectonic Nanophotonics by Replication and Systolic Miniaturization of Natural Forms

by Konstantina Papachristopoulou and Nikolaos A. Vainos

Biomimetics 2024, 9(8), 487; https://doi.org/10.3390/biomimetics9080487 - 13 Aug 2024

Viewed by 5051

Abstract

The mimesis of biological mechanisms by artificial devices constitutes the modern, rapidly expanding, multidisciplinary biomimetics sector. In the broader bioinspiration perspective, however, bioarchitectures may perform independent functions without necessarily mimicking their biological generators. In this paper, we explore such Bioarchitectonic notions and demonstrate [...] Read more.

The mimesis of biological mechanisms by artificial devices constitutes the modern, rapidly expanding, multidisciplinary biomimetics sector. In the broader bioinspiration perspective, however, bioarchitectures may perform independent functions without necessarily mimicking their biological generators. In this paper, we explore such Bioarchitectonic notions and demonstrate three-dimensional photonics by the exact replication of insect organs using ultra-porous silica aerogels. The subsequent conformal systolic transformation yields their miniaturized affine ‘clones’ having higher mass density and refractive index. Focusing on the paradigms of ommatidia, the compound eye of the hornet Vespa crabro flavofasciata and the microtrichia of the scarab Protaetia cuprea phoebe, we fabricate their aerogel replicas and derivative clones and investigate their photonic functionalities. Ultralight aerogel microlens arrays are proven to be functional photonic devices having a focal length f ~ 1000 μm and f-number f/30 in the visible spectrum. Stepwise systolic transformation yields denser and affine functional elements, ultimately fused silica clones, exhibiting strong focusing properties due to their very short focal length of f ~ 35 μm and f/3.5. The fabricated transparent aerogel and xerogel replicas of microtrichia demonstrate a remarkable optical waveguiding performance, delivering light to their sub-100 nm nanotips. Dense fused silica conical clones deliver light through sub-50 nm nanotips, enabling nanoscale light–matter interactions. Super-resolution bioarchitectonics offers new and alternative tools and promises novel developments and applications in nanophotonics and other nanotechnology sectors. Full article

(This article belongs to the Special Issue Biomimetic Nanotechnology Vol. 4: Advances in Biomimetic Nanotechnology)

► Show Figures

Figure 1

Search Results (82)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (82)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI