Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (20)

Search Parameters:
Keywords = hardware-aware deep learning

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
19 pages, 13244 KB  
Article
MWR-Net: An Edge-Oriented Lightweight Framework for Image Restoration in Single-Lens Infrared Computational Imaging
by Xuanyu Qian, Xuquan Wang, Yujie Xing, Guishuo Yang, Xiong Dun, Zhanshan Wang and Xinbin Cheng
Remote Sens. 2025, 17(17), 3005; https://doi.org/10.3390/rs17173005 - 29 Aug 2025
Viewed by 558
Abstract
Infrared video imaging is an cornerstone technology for environmental perception, particularly in drone-based remote sensing applications such as disaster assessment and infrastructure inspection. Conventional systems, however, rely on bulky optical architectures that limit deployment on lightweight aerial platforms. Computational imaging offers a promising [...] Read more.
Infrared video imaging is an cornerstone technology for environmental perception, particularly in drone-based remote sensing applications such as disaster assessment and infrastructure inspection. Conventional systems, however, rely on bulky optical architectures that limit deployment on lightweight aerial platforms. Computational imaging offers a promising alternative by integrating optical encoding with algorithmic reconstruction, enabling compact hardware while maintaining imaging performance comparable to sophisticated multi-lens systems. Nonetheless, achieving real-time video-rate computational image restoration on resource-constrained unmanned aerial vehicles (UAVs) remains a critical challenge. To address this, we propose Mobile Wavelet Restoration-Net (MWR-Net), a lightweight deep learning framework tailored for real-time infrared image restoration. Built on a MobileNetV4 backbone, MWR-Net leverages depthwise separable convolutions and an optimized downsampling scheme to minimize parameters and computational overhead. A novel wavelet-domain loss enhances high-frequency detail recovery, while the modulation transfer function (MTF) is adopted as an optics-aware evaluation metric. With only 666.37 K parameters and 6.17 G MACs, MWR-Net achieves a PSNR of 37.10 dB and an SSIM of 0.964 on a custom dataset, outperforming a pruned U-Net baseline. Deployed on an RK3588 chip, it runs at 42 FPS. These results demonstrate MWR-Net’s potential as an efficient and practical solution for UAV-based infrared sensing applications. Full article
Show Figures

Figure 1

24 pages, 8688 KB  
Article
Lightweight Obstacle Avoidance for Fixed-Wing UAVs Using Entropy-Aware PPO
by Meimei Su, Haochen Chai, Chunhui Zhao, Yang Lyu and Jinwen Hu
Drones 2025, 9(9), 598; https://doi.org/10.3390/drones9090598 - 26 Aug 2025
Viewed by 738
Abstract
Obstacle avoidance during high-speed, low-altitude flight remains a significant challenge for unmanned aerial vehicles (UAVs), particularly in unfamiliar environments where prior maps and heavy onboard sensors are unavailable. To address this, we present an entropy-aware deep reinforcement learning framework that enables fixed-wing UAVs [...] Read more.
Obstacle avoidance during high-speed, low-altitude flight remains a significant challenge for unmanned aerial vehicles (UAVs), particularly in unfamiliar environments where prior maps and heavy onboard sensors are unavailable. To address this, we present an entropy-aware deep reinforcement learning framework that enables fixed-wing UAVs to navigate safely using only monocular onboard cameras. Our system features a lightweight, single-frame depth estimation module optimized for real-time execution on edge computing platforms, followed by a reinforcement learning controller equipped with a novel reward function that balances goal-reaching performance with path smoothness under fixed-wing dynamic constraints. To enhance policy optimization, we incorporate high-quality experiences from the replay buffer into the gradient computation, introducing a soft imitation mechanism that encourages the agent to align its behavior with previously successful actions. To further balance exploration and exploitation, we integrate an adaptive entropy regularization mechanism into the Proximal Policy Optimization (PPO) algorithm. This module dynamically adjusts policy entropy during training, leading to improved stability, faster convergence, and better generalization to unseen scenarios. Extensive software-in-the-loop (SITL) and hardware-in-the-loop (HITL) experiments demonstrate that our approach outperforms baseline methods in obstacle avoidance success rate and path quality, while remaining lightweight and deployable on resource-constrained aerial platforms. Full article
Show Figures

Figure 1

17 pages, 1292 KB  
Article
An Instrumental High-Frequency Smart Meter with Embedded Energy Disaggregation
by Dimitrios Kolosov, Matthew Robinson, Pascal A. Schirmer and Iosif Mporas
Sensors 2025, 25(17), 5280; https://doi.org/10.3390/s25175280 - 25 Aug 2025
Viewed by 779
Abstract
Most available smart meters sample at low rates and transmit the acquired measurements to a cloud server for further processing. This article presents a prototype smart meter operating at a high sampling frequency (15 kHz) and performing energy disaggregation locally, thus negating the [...] Read more.
Most available smart meters sample at low rates and transmit the acquired measurements to a cloud server for further processing. This article presents a prototype smart meter operating at a high sampling frequency (15 kHz) and performing energy disaggregation locally, thus negating the need to transmit the acquired high-frequency measurements. The prototype’s architecture comprises a custom signal conditioning circuit and an embedded board that performs energy disaggregation using a deep learning model. The influence of the sampling frequency on the model’s accuracy and the edge device power consumption, throughput, and latency across different hardware platforms is evaluated. The architecture embeds NILM inference into the meter hardware while maintaining a compact and energy-efficient design. The presented smart meter is benchmarked across six embedded platforms, evaluating model accuracy, latency, power usage, and throughput. Furthermore, three novel hardware-aware performance metrics are introduced to quantify NILM efficiency per unit cost, throughput, and energy, offering a reproducible framework for future NILM-enabled edge meter designs. Full article
(This article belongs to the Section Electronic Sensors)
Show Figures

Figure 1

32 pages, 1435 KB  
Review
Smart Safety Helmets with Integrated Vision Systems for Industrial Infrastructure Inspection: A Comprehensive Review of VSLAM-Enabled Technologies
by Emmanuel A. Merchán-Cruz, Samuel Moveh, Oleksandr Pasha, Reinis Tocelovskis, Alexander Grakovski, Alexander Krainyukov, Nikita Ostrovenecs, Ivans Gercevs and Vladimirs Petrovs
Sensors 2025, 25(15), 4834; https://doi.org/10.3390/s25154834 - 6 Aug 2025
Viewed by 1221
Abstract
Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused [...] Read more.
Smart safety helmets equipped with vision systems are emerging as powerful tools for industrial infrastructure inspection. This paper presents a comprehensive state-of-the-art review of such VSLAM-enabled (Visual Simultaneous Localization and Mapping) helmets. We surveyed the evolution from basic helmet cameras to intelligent, sensor-fused inspection platforms, highlighting how modern helmets leverage real-time visual SLAM algorithms to map environments and assist inspectors. A systematic literature search was conducted targeting high-impact journals, patents, and industry reports. We classify helmet-integrated camera systems into monocular, stereo, and omnidirectional types and compare their capabilities for infrastructure inspection. We examine core VSLAM algorithms (feature-based, direct, hybrid, and deep-learning-enhanced) and discuss their adaptation to wearable platforms. Multi-sensor fusion approaches integrating inertial, LiDAR, and GNSS data are reviewed, along with edge/cloud processing architectures enabling real-time performance. This paper compiles numerous industrial use cases, from bridges and tunnels to plants and power facilities, demonstrating significant improvements in inspection efficiency, data quality, and worker safety. Key challenges are analyzed, including technical hurdles (battery life, processing limits, and harsh environments), human factors (ergonomics, training, and cognitive load), and regulatory issues (safety certification and data privacy). We also identify emerging trends, such as semantic SLAM, AI-driven defect recognition, hardware miniaturization, and collaborative multi-helmet systems. This review finds that VSLAM-equipped smart helmets offer a transformative approach to infrastructure inspection, enabling real-time mapping, augmented awareness, and safer workflows. We conclude by highlighting current research gaps, notably in standardizing systems and integrating with asset management, and provide recommendations for industry adoption and future research directions. Full article
Show Figures

Figure 1

24 pages, 8344 KB  
Article
Research and Implementation of Travel Aids for Blind and Visually Impaired People
by Jun Xu, Shilong Xu, Mingyu Ma, Jing Ma and Chuanlong Li
Sensors 2025, 25(14), 4518; https://doi.org/10.3390/s25144518 - 21 Jul 2025
Viewed by 645
Abstract
Blind and visually impaired (BVI) people face significant challenges in perception, navigation, and safety during travel. Existing infrastructure (e.g., blind lanes) and traditional aids (e.g., walking sticks, basic audio feedback) provide limited flexibility and interactivity for complex environments. To solve this problem, we [...] Read more.
Blind and visually impaired (BVI) people face significant challenges in perception, navigation, and safety during travel. Existing infrastructure (e.g., blind lanes) and traditional aids (e.g., walking sticks, basic audio feedback) provide limited flexibility and interactivity for complex environments. To solve this problem, we propose a real-time travel assistance system based on deep learning. The hardware comprises an NVIDIA Jetson Nano controller, an Intel D435i depth camera for environmental sensing, and SG90 servo motors for feedback. To address embedded device computational constraints, we developed a lightweight object detection and segmentation algorithm. Key innovations include a multi-scale attention feature extraction backbone, a dual-stream fusion module incorporating the Mamba architecture, and adaptive context-aware detection/segmentation heads. This design ensures high computational efficiency and real-time performance. The system workflow is as follows: (1) the D435i captures real-time environmental data; (2) the processor analyzes this data, converting obstacle distances and path deviations into electrical signals; (3) servo motors deliver vibratory feedback for guidance and alerts. Preliminary tests confirm that the system can effectively detect obstacles and correct path deviations in real time, suggesting its potential to assist BVI users. However, as this is a work in progress, comprehensive field trials with BVI participants are required to fully validate its efficacy. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

23 pages, 2426 KB  
Article
SUQ-3: A Three Stage Coarse-to-Fine Compression Framework for Sustainable Edge AI in Smart Farming
by Thavavel Vaiyapuri and Huda Aldosari
Sustainability 2025, 17(12), 5230; https://doi.org/10.3390/su17125230 - 6 Jun 2025
Viewed by 683
Abstract
Artificial intelligence of things (AIoT) has become a pivotal enabler of precision agriculture by supporting real-time, data-driven decision-making at the edge. Deep learning (DL) models are central to this paradigm, offering powerful capabilities for analyzing environmental and climatic data in a range of [...] Read more.
Artificial intelligence of things (AIoT) has become a pivotal enabler of precision agriculture by supporting real-time, data-driven decision-making at the edge. Deep learning (DL) models are central to this paradigm, offering powerful capabilities for analyzing environmental and climatic data in a range of agricultural applications. However, deploying these models on edge devices remains challenging due to constraints in memory, computation, and energy. Existing model compression techniques predominantly target large-scale 2D architectures, with limited attention to one-dimensional (1D) models such as gated recurrent units (GRUs), which are commonly employed for processing sequential sensor data. To address this gap, we propose a novel three-stage coarse-to-fine compression framework, termed SUQ-3 (Structured, Unstructured Pruning, and Quantization), designed to optimize 1D DL models for efficient edge deployment in AIoT applications. The SUQ-3 framework sequentially integrates (1) structured pruning with an M×N sparsity pattern to induce hardware-friendly, coarse-grained sparsity; (2) unstructured pruning to eliminate low-magnitude weights for fine-grained compression; and (3) quantization, applied post quantization-aware training (QAT), to support low-precision inference with minimal accuracy loss. We validate the proposed SUQ-3 by compressing a GRU-based crop recommendation model trained on environmental and climatic data from an agricultural dataset. Experimental results show a model size reduction of approximately 85% and an 80% improvement in inference latency while preserving high predictive accuracy (F1 score: 0.97 vs. baseline: 0.9837). Notably, when deployed on a mobile edge device using TensorFlow Lite, the SUQ-3 model achieved an estimated energy consumption of 1.18 μJ per inference, representing a 74.4% reduction compared with the baseline and demonstrating its potential for sustainable low-power AI deployment in agricultural environments. Although demonstrated in an agricultural AIoT use case, the generality and modularity of SUQ-3 make it applicable to a broad range of DL models across domains requiring efficient edge intelligence. Full article
(This article belongs to the Collection Sustainability in Agricultural Systems and Ecosystem Services)
Show Figures

Figure 1

20 pages, 305 KB  
Article
Revisiting Database Indexing for Parallel and Accelerated Computing: A Comprehensive Study and Novel Approaches
by Maryam Abbasi, Marco V. Bernardo, Paulo Váz, José Silva and Pedro Martins
Information 2024, 15(8), 429; https://doi.org/10.3390/info15080429 - 24 Jul 2024
Cited by 1 | Viewed by 3531
Abstract
While the importance of indexing strategies for optimizing query performance in database systems is widely acknowledged, the impact of rapidly evolving hardware architectures on indexing techniques has been an underexplored area. As modern computing systems increasingly leverage parallel processing capabilities, multi-core CPUs, and [...] Read more.
While the importance of indexing strategies for optimizing query performance in database systems is widely acknowledged, the impact of rapidly evolving hardware architectures on indexing techniques has been an underexplored area. As modern computing systems increasingly leverage parallel processing capabilities, multi-core CPUs, and specialized hardware accelerators, traditional indexing approaches may not fully capitalize on these advancements. This comprehensive experimental study investigates the effects of hardware-conscious indexing strategies tailored for contemporary and emerging hardware platforms. Through rigorous experimentation on a real-world database environment using the industry-standard TPC-H benchmark, this research evaluates the performance implications of indexing techniques specifically designed to exploit parallelism, vectorization, and hardware-accelerated operations. By examining approaches such as cache-conscious B-Tree variants, SIMD-optimized hash indexes, and GPU-accelerated spatial indexing, the study provides valuable insights into the potential performance gains and trade-offs associated with these hardware-aware indexing methods. The findings reveal that hardware-conscious indexing strategies can significantly outperform their traditional counterparts, particularly in data-intensive workloads and large-scale database deployments. Our experiments show improvements ranging from 32.4% to 48.6% in query execution time, depending on the specific technique and hardware configuration. However, the study also highlights the complexity of implementing and tuning these techniques, as they often require intricate code optimizations and a deep understanding of the underlying hardware architecture. Additionally, this research explores the potential of machine learning-based indexing approaches, including reinforcement learning for index selection and neural network-based index advisors. While these techniques show promise, with performance improvements of up to 48.6% in certain scenarios, their effectiveness varies across different query types and data distributions. By offering a comprehensive analysis and practical recommendations, this research contributes to the ongoing pursuit of database performance optimization in the era of heterogeneous computing. The findings inform database administrators, developers, and system architects on effective indexing practices tailored for modern hardware, while also paving the way for future research into adaptive indexing techniques that can dynamically leverage hardware capabilities based on workload characteristics and resource availability. Full article
(This article belongs to the Special Issue Advances in High Performance Computing and Scalable Software)
23 pages, 3711 KB  
Article
ESC-NAS: Environment Sound Classification Using Hardware-Aware Neural Architecture Search for the Edge
by Dakshina Ranmal, Piumini Ranasinghe, Thivindu Paranayapa, Dulani Meedeniya and Charith Perera
Sensors 2024, 24(12), 3749; https://doi.org/10.3390/s24123749 - 9 Jun 2024
Cited by 11 | Viewed by 2239
Abstract
The combination of deep-learning and IoT plays a significant role in modern smart solutions, providing the capability of handling task-specific real-time offline operations with improved accuracy and minimised resource consumption. This study provides a novel hardware-aware neural architecture search approach called ESC-NAS, to [...] Read more.
The combination of deep-learning and IoT plays a significant role in modern smart solutions, providing the capability of handling task-specific real-time offline operations with improved accuracy and minimised resource consumption. This study provides a novel hardware-aware neural architecture search approach called ESC-NAS, to design and develop deep convolutional neural network architectures specifically tailored for handling raw audio inputs in environmental sound classification applications under limited computational resources. The ESC-NAS process consists of a novel cell-based neural architecture search space built with 2D convolution, batch normalization, and max pooling layers, and capable of extracting features from raw audio. A black-box Bayesian optimization search strategy explores the search space and the resulting model architectures are evaluated through hardware simulation. The models obtained from the ESC-NAS process achieved the optimal trade-off between model performance and resource consumption compared to the existing literature. The ESC-NAS models achieved accuracies of 85.78%, 81.25%, 96.25%, and 81.0% for the FSC22, UrbanSound8K, ESC-10, and ESC-50 datasets, respectively, with optimal model sizes and parameter counts for edge deployment. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

22 pages, 1350 KB  
Article
Causal-Based Approaches to Explain and Learn from Self-Extension—A Review
by Rebeca Marfil, Pablo Bustos and Antonio Bandera
Electronics 2024, 13(7), 1169; https://doi.org/10.3390/electronics13071169 - 22 Mar 2024
Cited by 1 | Viewed by 1777
Abstract
The last decades have seen a revolution in autonomous robotics. Deep learning approaches and their hardware implementations have made it possible to endow robots with extraordinary perceptual capabilities. In addition, they can benefit from advances in Automated Planning, allowing them to autonomously solve [...] Read more.
The last decades have seen a revolution in autonomous robotics. Deep learning approaches and their hardware implementations have made it possible to endow robots with extraordinary perceptual capabilities. In addition, they can benefit from advances in Automated Planning, allowing them to autonomously solve complex tasks. However, on many occasions, the robot still acts without internalising and understanding the reasons behind a perception or an action, beyond an immediate response to a current state of the context. This gap results in limitations that affect its performance, reliability, and trustworthiness. Deep learning alone cannot bridge this gap because the reasons behind behaviour, when it emanates from a model in which the world is a black-box, are not accessible. What is really needed is an underlying architecture based on deeper reasoning. Among other issues, this architecture should enable the robot to generate explanations, allowing people to know why the robot is performing, or has performed, a certain action, or the reasons that may have caused a certain plan failure or perceptual anomaly. Furthermore, when these explanations arise from a cognitive process and are shared, and thus validated, with people, the robot should be able to incorporate these explanations into its knowledge base, and thus use this understanding to improve future behaviour. Our article looks at recent advances in the development of self-aware, self-evolving robots. These robots are designed to provide the necessary explanations to their human counterparts, thereby enhancing their functional capabilities in the quest to gain their trust. Full article
(This article belongs to the Special Issue Intelligent Control and Computing in Advanced Robotics)
Show Figures

Figure 1

13 pages, 2860 KB  
Article
Effectiveness of Data Augmentation for Localization in WSNs Using Deep Learning for the Internet of Things
by Jehan Esheh and Sofiene Affes
Sensors 2024, 24(2), 430; https://doi.org/10.3390/s24020430 - 10 Jan 2024
Cited by 6 | Viewed by 1630
Abstract
Wireless sensor networks (WSNs) have become widely popular and are extensively used for various sensor communication applications due to their flexibility and cost effectiveness, especially for applications where localization is a main challenge. Furthermore, the Dv-hop algorithm is a range-free localization algorithm commonly [...] Read more.
Wireless sensor networks (WSNs) have become widely popular and are extensively used for various sensor communication applications due to their flexibility and cost effectiveness, especially for applications where localization is a main challenge. Furthermore, the Dv-hop algorithm is a range-free localization algorithm commonly used in WSNs. Despite its simplicity and low hardware requirements, it does suffer from limitations in terms of localization accuracy. In this article, we develop an accurate Deep Learning (DL)-based range-free localization for WSN applications in the Internet of things (IoT). To improve the localization performance, we exploit a deep neural network (DNN) to correct the estimated distance between the unknown nodes (i.e., position-unaware) and the anchor nodes (i.e., position-aware) without burdening the IoT cost. DL needs large training data to yield accurate results, and the DNN is no stranger. The efficacy of machine learning, including DNNs, hinges on access to substantial training data for optimal performance. However, to address this challenge, we propose a solution through the implementation of a Data Augmentation Strategy (DAS). This strategy involves the strategic creation of multiple virtual anchors around the existing real anchors. Consequently, this process generates more training data and significantly increases data size. We prove that DAS can provide the DNNs with sufficient training data, and ultimately making it more feasible for WSNs and the IoT to fully benefit from low-cost DNN-aided localization. The simulation results indicate that the accuracy of the proposed (Dv-hop with DNN correction) surpasses that of Dv-hop. Full article
(This article belongs to the Section Internet of Things)
Show Figures

Figure 1

28 pages, 1384 KB  
Article
Quantization-Aware NN Layers with High-throughput FPGA Implementation for Edge AI
by Mara Pistellato, Filippo Bergamasco, Gianluca Bigaglia, Andrea Gasparetto, Andrea Albarelli, Marco Boschetti and Roberto Passerone
Sensors 2023, 23(10), 4667; https://doi.org/10.3390/s23104667 - 11 May 2023
Cited by 7 | Viewed by 4316
Abstract
Over the past few years, several applications have been extensively exploiting the advantages of deep learning, in particular when using convolutional neural networks (CNNs). The intrinsic flexibility of such models makes them widely adopted in a variety of practical applications, from medical to [...] Read more.
Over the past few years, several applications have been extensively exploiting the advantages of deep learning, in particular when using convolutional neural networks (CNNs). The intrinsic flexibility of such models makes them widely adopted in a variety of practical applications, from medical to industrial. In this latter scenario, however, using consumer Personal Computer (PC) hardware is not always suitable for the potential harsh conditions of the working environment and the strict timing that industrial applications typically have. Therefore, the design of custom FPGA (Field Programmable Gate Array) solutions for network inference is gaining massive attention from researchers and companies as well. In this paper, we propose a family of network architectures composed of three kinds of custom layers working with integer arithmetic with a customizable precision (down to just two bits). Such layers are designed to be effectively trained on classical GPUs (Graphics Processing Units) and then synthesized to FPGA hardware for real-time inference. The idea is to provide a trainable quantization layer, called Requantizer, acting both as a non-linear activation for neurons and a value rescaler to match the desired bit precision. This way, the training is not only quantization-aware, but also capable of estimating the optimal scaling coefficients to accommodate both the non-linear nature of the activations and the constraints imposed by the limited precision. In the experimental section, we test the performance of this kind of model while working both on classical PC hardware and a case-study implementation of a signal peak detection device running on a real FPGA. We employ TensorFlow Lite for training and comparison, and use Xilinx FPGAs and Vivado for synthesis and implementation. The results show an accuracy of the quantized networks close to the floating point version, without the need for representative data for calibration as in other approaches, and performance that is better than dedicated peak detection algorithms. The FPGA implementation is able to run in real time at a rate of four gigapixels per second with moderate hardware resources, while achieving a sustained efficiency of 0.5 TOPS/W (tera operations per second per watt), in line with custom integrated hardware accelerators. Full article
(This article belongs to the Special Issue Sensors Based SoCs, FPGA in IoT Applications)
Show Figures

Figure 1

28 pages, 23741 KB  
Article
Embedded Vision Intelligence for the Safety of Smart Cities
by Jon Martin, David Cantero, Maite González, Andrea Cabrera, Mikel Larrañaga, Evangelos Maltezos, Panagiotis Lioupis, Dimitris Kosyvas, Lazaros Karagiannidis, Eleftherios Ouzounoglou and Angelos Amditis
J. Imaging 2022, 8(12), 326; https://doi.org/10.3390/jimaging8120326 - 14 Dec 2022
Cited by 6 | Viewed by 4785
Abstract
Advances in Artificial intelligence (AI) and embedded systems have resulted on a recent increase in use of image processing applications for smart cities’ safety. This enables a cost-adequate scale of automated video surveillance, increasing the data available and releasing human intervention. At the [...] Read more.
Advances in Artificial intelligence (AI) and embedded systems have resulted on a recent increase in use of image processing applications for smart cities’ safety. This enables a cost-adequate scale of automated video surveillance, increasing the data available and releasing human intervention. At the same time, although deep learning is a very intensive task in terms of computing resources, hardware and software improvements have emerged, allowing embedded systems to implement sophisticated machine learning algorithms at the edge. Additionally, new lightweight open-source middleware for constrained resource devices, such as EdgeX Foundry, have appeared to facilitate the collection and processing of data at sensor level, with communication capabilities to exchange data with a cloud enterprise application. The objective of this work is to show and describe the development of two Edge Smart Camera Systems for safety of Smart cities within S4AllCities H2020 project. Hence, the work presents hardware and software modules developed within the project, including a custom hardware platform specifically developed for the deployment of deep learning models based on the I.MX8 Plus from NXP, which considerably reduces processing and inference times; a custom Video Analytics Edge Computing (VAEC) system deployed on a commercial NVIDIA Jetson TX2 platform, which provides high level results on person detection processes; and an edge computing framework for the management of those two edge devices, namely Distributed Edge Computing framework, DECIoT. To verify the utility and functionality of the systems, extended experiments were performed. The results highlight their potential to provide enhanced situational awareness and demonstrate the suitability for edge machine vision applications for safety in smart cities. Full article
Show Figures

Figure 1

13 pages, 2313 KB  
Article
Hardware-Aware Mobile Building Block Evaluation for Computer Vision
by Maxim Bonnaerens, Matthias Freiberger, Marian Verhelst and Joni Dambre
Appl. Sci. 2022, 12(24), 12615; https://doi.org/10.3390/app122412615 - 9 Dec 2022
Cited by 1 | Viewed by 2309
Abstract
In this paper, we propose a methodology to accurately evaluate and compare the performance of efficient neural network building blocks for computer vision in a hardware-aware manner. Our comparison uses pareto fronts based on randomly sampled networks from a design space to capture [...] Read more.
In this paper, we propose a methodology to accurately evaluate and compare the performance of efficient neural network building blocks for computer vision in a hardware-aware manner. Our comparison uses pareto fronts based on randomly sampled networks from a design space to capture the underlying accuracy/complexity trade-offs. We show that our approach enables matching of information obtained by previous comparison paradigms, but provides more insights into the relationship between hardware cost and accuracy. We use our methodology to analyze different building blocks and evaluate their performance on a range of embedded hardware platforms. This highlights the importance of benchmarking building blocks as a preselection step in the design process of a neural network. We show that choosing the right building block can speed up inference by up to a factor of two on specific hardware ML accelerators. Full article
(This article belongs to the Special Issue Hardware-Aware Deep Learning)
Show Figures

Figure 1

14 pages, 3879 KB  
Article
SRAM-Based CIM Architecture Design for Event Detection
by Muhammad Bintang Gemintang Sulaiman, Jin-Yu Lin, Jian-Bai Li, Cheng-Ming Shih, Kai-Cheung Juang and Chih-Cheng Lu
Sensors 2022, 22(20), 7854; https://doi.org/10.3390/s22207854 - 16 Oct 2022
Viewed by 3345
Abstract
Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the high computational complexity and high-energy consumption of CNNs trammel their application in hardware accelerators. Computing-in-memory (CIM) is the technique of running calculations entirely in memory (in our design, we [...] Read more.
Convolutional neural networks (CNNs) play a key role in deep learning applications. However, the high computational complexity and high-energy consumption of CNNs trammel their application in hardware accelerators. Computing-in-memory (CIM) is the technique of running calculations entirely in memory (in our design, we use SRAM). CIM architecture has demonstrated great potential to effectively compute large-scale matrix-vector multiplication. CIM-based architecture for event detection is designed to trigger the next stage of precision inference. To implement an SRAM-based CIM accelerator, a software and hardware co-design approach must consider the CIM macro’s hardware limitations to map the weight onto the AI edge devices. In this paper, we designed a hierarchical AI architecture to optimize the end-to-end system power in the AIoT application. In the experiment, the CIM-aware algorithm with 4-bit activation and 8-bit weight is examined on hand gesture and CIFAR-10 datasets, and determined to have 99.70% and 70.58% accuracy, respectively. A profiling tool to analyze the proposed design is also developed to measure how efficient our architecture design is. The proposed design system utilizes the operating frequency of 100 MHz, hand gesture and CIFAR-10 as the datasets, and nine CNNs and one FC layer as its network, resulting in a frame rate of 662 FPS, 37.6% processing unit utilization, and a power consumption of 0.853 mW. Full article
Show Figures

Figure 1

16 pages, 5375 KB  
Article
Computational Optimization of Image-Based Reinforcement Learning for Robotics
by Stefano Ferraro, Toon Van de Maele, Pietro Mazzaglia, Tim Verbelen and Bart Dhoedt
Sensors 2022, 22(19), 7382; https://doi.org/10.3390/s22197382 - 28 Sep 2022
Cited by 5 | Viewed by 2626
Abstract
The robotics field has been deeply influenced by the advent of deep learning. In recent years, this trend has been characterized by the adoption of large, pretrained models for robotic use cases, which are not compatible with the computational hardware available in robotic [...] Read more.
The robotics field has been deeply influenced by the advent of deep learning. In recent years, this trend has been characterized by the adoption of large, pretrained models for robotic use cases, which are not compatible with the computational hardware available in robotic systems. Moreover, such large, computationally intensive models impede the low-latency execution which is required for many closed-loop control systems. In this work, we propose different strategies for improving the computational efficiency of the deep-learning models adopted in reinforcement-learning (RL) scenarios. As a use-case project, we consider an image-based RL method on the synergy between push-and-grasp actions. As a first optimization step, we reduce the model architecture in complexity, by decreasing the number of layers and by altering the architecture structure. Second, we consider downscaling the input resolution to reduce the computational load. Finally, we perform weight quantization, where we compare post-training quantization and quantized-aware training. We benchmark the improvements introduced in each optimization by running a standard testing routine. We show that the optimization strategies introduced can improve the computational efficiency by around 300 times, while also slightly improving the functional performance of the system. In addition, we demonstrate closed-loop control behaviour on a real-world robot, while processing everything on a Jetson Xavier NX edge device. Full article
(This article belongs to the Special Issue Applications of Resource Efficient Machine Learning in Smart Sensors)
Show Figures

Figure 1

Back to TopTop