Loading [MathJax]/jax/output/HTML-CSS/jax.js
 
 
Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Article Types

Countries / Regions

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Search Results (530)

Search Parameters:
Keywords = convolution neural network accelerator

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
24 pages, 3877 KiB  
Article
A Hybrid Approach for Sports Activity Recognition Using Key Body Descriptors and Hybrid Deep Learning Classifier
by Muhammad Tayyab, Sulaiman Abdullah Alateyah, Mohammed Alnusayri, Mohammed Alatiyyah, Dina Abdulaziz AlHammadi, Ahmad Jalal and Hui Liu
Sensors 2025, 25(2), 441; https://doi.org/10.3390/s25020441 - 13 Jan 2025
Viewed by 372
Abstract
This paper presents an approach for event recognition in sequential images using human body part features and their surrounding context. Key body points were approximated to track and monitor their presence in complex scenarios. Various feature descriptors, including MSER (Maximally Stable Extremal Regions), [...] Read more.
This paper presents an approach for event recognition in sequential images using human body part features and their surrounding context. Key body points were approximated to track and monitor their presence in complex scenarios. Various feature descriptors, including MSER (Maximally Stable Extremal Regions), SURF (Speeded-Up Robust Features), distance transform, and DOF (Degrees of Freedom), were applied to skeleton points, while BRIEF (Binary Robust Independent Elementary Features), HOG (Histogram of Oriented Gradients), FAST (Features from Accelerated Segment Test), and Optical Flow were used on silhouettes or full-body points to capture both geometric and motion-based features. Feature fusion was employed to enhance the discriminative power of the extracted data and the physical parameters calculated by different feature extraction techniques. The system utilized a hybrid CNN (Convolutional Neural Network) + RNN (Recurrent Neural Network) classifier for event recognition, with Grey Wolf Optimization (GWO) for feature selection. Experimental results showed significant accuracy, achieving 98.5% on the UCF-101 dataset and 99.2% on the YouTube dataset. Compared to state-of-the-art methods, our approach achieved better performance in event recognition. Full article
(This article belongs to the Section Intelligent Sensors)
Show Figures

Figure 1

14 pages, 3586 KiB  
Article
Improving Human Activity Recognition Through 1D-ResNet: A Wearable Wristband for 14 Workout Movements
by Sang-Un Kim and Joo-Yong Kim
Processes 2025, 13(1), 207; https://doi.org/10.3390/pr13010207 - 13 Jan 2025
Viewed by 401
Abstract
This study presents a 1D Residual Network(ResNet)-based algorithm for human activity recognition (HAR) focused on classifying 14 different workouts, which represent key exercises commonly performed in fitness training, using wearable inertial measurement unit (IMU) sensors. Unlike traditional 1D Convolutional neural network (CNN) models, [...] Read more.
This study presents a 1D Residual Network(ResNet)-based algorithm for human activity recognition (HAR) focused on classifying 14 different workouts, which represent key exercises commonly performed in fitness training, using wearable inertial measurement unit (IMU) sensors. Unlike traditional 1D Convolutional neural network (CNN) models, the proposed 1D ResNet incorporates residual blocks to prevent gradient vanishing and exploding problems, allowing for deeper networks with improved performance. The IMU sensor, placed on the wrist, provided Z-axis acceleration data, which were used to train the model. A total of 901 data samples were collected from five participants, with 600 used for training and 301 for testing. The model achieved a recognition accuracy of 97.09%, surpassing the 89.03% of a 1D CNN without residual blocks and the 92% of a cascaded 1D CNN from previous research. These results indicate that the 1D ResNet model is highly effective in recognizing a wide range of workouts. The findings suggest that wearable devices can autonomously classify human activities and provide personalized training recommendations, paving the way for AI-driven personal training systems. Full article
(This article belongs to the Special Issue Smart Wearable Technology: Thermal Management and Energy Applications)
Show Figures

Figure 1

21 pages, 5845 KiB  
Article
FPGA-QNN: Quantized Neural Network Hardware Acceleration on FPGAs
by Mustafa Tasci, Ayhan Istanbullu, Vedat Tumen and Selahattin Kosunalp
Appl. Sci. 2025, 15(2), 688; https://doi.org/10.3390/app15020688 - 12 Jan 2025
Viewed by 468
Abstract
Recently, convolutional neural networks (CNNs) have received a massive amount of interest due to their ability to achieve high accuracy in various artificial intelligence tasks. With the development of complex CNN models, a significant drawback is their high computational burden and memory requirements. [...] Read more.
Recently, convolutional neural networks (CNNs) have received a massive amount of interest due to their ability to achieve high accuracy in various artificial intelligence tasks. With the development of complex CNN models, a significant drawback is their high computational burden and memory requirements. The performance of a typical CNN model can be enhanced by the improvement of hardware accelerators. Practical implementations on field-programmable gate arrays (FPGA) have the potential to reduce resource utilization while maintaining low power consumption. Nevertheless, when implementing complex CNN models on FPGAs, these may may require further computational and memory capacities, exceeding the available capacity provided by many current FPGAs. An effective solution to this issue is to use quantized neural network (QNN) models to remove the burden of full-precision weights and activations. This article proposes an accelerator design framework for FPGAs, called FPGA-QNN, with a particular value in reducing high computational burden and memory requirements when implementing CNNs. To approach this goal, FPGA-QNN exploits the basics of quantized neural network (QNN) models by converting the high burden of full-precision weights and activations into integer operations. The FPGA-QNN framework comes up with 12 accelerators based on multi-layer perceptron (MLP) and LeNet CNN models, each of which is associated with a specific combination of quantization and folding. The outputs from the performance evaluations on Xilinx PYNQ Z1 development board proved the superiority of FPGA-QNN in terms of resource utilization and energy efficiency in comparison to several recent approaches. The proposed MLP model classified the FashionMNIST dataset at a speed of 953 kFPS with 1019 GOPs while consuming 2.05 W. Full article
(This article belongs to the Special Issue Advancements in Deep Learning and Its Applications)
Show Figures

Figure 1

15 pages, 11124 KiB  
Article
Intraoperative Augmented Reality for Vitreoretinal Surgery Using Edge Computing
by Run Zhou Ye and Raymond Iezzi
J. Pers. Med. 2025, 15(1), 20; https://doi.org/10.3390/jpm15010020 - 6 Jan 2025
Viewed by 456
Abstract
Purpose: Augmented reality (AR) may allow vitreoretinal surgeons to leverage microscope-integrated digital imaging systems to analyze and highlight key retinal anatomic features in real time, possibly improving safety and precision during surgery. By employing convolutional neural networks (CNNs) for retina vessel segmentation, [...] Read more.
Purpose: Augmented reality (AR) may allow vitreoretinal surgeons to leverage microscope-integrated digital imaging systems to analyze and highlight key retinal anatomic features in real time, possibly improving safety and precision during surgery. By employing convolutional neural networks (CNNs) for retina vessel segmentation, a retinal coordinate system can be created that allows pre-operative images of capillary non-perfusion or retinal breaks to be digitally aligned and overlayed upon the surgical field in real time. Such technology may be useful in assuring thorough laser treatment of capillary non-perfusion or in using pre-operative optical coherence tomography (OCT) to guide macular surgery when microscope-integrated OCT (MIOCT) is not available. Methods: This study is a retrospective analysis involving the development and testing of a novel image-registration algorithm for vitreoretinal surgery. Fifteen anonymized cases of pars plana vitrectomy with epiretinal membrane peeling, along with corresponding preoperative fundus photographs and optical coherence tomography (OCT) images, were retrospectively collected from the Mayo Clinic database. We developed a TPU (Tensor-Processing Unit)-accelerated CNN for semantic segmentation of retinal vessels from fundus photographs and subsequent real-time image registration in surgical video streams. An iterative patch-wise cross-correlation (IPCC) algorithm was developed for image registration, with a focus on optimizing processing speeds and maintaining high spatial accuracy. The primary outcomes measured were processing speed in frames per second (FPS) and the spatial accuracy of image registration, quantified by the Dice coefficient between registered and manually aligned images. Results: When deployed on an Edge TPU, the CNN model combined with our image-registration algorithm processed video streams at a rate of 14 FPS, which is superior to processing rates achieved on other standard hardware configurations. The IPCC algorithm efficiently aligned pre-operative and intraoperative images, showing high accuracy in comparison to manual registration. Conclusions: This study demonstrates the feasibility of using TPU-accelerated CNNs for enhanced AR in vitreoretinal surgery. Full article
(This article belongs to the Section Methodology, Drug and Device Discovery)
Show Figures

Figure 1

13 pages, 1853 KiB  
Article
Optimizing Deep Learning Acceleration on FPGA for Real-Time and Resource-Efficient Image Classification
by Ahmad Mouri Zadeh Khaki and Ahyoung Choi
Appl. Sci. 2025, 15(1), 422; https://doi.org/10.3390/app15010422 - 5 Jan 2025
Viewed by 655
Abstract
Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two [...] Read more.
Deep learning (DL) has revolutionized image classification, yet deploying convolutional neural networks (CNNs) on edge devices for real-time applications remains a significant challenge due to constraints in computation, memory, and power efficiency. This work presents an optimized implementation of VGG16 and VGG19, two widely used CNN architectures, for classifying the CIFAR-10 dataset using transfer learning on field-programmable gate arrays (FPGAs). Utilizing the Xilinx Vitis-AI and TensorFlow2 frameworks, we adapt VGG16 and VGG19 for FPGA deployment through quantization, compression, and hardware-specific optimizations. Our implementation achieves high classification accuracy, with Top-1 accuracy of 89.54% and 87.47% for VGG16 and VGG19, respectively, while delivering significant reductions in inference latency (7.29× and 6.6× compared to CPU-based alternatives). These results highlight the suitability of our approach for resource-efficient, real-time edge applications. Key contributions include a detailed methodology for combining transfer learning with FPGA acceleration, an analysis of hardware resource utilization, and performance benchmarks. This work underscores the potential of FPGA-based solutions to enable scalable, low-latency DL deployments in domains such as autonomous systems, IoT, and mobile devices. Full article
(This article belongs to the Special Issue Research on Machine Learning in Computer Vision)
Show Figures

Figure 1

18 pages, 4377 KiB  
Article
Deep Convolutional Framelets for Dose Reconstruction in Boron Neutron Capture Therapy with Compton Camera Detector
by Angelo Didonna, Dayron Ramos Lopez, Giuseppe Iaselli, Nicola Amoroso, Nicola Ferrara and Gabriella Maria Incoronata Pugliese
Cancers 2025, 17(1), 130; https://doi.org/10.3390/cancers17010130 - 3 Jan 2025
Viewed by 469
Abstract
Background: Boron neutron capture therapy (BNCT) is an innovative binary form of radiation therapy with high selectivity towards cancer tissue based on the neutron capture reaction 10B(n,α)7Li, consisting in the exposition of patients to neutron beams after administration [...] Read more.
Background: Boron neutron capture therapy (BNCT) is an innovative binary form of radiation therapy with high selectivity towards cancer tissue based on the neutron capture reaction 10B(n,α)7Li, consisting in the exposition of patients to neutron beams after administration of a boron compound with preferential accumulation in cancer cells. The high linear energy transfer products of the ensuing reaction deposit their energy at the cell level, sparing normal tissue. Although progress in accelerator-based BNCT has led to renewed interest in this cancer treatment modality, in vivo dose monitoring during treatment still remains not feasible and several approaches are under investigation. While Compton imaging presents various advantages over other imaging methods, it typically requires long reconstruction times, comparable with BNCT treatment duration. Methods: This study aims to develop deep neural network models to estimate the dose distribution by using a simulated dataset of BNCT Compton camera images. The models pursue the avoidance of the iteration time associated with the maximum-likelihood expectation-maximization algorithm (MLEM), enabling a prompt dose reconstruction during the treatment. The U-Net architecture and two variants based on the deep convolutional framelets framework have been used for noise and artifact reduction in few-iteration reconstructed images. Results: This approach has led to promising results in terms of reconstruction accuracy and processing time, with a reduction by a factor of about 6 with respect to classical iterative algorithms. Conclusions: This can be considered a good reconstruction time performance, considering typical BNCT treatment times. Further enhancements may be achieved by optimizing the reconstruction of input images with different deep learning techniques. Full article
(This article belongs to the Section Methods and Technologies Development)
Show Figures

Figure 1

23 pages, 6640 KiB  
Article
Research on Prediction of Multiple Degenerative Diseases and Biomarker Screening Based on DNA Methylation
by Ruoting Tian, Hao Zhang, Chencai Wang, Shengyang Zhou, Li Zhang and Han Wang
Int. J. Mol. Sci. 2025, 26(1), 313; https://doi.org/10.3390/ijms26010313 - 1 Jan 2025
Viewed by 599
Abstract
The aging process will lead to a gradual functional decline in the human body, and even accelerate a significantly increased risk of degenerative diseases. DNA methylation patterns change markedly with one’s age, serving as a biomarker of biological age and closely linked to [...] Read more.
The aging process will lead to a gradual functional decline in the human body, and even accelerate a significantly increased risk of degenerative diseases. DNA methylation patterns change markedly with one’s age, serving as a biomarker of biological age and closely linked to the occurrence and progression of age-related diseases. Currently, diagnostic methods for individual degenerative diseases are relatively mature. However, aging often accompanies the onset of multiple degenerative diseases, presenting certain limitations in existing diagnostic models. Additionally, some identified DNA methylation biomarkers are typically applicable to only one or a few types of cancer or diseases, further restricting their utility. We endeavor to screen for biomarkers associated with multiple degenerative diseases from the perspective of aging-related co-morbid mechanisms and to perform multiple degenerative disease diagnoses. In this study, we explored research based on methylation correlations and patterns to investigate shared mechanisms across multiple degenerative diseases, identifying a set of biomarkers associated with them. We validated these biomarkers with biological omics analysis and the prediction of multiple classes of degenerative diseases, screened the biomarkers from 600 to 110 by biological omics analysis, and demonstrated the validity and predictive ability of the screened 110 biomarkers. We propose a disease diagnostic model based on a multi-scale one-dimensional convolutional neural network (MSDCNN) and a multi-class degenerative disease prediction model (ResDegNet). The two models are well trained and tested to accurately diagnose diseases and categorize four types of degenerative diseases. The research identified 110 biomarkers associated with degenerative diseases, providing a foundation for further exploration of age-related degenerative conditions. This work aims to facilitate early diagnosis, the identification of biomarkers, and the development of therapeutic targets for drug interventions. Full article
(This article belongs to the Section Molecular Pathology, Diagnostics, and Therapeutics)
Show Figures

Figure 1

16 pages, 4126 KiB  
Article
Deep Learning for Predicting Spheroid Viability: Novel Convolutional Neural Network Model for Automating Quality Control for Three-Dimensional Bioprinting
by Zyva A. Sheikh, Oliver Clarke, Amatullah Mir and Narutoshi Hibino
Bioengineering 2025, 12(1), 28; https://doi.org/10.3390/bioengineering12010028 - 1 Jan 2025
Viewed by 638
Abstract
Spheroids serve as the building blocks for three-dimensional (3D) bioprinted tissue patches. When larger than 500 μm, the desired size for 3D bioprinting, they tend to have a hypoxic core with necrotic cells. Therefore, it is critical to assess the viability of spheroids [...] Read more.
Spheroids serve as the building blocks for three-dimensional (3D) bioprinted tissue patches. When larger than 500 μm, the desired size for 3D bioprinting, they tend to have a hypoxic core with necrotic cells. Therefore, it is critical to assess the viability of spheroids in order to ensure the successful fabrication of high-viability patches. However, current viability assays are time-consuming, labor-intensive, require specialized training, or are subject to human bias. In this study, we build a convolutional neural network (CNN) model to efficiently and accurately predict spheroid viability, using a phase-contrast image of a spheroid as its input. A comprehensive dataset of mouse mesenchymal stem cell (mMSC) spheroids of varying sizes with corresponding viability percentages, which was obtained through CCK-8 assays, was established and used to train and validate the model. The model was trained to automatically classify spheroids into one of four distinct categories based on their predicted viability: 0–20%, 20–40%, 40–70%, and 70–100%. The model achieved an average accuracy of 92%, with a consistent loss below 0.2. This deep-learning model offers a non-invasive, efficient, and accurate method to streamline the assessment of spheroid quality, thereby accelerating the development of bioengineered cardiac tissue patches for cardiovascular disease therapies. Full article
Show Figures

Graphical abstract

14 pages, 3865 KiB  
Article
SiC MOSFET with Integrated SBD Device Performance Prediction Method Based on Neural Network
by Xiping Niu, Ling Sang, Xiaoling Duan, Shijie Gu, Peng Zhao, Tao Zhu, Kaixuan Xu, Yawei He, Zheyang Li, Jincheng Zhang and Rui Jin
Micromachines 2025, 16(1), 55; https://doi.org/10.3390/mi16010055 - 31 Dec 2024
Viewed by 659
Abstract
The SiC MOSFET with an integrated SBD (SBD-MOSFET) exhibits excellent performance in power electronics. However, the static and dynamic characteristics of this device are influenced by a multitude of parameters, and traditional TCAD simulation methods are often characterized by their complexity. Due to [...] Read more.
The SiC MOSFET with an integrated SBD (SBD-MOSFET) exhibits excellent performance in power electronics. However, the static and dynamic characteristics of this device are influenced by a multitude of parameters, and traditional TCAD simulation methods are often characterized by their complexity. Due to the increasing research on neural networks in recent years, such as the application of neural networks to the prediction of GaN JBS and Finfet devices, this paper considers the application of neural networks to the performance prediction of SiC MOSFET devices with an integrated SBD. This study introduces a novel approach utilizing neural network machine learning to predict the static and dynamic characteristics of the SBD-MOSFET. In this research, SBD-MOSFET devices are modeled and simulated using Sentaurus TCAD(2017) software, resulting in the generation of 625 sets of device structure and sample data, which serve as the sample set for the neural network. These input variables are then fed into the neural network for prediction. The findings indicate that the mean square error (MSE) values for the threshold voltage (Vth), breakdown voltage (BV), specific on-resistance (Ron), and total switching power dissipation (E) are 0.0051, 0.0031, 0.0065, and 0.0220, respectively, demonstrating a high degree of accuracy in the predicted values. Meanwhile, in the comparison of convolutional neural networks and machine learning, the CNN accuracy is much higher than the machine learning methods. This method of predicting device performance via neural networks offers a rapid means of designing SBD-MOSFETs with specified performance targets, thereby presenting significant advantages in accelerating research on SBD-MOSFET performance prediction. Full article
(This article belongs to the Special Issue Research Progress of Advanced SiC Semiconductors)
Show Figures

Figure 1

18 pages, 3376 KiB  
Article
Heterogeneous Edge Computing for Molecular Property Prediction with Graph Convolutional Networks
by Mahdieh Grailoo and Jose Nunez-Yanez
Electronics 2025, 14(1), 101; https://doi.org/10.3390/electronics14010101 - 30 Dec 2024
Viewed by 471
Abstract
Graph-based neural networks have proven to be useful in molecular property prediction, a critical component of computer-aided drug discovery. In this application, in response to the growing demand for improved computational efficiency and localized edge processing, this paper introduces a novel approach that [...] Read more.
Graph-based neural networks have proven to be useful in molecular property prediction, a critical component of computer-aided drug discovery. In this application, in response to the growing demand for improved computational efficiency and localized edge processing, this paper introduces a novel approach that leverages specialized accelerators on a heterogeneous edge computing platform. Our focus is on graph convolutional networks, a leading graph-based neural network variant that integrates graph convolution layers with multi-layer perceptrons. Molecular graphs are typically characterized by a low number of nodes, leading to low-dimensional dense matrix multiplications within multi-layer perceptrons—conditions that are particularly well-suited for Edge TPUs. These TPUs feature a systolic array of multiply–accumulate units optimized for dense matrix operations. Furthermore, the inherent sparsity in molecular graph adjacency matrices offers additional opportunities for computational optimization. To capitalize on this, we developed an FPGA GFADES accelerator, using high-level synthesis, specifically tailored to efficiently manage the sparsity in both the graph structure and node features. Our hardware/software co-designed GCN+MLP architecture delivers performance improvements, achieving up to 58× increased speed compared to conventional software implementations. This architecture is implemented using the Pynq framework and TensorFlow Lite Runtime, running on a multi-core ARM CPU within an AMD/Xilinx Zynq Ultrascale+ device, in combination with the Edge TPU and programmable logic. Full article
Show Figures

Figure 1

31 pages, 3152 KiB  
Article
Research on Spaceborne Neural Network Accelerator and Its Fault Tolerance Design
by Yingzhao Shao, Junyi Wang, Xiaodong Han, Yunsong Li, Yaolin Li and Zhanpeng Tao
Remote Sens. 2025, 17(1), 69; https://doi.org/10.3390/rs17010069 - 28 Dec 2024
Viewed by 378
Abstract
To meet the high-reliability requirements of real-time on-orbit tasks, this paper proposes a fault-tolerant reinforcement design method for spaceborne intelligent processing algorithms based on convolutional neural networks (CNNs). This method is built on a CNN accelerator using Field-Programmable Gate Array (FPGA) technology, analyzing [...] Read more.
To meet the high-reliability requirements of real-time on-orbit tasks, this paper proposes a fault-tolerant reinforcement design method for spaceborne intelligent processing algorithms based on convolutional neural networks (CNNs). This method is built on a CNN accelerator using Field-Programmable Gate Array (FPGA) technology, analyzing the impact of Single-Event Upsets (SEUs) on neural network computation. The accelerator design integrates data validation, Triple Modular Redundancy (TMR), and other techniques, optimizing a partial fault-tolerant architecture based on SEU sensitivity. This fault-tolerant architecture analyzes the hardware accelerator, parameter storage, and actual computation, employing data validation to reinforce model parameters and spatial and temporal TMR to reinforce accelerator computations. Using the ResNet18 model, fault tolerance performance tests were conducted by simulating SEUs. Compared to the prototype network, this fault-tolerant design method increases tolerance to SEU error accumulation by five times while increasing resource consumption by less than 15%, making it more suitable for spaceborne on-orbit applications than traditional fault-tolerant design approaches. Full article
Show Figures

Figure 1

42 pages, 7308 KiB  
Article
Vertical Force Monitoring of Racing Tires: A Novel Deep Neural Network-Based Estimation Method
by Semih Öngir, Egemen Cumhur Kaleli, Mehmet Zeki Konyar and Hüseyin Metin Ertunç
Appl. Sci. 2025, 15(1), 123; https://doi.org/10.3390/app15010123 - 27 Dec 2024
Viewed by 457
Abstract
This study aims to accurately estimate vertical tire forces on racing tires of specific stiffness using acceleration, pressure, and speed data measurements from a test rig. A hybrid model, termed Random Forest Assisted Deep Neural Network (RFADNN), is introduced, combining a novel deep [...] Read more.
This study aims to accurately estimate vertical tire forces on racing tires of specific stiffness using acceleration, pressure, and speed data measurements from a test rig. A hybrid model, termed Random Forest Assisted Deep Neural Network (RFADNN), is introduced, combining a novel deep learning framework with the Random Forest Algorithm to enhance estimation accuracy. By leveraging the Temporal Convolutional Network (TCN), Minimal Gated Unit (MGU), Long Short-Term Memory (LSTM), and Attention mechanisms, the deep learning framework excels in extracting complex features, which the Random Forest Model subsequently analyzes to improve the accuracy of estimating vertical tire forces. Validated with test data, this approach outperforms standard models, achieving an MAE of 0.773 kgf, demonstrating the advantage of the RFADNN method in required vertical force estimation tasks for race tires. This comparison emphasizes the significant benefits of incorporating advanced deep learning with traditional machine learning to provide a comprehensive and interpretable solution for complex estimation challenges in automotive engineering. Full article
(This article belongs to the Section Computing and Artificial Intelligence)
Show Figures

Figure 1

29 pages, 1433 KiB  
Article
Sparse Convolution FPGA Accelerator Based on Multi-Bank Hash Selection
by Jia Xu, Han Pu and Dong Wang
Micromachines 2025, 16(1), 22; https://doi.org/10.3390/mi16010022 - 27 Dec 2024
Viewed by 501
Abstract
Reconfigurable processor-based acceleration of deep convolutional neural network (DCNN) algorithms has emerged as a widely adopted technique, with particular attention on sparse neural network acceleration as an active research area. However, many computing devices that claim high computational power still struggle to execute [...] Read more.
Reconfigurable processor-based acceleration of deep convolutional neural network (DCNN) algorithms has emerged as a widely adopted technique, with particular attention on sparse neural network acceleration as an active research area. However, many computing devices that claim high computational power still struggle to execute neural network algorithms with optimal efficiency, low latency, and minimal power consumption. Consequently, there remains significant potential for further exploration into improving the efficiency, latency, and power consumption of neural network accelerators across diverse computational scenarios. This paper investigates three key techniques for hardware acceleration of sparse neural networks. The main contributions are as follows: (1) Most neural network inference tasks are typically executed on general-purpose computing devices, which often fail to deliver high energy efficiency and are not well-suited for accelerating sparse convolutional models. In this work, we propose a specialized computational circuit for the convolutional operations of sparse neural networks. This circuit is designed to detect and eliminate the computational effort associated with zero values in the sparse convolutional kernels, thereby enhancing energy efficiency. (2) The data access patterns in convolutional neural networks introduce significant pressure on the high-latency off-chip memory access process. Due to issues such as data discontinuity, the data reading unit often fails to fully exploit the available bandwidth during off-chip read and write operations. In this paper, we analyze bandwidth utilization in the context of convolutional accelerator data handling and propose a strategy to improve off-chip access efficiency. Specifically, we leverage a compiler optimization plugin developed for Vitis HLS, which automatically identifies and optimizes on-chip bandwidth utilization. (3) In coefficient-based accelerators, the synchronous operation of individual computational units can significantly hinder efficiency. Previous approaches have achieved asynchronous convolution by designing separate memory units for each computational unit; however, this method consumes a substantial amount of on-chip memory resources. To address this issue, we propose a shared feature map cache design for asynchronous convolution in the accelerators presented in this paper. This design resolves address access conflicts when multiple computational units concurrently access a set of caches by utilizing a hash-based address indexing algorithm. Moreover, the shared cache architecture reduces data redundancy and conserves on-chip resources. Using the optimized accelerator, we successfully executed ResNet50 inference on an Intel Arria 10 1150GX FPGA, achieving a throughput of 497 GOPS, or an equivalent computational power of 1579 GOPS, with a power consumption of only 22 watts. Full article
Show Figures

Figure 1

25 pages, 12414 KiB  
Article
Investigation into the Prediction of Ship Heave Motion in Complex Sea Conditions Utilizing Hybrid Neural Networks
by Yuchen Liu, Xide Cheng, Kunyu Han, Zhechun Liu and Baiwei Feng
J. Mar. Sci. Eng. 2025, 13(1), 1; https://doi.org/10.3390/jmse13010001 - 24 Dec 2024
Viewed by 388
Abstract
While navigating at sea, ships are influenced by various factors, including wind, waves, and currents, which can result in heave motion that significantly impacts operations and potentially leads to accidents. Accurate forecasting of ship heaving is essential to guarantee the safety of maritime [...] Read more.
While navigating at sea, ships are influenced by various factors, including wind, waves, and currents, which can result in heave motion that significantly impacts operations and potentially leads to accidents. Accurate forecasting of ship heaving is essential to guarantee the safety of maritime navigation. Consequently, this paper proposes a hybrid neural network method that combines Convolutional Neural Networks (CNNs), Bidirectional Long Short-Term Memory Networks (BiLSTMs), and an Attention Mechanism to predict the heaving motion of ships in moderate to complex sea conditions. The data feature extraction ability of CNNs, the temporal analysis capabilities of BiLSTMs, and the dynamic adjustment function of Attention on feature weights were comprehensively utilized to predict a ship’s heave motion. Simulations of a standard container ship’s motion time series under complex sea state conditions were carried out. The model training and validation results indicate that, under sea conditions 4, 5, and 6, the CNN-BiLSTM-Attention method demonstrated significant improvements in MAPE, APE, and RMSE when compared to the traditional LSTM, Attention, and LSTM-Attention methods. The CNN-BiLSTM-Attention method could enhance the accuracy of the prediction. Heave displacement, pitch displacement, pitch velocity, pitch acceleration, and incoming wave height were chosen as key input features. Sensitivity analysis was conducted to optimize the prediction performance of the CNN-BiLSTM-Attention hybrid neural network method, resulting in a significant improvement in MAPE and enhancing the accuracy of ship motion prediction. The research presented in this paper establishes a foundation for future studies on ship motion prediction. Full article
Show Figures

Figure 1

18 pages, 2514 KiB  
Article
Research on Prediction and Optimization of Airport Express Passenger Flow Based on Fusion Intelligence Network Model
by Jin He, Yinzhen Li and Yuhong Chao
Appl. Sci. 2024, 14(24), 11886; https://doi.org/10.3390/app142411886 - 19 Dec 2024
Viewed by 411
Abstract
The purpose of this paper is to optimize the accuracy of airport express passenger flow prediction so as to meet the need for the optimal allocation of traffic resources against the background of accelerated urbanization and the rapid development of airport express services. [...] Read more.
The purpose of this paper is to optimize the accuracy of airport express passenger flow prediction so as to meet the need for the optimal allocation of traffic resources against the background of accelerated urbanization and the rapid development of airport express services. A fusion intelligence network model (FINM) is proposed, which integrates the advantages of convolutional neural networks, bidirectional long short-term memory networks, and gated recurrent units. Firstly, by using the powerful feature extraction ability of convolutional neural networks, local features and detail information are captured from the input data to improve the data representation ability. Secondly, bidirectional long short-term memory networks are used to process the sequence data, capture the global information and its context relationship, and enhance the model’s understanding of the dependence of time series data. Finally, gated recurrent units are introduced to simplify the computational complexity while maintaining high prediction accuracy and training efficiency. Based on the actual passenger flow data for Tianjin Metro Line 2 on a 30 min time scale, the proposed FINM is verified. The experimental results show that the model achieves an excellent performance, with 0.0160, 0.0947, 0.0160, 0.1255, 18.40, and 0.7788 in key indicators such as loss value (Loss Value), mean absolute error (MAE), mean squared error (MSE), root mean square error (RMSE), mean absolute percentage error (MAPE), and coefficient of determination (R-Squared). Compared with the comparison algorithm, this model shows significant advantages in all indicators, which proves its effectiveness in dealing with complex time series data. Full article
Show Figures

Figure 1

Back to TopTop