A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision
Abstract
:1. Introduction
2. Review Taxonomy
3. Advances in GPU-Accelerated Hardware
3.1. ASUS Tinker Boards
3.2. NVIDIA Jetson Boards
3.3. Libre Boards
3.4. Other Boards
3.5. Comparative Analysis
3.6. Mapping of Fundamental Computer Vision (CV) Tasks to SBCs
3.6.1. Entry-Level Computer Vision (CV) Task
3.6.2. Moderate-Performance Computer Vision (CV) Task
3.6.3. High-Performance Computer Vision (CV) Task
3.6.4. Very-High-Performance Computer Vision (CV) Task
4. Software Advances
4.1. Computer Vision (CV) Algorithm Optimization Techniques
4.1.1. Model Quantization
- Post-Training Quantization: This technique involves converting pre-trained models to low-precision representations, typically 8-bit fixed point, without retraining. Notable advancements include TensorFlow’s post-training quantization, which provides a straightforward approach for converting models to 8-bit precision, significantly reducing memory usage and accelerating inference without compromising accuracy [78].
- Quantization-Aware Training: This technique integrates quantization constraints during the model training process, enabling the creation of models that can be directly quantized without significant accuracy loss. Noteworthy advancements in quantization-aware training include the development of techniques in popular deep learning frameworks such as TensorFlow, PyTorch, and ONNX, allowing for the seamless integration of quantization during the training phase [78].
4.1.2. Model Pruning
- Magnitude-Based Pruning: This approach involves removing small-magnitude weights or connections from the network, often based on a predefined threshold. The advancements in this technique have led to the development of iterative pruning methods, which iteratively prune the least significant weights and fine-tune the remaining network to maintain performance levels [81].
- Filter Pruning: Filter pruning techniques aim to remove entire filters or channels within convolutional layers that contribute minimally to the network’s overall output. Network slimming and ThiNet are examples of filter pruning techniques that have demonstrated significant reductions in model size while preserving model accuracy [82].
- Structured Pruning: Structured pruning techniques target specific structures within the network, including entire neurons or layers, for removal. By leveraging the structured patterns present in neural networks, these techniques enable more efficient and systematic model compression. Recent advancements in structured pruning have focused on preserving model performance through techniques such as gradual pruning and retraining, ensuring that the pruned networks retain their original functionality [80].
- Channel Pruning: Channel pruning techniques specifically target individual channels within convolutional layers based on their importance to the network’s output. These methods identify redundant or less significant channels and selectively prune them to reduce computational overhead while maintaining model accuracy. Recent developments in channel pruning have emphasized the integration of sparsity regularization and fine-grained pruning techniques to achieve better trade-offs between model size and performance.
4.1.3. Knowledge Distillation
- Hinton’s Knowledge Distillation: Proposed by Geoffrey Hinton in 2015, this pioneering technique involved training the student model to match the softened probabilities generated by the teacher model. It provided a foundational framework for subsequent developments in knowledge distillation, emphasizing the importance of transferring rich knowledge representations from complex models to compact ones.
- Born-Again Networks: Introduced in 2017, this approach focused on leveraging knowledge distillation to improve the performance of the teacher model itself. By training the teacher model on its own soft targets, the model’s performance was enhanced, leading to more effective knowledge transfer to the student model.
- Attention Mechanism-Based Distillation: These approaches incorporate attention mechanisms to guide the student model in focusing on crucial details provided by the teacher model. Attention-based distillation techniques facilitate the transfer of intricate knowledge representations, enabling the student model to capture important patterns and nuances present in the data [86].
- Multi-Stage Distillation: Multi-stage distillation techniques refine the knowledge transfer process iteratively, allowing the student model to learn from multiple stages of the teacher’s learning process. By progressively transferring knowledge across different stages, the student model can capture a more comprehensive understanding of the underlying data distribution, resulting in improved performance and robustness [87].
4.1.4. Hardware-Aware Optimization
- Neural Network Design for Specific Hardware: These approaches involve designing neural network architectures tailored to the the specific hardware characteristics of edge devices. By customizing the network structure to exploit hardware features such as parallel processing capabilities and memory hierarchies, these techniques optimize the overall performance and energy efficiency of models deployed on edge devices [89].
- Compiler Optimizations for Edge Devices: Compiler optimization techniques have been developed to transform high-level deep learning model representations into efficient executable code optimized for specific edge-computing platforms. These optimizations include code transformations and scheduling techniques that leverage the underlying hardware architecture to improve the performance and energy efficiency of deployed models [90].
4.1.5. Federated Learning
- Efficient Communication Protocols: These approaches aim to minimize communication overhead and optimize the exchange of model updates and gradients across distributed edge devices. By implementing efficient communication protocols, such as adaptive quantization and compression techniques, federated learning systems can achieve faster convergence and reduced communication costs while maintaining data privacy and security.
- Enhanced Security Protocols: Advanced federated learning systems integrate robust security protocols, including Secure Multiparty Computation (SMC) and homomorphic encryption, to protect sensitive data and model parameters during collaborative model training. These protocols ensure that privacy-sensitive information remains secure and encrypted throughout the entire learning process, enabling decentralized edge devices to participate in collaborative learning without compromising data privacy.
4.1.6. Model Compression
- Parameter Regularization Techniques: These techniques aim to impose constraints on model parameters during training to prevent overfitting and reduce the model’s complexity. Methods such as L1 and L2 regularization encourage sparsity in the weight matrices, leading to more compact models with improved generalization capabilities and reduced memory footprint [92].
- Dynamic Network Surgery: Dynamic network surgery techniques dynamically adjust the network architecture during the training process, allowing the model to grow or shrink based on the task requirements. By adaptively adding or removing network components, dynamic network surgery enables the creation of highly efficient and task-specific models tailored for edge-computing applications [93].
4.2. Computer Vision (CV) Packages and Libraries
- TensorFlow is a widely adopted end-to-end open-source platform introduced by the Google Brain’s Machine Intelligence team for CV and neural network development, offering a comprehensive and flexible ecosystem encompassing various tools, libraries, and community resources [94]. TensorFlow provides stable Python and C++ APIs, alongside a non-guaranteed backward compatible API for other programming languages.
- PyTorch, initially developed by Meta AI, has emerged as one of the leading open-source ML libraries and serves as a versatile tool for CV development [95]. Offering a polished Python interface and a secondary C++ interface, PyTorch caters to diverse user preferences and development requirements. PyTorch supports ONNX for seamless model conversion between frameworks. PyTorch’s Tensor class facilitates efficient storage and manipulation of multi-dimensional arrays, offering seamless integration with CUDA-capable GPUs and ongoing support for diverse GPU platforms.
- OpenCV is an open-source CV library which serves as a foundational framework for a wide array of CV applications, offering accessibility and adaptability through its Apache 2 licensing [96]. Featuring over 2500 optimized algorithms, OpenCV enables users to tackle diverse CV tasks (e.g., face detection, optical tracking, and object detection). OpenCV is one of most widely adopted CV packages with around 47 thousand robust users. It supports multiple programming languages including C++, Python, Java, and MATLAB, as well as major OSs such as Windows, Linux, Android, and macOS, ensuring its broad accessibility and integration. Ongoing development efforts are directed towards enhancing GPU acceleration through CUDA and OpenCL interfaces, ensuring that OpenCV remains at the forefront of innovation in CV research and application development.
- Caffe is one of the initially developed deep learning frameworks by Berkeley AI Research (BAIR) designed with a focus on expression, speed, and modularity [97]. Caffe operates under the BSD 2-Clause license, fostering an open and collaborative development environment. Caffe offers seamless switching between CPU and GPU for training and deployment, facilitating versatility across various computing environments, from high-performance GPU machines to commodity clusters and mobile devices. Caffe project, at the time of its launch, had over 1000 members contributing to enhance its capabilities; however, it could not keep the pace as other libraries like PyTorch and TensorFlow took over. Although Caffe is not actively maintained any more, there is still support from NVIDIA to run Caffe on the latest generation of GPUs using cuDNN Caffe library. Caffe is still popular among the community; however, Caffe2 is actively adopted by researchers as the successor.
- Scikit-image, developed by Stéfan van der Walt and formerly known as scikits.image, is a Python-based open-source library dedicated to image-processing tasks. It features a comprehensive suite of algorithms covering segmentation, geometric transformations, colour space manipulation, and feature detection [98]. Designed to seamlessly integrate with Python’s numerical and scientific libraries, such as NumPy and SciPy, scikit-image offers a robust ecosystem for image analysis and manipulation. Leveraging a predominantly Python-based architecture, scikit-image optimizes performance by implementing core algorithms in Cython, striking a balance between ease of use and computational efficiency.
- SimpleCV is an accessible framework designed for the development of open-source CV leveraging the capabilities of OpenCV and the simplicity of the Python programming language [99]. SimpleCV aims to cater to both novice and experienced programmers, providing a comprehensive platform for basic CV functions as well as an elegant programming interface for advanced users. With features for the easy extraction, sorting, and filtering of image information, as well as fast manipulations with intuitive naming conventions, SimpleCV streamlines the process of developing CV applications. Moreover, it abstracts away the complexities of underlying CV libraries, such as OpenCV, allowing users to focus on application development without the need to delve into technical details like bit depths, file formats, or linear algebra concepts. In essence, SimpleCV enables users to harness the power of CV without unnecessary barriers, making the field more accessible and approachable for all.
Comparative Analysis
4.3. Computer Vision (CV) Development Frameworks
- Detectron2, developed by Facebook AI Research (FAIR) in 2018 [100], is a widely adopted open-source framework among the research community and offers a range of detection and segmentation model variants. It is built on PyTorch and known for its modularity, flexibility, and performance. Detectron2 model zoo includes a wide array of the latest object detection and instance segmentation model variants of Faster R-CNN. Furthermore, it uses GPU acceleration with mixed precision training enabling it to achieve higher inference speeds. In addition, it simplifies the deployment of models to production by offering standard training workflows and model conversion capabilities.
- The NVIDIA TAO Toolkit, unveiled in 2020 by NVIDIA, emerges as a leading solution designed for CV and AI applications, particularly suited for edge computing and embedded systems [101]. Developed to streamline AI model training and deployment, TAO simplifies the intricacies of deep learning frameworks like TensorFlow and PyTorch, offering a low-code approach to model customization and optimization. Leveraging pre-trained vision AI models from the NVIDIA GPU Cloud (NGC), users can fine-tune and customize models effortlessly, culminating in trained models deployable across a spectrum of platforms, from GPUs to CPUs and MCUs. Key features of the TAO Toolkit include its AutoML capability, which streamlines model training by automating hyperparameter tuning, and its support for model pruning and quantization-aware training, optimizing model size and inference performance. Additionally, TAO facilitates seamless deployment on various devices through its support for ONNX export and multi-GPU/multi-node training, ensuring scalability and efficiency.
- OpenMMLab, introduced in October 2018, is a comprehensive and modular open-source platform for the development of deep learning-driven CV applications [102]. Built upon the PyTorch framework, OpenMMLab leverages MMEngine to provide a universal training and evaluation engine, alongside MMCV, which offers essential neural network operators and data transforms. With over 30 vision libraries, 300 implemented algorithms, and a repository containing over 2000 pre-trained models, OpenMMLab continues to drive innovation and empower developers with the tools needed to tackle complex CV tasks effectively.
- Ultralytics platform, released in 2019, has gained fairly rapid attention in recent years for its YOLO series of models in object detection and instance segmentation research [103]. Ultralytics’ open-source projects on GitHub provide state-of-the-art solutions for a diverse range of AI tasks, spanning detection, segmentation, classification, tracking, and pose estimation. With an open and inclusive approach, Ultralytics actively seeks feedback, feature requests, and bug reports from its user base, ensuring continuous improvement and innovation. Leveraging the power of PyTorch and a range of compatible models, Ultralytics empowers users with features such as mixed precision training, real-time model evaluation, and visualization, facilitating a seamless transition from model creation to practical deployment.
Comparative Analysis
4.4. Computer Vision (CV) Packages for Hardware Deployment
- PyTorch Mobile is a leading solution for deploying ML models on low-power mobile and edge-computing devices, designed for compactness and performance in CV tasks [104]. It is part of the PyTorch ecosystem allowing a smooth transition from model training to deployment. Some highlighted features include the privacy preserving federating learning capability, cross-platform support, support for TorchScript, and integration with optimization techniques.
- OpenVINO, short for Open Visual Inference and Neural network Optimization, is an open-source toolkit developed by Intel for optimizing and deploying AI inference on a variety of devices, particularly embedded systems and edge-computing devices [105]. By leveraging models trained on popular frameworks like TensorFlow and PyTorch, OpenVINO enables users to optimize the model inference on low-power and resource-constrained SBCs. The core components of the OpenVINO toolkit encompass the OpenVINO Model Converter (OVC), OpenVINO Runtime, and a versatile set of plugins catering to CPUs, GPUs, and heterogeneous computing environments. Additionally, the toolkit provides a suite of samples and frontends, facilitating model conversion, inference, and transformation tasks with ease.
- ONNX is an open-source initiative towards a unified solution for the seamless interoperability of models across various development frameworks [106]. ONNX enables AI developers to transcend framework limitations and effortlessly exchange models between platforms such as TensorFlow, PyTorch, and Caffe. It facilitates access to hardware accelerators through compatible runtimes and libraries, optimizing performance across a spectrum of hardware configurations.
- TensorRT, introduced by NVIDIA, is a high-performance deep learning inference engine designed for the efficient deployment of models on edge devices and low-power systems [107]. Compatible with popular frameworks like TensorFlow and PyTorch, TensorRT offers a suite of optimization techniques aimed at minimizing memory usage and computational overhead. Leveraging the NVIDIA CUDA parallel programming model, TensorRT enables developers to enhance inference performance through quantization, layer fusion, kernel tuning, and other optimization methods on NVIDIA GPUs. Furthermore, it supports INT8 quantization-aware training, post-training quantization, and FP16 optimizations.
- TensorFlow Lite is a specialized version of TensorFlow explicitly designed for deployment on edge-computing and embedded devices [108]. By optimizing for on-device operations, TensorFlow Lite addresses crucial constraints such as latency, privacy, connectivity, size, and power consumption, making it ideal for edge-computing scenarios where real-time processing and data privacy are significant. With hardware acceleration and model optimization techniques, TensorFlow Lite delivers high-performance inference, ensuring the efficient execution of ML models on resource-constrained devices.
Comparative Analysis
5. Challenges, Limitations, and Opportunities
5.1. Current Limitations and Challenges
- Limited Computational Power: SBCs have restricted computational capabilities and necessitate highly optimized models that balance computational complexity and performance to maintain real-time processing capabilities. Techniques such as model pruning, quantization, and knowledge distillation are essential to reduce the model’s computational footprint while preserving accuracy. Additionally, optimizing the computational graph and leveraging specialized instruction sets at CUDA level can further enhance inference performance.
- Memory and Storage Constraints: SBCs often feature limited RAM and non-volatile storage, which restricts the size and complexity of deployable models. This limitation requires the deployment of compact neural network architectures (e.g., MobileNets or SqueezeNet) and efficient memory management strategies to fit models within the available memory without significantly sacrificing performance. Memory mapping techniques and in-memory computation strategies are critical for maximizing the usage of available resources.
- Real-Time Processing: Real-time video analytics on SBCs demands efficient processing pipelines capable of handling high-resolution video feeds and complex CV tasks with minimal latency. Techniques like pipeline parallelism, edge-computing strategies, and the use of lightweight neural networks (e.g., YOLO-lite or Tiny YOLO) are crucial for achieving the desired throughput and response times. Implementing hardware-level model optimization and leveraging asynchronous processing can significantly reduce processing times and enhance real-time capabilities.
- Maintenance and Management: Deploying and managing CV models on distributed edge devices involve complex challenges related to monitoring, updates, and maintenance. This necessitates the implementation of Over-The-Air (OTA) updates, remote diagnostics, and automated monitoring systems to ensure continuous operation and adaptability to evolving requirements. Utilizing containerization (e.g., Docker or Kubernetes) and employing Continuous Integration/Continuous Deployment (CI/CD) pipelines can streamline the deployment and maintenance processes.
- Heterogeneity in Edge Devices: The diversity of edge devices, each with distinct hardware capabilities, OS, and software ecosystems, present significant compatibility challenges. Developing universally compatible CV models requires using cross-platform development frameworks, and leveraging hardware abstraction layers to ensure consistent performance across different devices. Implementing device-specific optimizations and leveraging cloud-based orchestration can further enhance compatibility and performance.
- Heat Management in SBCs: SBCs often rely on passive cooling mechanisms, which may be insufficient for systems running 24/7. Continuous operation under high computational loads can lead to overheating, resulting in thermal throttling or system failures. Even active cooling solutions may not provide adequate heat dissipation, necessitating the design of efficient thermal management strategies, such as heat sinks, heat pipes, and advanced cooling systems, to ensure reliable long-term operation.
- Power Consumption: SBCs designed for remote applications must contend with high power consumption, which can make them unsuitable for scenarios where energy efficiency is critical. Optimizing power usage through efficient hardware design, low-power components, and dynamic power-management techniques is essential to extend operational uptime and reduce overall energy costs. The integration of energy-efficient processors and peripherals, coupled with software optimizations for minimal power consumption during idle and active states, is crucial for enhancing the practicality of SBCs in remote deployments.
5.2. Future Opportunities
- Advancements in Edge-Computing Hardware: The development of next-generation embedded hardware, featuring more powerful processors, enhanced memory, and efficient energy utilization, will facilitate the deployment of more sophisticated CV models. Innovations in the GPU architectures and dedicated AI accelerators will further enhance the performance of SBC.
- Optimized Deep Learning Algorithms: The development of light-weight and yet efficient CV models specifically for mobile deployment is an active area of research. In addition, advancements in model optimization and quantization techniques will enable high-performance inference on SBCs.
- Integration of AI Accelerators: Embedding dedicated AI accelerators, such as Google’s Edge TPU, Intel’s Movidius, and NVIDIA’s Tensor cores, into edge devices will dramatically enhance inference speed and energy efficiency, enabling the real-time processing of more complex CV tasks directly on the edge. These accelerators provide specialized hardware designed to accelerate common deep learning operations (e.g., matrix multiplications and convolutions) and support parallel processing, significantly boosting performance. Elevated performances have already been observed for the latest NVIDIA Jetson boards with dedicated Tensor cores.
- Offline Functionality: Enhancements in edge-computing capabilities will allow CV systems to perform critical tasks offline, ensuring continuous operation even in the absence of stable network connectivity. This is particularly advantageous for applications in remote areas or environments with unreliable internet access, where uninterrupted real-time processing is essential. Developing robust fallback mechanisms and local data-storage solutions will support seamless offline functionality.
- Intelligent Power Management: Power management in SBCs stands out as a critical challenge that requires immediate attention to extend operational times effectively. Future research directions are increasingly focusing on AI-oriented approaches to power management and optimization at the device level. These approaches aim to leverage AI techniques such as ML and reinforcement learning to dynamically adjust power consumption based on workload demands and environmental conditions. Optimizing at the device level involves developing energy-efficient hardware designs and implementing intelligent power-saving algorithms.
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
Abbreviations
CV | Computer Vision |
GPU | Graphical Processing Unit |
CNN | Convolutional Neural Network |
SBC | Single-Board Computer |
CUDA | Compute Unified Device Architecture |
LiDAR | Light Detection and Ranging |
AI | Artificial Intelligence |
ARM | Advanced RISC Machine |
CPU | Central Processing Unit |
GPIO | General-Purpose Input/Output |
DSI | Display Serial Interface |
MIPI | Mobile Industry Processor Interface |
CSI | Camera Serial Interface |
HDMI | High-Definition Multimedia Interface |
USB | Universal Serial Bus |
RAM | Random Access Memory |
eMMC | Embedded Multi-Media Card |
microSD | Micro Secure Digital |
ML | Machine Learning |
TPU | Tensor Processing Unit |
LPDDR | Low-Power Double Data Rate |
I/O | Input/Output |
LAN | Local Area Network |
IoT | Internet of Things |
UART | Universal Asynchronous Receiver/Transmitter |
PCIe | Peripheral Component Interconnect Express |
SO-DIMM | Small Outline Dual Inline Memory Module |
EVE | Embedded Vision Engine |
NPU | Neural Processing Unit |
ONNX | Open Neural Network Exchange |
SMC | Secure Multiparty Computation |
Caffe | Convolutional Architecture for Fast Feature Embedding |
BAIR | Berkeley AI Research |
FAIR | Facebook AI Research |
NGC | NVIDIA GPU Cloud |
OVC | OpenVINO Model Converter |
OTA | Over The Air |
CI/CD | Continuous Integration/Continuous Deployment |
References
- Lürig, M.D.; Donoughe, S.; Svensson, E.I.; Porto, A.; Tsuboi, M. Computer vision, machine learning, and the promise of phenomics in ecology and evolutionary biology. Front. Ecol. Evol. 2021, 9, 642774. [Google Scholar] [CrossRef]
- Zhu, L.; Spachos, P.; Pensini, E.; Plataniotis, K.N. Deep learning and machine vision for food processing: A survey. Curr. Res. Food Sci. 2021, 4, 233–249. [Google Scholar] [CrossRef]
- Iqbal, U.; Perez, P.; Li, W.; Barthelemy, J. How computer vision can facilitate flood management: A systematic review. Int. J. Disaster Risk Reduct. 2021, 53, 102030. [Google Scholar] [CrossRef]
- Akbari, Y.; Almaadeed, N.; Al-Maadeed, S.; Elharrouss, O. Applications, databases and open computer vision research from drone videos and images: A survey. Artif. Intell. Rev. 2021, 54, 3887–3938. [Google Scholar] [CrossRef]
- Paletta, Q.; Terrén-Serrano, G.; Nie, Y.; Li, B.; Bieker, J.; Zhang, W.; Dubus, L.; Dev, S.; Feng, C. Advances in solar forecasting: Computer vision with deep learning. Adv. Appl. Energy 2023, 11, 100150. [Google Scholar] [CrossRef]
- Gunawardena, N.; Ginige, J.A.; Javadi, B. Eye-tracking technologies in mobile devices Using edge computing: A systematic review. ACM Comput. Surv. 2022, 55, 1–33. [Google Scholar] [CrossRef]
- Barthélemy, J.; Verstaevel, N.; Forehead, H.; Perez, P. Edge-computing video analytics for real-time traffic monitoring in a smart city. Sensors 2019, 19, 2048. [Google Scholar] [CrossRef] [PubMed]
- Iqbal, U.; Barthelemy, J.; Perez, P.; Davies, T. Edge-computing video analytics solution for automated plastic-bag contamination detection: A case from remondis. Sensors 2022, 22, 7821. [Google Scholar] [CrossRef]
- Papini, M.; Iqbal, U.; Barthelemy, J.; Ritz, C. The role of deep learning models in the detection of anti-social behaviours towards women in public transport from surveillance videos: A scoping review. Safety 2023, 9, 91. [Google Scholar] [CrossRef]
- Iqbal, U.; Bin Riaz, M.Z.; Barthelemy, J.; Perez, P. Artificial Intelligence of Things (AIoT)-oriented framework for blockage assessment at cross-drainage hydraulic structures. Australas. J. Water Resour. 2023, 1–11. [Google Scholar] [CrossRef]
- Feng, X.; Jiang, Y.; Yang, X.; Du, M.; Li, X. Computer vision algorithms and hardware implementations: A survey. Integration 2019, 69, 309–320. [Google Scholar] [CrossRef]
- Plastiras, G.; Terzi, M.; Kyrkou, C.; Theocharidcs, T. Edge intelligence: Challenges and opportunities of near-sensor machine learning applications. In Proceedings of the 2018 IEEE 29th International Conference on Application-Specific Systems, Architectures and Processors (ASAP), Milano, Italy, 10–12 July 2018; IEEE: Piscataway, NJ, USA, 2018; pp. 1–7. [Google Scholar]
- Yu, W.; Liang, F.; He, X.; Hatcher, W.G.; Lu, C.; Lin, J.; Yang, X. A survey on the edge computing for the Internet of Things. IEEE Access 2017, 6, 6900–6919. [Google Scholar] [CrossRef]
- Himeur, Y.; Sayed, A.; Alsalemi, A.; Bensaali, F.; Amira, A. Edge AI for Internet of Energy: Challenges and perspectives. Internet Things 2023, 25, 101035. [Google Scholar] [CrossRef]
- Nastic, S.; Rausch, T.; Scekic, O.; Dustdar, S.; Gusev, M.; Koteska, B.; Kostoska, M.; Jakimovski, B.; Ristov, S.; Prodan, R. A serverless real-time data analytics platform for edge computing. IEEE Internet Comput. 2017, 21, 64–71. [Google Scholar] [CrossRef]
- Wang, F.; Zhang, M.; Wang, X.; Ma, X.; Liu, J. Deep learning for edge computing applications: A state-of-the-art survey. IEEE Access 2020, 8, 58322–58336. [Google Scholar] [CrossRef]
- Pandey, M.; Fernandez, M.; Gentile, F.; Isayev, O.; Tropsha, A.; Stern, A.C.; Cherkasov, A. The transformational role of GPU computing and deep learning in drug discovery. Nat. Mach. Intell. 2022, 4, 211–221. [Google Scholar] [CrossRef]
- Dally, W.J.; Keckler, S.W.; Kirk, D.B. Evolution of the graphics processing unit (GPU). IEEE Micro 2021, 41, 42–51. [Google Scholar] [CrossRef]
- Gill, S.S.; Wu, H.; Patros, P.; Ottaviani, C.; Arora, P.; Pujol, V.C.; Haunschild, D.; Parlikad, A.K.; Cetinkaya, O.; Lutfiyya, H.; et al. Modern computing: Vision and challenges. Telemat. Inform. Rep. 2024, 13, 100116. [Google Scholar] [CrossRef]
- Varghese, B.; Wang, N.; Bermbach, D.; Hong, C.H.; Lara, E.D.; Shi, W.; Stewart, C. A survey on edge performance benchmarking. Acm Comput. Surv. (CSUR) 2021, 54, 1–33. [Google Scholar] [CrossRef]
- Afif, M.; Said, Y.; Atri, M. Computer vision algorithms acceleration using graphic processors NVIDIA CUDA. Clust. Comput. 2020, 23, 3335–3347. [Google Scholar] [CrossRef]
- Sctorptec. ASUS Tinker Board S Revision 2.0. Available online: https://www.scorptec.com.au/product/motherboards/development-kits/100884-tinker-board-s-r2.0-a-2g-16g?gad_source=1&gclid=CjwKCAjww_iwBhApEiwAuG6ccFwXvZeXVq9ruR1A6GPsy1Mnyl_Y0fGq80Y7QUnYKUYRhi-sVY8-FBoCuf4QAvD_BwE (accessed on 22 February 2024).
- DigiKey. ASUS Tinker Edge T. Available online: https://www.digikey.com/en/products/detail/asus/TINKER-EDGE-T/14005964 (accessed on 15 February 2024).
- RSOnline. ASUS Tinker Board 2. Available online: https://uk.rs-online.com/web/p/single-board-computers/2657193 (accessed on 16 February 2024).
- Rutronik. ASUS Tinker Board 3N. Available online: https://www.rutronik24.com/product/asus/tinker+board+3n/21508431.html (accessed on 22 February 2024).
- Taşpınar, Y.S.; Selek, M. Object recognition with hybrid deep learning methods and testing on embedded systems. Int. J. Intell. Syst. Appl. Eng. 2020, 8, 71–77. [Google Scholar] [CrossRef]
- Chen, W.Y.; Wu, F.; Hu, C.C. Application of OpenCV in Asus Tinker Board for face recognition. In Proceedings of the Second International Workshop on Pattern Recognition, Singapore, 1–3 May 2017; SPIE: Kuala Lumpur, Malaysia, 2017; Volume 10443, pp. 87–91. [Google Scholar]
- Jahan, N.; Rupa, F.Y.; Sarkar, S.; Hossain, S.; Kabir, S.S. Performance Analysis of ASUS Tinker and MobileNetV2 in Face Mask Detection on Different Datasets. In Proceedings of the International Conference on Machine Intelligence and Emerging Technologies, Noakhali, Bangladesh, 23–25 September 2022; Springer: Berlin/Heidelberg, Germany, 2022; pp. 283–293. [Google Scholar]
- Tran, L.D.; Tran, M.T. Enhancing Edge-Based Mango Pest Classification Through Model Optimization. In Proceedings of the International Conference on Computing and Communication Technologies (RIVF), Hanoi, Vietnam, 23–25 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 266–271. [Google Scholar]
- Scorptec. NVIDIA Jetson Nano. Available online: https://www.scorptec.com.au/product/motherboards/development-kits/89586-900-13448-0020-000 (accessed on 28 February 2024).
- LittleBird. NVIDIA Jetson TX2. Available online: https://littlebirdelectronics.com.au/products/nvidia-jetson-tx2-developer-kit (accessed on 3 March 2024).
- JW-Electronics. NVIDIA Jetson Xavier NX. Available online: https://www.jw.com.au/product/nvidia-jetson-xavier-nx-developer-kit-development-board (accessed on 2 March 2024).
- LittleBird. NVIDIA Jetson ORIN AGX. Available online: https://littlebirdelectronics.com.au/products/nvidia-jetson-agx-orin-64gb-developer-kit (accessed on 4 March 2024).
- LittleBird. NVIDIA Jetson ORIN Nano. Available online: https://littlebirdelectronics.com.au/products/nvidia-jetson-orin-nano-8gb-developer-kit (accessed on 4 March 2024).
- Chen, C.; Wang, W. Jetson Nano-Based Subway Station Area Crossing Detection. In Proceedings of the International Conference on Artificial Intelligence in China, Wuhan, China, 26–28 July 2024; Springer: Berlin/Heidelberg, Germany, 2023; pp. 627–635. [Google Scholar]
- Sarvajcz, K.; Ari, L.; Menyhart, J. AI on the Road: NVIDIA Jetson Nano-Powered Computer Vision-Based System for Real-Time Pedestrian and Priority Sign Detection. Appl. Sci. 2024, 14, 1440. [Google Scholar] [CrossRef]
- Wang, Y.; Zou, R.; Chen, Y.; Gao, Z. Research on Pedestrian Detection Based on Jetson Xavier NX Platform and YOLOv4. In Proceedings of the 4th International Symposium on Computer Engineering and Intelligent Communications (ISCEIC), Nanjing, China, 18–20 August 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 373–377. [Google Scholar]
- Baller, S.P.; Jindal, A.; Chadha, M.; Gerndt, M. DeepEdgeBench: Benchmarking deep neural networks on edge devices. In Proceedings of the International Conference on Cloud Engineering (IC2E), San Francisco, CA, USA, 4–8 October 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 20–30. [Google Scholar]
- Tolmacheva, A.; Ogurtsov, D.; Dorrer, M. Justification for choosing a single-board hardware computing platform for a neural network performing image processing. IOP Conf. Ser. Mater. Sci. Eng. 2020, 734, 012130. [Google Scholar] [CrossRef]
- Valencia, C.A.A.; Suliva, R.S.S.; Villaverde, J.F. Hardware Performance Evaluation of Different Computing Devices on YOLOv5 Ship Detection Model. In Proceedings of the 14th International Conference on Humanoid, Nanotechnology, Information Technology, Communication and Control, Environment, and Management (HNICEM), Boracay Island, Philippines, 1–4 December 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–5. [Google Scholar]
- Hakim, A.A.; Juanara, E.; Rispandi, R. Mask Detection System with Computer Vision-Based on CNN and YOLO Method Using Nvidia Jetson Nano. J. Inf. Syst. Explor. Res. 2023, 1, 109–122. [Google Scholar] [CrossRef]
- Süzen, A.A.; Duman, B.; Şen, B. Benchmark analysis of jetson tx2, jetson nano and raspberry pi using deep-cnn. In Proceedings of the International Congress on Human-Computer Interaction, Optimization and Robotic Applications (HORA), Ankara, Turkey, 1–13 June 2021; IEEE: Piscataway, NJ, USA, 2020; pp. 1–5. [Google Scholar]
- Zhu, J.; Feng, H.; Zhong, S.; Yuan, T. Performance analysis of real-time object detection on Jetson device. In Proceedings of the 22nd International Conference on Computer and Information Science (ICIS), Zhuhai, China, 28 June 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 156–161. [Google Scholar]
- Ma, M.Y.; Shen, S.E.; Huang, Y.C. Enhancing UAV Visual Landing Recognition with YOLO’s Object Detection by Onboard Edge Computing. Sensors 2023, 23, 8999. [Google Scholar] [CrossRef] [PubMed]
- Berardini, D.; Galdelli, A.; Mancini, A.; Zingaretti, P. Benchmarking of dual-step neural networks for detection of dangerous weapons on edge devices. In Proceedings of the 18th International Conference on Mechatronic and Embedded Systems and Applications (MESA), Taipei, Taiwan, 28–30 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
- Beegam, K.S.; Shenoy, M.V.; Chaturvedi, N. Hybrid consensus and recovery block-based detection of ripe coffee cherry bunches using RGB-D sensor. IEEE Sens. J. 2021, 22, 732–740. [Google Scholar] [CrossRef]
- Chen, Y.; Wang, C.; Lou, S. Edge artificial intelligence camera network: An efficient object detection and tracking framework. J. Electron. Imaging 2022, 31, 033030. [Google Scholar] [CrossRef]
- Xun, D.T.W.; Lim, Y.L.; Srigrarom, S. Drone detection using YOLOv3 with transfer learning on NVIDIA Jetson TX2. In Proceedings of the Second International Symposium on Instrumentation, Control, Artificial Intelligence, and Robotics (ICA-SYMP), Bangkok, Thailand, 20–22 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 1–6. [Google Scholar]
- Nguyen, H.H.; Tran, D.N.N.; Jeon, J.W. Towards real-time vehicle detection on edge devices with nvidia jetson tx2. In Proceedings of the International Conference on Consumer Electronics-Asia (ICCE-Asia), Busan, Republic of Korea, 23–25 October 2023; IEEE: Piscataway, NJ, USA, 2020; pp. 1–4. [Google Scholar]
- Afifi, M.; Ali, Y.; Amer, K.; Shaker, M.; ElHelw, M. Robust real-time pedestrian detection in aerial imagery on jetson tx2. arXiv 2019, arXiv:1905.06653. [Google Scholar]
- Byzkrovnyi, O.; Smelyakov, K.; Chupryna, A.; Lanovyy, O. Comparison of Object Detection Algorithms for the Task of Person Detection on Jetson TX2 NX Platform. In Proceedings of the Open Conference of Electrical, Electronic and Information Sciences (eStream), Vilnius, Lithuania, 25 April 2024; IEEE: Piscataway, NJ, USA, 2024; pp. 1–6. [Google Scholar]
- Sa, I.; Chen, Z.; Popović, M.; Khanna, R.; Liebisch, F.; Nieto, J.; Siegwart, R. weednet: Dense semantic weed classification using multispectral images and mav for smart farming. IEEE Robot. Autom. Lett. 2017, 3, 588–595. [Google Scholar] [CrossRef]
- Chen, C.J.; Huang, Y.Y.; Li, Y.S.; Chen, Y.C.; Chang, C.Y.; Huang, Y.M. Identification of fruit tree pests with deep learning on embedded drone to achieve accurate pesticide spraying. IEEE Access 2021, 9, 21986–21997. [Google Scholar] [CrossRef]
- Kumar, P.; Batchu, S.; Kota, S.R. Real-time concrete damage detection using deep learning for high rise structures. IEEE Access 2021, 9, 112312–112331. [Google Scholar] [CrossRef]
- Aishwarya, N.; Kumar, V. Banana Ripeness Classification with Deep CNN on NVIDIA Jetson Xavier AGX. In Proceedings of the 7th International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud)(I-SMAC), Kirtipur, Nepal, 11–13 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 663–668. [Google Scholar]
- Aljaafreh, A.; Abadleh, A.; Alja’Afreh, S.S.; Alawasa, K.; Almajali, E.; Faris, H. Edge deep learning and computer vision-based physical distance and face mask detection system using Jetson Xavior NX. Emerg. Sci. J. 2022, 7, 70–80. [Google Scholar] [CrossRef]
- Shin, D.J.; Kim, J.J. A deep learning framework performance evaluation to use YOLO in Nvidia Jetson platform. Appl. Sci. 2022, 12, 3734. [Google Scholar] [CrossRef]
- Wasule, S.; Khadatkar, G.; Pendke, V.; Rane, P. Xavier Vision: Pioneering Autonomous Vehicle Perception with YOLO v8 on Jetson Xavier NX. In Proceedings of the Pune Section International Conference (PuneCon), Pune, India, 15–17 December 2022; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
- Chen, Y.C.; Fathoni, H.; Yang, C.T. Implementation of fire and smoke detection using deepstream and edge computing approachs. In Proceedings of the International Conference on Pervasive Artificial Intelligence (ICPAI), Taipei, Taiwan, 3–5 December 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 272–275. [Google Scholar]
- Dao, T.T.; Pham, Q.V.; Hwang, W.J. FastMDE: A fast CNN architecture for monocular depth estimation at high resolution. IEEE Access 2022, 10, 16111–16122. [Google Scholar] [CrossRef]
- Zahid, A.; Majeed, Y.; Ojo, M.O. Standalone Edge Ai-Based Solution for Tomato Diseases Detection. Available online: https://ouci.dntb.gov.ua/en/works/4YEJxPq4/ (accessed on 14 March 2024).
- Avila, R.; Kitani, E.; Zampirolli, F.D.A.; Yoshioka, L.; Celiberto, L.A.; Ibusuki, U. Comparisons of Neural Networks Using Computer Vision for Agricultural Automation. In Proceedings of the 15th International Conference on Industry Applications (INDUSCON), Sao Bernardo do Campo, Brazil, 22–24 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 466–470. [Google Scholar]
- Bhattacharjee, A.; Patel, B.; Taylor, A.J.; Rivera, J.A. Object detection for infrared ground to ground applications on the edge. In Proceedings of the Automatic Target Recognition XXXIV, National Harbor, MD, USA, 7 June 2024; SPIE: Kuala Lumpur, Malaysia, 2024; Volume 13039, pp. 172–179. [Google Scholar]
- Belhaoua, A.; Kimpe, T.; Crul, S. TensorRT-based surgical instrument detection assessment for deep learning on edge computing. In Proceedings of the Medical Imaging 2024: Image-Guided Procedures, Robotic Interventions, and Modeling, San Diego, CA, USA, 19–23 February 2023; SPIE: Kuala Lumpur, Malaysia, 2024; Volume 12928, pp. 368–371. [Google Scholar]
- Carvalho, D.R.; Lompado, A.; Consolo, R.; Bhattacharjee, A.; Brown, J.P. Real-time object detection and tracking using flash LiDAR imagery. In Proceedings of the Automatic Target Recognition XXXIV, National Harbor, MD, USA, 7 June 2024; SPIE: Kuala Lumpur, Malaysia, 2024; Volume 13039, pp. 45–57. [Google Scholar]
- LoveRPI. Libre Tritium. Available online: https://www.loverpi.com/products/libre-computer-board-all-h3-cc (accessed on 21 February 2024).
- LoveRPI. Libre Le Potato. Available online: https://www.loverpi.com/products/libre-computer-board-aml-s905x-cc (accessed on 22 February 2024).
- LoveRPI. Libre Renegade. Available online: https://www.loverpi.com/products/libre-computer-board-roc-rk3328-cc (accessed on 21 February 2024).
- AllNet. StarFive VisionFive 2. Available online: https://shop.allnetchina.cn/products/starfive-visionfive-2-quad-core-risc-v-dev-board (accessed on 24 February 2024).
- Amazon. ROCK PI N10. Available online: https://www.amazon.com.au/Designed-Solutions-Rockchip-RK3399pro-Acrylic/dp/B0BGJ14391 (accessed on 25 February 2024).
- RSOnline. BeagleBone AI. Available online: https://au.rs-online.com/web/p/single-board-computers/2397123 (accessed on 25 February 2024).
- Jacob. Enabling Computer Vision: Object Detection with StarFive VisionFive 2 using GPU Acceleration. 2023. Available online: https://forum.youyeetoo.com/t/enabling-computer-vision-object-detection-with-starfive-visionfive-2-using-gpu-acceleration/296 (accessed on 25 February 2024).
- Bogacz, J.; Qouneh, A. Convolution Neural Network on BeagleBone Black Wireless for Machine Learning Applications. In Proceedings of the MIT Undergraduate Research Technology Conference (URTC), Cambridge, MA, USA, 30 September–2 October 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–4. [Google Scholar]
- Civit-Masot, J.; Luna-Perejón, F.; Corral, J.M.R.; Domínguez-Morales, M.; Morgado-Estévez, A.; Civit, A. A study on the use of Edge TPUs for eye fundus image segmentation. Eng. Appl. Artif. Intell. 2021, 104, 104384. [Google Scholar] [CrossRef]
- Petersson, M.; Mohammedi, Y.M. Real-time Counting of People in Public Spaces. Bachelor’s Thesis, Linnaeus University, Växjö, Sweden, 2022. [Google Scholar]
- Swaminathan, T.P.; Silver, C.; Akilan, T. Benchmarking Deep Learning Models on NVIDIA Jetson Nano for Real-Time Systems: An Empirical Investigation. arXiv 2024, arXiv:2406.17749. [Google Scholar]
- Rosero-Montalvo, P.D.; Tözün, P.; Hernandez, W. Optimized CNN Architectures Benchmarking in Hardware-Constrained Edge Devices in IoT Environments. IEEE Internet Things J. 2024, 11, 20357–20366. [Google Scholar] [CrossRef]
- Orăşan, I.L.; Seiculescu, C.; Caleanu, C.D. Benchmarking tensorflow lite quantization algorithms for deep neural networks. In Proceedings of the 2022 IEEE 16th International Symposium on Applied Computational Intelligence and Informatics (SACI), Timişoara, Romania, 15–28 May 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 000221–000226. [Google Scholar]
- Gholami, A.; Kim, S.; Dong, Z.; Yao, Z.; Mahoney, M.W.; Keutzer, K. A survey of quantization methods for efficient neural network inference. In Low-Power Computer Vision; Chapman and Hall/CRC: Boca Raton, FL, USA, 2022; pp. 291–326. [Google Scholar]
- Vadera, S.; Ameen, S. Methods for pruning deep neural networks. IEEE Access 2022, 10, 63280–63300. [Google Scholar] [CrossRef]
- Li, G.; Yang, P.; Qian, C.; Hong, R.; Tang, K. Stage-wise magnitude-based pruning for recurrent neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2022, 35, 1666–1680. [Google Scholar] [CrossRef]
- Luo, J.H.; Wu, J.; Lin, W. Thinet: A filter level pruning method for deep neural network compression. In Proceedings of the IEEE International Conference on COMPUTER Vision, Venice, Italy, 22–29 October 2017; pp. 5058–5066. [Google Scholar]
- He, Y.; Xiao, L. Structured pruning for deep convolutional neural networks: A survey. IEEE Trans. Pattern Anal. Mach. Intell. 2023, 46, 2900–2919. [Google Scholar] [CrossRef]
- Sarfraz, F.; Arani, E.; Zonooz, B. Knowledge distillation beyond model compression. In Proceedings of the 2020 25th International Conference on Pattern Recognition (ICPR), Milan, Italy, 10–15 January 2021; IEEE: Piscataway, NJ, USA, 2021; pp. 6136–6143. [Google Scholar]
- Kim, H.Y.; Jeon, W.; Kim, D. An enhanced variant effect predictor based on a deep generative model and the Born-Again Networks. Sci. Rep. 2021, 11, 19127. [Google Scholar] [CrossRef]
- Wang, J.; Jiang, T.; Cui, Z.; Cao, Z.; Cao, C. A Knowledge Distillation Method based on IQE Attention Mechanism for Target Recognition in Sar Imagery. In Proceedings of the IGARSS 2022–2022 IEEE International Geoscience and Remote Sensing Symposium, Kuala Lumpur, Malaysia, 17–22 July 2022; IEEE: Piscataway, NJ, USA; 2022; pp. 1043–1046. [Google Scholar]
- Dong, N.; Zhang, Y.; Ding, M.; Xu, S.; Bai, Y. One-stage object detection knowledge distillation via adversarial learning. Appl. Intell. 2022, 52, 4582–4598. [Google Scholar] [CrossRef]
- Li, H.; Meng, L. Hardware-aware approach to deep neural network optimization. Neurocomputing 2023, 559, 126808. [Google Scholar] [CrossRef]
- Gholami, A.; Kwon, K.; Wu, B.; Tai, Z.; Yue, X.; Jin, P.; Zhao, S.; Keutzer, K. Squeezenext: Hardware-aware neural network design. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition Workshops, Salt Lake City, UT, USA, 18–22 June 2018; pp. 1638–1647. [Google Scholar]
- Jain, A.; Bhattacharya, S.; Masuda, M.; Sharma, V.; Wang, Y. Efficient execution of quantized deep learning models: A compiler approach. arXiv 2020, arXiv:2006.10226. [Google Scholar]
- Guendouzi, B.S.; Ouchani, S.; Assaad, H.E.; Zaher, M.E. A systematic review of federated learning: Challenges, aggregation methods, and development tools. J. Netw. Comput. Appl. 2023, 220, 103714. [Google Scholar] [CrossRef]
- Nusrat, I.; Jang, S.B. A comparison of regularization techniques in deep neural networks. Symmetry 2018, 10, 648. [Google Scholar] [CrossRef]
- Guo, Y.; Yao, A.; Chen, Y. Dynamic network surgery for efficient dnns. Adv. Neural Inf. Process. Syst. 2016, 29. [Google Scholar]
- Pang, B.; Nijkamp, E.; Wu, Y.N. Deep learning with tensorflow: A review. J. Educ. Behav. Stat. 2020, 45, 227–248. [Google Scholar] [CrossRef]
- Imambi, S.; Prakash, K.B.; Kanagachidambaresan, G. PyTorch. In Programming with TensorFlow: Solution for Edge Computing Applications; Springer: Berlin/Heidelberg, Germany, 2021; pp. 87–104. [Google Scholar]
- Bradski, G.; Kaehler, A. The Opencv Library. Dr. Dobb’s Journal: Software Tools for the Professional Programmer; Miller Freeman Inc.: San Francisco, CA, USA, 2000; Volume 25, pp. 120–123. [Google Scholar]
- Jia, Y.; Shelhamer, E.; Donahue, J.; Karayev, S.; Long, J.; Girshick, R.; Guadarrama, S.; Darrell, T. Caffe: Convolutional architecture for fast feature embedding. In Proceedings of the 22nd ACM international conference on Multimedia, Barcelona, Spain, 22 October 2013; pp. 675–678. [Google Scholar]
- Van der Walt, S.; Schönberger, J.L.; Nunez-Iglesias, J.; Boulogne, F.; Warner, J.D.; Yager, N.; Gouillart, E.; Yu, T. scikit-image: Image processing in Python. PeerJ 2014, 2, e453. [Google Scholar] [CrossRef]
- Demaagd, K.; Oliver, A.; Oostendorp, N.; Scott, K. Practical Computer Vision with SimpleCV: The Simple Way to Make Technology See; O’Reilly Media, Inc.: Newton, MA, USA, 2012. [Google Scholar]
- Meta. Detectron 2. Available online: https://ai.meta.com/tools/detectron2/ (accessed on 23 February 2024).
- NVIDIA. TAO Toolkit. Available online: https://developer.nvidia.com/tao-toolkit (accessed on 22 February 2024).
- OpenMM Lab. Available online: https://openmmlab.com (accessed on 23 February 2024).
- Ultralytics. Available online: https://www.ultralytics.com (accessed on 24 February 2024).
- PyTorch Mobile. Available online: https://pytorch.org/mobile/home/ (accessed on 21 February 2024).
- Intel. OpenVINO. Available online: https://docs.openvino.ai/2024/index.html# (accessed on 21 February 2024).
- Open Neural Network Exchange (ONNX). Available online: https://onnx.ai (accessed on 22 February 2024).
- NVIDIA. TensorRT SDK. Available online: https://developer.nvidia.com/tensorrt (accessed on 23 February 2024).
- TensorFlow Lite. Available online: https://www.tensorflow.org/lite (accessed on 23 February 2024).
Name | Release Year | CPU Technology | GPU Technology | RAM | Storage | Avg Power Consumption | Dimensions (inch) | Price (USD) | |
---|---|---|---|---|---|---|---|---|---|
ASUS | Tinker Board S | 2021 | Rockchip RK3288 | ARM Mali-T764 | 2 GB DDR3 | 16 GB eMMC | ≈5 W | 3.37″ × 2.125″ | USD 199 |
Tinker Edge T | 2019 | NXP i.MX 8M | ARM Mali-T860 | 1 GB LRDDR4 | 8 GB eMMC | ≈5–10 W | 3.35″ × 2.20″ | USD 240 | |
Tinker Board 2 | 2020 | Rockchip RK3399 | ARM Mali-T860 | 2 GB LPDDR4 | 16 GB eMMC | ≈5–10 W | 3.37″ × 2.12″ | USD 120 | |
Tinker Board 3 N | 2023 | Rockchip RK3568 | ARM Mali G52 | 4 GB LPDDR4 | 64 GB eMMC | ≈5–10 W | 4″ × 4″ | USD 160 | |
NVIDIA | Jetson Nano | 2019 | ARM Cortex-A57 | 128-core Maxwell | 4 GB LPDDR4 | External microSD | ≈5–10 W | 2.72″ × 1.77″ | USD 249 |
Jetson TX2 | 2017 | ARM Cortex-A57 | 256-core Pascal | 8 GB LPDDR4 | 32 GB eMMC | ≈15 W | 3.42″ × 1.96″ | USD 810 | |
Jetson Xavier NX | 2020 | 6-core Carmel | 384-core Volta 1 | 8 GB LPDDR4 | 16 GB eMMC | ≈10–30 W | 2.74″ × 1.77″ | USD 530 | |
Jetson AGX Orin | 2023 | ARM Cortex-A78AE | 2048-core Ampere 2 | 32 GB LPDDR5 | 64 GB eMMC | ≈15–60 W | 4.33″ × 4.33″ | USD 3000 | |
Jetson Orin Nano | 2023 | ARM Cortex-A78 AE | 512-core Ampere 3 | 8 GB LPDDR5 | External microSD | ≈7–15 W | 3.93″ × 3.11″ | USD 800 | |
Libre | Libre Tritium | 2018 | 4 ARM Cortex-A7 | ARM Mali-400 | 2 GB DDR3 | External microSD | ≈5 W | 3.34″ × 2.20″ | USD 35 |
Libre Le Potato | 2017 | 4 ARM Cortex-A53 | ARM Mali-450 | 2 GB DDR3 | External microSD | ≈5 W | 3.34″ × 2.20″ | USD 30 | |
Libre Renegade | 2018 | 4 ARM Cortex-A53 | ARM Mali-450 | 4 GB DDR4 | External microSD | ≈5–10 W | 3.34″ × 2.20″ | USD 45 | |
Others | VisionFive 2 | 2021 | StarFive JH7110 | BXE-4-32 | 8 GB LPDDR4 | External microSD | ≈10 W | 3.93″ × 2.83″ | USD 65 |
ROCK PI N10 | 2021 | ARM Cortex-A72 | ARM Mali T860MP4 | 4 GB DDR3 | 8 GB eMMC | ≈15–18 W | 3.93″ × 3.93″ | USD 199 | |
BeagleBone AI | 2019 | ARM Cortex-A15 | PowerVR SGX544 | 1 GB | 16 GB eMMC | ≈5–10 W | 3.50″ × 2.12″ | USD 198 | |
HiKey970 | 2017 | ARM Cortex-A7 | ARM Mali-G72 | 6 GB LPDDR4 | 64 GB UFS | ≈10–15 W | 4.14″ × 3.93″ | USD 239 | |
Coral Dev Board | 2019 | ARM Cortex-A53 | GC7000 Lite Graphics | 4 GB LPDDR4 | 8 GB eMMC | ≈5 W | 5.40″ × 3.90″ | USD 200 | |
Coral Dev Mini | 2020 | ARM Cortex-A35 | PowerVR GE8300 | 2 GB LPDDR3 | 8 GB eMMC | ≈3 W | 2.52″ × 1.89″ | USD 100 |
SBC | CV Task | Purpose | Model | Packages | Inference Performance | Reference |
---|---|---|---|---|---|---|
Jetson Nano | Detection | Pedestrian Detection | YOLOv5s | TensorRT | 15 FPS | Chen and Wang [35] |
Classification | ImageNet Classification | MobileNetv2 | TensorFlow, TensorRT, ONNX | 0.020 s per image | Baller et al. [38] | |
Classification | Binary Classification | MobileNetv3 | PyTorch, ONNX, TensorRT | 0.300 s per image | Swaminathan et al. [76] | |
Detection | Ship Detection | YOLOv5 | PyTorch with CUDA | 4.86 FPS | Valencia et al. [40] | |
Classification | Multiclass Custom | MobileNetv2 | TensorRT | 6 ms per image | Rosero-Montalvo et al. [77] | |
Detection | Face Mask Detection | YOLOv4 Tiny | PyTorch with CUDA | 13 FPS | Hakim et al. [41] | |
Detection | Pedestrian Detection | SSD | PyTorch, TensorRT | 8.7 FPS | Sarvajcz et al. [36] | |
Classification | DeepFashion2 Classification | Custom CNN | CUDA, CuDNN | 6.4 ms per image | Suzen et al. [42] | |
Detection | COCO Detection | YOLOv3 | PyTorch with CUDA | 0.786 ms per image | Zhu et al. [43] | |
Detection | Pedestrian Detection | YOLOv4 | PyTorch with CUDA | 8 FPS | Wang et al. [37] | |
Classification | MNIST Classification | DarkNet | TensorFlow | 0.217 ms per image | Tolmacheva et al. [39] | |
Detection | Landing Platform Identification | YOLOv4 Tiny | TensorRT | 36.48 FPS | Ma et al. [44] | |
Dual Detection | Person and Weapon | SSD, YOLOv4 | FP16 with TensorRT | 1.4 FPS | Berardini et al. [45] | |
Jetson Nano | Detection | Plastic Bag Detection | YOLOv4 Tiny | TensortRT, FP32, DeepStream | 16.4 FPS | Iqbal et al. [8] |
Detection | Ripe Coffee Detection | YOLOv3 | TensorFlow | 8180 ms per image | Beegam et al. [46] | |
Jetson TX2 | Detection | PASCAL VOC Detection | SSD | PyTorch with CUDA Support | 9.35 FPS | Chen et al. [47] |
Detection | COCO Detection | Custom CNN | OpenCV, TensorFlow | 0.130 s per image | Taspinar and Selek [26] | |
Classification | Multiclass Custom | MobileNetv2 | TensorRT | 3.1 ms per image | Rosero-Montalvo et al. [77] | |
Classification | DeepFashion2 Classification | Custom CNN | CUDA, CuDNN | 4.6 ms per image | Suzen et al. [42] | |
Detection | Drone Detection | YOLOv3 | PyTorch with CUDA | <1 FPS | Xun et al. [48] | |
Detection | Vehicle Detection | YOLOv3 Tiny | TensorRT with FP16 | 18.3 FPS | Nguyen et al. [49] | |
Detection | Pedestrian Detection | YOLOv3 | PyTorch with CUDA | 6.6 FPS | Afifi et al. [50] | |
Detection | Person Detection | YOLOv8 | PyTorch with CUDA | 5.61 FPS | Byzkrovnyi et al. [51] | |
Detection | Plastic Bag Detection | YOLOv4 Tiny | TensortRT, FP32, DeepStream | 24.8 FPS | Iqbal et al. [8] | |
Segmentation | Weed Segmentation | SegNet | Caffe, CUDA, cuDNN | 0.56 s per image | Sa et al. [52] | |
Detection | Pest Detection | YOLOv3 Tiny | PyTorch with CUDA | 8.71 FPS | Chen et al. [53] | |
Detection | Concrete Crack Detection | YOLOv3 | OpenCV, PyTorch, CUDA | 33 ms per image | Kumar et al. [54] | |
Jetson Xavier NX | Detection | Banana Ripeness Detection | YOLOv8 Nano | PyTorch with CUDA | 13.9 ms per image | Aishwarya and Kumar [55] |
Detection | Face Mask Detection | YOLOv5s | PyTorch with CUDA | 12 FPS | Aljaafreh et al. [56] | |
Detection | COCO Detection | YOLOv4 Tiny | TensorRT with FP16 | 0.0035 ms per image | Shin and Kim [57] | |
Detection | Vehicle and Pedestrian Detection | YOLOv8 | DeepStream, TensorRT | 6.7 ms per image | Wasule et al. [58] | |
Detection | COCO Detection | YOLOv3 | PyTorch with CUDA | 0.252 ms per image | Zhu et al. [43] | |
Detection | Pedestrian Detection | YOLOv4 | PyTorch with CUDA | 15 FPS | Wang et al. [37] | |
Depth Estimation | Monocular Estimation | FastMDE | PyTorch, ONNX, TensorRT | 30 ms per image | Dao et al. [60] | |
Detection | Fire and Smoke Detection | YOLOv3 | PyTorch, DeepStream | 9.9 FPS | Chen et al. [59] | |
Jetson ORIN AGX | Classification | Tomato Disease Classification | MobileNetv2 | ONNX with ONNXRuntime | 0.4 ms per image | Zahid et al. [61] |
Detection | COCO Detection | SSD | TensorRT | 48.41 FPS | Avila et al. [62] | |
Detection | Ground Vehicle Detection | YOLOX | TensorRT with FP32 | 37 FPS | Bhattacharjee et al. [63] | |
Detection | Surgical Instrument Detection | YOLOv5 Model | TensorRT with int8, DeepStream | 2.3 ms per image | Belhaoua et al. [64] | |
Detection+Tracking | Vehicle Detection | YOLOX | FP32 with TensorRT | 87.2 FPS | Carvalho et al. [65] | |
Jetson ORIN Nano | Classification | Tomato Disease Classification | MobileNetv2 | ONNX with ONNXRuntime | 0.6 ms per image | Zahid et al. [61] |
Detection+Tracking | Vehicle Detection | YOLOX | FP32 with TensorRT | 78.9 FPS | Carvalho et al. [65] | |
HiKey970 | Detection | PASCAL VOC Detection | SSD | PyTorch with CUDA Support | 1.45 FPS | Chen et al. [47] |
Classification | MNIST Classification | DarkNet | TensorFlow | 0.460 ms per image | Tolmacheva et al. [39] | |
ASUS Tinker Board S | Detection | COCO Detection | Custom CNN | OpenCV, TensorFlow | 0.245 s per image | Taspinar and Selek [26] |
Detection | Face Detection | Haar Features | OpenCV | 0.2 s per image | Chen et al. [27] | |
Classification | Face with Mask Classification | MobileNetv2 | TensorFlow | Not reported | Jahan et al. [28] | |
ASUS Tinker Edge T | Classification | Pest Classification | ResNet 8 | TFLite | 3 ms per image | Tran and Tran [29] |
Coral Dev Board | Classification | ImageNet Classification | MobileNetv2 | TFLite, Quant Integer | 0.004 s per image | Baller et al. [38] |
Segmentation | Eye Optical Disc Segmentation | UNet Model | TFLite | 9 ms per image | Masot et al. [74] | |
Classification | Multiclass Custom | MobileNetv2 | TFLite with IQ quantization | 2.9 ms | Rosero-Montalvo et al. [77] | |
Classification | Tomato Disease Classification | MobileNetv2 | ONNX with ONNXRuntime | 1.16 ms per image | Zahid et al. [61] | |
Detection | People Detection | Custom Model | OpenCV, TFLite | 1 FPS | Petersson and Mohammedi [75] | |
Detection | Person and Weapon | SSD, YOLOv4 | int8 with TFLite | 0.9 FPS | Berardini et al. [45] | |
BeagleBone AI | Classification | MNIST Classification | Custom Model | OpenCV, TFLite | 0.524 ms per image | Bogacz and Qouneh [73] |
VisionFive 2 | Detection | Universal Object Detection | YOLOv3 | OpenCV, PyTorch | 427.70 ms per image | Jacob [72] |
CV Task | Model Complexity | Inference Speed | Visual Complexity | Single/Multiple Models | Input Streams | Recommended SBCs |
---|---|---|---|---|---|---|
Entry-Level 1 Classification | Simple (e.g., MobileNet) | Not Critical | Simple Background | Single Model Inference | Single Stream | Tinker Board S |
Tinker Board T | ||||||
Tinker Board 2 | ||||||
Libre Tritium | ||||||
Libre Le Potato | ||||||
Libre Renegade | ||||||
ROCK PI N10 | ||||||
BeagleBone AI | ||||||
Coral Dev Mini | ||||||
Moderate-Performance 2 Classification | Moderate (e.g., ResNet50) | <1 FPS | Visually Challenging Background | Single Model Inference | Single Stream | Tinker Board 3N |
Jetson Nano | ||||||
VisionFive 2 | ||||||
HiKey970 | ||||||
Coral Dev Board | ||||||
High-Performance 3 Detection and Tracking | Complex (e.g., YOLO, DINO) | >15 FPS | Complex and Dynamically Changing Background | Single Model Inference + Tracker | Single Stream | Jetson TX2 |
Jetson Xavier NX | ||||||
Jetson Orin Nano | ||||||
Very-High-Performance 4 Detection and Tracking | Complex (e.g., YOLO, DINO) | >15 FPS | Complex and Dynamically Changing Background | Multiple Model Inference + Tracker | Multiple Stream | Jetson AGX Orin |
Feature | OpenCV | TensorFlow | Caffe | PyTorch | Scikit-Image | SimpleCV |
---|---|---|---|---|---|---|
License | Apache 2 | Apache 2 | BSD 2-Clause | Custom (BSD-style) | BSD 3-Clause | BSD 3-Clause |
Application | CV and ML | End-to-End ML | Deep Learning | ML and Deep Learning | Image Processing | Machine Vision |
Language | C++, Python, Java, MATLAB | Python, C++, Java | C++, Python | Python, C++ | Python | Python |
Optimization | MMX, SSE | GPU (CUDA, OpenCL) | GPU Acceleration | GPU Acceleration | Cython | NA |
Feature | Detectron2 | NVIDIA TAO Toolkit | OpenMMLab | Ultralytics |
---|---|---|---|---|
License | Apache 2 | Proprietary | Apache 2 | MIT |
Developer | Facebook AI Research | NVIDIA | MMLAB | Ultralytics |
Framework | PyTorch | TensorFlow, PyTorch | PyTorch | PyTorch |
CV Tasks | Det, Seg | Det, Seg, Clas, Action, Pose | All | Det, Seg |
AutoML | No | Yes | No | No |
Export Options | TorchScript, ONNX | ONNX, TRT Engine | ONNX, Torch | ONNX, Torch |
Feature | PyTorch Mobile | OpenVINO | ONNX | TensorRT | TensorFlow Lite |
---|---|---|---|---|---|
License | BSD | Apache 2 | Apache 2 | NVIDIA License | Apache 2 |
Developer | Facebook AI Research | Intel | Linux Foundation | NVIDIA | |
Framework Comparability | PyTorch | TensorFlow, PyTorch | TensorFlow, PyTorch, Caffe | TensorFlow, PyTorch, Caffe | TensorFlow |
Optimization Techniques | Pruning, Quantization | Model Conversion | Framework Independent | Quantizations, Layer Fusion | Quantization, Operator Optimization |
GPU Support | Upcoming | Yes (Intel GPUs) | Yes | Yes (NVIDIA GPUs) | Yes |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Iqbal, U.; Davies, T.; Perez, P. A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision. Sensors 2024, 24, 4830. https://doi.org/10.3390/s24154830
Iqbal U, Davies T, Perez P. A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision. Sensors. 2024; 24(15):4830. https://doi.org/10.3390/s24154830
Chicago/Turabian StyleIqbal, Umair, Tim Davies, and Pascal Perez. 2024. "A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision" Sensors 24, no. 15: 4830. https://doi.org/10.3390/s24154830
APA StyleIqbal, U., Davies, T., & Perez, P. (2024). A Review of Recent Hardware and Software Advances in GPU-Accelerated Edge-Computing Single-Board Computers (SBCs) for Computer Vision. Sensors, 24(15), 4830. https://doi.org/10.3390/s24154830