Image Processing Using FPGAs

A special issue of Journal of Imaging (ISSN 2313-433X).

Deadline for manuscript submissions: closed (30 November 2018) | Viewed by 74889

Printed Edition Available!
A printed edition of this Special Issue is available here.

Special Issue Editor


E-Mail Website
Guest Editor
Department of Mechanical and Electrical Engineering, School of Food and Advanced Technology, Massey University, Palmerston North 4442, New Zealand
Interests: machine vision; FPGA based design; digital image processing
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Field Programmable Gate Arrays (FPGAs) are increasingly being used for the implementation of image processing applications. This is especially the case for real-time embedded applications, where latency and power are important consideration. An FPGA embedded in a smart camera is able to perform much of the image processing directly as the image is streamed from the sensor, providing a processed data stream, rather than images. The parallelism of hardware is able to exploit the spatial and temporal parallelism implicit within many image processing tasks. Unfortunately, simply porting a software algorithm onto an FPGA often gives disappointing results, because many image processing algorithms have been optimised for a serial processor. It is usually necessary to transform the algorithm to efficiently exploit the parallelism and resources available on an FPGA. This can lead to novel algorithms and hardware computational architectures, both at the image processing operation level and also the application level.

The aim of this Special Issue is to present and highlight novel algorithms, architectures, techniques and applications of FPGAs for image processing.

Prof. Donald Bailey
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Journal of Imaging is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1800 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • Hardware algorithms for imaging
  • Computational imaging architectures
  • Reconfigurable image processing systems
  • Parallel image processing
  • Hardware acceleration for imaging applications
  • FPGA based smart cameras

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Editorial

Jump to: Research

4 pages, 169 KiB  
Editorial
Image Processing Using FPGAs
by Donald G. Bailey
J. Imaging 2019, 5(5), 53; https://doi.org/10.3390/jimaging5050053 - 10 May 2019
Cited by 11 | Viewed by 6772
Abstract
Nine articles have been published in this Special Issue on image processing using field programmable gate arrays (FPGAs). The papers address a diverse range of topics relating to the application of FPGA technology to accelerate image processing tasks. The range includes: Custom processor [...] Read more.
Nine articles have been published in this Special Issue on image processing using field programmable gate arrays (FPGAs). The papers address a diverse range of topics relating to the application of FPGA technology to accelerate image processing tasks. The range includes: Custom processor design to reduce the programming burden; memory management for full frames, line buffers, and image border management; image segmentation through background modelling, online K-means clustering, and generalised Laplacian of Gaussian filtering; connected components analysis; and visually lossless image compression. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)

Research

Jump to: Editorial

29 pages, 2873 KiB  
Article
A JND-Based Pixel-Domain Algorithm and Hardware Architecture for Perceptual Image Coding
by Zhe Wang, Trung-Hieu Tran, Ponnanna Kelettira Muthappa and Sven Simon
J. Imaging 2019, 5(5), 50; https://doi.org/10.3390/jimaging5050050 - 26 Apr 2019
Cited by 6 | Viewed by 8023
Abstract
This paper presents a hardware efficient pixel-domain just-noticeable difference (JND) model and its hardware architecture implemented on an FPGA. This JND model architecture is further proposed to be part of a low complexity pixel-domain perceptual image coding architecture, which is based on downsampling [...] Read more.
This paper presents a hardware efficient pixel-domain just-noticeable difference (JND) model and its hardware architecture implemented on an FPGA. This JND model architecture is further proposed to be part of a low complexity pixel-domain perceptual image coding architecture, which is based on downsampling and predictive coding. The downsampling is performed adaptively on the input image based on regions-of-interest (ROIs) identified by measuring the downsampling distortions against the visibility thresholds given by the JND model. The coding error at any pixel location can be guaranteed to be within the corresponding JND threshold in order to obtain excellent visual quality. Experimental results show the improved accuracy of the proposed JND model in estimating visual redundancies compared with classic JND models published earlier. Compression experiments demonstrate improved rate-distortion performance and visual quality over JPEG-LS as well as reduced compressed bit rates compared with other standard codecs such as JPEG 2000 at the same peak signal-to-perceptible-noise ratio (PSPNR). FPGA synthesis results targeting a mid-range device show very moderate hardware resource requirements and over 100 Megapixel/s throughput of both the JND model and the perceptual encoder. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)
Show Figures

Figure 1

26 pages, 659 KiB  
Article
Zig-Zag Based Single-Pass Connected Components Analysis
by Donald G. Bailey and Michael J. Klaiber
J. Imaging 2019, 5(4), 45; https://doi.org/10.3390/jimaging5040045 - 06 Apr 2019
Cited by 16 | Viewed by 5625
Abstract
Single-pass connected components analysis (CCA) algorithms suffer from a time overhead to resolve labels at the end of each image row. This work demonstrates how this overhead can be eliminated by replacing the conventional raster scan by a zig-zag scan. This enables chains [...] Read more.
Single-pass connected components analysis (CCA) algorithms suffer from a time overhead to resolve labels at the end of each image row. This work demonstrates how this overhead can be eliminated by replacing the conventional raster scan by a zig-zag scan. This enables chains of labels to be correctly resolved while processing the next image row. The effect is faster processing in the worst case with no end of row overheads. CCA hardware architectures using the novel algorithm proposed in this paper are, therefore, able to process images at higher throughput than other state-of-the-art methods while reducing the hardware requirements. The latency introduced by the conversion from raster scan to zig-zag scan is compensated for by a new method of detecting object completion, which enables the feature vector for completed connected components to be output at the earliest possible opportunity. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)
Show Figures

Figure 1

17 pages, 5883 KiB  
Article
High-Level Synthesis of Online K-Means Clustering Hardware for a Real-Time Image Processing Pipeline
by Aiman Badawi and Muhammad Bilal
J. Imaging 2019, 5(3), 38; https://doi.org/10.3390/jimaging5030038 - 14 Mar 2019
Cited by 10 | Viewed by 7424
Abstract
The growing need for smart surveillance solutions requires that modern video capturing devices to be equipped with advance features, such as object detection, scene characterization, and event detection, etc. Image segmentation into various connected regions is a vital pre-processing step in these and [...] Read more.
The growing need for smart surveillance solutions requires that modern video capturing devices to be equipped with advance features, such as object detection, scene characterization, and event detection, etc. Image segmentation into various connected regions is a vital pre-processing step in these and other advanced computer vision algorithms. Thus, the inclusion of a hardware accelerator for this task in the conventional image processing pipeline inevitably reduces the workload for more advanced operations downstream. Moreover, design entry by using high-level synthesis tools is gaining popularity for the facilitation of system development under a rapid prototyping paradigm. To address these design requirements, we have developed a hardware accelerator for image segmentation, based on an online K-Means algorithm using a Simulink high-level synthesis tool. The developed hardware uses a standard pixel streaming protocol, and it can be readily inserted into any image processing pipeline as an Intellectual Property (IP) core on a Field Programmable Gate Array (FPGA). Furthermore, the proposed design reduces the hardware complexity of the conventional architectures by employing a weighted instead of a moving average to update the clusters. Experimental evidence has also been provided to demonstrate that the proposed weighted average-based approach yields better results than the conventional moving average on test video sequences. The synthesized hardware has been tested in real-time environment to process Full HD video at 26.5 fps, while the estimated dynamic power consumption is less than 90 mW on the Xilinx Zynq-7000 SOC. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)
Show Figures

Figure 1

20 pages, 1993 KiB  
Article
High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing
by Runbin Shi, Justin S.J. Wong and Hayden K.-H. So
J. Imaging 2019, 5(3), 34; https://doi.org/10.3390/jimaging5030034 - 06 Mar 2019
Cited by 4 | Viewed by 7641
Abstract
Parallel hardware designed for image processing promotes vision-guided intelligent applications. With the advantages of high-throughput and low-latency, streaming architecture on FPGA is especially attractive to real-time image processing. Notably, many real-world applications, such as region of interest (ROI) detection, demand the ability to [...] Read more.
Parallel hardware designed for image processing promotes vision-guided intelligent applications. With the advantages of high-throughput and low-latency, streaming architecture on FPGA is especially attractive to real-time image processing. Notably, many real-world applications, such as region of interest (ROI) detection, demand the ability to process images continuously at different sizes and resolutions in hardware without interruptions. FPGA is especially suitable for implementation of such flexible streaming architecture, but most existing solutions require run-time reconfiguration, and hence cannot achieve seamless image size-switching. In this paper, we propose a dynamically-programmable buffer architecture (D-SWIM) based on the Stream-Windowing Interleaved Memory (SWIM) architecture to realize image processing on FPGA for image streams at arbitrary sizes defined at run time. D-SWIM redefines the way that on-chip memory is organized and controlled, and the hardware adapts to arbitrary image size with sub-100 ns delay that ensures minimum interruptions to the image processing at a high frame rate. Compared to the prior SWIM buffer for high-throughput scenarios, D-SWIM achieved dynamic programmability with only a slight overhead on logic resource usage, but saved up to 56 % of the BRAM resource. The D-SWIM buffer achieves a max operating frequency of 329.5 MHz and reduction in power consumption by 45.7 % comparing with the SWIM scheme. Real-world image processing applications, such as 2D-Convolution and the Harris Corner Detector, have also been used to evaluate D-SWIM’s performance, where a pixel throughput of 4.5 Giga Pixel/s and 4.2 Giga Pixel/s were achieved respectively in each case. Compared to the implementation with prior streaming frameworks, the D-SWIM-based design not only realizes seamless image size-switching, but also improves hardware efficiency up to 30 × . Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)
Show Figures

Figure 1

13 pages, 6972 KiB  
Article
Efficient FPGA Implementation of Automatic Nuclei Detection in Histopathology Images
by Haonan Zhou, Raju Machupalli and Mrinal Mandal
J. Imaging 2019, 5(1), 21; https://doi.org/10.3390/jimaging5010021 - 17 Jan 2019
Cited by 3 | Viewed by 5551
Abstract
Accurate and efficient detection of cell nuclei is an important step towards the development of a pathology-based Computer Aided Diagnosis. Generally, high-resolution histopathology images are very large, in the order of billion pixels, therefore nuclei detection is a highly compute intensive task, and [...] Read more.
Accurate and efficient detection of cell nuclei is an important step towards the development of a pathology-based Computer Aided Diagnosis. Generally, high-resolution histopathology images are very large, in the order of billion pixels, therefore nuclei detection is a highly compute intensive task, and software implementation requires a significant amount of processing time. To assist the doctors in real time, special hardware accelerators, which can reduce the processing time, are required. In this paper, we propose a Field Programmable Gate Array (FPGA) implementation of automated nuclei detection algorithm using generalized Laplacian of Gaussian filters. The experimental results show that the implemented architecture has the potential to provide a significant improvement in processing time without losing detection accuracy. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)
Show Figures

Figure 1

22 pages, 2370 KiB  
Article
FPGA-Based Processor Acceleration for Image Processing Applications
by Fahad Siddiqui, Sam Amiri, Umar Ibrahim Minhas, Tiantai Deng, Roger Woods, Karen Rafferty and Daniel Crookes
J. Imaging 2019, 5(1), 16; https://doi.org/10.3390/jimaging5010016 - 13 Jan 2019
Cited by 34 | Viewed by 12257
Abstract
FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a [...] Read more.
FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details of the dataflow-based programming environment. The approach is demonstrated for a k-means clustering operation and a traffic sign recognition application, both of which have been prototyped on an Avnet Zedboard that has Xilinx Zynq-7000 system-on-chip (SoC). A number of parallel dataflow mapping options were explored giving a speed-up of 8 times for the k-means clustering using 16 IPPro cores, and a speed-up of 9.6 times for the morphology filter operation of the traffic sign recognition using 16 IPPro cores compared to their equivalent ARM-based software implementations. We show that for k-means clustering, the 16 IPPro cores implementation is 57, 28 and 1.7 times more power efficient (fps/W) than ARM Cortex-A7 CPU, nVIDIA GeForce GTX980 GPU and ARM Mali-T628 embedded GPU respectively. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)
Show Figures

Figure 1

23 pages, 2876 KiB  
Article
Optimized Memory Allocation and Power Minimization for FPGA-Based Image Processing
by Paulo Garcia, Deepayan Bhowmik, Robert Stewart, Greg Michaelson and Andrew Wallace
J. Imaging 2019, 5(1), 7; https://doi.org/10.3390/jimaging5010007 - 01 Jan 2019
Cited by 18 | Viewed by 8257
Abstract
Memory is the biggest limiting factor to the widespread use of FPGAs for high-level image processing, which require complete frame(s) to be stored in situ. Since FPGAs have limited on-chip memory capabilities, efficient use of such resources is essential to meet performance, size [...] Read more.
Memory is the biggest limiting factor to the widespread use of FPGAs for high-level image processing, which require complete frame(s) to be stored in situ. Since FPGAs have limited on-chip memory capabilities, efficient use of such resources is essential to meet performance, size and power constraints. In this paper, we investigate allocation of on-chip memory resources in order to minimize resource usage and power consumption, contributing to the realization of power-efficient high-level image processing fully contained on FPGAs. We propose methods for generating memory architectures, from both Hardware Description Languages and High Level Synthesis designs, which minimize memory usage and power consumption. Based on a formalization of on-chip memory configuration options and a power model, we demonstrate how our partitioning algorithms can outperform traditional strategies. Compared to commercial FPGA synthesis and High Level Synthesis tools, our results show that the proposed algorithms can result in up to 60% higher utilization efficiency, increasing the sizes and/or number of frames that can be accommodated, and reduce frame buffers’ dynamic power consumption by up to approximately 70%. In our experiments using Optical Flow and MeanShift Tracking, representative high-level algorithms, data show that partitioning algorithms can reduce total power by up to 25% and 30%, respectively, without impacting performance. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)
Show Figures

Figure 1

21 pages, 509 KiB  
Article
Border Handling for 2D Transpose Filter Structures on an FPGA
by Donald G. Bailey and Anoop S. Ambikumar
J. Imaging 2018, 4(12), 138; https://doi.org/10.3390/jimaging4120138 - 26 Nov 2018
Cited by 7 | Viewed by 6028
Abstract
It is sometimes desirable to implement filters using a transpose-form filter structure. However, managing image borders is generally considered more complex than it is with the more commonly used direct-form structure. This paper explores border handling for transpose-form filters, and proposes two novel [...] Read more.
It is sometimes desirable to implement filters using a transpose-form filter structure. However, managing image borders is generally considered more complex than it is with the more commonly used direct-form structure. This paper explores border handling for transpose-form filters, and proposes two novel mechanisms: transformation coalescing, and combination chain modification. For linear filters, coefficient coalescing can effectively exploit the digital signal processing blocks, resulting in the smallest resources requirements. Combination chain modification requires similar resources to direct-form border handling. It is demonstrated that the combination chain multiplexing can be split into two stages, consisting of a combination network followed by the transpose-form combination chain. The resulting transpose-form border handling networks are of similar complexity to the direct-form networks, enabling the transpose-form filter structure to be used where required. The transpose form is also significantly faster, being automatically pipelined by the filter structure. Of the border extension methods, zero-extension requires the least resources. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)
Show Figures

Figure 1

17 pages, 1830 KiB  
Article
Accelerating SuperBE with Hardware/Software Co-Design
by Andrew Tzer-Yeu Chen, Rohaan Gupta, Anton Borzenko, Kevin I-Kai Wang and Morteza Biglari-Abhari
J. Imaging 2018, 4(10), 122; https://doi.org/10.3390/jimaging4100122 - 18 Oct 2018
Cited by 5 | Viewed by 5410
Abstract
Background Estimation is a common computer vision task, used for segmenting moving objects in video streams. This can be useful as a pre-processing step, isolating regions of interest for more complicated algorithms performing detection, recognition, and identification tasks, in order to reduce overall [...] Read more.
Background Estimation is a common computer vision task, used for segmenting moving objects in video streams. This can be useful as a pre-processing step, isolating regions of interest for more complicated algorithms performing detection, recognition, and identification tasks, in order to reduce overall computation time. This is especially important in the context of embedded systems like smart cameras, which may need to process images with constrained computational resources. This work focuses on accelerating SuperBE, a superpixel-based background estimation algorithm that was designed for simplicity and reducing computational complexity while maintaining state-of-the-art levels of accuracy. We explore both software and hardware acceleration opportunities, converting the original algorithm into a greyscale, integer-only version, and using Hardware/Software Co-design to develop hardware acceleration components on FPGA fabric that assist a software processor. We achieved a 4.4× speed improvement with the software optimisations alone, and a 2× speed improvement with the hardware optimisations alone. When combined, these led to a 9× speed improvement on a Cyclone V System-on-Chip, delivering almost 38 fps on 320 × 240 resolution images. Full article
(This article belongs to the Special Issue Image Processing Using FPGAs)
Show Figures

Figure 1

Back to TopTop