Editorial

4 pages, 169 KiB

Open AccessEditorial

by Donald G. Bailey

J. Imaging 2019, 5(5), 53; https://doi.org/10.3390/jimaging5050053 - 10 May 2019

Cited by 11 | Viewed by 6772

Nine articles have been published in this Special Issue on image processing using field programmable gate arrays (FPGAs). The papers address a diverse range of topics relating to the application of FPGA technology to accelerate image processing tasks. The range includes: Custom processor [...] Read more.

Nine articles have been published in this Special Issue on image processing using field programmable gate arrays (FPGAs). The papers address a diverse range of topics relating to the application of FPGA technology to accelerate image processing tasks. The range includes: Custom processor design to reduce the programming burden; memory management for full frames, line buffers, and image border management; image segmentation through background modelling, online K-means clustering, and generalised Laplacian of Gaussian filtering; connected components analysis; and visually lossless image compression. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

Research

Jump to: Editorial

29 pages, 2873 KiB

Open AccessArticle

A JND-Based Pixel-Domain Algorithm and Hardware Architecture for Perceptual Image Coding

by Zhe Wang, Trung-Hieu Tran, Ponnanna Kelettira Muthappa and Sven Simon

J. Imaging 2019, 5(5), 50; https://doi.org/10.3390/jimaging5050050 - 26 Apr 2019

Cited by 6 | Viewed by 8023

Abstract

This paper presents a hardware efficient pixel-domain just-noticeable difference (JND) model and its hardware architecture implemented on an FPGA. This JND model architecture is further proposed to be part of a low complexity pixel-domain perceptual image coding architecture, which is based on downsampling [...] Read more.

This paper presents a hardware efficient pixel-domain just-noticeable difference (JND) model and its hardware architecture implemented on an FPGA. This JND model architecture is further proposed to be part of a low complexity pixel-domain perceptual image coding architecture, which is based on downsampling and predictive coding. The downsampling is performed adaptively on the input image based on regions-of-interest (ROIs) identified by measuring the downsampling distortions against the visibility thresholds given by the JND model. The coding error at any pixel location can be guaranteed to be within the corresponding JND threshold in order to obtain excellent visual quality. Experimental results show the improved accuracy of the proposed JND model in estimating visual redundancies compared with classic JND models published earlier. Compression experiments demonstrate improved rate-distortion performance and visual quality over JPEG-LS as well as reduced compressed bit rates compared with other standard codecs such as JPEG 2000 at the same peak signal-to-perceptible-noise ratio (PSPNR). FPGA synthesis results targeting a mid-range device show very moderate hardware resource requirements and over 100 Megapixel/s throughput of both the JND model and the perceptual encoder. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

► Show Figures

Figure 1

26 pages, 659 KiB

Open AccessArticle

Zig-Zag Based Single-Pass Connected Components Analysis

by Donald G. Bailey and Michael J. Klaiber

J. Imaging 2019, 5(4), 45; https://doi.org/10.3390/jimaging5040045 - 06 Apr 2019

Cited by 16 | Viewed by 5625

Abstract

Single-pass connected components analysis (CCA) algorithms suffer from a time overhead to resolve labels at the end of each image row. This work demonstrates how this overhead can be eliminated by replacing the conventional raster scan by a zig-zag scan. This enables chains [...] Read more.

Single-pass connected components analysis (CCA) algorithms suffer from a time overhead to resolve labels at the end of each image row. This work demonstrates how this overhead can be eliminated by replacing the conventional raster scan by a zig-zag scan. This enables chains of labels to be correctly resolved while processing the next image row. The effect is faster processing in the worst case with no end of row overheads. CCA hardware architectures using the novel algorithm proposed in this paper are, therefore, able to process images at higher throughput than other state-of-the-art methods while reducing the hardware requirements. The latency introduced by the conversion from raster scan to zig-zag scan is compensated for by a new method of detecting object completion, which enables the feature vector for completed connected components to be output at the earliest possible opportunity. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

► Show Figures

Figure 1

17 pages, 5883 KiB

Open AccessArticle

High-Level Synthesis of Online K-Means Clustering Hardware for a Real-Time Image Processing Pipeline

by Aiman Badawi and Muhammad Bilal

J. Imaging 2019, 5(3), 38; https://doi.org/10.3390/jimaging5030038 - 14 Mar 2019

Cited by 10 | Viewed by 7424

Abstract

The growing need for smart surveillance solutions requires that modern video capturing devices to be equipped with advance features, such as object detection, scene characterization, and event detection, etc. Image segmentation into various connected regions is a vital pre-processing step in these and [...] Read more.

The growing need for smart surveillance solutions requires that modern video capturing devices to be equipped with advance features, such as object detection, scene characterization, and event detection, etc. Image segmentation into various connected regions is a vital pre-processing step in these and other advanced computer vision algorithms. Thus, the inclusion of a hardware accelerator for this task in the conventional image processing pipeline inevitably reduces the workload for more advanced operations downstream. Moreover, design entry by using high-level synthesis tools is gaining popularity for the facilitation of system development under a rapid prototyping paradigm. To address these design requirements, we have developed a hardware accelerator for image segmentation, based on an online K-Means algorithm using a Simulink high-level synthesis tool. The developed hardware uses a standard pixel streaming protocol, and it can be readily inserted into any image processing pipeline as an Intellectual Property (IP) core on a Field Programmable Gate Array (FPGA). Furthermore, the proposed design reduces the hardware complexity of the conventional architectures by employing a weighted instead of a moving average to update the clusters. Experimental evidence has also been provided to demonstrate that the proposed weighted average-based approach yields better results than the conventional moving average on test video sequences. The synthesized hardware has been tested in real-time environment to process Full HD video at 26.5 fps, while the estimated dynamic power consumption is less than 90 mW on the Xilinx Zynq-7000 SOC. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

► Show Figures

Figure 1

20 pages, 1993 KiB

Open AccessArticle

High-Throughput Line Buffer Microarchitecture for Arbitrary Sized Streaming Image Processing

by Runbin Shi, Justin S.J. Wong and Hayden K.-H. So

J. Imaging 2019, 5(3), 34; https://doi.org/10.3390/jimaging5030034 - 06 Mar 2019

Cited by 4 | Viewed by 7641

Abstract

Parallel hardware designed for image processing promotes vision-guided intelligent applications. With the advantages of high-throughput and low-latency, streaming architecture on FPGA is especially attractive to real-time image processing. Notably, many real-world applications, such as region of interest (ROI) detection, demand the ability to [...] Read more.

Parallel hardware designed for image processing promotes vision-guided intelligent applications. With the advantages of high-throughput and low-latency, streaming architecture on FPGA is especially attractive to real-time image processing. Notably, many real-world applications, such as region of interest (ROI) detection, demand the ability to process images continuously at different sizes and resolutions in hardware without interruptions. FPGA is especially suitable for implementation of such flexible streaming architecture, but most existing solutions require run-time reconfiguration, and hence cannot achieve seamless image size-switching. In this paper, we propose a dynamically-programmable buffer architecture (D-SWIM) based on the Stream-Windowing Interleaved Memory (SWIM) architecture to realize image processing on FPGA for image streams at arbitrary sizes defined at run time. D-SWIM redefines the way that on-chip memory is organized and controlled, and the hardware adapts to arbitrary image size with sub-100 ns delay that ensures minimum interruptions to the image processing at a high frame rate. Compared to the prior SWIM buffer for high-throughput scenarios, D-SWIM achieved dynamic programmability with only a slight overhead on logic resource usage, but saved up to

56 %

of the BRAM resource. The D-SWIM buffer achieves a max operating frequency of

329.5

MHz and reduction in power consumption by

45.7 %

comparing with the SWIM scheme. Real-world image processing applications, such as 2D-Convolution and the Harris Corner Detector, have also been used to evaluate D-SWIM’s performance, where a pixel throughput of

4.5

Giga Pixel/s and

4.2

Giga Pixel/s were achieved respectively in each case. Compared to the implementation with prior streaming frameworks, the D-SWIM-based design not only realizes seamless image size-switching, but also improves hardware efficiency up to

30 \times

. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

► Show Figures

Figure 1

13 pages, 6972 KiB

Open AccessArticle

Efficient FPGA Implementation of Automatic Nuclei Detection in Histopathology Images

by Haonan Zhou, Raju Machupalli and Mrinal Mandal

J. Imaging 2019, 5(1), 21; https://doi.org/10.3390/jimaging5010021 - 17 Jan 2019

Cited by 3 | Viewed by 5551

Abstract

Accurate and efficient detection of cell nuclei is an important step towards the development of a pathology-based Computer Aided Diagnosis. Generally, high-resolution histopathology images are very large, in the order of billion pixels, therefore nuclei detection is a highly compute intensive task, and [...] Read more.

Accurate and efficient detection of cell nuclei is an important step towards the development of a pathology-based Computer Aided Diagnosis. Generally, high-resolution histopathology images are very large, in the order of billion pixels, therefore nuclei detection is a highly compute intensive task, and software implementation requires a significant amount of processing time. To assist the doctors in real time, special hardware accelerators, which can reduce the processing time, are required. In this paper, we propose a Field Programmable Gate Array (FPGA) implementation of automated nuclei detection algorithm using generalized Laplacian of Gaussian filters. The experimental results show that the implemented architecture has the potential to provide a significant improvement in processing time without losing detection accuracy. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

► Show Figures

Figure 1

22 pages, 2370 KiB

Open AccessArticle

FPGA-Based Processor Acceleration for Image Processing Applications

by Fahad Siddiqui, Sam Amiri, Umar Ibrahim Minhas, Tiantai Deng, Roger Woods, Karen Rafferty and Daniel Crookes

J. Imaging 2019, 5(1), 16; https://doi.org/10.3390/jimaging5010016 - 13 Jan 2019

Cited by 34 | Viewed by 12257

Abstract

FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a [...] Read more.

FPGA-based embedded image processing systems offer considerable computing resources but present programming challenges when compared to software systems. The paper describes an approach based on an FPGA-based soft processor called Image Processing Processor (IPPro) which can operate up to 337 MHz on a high-end Xilinx FPGA family and gives details of the dataflow-based programming environment. The approach is demonstrated for a k-means clustering operation and a traffic sign recognition application, both of which have been prototyped on an Avnet Zedboard that has Xilinx Zynq-7000 system-on-chip (SoC). A number of parallel dataflow mapping options were explored giving a speed-up of 8 times for the k-means clustering using 16 IPPro cores, and a speed-up of 9.6 times for the morphology filter operation of the traffic sign recognition using 16 IPPro cores compared to their equivalent ARM-based software implementations. We show that for k-means clustering, the 16 IPPro cores implementation is 57, 28 and 1.7 times more power efficient (fps/W) than ARM Cortex-A7 CPU, nVIDIA GeForce GTX980 GPU and ARM Mali-T628 embedded GPU respectively. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

► Show Figures

Figure 1

23 pages, 2876 KiB

Open AccessArticle

Optimized Memory Allocation and Power Minimization for FPGA-Based Image Processing

by Paulo Garcia, Deepayan Bhowmik, Robert Stewart, Greg Michaelson and Andrew Wallace

J. Imaging 2019, 5(1), 7; https://doi.org/10.3390/jimaging5010007 - 01 Jan 2019

Cited by 18 | Viewed by 8257

Abstract

Memory is the biggest limiting factor to the widespread use of FPGAs for high-level image processing, which require complete frame(s) to be stored in situ. Since FPGAs have limited on-chip memory capabilities, efficient use of such resources is essential to meet performance, size [...] Read more.

Memory is the biggest limiting factor to the widespread use of FPGAs for high-level image processing, which require complete frame(s) to be stored in situ. Since FPGAs have limited on-chip memory capabilities, efficient use of such resources is essential to meet performance, size and power constraints. In this paper, we investigate allocation of on-chip memory resources in order to minimize resource usage and power consumption, contributing to the realization of power-efficient high-level image processing fully contained on FPGAs. We propose methods for generating memory architectures, from both Hardware Description Languages and High Level Synthesis designs, which minimize memory usage and power consumption. Based on a formalization of on-chip memory configuration options and a power model, we demonstrate how our partitioning algorithms can outperform traditional strategies. Compared to commercial FPGA synthesis and High Level Synthesis tools, our results show that the proposed algorithms can result in up to 60% higher utilization efficiency, increasing the sizes and/or number of frames that can be accommodated, and reduce frame buffers’ dynamic power consumption by up to approximately 70%. In our experiments using Optical Flow and MeanShift Tracking, representative high-level algorithms, data show that partitioning algorithms can reduce total power by up to 25% and 30%, respectively, without impacting performance. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

► Show Figures

Figure 1

21 pages, 509 KiB

Open AccessArticle

Border Handling for 2D Transpose Filter Structures on an FPGA

by Donald G. Bailey and Anoop S. Ambikumar

J. Imaging 2018, 4(12), 138; https://doi.org/10.3390/jimaging4120138 - 26 Nov 2018

Cited by 7 | Viewed by 6028

Abstract

It is sometimes desirable to implement filters using a transpose-form filter structure. However, managing image borders is generally considered more complex than it is with the more commonly used direct-form structure. This paper explores border handling for transpose-form filters, and proposes two novel [...] Read more.

It is sometimes desirable to implement filters using a transpose-form filter structure. However, managing image borders is generally considered more complex than it is with the more commonly used direct-form structure. This paper explores border handling for transpose-form filters, and proposes two novel mechanisms: transformation coalescing, and combination chain modification. For linear filters, coefficient coalescing can effectively exploit the digital signal processing blocks, resulting in the smallest resources requirements. Combination chain modification requires similar resources to direct-form border handling. It is demonstrated that the combination chain multiplexing can be split into two stages, consisting of a combination network followed by the transpose-form combination chain. The resulting transpose-form border handling networks are of similar complexity to the direct-form networks, enabling the transpose-form filter structure to be used where required. The transpose form is also significantly faster, being automatically pipelined by the filter structure. Of the border extension methods, zero-extension requires the least resources. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

► Show Figures

Figure 1

17 pages, 1830 KiB

Open AccessArticle

Accelerating SuperBE with Hardware/Software Co-Design

by Andrew Tzer-Yeu Chen, Rohaan Gupta, Anton Borzenko, Kevin I-Kai Wang and Morteza Biglari-Abhari

J. Imaging 2018, 4(10), 122; https://doi.org/10.3390/jimaging4100122 - 18 Oct 2018

Cited by 5 | Viewed by 5410

Abstract

Background Estimation is a common computer vision task, used for segmenting moving objects in video streams. This can be useful as a pre-processing step, isolating regions of interest for more complicated algorithms performing detection, recognition, and identification tasks, in order to reduce overall [...] Read more.

Background Estimation is a common computer vision task, used for segmenting moving objects in video streams. This can be useful as a pre-processing step, isolating regions of interest for more complicated algorithms performing detection, recognition, and identification tasks, in order to reduce overall computation time. This is especially important in the context of embedded systems like smart cameras, which may need to process images with constrained computational resources. This work focuses on accelerating SuperBE, a superpixel-based background estimation algorithm that was designed for simplicity and reducing computational complexity while maintaining state-of-the-art levels of accuracy. We explore both software and hardware acceleration opportunities, converting the original algorithm into a greyscale, integer-only version, and using Hardware/Software Co-design to develop hardware acceleration components on FPGA fabric that assist a software processor. We achieved a 4.4× speed improvement with the software optimisations alone, and a 2× speed improvement with the hardware optimisations alone. When combined, these led to a 9× speed improvement on a Cyclone V System-on-Chip, delivering almost 38 fps on 320 × 240 resolution images. Full article

(This article belongs to the Special Issue Image Processing Using FPGAs)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Image Processing Using FPGAs

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Published Papers (10 papers)

Editorial

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI