Electronics

Research

Jump to: Review

28 pages, 12120 KiB

Open AccessArticle

A Low-Power Spike-Like Neural Network Design

by Michael Losh and Daniel Llamocca

Electronics 2019, 8(12), 1479; https://doi.org/10.3390/electronics8121479 - 4 Dec 2019

Cited by 7 | Viewed by 5388

Abstract

Modern massively-parallel Graphics Processing Units (GPUs) and Machine Learning (ML) frameworks enable neural network implementations of unprecedented performance and sophistication. However, state-of-the-art GPU hardware platforms are extremely power-hungry, while microprocessors cannot achieve the performance requirements. Biologically-inspired Spiking Neural Networks (SNN) have inherent characteristics [...] Read more.

Modern massively-parallel Graphics Processing Units (GPUs) and Machine Learning (ML) frameworks enable neural network implementations of unprecedented performance and sophistication. However, state-of-the-art GPU hardware platforms are extremely power-hungry, while microprocessors cannot achieve the performance requirements. Biologically-inspired Spiking Neural Networks (SNN) have inherent characteristics that lead to lower power consumption. We thus present a bit-serial SNN-like hardware architecture. By using counters, comparators, and an indexing scheme, the design effectively implements the sum-of-products inherent in neurons. In addition, we experimented with various strength-reduction methods to lower neural network resource usage. The proposed Spiking Hybrid Network (SHiNe), validated on an FPGA, has been found to achieve reasonable performance with a low resource utilization, with some trade-off with respect to hardware throughput and signal representation. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

15 pages, 6106 KiB

Open AccessArticle

A Modularized Architecture of Multi-Branch Convolutional Neural Network for Image Captioning

by Shan He and Yuanyao Lu

Electronics 2019, 8(12), 1417; https://doi.org/10.3390/electronics8121417 - 28 Nov 2019

Cited by 12 | Viewed by 3356

Abstract

Image captioning is a comprehensive task in computer vision (CV) and natural language processing (NLP). It can complete conversion from image to text, that is, the algorithm automatically generates corresponding descriptive text according to the input image. In this paper, we present an [...] Read more.

Image captioning is a comprehensive task in computer vision (CV) and natural language processing (NLP). It can complete conversion from image to text, that is, the algorithm automatically generates corresponding descriptive text according to the input image. In this paper, we present an end-to-end model that takes deep convolutional neural network (CNN) as the encoder and recurrent neural network (RNN) as the decoder. In order to get better image captioning extraction, we propose a highly modularized multi-branch CNN, which could increase accuracy while maintaining the number of hyper-parameters unchanged. This strategy provides a simply designed network consists of parallel sub-modules of the same structure. While traditional CNN goes deeper and wider to increase accuracy, our proposed method is more effective with a simple design, which is easier to optimize for practical application. Experiments are conducted on Flickr8k, Flickr30k and MSCOCO entities. Results demonstrate that our method achieves state of the art performances in terms of caption quality. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

19 pages, 8452 KiB

Open AccessArticle

Real-Time Object Detection in Remote Sensing Images Based on Visual Perception and Memory Reasoning

by Xia Hua, Xinqing Wang, Ting Rui, Dong Wang and Faming Shao

Electronics 2019, 8(10), 1151; https://doi.org/10.3390/electronics8101151 - 11 Oct 2019

Cited by 10 | Viewed by 4464

Abstract

Aiming at the real-time detection of multiple objects and micro-objects in large-scene remote sensing images, a cascaded convolutional neural network real-time object-detection framework for remote sensing images is proposed, which integrates visual perception and convolutional memory network reasoning. The detection framework is composed [...] Read more.

Aiming at the real-time detection of multiple objects and micro-objects in large-scene remote sensing images, a cascaded convolutional neural network real-time object-detection framework for remote sensing images is proposed, which integrates visual perception and convolutional memory network reasoning. The detection framework is composed of two fully convolutional networks, namely, the strengthened object self-attention pre-screening fully convolutional network (SOSA-FCN) and the object accurate detection fully convolutional network (AD-FCN). SOSA-FCN introduces a self-attention module to extract attention feature maps and constructs a depth feature pyramid to optimize the attention feature maps by combining convolutional long-term and short-term memory networks. It guides the acquisition of potential sub-regions of the object in the scene, reduces the computational complexity, and enhances the network’s ability to extract multi-scale object features. It adapts to the complex background and small object characteristics of a large-scene remote sensing image. In AD-FCN, the object mask and object orientation estimation layer are designed to achieve fine positioning of candidate frames. The performance of the proposed algorithm is compared with that of other advanced methods on NWPU_VHR-10, DOTA, UCAS-AOD, and other open datasets. The experimental results show that the proposed algorithm significantly improves the efficiency of object detection while ensuring detection accuracy and has high adaptability. It has extensive engineering application prospects. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

12 pages, 3522 KiB

Open AccessArticle

Detection of Wildfire Smoke Images Based on a Densely Dilated Convolutional Network

by Tingting Li, Enting Zhao, Junguo Zhang and Chunhe Hu

Electronics 2019, 8(10), 1131; https://doi.org/10.3390/electronics8101131 - 7 Oct 2019

Cited by 42 | Viewed by 4752

Abstract

Recently, many researchers have attempted to use convolutional neural networks (CNNs) for wildfire smoke detection. However, the application of CNNs in wildfire smoke detection still faces several issues, e.g., the high false-alarm rate of detection and the imbalance of training data. To address [...] Read more.

Recently, many researchers have attempted to use convolutional neural networks (CNNs) for wildfire smoke detection. However, the application of CNNs in wildfire smoke detection still faces several issues, e.g., the high false-alarm rate of detection and the imbalance of training data. To address these issues, we propose a novel framework integrating conventional methods into CNN for wildfire smoke detection, which consisted of a candidate smoke region segmentation strategy and an advanced network architecture, namely wildfire smoke dilated DenseNet (WSDD-Net). Candidate smoke region segmentation removed the complex backgrounds of the wildfire smoke images. The proposed WSDD-Net achieved multi-scale feature extraction by combining dilated convolutions with dense block. In order to solve the problem of the dataset imbalance, an improved cross entropy loss function, namely balanced cross entropy (BCE), was used instead of the original cross entropy loss function in the training process. The proposed WSDD-Net was evaluated according to two smoke datasets, i.e., WS and Yuan, and achieved a high AR (99.20%) and a low FAR (0.24%). The experimental results demonstrated that the proposed framework had better detection capabilities under different negative sample interferences. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

13 pages, 1194 KiB

Open AccessArticle

Efficient Implementation of 2D and 3D Sparse Deconvolutional Neural Networks with a Uniform Architecture on FPGAs

by Deguang Wang, Junzhong Shen, Mei Wen and Chunyuan Zhang

Electronics 2019, 8(7), 803; https://doi.org/10.3390/electronics8070803 - 18 Jul 2019

Cited by 11 | Viewed by 3631

Abstract

Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in [...] Read more.

Three-dimensional (3D) deconvolution is widely used in many computer vision applications. However, most previous works have only focused on accelerating two-dimensional (2D) deconvolutional neural networks (DCNNs) on Field-Programmable Gate Arrays (FPGAs), while the acceleration of 3D DCNNs has not been well studied in depth as they have higher computational complexity and sparsity than 2D DCNNs. In this paper, we focus on the acceleration of both 2D and 3D sparse DCNNs on FPGAs by proposing efficient schemes for mapping 2D and 3D sparse DCNNs on a uniform architecture. Firstly, a pruning method is used to prune unimportant network connections and increase the sparsity of weights. After being pruned, the number of parameters of DCNNs is reduced significantly without accuracy loss. Secondly, the remaining non-zero weights are encoded in coordinate (COO) format, reducing the memory demands of parameters. Finally, to demonstrate the effectiveness of our work, we implement our accelerator design on the Xilinx VC709 evaluation platform for four real-life 2D and 3D DCNNs. After the first two steps, the storage required of DCNNs is reduced up to 3.9×. Results show that the performance of our method on the accelerator outperforms that of the our prior work by 2.5× to 3.6× in latency. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

20 pages, 3477 KiB

Open AccessFeature PaperArticle

Jet Features: Hardware-Friendly, Learned Convolutional Kernels for High-Speed Image Classification

by Taylor Simons and Dah-Jye Lee

Electronics 2019, 8(5), 588; https://doi.org/10.3390/electronics8050588 - 27 May 2019

Cited by 3 | Viewed by 3808

Abstract

This paper explores a set of learned convolutional kernels which we call Jet Features. Jet Features are efficient to compute in software, easy to implement in hardware and perform well on visual inspection tasks. Because Jet Features can be learned, they can be [...] Read more.

This paper explores a set of learned convolutional kernels which we call Jet Features. Jet Features are efficient to compute in software, easy to implement in hardware and perform well on visual inspection tasks. Because Jet Features can be learned, they can be used in machine learning algorithms. Using Jet Features, we make significant improvements on our previous work, the Evolution Constructed Features (ECO Features) algorithm. Not only do we gain a 3.7× speedup in software without loosing any accuracy on the CIFAR-10 and MNIST datasets, but Jet Features also allow us to implement the algorithm in an FPGA using only a fraction of its resources. We hope to apply the benefits of Jet Features to Convolutional Neural Networks in the future. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

22 pages, 3964 KiB

Open AccessArticle

An Efficient Streaming Accelerator for Low Bit-Width Convolutional Neural Networks

by Qinyu Chen, Yuxiang Fu, Wenqing Song, Kaifeng Cheng, Zhonghai Lu, Chuan Zhang and Li Li

Electronics 2019, 8(4), 371; https://doi.org/10.3390/electronics8040371 - 27 Mar 2019

Cited by 4 | Viewed by 4148

Abstract

Convolutional Neural Networks (CNNs) have been widely applied in various fields, such as image recognition, speech processing, as well as in many big-data analysis tasks. However, their large size and intensive computation hinder their deployment in hardware, especially on the embedded systems with [...] Read more.

Convolutional Neural Networks (CNNs) have been widely applied in various fields, such as image recognition, speech processing, as well as in many big-data analysis tasks. However, their large size and intensive computation hinder their deployment in hardware, especially on the embedded systems with stringent latency, power, and area requirements. To address this issue, low bit-width CNNs are proposed as a highly competitive candidate. In this paper, we propose an efficient, scalable accelerator for low bit-width CNNs based on a parallel streaming architecture. With a novel coarse grain task partitioning (CGTP) strategy, the proposed accelerator with heterogeneous computing units, supporting multi-pattern dataflows, can nearly double the throughput for various CNN models on average. Besides, a hardware-friendly algorithm is proposed to simplify the activation and quantification process, which can reduce the power dissipation and area overhead. Based on the optimized algorithm, an efficient reconfigurable three-stage activation-quantification-pooling (AQP) unit with the low power staged blocking strategy is developed, which can process activation, quantification, and max-pooling operations simultaneously. Moreover, an interleaving memory scheduling scheme is proposed to well support the streaming architecture. The accelerator is implemented with TSMC 40 nm technology with a core size of

0.17

mm

^{2}

. It can achieve

7.03

TOPS/W energy efficiency and

4.14

TOPS/mm

^{2}

area efficiency at

100.1

mW, which makes it a promising design for the embedded devices. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

15 pages, 2318 KiB

Open AccessArticle

Optimized Compression for Implementing Convolutional Neural Networks on FPGA

by Min Zhang, Linpeng Li, Hai Wang, Yan Liu, Hongbo Qin and Wei Zhao

Electronics 2019, 8(3), 295; https://doi.org/10.3390/electronics8030295 - 6 Mar 2019

Cited by 53 | Viewed by 7865

Abstract

Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes [...] Read more.

Field programmable gate array (FPGA) is widely considered as a promising platform for convolutional neural network (CNN) acceleration. However, the large numbers of parameters of CNNs cause heavy computing and memory burdens for FPGA-based CNN implementation. To solve this problem, this paper proposes an optimized compression strategy, and realizes an accelerator based on FPGA for CNNs. Firstly, a reversed-pruning strategy is proposed which reduces the number of parameters of AlexNet by a factor of 13× without accuracy loss on the ImageNet dataset. Peak-pruning is further introduced to achieve better compressibility. Moreover, quantization gives another 4× with negligible loss of accuracy. Secondly, an efficient storage technique, which aims for the reduction of the whole overhead cache of the convolutional layer and the fully connected layer, is presented respectively. Finally, the effectiveness of the proposed strategy is verified by an accelerator implemented on a Xilinx ZCU104 evaluation board. By improving existing pruning techniques and the storage format of sparse data, we significantly reduce the size of AlexNet by 28×, from 243 MB to 8.7 MB. In addition, the overall performance of our accelerator achieves 9.73 fps for the compressed AlexNet. Compared with the central processing unit (CPU) and graphics processing unit (GPU) platforms, our implementation achieves 182.3× and 1.1× improvements in latency and throughput, respectively, on the convolutional (CONV) layers of AlexNet, with an 822.0× and 15.8× improvement for energy efficiency, separately. This novel compression strategy provides a reference for other neural network applications, including CNNs, long short-term memory (LSTM), and recurrent neural networks (RNNs). Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

18 pages, 5742 KiB

Open AccessArticle

An FPGA-Based CNN Accelerator Integrating Depthwise Separable Convolution

by Bing Liu, Danyin Zou, Lei Feng, Shou Feng, Ping Fu and Junbao Li

Electronics 2019, 8(3), 281; https://doi.org/10.3390/electronics8030281 - 3 Mar 2019

Cited by 79 | Viewed by 10261

Abstract

The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great [...] Read more.

The Convolutional Neural Network (CNN) has been used in many fields and has achieved remarkable results, such as image classification, face detection, and speech recognition. Compared to GPU (graphics processing unit) and ASIC, a FPGA (field programmable gate array)-based CNN accelerator has great advantages due to its low power consumption and reconfigurable property. However, FPGA’s extremely limited resources and CNN’s huge amount of parameters and computational complexity pose great challenges to the design. Based on the ZYNQ heterogeneous platform and the coordination of resource and bandwidth issues with the roofline model, the CNN accelerator we designed can accelerate both standard convolution and depthwise separable convolution with a high hardware resource rate. The accelerator can handle network layers of different scales through parameter configuration and maximizes bandwidth and achieves full pipelined by using a data stream interface and ping-pong on-chip cache. The experimental results show that the accelerator designed in this paper can achieve 17.11GOPS for 32bit floating point when it can also accelerate depthwise separable convolution, which has obvious advantages compared with other designs. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

18 pages, 4563 KiB

Open AccessArticle

Energy-Efficient Gabor Kernels in Neural Networks with Genetic Algorithm Training Method

by Fanjie Meng, Xinqing Wang, Faming Shao, Dong Wang and Xia Hua

Electronics 2019, 8(1), 105; https://doi.org/10.3390/electronics8010105 - 18 Jan 2019

Cited by 22 | Viewed by 6570

Abstract

Deep-learning convolutional neural networks (CNNs) have proven to be successful in various cognitive applications with a multilayer structure. The high computational energy and time requirements hinder the practical application of CNNs; hence, the realization of a highly energy-efficient and fast-learning neural network has [...] Read more.

Deep-learning convolutional neural networks (CNNs) have proven to be successful in various cognitive applications with a multilayer structure. The high computational energy and time requirements hinder the practical application of CNNs; hence, the realization of a highly energy-efficient and fast-learning neural network has aroused interest. In this work, we address the computing-resource-saving problem by developing a deep model, termed the Gabor convolutional neural network (Gabor CNN), which incorporates highly expression-efficient Gabor kernels into CNNs. In order to effectively imitate the structural characteristics of traditional weight kernels, we improve upon the traditional Gabor filters, having stronger frequency and orientation representations. In addition, we propose a procedure to train Gabor CNNs, termed the fast training method (FTM). In FTM, we design a new training method based on the multipopulation genetic algorithm (MPGA) and evaluation structure to optimize improved Gabor kernels, but train the rest of the Gabor CNN parameters with back-propagation. The training of improved Gabor kernels with MPGA is much more energy-efficient with less samples and iterations. Simple tasks, like character recognition on the Mixed National Institute of Standards and Technology database (MNIST), traffic sign recognition on the German Traffic Sign Recognition Benchmark (GTSRB), and face detection on the Olivetti Research Laboratory database (ORL), are implemented using LeNet architecture. The experimental result of the Gabor CNN and MPGA training method shows a 17–19% reduction in computational energy and time and an 18–21% reduction in storage requirements with a less than 1% accuracy decrease. We eliminated a significant fraction of the computation-hungry components in the training process by incorporating highly expression-efficient Gabor kernels into CNNs. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

19 pages, 752 KiB

Open AccessArticle

A Uniform Architecture Design for Accelerating 2D and 3D CNNs on FPGAs

by Zhiqiang Liu, Paul Chow, Jinwei Xu, Jingfei Jiang, Yong Dou and Jie Zhou

Electronics 2019, 8(1), 65; https://doi.org/10.3390/electronics8010065 - 7 Jan 2019

Cited by 51 | Viewed by 5318

Abstract

Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design [...] Read more.

Three-dimensional convolutional neural networks (3D CNNs) have gained popularity in many complicated computer vision applications. Many customized accelerators based on FPGAs are proposed for 2D CNNs, while very few are for 3D CNNs. Three-D CNNs are far more computationally intensive and the design space for 3D CNN acceleration has been further expanded since one more dimension is introduced, making it a big challenge to accelerate 3D CNNs on FPGAs. Motivated by the finding that the computation patterns of 2D and 3D CNNs are very similar, we propose a uniform architecture design for accelerating both 2D and 3D CNNs in this paper. The uniform architecture is based on the idea of mapping convolutions to matrix multiplications. A customized mapping module is developed to generate the feature matrix tilings with no need to store the entire enlarged feature matrix on-chip or off-chip, a splitting strategy is adopted to reconstruct a convolutional layer to adapt to the on-chip memory capacity, and a 2D multiply-and-accumulate (MAC) array is adopted to compute matrix multiplications efficiently. For demonstration, we implement an accelerator prototype with a high-level synthesis (HLS) methodology on a Xilinx VC709 board and test the accelerator on three typical CNN models: AlexNet, VGG16, and C3D. Experimental results show that the accelerator achieves state-of-the-art throughput performance on both 2D and 3D CNNs, with much better energy efficiency than the CPU and GPU. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

Review

Jump to: Research

25 pages, 376 KiB

Open AccessFeature PaperReview

A Review of Binarized Neural Networks

by Taylor Simons and Dah-Jye Lee

Electronics 2019, 8(6), 661; https://doi.org/10.3390/electronics8060661 - 12 Jun 2019

Cited by 166 | Viewed by 13520

Abstract

In this work, we review Binarized Neural Networks (BNNs). BNNs are deep neural networks that use binary values for activations and weights, instead of full precision values. With binary values, BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes [...] Read more.

In this work, we review Binarized Neural Networks (BNNs). BNNs are deep neural networks that use binary values for activations and weights, instead of full precision values. With binary values, BNNs can execute computations using bitwise operations, which reduces execution time. Model sizes of BNNs are much smaller than their full precision counterparts. While the accuracy of a BNN model is generally less than full precision models, BNNs have been closing accuracy gap and are becoming more accurate on larger datasets like ImageNet. BNNs are also good candidates for deep learning implementations on FPGAs and ASICs due to their bitwise efficiency. We give a tutorial of the general BNN methodology and review various contributions, implementations and applications of BNNs. Full article

(This article belongs to the Special Issue Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications)

► Show Figures

Figure 1

Journal Menu

Journal Browser

Convolutional Neural Network Design and Hardware Implementation for Real-Time Vision Applications

Share This Special Issue

Special Issue Editor

Special Issue Information

Benefits of Publishing in a Special Issue

Published Papers (12 papers)

Research

Review

Further Information

Guidelines

MDPI Initiatives

Follow MDPI