Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

Xu, Yuhua; Luo, Jie; Sun, Wei

doi:10.3390/s24072239

Open AccessArticle

Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

by

Yuhua Xu

,

Jie Luo

and

Wei Sun

^*

School of Electronics and Information Technology (School of Microelectronics), Sun Yat-sen University, Guangzhou 510275, China

^*

Author to whom correspondence should be addressed.

Sensors 2024, 24(7), 2239; https://doi.org/10.3390/s24072239

Submission received: 3 March 2024 / Revised: 20 March 2024 / Accepted: 29 March 2024 / Published: 31 March 2024

(This article belongs to the Section Electronic Sensors)

Download

Browse Figures

Versions Notes

Abstract

Convolutional neural networks (CNNs) have significantly advanced various fields; however, their computational demands and power consumption have escalated, posing challenges for deployment in low-power scenarios. To address this issue and facilitate the application of CNNs in power constrained environments, the development of dedicated CNN accelerators is crucial. Prior research has predominantly concentrated on developing low precision CNN accelerators using code generated from high-level synthesis (HLS) tools. Unfortunately, these approaches often fail to efficiently utilize the computational resources of field-programmable gate arrays (FPGAs) and do not extend well to full precision scenarios. To overcome these limitations, we integrate vector dot products to unify the convolution and fully connected layers. By treating the row vector of input feature maps as the fundamental processing unit, we balance processing latency and resource consumption while eliminating data rearrangement time. Furthermore, an accurate design space exploration (DSE) model is established to identify the optimal design points for each CNN layer, and dynamic partial reconfiguration is employed to maximize each layer’s access to computational resources. Our approach is validated through the implementation of AlexNet and VGG16 on 7A100T and ZU15EG platforms, respectively. We achieve an average convolutional layer throughput of 28.985 GOP/s and 246.711 GOP/s for full precision. Notably, the proposed accelerator demonstrates remarkable power efficiency, with a maximum improvement of 23.989 and 15.376 times compared to current state-of-the-art FPGA implementations.

Keywords: FPGA accelerator; convolutional neural networks; full precision; design space exploration; dynamic partial reconfiguration

Share and Cite

MDPI and ACS Style

Xu, Y.; Luo, J.; Sun, W. Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure. Sensors 2024, 24, 2239. https://doi.org/10.3390/s24072239

AMA Style

Xu Y, Luo J, Sun W. Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure. Sensors. 2024; 24(7):2239. https://doi.org/10.3390/s24072239

Chicago/Turabian Style

Xu, Yuhua, Jie Luo, and Wei Sun. 2024. "Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure" Sensors 24, no. 7: 2239. https://doi.org/10.3390/s24072239

APA Style

Xu, Y., Luo, J., & Sun, W. (2024). Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure. Sensors, 24(7), 2239. https://doi.org/10.3390/s24072239

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Flare: An FPGA-Based Full Precision Low Power CNN Accelerator with Reconfigurable Structure

Abstract

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI