Next Article in Journal
A Survey on Techniques in the Circular Formation of Multi-Agent Systems
Previous Article in Journal
Optimal Load Determination of Capacitor–Inductor Compensated Capacitive Power Transfer System with Curved-Edge Shielding Layer
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

A Collaborative CPU Vector Offloader: Putting Idle Vector Resources to Work on Commodity Processors

1
Department of Artificial Intelligence, Hanyang University, Seoul 04763, Korea
2
Department of Computer Science, Hanyang University, Seoul 04763, Korea
*
Author to whom correspondence should be addressed.
Electronics 2021, 10(23), 2960; https://doi.org/10.3390/electronics10232960
Submission received: 4 November 2021 / Revised: 24 November 2021 / Accepted: 26 November 2021 / Published: 28 November 2021
(This article belongs to the Section Computer Science & Engineering)

Abstract

Most modern processors contain a vector accelerator or internal vector units for the fast computation of large target workloads. However, accelerating applications using vector units is difficult because the underlying data parallelism should be uncovered explicitly using vector-specific instructions. Therefore, vector units are often underutilized or remain idle because of the challenges faced in vector code generation. To solve this underutilization problem of existing vector units, we propose the Vector Offloader for executing scalar programs, which considers the vector unit as a scalar operation unit. By using vector masking, an appropriate partition of the vector unit can be utilized to support scalar instructions. To efficiently utilize all execution units, including the vector unit, the Vector Offloader suggests running the target applications concurrently in both the central processing unit (CPU) and the decoupled vector units, by offloading some parts of the program to the vector unit. Furthermore, a profile-guided optimization technique is employed to determine the optimal offloading ratio for balancing the load between the CPU and the vector unit. We implemented the Vector Offloader on a RISC-V infrastructure with a Hwacha vector unit, and evaluated its performance using a Polybench benchmark set. Experimental results showed that the proposed technique achieved performance improvements up to 1.31× better than the simple, CPU-only execution on a field programmable gate array (FPGA)-level evaluation.
Keywords: vector processors; job offloading; resource utilization; data parallelism; heterogeneous system architectures vector processors; job offloading; resource utilization; data parallelism; heterogeneous system architectures

Share and Cite

MDPI and ACS Style

Son, Y.; Kang, S.; Um, H.; Lee, S.; Ham, J.; Kim, D.; Park, Y. A Collaborative CPU Vector Offloader: Putting Idle Vector Resources to Work on Commodity Processors. Electronics 2021, 10, 2960. https://doi.org/10.3390/electronics10232960

AMA Style

Son Y, Kang S, Um H, Lee S, Ham J, Kim D, Park Y. A Collaborative CPU Vector Offloader: Putting Idle Vector Resources to Work on Commodity Processors. Electronics. 2021; 10(23):2960. https://doi.org/10.3390/electronics10232960

Chicago/Turabian Style

Son, Youngbin, Seokwon Kang, Hongjun Um, Seokho Lee, Jonghyun Ham, Donghyeon Kim, and Yongjun Park. 2021. "A Collaborative CPU Vector Offloader: Putting Idle Vector Resources to Work on Commodity Processors" Electronics 10, no. 23: 2960. https://doi.org/10.3390/electronics10232960

APA Style

Son, Y., Kang, S., Um, H., Lee, S., Ham, J., Kim, D., & Park, Y. (2021). A Collaborative CPU Vector Offloader: Putting Idle Vector Resources to Work on Commodity Processors. Electronics, 10(23), 2960. https://doi.org/10.3390/electronics10232960

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop