Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and Kubernetes

Dakić, Vedran; Mršić, Leo; Kunić, Zdravko; Đambić, Goran

doi:10.3390/electronics13173494

Open AccessArticle

Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and Kubernetes

by

Vedran Dakić

¹

,

Leo Mršić

^2,*

,

Zdravko Kunić

²

and

Goran Đambić

³

¹

Department of Cybersecurity and System Engineering, Algebra University, 10000 Zagreb, Croatia

²

Department of Information Systems and Business Analytics, Algebra University, 10000 Zagreb, Croatia

³

Department of Software Engineering, Algebra University, 10000 Zagreb, Croatia

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(17), 3494; https://doi.org/10.3390/electronics13173494

Submission received: 11 August 2024 / Revised: 28 August 2024 / Accepted: 29 August 2024 / Published: 3 September 2024

(This article belongs to the Section Computer Science & Engineering)

Download

Browse Figures

Versions Notes

Abstract

This paper thoroughly assesses the ARM and RISC-V architectures in the context of high-performance computing (HPC). It includes an analysis of Docker and Kubernetes integration. Our study aims to evaluate and compare these systems’ performance, scalability, and practicality in a general context and then assess the impact they might have on special use cases, like HPC. ARM-based systems exhibited better performance and seamless integration with Docker and Kubernetes, underscoring their advanced development and effectiveness in managing high-performance computing workloads. On the other hand, despite their open-source architecture, RISC-V platforms presented considerable intricacy and difficulties in working with Kubernetes, which hurt their overall effectiveness and ease of management. The results of our study offer valuable insights into the practical consequences of implementing these architectures for HPC, highlighting ARM’s preparedness and the potential of RISC-V while acknowledging the increased complexity and significant trade-offs involved at this point.

Keywords:

HPC; Docker; containers; performance; evaluation; heterogenous computing

1. Introduction

HPC has played a leading role in pushing technological improvements, particularly in scientific research, weather forecasting, financial modeling, and other sectors that rely on extensive computational capabilities. Throughout history, the x86 architecture, introduced by Intel and AMD has been the prevailing force in the field of HPC. The advancement of this design, characterized by ongoing enhancements in computational capability, parallel processing, and energy efficiency, has facilitated the creation of some of the most formidable supercomputers globally. The Summit supercomputer at Oak Ridge National Laboratory and the Frontera at Texas Advanced Computing Center illustrate the exceptional performance of x86 architecture in HPC.

Nevertheless, the increasing need for computational capacity, along with the necessity for energy-saving solutions, have stimulated interest in alternate architectures. ARM, initially developed for energy-efficient use in mobile devices, has gained attention in HPC due to its energy efficiency and expanding processing capabilities. ARM debuted in the HPC field by introducing the Fujitsu A64FX processor in the Fugaku supercomputer. This processor has achieved remarkable success in terms of both performance and power efficiency, leading it to secure the top position on the TOP500 list of the world’s most powerful supercomputers.

The ARM and RISC-V architectures are separate but very capable ISAs (Instruction Set Architectures) in contemporary computing. ARM has become a prominent and influential player, especially in the mobile and embedded industries, thanks to its well-developed network of resources, comprehensive assistance, and demonstrated history of success. Renowned for its exceptional power economy and superior performance, this technology is the favored option for various consumer electronics, from smartphones to IoT devices. The proprietary nature of ARM guarantees a resilient environment with significant support from the industry, but it also restricts the ability to customize and adapt.

Conversely, RISC-V is a developing open-source option quickly gaining popularity, especially in academic and specialized sectors. RISC-V’s open ISA enables developers to customize their designs for specific applications without being limited by licensing fees. This level of transparency promotes creativity and cooperation among individuals worldwide. Although ARM offers a dependable and well-established platform, RISC-V’s ability to be customized and its open nature make it a potentially revolutionary technology, especially in fields that require specialized computing solutions.

Combining these two architectures, the HPC community has shown great interest in using both—either for already well-known capabilities (ARM), or customization and the advantages of an open-source paradigm (RISC-V). However, as discussed in Section 4, the intricate nature of incorporating RISC-V into contemporary software ecosystems, including containerization technologies such as Docker and orchestration tools like Kubernetes, poses distinct difficulties.

Docker and Kubernetes provide a streamlined and effective method for deploying and overseeing applications in many settings, improving scalability, uniformity, and security. In high-performance computing, isolated and reproducible environments are a must; for science experiments, reproducibility and security are paramount to the scientific method. This approach can reduce the effort required for deployment and enhance the efficient exploitation of resources. Standardization expedites development, cooperation, and portability while guaranteeing the smooth operation of applications across many system architectures. Containers are beginning to play a vital role in optimizing HPC workflows, lowering overhead when compared to physical or virtualized workloads, and minimizing costs by enabling more efficient utilization of hardware resources. Therefore, introducing containerization technologies like Docker and Kubernetes can fundamentally transform the management and deployment of HPC workloads. These technologies provide the capacity to quickly move and adapt applications to different computer environments while improving efficiency. By integrating these technologies with ARM and RISC-V platforms, HPC can achieve higher levels of performance and efficiency. However, combining these systems has challenges, mainly due to the variances in architecture and varied levels of software support.

This paper is organized as follows: In the next two sections, we will discuss related research and the basics of ARM and RISC-V platforms as platforms for Docker and Kubernetes. After those two sections, we will discuss our experimental test and setup environment, followed by the performance evaluations, discussion about the performance and feasibility of being used by HPC environments, future works, and conclusion.

2. Related Work

Researchers presented a low-power, fault-tolerant heterogeneous computer featuring multi-core CPUs, GPUs, and FPGAs for small satellite applications in 2015. The system’s x86 CPU allows for extensive software compatibility in space environments. Integrating GPU and FPGA enhances computational performance, achieving TFLOP-level processing power. Challenges include ensuring radiation tolerance and safety-critical operations, previously unaddressed for x86 processors in space. Future research could focus on further improving radiation tolerance and developing more robust software ecosystems for space applications [1].

The integration of FPGAs into heterogeneous system architecture (HSA)-compliant systems was also researched, addressing the challenge of interfacing FPGAs with CPUs. The LibHSA IP library simplifies this integration, allowing FPGA accelerators to utilize high-level language toolchains. Demonstrations with image processors show significant performance improvements. Problems include the lack of FPGA models in HSA standards and the complexities of interfacing. Future work could aim to standardize FPGA models in HSA and further simplify FPGA–CPU integration [2].

Feng et al. (2016) introduced HeteroSim, a simulator for heterogeneous CPU–FPGA systems. It supports architectural exploration by simulating x86 multi-cores connected to FPGAs, allowing for performance analysis and the optimization of memory hierarchies. Challenges include the absence of integrated simulators for system-level architectural exploration. Future research may enhance simulation accuracy and extend the tool to support a broader range of architectures and applications [3].

A 2017 survey integrated heterogeneous systems and collaborative computing techniques, focusing on CPU-GPU and CPU–FPGA systems. OpenCL’s effectiveness for programming these systems was evaluated, highlighting limitations and suggesting future programming languages. Problems include inadequate support for memory coherence and shared virtual memory in current programming interfaces. Future research could improve programming models to enhance collaboration between heterogeneous devices [4].

CPU–GPU heterogeneous computing techniques were also reviewed in 2015, discussing workload partitioning and the design of CPU–GPU systems. They address performance and energy efficiency challenges and review benchmark suites for evaluating such systems. Critical problems include the complexity of optimizing workload distribution and the need for better performance models. Future research may focus on developing more effective optimization techniques and comprehensive benchmark suites [5].

In 2016, a performance evaluation of APIs and partially reconfigurable hardware accelerators on heterogeneous computing platforms was proposed. The authors use the Parallella single-board computer to compare matrix-vector multiplication performance across different accelerators. They find PR hardware accelerators the most efficient for increasing data processing. Challenges include optimizing API performance and managing hardware reconfiguration. Future work could improve API efficiency and explore new applications for PR hardware accelerators [6].

Researchers also presented different platforms combining CPUs, GPUs, and FPGAs for high-performance and embedded systems. These platforms achieve significant speed-ups and energy savings compared to dual-device systems. Challenges include managing data transfers and maximizing parallelism. Future research could optimize workload distribution and explore new high-performance applications for trigeneous platforms [7].

HERO (the Open Heterogenous Research Platform), a research platform combining RISC-V cores on FPGAs with ARM host processors, was introduced in 2017. HERO includes a comprehensive software stack supporting OpenMP and enabling the rapid exploration of heterogeneous systems. Key challenges involve integrating and scaling PMCA (Programmable Many-Core Accelerator) architectures. Future research could expand HERO’s capabilities and apply it to more diverse application domains [8].

In 2017, a new programming model that supports approximation semantics in heterogeneous architectures comprising CPUs, GPUs, and FPGAs was proposed. The model allows for dynamic performance–quality trade-offs. Challenges include ensuring seamless execution and effective approximation control. Future research may refine the approximation model and expand its application to other domains [9].

A 2015 survey reviews performance tools for heterogeneous systems, focusing on GPUs and Intel’s Xeon Phi. It describes development frameworks and performance models for accelerator-based systems. Key issues include the need for more accurate simulators and comprehensive performance tools. Future research could enhancing the performance of modeling techniques and developing more robust simulation tools for heterogeneous computing [10].

A notable study introduced a synergistic computing framework built on the Halide programming model, designed to enhance performance in heterogeneous systems by leveraging both CPUs and GPUs. The framework addresses critical issues like data coherence, workload partitioning, and job dispatching. Despite significant performance gains, the complexity of writing cooperative Halide programs remains challenging, necessitating further simplifications and optimizations in future research [11]. FPGA implementations have shown significant potential in portable and energy-efficient applications. An innovative solution for portable DNA sequencing using a base-calling hardware architecture based on RISC-V demonstrated a 1.95× energy efficiency improvement over x86 and 38% over ARM. Despite these advancements, further miniaturization and computational load reductions are required for broader adoption [12].

Another study explored the integration of novel computing architectures, including ARM and RISC-V CPUs and FPGAs, into HPC education at Georgia Tech. The collaborative workflow highlights the benefits and challenges of integrating these architectures into educational curricula. Key sticking points included the complexity of managing diverse hardware and software environments. Future research should develop more streamlined tools and methodologies to facilitate the integration of heterogeneous computing architectures in educational settings [13]. An FPGA-based research platform, HEROv2, combines RISC-V cores with ARMv8 or RV64 host processors, enabling significant application speed-ups. HEROv2’s complexity suggests a need for further simplifications in hardware–software integration. Future research could enhance the platform’s compiler capabilities and expand its application range [14].

PCS, a productive computational science platform, aims to unify multiple programming models for cluster-scale heterogeneous computing, emphasizing FPGA acceleration for graph-centric workloads. The platform presents significant advancements but faces challenges in optimizing FPGA designs for specific applications and ensuring compatibility with diverse hardware. Future research should focus on developing more adaptable FPGA models and enhancing integration [15]. Combining multiple CPU and CGRA (Coarse-Grained Reconfigurable Architecture) cores is explored, presenting implementation results for digital audio and machine learning applications. Despite the promising results, the absence of commercial CGRAs poses challenges regarding widespread adoption. Future research should aim at improving performance metrics and exploring the commercial applications of CGRA-based architectures to validate their practical viability [16].

The study on accelerating elliptic curve cryptography on NVIDIA Tegra X2 embedded GPU platforms achieved significant speedups in cryptographic operations, outperforming ARM CPUs and FPGA implementations in terms of power efficiency. The reliance on specific hardware platforms limits the generalizability of the results, suggesting future research should explore broader hardware compatibility and further optimization techniques [17].

Heterogeneous computing is crucial in AI and Big Data applications within high-energy physics. Integrating GPUs, FPGAs, and other specialized processors presents challenges in performance optimization and energy efficiency. Future research should focus on developing unified frameworks and tools to manage heterogeneous systems effectively, ensuring seamless integration and optimal resource utilization across various computational tasks [18]. The FLIA architecture abstracts heterogeneous computing for mobile GPUs and FPGAs, supporting task partition, communication, and synchronization. Despite impressive performance gains, challenges include the complexity of developing applications for heterogeneous architectures and managing inter-processor communication. Future research should simplify the development process and enhance the proposed architecture’s scalability for broader applications [19].

Another significant contribution is the development of Molecule, a serverless computing system utilizing heterogeneous computers, including general-purpose devices like Nvidia DPUs and domain-specific accelerators such as FPGAs and GPUs. By leveraging heterogeneous hardware, Molecule significantly improves function density and application performance. Key innovations include the XPU-Shim for multi-OS systems and vectorized sandbox for hardware abstraction. Despite the performance improvements, challenges remain in optimizing startup and communication latency. Future research should refine these optimizations and expand the system’s compatibility with additional hardware types [20].

Energy efficiency is another critical focus area. A Xilinx Zynq MPSoC device approach demonstrates significant energy savings and performance improvements by simultaneously executing tasks and applying adaptive voltage scaling. Integrating GPUs and FPGAs as specialized hardware units showcases the potential of heterogeneous computing for energy-efficient applications. However, optimizing the balance between performance and energy consumption across different tasks remains challenging. Future research should enhance the adaptability of voltage-scaling techniques and explore additional use cases for reconfigurable MPSoCs [21].

Heterogeneous computing also plays a crucial role in deep learning. Combining FPGAs and GPUs for accelerating deep neural networks in embedded systems demonstrates that the direct hardware mapping of CNNs on FPGAs outperforms GPU implementations regarding energy efficiency and execution time. However, the resource-intensive nature of FPGA-based implementations necessitates a hybrid approach. The study shows that heterogeneous FPGA–GPU acceleration reduces energy consumption and latency compared to GPU-only solutions. Challenges include managing communication overheads and optimizing the division of tasks between FPGAs and GPUs. Future research should focus on refining hybrid acceleration techniques and expanding the range of supported neural network architectures [22].

Security concerns in heterogeneous systems are also highlighted. A survey on electrical-level attacks on CPUs, FPGAs, and GPUs discusses how vulnerabilities in individual components can affect the overall security of heterogeneous architectures. Ensuring the security of multitenant environments and developing robust defense mechanisms are vital challenges. Future research should investigate the potential for cross-component attacks in heterogeneous systems and develop comprehensive security frameworks to mitigate these risks [23].

The demand for computing power and the diversity of computational problems have led to exploring the non-uniform decomposition of data domains to improve fluid flow simulation performance on hybrid architectures. Evaluations of systems combining x86 CPUs with GPUs and ARM CPUs with FPGAs show performance improvements of up to 15.15% with non-uniform partitioning. Despite the benefits, challenges include managing the complexity of data decomposition and optimizing the collaboration between different hardware components. Future research should refine data partitioning techniques and expand the applicability of collaborative execution methods to other computational problems [24].

Scheduling frameworks for heterogeneous computing platforms combining CPU and FPGA resources effectively increase performance and reduce energy consumption. Extending a high-level C++ template-based scheduler to the Xeon+FPGA platform demonstrates performance improvements of up to 8× compared to CPU-only solutions. Key challenges include optimizing the scheduler for dynamic and adaptive task allocation. Future research should enhance the framework’s adaptability to different application domains and explore additional HPC benchmarks to validate its effectiveness [25]. The study on accelerating coupled-cluster calculations with GPUs presents an algorithm adapted for heterogeneous computing platforms with multicore CPUs and GPUs. The authors demonstrate significant speedups on pre-exascale and exascale supercomputers. Challenges include managing the limited memory space of GPUs and minimizing CPU–GPU data transfers. Future research should optimize data tiling strategies and expand the algorithm’s applicability to other computational chemistry problems [26].

Another study reviews discussions from critical workshops on heterogeneous computing, highlighting the integration of multicore CPUs with GPUs, FPGAs, and Intel Xeon Phis in HPC clusters. The workshops emphasized challenges such as maximizing efficiency, resource utilization, and energy optimization. Future research should aim to develop specialized programming environments and tools to address these challenges and foster innovation in heterogeneous computing [27]. The development of Xar-Trek, a compiler and runtime framework, allows execution migration between heterogeneous ISA CPUs and FPGAs at runtime. By compiling applications for multiple CPU ISAs and FPGA acceleration, Xar-Trek enables dynamic and transparent migration. Challenges include optimizing scheduling policies and managing diverse hardware configurations. Future research should focus on refining scheduling heuristics and expanding the framework’s support to additional hardware architectures for broader applicability [28].

A historical perspective on heterogeneous computing highlights the complexity of integrating diverse processors like RISC, GPUs, TPUs, and FPGAs. It emphasizes the need for advanced compilers to map programming languages to various hardware platforms while ensuring optimal performance. Key challenges include maintaining compatibility and efficiency across heterogeneous systems. Future research should focus on developing robust compiler technologies and enhancing the interoperability of different processing units to exploit the potential of heterogeneous computing entirely [29].

Advancements in heterogeneous computing have revolutionized HPC platforms, with significant developments in both hardware and software. A paper from 2022 presented algorithmic and software development advancements, emphasizing the need for novel programming environments to harness the potential of heterogeneous systems. The authors note the increasing adoption of GPU accelerators and highlight the challenges in achieving performance portability and energy efficiency across diverse architectures. Future research could explore further integrating new accelerator technologies and improving programming models to enhance efficiency and performance [30].

The impact of heterogeneous computing systems cannot be overstated, especially for complex scientific discovery workflows. They highlight the transition from homogeneous to heterogeneous architectures as a response to the limits of Moore’s law, with a focus on sustainable computing through domain-specific hardware. However, the increased complexity in system design and the need for better integration of multidisciplinary workflows are significant challenges. Potential future research could aim to optimize these systems’ design to balance performance and energy efficiency [31].

In a paper from 2021, challenges of heterogeneous migration between x86 and ARM machines are addressed, focusing on techniques to simplify the migration process. The research highlights the significant overhead induced by transforming the execution state during migration, which offsets the benefits. The goal is to create a uniform address space to simplify migration, but performance and energy efficiency remain problematic. Future research could explore more efficient methods of state transformation and address space unification [32].

In 2022, a runtime framework for performance portable programming on distributed heterogeneous systems was introduced. By hiding hardware idiosyncrasies and optimizing resource utilization, substantial performance and scalability improvements were noticed. Despite these advancements, programming such architectures remains challenging due to their increased complexity. Future research could focus on further reducing this complexity and enhancing the portability of performance across diverse systems [33].

In 2020, a new methodology for simultaneous heterogeneous computing using a quad-core ARM Cortex-A53 CPU and FPGA accelerator was introduced. Authors achieved significant performance improvements through a heterogeneous scheduler that optimally distributes tasks. However, integrating diverse computing units introduces task scheduling and synchronization complexity. Future research could aim to develop more sophisticated scheduling algorithms to enhance performance and energy efficiency further [34].

A paper from 2022 discussed the introduction of heterogeneous computing in undergraduate education using DPC++. The authors highlight the importance of teaching modern computing architectures, including GPUs and FPGAs, to bridge the gap between academic knowledge and industry requirements. The challenge lies in simplifying complex hardware concepts for students. Future research could explore more effective teaching methods and tools to improve student understanding and engagement in heterogeneous computing [35].

A runtime support framework for performance portability on heterogeneous distributed platforms was presented in 2023. This framework significantly improves performance and scalability, particularly in distributed memory environments. However, achieving efficient inter-node communication among diverse devices remains a challenge. Future research could further focus on optimizing communication protocols and exploring new abstractions to enhance performance portability [36].

Scientists explored energy-aware self-adaptation for applications on heterogeneous parallel architectures. They developed a framework that automates application configuration and deployment to improve energy efficiency. Despite these advancements, managing the complexities of heterogeneous devices and achieving consistent energy savings remains challenging. Future research could investigate more adaptive algorithms and techniques to optimize energy usage dynamically [37].

A methodology from 2022 characterizes uncertainties in Earth system modeling with heterogeneous many-core architectures. They address the challenges of non-bit-for-bit reproducibility and numerical perturbations in simulations, which can blend with coding errors. Their methodology provides a way to distinguish platform-induced perturbations from software bugs, enhancing model reliability. Future research could focus on refining and applying this methodology to a broader range of applications to ensure robust modeling on new architectures [38].

Constructing and applying a large-scale ARM computing cluster for high-energy physics experiments was discussed in 2022. They highlight the cost-effectiveness of ARM processors compared to traditional x86 processors. However, they also note the performance limitations of ARM processors, especially in memory-bound operations. Future research could explore optimizing ARM-based clusters for specific scientific applications and reducing reliance on single-chip architectures to enhance performance [39].

In their study, scientists proposed AGILER, a tile-based many-core architecture for RISC-V processors designed for adaptability and modularity in heterogeneous computing environments. The architecture supports 32-bit and 64-bit RISC-V ISAs with scalable network-on-chip communication, enabling high system scalability. Evaluations on Xilinx FPGAs showed scalable performance up to 685 MOPS for 32-bit tiles and 316 MOPS for 64-bit tiles, with reconfiguration times of 38.1 ms per tile. The key challenges include managing the complexity of modularity and ensuring efficient reconfiguration. Future research could focus on optimizing the reconfiguration manager and expanding the architecture’s application domains [40].

In their survey, scientists analyzed SoC platform security across ARM, Intel, and RISC-V architectures, highlighting RISC-V’s potential for customizable security extensions. They discuss various hardware and software security attacks and compare RISC-V’s capabilities to traditional architectures. The trade-off between performance and security remains a significant challenge. They suggest further research into enhancing RISC-V’s security features without compromising performance, mainly focusing on developing robust security frameworks and exploring new attack vectors to improve defense mechanisms [41].

xBGAS (eXtended Big Global Address Space), an extension to the RISC-V ISA to enhance HPC by enabling direct access to remote shared memory, was introduced in 2021. This reduces inter-process communication overhead by 69.26% and achieves an average 21.96% performance gain. Challenges include ensuring seamless integration with existing software infrastructures and maintaining low latency in data access. Future research might explore further optimization of the xBGAS design and its application in various HPC scenarios, potentially expanding its adoption in large-scale distributed systems [42].

An open-source FPGA platform for exploring heterogeneous many-core architectures, focusing on integrating custom accelerators with standard RISC-V cores, was discussed in a paper from 2023. The platform’s coherent shared memory model enhances programmability and communication efficiency. Preliminary results indicate significant benefits from using systolic accelerators. The main challenges are optimizing the network and memory subsystems and managing resource usage. Future work could involve refining the platform to support more complex accelerators and improving the coherence mechanisms to enhance overall system performance [43].

An examination of using RISC-V in genomics-based workloads, benchmarking the Variant-Interaction Analytics use case, was conducted in 2023. The authors highlight the potential of RISC-V for HPC in large-scale scientific environments. The study reveals challenges in achieving comparable performance to x86 architectures, particularly in data processing and system integration. Future research should optimize RISC-V implementations for specific scientific applications, improve performance and scalability, and address integration issues to facilitate broader adoption in genomics and other data-intensive fields [44].

A secure heterogeneous RISC-V system featuring a protection-dedicated core for establishing root-of-trust and monitoring execution was developed in 2023. This architecture enhances system security by providing hardware mechanisms for control and monitoring. The primary challenge lies in integrating these security features without significantly impacting performance. Future research could explore advanced security protocols and mechanisms to strengthen system integrity further and optimize the balance between security and performance in heterogeneous RISC-V systems [45].

A heterogeneous RISC-V SoC, integrating high-performance out-of-order cores, energy-efficient in-order cores, and specialized accelerators in a low-power 22 nm FinFET process was presented in 2021. The SoC achieves substantial performance and efficiency gains, with up to 286× MOPS/W improvement. Challenges include managing the complexity of integrating diverse components and optimizing power consumption. Future research could focus on refining the integration process, exploring new accelerator designs, and reducing power consumption while maintaining high performance across varied workloads [46].

In 2020, scientists proposed a modular RISC-V-based many-core architecture for FPGA accelerators designed for flexibility and scalability. The architecture features multiple processing clusters connected via a network-on-chip, supporting dynamic and partial reconfiguration. Evaluations demonstrate scalable performance and memory bandwidth. Key challenges include managing reconfiguration complexity and ensuring efficient intra-cluster communication. Future research might explore enhancing the flexibility of reconfiguration processes and expanding the architecture’s applicability to a broader range of applications, improving performance and energy efficiency [47].

A programmable heterogeneous microprocessor based on bit-scalable in-memory computing (IMC), addressing energy and throughput trade-offs in accessing data, was explored in 2020. The architecture integrates a 590 Kb IMC accelerator, digital near-memory computing (NMC) accelerator, and RISC-V CPU. It achieves high energy efficiency and performance in deep neural network tasks. Challenges involve maintaining computation signal-to-noise ratio (SNR) and ensuring robust hardware and software integration. Future research could improve IMC and NMC integration, enhance SNR, and extend the architecture’s application to more complex and varied workloads [48].

Docker and Kubernetes have been thoroughly covered in one of our previous papers [49]. The only notable addition to this technology overview is that Kubernetes is still unavailable on the RISC-V platform, while Docker recently became available in the upstream Ubuntu repositories. We will cover this topic in Section 4 of this paper.

The following two sections will discuss ARM and RISC-V in detail as platforms for Docker and Kubernetes. Understanding the context of these two architectures is essential, as these two ISAs are not evaluated as much as x86-based platforms.

3. ARM as a Platform for Docker and Kubernetes

ARM processors are increasingly used to deploy Docker and Kubernetes on Ubuntu because of their energy efficiency, scalability, and cost-effectiveness. This is particularly advantageous in cloud computing and edge contexts. ARM’s RISC architecture is highly efficient at processing high-throughput workloads while consuming less power. This makes it an excellent option for running containerized apps using Docker and orchestrating them with Kubernetes. These benefits are especially noticeable when energy economy and cost-effectiveness are crucial, such as in extensive cloud data centers and dispersed edge computing configurations.

Utilizing ARM processors with Docker on Ubuntu enables developers to generate compact and adaptable containers capable of operating on many platforms, hence offering versatility in deploying applications. The ARM architecture is compatible with multiple Linux distributions, such as Ubuntu, making it a flexible choice for developers who want to utilize containerization technologies. Docker is highly efficient on ARM processors because of their capacity to manage concurrent operations with reduced energy requirements compared to standard x86 processors. Docker packages apps and their dependencies into containers. Efficiency is paramount when implementing services that must be scaled over several nodes, as shown in extensive cloud infrastructures or distributed networks [50].

Kubernetes boosts the functionality of ARM processors by effectively managing and orchestrating Docker containers in a scalable manner. It enables the automatic deployment, scaling, and management of application containers across groups of hosts, offering a framework that guarantees the reliability and resilience of applications. Integrating ARM processors and Kubernetes on Ubuntu provides a robust solution for delivering microservices and other cloud-native applications necessitating comprehensive orchestration. Kubernetes’ capacity to scale and oversee containers over a wide range of nodes, including those utilizing ARM processors, guarantees effective deployment and management of applications, even in different environments [51,52].

Furthermore, researchers have conducted several experiments to investigate the integration of Kubernetes with ARM processors to enhance performance and optimize resource consumption. An example is research conducted on the KubCG platform, which showcased the effectiveness of a dynamic Kubernetes scheduler in enhancing container deployment in clusters with diverse architectures, such as those including ARM processors. The utilization of ARM processors in managing containerized workloads using Kubernetes has demonstrated a notable decrease in job completion time, highlighting the potential for enhanced efficiency. A different research study emphasized the utilization of ARM-based fog computing platforms that employ Docker and Kubernetes for effective data processing at the network edge, further confirming the appropriateness of ARM processors in situations that need both scalability and low latency [53,54].

The combination of ARM processors, Docker, and Kubernetes is seen in the implementation of distributed file systems, which are crucial for efficiently handling data over extensive clusters. Studies have demonstrated that deploying distributed file systems such as CephFS and Lustre-ZFS on ARM-based Kubernetes clusters can enhance the flexibility of data management and the dependability of services. This is especially advantageous in contemporary data centers and cloud environments requiring fast data transfer rates and reliable operations [55].

Using Docker and Kubernetes on Ubuntu operating on ARM processors offers a resilient and effective solution for contemporary cloud computing and edge scenarios [56]. The combination utilizes ARM’s energy-efficient and scalable technology, Docker’s containerization capabilities, and Kubernetes’ powerful orchestration to provide high-performance, cost-effective, and scalable solutions for various applications. This is why cloud providers are partially switching to ARM-based platforms for Kubernetes environments, as they offer excellent performance for most everyday applications while being more efficient than x86 platforms.

For these evaluations, we decided to use the TuringPi2 platform on the ARM side. This platform offers a motherboard to connect four compute modules (similar to Supermicro Twin servers). Multiple models of ARM-based compute modules can be installed in it, which helps the platform’s modularity and reconfiguration with different modules.

Deployment processes on the TuringPi2 platform were quite complex—this is down to our hardware choice due to the lack of availability of Ampere-based servers, for example. However, we had to:

Flash the image to the module using the TuringPi2 web interface (for RK1) or the Seeed Studio Development Board Kit connected to an Ubuntu-based laptop with a specific Ubuntu release and NVIDIA’s SDK Manager;
Power on the module to enter the default installation;
Configure output on TuringPi2 to output from the module via HDMI.

The TuringPi2 platform has HDMI output available, so we could use GUI if required. This is much more convenient than using the USB serial console, which is easy to break physically on our RISC-V platform.

There are multiple reasons why we selected the TuringPi2 platform for these evaluations:

Other ARM-based platforms with decent enough performance were unavailable on the market;
They offer easily replaceable ARM-based modules that we could use for evaluations;
They are incredibly price-efficient when getting to know a new platform;
They were widely available and already had a foothold on the market.

3.1. Docker Deployment

Docker deployment on the ARM platform is straightforward. A set of Docker packages is available in repositories for Ubuntu for ARM. Hence, the installation process for Docker requires one apt command on the Ubuntu Server:

apt -y install docker.io

Even if there were no packages, the compilation process for Docker is not a complex task. It takes a couple of hours, but Docker does work after it. After this, Docker containerization features are fully available, feature-par with features available on the x86 platform. This also makes the deployment experience equal to our experience on x86 platforms. Let us now see if the same applies to the process of deploying Kubernetes on the ARM platform.

3.2. Kubernetes Deployment

Kubernetes deployment has always been more complex than Docker deployment, as it is a much bigger platform. However, deployment on ARM closely resembles the deployment process on x86 counterparts. The detailed installation procedure is available online [57].

4. RISC-V as a Platform for Docker and Kubernetes

The RISC-V architecture has become increasingly popular in recent years because of its open-source nature, which enables more customization and freedom in designing processors. This architecture’s scalability and cost-effectiveness make it suitable for cloud computing, IoT, and edge computing applications. However, several obstacles and constraints exist to the RISC-V performance based on the SiFive HiFive Unmatched Rev B platform, the only widely available RISC-V-based platform on the market. This platform is based on the SiFive Freedom U740 SoC RISC-V chip. The first one we will mention is the very slow NVMe controller. Figure 1 clearly shows the difference in performance between the NVMe controller on the TuringPI2 platform versus the SiFive platform based on the U740 RISC-V SoC with the same SSD.

Operationally speaking, SiFive’s RISC-V platform has one big issue: it is unable to boot from NVMe, i.e., it only boots from microSD, which is much slower. For reference, we are talking about 50 MB/sec cached reads and 1.51 MB/s buffered disk reads, which makes it unusable except for the initial boot and a bit of configuration to make the platform use NVMe as an Ubuntu root partition drive. Even the regular package deployment processes can become unusably slow if we were to go down that route, which is not recommended. This would be a huge issue if we wanted to run containers from a local disk.

Implementing Docker on RISC-V is reasonably seamless, thanks to the flexibility of Linux as a platform, which serves as the foundation for Ubuntu and Docker’s containerization technologies. Nevertheless, difficulties arise in the process of coordinating these containers using Kubernetes. Kubernetes is essential for efficiently managing large-scale containerized applications commonly found in cloud computing settings. Regrettably, there is no complete and officially endorsed version of Kubernetes available for the RISC-V architecture, and there is also a significant lack of available RISC-V-compatible Docker containers with which to work. For example, there is no official Ubuntu RISC-V image available at the time of writing this paper. Therefore, the potential for implementing Kubernetes in a production setting on RISC-V processors is significantly restricted [49,56].

The sole existing binary package of Kubernetes for RISC-V is version 1.16, providing solely fundamental services. As a result of this constraint, certain sophisticated functionalities of Kubernetes, including automatic recovery, scalability, and gradual upgrades, may not operate as intended or necessitate substantial adjustments and customization. Furthermore, the absence of support from the upstream source means that any upgrades or security patches must be applied manually, making it more complicated and increasing the risks involved in maintaining a Kubernetes cluster on RISC-V [56].

Notwithstanding these obstacles, endeavors have been made to narrow the divide. An orchestration platform called KubeEdge-V has been created explicitly for RISC-V computers. This platform establishes the essential elements necessary to facilitate the fundamental functionalities of containerization and orchestration. It has undergone testing on a prototype system utilizing SiFive processors. Nevertheless, this solution is under development and does not give Kubernetes a complete array of functionalities on well-established architectures such as x86 or ARM [56,58].

RISC-V processors present promising opportunities for open-source hardware and software ecosystems [59,60]. The utilization of Docker and Kubernetes on these processors, particularly on Ubuntu, is still at an early stage of development. The absence of a comprehensively endorsed Kubernetes version and the restricted capabilities of the current binary package are substantial obstacles to extensive adoption. Continued progress and assistance from the community will be essential in overcoming these obstacles and fully harnessing the capabilities of RISC-V in cloud-native settings.

First, Linux must be deployed on the set of RISC-V nodes. The deployment process for these platforms is more involved than using an x86 platform. That is partially due to the hardware choices we made and partially due to the immaturity of these platforms. Deployment for the SiFive-based RISC-V platform was as painless as possible [61]:

Downloading the Ubuntu Server 24.04 RISC-V image;
Unpacking the image and flash it on an SD card for installation by using Raspberry Pi Imager (or dd, if Linux is used);
Connecting the serial console and follow the standard Ubuntu boot procedure.

After that, it would be prudent to make the board boot the root filesystem from the NVMe drive—it is much faster than the microSD. We had to change a few settings in the u-boot configuration files. This procedure is documented on the Ubuntu Tutorials homepage [60], and it requires:

Downloading the Ubuntu RISC-V image on the booted RISC-V system (booted via MicroSD card);
Finding the corresponding NVMe device entry in the Linux/dev filesystem (usually/dev/nvme0n1);
Writing the image to the NVMe device using dd.
mounting and chrooting the new NVMe-hosted/filesystem to /mnt
Changing u-boot configuration and applying it;
Rebooting the system.

After that, a system-wide upgrade to the latest packages (apt-get—y upgrade) is recommended, and a reboot is mandatory after the new kernel has been deployed. The following steps involve installing Docker and Kubernetes (if possible), which we will do in the following two sub-sections.

4.1. Docker Deployment

Since we started finishing this paper a couple of months ago, the situation with Docker deployment has improved immensely. There is a set of available Docker packages available in repositories for Ubuntu 24.04 for RISC-V, so the installation process for Docker is straightforward:

apt -y install docker.io

This is a recent development in the Ubuntu/RISC-V world, as these packages were unavailable when we started writing this paper a couple of months ago. The results are visible in Figure 2.

Even if there were no packages, the compilation process for Docker is not a big challenge. It takes a couple of hours, but Docker does work after it.

4.2. Kubernetes Deployment

Unfortunately, Kubernetes still does not have current upstream packages for RISC-V. In this sense, there are three available options:

To compile Kubernetes from source code (a very complex task that will not necessarily end up being successful);
To use the only available binary package for Kubernetes, version for RISC-V (1.16 from September 2019), available online [62];
To install and run k3s.

K3s is a version of K8s with a much smaller footprint; it uses fewer resources, the configuration is more straightforward, albeit with a limited set of options, and it is not meant to be scalable and highly available for production-level environments. It is also much more limited in features and extensions while offering limited compatibility with standard K8s tools and extensions. We used three RISC-V nodes based on SiFive HiFive Unmatched boards. We deployed the available, minimal Kubernetes v1.16 package to evaluate whether using RISC-V as a platform makes sense for k8s workloads. However, we must also make note of one simple fact: this package does not contain the full k8s environment with all modules and addons, it just contains the minimum services, like:

Set of required services and binaries, like kubectl, kubeadm, etc.;
apiserver;
Controller-manager;
Scheduler;
Proxy;
Pause (for pod network namespace);
etcd;
coredns.

First and foremost, a couple of dependencies must be deployed before the k8s v1.16 package deployment. We need to employ a set of commands as described on Carlos Eduardo’s GitHub page [62,63]. Since this GitHub page was made, many new Docker versions have been released, so it is expected to get some warnings about k8s v1.16 not being compatible with, for example, Docker 24.0.7.

After the package deployment on our three nodes, the Kubernetes cluster works, and we can conduct a performance evaluation with it. However, we also need to point out the fact that this package version is five years old and it is missing a whole bunch of new features that were introduced during that time, such as:

Changes to Ingress controller (1.18);
Better CLI support, logging, new APIs, CSI health monitoring (v1.19);
Docker deprecation (v1.20);
Changes to Kubelet logging, storage capacity tracking (v1.21);
External credential providers support (v1.22);
Dual-stack IPv4/IPv6 networking, HorizontalPodAutoscaler v2 changes (v1.23);
Removal of Dockershim from Kubelet, changes in storage plugins (v1.24);
cgroups v2 support, further changes in storage plugins (v1.25);
API changes (v1.26);
iptables performance improvements (v1.27);
Changes to Ceph support (removal of the CephFS plugin in favor of CephFS CSI driver) (v1.28), etc.

Furthermore, many stability issues exist when deploying Kubernetes from the binary package on Ubuntu 24.04. The Kubelet service times out occasionally (even during the cluster initialization phase), containers sometimes fail to start, issues with networking and firewalling, problems with the cgroups v2 subsystem, etc. However, we got it up and running and ran some tests to understand how this platform performs compared to ARM-based platforms.

5. Experimental Setup and Study Methodology

When we started working on this paper a couple of years ago, the priority was to get access to hardware to do real-life performance evaluations, not to write about theory and technical marketing. Years later, these platforms are still challenging to get, especially in volume. The availability of ARM servers in the EU region is poor. RISC-V is even worse, although it has been years since various vendors promised that they will be available. It is becoming a bit better in 2024, but still, no high-performance RISC-V processors are available, and for example, ARM Ampere-based multicore system availability is not much better.

Ultimately, we have opted to do our software and performance evaluations based on readily available platforms—a set of TuringPI2 platforms plus a selection of ARM-based compute modules for ARM systems and SiFive HiFive Unmatched Rev B for RISC-V. For TuringPi compute modules, we acquired Turing RK1, CM4-based Raspberry Pi CM4 modules, and NVIDIA Jetson TX2 NX. Turing RK1, based on Rockchip RK3588, is by far and away the most performant module within the price envelope. At the time of writing this paper, the TuringPI2 cluster board plus an RK1 price was comparable to SiFive Unmatched Rev B with a RISC-V CPU if we add the cost of memory that was an extra cost for the RISC-V board. Price similarity gave us a good baseline with which to work.

Regarding performance evaluations, we focused on a stack of CPU, memory, and disk evaluations implemented by a set of custom containers managed by Kubernetes. This means that all the scores will be from the perspective of an Alpine container with the necessary tools (stress-ng, sysbench, etc.) installed inside. There was no point in using any GPU tests as GPUs are far from being supported on the RISC-V platform, making the comparison moot. However, we will reflect on that in our Discussion section to provide the correct information.

For HPC performance evaluations, we decided to use a standard set of performance evaluations based on HPCC (High-Performance Challenge), as it has different test suites and gives us a broad performance evaluation for various types of workloads. First and foremost, HPCC needed to be compiled for every one of these platforms. For that, we also had to compile OpenBLAS (Open Basic Linear Algebra Subprograms) library, then compile HPCC (which required a custom Makefile per platform, and then all that was merged into a per-platform Docker image container to keep the methodology constant across all performance evaluations. We used the latest OpenBLAS library (v0.3.28) and the latest version of HPCC (1.5.0). Also, as we used Ubuntu Linux 24.04 across all our platforms, we had to install some dependencies, which was performed via the same command on all platforms:

apt -y install build-essential hwloc libhwloc-dev libevent-dev gfortran libblas-dev liblapack-dev mpich libopenmpi-dev make

On our GitHub page dedicated to this paper [64], we published the procedure for compiling OpenBLAS, installing these dependencies, and finishing Makefiles for HPCC for all platforms. Configuration and compilation processes for these utils take quite a while, so we are publishing these configuration details for transparency reasons in case someone needs them for verification.

6. ARM and RISC-V Performance Evaluation

We used a set of standardized tests for performance evaluation, like stress-ng and sysbench, where available (sysbench is not supported on RISC-V architecture). We focused on CPU and memory performance paired with power usage, as this seemed like a reasonable scenario—these platforms should be efficient compared to x86 platforms. We used an HP ProLiant Gen8 server based on an Intel Xeon E5-2680 CPU for context reasons—we wanted to see how the performance of all these RISC-V and ARM platforms stacks up against a similarly priced x86 CPU, no matter the fact that E5-2680 is a twelve-year-old CPU. The RK3588 processor mentioned in the performance evaluations is the CPU on the Turing RK1 compute module.

Stress-ng has a set of different metrics that we need to cover before moving on to the following sub-section, specifically:

bogo ops—measures the amount of bogo ops (bogus operations) done overall by the workload performing the evaluation;
bogo ops/s (real time)—measures the amount of bogo ops per second divided by the overall run time of the evaluation;
bogo ops/s (usr+sys time)—measures the amount of bogo ops per second based on the combined user and system CPU time used by the evaluation’s workload.

It is also worth remembering that stress-ng can set the number of CPU cores used for performance evaluation (parameter --cpu).

Let us start with single-core performance, as this is very important when dealing with various types of workloads based on containers.

6.1. Single-Core Performance

In single-core performance tests, the Turing RK1 ARM-based system wins considerably. What is surprising is that all other ARM-based platforms and RISC-V-based U740 are nowhere to be found in that respect. We do have to note, though, that NVIDIA Jetson-based TX2 NX does have a built-in GPU with 256 CUDA (Compute Unified Device Architecture), which is one of the reasons why the CPU part of it is losing by such a margin, as can be seen in Figure 3.

Regarding memory performance, the Turing RK1 ARM-based system is miles ahead of everything else. However, the exciting part is the fact that it is also significantly faster than our x86-based system, as can be seen in Figure 4.

The RISC-V-based systems’ score (U740) is a bit misleading in the following performance chart because we cannot do a sysbench latency test on it, so it did not post any scores. If we exclude that result, then we can see that Turing RK1 is still much better than anything else, including other ARM CPUs that also have memory built-in, as can be seen in Figure 5.

We will continue our performance evaluations with multi-core performance tests to see how performance scales across all available cores. E5-2680 has 32 cores with HyperThreading enabled, RK3588 is an 8-core CPU, and all the other CPUs have 4 cores. We can expect this to impact performance results significantly, but that is the whole point—we need a complete overview of a platform’s performance.

6.2. All-Core Performance

With 32 available x86 cores and significantly faster frequency, it is no wonder that E5-2680 is far ahead of every other CPU—but that is also not the point. If we evaluate all different platforms, then we can again see the Turing RK1 compute module being far ahead of all other assessed platforms, as can be seen in Figure 6.

Memory performance for the all-core scenario continues the same trend, as the Rockwell RK3588-based Turing RK1 compute module still has a significant lead compared to RISC-V and other ARM platforms, as can be seen in Figure 7.

Again, ignoring that we cannot perform latency testing on the RISC-V platform (U740 chip), Turing RK1 still does very well, although the A72-based CPU has a tiny bit less latency (0.57 ms vs. 0.58 ms). However, this time, we need to take note and compare this latency to the x86-based system, as there is a world of difference between them, with the x86-based system having almost three times the latency as the ARM-based platforms. This is one of the fundamental issues with x86 platforms in general—memory is too far away from the CPU to be less latent, and the built-in caches cannot make that much of a difference compared to CPUs with memory on-chip as ARM chips have. Intel and AMD announced that this issue will be addressed in some of the future x86 chips in the next couple of generations as this design feature has the most detrimental influence on performance. The average latency of all platforms can be seen in Figure 8.

ARM-based systems are a much better choice for CPU or memory-intensive workloads. It is surprising how much faster they are, especially compared to the similarly priced RISC-V platform. Let us now perform some essential HPC-related evaluations to see what performance we can expect from these platforms.

6.3. HPC Performance

The first stack of tests that we did for our HPC performance evaluation was related to HPL (High-Performance Linpack) in terms of available TFLOPS (Tera Floating Operations Per Second) and HPL time (time required to finish the evaluation). The TFLOPS evaluation tells us how much faster evaluated ARM-based platforms are than RISC-V platform (especially Turing RK1 vs. U740 RISC-V SoC), as can be seen in Figure 9.

HPL time measures the amount needed to finish the evaluation—the shorter the time, the more performance the platform has. The results can be seen in Figure 10.

DGEMM, part of the HPCC benchmark, measures floating point and double-precision matrix-to-matrix multiplication performance. These performance evaluations, as well as some that are coming up next, should scale similarly to the HPL GFLOPS scores, and they do, as can be seen in Figure 11.

PTRANS measurement evaluates the parallel matrix transpose capabilities of our platforms. RandomAccess is a part of that evaluation, measuring random access performance for large data arrays in multicore scenarios. The evaluation results can be seen in Figure 12.

HPCC STREAM evaluates the sustainable memory bandwidth transfer in GB/s. RISC-V platform falls to the bottom here, but ARM platforms are surprisingly close to the x86 platform (especially when we consider that they are consuming much less power), as can be seen in Figure 13.

The last set of performance evaluations is related to FFT, which measures the floating-point DFT (Discrete Fourier Transform) execution rate. Again, Turing RK1 is very close to the x86 system here, and the other ARM platforms are far ahead of the RISC-V platform. as can be seen in Figure 14.

In all the performance metrics we could show in this paper (and quite a few more), ARM platforms are much faster than anything the RISC-V platform can offer for the same price. Let us discuss this in a bit more detail in the next section.

7. Discussion

We can conclude that the ARM platform is much more robust and production-ready than the RISC-V platform. Of course, this is not surprising, as it has been on the market for decades. ARM has experience designing CPU architectures from billions of processors being used in various devices, so this was to be expected.

What we did not expect, however, was the difference in performance between our RISC-V platform and all the ARM platforms. The RISC-V platform is much slower in terms of performance and latency. If we were to discuss these results based on the timeline, then a direct comparison could be made between the A72 ARM CPU and the RISC-V platform, as they were launched almost simultaneously. The difference in memory performance (four times plus faster), for the same basic CPU performance in all-core, and better single-core performance is notable when comparing Quad Cortex A72 ARM core to U740 RISC-V core.

Then, there is the comparison to Turing RK1 and NVIDIA Jetson TX2 NX. Yes, both platforms are newer than the RISC-V platform, although TX2 NX was introduced only a few months after U740, while the RK1 was introduced a year and a half later. However, the performance difference, even accounting for the 256 CUDA cores in Jetson TX2 NX, is staggering. We compare them for the same amount of money and a much more favorable power envelope. Jetson’s memory performance is roughly 5× the U740, while RK3588 is 15× plus times faster in memory performance. The CPU performance gap is also quite big—RK3588 is approximately 5× faster, and Jetson TX2 NX is approximately 2× faster than U740. Suppose we count the CUDA cores on Jetson; that makes the comparison even worse. That is why, if we were to deploy micro-clusters for Docker/Kubernetes for either cloud services or super-efficient HPC environments, TuringPi2 platforms based on NVIDIA compute modules, and Turing RK1 are a much more efficient and faster solution. The only fact that works in the RISC-V platform’s favor is its PCI-Express slot on the motherboard. However, that advantage is null and void when we look at the following facts:

The only officially supported PCIe graphics cards are AMD RX 500-Series and Radeon HD 6000-Series VGA cards, which are both old and do not run CUDA formally, so they cannot be used to accelerate anything;
There is no support for CUDA on the RISC-V platform, even if the platform supports NVIDIA GPUs;
There are no known FPGAs that can be used on RISC-V;
There are no known ASICs that can be used on RISC-V.

The big plus of RISC-V—the fact that it is an open-source platform—will only start paying dividends when critical players on the market support the platform for familiar use cases. There are currently EU-sponsored projects, such as the European Processor Initiative, for developing an HPC ecosystem based on the RISC-V core. This is where concepts like FAUST [65] will shine—these sorts of specialized acceleration units that can be integrated with RISC-V architecture are where RISC-V’s forte will come to the fore. However, it will also take time, as RISC-V is currently not well supported on the software side. In contrast, the basic hardware side still needs quite a bit of additional development to be used in a heterogeneous HPC environment managed by Kubernetes.

Looking at the performance analytics charts, we can see why ARM, specifically the TuringPI2 platform, is used so often, especially in the education sector, to teach the different ways to do distributed programming and HPC-related topics. These platforms are very price competitive, highly capable, and offer incredible consolidation ratios. When the platform has all four nodes running at full speed, we can have four independent nodes in one mini-ITX system that consumes less than 70 W of power, which is incredibly power-efficient compared to anything x86 offers. For reference, the x86 system idles around 160 W continuously, and its power usage increases to 350 W if used 100%, while the RISC-V system consumes roughly 100 W. These numbers were measured by a switching/monitoring PDU with per-socket power management. ARM Ampere-based systems would probably be an even better example to illustrate the efficiency point, as more and more research points to the fact that ARM is energy-efficient even in high-end CPU designs [66,67,68]. However, TuringPi2 systems with ARM CPUs can handle Docker and Kubernetes and have full upstream support for those platforms; they are incredibly energy-efficient and can be procured quickly and used for educational and production tasks.

Before we discuss possible future research areas, it would be fair to note that some types of workloads would greatly benefit from using RISC-V architectures with specific ISA extensions. For example, AI (Artificial Intelligence) training can be very efficient by using RISC-V with a built-in AI co-processor, just as vectorization of Deep Neural Networks via RISC-V vectorization and using PiM (Processing-in-Memory) units to speed up AI inference [69,70,71]. However, these are incredibly difficult to acquire and were out of reach for our evaluations.

8. Future Works

We see multiple exciting research areas for the future of heterogeneous computing based on different ISAs, especially in HPC. These research areas depend on Intel, AMD, NVIDIA, and others to further develop their ARM, especially the RISC-V-based software stack, and offer readily available software support to continue the research path. Given better workload scheduling, heterogeneous HPC clusters could provide a massive bump in energy efficiency with more development.

Further research is needed to optimize RISC-V performance. Given its open-source nature, RISC-V could use more microarchitectural enhancements, but even more so, with the integration of various hardware accelerators (AI, cryptography, etc.) or different domain-specific architectures for specific industries and tasks, to give RISC-V a bit more of a foothold in specific niche technology areas. If we look at this issue from the perspective of hardware–software co-design, which usually yields code that is optimized for the platform used, then software (Kubernetes, libraries, compilers, HPC apps, tools, etc.) needs to be further developed for RISC-V to be able to extract performance potential in the future. We might also add that we do not necessarily consider this a huge issue—more of a normal stage in the RISC-V platform development—as we see definite value in RISC-V systems with built-in accelerators currently being researched for current and future workloads.

Research into performance optimizations must be combined with additional research into compiler and library optimization to boost performance in specific, targeted applications.

Further research needs to be conducted on energy efficiency and power consumption, especially for various workloads. x86 platforms will be the best overall choice, but there will also be areas where ARM and, potentially, RISC-V might be the correct choice. However, the last five years of development of ARM products for the data center are a great example ARM many-core architectures (for example, Ampere) taking a strong foothold in the data center space, as they are very efficient for many different tasks. Given the choice, cloud providers will gladly sell us the capability to run Kubernetes/Docker environments on ARM-based architectures, and rightfully so, as they are much more efficient than x86 platforms. This research can lead to workload partitioning across different ISA architectures for bottom-up environments for heterogeneous computing.

Regarding the scalability of using heterogeneous computing environments with Kubernetes for HPC, one more critical obstacle must be overcome—the networking utilized to scale the environment. The selection of Ethernet or InfiniBand for interconnects is vital in heterogeneous HPC environments that include x86, ARM, and RISC-V servers and still needs to be thoroughly researched to be ready for production HPC systems. Ethernet, renowned for its adaptability and seamless integration, effortlessly accommodates all three architectures—x86, ARM, and RISC-V—without any compatibility concerns. While it is economically advantageous and suitable for various general-purpose HPC tasks, it may experience increased delay and reduced data transfer rate compared to InfiniBand. In contrast, InfiniBand is designed explicitly for HPC and offers reduced latency and increased bandwidth, making it well-suited for applications that necessitate fast data transfers between nodes. InfiniBand provides strong support for x86 and almost as good support for ARM architectures, facilitating efficient communication and performance scalability in these situations. Nevertheless, the integration of InfiniBand and RISC-V-based platforms is still in its early stages. Although efforts are underway to create InfiniBand drivers and support for RISC-V, it is not currently as advanced as it is for x86 and ARM. Consequently, Ethernet may be favored in settings that extensively employ RISC-V because of its wider and more reliable assistance.

More research is needed into standardization in heterogeneous computing to make it easier for researchers and regular or business users to integrate and switch seamlessly between various architectures. For example, we mentioned that Kubernetes is supported on x86 and ARM but not RISC-V. That means that, even if we wanted to, we cannot use the same toolset that Kubernetes offers on RISC-V, no matter how hard we try. The latest available binary distribution of Kubernetes on RISC-V is five years old and needs to be brought into the present. Research into the management and implementation of such heterogeneous Kubernetes clusters is underway. Given the popularity of the ARM platform and the rising interest in the RISC-V platform, it seems to be the right way to go [71]. We can see a potential future in which different ISAs will be used for various applications in large-scale heterogeneous environments, especially when combined with more research into the energy efficiency of the Kubernetes scheduler workload placement.

9. Conclusions

The research described in this paper thoroughly examines ARM and RISC-V architectures’ performance in general and with HPC workloads, specifically emphasizing their incorporation with Docker and Kubernetes. It provides thorough empirical assessments, practical insights, and performance comparisons between these two architectures, which are increasingly important in the changing landscape of heterogeneous computing systems.

In most instances, the performance study demonstrates the superiority of ARM over RISC-V. The architecture developed by ARM has consistently shown exceptional performance, particularly in tasks that require high memory usage and extensive processing by the CPU. The Turing RK1, based on the ARM architecture, demonstrated superior performance compared to RISC-V, exhibiting notably better data processing speed and reduced latencies for the same price range. The advantage is particularly highlighted in Docker and Kubernetes environments, where ARM’s well-developed ecosystem guarantees smooth integration. The ARM platform’s capacity to efficiently manage intricate containerized workloads makes it a more practical choice for HPC applications, especially in cloud and edge computing scenarios where scalability and power economy are essential.

RISC-V, despite its open-source appeal and ability for customization, encounters substantial obstacles. The study indicates that RISC-V’s performance is inferior to ARM’s in the evaluations performed, particularly in single- and multi-core processing. Furthermore, the absence of well-developed software assistance impeded the incorporation of Kubernetes on RISC-V platforms. The lack of a comprehensive and officially endorsed Kubernetes version for RISC-V is a significant obstacle to its implementation in production-level environments. Despite the potential for increased flexibility and creativity, these drawbacks of the RISC-V architecture and the added intricacies of deploying and operating Kubernetes clusters on this platform outweigh its current benefits.

RISC-V will exhibit its potential in the mid-term, especially in specialized fields where its open-source characteristics and ability to be tailored to specific needs could be advantageous. Excellent examples are various vectorization and AI acceleration units that are either ready or being developed for RISC-V. Optimizing RISC-V’s performance and improving its software ecosystem to increase its competitiveness in heterogeneous computing settings, especially in scenarios where Docker and Kubernetes are becoming more widespread, is an absolute must. Otherwise, RISC-V might end up being an excellent idea that never was—on its own or in heterogeneous environments.

Author Contributions

Conceptualization, V.D. and L.M.; methodology, V.D. and Z.K.; software, Z.K.; validation, V.D. and L.M.; formal analysis, V.D. and G.Đ.; investigation, Z.K. and G.Đ.; resources, L.M.; data curation, V.D. and G.Đ.; writing—original draft preparation, Z.K. and G.Đ.; writing—review and editing, L.M. and V.D.; supervision, L.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data supporting this study’s findings are available from the corresponding author, [L.M.], upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

Bruhn, F.; Brunberg, K.; Hines, J.; Asplund, L.; Norgren, M. Introducing radiation tolerant heterogeneous computers for small satellites. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 7–14 March 2015; pp. 1–10. [Google Scholar] [CrossRef]
Reichenbach, M.; Holzinger, P.; Häublein, K.; Lieske, T.; Blinzer, P.; Fey, D. Heterogeneous Computing Utilizing FPGAs. J. Signal Process. Syst. 2018, 91, 745–757. [Google Scholar] [CrossRef]
Feng, L.; Liang, H.; Sinha, S.; Zhang, W. HeteroSim: A Heterogeneous CPU-FPGA Simulator. IEEE Comput. Archit. Lett. 2016, 16, 38–41. [Google Scholar] [CrossRef]
Chang, L.; Gómez-Luna, J.; Hajj, I.E.; Huang, S.; Chen, D.; Hwu, W. Collaborative Computing for Heterogeneous Integrated Systems. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, L’Aquila, Italy, 22–26 April 2017. [Google Scholar] [CrossRef]
Mittal, S.; Vetter, J. A Survey of CPU-GPU Heterogeneous Computing Techniques. ACM Comput. Surv. (CSUR) 2015, 47, 1–35. [Google Scholar] [CrossRef]
Prongnuch, S.; Wiangtong, T. Heterogeneous Computing Platform for data processing. In Proceedings of the 2016 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Phuket, Thailand, 24–27 October 2016; pp. 1–4. [Google Scholar] [CrossRef]
Rethinagiri, S.; Palomar, O.; Moreno, J.; Unsal, O.; Cristal, A. Trigeneous Platforms for Energy Efficient Computing of HPC Applications. In Proceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), Bengaluru, India, 16–19 December 2015; pp. 264–274. [Google Scholar] [CrossRef]
Kurth, A.; Vogel, P.; Capotondi, A.; Marongiu, A.; Benini, L. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. arXiv 2017, arXiv:1712.06497. [Google Scholar] [CrossRef]
Parnassos, I.; Bellas, N.; Katsaros, N.; Patsiatzis, N.; Gkaras, A.; Kanellis, K.; Antonopoulos, C.; Spyrou, M.; Maroudas, M. A programming model and runtime system for approximation-aware heterogeneous computing. In Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 4–8 September 2017; pp. 1–4. [Google Scholar] [CrossRef]
Lopez-Novoa, U.; Mendiburu, A.; Miguel-Alonso, J. A Survey of Performance Modeling and Simulation Techniques for Accelerator-Based Computing. IEEE Trans. Parallel Distrib. Syst. 2015, 26, 272–281. [Google Scholar] [CrossRef]
Liao, S.-W.; Kuang, S.-Y.; Kao, C.-L.; Tu, C.-H. A Halide-based Synergistic Computing Framework for Heterogeneous Systems. J. Signal Process. Syst. 2019, 91, 219–233. [Google Scholar] [CrossRef]
Wu, Z.; Hammad, K.; Beyene, A.; Dawji, Y.; Ghafar-Zadeh, E.; Magierowski, S. An FPGA Implementation of A Portable DNA Sequencing Device Based on RISC-V. In Proceedings of the 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), Quebec City, QC, Canada, 19–22 June 2022; pp. 417–420. [Google Scholar] [CrossRef]
Young, J.; Jezghani, A.; Valdez, J.; Jijina, S.; Liu, X.; Weiner, M.D.; Powell, W.; Sarajlic, S. Enhancing HPC Education and Workflows with Novel Computing Architectures. J. Comput. Sci. Educ. 2022, 13, 31–38. [Google Scholar] [CrossRef]
Kurth, A.; Forsberg, B.; Benini, L. HEROv2: Full-Stack Open-Source Research Platform for Heterogeneous Computing. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 4368–4382. [Google Scholar] [CrossRef]
Ojika, D.; Gordon-Ross, A.; Lam, H.; Yoo, S.; Cui, Y.; Dong, Z.; Dam, K.V.; Lee, S.; Kurth, T. PCS: A Productive Computational Science Platform. In Proceedings of the 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 15–19 July 2019; pp. 636–641. [Google Scholar] [CrossRef]
Fiolhais, L.; Gonçalves, F.F.; Duarte, R.; Véstias, M.; Sousa, J. Low Energy Heterogeneous Computing with Multiple RISC-V and CGRA Cores. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 26–29 May 2019; pp. 1–5. [Google Scholar] [CrossRef]
Dong, J.; Zheng, F.; Lin, J.; Liu, Z.; Xiao, F.; Fan, G. EC-ECC: Accelerating Elliptic Curve Cryptography for Edge Computing on Embedded GPU TX2. ACM Trans. Embed. Comput. Syst. (TECS) 2022, 21, 1–25. [Google Scholar] [CrossRef]
D’Agostino, D.; Cesini, D. Editorial: Heterogeneous Computing for AI and Big Data in High Energy Physics. Front. Big Data 2021, 4, 652881. [Google Scholar] [CrossRef]
Hu, N.; Wang, C.; Zhou, X. FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing. Electronics 2022, 11, 3756. [Google Scholar] [CrossRef]
Du, D.; Liu, Q.; Jiang, X.; Xia, Y.; Zang, B.; Chen, H. Serverless computing on heterogeneous computers. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February–4 March 2022. [Google Scholar] [CrossRef]
Núñez-Yáñez, J. Energy Proportional Heterogenous Computing with Reconfigurable MPSoC. In Proceedings of the 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 15–19 July 2019; p. 642. [Google Scholar] [CrossRef]
Carballo-Hernández, W.; Pelcat, M.; Berry, F. Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks? arXiv 2021, arXiv:2102.01343. [Google Scholar] [CrossRef]
Mahmoud, D.G.; Lenders, V.; Stojilović, M. Electrical-Level Attacks on CPUs, FPGAs, and GPUs: Survey and Implications in the Heterogeneous Era. ACM Comput. Surv. (CSUR) 2022, 55, 1–40. [Google Scholar] [CrossRef]
Freytag, G.; Serpa, M.; Lima, J.F.; Rech, P.; Navaux, P. Collaborative execution of fluid flow simulation using non-uniform decomposition on heterogeneous architectures. J. Parallel Distrib. Comput. 2021, 152, 11–20. [Google Scholar] [CrossRef]
Rodríguez, A.; Navarro, A.; Asenjo, R.; Corbera, F.; Gran, R.; Suárez, D.; Núñez-Yáñez, J. Parallel multiprocessing and scheduling on the heterogeneous Xeon+FPGA platform. J. Supercomput. 2019, 76, 4645–4665. [Google Scholar] [CrossRef]
Datta, D.; Gordon, M.S. Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives. J. Chem. Theory Comput. 2023, 19, 7640–7657. [Google Scholar] [CrossRef]
Lastovetsky, A.L.; Manumachu, R.R. The 27th International Heterogeneity in Computing Workshop and the 16th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms. Concurr. Comput. Pract. Exp. 2020, 104–118. [Google Scholar] [CrossRef]
Horta, E.; Chuang, H.-R.; VSathish, N.R.; Philippidis, C.J.; Barbalace, A.; Olivier, P.; Ravindran, B. Xar-trek: Run-time execution migration among FPGAs and heterogeneous-ISA CPUs. In Proceedings of the 22nd International Middleware Conference, Québec City, QC, Canada, 6–10 December 2021. [Google Scholar] [CrossRef]
Cerf, V. On heterogeneous computing. Commun. ACM 2021, 64, 9. [Google Scholar] [CrossRef]
Wyrzykowski, R.; Ciorba, F. Algorithmic and software development advances for next-generation heterogeneous platforms. Concurr. Comput. Pract. Exp. 2022, 34, e7013. [Google Scholar] [CrossRef]
Hagleitner, C.; Diamantopoulos, D.; Ringlein, B.; Evangelinos, C.; Johns, C.; Chang, R.N.; D’Amora, B.D.; Kahle, J.; Sexton, J.; Johnston, M.; et al. Heterogeneous Computing Systems for Complex Scientific Discovery Workflows. In Proceedings of the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 1–5 February 2021; pp. 13–18. [Google Scholar] [CrossRef]
Mavrogeorgis, N. Simplifying heterogeneous migration between x86 and ARM machines. arXiv 2021, arXiv:2112.01189. [Google Scholar]
Thomadakis, P.; Chrisochoides, N. Towards Performance Portable Programming for Distributed Heterogeneous Systems. arXiv 2022, arXiv:2210.01238. [Google Scholar] [CrossRef]
Nikov, K.; Hosseinabady, M.; Asenjo, R.; Rodríguez, A.; Navarro, A.; Núñez-Yáñez, J. High-Performance Simultaneous Multiprocessing for Heterogeneous System-on-Chip. arXiv 2020, arXiv:2008.08883. [Google Scholar]
Fuentes, J.; López, D.; González, S. Teaching Heterogeneous Computing Using DPC++. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lyon, France, 30 May–3 June 2022; pp. 354–360. [Google Scholar] [CrossRef]
Thomadakis, P.; Chrisochoides, N. Runtime Support for Performance Portability on Heterogeneous Distributed Platforms. arXiv 2023, arXiv:2303.02543. [Google Scholar] [CrossRef]
Kavanagh, R.; Djemame, K.; Ejarque, J.; Badia, R.M.; García-Pérez, D. Energy-Aware Self-Adaptation for Application Execution on Heterogeneous Parallel Architectures. IEEE Trans. Sustain. Comput. 2020, 5, 81–94. [Google Scholar] [CrossRef]
Yu, Y.; Zhang, S.; Fu, H.; Wu, L.; Chen, D.; Gao, Y.; Wei, Z.; Jia, D.; Lin, X. Characterizing uncertainties of Earth system modeling with heterogeneous many-core architecture computing. Geoscientific Model Development. Geosci. Model Dev. 2022, 15, 6695–6708. [Google Scholar] [CrossRef]
Cheng, Y.; Sun, W.-T.; Bi, Y.; Cheng, Y.; Shi, J.; Wang, L.; Yao, Q.; Hu, Q.; Zhang, M. Large Scale ARM Computing Cluster and its Application in HEP. In Proceedings of the International Symposium on Grids & Clouds 2022—PoS (ISGC2022), Taipei, Taiwan, 21–25 March 2022. [Google Scholar] [CrossRef]
Kamaleldin, A.; Göhringer, D. AGILER: An Adaptive Heterogeneous Tile-Based Many-Core Architecture for RISC-V Processors. IEEE Access 2022, 10, 43895–43913. [Google Scholar] [CrossRef]
Nicholas, G.S.; Gui, Y.; Saqib, F. A Survey and Analysis on SoC Platform Security in ARM, Intel and RISC-V Architecture. In Proceedings of the 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS), Springfield, MA, USA, 9–12 August 2020; pp. 718–721. [Google Scholar] [CrossRef]
Wang, X.; Leidel, J.D.; Williams, B.; Ehret, A.; Mark, M.; Kinsy, M.; Chen, Y. xBGAS: A Global Address Space Extension on RISC-V for High Performance Computing. In Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Portland, OR, USA, 17–21 May 2021; pp. 454–463. [Google Scholar] [CrossRef]
Tornero, R.; Rodríguez, D.; Martínez, J.M.; Flich, J. An Open-Source FPGA Platform for Shared-Memory Heterogeneous Many-Core Architecture Exploration. In Proceedings of the 2023 38th Conference on Design of Circuits and Integrated Systems (DCIS), Málaga, Spain, 15–17 November 2023; pp. 1–6. [Google Scholar] [CrossRef]
Gómez-Sánchez, G.; Call, A.; Teruel, X.; Alonso, L.; Morán, I.; Perez, M.A.; Torrents, D.; Berral, J.L. Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads. In International Conference on High Performance Computing; Springer Nature: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
Stoyanov, S.; Kakanakov, N.; Marinova, M. Secure Heterogeneous Architecture based on RISC-V and root-of-trust. In Proceedings of the 24th International Conference on Computer Systems and Technologies, Ruse Bulgaria, 16–17 June 2023. [Google Scholar] [CrossRef]
Gonzalez, A.; Zhao, J.; Korpan, B.; Genç, H.; Schmidt, C.; Wright, J.; Biswas, A.; Amid, A.; Sheikh, F.; Sorokin, A.; et al. A 16mm² 106.1 GOPS/W Heterogeneous RISC-V Multi-Core Multi-Accelerator SoC in Low-Power 22nm FinFET. In Proceedings of the ESSCIRC 2021—IEEE 47th European Solid State Circuits Conference (ESSCIRC), Grenoble, France, 13–22 September 2021; pp. 259–262. [Google Scholar] [CrossRef]
Kamaleldin, A.; Hesham, S.; Göhringer, D. Towards a Modular RISC-V Based Many-Core Architecture for FPGA Accelerators. IEEE Access 2020, 8, 148812–148826. [Google Scholar] [CrossRef]
Jia, H.; Valavi, H.; Tang, Y.; Zhang, J.; Verma, N. A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing. IEEE J. Solid-State Circuits 2020, 55, 2609–2621. [Google Scholar] [CrossRef]
Dakić, V.; Kovač, M.; Slovinac, J. Evolving High-Performance Computing Data Centers with Kubernetes, Performance Analysis, and Dynamic Workload Placement Based on Machine Learning Scheduling. Electronics 2024, 13, 2651. [Google Scholar] [CrossRef]
Vohra, D. Installing Kubernetes Using Docker. In ubernetes Microservices with Docker; Apress: Berkeley, CA, USA, 2016; pp. 3–38. [Google Scholar] [CrossRef]
Chen, C.; Hung, M.; Lai, K.; Lin, Y. Docker and Kubernetes. In Industry 4.1; The Institute of Electrical and Electronics Engineers, Inc.: Piscataway, NJ, USA, 2021; pp. 169–213. [Google Scholar] [CrossRef]
Menegidio, F.B.; Jabes, D.L.; Costa de Oliveira, R.; Nunes, L.R. Dugong: A Docker Image, Based on Ubuntu Linux, Focused on Reproducibility and Replicability for Bioinformatics Analyses. Bioinformatics 2017, 34, 514–515. [Google Scholar] [CrossRef]
El Haj Ahmed, G.; Gil-Castiñeira, F.; Costa-Montenegro, E. KubCG: A Dynamic Kubernetes Scheduler for Heterogeneous Clusters. Softw. Pract. Exp 2020, 51, 213–234. [Google Scholar] [CrossRef]
Eiermann, A.; Renner, M.; Großmann, M.; Krieger, U.R. On a Fog Computing Platform Built on ARM Architectures by Docker Container Technology. Commun. Comput. Inf. Sci. 2017, 717, 71–86. [Google Scholar] [CrossRef]
Fornari, F.; Cavalli, A.; Cesini, D.; Falabella, A.; Fattibene, E.; Morganti, L.; Prosperini, A.; Sapunenko, V. Distributed Filesystems (GPFS, CephFS and Lustre-ZFS) Deployment on Kubernetes/Docker Clusters. In Proceedings of the International Symposium on Grids & Clouds 2021—PoS (ISGC2021), Taipei, Taiwan, 22–26 March 2021. [Google Scholar] [CrossRef]
Lumpp, F.; Barchi, F.; Acquaviva, A.; Bombieri, N. On the Containerization and Orchestration of RISC-V Architectures for Edge-Cloud Computing. In Proceedings of the 3rd Eclipse Security, AI, Architecture and Modelling Conference on Cloud to Edge Continuum, Ludwigsburg, Germany, 17 October 2023. [Google Scholar] [CrossRef]
Kubernetes.io. Available online: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/ (accessed on 28 August 2024).
Butler, S.; Gamalielsson, J.; Lundell, B.; Brax, C.; Persson, T.; Mattsson, A.; Gustavsson, T.; Feist, J.; Öberg, J. An Exploration of Openness in Hardware and Software Through Implementation of a RISC-V Based Desktop Computer. In Proceedings of the 18th International Symposium on Open Collaboration, Madrid Spain, 7–9 September 2022. [Google Scholar] [CrossRef]
Zaruba, F.; Benini, L. The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-Nm FDSOI Technology. IEEE Trans. VLSI Syst. 2019, 27, 2629–2640. [Google Scholar] [CrossRef]
Miura, J.; Miyazaki, H.; Kise, K. A Portable and Linux Capable RISC-V Computer System in Verilog HDL. arXiv 2020, arXiv:2002.03576. [Google Scholar] [CrossRef]
Ubuntu.com. Ubuntu Tutorials. Available online: https://ubuntu.com/tutorials/how-to-install-ubuntu-on-risc-v-hifive-boards#4-installing-ubuntu-to-an-nvme-drive-only-for-unmatched (accessed on 28 August 2024).
GitHub.com. Available online: https://github.com/carlosedp/riscv-bringup/releases/download/v1.0/kubernetes_1.16.0_riscv64.deb (accessed on 9 August 2024).
GitHub.com. Available online: https://github.com/carlosedp/riscv-bringup/blob/master/kubernetes/Readme.md (accessed on 9 August 2024).
GitHub.com. Available online: https://github.com/vEddYcro/HPCC-HetComp (accessed on 10 August 2024).
Kovač, M.; Dragić, L.; Malnar, B.; Minervini, F.; Palomar, O.; Rojas, C.; Olivieri, M.; Knezović, J.; Kovač, M. FAUST: Design and Implementation of a Pipelined RISC-V Vector Floating-Point Unit. Microprocess. Microsyst. 2023, 97, 104762. [Google Scholar] [CrossRef]
Simakov, N.A.; Deleon, R.L.; White, J.P.; Jones, M.D.; Furlani, T.R.; Siegmann, E.; Harrison, R.J. Are We Ready for Broader Adoption of ARM in the HPC Community: Performance and Energy Efficiency Analysis of Benchmarks and Applications Executed on High-End ARM Systems. In Proceedings of the HPC Asia 2023 Workshops, Singapore, 27 February–2 March 2023. [Google Scholar] [CrossRef]
Elwasif, W.; Godoy, W.; Hagerty, N.; Harris, J.A.; Hernandez, O.; Joo, B.; Kent, P.; Lebrun-Grandie, D.; Maccarthy, E.; Melesse Vergara, V.; et al. Application Experiences on a GPU-Accelerated Arm-Based HPC Testbed. In Proceedings of the HPC Asia 2023 Workshops, Singapore, 27 February–2 March 2023. [Google Scholar] [CrossRef]
Godoy, W.F.; Valero-Lara, P.; Dettling, T.E.; Trefftz, C.; Jorquera, I.; Sheehy, T.; Miller, R.G.; Gonzalez-Tallada, M.; Vetter, J.S.; Churavy, V. Evaluating Performance and Portability of High-Level Programming Models: Julia, Python/Numba, and Kokkos on Exascale Nodes. In Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, FL, USA, 15–19 May 2023. [Google Scholar] [CrossRef]
Christofas, V.; Amanatidis, P.; Karampatzakis, D.; Lagkas, T.; Goudos, S.K.; Psannis, K.E.; Sarigiannidis, P. Comparative Evaluation between Accelerated RISC-V and ARM AI Inference Machines. In Proceedings of the 2023 6th World Symposium on Communication Engineering (WSCE), Thessaloniki, Greece, 27–29 September 2023. [Google Scholar] [CrossRef]
Cococcioni, M.; Rossi, F.; Ruffaldi, E.; Saponara, S. Vectorizing Posit Operations on RISC-V for Faster Deep Neural Networks: Experiments and Comparison with ARM SVE. Neural Comput. Appl. 2021, 33, 10575–10585. [Google Scholar] [CrossRef]
Verma, V.; Stan, M.R. AI-PiM—Extending the RISC-V Processor with Processing-in-Memory Functional Units for AI Inference at the Edge of IoT. Front. Electron. 2022, 3, 898273. [Google Scholar] [CrossRef]

Figure 1. NVMe SSD speed comparison: Turing RK1 vs. SiFive platform.

Figure 2. Docker deployment on the RISC-V platform is now available.

Figure 3. Single-core CPU performance for all platforms.

Figure 4. Memory performance with a single CPU core used for all platforms.

Figure 5. Average compute latency of all available platforms in single-core scenario.

Figure 6. All-core CPU performance for all platforms.

Figure 7. Memory performance with all CPU cores used for all platforms.

Figure 8. Average compute latency of all available platforms in an all-core scenario.

Figure 9. HPL TFLOPS evaluation for all platforms (more is better).

Figure 10. HPL evaluation—time required for HPL evaluation to complete (less is better).

Figure 11. DGEMM double precision scores.

Figure 12. PTRANS and RandomAccess evaluations for large arrays.

Figure 13. HPCC STREAM performance evaluation.

Figure 14. HPCC FFT performance analysis.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Dakić, V.; Mršić, L.; Kunić, Z.; Đambić, G. Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and Kubernetes. Electronics 2024, 13, 3494. https://doi.org/10.3390/electronics13173494

AMA Style

Dakić V, Mršić L, Kunić Z, Đambić G. Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and Kubernetes. Electronics. 2024; 13(17):3494. https://doi.org/10.3390/electronics13173494

Chicago/Turabian Style

Dakić, Vedran, Leo Mršić, Zdravko Kunić, and Goran Đambić. 2024. "Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and Kubernetes" Electronics 13, no. 17: 3494. https://doi.org/10.3390/electronics13173494

APA Style

Dakić, V., Mršić, L., Kunić, Z., & Đambić, G. (2024). Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and Kubernetes. Electronics, 13(17), 3494. https://doi.org/10.3390/electronics13173494

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and Kubernetes

Abstract

1. Introduction

2. Related Work

3. ARM as a Platform for Docker and Kubernetes

3.1. Docker Deployment

3.2. Kubernetes Deployment

4. RISC-V as a Platform for Docker and Kubernetes

4.1. Docker Deployment

4.2. Kubernetes Deployment

5. Experimental Setup and Study Methodology

6. ARM and RISC-V Performance Evaluation

6.1. Single-Core Performance

6.2. All-Core Performance

6.3. HPC Performance

7. Discussion

8. Future Works

9. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI