You are currently on the new version of our website. Access the old version .
ElectronicsElectronics
  • Article
  • Open Access

3 September 2024

Evaluating ARM and RISC-V Architectures for High-Performance Computing with Docker and Kubernetes

,
,
and
1
Department of Cybersecurity and System Engineering, Algebra University, 10000 Zagreb, Croatia
2
Department of Information Systems and Business Analytics, Algebra University, 10000 Zagreb, Croatia
3
Department of Software Engineering, Algebra University, 10000 Zagreb, Croatia
*
Author to whom correspondence should be addressed.
This article belongs to the Section Computer Science & Engineering

Abstract

This paper thoroughly assesses the ARM and RISC-V architectures in the context of high-performance computing (HPC). It includes an analysis of Docker and Kubernetes integration. Our study aims to evaluate and compare these systems’ performance, scalability, and practicality in a general context and then assess the impact they might have on special use cases, like HPC. ARM-based systems exhibited better performance and seamless integration with Docker and Kubernetes, underscoring their advanced development and effectiveness in managing high-performance computing workloads. On the other hand, despite their open-source architecture, RISC-V platforms presented considerable intricacy and difficulties in working with Kubernetes, which hurt their overall effectiveness and ease of management. The results of our study offer valuable insights into the practical consequences of implementing these architectures for HPC, highlighting ARM’s preparedness and the potential of RISC-V while acknowledging the increased complexity and significant trade-offs involved at this point.

1. Introduction

HPC has played a leading role in pushing technological improvements, particularly in scientific research, weather forecasting, financial modeling, and other sectors that rely on extensive computational capabilities. Throughout history, the x86 architecture, introduced by Intel and AMD has been the prevailing force in the field of HPC. The advancement of this design, characterized by ongoing enhancements in computational capability, parallel processing, and energy efficiency, has facilitated the creation of some of the most formidable supercomputers globally. The Summit supercomputer at Oak Ridge National Laboratory and the Frontera at Texas Advanced Computing Center illustrate the exceptional performance of x86 architecture in HPC.
Nevertheless, the increasing need for computational capacity, along with the necessity for energy-saving solutions, have stimulated interest in alternate architectures. ARM, initially developed for energy-efficient use in mobile devices, has gained attention in HPC due to its energy efficiency and expanding processing capabilities. ARM debuted in the HPC field by introducing the Fujitsu A64FX processor in the Fugaku supercomputer. This processor has achieved remarkable success in terms of both performance and power efficiency, leading it to secure the top position on the TOP500 list of the world’s most powerful supercomputers.
The ARM and RISC-V architectures are separate but very capable ISAs (Instruction Set Architectures) in contemporary computing. ARM has become a prominent and influential player, especially in the mobile and embedded industries, thanks to its well-developed network of resources, comprehensive assistance, and demonstrated history of success. Renowned for its exceptional power economy and superior performance, this technology is the favored option for various consumer electronics, from smartphones to IoT devices. The proprietary nature of ARM guarantees a resilient environment with significant support from the industry, but it also restricts the ability to customize and adapt.
Conversely, RISC-V is a developing open-source option quickly gaining popularity, especially in academic and specialized sectors. RISC-V’s open ISA enables developers to customize their designs for specific applications without being limited by licensing fees. This level of transparency promotes creativity and cooperation among individuals worldwide. Although ARM offers a dependable and well-established platform, RISC-V’s ability to be customized and its open nature make it a potentially revolutionary technology, especially in fields that require specialized computing solutions.
Combining these two architectures, the HPC community has shown great interest in using both—either for already well-known capabilities (ARM), or customization and the advantages of an open-source paradigm (RISC-V). However, as discussed in Section 4, the intricate nature of incorporating RISC-V into contemporary software ecosystems, including containerization technologies such as Docker and orchestration tools like Kubernetes, poses distinct difficulties.
Docker and Kubernetes provide a streamlined and effective method for deploying and overseeing applications in many settings, improving scalability, uniformity, and security. In high-performance computing, isolated and reproducible environments are a must; for science experiments, reproducibility and security are paramount to the scientific method. This approach can reduce the effort required for deployment and enhance the efficient exploitation of resources. Standardization expedites development, cooperation, and portability while guaranteeing the smooth operation of applications across many system architectures. Containers are beginning to play a vital role in optimizing HPC workflows, lowering overhead when compared to physical or virtualized workloads, and minimizing costs by enabling more efficient utilization of hardware resources. Therefore, introducing containerization technologies like Docker and Kubernetes can fundamentally transform the management and deployment of HPC workloads. These technologies provide the capacity to quickly move and adapt applications to different computer environments while improving efficiency. By integrating these technologies with ARM and RISC-V platforms, HPC can achieve higher levels of performance and efficiency. However, combining these systems has challenges, mainly due to the variances in architecture and varied levels of software support.
This paper is organized as follows: In the next two sections, we will discuss related research and the basics of ARM and RISC-V platforms as platforms for Docker and Kubernetes. After those two sections, we will discuss our experimental test and setup environment, followed by the performance evaluations, discussion about the performance and feasibility of being used by HPC environments, future works, and conclusion.

3. ARM as a Platform for Docker and Kubernetes

ARM processors are increasingly used to deploy Docker and Kubernetes on Ubuntu because of their energy efficiency, scalability, and cost-effectiveness. This is particularly advantageous in cloud computing and edge contexts. ARM’s RISC architecture is highly efficient at processing high-throughput workloads while consuming less power. This makes it an excellent option for running containerized apps using Docker and orchestrating them with Kubernetes. These benefits are especially noticeable when energy economy and cost-effectiveness are crucial, such as in extensive cloud data centers and dispersed edge computing configurations.
Utilizing ARM processors with Docker on Ubuntu enables developers to generate compact and adaptable containers capable of operating on many platforms, hence offering versatility in deploying applications. The ARM architecture is compatible with multiple Linux distributions, such as Ubuntu, making it a flexible choice for developers who want to utilize containerization technologies. Docker is highly efficient on ARM processors because of their capacity to manage concurrent operations with reduced energy requirements compared to standard x86 processors. Docker packages apps and their dependencies into containers. Efficiency is paramount when implementing services that must be scaled over several nodes, as shown in extensive cloud infrastructures or distributed networks [50].
Kubernetes boosts the functionality of ARM processors by effectively managing and orchestrating Docker containers in a scalable manner. It enables the automatic deployment, scaling, and management of application containers across groups of hosts, offering a framework that guarantees the reliability and resilience of applications. Integrating ARM processors and Kubernetes on Ubuntu provides a robust solution for delivering microservices and other cloud-native applications necessitating comprehensive orchestration. Kubernetes’ capacity to scale and oversee containers over a wide range of nodes, including those utilizing ARM processors, guarantees effective deployment and management of applications, even in different environments [51,52].
Furthermore, researchers have conducted several experiments to investigate the integration of Kubernetes with ARM processors to enhance performance and optimize resource consumption. An example is research conducted on the KubCG platform, which showcased the effectiveness of a dynamic Kubernetes scheduler in enhancing container deployment in clusters with diverse architectures, such as those including ARM processors. The utilization of ARM processors in managing containerized workloads using Kubernetes has demonstrated a notable decrease in job completion time, highlighting the potential for enhanced efficiency. A different research study emphasized the utilization of ARM-based fog computing platforms that employ Docker and Kubernetes for effective data processing at the network edge, further confirming the appropriateness of ARM processors in situations that need both scalability and low latency [53,54].
The combination of ARM processors, Docker, and Kubernetes is seen in the implementation of distributed file systems, which are crucial for efficiently handling data over extensive clusters. Studies have demonstrated that deploying distributed file systems such as CephFS and Lustre-ZFS on ARM-based Kubernetes clusters can enhance the flexibility of data management and the dependability of services. This is especially advantageous in contemporary data centers and cloud environments requiring fast data transfer rates and reliable operations [55].
Using Docker and Kubernetes on Ubuntu operating on ARM processors offers a resilient and effective solution for contemporary cloud computing and edge scenarios [56]. The combination utilizes ARM’s energy-efficient and scalable technology, Docker’s containerization capabilities, and Kubernetes’ powerful orchestration to provide high-performance, cost-effective, and scalable solutions for various applications. This is why cloud providers are partially switching to ARM-based platforms for Kubernetes environments, as they offer excellent performance for most everyday applications while being more efficient than x86 platforms.
For these evaluations, we decided to use the TuringPi2 platform on the ARM side. This platform offers a motherboard to connect four compute modules (similar to Supermicro Twin servers). Multiple models of ARM-based compute modules can be installed in it, which helps the platform’s modularity and reconfiguration with different modules.
Deployment processes on the TuringPi2 platform were quite complex—this is down to our hardware choice due to the lack of availability of Ampere-based servers, for example. However, we had to:
  • Flash the image to the module using the TuringPi2 web interface (for RK1) or the Seeed Studio Development Board Kit connected to an Ubuntu-based laptop with a specific Ubuntu release and NVIDIA’s SDK Manager;
  • Power on the module to enter the default installation;
  • Configure output on TuringPi2 to output from the module via HDMI.
The TuringPi2 platform has HDMI output available, so we could use GUI if required. This is much more convenient than using the USB serial console, which is easy to break physically on our RISC-V platform.
There are multiple reasons why we selected the TuringPi2 platform for these evaluations:
  • Other ARM-based platforms with decent enough performance were unavailable on the market;
  • They offer easily replaceable ARM-based modules that we could use for evaluations;
  • They are incredibly price-efficient when getting to know a new platform;
  • They were widely available and already had a foothold on the market.

3.1. Docker Deployment

Docker deployment on the ARM platform is straightforward. A set of Docker packages is available in repositories for Ubuntu for ARM. Hence, the installation process for Docker requires one apt command on the Ubuntu Server:
apt -y install docker.io
Even if there were no packages, the compilation process for Docker is not a complex task. It takes a couple of hours, but Docker does work after it. After this, Docker containerization features are fully available, feature-par with features available on the x86 platform. This also makes the deployment experience equal to our experience on x86 platforms. Let us now see if the same applies to the process of deploying Kubernetes on the ARM platform.

3.2. Kubernetes Deployment

Kubernetes deployment has always been more complex than Docker deployment, as it is a much bigger platform. However, deployment on ARM closely resembles the deployment process on x86 counterparts. The detailed installation procedure is available online [57].

4. RISC-V as a Platform for Docker and Kubernetes

The RISC-V architecture has become increasingly popular in recent years because of its open-source nature, which enables more customization and freedom in designing processors. This architecture’s scalability and cost-effectiveness make it suitable for cloud computing, IoT, and edge computing applications. However, several obstacles and constraints exist to the RISC-V performance based on the SiFive HiFive Unmatched Rev B platform, the only widely available RISC-V-based platform on the market. This platform is based on the SiFive Freedom U740 SoC RISC-V chip. The first one we will mention is the very slow NVMe controller. Figure 1 clearly shows the difference in performance between the NVMe controller on the TuringPI2 platform versus the SiFive platform based on the U740 RISC-V SoC with the same SSD.
Figure 1. NVMe SSD speed comparison: Turing RK1 vs. SiFive platform.
Operationally speaking, SiFive’s RISC-V platform has one big issue: it is unable to boot from NVMe, i.e., it only boots from microSD, which is much slower. For reference, we are talking about 50 MB/sec cached reads and 1.51 MB/s buffered disk reads, which makes it unusable except for the initial boot and a bit of configuration to make the platform use NVMe as an Ubuntu root partition drive. Even the regular package deployment processes can become unusably slow if we were to go down that route, which is not recommended. This would be a huge issue if we wanted to run containers from a local disk.
Implementing Docker on RISC-V is reasonably seamless, thanks to the flexibility of Linux as a platform, which serves as the foundation for Ubuntu and Docker’s containerization technologies. Nevertheless, difficulties arise in the process of coordinating these containers using Kubernetes. Kubernetes is essential for efficiently managing large-scale containerized applications commonly found in cloud computing settings. Regrettably, there is no complete and officially endorsed version of Kubernetes available for the RISC-V architecture, and there is also a significant lack of available RISC-V-compatible Docker containers with which to work. For example, there is no official Ubuntu RISC-V image available at the time of writing this paper. Therefore, the potential for implementing Kubernetes in a production setting on RISC-V processors is significantly restricted [49,56].
The sole existing binary package of Kubernetes for RISC-V is version 1.16, providing solely fundamental services. As a result of this constraint, certain sophisticated functionalities of Kubernetes, including automatic recovery, scalability, and gradual upgrades, may not operate as intended or necessitate substantial adjustments and customization. Furthermore, the absence of support from the upstream source means that any upgrades or security patches must be applied manually, making it more complicated and increasing the risks involved in maintaining a Kubernetes cluster on RISC-V [56].
Notwithstanding these obstacles, endeavors have been made to narrow the divide. An orchestration platform called KubeEdge-V has been created explicitly for RISC-V computers. This platform establishes the essential elements necessary to facilitate the fundamental functionalities of containerization and orchestration. It has undergone testing on a prototype system utilizing SiFive processors. Nevertheless, this solution is under development and does not give Kubernetes a complete array of functionalities on well-established architectures such as x86 or ARM [56,58].
RISC-V processors present promising opportunities for open-source hardware and software ecosystems [59,60]. The utilization of Docker and Kubernetes on these processors, particularly on Ubuntu, is still at an early stage of development. The absence of a comprehensively endorsed Kubernetes version and the restricted capabilities of the current binary package are substantial obstacles to extensive adoption. Continued progress and assistance from the community will be essential in overcoming these obstacles and fully harnessing the capabilities of RISC-V in cloud-native settings.
First, Linux must be deployed on the set of RISC-V nodes. The deployment process for these platforms is more involved than using an x86 platform. That is partially due to the hardware choices we made and partially due to the immaturity of these platforms. Deployment for the SiFive-based RISC-V platform was as painless as possible [61]:
  • Downloading the Ubuntu Server 24.04 RISC-V image;
  • Unpacking the image and flash it on an SD card for installation by using Raspberry Pi Imager (or dd, if Linux is used);
  • Connecting the serial console and follow the standard Ubuntu boot procedure.
After that, it would be prudent to make the board boot the root filesystem from the NVMe drive—it is much faster than the microSD. We had to change a few settings in the u-boot configuration files. This procedure is documented on the Ubuntu Tutorials homepage [60], and it requires:
  • Downloading the Ubuntu RISC-V image on the booted RISC-V system (booted via MicroSD card);
  • Finding the corresponding NVMe device entry in the Linux/dev filesystem (usually/dev/nvme0n1);
  • Writing the image to the NVMe device using dd.
  • mounting and chrooting the new NVMe-hosted/filesystem to /mnt
  • Changing u-boot configuration and applying it;
  • Rebooting the system.
After that, a system-wide upgrade to the latest packages (apt-get—y upgrade) is recommended, and a reboot is mandatory after the new kernel has been deployed. The following steps involve installing Docker and Kubernetes (if possible), which we will do in the following two sub-sections.

4.1. Docker Deployment

Since we started finishing this paper a couple of months ago, the situation with Docker deployment has improved immensely. There is a set of available Docker packages available in repositories for Ubuntu 24.04 for RISC-V, so the installation process for Docker is straightforward:
apt -y install docker.io
This is a recent development in the Ubuntu/RISC-V world, as these packages were unavailable when we started writing this paper a couple of months ago. The results are visible in Figure 2.
Figure 2. Docker deployment on the RISC-V platform is now available.
Even if there were no packages, the compilation process for Docker is not a big challenge. It takes a couple of hours, but Docker does work after it.

4.2. Kubernetes Deployment

Unfortunately, Kubernetes still does not have current upstream packages for RISC-V. In this sense, there are three available options:
  • To compile Kubernetes from source code (a very complex task that will not necessarily end up being successful);
  • To use the only available binary package for Kubernetes, version for RISC-V (1.16 from September 2019), available online [62];
  • To install and run k3s.
K3s is a version of K8s with a much smaller footprint; it uses fewer resources, the configuration is more straightforward, albeit with a limited set of options, and it is not meant to be scalable and highly available for production-level environments. It is also much more limited in features and extensions while offering limited compatibility with standard K8s tools and extensions. We used three RISC-V nodes based on SiFive HiFive Unmatched boards. We deployed the available, minimal Kubernetes v1.16 package to evaluate whether using RISC-V as a platform makes sense for k8s workloads. However, we must also make note of one simple fact: this package does not contain the full k8s environment with all modules and addons, it just contains the minimum services, like:
  • Set of required services and binaries, like kubectl, kubeadm, etc.;
  • apiserver;
  • Controller-manager;
  • Scheduler;
  • Proxy;
  • Pause (for pod network namespace);
  • etcd;
  • coredns.
First and foremost, a couple of dependencies must be deployed before the k8s v1.16 package deployment. We need to employ a set of commands as described on Carlos Eduardo’s GitHub page [62,63]. Since this GitHub page was made, many new Docker versions have been released, so it is expected to get some warnings about k8s v1.16 not being compatible with, for example, Docker 24.0.7.
After the package deployment on our three nodes, the Kubernetes cluster works, and we can conduct a performance evaluation with it. However, we also need to point out the fact that this package version is five years old and it is missing a whole bunch of new features that were introduced during that time, such as:
  • Changes to Ingress controller (1.18);
  • Better CLI support, logging, new APIs, CSI health monitoring (v1.19);
  • Docker deprecation (v1.20);
  • Changes to Kubelet logging, storage capacity tracking (v1.21);
  • External credential providers support (v1.22);
  • Dual-stack IPv4/IPv6 networking, HorizontalPodAutoscaler v2 changes (v1.23);
  • Removal of Dockershim from Kubelet, changes in storage plugins (v1.24);
  • cgroups v2 support, further changes in storage plugins (v1.25);
  • API changes (v1.26);
  • iptables performance improvements (v1.27);
  • Changes to Ceph support (removal of the CephFS plugin in favor of CephFS CSI driver) (v1.28), etc.
Furthermore, many stability issues exist when deploying Kubernetes from the binary package on Ubuntu 24.04. The Kubelet service times out occasionally (even during the cluster initialization phase), containers sometimes fail to start, issues with networking and firewalling, problems with the cgroups v2 subsystem, etc. However, we got it up and running and ran some tests to understand how this platform performs compared to ARM-based platforms.

5. Experimental Setup and Study Methodology

When we started working on this paper a couple of years ago, the priority was to get access to hardware to do real-life performance evaluations, not to write about theory and technical marketing. Years later, these platforms are still challenging to get, especially in volume. The availability of ARM servers in the EU region is poor. RISC-V is even worse, although it has been years since various vendors promised that they will be available. It is becoming a bit better in 2024, but still, no high-performance RISC-V processors are available, and for example, ARM Ampere-based multicore system availability is not much better.
Ultimately, we have opted to do our software and performance evaluations based on readily available platforms—a set of TuringPI2 platforms plus a selection of ARM-based compute modules for ARM systems and SiFive HiFive Unmatched Rev B for RISC-V. For TuringPi compute modules, we acquired Turing RK1, CM4-based Raspberry Pi CM4 modules, and NVIDIA Jetson TX2 NX. Turing RK1, based on Rockchip RK3588, is by far and away the most performant module within the price envelope. At the time of writing this paper, the TuringPI2 cluster board plus an RK1 price was comparable to SiFive Unmatched Rev B with a RISC-V CPU if we add the cost of memory that was an extra cost for the RISC-V board. Price similarity gave us a good baseline with which to work.
Regarding performance evaluations, we focused on a stack of CPU, memory, and disk evaluations implemented by a set of custom containers managed by Kubernetes. This means that all the scores will be from the perspective of an Alpine container with the necessary tools (stress-ng, sysbench, etc.) installed inside. There was no point in using any GPU tests as GPUs are far from being supported on the RISC-V platform, making the comparison moot. However, we will reflect on that in our Discussion section to provide the correct information.
For HPC performance evaluations, we decided to use a standard set of performance evaluations based on HPCC (High-Performance Challenge), as it has different test suites and gives us a broad performance evaluation for various types of workloads. First and foremost, HPCC needed to be compiled for every one of these platforms. For that, we also had to compile OpenBLAS (Open Basic Linear Algebra Subprograms) library, then compile HPCC (which required a custom Makefile per platform, and then all that was merged into a per-platform Docker image container to keep the methodology constant across all performance evaluations. We used the latest OpenBLAS library (v0.3.28) and the latest version of HPCC (1.5.0). Also, as we used Ubuntu Linux 24.04 across all our platforms, we had to install some dependencies, which was performed via the same command on all platforms:
apt -y install build-essential hwloc libhwloc-dev libevent-dev gfortran libblas-dev liblapack-dev mpich libopenmpi-dev make
On our GitHub page dedicated to this paper [64], we published the procedure for compiling OpenBLAS, installing these dependencies, and finishing Makefiles for HPCC for all platforms. Configuration and compilation processes for these utils take quite a while, so we are publishing these configuration details for transparency reasons in case someone needs them for verification.

6. ARM and RISC-V Performance Evaluation

We used a set of standardized tests for performance evaluation, like stress-ng and sysbench, where available (sysbench is not supported on RISC-V architecture). We focused on CPU and memory performance paired with power usage, as this seemed like a reasonable scenario—these platforms should be efficient compared to x86 platforms. We used an HP ProLiant Gen8 server based on an Intel Xeon E5-2680 CPU for context reasons—we wanted to see how the performance of all these RISC-V and ARM platforms stacks up against a similarly priced x86 CPU, no matter the fact that E5-2680 is a twelve-year-old CPU. The RK3588 processor mentioned in the performance evaluations is the CPU on the Turing RK1 compute module.
Stress-ng has a set of different metrics that we need to cover before moving on to the following sub-section, specifically:
  • bogo ops—measures the amount of bogo ops (bogus operations) done overall by the workload performing the evaluation;
  • bogo ops/s (real time)—measures the amount of bogo ops per second divided by the overall run time of the evaluation;
  • bogo ops/s (usr+sys time)—measures the amount of bogo ops per second based on the combined user and system CPU time used by the evaluation’s workload.
It is also worth remembering that stress-ng can set the number of CPU cores used for performance evaluation (parameter --cpu).
Let us start with single-core performance, as this is very important when dealing with various types of workloads based on containers.

6.1. Single-Core Performance

In single-core performance tests, the Turing RK1 ARM-based system wins considerably. What is surprising is that all other ARM-based platforms and RISC-V-based U740 are nowhere to be found in that respect. We do have to note, though, that NVIDIA Jetson-based TX2 NX does have a built-in GPU with 256 CUDA (Compute Unified Device Architecture), which is one of the reasons why the CPU part of it is losing by such a margin, as can be seen in Figure 3.
Figure 3. Single-core CPU performance for all platforms.
Regarding memory performance, the Turing RK1 ARM-based system is miles ahead of everything else. However, the exciting part is the fact that it is also significantly faster than our x86-based system, as can be seen in Figure 4.
Figure 4. Memory performance with a single CPU core used for all platforms.
The RISC-V-based systems’ score (U740) is a bit misleading in the following performance chart because we cannot do a sysbench latency test on it, so it did not post any scores. If we exclude that result, then we can see that Turing RK1 is still much better than anything else, including other ARM CPUs that also have memory built-in, as can be seen in Figure 5.
Figure 5. Average compute latency of all available platforms in single-core scenario.
We will continue our performance evaluations with multi-core performance tests to see how performance scales across all available cores. E5-2680 has 32 cores with HyperThreading enabled, RK3588 is an 8-core CPU, and all the other CPUs have 4 cores. We can expect this to impact performance results significantly, but that is the whole point—we need a complete overview of a platform’s performance.

6.2. All-Core Performance

With 32 available x86 cores and significantly faster frequency, it is no wonder that E5-2680 is far ahead of every other CPU—but that is also not the point. If we evaluate all different platforms, then we can again see the Turing RK1 compute module being far ahead of all other assessed platforms, as can be seen in Figure 6.
Figure 6. All-core CPU performance for all platforms.
Memory performance for the all-core scenario continues the same trend, as the Rockwell RK3588-based Turing RK1 compute module still has a significant lead compared to RISC-V and other ARM platforms, as can be seen in Figure 7.
Figure 7. Memory performance with all CPU cores used for all platforms.
Again, ignoring that we cannot perform latency testing on the RISC-V platform (U740 chip), Turing RK1 still does very well, although the A72-based CPU has a tiny bit less latency (0.57 ms vs. 0.58 ms). However, this time, we need to take note and compare this latency to the x86-based system, as there is a world of difference between them, with the x86-based system having almost three times the latency as the ARM-based platforms. This is one of the fundamental issues with x86 platforms in general—memory is too far away from the CPU to be less latent, and the built-in caches cannot make that much of a difference compared to CPUs with memory on-chip as ARM chips have. Intel and AMD announced that this issue will be addressed in some of the future x86 chips in the next couple of generations as this design feature has the most detrimental influence on performance. The average latency of all platforms can be seen in Figure 8.
Figure 8. Average compute latency of all available platforms in an all-core scenario.
ARM-based systems are a much better choice for CPU or memory-intensive workloads. It is surprising how much faster they are, especially compared to the similarly priced RISC-V platform. Let us now perform some essential HPC-related evaluations to see what performance we can expect from these platforms.

6.3. HPC Performance

The first stack of tests that we did for our HPC performance evaluation was related to HPL (High-Performance Linpack) in terms of available TFLOPS (Tera Floating Operations Per Second) and HPL time (time required to finish the evaluation). The TFLOPS evaluation tells us how much faster evaluated ARM-based platforms are than RISC-V platform (especially Turing RK1 vs. U740 RISC-V SoC), as can be seen in Figure 9.
Figure 9. HPL TFLOPS evaluation for all platforms (more is better).
HPL time measures the amount needed to finish the evaluation—the shorter the time, the more performance the platform has. The results can be seen in Figure 10.
Figure 10. HPL evaluation—time required for HPL evaluation to complete (less is better).
DGEMM, part of the HPCC benchmark, measures floating point and double-precision matrix-to-matrix multiplication performance. These performance evaluations, as well as some that are coming up next, should scale similarly to the HPL GFLOPS scores, and they do, as can be seen in Figure 11.
Figure 11. DGEMM double precision scores.
PTRANS measurement evaluates the parallel matrix transpose capabilities of our platforms. RandomAccess is a part of that evaluation, measuring random access performance for large data arrays in multicore scenarios. The evaluation results can be seen in Figure 12.
Figure 12. PTRANS and RandomAccess evaluations for large arrays.
HPCC STREAM evaluates the sustainable memory bandwidth transfer in GB/s. RISC-V platform falls to the bottom here, but ARM platforms are surprisingly close to the x86 platform (especially when we consider that they are consuming much less power), as can be seen in Figure 13.
Figure 13. HPCC STREAM performance evaluation.
The last set of performance evaluations is related to FFT, which measures the floating-point DFT (Discrete Fourier Transform) execution rate. Again, Turing RK1 is very close to the x86 system here, and the other ARM platforms are far ahead of the RISC-V platform. as can be seen in Figure 14.
Figure 14. HPCC FFT performance analysis.
In all the performance metrics we could show in this paper (and quite a few more), ARM platforms are much faster than anything the RISC-V platform can offer for the same price. Let us discuss this in a bit more detail in the next section.

7. Discussion

We can conclude that the ARM platform is much more robust and production-ready than the RISC-V platform. Of course, this is not surprising, as it has been on the market for decades. ARM has experience designing CPU architectures from billions of processors being used in various devices, so this was to be expected.
What we did not expect, however, was the difference in performance between our RISC-V platform and all the ARM platforms. The RISC-V platform is much slower in terms of performance and latency. If we were to discuss these results based on the timeline, then a direct comparison could be made between the A72 ARM CPU and the RISC-V platform, as they were launched almost simultaneously. The difference in memory performance (four times plus faster), for the same basic CPU performance in all-core, and better single-core performance is notable when comparing Quad Cortex A72 ARM core to U740 RISC-V core.
Then, there is the comparison to Turing RK1 and NVIDIA Jetson TX2 NX. Yes, both platforms are newer than the RISC-V platform, although TX2 NX was introduced only a few months after U740, while the RK1 was introduced a year and a half later. However, the performance difference, even accounting for the 256 CUDA cores in Jetson TX2 NX, is staggering. We compare them for the same amount of money and a much more favorable power envelope. Jetson’s memory performance is roughly 5× the U740, while RK3588 is 15× plus times faster in memory performance. The CPU performance gap is also quite big—RK3588 is approximately 5× faster, and Jetson TX2 NX is approximately 2× faster than U740. Suppose we count the CUDA cores on Jetson; that makes the comparison even worse. That is why, if we were to deploy micro-clusters for Docker/Kubernetes for either cloud services or super-efficient HPC environments, TuringPi2 platforms based on NVIDIA compute modules, and Turing RK1 are a much more efficient and faster solution. The only fact that works in the RISC-V platform’s favor is its PCI-Express slot on the motherboard. However, that advantage is null and void when we look at the following facts:
  • The only officially supported PCIe graphics cards are AMD RX 500-Series and Radeon HD 6000-Series VGA cards, which are both old and do not run CUDA formally, so they cannot be used to accelerate anything;
  • There is no support for CUDA on the RISC-V platform, even if the platform supports NVIDIA GPUs;
  • There are no known FPGAs that can be used on RISC-V;
  • There are no known ASICs that can be used on RISC-V.
The big plus of RISC-V—the fact that it is an open-source platform—will only start paying dividends when critical players on the market support the platform for familiar use cases. There are currently EU-sponsored projects, such as the European Processor Initiative, for developing an HPC ecosystem based on the RISC-V core. This is where concepts like FAUST [65] will shine—these sorts of specialized acceleration units that can be integrated with RISC-V architecture are where RISC-V’s forte will come to the fore. However, it will also take time, as RISC-V is currently not well supported on the software side. In contrast, the basic hardware side still needs quite a bit of additional development to be used in a heterogeneous HPC environment managed by Kubernetes.
Looking at the performance analytics charts, we can see why ARM, specifically the TuringPI2 platform, is used so often, especially in the education sector, to teach the different ways to do distributed programming and HPC-related topics. These platforms are very price competitive, highly capable, and offer incredible consolidation ratios. When the platform has all four nodes running at full speed, we can have four independent nodes in one mini-ITX system that consumes less than 70 W of power, which is incredibly power-efficient compared to anything x86 offers. For reference, the x86 system idles around 160 W continuously, and its power usage increases to 350 W if used 100%, while the RISC-V system consumes roughly 100 W. These numbers were measured by a switching/monitoring PDU with per-socket power management. ARM Ampere-based systems would probably be an even better example to illustrate the efficiency point, as more and more research points to the fact that ARM is energy-efficient even in high-end CPU designs [66,67,68]. However, TuringPi2 systems with ARM CPUs can handle Docker and Kubernetes and have full upstream support for those platforms; they are incredibly energy-efficient and can be procured quickly and used for educational and production tasks.
Before we discuss possible future research areas, it would be fair to note that some types of workloads would greatly benefit from using RISC-V architectures with specific ISA extensions. For example, AI (Artificial Intelligence) training can be very efficient by using RISC-V with a built-in AI co-processor, just as vectorization of Deep Neural Networks via RISC-V vectorization and using PiM (Processing-in-Memory) units to speed up AI inference [69,70,71]. However, these are incredibly difficult to acquire and were out of reach for our evaluations.

8. Future Works

We see multiple exciting research areas for the future of heterogeneous computing based on different ISAs, especially in HPC. These research areas depend on Intel, AMD, NVIDIA, and others to further develop their ARM, especially the RISC-V-based software stack, and offer readily available software support to continue the research path. Given better workload scheduling, heterogeneous HPC clusters could provide a massive bump in energy efficiency with more development.
Further research is needed to optimize RISC-V performance. Given its open-source nature, RISC-V could use more microarchitectural enhancements, but even more so, with the integration of various hardware accelerators (AI, cryptography, etc.) or different domain-specific architectures for specific industries and tasks, to give RISC-V a bit more of a foothold in specific niche technology areas. If we look at this issue from the perspective of hardware–software co-design, which usually yields code that is optimized for the platform used, then software (Kubernetes, libraries, compilers, HPC apps, tools, etc.) needs to be further developed for RISC-V to be able to extract performance potential in the future. We might also add that we do not necessarily consider this a huge issue—more of a normal stage in the RISC-V platform development—as we see definite value in RISC-V systems with built-in accelerators currently being researched for current and future workloads.
Research into performance optimizations must be combined with additional research into compiler and library optimization to boost performance in specific, targeted applications.
Further research needs to be conducted on energy efficiency and power consumption, especially for various workloads. x86 platforms will be the best overall choice, but there will also be areas where ARM and, potentially, RISC-V might be the correct choice. However, the last five years of development of ARM products for the data center are a great example ARM many-core architectures (for example, Ampere) taking a strong foothold in the data center space, as they are very efficient for many different tasks. Given the choice, cloud providers will gladly sell us the capability to run Kubernetes/Docker environments on ARM-based architectures, and rightfully so, as they are much more efficient than x86 platforms. This research can lead to workload partitioning across different ISA architectures for bottom-up environments for heterogeneous computing.
Regarding the scalability of using heterogeneous computing environments with Kubernetes for HPC, one more critical obstacle must be overcome—the networking utilized to scale the environment. The selection of Ethernet or InfiniBand for interconnects is vital in heterogeneous HPC environments that include x86, ARM, and RISC-V servers and still needs to be thoroughly researched to be ready for production HPC systems. Ethernet, renowned for its adaptability and seamless integration, effortlessly accommodates all three architectures—x86, ARM, and RISC-V—without any compatibility concerns. While it is economically advantageous and suitable for various general-purpose HPC tasks, it may experience increased delay and reduced data transfer rate compared to InfiniBand. In contrast, InfiniBand is designed explicitly for HPC and offers reduced latency and increased bandwidth, making it well-suited for applications that necessitate fast data transfers between nodes. InfiniBand provides strong support for x86 and almost as good support for ARM architectures, facilitating efficient communication and performance scalability in these situations. Nevertheless, the integration of InfiniBand and RISC-V-based platforms is still in its early stages. Although efforts are underway to create InfiniBand drivers and support for RISC-V, it is not currently as advanced as it is for x86 and ARM. Consequently, Ethernet may be favored in settings that extensively employ RISC-V because of its wider and more reliable assistance.
More research is needed into standardization in heterogeneous computing to make it easier for researchers and regular or business users to integrate and switch seamlessly between various architectures. For example, we mentioned that Kubernetes is supported on x86 and ARM but not RISC-V. That means that, even if we wanted to, we cannot use the same toolset that Kubernetes offers on RISC-V, no matter how hard we try. The latest available binary distribution of Kubernetes on RISC-V is five years old and needs to be brought into the present. Research into the management and implementation of such heterogeneous Kubernetes clusters is underway. Given the popularity of the ARM platform and the rising interest in the RISC-V platform, it seems to be the right way to go [71]. We can see a potential future in which different ISAs will be used for various applications in large-scale heterogeneous environments, especially when combined with more research into the energy efficiency of the Kubernetes scheduler workload placement.

9. Conclusions

The research described in this paper thoroughly examines ARM and RISC-V architectures’ performance in general and with HPC workloads, specifically emphasizing their incorporation with Docker and Kubernetes. It provides thorough empirical assessments, practical insights, and performance comparisons between these two architectures, which are increasingly important in the changing landscape of heterogeneous computing systems.
In most instances, the performance study demonstrates the superiority of ARM over RISC-V. The architecture developed by ARM has consistently shown exceptional performance, particularly in tasks that require high memory usage and extensive processing by the CPU. The Turing RK1, based on the ARM architecture, demonstrated superior performance compared to RISC-V, exhibiting notably better data processing speed and reduced latencies for the same price range. The advantage is particularly highlighted in Docker and Kubernetes environments, where ARM’s well-developed ecosystem guarantees smooth integration. The ARM platform’s capacity to efficiently manage intricate containerized workloads makes it a more practical choice for HPC applications, especially in cloud and edge computing scenarios where scalability and power economy are essential.
RISC-V, despite its open-source appeal and ability for customization, encounters substantial obstacles. The study indicates that RISC-V’s performance is inferior to ARM’s in the evaluations performed, particularly in single- and multi-core processing. Furthermore, the absence of well-developed software assistance impeded the incorporation of Kubernetes on RISC-V platforms. The lack of a comprehensive and officially endorsed Kubernetes version for RISC-V is a significant obstacle to its implementation in production-level environments. Despite the potential for increased flexibility and creativity, these drawbacks of the RISC-V architecture and the added intricacies of deploying and operating Kubernetes clusters on this platform outweigh its current benefits.
RISC-V will exhibit its potential in the mid-term, especially in specialized fields where its open-source characteristics and ability to be tailored to specific needs could be advantageous. Excellent examples are various vectorization and AI acceleration units that are either ready or being developed for RISC-V. Optimizing RISC-V’s performance and improving its software ecosystem to increase its competitiveness in heterogeneous computing settings, especially in scenarios where Docker and Kubernetes are becoming more widespread, is an absolute must. Otherwise, RISC-V might end up being an excellent idea that never was—on its own or in heterogeneous environments.

Author Contributions

Conceptualization, V.D. and L.M.; methodology, V.D. and Z.K.; software, Z.K.; validation, V.D. and L.M.; formal analysis, V.D. and G.Đ.; investigation, Z.K. and G.Đ.; resources, L.M.; data curation, V.D. and G.Đ.; writing—original draft preparation, Z.K. and G.Đ.; writing—review and editing, L.M. and V.D.; supervision, L.M. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data supporting this study’s findings are available from the corresponding author, [L.M.], upon reasonable request.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Bruhn, F.; Brunberg, K.; Hines, J.; Asplund, L.; Norgren, M. Introducing radiation tolerant heterogeneous computers for small satellites. In Proceedings of the IEEE Aerospace Conference, Big Sky, MT, USA, 7–14 March 2015; pp. 1–10. [Google Scholar] [CrossRef]
  2. Reichenbach, M.; Holzinger, P.; Häublein, K.; Lieske, T.; Blinzer, P.; Fey, D. Heterogeneous Computing Utilizing FPGAs. J. Signal Process. Syst. 2018, 91, 745–757. [Google Scholar] [CrossRef]
  3. Feng, L.; Liang, H.; Sinha, S.; Zhang, W. HeteroSim: A Heterogeneous CPU-FPGA Simulator. IEEE Comput. Archit. Lett. 2016, 16, 38–41. [Google Scholar] [CrossRef]
  4. Chang, L.; Gómez-Luna, J.; Hajj, I.E.; Huang, S.; Chen, D.; Hwu, W. Collaborative Computing for Heterogeneous Integrated Systems. In Proceedings of the 8th ACM/SPEC on International Conference on Performance Engineering, L’Aquila, Italy, 22–26 April 2017. [Google Scholar] [CrossRef]
  5. Mittal, S.; Vetter, J. A Survey of CPU-GPU Heterogeneous Computing Techniques. ACM Comput. Surv. (CSUR) 2015, 47, 1–35. [Google Scholar] [CrossRef]
  6. Prongnuch, S.; Wiangtong, T. Heterogeneous Computing Platform for data processing. In Proceedings of the 2016 International Symposium on Intelligent Signal Processing and Communication Systems (ISPACS), Phuket, Thailand, 24–27 October 2016; pp. 1–4. [Google Scholar] [CrossRef]
  7. Rethinagiri, S.; Palomar, O.; Moreno, J.; Unsal, O.; Cristal, A. Trigeneous Platforms for Energy Efficient Computing of HPC Applications. In Proceedings of the 2015 IEEE 22nd International Conference on High Performance Computing (HiPC), Bengaluru, India, 16–19 December 2015; pp. 264–274. [Google Scholar] [CrossRef]
  8. Kurth, A.; Vogel, P.; Capotondi, A.; Marongiu, A.; Benini, L. HERO: Heterogeneous Embedded Research Platform for Exploring RISC-V Manycore Accelerators on FPGA. arXiv 2017, arXiv:1712.06497. [Google Scholar] [CrossRef]
  9. Parnassos, I.; Bellas, N.; Katsaros, N.; Patsiatzis, N.; Gkaras, A.; Kanellis, K.; Antonopoulos, C.; Spyrou, M.; Maroudas, M. A programming model and runtime system for approximation-aware heterogeneous computing. In Proceedings of the 2017 27th International Conference on Field Programmable Logic and Applications (FPL), Ghent, Belgium, 4–8 September 2017; pp. 1–4. [Google Scholar] [CrossRef]
  10. Lopez-Novoa, U.; Mendiburu, A.; Miguel-Alonso, J. A Survey of Performance Modeling and Simulation Techniques for Accelerator-Based Computing. IEEE Trans. Parallel Distrib. Syst. 2015, 26, 272–281. [Google Scholar] [CrossRef]
  11. Liao, S.-W.; Kuang, S.-Y.; Kao, C.-L.; Tu, C.-H. A Halide-based Synergistic Computing Framework for Heterogeneous Systems. J. Signal Process. Syst. 2019, 91, 219–233. [Google Scholar] [CrossRef]
  12. Wu, Z.; Hammad, K.; Beyene, A.; Dawji, Y.; Ghafar-Zadeh, E.; Magierowski, S. An FPGA Implementation of A Portable DNA Sequencing Device Based on RISC-V. In Proceedings of the 2022 20th IEEE Interregional NEWCAS Conference (NEWCAS), Quebec City, QC, Canada, 19–22 June 2022; pp. 417–420. [Google Scholar] [CrossRef]
  13. Young, J.; Jezghani, A.; Valdez, J.; Jijina, S.; Liu, X.; Weiner, M.D.; Powell, W.; Sarajlic, S. Enhancing HPC Education and Workflows with Novel Computing Architectures. J. Comput. Sci. Educ. 2022, 13, 31–38. [Google Scholar] [CrossRef]
  14. Kurth, A.; Forsberg, B.; Benini, L. HEROv2: Full-Stack Open-Source Research Platform for Heterogeneous Computing. IEEE Trans. Parallel Distrib. Syst. 2022, 33, 4368–4382. [Google Scholar] [CrossRef]
  15. Ojika, D.; Gordon-Ross, A.; Lam, H.; Yoo, S.; Cui, Y.; Dong, Z.; Dam, K.V.; Lee, S.; Kurth, T. PCS: A Productive Computational Science Platform. In Proceedings of the 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 15–19 July 2019; pp. 636–641. [Google Scholar] [CrossRef]
  16. Fiolhais, L.; Gonçalves, F.F.; Duarte, R.; Véstias, M.; Sousa, J. Low Energy Heterogeneous Computing with Multiple RISC-V and CGRA Cores. In Proceedings of the 2019 IEEE International Symposium on Circuits and Systems (ISCAS), Sapporo, Japan, 26–29 May 2019; pp. 1–5. [Google Scholar] [CrossRef]
  17. Dong, J.; Zheng, F.; Lin, J.; Liu, Z.; Xiao, F.; Fan, G. EC-ECC: Accelerating Elliptic Curve Cryptography for Edge Computing on Embedded GPU TX2. ACM Trans. Embed. Comput. Syst. (TECS) 2022, 21, 1–25. [Google Scholar] [CrossRef]
  18. D’Agostino, D.; Cesini, D. Editorial: Heterogeneous Computing for AI and Big Data in High Energy Physics. Front. Big Data 2021, 4, 652881. [Google Scholar] [CrossRef]
  19. Hu, N.; Wang, C.; Zhou, X. FLIA: Architecture of Collaborated Mobile GPU and FPGA Heterogeneous Computing. Electronics 2022, 11, 3756. [Google Scholar] [CrossRef]
  20. Du, D.; Liu, Q.; Jiang, X.; Xia, Y.; Zang, B.; Chen, H. Serverless computing on heterogeneous computers. In Proceedings of the 27th ACM International Conference on Architectural Support for Programming Languages and Operating Systems, Lausanne, Switzerland, 28 February–4 March 2022. [Google Scholar] [CrossRef]
  21. Núñez-Yáñez, J. Energy Proportional Heterogenous Computing with Reconfigurable MPSoC. In Proceedings of the 2019 International Conference on High Performance Computing & Simulation (HPCS), Dublin, Ireland, 15–19 July 2019; p. 642. [Google Scholar] [CrossRef]
  22. Carballo-Hernández, W.; Pelcat, M.; Berry, F. Why is FPGA-GPU Heterogeneity the Best Option for Embedded Deep Neural Networks? arXiv 2021, arXiv:2102.01343. [Google Scholar] [CrossRef]
  23. Mahmoud, D.G.; Lenders, V.; Stojilović, M. Electrical-Level Attacks on CPUs, FPGAs, and GPUs: Survey and Implications in the Heterogeneous Era. ACM Comput. Surv. (CSUR) 2022, 55, 1–40. [Google Scholar] [CrossRef]
  24. Freytag, G.; Serpa, M.; Lima, J.F.; Rech, P.; Navaux, P. Collaborative execution of fluid flow simulation using non-uniform decomposition on heterogeneous architectures. J. Parallel Distrib. Comput. 2021, 152, 11–20. [Google Scholar] [CrossRef]
  25. Rodríguez, A.; Navarro, A.; Asenjo, R.; Corbera, F.; Gran, R.; Suárez, D.; Núñez-Yáñez, J. Parallel multiprocessing and scheduling on the heterogeneous Xeon+FPGA platform. J. Supercomput. 2019, 76, 4645–4665. [Google Scholar] [CrossRef]
  26. Datta, D.; Gordon, M.S. Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives. J. Chem. Theory Comput. 2023, 19, 7640–7657. [Google Scholar] [CrossRef]
  27. Lastovetsky, A.L.; Manumachu, R.R. The 27th International Heterogeneity in Computing Workshop and the 16th International Workshop on Algorithms, Models and Tools for Parallel Computing on Heterogeneous Platforms. Concurr. Comput. Pract. Exp. 2020, 104–118. [Google Scholar] [CrossRef]
  28. Horta, E.; Chuang, H.-R.; VSathish, N.R.; Philippidis, C.J.; Barbalace, A.; Olivier, P.; Ravindran, B. Xar-trek: Run-time execution migration among FPGAs and heterogeneous-ISA CPUs. In Proceedings of the 22nd International Middleware Conference, Québec City, QC, Canada, 6–10 December 2021. [Google Scholar] [CrossRef]
  29. Cerf, V. On heterogeneous computing. Commun. ACM 2021, 64, 9. [Google Scholar] [CrossRef]
  30. Wyrzykowski, R.; Ciorba, F. Algorithmic and software development advances for next-generation heterogeneous platforms. Concurr. Comput. Pract. Exp. 2022, 34, e7013. [Google Scholar] [CrossRef]
  31. Hagleitner, C.; Diamantopoulos, D.; Ringlein, B.; Evangelinos, C.; Johns, C.; Chang, R.N.; D’Amora, B.D.; Kahle, J.; Sexton, J.; Johnston, M.; et al. Heterogeneous Computing Systems for Complex Scientific Discovery Workflows. In Proceedings of the 2021 Design, Automation & Test in Europe Conference & Exhibition (DATE), Grenoble, France, 1–5 February 2021; pp. 13–18. [Google Scholar] [CrossRef]
  32. Mavrogeorgis, N. Simplifying heterogeneous migration between x86 and ARM machines. arXiv 2021, arXiv:2112.01189. [Google Scholar]
  33. Thomadakis, P.; Chrisochoides, N. Towards Performance Portable Programming for Distributed Heterogeneous Systems. arXiv 2022, arXiv:2210.01238. [Google Scholar] [CrossRef]
  34. Nikov, K.; Hosseinabady, M.; Asenjo, R.; Rodríguez, A.; Navarro, A.; Núñez-Yáñez, J. High-Performance Simultaneous Multiprocessing for Heterogeneous System-on-Chip. arXiv 2020, arXiv:2008.08883. [Google Scholar]
  35. Fuentes, J.; López, D.; González, S. Teaching Heterogeneous Computing Using DPC++. In Proceedings of the 2022 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), Lyon, France, 30 May–3 June 2022; pp. 354–360. [Google Scholar] [CrossRef]
  36. Thomadakis, P.; Chrisochoides, N. Runtime Support for Performance Portability on Heterogeneous Distributed Platforms. arXiv 2023, arXiv:2303.02543. [Google Scholar] [CrossRef]
  37. Kavanagh, R.; Djemame, K.; Ejarque, J.; Badia, R.M.; García-Pérez, D. Energy-Aware Self-Adaptation for Application Execution on Heterogeneous Parallel Architectures. IEEE Trans. Sustain. Comput. 2020, 5, 81–94. [Google Scholar] [CrossRef]
  38. Yu, Y.; Zhang, S.; Fu, H.; Wu, L.; Chen, D.; Gao, Y.; Wei, Z.; Jia, D.; Lin, X. Characterizing uncertainties of Earth system modeling with heterogeneous many-core architecture computing. Geoscientific Model Development. Geosci. Model Dev. 2022, 15, 6695–6708. [Google Scholar] [CrossRef]
  39. Cheng, Y.; Sun, W.-T.; Bi, Y.; Cheng, Y.; Shi, J.; Wang, L.; Yao, Q.; Hu, Q.; Zhang, M. Large Scale ARM Computing Cluster and its Application in HEP. In Proceedings of the International Symposium on Grids & Clouds 2022—PoS (ISGC2022), Taipei, Taiwan, 21–25 March 2022. [Google Scholar] [CrossRef]
  40. Kamaleldin, A.; Göhringer, D. AGILER: An Adaptive Heterogeneous Tile-Based Many-Core Architecture for RISC-V Processors. IEEE Access 2022, 10, 43895–43913. [Google Scholar] [CrossRef]
  41. Nicholas, G.S.; Gui, Y.; Saqib, F. A Survey and Analysis on SoC Platform Security in ARM, Intel and RISC-V Architecture. In Proceedings of the 2020 IEEE 63rd International Midwest Symposium on Circuits and Systems (MWSCAS), Springfield, MA, USA, 9–12 August 2020; pp. 718–721. [Google Scholar] [CrossRef]
  42. Wang, X.; Leidel, J.D.; Williams, B.; Ehret, A.; Mark, M.; Kinsy, M.; Chen, Y. xBGAS: A Global Address Space Extension on RISC-V for High Performance Computing. In Proceedings of the 2021 IEEE International Parallel and Distributed Processing Symposium (IPDPS), Portland, OR, USA, 17–21 May 2021; pp. 454–463. [Google Scholar] [CrossRef]
  43. Tornero, R.; Rodríguez, D.; Martínez, J.M.; Flich, J. An Open-Source FPGA Platform for Shared-Memory Heterogeneous Many-Core Architecture Exploration. In Proceedings of the 2023 38th Conference on Design of Circuits and Integrated Systems (DCIS), Málaga, Spain, 15–17 November 2023; pp. 1–6. [Google Scholar] [CrossRef]
  44. Gómez-Sánchez, G.; Call, A.; Teruel, X.; Alonso, L.; Morán, I.; Perez, M.A.; Torrents, D.; Berral, J.L. Challenges and Opportunities for RISC-V Architectures towards Genomics-based Workloads. In International Conference on High Performance Computing; Springer Nature: Cham, Switzerland, 2023. [Google Scholar] [CrossRef]
  45. Stoyanov, S.; Kakanakov, N.; Marinova, M. Secure Heterogeneous Architecture based on RISC-V and root-of-trust. In Proceedings of the 24th International Conference on Computer Systems and Technologies, Ruse Bulgaria, 16–17 June 2023. [Google Scholar] [CrossRef]
  46. Gonzalez, A.; Zhao, J.; Korpan, B.; Genç, H.; Schmidt, C.; Wright, J.; Biswas, A.; Amid, A.; Sheikh, F.; Sorokin, A.; et al. A 16mm2 106.1 GOPS/W Heterogeneous RISC-V Multi-Core Multi-Accelerator SoC in Low-Power 22nm FinFET. In Proceedings of the ESSCIRC 2021—IEEE 47th European Solid State Circuits Conference (ESSCIRC), Grenoble, France, 13–22 September 2021; pp. 259–262. [Google Scholar] [CrossRef]
  47. Kamaleldin, A.; Hesham, S.; Göhringer, D. Towards a Modular RISC-V Based Many-Core Architecture for FPGA Accelerators. IEEE Access 2020, 8, 148812–148826. [Google Scholar] [CrossRef]
  48. Jia, H.; Valavi, H.; Tang, Y.; Zhang, J.; Verma, N. A Programmable Heterogeneous Microprocessor Based on Bit-Scalable In-Memory Computing. IEEE J. Solid-State Circuits 2020, 55, 2609–2621. [Google Scholar] [CrossRef]
  49. Dakić, V.; Kovač, M.; Slovinac, J. Evolving High-Performance Computing Data Centers with Kubernetes, Performance Analysis, and Dynamic Workload Placement Based on Machine Learning Scheduling. Electronics 2024, 13, 2651. [Google Scholar] [CrossRef]
  50. Vohra, D. Installing Kubernetes Using Docker. In ubernetes Microservices with Docker; Apress: Berkeley, CA, USA, 2016; pp. 3–38. [Google Scholar] [CrossRef]
  51. Chen, C.; Hung, M.; Lai, K.; Lin, Y. Docker and Kubernetes. In Industry 4.1; The Institute of Electrical and Electronics Engineers, Inc.: Piscataway, NJ, USA, 2021; pp. 169–213. [Google Scholar] [CrossRef]
  52. Menegidio, F.B.; Jabes, D.L.; Costa de Oliveira, R.; Nunes, L.R. Dugong: A Docker Image, Based on Ubuntu Linux, Focused on Reproducibility and Replicability for Bioinformatics Analyses. Bioinformatics 2017, 34, 514–515. [Google Scholar] [CrossRef]
  53. El Haj Ahmed, G.; Gil-Castiñeira, F.; Costa-Montenegro, E. KubCG: A Dynamic Kubernetes Scheduler for Heterogeneous Clusters. Softw. Pract. Exp 2020, 51, 213–234. [Google Scholar] [CrossRef]
  54. Eiermann, A.; Renner, M.; Großmann, M.; Krieger, U.R. On a Fog Computing Platform Built on ARM Architectures by Docker Container Technology. Commun. Comput. Inf. Sci. 2017, 717, 71–86. [Google Scholar] [CrossRef]
  55. Fornari, F.; Cavalli, A.; Cesini, D.; Falabella, A.; Fattibene, E.; Morganti, L.; Prosperini, A.; Sapunenko, V. Distributed Filesystems (GPFS, CephFS and Lustre-ZFS) Deployment on Kubernetes/Docker Clusters. In Proceedings of the International Symposium on Grids & Clouds 2021—PoS (ISGC2021), Taipei, Taiwan, 22–26 March 2021. [Google Scholar] [CrossRef]
  56. Lumpp, F.; Barchi, F.; Acquaviva, A.; Bombieri, N. On the Containerization and Orchestration of RISC-V Architectures for Edge-Cloud Computing. In Proceedings of the 3rd Eclipse Security, AI, Architecture and Modelling Conference on Cloud to Edge Continuum, Ludwigsburg, Germany, 17 October 2023. [Google Scholar] [CrossRef]
  57. Kubernetes.io. Available online: https://kubernetes.io/docs/tasks/tools/install-kubectl-linux/ (accessed on 28 August 2024).
  58. Butler, S.; Gamalielsson, J.; Lundell, B.; Brax, C.; Persson, T.; Mattsson, A.; Gustavsson, T.; Feist, J.; Öberg, J. An Exploration of Openness in Hardware and Software Through Implementation of a RISC-V Based Desktop Computer. In Proceedings of the 18th International Symposium on Open Collaboration, Madrid Spain, 7–9 September 2022. [Google Scholar] [CrossRef]
  59. Zaruba, F.; Benini, L. The Cost of Application-Class Processing: Energy and Performance Analysis of a Linux-Ready 1.7-GHz 64-Bit RISC-V Core in 22-Nm FDSOI Technology. IEEE Trans. VLSI Syst. 2019, 27, 2629–2640. [Google Scholar] [CrossRef]
  60. Miura, J.; Miyazaki, H.; Kise, K. A Portable and Linux Capable RISC-V Computer System in Verilog HDL. arXiv 2020, arXiv:2002.03576. [Google Scholar] [CrossRef]
  61. Ubuntu.com. Ubuntu Tutorials. Available online: https://ubuntu.com/tutorials/how-to-install-ubuntu-on-risc-v-hifive-boards#4-installing-ubuntu-to-an-nvme-drive-only-for-unmatched (accessed on 28 August 2024).
  62. GitHub.com. Available online: https://github.com/carlosedp/riscv-bringup/releases/download/v1.0/kubernetes_1.16.0_riscv64.deb (accessed on 9 August 2024).
  63. GitHub.com. Available online: https://github.com/carlosedp/riscv-bringup/blob/master/kubernetes/Readme.md (accessed on 9 August 2024).
  64. GitHub.com. Available online: https://github.com/vEddYcro/HPCC-HetComp (accessed on 10 August 2024).
  65. Kovač, M.; Dragić, L.; Malnar, B.; Minervini, F.; Palomar, O.; Rojas, C.; Olivieri, M.; Knezović, J.; Kovač, M. FAUST: Design and Implementation of a Pipelined RISC-V Vector Floating-Point Unit. Microprocess. Microsyst. 2023, 97, 104762. [Google Scholar] [CrossRef]
  66. Simakov, N.A.; Deleon, R.L.; White, J.P.; Jones, M.D.; Furlani, T.R.; Siegmann, E.; Harrison, R.J. Are We Ready for Broader Adoption of ARM in the HPC Community: Performance and Energy Efficiency Analysis of Benchmarks and Applications Executed on High-End ARM Systems. In Proceedings of the HPC Asia 2023 Workshops, Singapore, 27 February–2 March 2023. [Google Scholar] [CrossRef]
  67. Elwasif, W.; Godoy, W.; Hagerty, N.; Harris, J.A.; Hernandez, O.; Joo, B.; Kent, P.; Lebrun-Grandie, D.; Maccarthy, E.; Melesse Vergara, V.; et al. Application Experiences on a GPU-Accelerated Arm-Based HPC Testbed. In Proceedings of the HPC Asia 2023 Workshops, Singapore, 27 February–2 March 2023. [Google Scholar] [CrossRef]
  68. Godoy, W.F.; Valero-Lara, P.; Dettling, T.E.; Trefftz, C.; Jorquera, I.; Sheehy, T.; Miller, R.G.; Gonzalez-Tallada, M.; Vetter, J.S.; Churavy, V. Evaluating Performance and Portability of High-Level Programming Models: Julia, Python/Numba, and Kokkos on Exascale Nodes. In Proceedings of the 2023 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW), St. Petersburg, FL, USA, 15–19 May 2023. [Google Scholar] [CrossRef]
  69. Christofas, V.; Amanatidis, P.; Karampatzakis, D.; Lagkas, T.; Goudos, S.K.; Psannis, K.E.; Sarigiannidis, P. Comparative Evaluation between Accelerated RISC-V and ARM AI Inference Machines. In Proceedings of the 2023 6th World Symposium on Communication Engineering (WSCE), Thessaloniki, Greece, 27–29 September 2023. [Google Scholar] [CrossRef]
  70. Cococcioni, M.; Rossi, F.; Ruffaldi, E.; Saponara, S. Vectorizing Posit Operations on RISC-V for Faster Deep Neural Networks: Experiments and Comparison with ARM SVE. Neural Comput. Appl. 2021, 33, 10575–10585. [Google Scholar] [CrossRef]
  71. Verma, V.; Stan, M.R. AI-PiM—Extending the RISC-V Processor with Processing-in-Memory Functional Units for AI Inference at the Edge of IoT. Front. Electron. 2022, 3, 898273. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.