1. Introduction
The cost of owning and maintaining high-end supercomputers capable of dealing with large-scale problems is usually too high for medium-sized companies or educational institutions. This problem is addressed by modern big data engines such as Apache Spark [
1] that are designed to run on commodity hardware, which is efficiently combined to create computing clusters. The common practice is to lease the hardware that is necessary for executing analyses in the form of virtual machines that reside in the cloud (IaaS). In this way, an organization can greatly reduce its running costs, as it pays for hardware only when it uses it. However, the majority of organizations still have to purchase and maintain personal computers for their employees. This hardware remains largely underutilized as it is only used during working hours, and even when it is in use, most of the available resources (CPUs, RAM, etc.) are in an idle state. The work of Abdulghafour Mohammad and Yasir Abbas [
2] identifies 11 major barriers that hinder small and medium enterprises from effectively utilizing cloud computing resources.
Such barriers can be categorized into technological, organizational, and environmental barriers. Technological issues include security and privacy concerns, integration difficulties with existing infrastructure, and unreliable cloud service performance. Organizational challenges stem from limited IT expertise, financial constraints, and resistance to change, which hinder smooth adoption. Environmental barriers involve regulatory compliance, vendor lock-in, market pressure for rapid adoption, unreliable internet connectivity, and lack of trust in service providers. To overcome these obstacles, organizations must adopt strategic planning, invest in training, and carefully select cloud providers to maximize the benefits of cloud computing.
Moreover, the current trend of research in this area is concentrated on high-performance systems, focusing on optimizing resource allocation, minimizing latency, and improving scalability in dynamic and heterogeneous environments. Advanced frameworks like EVRM [
3] leverage cutting-edge technologies such as deep reinforcement learning to manage multiple virtual resources, ensuring efficient utilization of CPUs, memory, and bandwidth while adapting to fluctuating workloads. These innovations aim to address the limitations of traditional resource management approaches, enhancing system performance and reliability, especially in the context of virtualization and cloud computing.
The Elastic Virtual Resource Management (EVRM) framework, which is the work of Wang et al. (2025) [
3], incorporates four modules—monitor, analyzer, planner, and executor—to dynamically manage virtual instances. The framework employs a deep reinforcement learning-based resource allocation method, Deep Deterministic Policy Gradient-based Resource Allocation (DDPG-RA), to optimize the allocation of CPUs, memory, and bandwidth. This approach learns the complex dependencies between resources and application performance, enabling better adaptation to fluctuating demands.
The demand for high-performance computing has increased dramatically, fuelled by advancements in big data analytics, artificial intelligence, and other computation-intensive workloads [
4]. However, acquiring and maintaining specialized hardware, such as supercomputers, remains financially challenging for many medium-sized organizations and educational institutions [
5]. This limitation has led to a growing interest in utilizing commodity hardware and virtualization technologies to achieve scalable and cost-effective computing solutions.
We developed a system that integrates cutting-edge open-source virtualization and cluster management technologies to address this need. Our approach involves deploying a hypervisor stack, virtualization management software, and a desktop environment on standard personal computers. This setup enables the creation of virtual machines (VMs) on each physical host, which users can seamlessly access through a VNC-based application. This ensures that users can perform their daily tasks without realizing they are interacting with a virtualized system rather than one running on physical hardware. This configuration also empowers system administrators with precise control over resource allocation for VMs on each host, allowing them to run additional tasks on the same machines.
The use of virtual machines instead of physical machines provides significant security enhancements by isolating workloads effectively [
6]. Each VM operates within its own sandboxed environment, ensuring that a compromise in one VM does not impact others or the host system. This isolation reduces the risk of malware propagation or unauthorized access. Additionally, VMs enable secure snapshots and rollbacks, allowing administrators to quickly restore systems to a known safe state in the event of a security breach. Virtualized environments also support advanced monitoring and access control mechanisms, simplifying real-time threat detection and response. Features like virtual network segmentation and controlled resource sharing further enhance the security and manageability of the infrastructure compared to traditional physical systems.
For enhanced flexibility and easier maintenance, the physical machines are interconnected to form a cluster. This cluster setup provides numerous benefits for managing virtualized environments efficiently. By pooling resources such as CPU, RAM, and storage across multiple nodes, the cluster enhances scalability and ensures optimal utilization [
7]. It facilitates seamless VM migration between nodes with minimal downtime, ensuring high availability and robust disaster recovery capabilities to maintain service continuity even during hardware failures [
8]. Centralized cluster management simplifies administration by offering a unified interface to monitor and control the entire infrastructure. Moreover, shared storage solutions improve data redundancy and accessibility. With integrated tools for backup, restoration, and resource allocation, clustering enables organizations to maintain highly efficient, flexible, and reliable virtualized infrastructures.
This cluster, as mentioned before, can also be exploited to run big data engines like Apache Spark. However, this cannot be achieved without a specialized cluster manager for these kinds of jobs. Our solution is to host (in the same physical machines) some additional virtual machines that form a Kubernetes [
9] cluster. Using Apache Spark with Kubernetes (K8s) offers powerful advantages for managing big data workloads. Kubernetes provides a flexible, scalable, and containerized environment that simplifies deploying and orchestrating Spark applications. Unlike Spark’s Standalone mode, which lacks the breadth of ecosystem integrations and advanced scheduling features, Kubernetes provides fine-grained control over resource allocation and integrates natively with DevOps workflows, CI/CD pipelines, and monitoring tools. By leveraging Kubernetes’ built-in resource management, Spark clusters can scale dynamically to handle varying workloads efficiently. The integration allows seamless containerization, ensuring consistency across development, testing, and production environments. Additionally, Kubernetes enhances fault tolerance by automatically rescheduling Spark executors in case of failures. Its native support for multi-cloud and on-premises environments ensures flexibility in infrastructure choices, while unified logging and monitoring tools streamline performance tracking and debugging. This container-centric approach reduces overhead, ensures portability, and simplifies management compared to provisioning and maintaining dedicated Spark Standalone clusters. Together, K8s and Spark enable robust, scalable, and efficient big data processing pipelines.
Compared to Mesos or YARN, Kubernetes benefits from a wider adoption and more active development community, which translates into better support, frequent updates, and rich plugins for networking, storage, and security. While YARN is tailored to Hadoop ecosystems and Mesos is known for generalized cluster management, Kubernetes embraces a broader cloud-native philosophy that lets Spark coexist seamlessly with other containerized workloads, ultimately delivering greater flexibility and efficiency. In CommC’s case, this setup allows for running a series of services in the same Kubernetes environment. Running web services on Kubernetes enhances scalability, reliability, and operational efficiency by automating scaling to handle traffic fluctuations, ensuring consistent performance, and optimizing resource use. Its self-healing capabilities replace failed containers and restart unresponsive services, boosting uptime. Kubernetes simplifies deployments with declarative configurations, enabling seamless updates and rollbacks while minimizing disruptions. Moreover, Kubernetes natively provides both Layer 4 (TCP/UDP) load-balancing through its Service abstraction (e.g., ClusterIP, NodePort, and LoadBalancer Services) and Layer 7 (HTTP/HTTPS) load-balancing via the Ingress resource, which ensures even traffic distribution for better performance and resilience. Its platform-agnostic design supports deployments across on-premises, cloud, or hybrid environments, avoiding vendor lock-in. Robust networking features, including service discovery and secure communication, combined with integrated monitoring and logging, make Kubernetes an ideal platform for deploying, managing, and scaling modern web applications.
Nonetheless, the architecture that was described in the previous paragraphs comes with some drawbacks, too. Some jobs are inherently difficult to parallelize, and the machines used in this scenario are very limited, so it is of great importance to extract every bit of available performance without sacrificing the ease of managing the system. Our proposed solution for serial performance-oriented tasks is to run Linux containers alongside the full virtual machines. This enables us to have a high-level production orchestration system for managing our cluster and, at the same time, maintain full control over the resources allocated by the nodes participating in it.
However, apart from the technological advantages, this approach challenges environmental issues. It is common knowledge that the rapid growth of electronic waste (E-waste) [
10] has been noted as an issue by various resources and organizations. The work of Jain et al. (2023) [
10] highlights that E-waste generated by the disposal of electronics like laptops, mobile phones, and televisions contains hazardous materials posing significant risks to human health and the environment. The study explores global and Indian E-waste production trends, identifying factors such as industrialization, technological advancements, and poor recycling practices as key contributors. E-waste arises not only from the end of a product’s lifespan but also from consumer behavior and technological obsolescence. Despite its rapid growth (3–4% annually), only 15% of E-waste is recycled, highlighting a significant environmental and health concern due to toxic substances [
11]. Our proposed system architecture expands the lifespan of systems deemed obsolete and will eventually deteriorate the E-waste issue that our planet suffers from.
Our contribution: The CommC system introduces a novel approach to leveraging commodity hardware for cost-effective, scalable, and multi-purpose computing. Unlike traditional cloud-based or high-performance computing (HPC) cluster solutions, CommC integrates virtualization (Proxmox VE) and container orchestration (Kubernetes and Docker) within the same physical infrastructure, optimizing resource utilization while maintaining workload isolation. This hybrid model allows organizations to repurpose underutilized hardware, providing a low-cost alternative to expensive cloud services while retaining flexibility.
One of the core contributions of CommC is its cost-efficiency. By utilizing open-source technologies, CommC offers a self-managed cloud-like environment without incurring vendor lock-in or recurring subscription fees. Our benchmarking results demonstrate significant cost reductions compared to cloud providers such as Google Cloud, AWS, and Microsoft Azure, making it an ideal solution for medium-sized enterprises, research institutions, and educational organizations. Additionally, CommC ensures scalability and adaptability, dynamically adjusting resource allocation based on workload demands. In the current system state, an administrator is responsible for the resource allocation task. Our real-world experiments validate its ability to handle machine learning, big data analytics, and web services efficiently.
Beyond cost and performance benefits, CommC contributes to environmental sustainability by extending the lifespan of older hardware. By repurposing existing systems, the platform helps mitigate electronic waste (E-waste), offering a greener alternative to cloud computing, which relies on energy-intensive data centers. Furthermore, CommC enhances security and fault tolerance by leveraging KVM-based virtualization and Kubernetes security policies, ensuring workload isolation, access control, and resilience against cyber threats.
Our novelty: The CommC system was developed as a direct response to the high cost of cloud infrastructure, which often makes large-scale computing inaccessible to medium-sized enterprises, research institutions, and educational organizations. Even though cloud platforms like Google Cloud, AWS, and Microsoft Azure offer scalable computing solutions, their recurring costs, data transfer fees, and vendor lock-in present significant financial and operational challenges. Many organizations face the dilemma of either investing in expensive cloud services or struggling with underutilized local hardware, which remains idle for large portions of the day. CommC bridges this gap by providing a scalable, cloud-like environment that repurposes existing hardware into a cost-efficient and high-performance computing cluster. Additionally, CommC enables cost savings of up to – compared to cloud platforms by eliminating compute instance rental fees, storage costs, and network egress charges. Instead of paying for cloud infrastructure that scales with usage, organizations using CommC can leverage their own underutilized hardware, transforming idle machines into an efficient computing cluster.
Beyond cost efficiency, CommC promotes sustainability by extending the lifespan of commodity hardware, reducing electronic waste (E-waste), and lowering energy consumption compared to large cloud data centers. Unlike hyperscale cloud providers, which operate energy-intensive facilities, CommC runs on local infrastructure, offering a greener alternative without sacrificing computational power.
In summary, the proposed architecture was designed to eliminate the financial and operational barriers of cloud computing by offering an affordable, scalable, and secure alternative that enables organizations to fully utilize their existing hardware investments.
The remainder of this paper is structured as follows:
Section 2 reviews related technologies and services, highlighting their roles within the proposed system.
Section 3 details the system architecture, and
Section 3.1 describes the main cluster components.
Section 3.2 discusses the network requirement for the system. The software used for creating CommC is analyzed in
Section 4 while the physical implementation of the cluster, supported applications, use cases, and cost analysis are described in
Section 5 and
Section 5.2. Finally, conclusions, limitations, and future directions are outlined in
Section 6.
3. System Architecture
The high-level architecture of our proposed system is illustrated in
Figure 1. Our cluster is composed of multiple physical nodes that are equipped with a similar set of components, and they collectively form a cohesive virtualized environment managed through a centralized cluster management layer.
The Virtualization Cluster is represented at the top, symbolizing the unified virtualized environment that spans across all physical nodes. This centralized layer ensures that all nodes work in harmony, managing resources and Virtual Machines (VMs) across the entire cluster. By linking multiple nodes together, the cluster provides redundancy, load-balancing, and scalability, allowing the virtualized environment to handle varying workloads and ensure high availability. In its current state, CommC operates under a hierarchical resource management structure. At a higher level, an administrator is responsible for allocating the necessary resources for each virtual machine, similar to how resource management is handled in cloud-based environments. The actual distribution of tasks and resource assignment within the system is managed by the cluster orchestrator, which, in this case, is Kubernetes.
Each physical node contains three main components:
- 1.
GUI (in yellow): This provides a graphical user interface for interacting with the node locally. The GUI offers a user-friendly way to interact with the virtual machines running on the cluster.
- 2.
Cluster Manager (in blue): This component handles the orchestration of resources within the node and coordinates with the other nodes in the cluster. The Cluster Manager communicates with the centralized Virtualization Cluster layer, ensuring that resources are allocated effectively and that VMs can be managed across nodes.
- 3.
Virtual Machines (in green): These are the actual virtualized instances running on each physical node. VMs are isolated environments that simulate separate operating systems, allowing multiple applications to run concurrently on a single physical server. The hypervisor on each node allocates resources to these VMs and ensures their isolation from each other.
At the bottom of the diagram, the Kubernetes icon indicates that some of the VMs within the cluster are running a separate Kubernetes cluster. This setup enables Kubernetes to orchestrate containerized applications on top of virtualized infrastructure, providing an additional layer of flexibility. By running Kubernetes within VMs, the architecture can support containerized workloads alongside traditional virtualized applications, enabling a hybrid environment within each physical node.
In summary, each node is equipped with local management and monitoring capabilities, and virtual machines provide isolated environments for running both standard applications and a Kubernetes cluster.
3.1. Cluster Node Stack
Figure 2 presents a comprehensive architecture of the main interconnected components that comprise a single node in our cluster design. The architecture is organized into three main sections: the
Local User’s GUI, the
Virtualization Stack, and the
Operating System, all unified under the
Cluster Node.
On the left, the Local User’s GUI serves as the interface through which users seamlessly interact with the virtual environment. Key components, including the Controller and VNC Client within the VM Viewer, provide direct control and monitoring capabilities for virtual machines, enabling local users to manage the VMs intuitively and effectively. A Desktop Environment is required for these components to function properly.
At the heart of the diagram, the Virtualization Stack drives the entire virtualization process. A Web Application layer at the top enables remote access and management, offering a centralized web-based interface for streamlined control. Directly beneath, the Virtualization Management layer orchestrates the entire virtual infrastructure, overseeing the deployment, monitoring, and administration of both virtual machines and containers. This centralized management ensures cohesive and efficient control over all virtualized resources and provides the functionality for connecting each node to the Virtualization Cluster.
Within the Virtualization Stack, the Hypervisor and OS Virtualization components represent the two main virtualization technologies powering the system. The Hypervisor facilitates the creation of Virtual Machines (VMs), each operating as an independent, fully isolated environment with its own dedicated resources and operating system. This design is ideal for use cases requiring high levels of isolation and security. In contrast, OS Virtualization supports Containers, which are lightweight and efficient and share the host OS kernel while remaining securely isolated from one another. Containers are well-suited for rapid scaling and microservices architectures, allowing for agile and resource-efficient deployments.
The Operating System layer at the base underpins the entire setup, providing essential resources such as CPU, memory, and storage for both VMs and containers. As the foundation for the Virtualization Stack, it establishes a stable and reliable environment that enables seamless operation and scalability across all virtualized workloads.
Overall, this cluster node architecture provides a robust, flexible framework that integrates user-friendly interfaces, powerful virtualization management, and efficient resource allocation. Its dual support for VMs and containers ensures that diverse workload requirements are met with precision and agility, making it an ideal solution for modern cloud, data center, and enterprise environments.
3.2. Network Schema
The network configuration requirements for the proposed setup are quite complex and intricate. The users’ virtual machines should remain isolated, whereas those that form the Kubernetes cluster must communicate with each other. Every virtual machine should be able to access the internet but should not be directly accessible from it. However, a selected few services must be exposed to the World Wide Web; otherwise, the system’s functionality will be rather limited. To satisfy all those needs, we came up with the architecture that is demonstrated in
Figure 3.
This network architecture is designed specifically to combine traditional virtual machines with containerized workloads, such as a hybrid infrastructure running both simple VMs and Kubernetes clusters.
At the edge of the network, traffic from the internet passes through a reverse proxy, which intermediates requests and forwards them to appropriate resources within the cluster. A gateway sits between the reverse proxy and the cluster, handling the routing to ensure that external traffic is appropriately directed to internal systems.
Within the cluster, each node contains both physical and virtual networking layers. The physical network interface card (NIC) provides the hardware connection between the node and the broader network, and virtual network interfaces enable communication within the node and across the cluster. Two main types of virtual interfaces are used: NAT interfaces and VXLAN interfaces. NAT interfaces allow internal workloads, such as virtual machines and containers, to access external networks while masking their internal IP addresses, providing security, simplifying routing, and saving static IPs. VXLAN interfaces create an overlay network, allowing seamless communication between workloads across different nodes as if they were on the same local network.
Inside each cluster node, a mix of virtualized environments operates, including simple virtual machines and Kubernetes-managed nodes that host containerized applications or pods. These workloads rely on the virtual network interfaces to interact both within the node and across the cluster. By combining these networking layers, the system provides a secure, flexible, and scalable environment for managing hybrid workloads that include both traditional VMs and containerized applications. This configuration enables smooth communication between distributed components while maintaining efficient and secure access to external networks.
3.3. Security
A publicly accessible cluster requires stringent measures to minimize its attack surface and protect against unauthorized access. In the configuration that we propose, the firewall is set to allow only a small number of ports, which have been intentionally randomized, rather than using the default well-known ports. SSH access via the root account is completely disabled, and VNC functionality is restricted to the local network to reduce external visibility. All internet-bound traffic is routed through Cloudflare, which serves as a reverse proxy and secures data transmissions with SSL encryption. Furthermore, user accounts are grouped according to role-based permissions, ensuring that privileges are granted only as needed and preventing inadvertent or malicious misuse. Multi-tenancy challenges are managed by allocating separate virtual machines with dedicated storage to each user, thereby containing any potential security incidents within isolated environments.
4. Technologies and Services
4.1. Proxmox Virtual Environment (Proxmox VE)
Proxmox Virtual Environment (Proxmox VE) [
26] is an open-source platform for efficiently managing virtual machines (VMs) and containers. Built on Debian Linux, it integrates Kernel-based Virtual Machine (KVM) for full virtualization and Linux Containers (LXC) for lightweight containerization, providing flexibility to handle diverse workloads. Proxmox VE supports both single-node setups and scalable cluster deployments, making it suitable for various infrastructure requirements.
The platform features a centralized management interface accessible through a web-based graphical user interface (GUI), which simplifies tasks such as live migration, backups, and high-availability configurations. Advanced users can utilize a RESTful API, a command-line interface, and role-based access control for more granular management. The GUI is designed for ease of use, featuring a resource tree, real-time log monitoring, and secure SSL-encrypted access. It also supports multi-factor authentication and integrates with authentication sources like LDAP and Microsoft Active Directory.
Proxmox VE offers a flexible storage model, supporting both local and shared storage solutions such as NFS, SAN, Ceph RBD, and ZFS. This flexibility enables live migration and efficient management of storage resources across cluster nodes. An integrated backup tool, vzdump, is included to create consistent snapshots of running VMs and containers, ensuring data integrity and simplifying restoration processes.
In comparison with other similar systems, like OpenStack, Proxmox is a more lightweight and straightforward platform, making it well-suited for labs or small-to-medium enterprises (SMEs), where simplicity and ease of setup are key. Its intuitive web interface, which bundles virtualization (KVM) and containers (LXC) into one cohesive system, minimizes the need for multiple specialized services and extensive configuration. This makes Proxmox faster to deploy and easier to maintain, especially for teams with limited time or resources. In contrast, OpenStack is a full-fledged private cloud solution that offers a rich ecosystem of components for large-scale, multi-tenant environments. However, it is typically more complex to install, operate, and scale, requiring specialized expertise. For lab environments or SMEs that value a more direct, out-of-the-box virtualization solution without extensive overhead, Proxmox’s simplicity, resource efficiency, and robust feature set present a compelling choice.
4.2. Docker
Docker [
27] revolutionized containerization by making it accessible and user-friendly. It packages applications and their dependencies into standardized containers, ensuring consistency across development, testing, and production environments. Docker’s integration with tools like Docker Hub and orchestration platforms like Kubernetes has cemented its position as a cornerstone of modern DevOps practices. Its focus on portability, scalability, and efficiency makes Docker essential for cloud-native application development.
4.3. Kubernetes (K8s)
Kubernetes [
9] is a robust orchestration platform designed to manage, scale, and automate containerized applications. It organizes containers into pods, enabling seamless communication and scaling across clusters. Kubernetes’ features, such as rolling updates, self-healing, and persistent storage integration, have made it the industry standard for container orchestration in cloud-native and distributed systems. While its complexity presents a learning curve, Kubernetes remains essential for managing complex, large-scale applications.
4.4. Desktop Environments
Desktop environments offer a graphical workspace that abstracts the complexities of command-line interfaces, providing users with a visually appealing and customizable experience. They are designed to cater to diverse needs, from resource-efficient options like Xfce to more feature-rich and visually intensive alternatives. These environments support extensive customization, enabling users to personalize layouts, themes, and workflows, and they typically come with integrated tools like file managers and system settings to streamline usage.
Xfce is a lightweight desktop environment known for its balance of performance, functionality, and resource efficiency. Built on the GTK toolkit, Xfce is modular in design, allowing users to replace or customize components. Its low system requirements make it ideal for older hardware or resource-constrained environments. Xfce supports extensive theming and customization, offering a user-friendly and stable desktop experience suitable for both personal and professional use.
4.5. Graphical Desktop-Sharing Systems
Graphical desktop-sharing systems enable remote access to and control of computers over networks, facilitating tasks like troubleshooting, collaboration, and telecommuting. These systems use a client-server model, with protocols such as Remote Desktop Protocol (RDP) and Virtual Network Computing (VNC) enabling secure and seamless interactions. Desktop-sharing technologies are widely adopted in modern hybrid work environments, offering real-time collaboration and remote system management.
VNC is a cross-platform remote desktop protocol that uses the Remote Framebuffer Protocol (RFB) found in [
28] to share graphical desktops between devices. Its simplicity and compatibility across operating systems make it a popular choice for remote access. VNC operates via a client-server model, capturing the host desktop environment and transmitting it to a remote client, and bandwidth-intensive advancements like compression and adaptive quality have improved its performance. VNC is widely used in IT support, education, and system administration.
TigerVNC Client [
29] is high-performing and uses secure software to access remote desktops. It supports modern graphics capabilities, high-resolution displays, and encryption protocols like TLS, ensuring smooth and secure remote desktop connections. Its cross-platform compatibility and intuitive interface make it an efficient tool for remote work, system management, and troubleshooting, catering to both novice and advanced users.
4.6. Cloudflare
Cloudflare is a prominent web infrastructure and security platform that functions as a powerful reverse proxy server. By sitting between client requests and backend servers, Cloudflare intercepts, optimizes, and secures web traffic while protecting backend servers from direct exposure to the internet. This intermediary role enables Cloudflare to deliver enhanced performance, robust security, and flexibility in managing web applications.
Cloudflare secures web traffic with signed SSL/TLS certificates, offering end-to-end encryption to protect data from eavesdropping and tampering. By terminating SSL connections at the edge of its global network, Cloudflare reduces the computational load on backend servers. It supports various SSL configurations, such as flexible SSL for basic encryption and strict SSL for maximum security, catering to diverse business needs.
Cloudflare also provides extensive customization options through its rules engine. Users can create firewall rules to block malicious traffic, implement rate-limiting to prevent abuse, and use page rules to optimize performance or redirect visitors. Advanced features like Workers and Transform Rules enable scripting and dynamic customization, allowing businesses to deploy serverless applications, rewrite requests, and adapt content on the fly. These capabilities make Cloudflare a versatile and essential tool for securing and optimizing web applications, enhancing user experiences, and simplifying traffic management.
5. Applications, Use Cases, and Cost Analysis
The deployment of a cluster of commodity hardware provides an affordable and scalable solution to run various workloads, including hosted services, machine learning experiments, and big data processing. This section highlights the primary applications and use cases supported by the cluster, which demonstrate its versatility and resource efficiency. In addition, a cost analysis will be conducted to evaluate the economic benefits of utilizing commodity hardware, emphasizing its cost-effectiveness compared to traditional high-end computing solutions.
We implemented our proposed architecture in a research laboratory at the University of Patras equipped with 16 personal computers (PCs), each powered by an Intel
® Core™ i7-10700 CPU @ 2.90 GHz processor, except for two systems that are equipped with an intel i7-3770K processor at 2.50 GHz and one that uses an Intel
® Core™ i7-4770 @ 3.40 GHz. Fifteen of these PCs are configured with 16 GB of memory, and one has 24 GB. In terms of storage, the systems are equipped with 512 GB Solid State Drives (SSDs). Detailed specifications are shown in
Table 1. Additionally, the three systems with different processors feature RAM and storage components of varying speeds and manufacturers, allowing us to evaluate CommC’s performance under heterogeneous hardware conditions. The first version of our system is publicly available on GitHub and can be accessed using the
GitHub link, accompanied by a detailed installation tutorial and troubleshooting documentation.
5.1. Application and Use Cases
The cluster serves as a backbone for various hosted services that support research, collaboration, and operational needs. These services include APIs, web hosting, and collaborative platforms, which leverage the cluster’s distributed architecture for reliability and performance. DiscMycoVir is an elegant and user-friendly web platform for biological analysis, emphasizing a comprehensive and detailed result-reporting user experience. The Overleaf Server is a collaborative LaTeX editing platform within a Kubernetes environment tested specifically on a MicroK8s cluster, utilizing NFS for persistent storage.
Table 2 summarizes the hosted services and their respective types.
The cluster is extensively utilized for Apache Spark experiments, enabling distributed computing for tasks such as natural language processing, graph analytics, and community detection. These experiments make full use of the cluster’s computational and memory resources, as shown in
Table 3. It should be noted here that the RAM shown in
Table 3 expresses the memory usage for each experiment and does not account for the necessary RAM that the operating system and the containers need. Therefore, an overhead of 10–20% exists, which our system is more than capable of handling.
The SparkNLP experiment leverages Spark for natural language processing tasks. It utilized seven nodes, consuming 28 CPUs and 68 GB of RAM over a total duration estimate of 80 h. Neural Networks with Torch Distributor integrates Spark with TorchDistributor to run distributed neural networks. The resources for the experiment were four nodes, 16 CPUs, and 32 GB of RAM over 80 h. The Spark Bayesian Networks [
31] experiment is focused on implementing Bayesian networks in Spark. It used seven nodes, consuming 28 CPUs and 68 GB of RAM over 40 h.
The next series of experiments form part of ongoing research on graph analytics, with objectives like counting triangles in graph communities [
32]. The algorithms under evaluation were implemented in three phases. The first one utilized nine nodes, 36 CPUs, and 104 GB of RAM, running for a total of 192 h. The second and third phases have a broader aim to detect community structures within graphs, and the resource usage was adjusted accordingly. The second phase was executed on 13 nodes with 28 CPUs and 80 GB of RAM for 144 h, and the third one required significant resources, with 11 nodes, 44 CPUs, and 148 GB of RAM, making it the most resource-intensive experiment, lasting for 240 h.
5.2. Cost Analysis
In the previous subsection, several applications of the cluster were presented, showcasing its versatility and utility. In this subsection, we will focus on conducting a detailed cost analysis by estimating the expenses associated with running the same experiments on prominent cloud platforms, namely Google Cloud, Amazon Web Services (AWS), and Microsoft Azure. This analysis aims to compare the resource requirements of each experiment and their corresponding costs on these industry-leading platforms, providing insights into the economic efficiency of utilizing a commodity hardware cluster versus well-established cloud services.
The analysis will focus on three key aspects: computational resources, storage, and the network. The computational resources aspect includes the number of required instances, the core count, and the RAM necessary for each experiment. Regarding storage, each experiment was allocated 512 GB, and this will be factored into the cost analysis for each cloud platform. However, network traffic costs—both inbound and outbound—will be excluded from the analysis, along with any additional costs associated with disk read and write operations enforced by certain services. This simplification will result in a slightly lower overall estimated cost for the experiments, but it ensures a more streamlined and focused cost comparison.
Each cloud platform provides a cost estimator tool that allows users to estimate the necessary funds for their system based on the specific requirements of an experiment. However, this approach often results in a rough estimation of the cost, or a “guestimate”, due to variations in how each service calculates expenses. For instance, differences in pricing models, regional availability, instance types, and additional fees for features, such as data transfer or specialized hardware, can cause discrepancies in the estimations. As a result, although these tools offer a useful starting point, they may not always reflect the precise costs incurred during actual usage. Therefore, we will analyze each axis separately.
Firstly, for each cloud service, we selected the appropriate configuration and virtual machine (VM) type based on the resource requirements of each experiment.
Table 4 presents the chosen specifications for each platform. It is evident that, in some cases, the RAM requirements slightly exceed the experiment’s actual needs. However, we opted for the configuration that most closely aligned with the resource demands of the experiments while ensuring the necessary computational capacity was met. This pragmatic approach balances cost efficiency and functionality, leveraging the most suitable VM types available on each platform.
Table 4 provides details about the cost rate per hour of usage for each selected configuration. It is important to note that these rates may vary depending on the date of access or the user’s geographic location. For this analysis, the cost rates were retrieved on 22 January 2025, from Greece, with the servers hosted in the Central Europe region. These factors may slightly influence the overall cost and should be taken into consideration when interpreting the results.
The costs of each experiment regarding the computational aspect were calculated as shown in Equation (
1). The results are depicted in
Table 5. The table summarizes the estimated costs for running the experiments on three major cloud platforms: Google Cloud, Azure, and AWS. Each experiment is listed, along with its corresponding cost on each platform, calculated based on the VM configurations and usage hours. The total cost for all experiments is also provided, highlighting the differences in cost efficiency among the platforms.
The aforementioned cost is considered the monthly cost of the service used to conduct each experiment. Each service offers various storage options. However, we chose a persistent storage SSD of 512 GB, with the cost depicted in the last row of
Table 5.
Our cluster comprises 16 computers, each with a maximum power supply of 200 watts. This results in a total energy consumption of 3.2 kWh for the entire cluster per hour. In a worst-case scenario, where all systems operate at full capacity continuously for 24 h a day over a month (720 h), the total cost would amount to €, where is the average cost of 1 kWh in Greece. This duration is more than sufficient to complete the required experiments, and the overall cost remains significantly lower compared to alternative cloud platforms.
If commodity hardware were not utilized, the alternative approach would require the acquisition of pre-configured enterprise systems from major vendors, which comes with significantly higher costs. Systems such as the Dell PowerEdge R7525, HPE ProLiant DL385 Gen10 Plus, Lenovo ThinkSystem SR645, and Supermicro A+ Server offer high-performance configurations, typically featuring 128 to 240 threads, 256–512 GB RAM, and 8–16 TB of SSD storage. However, these solutions come at a steep price, ranging from EUR 15,000 to over EUR 30,000, depending on CPU configurations, memory capacity, storage options, and vendor support packages. For instance, a Dell PowerEdge R7525 with 128 cores, 256 GB RAM, and 8 TB NVMe storage can cost between EUR 18,000 and EUR 25,000, while an HPE ProLiant DL385 Gen10 Plus with similar specifications ranges from EUR 20,000 to EUR 30,000. These prices vary further based on warranties, performance tiers, and additional enterprise support options. In contrast, CommC leverages existing commodity hardware, significantly reducing upfront investment costs while providing a scalable and flexible computing environment. This cost-effective approach makes high-performance computing more accessible to organizations that may otherwise struggle with the financial burden of traditional enterprise solutions.
Discussing the performance of our design among popular cloud alternatives is a difficult task. The reason that gave birth to the proposed system design, apart from sustainability, was the high cost one must face in order to take advantage of such structures. However, when observing the hardware specifications of our implemented cluster and the alternatives in
Table 4, one would expect that our approach should outperform Azure and Amazon Web Services due to higher CPU clock speed, and Google Cloud would be the best performer regarding tackling the tasks shown in
Table 3. This statement by itself might be wrong. There are multiple factors that affect the performance of such systems, such as disk read/write speeds and network transfer speeds. In this study, we chose that each design would contain similar a SSD disk, in order to minimize the speed difference and the cost of the alternatives. Finally, in terms of networking, our commodity hardware is connected to a local network, which can sometimes be slower than virtual machines running on a single physical system. However, cloud providers do not guarantee that their virtual machines will be hosted on the same physical machine. They offer network plans for a fixed fee or based on the amount of transferred data, which will further elevate the cost. To sum up, this study does not attempt to compare the performance of our design versus cloud alternatives but offers a sustainable, highly customizable solution that can satisfy the needs of at least medium-sized organizations and enterprises.
6. Conclusions and Future Work
The proposed CommC system effectively demonstrates the potential of repurposing commodity hardware into a high-performing, multi-purpose cluster. By integrating advanced virtualization and containerization technologies, including KVM, Docker, and Kubernetes, CommC provides a scalable, flexible, and cost-efficient solution tailored to diverse workload demands. This hybrid architecture optimizes underutilized hardware and significantly reduces operational costs while offering enhanced fault tolerance and security.
The use of commodity hardware extends the lifecycle of existing systems, contributing to sustainability by addressing the growing concern about E-waste. Through strategic resource allocation and innovative management approaches, CommC empowers medium-sized enterprises and educational institutions to adopt high-performance computing solutions without incurring prohibitive costs.
However, we were unable to find any recent (2024–2025) peer-reviewed publications specifically addressing the energy consumption and environmental impact of cloud providers. Many articles [
33,
34,
35] exist, revealing the fact that over the past five years, the operation of data centers by major tech companies has led to more than USD 5.4 billion in public health costs due to air pollution. In 2023 alone, these costs were estimated at USD 1.5 billion, marking a 20% increase from the previous year. Google’s data centers were identified as the largest contributors, accounting for USD 2.6 billion in health-related expenses between 2019 and 2023. The energy consumption of these centers, often reliant on fossil fuels, has been linked to air pollution’s contribution to ailments such as cancer and asthma.
Furthermore, popular cloud service providers do not offer a carbon emission estimator based on a chosen configuration, making it challenging to accurately assess the environmental impact of different setups. As a result, a direct comparison of energy consumption between our system and cloud-based alternatives remains difficult without standardized emission metrics. However, one of the key sustainability advantages of our approach is that it extends the lifespan of existing hardware, preventing it from being discarded as electronic waste. By repurposing commodity hardware instead of decommissioning it, our system reduces E-waste accumulation and contributes to a more sustainable computing model.
The CommC system offers a cost-effective and scalable solution for leveraging commodity hardware, but it is not without its limitations. One significant challenge lies in the complexity of maintaining the architecture. The system integrates multiple advanced technologies, including KVM, Kubernetes, Docker, and Proxmox, which, while offering flexibility and functionality, introduce a level of operational complexity. Organizations with limited technical expertise or small IT teams may find it difficult to manage and troubleshoot such a setup. The steep learning curve associated with configuring and maintaining these tools could require considerable time and effort, potentially offsetting the system’s benefits.
Another limitation of CommC that is related to the underlying systems, mainly Proxmox, is the number of nodes that can be added to a single cluster. Proxmox relies on Corosync for synchronization, which can introduce overhead when achieving consensus. According to Proxmox’s official documentation, “There is no explicit limit for the number of nodes in a cluster. In practice, the actual possible node count may be limited by the host and network performance. Currently (2021), there are reports of clusters (using high-end enterprise hardware) with over 50 nodes in production”. However, other accounts indicate that Corosync may exhibit communication issues in clusters with more than 30 nodes. The primary bottleneck and root of these types of problems lies in the computers’ network interface controllers, which, unfortunately, cannot be remedied without a hardware upgrade. Our own deployment, consisting of 16 physical nodes, has not encountered any noticeable delays or discrepancies.
The cost analysis presented in this paper highlights the significant financial advantages of the CommC system over cloud services, even when factoring in additional costs, such as operational expenses and hardware failures. While the comparison primarily focuses on initial expenses and general operational costs, it is important to note that long-term factors like power consumption, cooling, periodic hardware upgrades, and occasional hardware failures do not substantially diminish the cost-efficiency of the CommC system. Even when accounting for these additional expenses, the overall affordability remains superior to cloud-based alternatives, where recurring costs such as subscription fees and data transfer charges often accumulate rapidly over time. Furthermore, the flexibility to avoid vendor lock-in and the ability to use open-source tools further enhance the system’s financial viability. By carefully managing operational expenses and leveraging commodity hardware, the CommC system provides a robust and cost-effective alternative to traditional cloud services, even when considering the potential for additional costs.
In the future, we aim to enhance the CommC system by focusing on two primary areas of improvement. First, we aim to enhance resource allocation through intelligent automation powered by machine learning algorithms. These algorithms will dynamically analyze workload patterns to optimize resource distribution, ensuring higher efficiency and improved utilization of the underlying hardware. This approach will minimize the need for manual intervention, making the system more adaptable to varying demands. Additionally, we plan to develop an API that securely integrates with the laboratory’s Proxmox cluster to automate processes, generate telemetry, and efficiently manage resources for multiple users. Furthermore, we intend to implement a machine learning model capable of predicting future resource utilization based on historical usage trends, leveraging auto-regressive and moving average techniques to enhance forecasting accuracy.
Second, we intend to develop a user-friendly platform for monitoring and accessing the various services hosted within the CommC environment. This platform would simplify administrative tasks by providing an intuitive interface for visualizing resource usage, managing services, and configuring system parameters. It would also enhance the user experience by allowing seamless access to hosted services, thereby lowering the technical barrier for end-users. These enhancements are expected to improve the scalability, usability, and overall efficiency of the CommC system, paving the way for broader adoption across diverse organizational settings.