Enhancing DevOps Practices in the IoT–Edge–Cloud Continuum: Architecture, Integration, and Software Orchestration Demonstrated in the COGNIFOG Framework

Petrakis, Kostas; Agorogiannis, Evangelos; Antonopoulos, Grigorios; Anagnostopoulos, Themistoklis; Grigoropoulos, Nasos; Veroni, Eleni; Berne, Alexandre; Azaiez, Selma; Benomar, Zakaria; Kakoulidis, Harry; Prasinos, Marios; Sotiriades, Philippos; Mavrothalassitis, Panagiotis; Alexopoulos, Kosmas

doi:10.3390/software4020010

Open AccessArticle

Enhancing DevOps Practices in the IoT–Edge–Cloud Continuum: Architecture, Integration, and Software Orchestration Demonstrated in the COGNIFOG Framework

by

Kostas Petrakis

¹

,

Evangelos Agorogiannis

¹,

Grigorios Antonopoulos

^1,*

,

Themistoklis Anagnostopoulos

¹

,

Nasos Grigoropoulos

¹

,

Eleni Veroni

¹

,

Alexandre Berne

²

,

Selma Azaiez

²

,

Zakaria Benomar

³

,

Harry Kakoulidis

⁴,

Marios Prasinos

⁴

,

Philippos Sotiriades

⁴,

Panagiotis Mavrothalassitis

⁵ and

Kosmas Alexopoulos

^5,6

¹

Netcompany-Intrasoft S.A., 2b, rue Nicolas Bové, L-1253 Luxembourg, Luxembourg

²

Commissariat à l’Énergie Atomique et aux Énergies Alternatives, Laboratoire d’Intégration de Systèmes et des Technologies (CEA LIST), Université Paris-Saclay, F-91120 Palaiseau, France

³

Thales Research & Technology, Thales Group, 1 Avenue Augustin Fresnel, F-91767 Palaiseau, France

⁴

Telematic Medical Applications, Skra 1-3, 17673 Kallithea, Greece

⁵

Laboratory for Manufacturing Systems & Automation (LMS), Mechanical Engineering & Aeronautics Department (MEAD), University Campus Rio Patras, University of Patras, 26504 Patras, Greece

⁶

Department of Digital Industry Technologies, National and Kapodistrian University of Athens, Euripos Complex, 34400 Psachna, Greece

^*

Author to whom correspondence should be addressed.

Software 2025, 4(2), 10; https://doi.org/10.3390/software4020010

Submission received: 25 February 2025 / Revised: 4 April 2025 / Accepted: 11 April 2025 / Published: 15 April 2025

Download

Browse Figures

Versions Notes

Abstract

This paper presents COGNIFOG, an innovative framework under development that is designed to leverage decentralized decision-making, machine learning, and distributed computing to enable autonomous operation, adaptability, and scalability across the IoT–edge–cloud continuum. The work emphasizes Continuous Integration/Continuous Deployment (CI/CD) practices, development, and versatile integration infrastructures. The described methodology ensures efficient, reliable, and seamless integration of the framework, offering valuable insights into integration design, data flow, and the incorporation of cutting-edge technologies. Through three real-world trials in smart cities, e-health, and smart manufacturing and the development of a comprehensive QuickStart Guide for deployment, this work highlights the efficiency and adaptability of the COGNIFOG platform, presenting a robust solution for addressing the complexities of next-generation computing environments.

Keywords:

IoT–edge–cloud continuum; service orchestration; cognitive connectivity; fog computing; edge computing; IoT; DevOps; CI/CD

1. Introduction

1.1. CI/CD Practices in the IoT–Edge–Cloud Continuum

In the constantly evolving technological landscape, the widespread adoption of Artificial Intelligence (AI), the Internet of Things (IoT), and cloud computing has created significant opportunities and challenges in the IT industry. A primary concern is managing the vast amounts of data generated by these kinds of systems. In 2020, approximately 64 trillion gigabytes of data were created, duplicated, and consumed globally [1]. This volume continues to grow with the increasing integration of IoT services in sectors such as transportation, healthcare, agriculture, and industrial automation. By 2022, there were 13.2 billion active IoT devices worldwide, a number projected to scale up and reach 34.4 billion by 2032 [2].

Traditional IoT architectures, which heavily rely on centralized cloud computing for data processing and analytics, face several limitations. While being straightforward in design, these architectures struggle with inefficiencies in latency, network traffic, computational processing, and energy consumption [3]. Additionally, centralized systems often fail to meet the stringent and prompt demands of time-sensitive and real-time applications, where delays in transferring data to the cloud and back compromise performance and on-time, prompt, and efficient processing [4]. Efficiently and flawlessly managing the entire lifecycle of an IoT ecosystem, including the development, deployment, and operation of its applications, is a complex and demanding task.

Modern software development and deployment practices have evolved to adopt DevOps approaches, especially continuous integration/continuous delivery (CI/CD), to overcome these challenges.

These practices enable frequent code integration, automated testing, and continuous deployment, ensuring the smooth integration of components while maintaining system reliability. In the context of IoT–edge cloud environments, CI/CD practices are particularly crucial as they help orchestrate those distributed systems while ensuring consistent quality and rapid delivery [5,6].

Edge computing reduces the quantity of data transferred to the cloud by processing data closer to its source using methods including data fusion, trend analysis, and partial decision-making. The problems of network traffic, bandwidth usage, latency, and energy consumption are lessened by this method. Also, by retaining local resources available during major events on cloud infrastructure, like cyberattacks, edge computing enhances security and resilience. By adding a layer of distributed resources between edge devices and the cloud, fog computing—first presented by Cisco Systems in 2012 [7]—expands this approach.

The IoT–edge–cloud ecosystem represents a paradigm shift in distributed computing, where workloads are dynamically deployed across cloud, fog, and edge environments to optimize resource utilization and minimize latency [8]. Given the complexity of such environments, robust orchestration frameworks like Kubernetes and automated CI/CD pipelines play a pivotal role in ensuring scalable and efficient deployment.

Together, these architectures form the IoT–edge–cloud continuum, an ecosystem that integrates powerful cloud data centers with energy-efficient, low-latency, low-resources devices close to data sources [9]. This continuum is ideal for AI-driven workloads, where IoT devices generate and collect raw data and the cloud handles data-intensive analytics, with edge servers acting as an intermediate layer. However, orchestrating this heterogeneous ecosystem presents challenges, including the dynamic management of distributed computing, storage, and network resources. Effective orchestration is critical to reducing energy consumption, latency, and message congestion while ensuring secure and efficient data handling in large-scale IoT scenarios.

The COGNIFOG project aims to build an open-source and modular framework for next-generation information systems that spans from IoT devices to the edge and to the cloud. It provides a secure Cognitive Fog environment with dynamic resources and services orchestration capabilities, real-time scalable monitoring, and AI-based analytical services to ensure adaptability, dependability, scalability, and energy efficiency. It is currently under development and is named after the Horizon Europe project COGNIFOG, which will be completed by the end of 2025 [10].

1.2. Orchestration Challenges

Fog computing extends cloud resources to the edge, reducing latency and enhancing proximity to user devices. Unlike cloud infrastructure, fog and edge environments often rely on resource-constrained devices, making traditional virtual machines (VMs) less suitable. Container-based virtualization offers a lightweight, secure, and portable alternative, driven by the adoption of microservices, which decompose applications into smaller, independently managed components. Containers offer superior scalability, flexibility, and agility compared to VMs, making them ideal for modern distributed systems [11].

In dynamic microservices platforms, containers are organized into clusters consisting of a control plane and compute nodes, which can be physical or virtual and located on-premises or in the cloud. The control plane manages the desired state of the cluster, while the nodes execute applications and workloads. As container adoption grows, organizations are challenged to meet service-level agreement (SLA) and quality of service (QoS) requirements. Advanced orchestration solutions are essential to autonomously provide, deploy, scale, and secure containerized applications, driving the shift to fully self-healing systems.

Within the IoT–edge–cloud continuum, maintaining application portability, scalability, and resiliency becomes increasingly challenging due to the geographically distributed and heterogeneous nature of clusters, often complicated by unstable connectivity and node availability [12]. Multi-cluster orchestration addresses these complexities by managing applications and services across multiple clusters, whether on premises, in hybrid environments, or in multi-cloud environments. This approach aims to simplify operations, increase scalability, improve resiliency, and strengthen security and compliance.

Some multi-cluster orchestration solutions, such as Rancher Fleet [13], Open Cluster Management [14], and OpenShift [15], are currently used in the continuum to achieve comprehensive communication and management of resources [16]. Still, they often do not manage to deal with common communication, networking, high availability, and service discovery issues because they are not fully designed for the specifics of fog–edge computing [17].

1.3. Resource and Availability Challenges

Resource allocation within the IoT–edge–cloud continuum presents significant challenges, as described in Section 1.2, complicating effective resource distribution across geographically dispersed clusters. The dynamic nature of workloads and the mobility of edge devices further exacerbate resource provisioning complexities, necessitating advanced orchestration techniques capable of real-time demand balancing across multiple clusters. Ineffective resource management can lead to suboptimal utilization, increased latency, and degraded service quality [18].

Achieving energy efficiency while maintaining high performance across the continuum is another critical challenge. Edge and fog devices often operate under stringent energy constraints, and inefficient orchestration can result in unnecessary energy consumption. In large-scale deployments, cumulative energy usage can escalate significantly, underscoring the need for effective energy management strategies. Techniques such as dynamic resource scaling and adaptive workload distribution are essential to optimize energy usage while meeting the performance and quality requirements of time-sensitive applications [19].

Deployment and availability time also pose significant barriers in the IoT–edge–cloud ecosystem. Continuous deployment pipelines and multi-cluster orchestration tools aim to streamline application delivery, but their effectiveness is often limited by unstable connectivity and varying node availability across distributed environments. Ensuring high availability and minimal downtime is particularly challenging in fog and edge computing, where devices are susceptible to failures, disconnections, or performance degradation. Current orchestration tools, while helpful, often fall short in addressing the unique demands of fog and edge environments, such as service discovery and network reliability. Novel solutions are needed to enhance the robustness and responsiveness of deployment processes while minimizing disruptions to service availability [20].

2. Solution

2.1. Overall COGNIFOG Framework (For IoT–Edge–Cloud Continuum)

The COGNIFOG framework presented in this paper represents an innovative cognitive fog solution tailored to the challenges of next-generation computing, with a particular emphasis on enabling efficient CI/CD processes and enhancing DevOps operations that facilitate the execution of relevant applications. This framework is designed to orchestrate diverse distributed computing resources across IoT, edge, and cloud ecosystems, creating a unified fog continuum that supports the dynamic needs of modern software development and deployment pipelines. COGNIFOG focuses on the following:

Reducing operational costs and accelerating service delivery by dynamically provisioning computing, storage, and network resources with minimal human intervention, aligning with the automation goals of DevOps practices.
Enabling the rapid development, testing, and deployment of applications through open application programming interfaces (APIs), streamlining the CI/CD pipeline, and fostering agility in DevOps environments.

This paper emphasizes the critical role of the deployment and integration environment in achieving the goals of the COGNIFOG project. By using advanced automation and configuration management strategies, the framework ensures smooth integration and deployment processes across distributed systems. These practices not only reduce human intervention but also increase consistency, scalability, and adaptability in the management of complex infrastructures. Through efficient automation, the COGNIFOG framework significantly accelerates the realization of its goals and supports the rapid and reliable delivery of applications in a dynamic, resource-constrained ecosystem.

A high-level architecture diagram of the COGNIFOG framework is shown in Figure 1. A complete presentation of the overall COGNIFOG framework and software components is given in [21].

2.2. CI/CD Practices, Tools, and Platform

The Integration Platform of COGNIFOG is the software environment that integrates various services and applications executed within the project’s scope. It provides the necessary infrastructure and customized CI/CD tools to support the integration, testing, and delivery of the COGNIFOG services and applications developed as part of the project.

A suitable development environment was established to guide the development of the COGNIFOG platform, allowing for the continued automation and close monitoring of the development, testing, and integration processes. The primary objective of this activity is to set up, implement, and configure a CI/CD infrastructure to facilitate development workflows [22].

The conceptual approach of the development and deployment workflows is depicted in Figure 2. The software development lifecycle within a CI/CD pipeline begins with the developer pushing code to a designated branch in the Version Control System (VCS). This action triggers the Continuous Integration (CI) server, which activates deployment pipelines. The CI server subsequently builds the application and conducts unit tests to validate the implemented functionalities. Upon successful test execution, the application is deployed to the development server for further validation. The code then undergoes a vulnerability assessment to identify and mitigate potential security risks and weaknesses. Once the vulnerability checks are passed, the application is encapsulated into a container image to ensure consistency and portability. This container image is then stored in a centralized registry, which serves as a repository for managing and distributing service images. The CI server retrieves the container image from the registry, and the application is ultimately deployed within a containerized environment on the operational infrastructure, facilitating scalability, reliability, and resilience.

2.2.1. Deployment and Integration Environment

A collection of collaborative productivity tools is deployed to aid the automation of the software development, deployment, and integration processes. Those tools build the COGNIFOG continuous integration/continuous delivery (CI/CD) framework. Two methodologies are defined and recommended to be applied considering the services’ requirements, restrictions, and what fits better to each case, with the first one being a Docker-based methodology, where the services are deployed as Docker containers, and the other being a Kubernetes-based methodology, where the applications are deployed as pods. Well-known open-source tools are used for better cost-efficiency, interoperability, transparency, and flexibility to formulate this ecosystem as follows:

GitHub is used for source control and tracking, code repository, and code versioning.
Jenkins 2.470 is used for automated building, testing, and deployment.
Docker 27.0 and Docker Compose 2.26 are used for bundling developed services and components into containers using de facto standards.
SonarQube 10.6 is used for performing a static analysis of code to detect bugs, code smells, and security vulnerabilities.
Kubernetes 1.30 is used for orchestrating the deployed services.
Pods are used for deploying the implemented applications in Kubernetes.
Harbor 2.7 is used as a container registry for managing, storing, and distributing the produced Docker images.
Portainer 2.20 is used for providing logging and health monitoring information for the developed and deployed applications, offering easy Docker container management through a GUI.
Kubernetes Dashboard is used for monitoring and managing the running Kubernetes clusters and their elements.
Keycloak 21.0 is used for handling the access and identity management of the users.
Microsoft Teams 2024/Mail is used for receiving notifications of the whole CI/CD process.
Ansible 2.17 is used for automating the deployment of the CI/CD stack.

In terms of security, the CI/CD platform enables users to connect and access tools using GitHub authentication via Keycloak, which serves as the single sign-on (SSO) solution necessary for triggering deployment workflows exclusively by authenticated and authorized users.

Regarding both the Docker-based and Kubernetes-based solutions, there are two available workflows: the GitHub-based solution, where the source code of the application needs to be available to initiate the process, and the Harbor-based solution, where only the Docker image of the application is required. The appropriate procedure is selected and applied, considering any source-code-sharing security restrictions imposed by technical teams.

For the Docker-based methodology, in the GitHub-based approach, the following steps are required:

Initially, the developer, after being authenticated via Keycloak (Figure 3a), commits the code to the project’s source code repository on GitHub (Figure 3b).
Then, a webhook between GitHub and Jenkins is enabled (Figure 3c), triggering Jenkins’s automation processes.
In the next step, before the code is built, it undergoes a quality check using SonarQube for code examination (Figure 3d).
If the checks are successful, a new Docker image is created and then uploaded to the Harbor registry (Figure 3e).
After those steps, another webhook between Harbor and Jenkins triggers Jenkins’s deployment process (Figure 3f).
Finally, the Docker image is executed and deployed as a Docker Container on the development and testing server (Figure 3g) through Jenkins.

In the Harbor-based approach, the deployment process adheres to the following steps:

The developer, after being authenticated through Keycloak (Figure 4a), pushes a new Docker image directly to the Harbor container registry (Figure 4b).
The previous action triggers Jenkins’s automation processes via a webhook (Figure 4c) enabled between Harbor and Jenkins.
Afterwards, the Docker image is executed and deployed as a Docker Container on the development and testing server (Figure 4d).

The Kubernetes-based solution within the COGNIFOG framework is depicted in Figure 5 and Figure 6 and demonstrates the concrete steps that should be performed to deploy and run each component successfully as a Kubernetes Pod. Regarding the Kubernetes-based deployment process, a Kubernetes cluster runs on the integration environment, and specific steps are followed.

The GitHub-based approach is carried out with the following steps:

The developer, upon authenticating via Keycloak (Figure 5a), commits a new component code to GitHub (Figure 5b).
This action enables the GitHub-Jenkins webhook (Figure 5c) and triggers a Jenkins job.
The Jenkins pipeline executes a series of checks through SonarQube (Figure 5d).
Following successful validation, a Docker image is created and pushed to the Harbor registry (Figure 5e).
Then, a webhook between Jenkins and Kubernetes is triggered, and a new version of the component is deployed into the integration namespace of the same Kubernetes cluster (Figure 5f) to facilitate the automated functional and integration testing of the application.
If these tests pass, the updated component version is ready for deployment across the operational environment Kubernetes cluster (Figure 5g).

The Harbor-based approach is carried out with the following steps:

The procedure is initiated by the developer authenticating via Keycloak (Figure 6a) and pushing a new Docker image to the Harbor (Figure 6b).
The webhook between Harbor and Jenkins is activated (Figure 6c), and a Jenkins pipeline is enabled to trigger the deployment of the application through Kubernetes into the integration environment (Figure 6d), where the component tests are performed.
Similarly to the GitHub-based approach, if the checks are successful, the service can be deployed in the operational environment (Figure 6e).

As outlined above, and as will be shown in real-life deployments (Section 3), Kubernetes serves as the backbone for container orchestration in COGNIFOG, ensuring workload distribution across IoT–edge–cloud environments. Specifically for edge IoT devices, there are limitations in terms of resources and performance. Thus, the overall platform design oversees some lighter certified Kubernetes distributions that are built to support IoT and edge computing, like K3s [23] and KubeEdge [24]. This approach is considered the most suitable and efficient for IoT and edge cases by effectively addressing the constraints [25,26]. Additionally, the lightweight Kubernetes versions include easier installation steps with fewer dependencies and only the necessary features for orchestration. K3s and KubeEdge achieve optimized usage of resources at the edge with a low memory footprint under 100MB and require less than 512 MB of RAM, supporting natively ARM architectures. COGNIFOG’s orchestration platform includes the usage of K3s and KubeEdge for IoT devices, as presented in the trials in Section 3 that validate the overall solution.

Kubernetes, along with its lighter distributions, automates deployment, manages scaling policies, and optimizes resource allocation across clusters, enhancing reliability in dynamic edge environments. The integration of Kubernetes operators and Helm charts facilitates seamless multi-cluster management [27].

2.2.2. Automation and Configuration Management

The Kubernetes deployment applied for COGNIFOG is shown in Figure 7, and three distinct node types have been identified. These nodes collectively form the infrastructure of a Kubernetes cluster [28], and their interactions are designed. Each part plays a distinct role in the management, execution, and maintenance of applications and services within the active containerized environment. The different types of nodes are the Control Plane nodes, shown in the left part of Figure 7; the Worker nodes, shown in the left part of Figure 7; and the Load balancer node, which is the intermediate node used to orchestrate the traffic between the control and working nodes.

The overall workflow of deploying an application in a pod on the running cluster involves several steps and orchestrated communication between the Control Plane and Worker Node components through the Load Balancer. The process begins with the developer defining the desired state of the application through a manifest file and using Kubernetes CLI commands to send the deployment request to the API Server. The API Server is the central management entity that interfaces with etcd to store and retrieve the cluster state, and it also communicates with the scheduler to provide notifications about new pods needing assignment. Additionally, the API Server interacts with the controller manager to manage different controllers, ensuring that the cluster maintains the desired state.

The scheduler plays a crucial role by monitoring the API Server for unscheduled pods and then determining the most appropriate worker node for each pod based on resource requirements and constraints. Once a worker node is selected from the Load Balancer, the scheduler updates the API Server with the node assignment. The controller manager continuously checks the API Server to maintain the state of controllers, making necessary adjustments to match the desired state specified in the deployment. On the worker nodes, the kubelet, through the Load Balancing node, watches the API Server for pod assignments and communicates with Containerd to manage the lifecycle of containers, including pulling images from registries. The kube-proxy on each worker node ensures that network traffic is correctly routed to the pods by maintaining network rules and inspecting the Load Balancer and the API Server for service and endpoint changes. This coordinated process ensures that the application is deployed and operates as expected within the COGNIFOG Kubernetes cluster and is depicted in Figure 7. In the next subsections, a more detailed analysis is presented for each Kubernetes component.

2.2.3. Kubernetes Control Plane Components

Kubernetes control plane nodes are central to managing the cluster’s overall state and operations. They host essential components, such as the API server, controller manager, and scheduler, which collectively handle tasks like application scheduling, maintaining the cluster state, and scaling operations. These components make global decisions to ensure the cluster functions as intended.

The API server (kube-apiserver) serves as the primary interface to the cluster, exposing the Kubernetes API through RESTful HTTPS operations. Its key roles include acting as a unified endpoint for all Kubernetes resources, maintaining the shared state of the cluster, validating and enforcing configuration data, and managing CRUD (Create, Read, Update, and Delete) operations for API objects like pods, services, and deployments. As the central component of the Kubernetes control plane, the API server processes requests, manages the cluster state, and enforces policies to ensure consistency and reliability. Similarly, etcd, a distributed key-value store, is essential for maintaining the cluster’s state and configurations. It securely stores information about pods, services, and deployments, ensuring consistency through the Raft consensus algorithm. By continuously reconciling discrepancies between the actual and desired state, Kubernetes ensures reliability and resilience in large-scale clusters.

The kube-scheduler is responsible for assigning unscheduled pods to nodes within the cluster and optimizing pod placement based on various criteria. These include resource requirements like CPU, memory, and disk space, as well as constraints such as node affinity, performance considerations, and inter-pod communication needs. By applying these criteria, the scheduler ensures efficient resource allocation while minimizing latency and maximizing performance.

The kube-controller-manager hosts multiple controllers, each responsible for monitoring and managing specific cluster resources. These controllers, including the Node, ReplicaSet, and Endpoint controllers, operate in event-driven loops, continuously reconciling the cluster’s actual state with the desired state defined in Kubernetes objects. They detect discrepancies and take corrective actions, such as replacing failed pods or maintaining the required number of replicas, to ensure the cluster remains stable and aligned with its intended configuration.

2.2.4. Kubernetes Application Plane/Worker Nodes Components

The data (application) plane nodes in the Kubernetes cluster play a vital role in maintaining running pods and providing the runtime environment for containerized applications, such as those of the COGNIFOG Platform during the testing and integration phase. These nodes include essential components like the kubelet, kube-proxy, and the container runtime, which work together to ensure the smooth operation of pods. The kubelet, as the primary node agent, monitors and manages the containers within pods, ensuring their health by interacting with the Kubernetes API to report status and health information. The kube-proxy acts as a network proxy on each node, managing network rules to facilitate continuous communication between pods and external services.

The container runtime, specifically Containerd in this deployment, is responsible for executing and managing the lifecycle of containers on the nodes. Examples of container runtimes include CRI-O, Docker, and others, but Containerd was chosen for its efficiency and reliability. Together, these components enable the effective execution and management of containers, maintain pod health, and ensure efficient networking within and outside the Kubernetes cluster, ensuring the robust operation of the data plane [29].

2.2.5. Load Balancer Node

The load balancer plays a vital role in ensuring high availability (HA) for the Kubernetes control plane by distributing incoming traffic evenly across multiple control plane nodes. Acting as a mediator, it prevents any single control plane node from being overwhelmed, ensuring the control plane remains responsive and highly available. In the COGNIFOG deployment, the HAProxy Community Edition is utilized in Layer 4 (TCP) mode as an external module, enabling the transition from a single control plane node to multiple nodes for enhanced HA. HAProxy employs security measures, such as active SSL health checks, to detect unresponsive or problematic control plane nodes. By periodically polling the Kubernetes control plane nodes, HAProxy removes any faulty node from the pool until it resumes normal operation. This configuration guarantees that incoming requests are reliably routed to functional nodes, minimizing downtime and maintaining the stability of the Kubernetes control plane.

2.2.6. Kubernetes Cluster Networking

Kubernetes separates networking from its core, allowing users to integrate various solutions based on specific needs. The Container Network Interface (CNI) standardizes the interaction between container runtimes and networking layers, facilitating the integration of diverse networking options. Popular CNI plugins include Weave, Flannel, Calico, and Cilium. In the COGNIFOG project, Calico was selected for its robust performance, flexibility, security features, and reliability.

Calico [30] is an open-source networking and security solution for workloads on native hosts, virtual machines, and containers. Operating at Layer 3 of the OSI model, it uses the Border Gateway Protocol (BGP) for node-to-node packet transfer and supports multiple data planes, including Linux eBPF and Windows Host Networking Service (HNS). Calico offers a comprehensive networking stack, supports Kubernetes network policies, and facilitates pod traffic encryption using WireGuard tunnels, ensuring secure, reliable, and high-performance networking within Kubernetes clusters.

To evaluate the robustness and resilience of the COGNIFOG CI/CD process and its behavior under real-world failure scenarios, a benchmarking procedure has been implemented based on chaos-engineering principles [31]. The procedure is designed to simulate critical fault conditions, including hardware and node failures, network disconnections, and workload overloads, with the aim of assessing the system’s ability to maintain deployment consistency, recover autonomously, and ensure continuous service delivery.

The testing methodology involves deploying two separate environments: a fully featured instance of the COGNIFOG platform, equipped with dynamic reconfiguration, health monitoring, and self-healing capabilities, and a control environment without these enhancements, used as a comparative baseline. The benchmarking is being executed using automated tools, such as Chaos Mesh, to inject failures by deliberately shutting down fog nodes, severing network links, or progressively increasing system load.

As depicted in Figure 8, during hardware and node failure simulations, the system’s response is being measured in terms of workload redistribution, fault isolation, and recovery time once nodes are restored. In overload scenarios, the evaluation focuses on latency, throughput, and resource utilization under increasing device and data volumes. For network failures, the emphasis is on buffering mechanisms, retry logic, and the time required to resume normal operations once connectivity is re-established.

Key performance metrics—including the response time, error rates, task recovery time, and system resource usage—are being collected via Prometheus throughout each test cycle. These experiments are expected to demonstrate that COGNIFOG’s CI/CD infrastructure is resilient to disruptions common in edge environments, offering automated rollback, consistent deployment integrity, and adaptive behavior across heterogeneous and unstable infrastructures [31]. The experimental measurements are currently in progress.

2.2.7. Deployment Environment

Virtual servers on cloud infrastructure have been deployed to set up the COGNIFOG Development and Testing environment, following the recommendations for hardware specifications for the Kubernetes cluster core elements, as specified in the official Kubernetes documentation site. The environment includes dedicated machines to host the CI/CD tools and processes, as mentioned in previous sections; the VPN server that establishes a reliable and independent network securing the testing infrastructure; and the Kubernetes services. The running Kubernetes cluster includes three instances for control plane nodes, three for workers, and one for load balancing. An additional all-in-one Kubernetes node is deployed to support further testing related to the edge and multi-cluster management operations of the platform. The described architecture of the overall environment is depicted in Figure 9.

The Kubeadm tool was utilized for the initialization and bootstrapping of the Kubernetes cluster. This tool carries out the tasks required to quickly and easily establish a minimal viable, secure cluster. Kubeadm is designed to be a modular building block of higher-level tools, and its scope is restricted to the local node filesystem and the Kubernetes API [32].

The multidimensional software components introduced within COGNIFOG, along with the convolution of the Kubernetes elements, formulated a complex ecosystem difficult to analyze in its parts. As part of the integration and deployment phase of COGNIFOG to address those challenges, the deployment view for testing activities is defined as shown in Figure 10 to provide a comprehensive overview of the environments. The focus is on ensuring that the deployment views are aligned with the evolving requirements of each trial and the overall system architecture, and they are designed to provide a high level of understanding of the infrastructure components, their interactions, and how they are deployed in different environments.

3. Deployment in Real-Life Environments

The transition from a theoretical framework to practical implementation presents unique challenges in the IoT–edge–cloud continuum, where traditional deployment approaches often fall short of addressing the complexities of distributed systems [33]. The COGNIFOG framework addresses these challenges through an innovative, multi-layered approach to deployment that combines automated tooling, standardized practices, and flexible orchestration capabilities. A key component of this approach is the COGNIFOG QuickStart Guide, a comprehensive deployment solution that bridges the gap between development and production environments. This guide, coupled with robust CI/CD practices and intelligent resource orchestration, enables organizations to rapidly deploy and scale their applications across the continuum while maintaining security and performance. Following extensive laboratory validation through unit and integration testing of the framework components, the following sections detail the practical implementation of these solutions across diverse real-world scenarios, from initial deployment steps to full-scale operations in smart cities, healthcare, and manufacturing environments. The COGNIFOG deployment methodology demonstrates how systematic automation and standardization not only streamline the implementation process but also provide the flexibility and reliability required for modern distributed applications.

The COGNIFOG approach includes dynamically reallocating resources using optimization tools to adapt to fluctuating workload demands [21]. Additionally, seamless over-the-air updates are implemented to ensure applications remain up to date and operate at peak performance. Once the data are processed at the edge, it is forwarded to the cloud for visualization and storage, depending on the specific requirements of the deployed applications.

Monitoring forms the core of the COGNIFOG framework features, encompassing both infrastructure and applications. At the cloud level, monitoring data collected from edge nodes—such as CPU and RAM usage and latency metrics—are utilized by the resource optimization component for intelligent workload placement. This process is supported by integrating the container orchestration system with the resource optimizer, enabling advanced capabilities for resource optimization and system scalability. The COGNIFOG components deployed [21] are the following:

IoT Sensors: These collect data and transmit them to edge-based applications (using MQTT).
Edge Gateway: This manages the acquisition and initial processing of IoT data at the edge level [34].
Secure Operating System: This provides a trusted execution environment integrated into the edge infrastructure (ARCA Trusted OS [35]).
Monitoring System: This tracks resource usage in the edge infrastructure, feeding data to the resource optimization component (Prometheus/Thanos [36]).
Resource Optimizer: This analyzes monitoring data to optimize application placement at the edge. It is an extension of the Kubernetes scheduler that implements an extended model for describing enhanced application requirements and infrastructure capabilities (Smart Allocator [36]).
Application Performance Monitor: This tracks the health and performance of applications running on edge nodes (polygraph [37]).
Data Processing Hub: This handles data processing from IoT devices at the edge level [38].
Monitoring Dashboard: This provides intuitive data visualization for decision-making and situational awareness (Front-End Dashboard [39]).

3.1. First Steps for Deployment: The COGNIFOG Quickstart Guide

Deploying and managing the COGNIFOG platform across cloud and edge environments presents unique challenges due to its distributed nature and the need for seamless integration of multiple components provided by different entities. COGNIFOG Quick Start was created to address these challenges by offering an automated, efficient deployment approach that helps COGNIFOG solution users to quickly set up and test the platform’s capabilities, reducing the complexity of infrastructure management and accelerating development cycles.

Quick Start serves as a deployment tool rather than just a one-time setup solution. It is not a proper use case of the COGNIFOG project but exists to show how the components are integrated. It provides a structured way to deploy and configure all the necessary infrastructure components, including Kubernetes (K8s), KubeEdge, K3s, and the COGNIFOG subsystems, allowing users to focus on testing, integration, and feature development without spending time on manual configurations. A key advantage of Quick Start is its ability to seamlessly integrate into CI/CD workflows, supporting continuous testing and deployment across complex distributed environments. Whether testing it in a local standalone setup or deploying it across multiple locations, Quick Start ensures consistency, repeatability, and scalability.

To provide an automated all-in-one COGNIFOG deployment, Quick Start uses a combination of virtualization and automation tools tailored for different environments (standalone or distributed):

Vagrant 2.4.2 is used for testing within a standalone environment, allowing developers to quickly spin up a virtualized cloud–edge system on their local machines. This environment includes all required components, such as K8s, K3s, and KubeEdge clusters, and the COGNIFOG subsystems to validate applications and workflows before moving to production.
Terraform 1.11.2 is leveraged for deployment in a distributed environment, enabling scalable infrastructure provisioning across cloud and edge locations. It ensures smooth and efficient CI/CD processes by automating resource management and deployment across geographically dispersed infrastructures. This component helps COGNIFOG users to deploy the subsystems on their own physical target in a real environment.
Ansible 2.17 is used in both standalone and distributed environments to automate the configuration of the deployed infrastructure, ensuring uniformity and reliability, and reducing manual intervention.
Helm charts are a package manager for Kubernetes, allowing users to define, install, and manage applications running on clusters. They are YAML files used to specify the applications, the services, and the configurations and dependencies required by applications.
Kubernetes manifests are YAML or JSON files used to define Kubernetes resources like pods, services, and deployments. They specify how to configure those resources and are directly applied to Kubernetes clusters using the kubectl command.

Figure 11 provides an overview of the Quick Start. This environment aims to integrate most of the components developed by technology providers for the project. As previously mentioned, the goal of Quick Start is to provide users with an emulated continuum environment featuring the COGNIFOG platform to ease the integration on real targets. By deploying the Quickstart, COGNIFOG users also see the connections between the main components of the project. In Figure 11, six virtual machines (VMs) are automatically provisioned on a single host (using Vagrant). The first VM represents the cloud, while the remaining VMs emulate edge nodes. This setup can be easily expanded/adapted to match edge user needs by simply modifying the Vagrant configuration file, offering flexibility for different testing scenarios.

The entire virtualized infrastructure is automatically configured using Ansible, which handles the setup process, including the networking configurations as well as the setup of KubeEdge and Kubernetes (K3s). Additionally, Quick Start deploys key COGNIFOG components, including the Smart Allocator, message MQTT broker, application and node monitoring, and other essential services required for the platform’s operation. To provide a complete emulation environment, Quick Start also provides a set of containerized IoT data generators, emulating data streams from multiple IoT nodes and a weather station scenario. A dedicated simulator, provided by CEA, is seamlessly integrated into the deployment using containers and Kubernetes manifest, allowing users to test and validate their applications under realistic data flows.

Energy management in COGNIFOG is addressed through the deployment of a dedicated monitoring and allocation subsystem designed to optimize workload placement based on real-time power usage data, hardware efficiency profiles, and application requirements [31]. The overall strategy focuses on minimizing energy waste and improving sustainability without compromising system performance. Figure 12 illustrates the adaptive resource management enabled by this subsystem.

The energy-aware orchestration layer integrates several components. The monitoring system and application performance monitor gather real-time consumption metrics from cloud and edge nodes, while the resource optimizer uses these data to inform placement decisions [36]. Monitoring is supported by specialized tools such as Prometheus for time-series metric collection, and two open-source energy exporters—Kepler [40] and Scaphandre [41]—which provide granular visibility into resource-level power usage.

Kepler, in particular, leverages eBPF technology to measure container- and node-level energy consumption in Kubernetes clusters. Scaphandre complements this by exposing hardware-level power metrics, such as CPU and memory usage, across various Linux platforms. Both tools integrate seamlessly with Prometheus, allowing energy data to be visualized in real time via Grafana dashboards [31].

The energy benchmarking methodology was designed to capture consumption at three levels of the continuum: cloud, edge, and far edge. Monitoring subsystems were deployed across representative nodes, and Prometheus was used to collect energy consumption data across the system. Key performance indicators include CPU and memory utilization, energy draw per container, and system-level power efficiency under different operational conditions.

These data are continuously analyzed by the resource optimizer to enforce energy-aware scheduling and task migration policies. By deploying workloads to the most energy-efficient nodes available at runtime, the system reduces redundant power consumption and contributes to the platform’s sustainability goals.

In future work, further power consumption curves may be published across different test conditions and workloads, further illustrating the impact of dynamic resource orchestration on energy optimization.

3.2. Trial 1 (Thales, Paris-Saclay)—Collaborative Missions in Urban Areas

In urban environments, disruptions caused by natural disasters, social unrest, or other events can impact essential services like power, water, and communication, necessitating coordinated responses from civil organizations. Sensor networks, video surveillance, and other data sources enhance situational awareness and decision-making through OODA (Observe, Orient, Decide, and Act) loops, while digitalized infrastructure, such as air traffic control and transportation networks, improves safety and security [42]. Trial 1 aims to validate the integration of COGNIFOG building blocks for crisis management, using the centennial flood in Paris as a scenario. The trial demonstrates COGNIFOG’s self-adaptability for cloud-native technologies like Kubernetes, Kafka, and software-defined networking, supporting dependable orchestration and meta-orchestration for higher-level functional chains and time-predictable operations.

For this trial scenario, a network of IoT sensors is leveraged to periodically send data to edge-based applications. These applications are designed to detect anomalies that require attention, such as water levels exceeding predefined thresholds.

As depicted in Figure 13, at the edge level, these data undergo pre-processing and analysis using applications managed/provisioned using COGNIFOG. Ensuring the reliability and security of this layer is crucial to prevent misbehavior or configuration issues that could compromise system performance. To support the efficient deployment and management of these edge-based applications, container-based orchestration platforms are utilized, such as K3s and KubeEdge. These platforms streamline the application lifecycle—from deployment to ongoing management—across the continuum.

Regarding the hardware used in this trial, microcontroller- and microprocessor-powered boards are selected, such as Arduino 2.3.6 and Raspberry Pi 4 Model B, equipped with various sensors (e.g., gyroscope, magnetometer, and accelerometer) to emulate the IoT level. At the edge level, NXP LX2160ARDB cards (NXP Semiconductors N.V., Eindhoven, The Netherlands) featuring 16 Cortex A72 cores and 32 GB of RAM are deployed, as well as Raspberry Pi boards running the secure operating system. Additionally, a physical and distributed cluster of server-class machines is integrated to enhance computational capabilities. To configure the testing environment, Proxmox PVE and Terraform are utilized, employing Infrastructure-as-Code (IaC) methodologies with Ansible for automation (as performed in QuickStart). This IaC approach is crucial for enabling the replication and redeployment of the scenario across different environments and for testing new applications.

The deployment of these components follows the COGNIFOG CI/CD pipeline, ensuring consistent and automated rollout across the infrastructure. Each component is containerized and deployed through the established DevOps practices, with the QuickStart Guide providing standardized deployment procedures. The monitoring and resource optimization components are particularly crucial for maintaining the performance and reliability of the edge infrastructure.

3.3. Trial 2 (TMA, Piraeus, Greece)—E-Health Services in the Edge–Cloud Continuum

This trial aims to enhance medical IoT devices by integrating the COGNIFOG framework with the “Noah Ark of Health—NoAH” [43] Telemedicine Station along with wearable IoT devices—such as smartwatches—enabling self-management evaluation notifications for employees of remote worksites. Health measurements, such as blood pressure, oxygen levels, blood markers, temperature, and physical activity, are processed locally and forwarded to the cloud for computational analysis and health status evaluation. These data primarily come from two sources: wearable IoT devices worn by the personnel at construction sites and portable Telemedicine Suitcases (physically in the same location as the edge server) that include advanced health diagnostic tools. The system architecture ensures that these data streams are efficiently handled at the edge, reducing reliance on uninterrupted connectivity to the cloud and offsetting privacy concerns.

Wearable IoT devices (smartwatches) continuously collect vital signs like heart rate, oxygen saturation, and physical activity. They connect via Wi-Fi—provided by IoT edge gateways—and send data to them using the MQTT protocol. Edge gateways also run custom software that preprocesses these messages and automatically generates alerts if anomalies, such as a critical drop in a health metric, are detected.

Alongside the wearables, the Telemedicine Suitcase operates as a compact, portable station containing sensors for ECG, blood pressure, glucose monitoring, and a video-conferencing module for real-time medical consultations. This suitcase transmits its measurements through Ethernet or Wi-Fi to edge servers. When network conditions permit, it can stream data and teleconsultation sessions directly to remote medical staff, but if connectivity is intermittent, local buffering at the edge layer ensures no loss of critical information.

The deployment of Trial 2 utilizes the COGNIFOG QuickStart Guide to ensure standardized and efficient setup of the system components. The general deployment view for Trial 2 is depicted in Figure 14. The distinct layers are highlighted: management, cloud (a working cluster), edge, and far edge. At the top, the management cluster includes components for dashboard visualization, monitoring management, data processing, and workload scheduling, which together provide global control, observability, and orchestration.

The cloud portion of the working cluster hosts resource optimization and container orchestration components, while the edge layer runs telemedicine software on the secure operating system. At this layer, the telemedicine suitcases communicate with the edge server. Finally, Raspberry Pi devices at the far edge collect and filter data from IoT devices [21].

The trial addresses challenges like simplifying the integration of new devices, reducing the installation time, providing first-level diagnoses without human intervention, and ensuring functionality in harsh network conditions. Trial 2 demonstrates the full functionality of COGNIFOG, showcasing its edge computing capabilities in e-health services, with additional AI-based cloud processing and storage. The use case highlights the effective use of medical IoT devices for accurate patient assessments, even in mobile or remote settings.

The goal of the e-health trial is to make the described scenario feasible using COGNIFOG by addressing the outlined challenges and enabling additional functionalities. Specifically, the trial aims to achieve the following:

Scale up operations across multiple clusters of physical sites;
Manage these sites using negotiated service-level agreements (SLAs);
Verify the overall topology and monitor connections;
Monitor energy consumption and optimize resource usage;
Enhance connectivity through edge computing.

By deploying the telemedicine systems and server at the edge, the trial ensures that critical health data can be processed locally, even in the absence of internet connectivity. This edge computing capability allows for real-time health monitoring and immediate medical interventions without relying solely on cloud connectivity. The IoT edge gateway facilitates this process by acting as a bridge between IoT medical devices and the edge computing infrastructure.

The framework provides advanced interoperability and orchestration capabilities [21], ensuring quality communication between wearable devices (IoT), edge stations, and the backend cloud system. The Data Processing Hub enables IoT devices and edge gateways to exchange information and interact within the setup. The platform leverages decentralized capabilities to orchestrate servers analyzing health data at the edge, reducing the need to transmit sensitive medical information to the cloud, aligning with privacy policies that restrict data transmission.

The platform includes a comprehensive cybersecurity framework with features like a trusted execution environment (TEE), secure operating systems, cryptographic services, and automated threat detection. The CI/CD pipeline ensures these security features are consistently deployed and configured across all sites. The secure operating system provides these security measures, ensuring a robust and secure operating environment for telemedicine applications.

COGNIFOG supports deployment across multiple clusters of physical sites through automated deployment procedures, allowing the telemedicine system to scale horizontally and manage numerous construction sites simultaneously. The cloud-to-edge orchestrator facilitates this scalability by orchestrating containerized applications across the continuum, while network management components support efficient resource allocation and network operations [21].

The resource optimization component uses advanced scheduling and solving mechanisms to manage resource allocation, ensuring telemedicine services meet predefined quality-of-service constraints. The QuickStart Guide automates the deployment of these scheduling mechanisms and their integration with monitoring systems. The dashboard interface allows for user-friendly management and monitoring of these resource allocations, with automated configuration through declarative YAML files.

The adaptation services of COGNIFOG apply corrective actions during service operation to maintain optimal performance and reliability. The deployment pipeline includes the automated setup of these adaptation mechanisms, ensuring consistent service quality across all sites. Monitoring and modeling tools [21] provide detailed insights into system performance and predictive analytics.

The monitoring infrastructure, automatically deployed through the QuickStart Guide, collects telemetry data from various parts of the COGNIFOG continuum. The system aggregates these data into a topological model, providing a comprehensive view of the infrastructure status. The monitoring components ensure detailed and continuous oversight across the edge–cloud continuum, enabling proactive adjustments and maintaining overall system health.

COGNIFOG integrates AI-driven self-optimization features to manage energy consumption across the edge–cloud continuum. The platform includes functionalities for real-time power monitoring, workload balancing, and resource migration to optimize energy use. The use of Kepler, Prometheus, and Grafana by ATOS facilitates comprehensive energy and performance monitoring. By reducing unnecessary data transmission and leveraging edge computing, COGNIFOG minimizes the energy footprint of telemedicine operations. The specific KPI is a 20% reduction in overall energy consumption and associated CO₂ emissions.

The e-health trial demonstrates that telemedicine services can be effectively delivered to rural and remote settings, like construction sites, marine, aviation, and tourist industries, through enhanced connectivity, privacy and security, scalable operations, SLA-based management, comprehensive infrastructure monitoring, and energy-efficient practices.

3.4. Trial 3 (LMS, Patras)—Automated Edge-Cloud Continuum for Smart Manufacturing

European industries are increasingly investing in industrial robotics to enhance flexibility and efficiency in production. In 2022, EU member states installed approximately 72,000 industrial robots, marking a 6% increase from the previous year [44]. Today’s European industry faces the challenge of improving its competitiveness in the global market. One way to do this is by increasing flexibility and efficiency in production. However, the full potential of industrial robots is not being realized in many EU production plants due to poor acceptance. Trial 3 aims to improve shopfloor efficiency by introducing flexible mobile dual-arm robots controlled by a flexible IT infrastructure. The approach allows for the optimization of the use of computing resources in terms of throughput, costs, and energy savings.

In this scenario, robots perform a variety of tasks such as material handling, assembly, and transportation tasks. The use case involves multiple robotic systems, including mobile robots and collaborative arms, working in tandem to ensure smooth and efficient operations on the factory floor. The system is designed to manage real-time and historical data, coordinate tasks between robots, and ensure that all components are functioning optimally, as depicted in Figure 15. Key challenges addressed include the orchestration of multiple devices, dynamic task scheduling, and the ability to scale operations seamlessly as production demands fluctuate. The integration of the COGNIFOG platform allows for real-time higher-level observation, enhancing the decision-making process, which allows for storing and optimizing predictive maintenance data and operations and improves resource utilization through smart orchestration, leading to higher efficiency, reduced downtime, and greater flexibility in handling complex manufacturing workflows.

Here, the deployment of COGNIFOG, especially in relation to its orchestration capabilities, DevOps practices, and performance metrics, is instrumental in improving operational efficiency. In the context of industrial robotics, the integration of COGNIFOG provides the flexibility needed to orchestrate multiple devices, ensuring that they work in coordination to accomplish tasks like assembly or material transport. The need for dynamic orchestration is clear, as manufacturing environments often require real-time adjustments due to disruptions, maintenance schedules, or shifts in production demands. COGNIFOG’s platform facilitates this by enabling smart, automated orchestration of various hardware and software components, reducing the reliance on manual intervention. This is achieved through the use of the QuickStart Guide, which helps deploy the platform efficiently, with minimal setup time and greater consistency. As part of the integration process, the platform also incorporates performance metrics and Key Performance Indicators (KPIs), ensuring that all aspects of the deployment are continuously monitored for operational improvements.

A key challenge addressed by COGNIFOG in the smart manufacturing use case is the need for seamless communication across devices and systems. Traditional systems often struggle with data coordination between edge devices (such as robots) and the cloud, especially when handling real-time data or dealing with network disruptions. By using a multi-layered architecture (edge, fog, and cloud), COGNIFOG is able to process data closer to the source, reducing latency and ensuring faster response times for critical decisions. In situations where cloud connectivity may be unreliable, the edge layer of COGNIFOG ensures that the manufacturing process continues without interruption thanks to the processing and decision-making capabilities at the edge.

The use of Kubernetes (K8s) for orchestrating the deployment of IoT devices and edge nodes is another area where COGNIFOG proves beneficial (Figure 16). Kubernetes’ ability to manage containerized applications is essential in the manufacturing environment, where the complexity of tasks and the number of devices may fluctuate. COGNIFOG’s integration with Kubernetes enhances this by allowing for the dynamic scaling of resources based on real-time workload assessments. This level of automation reduces the strain on manual processes, enhances scalability, and ensures that the infrastructure can evolve with the growing demands of the manufacturing system.

Furthermore, DevOps practices play a vital role in the successful deployment and maintenance of COGNIFOG in the smart manufacturing use case. Through continuous integration and continuous delivery (CI/CD) pipelines, updates to the system and robot applications can be rolled out with minimal disruption to ongoing operations. This ensures that manufacturing processes are always running the latest and most efficient software versions. The monitoring and testing of these deployments, powered by COGNIFOG, allow for continuous optimization. The QuickStart Guide, which is a part of the deployment framework, plays a crucial role in simplifying the initial setup process. It provides a predefined, standardized approach to configure the COGNIFOG platform for specific industrial use cases. This not only reduces the setup time but also minimizes errors during deployment, ensuring a smoother transition into production environments. In terms of improvement metrics, COGNIFOG’s performance is tracked through various KPIs, such as the development and installation time, service establishment time, edge deployment speed, and scalability under load. These metrics allow the manufacturing team to assess the effectiveness of the platform in real time, making it easier to identify areas for improvement. The scalability and resilience of COGNIFOG are tested in the smart manufacturing use case, particularly when multiple robots and devices need to be deployed simultaneously. The platform’s dynamic scaling capability ensures that as new robots are added to the system or tasks become more complex, the infrastructure can grow without significant delays.

4. Conclusions and Future Work

The work described in this paper showcases the substantial progress achieved in establishing a robust, secure, and efficient CI/CD environment for the COGNIFOG framework, contributing to the practices applied to efficiently building an IoT–edge–cloud ecosystem. This environment plays a crucial role in ensuring solid development, deployment, and testing of the COGNIFOG platform’s components. Through the work carried out on automated CI/CD tools, orchestration functionalities, methodologies, the development of a comprehensive QuickStart Guide, guidelines, and security measures, the platform has been well prepared to support the work carried out in the COGNIFOG platform towards the IoT–edge–cloud continuum, supporting the interconnections and essential interfaces of the components, as well as the first release of the integrated COGNIFOG platform.

One of the significant achievements of this work is the setup of a complete CI/CD workflow, which leverages best practices in automation and security to enhance efficiency and scalability. The QuickStart Guide provides a standardized approach for deployment, significantly reducing setup complexity and time while ensuring consistency across different environments. This CI/CD practice not only streamlines the integration process but also provides a scalable framework that can accommodate the growing complexity of the platform as new components are integrated. By integrating diverse open-source tools, the platform achieves a high degree of distributed orchestration and integration automation, enabling continuous delivery and monitoring of components, while ensuring their performance is rigorously tested across the IoT–edge–cloud continuum.

Moving forward, the platform is equipped to handle the increasing demands of future development cycles, the final integrated COGNIFOG platform, final optimization and lab testing activities, and pilot activities. This will ensure that COGNIFOG can adapt to the dynamic and challenging environments of the project’s use cases in the areas of Smart Cities, e-Health, and Industry 4.0. Indicatively, preliminary experimental results from Trial 1 and Trial 3 (to be published in upcoming project deliverables) concerning the setup time of the cloud and edge infrastructure, including networking configuration and Kubernetes and K3s orchestration, are already very promising, with setup times measured around 5 min, achieving the respective COGNIFOG KPI value.

Beyond these developments, future research will explore how orchestration strategies can be extended to highly constrained IoT devices. The heterogeneity of such platforms introduces new challenges in management and configuration, which may be addressed through dynamic firmware deployment, configurable protocol stacks, or meta-OS architectures that support peer-to-peer collaboration [45]. Concepts such as swarm intelligence and distributed decision-making could further enable self-organizing, cooperative behavior among cognitive edge devices, unlocking new levels of autonomy and efficiency within the IoT–edge–cloud continuum.

Future research within the COGNIFOG framework will focus on enhancing orchestration intelligence, integrating AI-driven mechanisms to optimize resource allocation and workload distribution across the IoT–edge–cloud continuum. Security will be further reinforced through the adoption of zero-trust architectures and blockchain-based authentication, ensuring resilience in highly dynamic environments. In parallel, the platform’s adaptability will be extended to emerging application domains, such as autonomous mobility, AI-powered industrial automation, and real-time healthcare analytics. Efforts will also address energy efficiency in resource-constrained devices and investigate federated learning techniques to enable decentralized, privacy-preserving AI training at the edge. These advancements aim to position COGNIFOG as a scalable, intelligent fog computing framework equipped to address the evolving demands of distributed computing systems.

In conclusion, the structured integration approach has laid the groundwork for the continued success of the COGNIFOG project. With a secure, scalable, and efficient integration environment in place, the project is well positioned to achieve its technical and operational goals, ultimately contributing to the development of a cognitive fog computing platform that spans the IoT–edge–fog–cloud continuum.

Author Contributions

Conceptualization, K.P., E.A., G.A., T.A., N.G. and E.V.; methodology, K.P., E.A., T.A., G.A., N.G. and E.V.; software, A.B., Z.B., H.K. and M.P.; validation, K.P., E.A., T.A. and G.A.; formal analysis, K.P. and E.A.; investigation, all authors; data curation, K.P., E.A., T.A., G.A. and E.V.; writing—original draft preparation, all authors; writing—review and editing, K.P., E.A., G.A., N.G. and E.V. All authors have read and agreed to the published version of the manuscript.

Funding

This work received funding from the European Union’s Horizon Europe Research and Innovation Framework Programme under Grant Agreement No 101092968 (project COGNIFOG).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The raw data supporting the conclusions of this article will be made available by the corresponding author upon request.

Conflicts of Interest

Authors Kostas Petrakis, Evangelos Agorogiannis, Grigoris Antonopoulos, Themistoklis Anagnostopoulous, Nasos Grigoropoulos, and Eleni Veroni were employed by the company Netcompany-Intrasoft. Author Zakaria Benomar was employed by the company Thales Group. Authors Harry Kakoulidis Marios Prasinos and Philippos Sotiriades were employed by the company Telematic Medical Applications. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

References

Statista. Volume of Data/Information Created, Captured, Copied, and Consumed Worldwide from 2010 to 2020, with Forecasts from 2021 to 2025. 2022. Available online: https://www.statista.com/statistics/871513/worldwide-data-created (accessed on 20 November 2024).
Morrish, J.; Arnott, M.; Hatton, M. Global IoT Forecast Report, 2022–2032. 2023. Available online: https://transformainsights.com/research/reports/global-iot-forecast-report-2032 (accessed on 20 November 2024).
Escamilla-Ambrosio, P.; Rodriguez-Mota, A.; Aguirre-Anaya, E.; Acosta-Bermejo, R.; Salinas-Rosales, M. Distributing computing in the internet of things: Cloud, fog and edge computing overview. In Proceedings of the NEO 2016: Results of the Numerical and Evolutionary Optimization Workshop NEO 2016 and the NEO Cities 2016 Workshop, Tlalnepantla, Mexico, 20–24 September 2016; Springer: Berlin/Heidelberg, Germany, 2018; pp. 87–115. [Google Scholar]
Adame, T.; Carrascosa-Zamacois, M.; Bellalta, B. Time-sensitive networking in IEEE 802.11 be: On the way to low-latency WiFi 7. Sensors 2021, 21, 4954. [Google Scholar] [CrossRef] [PubMed]
Pittet, S. Continuous Integration vs. Delivery vs. Deployment. Available online: https://www.atlassian.com/continuous-delivery/principles/continuous-integration-vs-delivery-vs-deployment (accessed on 21 November 2024).
Rehkopf, M. Continuous Delivery Principles. Available online: https://www.atlassian.com/continuous-delivery/principles (accessed on 20 November 2024).
Bonomi, F.; Milito, R.; Zhu, J.; Addepalli, S. Fog computing and its role in the internet of things. In Proceedings of the First Edition of the MCC Workshop on Mobile Cloud Computing, Helsinki, Finland, 17 August 2012; pp. 13–16. [Google Scholar]
Böhm, S.; Wirtz, G. Towards Orchestration of Cloud-Edge Architectures with Kubernetes. In Proceedings of the EAI Edge-IoT 2021—2nd EAI International Conference on Intelligent Edge Processing in the IoT Era 2021, Online, 24–26 November 2021; Springer: Berlin/Heidelberg, Germany, 2009; pp. 207–230. Available online: https://www.researchgate.net/publication/356641902_Towards_Orchestration_of_Cloud-Edge_Architectures_with_Kubernetes (accessed on 22 November 2024).
Kimovski, D.; Mathá, R.; Hammer, J.; Mehran, N.; Hellwagner, H.; Prodan, R. Cloud, fog, or edge: Where to compute? IEEE Internet Comput. 2021, 25, 30–36. [Google Scholar] [CrossRef]
The COGNIFOG Consortium. Horizon Europe Project COGNIFOG Main Website. Available online: https://cognifog.eu/ (accessed on 21 November 2024).
Dogani, J.; Namvar, R.; Khunjush, F. Auto-scaling techniques in container-based cloud and edge/fog computing: Taxonomy and survey. Comput. Commun. 2023, 209, 120–150. [Google Scholar] [CrossRef]
Costa, B.; Bachiega, J., Jr.; de Carvalho, L.R.; Araujo, A.P. Orchestration in fog computing: A comprehensive survey. ACM Comput. Surv. 2022, 55, 29. [Google Scholar] [CrossRef]
SUSE. Rancher Fleet Main Website. Available online: https://fleet.rancher.io/ (accessed on 26 November 2024).
The Cloud Native Computing Foundation. OCM Main Website. Available online: https://open-cluster-management.io/ (accessed on 26 November 2024).
Red Hat. OpenShift Documentation. Available online: https://docs.openshift.com/ (accessed on 27 November 2024).
Fevereiro, D.D.M. Smart Orchestration on Cloud-Native Environments. Master’s Thesis, Universidad de Coimbra, Coimbra, Portugal, 2023. [Google Scholar]
Vaño, R.; Lacalle, I.; Sowinski, P.; S-Julián, R.; Palau, C.E. Cloud-native workload orchestration at the edge: A deployment review and future directions. Sensors 2023, 23, 2215. [Google Scholar] [CrossRef] [PubMed]
Vengara, J.; Botero, J.; Fletscher, L. A Comprehensive Survey on Resource Allocation Strategies in Fog/Cloud Environments. Sensors 2023, 23, 4413. [Google Scholar] [CrossRef]
Holmes, T.; McLarty, C.; Shi, Y.; Bobbie, P.; Suo, K. Energy Efficiency on Edge Computing: Challenges and Vision. In Proceedings of the 2022 IEEE International Performance, Computing, and Communications Conference (IPCCC), Austin, TX, USA, 1–13 November 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–6. [Google Scholar]
Gkonis, P.; Giannopoulos, A.; Trakadas, P.; Masip-Bruin, X.; D’Andria, F. A Survey on IoT-Edge-Cloud Continuum Systems: Status, Challenges, Use Cases, and Open Issues. Future Internet 2023, 15, 383. [Google Scholar] [CrossRef]
Adame, T.; Amri, E.; Antonopoulos, G.; Azaiez, S.; Berne, A.; Camargo, J.S.; Kakoulidis, H.; Kleisarchaki, S.; Llamedo, A.; Prasinos, M.; et al. Presenting the COGNIFOG Framework: Architecture, Building Blocks and Road toward Cognitive Connectivity. Sensors 2024, 24, 5283. [Google Scholar] [CrossRef] [PubMed]
Shahin, M.; Ali Babar, M.; Zhu, L. Continuous Integration, Delivery and Deployment: A Systematic Review on Approaches, Tools, Challenges and Practices. IEEE Access 2017, 5, 3903–3943. [Google Scholar] [CrossRef]
K3s. K3s Documentation. Available online: https://docs.k3s.io/ (accessed on 3 April 2025).
KubeEdge. KubeEdge Documentation. Available online: https://kubeedge.io/ (accessed on 3 April 2025).
Böhm, S.; Wirtz, G. Profiling Lightweight Container Platforms: MicroK8s and K3s in Comparison to Kubernetes. In Proceedings of the 13th Central European Workshop on Services and Their Composition, Bamberg, Germany, 25–26 February 2021. [Google Scholar]
Skoularikis, M.; Liatifis, A.; Pliatsios, D.; Argyriou, V.; Markakis, E.; Lagkas, T.; Papadopoulos, G.; Sargiannidis, P. Kubernetes in Edge and Cloud Computing: A Comparative Study of K3s, K0s, MicroK8s, and K8s. Available online: https://www.researchgate.net/profile/Georgios-Papadopoulos-2/publication/389465530_Kubernetes_in_Edge_and_Cloud_Computing_A_Comparative_Study_of_K3s_K0s_MicroK8s_and_K8s/links/67c316528311ce680c791c59/Kubernetes-in-Edge-and-Cloud-Computing-A-Comparative-Study-of-K3s-K0s-MicroK8s-and-K8s.pdf (accessed on 3 April 2025).
Fernandez, J.-M.; Vidal, I.; Valera, F. Enabling the Orchestration of IoT Slices through Edge and Cloud Microservice Platforms. Sensors 2019, 19, 2980. [Google Scholar] [CrossRef] [PubMed]
Kubernetes. Kubernetes Components. Available online: https://kubernetes.io/docs/concepts/overview/components/ (accessed on 29 November 2024).
Kubernetes. Container Runtimes. Available online: https://kubernetes.io/docs/setup/production-environment/container-runtimes/ (accessed on 29 November 2024).
Tigera. Calico. Available online: https://docs.tigera.io/calico/latest/about (accessed on 3 December 2024).
COGNIFOG D2.6. COGNIFOG Website. Available online: https://cognifog.eu/wp-content/uploads/2025/01/COGNIFOG_D2.6_Validation_Benchmarking_Plan_v2.0.0.pdf (accessed on 3 April 2025).
Kubernetes. Creating Highly Available Clusters with Kubeadm. Available online: https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/high-availability/ (accessed on 3 December 2024).
Arzovs, A.; Judvaitis, J.; Nesenbergs, K.; Selavo, L. Distributed Learning in the IoT–Edge–Cloud Continuum. Mach. Learn. Knowl. Extr. 2024, 6, 283–315. [Google Scholar] [CrossRef]
CYSEC. I2CAT and CYSEC Join Forces to Advance IoT Security within COGNIFOG Project. Available online: https://www.cysec.com/i2cat-and-cysec (accessed on 2 December 2024).
CYSEC. ARCA Trusted OS. Available online: https://www.cysec.com/arca-trusted-os/ (accessed on 3 December 2024).
COGNIFOG D3.4. COGNIFOG Website. Available online: https://cognifog.eu/wp-content/uploads/2025/01/D3.4-Data-management-tools-and-secured-cloud-to-edge-continuum-orchestration.pdf (accessed on 3 April 2025).
Dubrulle, P.; Gaston, C.; Kosmatov, N.; Lapitre, A.; Louise, S. A Data Flow Model with Frequency Arithmetic. In Fundamental Approaches to Software Engineering; FASE 2019, Lecture Notes in Computer Science; Hähnle, R., van der Aalst, W., Eds.; Springer: Cham, Switzerland, 2009; Volume 11424. [Google Scholar]
Kentyou. Disaster Management Using Kentyou Platform in the EU Project COGNIFOG. Available online: https://kentyou.com/2023/06/08/disaster-management-using-kentyou-platform-in-the-eu-project-cognifog/ (accessed on 4 December 2024).
COGNIFOG D3.2. COGNIFOG Website. Available online: https://cognifog.eu/wp-content/uploads/2025/01/D3.2_Application-and-connectivity-layer-modules-final.pdf (accessed on 3 April 2025).
Kubernetes Efficient Power Level Exporter (Kepler). Available online: https://sustainable-computing.io/ (accessed on 25 February 2025).
Scaphandre Documentation. Available online: https://hubblo-org.github.io/scaphandre-documentation/index.html (accessed on 25 February 2025).
Kaniewski, P.; Roanik, J.; Zubel, K.; Golan, E.; R-Moreno, M.D.; Skokowski, P. Heterogeneous Wireless Sensor Networks Enabled Situational Awareness Enhancement for Armed Forces Operating in an Urban Environment. In Proceedings of the 2023 Communication and Information Technologies (KIT), Vysoke Tatry, Slovakia, 11–13 October 2023; pp. 1–8. [Google Scholar]
TMA. NOAH—Ark of Health. Available online: https://tma.gr/hardware-solutions/noah/ (accessed on 4 December 2024).
Internation Federation of Robotics. European Union: Industries Invest Heavily in Robotics. Available online: https://ifr.org/ifr-press-releases/news/eu-industries-invest-heavily-in-robotics (accessed on 5 December 2024).
Trakadas, P.; Masip-Bruin, X.; Facca, F.M.; Spantideas, S.T.; Giannopoulos, A.E.; Kapsalis, N.C.; Martins, R.; Bosani, E.; Ramon, J.; Prats, R.G.; et al. A Reference Architecture for Cloud–Edge Meta-Operating Systems Enabling Cross-Domain, Data-Intensive, ML-Assisted Applications: Architectural Overview and Key Concepts. Sensors 2022, 22, 9003. [Google Scholar] [CrossRef] [PubMed]

Figure 1. The high-level COGNIFOG architecture.

Figure 2. A schematic representation of the COGNIFOG CI/CD conceptual workflow including all steps required for successfully deploying an application on the operational environment.

Figure 3. COGNIFOG CI/CD operational workflow with Docker with source code available. The intermediate workflow steps (steps a–g) are depicted in the figure and outlined in the main text.

Figure 4. COGNIFOG CI/CD operational workflow with Docker image available. The intermediate workflow steps (steps a–d) are depicted in the figure and outlined in the main text.

Figure 5. COGNIFOG CI/CD operational workflow with Kubernetes with source code available. The intermediate workflow steps (steps a–g) are depicted in the figure and outlined in the main text.

Figure 6. COGNIFOG CI/CD operational workflow with Kubernetes with docker image available. The intermediate workflow steps (steps a–e) are depicted in the figure and outlined in the main text.

Figure 7. Kubernetes cluster core elements and their interactions in COGNIFOG infrastructure.

Figure 8. The workflow of a fog node failure scenario in which the network is reconfigured to migrate the workload to a different node.

Figure 9. The high-level architecture of the development and integration environment, including the CI/CD tools and the Kubernetes cluster.

Figure 10. Testing deployment view.

Figure 11. The emulated environment in COGNIFOG Quick Start with the management and working clusters and the tools implemented with their interactions.

Figure 12. A resource and energy monitoring system for cloud–edge deployments using Ku-bernetes and COGNIFOG supporting services.

Figure 13. Trial 1 architecture overview with the Kubernetes cluster deployment on the cloud and multiple edge nodes.

Figure 14. Trial 2 architecture overview with management and working Kubernetes clusters in cloud edge and far edge layers, including supporting COGNIFOG tools.

Figure 15. Smart manufacturing case: current setup and deployment using Docker in cloud and edge layers.

Figure 16. Smart manufacturing case example of K8s clusters deployment with COGNIFOG advancements.

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Petrakis, K.; Agorogiannis, E.; Antonopoulos, G.; Anagnostopoulos, T.; Grigoropoulos, N.; Veroni, E.; Berne, A.; Azaiez, S.; Benomar, Z.; Kakoulidis, H.; et al. Enhancing DevOps Practices in the IoT–Edge–Cloud Continuum: Architecture, Integration, and Software Orchestration Demonstrated in the COGNIFOG Framework. Software 2025, 4, 10. https://doi.org/10.3390/software4020010

AMA Style

Petrakis K, Agorogiannis E, Antonopoulos G, Anagnostopoulos T, Grigoropoulos N, Veroni E, Berne A, Azaiez S, Benomar Z, Kakoulidis H, et al. Enhancing DevOps Practices in the IoT–Edge–Cloud Continuum: Architecture, Integration, and Software Orchestration Demonstrated in the COGNIFOG Framework. Software. 2025; 4(2):10. https://doi.org/10.3390/software4020010

Chicago/Turabian Style

Petrakis, Kostas, Evangelos Agorogiannis, Grigorios Antonopoulos, Themistoklis Anagnostopoulos, Nasos Grigoropoulos, Eleni Veroni, Alexandre Berne, Selma Azaiez, Zakaria Benomar, Harry Kakoulidis, and et al. 2025. "Enhancing DevOps Practices in the IoT–Edge–Cloud Continuum: Architecture, Integration, and Software Orchestration Demonstrated in the COGNIFOG Framework" Software 4, no. 2: 10. https://doi.org/10.3390/software4020010

APA Style

Petrakis, K., Agorogiannis, E., Antonopoulos, G., Anagnostopoulos, T., Grigoropoulos, N., Veroni, E., Berne, A., Azaiez, S., Benomar, Z., Kakoulidis, H., Prasinos, M., Sotiriades, P., Mavrothalassitis, P., & Alexopoulos, K. (2025). Enhancing DevOps Practices in the IoT–Edge–Cloud Continuum: Architecture, Integration, and Software Orchestration Demonstrated in the COGNIFOG Framework. Software, 4(2), 10. https://doi.org/10.3390/software4020010

Article Menu

Enhancing DevOps Practices in the IoT–Edge–Cloud Continuum: Architecture, Integration, and Software Orchestration Demonstrated in the COGNIFOG Framework

Abstract

1. Introduction

1.1. CI/CD Practices in the IoT–Edge–Cloud Continuum

1.2. Orchestration Challenges

1.3. Resource and Availability Challenges

2. Solution

2.1. Overall COGNIFOG Framework (For IoT–Edge–Cloud Continuum)

2.2. CI/CD Practices, Tools, and Platform

2.2.1. Deployment and Integration Environment

2.2.2. Automation and Configuration Management

2.2.3. Kubernetes Control Plane Components

2.2.4. Kubernetes Application Plane/Worker Nodes Components

2.2.5. Load Balancer Node

2.2.6. Kubernetes Cluster Networking

2.2.7. Deployment Environment

3. Deployment in Real-Life Environments

3.1. First Steps for Deployment: The COGNIFOG Quickstart Guide

3.2. Trial 1 (Thales, Paris-Saclay)—Collaborative Missions in Urban Areas

3.3. Trial 2 (TMA, Piraeus, Greece)—E-Health Services in the Edge–Cloud Continuum

3.4. Trial 3 (LMS, Patras)—Automated Edge-Cloud Continuum for Smart Manufacturing

4. Conclusions and Future Work

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI