**1. Introduction**

Securing distributed collaborative multi-agent agent systems is an extremely complex task. Since attackers are not obliged to follow the protocols defined by the system developers, they may create diverse adverse effects with simple manipulations applied to non-adversary-resilient protocols. Unfortunately, it is extremely difficult to secure a multiagent system if it was not designed with security in mind. A good example of a design decision that may affect the overall security of a system is the choice of the leader-election algorithm [1]. In this article, we explore the consequences of the insecure leader election algorithm used in Docker Swarm.

As Docker gained popularity among cloud service providers, attackers began to develop various techniques to attack Docker-based applications. Although a great deal of attention was paid to securing Docker hosts from application level exploits and container escape few solutions exist for securing against privilege escalation among different hosts in a Docker cluster. In this work, we show how an attacker with access to a manager host inside a Docker cluster can escalate their privileges in the cluster. The research scope is presented in Figure 1.

For example, Raft, a consensus algorithm used to manage a replicated log [2], is used in Docker Swarm to synchronize the cluster's state between all managers of the cluster. See Section 2.2 for details. The logs are replicated using a strong leader, which is elected in the leader election phase in the algorithm. In case of a leader failure (a crash, network issues, etc.), the rest of the managers choose a new leader using the Raft algorithm. Despite its many advantages, Raft is a non-Byzantine algorithm that can allow a malicious insider to become a leader.

**Citation:** Farshteindiker, A.; Puzis, R. Leadership Hijacking in Docker Swarm and Its Consequences. *Entropy* **2021**, *23*, 914. https://doi.org/ 10.3390/e23070914

Academic Editors: Sotiris Kotsiantis and Alberto Guillén

Received: 24 May 2021 Accepted: 5 July 2021 Published: 19 July 2021

**Publisher's Note:** MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations.

**Copyright:** © 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https:// creativecommons.org/licenses/by/ 4.0/).

**Figure 1.** High-level description of the end-to-end scenario.

In this paper, we highlight a new privilege escalation technique called *leadership hijacking* (see Section 4.2). An attacker with access to a manager node in Docker Swarm can use this technique, which abuses the aforementioned fact that Raft is a non-Byzantine algorithm, to escalate their cluster privileges and become the cluster leader. By doing so, the attacker can control all messages and decisions within the cluster.

In addition, we demonstrate two possible malicious payloads expected to be executed by a typical attacker: a *lateral movement* payload and a *defense evasion* payload. The former utilizes cluster leader privileges and allows the attacker to execute code on every host in the cluster.

The latter is used by an attacker in order to hide their malicious activity from infrastructure management tools.

The rest of this paper is structured as follows: Section 2 reviews the technical background. In Section 4.2, we introduce the novel privilege escalation technique, called leadership hijacking. Next, in Section 4.3 we investigate malicious payloads that can be executed after the privilege escalation. In Section 5, we demonstrate an end-to-end attack scenario that illustrates the potential security risk and the impact of the investigated attack. Finally, in Section 6, we discuss possible mitigation and propose countermeasures. Our final remarks can be found in Section 7.

#### **2. Background**

#### *2.1. Docker Swarm*

An increasing number of organizations are moving their digital systems to the cloud. The benefits of cloud servers are easy deployment, high availability, continuous maintenance, system security, and more. From online websites to internal servers and databases, cloud servers store a lot of sensitive information, making them an attractive target for attackers. As the cost of hardware has decreased, software has become the main performance bottleneck. In order to fully utilize the available hardware, cloud service providers use virtualization technology to run different applications on the same hardware.

Until recently, the most advanced solution was virtual machine (VM) technology.

VM technology allows one physical server to run many different virtual servers, all of them running different operating systems.

From a security point of view, a VM is a good solution, since breaking out of a VM is a relatively complex task [3].

On the other hand, VMs suffer from significant performance overhead [4]. The main reason for the reduced performance is the overhead added by the hypervisor to each hardware operation emulated to the VM.

Today, many cloud service providers use operating system level virtualization, which employs isolated user space instances called containers. In contrast to a VM, which includes its own operating system, containers run under the host's operating system and communicate with it directly. During the runtime, a container communicates through a regular system call interface with the host OS, without any intermediate software. The architectural difference is illustrated in Figure 2.

**Figure 2.** Container vs. VM architecture [5].

At the time of this writing, Docker is one of the leading OS virtualization solutions (https://resources.flexera.com/web/media/documents/rightscale-2019-state-ofthe-cloud-report-from-flexera.pdf (accessed on 8 July 2021)). Docker is implemented in the Go programming language and enables the creation, deployment, and management of containers on a host computer. A Docker container is a lightweight software unit that bundles its own tools and libraries. Typically, one container includes one instance of an application or service, e.g., a Web server, database, or scientific software package.

Docker is a rich ecosystem. One of the main components of this ecosystem is the Docker daemon. The Docker daemon is software that runs on the host and is responsible for the creation of images and containers. The Docker daemon can run containers and create their runtime environment; it can also create a container's networking interfaces, mount points, can trigger actions, and execute commands inside a running container. The Docker daemon implements Docker's main logic and many of its features.

When deploying an application in a production environment, it is important to ensure that when a container fails, a new container will start and replace the faulty container. In addition, it is highly recommended to run several instances of a container for high availability and load balancing. To address these issues, Docker introduced a feature called swarm.

Docker Swarm abstracts many Docker hosts to one virtual Docker host. Each host that participates in the swarm cluster is called a *node*. Each node can have two roles: *manager* or *worker*. A manager's job is to keep a replicated state of the cluster. One manager node is also a *leader*. The cluster's leader is responsible for scheduling new containers and services for the cluster. A worker's job is to get container tasks from the leader and to actually run the container. The weakest point in the design of Docker Swarm exploited in this research is the Raft leader election algorithm.
