*5.3. Container Exploitation*

First, the attacker needs to have an initial foothold in the cluster. They have network access to an application that runs on a container in the cluster. In order to obtain an initial foothold, the attacker exploits a vulnerability in the application.

In this case, the application running inside the container is the Apache Tomcat Web server, version 8.5.19. The attacker finds a one-day exploit for that Web server in the Metasploit framework; after successful exploit completion, the attacker has shell access to the application's container.

#### *5.4. Container Escape*

After the attacker has successfully exploited the application, the attacker has a shell in the restricted Docker environment. In order to execute our privilege escalation technique, the attacker needs to escape from the restricted environment and retrieve a shell on the underlying host of the container.

The attacker then exploits a vulnerability in the host's RunC component (https:// www.cvedetails.com/cve/CVE-2019-5736/ (accessed on 8 July 2021)). RunC is a container runtime that was originally developed as part of Docker, which is responsible for running and managing new container environments.

A vulnerability resides in RunC version < 1.0-rc6 (which is used by Docker < 18.09.2), allowing the attacker to overwrite the host's RunC binary and, thus, achieve code execution with root privileges on the host.

#### *5.5. Cloud Privilege Escalation*

Once the attacker has achieved code execution on Docker's manager host, they can execute the leadership hijacking technique and escalate their privileges in order to become the cluster leader (see Section 4.2 for a description of the leadership hijacking technique).

After the leadership hijacking technique's successful execution, the attacker obtains leader privileges in the cluster and, thus, will be able to control all messages that flow between the leader and other hosts in the cluster.

The result of the technique's successful execution can be seen in Figure 5. In this figure, we can see that, before the attack, UBUNTU-HOST3 was the cluster leader, and after the technique was successfully executed, UBUNTU-HOST1 (which is the attacker's manager) obtained the leadership role in the cluster.


**Figure 5.** Successful attack attempt.

#### *5.6. Lateral Movement and Defense Evasion*

Armed with leader privileges, the attacker can now control all messages that flow between the leader and other hosts in the cluster. As described in Sections 4.3.1 and 4.3.2, the attacker can execute a malicious container on each host in the cluster and hide these actions from various management tools.

To effectively demonstrate the attack and its potential impact, in our scenario, the attacker will run a WebShell service, which will run a WebShell container on every host in the cluster.

The malicious WebShell container provides a root privileged command execution environment on the underlying host. The host's file system is mounted in the container's /tmp directory. This allows the attacker to view, modify, and delete the host's files. Effectively, the attacker runs a root WebShell on all hosts in the cluster.

The output of the WebShell can be seen in Figure 6. In addition, the figure shows that the WebShell is executed with high privileges (root).


**Figure 6.** The output of the malicious WebShell.

The attacker uses the defense evasion functionality described in Section 4.3.2, hooking the leader's Docker daemon function, which is responsible for listing the services and containers of services. By doing so, any service listing request that is made to the cluster leader will be monitored by the attacker. In cases in which the attacker's malicious service is running, the attacker will spoof the answer of the listing and hide their malicious service image with a benign Alpine image.

As seen in Figure 7, docker service ls command reveals a single running service, with image "alpine:latest". In addition, it seems that there are no listening ports; however, in actuality, a container on each host is listening on port 80.

Furthermore, the attacker also hooks the function responsible for listing container of each service; thus, the output of docker service ps \$(docker service ls -q) does not reveal the real image that each container is actually running. According to Docker's default tools, it looks like the service running is a benign alpine service but accessing each host in port 80 reveals the true "face" of the service.


**Figure 7.** Docker's default tools used for viewing information about malicious services.

#### **6. Discussion**

The main advantage of our technique is that, unlike many techniques seen in the wild, our technique does not exploit any software bugs. A software bug is usually a mistake in a program's code, which can lead to an undefined behavior of the program. In most cases, software bugs are easily fixed. However, our technique does not exploit any programming errors but rather exploits a design flaw. Unlike programming bugs, logical bugs are much harder to fix, since, in many scenarios, a large amount of code must be changed, which can be costly and time-consuming for software developers.

As shown in Section 4.2, our technique exploits the fact that the Raft algorithm is used to replicate logs in the Docker Swarm environment but is a non-adversarial algorithm. Raft is a key component of Docker Swarm's management infrastructure, and it is integrated into the core logic of Docker Swarm. Replacing the Raft algorithm in Docker Swarm is a mandatory step to mitigate our proposed technique, since exploits used to escape from container to host (as shown in Section 5) are very common and relatively easy to find. Since its a design bug, replacing Raft requires a significant amount of work.

First, Docker's developers should choose and implement a byzantine fault tolerant algorithm [9,11] in Go, or find such an implementation as a Go package. The implementation should be high quality, since it will be deployed to every manager in the cluster. Next, the developers should modify Docker Swarm's source code. In Docker Swarm, Raft's implementation is encapsulated with a wrapper object. The developers of Docker Swarm should change the entire wrapper object to encapsulate the new package instead of Raft.

Then, series of tests should be ran to ensure that the new package meets Docker's efficiency requirements: both local and network. The new package should not consume a significant amount of the host's resources, and should be be efficient in terms of network activity between hosts in the Docker Swarm. Moreover, the tests should ensure that the new package works as expected on every operating system supported by Docker Swarm. Since managers are the most valuable servers in the cluster, any bug in a manager can be fatal. The tests should ensure, as much as possible, that the new package is bug free and that it has no unwanted side effects. In any case, replacing the Raft implementation holds a major risk and may cause a service degradation.

There are some best practices that may block our attack; the most common is to separate the manager nodes from worker nodes. In such a case, even if the attacker compromised a worker node, he will not be able to escalate his privileges in the way we suggested in this article, since the attacker's node is not part of the managers group. However, although considered a best practice, this is not the default behavior of Docker Swarm. We believe that Docker's developers chose to make the manager node a worker too

by default in order to not waste expensive computing power. If a node is just a manager, it will not receive the client container to execute, and hence the cluster's computing capacity decreases. Regardless, in this article, we chose to research and exploit systems in their default state and not delve into best practices.

We offer two strategies in order to effectively mitigate our technique. In the short term, the technique can be mitigated by detecting and blocking container escape exploits. As discussed in Section 4.2, the leadership hijacking technique should be executed from a manager host. We showed in Section 5 that an attacker can gain such access using a container escape exploit. In the case that the container escape exploit fails, an attacker cannot launch the technique and, therefore, cannot escalate his privileges in the cluster. In order to reduce the amount of container escape exploits, Docker can start a bug bounty program. We believe that this will help Docker patch container escape vulnerabilities before they can be exploited by real attackers in the wild.

In the long term, we offer to replace the Raft algorithm with a byzantine fault tolerant algorithm [33,34]. As discussed earlier, Raft is a non adversarial algorithm; hence, an attacker who is in control of a Raft's participant can forge and spoof messages. In that way, the attacker can trick other participants to vote for him in the leader election phase and become the cluster's leader. In the case that a BFT algorithm is used, other participants would not vote for the attacker since the algorithm can tolerate byzantine participants. In that way, the attacker would not be able to escalate his privileges to cluster leader. Furthermore, in order to support future changes, the developers of Docker should divide Docker's infrastructure from the leader election algorithm. The architecture of Docker Swarm should be "plug and play", such that the leader election algorithm is chosen as a configuration option instead of a source code modification.

#### **7. Conclusions**

In this work, we suggested a new attack vector on the Docker Swarm orchestrator. Our technique demonstrated a new concept in offensive security in which a cluster is treated as a single unit of processing and an attacker is able to escalate their privileges in that unit and, thereafter, perform malicious activity on every component of that unit separately (i.e., every host in the cluster).

We presented a novel technique that, when combined with our proposed payloads, allows an attacker to gain full control over the Docker Swarm cluster. Since our technique and payloads do not exploit a software bug but rather exploit a design weakness, developers should take them into account during the design of their multi-agent systems. Future research should, on the one hand, explore additional ways in which attackers can obtain leader privileges in other cloud environments, e.g., Kubernetes, and, on the other hand, develop methods to detect misbehaving managers, for example, using anomaly detection techniques.

**Author Contributions:** Conceptualization, A.F.; Software, A.F.; Validation, A.F.; Formal analysis, A.F.; Investigation, A.F.; Methodology, R.P.; Supervision, R.P.; Writing—Original Draft, A.F. and R.P.; Writing—Review & Editing, A.F. and R.P. All authors have read and agreed to the published version of the manuscript.

**Funding:** This research was partially supported by the Cyber Security Research Center at Ben-Gurion University of the Negev.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest.

#### **References**

