5.4.1. Description

This second branch refers to scenarios where the AWS policy administration in an organization is not working properly, and a user account maintains unnecessary policies, e.g., when a user changes role or area in a company. This scenario, caused by a misconfiguration in the IAM module, may be more critical when such a policy enables the user account to restore policies. Thus, the user may cause an elevation of privileges that allow him/her access to services and data in an unauthorized way. As part of the security inspection that a cybersecurity team could execute over a business infrastructure, one may assume that an internal attacker, e.g., an employee or contractor, could be interested in validating if his/her account allows the execution of policies additional to the required ones for the role. In addition, in the case of an external attacker, he/she could be interested in validating if some previously compromised AWS account, which contains limited permissions, can be elevated.

Considering the previous scenario, the following SCE definitions aligned to the scientific method are posed:


For the **execution** of the second branch of the tree, ChaosXploit checks the policies assigned to the user account's profile defined for the experiment setup. If it identifies that the user account has the permission to restore previous versions of its policies, then it lists all the policy versions and searches for the one with elevated permissions to gain access to a privileged service, i.e., the AWS managed storage service (S3). This will achieve the goal of the attack tree: to extract or modify information. If the user does not have such a permission, ChaosXploit will start the execution of the third branch of the presented attack tree.

The upper part of Table 5 shows the two main variables that were monitored through the experiments of branch 2, i.e., *Attached-User-Policies* and *Current-Policy*. First, *Attached-User-Policies* is used at two moments of the branch execution: (i) at the beginning of branch 2 to identify all policies associated with a user account, and (ii) at the middle of branch 2 to

identify permission associated with the user account that allows for the restoration of the previous version of policies and a previous policy that may be a suitable candidate to be restored, e.g., a policy that allows for the extraction and modification of information in the AWS S3 service. Second, *Current-Policy* represents the current version of the user's policy set, so this variable verifies whether the previous policy's restoration was successful.

**Table 5.** Monitored variables and input parameters considered along the execution of branch 2 by ChaosXploit.


On the other hand, the lower part of Table 5 shows the input elements that ChaosXploit receives for the execution of this branch. In this case, ChaosXploit uses the name of the user account (user account) for whom the security inspection must be performed. In addition, ChaosXploit takes as a parameter the name of the output file (*output*) for where to store the results.

#### 5.4.2. Results Analysis

Figure 5 shows the execution of ChaosXploit for branch 2, which includes (i) the setup of ChaosXploit (lines 1–5), (ii) the steady state validation which assumes a correct configuration of the policies assigned to the user account under analysis (lines 6–10), (iii) execution of the actions that allow validating the hypothesis through an attempt to restore a previous policy (lines 11–20). This last set of lines includes listing the user policies (line 13–14), validating the current version (line 15), identifying the version that allows the privilege escalation (line 16), restoring the desired policy (line 17–18) and validation of the restore (line 20).

**Figure 5.** Validation of the steady state and elevation of privileges achieved by ChaosXploit through branch 2.

Table 6 shows the details of each of the policy versions found by ChaosXploit for the user account under analysis. This table lists the policy versions, the effects on the actions (either allow or deny access), the actions that indicate what the user can or cannot do, the resources on which the action may be applied, and additional conditions under which the policy has an effect. The current policy version (1) has limited actions related to the IAM service, but it still allows to change the policy through the action *SetDe f aultPolicyVersion*. It is also possible to identify the policy version 5, which includes some actions to manage the AWS S3 service. However, such actions would not allow reaching the attack goal because they do not allow modifying information. Finally, the version chosen by ChaosXploit (2) to be restored was the one that allows any action on any resource without any condition.

**Table 6.** Policy versions found by ChaosXploit through branch 2.


Once the previous policy is restored, as shown in Figure 5, ChaosXploit initiates the actions shown in Figure 6. Between the first actions, ChaosXploit establishes the connection to the target and defines the *collect* mode to inspect the files in the bucket and the *write* mode to write a new file (lines 1–4). Additionally, ChaosXploit creates new files in the S3 bucket, as this experiment was being executed in its own controlled environment (lines 5–6). The validation of the steady state at lines 8–10 failed in this case as the policy settings can be manipulated and used to alter the information.


**Figure 6.** Attack goal (extract or modify information) achieved by ChaosXploit through branch 2.

In experiments executed along branch 1 (Section 5.3) and branch 2 (Section 5.4), the attack goal was achieved so the experiments ended in a **critical** state similar to the one seen in line 11 at Figure 6. Table 7 shows the summary of the results for this second experiment, considering the differences between traditional pentesting and SCE presented in Section 3.3. In this case, we highlight the ChaosXploit capabilities to develop this kind of experiment that exploits the AWS authorization module. Additionally, we define the scope of the experiment only to users belonging to the same IAM account. Finally, as the experiment ended in a critical state, we report a vulnerability associated with privilege escalation, which allows a user to pass from few to many permissions, putting the confidentiality and integrity of the information available in the different AWS services at risk.

**Table 7.** Results of ChaosXploit's execution of branch 2 in terms of the differences between traditional pentesting and SCE.


#### **6. Toward an Adoption of SCE in Industry**

With the growing adoption of CE, many companies have included it as a discipline for improving reliability. According to InfoQ [41], the appropriation of CE practices to inject failures and generate resilience has evolved to the "Early Majority stage", which means that its adoption is about one-third of the overall population. Gremlin, Litmus, and Steadybit are some key CE initiatives that have contributed to this achievement.

The stories of the adoption of CE reported by companies such as Capital One, Linkedin, Google, and Microsoft [34] are examples of its wide acceptance. The appropriation of CE as a common discipline to inject failures and generate resilience provides arguments to justify the success of this discipline between industry and academia.

Not only have the failures of the infrastructure attracted the attention of practitioners, but data breaches and security incidents have risen in recent years [42]. Failure to implement basic configurations and appropriate security controls have led to causes that contribute to the security incidents. Undoubtedly organizations are being asked to produce with extremely high throughput and with very little resources to maintain the security status quo. All the while, there is a divergent gap in how we design and build distributed systems and approach security engineering.

In this sense, SCE serves as a foundation for developing a learning culture around how organizations build, operate, instrument, and secure their systems. The goal of these experiments is to move security in practice from subjective assessment into objective measurements. As they do in CE, Security Chaos experiments allow security teams to reduce the "unknown unknowns" and replace "known unknowns" with information that can drive improvements to security posture. The promise in terms of adoption and sophistication is immense.

Even though introducing false positives into production networks and other infrastructures under the context of CE is a common practice nowadays, SCE is still seen as more of an academic research topic than industry practice. Nevertheless, in recent years, SCE is starting to become known in the industry. One example is the Thoughtworks report [43], which documented an evolution around this technique migrating SCE from a phase of "Assess" to "Trial", which means that SCE could be eventually used in a controlled way and validated that the security policies in place are robust enough to handle common security failure modes.

Another remarkable example of the application of SCE in the industry was documented by Jamie Dicken [44]. She wrote about her SCE journey at Cardinal Health, a global Fortune 20 healthcare manufacturer and distributor of medical and laboratory products and a provider of performance and data solutions for healthcare facilities. Cardinal Health needed an applied security model to protect critical infrastructure and data as it was moving to the cloud, and SCE became the most appropriate answer. Cardinal Health created a process named Continuous Verification and Validation (CVV) that, by using SCE, allowed them to continuously verify that security controls were working correctly and as expected.

Adopting SCE first requires a solid understanding of the principles of chaos. For example, insufficient observability of the chaotic experiments would impede drawing reliable statements about a hypothesis. After understanding the fundamentals, the next step should start by developing competency and confidence in the methods and tools needed to perform the SCE experiments. For this, a new SCE practitioner may decide to start designing small and manual experiments. In case the hypothesis is not disproved, we can automate the experiment. Here, ChaosXploit may play a key role as one of the few SCE platforms existing nowadays that may enable the industry to design and execute experiments aimed at the automatic and controlled exploitation of vulnerabilities and validation of systems security. Security validations can also be achieved progressively through security chaos game days that allow players to advance in this path without causing a security incident on production.

On the side, diverse teams should know and try SCE since it is no longer a limited concept for Security Engineers or security teams. We believe that if SCE begins as an engineering practice, it could be quickly adopted by other roles (Cloud Engineers, Software Engineers, Site Reliability Engineers) and teams (platform, infrastructure, operations, and application development) as it would allow them to improve the reliability of their applications through proactive testing of their own security.

#### **7. Conclusions and Future Work**

The digital revolution, or digital transformation, as it has been called in recent years, has proven to be an incredible driving factor in our society. Thanks to this revolution, our society was able to handle some of the most serious restrictions that the recent pandemic put on different essential services, e.g., the use of highly interactive e-health services in response to the restrictions regarding in-person medical consultations, exploitation of e-learning platforms to face the limitations in the physical access to formal educational services, enabling e-payments as an alternative to the use of traditional financial services, among many others.

On the downside, such a change also implies the existence of ill-motivated entities that constantly try to attack connected systems to damage the confidentiality, integrity, or availability of the provided online services. Such threat entities use increasingly advanced techniques, for example, based on malware campaigns [45] or threats addressed to a specific technology [46].

Over the last years, a novel paradigm has emerged, the so-called Chaos Engineering (CE), whose main objective consists of testing the resiliency of distributed and complex systems through continuous observation and experimentation. More recently, the paradigm has evolved to embrace the entire cybersecurity ecosystem, i.e., Security Chaos Engineering (SCE) comes into play to defend the system assets against cyberattacks through continuous and rigorous experimentations on possible security holes and consequent mitigations.

In this paper, we proposed ChaosXploit, an SCE-powered framework that can conduct SCE experiments on different target architectures. Based on the hypothesis generated by the knowledge database and the attack representations, ChaosXploit executes SCE experiments over a target to find a potential security problem as an ultimate goal. In addition, ChaosXploit features an observer that is in charge of verifying the change between the steady state of a certain hypothesis and the current state of the system. To prove the capabilities of ChaosXploit, a set of experiments was conducted on several AWS S3 buckets, evaluating their security characteristics with SCE. The results demonstrated that our approach could be successful, highlighting several unprotected buckets for a specific attack path. To foster its adoption, ChaosXploit was made publicly available for the cybersecurity community through the repository of the project [39].

Future work will explore the possibility of widening the ChaosXploit framework target architectures to include other use cases, systems, or providers. That is, the extension of the Attack Trees knowledge base is considered mandatory to include a number of different application scenarios, which can lead to the potential improvements of ChaosXploit, too. Particularly, one could easily argue that using a standardized attack modeling methodology (e.g., MITRE ATTC&K [47]) would be beneficial for the proposed SCE framework, even if some adjustments are needed to achieve full compliance. Besides, integrating a recommendation module to suggest countermeasures once a security flaw is discovered is worth investigating. In this sense, several attack models have been proposed in the literature so far, and some of them already integrate the Attack Trees representation adding countermeasures (e.g., Attack Countermeasures Trees [48], Attack Response Trees [49], etc.). Thus, ChaosXploit may incorporate those representations in the Knowledge base and select the optimal reaction to fire against the threat based on specific criteria [50]. Moreover, the performance of ChaosXploit should be further evaluated to prove its usefulness in performance-demanding or critical scenarios. Expressly, the assessment of the response time and resource consumption is essential to argue the applicability of the presented framework in scenarios where the threat discovery procedure must be executed in real-time or with limited computation capabilities.

**Author Contributions:** Conceptualization, S.P.C., P.N. and D.D.-L.; methodology, S.P.C., P.N. and D.D.-L.; software, S.P.C.; validation, P.N. and D.D.-L.; formal analysis, S.P.C., P.N. and D.D.-L.; investigation, S.P.C. and P.N.; resources, D.D.-L.; data curation, S.P.C.; writing—original draft preparation, S.P.C., P.N., D.D.-L. and Y.N.R.; writing—review and editing, P.N. and D.D.-L.; visualization, S.P.C.; supervision, P.N. and D.D.-L.; project administration, D.D.-L.; funding acquisition, D.D.-L. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work has been supported by Universidad del Rosario (Bogotá) through the project "IV-TFA043—Developing Cyber Intelligence Capacities for the Prevention of Crime" and through "Becas para Estancias de Docencia e Investigación. Universidad del Rosario".

**Institutional Review Board Statement:** Not applicable.

**Informed Consent Statement:** Not applicable.

**Data Availability Statement:** Not applicable.

**Conflicts of Interest:** The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

#### **Abbreviations**

The following abbreviations are used in this manuscript:

