1. Introduction
Traditional digital forensic techniques and tools are becoming obsolete as a result of technological evolution. Proactive digital forensics is the new trend, as most investigations will not require analysts to be present on the premises to obtain digital evidence. Procedurally, digital forensics has always been performed in a post-mortem analysis fashion [
1]. However, in recent times, organisations must shift their focus from a reactionary to a proactive strategy. Technologically enhanced businesses need to realise the potential to detect and analyse potential evidence prior to incidents occurring, preferably when monitoring endpoints or network traffic. This allows for the extraction of evidence for criminal proceedings in a sound and timely manner whilst at the same time proving compliance. Digital forensics aims to obtain good and sound electronic evidence that is admissible in court. Comprehensive Digital Evidence (CDE) is defined as electronic evidence containing the relevant facts necessary to establish the cause of a case, thereby connecting links to the perpetrator and leading to a successful prosecution [
2].
The adversary emulation concept originates from the MITRE ATT&CK group. ATT&CK is an abbreviation for “Adversarial Tactics, Techniques, and Common Knowledge”. The framework serves as a model document for tracing various methods threat actors use in the stages of a cyberattack through intrusion and exfiltration of data [
3]. During adversary emulation, traces of evidence are created, enabling an investigator to identify, examine, collect, and report their findings. Adversary emulation is an offence team engagement that simulates known threats by leveraging threat intelligence to determine the attackers’ actions and behaviours. The difference between adversary emulation and penetration testing stems from the fact that the former employs threat intelligence in addition to exploiting vulnerabilities or weaknesses in a system.
This research paper analyses proactive digital forensic investigations as a solution to current problems emanating from cloud computing and/or networked systems, such as evidence identification, collection, preservation, analysis and timelining from vast sets of cumulative data of a heterogeneous nature. With the increasing complexity of cloud infrastructures and distributed systems, operations must continue in the event of an incident. This study aims to analyse proactive digital forensic investigations by leveraging adversary emulation in a virtualised environment. Henceforth, the objectives for the research consist of enabling forensic-driven endpoint monitoring, simulating adversary emulation, automating the identification and collection of digital forensic artefacts, and, lastly, cross-examining evidence obtained from network and endpoint devices. The desired results were achieved by combining the efforts and functionalities of security operations and an endpoint detection and response (EDR/Digital Forensics) system. This paper is organised as follows:
Section 2 details the different types of forensics approaches and related work,
Section 3 describes the deployment model for proactive digital forensics, followed by the used methods and materials in
Section 4.
Section 5 and
Section 6 detail the results and their interpretation.
Section 7 presents future works, and
Section 8 is dedicated to conclusions.
2. Literature Review
Digital forensics is described as a part of forensic science that focuses on the investigation and examination of artefacts collected from electronic devices [
4]. On the other hand, the National Institute of Standards Technology (NIST) defines it as “the application of science to the identification, collection, examination, and analysis of data while preserving the integrity of the information and maintaining a strict chain of custody for the data” [
5]. Proactive forensics may be described as the design and configuration of systems to make the organisation responsive to digital investigations in the future [
6]. The authors’ concept in [
6] is centred around the ability to analyse logs, asserting that proactive forensics are long-term and involve the configuration of alerts and system properties as opposed to impulsive intrusion detection. There are three types of digital forensics: proactive, active, and reactive forensics [
2]. Reactive forensics is defined as an examination of computer devices after the incident has occurred—also called dead forensics [
7]. In the case that an incident did occur, proactive forensics should have already put in place processes, methods, techniques, and tools on how to conduct the investigation, thus cutting costs, reducing the impact of the incident, and improving investigative efficiency [
2,
8,
9]. Active digital forensics, however, relies on the intrusion detection element to ensure relevant and court-ready electronic evidence when an investigation is deemed necessary [
10]. Remote forensic investigations are an essential tool in the implementation of active digital forensics, and this usually requires the analyst to use some programs that pre-existed in the device being analysed within the timeframe of the incident occurring [
2].
In [
7], the authors performed a systematic literature review, and in their findings, they regarded the multi-component view as too broad, making it inefficient to implement within automated solutions. Thus, proactive forensics and reactive forensics were added, with the active component removed from the resulting model. Therefore, this study adopted the definition of Proactive Digital Forensics as a method to establish processes, procedures, policies, tools, and technology ahead of time to collect an event/alert and safely preserve and examine evidence in the case of an incident [
7]. The main objectives of proactive digital forensics are “system structuring and augmentation for automated data discovery, lead formation, and efficient data preservation” [
2]. Proactive forensics has five phases, which are “proactive collection, event triggering, proactive preservation, proactive analysis, and lastly, preliminary reporting” [
2]. Proactive digital forensics resemble digital forensic readiness and computer intrusion forensics. Computer intrusion forensics (CIF) differs from classical computer forensics in that CIF occurs when an intrusion has been detected and a need arises to assess the incident, whereas the latter is focused on gathering evidence from digital devices, which may not necessarily be a computer [
11].
Intrusion detection is dependent on the analysis of logs and computer audit trails gathered from various critical infrastructures such as routers, servers, and PC workstations. Based on the same principles, forensic readiness aims at making the best use of incident data as evidence whilst minimising the cost of the forensic operation [
9,
12]. The resulting incident evidence has relevant potential use in internal matters, regulatory compliance, and as evidence in court. It may also be implemented in vulnerability assessments and operational troubleshooting [
12]. The extent to which network security services or tools may be used in the collection of evidence from computer and network systems during an incident is unclear [
8]. However, a real-time forensic examination may present to the investigators a somewhat theoretical opportunity to extract pieces of evidence about the intrusion. There have been arguments on what is better between a Host Intrusion Detection System (HIDS) and a Network Intrusion Detection System (NIDS); however, due to the sophisticated nature of modern-day attacks, a mixed solution is preferred [
9]. Apart from the HIDS, Endpoint Detection and Response (EDR) systems have gained more attention recently because they constantly offer threat monitoring and rapid response, thus ensuring that an entire enterprise is protected at all times [
13]. The major advantage of an EDR over the HIDS is that it combines the prevention component, enables investigation after detection, and responds within a single platform, hence providing unmatched security and operational effectiveness [
13].
Traditionally, computer forensics is aimed at examining the duplicate bit-copy equivalent of a disk extracted out of memory, file and web history, network connections, jump lists, and link files, which proffer a basic understanding of the activities that would have been performed on a victim’s electronic device before it was shut down [
14]. Digital forensics can also be performed on live hosts using specialised software. An author in [
14] proposed a collection technique in a virtualised environment involving taking snapshots of a virtual machine either through a new virtual machine instance user transfer or the use of an algorithm to determine an incident timeline to take a snapshot. However, traditional digital forensic tools are becoming outdated as technology evolves, and tools such as Forensic Tool Kit (FTK) [
15] and Encase [
16] might no longer handle the complicated nature of modern systems and applications [
17]. Some of the drawbacks of these techniques are that they are unable to extract pre-incidental evidence and they are not very capable of being used remotely, as the communication overhead can largely impact the quality of forensic results [
17,
18]. The authors in [
19] suggest that, for an organisation to be considered digital-forensics-ready, there should be a “communication channel, Encryption, compression, authentication of log data and proof of integrity, authenticating the client and server, and timestamping”. In the cloud, there is a point where cloud security and digital forensics converge, and, hence, unifying security and digital forensics may improve the forensic capabilities as well as the security of cloud environments and/or networked systems [
20].
3. Conceptual Model
The concept of proactive digital forensics resembles an architecture similar to a security operation centre (SOC), where there is continuous monitoring of the network and endpoint monitoring. Digital forensics and incidence response infrastructure (hardware and software) is installed in advance, anticipating future incidents that may require quick access to electronic evidence artefacts for investigations.
Figure 1, below, illustrates the proactive digital forensics concept using a unified modelling language (UML) diagram.
The proactive digital forensics investigation begins with network and endpoint monitoring, where a monitoring server assisted by an agent (in the case of an EDR) or a network tap (in the case of NIDS) constantly communicates with the network and host devices over a network in the search for suspicious behaviour. The status corresponds to an alert or notification received, and the forensic investigator verifies whether it is a positive, false positive, negative, or false negative. The investigator checks whether an Indication of Compromise (IOC) or an Indication of Attack (IOA) exists or not. Given that it exists, an investigation is initiated either on the network or the endpoint. The identification of artefacts is automatic in known attacks, which makes it easier for the forensic investigator to extract and preserve the evidence. The investigation is performed on top of security monitoring, where there is increased visibility on both the network and endpoint devices. The security events received are based on the NIDS detection capabilities either using signature-based identification or behavioural analysis. Based on the events on the NIDS, the analyst then performs a digital triage on the endpoint to obtain further details.
Unlike the solutions in [
11,
14,
21] of taking a virtual snapshot of the computer system, which can be costly in terms of network overhead due to its size, the PDFI concept implements real-time digital forensic triage. The authors in [
22] define a digital triage as a technical process that enables efficient identification, verification, and collection of ESIs to prioritise digital artefacts for easier analysis. There are several definitions of digital triage, nonetheless, the researchers have chosen to use the term “forensic digital triage” because the subsequent results are to be used for a forensic examination. In addition, the collection is done forensically, as the identified digital artefacts/ESI would be validated and verified through a hash signature calculation and transported over a secure network for further analysis, as recommended by the authors in [
19]. Furthermore, unlike the traditional digital forensics model, which does not provide pre-incidental information and analysis (normally done in a post-mortem fashion), proactive digital forensics offers a live examination, thus leading to the faster processing of evidential media. In [
23], the authors took a study on the use of keystroke logging as a proactive digital forensics digital preservation technique. However, it is not mentioned that the concept is performing a live investigation or triage in near real-time.
The advantages of the proposed model are forensic readiness, an increase in organisational security posture, proof of information governance (compliance), lower downtime (continued operations), and increased precision of the identification, collection, and analysis of digital evidence. Proactive digital forensics can provide near real-time results, thereby enhancing resilience to threats and information governance needs. However, the legality of live and/or remote forensics is still of major concern in several jurisdictions, thereby affecting the credibility of digital artefacts obtained in this manner. The concept relies heavily on the principles of information security, which are confidentiality, integrity, availability (CIA), authentication, authorisation, and accountability (AAA). SOC architecture and Endpoint Detection Systems should therefore offer the principles mentioned above because security breach data or potential evidential data are categorised as sensitive data.
6. Discussion
With reference to the results section, one could observe that the digital triage was a comprehensive investigation that was conducted in near real-time. It tells a story of exactly what happened to the system from the moment of the attack to the execution of the payload and the provision of access to an attacker. Within an organisation that is incidence-response-focused, it would favour the eradication of the threat over the containment and collection of forensic evidence. However, these can occur simultaneously because an organisation will need to prove to the regulator that the attack was beyond its technical and administrative safeguards in the case of a data breach. The research had four objectives, of which three were met and the remaining one was partially fulfilled. The first objective was to enable forensic-driven endpoint monitoring, and this was achieved by setting up a virtualised networking environment that simulated real-world networks. Of course, the virtualised environment is different from the real world; nevertheless, it provides the basis for the analysis and testing of the concept. This virtualisation setup was comprised of adversary emulation software as well as security monitoring and digital forensic tools. The velociraptor digital forensics/Endpoint Detection and Response software was installed on the monitoring server, and agents were deployed to client/server virtual machines. Through the aforementioned actions, the first step of the proactive digital forensics model was established, which is infrastructure readiness and enhanced incident-detecting capabilities.
Substantially, forensic-driven endpoint monitoring made it possible to effortlessly perform digital forensic functions such as reducing, examining, collating and reconstructing evidence from the network and endpoint devices, thus enabling the analyst to perform their duties effectively [
29] whilst obtaining comprehensive digital evidence [
2]. The second objective was to simulate adversary emulation that mimics the tactics, techniques, and procedures used by adversaries in compromising networks and systems. The researchers were able to reproduce seven (7) adversary emulation techniques along with the Mitre ATT&CK matrices framework [
30]. For illustrative reasons, only one attack scenario has been documented and demonstrated throughout this paper. Among the implemented phases, the researchers can mention reconnaissance, initial access, execution and code persistence, privilege escalation, credential access, discovery and lateral movement, C2, and data exfiltration [
35]. This technique allowed the researchers to explore various attack techniques, not only assessing how they are investigated but performing the investigation itself. Several digital forensic concepts were discussed, ranging from file systems and memory forensics to network forensics, hence providing a detailed assessment of proactive digital forensics.
The third objective was to automate the identification and collection of digital artefacts. The automation of the identification and collection of digital artefacts was carried out by the NIDS, which was assisted by Stenographer, a full packet capture software. During laboratory experiments, the researchers observed that automation in the identification component enhanced the analyst’s investigative capabilities, making it more efficient to extract digital artefacts. Moreover, the Velociraptor digital forensics/EDR was very resourceful in performing forensic investigations on the endpoint, especially in acquiring live evidence over a network. However, the extraction of the artefact needed an analyst to examine and collect the digital artefacts. Hence, the researchers concluded that automated digital artefact collection can be carried out to a certain degree, that is, through enhanced detection capabilities. Therefore, automation efforts should focus on detection and preservation, while the digital forensic examination itself is performed by a forensic investigator, using their knowledge and skills along with specialised software.
The last objective was to cross-examine digital evidence found from network sources with that of endpoint devices with the intent of corroborating evidence from both perspectives. It was observed that the approach provided thorough insight and coherence between the digital artefacts found on the network and endpoint devices, hence removing any doubt that an incident indeed happened. In some experiments, it was observed that more artefacts were being found on the endpoint as opposed to the network, and vice versa. Hence, having evidence from both the network and endpoint devices will enable an analyst to observe what truly transpired.