*3.2. Ransomware Detection*

In this section, we discuss the main research efforts for ransomware detection, mitigation and prevention. Detection methods rely on ransomware attack behaviors that affect computer systems such as files or network systems. They give an alarming signal to the end-users to prompt responses towards their files and important data. A SDN-based system that can improve protection against Ransomware by observing the ransomware attack is presented in [29]. By analyzing the behavior of two popular Ransomware, Cryp-toWall and Locky, they could be leveraged to detect Ransomware based on HTTPS messaging sequences and content size based on network-communication observations.

The authors of [35] proposed a Paybreak recovery solution to recover corrupted files on a victim's machine by extracting the encryption keys used to decrypt infected files following a Ransomware attack. PayBreak effectively implements a key escrow mechanism to store session keys in a key vault that can be encoded with a public user key; thus, the user may decrypt the key vault with his private key following ransomware attack. In another research work, Continella proposed ShieldFS in [36]. In this approach, the proposed scheme acts upon the operating and file system levels and serves as a shield to detect and correct any suspicious activities.

Kharraz in [27] carried out a long-term study of ransomware attacks and presents results leveraging analysis of more than 1300 samples collected between 2006 and 2014 that belong to 15 different Ransomware families. Further, the study showed that monitoring activities in the file system would ultimately help with Ransomware detection. R-locker, a general technique intended to prevent crypto Ransomware action, was first introduced by [37]. The researchers used the honeyfile technique to prevent a ransom once it accessed

a trap file. Therefore, the honeyfile technique helps to preserve the data on the system. Moreover, while the ransom is blocked, a countermeasure to eliminate the issue would be beneficial to eradicate the environment's problem.

The study presented in [29] came with the ultimate objective of detecting the underlying Ransomware and mitigating its impact on the systems. The work in [38] provides a signature-based detection approach by observing the original semantics of the dataset of malware. Here, semantics are required to be as effective as malware. However, the authors conclude that malware could be detected commensurate with these signatures at higher error rates with broad classes such as Trojans. In [39], the authors introduced CryptoDrop; an early warning system for ransomware attacks to notify users during any unusual file operation. Based on popular ransomware behavior criteria, the proposed solution tracks victim data and identified Ransomware in the process. Their study conducted experiments on 492 real-world samples of Ransomware, representing 14 families, and was able to achieve high detection rates with low false positives. Ransomware designers continually keep improving their techniques to spread their attacks, especially for Ransomware types that are not easily detected. They use encryption algorithms to hide malicious code within benign code to be executed later.

Shafiqq, Khayam, and Farooq [40] proposed a detection scheme to detect embedded malware, malicious code that is hidden within benign files, using statistical abnormal detection. Yfuksel, den Hartog, and Etalle [41] described a protocol-aware anomaly detection framework that aims to monitor a network from embedded malware access by scanning a network for SBM and Microsoft Remote Procedure Call (RPC) messages. The work presented in [42] studies the whole life cycle of Ransomware creation, design, and implementation using Dynamic Data Exchange (DDE) in Python scripting language and REST APIs in PHP, with the back-end being a MySQL database. Their study aimed to prove that even though many security measures and several top-quality antivirus programs are currently in use, ransomware authors continue to develop and write dangerous malicious codes that can be distributed easily through connected devices. Meanwhile, various research endeavors have widely explored analysis and detection of Ransomware based on its characteristics, leveraging machine learning techniques. In [43], Lim and Ramli applied machine learning techniques to classify extracted static and behavioral analysis, and they developed an efficient malware analysis framework based on the mentioned analysis features addressed thus far.

An approach to efficiently detect Ransomware was presented in [44]. The authors incorporated feature-generation engines and machine learning in a reverse-engineering framework. The purpose of malware code segments is to achieve better examination and interpretation in the proposed framework by performing multilevel analyses such as raw binaries, libraries, function calls, and assembly language. Binaries are decoded to assembly level instructions and DLL libraries using the object-code dump tool (Linux) and portable executable (PE) parser. The experiments were conducted using supervised ML techniques on both Ransomware and normal binaries. Seven of the eight ML classifiers that were tested had a detection rate of at least 90%.

In [45], G. Cusack, O. Michel, and E. Keller proposed a solution using programmable data-transmission from the network-traffic-monitoring engines between the infected computer and command and control server. They derived high-level flow features from this traffic and used this dataset to detect Ransomware. A detection rate of around 0.86 was achieved in this classification model.

While Ransomware is commonly found to infect personal computers rather more frequently, the rapid spread and increased usage of mobile devices and smartphones have often led Ransomware writers and hackers to pay particular attention to this evolving market. Although mobile applications are subject to specific standards by stores before they are made available to end-users, users can still find and download infected applications from these stores. Andronio, Zanero, and Maggi [46] developed a detection scheme based on training ransomware samples called HelDriod. Their approach detects whether a

particular application will attempt to lock or encryp<sup>t</sup> a mobile device without the user's approval. It can also detect ransom requests from within the text of the application itself.

Stokkel, M. [47] proposed a code using an open-source intrusion detection system called Bro to detect many samples. Alfredo Cuzzocrea, Fabio Martinelli, and Francesco Mercaldo [48] presented a fuzzy logic classification method to identify whether a mobile application exhibits Ransomware behavior; they performed their evaluation based on a dataset containing 10,052 legitimate and illegitimate android mobile applications.

The work presented in [49] proposed a detection method leveraging a Support Vector Machine (SVM). This, inherently, is considered one of a group of supervised algorithms for machine learning. By using this approach, they can identify the API calls logs of Ransomware samples based on their features. These authors evaluated this scheme using 276 real Ransomware samples and they concluded that their technique indeed increases the predictive accuracy and the correct Ransomware detection rate. Ref. [50] conducted a survey on Ransomware Detection Using the Dynamic Analysis and Machine Learning from 2019 to 2021.

### *3.3. Recovery from Ransomware*

This section provides an overview of the literature for recovery from ransomware attacks, the proposed schemes to counter them, and the efficiencies involved. Zimba A, Wang Z, and Simukonda in [51] examined samples from crypto Ransomware through reverse engineering and dynamic analysis to evaluate a Ransomware's underlying attack structures and deletion techniques. They conclude that no matter how disruptive a crypto Ransomware attack is, the key to data recovery is the underlying attack structure and the deletion technique applied. They show that data recovery based on the structure of the attack is possible. The work presented in [52] studies the recovery of lost files due to ransomware attacks in a network-shared volume scenario. It presents a software tool that monitors the traffic and records all user actions on the file. The authors demonstrate that their proposed tool can recover the file from previous and subsequent operations without taking the encrypted content as valid data. This tool, which could recover files successfully, is evaluated based on test-traffic records of 18 different families. The work presented in [53] presents a tool to perform evaluations for Ransomware backup systems during securityrisk assessment; this study would make auditors analyze backup systems effectively and improve organizational abilities to detect and recover from Ransomware attacks.

RDS3 is a novel Ransomware Defense Strategy in which it stealthily backs up data in the spare space of a computing device so that the data encrypted by ransomware can be restored [54,55]. Kim et al. [56] proposed a method to decrypt Hive ransomware and recover infected data. Continella et al. [36] described a self-healing, ransomware-aware file system by monitoring low-level filesystem activity. If a process violates a previously trained model, its operations are deemed malicious, and the side-effects on the filesystem are transparently rolled back. The work carried out by Ye et al. [57] suggests monitoring and analyzing operating systems events to ensure that a back up is created whenever a suspicious event is detected. In case the misgiving comes true, it can be rolled back.

### **4. Proposed Version-Aware Ransomware Recovery Framework**

In this section, we describe the proposed framework for Self-Healing Version-Aware Ransomware Recovery (SH-VARR). The main goal of the proposed framework is to serve as a version-control system and assist in recovery against ransomware attacks targeting XML-based documents. To achieve this goal, we implemented a distributed version-control system by adding the absolute URL path of the original file to keep track of file versions. Further, we employed access-control techniques to protect file versions from modification or deletion. These techniques ensure protection from ransomware attacks while allowing users to keep track of older versions of their files. Here, we point out that the novelty of our proposed framework relies on the way we combine well-known techniques from

access-control theory and version-control mechanisms to achieve the desired Self-Healing Version-Aware Ransomware Recovery of XM-based documents.

Figure 1 depicts the overall framework architecture. In this framework, all XML-based documents in a predefined directory go through the version-control module at the time of file closing to maintain the latest version of each document. The access-control module is activated by invoking the root daemon service to perform write protection for the snapshot version, which would be already pointing to the original file.

**Figure 1.** The overall architecture of SH-VARR framework.

### *4.1. Details of the Proposed SH-VARR Framework*

We first describe the version-control module, illustrating the importance of using absolute URL links to keep track of old versions of a file. This is followed by a detailed description of the access-control module.

### 4.1.1. Version-Control Module

The version-control module is designed to maintain a copy of the XML-based file at the time of file closure so that the latest version can be retrieved in case of any corruption or system failure. We use the term *snapshot* to refer to the resulting file version. This can be achieved by adding a special plugin for Microsoft Word or LibreOffice. As part of this work, we have implemented a custom plugin for Microsoft Word 2013.

Our framework is specifically designed to recover XML-based documents in a predefined folder/directory in case of a ransomware attack. Microsoft documents and LibreOffice documents are XML-based documents that are originally compressed using the zip compression algorithm. To create a snapshot of a .odt or .docx file, the plugin performs the following steps:



As an illustrative example, consider Figure 2, which shows the main steps performed by our distributed version-control module to obtain a new version for an XML-based file abc.odt. In this example, we assume that the file is in the user directory /home/user/documents. The version (i.e., a file snapshot) is created by renaming the file to abc.zip and then unzipping the resulting file to obtain the XML file archive. The main reason for performing this step is to add an absolute path (i.e., a link) to the location of the newly introduced version. Assuming that the file version will be stored in: /home/user/versions with the nameabc-version1.zip, then the absolute path /home/user/versions/abc-version1.zip will be saved in the link.XML file that is added to the document archive in step 3. In step 4, the XML-based document archive is compressed back to obtain abc.zip. At this point, the file is copied to the predefined protected versions directory /home/user/versions. Finally, the file extension is changed to .odt.

Here, note that the version-control module is invoked at the time of closing the document. This ensures that a new snapshot of the XML-based document is saved each time the user closes the file. Here, we emphasize that keeping track of document history (i.e., versions) is achieved by following the absolute path stored in the link.XML file stored in each version. Figure 3 shows the approach used to retrieve older versions. Staring with the newest version ( *VN* ), it is possible to retrieve the preceding version by following the link found in the link.XML file stored in the version itself. Older versions can be retrieved similarly. For recovery from a ransomware attack, it would be sufficient to keep the latest version only. However, suppose the objective was to retrieve older file versions while providing ransomware recovery capability. In that case, the system can be configured to store protected versions in precisely the same way as described in this section.

**Figure 2.** An illustrative example of the main steps of the version-control module.

**Figure 3.** Keeping track of file version history based on link concept.
