**2. Background**

In this section, we introduce a set of definitions related to Android apps, machine learning, and features used by BrainShield to detect malware apps.

Malware detection is a classification problem [7], which consists of determining the class of an app. The different classes presented in this paper are of two types: (1) malware app; and (2) benign app. Malware is an Android package kit (APK), also known as an Android app, used to serve illegal purposes, such as espionage or extortion. An app is benign if it is legitimate and harmless.

Machine learning [8] is a discipline that consists of many different methods and objectives. We use: (1) the fully connected neural networks (i.e., dense layers) with one vector Tensorflow algorithm; the Dropout regularization on the hidden layer for reducing overfitting and improving the generalization error of deep neural networks; (2) the Sigmoid activation function on the output layer to give a probabilistic distribution between 0 and 1; and (3) the optimizer ADAM to optimize the error.

The common point of these machine learning methods is to provide them with many features, labeled for supervised learning or not for unsupervised learning, which serve as input to the learning algorithm. The quantity of data and a balance of data are very important to build a precise classification model that we adopt in our proposed model. Labeling is the act of considering an app as a malware app (i.e., value = 1) or as a benign app (i.e., value = 0). Therefore, we use the binary class classification method that gives, as output, a probability value between 0 (i.e., benign apps) and 1 (i.e., malware apps).

Features are needed in the case of supervised learning. They represent an app as faithfully as possible. Static features are those obtained using static tools, while dynamic features are those obtained using dynamic tools [9].

Evaluation metrics [10] are quantifiable measures, which determine if the detection model efficiently differentiates malware from benign apps. Among these metrics, let us quote the ones used for the evaluation of the performance of our proposed model. The accuracy represents the proportion of correct predictions. The precision is the proportion of correct positive predictions. A detection model producing no false positive has an accuracy of 1. The recall is the proportion of actual positive results that have been correctly identified. In addition, the recall is called the true positive rate (TPR). A detection model producing no false negative has a recall of 1. The F1 score is the harmonic mean of the precision and the recall. Therefore, this score considers both false positive and false negative. The area under the receiver operating characteristic (AUROC) curve measures the two-dimensional area underneath the receiver operating characteristic (ROC) curve. It gives an aggregate measure of performance across all classification thresholds.

#### **3. Related Work**

In this section, we present a literature review based on four categories of malware detection methods for mobile devices using the Android operating system: (1) company solutions; (2) static method; (3) dynamic method; and (4) hybrid method. At the end of this section, we present the limitations of the existing methods.

#### *3.1. Company Solutions*

In this section, we present a non-exhaustive list of the most popular Android apps, known as antivirus, available on the Google Play Store. This list provides solutions proposed by companies that have additional features to detect malicious apps. Table 1 illustrates a comparison of these Android apps, including the descriptive information for each app, according to Google Play Store in autumn 2019, as well as the prices offered by each app publisher.


**Table 1.** List of antivirus software on Google Play Store.

The detection methods used by Android apps and presented in Table 1 are not known. This opacity does not allow us to develop our own detection method, but guides us to study more existing detection methods on the market. In addition, most of these Android apps provide additional functionalities besides malware detection, such as network scanner, virtual private network (VPN) service, AppLock, and permissions scanner. Typically, these features are accessible through a monthly or annual paid subscription.

Even Google Inc. cannot be certain of the 100% detection rate. Although Google Inc. made huge strides in 2019, its Google Bouncer in 2012 detection system was bypassable. Indeed, the official announcement of its existence in February 2012 [11] caused a boom in the field of research. Several researchers have studied Google Bouncer to find out more. On 4 June 2012, Jon Oberheide and Charlie Miller [12] presented interesting results. They were able to explore the system using a command system to search for attributes of the Bouncer environment, such as the version of the kernel running, the contents of the file system, or information on some of the devices emulated by the Bouncer environment. Against all these new and increasingly virulent threats, Google Inc. has revised its policy and established Google Play Protect [13], which is the integrated malware protection platform for most Android devices. The Google Play Protect is supported by machine learning techniques to analyze more than 50 billion apps per day. Despite those advancements, malware is still found in the Google Play Store [14].

#### *3.2. Static Method*

The static analysis method does not require running the app on a device. It focuses on the app code rather than on its actual behavior when it is executed, since the app code is supposed to be faithful to the app functionality.

Fournier et al. [4] proposed a static detection method based on 151 Android system permissions trained with Waikato environment for knowledge analysis (WEKA). The model is based on training a set of 10,000 apps, consisting of 5000 benign apps and 5000 malware. Malware is from the Drebin dataset [15] dated from 2010 to 2012. The benign apps come from the top 500 in each category of the Google Play Store. The inconvenience is that no security check was offered to verify that such apps were non-malware. In the same vein, the accuracy announced on the test set is 94.62%.

IntelliAV [16] is an on-device malware detection system, which uses static analysis coupled with machine learning. The app is available on Google Play Store. Based on a training and validation set of 19,722 apps, including 9664 malware ones, the authors obtained a TPR of 92.5% and a false positive rate (FPR) of 4.2% on the validation set, with 1000 attributes generated by the training process. Moreover, the authors evaluated their model on a set of 2898 benign apps and 2311 malware from VirusTotal dated from February 2017. The accuracy is 71.96%.

MaMaDroid [17] detects malware from a behavioral perspective, modeled as a sequence of abstract API calls. It is based on a static analysis system that collects API calls made by an app, and then builds a model from the sequences obtained from the call graph in the form of Markov chains. This ensures that the model is more resilient to API changes, and that the feature set is manageable in size. MaMaDroid has been tested using a dataset of 8500 benign apps, and 35,500 malware collected over a six-year period, with F-measure reaching 99%.

DroidSieve [18] adopts a combination of features, which is suggested by authors as crucial for the robust detection of simple and obfuscated malware. Thus, syntactic features (e.g., API calls and system permissions) are integrated into such a detection method. These features have been used to build a classifier that is robust for both old and new malware, which tend to be increasingly obfuscated. To enrich all the syntactic functionalities, new features based on explicit intentions, meta-information and Dalvik Virtual Machine (DEX) files have been added. The authors created a ranking system of the most relevant features for detecting malware, where Android permissions and intents come first. The system achieves an accuracy of 99.82% with zero false positives.

FlowDroid [19] is a tool that performs taint analysis on the app code, which enables the discovery of connections where the device's International Mobile Equipment Identity (IMEI) is sent to a third party, using the network. It achieves 93% as recall, and 86% as precision.

Maldozer [20] is based on the classification of raw sequences of calls to API methods, using deep learning techniques. Maldozer can be used as a malware detection system on servers, on mobile devices, and even on Internet of Things (IoT) devices. It achieves an

F1-Score of 96–99% and a false positive rate of 0.06%. The datasets used were from the Malgenome project (2010–2011).

AndroGuard [21] is a Python library that extracts various information from code, resources or the AndroidManifest.xml file from Android. It is used for static feature extraction.

#### *3.3. Dynamic Method*

The dynamic analysis method requires running the app code on a device. Dynamic analysis is used in the literature, since techniques, such as encryption, obfuscation of code, dynamic loading of code or reflection, can be implemented to avoid detection by the static analysis method. A significant number of searches attempt to work around this problem by monitoring the actions of the app in an emulator or on a real device.

TaintDroid [22] introduces and prototypes a taint tracking method, which is widely used. The authors had to manually explore the apps, which greatly limits the number of apps that can be analyzed. Indeed, only 30 random apps have been selected.

AppsPlayground [23] takes the concept of taint tracking and develops an intelligent method of input generation and app path for dynamic analysis, which makes the detection automatic, and where the tests are performed on emulator. On the other hand, like TaintDroid, it requires a modification of the Android operating system to track data via taint tracking. AppsPlayground was evaluated with 3968 apps from the Google Play store.

Chen et al. [24] proposed the detection of systems based on data mining by ransomware for automatic detection. The actual behavior of the apps is controlled and generated in the call flow graph API (Application Programming Interface) as a set of functionalities.

Emulator vs. real phone [25] offers a detailed study of the differences between the execution environments. This study is recommended to perform the detection on a real device.

DroidBox [26] allows monitoring a wide range of events, such as file access, network traffic or DEX files loaded dynamically at runtime. DroidBox uses API 16, which covers 99.6% of smartphones according to Android. It is used for feature extraction in the context of dynamic analysis.

#### *3.4. Hybrid Method*

We define the hybrid analysis method as a method that combines static and dynamic analysis methods.

MADAM [27] is a hybrid framework using machine learning to detect malware. It classifies them based on suspicious behavior observed at different levels of Android: kernel, application, user, and package. MADAM requires administrator privileges on the phone used, since it works at the kernel level. Thus, the authors specify that their solution is not intended for the general public, but seeks to prove the strength of such an approach (i.e., multi-level, dynamic, and on the device). The 2018 version offers real-world experiments on 2800 malware of 125 different families from three datasets.

SAMADroid [28] uses machine learning to detect malware. It works on both local hosts (i.e., on-devices) to perform dynamic analysis, and remote hosts, to obtain static analysis and prediction. The SAMADroid client app is developed for Android devices. The dataset for neural network training is Drebin (2010–2012) [15], which contains old malware. However, SAMADroid claims to achieve an accuracy of 99.07%.

AndroPyTool adopted by Martin et al. [29] presents two tools that are of grea<sup>t</sup> importance for our own detection method: (1) the AndroPyTool framework; (2) and the Omnidroid dataset. AndroPyTool is developed in Python, and the code is hosted on GitHub. It can perform a complete extraction of static and dynamic features. It integrates the most used Android malware analysis tools (i.e., FlowDroid [19], DroidBox [26], Andro-Guard [21] and Strace [30]) to perform a source code inspection, and to retrieve information on behavior when the sample is run in a controlled environment.

#### *3.5. Limitations of the Existing Methods*

Static, dynamic or hybrid approaches have the following shortcomings:


For dynamic analysis, (1) manual intervention may be required [22,32] to guarantee full exploration of the app; and (2) the app could determine if the runtime environment is an emulation. In this case, the malicious code would not be triggered, which would prevent its detection [23].

In addition to the previous shortcomings, hybrid approaches may have the following drawbacks: (1) the average performance; and (2) the device must be necessarily rooted.

Finally, all the methods presented above have high accuracy only if they are associated with: (1) many apps in the dataset for training and evaluation; and (2) recent malware. Indeed, any method that claims to achieve an accuracy of around 99%, while using old databases for evaluating the model, is considered to be obsolete.
