5.2.4. Relabeling

The Omnidroid dataset was labeled by VirusTotal with a threshold ε equal to 1. This threshold ε represents the number of antiviruses that detects an app as malicious. Thus, a threshold ε set to 1 means that if only one of the sixty antiviruses on VirusTotal detects an app as malicious, then this app will be labeled as malicious in Omnidroid. Therefore, we track the number of apps detected by antivirus according to the number of antiviruses.

The results in Figure 13 are valid as of September 2019. We used the VirusTotal service [37] thanks to an academic API, to obtain a report for each app identified by its hash. Therefore, such results date from little more than a year and a half after the initial results obtained by [29]. Thus, we notice that a year and half later, only 9024 apps are detected as benign, with a threshold ε equal to 1, which corresponds to 0 antivirus on the abscissa axis. In addition, we note that 1807 apps are classified as malware, even though only one antivirus has detected them as malicious. A threshold ε equal to 2 would have been enough to classify them as benign.

**Figure 13.** Scans on VirusTotal of 22,000 apps.

In Figures 14 and 15, the representations are based on 22,000 app reports from September 2019, which are collected using a python script and a VirusTotal academic key. We have relabeled Omnidroid for static trainings by setting the threshold ε from 1 to 4 for 2018s reports, and from 1 to 10 for 2019s reports.

In Figure 14, we note an improvement in the results following the relabeling. We deduce that there have been new scans of antivirus among VirusTotal for a year and a half, and that this has an impact on the detection of malware.

Moreover, we have relabeled Omnidroid for dynamic trainings in Figure 15, but this time by varying the threshold ε from 1 to 6 for the reports. As with static relabeling, we notice an improvement in the results following dynamic relabeling. In particular, the recall is the metric that is improved the most (i.e., up to 10%). Relabeling has made it possible to detect more malware that were previously undetected by Omnidroid labeling.
