**2. Related Work**

The issue of identifying malicious domains is a fundamental problem in cybersecurity. This section discusses recent results in identifying malicious domains, focusing on two significant methodologies, mathematical theory (MT) approaches and machine learning (ML)-based techniques.

The use of graph theory to identify malicious domains was more pervasive in the past [16,26,31–33]. Yadav et al. [26] presented a method for recognizing malicious domain names based on fast flux. Fast flux is a DNS technique used by botnets to hide phishing and malware delivery sites behind an ever-changing network of compromised hosts acting as proxies. They analyzed the DNS queries and responses to detect if and when domain names were being generated by a Domain Generation Algorithm (DGA). Their solution was based on computing the distribution of alphanumeric characters for groups of domains and by statistical metrics with the KL (Kullback Leibler) distance, Edit distance and Jaccard measure to identify these domains. For a fast-flux attack using the Jaccard Index, they achieved impressive results, with 100% detection and 0% false positives. However, for smaller numbers of generated domains for each TLD, their false-positive results were much higher, at 15% when 50 domains were generated for the TLD using the KL-divergence over unigrams, and 8% when 200 domains were generated for each TLD using the Edit distance.

Dolberg et al. [16] described a system called *Multi-dimensional Aggregation Monitoring (MAM)* that detects anomalies in DNS data by measuring and comparing a "steadiness" metric over time for domain names and IP addresses using a tree-based mechanism. The steadiness metric is based on a domain similar to IP resolution patterns when comparing DNS data over a sequence of consecutive time frames. The domain name to IP mappings were based on an aggregation scheme and measured steadiness. In terms of detecting malicious domains, the results showed that an average steadiness value of 0.45 could be used as a reasonable threshold value, with a 73% true positive rate and only a 0.3% false positive one. The steadiness values might not be considered a good indicator when fewer malicious activities are present (e.g., <10%).

However, the most common approach to identifying malicious domains is by means of machine learning (ML) and Deep Learning (DL) [11,14,20,23,24,27,28,34–42]. Researchers can train ML algorithms to label URLs as malicious or benign using a set of extracted features. Shi et al. [23] proposed a machine learning methodology to detect malicious domain names using the Extreme Learning Machine (ELM) [19], which is closest to the one employed here. ELM is a new neural network with a high accuracy and fast learning speed. The authors divided their features into four categories: construction-based, IP-based, TTL-based, and *WHOIS*-based categories. Their evaluation resulted in a high detection rate with an accuracy exceeding 95% and a fast learning speed. However, as shown below, a significant fraction of the features used in this work emerged as non-robust and ineffective in the presence of an intelligent adversary.

Sun et al. [24] presented a system called *HinDom*, which generates a heterogeneous graph (in contrast to homogeneous graphs created by Rahbarinia et al. [22] and Yadav et al. [26]) in order to robustly identify malicious attacks (e.g., spam, phishing, malware, and botnets). Even though HinDom collected DNS and pDNS data, it also has the ability to collect information from various clients inside networks (e.g., CERNET2 and TUNET); thus, its perspective is different from the perspective of this study (i.e., client perspective). Nevertheless, HinDom has achieved remarkable results using a transductive classifier and achieved a high accuracy and F1-scores of 99% and 97.5%, respectively.

Bilge et al. [13] created a system called *Exposure*, which is designed to detect malicious domain names. Their system uses passive DNS data collected over some time to extract features related to known malicious and benign domains. Passive DNS Replication [11,13,20,22,25,27,28] refers to the reconstruction of DNS zone data by recording and aggregating live DNS queries and responses. Passive DNS data can be collected without requiring the cooperation of zone administrators. The Exposure system is designed to detect malware- and spam-related domains. It can also detect malicious fast-flux and DGA-related domains based on their unique features. The system computes the following four sets of features from anonymized DNS records: (a) time-based features related to the periods and frequencies that a specific domain name was queried in; (b) DNS-answer-based features calculated according to the number of distinctive resolved IP addresses and domain names, the countries in which the IP addresses reside, and the ratio of the resolved IP addresses that can be matched with valid domain names and other services; (c) TTL-based features that are calculated based on a statistical analysis of the TTL over a given time series; and (d) domain name-based features that are extracted by computing the ratio of the numerical characters to the domain name string, and the ratio of the size of the longest meaningful substring in the domain name. Using a Decision Tree model, Exposure reported a total of 100,261 distinct domains as being malicious, which resulted in 19,742 unique IP addresses. The combination of features used to identify malicious domains led to the successful identification of several domains related to botnets, flux networks, and DGAs, with low false-positive and high detection rates. It may not be possible to generalize the detection rate results reported by the authors (98%) since they were highly dependent on comparisons with biased datasets. Despite the positive results, once an identification scheme is published, it is always possible for an attacker to evade detection by mimicking the behaviors of benign domains.

Rahbarinia et al. [22] presented a system called *Segugio*, which is an anomaly detection system based on passive DNS traffic to identify malware-controlled domain names based on their relationship to known malicious domains. The system detects malware-controlled domains by creating a machine domain bipartite graph representing the underlying relations between new domains and known benign/malicious domains. The system operates by calculating the following features: (a) machine behavior, based on the ratio of "known malicious" and "unknown" domains that query a given domain d over the total number of machines that query d. The larger the total number of queries and the fraction of malicious related queries, the higher the probability that d is a malware-controlled domain; (b) Domain activity, where given a time period, domain activity is computed by counting the total

number of days in which a domain was actively queried; (c) IP abuse, where, given a set of IP addresses that the domain resolves to, this feature represents the fraction of those IP addresses that were previously targeted by known malware-controlled domains. Using a Random Forest model, Segugio was shown to produce high true positive and meager false positive rates (94% and 0.1%, respectively). It was also able to detect malicious domains earlier than commercial blacklisting websites. However, Segugio is a system that can only detect malware-related domains based on their relationship to previously known domains and therefore cannot detect new (unrelated to previous malicious domains) malicious domains. Additional information concerning malicious domain filtering and malicious URL detection can be found in [34,42].

Adversarial machine learning is a subfield of machine learning in which instances used to train the model and instances in the wild may be characterized by different distributions. For example, given perturbations on a malicious instance so that it will be falsely classified as benign. These manipulated instances are commonly called *adversarial examples (AE)* [43]. AE are samples that an attacker changes based on some model classification function knowledge. These examples are slightly different from correctly classified examples. Therefore, the model fails to classify them correctly. AE are widely used in the fields of spam filtering [44], network intrusion detection systems (IDS) [45], anti-virus signature tests [46] and biometric recognition [47].

Attackers commonly follow one of two models to generate adversarial examples: (1) white-box attacker [48–51], which has full knowledge of the classifier and the train/test data and (2) black-box attacker [48,52,53], which has access to the model's output for each given input. Various methods have emerged to tackle AE-based attacks and make ML models robust. The most promising are those based on game-theoretic approaches [54–56], robust optimization [48,49,57], and adversarial retraining [30,58,59]. These approaches mainly concern *feature-space models* of attacks where feature space models assume that the attacker changes the values of features directly. Note that these attacks may be an abstraction of reality as random modifications to feature values may not be realizable or avoid the manipulated instance functionality.

Note that the topic of robust feature selection has attracted an increasing number of researchers in recent years [30,60,61]. In the domain of PDF malware, Tong et al. [30] extracted a set of features termed "conserved features" that the adversary cannot unilaterally modify without compromising malicious functionality. In the domain of APK malware, Chen et al. [60] demonstrated the need for robust feature selection in their tool, Android HIV. This tool takes advantage of non-robust features to easily bypass state-of-the-art android malware classifiers.
