DoH Tunneling Detection System for Enterprise Network Using Deep Learning Technique
Abstract
:1. Introduction
- We propose a two-layered detection approach based on Transformer architecture as an effective DoH tunneling detection model. The first layer separates DoH and HTTPS traffic. The second layer separates legitimate DoH traffic from malicious DoH traffic.
- We implement a fully functional DoH tunneling detection solution that can be integrated into an enterprise network’s security operation system. The system is an end-to-end detection system that gathers HTTPS traffic traces, stores and analyzes them to detect malicious DoH traffic, and alerts the network manager.
- Finally, we conduct comprehensive experiments to evaluate the proposed scheme. The results show that our proposed method achieves significant improvement compared with other research.
2. Related Works
2.1. DNS Attacks and Countermeasures Methods
2.2. DoH Tunneling Detection Methods
3. Background Knowledge
3.1. Domain Name System (DNS)
3.2. DNS-over-HTTPS
3.3. DNS-over-HTTPS Vulnerabilities in Enterprise Network
3.4. Transformer Architecture
- Encoder: The original encoder of the Transformer architecture has six identical layers, and each layer is constructed with a multi-head self-attention mechanism and a fully connected feed-forward network mechanism. Both of these mechanisms consist of a residual connection and a normalization layer. In our work, we only used four layers.
- Decoder: The original decoder also includes six identical layers, each consisting of an additional sublayer. The additional sublayer performs multi-head attention over the output of the encoder stack.
- Attention Mechanism: The attention component of the network maps the critical and relevant elements from the input sequence and assigns higher weights to these elements, which enhances the accuracy of the output prediction.
- Scaled Dot-Product Attention: Scaled dot-product attention is an attention mechanism that calculates the output as a weighted sum of the values, where the weight assigned to each value is determined by the dot-product of the query with all the keys:
- Multi-Head Attention: The architecture of multi-head attention is shown in Figure 4. Instead of only computing the attention once, the multi-head attention mechanism operates through the scaled dot-product attention multiple times in parallel. The separated attention outputs are concatenated and linearly transformed into the expected dimensions. The multi-headed-attention matrix for input matrices (Q, K, V) is calculated as:
3.5. ELK Architecture
- Elasticsearch is a NoSQL database with RESTful APIs based on the Lucene search engine. It is a search and analytics engine that is highly adaptable and dispersed. It has powerful queries for deep analysis and centrally maintains all data for quick document searches. It also provides horizontal scalability, allowing fast deployment, excellent reliability, and control.
- Logstash is a data collection pipeline tool. It is the first part of the ELK Stack, and it collects data and feeds it to Elasticsearch. It gathers a variety of data kinds from numerous sources simultaneously and makes it immediately available for further use.
- Kibana is a data visualization tool. It is used to visualize Elasticsearch data and provide developers with fast access. The Kibana dashboard visualizes sophisticated queries with interactive charts, geographical information, timelines, and diagrams. Kibana allows network managers to create and save individual charts based on their explicit requirements.
- Beats are lightweight agents that are installed at every client’s station to collect logs and send them to the ELK host station.
3.6. Security Operation Center (SOC) in Enterprise Network
- Prevention and detection: In cybersecurity, prevention is more effective than response. An SOC monitors the network to detect attacks rather than responding when they occur. As a result, the SOC team can see malicious activity and stop it before it causes any harm. When SOC analysts notice something unusual, they gather as much information as possible to conduct a more thorough investigation.
- Investigation: The SOC team analyzes anomaly activity at the investigation stage to establish the nature of the threat and the extent to which it has entered the infrastructure. The security analyst examines the network and activities of the company from the perspective of an attacker, looking for essential indicators and sections of weakness before they are exploited. By understanding how assaults escalate and successfully responding before they become uncontrollable, the analyst identifies and executes triage on various security problems. To achieve a successful defense, the SOC analyst combines the enterprise’s network expertise with the most recent global threat intelligence, including specifics on attacker tools, techniques, and trends.
- Response: Following the investigation, the SOC team plans a reaction to address the issue. The SOC serves as the first responder as soon as an incident is confirmed, isolating endpoints, eliminating hazardous programs, stopping them from executing, removing them, and more.
4. Model Development and System Architecture
4.1. DoH Traffic Collection and Statistical Feature Extraction
- Browser: Google Chrome, Filefox.
- DoH Server: Adguard, Cloudflare, Google, and Quad9.
- Traffic capture: tcpdump [41].
- Data rate: Random from 100 B/s to 1000 B/s.
4.2. Training Process
4.3. System Implementation
5. Results and Evaluation
- Precision: defined as the number of true positives (TP) over the number of true positives plus the number of false positives (FN)
- Recall: defined as the number of true positives (TP) over the number of true positives plus the number of false negatives.
- F1-scorce: the adjustment average of the precision and recall
6. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Houser, R.; Hao, S.; Li, Z.; Liu, D.; Cotton, C.; Wang, H. A Comprehensive Measurement-based Investigation of DNS Hijacking. In Proceedings of the 2021 40th International Symposium on Reliable Distributed Systems (SRDS), Chicago, IL, USA, 20–23 September 2021; pp. 210–221. [Google Scholar] [CrossRef]
- Stewart, J. Dns Cache Poisoning—The Next Generation. 2003. Available online: https://www.ida.liu.se/~TDDD17/literature/dnscache.pdf (accessed on 23 February 2022).
- Vaughn, R. DNS Amplification Attacks (Preliminary Release). 2006. Available online: http://index-of.es/Tutorials/AstalaVista/dns-amplification-attacks.pdf (accessed on 23 February 2022).
- Wang, Y.; Zhou, A.; Liao, S.; Zheng, R.; Hu, R.; Zhang, L. A comprehensive survey on DNS tunnel detection. Comput. Networks 2021, 197, 108322. [Google Scholar] [CrossRef]
- OpenDNS. Available online: https://www.opendns.com/ (accessed on 23 December 2021).
- DNSFilter. Available online: https://www.dnsfilter.com/ (accessed on 23 December 2021).
- Infoblox BloxOne Threat Defense. Available online: https://www.infoblox.com/products/bloxone-threat-defense/ (accessed on 23 December 2021).
- Bieler, D.; Kindness, A. The Enterprise Network Enables Business Innovation. Available online: https://www.hughes.com/ (accessed on 9 February 2021).
- Domain Name System—Request For Comments 1034. Available online: https://tools.ietf.org/html/rfc1034 (accessed on 9 February 2022).
- Carli, F. Security Issues with DNS. Available online: https://www.sans.org/white-papers/1069/ (accessed on 9 February 2022).
- Kim, T.H.; Reeves, D. A survey of domain name system vulnerabilities and attacks. J. Surveill. Secur. Saf. 2020, 1, 34–60. [Google Scholar] [CrossRef]
- Guha, S.; Francis, P. Identity trail: Covert surveillance using DNS. In Proceedings of the Internationl Workshop on Privacy Enhancing Technologies (PET), Ottawa, BC, Canada, 20–22 June 2007. [Google Scholar]
- Hoffman, P.; McManus, P. RFC 8484—DNS Queries over HTTPS. October 2018. Available online: https://datatracker.ietf.org/doc/html/rfc8484 (accessed on 23 May 2021).
- Mockapetris, P. Domain Names—Implementation and Specification. Available online: https://datatracker.ietf.org/doc/html/rfc1035 (accessed on 23 February 2022).
- Montazeri Shatoori, M.; Davidson, L.; Kaur, G.; Lashkari, A.H. Detection of DoH Tunnels using Time-series Classification of Encrypted Traffic. In Proceedings of the 2020 IEEE International Conference on Dependable, Autonomic and Secure Computing, International Conference on Pervasive Intelligence and Computing, International Conference on Cloud and Big Data Computing, International Conference on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), Calgary, AB, Canada, 17–22 August 2020; pp. 63–70. [Google Scholar] [CrossRef]
- Mitsuhashi, R.; Satoh, A.; Jin, Y.; Iida, K.; Shinagawa, T.; Takai, Y. Identifying Malicious DNS Tunnel Tools from DoH Traffic Using Hierarchical Machine Learning Classification. In Information Security. ISC 2021; Lecture Notes in Computer Science; Liu, J.K., Katsikas, S., Meng, W., Susilo, W., Intan, R., Eds.; Springer: Cham, Switzerland, 2021; Volume 13118. [Google Scholar] [CrossRef]
- Singh, S.K.; Roy, P.K. Detecting Malicious DNS over HTTPS Traffic Using Machine Learning. In Proceedings of the 2020 International Conference on Innovation and Intelligence for Informatics, Computing and Technologies (3ICT), Sakheer, Bahrain, 20–21 December 2020; pp. 1–6. [Google Scholar] [CrossRef]
- De Vries, L.J.W. Detection of DoH Tunnelling: Comparing Supervised with Unsupervised Learning. 2021. Available online: http://essay.utwente.nl/88335/ (accessed on 23 December 2021).
- Rezaei, S.; Liu, X. Deep Learning for Encrypted Traffic Classification: An Overview. IEEE Commun. Mag. 2019, 57, 76–81. [Google Scholar] [CrossRef] [Green Version]
- Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar]
- CISA Alert (TA13-088A) DNS Amplication Attacks. Available online: https://www.cisa.gov/uscert/ncas/alerts/TA13-088A (accessed on 23 February 2022).
- Maksutov, A.A.; Cherepanov, I.A.; Alekseev, M.S. Detection and prevention of DNS spoofing attacks. In Proceedings of the 2017 Siberian Symposium on Data Science and Engineering (SSDSE), Novosibirsk, Russia, 12–13 April 2017; pp. 84–87. [Google Scholar] [CrossRef]
- Arends, R.; Telematica Instituut. “DNS Security Introduction and Requirements” RFC4033. Available online: https://datatracker.ietf.org/doc/html/rfc4033 (accessed on 9 February 2022).
- Aiello, M.; Mongelli, M.; Papaleo, G. DNS tunneling detection through statistical fingerprints of protocol messages and machine learning. Int. J. Commun. Syst. 2015, 28, 1987–2002. [Google Scholar] [CrossRef]
- Bottger, T.; Cuadrado, F.; Antichi, G.; Fernandes, E.L.; Tyson, G.; Castro, I.; Uhlig, S. An Empirical Study of the Cost of DNS-over-HTTPS. In Proceedings of the Internet Measurement Conference, Amsterdam, The Netherlands, 21–23 October 2019; pp. 15–21. [Google Scholar]
- Hounsel, A.; Borgolte, K.; Schmitt, P.; Holland, J.; Feamster, N. Analyzing the costs (and benefits) of DNS, DoT, and DoH for the modern web. In Proceedings of the Applied Networking Research Workshop, Montreal, QC, Canada, 22 July 2019; pp. 20–22. [Google Scholar]
- Wijenbergh, J.; Moonsamy, V.; van Rijsdijk-Deij, R.; Kuijsters, D.D. Performance comparison of DNS over HTTPS to Unencrypted DNS. Bachelor’s Thesis, Radboud University, Nijmegen, The Netherlands, 2019. [Google Scholar]
- Siby, S.; Juarez, M.; Vallina-Rodriguez, N.; Troncoso, C. DNS Privacy not So Private: The Traffic Analysis Perspective. 2018. Available online: https://cpb-us-e1.wpmucdn.com/sites.usc.edu/dist/5/475/files/2020/01/hotpets18.pdf (accessed on 23 December 2021).
- Siby, S.; Juarez, M.; Diaz, C.; Vallina-Rodriguez, N.; Troncoso, C. Encrypted DNS → Privacy? A Traffic Analysis Perspective. arXiv 2019, arXiv:1906.09682. [Google Scholar]
- Bushart, J.; Rossow, C. Padding Ain’t Enough: Assessing the Privacy Guarantees of Encrypted DNS. arXiv 2019, arXiv:1907.01317. [Google Scholar]
- Lu, C.; Liu, B.; Li, Z.; Hao, S.; Duan, H.; Zhang, M.; Leng, C.; Liu, Y.; Zhang, Z.; Wu, J. An end-to-end, large-scale measurement of DNS-over-Encryption: How far have we come? In Proceedings of the Internet Measurement Conference, Amsterdam, The Netherlands, 21–23 October 2019; pp. 22–35. [Google Scholar]
- Nijeboer, F. Detection of Https Encrypted Dns Traffic. Bachelor’s Thesis, University of Twente, Enschede, The Netherlands, 2020. [Google Scholar]
- Hjelm, D. A New Needle and Haystack: Detecting DNS over HTTPS Usage. Available online: https://www.sans.org/white-papers/39160/ (accessed on 23 December 2021).
- Godlua Backdoor. Available online: https://blog.netlab.360.com/an-analysis-of-godlua-backdoor-en/ (accessed on 23 December 2021).
- What Is the ELK Stack? Available online: https://www.elastic.co/what-is/elk-stack (accessed on 23 December 2021).
- Checkpoint SOC. Available online: https://www.checkpoint.com/cyber-hub/threat-prevention/what-is-soc/ (accessed on 23 December 2021).
- Alexa Traffic Rank. Available online: https://www.alexa.com/topsites (accessed on 23 December 2021).
- dns2tcp. Available online: https://github.com/alex-sector/dns2tcp (accessed on 23 December 2021).
- DNSCat2. Available online: https://github.com/iagox86/dnscat2 (accessed on 23 December 2021).
- Iodine. Available online: https://github.com/yarrick/iodine (accessed on 23 December 2021).
- tcpdump. Available online: https://github.com/the-tcpdump-group/tcpdump (accessed on 23 December 2021).
- CICFlowMeter. Available online: https://github.com/ahlashkari/CICFlowMeter (accessed on 23 December 2021).
Author | Method | Drawback |
---|---|---|
M. Montazeri Shatoori [15] | Supervised Machine Learning technique | Require a huge number of labeled data |
Mitsuhashi R. [16] | Supervised Machine Learning technique | Require a huge number of labeled data |
Sunil Kumar Singh [17] | Supervised Machine Learning technique | Only classifies the DoH traffic and traditional DNS traffic |
D. Hjelm [33] | Application Fingerprinting | Not flexible and require to build an application fingerprinting database |
No. | Feature Name |
---|---|
1 | Duration |
2 | Flow bytes sent |
3 | Flow sent rate |
4 | Flow bytes received |
5 | Flow received rate |
6 | Packet length variance |
7 | Packet length standard deviation |
8 | Packet length mean |
9 | Packet length median |
10 | Packet length Mode |
11 | Packet length skew from median |
12 | Packet length skew from mode |
13 | Packet length coefficient of variation |
14 | Packet time variance |
15 | Packet time standard deviation |
16 | Packet time mean |
17 | Packet time median |
18 | Packet time mode |
19 | Packet time skew from median |
20 | Packet time skew from mode |
21 | Packet time coefficient of variation |
22 | Response time variance |
23 | Response time standard deviation |
24 | Response time mean |
25 | Response time median |
26 | Response time mode |
27 | Response time skew from median |
28 | Response time skew from mode |
29 | Response time coefficient of variation |
Hyperparameters | Values |
---|---|
Number of attention heads | 4 |
Sequence length | 128 |
Number of Transformer Blocks | 4 |
Dropout | 0.25 |
Batch size | 64 |
Epochs | 120 |
Optimizer | ADAM |
Model | Number of Instances in Training Data (Layer 1/Layer 2) | Precision (Layer 1/Layer 2) | Recall (Layer 1/Layer 2) | F1-Score (Layer 1/Layer 2) |
---|---|---|---|---|
RF [15] | 233,427/53,928 | 0.92/0.93 | 0.92/0.93 | 0.92/0.93 |
C4.5 [15] | 233,427/53,928 | 0.92/0.93 | 0.92/0.92 | 0.92/0.92 |
2D CNN [16] | 233,427/53,928 | 0.91/0.91 | 0.91/0.91 | 0.91/0.91 |
XGBoot [17] | 233,427/53,928 | 0.94/0.94 | 0.94/0.94 | 0.94/0.94 |
Transformer (our work) | 233,427/53,928 | 0.99/0.994 | 0.99/0.994 | 0.99/0.994 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2022 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Nguyen, T.A.; Park, M. DoH Tunneling Detection System for Enterprise Network Using Deep Learning Technique. Appl. Sci. 2022, 12, 2416. https://doi.org/10.3390/app12052416
Nguyen TA, Park M. DoH Tunneling Detection System for Enterprise Network Using Deep Learning Technique. Applied Sciences. 2022; 12(5):2416. https://doi.org/10.3390/app12052416
Chicago/Turabian StyleNguyen, Tuan Anh, and Minho Park. 2022. "DoH Tunneling Detection System for Enterprise Network Using Deep Learning Technique" Applied Sciences 12, no. 5: 2416. https://doi.org/10.3390/app12052416
APA StyleNguyen, T. A., & Park, M. (2022). DoH Tunneling Detection System for Enterprise Network Using Deep Learning Technique. Applied Sciences, 12(5), 2416. https://doi.org/10.3390/app12052416