A Survey on TLS-Encrypted Malware Network Traffic Analysis Applicable to Security Operations Centers
Abstract
:1. Introduction
- TLS is a widely used end-to-end encryption protocol with a wide variety of applications in diverse configurations [28]. Additionally, various malware families (especially Trickbot and Dridex) abuse TLS encryption [29,30], which is one of the biggest challenges faced by SOCs in recent years. Furthermore, the fraction of TLS-encryption that flows among malware flows is dramatically increasing: there were industrial reports in April 2021 stating that nearly a half of the malware uses TLS [31], and further in the second quarter of 2021, it stated that 91.5% of malware arrives over TLS-encrypted traffic [32]. As we have primarily discussed NTA methods applicable to malware detection and family classification, security experts in SOCs and researchers in academia can obtain useful information from our survey.
- While various surveys only focus on comparison between existing methods, we also cover industrial and community efforts on the so-called TLS fingerprinting techniques. Similar to multiple data-driven approaches, the performance of NTA is directly related to the quality of the dataset. Fortunately, several open source threat intelligence (OSINT) feeds [33,34] now provide TLS fingerprint information. Therefore, through our discussion, better traffic analysis results can be achieved by integrating such information.
2. Background
2.1. Basics of SSL/TLS
2.2. The Goals of Network Traffic Analysis for SOCs
- Malware detection: In malware (traffic) detection, NTA is used to detect network traffic containing various types of malicious content, or contributing to malicious applications. Traditionally, detection of malicious traffic is analyzed according to pre-configured rules for known attacks, but machine learning-based detection has been proposed as a complement of the signature-based network intrusion detection systems [44]. Malware detection methods typically utilize accumulated attack knowledge so that collecting and regularly updating the knowledge base is important in SOCs [2].
- (Network) Anomaly Detection: Network anomaly detection, or anomaly based intrusion detection is the problem to detect exceptional patterns in network traffic which can be distinguished from the expected normal network traffic pattern [45]. A broad range of anomaly detection techniques such as statistical, unsupervised, and rule-based techniques have been proposed in literature [46]. Furthermore, deep learning-based anomaly detection systems are actively discussed [47]. However, in real-world SOCs, the potential of human security experts may be more trusted than the automated methods so that some SOCs utilize or develop practical machine learning-based anomaly detection solutions combined with information visualization [48,49], which is out of our scope.
- Application identification: NTAs for application identification identify the network traffic from particular applications, including unauthorized applications. This can be used for specific policy enforcement in SOCs (e.g., block Amazon traffic during work hours). Recently, especially for mobile traffic, there are several machine learning-based solutions where mobile application identification and even user actions can be identified [21,22], which is sometimes called user behavior analytics (UBA) in the context of SOCs [50]. While malware family classification can be seen as a variant of the conventional application identification problem, to the best of our knowledge, there is no NTA method to identify fine-grained behavior of malware from encrypted traffic.
3. The Deployment Models
3.1. TLS Interception without Private Key
3.2. Inspection Using Cryptographic Functions
3.2.1. TLS Inspection with a Private Key
3.2.2. Privacy-Preserving Inspection through Searchable Encryption
3.3. Inspection without Decryption
4. Machine Learning Pipeline for Passive Inspection of TLS-Encrypted Traffic
4.1. Traffic Sniffing
4.2. Collecting Flow Records
4.3. Feature Extraction
- Variable-size sequential data type: TLS message type sequence, packet length sequence, interarrival time sequence, and time-slotted Zeek connection state log [86] have variable sizes, which is not suitable as an input for certain machine learning algorithms. There are several studies to transform variable-size data into statistical representative values (e.g., max, min, median, standard deviation, etc.) or a specific probabilistic/statistical object, such as a histogram and its self-similarity matrix [89], a first-order Markov chain [90], a second-order Markov chain [91], a hidden-Markov model [92], each of which can be represented as a finite-dimensional vector, while only statistical information remains in such models. Among these, Markov chain transformation has been widely used in TLS-encrypted traffic classification. Note that the approach in [89] has only been validated for unencrypted traffic; hence, we consider the adoption of the proposed approach into encrypted traffic under prospects for future work.In contrast, there are several approaches to utilize machine learning algorithms, which allows variable-size input. FS-Net [93] proposes an end-to-end traffic classification model as a variant of the recurrent neural network (RNN), which allows the packet length sequence of a flow record to be an input. According to [93], FS-Net outperforms several Markov-chain based approaches in the true positive rate and the false positive rate. Shen et al. [94] proposed a novel graph-based representation of packet length sequences (with the direction of each packet between the client and the server), known as traffic interaction graph (TIG). This research also proposes a graph neural network, which can classify decentralized applications on Ethereum from TLS encrypted traffic.
- Categorical data type: Each element of TLS client-offered ciphersuite list and TLS client-advertised extension list has a unique value with a finite number of cases, namely n, to allow better representation of a n bit vector using one-hot encoding, although the order information of the list would be lost. For example, Anderson and McGrew [17] observed that there are only 176 cases for each element in TLS client-offered ciphersuite list in their dataset. They also reported that applying order-preserving representation on the list did not increase the performance significantly.
- Numeric data type: There are several numeric data type fields in TCP header and TLS message header of each packet, and it is not necessary for such data to be transformed into other data types.
- String data type: the HostName field in the SNI extension, the Certificate message in TLS handshaking, the subjectAltName field in the Certificate message and TLS flow data can be considered as string data. As each character has a unique value and a string has variable length, these data can be considered as variable-size sequential data types. In this context, the byte distribution in [83] can be observed as a histogram of TLS flow data. However, in various approaches such as [91], only the string length is extracted as a feature. In addition, ref. [17] reports that the mismatch between the subjectAltName field and the HostName field, if available, can be an effective feature for malware detection.
4.4. TLS Flow Fingerprinting
4.5. Feature Representation
4.6. Machine Learning Algorithms and Model Selection
4.7. Hyperparameter Tuning
5. Conclusions
- The existing proposals have been validated in different and small datasets. While lack of diverse, large, and sharable datasets with labels is a persistent problem in NTA [115], sharing TLS fingerprints in OSINT feeds seems to be relatively plausible. Thus, designing OSINT-friendly TLS fingerprinting techniques with more features optimized for machine learning-based NTA can be a promising research direction.
- With the fast adoption of TLS 1.3, visibility of TLS-encrypted traffic using TLS interception is rapidly decreasing in many SOCs, even though the enhanced flow records are collected. It is because that in TLS 1.3, many features in TLS handshake metadata are no longer collectible due to inherent secure design. It implies that more features in TLS-encrypted traffic should be collected with novel feature representations, well-designed machine learning algorithms, and model optimization techniques, under the diverse constraints of SOCs (privacy, cost, automation, scalability, etc.). Recent advances in deep learning-based NTA can be a potential research direction.
- The current academic literature lacks consideration in real-time and online processing for NTA. Considering the higher requirements of deep learning-based methods, we may need to be aware of systematic and holistic approaches in NTA.
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Conflicts of Interest
References
- Zimmerman, C. Ten Strategies of a World-Class Cybersecurity Operations Center; The MITRE Corporation: McLean, VA, USA, 2014. [Google Scholar]
- Vielberth, M.; Bohm, F.; Fichtinger, I.; Pernul, G. Security Operations Center: A Systematic Study and Open Challenges. IEEE Access 2020, 8, 227756–227779. [Google Scholar] [CrossRef]
- Kokulu, F.B.; Shoshitaishvili, Y.; Soneji, A.; Zhao, Z.; Ahn, G.J.; Bao, T.; Doupé, A. Matched and Mismatched SOCs: A Qualitative Study on Security Operations Center Issues. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (ACM CCS), London, UK, 11–15 November 2019; pp. 1955–1970. [Google Scholar] [CrossRef]
- Bejtlich, R. The Practice of Network Security Monitoring: Understanding Incident Detection and Response; No Starch Press: San Francisco, CA, USA, 2013. [Google Scholar]
- Sanders, C.; Smith, J. Applied Network Security Monitoring: Collection, Detection, and Analysis; Syngress: Burlington, MA, USA, 2014. [Google Scholar]
- Richardson, M.; Harris, G. PCAP Capture File Format; Technical Report Draft-Gharris-Opsawg-Pcap-02; Internet Engineering Task Force: Fremont, CA, USA, 2021. [Google Scholar]
- Trammell, B.; Boschi, E. An Introduction to IP Flow Information Export (IPFIX). IEEE Commun. Mag. 2011, 49, 89–95. [Google Scholar] [CrossRef]
- Santos, O. Network Security with NetFlow and IPFIX: Big Data Analytics for Information Security; Cisco Press: Indianapolis, IN, USA, 2015. [Google Scholar]
- ENEA Qosmos Division. Importance of Network Traffic Analysis (NTA) for SOCs; Technical Report; ENEA Qosmos Division: Paris, France, 2019. [Google Scholar]
- Symantec. A Technology Brief on SSL/TLS Traffic; Symantec Corporation World Headquarters: Mountain View, CA, USA, 2017. [Google Scholar]
- Cisco. Cisco Encrypted Traffic Analytics. 2019. Available online: https://www.cisco.com/c/en/us/td/docs/ios-xml/ios/sec_data_eta/configuration/xe-16-10/sec-data-encrypted-traffic-analytics-xe-16-10-book/sec-data-encrypted-traffic-analytics-xe-16-6-book_chapter_01.pdf?dtid=osscdc000283 (accessed on 11 November 2021).
- Naylor, D.; Finamore, A.; Leontiadis, I.; Grunenberger, Y.; Mellia, M.; Munafò, M.; Papagiannaki, K.; Steenkiste, P. The Cost of the “S” in HTTPS. In Proceedings of the 10th Conference on Emerging Networking Experiments and Technologies (ACM CoNEXT), Sydney, Australia, 2–5 December 2014; pp. 133–140. [Google Scholar] [CrossRef] [Green Version]
- Google. HTTPS Encryption on the Web. 2021. Available online: https://transparencyreport.google.com/https/overview (accessed on 11 November 2021).
- “Let’s Encrypt”. Let’s Encrypt Stats. 2021. Available online: https://letsencrypt.org/stats/ (accessed on 11 November 2021).
- Aas, J.; Barnes, R.; Case, B.; Durumeric, Z.; Eckersley, P.; Flores-López, A.; Halderman, J.A.; Hoffman-Andrews, J.; Kasten, J.; Rescorla, E.; et al. Let’s Encrypt: An Automated Certificate Authority to Encrypt the Entire Web. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (ACM CCS) CCS ’19, London, UK, 11–15 November 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 2473–2487. [Google Scholar] [CrossRef] [Green Version]
- Mayer, W.; Schmiedecker, M. TLScompare: Crowdsourcing Rules for HTTPS Everywhere. In Proceedings of the 25th International Conference Companion on World Wide Web (WWW), Montreal, QC, Canada, 11–15 April 2016; pp. 471–476. [Google Scholar] [CrossRef]
- Anderson, B.; McGrew, D. Identifying Encrypted Malware Traffic with Contextual Flow Data. In Proceedings of the 9th ACM Workshop on Artificial Intelligence and Security (ACM AISec’2016), Co-Located with ACM CCS 2016, Vienna, Austria, 28 October 2016; Association for Computing Machinery, Inc.: New York, NY, USA, 2016; pp. 35–46. [Google Scholar] [CrossRef]
- Papadogiannaki, E.; Ioannidis, S. A Survey on Encrypted Network Traffic Analysis Applications, Techniques, and Countermeasures. ACM Comput. Surv. 2021, 54, 1–35. [Google Scholar] [CrossRef]
- Pacheco, F.; Exposito, E.; Gineste, M.; Baudoin, C.; Aguilar, J. Towards the Deployment of Machine Learning Solutions in Network Traffic Classification: A Systematic Survey. IEEE Commun. Surv. Tutor. 2019, 21, 1988–2014. [Google Scholar] [CrossRef] [Green Version]
- Velan, P.; Čermák, M.; Čeleda, P.; Drašar, M. A survey of methods for encrypted traffic classification and analysis. Int. J. Netw. Manag. 2015, 25, 355–374. [Google Scholar] [CrossRef]
- Aceto, G.; Ciuonzo, D.; Montieri, A.; Pescapé, A. Mobile encrypted traffic classification using deep learning: Experimental evaluation, lessons learned, and challenges. IEEE Trans. Netw. Serv. Manag. 2019, 16, 445–458. [Google Scholar] [CrossRef]
- Conti, M.; Li, Q.Q.; Maragno, A.; Spolaor, R. The Dark Side(-Channel) of Mobile Devices: A Survey on Network Traffic Analysis. IEEE Commun. Surv. Tutor. 2018, 20, 2658–2713. [Google Scholar] [CrossRef] [Green Version]
- Poh, G.S.; Divakaran, D.M.; Lim, H.W.; Ning, J.; Desai, A. A Survey of Privacy-Preserving Techniques for Encrypted Traffic Inspection over Network Middleboxes. arXiv 2021, arXiv:2101.04338. [Google Scholar]
- Rezaei, S.; Liu, X. Deep Learning for Encrypted Traffic Classification: An Overview. IEEE Commun. Mag. 2019, 57, 76–81. [Google Scholar] [CrossRef] [Green Version]
- Shen, M.; Liu, Y.; Zhu, L.; Xu, K.; Du, X.; Guizani, N. Optimizing feature selection for efficient encrypted traffic classification: A systematic approach. IEEE Netw. 2020, 34, 20–27. [Google Scholar] [CrossRef]
- Shbair, W.M.; Cholez, T.; Francois, J.; Chrisment, I. A Survey of HTTPS Traffic and Services Identification Approaches. arXiv 2020, arXiv:2008.08339. [Google Scholar]
- De Carnavalet, X.C.; van Oorschot, P.C. A survey and Analysis of TLS Interception Mechanisms and Motivations. arXiv 2020, arXiv:2010.16388. [Google Scholar]
- Mckay, K.; Cooper, D. Guidelines for the Selection, Configuration, and Use of Transport Layer Security (TLS) Implementations; Technical Report; NIST: Gaithersburg, MD, USA, 2019. [Google Scholar] [CrossRef]
- Anderson, B.; Paul, S.; McGrew, D. Deciphering malware’s use of TLS (without decryption). J. Comput. Virol. Hacking Tech. 2018, 14, 195–211. [Google Scholar] [CrossRef] [Green Version]
- Warburton, D. The 2021 TLS Telemetry Report; Technical Report; F5 Labs: Washington, DC, USA, 2021. [Google Scholar]
- Gallagher, S. Nearly Half of Malware Now Use TLS to Conceal Communications; Technical Report; SophosLabs: Tokyo, Japan, 2021. [Google Scholar]
- WatchGuard Threat Lab. Internet Security Report: Q2 2021; Technical Report; Watchguard: Seattle, WA, USA, 2021. [Google Scholar]
- Abuse.ch. No SSLBL|Malicious JA3 Fingerprints. Available online: https://sslbl.abuse.ch/ja3-fingerprints/ (accessed on 11 November 2021).
- SSL Fingerprint JA3. Available online: https://ja3er.com/ (accessed on 11 November 2021).
- Freier, A.O.; Karlton, P.; Kocher, P.C. The Secure Sockets Layer (SSL) Protocol Version 3.0. RFC 6101. 2011. Available online: https://www.rfc-editor.org/rfc/rfc6101 (accessed on 11 November 2021). [CrossRef]
- Barnes, R.; Thomson, M.; Pironti, A.; Langley, A. Deprecating Secure Sockets Layer Version 3.0. RFC 7568. 2015. Available online: https://www.rfc-editor.org/rfc/rfc7568 (accessed on 11 November 2021). [CrossRef]
- Allen, C.; Dierks, T. The TLS Protocol Version 1.0. RFC 2246. 1999. Available online: https://www.rfc-editor.org/rfc/rfc2246 (accessed on 11 November 2021). [CrossRef]
- Rescorla, E. The Transport Layer Security (TLS) Protocol Version 1.3. RFC 8446. 2018. Available online: https://www.rfc-editor.org/rfc/rfc8446 (accessed on 11 November 2021). [CrossRef]
- Moriarty, K.; Farrell, S. Deprecating TLS 1.0 and TLS 1.1. RFC 8996. 2021. Available online: https://www.rfc-editor.org/rfc/rfc8996 (accessed on 11 November 2021). [CrossRef]
- Qualys, I. Qualys SSL Labs—SSL Pulse. 2021. Available online: https://www.ssllabs.com/ssl-pulse/ (accessed on 11 November 2021).
- Rescorla, E.; Dierks, T. The Transport Layer Security (TLS) Protocol Version 1.2. RFC 5246. 2008. Available online: https://rfc-editor.org/rfc/rfc5246 (accessed on 11 November 2021). [CrossRef]
- Eastlake, D.E., 3rd. Transport Layer Security (TLS) Extensions: Extension Definitions. RFC 6066. 2011. Available online: https://rfc-editor.org/rfc/rfc6066 (accessed on 11 November 2021). [CrossRef]
- Axon, L.; AlAhmadi, B.A.; Nurse, J.R.C.; Goldsmith, M.; Creese, S. Data presentation in security operations centres: Exploring the potential for sonification to enhance existing practice. J. Cybersecur. 2020, 6, tyaa004. [Google Scholar] [CrossRef]
- Fu, C.; Li, Q.; Shen, M.; Xu, K. Realtime Robust Malicious Traffic Detection via Frequency Domain Analysis. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (ACM CCS), CCS ’21, Virtual, 15–19 November 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 3431–3446. [Google Scholar] [CrossRef]
- Bhuyan, M.H.; Bhattacharyya, D.K.; Kalita, J.K. Network anomaly detection: Methods, systems and tools. IEEE Commun. Surv. Tutor. 2014, 16, 303–336. [Google Scholar] [CrossRef]
- Chandola, V.; Banerjee, A.; Kumar, V. Anomaly Detection: A Survey. ACM Comput. Surv. 2009, 41, 1–58. [Google Scholar] [CrossRef]
- Aldweesh, A.; Derhab, A.; Emam, A.Z. Deep learning approaches for anomaly-based intrusion detection systems: A survey, taxonomy, and open issues. Knowl. Based Syst. 2020, 189, 105124. [Google Scholar] [CrossRef]
- Goodall, J.R.; Ragan, E.D.; Steed, C.A.; Reed, J.W.; Richardson, G.D.; Huffer, K.M.; Bridges, R.A.; Laska, J.A. Situ: Identifying and Explaining Suspicious Behavior in Networks. IEEE Trans. Vis. Comput. Graph. 2019, 25, 204–214. [Google Scholar] [CrossRef]
- Choi, I.; Lee, J.; Kwon, T.; Kim, K.; Choi, Y.; Song, J. An Easy-to-use Framework to Build and Operate AI-based Intrusion Detection for In-situ Monitoring. In Proceedings of the 2021 16th Asia Joint Conference on Information Security (AsiaJCIS), Seoul, Korea, 19–20 August 2021; pp. 1–8. [Google Scholar] [CrossRef]
- Smith, M. The SOC is Dead, Long Live the SOC! ITNOW 2020, 62, 34–35. [Google Scholar] [CrossRef]
- Finsterbusch, M.; Richter, C.; Rocha, E.; Müller, J.A.; Hänßgen, K. A survey of payload-based traffic classification approaches. IEEE Commun. Surv. Tutor. 2014, 16, 1135–1156. [Google Scholar] [CrossRef]
- Durumeric, Z.; Ma, Z.; Springall, D.; Barnes, R.; Sullivan, N.; Bursztein, E.; Bailey, M.; Halderman, J.A.; Paxson, V. The Security Impact of HTTPS Interception. In Proceedings of the 2017 Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 26 February–1 March 2017; Internet Society: Reston, VA, USA, 2017. [Google Scholar] [CrossRef] [Green Version]
- Saltzer, J.H.; Reed, D.P.; Clark, D.D. End-to-end arguments in system design. ACM Trans. Comput. Syst. (TOCS) 1984, 2, 277–288. [Google Scholar] [CrossRef] [Green Version]
- Liang, J.; Jiang, J.; Duan, H.; Li, K.; Wan, T.; Wu, J. When HTTPS meets CDN: A case of authentication in delegated service. In Proceedings of the 2014 IEEE Symposium on Security and Privacy (IEEE S&P), Berkeley, CA, USA, 18–21 May 2014; pp. 67–82. [Google Scholar] [CrossRef]
- Huang, L.S.; Rice, A.; Ellingsen, E.; Jackson, C. Analyzing forged SSL certificates in the wild. In Proceedings of the IEEE Symposium on Security and Privacy (IEEE S&P), Berkeley, CA, USA, 18–21 May 2014; pp. 83–97. [Google Scholar] [CrossRef]
- Dekker, M. The HTTPS Interception Dilemma: Pros and Cons. 2017. Available online: https://www.helpnetsecurity.com/2017/03/08/https-interception-dilemma/ (accessed on 11 November 2021).
- Clark, J.; Van Oorschot, P.C. SoK: SSL and HTTPS: Revisiting past challenges and evaluating certificate trust model enhancements. In Proceedings of the 2013 IEEE Symposium on Security and Privacy (IEEE S&P), Berkeley, CA, USA, 19–22 May 2013; pp. 511–525. [Google Scholar] [CrossRef]
- Song, D.X.; Wagner, D.; Perrig, A. Practical techniques for searches on encrypted data. In Proceedings of the IEEE Computer Society Symposium on Security and Privacy (IEEE S&P), Berkeley, CA, USA, 14–17 May 2000; pp. 44–55. [Google Scholar] [CrossRef] [Green Version]
- Curtmola, R.; Garay, J.; Kamara, S.; Ostrovsky, R. Searchable symmetric encryption. In Proceedings of the 13th ACM Conference on Computer and Communications Security (ACM CCS), Virginia, VA, USA, 3 October–3 November 2006; Volume 402, pp. 79–88. [Google Scholar] [CrossRef]
- Bösch, C.; Hartel, P.; Jonker, W.; Peter, A. A survey of provably secure searchable encryption. ACM Comput. Surv. 2014, 47, 1–51. [Google Scholar] [CrossRef]
- O’Neill, M.; Ruoti, S.; Seamons, K.; Zappala, D. TLS Proxies. In Proceedings of the 2016 ACM Internet Measurement Conference (ACM IMC), Santa Monica, CA, USA, 14–16 November 2016; ACM: New York, NY, USA, 2016; pp. 551–557. [Google Scholar] [CrossRef] [Green Version]
- Waked, L.; Mannan, M.; Youssef, A. The Sorry State of TLS Security in Enterprise Interception Appliances. Digit. Threat. Res. Pract. 2020, 1, 1–26. [Google Scholar] [CrossRef]
- Sherry, J.; Lan, C.; Popa, R.A.; Ratnasamy, S. BlindBox. ACM SIGCOMM Comput. Commun. Rev. 2015, 45, 213–226. [Google Scholar] [CrossRef]
- Lan, C.; Sherry, J.; Popa, R.A.; Ratnasamy, S.; Liu, Z. Embark: Securely Outsourcing Middleboxes to the Cloud. In Proceedings of the 13th USENIX Symposium on Networked Systems Design and Implementation (USENIX NSDI), Santa Clara, CA, USA, 16–18 March 2016; pp. 255–273. [Google Scholar]
- Yuan, X.; Wang, X.; Lin, J.; Wang, C. Privacy-preserving deep packet inspection in outsourced middleboxes. In Proceedings of the 35th Annual IEEE International Conference on Computer Communications (IEEE INFOCOM), San Francisco, CA, USA, 10–14 April 2016. [Google Scholar] [CrossRef]
- Ning, J.; Poh, G.S.; Loh, J.C.; Chia, J.; Chang, E.C. PrivDPI: Privacy-preserving encrypted traffic inspection with reusable obfuscated rules. In Proceedings of the ACM Conference on Computer and Communications Security (ACM CCS), New York, NY, USA, 11–15 November 2019; pp. 1657–1670. [Google Scholar] [CrossRef]
- Canard, S.; Diop, A.; Kheir, N.; Paindavoine, M.; Sabt, M. BlindIDS: Market-Compliant and Privacy-Friendly Intrusion Detection System over Encrypted Traffic. In Proceedings of the 2017 ACM on Asia Conference on Computer and Communications Security (ACM ASIACCS), ASIA CCS ’17, Abu Dhabi, United Arab Emirates, 2–6 April 2017; Association for Computing Machinery: New York, NY, USA, 2017; pp. 561–574. [Google Scholar] [CrossRef]
- Baek, J.; Kim, J.; Susilo, W. Inspecting TLS Anytime Anywhere: A New Approach to TLS Interception. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security (ACM ASIACCS), ASIA CCS ’20, Taipei, Taiwan, 5–9 October 2020; Association for Computing Machinery: New York, NY, USA, 2020; pp. 116–126. [Google Scholar] [CrossRef]
- Kim, J.; Camtepe, S.; Baek, J.; Susilo, W.; Pieprzyk, J.; Nepal, S. P2DPI: Practical and Privacy-Preserving Deep Packet Inspection. In Proceedings of the 2021 ACM Asia Conference on Computer and Communications Security, ASIA CCS ’21, Virtual Event, 7–11 June 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 135–146. [Google Scholar] [CrossRef]
- Han, J.; Kim, S.; Ha, J.; Han, D. SGX-Box. In Proceedings of the First Asia-Pacific Workshop on Networking (APNet), Hong Kong, China, 3–4 August 2017; ACM: New York, NY, USA, 2017; pp. 99–105. [Google Scholar] [CrossRef]
- Naylor, D.; Li, R.; Gkantsidis, C.; Karagiannis, T.; Steenkiste, P. And then there were more: Secure communication for more than two parties. In Proceedings of the 13th International Conference on Emerging Networking EXperiments and Technologies (ACM CoNEXT 2017), New York, NY, USA, 12–15 December 2017; pp. 88–100. [Google Scholar] [CrossRef]
- Costan, V.; Devadas, S. Intel SGX Explained. Technical Report. 2016. Available online: http://css.csail.mit.edu/6.858/2020/readings/costan-sgx.pdf (accessed on 11 November 2021).
- Papadogiannaki, E.; Ioannidis, S. Acceleration of Intrusion Detection in Encrypted Network Traffic Using Heterogeneous Hardware. Sensors 2021, 21, 1140. [Google Scholar] [CrossRef] [PubMed]
- Karagiannis, T.; Papagiannaki, K.; Faloutsos, M. BLINC: Multilevel Traffic Classification in the Dark. In Proceedings of the 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications, SIGCOMM ’05, Philadelphia, PA, USA, 22 August 2005; Association for Computing Machinery: New York, NY, USA, 2005; pp. 229–240. [Google Scholar] [CrossRef]
- Velan, P.; Medková, J.; Jirsík, T.; Čeleda, P. Network traffic characterisation using flow-based statistics. In Proceedings of the 2016 IEEE/IFIP Network Operations and Management Symposium (IEEE/IFIP NOMS), Istanbul, Turkey, 25–29 April 2016; pp. 907–912. [Google Scholar] [CrossRef] [Green Version]
- Sanders, C. Practical Packet Analysis: Using Wireshark to Solve Real-World Network Problems, 3rd ed.; No Starch Press: San Francisco, CA, USA, 2017. [Google Scholar]
- Collins, M. Network Security through Data Analysis: From Data to Action, 2nd ed.; O’Reilly: Sebastopol, CA, USA, 2017; p. 428. [Google Scholar]
- Rohde & Schwarz Company. R&S® PACE 2—First Packet Classification in An Encrypted World. 2021. Available online: https://www.ipoque.com/news-media/resources/brochures/dpi-engine-pace-2-first-packet-classification (accessed on 11 November 2021).
- Deri, L.; Martinelli, M.; Bujlow, T.; Cardigliano, A. nDPI: Open-source high-speed deep packet inspection. In Proceedings of the 2014 International Wireless Communications and Mobile Computing Conference (IWCMC), Nicosia, Cyprus, 4–8 August 2014; pp. 617–622. [Google Scholar] [CrossRef]
- Bernaille, L.; Teixeira, R.; Salamatian, K. Early Application Identification. In Proceedings of the 2nd Conference on Future Networking Technologies (ACM CoNEXT), New York, NY, USA, 4–7 December 2006. [Google Scholar] [CrossRef] [Green Version]
- Bernaille, L.; Teixeira, R. Implementation issues of early application identification. In Proceedings of the 3rd Asian Conference on Internet Engineering: Sustainable Internet (AINTEC), Phuket, Thailand, 27–29 November 2007; pp. 156–166. [Google Scholar] [CrossRef]
- Bernaille, L.; Teixeira, R. Early recognition of encrypted applications. In Proceedings of the 8th International Conference on Passive and Active Network Measurement (PAM), Louvain-la-Neuve, Belgium, 5–6 April 2007; pp. 165–175. [Google Scholar] [CrossRef] [Green Version]
- McGrew, D.; Anderson, B. Enhanced telemetry for encrypted threat analytics. In Proceedings of the 24th IEEE ICNP Workshop on Machine Learning in Computer Networks (NetworkML 2016), Singapore, 8 November 2016; pp. 1–6. [Google Scholar] [CrossRef]
- McGrew, D.; Anderson, B.; Perricone, P.; Hudson, B. Joy: A Package for Capturing and Analyzing Network Flow Data and Intraflow Data, for Network Research, Forensics, and Security Monitoring. 2016. Available online: https://github.com/cisco/joy (accessed on 11 November 2021).
- Tegeler, F.; Fu, X.; Vigna, G.; Kruegel, C. BotFinder: Finding Bots in Network Traffic Without Deep Packet Inspection. In Proceedings of the 8th International Conference on Emerging Networking Experiments and Technologies (ACM CoNEXT, 2012), Nice, France, 10–13 December 2012; ACM Press: New York, NY, USA, 2012; p. 349. [Google Scholar] [CrossRef]
- Alahmadi, B.A.; Mariconti, E.; Spolaor, R.; Stringhini, G.; Martinovic, I. BOTection: Bot Detection by Building Markov Chain Models of Bots Network Behavior. In Proceedings of the 15th ACM Asia Conference on Computer and Communications Security, (ACM ASIACCS), New York, NY, USA, 5–9 October 2020; pp. 652–664. [Google Scholar] [CrossRef]
- AlAhmadi, B.A.; Martinovic, I. MalClassifier: Malware Family Classification Using Network Flow Sequence Behaviour. In Proceedings of the 13th APWG Symposium on Electronic Crime Research (eCrime), San Diego, CA, USA, 15–17 May 2018; pp. 1–13. [Google Scholar] [CrossRef]
- Paxson, V. Bro: A system for detecting network intruders in real-time. Comput. Netw. 1999, 31, 2435–2463. [Google Scholar] [CrossRef]
- Bartos, K.; Sofka, M.; Systems, C.; Franc, V.; Bartos, K.; Sofka, M. Optimized Invariant Representation of Network Traffic for Detecting Unseen Malware Variants. In Proceedings of the 25th USENIX Security Symposium (USENIX Security), Austin, TX, USA, 10–12 August 2016; pp. 807–822. [Google Scholar]
- Korczyński, M.; Duda, A. Markov Chain Fingerprinting to Classify Encrypted Traffic. In Proceedings of the 33rd IEEE International Conference on Computer Communications (IEEE INFOCOM), Toronto, ON, Canada, 27 April–2 May 2014. [Google Scholar] [CrossRef]
- Shen, M.; Wei, M.; Zhu, L.; Wang, M. Classification of Encrypted Traffic with Second-Order Markov Chains and Application Attribute Bigrams. IEEE Trans. Inf. Forensics Secur. 2017, 12, 1830–1843. [Google Scholar] [CrossRef]
- Fu, Y.; Xiong, H.; Lu, X.; Yang, J.; Chen, C. Service Usage Classification with Encrypted Internet Traffic in Mobile Messaging Apps. IEEE Trans. Mob. Comput. 2016, 15, 2851–2864. [Google Scholar] [CrossRef]
- Liu, C.; He, L.; Xiong, G.; Cao, Z.; Li, Z. FS-Net: A Flow Sequence Network for Encrypted Traffic Classification. In Proceedings of the 38th IEEE International Conference on Computer Communications (IEEE INFOCOM), Paris, France, 29 April–2 May 2019. [Google Scholar] [CrossRef]
- Shen, M.; Zhang, J.; Zhu, L.; Xu, K.; Du, X. Accurate Decentralized Application Identification via Encrypted Traffic Analysis Using Graph Neural Networks. IEEE Trans. Inf. Forensics Secur. 2021, 16, 2367–2380. [Google Scholar] [CrossRef]
- Paine, K.; Whitehouse, O.; Sellwood, J. Indicators of Compromise (IoCs) and Their Role in Attack Defence; Technical Report Draft-Paine-Smart-Indicators-of-Compromise-03; Internet Engineering Task Force: Fremont, CA, USA, 2021. [Google Scholar]
- Anderson, B.; McGrew, D. Machine Learning for Encrypted Malware Traffic Classification: Accounting for Noisy Labels and Non-stationarity. In Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (ACM KDD), Halifax, NS, Canada, 13–17 August 2017; Volume Part F1296. [Google Scholar] [CrossRef]
- Husák, M.; Čermák, M.; Jirsík, T.; Čeleda, P. HTTPS traffic analysis and client identification using passive SSL/TLS fingerprinting. EURASIP J. Inf. Secur. 2016, 2016, 1–14. [Google Scholar] [CrossRef] [Green Version]
- Laperdrix, P.; Bielova, N.; Baudry, B.; Avoine, G. Browser Fingerprinting: A Survey. ACM Trans. Web 2020, 14, 1–33. [Google Scholar] [CrossRef] [Green Version]
- Althouse, J.B.; Atkinson, J.; Atkins, J. Open Sourcing JA3: SSL/TLS Client Fingerprinting for Malware Detection. 2017. Available online: https://engineering.salesforce.com/open-sourcing-ja3-92c9e53c3c41 (accessed on 11 November 2021).
- Benjamin, D. Applying Generate Random Extensions And Sustain Extensibility (GREASE) to TLS Extensibility. RFC 8701. 2020. Available online: https://rfc-editor.org/rfc/rfc8701.txt (accessed on 11 November 2021). [CrossRef]
- Ristic, I. HTTP Client Fingerprinting Using SSL Handshake Analysis. 2009. Available online: https://www.ssllabs.com/projects/client-fingerprinting/ (accessed on 11 November 2021).
- Majkowski, M. SSL Fingerprinting for p0f. 2012. Available online: https://idea.popcount.org/2012-06-17-ssl-fingerprinting-for-p0f/ (accessed on 11 November 2021).
- Brotherston, L. TLS Fingerprinting: Smarter Defending & Stealthier Attacking. 2015. Available online: https://blog.squarelemon.com/tls-fingerprinting/ (accessed on 11 November 2021).
- Matoušek, P.; Burgetová, I.; Ryšavý, O.; Victor, M. On Reliability of JA3 Hashes for Fingerprinting Mobile Applications. In Proceedings of the 12th EAI International Conference on Digital Forensics & Cyber Crime (EAI ICDF2C), Singapore, 7–9 December 2021; Volume 351, pp. 1–22. [Google Scholar] [CrossRef]
- Kotzias, P.; Paterson, K.G.; Razaghpanah, A.; Vallina-Rodriguez, N.; Amann, J.; Caballero, J. Coming of age: A longitudinal study of TLS deployment. In Proceedings of the Internet Measurement Conference, Boston, MA, USA, 31 October–2 November 2018; pp. 415–428. [Google Scholar] [CrossRef] [Green Version]
- Frolov, S.; Wustrow, E. The use of TLS in Censorship Circumvention. In Proceedings of the 2019 Network and Distributed System Security Symposium (NDSS), San Diego, CA, USA, 24–27 February 2019; Internet Society: Reston, VA, USA, 2019. [Google Scholar] [CrossRef]
- Anderson, B.; McGrew, D. Accurate TLS Fingerprinting using Destination Context and Knowledge Bases. arXiv 2020, arXiv:2009.01939. [Google Scholar]
- Artur, K.; Tomas, V.; Roman, L. Encrypted Traffic Analysis: The Data Privacy-Preserving Way to Regain Visibility into Encrypted Communication. 2019. Available online: https://www.flowmon.com/en/solutions/security-operations/encrypted-traffic-analysis (accessed on 11 November 2021).
- Hynek, K.; Luk, C. JA3cury—A New Approch to TLS Fingerprinting by Merging Fingerprinting Methods. Presented at Excel@FIT 2021. Available online: http://excel.fit.vutbr.cz/submissions/2021/013/13.pdf (accessed on 11 November 2021).
- Holland, J.; Schmitt, P.; Feamster, N.; Mittal, P. New Directions in Automated Traffic Analysis. In Proceedings of the 2021 ACM SIGSAC Conference on Computer and Communications Security (ACM CCS), CCS ’21, Virtual Event, 15–19 November 2021; Association for Computing Machinery: New York, NY, USA, 2021; pp. 3366–3383. [Google Scholar] [CrossRef]
- Benjamini, Y.; Hochberg, Y. Controlling the False Discovery Rate: A Practical and Powerful Approach to Multiple Testing. J. R. Stat. Soc. Ser. B Methodol. 1995, 57, 289–300. [Google Scholar] [CrossRef]
- Nechay, D.; Pointurier, Y.; Coates, M. Controlling False Alarm/Discovery Rates in Online Internet Traffic Flow Classification. In Proceedings of the IEEE International Conference on Computer Communications (IEEE INFOCOM), Rio de Janiero, Brazil, 19–25 April 2009; pp. 684–692. [Google Scholar] [CrossRef] [Green Version]
- Alahmadi, B.A.; Axon, L.; Martinovic, I. 99% False Positives: A Qualitative Study of SOC Analysts’ Perspectives on Security Alarms. In Proceedings of the 31st USENIX Security Symposium (USENIX Security), Boston, MA, USA, 10–12 August 2022; Available online: https://www.usenix.org/conference/usenixsecurity22/presentation/alahmadi (accessed on 11 November 2021).
- Erickson, N.; Mueller, J.; Shirkov, A.; Zhang, H.; Larroy, P.; Li, M.; Smola, A. AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data. arXiv 2020, arXiv:2003.06505. [Google Scholar]
- Dainotti, A.; Pescapé, A.; Claffy, K.C. Issues and future directions in traffic classification. IEEE Netw. 2012, 26, 35–40. [Google Scholar] [CrossRef] [Green Version]
Survey | Protocols | Problem Domains | Methods | Notes |
---|---|---|---|---|
[18] | Various | Various | Various | • Comprehensive and up-to-date survey on encryp-ted NTA methods • Insufficient detail for the security domain |
[19] | Various | Various | ML-based only | • Comprehensive survey on ML-based methods • Omits recent works for encrypted malware traffic |
[20] | Various | Various | Various | • The first in encrypted traffic analysis area • Published in 2015 so that lacks the state-of-the-art |
[21] | Various | Mobile Apps | DL-based only | • Experimental evaluation among existing DL-based methods |
[22] | Various | Mobile Apps | Various | • The most comprehensive for mobile traffic • May not suitable for many SOCs protecting servers |
[23] | Various | Detection | Various | • Comprehensive survey on privacy preserving ins-pection in middleboxes • Too little coverage on ML-based methods |
[24] | Various | Traffic Classification | DL-based only | • Brief overview on DL-based methods |
[25] | Various | Website fingerprinting | Feature selection only | • Brief overview on ML-based methods • Focus on website finterprinting dataset |
[26] | HTTPS | Web Apps | Various | • Services identification inside HTTPS |
[27] | TLS | Various | Interception-based only | • Industry practices analysis of TLS interception |
This Paper | TLS | Malware Traffic | Various, focusing on ML-based | • The state-of-the-art for ML-based malware detection and family classification |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Oh, C.; Ha, J.; Roh, H. A Survey on TLS-Encrypted Malware Network Traffic Analysis Applicable to Security Operations Centers. Appl. Sci. 2022, 12, 155. https://doi.org/10.3390/app12010155
Oh C, Ha J, Roh H. A Survey on TLS-Encrypted Malware Network Traffic Analysis Applicable to Security Operations Centers. Applied Sciences. 2022; 12(1):155. https://doi.org/10.3390/app12010155
Chicago/Turabian StyleOh, Chaeyeon, Joonseo Ha, and Heejun Roh. 2022. "A Survey on TLS-Encrypted Malware Network Traffic Analysis Applicable to Security Operations Centers" Applied Sciences 12, no. 1: 155. https://doi.org/10.3390/app12010155
APA StyleOh, C., Ha, J., & Roh, H. (2022). A Survey on TLS-Encrypted Malware Network Traffic Analysis Applicable to Security Operations Centers. Applied Sciences, 12(1), 155. https://doi.org/10.3390/app12010155