An Exploit Traffic Detection Method Based on Reverse Shell
Abstract
:1. Introduction
- We develop a reverse shell traffic detection method based on thought time. We extract the fusion information feature from original features, such as the packet delay sequence, as input of a decision tree model to identify reverse shell traffic in the shellcode execution stage.
- We propose ETDetector, a novel exploit detection method. Different from the existing methods, we take the reverse shell, one of the most popular exploit behaviors, as the entry point to detect exploits traffic. We first identify reverse shell traffic in the shellcode execution stage. Then, we trace suspicious traffic in the shellcode delivery stage by reconstructing the session relationship of the two stages above.
- We design a traffic stratification method based on a bisecting K-means algorithm, which can intuitively show the traffic communication behavior and improve the interpretability of ETDetector.
- We simulate ten vulnerability exploitation experiments to evaluate the effect of ETDetector. It proves that ETDetector detects nine of them, and the result is not affected by the specific category of vulnerabilities and traffic encryption methods.
2. Background
2.1. Shellcode
2.2. Reverse Shell
2.2.1. Reverse Shell Definition and Classification
2.2.2. Reverse Shell Communication Behavior Model
3. Methodology
3.1. Traffic Feature Analysis
3.1.1. Original Session Features
- (1)
- Packet delay sequence. The delay of a packet in a session means the time interval between the current packet and the last valid packet. The time interval sequence is the packet delay sequence. When reverse shell attacks occur, the attacker needs to take some time to think about what to do before performing operations, which causes the packet containing the attack command to have a relatively long interval with the next packet. Generally speaking, normal traffic will not appear in this situation. Therefore, this feature can obviously reflect the attacker’s thinking time, and we can regard it as a crucial signal for detecting reverse shell attack behavior;
- (2)
- Packet direction sequence. We pronounce the sequence of valid packets transmitted in a session as the sequence of packet directions. The direction of the first SYN packet in the three-way handshake is “0”. The reverse packet direction is marked as “1”. In addition to packet delay, packet interaction direction sequence is another important feature that can reflect the interaction of reverse shell attack behavior. Generally speaking, in the early stage of rebound shell attacks, the target initiates a TCP connection and returns shell-related information to prepare for subsequent interactions. After a successful reverse shell, the attacker sends the command, the target executes the command, and returns the result. Then, it enters the shell return stage. Taking the Metasploit reverse shell module as an example, the attacker usually sends packets bearing the Trojan to the target machine. When Meterpreter completely takes over the control of the target machine, the target sends the packet carrying the Meterpreter shell information to the control terminal, and the packet transmission direction is “0”. The last stage is command interaction. The control terminal sends packets carrying attack commands, the direction of which is “1”. The target machine sends packets with the response message, the direction of which is “0”. Because the attacker usually waits for the target machine to return the message before executing a new command, there are generally only packets in the direction of “0” between the two packets in the direction of “1”;
- (3)
- Packet length sequence. We regard a valid packet length sequence in a session as a packet length sequence. Packet length means transport layer payload length. After returning the shell, the attacker sends the command, and the target machine returns the command execution result. Generally, the length of a system command is within a confined range, but the command execution result is not limited. Usually, one of the primary purposes for the attacker to execute commands after the successful reverse shell attack is to obtain information about the target. Therefore, from the perspective of data flow direction, there is data leakage. This behavior is reflected in the traffic by the shorter payload length of the packet carrying the command and the longer payload length of the packet carrying the response message.
3.1.2. Fusion Information Features
- (1)
- The number distribution sequence of delay packets within 3 min before reverse flow. Three flag bits represent the number distribution of delay packets in the first three minutes. The number “0” indicates that no reverse delay packet exists in the time range, and “1” represents that the reverse delay packet exists in the time range;
- (2)
- Whether the packet length sequences of the delayed packet and non-delayed packet obey the same distribution. First, we extract the packet length sequences of the session. The directions are symbolized by positive and negative values. Then, we decide whether the two packet length sequences obeyed the same distribution (whether the p-value was less than 0.05, which indicated that the distribution was inconsistent) by the Kolmogorov–Smirnov test (K-S test). “0” means to obey the same distribution, and “1” means contrarily;
- (3)
- The difference between the packet direction sequence of the delayed packet and the packet direction change sequence. Firstly, the packet direction sequences of all delayed packets and non-delayed packets were extracted, respectively, with “0” representing forward and “1” representing reverse. Then, the sequence of packet direction change is calculated, respectively. “0” means the direction of adjacent packets remains unchanged, and “1” means the change. Next, we calculate the Hamming distance between the delay packet direction sequence and the packet direction change sequence, and its ratio to its sequence length. The feature is represented by a bit of a flag bit, and the threshold is set as 0.58. If the ratio is greater than 0.58, the flag bit is “1”; otherwise, it is “0”. For details about the meaning of the threshold setting, see the following.
3.1.3. Analysis of Two-Stage Session Association Features of Vulnerability Exploitation
- (1)
- According to the reverse shell session features described in Section 3.1.2, we can identify the IP address of the suspicious target machine. As shellcode delivery sessions must be binding with the target, the first step is to filter out all the packets that have the target’s IP address as the recipient;
- (2)
- From the chronological analysis, the shellcode execution phase always occurs after the shellcode delivery phase, which means the message carrying the shellcode must appear before the first SYN message of the reverse shell session three-way handshake. The interval between the execution of the shellcode and the establishment of the reverse shell connection is usually less than 0.1s. Considering the delay caused by network transmission problems, we filter out all the packets based on timestamp and further analyze the suspicious session.
3.1.4. Session Feature Analysis in Shellcode Delivery Phase
- (1)
- Direction of establishing TCP connection sessions. In the shellcode delivery phase, the attacker usually already gathers service information of the target, so it is reasonable to establish a TCP connection. On the contrary, a legal client sends the first SYN message regardless of whether users browse web pages or transfer files. Therefore, the direction of the TCP connection session establishment can be crucial to determining the exploit traffic. Of course, attackers also use social engineering to induce users to click malicious links and actively establish TCP connections with malicious servers to obtain shellcodes in the real world. To specify the scope of this paper, we suppose the attacker always initiates a TCP session. If the target receives the first SYN message in a session, step into (2). Otherwise, it is considered benign traffic;
- (2)
- The direction of data flow. In the shellcode delivery phase, the attacker sends the shellcode to the target, and the target usually only sends an ACK message. Therefore, data flow is mainly from the attacker to the target. We set the timestamp of the first SYN message in the three-way shakehand of the reverse shell session as the cut-off point. Suppose the data flow to the target in the last interaction before this time point and step into (3). Otherwise, it is considered benign traffic;
- (3)
- Number of the response message. Through the analysis of exploit traffic, we infer that after the attacker sends shellcode packets, the target either does not send a response message or sends multiple response packets. Legal programs of the target always send one or more response packets after receiving the request from the client, apart from receiving a FIN or RST message to disconnect the connection. Therefore, we use the number of response messages from the target in the last interaction before the cut-off time for the verdict. Provided the target sends no response with the connection lasting, the session is a suspicious shellcode delivery session. If there are multi-response messages, step into (4). Otherwise, it is considered benign traffic;
- (4)
- Packet length distribution. After the successful execution of shellcode delivery, the target may return some content containing relevant information about itself, similar to the response message information sent by the legal programs, but payload length distributions are different. The former message is related to the specific command sent by the attacker, causing different payload lengths. Generally, the latter is fragmented as maximum payload length to improve transmission efficiency. Therefore the number of bytes in each fragment is the same except for the last one. Here we infer the type of a session based on whether the payload length of response packet fragmentation is uniform. If the payload length of response packet fragmentation conforms to the general slicing discipline, it is benign traffic. Otherwise, it is a suspicious shellcode delivery session.
3.2. Overview of ETDetector
3.2.1. Data Preprocessing
3.2.2. Feature Extraction
3.2.3. Abnormal Detection
4. Evaluation
- RQ1: Can the decision tree model identify reverse shell traffic at whether it is encrypted?
- RQ2: Can ETDetector detect the traffic of the shellcode delivery stage based on reverse shell traffic?
4.1. Experimental Setup
4.1.1. Data Collection
4.1.2. Evaluate Metrics
- TP (TRUE positive): The current traffic is predicted to be reverse shell traffic, and it is reverse shell traffic.
- FN (FALSE negative): The current traffic is predicted to be normal, but it is a reverse shell traffic.
- FP (FALSE positive): The current traffic is predicted to be reverse shell traffic, but it is normal traffic.
- TN (TRUE negative): The current traffic is predicted to be normal, and it is normal traffic.
- M: The number of code execution vulnerability exploitation behaviors detected in the defined time T.
- MC: The number of encrypted traffic detected in the defined time T.
- MN: The number of non-encrypted traffic detected in the defined time T.
- N: The total number of occurrences of code execution vulnerability exploitation behavior in the defined time T.
- NC: The number of occurrences under traffic encryption conditions in the defined time T.
- NN: The number of occurrences without encryption in the defined time T.
4.2. Experimental Design Scheme
4.2.1. Comparison of Detection Results of Mixed Traffic in Three Proportions
- (1)
- The number distribution sequence of delay packets in the reverse flow within the first three minutes. Figure 3 shows that, compared with white traffic, there are more delay packets of reverse direction in reverse shell traffic. Generally speaking, once the attackers get a successful reverse shell, they should execute commands immediately, causing a series of delay packets due to the thought time. Moreover, the attackers usually wait for the command execution result before sending new commands. Therefore, the transmission rate of the delay packets is limited, and the distribution is relatively dispersed within a certain period. On the contrary, the interaction of benign programs is almost automatic, making delay packets rare. Even if there are delay packets, they should be intense in short intervals due to the sending rate of programs. Therefore, the number of delay packets in each interval can effectively distinguish the traffic generated by human and automatic operation, reducing the possibility of false positives;
- (2)
- Whether the packet length sequences of the delayed packet and non-delayed packet obey the same distribution. Figure 4 shows the K-S test results of the packet length sequence of the delayed packet and non-delayed packet, where the K-S test p-value corresponding to the red dashed line is 0.05. It shows that the consistency of packet length sequence distribution of reverse shell traffic sessions is below 0.05, meaning they do not obey the same distribution, while 98% of benign traffic sessions are on the contrary. As we all know, packets sent by benign programs fit the fragment rule to achieve maximum transmission efficiency. However, it is not the same in reverse shell traffic. Delay packets bearing commands have a short packet payload, and non-delayed packets delivering shellcode have more bytes, causing a relatively mass difference in packet length distribution. Here we judge whether the packet length sequences in bi-direction flows obey the same distribution based on the K-S test. Because it is one of the most popular non-parametric methods, and it is sensitive to the difference in the position and shape parameters of the empirical distribution function of samples. When the p-value is less than 0.05, we deny the null hypothesis, meaning the two packet length sequences do not obey the same distribution;
- (3)
- The difference between the packet direction sequence of the delayed packet and the packet direction change sequence. Figure 5 shows the ratio of the Hamming distance to the packet sequence length. Compared with benign traffic, the delay packet direction sequence of the reverse shell traffic is more different from its packet direction change sequence. As commands are always from the attacker to the target in reverse shell interaction, delay packets with commands always belong to the reverse flow, and the direction changes infrequently. We suppose that the direction change sequence starts with “0”, and if the direction of the packet is different from the last one, note “1”, else “0”. Therefore, the direction change sequence of delay packets in reverse shell sessions consists of merely “0”. According to Section 3.1.1, “1” appears continuously in the direction sequence of the reverse shell traffic delay packet. We infer the distribution of the above sequence is symmetrical, and the Hamming distance is the best way to describe the difference between the two sequences. Considering the length of the packet sequence may cause interference, we use the ratio of Hamming distance to sequence length as the judgment basis. Figure 5 shows the ratio of reverse shell traffic is all greater than 0.58, while 99% of benign traffic is on the contrary. Therefore, we can detect reverse shell traffic easily with the threshold of 0.58.
4.2.2. Comparison of Detection Results before and after Feature Fusion
4.2.3. Comparison of Detection Results of Exploit Traffic
4.2.4. The Stratification Results of Detected Exploit Traffic
5. Conclusions
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
References
- Kong, D.; Tian, D.; Pan, Q.; Liu, P.; Wu, D. Semantic aware attribution analysis of remote exploits. Secur. Commun. Netw. 2013, 6, 818–832. [Google Scholar] [CrossRef]
- Wu, J.; Arrott, A.; Osorio, F.C.C. Protection against remote code execution exploits of popular applications in Windows. In Proceedings of the IEEE 9th International Conference on Malicious and Unwanted Software: The Americas (MALWARE), Fajardo, PR, USA, 28–30 October 2014. [Google Scholar]
- Parrend, P.; Navarro, J.; Guigou, F.; Deruyver, A.; Collet, P. Foundations and applications of artificial Intelligence for zero-day and multi-step attack detection. EURASIP J. Inf. Secur. 2018, 2018, 4. [Google Scholar] [CrossRef]
- Homoliak, I.; Teknos, M.; Ochoa, M.; Breitenbacher, D.; Hosseini, S.; Hanacek, P. Improving Network Intrusion Detection Classifiers by Non-payload-Based Exploit-Independent Obfuscations: An Adversarial Approach. EAI Endorsed Trans. Secur. Saf. 2018, 5, 156245. [Google Scholar] [CrossRef]
- Chen, L.; Sultana, S.; Sahita, R. Henet: A deep learning approach on intel® processor trace for effective exploit detection. In Proceedings of the IEEE Security and Privacy Workshops (SPW), SP Workshops 2018, San Francisco, CA, USA, 24 May 2018. [Google Scholar]
- Biswas, S.; Sohel, M.; Sajal, M.M.; Afrin, T.; Bhuiyan, T.; Hassan, M.M. A study on remote code execution vulnerability in web applications. In Proceedings of the International Conference on Cyber Security and Computer Science (ICONCS 2018), Karabuk, Turkey, 18–20 October 2018. [Google Scholar]
- Mokbal, F.M.; Dan, W.; Imran, A.; Jiuchuan, L.; Akhtar, F.; Xiaoxi, W. MLPXSS: An integrated XSS-based attack detection scheme in web applications using multilayer perceptron technique. IEEE Access 2019, 7, 100567–100580. [Google Scholar] [CrossRef]
- Polychronakis, M.; Anagnostakis, K.G.; Markatos, E.P. Network-level polymorphic shellcode detection using emulation. J. Comput. Virol. 2007, 2, 257–274. [Google Scholar] [CrossRef]
- Borders, K.; Prakash, A.; Zielinski, M. Spector: Automatically analyzing shell code. In Proceedings of the 23rd Annual Computer Security Applications Conference (ACSAC 2007), Miami Beach, FL, USA, 10–14 December 2007. [Google Scholar]
- Kanemoto, Y.; Aoki, K.; Iwamura, M.; Miyoshi, J.; Kotani, D.; Takakura, H.; Okabe, Y. Detecting successful attacks from IDS alerts based on an emulation of remote shellcodes. In Proceedings of the IEEE 43rd Annual Computer Software and Applications Conference (COMPSAC), Milwaukee, WI, USA, 15–19 July 2019. [Google Scholar]
- Pratomo, B.A.; Burnap, P.; Theodorakopoulos, G. Blatta: Early exploit detection on network traffic with recurrent neural networks. Secur. Commun. Netw. 2020, 2020, 8826038. [Google Scholar] [CrossRef]
- Pratomo, B. Low-Rate Attack Detection with Intelligent Fine-Grained Network Analysis. Ph.D. Thesis, Cardiff University, Cardiff, UK, 2020. [Google Scholar]
- Foster, J.C. Sockets, Shellcode, Porting, and Coding: Reverse Engineering Exploits and Tool Coding for Security Professionals; Elsevier: Amsterdam, The Netherlands, 2005. [Google Scholar]
- Cheng, T.H.; Lin, Y.D.; Lai, Y.C.; Lin, P.C. Evasion techniques: Sneaking through your intrusion detection/prevention systems. IEEE Commun. Surv. Tutor. 2011, 14, 1011–1020. [Google Scholar] [CrossRef] [Green Version]
- Noman, H.A.; Al-Maatouk, Q.; Noman, S.A. Design and Implementation of a Security Analysis Tool that Detects and Eliminates Code Caves in Windows Applications. In Proceedings of the International Conference on Data Analytics for Business and Industry, Virtual, 25–26 October 2021. [Google Scholar]
- Yadav, T.; Rao, A.M. Technical aspects of the cyber kill chain. In Proceedings of the International Symposium on Security in Computing and Communication, Kochi, India, 10–13 August 2015. [Google Scholar]
- Stipovic, I. Antiforensic techniques deployed by custom developed malware in evading anti-virus detection. arXiv 2019, arXiv:1906.10625. [Google Scholar]
- Leka, C.; Ntantogian, C.; Karagiannis, S.; Magkos, E.; Verykios, V.S. A Comparative Analysis of VirusTotal and Desktop Antivirus Detection Capabilities. In Proceedings of the 2022 13th International Conference on Information, Intelligence, Systems & Applications, IISA 2022, Corfu, Greece, 18–20 July 2022. [Google Scholar]
- Denis, M.; Zena, C.; Hayajneh, T. Penetration testing: Concepts, attack methods, and defense strategies. In Proceedings of the IEEE Long Island Systems, Applications, and Technology Conference, LISAT 2016, Farmingdale, New York, NY, USA, 29 April 2016. [Google Scholar]
- Rahalkar, S.; Jaswal, N. The Complete Metasploit Guide: Explore Effective Penetration Testing Techniques with Metasploit; Packt Publishing Ltd.: Birmingham, UK, 2019. [Google Scholar]
Scene | Traffic Type (Number) | Use |
---|---|---|
Get online legally | Background traffic (38,512) | 18% for training and validation (6847) and 82% for testing in Experiments 1 and 2 (31,665) |
Non-encrypted reverse shell | Non-encrypted reverse shell traffic (77) | 72% for training and validation (6847) and 28% for testing in Experiments 1 and 2 (21) |
Encrypted reverse shell | Encrypted reverse shell traffic (39) | Used for testing in Experiment 2 |
Realistic attack scenario | Non-encrypted reverse shell traffic (5); encrypted reverse shell traffic (5) | Used for testing in Experiment 3 |
Ratio | Accuracy | Precision | Recall | F1-Score | F2-Score |
---|---|---|---|---|---|
1:50 | 0.995 | 0.942 | 0.82 | 0.876 | 0.841 |
1:100 | 0.995 | 0.835 | 0.671 | 0.703 | 0.679 |
1:300 | 0.998 | 0.785 | 0.720 | 0.717 | 0.715 |
Feature Vector Type | Mean Time Required/s | ||
---|---|---|---|
1:50 | 1:100 | 1:300 | |
Packet delay sequence | 3.614 | 4.384 | 7.868 |
Packet direction sequence | 3.392 | 4.120 | 7.105 |
Packet length sequence | 3.505 | 4.292 | 7.335 |
Packet delay + packet direction sequence | 3.587 | 4.364 | 7.901 |
Packet length + packet direction sequence | 3.601 | 4.185 | 7.368 |
Packet delay + packet length sequence | 6.522 | 7.716 | 14.36 |
Packet delay + packet length + packet direction sequence | 6.424 | 7.943 | 13.992 |
Fusion information feature vector | 0.863 | 0.949 | 1.379 |
Vulnerability Type, CVE, and Shell Type | Encrypted | Reference [11] | Ours |
---|---|---|---|
Remote code execution vulnerability, CVE-2021-22205, bash | ✗ | ✓ | ✓ |
Buffer overflow vulnerability, MS17-010, Python | ✗ | ✓ | ✓ |
Remote code execution vulnerability, CVE-2020-17530, Perl+OpenSSL | ✓ | ✓ | ✓ |
Remote code execution vulnerability, CVE-2019-15107, linux/x64/shell/reverse_tcp | ✗ | ✓ | ✓ |
Buffer overflow vulnerability, MS17-010, windows/shell_reverse_tcp | ✗ | ✓ | ✓ |
Buffer overflow vulnerability, CVE-2017-7494, python/shell_reverse_tcp_ssl | ✓ | ✗ | ✓ |
Remote code execution vulnerability, CVE-2019-0708, windows/Meterpreter/reverse_tcp | ✓ | ✗ | ✓ |
Buffer overflow vulnerability, CVE-2017-7494, linux/x64/Meterpreter/reverse_tcp | ✓ | ✗ | ✓ |
Remote code execution vulnerability, CVE-2019-0708, windows/Meterpreter/reverse_http | ✓ | ✗ | ✗ |
Remote code execution vulnerability, CVE-2019-0708, windows/Meterpreter/reverse_tcp_rc4 | ✓ | ✗ | ✓ |
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2023 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Liu, Y.; Cai, R.; Yin, X.; Liu, S. An Exploit Traffic Detection Method Based on Reverse Shell. Appl. Sci. 2023, 13, 7161. https://doi.org/10.3390/app13127161
Liu Y, Cai R, Yin X, Liu S. An Exploit Traffic Detection Method Based on Reverse Shell. Applied Sciences. 2023; 13(12):7161. https://doi.org/10.3390/app13127161
Chicago/Turabian StyleLiu, Yajing, Ruijie Cai, Xiaokang Yin, and Shengli Liu. 2023. "An Exploit Traffic Detection Method Based on Reverse Shell" Applied Sciences 13, no. 12: 7161. https://doi.org/10.3390/app13127161
APA StyleLiu, Y., Cai, R., Yin, X., & Liu, S. (2023). An Exploit Traffic Detection Method Based on Reverse Shell. Applied Sciences, 13(12), 7161. https://doi.org/10.3390/app13127161