1. Introduction
In recent years, Advanced Persistent Threat (APT) [
1] has become the most serious cyber attack. It steals confidential information or undermines the information system from a particular organization or company. As a kind of high-latency, high-hidden, high-harm malware, Trojan plays an indispensable role in the APT attacks. A Trojan called Remote Access Trojan (RAT) that is often used by APT attackers, which can give them interactive access to a victim’s computer and steal confidential data [
2,
3]. RAT is often embedded in the system by email attachment, USB memory, file bundle, or in combination with Zero-day vulnerability penetration. Not only is it hard for ordinary users to detect implanted RAT, but also it is difficult for administrators to find such malware.
Given the fearful security threat of RAT, a growing number of research focused on proposing an efficient RAT detection method to alleviate this damage. Existing RAT detection methods can be divided into two categories [
4]: host-based detection methods and network-based detection methods. The host-based method that employs static analysis is largely based on syntactic signatures or semantic features. However, the method based on static analysis alone might not be sufficient to identify RAT [
5]. The host-based detection method that employs dynamic analysis is a good complement to static analysis method, but this type of methods have difficulty in detecting RAT in its keep-alive state. Network-based detection methods usually identify RATs by checking the payload of network traffic, analyzing the statistical characteristics of network traffic, or analyzing the behaviour of network traffic. For the method of analyzing the payload of network traffic needs to establish a characteristic storehouse of the RATs’ network traffic in advance and thus this kind of methods cannot detect the unknown RAT. In addition, for the detection methods of analyzing the communication behaviour of RATs, they can detect some unknown RATs due to the fact that they pay attention to the different communication behaviours between the RATs and benign programs. However, unfortunately, many benign programs such as P2P software have some similar characteristics as RAT in the aspect of network communication.
To complement one another’s strengths, the hybrid method combining host-based and network-based detections is feasible. Based on this idea, we propose a phased RATs detection method with double-sides features (PRATD). In PRATD, network traffic and host operation records of a program are separately extracted at the same time, and then the extracted double-sides (host-side and network-side) features are combined into a feature vector. Besides, a detection model for each the two different running states of RAT, i.e., keep-alive and command & control states, are trained respectively. In particular, PRATD neither collects the packet’s payload transmitted by the program nor monitors the concrete application data used by the program, i.e., it does not threaten the privacy of the user.
The main contributions of this article can be summarized as follows:
We combine both the host-side and network-side features to detect RATs. Since RAT performs operations on the network-side or host-side throughout its running process, this article extracts some features from the network-side and host-side to cover RAT’s network and host behaviours, which is conducive to distinguishing the RATs from benign programs.
We propose a phased method named PRATD for detecting RAT based on the double-side features and the two major states of RAT. RAT in the different stages has different behaviour characteristics, so we train the detection model for each of the two major states (i.e., keep-alive and command & control) to improve the detection performance for RAT.
We implement a prototype system and evaluate PRATD on different kinds of benign programs and RATs. The experimental results show that the proposed PRATD is able to obtain a good detection performance, for example, the obtained True Positive Rate (TPR) and False Positive Rate (FPR) of PRATD for known RATs is 93.609% and 0.407% respectively when AdaBoost is used to train its two detection models.
The article is organized as follows.
Section 2 summarized the related works. In
Section 3, we analyzed the runtime status of the RATs.
Section 4 described the used features of PRATD and introduced the details of the method. In
Section 5, we gave a description of the experimental data and the experimental results.
Section 6 discussed our results. Finally,
Section 7 concluded our article.
2. Related Works
As a kind of malware with great harmful, RAT has received the researchers’ extensive concern. Farinholt et al. [
6] and Rezaeirad et al. [
3] tries to reveal the operators and procedures of RAT. However, they only concentrate on analysing one or two RAT families and do not propose any detection method. To detect Trojan, many detection methods have been proposed in the past few decades. Based on the objects to be monitored, these methods can be divided into two categories, i.e., host-based and network-based methods. For the host-based method, two types of technologies are currently used, namely, static and dynamic analysis technologies. Static technology analysis of the executable program before this program is executed. The most widely used static technology is the traditional signature-based technique [
7], but it can easily be evaded by obfuscation techniques [
8]. Instead of using only the signatures, some researchers detect Trojan by employing the opcode sequences of the executable program. In work [
9], opcodes and API function names are used to classify malware. Some methods based on malware images and deep leaning [
10,
11] can be also classified to the static analysis methods. They convert malware binaries into images and then use deep learning for malware classification. In general, static detection method has a fast detection speed, but it is difficult to analyze the malware that uses code obfuscation techniques (e.g., UPX, PEX, ASPack and FSG) [
12]. As for dynamic analysis technology, it analyzes the behaviours of a program during it is running. In recent years, the information collected during the program is running, such as API call sequence [
13], system logs [
8], CPU usage [
14], process behaviours [
15] and system provenance graph [
16], is used in some dynamic analysis methods. For example, Yang et al. [
8] propose an instrumentation-free RAT forensic system that can reconstruct RAT attack by using the system logs on Windows platform. However, most dynamic detection methods can achieve high TPR only when RAT has performed a lot of operational behaviours.
A growing number of people pay more attention to the network-based RAT detection method. The most widely used detection technique is the Deep Packet Inspection (DPI) [
17], it distinguishes between benign programs and malware by analyzing whether the payload of each network packet has sensitive information. This technique can effectively identify abnormal traffic and has been deployed to many Network Intrusion Detection Systems (NIDS). However, this traditional technique not only needs to maintain the detection rules continuously but also has to detect the content of payload [
18], so it often has a high False Negative Rate (FNR) and threatens user’s privacy. To overcome the limitation of DPI technique, many researchers focus on the communication characteristics of RATs and detect RATs based on different network features [
4,
19,
20,
21]. In [
19], the authors pay attention to the early stage of RAT communication and propose a method to detect RAT by analyzing the behaviours of RAT in the early stage. Xie et al. [
21] extract features from packet-level and flow-level to form two feature vectors, and then build a hybrid structure neural network model based on deep learning to detect HTTP-based Trojan. To improve the detection result, Pallaprolu et al. [
22] detect RAT based on the voting result of three different classifiers. It can be found that the research on network-based RATs detection is focused on analyzing the statistical characteristics of network traffic of RAT. However, some benign software has similar network behaviour as RATs, especially P2P software. Therefore, the network-based RAT detection methods usually have the problem of a high FPR.
Each of host-based and network-based methods has a distinct analysis object, and each has distinct advantages and disadvantages. Therefore, a method that integrates host and network data can benefit from both of their strengths. In the domain of bot/botnet detection, a few methods that detect bot/botnet based on multi-source were proposed recently. Zeng et al. [
23] proposed a botnet detection method that combines host-level and network-level information for the first time. This method detects the behaviours of bots on the host-side and network-side respectively. When its network-side analyzer detects an abnormality, this method triggers the correlation engine to correlate the detection result of the host-side and network-side, then obtains the final detection result. This method aims at the problem of botnet-detection and needs to be based on the assumption that it is the existence of similarities among bots, so this method is not good for RAT detection. Shin et al. [
24] proposed a host-network cooperated framework for bot malware detection. This method correlates information from the different host-level and network-level aspects and performs heavy monitoring only when necessary, one of its core modules mainly assumes that bots will use DNS to contact their master, thus this method is also not suitable for RAT detection. Kalpika and Vasudevan [
25] proposed a system that consists of folder monitoring, network traffic monitoring and API hook monitoring for detecting Zeus bot. This system will trigger an alert for the presence of Zeus bot if the three conditions regarding specific folder, network traffic and API hooks are all satisfied. Ahmed A.awad et al. [
26] introduced a machine learning-based framework for detecting compromised hosts and networks that are infected by the RAT-Bots. This method relies heavily on the host agent because its network agent starts to run until it receives the alarm sent by the host agent. However, the host agent is difficult to obtain a very high TPR because modern RATs use various concealing technology. Different from the above methods, the proposed PRATD focus on RAT detection and can simultaneously analyze the behaviours of RAT on host-side and network-side in the two states of keep-alive and command & control, thus contributing to achieve high detection accuracy.
3. RAT Runtime State Analysis
Generally, RATs are based on the C/S architecture and thus RAT consists of a client and a server. The client is controlled by the attacker and the server is implanted into the victim computer. The attacker uses the client to control the server to implement remote control for the victim host. In the early days, the client of a RAT will actively connect to its server, but this kind of RAT is easy to be detected because many security devices strictly check incoming traffic. For concealment, the RATs that use the server to actively connect to the client are widely used recently. As is shown in
Figure 1, the whole communication of RAT includes three runtimes states: connection establishment, keep-alive and command & control. In the three states, connection establishment is the first state and only last a short time, while keep-alive and command & control states appear alternately and repeatedly [
19]. In particular, this article focuses more on the keep-alive and command & control states because they take up most of the entire runtime of RAT. Next, we will introduce the details of the two states.
3.1. Keep-Alive State
The keep-alive state has the longest period of the runtime of RAT, and it is also the state with the smallest change. When the server of RAT does not receive a command request within a certain amount of time, the server will actively send a keep-alive request to the client to inform the hacker that it is online. On the one hand, since the size of the keep-alive request is small, the network packet of a RAT that carries the keep-alive request often contains only a few bytes and the most same content. On the other hand, the server of RAT in the keep-alive state would perform few actions on the host, since it does not execute substantial operations without receiving a command request from the client. Therefore, from the perspective of the detection, the size and content of the network packet, and the number of host operation records can be used to identify the behaviours of RAT in its keep-alive state. However, some benign programs such as instant messaging software also have the same kind of communication that keeps the connection alive, so it is difficult to distinguish between these benign programs and RATs by using only the network-side features in this state. Considering the mentioned conditions, it is rational to add host-side features to detect the behaviours of RAT in its keep-alive state more efficiently.
3.2. Command & Control State
Command & control is the state in which the client and server of RAT interact most frequently. After the client and server successfully establish a connection, the hacker controls the client to send a command request to the server as needed. After receiving the requested command, the server of RAT will complete the requested task quickly and return the result to the client. Firstly, since the commands for stealing victim’s data are often used by an attacker, the server of RAT usually send the corresponding data to the client in its command & control state, which results in the RAT’s connections in this state often have large amounts of upload data or packets. Secondly, since there will be a think time after the hacker receives the result of the request command, the next request command will be sent until the hacker makes a decision, which leads to a relatively long interval time between a request and the next request in this state. Thirdly, some host behaviours and network traffic will be generated on the victim host because the server needs to use system resources to complete the request command and send the result to the client, which generates many host and network records in this state. Therefore, from the perspective of the detection, it is important to pay attention to the network traffic and host behaviours related to the executions of RAT’s commends.
As mentioned above, there are some interactions between the client and server of RAT in the keep-alive and command & control states, and as a result, the network traffic and host operation records regarding to the actions of RAT are generated correspondingly. Therefore, both the network traffic and host operation records are useful for detecting RAT. Besides, it can be found that the network traffic and operational records in the keep-alive and command & control states of a RAT are different. In the keep-alive state, the amounts of network traffic and operation records generated by RAT during different periods are usually the same, while they are often different in the command & control state. Based on these observations, it is reasonable to detect RATs by using the double-side features and in each of the keep-alive and command & control states. Follow this line of thought, we proposed PRATD and give the details in
Section 4.
4. PRATD: A Phased RATs Detection Method with Double-Sides Features
To describe the proposed PRATD, we first introduce its architecture. Then, the features used in the PRATD are described. Finally, we explain the details of the model training and detection.
4.1. Architecture
The intuition of PRATD is that we believe that a RAT not only transmits information on the network but also uses host resource during it is running. Besides, as mentioned above, RAT has two main runtime states, i.e., keep-alive and command & control, and we find that RAT has the different host or network behaviours when they are in these two states. Therefore, we combine the host-side and network-side features and train two detection models separately for these two states. As shown in
Figure 2, the proposed PRATD can be divided into three parts: Feature extraction phase, Model training phase, Detection phase. In the feature extraction phase, the features of the host-side and network-side are firstly extracted separately, and then these features are combined into one feature vector. In the model training phase, the feature vectors are labelled and be used as two training sets to build two detection models for the keep-alive and command & control states of RAT by using machine learning algorithms. Then, the feature vectors of the test set will be identified by the built detection models to examine whether they belong to the behaviours of the RAT. More details will be described next.
4.2. Feature Extraction
A communication session represents all information in a communication interaction between two parties. It is usually determined by a 5-tuple (source IP, source port, destination IP, destination port, protocol). However, communications initiated to the same destination address (including the destination IP and destination port) over a period of time are often initiated by the same process. Therefore, in this article, we define a session as all network traffic that has the same destination address in a time-window, which is set to five minutes. Specifically, a session consists of four elements: source IP, destination IP, destination port, and protocol. To obtain a good detection performance, both host-side and network-side features are used in this article to detect RATs. In this research, seven network session features and 10 host process features are respectively extracted from network and host.
4.2.1. Network-Side Features Extraction
To analyze the network behaviour of RAT, we mainly monitor network layer and transport layer protocols. Given that the most widely used transport layer protocols are TCP and UDP, this article focuses solely on the sessions using TCP or UDP. Then, we extract features from the network sessions. Compared to most benign programs, RATs may have less communication traffic during communication in order to maintain the characteristics of concealment. To test this idea and investigate the different network behaviours of benign programs and RATs, we used five benign programs and three RATs. These five benign programs belong to five different types of applications including browsers, instant messaging software, video software, collaborative office software, and download software. The three RATs are darkcomet, njrat and vantom, which are well-known and widely used. We collected network traffic of the eight programs separately in a pure Windows 7 environment and compared them from four different perspectives by calculating the mean of five time-windows. Specifically, the RATs exhibited the following network characteristics during communication: (1) The RATs generally communicate with the target host with fewer source ports than benign programs in order to maintain characteristics of concealment. As shown in
Figure 3a, the numbers of source port usages of the three RATs are usually less than that of the five benign programs. (2) For the servers of the RATs, their received packets are used to transmit command and control information, so the size of these packets is usually small.
Figure 3b shows the ratios of small download packets to all download packets of the eight processes, and it can be seen that the ratio of RAT is usually greater than that of benign programs. (3) After receiving the result of a request command, the hacker often needs a period of think time to decide its next move, thus there is usually a longer interval between a request and the next request than many benign programs.
Figure 3c exhibits the average intervals of communication interaction of the eight processes. (4) For the servers of the RATs, the size of upload-side data is generally larger than the download-side data. The ratios of the upload data to the download data of the eight processes are given by
Figure 3d, and it can be observed that the ratio of RAT is usually greater than 1.0. Please note that in
Figure 3 a few benign programs exhibit some similar network characteristics with RATs.
Based on the above network characteristics of the RATs, we have extracted seven network features: , , , , , , and . Detailed description is as follows:
The number of source ports used in a session, also known as the number of sub-connections.
The number of small download packets in a session. This article regards a packet as a small packet when the size of the packet is less than a threshold (it is set to 70 bytes in this article).
The average interaction time of multiple interactions in a session.
The ratio of small download packets and all download packets in a session. , represents the number of all packets in a session.
The ratio of upload and download packets in a session. , represents the number of all upload packets, represents the number of all download packets.
The ratio of the size of upload data to that of download data in a session. , represents the size of all upload data, represents the size of all download data.
The used transport layer protocol.
4.2.2. Host-Side Features Extraction
From the observations in
Section 4.2.1, it can be found that using only network-side features to detect RATs may lead to some mistakes. Intuitively, RATs not only generate network traffic but also leave traces on the host when they are running, because they need system resources to complete its functions. In contrast to most benign programs, RATs usually use less security-critical system resources for most of the time because they need to maintain their own concealment. Therefore, it is helpful for distinguishing between benign programs and RATs by adding host-side features. Base on the analysis of different programs, we believe that file, network, registry and process belong to security-critical system resources. Similarly, we compared several host operations of the same eight programs as
Section 4.2.1 in a pure Windows 7 environment. The number of file creation records, the number of network connection creation, the number of registry operation (including creating registry record, deleting registry record, and modifying registry record) and the number of process operation (including process accessing and process creating) of these eight programs in a time-window (the value is the average of five time-windows) are shown in
Figure 4a–d, respectively. As shown in these figures, there are big differences between RATs and benign programs in their operations of file, network, registry and process.
The host-side features are extracted based on each process. Through observation and analysis, it is found that the content of network transmission changes little and the usage of security-critical system resources is relatively fixed when RATs are in the keep-alive state, and the transported information is varied greatly when the RATs in command & control state. Considering that there are different system resource usages in the different states of RAT, we have extracted 10 host-sides features. These features can be divided into two categories: one is the number of resource usage of a process in a time-window and the other is the proportion of resources used by the process in the current time windows. A detailed description of the host-side features is shown in
Table 1.
As shown in
Table 1, the 10 host-side features are related to the host operations of a program. For example, the number of network connection creations indicates the number of network connections created by a process in the current time-window, and the proportion of network connections represents the ratio of the number of network connection creations of a process to the total number of network connections created by all processes of an operational system in the current time-window. Others are similar to the features described above.
4.2.3. Features Combination
The extraction of the network-side and host-side features are performed simultaneously. Then, we need to combine the double-sides features to construct a combined feature vector which will be used as the input for the next phase. We monitor the record of TCP/UDP network connection for each process and correlate host-side and network-side features by finding the process ID of the session. PRATD determines a process ID corresponding to a session by querying the network connection record that is closest to the current session time. The details are shown in Algorithm 1.
Algorithm 1 Double-sides features combination |
- Input:
A collection N of network-side feature vectors composed of n vectors, , a collection H of host-side feature vectors in composed of m vectors, , the collection D of records of each process creating network connection in host composed of r records, . - Output:
The collection M of double-sides feature vectors composed of s vectors, . - 1:
M = ; - 2:
Sort in ascending order of the generated times; - 3:
for each feature vector do: - 4:
← the 4-tuple of - 5:
← the timestamp of ; - 6:
c← the ID of the record of network connection creation in D whose timestamp is closest to - 7:
- 8:
// is the 4-tuple of c - 9:
// represents the process ID of , indicates the process ID of the record in D - 10:
- 11:
- 12:
end while - 13:
for each feature vector do: - 14:
process ID of - 15:
// represents the host-side features related to - 16:
- 17:
end for - 18:
; - 19:
appends in M; - 20:
end for - 21:
returnM
|
As shown in Algorithm 1, line 2 sorts the records
of each process creating network connection in ascending order of the generated times, where
r is the number of records. If the quick-sort algorithm [
27] is employed, the time complexity of the sort is
. The main loop (Lines 3–20) looks for a feature vector from the collection
H of host-side feature vectors associated with every feature vector in the collection
N of network-side feature vectors. The first inner loop (Lines 7–12) searches the process ID that creates the session connection, the time complexity for this loop is
, where
c is the ID of the record whose timestamp is closest to the network-side feature vector. The second inner loop (Lines 13–17) finds the corresponding network-side feature vector related to every host-side feature vector based on the found process ID, the time complexity of this loop is
(
m is the number of host-side feature vectors). Then, the host-side features are concatenated to the network-side features in Line 18. Overall, the time complexity of Algorithm 1 is
.
4.3. Model Training and Detection
Model training: In this phase, two RAT detection models will be trained. To obtain two training sets, a controllable real environment is built and the network traffic and host behaviours of benign programs and RATs are collected during they are running. As mentioned in
Section 4.2, each feature vector in these two training set contains both network-side and host-side features as well as a class label (RAT or benign program). Besides, owing to these two detections models being used for the different two runtime states of RAT, it is necessary to collect the network traffic and host behaviours of RAT in its keep-alive state and command & control state respectively, and then give a tag (keep-alive state or command & control state) to the corresponding feature vector. Please note that these tags are only used in the training phase to distinguish different states. Therefore, two training sets will be obtained in this phase. One consists of the feature vectors belonging to benign programs and RAT (keep-alive state), another includes the feature vectors belonging to benign programs and RAT (command & control state). Based on the two training set and machine learning algorithms, two RAT detection models are built, and then they will be used in the detection phase.
Detection: In this phase, the detection results for each unlabelled sample of the test set will be given. Specifically, each feature vector in the test data will be used as input and then will be simultaneously examined by the built two RAT detection models, which will give its detection result respectively. After the two detection results are obtained, we considered an unlabelled instance from the test set as a RAT if one of the two detection models regards it as RAT, because its feature vector is similar with the feature vectors belonging to at least one mainly states of RAT. The details are shown in Algorithm 2. In Lines 2–3, the feature vector is simultaneously detected by model A and model B, and get the respective detection result. The time complexities of the training Model A and Model B and detection depend on the classification algorithms that are used. Lines 4–8 give the final detection result of a feature vector by combining the respective detection result. For
n feature vectors, the time complexity for judging their final detection results is
.
Algorithm 2 Merge phased results to give final detection result |
- Input:
A collection M of double-sides feature vectors composed of n vectors, , the detection model for the command & control state of RAT, the detection model for the keep-alive state of RAT. - Output:
The collection R of all detection results, , r is the predicted result (benign or RAT). - 1:
for each feature vector do: - 2:
Model A detects and give a result (benign or RAT); - 3:
Model B detects and give a result (benign or RAT); - 4:
- 5:
; - 6:
- 7:
; - 8:
- 9:
appends in R; - 10:
end for - 11:
returnR
|
7. Conclusions
RAT has long been a threat to organization and personal computers. In this article, we propose a RAT detection method named PRATD. The first core of PRATD lies in its each detection model is separately trained for each of the two runtime states of RAT. Based on it, PRATD can detect RATs in their runtime states of keep-alive and command & control. The second core of PRATD is that it combines the features extracted from network-side and host-side, thus the detection capacities of RATs on network and host are combined together. We conduct the experiment by using five kinds of benign programs and 20 famous RATs. The experimental results show that PRATD obtains better detection results than host-based and network-based methods, which suggests it is the benefit to enhance the accuracy for detecting RAT by using the double-sides features and building detection models for the different RAT’s runtime states.
As for future work, we will test PRATD in practical scenarios. Moreover, the work of extracting better host-side features and further improve the TPR of RAT detection method also need to be addressed.