A router is a device that forwards data packets between computer networks. Routers are widely used in the Internet and have become the traffic center of the network flow. There are a lot of services, such as the web, file transfer, domain name resolution and dynamic host configuration server, running on these embedded devices. Even if they run as regular small computers and have the corresponding secure configuration choices, there is no anomaly detection of prediction software on them. They depend on firmware updates and firewalls to protect their security. Nowadays, routers are easily attacked by hackers. First, the users do not have professional knowledge about device security and set the device improperly. Second, the router is the data forwarding center of the network, and it is worthy for the hackers to attack them. Third, they are constantly online to forward data and router systems can have inherent bugs and misconfigurations installed by the developers or administrators. Besides, all network traffic passes through the routers. Thus, if an adversary manages to attack the routers, it can make the whole network paralyzed or gain the data control rights of the whole system. There are some recent attacks towards mainstream routers shown in
Table 1.
Due to the vulnerability of the routers, many methods have attempted to diagnose routers. Labovitz et al. [
1] analyzed the commercial Internet traffic to find the changes and provided the trend of the traffic of the network and the communication devices. Lee et al. [
2] used the method based on MapReduce to do the flow analysis of the routers. However, not all of the router attacks can be reflected in traffic change. There are numerous works that concentrate on the routers’ syslogs. Routers record the events occurring on them by logging these messages, which contain debugging or error messages, internal states and exceptions. Some current anomaly detection based on log analysis are inappropriately used in routers. The authors of [
3,
4] needed to combine the source code of the systems to do the analysis. Although these works can improve the attack detection accuracy depending on source codes, they are not practical in real router systems. Apparently, most routers are closed systems where we cannot get the source codes in most cases, and reverse engineering of the firmware encounters many difficulties and obstacles [
5,
6]. Some researchers also concentrate on pure log diagnosis. Dlog [
7] extracted the log templates and presented the cluster of the attacks. DeepLog [
8] constructed the workflow from system logs to detect the anomalies through deep learning. Dusi et al. used network traffic such as the Tcpdum log [
9] to monitor the network and find missing segments. Balzarotti et al. [
10] constructed a system-call trace and analyzed system-call logs to detect if there were any malicious codes. However, all these works only use the syslogs as the single diagnosis source and ignore other information of the system, which is also significant in the anomaly detection. According to our survey of routers, relying solely on syslog messages cannot record all events and error messages on the device. Log information is recorded in the device at different granularities. If the system status enters a predetermined state, the corresponding log information will be printed. Therefore, the router log can form a limited state machine [
11]. However, there are countless rampant attacks in the network and there are some modified versions for router attacks. Some attacks and anomalies can no longer be reflected in a single source log. Further, many attacks are no longer recorded in the router’s single log. Therefore, to detect these attacks, it is necessary to correlate and analyze multiple types of log information. Therefore, we need to use the correlation log information to improve the accuracy of the anomaly detection. Instead of using just one source log information as in works [
8,
12], we leverage multiple sources of logs, such as syslogs, firewall logs, CPU utilization, LAN status, and memory information to do the training step. With labeled data, we can learn the benign and malicious events. However, there are some events for which we cannot get labels before detection. Thus, we use the correlation analysis to perform clustering, which can help us migrate the labeled training problem. To accelerate the processing of a prohibitive volume of log data, data mining and machine learning are useful in solving this problem. Our approach consists of three steps: (1) learn the normal and abnormal states of the routers with the correlation of multiple log data; (2) detect anomalies with the input of multiple audit data combined with the result of the first step; and (3) use the longest common subsequences algorithm to find the regular pre-steps before the attack and use this attack chain to do the prediction in order to achieve proactive protection. We applied our approach on the real router network, and the results show that our approach can improve the current intrusion detection techniques. In summary, our key contributions are: (1) We use multi-source logs in the router for offline learning and training, which obviously improves the accuracy of attack detection. (2) We use correlation analysis method to get the relationships among events and find some unlabeled events during the learning step. (3) We perform anomaly detection by calculating the distance between the event and the clusters, and accurately classify the anomaly. (4) We use LCS algorithm to find the pre-steps before the attack, and we can use this chain to predict the attack before it happens.