**1. Introduction**

High connectivity and automotive electronics are two major developments in modern vehicles, which are evolving to provide various convenience features to drivers. Vehicle connectivity using smart devices and cellular network has enabled the consumption of various contents in the vehicle through an infotainment platform. Particularly, vehicle-to-vehicle communication has enabled the sharing of driving information and dangerous situations on the road. Likewise, vehicle-to-infrastructure communication has broadened the prospects of autonomous vehicles, which have depended on existing sensors only, through the exchange of traffic signals and flows. Furthermore, vehicles are evolving to giant smart devices by being equipped with safety devices, such as forward collision-avoidance and lane-keeping assists, as well as convenience devices, such as telematics and power supply electric devices.

However, such diverse connectivity of vehicles increases their points of attack and exposure to external attacks. As the current controller area network (CAN) message frame lacks authentication or access control mechanisms, in-vehicle data transfer is performed without the use of security techniques. Furthermore, as the in-vehicle controllers are interconnected, the complexity of the architecture increases. The interferences or mutual effects between controllers may cause unintended motions or failures, thus posing further threats to the cybersecurity of vehicles or the safety of passengers.

Existing connected vehicles attain security by configuring a separate dedicated network for in-vehicle Internet services, such as telematics, and separating the connectivity services of the vehicle from the Internet. However, the dedicated network is costly to construct and operate, and it has limitations in opening the platform to expand connectivity-related services. Hence, a more fundamental solution to protect the devices without depending on the traditional communication network security is now required because dedicated Internet services and local area network system have been combined.

To design the cybersecurity of a mission-critical environment, such as vehicles, the characteristics of the external network environment, such as vehicle domain and machine-to-machine (M2M) communication, should be considered. Particularly, intrusion detection or prevention systems of in-vehicle network protection require high accuracy. If important messages in the vehicle are mistaken for an attack and blocked, the vehicle may malfunction and develop safety problems. Therefore, false alarms must be prevented in the intrusion prevention of in-vehicle networks.

Additionally, real-time response is critical for the cybersecurity of vehicles. Malicious attacks on moving vehicles are directly linked to the safety of passengers, pedestrians and other vehicles. Therefore, when external attack messages are identified, the vehicle must be able to implement response measures in real time. However, due to the nature of embedded environments, such as vehicles, there are constraints in temporal and spatial resources. As the available resources for learning and classifying intrusion data are limited, a real-time intrusion detection system (IDS) having high accuracy should be constructed, and it should be able to function with the minimum available computing power of the vehicle.

In 2015, a Jeep Cherokee was remotely hacked and reported to raise awareness of the cybersecurity of vehicles [**?** ]. In a recent article [**?** ], the author suggested that we should not only depend on defending against attacks because it is impossible to produce vehicles with perfect security system to disable hacking, but we should also design the security system to detect attacks and respond appropriately.

Therefore, in this study, we developed a model for detecting anomalous behaviors and attacks caused by message injection on vehicles in real time with high accuracy. We applied a hierarchical data analysis technique for detecting and classifying attack data. Furthermore, to train the intrusion detection model, we minimized misdetections and no-detections using a machine learning algorithm. An appropriate algorithm for the dataset was selected to detect the attack data, and a simulation environment was set up to derive the optimal hyperparameters. Particularly, we propose a method to quickly detect the existence or absence of attacks hierarchically by learning the behaviors of the CAN data. The accuracy of the model was increased to make it applicable to an actual vehicle environment, and a model with real-time responsiveness and using limited resources was implemented. Accuracy, F1 score and detection time were applied as valid metrics to evaluate the proposed model. Using these metrics, we obtained an improved model to detect attacks and anomaly behaviors that flowed into vehicles. The contributions of this study are as follows.


The rest of this paper is organized as follows. Section **??** introduces existing related studies. Section **??** details the CAN message frame and topology for an understanding of vehicle cybersecurity. Section **??** describes the dataset we used, as well as the concrete data analysis method and analysis model proposed in this paper. This includes the algorithm for vehicle data analysis, performance measurement metrics and hypothesis space comparison of models for in-vehicle data analysis. Section **??** interprets the simulation results and verifies the effectiveness of the proposed method by comparing it with existing results. In Section **??**, we present the conclusion and future research direction.

#### **2. Related Work**

This section highlights existing works related to this study. The problems in each domain, existing methods to solve them, advantages and disadvantages of the solutions and constraints are stated.

Song et al. [**?** ] proposed an intrusion detection model that learns the sequential pattern of in-vehicle network traffic and detects message insertion attacks according to traffic changes. The structure of the inception-ResNet model designed for large-scale images was used, and the deep convolutional neural network was redesigned by reducing the architecture complexity. Particularly, the authors experimented with a dataset extracted from actual vehicle environment and suggested that detecting complex, irregular random attacks has an advantage. The experiment compared long short-term memory (LSTM), artificial neural network, support vector machine, k-nearest neighbors (kNN) [**?** ], naïve Bayes (NB) and decision tree (DT) [**?** ] algorithms. Zhang et al. [**?** ] proposed a vehicle intrusion detection model based on the neural network algorithm. They compared detection performances using gradient descent with momentum and adaptive gain, and they performed verification and evaluation by applying data collected from actual vehicles. Further, the authors proposed a host-type intrusion detection model for in-vehicle intrusion detection. However, host-type IDS may be inefficient in a broadcast-type communication environment, such as CAN. This architecture is impractical in an embedded environment using limited resources as duplicate detections are performed because every controller receives the same message, and each controller must secure separate resources for intrusion detection. Kang et al. [**?** ] proposed a deep neural network (DNN)-based IDS to monitor the CAN message frame. The DNN model was pre-trained using a deep-belief network. The authors used probability-based feature vectors extracted from packets in learning and training to classify messages as normal or attack. The experiment demonstrated that an accurate detection ratio of approximately 0.98 can be provided in real-time response.

Hoppe et al. [**?** ] placed an anomaly-based IDS in the CAN bus to monitor network traffic. The IDS detects randomly manipulated messages by comparing them with normal patterns. Four attack scenarios related to the CAN bus were presented and classified using the established computer emergency response team taxonomy. It includes technical and managerial considerations to protect the in-vehicle network in comparison with the traditional information technology system, and the countermeasures are discussed by analyzing security vulnerability and potential safety implications. Taylor et al. [**?** ] suggested an anomaly detection method based on the LSTM neural network to detect attacks on the CAN bus. The authors analyzed data by manipulating the identifiers (IDs) of the message frame in a dataset extracted from vehicles rather than infusing attack traffic into the in-vehicle network. By assuming that the CAN traffic was regular, they detected traffic outside the normal sequence in five dataset manipulation scenarios. The result of detecting the known attacks of the CAN bus showed potential for development and provided follow-up tasks to improve the experimental method and detection model. Wang et al. [**?** ] proposed a distributed anomaly detection framework using hierarchical temporal memory (HMM) to strengthen the security of the in-vehicle CAN bus. This method evaluates the output using an abnormal score mechanism that learns the prior state of the CAN network and predicts the flow data. The authors extracted CAN traffic and modified the data fields manually. In addition, they created attack data by replaying the captured traffic on the dataset. They claimed that the area under the curve score was higher than those of the recurrent neural network and HMM, but a method of efficiently detecting attacks where multiple IDs interact without relying on a single message ID should also be considered. Furthermore, experiments are required on indices related to time or resource utilization to examine the applicability of the proposed model to an actual vehicle environment.

The common limitation of the studies mentioned above is that the existing models only determine whether the attack, which is injected in the in-vehicle network, has occurred. In an actual vehicle environment, merely distinguishing between an attack and benign status is insufficient. It is highly important to provide additional information for immediately determining the target affected by the type of attack. It may be easy to inject the attack data in a network and track the sign of occurrence. However, a large amount of computation, which is proportional to the number of target labels, is required to extensively determine the semantics of the attack injected into the vehicle. To address these limitations and satisfy the requirements of an IDS in an actual vehicle environment, we propose a learning model that can not only determine whether an attack occurred, but also classify the attack type and target vehicle.

### **3. In-Vehicle Network Security**

To define the proposed multi-labeled hierarchical classification (MLHC) model, this section describes the vehicle CAN message frame, CAN bus structure and attack vector for the vehicle.

#### *3.1. Controller Area Network Message Frame and Topology*

The CAN is the most representative in-vehicle network technology developed by Robert Bosch GmbH [**?** ] in the early 1980s. Its specifications are still being expanded as a major protocol was used in On-Board Diagnostics II standard. The International Organization for Standardization (ISO) standardized the CAN by ISO 11898 [**?** ] and is still expanding it. This standard was designed to enable communication between in-vehicle microcontrollers and devices and is used for information exchange between electronic control units (ECUs). The CAN device transfers data in packets in message frame units on the CAN network. The message frame does not contain the source or target addresses but only the IDs related to priorities. The real-time priority-based message transfer system follows IDs composed of an 11- or 29-bit string, and a lower ID has a higher priority. First, whether the CAN bus is in use is determined before sending a message to the CAN node, and then collision between messages is detected. When two nodes send a message simultaneously, the message with a higher priority is first sent, and then the message with a lower priority is delayed.

The CAN message frame is divided into base and extended formats depending on the length of the arbitration field, as shown in Figure **??**. The base format supports the CAN 2.0A protocol, whereas the extended format supports the CAN 2.0B protocol, and it also accepts the CAN 2.0A protocol. We describe the fields used in the present paper, and the abbreviations for the remaining fields are presented in the Abbreviation Section.

**Figure 1.** Controller area network (CAN) message structures: (**a**) base format; and (**b**) extended format.


The ECU is a component of the in-vehicle network. It is an embedded device that controls other in-vehicle controllers or devices. The ECU contains input and output interfaces for interconnecting the microcontroller unit, memories (such as read-only and random-access memories), sensors and actuators. The ECU collects and analyzes data from sensors, and it generates control signals and sends them to actuators.

Figure **??** illustrates the CAN topology composed of the in-vehicle network and controllers. The ECUs are grouped as the domain controller for logically distinguishing vehicle functions by use, and the CAN bus enables mutual cooperation or control between the ECUs by interconnecting them. Vehicle ethernet may be used for interconnecting controllers that require high-speed communication, and the media-oriented systems transport network is often used for multimedia communication. A gateway may be installed to control diagnostic communication or external interfaces and installing an IDS function for monitoring the CAN traffic inside this gateway may be effective. As shown in Figure **??**, external attacks may be injected through a diagnostic bus connected to the CAN bus or an external interface, and this can aid hacking by dominating the CAN bus or ECU.

**Figure 2.** CAN topology and attack vectors: (**a**) external interfaces; (**b**) diagnostic bus; and (**c**) occupation of CAN bus.

### *3.2. Attack Vectors on In-Vehicle Network*

Attack vectors of confidentiality, integrity and availability aspects need to be considered for defense against vehicle cyberattacks. Attackers can seize the rights for a vehicle or the systems connected to a vehicle and randomly tap major traffic in the vehicle or peek into sensitive information, such as the location of the vehicle. They can also attempt to launch a denial-of-service attack to manipulate the ECU software by reprogramming it. Additionally, they can generate large-scale traffic inside the vehicle to disable normal messages. By entering the in-vehicle network and injecting random messages, hackers can threaten the confidentiality, integrity and availability of the vehicle. Threats of compromising the security objectives of in-vehicle systems are outlined in Table **??**.


**Table 1.** Summary of security objectives and corresponding threats on in-vehicle network.

A monumental event in vehicle cybersecurity occurred in 2015 when Miller and Valasek [**?** ] hacked Jeep Cherokee and opened it to the media and at a hacking conference. They demonstrated a hacking attack targeted at a real moving vehicle by using the vulnerabilities of the cellular network and external interface of the connected service. They accessed the CAN bus through the head unit of a remote vehicle and successfully updated a tampered firmware by acquiring the rights of the controller. After acquiring the control rights of the vehicle, they could remotely operate not only the audio and wiper of the moving vehicle, but also the brakes and steering wheel. Consequently, Fiat Chrysler Automobiles recalled 1.4 million vehicles that could be attacked and was fined \$105 million. Furthermore, Tencent's Keen Security Lab [**?** ] recently seized the rights of a Lexus NX300 using the vulnerability of the audio-video navigation system in the vehicle. They informed the manufacturer that they invaded the CAN bus and successfully injected a malicious message that can cause the vehicle to malfunction and warned of the vulnerability on their blog.

Various attack vectors that may damage the security objectives of vehicles in an in-vehicle network topology are shown in Figure **??**. Various remote-connection external interfaces such as Wi-Fi hotspot and Bluetooth are used, as well as the Internet and cellular networks. It is also possible to form sessions with remote vehicles by scanning the M2M network of a specific communication service provider for connectivity services and searching the Internet protocol address and open service ports of the vehicle. In addition, the controller can be operated by force or reprogrammed using diagnostic communication that bypasses the authentication system of the gateway in an in-vehicle network. Once a specific controller is seized, it is possible to launch an attack to occupy the network and stop services by sending many CAN messages with manipulated priorities to the CAN bus.

#### **4. Materials and Methods**

#### *4.1. Multi-Labeled Hierarchical Classification (MLHC) Process*

The overall process of the proposed model is illustrated in Figure **??**. The CAN traffic extracted from vehicles is preprocessed to enable the classifier to learn and evaluate it. The data analysis model uses a classification algorithm, preconfigured hyperparameters and performance evaluation metrics. The analysis model is trained by injecting training data, and the performance of the trained model is evaluated using test data. The intrusion detection module, including the trained model in an actual application environment, is used to detect follow-up information, such as attack or benign, vehicle type and attack type, after receiving the CAN message frame as input.

**Figure 3.** Overall multi-labeled hierarchical classification (MLHC) process.
