4.1.1. Dataset

The scheme of the in-vehicle network intrusion detection challenge dataset released by Han et al. [**?** ] included CAN ID, DLC and data payload, reflecting the CAN message structure; the timestamp when each data sample was recorded was added into this dataset. They also added a binary label to indicate whether it corresponds to an attack or benign status, whether the data sample is that of an attack or a normal state. We selected this dataset because it includes data extracted from an actual vehicle environment and allows a hierarchical structure of detailed data in the lower layers, such as attack type and vehicle type, for training the vehicle IDS model. The dataset comprises a total of 12 files, with three types of attack data and three vehicle types in normal and message-injected states. This dataset was constructed using data from vehicle models from three vehicle manufacturers. Furthermore, a group of vehicles using the same CAN database formed a vehicle type, and this depended on the vehicle manufacturer that designs the CAN databases. The distributions of the data in each data type are outlined in Table **??**.



The message injection into the in-vehicle network was attempted in three attack types as follows. For the flooding attack, several messages were injected with a high-priority CAN ID to induce service delay. For the fuzzing attack, random CAN IDs were injected in brute force until the pre-defined valid CAN ID in the vehicle reacted. For the malfunction attack, valid CAN IDs for each vehicle type were

collected in advance, random data fields were configured using the IDs and tampered values were injected. The dataset can be expanded without limitation when additional information is required, such as attack type and vehicle type.

#### 4.1.2. Data Preprocessing

For the classifier to learn the CAN traffic for data analysis, the data preprocessing step illustrated in Figure **??** is required. The CAN IDS dataset used in this model consists of 12 files, which are separated by vehicle type and attack type, and only attack or benign is expressed by binary classification. However, as the vehicle type or attack type is not classified in advance in an actual environment, the intrusion detection module should be able to detect anomalies, even in an environment of random combinations of vehicle types or attack types. Therefore, in this model, to enable the classification of vehicle type and attack type from the incoming data, each unit dataset was integrated into one data frame as shown in Equation (**??**):

$$S = \sum\_{v\_{\text{lypr}}} \sum\_{a\_{\text{lypr}}} S\_{v\_{\text{lypr}}, a\_{\text{bypr}}} \tag{1}$$

where *S* is the total dataset required for data analysis, *vtype* is the vehicle type and *atype* is the attack type. The unit dataset *Svtype*,*atype* is subdivided by attack type and vehicle type, and the existing binary codes are encoded in multiple sub-labels to express additional information, such as vehicle type or attack type.

**Figure 4.** MLHC data preprocessing: (**a**) initial dataset; (**b**) merging and feature selection; (**c**) scaling; and (**d**) data split.

The features of this dataset include timestamp, time interval, CAN ID, DLC and eight data bytecodes for payload. The feature set of the input data is extracted using the improved feature selection (IFS) method proposed by Park and Choi [**?** ]. This method uses correlations and cross-entropy between the features to combine the high values derived from correlation and information gain. It finds both greedy features as well as the ones with the highest correlation. These two vectors are combined to determine the final features from the dataset that are highly correlated and have a strong impact on the classes. Consequently, timestamp is excluded from the original feature set, and the selected features are as follows: time interval, CAN ID, DLC and data payload. Particularly, the data payload is composed of 64-bit strings at the maximum and can be converted to a byte code string of a length specified by the DLC field. Normalization is applied to prevent underflow or overflow that may occur in the learning process and to evenly distribute the impact on each data string of the payload. The eight independent byte strings having the same values of sections from 0 to 255 are converted to eight floating point variables having a value between 0 and 1 using the min-max normalizer with minimum and maximum values as follows:

$$\mathbf{x}'\_{i} = \frac{\mathbf{x}\_{i} - \min(\mathbf{x}\_{i})}{\max(\mathbf{x}\_{i}) - \min(\mathbf{x}\_{i})} = \frac{\mathbf{x}\_{i}}{2^{8} - 1} \tag{2}$$

where *x*- *<sup>i</sup>* is a normalized value and *xi* is an original vector of feature *i*.

The dataset *S* used as input contains a feature set *X* and target set *Y*. This is split into training, validation and target sets, which are used for learning. For the feature and target sets, *S* is divided into columns, whereas for the training, validation and test sets, S is divided into rows. *x* (*l*) *<sup>i</sup>* and *y* (*l*) *j* denote data elements at feature *i* and labels in the classification group *j* for sample *l*, respectively. In this study, the training and test sets were divided at the ratio 8:2. The model was trained using 80% of the total data, and the performance of the final model was evaluated using the remaining 20% samples. The test set was separated to prevent overfitting and to accurately predict the model performance in a new actual data environment. Notably, the test set was used only for evaluating the model and not for learning. Instead, part of the training data was divided and used for verification to measure the model performance in the learning stage and to obtain hyperparameters yielding excellent performance. This process is illustrated in Figure **??**.

**Figure 5.** MLHC 10-fold cross validation: (**a**) dividing the training set into 10 folds; and (**b**) learning training-fold and validating with the other fold.

After dividing the training set into 10 folds, the model was trained with nine different folds, and the model performance was verified with the remaining fold. The learning was performed 10 times; nine folds were used for training, and the remaining one fold was used for validation.

Additional information must be present in the target data, for example, vehicle information and attack type, as well as the attack or benign of the CAN message. The label was excluded from the feature set for training because it was used to evaluate the learning result in supervised learning. Rather, the label was included in the target data and reorganized to express the additional information, such as vehicle information and attack type, as well as the attack or benign of the CAN message. To hierarchically classify data traffic as suggested in this study, the target data must also form a similar data structure. As shown in Figure **??**, the first row of the target data classifies attack or benign, and the lower rows include a hierarchical structure to distinguish the vehicle information or attack type only for attack data. Furthermore, the target data were designed to have a multi-labeled form so that the additional information can be included. Finally, the output data become a vector set including sub-vectors.

**Figure 6.** MLHC target labeling.

#### *4.2. MLHC Model*

The objective of this study was to effectively detect anomaly behaviors, such as message injection attack, in the CAN traffic of vehicles. To detect intrusion or anomaly behaviors external to the vehicle, an intrusion detection module is required in the CAN bus. Prior studies have detected anomaly behaviors by training normal CAN traffic and analyzing the time interval between messages, or by using machine learning algorithms. In this present study, we adopted a hierarchical approach using multi-label and multi-class classifiers. Hence, we propose a machine-learning-based multi-labeled method for detecting intrusions into the CAN and classifying attack techniques in a hierarchical manner. The multi-class classifier can identify more various categories of data with one classifier as compared to binary classification, and the multi-labeled classifier can contain various types of information simultaneously in a single classifier. This section explains the learning process and algorithm of the hierarchical intrusion detection method using the multi-labeled technique proposed in this study. This subsection describes the MLHC algorithm and compares the space of hypothesis and accuracy according to the classification model.

#### 4.2.1. MLHC Algorithm

The MLHC algorithm and its deployment (see Algorithm **??**). The data preprocessing process described in Section **??** is described on Lines 1–4, and the model learning process is described on Lines 5–17.

In the preprocessing stage, we use the IFS method to select the features for the model (Line 1). Then, we normalize the features using min-max normalization, as described in Equation (**??**) (Line 2). The training and test sets are split (Line 3); the training set is divided into k folds using k-fold cross-validation (Line 4).

In the learning stage, the algorithm searches through the training data of each training dataset *Strain*, determines whether the data sample *x*(*l*) is benign or attack using the first classifier *c*<sup>0</sup> and records the result in *y*ˆ (*l*) <sup>0</sup> (Line 7). If the data sample indicates a benign state, it is not classified further, and the learning of the corresponding sample is terminated (Lines 8–9). Otherwise, *y*ˆ (*l*) *<sup>j</sup>* (Line 10), the result of additional classification using the sub-classifier *cj* is obtained and stored in the detailed information vector *V*ˆ (*l*) (Lines 12–13). *Y*ˆ, which is returned as the result of the model, is composed of a set comprising *y*ˆ(*l*) as its elements, as shown in Equation (**??**):

$$\hat{Y}^{(l)} = \left\{ \hat{y}^{(l)} \mid l \in \{0, 1, \dots, n(S)\} \right\} \tag{3}$$

where *l* is an index of a sample of dataset *S*. Regarding dataset *S*, *Strain* is the training set and *Stest* is the test set. This is generally expressed as *S*. The result for each sample *l* can be expressed as a concatenation of *y*ˆ (*l*) <sup>0</sup> and *<sup>V</sup>*<sup>ˆ</sup> (*l*) (Line 16), as expressed in Equation (**??**):

$$\mathcal{Y}^{(l)} = \begin{bmatrix} \mathcal{Y}\_0^{(l)} & \mathcal{V}^{(l)} \end{bmatrix} \tag{4}$$

where *y*ˆ (*l*) <sup>0</sup> is a binary classification result to determine whether sample *l* is a benign or an attack case. *V*ˆ (*l*) is a vector set that expresses additional information if *y*ˆ(*l*) is an attack, and it can be expressed in detail as Equation (**??**):

$$\mathcal{V}^{(l)} = \begin{cases} \mathcal{Q}, & \text{for } \mathcal{Y}\_0^{(l)} \text{ is } bnilgn\\ \begin{bmatrix} \mathcal{Y}\_1^{(l)} & \mathcal{Y}\_2^{(l)} & \cdots & \mathcal{Y}\_j^{(l)} \end{bmatrix}, & \text{otherwise} \end{cases}, \tag{5}$$

where *V*ˆ (*l*) is an empty matrix if *y*ˆ (*l*) <sup>0</sup> is a benign case. On the other hand, *<sup>V</sup>*<sup>ˆ</sup> (*l*) has a matrix of elements *y*ˆ (*l*) <sup>1</sup> , *y*ˆ (*l*) <sup>2</sup> , . . ., *y*ˆ (*l*) *<sup>j</sup>* representing additional learning results by each classifier if *y*ˆ (*l*) <sup>0</sup> is an attack.

**Algorithm 1** Multi-labeled hierarchical anomaly detection.

**Input:** *S* is a universal dataset including a feature set *X* and a target set *Y*.

**Output:** *Y*ˆ is a set of learning results including *y*ˆ (*l*) <sup>0</sup> and *<sup>V</sup>*<sup>ˆ</sup> (*l*) for all samples *<sup>l</sup>*.

> *y*ˆ (*l*) <sup>0</sup> is a result of determining whether sample *l* is a benign or an attack.

*<sup>V</sup>*<sup>ˆ</sup> (*l*) is a combination of additional classification results for the attack sample (*<sup>j</sup>* <sup>=</sup> <sup>0</sup>).


training set. *n*(*S*) = *n*(*Strain*) + *n*(*Svalidation*) + *n*(*Stest*)

$$\mathbf{5: } \mathbf{for} \ x^{(l)} \in X \land X \subset S\_{train} \ \mathbf{do}$$

```
6: Initialize Vˆ (l) ← ∅
```

```
7: yˆ
      (l)
      0 ← c0(x(l))
```

10: **else**

11: **for** *cj* ∈ *C* ∧ *j* = 0 **do** 12: *y*ˆ (*l*) <sup>0</sup> <sup>←</sup> *cj*(*x*(*l*)) 13: *<sup>V</sup>*<sup>ˆ</sup> (*l*) <sup>←</sup> add(*<sup>y</sup>* (*l*) *j* ) 14: **end for** 15: **end if** 16: *<sup>y</sup>*ˆ(*l*) <sup>←</sup> *<sup>y</sup>*<sup>ˆ</sup> (*l*) <sup>0</sup> concat *<sup>V</sup>*<sup>ˆ</sup> (*l*) 17: **end for**

#### 4.2.2. Confusion Matrix and Evaluation Metric for MLHC

A confusion matrix is used to evaluate the classification results. In general, when the training results of the model are returned only in binary classification, the results are expressed in only two types, positive and negative, so they have a simple matrix, as presented in Table **??**.


**Table 3.** Confusion matrix for binary classification.

However, the proposed MLHC method contains more information than the typical confusion matrix because it is a multi-class method that processes data of various categories and contains various classification results simultaneously. Similar to the existing confusion matrix, the confusion matrix indicates true negative (TN) or true positive (TP) if the benign sample is classified accurately as benign, or the sub-classification information of the attack sample, such as vehicle type and attack type, is accurately detected. Furthermore, the matrix classifies it as false negative (FN) if attack detection is missed because the sample containing sub attack information is misclassified as normal and as false positive (FP) if normal data are erroneously detected as attack; a sub attack classification result is then returned. The difference from the existing confusion matrix is that if the model classifies a data sample as attack, classification results of various categories are included in the layers below the attack. If the first classifier accurately detected an attack but erroneously classified additional information, such as vehicle type and attack type in the lower layers, it is classified as partial true positive (PTP). The hierarchical confusion matrix that contains PTP in the MLHC model is shown in Table **??**.


**Table 4.** Hierarchical confusion matrix.

For the model's performance, among the accuracy classification indies, accuracy and F1 score are used as shown in Equations (**??**) and (**??**), respectively.

$$Accuracy = \frac{TN + \sum TP}{TN + \sum TP + \sum FN + \sum FP + \sum PTP} \tag{6}$$

where accuracy represents the ratio of accurate classification of attack cases as attack and benign cases as benign among all cases. For attack cases, only TP cases where even the additional information type is correct are counted as follows. The precision, which represents the probability that the actual correct answer is included among the values predicted as attack (i.e., *Ppredict*) by the classifier, is expressed as follows:

$$Precision = \frac{\text{positive detectors}}{\text{whole detections of an algorithm}} = \frac{\sum TP}{P\_{\text{Predict}}} = \frac{\sum TP}{\sum TP + \sum FP + \sum PTP} \tag{7}$$

However, precision does not include the PTP cases where the vehicle type or attack type is not accurately detected.

The recall, which represents the probability that the actual attack cases noted as *P* are accurately predicted as attack by the classifier, is expressed as follows:

$$Recall = \frac{\text{positive detectors}}{\text{total number of existing positives}} = \frac{\sum TP}{P} = \frac{\sum TP}{\sum TP + \sum FN + \sum PTP} \tag{8}$$

As with the precision, PTP cases are not included in recall. Precision and recall have a trade-off relationship with each other. When the recall is raised by adjusting the parameters of the algorithm, false alarms increase; if the conditions are strengthened to reduce false alarms, the recall drops. Therefore, recall and precision should be considered together. Hence, in this study, we used F1 score, which is the harmonic mean of these two items, as follows:

$$F1score = 2 \times \frac{Precision \times Recall}{Precision + Recall} \tag{9}$$

#### 4.2.3. Space of Hypothesis

The space of hypothesis *H*(*S*, *C*), which represents the space set of the model, product of the number of samples and number of classifiers, increases in proportion to the quotient of the data depth. It can be expressed as Equation (**??**):

$$H(\mathcal{S}, \mathcal{C}) = \left( n(\mathcal{S}) \times n(\mathcal{C}) \right)^{depth} \tag{10}$$

where *S* is the set of all samples, *C* is the set of classifiers for distinguishing the type of each target and depth is the number of layers of each classifier. The related notations are outlined in Table **??**.

#### **Table 5.** Summary of notations.


In this section, the existing two models, two-layer multi-class detection (TLMD) and single-layer based multi-class classification (SLMC), are compared in terms of space set with our proposed data learning model MLHC. The TLMD model proposed by Yuan et al. [**?** ] performs multi-class classification independently in each layer by two independent classifiers using the C5.0 algorithm and NB algorithm, respectively. By contrast, the method proposed by Aburomman and Reaz [**?** ] is an SLMC model that contains a multi-class classifier using a support vector machine that has a weight in one layer.

Figure **??**a illustrates the traditional model TLMD, which repeats the learning of the total dataset for the number of classifiers, and the computation of TLMD is shown in Equation (**??**):

$$H\_{TLMD}(\mathcal{S}, \mathcal{C}) = \frac{n(\mathcal{S})}{\mathcal{c}\_0} \times \frac{n(\mathcal{S})}{\mathcal{c}\_1} \times \dots \times \frac{n(\mathcal{S})}{\mathcal{c}\_j} = (n(\mathcal{S}))^{j+1} \cdot \prod\_{i=0}^j \left(\frac{1}{\mathcal{c}\_i}\right) \tag{11}$$

where the number of sample data to be learned in each classifier is *n*(*S*)/*cj*, and training is repeated for the number of classifiers *cj*.

**Figure 7.** Comparison classification methods: (**a**) two layers multi-class detection model; (**b**) single-layer based multi-class classification model; and (**c**) multi-labeled hierarchical classification model (our approach).

Figure **??**b illustrates the SLMC for classifying all the target data using one classifier. The multi-class classification method is used because the number of classes *kj* classified by every classifier *C* must be expressed. The computation of SLMC is expressed as Equation (**??**):

$$H\_{\rm SLMC}(\mathbb{S}, \mathbb{C}) = \left\{ n(\mathbb{S}) \times (\mathfrak{c}\_0 \times \mathfrak{c}\_1 \times \cdots \times \mathfrak{c}\_j) \right\}^1 = n(\mathbb{S}) \cdot \prod\_{i=0}^j c\_i \tag{12}$$

where the target data are expressed as a combination of all data types that can be expressed by each classifier. Therefore, classifier *C* is *c*<sup>0</sup> × *c*<sup>1</sup> ×···× *cj*, and the depth is one.

By contrast, our proposed MLHC method in Figure **??**c forms one classifier by combining multi-class classification and multi-labeled classification. Therefore, the computation of the MLHC is expressed as Equation (**??**):

$$H\_{MLHC}(\mathcal{S}, \mathcal{C}) = \left\{ n(\mathcal{S}) \times \left( 1 + c\_1 \times \dots \times c\_j \right) \right\}^1 = n(\mathcal{S}) \cdot \left( 1 + \prod\_{i=1}^j c\_i \right) \tag{13}$$

Compared to Equation (**??**), Equation (**??**) can reduce the amount of computation for benign data because it does not perform a separate classification process if the result of classifier *c*<sup>0</sup> of the first layer is benign. To compare them with each other, the two equations are rearranged after replacing *<sup>n</sup>*(*S*) · *<sup>j</sup>* ∏ *i*=1 *ci* with *δ*. *HSLMC*(*S*, *C*) and *HMLHC*(*S*, *C*) are expressed as Equations (**??**) and (**??**), respectively:

$$H\_{SLMC}(S, \mathbb{C}) = n(S) \cdot c\_0 \cdot \prod\_{i=1}^{j} c\_i = \delta \cdot c\_0 \tag{14}$$

$$H\_{MLHC}(S, \mathcal{C}) = n(S) + n(S) \cdot \prod\_{i=1}^{j} c\_i = \delta + n(S) \tag{15}$$

In the SLMC model, an increase in data types to be classified means that the space of hypothesis increases according to the multiplicative function. By contrast, in the MLHC model, classifier *c*<sup>0</sup> of the first layer determines benign or attack; if it is benign, classification stops. Therefore, the amount of computation can be reduced for the amount of benign data. When the present dataset, where 89.39% of the total data is benign, is applied, only 10.61% of the attack data is used to classify the vehicle type and attack type. Hence, the space of hypothesis is reduced for the ratio of attack data.

#### **5. Results and Discussion**

#### *5.1. Simulation Environments*

In the simulation, the data were learned using the learning model described in Section **??**, and the performance was compared by measuring accuracy and time. For the intrusion detection model of the in-vehicle network, we used the dataset [**?** ] released from the challenge of in-vehicle intrusion detection. The model was trained and verified by randomly extracting 80% of the data samples from a total of 1.73 million data samples, and the model performance was evaluated using the remaining 20% of the data samples. To classify attack or benign, vehicle type and attack type of CAN traffic, the data samples were learned as multi-labels, and the targets were classified as multi-classes to accommodate various vehicle types and attack techniques.

We used four machine learning algorithms to compare the performance of the proposed method. The stochastic gradient descent (SGD) algorithm [**?** ] is an iterative algorithm used for optimizing objective functions such that they have suitable smoothness properties. We used SGD in our study to compare the performance of the machine learning algorithms, as it reduces the computational burden associated with high-dimensional optimization problems, thereby achieving faster iterations, although the convergence rate obtained is low. In the kNN classification algorithm, the input consists of the k-closest training examples in the feature space. An object is classified by a plurality vote of its neighbors, with the object being assigned to the class most common among its k nearest neighbors. We used this algorithm in our study, as it is basic and capable of performing multi-class classification for performance evaluation.

The DT algorithm constructs a tree structure where each non-leaf node represents an attribute evaluation and each leaf node represents a class label. This algorithm can effectively analyze and classify the data to identify the attributes with information gain. We also used DT in our study as it is a classification algorithm and can achieve good performance depending on the type of dataset used. Furthermore, the random forest (RF) algorithm [**?** ] is a kind of ensemble learning that is used for classification and regression. It returns the classification and average prediction results from the DTs and is therefore an extension of DT. We used the RF algorithm as well, to address the problem of overfitting on the training data and for obtaining a high accuracy.

To evaluate the performance of the classification model, detection rate and training time were selected as evaluation metrics. Accuracy, recall, precision and F1 score were calculated to evaluate the accuracy of the model in a reliable manner, and the elapsed time for training and evaluation of the model were measured. For the reference to evaluate whether the data samples were accurately classified, we used the hierarchical confusion matrix illustrated in Table **??**. This matrix does not include PTPs in TPs where the vehicle type or attack type is incorrect even if the attack or benign is accurately detected. We implemented classifiers using our novel method specified in Algorithm **??** and measured the accuracy.

### *5.2. Simulation Results*

Table **??** compares and outlines the simulation results based on the four machine learning algorithms, namely, SGD, kNN, DT and RF, in terms of the detection rate; these models are described in Section **??**. The results are rounded from the fifth decimal place. Among the three models described, the RF algorithm shows a high positive detection rate of 0.99 or higher. Particularly, the MLHC model proposed in this study showed the highest detection rates evenly in the other three algorithms. The algorithm having the highest F1 score in each model and a graph of F1 score are shown in Figure **??**. All three models showed the highest performance with RF. If the training time is not considered, it can be seen that the F1 score of the model is the highest in MLHC, followed by TLMD and SLMC. The reason for the higher detection rate of MLHC as compared to the other models can be explained as follows.


**Table 6.** Simulation results for detection rate.

**Figure 8.** Simulation results for the best F1 score of each model.

MLHC determines whether an attack has occurred and then classifies the attack information in a hierarchical manner. Therefore, benign and attack data are separated for each data sample in the first stage itself. Subsequently, the model uses only the attack data when classifying specific attack information such as the attack type and vehicle type. Therefore, in this model, the benign data do not contribute to any errors. Consequently, it can be seen that the MLHC model shows a higher detection rate than the TLMD model, which contains two layers and the SLMC model, which comprises a single layer.

Table **??** illustrates the measurement result of the time elapsed for training and model evaluation in each model. For the training data, 1,388,672 data samples corresponding to 80% of all data samples were extracted randomly. Each model was evaluated using the remaining 20% (347,168) of the data samples. The first method TLMD uses independent classifiers in each layer to classify the attack type and vehicle type from the CAN traffic data. For this, <sup>∏</sup>*j*=<sup>9</sup> *<sup>i</sup>* (*n*(*S*)/*ci*) needs to be computed in the three layers using Equation (**??**) during training. Consequently, TLMD took the largest amount of time for training and evaluating all the algorithms. In addition, in the case of the SLMC model, the hypothesis space is proportional to the product of the number of sample and the number of classifiers. The hypothesis space is represented by *<sup>n</sup>*(*S*) · <sup>∏</sup>*j*=<sup>24</sup> *<sup>i</sup> ci* as shown in Equation (**??**), and the number of classifiers is 24.


**Table 7.** Simulation results for elapsed time.

On the contrary, the MLHC model uses a classifier to learn the entire data and then determines if a data sample represents a an attack or benign state. In this method, the benign data that do not require additional analysis, such as vehicle type or attack type, are excluded from the sub-classification targets. Therefore, Equation (**??**) is used to reduce the amount of calculation as many as the number of benign data compared SLMC of Equation (**??**). Therefore, since in an MLHC model using a single classifier, the benign data (89.4% of the total data) need not be reclassified, 99.92% of the learning time is reduced on average, as compared to the TLMD model.

Figure **??** shows the number of CAN messages that can be processed per unit time for each algorithm of each model. The kNN and RF of the TLMD model processed 528 and 1927 test messages per second, respectively, whereas the kNN of the SLMC model processed 2973 messages per second. Considering that 1 Mbps of CAN has 50% of channel utilization, 5000 or more messages must be processed per second. Therefore, the three types of models are not suitable for processing the flooding messages in real time. If high-speed CAN communication in the future is considered, the DT algorithm of the MLHC model that can process 43.5 million messages per second should be used to prevent the bottleneck of the intrusion detection module.

#### **6. Conclusions**

This paper proposes the MLHC learning model that hierarchically classifies attacks using a machine learning algorithm to detect anomaly behaviors of the in-vehicle network accurately and rapidly. The MLHC method can make quick judgements about attack or benign cases for in-vehicle networks by learning the CAN traffic, and it can classify additional detailed information when an attack is detected. A learning model that accommodates multi-labeled multi-class schemas was designed to include various attributes simultaneously while classifying various types of attack data. To evaluate the performance of our model, we applied four machine learning algorithms to existing models and compared accuracy, precision, recall, F1 score and elapsed times for training step and test step.

The simulation results show that the proposed MLHC model achieved high accuracy when based on the RF algorithm and rapid detection when based on the DT algorithm. Both algorithms derived F1 scores higher than 0.998. Thus, we conclude that the DT and RF algorithms are applicable to high-speed internal communication environments, as well as in CAN for analyzing 43 million and 46 million CAN message frames per second, respectively.

In the future, we plan to train and verify intrusion detection models based on traffic injected into vehicles after directly generating messages of various attack types in addition to fuzzing, flooding and malfunction. Furthermore, we will additionally analyze the vehicle ethernet traffic beyond the CAN for target networks to investigate methods of applying the traditional intrusion detection and prevention patterns to the in-vehicle network. In addition, in the future, we intend to investigate the parallel processing method [**?** ] for fast data processing in real time against sequential message injection attacks.

**Author Contributions:** Conceptualization, S.P. and J.-Y.C.; methodology, S.P.; software, S.P.; validation, S.P. and J.-Y.C.; data curation, S.P.; writing—original draft preparation, S.P.; writing—review and editing, J.-Y.C.; visualization, S.P.; supervision, J.-Y.C.; project administration, S.P. and J.-Y.C.; and funding acquisition, J.-Y.C. All authors have read and agreed to the published version of the manuscript.

**Funding:** This work was supported by Institute for Information and communications Technology Promotion (IITP) grant funded by the Korea government (MSIT) [No. 2018-0-00532, Development of High-Assurance (≥EAL6) Secure Microkernel].

**Conflicts of Interest:** The authors declare no conflict of interest.

*Sensors* **2020**, *20*, 3934
