*4.2. Datasets*

We evaluated the performance of our approach against baseline models using the following network intrusion detection datasets: CIC-IDS2017 and NSL-KDD. The CIC-IDS2017 dataset reflects recent attacks, and, to some extent, it satisfies the criteria for reliable intrusion detection datasets proposed by [39], which are anonymity, attack diversity,

complete capture, complete interaction, complete network configuration, available protocol, complete traffic feature set, meta data, heterogeneity, and labeling.

The dataset was developed at the Canadian Institute of Cybersecurity of the University of New Brunswick (UNB) in 2017. The dataset comprises raw PCAP files, as well as 80 statistical features generated from the PCAP files, which were captured on different days of a week (from Monday to Friday). The dataset considers several attacks and sub-attacks, as depicted in Table 1.


**Table 1.** Attack composition of the CIC-IDS2017 datasets.

The NSL-KDD dataset was also generated at the Canadian Institute of Cybersecurity. This dataset was purposely created to solve the problem of the original KDDcup'99 dataset, which has about 78% and 75% of the training and testing set duplicated, respectively [40]. The NSL-KDD rectifies this problem and still retains the original 41 features.

Table 2 depicts a breakdown of some of the attacks that exist in the dataset with more than five samples.


**Table 2.** Attack composition of the NSL-KDD datasets (training set).
