4.1.1. BoT-IoT Dataset

In previous studies, different datasets, such as KDD99, ISCX, and CICIDS2017, have been used to evaluate ML models; however, few datasets have been produced to reflect realistic IoT network traffic. These datasets were either not diverse enough in terms of attacks or not realistic in terms of the testbed [19]. Therefore, Koroniotis et al. [49] designed the BoT-IoT dataset to address these limitations. The BoT-IoT dataset is used in forensic analysis and to evaluate IDS. The dataset contains normal IoT traffic and different types of attack traffic with subcategories for each type, which are listed in Table 2. Reconnaissance

is one of the privacy threats, and it allows a threat actor to collect data about a victim via port scanning and OS fingerprinting, among other ways. Information theft includes data theft by unauthorized access and keylogging. On the other hand, a DoS threat affects the availability of services and can damage systems, which make it one of the biggest threats to smart cities. In this dataset, UDP, TCP, and HTTP protocols were used to perform both DoS and DDoS attacks.


**Table 2.** Attack categories in BoT-IoT dataset.

### 4.1.2. TON\_IoT Dataset

The ToN\_IoT dataset [50] is one of the newest cyber security datasets; it as collected from a testbed network for industry 4.0 IoT and Industrial IoT (IIoT), which makes it suitable to evaluate CTI for a smart city. We used the TON\_IoT train–test dataset, which is in the CSV format. The dataset contains a total of 461,043 instances and 9 types of attacks, which are presented in Table 3 along with the number of instances for each type.

**Table 3.** Attack categories in TON\_IoT dataset.

