1. Introduction
The rapid development of the Internet of Things (IoT), such as smart homes, smart cities, and industries 4.0, has opened up a new cyber–physical technological era, impacting our daily lives. According to the International Data Corporation, approximately 41.6 billion IoT devices will be connected to the Internet, and 79.4 zettabytes of data will be the total generated data in 2025 [
1]. The growth is not expected to slow down and will keep increasing for the next years.
There are more innovations in smart devices. For instance, smart homes can be equipped with several sensors, remotely controlling video surveillance, lighting, and heating systems [
2]. The users can access information from these IoT devices or perform daily tasks (e.g., turning off the light). Therefore, many consumer IoT devices (e.g., light actuators, Wi-Fi plugs, and cameras) are connected, setting up a smart-home environment.
IoT will expectedly count for a major part of Internet traffic; however, there exist only a few studies for modeling IoT traffic and characterizing its properties [
3]. Compared with current Internet traffic, IoT traffic exhibits different characteristics because of the diversity of sources, heterogeneity of hardware devices, and novel services, leading to new traffic patterns [
4,
5]. IoT traffic is the aggregation of packets generated by several devices from different environments, such as smart homes or smart cities. These environments include several sensors dedicated to specific tasks, such as monitoring systems or collecting cyber–physical values (e.g., temperature and humidity).
Traffic generators are essential tools for evaluating network performance and characterizing traffic [
6]. Several traffic generators [
7], such as Iperf, PackETH, D-ITG, and Ostinato, focus on Internet traffic. IoT traffic has different characteristics, such as heterogeneity of sources, multiple sources, new traffic patterns, and different supported services [
4].
The rise of the IoT has also unveiled new vulnerabilities, as observed with the Mirai distributed denial of service (DDoS) botnet in 2016, severely impacting the Internet [
8]. The virtual carjacking of a vehicle has been reported as an exploit [
9]. Data from heart monitoring systems for babies were found to be unencrypted, and alerts in the case of an emergency could have been modified and strongly impacted the medical process [
10]. Hence, the IoT is facing new challenges regarding the cybersecurity of devices, data privacy, and communication. The collected data can convey critical information about users, their privacy, and the environment. It is essential to characterize the traffic of IoT devices to prevent security threats and mitigate vulnerabilities.
To study IoT traffic characteristics, we set up an IoT testbed for smart-home environments. Then, we performed extensive measurement experiment campaigns to study IoT traffic. We designed a novel IoT traffic generator tool called IoTTGen. It is a packet-level traffic generator that can generate traffic from multiple devices, emulating large-scale scenarios with different devices under different network conditions. It can study the properties of IoT traffic from various environments, such as smart homes and biomedical environments [
3].
From our IoT traffic dataset, we computed the entropy values of traffic parameters and visually observed the IoT traffic on the behavior shape (BS) graphs. We propose a new method for identifying traffic entropy-based devices, computing the entropy values of traffic features. The method relies on machine learning to classify the traffic. The proposed method succeeds in identifying devices under various network conditions with a performance accuracy up to 94% in all cases and is robust with unpredictable network behavior with traffic anomalies spreading in the network.
The rest of the paper is organized as follows.
Section 2 surveys the related work on IoT traffic.
Section 3 introduces the proposed IoT testbed for smart-home environments.
Section 4 describes the proposed novel IoT traffic generator tool called IoTTGen.
Section 5 presents our measurement experiments.
Section 6 shows the traffic entropy analysis.
Section 7 introduces the proposed new entropy-based identification method and discusses its performance accuracy.
Section 8 concludes this paper and provides some insights into future work.
4. IoT Traffic Generator Tool
In the previous section, we presented our testbed for collecting smart-home IoT traffic. The deployment of a testbed allows collecting a real dataset. However, it exhibits some limitations, as it is complex to extend the scale of a testbed in order to test new devices. In direct relation to the number of devices, it is also costly. To address this, we designed a new IoT traffic generator tool that we will introduce in this section. Therefore, by using our generator, it is possible to reproduce the behavior of IoT networks in various environments and on any scale.
4.1. Overview
IoT traffic is the aggregation of packets generated by several devices from different environments, such as smart homes or smart cities. These environments involve several sensors dedicated to specific tasks, such as monitoring systems or collecting cyber–physical values (temperature, humidity, etc.). IoT traffic can be more easily predicted, as sensor devices are deployed to perform the same tasks continuously and generate the same amount of data periodically.It is the opposite of the Internet traffic, where the traffic has some human-generated aspects (flash-crowds, popularity, etc.). Some alerts can also occur, adding less predictable traffic behavior; however, they are still part of the sensor devices’ behavior. Hence, we designed a novel IoT traffic generator tool called IoTTGen. Our packet-level generator can finely tune all the traffic feature parameters, such as the packet size or time interval between packets. This tool is essential for modeling IoT traffic, studying its characteristics, modeling unpredicted traffic behavior, and understanding IoT traffic anomalies.
4.2. Architecture
Different kinds of IoT environments can be modeled using IoTTGen, in which each sensor device produces its traffic trace according to its functionality and characteristics. For instance, a video recording camera generates continuous data flows with large-sized packets, and smart plugs generate small-sized packets at a slow pace. IoTTGen is designed to configure each device’s parameters easily. All the packets generated for each device are stored in a single trace file in the following supported file formats: pcap, csv, and txt.
The IoTTGen architecture comprises (i) a device configuration module, (ii) a packet creator module, and (iii) a main controller.
The device configuration module defines the IoT device configuration, such as the packet size, port number, payload, and period between packets. The user needs to define a new device template to add a new device (a smart light bulb or camera).
The packet creator module forges the packets based on the device configuration from the previous module. To forge packets and generate real packet traffic, we used Scapy [
47]. It can generate packet traces in text format only, depending on its needs. For a long-duration experiment, it drastically reduces the duration of the packet generation process.
The main controller controls the execution of IoTTGen. The controller extracts the parameters provided by the device configuration module and causes the packet creator to forge packets or provide packet traces. It also merges each device trace into a single trace; however, it is still possible to generate one trace per device.
4.2.1. Device-Generated Traffic
IoT devices are dedicated entities continuously performing their tasks, such as sensing the environment, transferring data among objects or users automatically without human intervention. Thus, each device is continuously generating traffic even though no user is requesting its explicit information.
Figure 3 shows the packet generation process for IoTTGen. The period for generating packets varies for each kind of device and can be configured. For example,
Figure 3 shows that the devices have periods of 1, 1.5, 1, and 2 s. As for generating traffic, all the devices are synchronized with the same time origin; however, it is also possible to configure the starting time and add some delays among packets and periods to reduce devices’ synchronicity. Then, each device generates a different number of packets with different periods for the entire duration of the scenario. Besides predefined parameters, it is also possible to assign parameters within intervals so that the parameters can be randomly chosen within ranges of values. This way of working is useful for modeling the unpredictable behavior of devices and users in cases of specific scenarios (e.g., a daily routine except for one day, new behaviors and different interactions with devices, etc.). Thus, our generator is a complete scenario composition framework for the IoT.
4.2.2. Human-Generated Traffic
IoTTGen can generate traffic triggered by human activities. Each device has different event patterns (e.g., the smart plug has a turn-on event and turn-off event); hence, IoTTGen can also model the traffic generated by human activities. Besides, the user can create different scenarios using IoT devices in regular daily activities. For example, when coming home, the user can turn on the bulb and plug, launch the Spotify music service on the hub, and access the camera’s recording activities.
Figure 3 shows that plug 1 is switched on at activity 1, and two packets are generated for plug 1. Then, when the bulb is switched on at activity 2, IoTTGen generates packets accordingly. Furthermore, human activities can also be created randomly. For example, when the user has a party at home, smart-home devices can operate randomly. Therefore, IoTTGen can also set the time to trigger events for these devices randomly. Thus, human-generated traffic can be generated easily using IoTTGen.
We will show how we applied IoTTGen to the IoT environment and generated a different kind of traffic in the following section.
5. IoT Traffic Measurement Experiments
We set up three experiments: collecting measured traffic using the testbed, generating synthetic traffic using IoTTGen, and generating anomalous traffic by injecting malicious traffic into synthetic traffic and measured traffic.
We used IoTTGen for implementing experiments and generating synthetic IoT traffic for smart-home environments. The measured traffic was obtained from our testbed, as discussed in
Section 3.1. We also generated malicious traffic by extracting anomalous traffic from the real dataset [
19]. Thus, we aimed to investigate the properties of IoT traffic and study the impact of malicious traffic.
IoTTGen can generate one day’s traffic for the five-device smart-home scenario in a few seconds with more than 60,000 packets per second. The generation of the 52-device scenarios reaches up to 80,000 packets per second, and one day’s traffic in a few seconds. This larger experiment is faster, as more devices have similar properties. The complexity of IoTTGen are presented in
Table 4. Thus, IoTTGen can generate daily traffic in a few seconds and generate a longer traffic trace according to the scenarios. We conducted all the experiments on a computer running an Intel Core i7-7700 at 3.6 GHz, with 8 GB of RAM and 64-bit Windows 10 Professional (Microsoft Co., Redmond, WA, USA).
5.1. Generating Synthetic Traffic
5.1.1. Smart-Home Environment
A smart home is defined as a house equipped with (multiple) cyber–physical sensors for temperature, humidity, light control, smart hubs, and other things, allowing inhabitants to obtain information on their environment (e.g., temperature) and control and monitor it remotely (e.g., turning on equipment). In our smart-home environment, the devices were connected to the Internet with Wi-Fi through a home gateway for controlling the flow of information among smart appliances and to the remote network. Thus, remote users can access data and control home sensors with dedicated devices, such as smartphones, tablets, and computers.
5.1.2. Smart-Home Scenario
To generate IoT traffic for a smart home, we set up an experiment with a four-room house equipped with the following 13 smart devices (
Figure 4): one smart hub (e.g., Amazon Echo) for controlling other devices, four smart cameras (e.g., Belkin NetCam), four smart lights (e.g., Lifx Bulb), and four smart plugs (e.g., TP-link Smart Plug).
To show that the different parameters would have an impact on the generated traffic, we considered two distinct sets of parameters for the smart-home environments: a case with custom parameters and another case with parameters extracted from the dataset [
48].
Table 5 summarizes the custom parameters that we modeled based on the functionality of each device. For instance, a smart plug generates short-sized packets (100 bps) periodically, as smart plugs are low-bandwidth sensor devices. A smart hub may have larger packets (200 bps) for a management purpose but with the same period. We consider that a smart camera is a high-bandwidth device that continuously generates large-sized video packets (1000 bytes) in a short period (50 ms) for a video bitrate at 160 Kbps.
Table 5 shows the parameters extracted from the previous study [
48]. The same network configuration, smart hub, smart light, smart plug, and smart camera generated short-sized packets (144, 94, 120, and 100 bytes) at distinct periods (2.77, 3.2, 10, and 2 s, respectively).
Different parameter values (custom vs. extracted parameters) will have an impact on the traffic properties (
Figure 5), which will be described in
Section 4.
For this smart-home environment scenario, the overall smart-home traffic was generated by IoTTGen for three durations: 8 h, 24 h, and 7 days.
In adding new devices, we also implemented an experiment with a biomedical environment from a previous study [
3].
5.2. Measured Traffic on Testbed
5.2.1. Traffic Trace Observation
Table 6 summarizes the one-day trace of the collected traffic. The camera, a high-bandwidth device, generated the major part of the packets and bytes, whereas the plugs counted for a small amount of the traffic. The hub also generated more packets than the camera, but overall, the hub counted for a lower amount of traffic.
5.2.2. Cloud Server Observation
There are many IoT manufacturers in the market today. Each manufacturer has its cloud server that manages its devices, and each DNS query–response pair is mapped into a particular domain owned by the manufacturers, as shown in
Table 7. For instance, the TP-link plug device is directed to the TP-Link cloud server at devs.tplinkcloud.com. For Internet users, they may access many online servers (e.g., the web, Online Social Network, and e-Business) during their activities. However, IoT devices are dedicated to a single task and communicate only to pre-established servers. Inspecting the remote servers that IoT devices are connected to can indicate whether the devices have been corrupted or whether they may send information to nonlegitimate servers.
5.3. Synthetic Traffic vs. Measured Traffic
To compare the measured traffic and synthetic traffic, we used IoTTGen to generate traffic based on parameters extracted from
Table 8. These parameters are similar to the parameters of measured traffic from our testbed.
For many devices, we also considered a scenario with a house equipped with the following 52 smart devices: four hubs, eight cameras, 20 bulbs, 10 Tpl plugs, and 10 Tk plugs. The total duration of this experiment was 2 h.
Figure 6 shows the differences between two scenarios of five devices and 52 devices. With 52 devices, the traffic was eight times higher than that with five devices.
5.4. Anomalous Traffic
From the regular users’ activity described previously as ON or OFF periods from the testbed and IoT traffic for smart-home environments generated from IoTTGen, another scenario of interest is when the IoT devices are under attack, and there are some security threats in the network. There are many IoT devices connected to the Internet, and there have been many cybersecurity threats or anomalies seen, such as the Mirai DDoS botnet. Thus, it is also essential to accurately detect when the devices are under attack or have anomalous traffic. In that case, the operator can react quickly in the case of new threats. We also wanted to model IoT anomalous traffic.
To this end, we relied on a public IoT traffic dataset [
19] by extracting the traffic into three cybersecurity threats: port scanning (PS), DoS, and DDoS. Five bots generated DDoS, whereas a single bot generated DoS and port scanning. Each of the malicious traffic traces was injected into our generated IoT traffic. Then, we obtained five different traces: one original trace for IoT traffic, four malicious traffic traces for each anomaly, and a mixture of them (referred to as DDoS, DoS, PS, and mix). The total duration of the experiment was 24 h, and the cybersecurity threats occurred at 06:00 for 38 min. However, for other traffic, we had different configurations for anomalous traffic.
For the measured traffic scenario, the anomalous traffic traces are presented in
Table 9, and the total duration of each trace was 1 min. The original traffic is depicted in
Figure 2.
For the synthetic traffic scenario, the anomalous traffic statistics are presented in
Table 10, and the total duration of the traces was 38 min.
Figure 7 shows the generated traffic used for the smart-home scenario. The generated anomalous traffic is a mixture of the three different attacks and the IoT traffic.
6. Behavior Shape Traffic Analysis
To study the different traffic properties, we computed the entropy of the six traffic parameters: the IP source, IP destination, port source, port destination, packet size, and byte count. Then, we plotted the BS graphs [
20] of the entropy to visualize and compare the traffic properties. The entropy can also be represented with time series [
49] or three-dimensional graphs [
24]. In our work, we chose the behavior shape graph, as we aimed at characterizing the traffic visually. Then, we plot the BS graphs [
20] of the entropy to visualize and compare the traffic properties.
In this section, the theory of information entropy is first briefly introduced. Then, the results of our experiments are presented through the use of the entropy behavior shape. First, we present the results for regular traffic (
Section 6.2) (i) generated by IoTTGen (
Section 6.2.1) and (ii) measured from the testbed (
Section 6.2.2), and the comparison (
Section 6.2.3). Second, we present the results with anomalous traffic. In these experiments, we also showed the overall traffic of a smart home as well as the traffic of individual devices.
6.1. Entropy
As we aimed to identify IoT devices based on network traffic (legitimate or anomalous), the frequencies of traffic parameters, such as IP addresses or ports, could help to identify the devices and the network conditions. Hence, we computed the entropy values of the following traffic parameters [
24]: the IP source, IP destination, port source, port destination, packet size, and bytes.
Information entropy is a quantity in information theory for measuring uncertainty (see Equation (
1)) [
25].
The entropy value can vary from 0 to log(n). An entropy value equal to 0 means that the observations (i.e., packets) are similar, whereas a higher entropy value shows that the observations are different. In the rest of the paper, these parameters convey sufficient information for precisely identifying the devices under different network conditions.
6.2. IoT Traffic under Regular Network Condition
In the following, we present the characterization of the IoT traffic collected on the testbed and generated by our generator.
6.2.1. Synthetic Traffic under Regular Network Condition
Figure 5 and
Figure 8 present the BS graphs of the generated traffic for the smart home from the experiments presented in
Section 5.1.
Figure 5 shows that our generator used custom parameters in
Table 5 and extracted parameters in
Table 5.
Figure 8 shows that smart-home traffic was also generated on the basis of the extracted parameters in
Table 5.
In all the figures, the entropy value for the IP destination parameter is equal to 0. This is because, for all the experiments, the traffic generated from IoT devices flowed to the same destination (i.e., the gateway). Then, there was no entropy, as there was only a single destination.
Figure 5 shows that the IoT traffic had been generated over 24 h as a regular daily activity with the same network configuration (smart-home environment). The parameter values directly impacted the shape of the traffic: the extracted parameters exhibited a larger shape than the custom parameters. There was the same number of IoT devices, but the parameters were different (period and packet size), resulting in different BS graphs.
Figure 8 shows that the experiments were performed for 8 h, 24 h, and 7 days. For all these durations, the IoT traffic exhibited the same BS graph. As the synthetic traffic was generated, the traffic parameters stayed unchanged during all the experiments. There was no traffic evolution (e.g., there were no new connected devices or failures).
6.2.2. Measured Traffic under Regular Network Condition
As we split the traffic into 5 min time slots, each trace from the same state (i.e., ON, OFF, or anomalous) showed similar properties. We show the BS graph of a single 5 min trace for each device.
Figure 9 shows the BS graph of one day’s traffic for five consecutive days. As the traffic shapes were consistent over a day, a single day’s traffic is representative of the other days.
Figure 10,
Figure 11,
Figure 12,
Figure 13 and
Figure 14 present the BS graphs for different smart devices during one day. The OFF period area is smaller than that of the ON period, except for the camera. This can be explained by the fact that in the ON period, the camera uses almost only one type of packet, whereas the remaining devices use more diverse types. This leads to a decrease in the entropy value, expressed by a smaller BS area for the camera ON traffic. Moreover, the BS graph for each device is very different, and this BS graph can help to identify the IoT devices. This can be explained by the fact that each device has different characteristics and functionality; hence, they are sending or receiving different kinds of packet. For example, the camera uses many different source ports when sending packets and shows a greater entropy value for the port source (3.7) than that for the hub (2.9) or those of other devices (i.e., the Tpl plug (1.55), Tk plug (2.0), and bulb (1.9)).
Regarding the OFF period, the BS graphs for the camera and hub are larger than those of the other devices. Those devices have many functionalities and generate many more packets, even in the OFF period, than the other devices (the bulb or plugs). The entropy values for such devices reached a higher level.
6.2.3. Synthetic Traffic vs. Measured Traffic under Regular Network Condition
We validated the effectiveness of our generator by comparing synthetic traffic and measured traffic.
Figure 15 and
Figure 16 show the BS graphs of the generated traffic from IoTTGen (based on parameters in
Table 8) and testbed, as presented in
Section 5.3. The BS graphs of the synthetic traffic with the measured parameters and the measured traffic overlap in both figures and exhibit the same properties.
Figure 15 shows that the entropy values were nearly even in pairs of the synthetic traffic and measured traffic.
Figure 16 shows that these values are similar when focusing only on the hub device with the ON/OFF activity. This shows that our generator succeeded in modeling the IoT traffic and capturing its main characteristics.
6.3. Anomalous Traffic
The number of packets increased drastically when the network was under attack. Therefore, anomalous traffic had a strong impact on the shape of the IoT traffic.
Figure 17,
Figure 18,
Figure 19,
Figure 20 and
Figure 21 presents the BS graph of the anomalous traffic (see
Section 5.4) for the smart-home environment.
6.3.1. Synthetic Traffic under Network Attack Condition
Figure 17 shows the IoT traffic, each of the malicious traffic types (DoS, DDoS, and PS), and the aggregation of all the traffic (IoT and anomalies). As we previously observed, the duration of the experiments had no impact on the entropy values; for this experiment, we focused on the daily activity patterns and present the experiment results for 24 h.
Figure 17 shows that the malicious traffic impacted the entropy values and the BS graph of the IoT traffic.
For all the malicious traffic, there was a higher entropy for the destination port. It was more pronounced for DoS and DDoS (14.5) than for port scanning (). These anomalies come from security threats targeting a large number of destination ports. Thus, it is possible to visually observe such anomalies in the network by computing the entropy value.
Regarding the entropy value of the IP source, the IoT traffic had higher entropy than the other malicious traffic. Remember that traffic was generated by 13 IoT devices for the smart-home environment and that the DDoS was generated by five bots. By contrast, the DoS and port scanning were generated by a single bot, so there were more distinct IP sources in the legitimate traffic, leading to more variations and higher entropy. However, this parameter allowed us to observe variation in the traffic and differentiate the malicious traffic from the legitimate traffic. For the source port, the observation was similar to that for the IP source. The DoS and DDoS relied on a single port to send traffic, so their entropy values reached the lowest value at 0.11. IoT scenarios involve more source ports and show higher entropy.
Regarding the packet size parameter, port scanning reached a higher value than the other traffic, and the DoS and DDoS traffic showed the lowest value. This is because the DoS and DDoS traffic sent only 60-byte packets, whereas the port scanning used various sizes of packets (100 B to 1 MB;
Table 10). The IoT traffic also used various packets according to the devices and environment, as shown in
Table 5. Thus, there were more variations for port scanning and higher entropy.
The DoS traffic showed lower entropy because the byte count computes the total bytes by IP source, and the entropy depends on the diversity of the packet size and the number of IP sources.
We experimented with computing the entropy level regarding traffic parameters and visually observed the traffic on BS graphs. Different traffic, such as legitimate or malicious traffic, shows different entropy values and has different impacts on a network. Using our IoT generator, we succeeded in picturing IoT traffic characteristics and showed that it is possible to detect anomalies based on entropy and the visual representation of the traffic, such as the behavior shape.
6.3.2. Measured Traffic under Network Attack Condition
Figure 18 shows that the shape of the hub traffic with DDoS was different from that of the hub traffic. The DDoS exhibited much higher entropy values for port destinations (16) because it targeted many destination ports. For the packet size and port source, the hub traffic had higher entropy than the DDoS traffic. DDoS relies on a single port to send the same size packets, so its traffic entropy for this feature is close to zero. Overall, the BS graph can represent the nature of the traffic.
We also compared the entropy values of all the device traffic with those of each type of anomalous traffic.
Figure 19 shows the difference between the ranges of the entropy values of the anomalous traffic and normal traffic. The entropy values of the normal traffic always fluctuated within a certain range, and those values of all the anomalous traffic were almost kept stable. Additionally, the minimum or maximum value of the normal traffic was always greater or less than that of the anomalous traffic. From the observation of the entropy value, it is possible to detect anomalies for IoT devices.
6.3.3. Synthetic Traffic vs. Measured Traffic under Network Attack Condition
We also used BS graphs to evaluate the effectiveness of our generator under the influence of anomalous traffic.
Figure 20 and
Figure 21 show that anomalous traffic directly impacted the shape of the traffic. This impact was equivalent to that of both the synthetic traffic and measured traffic. As analyzed in previous sections, the malicious traffic shapes were still different from those of the legitimate traffic. With different attack cases, the shape of the anomalous synthetic traffic nearly coincided with that of the anomalous measured traffic.
With IoTTGen, we can still model IoT traffic successfully under various network conditions.
7. Entropy-Based IoT Device Identification
Based on the experimental results, we propose a new approach to identifying and classifying IoT devices based on machine learning and traffic entropy values.
7.1. Classification Algorithms
Previously, the entropy values of traffic features showed different characteristics and could help to identify devices. We now aimed at classifying the devices by relying on machine learning algorithms. Our collected one-day trace was split into several 5 min traces. We evaluated the effectiveness of our classification using a 10-fold cross-validation method [
50] and then applied it to an independent validation dataset. The dataset was randomly divided into two datasets: the training dataset (80% of the total instances, i.e., 5-min traces) and validation dataset (20% of the total instances).
Using the Weka software [
51], we relied on six classification algorithms: (i) decision tree (DT), (ii) random forest (RF), (iii)
k-nearest neighbors (KNN), (iv) Gaussian Naive Bayes (NB), (v) neural network (NN), and (vi) support vector machine (SVM). We considered the following metrics for evaluating the algorithms’ performance: the true positive rate (TPR), false positive rate (FPR), precision, recall, and
F-measure.
Table 11 shows the classification results for these algorithms for the validation dataset for all the devices and network conditions.
In our experiments, the SVM and NN algorithms performed poorly based on the TPR metric: 0.2996 and 0.3137, respectively. NB and DT showed better performance; however, NB only reached an average level of performance (0.5712), and DT reached a higher level and could classify approximately 72% of the IoT devices properly. The RF and KNN algorithms outperformed the other algorithms and showed a high level of performance. KNN succeeded in classifying >92% of the devices, and RF exhibited even higher performance, up to 94.74%. As RF and KNN succeeded in classifying IoT devices based on the entropy value, RF showed the best performance. In the following, we relied on the RF algorithm for computing the classification matrix.
7.2. Confusion Matrix for IoT Identification
Through all our data, we could define 50 classes based on the states of each device and the type of network traffic (normal or anomalous). More precisely, the five devices could be classified into ON or OFF states and under five different network conditions: regular traffic, DDoS, DoS, PS, and the three attacks jointly.
After processing the data with RF, we obtained a probability vector for each class, shown on the confusion matrix (CM) in
Figure 22. The accuracy of the classification depends on the ratio of accurate predictions. The CM provides further information on the accuracy of different classifiers and which classes are correctly or incorrectly predicted. It also provides the misclassification type.
Our classification method showed a very high accuracy for detecting the devices under different network conditions for all the devices. For the plugs, 96% of the traffic for these devices were accurately classified. The precision for the hub under the regular network condition was also very high (95%) and still over 80% under anomalous network conditions. The prediction for camera OFF also reached a high level (96%), but the prediction accuracy decreased under network anomalies. Similarly, for the bulb, the prediction was very high for the ON period (98–99%) but dropped drastically with the OFF period (43–57%). The predictions were very accurate with regular traffic and dropped while cyberattacks compromised the network. The intense-activity period (ON) also showed higher accuracy than the no-user-activity period (OFF).
Finally, IoT device classification based on entropy succeeds in identifying IoT devices and under various network conditions.
8. Conclusions
We performed measurement experiments and modeling to study the properties of IoT traffic. We set up a smart-home testbed to collect consumers’ IoT traffic. From these experiments, we derived a new IoT traffic generator tool to model and emulate IoT traffic. Our tool can be used as a scenario composition framework for modeling IoT environments, such as smart homes.
We also extracted IoT anomalies from a public dataset, performed comparisons of synthetic traffic, and generated traffic by computing the entropy of the traffic parameters and observing the BS graphs’ results. The traffic shape differed significantly for different scenarios and each type of smart IoT device, and it was also possible to observe the impact of anomalous traffic. Our traffic generator succeeded in representing the traffic characteristics, and our methodology shows that we can also compare traffic and highlight anomalies.
Furthermore, we introduced a new entropy-based method for identifying IoT devices based on the computation of the entropy values of traffic features. We used machine-learning algorithms, such as RF, to classify the devices based on the IoT traffic’s entropy value in different scenarios. Our results show that our method reached a high level of performance, accurately classifying the devices (94% accuracy), especially in the case of intense activity. We also applied our method under different scenarios and cybersecurity threats, in which it still classified devices accurately.
For future work, we are now using our generator to model other IoT environments and characterize different IoT devices. For instance, we are adapting our generator tool for new scenarios, such as a biomedical or smart factory. Our generator is also being tuned to generate IoT network anomalies and combine them with the measured traffic. It is an essential tool for studying consumer IoT devices in a wide range of scenarios. For visualizing IoT traffic, we are now using the BS graphs to observe and analyze in general. In the future, the similarity of these shapes will be clarified to detect IoT anomalies; for example, we can use the Euclidean distance to evaluate the differences between shapes. Regarding our entropy-based identification method, we are now adapting it to classify IoT devices on the fly. This method could be used in a gateway to immediately detect legitimate or malicious devices in the network environment. Furthermore, other deep-learning algorithms or feature selection methods will be applied to our dataset to improve and evaluate the performance of our method.