1. Introduction
The Internet of Things (IoT) constitutes a network of interconnected things. The IoT extends and expands based on the Internet, connecting various information sensing devices (RFID, infrared sensors, global positioning systems, etc.) with the Internet through the network for information exchange and communication to achieve intelligent identification, control, perception, monitoring, and management. The IoT has a wide range of application areas, including but not limited to smart homes, smart agriculture, industrial manufacturing, smart transportation, smart cities, healthcare, etc. With the continuous development and improvement of technology, the IoT will play a greater role in the future, promote the intelligent development of society, and create a new era of efficient, convenient, data-driven decision-making.
IoT devices will be a significant part of the Internet. As of the beginning of 2024, there are 16.7 billion IoT devices. According to a study, by 2027, there will likely be more than 29 billion IoT connections [
1]. With the rapid development of IoT devices, humanity has achieved a more modern and intelligent way of life. In the past few years, the number of IoT devices has experienced explosive growth, significantly exceeding the number of Internet users. However, as the quantity of IoT devices continues to increase, the associated security risks are gradually escalating. Due to its intrinsic characteristics, the IoT confronts heightened risks of cyber threats [
2,
3], such as data leakage, broken authentication, and spoofing. In recent years, multiple instances have occurred where IoT devices were misused for network attacks [
4,
5,
6]. As IoT applications extend into more areas while bringing convenience and efficiency, it creates an urgent need for IoT device identification. IoT device identification, as a fundamental technology in network security and defense, plays a crucial role in ensuring the security of the underlying infrastructure.
However, the standards for IoT devices are diverse. Even devices of the same type may have differences regarding the control system, transmission protocols, and cipher suite. The environment that applies IoT devices typically deploys a large number of devices, with different types and manufacturers. The inability of administrators to easily identify devices and health behaviors may cause damage to network security and management.
According to the special behavior patterns of IoT device communication, researchers usually identify devices by analyzing the characteristics of network traffic, such as communication mode, protocol, and packet size [
7]. Traditional methods utilize the information carried by devices during communication, such as using Organizational Unique Identifiers (OUIs) in MAC addresses, TLS certificate manufacturer information, destination IP addresses, DNS requests, and the USER-AGENT field in HTTP requests to identify IoT devices. However, this rule-based identification method is sensitive to potential interference. For example, the user-agent-based identification method cannot be applied to encrypted traffic and has a high delay, so it is not universal [
8,
9]. Although MAC OUI is helpful for device identification, the identification granularity is too coarse and cannot be accurately identified [
10], making it difficult to meet the identification needs within the local area network.
Traditional methods are intuitive and cost-effective, but they often lack accuracy and efficiency, particularly in complex scenarios, and struggle to keep pace with the rapid changes and growing demands of the IoT environment. With the advancement of technologies such as artificial intelligence, machine-learning-based approaches for IoT device identification have emerged in recent years. Unlike traditional methods, machine-learning-based approaches can automatically learn and adapt to complex and dynamic device behaviors, thereby enhancing accuracy and efficiency in identification while also offering stronger dynamic adaptability and scalability. In order to classify IoT devices efficiently and stably, current researchers are attempting to extract features from the network traffic and provide them as inputs to machine learning models to learn hidden patterns from complex traffic features. Many studies have followed this approach and achieved high accuracy. Recent studies have extracted packet header fields or network layer features of IoT devices for device identification [
11,
12,
13,
14]. However, the network layer characteristics of many IoT devices exhibit consistency in patterns. Furthermore, most existing methods are not applicable when IoT devices are located behind a NAT (Network Address Translation)-enabled router, as many features change during the NAT process [
15].
In this paper, we aim to establish a practical IoT device identification system that accurately identifies IoT devices through sessions; in addition, our method is also applicable to IoT devices behind NAT-enabled routers, as the NAT process does not alter the communication payload. Compared to previous works that focused on identifying IoT devices by extracting features from the header of relatively easily tampered packets, our method focuses on extracting the payload content and features of IoT device sessions to identify IoT devices. It is more universal and suitable for any IoT device, even if they use different protocols. Meanwhile, this method also provides a high accuracy.
This work offers the following contributions and differences from previous work: (1) Our method does not rely on packet header fields or network layer features. Instead, we attempt to use the more stable and reliable payload of sessions for identification, which makes our method more robust and widely applicable. (2) We propose a novel two-stage machine learning model based on sessions to identify IoT devices. By establishing machine learning models based on the frequent items and network communication features of sessions, we achieve efficient identification of IoT devices and identify some network communication features. (3) Our method only requires one session to achieve the identification of IoT devices, with an identification accuracy of up to 99.48%. Moreover, the method can accurately identify IoT devices even in network communications with non-IoT devices.
The remainder of this paper is organized as follows. In
Section 2, we reviewed the relevant work on IoT device identification.
Section 3 gives an overview of our method.
Section 4 gives an overview of our model selection.
Section 5 demonstrates evaluation experiments and results.
Section 6 concludes the work.
2. Related Work
The increasing number of IoT devices has made it difficult for network managers to monitor the IoT environment. For network managers, it is necessary to be able to identify which IoT devices have been or are attempting to connect to their network and monitor them to protect them from network threats.
In recent years, researchers have developed several identification methods for IoT devices. These methods can be primarily categorized into active IoT device identification technology and passive IoT device identification techniques.
Active IoT device identification, also known as active probing, involves the identification party actively sending probe messages to IoT devices based on communication protocol formats. It collects and analyzes relevant fields from response data, such as DNS requests, MAC OUI, and the user-agent field in HTTP requests. These fields are then matched with identification rules, which are either manually crafted or generated automatically. If a substring from the application layer response data matches a rule, the device is labeled using the information from that rule.
Currently, the most popular large-scale active probing platforms are based on manually crafted rule libraries for identification, such as Censys, Shodan, and Zoomeye. However, a manual rule annotation is overly cumbersome and demands annotators with strong professional knowledge. Moreover, with an increasing number of protocols and device types, maintaining and updating large rule libraries becomes increasingly difficult. Therefore, scholars have attempted to create automated rule generation.
Feng et al. proposed ARE (Acquisitional Rule-Based Engine) [
16], which can automatically generate identification rules. ARE actively sends probe data to IoT devices and extracts relevant fields from the response data. It then uses these fields as search query keywords in search engines. Subsequently, ARE crawls websites from the search result list. For relevant webpages, ARE employs Named Entity Recognition (NER) to extract device annotations, including the device type, vendor, and product. However, ARE has some limitations, such as the inability to perform search queries if there is no device-related information in the response data.
To identify IoT devices that ARE cannot recognize, Wang et al. proposed IoTtracker [
17]. In the feature extraction stage, IoTtracker divides application layer data into semi-structured and unstructured data and then extracts the structure or features of each category. During the device identification stage, a labeled device database is required, and the similarity between the features of unknown devices and known devices is calculated. If the similarity exceeds a certain threshold, the label of the known device can be used to label the unknown device. The number of devices that this method can identify depends on the completeness of the label dataset.
Active IoT device identification, under the condition of allowing access to device information, can obtain more granular details such as the device type, vendor, and model. However, it has limitations when device ports are not open to external access or when third-party information collection is not permitted. In such cases, device information cannot be obtained, thus restricting the effectiveness of this method.
Passive IoT device identification is to identify IoT devices connected to the network by monitoring network traffic and analyzing device communication behavior. This method does not actively intervene in the network but passively observes the communication activities of the device to achieve device identification.
In [
18], the authors proposed a fingerprint-based identification framework called IoTsense. It generates device fingerprints based on network activities, protocol features, payload features, and other aspects of the first five data packets from the IoT device and then identifies the device type. Their fingerprint identification method provides an overview of the behavioral profile of device types. IoTsense collects both incoming and outgoing network traffic and then extracts relevant features using statistical tools, aggregates these features, and uses them as references for device type identification. This approach enables the monitoring of device behavior throughout its entire lifecycle. The results of the study showed recall rates of 93–100% for each device, with an average accuracy of 99%.
Sivanathan et al. [
19] utilized a large dataset obtained from 28 types of IoT devices (including motion sensors, digital cameras, plugs, smart lights, and healthcare-related monitoring devices) over a six-month period to classify IoT devices. They employed four device network activity attributes: (1) flow volume; (2) flow duration; (3) average flow rate; (4) device sleep time; and four application layer protocol attributes: (1) server port numbers; (2) domain name server queries; (3) Network Time Protocol (NTP) queries; (4) cipher suites. In total, eight key attributes were used for IoT device identification. The classification accuracy for the 28 IoT devices reached 99.88%. However, four devices from the dataset were not utilized, and some elements in the feature set were overly specific, thereby not focusing on device behavior, such as port numbers, DNS queries, and cipher suites.
In Reference [
20], the authors proposed an identification mechanism called IoT Sentinel, which automatically identifies IoT devices connected to the network based on the traffic generated during device boot-up. The study monitored device behavior by generating specific fingerprints for each device using data captured from IoT devices at Aalto University. IoT Sentinel extracts 23 features from each packet, describing its protocol, IP options, length, destination IP address, and port, and then selects the first 12 packets of each device to construct a feature matrix with 276 values as the device fingerprint. The study employed nine classifiers, among which Random Forest performed the best, achieving 95% accuracy. However, this method requires capturing traffic during the device boot-up phase for device identification, and a drawback is that once the device is operational, it cannot extract features and build models. Additionally, this method cannot identify devices from the same vendor.
Meanwhile, Kostas et al. [
21] proposed a machine-learning-based method for IoT device identification called IoTDevID. The study classified device behavior based on a single packet. In the feature extraction process, redundant features are eliminated using importance voting. The optimal feature subset is determined using a genetic algorithm, reducing the complexity of the model. The accuracy on the Aalto dataset was 83.30%, and in the UNSW dataset, it was 94.30%. IoTDevID achieves high accuracy for IoT devices at the packet level. Despite these advancements, a common issue still exists—the computational burden caused by excessive feature extraction, leading to intricate feature selection and complex features.
3. Proposed Method
In this study, we propose a method for identifying and classifying IoT devices at the session level based on a network communication payload.
3.1. System Overview and Methodology
The system involves extracting features from flows and sessions, as well as frequent patterns in payloads and utilizes a two-layer machine learning model for effective IoT device identification. As depicted in
Figure 1, our system workflow consists of the following four steps:
- (a)
Traffic and session splitting: dividing the original traffic into multiple individual sessions.
- (b)
Session Communication Feature Extraction: extracting session communication features of different IoT devices.
- (c)
Frequent Item Extraction and Initial Classification: extracting frequent patterns from different IoT device sessions and conducting initial classification based on the extraction results.
- (d)
Second Classification Prediction: performing a second round of IoT device classification based on session communication features and initial classification results. The detailed instructions for each step are as follows.
3.1.1. Traffic and Session Splitting
Our method first acquires traffic from the IoT network, where traffic from multiple devices is mixed together. Therefore, the purpose of this step is to separate the traffic from different IoT devices and reorganize it into sessions for feature extraction. In IoT device identification, the granularity of traffic splitting includes packets, TCP connections, flows, and sessions. Different granularities can be divided into different traffic units. Flows and sessions focus on the complete communication interaction process, helping to better understand the relationships between packets. Therefore, we chose to split traffic based on flows and sessions. A flow is defined as packets with the same five tuples (source IP, destination IP, source port, destination port, and transport protocol), while a session consists of bidirectional flows containing both directions of traffic. In this step, there are two operations: (1) IoT device traffic splitting: In the dataset we use, each device’s MAC address is fixed. Therefore, we first use the SplitCap tool (SplitCap is a free, open-source pcap file splitting tool that can split traffic based on flows, sessions, MAC addresses, etc.) to segment traffic based on MAC addresses, separating the traffic generated by each IoT device. (2) Session splitting: For the obtained traffic data from different IoT devices, we again use SplitCap to split each type of traffic generated by each IoT device into multiple files based on TCP or UDP flows, with each file representing a set of sessions for IoT devices.
3.1.2. Session Communication Feature Extraction
In this step, we will extract network communication features from the effective payload of sessions. Most session payloads are located at the top layer of the computer network protocol stack—the application layer, which is responsible for handling communication and data exchange between applications. The application layer data of different protocols vary significantly, and different IoT devices use different application layer protocols, exhibiting distinct features. Therefore, we attempt to extract communication features from each session.
The extracted features are shown in
Table 1, mainly including the following categories:
The message refers to the payload of a session, i.e., the actual data transmitted in the communication. Message characteristics mainly include the text-to-binary ratio of the message, the information entropy of the message, and the duration of the session.
Text-to-binary ratio: In our definition, text fields consist of continuous printable characters, including numbers, letters, punctuation marks, and standard symbols like spaces and tabs. When there are four or more consecutive characters that qualify as printable characters, that particular byte sequence is categorized as a text field; conversely, if not, it is identified as a binary field. For consistency and efficiency, we perform this calculation on the initial 100 bytes of each flow only.
Information entropy: Information Entropy is a quantity in information theory used to measure uncertainty [
22]. It is defined as follows:
where
represents the probability of random event
being
, and the value of entropy is
.
Our goal is to identify IoT devices based on the session’s payload. Therefore, we calculate the entropy of source and destination flows. As with the text-to-binary ratio, when determining the entropy of the payload, we limit our analysis to the first 100 bytes as well. If a session has no payload, its entropy value is assigned as 0.
- 2.
Counting characteristics
These are attributes that can be derived through simple counting. They primarily encompass the number of packets, the overall session length, etc. The quantity features take into account both directions of the flow and the entire session.
- 3.
Statistical characteristics
Statistical characteristics mainly refer to the statistical values of flows and sessions; it mainly includes the average length, quartiles of both direction flows, the maximum and minimum size of packets, the variance of packet length, etc.
Previous researchers mainly extracted features from just the initial few packets or individual packets from the traffic, which could lead to the neglect of correlations between packets. Therefore, we attempt to extract these features from complete sessions. Statistical features also encompass both directions of the flow and the entire session.
3.1.3. Frequent Item Extraction and Initial Classification
In this step, the requirement is to extract frequent items from sessions of different IoT devices. Previous studies have attempted to create device fingerprints for IoT device identification from specific application layer protocols (such as HTTP and DNS) [
23]; however, this method has latency issues. Consequently, instead of limiting ourselves to certain protocols, our method attempts to extract frequent items from all IoT device sessions to construct their fingerprints.
The first step involves aggregation sessions based on the destination port of the IoT device. After investigation, we discovered that for most IoT devices, the ports they connect to on the server side tend to be fixed or closely related, as illustrated in
Figure 2. Furthermore, sessions with similar or identical destination ports display a high level of resemblance. Therefore, we attempt to aggregate sessions of IoT devices based on their destination port numbers. We categorize port numbers into the following categories based on their characteristics: 53-DNS, 67-BOOTP server, 68-BOOTP client, 80-HTTP, 123-NTP, 137-NBNS, 554-RTSP, 443-HTTPS, 445-SMB,1900-SSDP, 5353-mDNS, 5355-LLMNR, 8080-WWW, 49153-ANTLR, 0:1023-well-known ports, 1023:49151-registered ports, and 49152:65535-dynamic (private) ports [
21]. These ports do not have any inclusive relationship; for example, port 80 will not be classified as a well-known port. If the destination port numbers of two sessions belong to the same category, then these two sessions will be aggregated together. However, if the destination port numbers of two sessions are the same but the transport layer protocols are different, they will not be grouped into the same category.
- 2.
Frequent Item Extraction
Next, we proceed to extract the IoT device’s frequent items from the sessions that have been aggregated in step one for the training set. It is important to note that we do not extract frequent items for non-IoT devices at this step.
Taking inspiration from the N-gram concept, we devised a method to extract frequent items from the first 40 bytes of the payload in each session. The process is as follows: A sliding window of size N moves across the payload at the byte level, generating a series of byte sequences, each having a length of N. We term each of these byte sequences a “gram”. Then, we calculate the frequency of occurrence of each gram in each category of sessions, apply a predefined threshold for filtering, and generate the list of frequent patterns for the current IoT device. Finally, we merge all frequent items extracted from the sessions of all IoT devices to compile a word set. The pseudocode is shown in Algorithm 1.
Algorithm 1: Frequent_Item_Extraction (pcaps, N, TS) |
Input: IoT session files pcaps, length N, Threshold TS |
Output: Frequent items Freq_item |
1: dport_setdict()
2: for Session in pcaps do
3: dport_set[Session.dport].append(Session)
4: end for
5: Freq_itemset()
6: for dport, session in dport_set do
7: F_setdict()
8: for s in session do
9: for i in [0:40-N] do
10: F_set[s[i:i+N]]+1
11: end for
12: end for
13: end for
14: for item,count in F_set do
15: if count/len(session)≥TS then
16: Freq_item.add(item)
17: end if
18: end for
19: return Freq_item |
- 3.
Initial Classification
Next, all sessions need to be transformed into word vectors based on the word set. The length of each vector corresponds to the total number of unique words in the set. For each session, if a word from the set is present, the matching position in the word vector is assigned a value of one; conversely, if the word is absent, the position is assigned a zero. After vectorizing all sessions, a word vector matrix is obtained. Each word vector is labeled according to the IoT device type. If there are non-IoT devices in the dataset, all non-IoT devices are classified into one category. After transforming the sessions into a word vector matrix, we input the training set and labels into the first-layer machine learning model to train the first-layer classifier, which we denote as “Classifier 1”. Then, we utilize Classifier 1 to calculate the confidence matrix for the initial classification of the training set. The pseudocode is shown in Algorithm 2.
Algorithm 2: Initial_Classification(FIs, train_set, train_labels) |
Input: Frequents sets of IoT FIs, train set trains_set, train_labels train_labels |
Output: initial classifier Classfier_1, confidence matrix CM |
1: WSUnion(FIs) # Union of FIs
2: Matrixlist()
3: Vector[0 for i in len(WS)]
4: for session in train_set do:
5: for i in 0:len(WS) do
6: if WS[i] in session
7: Vector[i]1
8: end if
9: end for
10: end for
11: Matrix.append(Vector)
12: Classfier_1MLModel(Matrix, train_labels)
13: CMClassifier_1(Matrix)
14: return Classifier_1, CM |
3.1.4. Second Classification
Based on the second and third steps, we obtain two feature matrices: one is the network communication feature matrix (M1) for sessions, and the other is the confidence matrix (M2) for initial classification. In this step, we will utilize these two matrices to construct the prediction model for IoT device classification. We will input M1 along with M2 and the labels into the second-layer classifier for training. Subsequently, based on the results, we will fine-tune and optimize the model parameters to derive the final classification model, referred to as “Classifier 2”. The output of Classifier 2 will serve as the final result.
3.2. Identification of IoT Devices Using Dual-IoTID
On the basis of the described workflow, we propose a dual-layer model for IoT device identification. The operational process can be outlined as follows: Given a session, the initial step involves extracting its network communication features, denoted as
f1. Next, based on the word set established in
Section 3.1.3, we transform the payload of this session into a word vector representation, termed
. Following this, we feed
as the input into Classifier 1, which produces a confidence vector,
. Finally, we combine
and
f1 as the input for Classifier 2 to finally obtain the classification result T. The pseudocode of the method is shown in Algorithm 3.
Algorithm 3: Dual-IoTID(S, WS) |
Input: session S, word set WS |
Output: device type T |
1:
2: f1network_feature_Extractor(S)
3: for i in do
4: if WS[i] in S then
5:
6: end if
7: end for
8: Classifier_1()
9: TClassifier_2([f1, ])
10: return T |
4. Model Selection
In this section, we introduce the dataset and evaluation metrics utilized in the experiment. Subsequently, we proceed to select the optimal classification algorithm as the IoT device identification model through the experiment.
4.1. Dataset
To accurately evaluate the performance of Dual-IoTID, we selected the publicly available UNSW dataset [
19], which contains diverse real network data from various IoT devices. The UNSW dataset was collected by the University of New South Wales to evaluate device type identification in real smart environments. It comprises traffic from 23 different IoT devices, such as Amazon Echo and a Belkin Wemo sensor, with each device recording 26 weeks of traffic data. However, only 20 days of data are publicly available. Additionally, the dataset includes traffic from non-IoT devices like laptops and iPhones. All this traffic is generated during normal device operation and does not include any abnormal or attack traffic. For each IoT device category, we conducted experiments using up to 10,000 randomly selected sessions. The dataset also contains some non-IoT device traffic, as detailed in
Table 2. Despite some devices having limited data, we retained them for analysis.
In the experiments, these sessions were used to train machine learning models for IoT device identification. We divided the dataset into a 70:30 ratio, meaning that 70% of the sessions from each device class were randomly selected for training, while the remaining 30% was used for testing.
4.2. Evaluation Metrics
To evaluate the performance of Dual-IoTID on the test set, we used the following four metrics as evaluation metrics: (1)
, (2)
, (3)
, and (4)
score. These four metrics are determined by
,
,
, and
, where
represents a true positive,
represents a true negative,
represents a false positive, and
represents a false negative.
4.3. Algorithm Selection
In terms of device identification models, based on previous research and related studies [
24], we selected five popular machine learning models: Random Forest (RF), eXtreme Gradient Boosting (XGB), Decision Tree (DT), Naive Bayes (NB), and Support Vector Machine (SVM). We evaluated the effectiveness of classification using a 10-fold cross-validation method and utilized a random search to find the optimal parameters for each model, which were then applied to independent test datasets. The model selection process was conducted excluding non-IoT devices.
4.3.1. Initial Classification Model Selection
As proposed in
Section 3, there are significant differences in the payloads of sessions for different IoT devices. Therefore, we attempted to split the payloads of sessions and extract frequent items from the segmented content for the initial classification of IoT devices. We selected the confidence matrix outputted by the model, with a high initial classification accuracy as the input for the second-layer classification.
Firstly, we proposed to slice the effective content of sessions into a fixed-length gram in N-gram mode and statistically analyze each gram to obtain a set of frequent items exceeding the threshold. A key parameter to determine in this process was the specific value of N.
Table 3 illustrates the accuracy of classification predictions on the test set using different models under different values of N. In the experiment, we set the threshold for frequent items to 0.6, meaning that words appearing in a class of sessions with a frequency exceeding 60% were considered frequent items.
From the experimental results, it can be observed that an increase in the value of N led to a gradual decline in the accuracy of each model. Consequently, when setting the length of frequent items at N = 2, the initial classification model exhibited the highest classification performance.
In terms of classification performance, XGB, RF, and DT were the best-performing algorithms, with initial classification accuracies exceeding 90%. NB consumed the least amount of time compared to the other algorithms but had the lowest accuracy. SVM also achieved good results in the initial classification model, but it also required much more time than the other methods. As a result, in the remaining work, we will try to use the confidence matrix outputted by XGB, RF, and DT as inputs for the second-layer model.
4.3.2. Second Classification Model Selection
After the initial classification model, we obtained a confidence matrix for each session. Next, we combined the confidence matrix of the train set with the extracted session communication features (as described in
Section 3.1.2). These merged data, together with the corresponding training set labels, were then fed into the second-layer classification model. For the sake of identification effectiveness, we only attempted DT, XGB, and RF for the second-layer model.
Table 4 presents the IoT device identification results on the test set using different classifiers. Similarly, we employed a random search to find the optimal parameters for each model and utilized a 10-fold cross-validation to assess the effectiveness of the classification.
From
Table 4, it can be seen that in the second-layer classification model, regardless of the first-layer model, RF, XGB, and DT perform well, achieving identification accuracies of 99% and above. Among them, the XGB + XGB model exhibits the best performance, achieving an accuracy of 99.48% and a recall of 99%. Additionally, XGB + XGB also provides relatively fast identification efficiency. Based on these considerations, we selected the highest-scoring combination, XGB + XGB, for device classification.
5. Performance Evaluation
In this section, we present our experimental results and provide a detailed analysis of the obtained results. Subsequently, we compare the performance of our proposed Dual-IoTID method with other existing methods.
5.1. Experimental Results
We conducted two experiments to evaluate the performance of Dual-IoTID. The first experiment included only the communication traffic of 23 IoT devices, excluding non-IoT device traffic.
Table 5 presents the accuracy, precision, recall, and F1 score for all devices in our dataset. It can be observed that our approach achieved high accuracy for most devices, with identification precision and recall exceeding 99%. Some IoT devices like Nest Dropcam and Dropcam exhibited relatively lower accuracy due to insufficient data volume, making it challenging to extract an adequate amount of features for identification. However, overall, we achieved excellent performance, with a global accuracy of 99.48% for all devices.
To better demonstrate the effectiveness of our proposed method,
Figure 3 illustrates the confusion matrix for each device. It can be observed that only a small number of devices are misclassified. Additionally, our method exhibits a low misclassification rate for devices with different purposes from the same vendor, indicating high identification accuracy even for similar devices from the same manufacturer. In this regard, our method outperforms previous works.
To verify whether our proposed method can accurately identify IoT devices in an Internet environment, we conducted Experiment 2, which included traffic generated by non-IoT devices. In this experiment, 23 labels were assigned to 23 IoT devices, while all non-IoT devices were assigned a different label.
Table 6 presents the overall device identification results of Experiment 2. From the table, it can be seen that with the inclusion of data generated by non-IoT devices, the overall identification precision, recall, F1 score, and accuracy decreased slightly compared to when there were no non-IoT devices. However, the identification accuracy still reached 98.5%, indicating that our proposed method can accurately identify IoT devices even in an Internet environment and distinguish them from non-IoT devices.
Figure 4 shows the confusion matrix for this experiment.
5.2. Comparison with Previous Works
As shown in
Table 7, we compared Dual-IoTID with recent advanced IoT classification and identification techniques. The IoT Sentinel model selects 23 features from the first 12 packets of each device as device fingerprints, including source IP address, destination IP address, packet attributes, etc., and then uses machine learning algorithms for detection. Its global identification accuracy for 27 IoT devices was 81.50%.
IoTDevID identifies IoT devices based on packet payloads and systematically studies packet-level features, proposing various methods to filter feature sets for identification, contributing to the automation of feature analysis. IoTDevID achieved an IoT device identification accuracy of only 94.30% in the UNSW dataset.
IoTTFID [
25] is an incremental learning model based on traffic fingerprints, capable of dynamically adapting to environments and identifying new types of IoT devices without requiring retraining from scratch. The model achieves this by extracting traffic fingerprints from new devices, preprocessing them into input vectors, and then updating select parameters of the existing model’s networks, thereby extending its identification scope.
Compared to IoTDevID, IoT Sentinel, and IoTTFID, Dual-IoTID achieved a higher accuracy rate of 99.48% in the UNSW dataset. It outperforms existing works in terms of classification performance and allows for finer-grained comparisons. Our method can correctly distinguish devices of different types from the same manufacturer, whereas IoT Sentinel and IoTDevID lack effectiveness in identifying such similar IoT devices.
IoT Sentinel and IoTDevID utilize network layer and link layer features for IoT device identification; IoTTFID also utilizes information from the link layer, network layer, and transport layer to generate fingerprints. However, the network environment is not static, and changes in the environment can affect these features. For example, in networks using NAT, the efficiency of device identification can be greatly affected. Unlike these methods, we attempted to identify IoT devices by analyzing the payload of IoT device sessions. Since the payload is unaffected by NAT and changes in the network environment, our method can be applied to more scenarios, achieving accurate IoT device identification.
6. Discussions and Conclusions
Identification of IoT device types is an essential part of recognizing instances of IoT devices, which contributes to establishing robust device authentication. In this study, we introduced a novel dual-machine learning model named Dual-IoTID, designed for IoT device identification. Unlike previous works that focused on packet headers and network layer features, our approach concentrated on the network communication payload, utilizing payload content and communication features for IoT device identification.
Specifically, we commenced by categorizing IoT device sessions according to their destination ports, subsequently extracting frequent items according to aggregation results. Following this, the sessions were converted into a word vector matrix, which served as the input for the first-layer classifier, generating an initial classification confidence matrix. Ultimately, this confidence matrix, coupled with the extracted session communication features, was fed into the second-layer classifier to train the definitive classifier.
Experimental results demonstrate the outstanding accuracy of Dual-IoTID. Furthermore, our method accurately identified devices of different types from the same vendor. Even in Internet environments, Dual-IoT demonstrated an excellent identification performance. Due to our method’s focus solely on session payloads, irrespective of the protocols used, it can accurately identify devices even when employing encrypted communication, thus enhancing its versatility. Similarly, Dual-IoTID is applicable in network environments utilizing NAT, owing to the same reasons.
Dual-IoTID offers a practical and efficient solution for IoT device identification. While the Dual-IoTID algorithm represents a significant stride in enhancing the accuracy of IoT device identification, it is not without its limitations. For instance, despite being designed with the intent of bolstering network security, the in-depth analysis of session payloads may inadvertently encroach upon user privacy, posing ethical and legal concerns. Moving forward, our objective is to extend the application of this method to broader scenarios, including new areas such as agriculture, industry, healthcare, and smart cities, and to further refine IoT device risk detection capabilities from this foundation. Concurrently, we intend to optimize the method to facilitate the identification of IoT devices in real-time scenarios. Ultimately, we aim to develop advanced privacy-preserving technologies that safeguard sensitive IoT data while enabling meaningful risk assessments. In an increasingly interconnected world, striking a balance between privacy and security is paramount. We aspire to establish a more comprehensive and dynamic framework for IoT device risk detection, fostering a safer and more resilient IoT eco-system.