IoT-Portrait: Automatically Identifying IoT Devices via Transformer with Incremental Learning
Round 1
Reviewer 1 Report
This paper presented the IoT-Portrait: Automatically Identifying IoT Devices via Transformer with Incremental Learning. This paper handles an interesting issue and it sounds good. But some major points for your consideration are:
· The language needs to be polished.
· The authors are encouraged to add subsection related to feature engineering including what is the motivation for select these features and explain each feature and its importance.
· It would be helpful if the authors included a table of acronyms.
· About the contribution, you presented the only improvement without analyzing drawbacks of the proposed approaches; for example, you noticed or expect computation overhead, or minor responsiveness because of the execution of proposed approach?
· Present some background on incremental learning empowered disease diagnosis mechanism in IoT. The description must clarify how this method will be utilized?
· It is not clear what the motivation is for adopting the proposed method as a training model. Are proposed methods necessarily better than other models?
Author Response
Point 1: The language needs to be polished.
Response 1: Thanks for your comment. We have checked the paper thoroughly with an English editing software.
Point 2: The authors are encouraged to add subsection related to feature engineering including what is the motivation for select these features and explain each feature and its importance.
Response 2: Thanks for your suggestion. After the data preprocessing phase and data sampling phase, we convert the sample into a matrix as the model’s input. Therefore, the features are generated by converting the data in the packet into numbers one by one. We have added the explanation of these features and their importance in Section 4.3.3 as follows:
4.3.3. Training Data Generation
After the data preprocessing phase and data sampling phase, we obtained many samples, each sample contains a fixed number of consecutive packets and each packet has a fixed length. Assume the number of consecutive packets is packet_num and the length of each packet is packet_len. We generate the training data by converting the sample into a packet_num*(packet_len+1) matrix, where 1 contains information on the packet number within 100ms representing the packet rate and other contains the data of the packet. The types of features involved are as follows:
- Some obviously important features obtained from the data packet. As analyzed in Section 3, there are distinguishable patterns between the traffic information of IoT devices and non-IoT devices, IoT devices and IoT devices (the IP and MAC addresses of the device's communication endpoints, communication ports, and the use of protocols). In addition, the data packet length is also an important feature that data packets of different IoT devices may have different lengths.
- Other features obtained from the data packet. The data packet information (the version of IP, etc.) designed and used by each device may be different, and it is not easy for us to find the rules of this information. Therefore, we convert the information in the data packet within packet_len length into numbers and store them in the matrix.
- Calculated statistical characteristics of multiple packets. As analyzed in Section 3, IoT devices have regular communication patterns so the traffic rate characteristic is also important. We count the number of packets within 100ms to represent the traffic rate of the device.
Point 3: It would be helpful if the authors included a table of acronyms.
Response 3: Thanks for your suggestion. We have added the table of acronyms before Section References.
Abbreviations
The following abbreviations are used in this manuscript:
IoT Internet of Things
OS Operating System
HTTP Hypertext Transfer Protocol
SSH Secure Shell Protocol
FTP File Transfer Protocol
HTML HyperText Markup Language
LSTM Long short-term memory
TCP Transmission Control Protocol
SDN Software-Defined Networking
IP Internet Protocol
MAC Media Access Control
PC Personal Computer
HTTPS Hypertext Transfer Protocol Secure
UDP User Datagram Protocol
DNS Domain Name System
NTP Network Time Protocol
MDNS Multicast DNS
LAN Local Area Network
ICMP Internet Control Message Protocol
RNN Recurrent Neural Network
UNSW University of New South Wales
FT Fine Tune
FTDL Fine Tune with Distillation Loss
MBC Multiple Binary Classfiers
Point4: About the contribution, you presented the only improvement without analyzing drawbacks of the proposed approaches; for example, you noticed or expect computation overhead, or minor responsiveness because of the execution of proposed approach?
Response 4: Thanks for your comment. The drawback of our approach is that if a system error occurs or the network status fluctuates, our identification model may misidentify because the traffic at this time does not conform to the knowledge learned by the model. We have added a new section to discuss about the drawbacks of our work and future works as follows:
6. Limitations and Future Works
Our identification model relies on a stable network. If a system error occurs or the network status fluctuates, the traffic behavior of the device will change, which will affect the effect of device identification. Considering that network traffic may show diversity in natural conditions, such as network jitter in a short time, strategies that adapt to the natural diversity of network traffic need to be investigated to ensure the effectiveness of the identification model. For example, we can average the traffic data over a certain time interval. In the future, we can make more attempts at feature engineering to solve this problem.
There are many studies on class incremental learning. In this paper, we only compare the effects of three main methods applied to our model. In addition, there are some methods to learn new knowledge by expanding the model structure or adjusting model parameters. There is not much research on class incremental learning for IoT device identification, and we can continue to experiment in this area in the future.
Point 5: Present some background on incremental learning empowered disease diagnosis mechanism in IoT. The description must clarify how this method will be utilized?
Response 5: Thanks for your suggestion. Section 4.5 discusses how the class incremental learning method is used. The FTDL method is to add distillation loss into the original loss function when training the model and the distillation loss will retain old knowledge from the old model. We have added more details as follows:
The FTDL method adds the distillation loss into the original cross-entropy loss function. When training the new model, the distillation loss will calculate the discrepancy in the output logits between the old model and the new model, which will make the predictions of the new model as close as possible to that of the old model. Compared with the Fine Tuning (FT) method which preserves the knowledge learned from the old data by fixing the parameters of the old model and learns new knowledge by retraining the dense layer, the FTDL method uses distillation loss to retrain the classifier and can restore the prediction results of the old model to the greatest extent while learning new data. Compared with the Multiple Binary Classifiers (MBC) that trains multiple binary classifiers that each class i has a corresponding binary classifier Ci, the FTDL method only trains one multi-classifier and retrains it when necessary. Although the MBC method does not need to retrain existing classifiers and just train a new binary classifier for the new device, the cost of the memory and prediction time of the FTDL method is much less than the MBC method.
Point 6: It is not clear what the motivation is for adopting the proposed method as a training model. Are proposed methods necessarily better than other models?
Response 6: Thanks for your suggestion. We have explained the reasons we choose the transformer network in Section 4.3.4 and we have rewritten the reasons in detail as follows:
First, the traffic of the device is regular in time series, and both the transformer network and Recurrent Neural Network (RNN) can well mine the regularity of the features presented in the time series. Second, compared with RNN, which must be calculated in chronological order, the parallel processing mechanism of transformer can bring higher computational efficiency and significantly speed up the training. In addition, RNN is prone to gradient explosion and gradient disappearance during training, which will affect the model training, while transformer does not have such problems.
Author Response File: Author Response.pdf
Reviewer 2 Report
Overall, the article is good. The technical parameters of the network used are not new, they have only been used for a new type of network devices. For this reason, I think the problem is not new.
I have a few comments that should be corrected or added:
-In the caption of figure 8 it should be briefly repeated what f1 and acc are.
-There are only a few examples in the overview. I think that the literature on this subject is more extensive.
-The labeling algorithm could be better described. This seems to be the main contribution of the authors.
-The summary needs to be expanded so that after reading this passage, the reader will immediately know your results and contributions.
-It is worth adding ideas for other further work, e.g. practical application.
Author Response
Point 1: In the caption of figure 8 it should be briefly repeated what f1 and acc are.
Response 1: Thanks for your suggestion. We have added the formulas to describe what f1 and acc are before Figure 8.
Point 2: There are only a few examples in the overview. I think that the literature on this subject is more extensive.
Response 2: Thanks for your comment. We have added some other literature on this subject in Section References and added descriptions of two other works to Section Related Works as follows:
Manuel, et al. designed a network traffic classifier for IoT devices using CNN and RNN, and investigated the impact of the features chosen. HomeMole is a traffic-analysis system that can automatically infer the IoT devices behind a smart home network. It uses a bidirectional LSTM model that is able to identify IoT devices based on the sniffed packets.
References
- Duan, C.; Gao, H.; Song, G.; Yang, J.; Wang, Z. ByteIoT: A practical IoT device identification system based on packet length distribution. IEEE Transactions on Network and Service Management, 2021, 19(2), 1717-1728.
- Aksoy, A.; Gunes, M. H. Automated iot device identification using network traffic. In ICC 2019-2019 IEEE International Conference on Communications. IEEE, 2019, 1-7.
- Salman, O.; Elhajj, I. H.; Chehab, A.; Kayssi, A. A machine learning based framework for IoT device identification and abnormal traffic detection. Transactions on Emerging Telecommunications Technologies, 2022, 33(3): e3743.
- Bai, L.; Yao, L.; Kanhere, S. S.; Wang, X.; Yang, Z. Automatic device classification from network traffic streams of internet of things. In 2018 IEEE 43rd conference on local computer networks. IEEE, 2018, 1-9.
- Kotak, J.; Elovici, Y. Iot device identification using deep learning. In 13th International Conference on Computational Intelligence in Security for Information Systems. Springer International Publishing, 2021, 12, 76-86.
- Lopez-Martin, M.; Carro, B.; Sanchez-Esguevillas, A.; Lloret, J. Network traffic classifier with convolutional and recurrent neural networks for Internet of Things. IEEE access, 2017, 5, 18042-18050.
Point 3: The labeling algorithm could be better described. This seems to be the main contribution of the authors.
Response 3: Thanks for your suggestion. We obtain detailed device information by active information collection using the IP address extracted from the packet and use it as the sample’s label. We have restructured this part and did a partial rewrite as follows:
For the sample labeling, we use information such as product type and device category, collected through the device information collection module, as labels. Specifically, we extract the IP address from the data packet, and actively collect information based on this IP address. Afterward, the packet undergoes data preprocessing and data sampling, and the information obtained before is used as the label of this sample. If the collected information is sufficient, the samples will be automatically labeled. Otherwise, they need to be manually labeled. After labeling, the samples will be used as the training input of the transformer model.
Point4: The summary needs to be expanded so that after reading this passage, the reader will immediately know your results and contributions.
Response 4: Thanks for your suggestion. We have rewritten the conclusion to make our results and contributions more clear as follows:
In this paper, we design IoT-Portrait, an automatic IoT device identification framework based on transformer. Compared with previous works, IoT-Portrait leverages an information acquisition scheme, combining both active and passive approaches to reduce the manual effort of labeling. Besides, IoT-Portrait uses a transformer neural network to mine potential features in the IoT device's traffic. For the problem of catastrophic forgetting, IoT-Portrait applies the class incremental learning method to retrain the model when new devices join the network. To the best of our knowledge, IoT-Portrait is the first work that applies the class incremental learning method to IoT device identification. The experimental results show that IoT-Portrait achieves high accuracy for IoT device identification and is well resistant to catastrophic forgetting.
Our result shows that IoT devices can be well identified through traffic information, and also verifies the feasibility of class incremental learning in IoT device identification. We will do more research to overcome the interference of network fluctuations on IoT device identification and attempt more class incremental learning methods in the future.
Point 5: It is worth adding ideas for other further work, e.g. practical application.
Response 5: Thanks for your suggestion. We have added a new section to discuss about the drawbacks of our work and future works as follows:
6. Limitations and Future Works
Our identification model relies on a stable network. If a system error occurs or the network status fluctuates, the traffic behavior of the device will change, which will affect the effect of device identification. Considering that network traffic may show diversity in natural conditions, such as network jitter in a short time, strategies that adapt to the natural diversity of network traffic need to be investigated to ensure the effectiveness of the identification model. For example, we can average the traffic data over a certain time interval. In the future, we can make more attempts at feature engineering to solve this problem.
There are many studies on class incremental learning. In this paper, we only compare the effects of three main methods applied to our model. In addition, there are some methods to learn new knowledge by expanding the model structure or adjusting model parameters. There is not much research on class incremental learning for IoT device identification, and we can continue to experiment in this area in the future.
Author Response File: Author Response.pdf
Round 2
Reviewer 1 Report
The authors have considered all comments from the last round. It is therefore possible to publish this paper in its current form.
Author Response
Thank you very much for all your advice!