ICLSTM: Encrypted Traffic Service Identification Based on Inception-LSTM Neural Network
Abstract
:1. Introduction
- A new encrypted traffic identification method-ICLSTM is proposed, which can automatically extract traffic features using neural networks, and no complex feature engineering is required.
- A model architecture containing two neural networks is proposed for feature extraction of encrypted traffic. One-dimensional convolutional neural network embedded the Inception module is used to extract local features of the traffic, and the LSTM model is used to extract the temporal features of the packets within a session. Then the extracted features are fused, which extends the feature information and enhances the characterization of packet features. Experiments show that our method can achieve better results in encrypted traffic service identification.
- A processing scheme for unbalanced data sets is proposed to enhance the symmetry of the data by adopting the method of assigning weights to different categories to effectively alleviate the data imbalance problem.
2. Related Work
3. Methodology
3.1. Dataset
3.2. Preprocessing
3.2.1. Traffic Representation Options
- Raw flow: the set P of all packets, each packet has five-tuple information.
- Flow: The set P is divided into multiple subsets according to the five-tuple information, and the packets in each subset are arranged in chronological order within a certain time window, making it a flow f.
- Session: The difference with flow is that the source and destination IP/port of its five-tuple are interchangeable, so it is also called bidirectional flow. The current research is more utilized also based on session flow, so we also chose the session flow.
3.2.2. Data Preprocessing
- (1)
- Pcap-session segmentation: continuous raw traffic is divided into multiple discrete traffic units according to a certain granularity [28].
- (2)
- Traffic clean: packet files without application layer generate bin files without actual content. The packet files with the same content for sessions or flows generate duplicate files. So it is necessary to clean up the chopped traffic data and only retain the needed traffic data.
- (3)
- Uniform the input size and generate gray images: using deep learning Neural network to train data requires a fixed amount of input, and we unifies the session segments in the above steps to 784 bytes in size. On the one hand, there are relevant papers proving that 784 bytes are effective for classification, and on the other hand, 784 bytes are more lightweight for some literature dealing with 1500 bytes. If the segment size is larger than 784 bytes, it is trimmed to 784 bytes. If the segment size is smaller than 784 bytes, add 0 × 00 at the end to supplement to 784 bytes and convert it to a gray images of size 28 × 28.
- (4)
- Conversion to IDX: IDX format is a common file format in the field of deep learning. We converted the generated traffic gray images to the IDX file format which commonly used by neural network models.
3.3. Class Imbalance
3.4. Background on Neural Networks
3.4.1. Inception Module
3.4.2. Long Short-Term Memory Module
- (I)
- Forget gate. The first step of LSTM is to decide what information needs to be discarded from the cell state, which is done by the sigmoid cell of the oblivion gate to decide what information needs to be removed from the LSTM memory. 0: Completely discarded, 1: Completely retained. Calculated as Equation (3).
- (II)
- Input gates. The second step of the LSTM is to use the input gates to decide what information to add to the cell state into the LSTM memory. The output is calculated as Equations (4)–(6).
- (III)
- Output gate. After updating the cell states it is necessary to determine which state characteristics of the output cells are based on the input and . The calculation is as Equations (7) and (8).
3.5. Model Architectures
4. Experimental
4.1. Experimental Configuration
4.2. Evaluation Indicators
4.3. Evaluation
4.3.1. Impact of Setting Category Weights
4.3.2. Analysis of Results
4.3.3. Model Comparison
5. Conclusions and Future Work
Author Contributions
Funding
Institutional Review Board Statement
Informed Consent Statement
Data Availability Statement
Acknowledgments
Conflicts of Interest
Abbreviations
MLP | Muti-Layer Perception |
CNN | Convolutional Neural Network |
SAE | Staked Autoencoder |
LSTM | Long short-term Memory Network |
NN | Neural Networks |
TP | True Positives |
TN | True Negatives |
FP | False Positives |
FN | False Negatives |
References
- Cisco Encrypted Traffic Analytics 2019. Available online: https://www.cisco.com/c/en/us/solutions/collateral/enterprise-networks/enterprise-network-security/nb-09-encrytd-traf-anlytcs-wp-cte-en.html (accessed on 10 February 2021).
- Soleymanpour, S.; Sadr, H.; Beheshti, H. An Efficient Deep Learning Method for Encrypted Traffic Classification on the Web. In Proceedings of the 2020 6th International Conference on Web Research (ICWR), Tehran, Iran, 22–23 April 2020; pp. 209–216. [Google Scholar] [CrossRef]
- Wang, W.; Zhu, M.; Zeng, X.; Ye, X.; Sheng, Y. Malware traffic classification using convolutional neural network for representation learning. In Proceedings of the 2017 International Conference on Information Networking (ICOIN), Da Nang, Vietnam, 11–13 January 2017; pp. 712–717. [Google Scholar] [CrossRef]
- Javaid, A.Y.; Niyaz, Q.; Sun, W.; Alam, M. A Deep Learning Approach for Network Intrusion Detection System. EAI Endorsed Trans. Security Safety 2016, 3, e2. [Google Scholar] [CrossRef] [Green Version]
- Vu, L.; Thuy, H.V.; Nguyen, Q.U.; Ngoc, T.N.; Nguyen, D.N.; Hoang, D.T.; Dutkiewicz, E. Time Series Analysis for Encrypted Traffic Classification: A Deep Learning Approach. In Proceedings of the 2018 18th International Symposium on Communications and Information Technologies (ISCIT), Bangkok, Thailand, 26–29 September 2018; pp. 121–126. [Google Scholar] [CrossRef]
- Anderson, B.; McGrew, D.A. Identifying Encrypted Malware Traffic with Contextual Flow Data. In Proceedings of the 2016 ACM Workshop on Artificial Intelligence and Security, AISec@CCS 2016, Vienna, Austria, 28 October 2016; Freeman, D.M., Mitrokotsa, A., Sinha, A., Eds.; ACM: New York, NY, USA, 2016; pp. 35–46. [Google Scholar] [CrossRef]
- Al-Obaidy, F.; Momtahen, S.; Hossain, M.F.; Mohammadi, F.A. Encrypted Traffic Classification Based ML for Identifying Different Social Media Applications. In Proceedings of the 2019 IEEE Canadian Conference of Electrical and Computer Engineering, CCECE 2019, Edmonton, AB, Canada, 5–8 May 2019; pp. 1–5. [Google Scholar] [CrossRef]
- Alshammari, R.; Zincir-Heywood, A.N. Can encrypted traffic be identified without port numbers, IP addresses and payload inspection? Comput. Netw. 2011, 55, 1326–1350. [Google Scholar] [CrossRef]
- Wang, P.; Li, S.; Ye, F.; Wang, Z.; Zhang, M. PacketCGAN: Exploratory Study of Class Imbalance for Encrypted Traffic Classification Using CGAN. In Proceedings of the 2020 IEEE International Conference on Communications, ICC 2020, Dublin, Ireland, 7–11 June 2020; pp. 1–7. [Google Scholar] [CrossRef]
- Szegedy, C.; Liu, W.; Jia, Y.; Sermanet, P.; Reed, S.E.; Anguelov, D.; Erhan, D.; Vanhoucke, V.; Rabinovich, A. Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, CVPR 2015, Boston, MA, USA, 7–12 June 2015; pp. 1–9. [Google Scholar] [CrossRef] [Green Version]
- Pimenta Rodrigues, G.A.; De Oliveira Albuquerque, R.; Gomes de Deus, F.E.; De Sousa, R.T., Jr.; De Oliveira Júnior, G.A.; García Villalba, L.J.; Kim, T.-H. Cybersecurity and Network Forensics: Analysis of Malicious Traffic towards a Honeynet with Deep Packet Inspection. Appl. Sci. 2017, 7, 1082. [Google Scholar] [CrossRef] [Green Version]
- Ning, J.; Poh, G.S.; Loh, J.; Chia, J.; Chang, E.-C. PrivDPI: Privacy-Preserving Encrypted Traffic Inspection with Reusable Obfuscated Rules. In Proceedings of the 2019 ACM SIGSAC Conference on Computer and Communications Security (CCS’19), London, UK, 11–15 November 2019; Association for Computing Machinery: New York, NY, USA, 2019; pp. 1657–1670. [Google Scholar] [CrossRef]
- Velan, P.; Cermák, M.; Celeda, P.; Drasar, M. A survey of methods for encrypted traffic classification and analysis. Int. J. Netw. Manag. 2015, 25, 355–374. [Google Scholar] [CrossRef]
- Yao, Z.; Ge, J.; Wu, Y.; Lin, X.; He, R.; Ma, Y. Encrypted traffic classification based on Gaussian mixture models and Hidden Markov Models. J. Netw. Comput. Appl. 2020, 166, 102711. [Google Scholar] [CrossRef]
- Madhukar, A.; Williamson, C.L. A Longitudinal Study of P2P Traffic Classification. In Proceedings of the 14th International Symposium on Modeling, Analysis, and Simulation of Computer and Telecommunication Systems (MASCOTS 2006), Monterey, CA, USA, 11–14 September 2006; pp. 179–188. [Google Scholar] [CrossRef] [Green Version]
- Lucia, M.J.D.; Cotton, C. Detection of Encrypted Malicious Network Traffic using Machine Learning. In Proceedings of the 2019 IEEE Military Communications Conference, MILCOM 2019, Norfolk, VA, USA, 12–14 November 2019; pp. 1–6. [Google Scholar] [CrossRef]
- Draper-Gil, G.; Lashkari, A.H.; Mamun, M.S.I.; Ghorbani, A. Characterization of Encrypted and VPN Traffic using Time-related Features. In Proceedings of the 2nd International Conference on Information Systems Security and Privacy, ICISSP 2016, Rome, Italy, 19–21 February 2016; Camp, O., Furnell, S., Mori, P., Eds.; SciTePress: Setúbal, Portugal, 2016; pp. 407–414. [Google Scholar] [CrossRef]
- Bhatia, M.; Sharma, V.; Singh, P.; Masud, M. Multi-Level P2P Traffic Classification Using Heuristic and Statistical-Based Techniques: A Hybrid Approach. Symmetry 2020, 12, 2117. [Google Scholar] [CrossRef]
- Ma, C.; Du, X.; Cao, L. Improved KNN Algorithm for Fine-Grained Classification of Encrypted Network Flow. Electronics 2020, 9, 324. [Google Scholar] [CrossRef] [Green Version]
- De Toledo, T.R.; Torrisi, N.M. Encrypted DNP3 Traffic Classification Using Supervised Machine Learning Algorithms. Mach. Learn. Knowl. Extr. 2019, 1, 384–399. [Google Scholar] [CrossRef] [Green Version]
- Anderson, B.; Paul, S.; McGrew, D.A. Deciphering malware’s use of TLS (without decryption). J. Comput. Virol. Hacking Tech. 2018, 14, 195–211. [Google Scholar] [CrossRef] [Green Version]
- Wang, W.; Zhu, M.; Wang, J.; Zeng, X.; Yang, Z. End-to-end encrypted traffic classification with one-dimensional convolution neural networks. In Proceedings of the 2017 IEEE International Conference on Intelligence and Security Informatics (ISI), Beijing, China, 22–24 July 2017; pp. 43–48. [Google Scholar] [CrossRef]
- Lotfollahi, M.; Siavoshani, M.J.; Zade, R.S.H.; Saberian, M. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef] [Green Version]
- Zou, Z.; Ge, J.; Zheng, H.; Wu, Y.; Han, C.; Yao, Z. Encrypted Traffic Classification with a Convolutional Long Short-Term Memory Neural Network. In Proceedings of the 20th IEEE International Conference on High Performance Computing and Communications; 16th IEEE International Conference on Smart City; 4th IEEE International Conference on Data Science and Systems, HPCC/SmartCity/DSS 2018, Exeter, UK, 28–30 June 2018; pp. 329–334. [Google Scholar] [CrossRef]
- Van Roosmalen, J.; Vranken, H.P.E.; van Eekelen, M.C.J.D. Applying deep learning on packet flows for botnet detection. In Proceedings of the 33rd Annual ACM Symposium on Applied Computing, SAC 2018, Pau, France, 9–13 April 2018; Haddad, H.M., Wainwright, R.L., Chbeir, R., Eds.; ACM: New York, NY, USA, 2018; pp. 1629–1636. [Google Scholar] [CrossRef]
- Dong, C.; Zhang, C.; Lu, Z.; Liu, B.; Jiang, B. CETAnalytics: Comprehensive effective traffic information analytics for encrypted traffic classification. Comput. Netw. 2020, 176, 107258. [Google Scholar] [CrossRef]
- Xu, L.; Dou, D.; Chao, H.J. ETCNet: Encrypted Traffic Classification Using Siamese Convolutional Networks. In Proceedings of the Workshop on Network Application Integration/CoDesign (NAI’20), Virtual Event, New York, NY, USA, 14 August 2020; ACM: New York, NY, USA, 2020; p. 3. [Google Scholar] [CrossRef]
- SplitCap. Available online: https://www.netresec.com/index.ashx?page=SplitCap (accessed on 20 September 2020).
- Scikitlearn. Available online: https://www.cntofu.com/book/170/docs/5.md (accessed on 5 October 2020).
- Branson, S.; Horn, G.V.; Belongie, S.J.; Perona, P. Bird Species Categorization Using Pose Normalized Deep Convolutional Nets. arXiv 2014, arXiv:1406.2952. [Google Scholar]
- Wang, Z.; Wang, X.; Wang, G. Learning fine-grained features via a CNN Tree for Large-scale Classification. Neurocomputing 2018, 275, 1231–1240. [Google Scholar] [CrossRef] [Green Version]
- Gu, J.; Wang, Z.; Kuen, J.; Ma, L.; Shahroudy, A.; Shuai, B.; Liu, T.; Wang, X.; Wang, G. Recent Advances in Convolutional Neural Networks. arXiv 2015, arXiv:1512.07108,. [Google Scholar] [CrossRef] [Green Version]
- Wang, P.; Ye, F.; Chen, X.; Qian, Y. Datanet: Deep Learning Based Encrypted Network Traffic Classification in SDN Home Gateway. IEEE Access 2018, 6, 55380–55391. [Google Scholar] [CrossRef]
- Siami-Namini, S.; Tavakoli, N.; Namin, A.S. The Performance of LSTM and BiLSTM in Forecasting Time Series. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019; pp. 3285–3292. [Google Scholar] [CrossRef]
- Huang, Q.; Chen, R.; Zheng, X.; Dong, Z. Deep Sentiment Representation Based on CNN and LSTM. In Proceedings of the 2017 International Conference on Green Informatics (ICGI), Fuzhou, China, 15–17 August 2017; pp. 30–33. [Google Scholar] [CrossRef]
- Luan, Y.; Lin, S. Research on Text Classification Based on CNN and LSTM. In Proceedings of the 2019 IEEE International Conference on Artificial Intelligence and Computer Applications (ICAICA), Dalian, China, 29–31 March 2019; pp. 352–355. [Google Scholar] [CrossRef]
- Li, C.; Zhan, G.; Li, Z. News Text Classification Based on Improved Bi-LSTM-CNN. In Proceedings of the 2018 9th International Conference on Information Technology in Medicine and Education (ITME), Hangzhou, China, 19–21 October 2018; pp. 890–893. [Google Scholar] [CrossRef]
- Ioffe, S.; Szegedy, C. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift. arXiv 2015, arXiv:1502.03167. [Google Scholar]
- Song, M.; Ran, J.; Li, S. Encrypted Traffic Classification Based on Text Convolution Neural Networks. In Proceedings of the 2019 IEEE 7th International Conference on Computer Science and Network Technology (ICCSNT), Dalian, China, 19–20 October 2019; pp. 432–436. [Google Scholar] [CrossRef]
Encryption Type | Traffic Service Type | Applications | Sample Size |
---|---|---|---|
Regular encrypted traffic | Chat | ICQ/AIM/Skype/Facebook/Hangous | 9285 |
Email/Gmail | 6983 | ||
File Transfer | File Transfer | 51,235 | |
P2P | Torrent | 940 | |
Streaming | Netflix/Spotify/Netflix/Vimeo/Youtube | 1680 | |
VoIP | Hangouts | 71,093 | |
VPN encrypted traffic | VPN-Chat | Aim/Facebook/Hangous/icq/skype | 3621 |
VPN-Email | 268 | ||
VPN-File Transfer | Ftps/Sftp/Skype | 915 | |
VPN-P2P | Bittorrent | 429 | |
VPN-Streaming | Youtube/Vimeo/Netflix/Spotify/Facebook | 590 | |
VPN-VoIP | Facebook/Hangous/Skype/Voipbuster | 6330 |
Experiment | Description | Classifier |
---|---|---|
1 | Protocol Encapsulation Identification | 2 |
2 | Mixed encrypted traffic service identification | 6 |
3 | Regular encrypted traffic service identification | 6 |
4 | VPN encrypted traffic service identification | 6 |
5 | encrypted traffic service identification | 12 |
Experiment | Accuracy | Precisionl | Recall | F1-Score |
---|---|---|---|---|
1 | 100 | 99.9 | 100 | 100 |
2 | 98.2 | 98 | 98.4 | 98.2 |
3 | 98.7 | 98.8 | 99 | 98.8 |
4 | 99 | 97.6 | 97.6 | 97.5 |
5 | 98.1 | 98 | 98 | 98.1 |
Work | Model | Non-VPN | VPN | ||
---|---|---|---|---|---|
Precision | Recall | Precision | Recall | ||
Draper-Gil [17] | C4.5 | 90.6 | 88.8 | 89 | 92 |
Wang [22] | 1DCNN | 100 | 99.9 | 99.9 | 100 |
This paper | ICLSTM | 100 | 100 | 99.9 | 100 |
Work | Model | Experiment 2 | Experiment 3 | Experiment 4 | |||
---|---|---|---|---|---|---|---|
Precision | Recall | Precision | Recall | Precision | Recall | ||
Draper-Gil [17] | C4.5 | N/A | N/A | 89 | 85.5 | 84 | 87.6 |
Wang [22] | 1DCNN | N/A | N/A | 85.5 | 85.8 | 94.9 | 97.3 |
This paper | ICLSTM | 98.6 | 98.9 | 99.1 | 99.4 | 98.3 | 97.6 |
Work | Model | Non-VPN | VPN | ||||
---|---|---|---|---|---|---|---|
Precision | Recall | F1-Score | Precision | Recall | F1-Score | ||
Draper-Gil [17] | C4.5 | 84.3 | 79.3 | 81.7 | 78.2 | 81.3 | 79.7 |
Wang [22] | 1DCNN | 85.8 | 85.9 | 85.8 | 92 | 95.2 | 93.5 |
Song [39] | Text-CNN | 87.6 | 87.3 | 87.5 | 95.2 | 97.4 | 96.1 |
Lotfollahi [23] | CNN | 88.8 | 88.8 | 88.5 | 99.1 | 99.1 | 99.1 |
SAE | 86.6 | 88.8 | 87.3 | 97.8 | 96.3 | 97 | |
This paper | ICLSTM | 96.6 | 97.6 | 97 | 98.5 | 97.8 | 98 |
Class | ICLSTM | SAE [23] | ||||
---|---|---|---|---|---|---|
Precision | Recall | F1 | Precision | Recall | F1 | |
Chat | 99.2 | 98.6 | 98.9 | 82 | 68 | 74 |
98.2 | 99.3 | 98.7 | 97 | 93 | 95 | |
FileTransfer | 93.4 | 95.3 | 94.3 | 98 | 99 | 99 |
P2P | 96.3 | 98.6 | 97.4 | 97 | 99 | 98 |
Streaming | 95.6 | 99 | 97.3 | 82 | 84 | 83 |
VoIP | 96.6 | 95 | 95.8 | 64 | 90 | 75 |
VPN-Chat | 99.3 | 99.5 | 99.4 | 95 | 94 | 94 |
VPN-Email | 98.5 | 91.2 | 94.7 | 97 | 93 | 95 |
VPN-FileTransfer | 97.4 | 99.2 | 98.3 | 98 | 95 | 97 |
VPN-P2P | 97.5 | 96.7 | 97 | 99 | 97 | 98 |
VPN-Streaming | 98.5 | 99.8 | 99 | 99 | 99 | 99 |
VPN-VoIP | 99.9 | 100 | 99.9 | 99 | 100 | 99 |
macro avg | 97.5 | 97.7 | 97.6 | N/A | N/A | N/A |
weighted avg | 96 | 95.9 | 96 | 92 | 92 | 92 |
Publisher’s Note: MDPI stays neutral with regard to jurisdictional claims in published maps and institutional affiliations. |
© 2021 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).
Share and Cite
Lu, B.; Luktarhan, N.; Ding, C.; Zhang, W. ICLSTM: Encrypted Traffic Service Identification Based on Inception-LSTM Neural Network. Symmetry 2021, 13, 1080. https://doi.org/10.3390/sym13061080
Lu B, Luktarhan N, Ding C, Zhang W. ICLSTM: Encrypted Traffic Service Identification Based on Inception-LSTM Neural Network. Symmetry. 2021; 13(6):1080. https://doi.org/10.3390/sym13061080
Chicago/Turabian StyleLu, Bei, Nurbol Luktarhan, Chao Ding, and Wenhui Zhang. 2021. "ICLSTM: Encrypted Traffic Service Identification Based on Inception-LSTM Neural Network" Symmetry 13, no. 6: 1080. https://doi.org/10.3390/sym13061080
APA StyleLu, B., Luktarhan, N., Ding, C., & Zhang, W. (2021). ICLSTM: Encrypted Traffic Service Identification Based on Inception-LSTM Neural Network. Symmetry, 13(6), 1080. https://doi.org/10.3390/sym13061080