Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

No Pictures, Please: Using eXplainable Artificial Intelligence to Demystify CNNs for Encrypted Network Packet Classification

Appl. Sci. 2024, 14(13), 5466; https://doi.org/10.3390/app14135466

by Ernesto Luis-Bisbé, Víctor Morales-Gómez

, Daniel Perdices

and Jorge E. López de Vergara^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2024, 14(13), 5466; https://doi.org/10.3390/app14135466

Submission received: 3 May 2024 / Revised: 14 June 2024 / Accepted: 19 June 2024 / Published: 24 June 2024

(This article belongs to the Special Issue Recent Applications of Explainable AI (XAI))

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1. Lines 206 and 223 have a lot of references. Please provide exact reasons why each of them is relevant.

2. Section 3 - The information inside ethernet, and TCP may also contain relevant information according to previous research. Should we be discarding them? Explain why this choice and what if this is biasing the CNN to only choose packet length as a metric.

3. It is unclear where the CNN inputs of different classes (email, Video, etc.) are using the same input size.

4. Explain the choice of the number of convolution stages.

5. Figure 3 needs more explanation. How exactly is the length of the packet being used as input?

6. Section 6.2 discusses 'flow dependence'. Please explain what that is and how it impacts train, test, and validation splits.

7. Figure 6 - Again explain how the position of the byte is being used as input to gauge its relevance using gradcam.

8. What exactly are we interpreting from Figure 8?

9. Why was padding selected as a hyperparameter for tuning? Does that imply other hyperparameters were explored but didn't show variance? Or should the authors try more hyperparameters to justify their importance?

10. Figure 3 uses 'Packet Length' while Figure 9 uses 'Position of bytes'. Assuming these two terms signify the bytes arranged in either 1D or 2D format, why are two different names being used?

11. If Figure 10 allows you to conclude "it is likely that the convolutional model is just extracting the 607 feature that the packet is ending at around 300 bytes.", would it be worthwhile to explore how these 300 bytes vary between different classes? I am wondering how the last 300 bytes equate to 'packet length is the only thing being observed by CNN'.

12. In Figure 10 (right), is it correct to assume that packet length is different across the same class? If 'yes', how is the variability in packet length being represented in a 2d-CNN? For example, how would a 2D representation of an email with a packet size of 250 be different from another email with a packet size of 500?

To summarize the paper lacks continuity. It is important to explain how data is being modified as input and how that may play a role in downstream analysis. It is challenging to follow the author's style of writing and the random use of hyperparameters for tuning. This makes it challenging for data interpretation. Please address all the concerns and resubmit.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Comments

In this work, the problem of real-time traffic classification is addressed, proposing a method to classify encrypted network packets using convolutional neural networks (CNNs) and explainable artificial intelligence (XAI) techniques. The authors organize the traffic flow to avoid data leakage and improve model generalization. Results suggest that CNNs can be effective for this task, particularly when biases in data processing are removed. Additionally, it is shown that a simple decision tree based on packet length can achieve comparable performance.

In section 1, lines 49-52, strictly speaking in AI terms, this is not a contribution as it is typically done in machine learning tasks. The separation of training and validation data is an implementation rule.

In section 3, starting from the paragraph beginning on line 264, in the description of the packet filtering process and the removal of Ethernet, IP, and TCP headers, more details could be added about how this process is technically performed. Is it a data preprocessing step? What tools or methods are used to carry out these tasks? It would be helpful to see a packet diagram (in a figure) and understand what information is used and what is discarded, making it clearer for the reader.

In the same section, at line 268, after describing how packets are grouped by flow using the quintuple (source IP, destination IP, source port, destination port, transport protocol), you could expand and explain why this specific approach was chosen. Perhaps highlighting the port-application relationship more clearly, if that's the case. What advantages does it have over other forms of data grouping?

In section 3.1, line 306. Could you describe in more detail why zero padding was chosen instead of random values? How does this affect the neural network training process?

In general, in section 3.2, you might consider adding examples or illustrations to visually demonstrate how each method of dataset splitting would be implemented. This would help readers better understand the concepts presented.

In section 4, in the first part, where feature extraction is described, you could add a brief explanation about why ReLU or sigmoid activation is used in the convolutional layer.

In the second part, where feature analysis is discussed, you could briefly explain why dense layers are chosen instead of additional convolutional layers.

In Section 4. Model Definition, in the paragraph starting at line 395, it's not clear why you use an RNN. Could you explain it in detail and clarify Figure 1, especially the part related to the MLP\RNN block?

Overall, in section 6.1: It is not clear how you managed the validation data. Why didn't you employ a validation technique such as k-fold validation?

- In addition to displaying the confusion matrix, it would be beneficial to conduct a deeper analysis of the errors made by the model. Why do certain confusions occur between specific classes? Are there common patterns in the errors that could indicate areas for improvement in the model?

In Section 6.3, line 569, although it is mentioned that some hyperparameters, such as the number of convolutional stages and padding, can be optimized, it would be helpful to provide more details on how these hyperparameters were selected and how they affect the model's performance. This would help readers better understand how to adjust hyperparameters to optimize the model's performance in different situations.

In Section 6.4, the section mentions that the proposed decision tree model achieves performance comparable to the original CNN models but could benefit from a more detailed comparison. It would be helpful to include a table or graph showing a side-by-side comparison of the performance metrics between the original CNN model and the decision tree model.

Although it is suggested that the simplified decision tree model is a viable alternative to the original CNN model, it would also be important to discuss the possible limitations of this approach. Are there specific scenarios where the decision tree model might not be as effective as the CNN model?

- Could you change the font type of the tree diagram in Figure 11? It's hard to read at 100% zoom.

Finally, in the conclusion section, the limitations of the study could be addressed and areas for future research suggested. What aspects of the encrypted network packet classification problem were not addressed in this study and could be the subject of future research?

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

In this manuscript, the authors study various CNN models pertaining to packet classification and use explainable AI to analyze them. I would recommend this paper for publication. Following are my comments.

Abstract: The abstract indicates the motivation behind the work. However, they have not indicated any crucial findings of the study. The readers should be able to understand the study’s contributions by reading the abstract.

Introduction:

The introduction provides the background properly. One recommendation would be to keep the introduction down to three paragraphs. The authors already have a related works section to discuss earlier works. Therefore, rather than having multiple smaller paragraphs in introduction which is tough to read with repetitive information, the authors can have larger but lesser paragraphs focusing only on the motivation of their work.

Related works and comparative analysis:

The authors can also provide information on how the review was done. What “keywords” were used and if there was any inclusion or exclusion criteria?

Initial data processing:

Why was CNN and RNN for chosen for this task? Is it a combination of recurrent and convolutional layers?

Why was TCP and IP headers excluded? Wouldn’t it affect the model’s ability to use critical information?

How does application labels influence QoS policies?

The authors talk about overfitting. Were there any other steps taken to address this issue?

From Packets to Images:

The authors have so many smaller paragraphs in this section. It is breaking any flow in the information. The authors would need to combine and have two or three larger cohesive paragraphs.

Did the authors face any significant challenge in terms of time complexity when using 3D CNN as opposed to 1D CNN.

Did the authors assess if there is any risk of losing critical information from smaller packets?

Organizing traffic flows:

Can authors explain further about the folder structure? May be include a diagram.

Model Definition:

Give some rationale behind why specific parameters and hyperparameters were chosen.

For instance, why was max pooling chosen over average?

Also, was the choice of D=2 based on empirical results?

Further, explain any steps taken to tackle overfitting?

Gradient analysis:

Can the authors provide more information on counterfactual explanations. How effective are they in debugging and evaluating model performance?

Explain the impact of noise on GradCAM?

Results:

Can the authors provide some information on the generalizability of the model?

How well does the model perform on datasets with different traffic patterns? Would it be consistent?

In the future, what additional features can be extracted or engineered to improve the differentiation between traffic patterns.

Is there a reason why authors did not explore hybrid models? Such as combining 1-D and 2-D CNNs.

In addition to the precision and accuracy, the result should also elaborate on the computational and time complexity of the model.

Also, the authors need to evaluate the model’s perform with real-time continuous learning using real-world traffic data updates.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Article Menu

No Pictures, Please: Using eXplainable Artificial Intelligence to Demystify CNNs for Encrypted Network Packet Classification

Further Information

Guidelines

MDPI Initiatives

Follow MDPI