*4.2. Discussion*

There is a trade-off between the word size and the classification performance, i.e., accuracy and detection time. Reducing the word size can potentially increase the detection time, and training time as well, due to more words involved, longer sentence created and dimension extended. For accuracy, the performance can vary and depend on the considered attack type. Apparently, we can't scale the size too small. In our evaluation, we set the word size to two bytes as aforementioned. However, there is promising research to propose an algorithm to auto-calculate this value properly on the deployed environment, even that for specific attacks.

In addition, the approach to consider the classification at the packet level can open the door to accelerating the detection since we can schedule the packets for parallel processing. However, the explicit shortcoming is to increase the training time and resource usage (e.g., memory) due to a large number of parameters and data size putting to the training model. Balancing the factor of acceptable training time but still gaining the best classification performance is a non-trivial task and a potential research direction.

Finally, the DL-based classification approach is highly susceptible to the data poisoning attack due to its dependence on the training data. Thus far, we have found few attack models targeting the evasion of the deep learning-based malicious classification systems, including ours. However, this can be soon changed when the popularity of deep learning will attract more attackers to exploit its vulnerabilities for hacking or monetization. Generating/preventing adversary models against deep learning is thus one of the most interesting and promising security research topics.
