*4.1. Time Efficiency*

Time efficiency is also an important metric measurement in our experiment. The time efficiency involves two cases: training time and detection time. The training time is defined by the total time to complete the training on the selected dataset. This time depends on the data-preprocessing, the training model (number of layers and dims) and server capacity. Therefore, it would not be fair to draw comparisons within this respect, due to differences in the hardware and software used. Through our evaluation, due to considering at the deeper level (the packet level), the amount of data (after transferring from packet information) used for training in our method is expected to be higher than the flow-based approaches. As a result, the time required for training will be higher. The time for offline training, i.e., ours, could be up to 17 h with USTC-TFC2016 at 200 epochs, in order to obtain the detection performance results as mentioned above. The bigger dataset it is, the more training time it costs. However, our method has advantages of detection time. The detection time in this study is defined as the total time (elapsed time) to perform the classification on a given testing sample, i.e., the testing dataset. Given a testing file 108 MB and our server capacity (mentioned above), our system only takes fewer than 2 s to complete the classification, no matter what type of traffic is. This achievement emboldens our approach for online monitoring since we believe that this speed can satisfy most on-demand applications and in medium networks. For the core networks, we may need more efforts to integrate this system for potential deployment, since the network speed at such nodes can be up to hundreds of gigabits per second. Note that the system at the core networks can be equipped with the much more powerful servers.
