Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Analysis of Lightweight Feature Vectors for Attack Detection in Network Traffic

Appl. Sci. 2018, 8(11), 2196; https://doi.org/10.3390/app8112196

by Fares Meghdouri, Tanja Zseby

and Félix Iglesias^*

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Appl. Sci. 2018, 8(11), 2196; https://doi.org/10.3390/app8112196

Submission received: 25 October 2018 / Revised: 5 November 2018 / Accepted: 6 November 2018 / Published: 9 November 2018

Round 1

Reviewer 1 Report

The article can be accepted in its present form

Author Response

Dear reviewer,

Thank you for your effort reading our contribution and your final decision.

Best regards,

Félix Iglesias

Reviewer 2 Report

The main text needs minor corrections:

- line 72, both for applications related to network security and network traffic analysis. ---> for applications related to both network security and network traffic analysis.

- line 76, Section 7 ---> Section 6

- lines 84-85, original sources ---> original sources [...]

- line 152, family (...) are ---> family (...) is

- line 200, underwent and additional phase ---> underwent an additional phase

- line 265, learners, they ---> learners, but they

- line 321, data was ---> data were

- lines 326-327, attacking ... are very active ---> attacking ... is very active

- line 363-364, Using ... have become ---> Using ... has become

- line 364, Bayond ---> Beyond

- line 397, extracting ... require ---> extracting ... requires

Author Response

Dear reviewer,

Thank you for your effort reading our contribution and your final decision. We have fixed all the mentioned typos in the new version of the paper.

Best regards,

Félix Iglesias

Reviewer 3 Report

This paper is interesting and clearly exposed. However, this paper lacks elements to judge the validity of classification and data analysis with the various algorithms. The scikit classification algorithms are very academic and are not applicable, in the strict sense of the word, without necessary adaptations. It is therefore important to show the accuracy and validity of these classifications compared to the original data.

For data from a network, scikit algorithms are simplistic, and need to be adapted to the various complex sets of variables characterizing network packets and flows. In this article, there is no presentation of scatter plots in 3D and the reader does not have an image on the way in which the set of points in Rn is configured. We can not understand how the algorithms have skeletonized this set.

It is probable that the set of points resulting from the packet flow of a network is in the Rn space a rather continuous set, with large connected zones, rebelling with the quite naive scikit classifications (by Naïve Bayes, Random Forest, Decision Tree, Vector Support Machines or Logistic Regression classifiers). It would be interesting to represent these continuous subsets and show how random forest algorithms have integrated the geometry of these subsets. We would like to see the shadows of these spaces Rn in 2D or 3D projections. It would have enriched the analysis of data, not only to execute the rather frustrating algorithms of scikit, which rarely give relevant results. In addition, there are ultimately few variables used in this analysis, and we would like to know more about the relative influence of each of the variables in the classification.

minor corrections

76 Section 5 shows and discusses results, Section 7 faces results from the perspective of data encryption, => Section 5 shows and discusses results, Section 6 faces results from the perspective of data encryption,

200 underwent and additional phase to convert unidirectional vectors into bidirectional vectors => underwent an additional phase to convert unidirectional vectors into bidirectional vectors

364 Bayond web browsing => Beyond web browsing

Author Response

Dear reviewer,

Thank you for your effort reading our contribution and your final decision. We have fixed all the observed typos. We have also added some scatter plots corresponding to numerical features randomly selected for the UNSW format case (the new Figure 1). This gives some insight about the kind of input space that algorithms are facing (at least with regard to numerical variables).

Please, note that we are dealing with 5 multidimensional spaces with a considerable number of features (namely 45, 30, 19, 13 and 22 before any one-hot encoding or "dummy" variable transformation). It would take many pages to show scatter plots for all studied vectors. Also data visualization is precisely an open issue in the field, therefore the usefulness of such plots would be debatable since the class-overlap is high as a general trend, and trying to understand spaces with more than 3 dimensions based on histograms, 2d and 3d scatter plots is anyway a very challenging task. Needless to say, we agree that visualizing data is a mandatory step in any data mining or machine learning application, and we do it in all our research projects and publications.

We agree with the observation about scikit algorithms: they are not perfect and a lot of improvement is still required, but they are quite a decent tool for fundamental research and prototyping, provided they are properly adjusted/parameterized. There is a huge community behind scikit and they have become quite popular in the research community for the last years (note the increase of scikit documented by KDnuggets in [1]). In any case, our experiments were conducted in MATLAB too, but we opted to release the python version of for the sake of reproducibility (python and scikit-learn are free, open-source).

[1] https://www.kdnuggets.com/2016/06/r-python-top-analytics-data-mining-data-science-software.html

Best regards,

Félix Iglesias

Article Menu

Analysis of Lightweight Feature Vectors for Attack Detection in Network Traffic

Further Information

Guidelines

MDPI Initiatives

Follow MDPI