Next Article in Journal
Detection and Localization of Overlapped Fruits Application in an Apple Harvesting Robot
Next Article in Special Issue
Effective DGA-Domain Detection and Classification with TextCNN and Additional Features
Previous Article in Journal
A Predictive Fleet Management Strategy for On-Demand Mobility Services: A Case Study in Munich
Previous Article in Special Issue
BLOCIS: Blockchain-Based Cyber Threat Intelligence Sharing Framework for Sybil-Resistance
 
 
Article
Peer-Review Record

Anomaly Based Unknown Intrusion Detection in Endpoint Environments

Electronics 2020, 9(6), 1022; https://doi.org/10.3390/electronics9061022
by Sujeong Kim, Chanwoong Hwang and Taejin Lee *
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Electronics 2020, 9(6), 1022; https://doi.org/10.3390/electronics9061022
Submission received: 27 April 2020 / Revised: 14 June 2020 / Accepted: 16 June 2020 / Published: 20 June 2020
(This article belongs to the Special Issue New Challenges on Cyber Threat Intelligence)

Round 1

Reviewer 1 Report

This article analyzes the endpoint event log using anomaly score, and proposed LOF and AutoEncoder models to calculate anomaly scores to address the security issues caused by malware targeting them are intensifying. In addition, a model was proposed to detect threats when suspicious events corresponding to rules generated through an attack profile occur continuously.

Overall I think this article has average quality in the writing and results. Some more sections and improvements have to be done before it's accepted for publishing:


1. The writing in the paper has to be improved:
In the first two sections, i.e., introduction and related work, the authors should describe the works which were not done by the previous researches, and identify their academic contributions.
One more section has to be added after the section 4. Experimental Results and Analysis to discuss the results, and to give more implications about the results. In addition, the authors have to discuss the improvements and comparisons about this research. Not just presented the results and some basic analysis. The authors should highlight and compare the results with other researchers' work. It would be better to have a table to show the comparisons.
Figure 5, 8, 9 are presented before any description in the article. The authors have to describe first then put the figures.
Figure 7 should not be in the conclusion section, it can be in the discussion section.

2. The abbreviations must be defined upon first use. Many abbreviations are not defined in this article. The authors have to check them overall. The following is just some of them:
Line. 26: IoT, AI, L. 31 DLL, L.51 IDS, L. 77 DNN, RNN, LSTM, L. 82 APT, L. 113 IP, L. 183 ATP, L. 187 C&C, L. 227 PE, L. 331 EDR...

3. Typos or bad grammar :
The authors repeatedly use "IP" instead of "IP address" in the whole article. It's suggested to use "IP address", e.g., "local IP address" instead of "local IP".
L. 158: "system behavior model and network behavior model ware independently generated" -> "system behavior model and network behavior model were independently generated"
L. 165 "Table 3 show the" -> "Table 3 shows the"
L. 173 "Each model 20 epochs." -> "Each model has 20 epochs."
L. 183 "ATP" -> "APT"?
L. 222 "on an ip" -> "on an IP address"?
L. 293 "3 of them were detected in network behavior" -> "Three of them were detected in network behavior"
"Three" instead of "3" should be used in the beginning of a sentence.
L. L302 "that can arise with suspicious processes form the anomaly event analysis proposed earlier" ->
"that can arise with suspicious processes from the anomaly event analysis proposed earlier"

4. The sentences have to be improved due to some repeated words, e.g.,:
L. 33 Fileless attacks bypass antivirus because they cannot be analyzed because they do not create the file.-> Duplicated "because they"
L. 327 This paper suggests the necessity of security measures against security threats against rapidly 327 growing endpoints in hyper-connected society. This paper suggests that security measures are 328 needed for increasing endpoints in hyper-connected societies. -> Duplicated "This paper suggests".

Author Response

1. The writing in the paper has to be improved:
In the first two sections, i.e., introduction and related work, the authors should describe the works which were not done by the previous researches, and identify their academic contributions. 

Thanks for the kind comments. We modified the first two section.

  • The introduction to section 1 describes the challenges of previous research and the major contribution of this paper.
    1. We designed the model using LOF and Autoencoder to detect efficient anomalies. In addition, it detects detected anomalies as suspicious behavior and shows threats detected by rules generated through attack profile analysis.
    2. Existing studies use supervised learning models to classify labeled datasets. This requires the administrator to analyze and generate labels directly from big-data. However, we propose an unsupervised learning-based model that can detect anomalies even in a big-data environment.
    3. Existing security solutions have limitations in real-time detection in big-data environments. This results in a huge security incident due to zero-day vulnerabilities. The proposed model in this paper detects suspicious behavior in real time according to the operational policy discussed in section 5 and presents the threat.
    4. Labeled datasets create and evaluate models. Labels are analyzed and generated by the security administrator However, the whitelist operation reduces the burden of security managers by reducing the need for analysis. In addition, it is efficient because it can set the learning period required for the operation policy.

One more section has to be added after the section 4. Experimental Results and Analysis to discuss the results, and to give more implications about the results. In addition, the authors have to discuss the improvements and comparisons about this research. Not just presented the results and some basic analysis. The authors should highlight and compare the results with other researchers' work. It would be better to have a table to show the comparisons.

  • Section 5 was added to discuss the results of the experiment in section 4 and add more meaning. We proposed operational policies and compared execution times to improve and compare models. Other studies use statistical methods to detect anomalies. Examples include outlier detection, time series analysis, and probability models. Our proposed LOF method is related to outlier detection. In addition, we propose an Autoencoder based on unsupervised learning. Table 9 compares the performance of LOF and Autoencoder. Table 10 shows an example when the whitelist operation policy is applied. You can see that the suspicious process to be reviewed decreases according to the operation.

Figure 5, 8, 9 are presented before any description in the article. The authors have to describe first then put the figures.
Figure 7 should not be in the conclusion section, it can be in the discussion section.

  • We fixed the layout of the picture. We explained first and put in the figure. Also, we removed the existing figure 7 from the conclusion section and added a picture to move it to the discussion section.

2. The abbreviations must be defined upon first use. Many abbreviations are not defined in this article. The authors have to check them overall. The following is just some of them:
Line. 26: IoT, AI, L. 31 DLL, L.51 IDS, L. 77 DNN, RNN, LSTM, L. 82 APT, L. 113 IP, L. 183 ATP, L. 187 C&C, L. 227 PE, L. 331 EDR...

  • It was defined when the abbreviation was first used. It is marked in blue. Please confirm. Thank you for being kind.

3. Typos or bad grammar :
The authors repeatedly use "IP" instead of "IP address" in the whole article. It's suggested to use "IP address", e.g., "local IP address" instead of "local IP".
L. 158: "system behavior model and network behavior model ware independently generated" -> "system behavior model and network behavior model were independently generated"
L. 165 "Table 3 show the" -> "Table 3 shows the"
L. 173 "Each model 20 epochs." -> "Each model has 20 epochs."
L. 183 "ATP" -> "APT"?
L. 222 "on an ip" -> "on an IP address"?
L. 293 "3 of them were detected in network behavior" -> "Three of them were detected in network behavior"
"Three" instead of "3" should be used in the beginning of a sentence.
L. L302 "that can arise with suspicious processes form the anomaly event analysis proposed earlier" -> 
"that can arise with suspicious processes from the anomaly event analysis proposed earlier"

  • Corrected typos or bad grammar. It is marked in blue. Please confirm. Thank you for being kind.

4. The sentences have to be improved due to some repeated words, e.g.,:
L. 33 Fileless attacks bypass antivirus because they cannot be analyzed because they do not create the file.-> Duplicated "because they"
L. 327 This paper suggests the necessity of security measures against security threats against rapidly 327 growing endpoints in hyper-connected society. This paper suggests that security measures are 328 needed for increasing endpoints in hyper-connected societies. -> Duplicated "This paper suggests".

  • Improved sentences due to some repeated words. It is marked in blue. Please confirm. Thank you for being kind.

Author Response File: Author Response.pdf

Reviewer 2 Report

On the problem on the unknown attacks under the internet security, which is called anomaly detection, the authors proposed the Local Outlier Factor (LOF) and AutoEncoder methods based on the machine learning method and analyzed these. First, for the practical proposal of anomaly detection, is this method on the real-time analysis during the test period? This is “real-time” written in conclusion. On the real-time, how much do we need the computational costs such as the memory and the speed? Next, in Sec. 2, the authors explained the related works. Did the authors compare these methods? Also, several figures did not satisfy the figure resolution. The authors should improve the figures.

Author Response

On the problem on the unknown attacks under the internet security, which is called anomaly detection, the authors proposed the Local Outlier Factor (LOF) and AutoEncoder methods based on the machine learning method and analyzed these. First, for the practical proposal of anomaly detection, is this method on the real-time analysis during the test period? This is “real-time” written in conclusion. On the real-time, how much do we need the computational costs such as the memory and the speed? Next, in Sec. 2, the authors explained the related works. Did the authors compare these methods? Also, several figures did not satisfy the figure resolution. The authors should improve the figures.

Thanks for the kind comments. We explain this by adding a discussion section in section 6. It is marked in blue.

  • We added the tested memory and CPU. In addition, Table 9 shows the real-time analysis rates for the proposed LOF and Autoencoder methods. LOF analyzes 166 events per second, while Autoencoder analyzes 571 events per second. The speed of analysis is expected to improve when using GPUs or upgrading hardware. Real-time analysis is determined by operation, as shown in Figure 9. Applying whitelists and blacklists reduces the suspicious process under view as shown in Table 10, and can be verified by setting a training period for machine learning.
  • Other studies use statistical methods to detect anomalies. Examples include outlier detection, time series analysis, and probability models. Our proposed LOF method is related to outlier detection. Jabez et al. [16] proved that NOF-based outlier detection is the fastest. This article proved that the Autoencoder method is faster than the LOF. In addition, most of the related works use labeled datasets to classify supervised learning anomalies. We detect suspicious behaviors that deviate from normal behavior based on unsupervised learning. The main contribution is to detect suspicious processes in a big-data environment and reduce review targets. Section 1 explains the details.
  • Revised to improve figure resolution in this article.

Author Response File: Author Response.pdf

Reviewer 3 Report

In his paper, the authors proposed an anomaly score-based detection method and attack profile technique to counter threats caused by malware intrusion. The work is interesting. However, the paper needs to be modified. Below are detailed comments:

  1. Abstract needs to be modified. The authors need to discuss the current results with existing works in detail. Why the proposed method better than existing methods?
  2. The introduction part need to be modified. 
  3. In related work, could you summarize the pros and cons of all these algorithms in a table to make the comparison more straightforward for readers?
  4. The novelty of the proposed work is limited, and the explanation is not convincing.
  5. Compare the results with existing algorithms. Explain why the proposed model is better than existing algorithms?

Author Response

In his paper, the authors proposed an anomaly score-based detection method and attack profile technique to counter threats caused by malware intrusion. The work is interesting. However, the paper needs to be modified. Below are detailed comments:

Thanks for the kind comments.

1. Abstract needs to be modified. The authors need to discuss the current results with existing works in detail. Why the proposed method better than existing methods?

  • We have modified the article`s abstract. In addition, Section 5 discussion have been added to further discuss the results. Previous studies used labeled datasets to classify threats in supervised learning. This must be done manually by the security administrator. However, we use an unlabeled dataset to effectively detect anomalies based on unsupervised learning and show corresponding threats. In addition, operational policies reduce the need for security managers to analyze. Details are provided in Section 1, Major Contribution.

2. The introduction part need to be modified.

  • We revised the introduction. The number of devices to be managed increases to form a variety of endpoint environments. As a result, there is a need to effectively detect suspicious behavior at the endpoint. We explain the limitations of existing security solutions and propose anomaly detection to detect unknown intrusions. It also describes the major contributions of this article.

3. In related work, could you summarize the pros and cons of all these algorithms in a table to make the comparison more straightforward for readers?

  • In order to make it easier for readers to compare in related works, we have added figure of the execution time and accuracy of all algorithms.

4. The novelty of the proposed work is limited, and the explanation is not convincing.

  • The major contribution is to detect suspicious processes in a big-data environment and reduce review targets. To explain this, the major contributions are described in the introduction section, and the experimental results are discussed in section 6 discussion.

5. Compare the results with existing algorithms. Explain why the proposed model is better than existing algorithms?

  • Other studies use statistical methods to detect anomalies. Examples include outlier detection, time series analysis, and probability models. Our proposed LOF method is related to outlier detection. Jabez et al. [16] proved that NOF-based outlier detection is the fastest. This article proved that the Autoencoder method is faster than the LOF. In addition, most of the related works use labeled datasets to classify supervised learning anomalies. We detect suspicious behaviors that deviate from normal behavior based on unsupervised learning. The anomaly detection we proposed is effective. The operational policies described in section 6 discussions enable real-time detection and reduce review.

 

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

The authors should pay attentions to the writing in the manuscript. There are still some issues, e.g.,
Line 32 APT is not defined first, and instead it is defined later at line 134.
Line 67 LOF is not defined first, and instead it is defined later at line 158.
Line 71 DOS, U2R, R2L are not definied.
Line 91 "NOD" should be "NOF"?
Line 97 KDD99 is not defined, and there is no any reference about it.
Line 119 k-NN is not defined, instead it is defined later at line 187.
Line 123 CANN is not defined.
Line 393 The table caption should be at the same page with its table. It should not cross the pages.
Line 404 Bad Gramma:
"Figure 9 is shows the procedure in real time." -> "Figure 9 shows the procedure in real time."

In addition, to be “racially neutral”, it's recommended to replace use of whitelist with allowlist and blacklist with blocklist (or denylist).

Author Response

Thanks for the kind comments. The revision was clearly highlighted using the "Track Changes" function in Microsoft Word. Also, it is marked in blue.

The authors should pay attentions to the writing in the manuscript. There are still some issues, e.g.,
Line 32 APT is not defined first, and instead it is defined later at line 134.

  • In the revised manuscript, APT was deleted on line 138 and defined on line 32.

Line 67 LOF is not defined first, and instead it is defined later at line 158.

  • In the revised manuscript, LOF was deleted on line 162 and defined on line 68.

Line 71 DOS, U2R, R2L are not defined.

  • DOS, U2R, and R2L are defined in line 72 of the revised manuscript.

Line 91 "NOD" should be "NOF"?

  • It was a typo. Corrected it to "NOF" in line 93 of the revised manuscript.

Line 97 KDD99 is not defined, and there is no any reference about it.

  • The KDD-99 dataset is explained and referenced in line 99 of the revised manuscript.

Line 119 k-NN is not defined, instead it is defined later at line 187.

  • In the revised manuscript, k-NN was deleted on line 191 and defined on line 122.

Line 123 CANN is not defined.

  • CANN was defined in line 127 of the revised manuscript.

Line 393 The table caption should be at the same page with its table. It should not cross the pages.

  • We moved table 5 so that the table caption was on the same page as the table, and added more details above table 9.

Line 404 Bad Gramma:
"Figure 9 is shows the procedure in real time." -> "Figure 9 shows the procedure in real time."

  • Corrected typos or bad grammar. Please confirm.

In addition, to be “racially neutral”, it's recommended to replace use of whitelist with allowlist and blacklist with blocklist (or denylist).

  • Corrected all whitelist and blacklist words in the manuscript as allowlist and denylist.

Author Response File: Author Response.pdf

Reviewer 3 Report

The reviewer thinks that the authors have addressed well what the reviewer suggested in the first-round review. This revised paper looks much better.

Author Response

I greatly appreciate your kind words.

Thank you for your attention to this matter.

 

Author Response File: Author Response.pdf

Back to TopTop