Next Article in Journal
Accumulation of Astaxanthin and Canthaxanthin in Liver and Gonads of Rainbow Trout (Oncorhynchus mykiss (Walbaum, 1792)) Reared in Water Containing the Fungicide Mancozeb in Concentration Level Permitted by European Legislation
Next Article in Special Issue
Effective Selfish Mining Defense Strategies to Improve Bitcoin Dependability
Previous Article in Journal
A Study on Correlation between Ultrasonic Pulse Velocity Method and Coarse Aggregate for Estimating Residual Modulus of Elasticity of Concrete Exposed to High Temperatures
Previous Article in Special Issue
A Review of Deep Learning Applications for the Next Generation of Cognitive Networks
 
 
Article
Peer-Review Record

LAN Intrusion Detection Using Convolutional Neural Networks

Appl. Sci. 2022, 12(13), 6645; https://doi.org/10.3390/app12136645
by Hanan Zainel and Cemal Koçak *
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(13), 6645; https://doi.org/10.3390/app12136645
Submission received: 25 May 2022 / Revised: 27 June 2022 / Accepted: 28 June 2022 / Published: 30 June 2022

Round 1

Reviewer 1 Report

The main problem about the results presented in the manuscript is that NSL KDD dataset is not new and there numerous papers, applying CNN for anomaly training on this dataset: 

https://www.researchgate.net/profile/Kathiravan-Natarajan/publication/326566510_An_Empirical_Study_on_Network_Anomaly_Detection_Using_Convolutional_Neural_Networks/links/5bcabc3ba6fdcc03c7962386/An-Empirical-Study-on-Network-Anomaly-Detection-Using-Convolutional-Neural-Networks.pdf

https://ieeexplore.ieee.org/abstract/document/8463997

https://link.springer.com/chapter/10.1007/978-3-319-70139-4_87

https://dl.acm.org/doi/abs/10.1145/3297156.3297230

To be continued...

Motivation for choosing NSL-KDD and CNN is not given. Why not newer datasets, like CSE-CIC-IDS2018.

Experiments are also not well described, e.g. classical cross-validation is not applied, so error rate and overall accuracy is under discussion.

In general such kind of research can exist, but in that case the achieved results should be higher that these of other researchers and comparison should be provided.

Author Response

Response to Reviewer 1 Comments

Acknowledgement

The authors would like to thank reviewer for his/her invaluable comments and suggestions. We have made requested changes and reorganized manuscript according to reviewers comments. Here are our explanations to reviewer’s comments.

 

 

Point 1 : 

The main problem about the results presented in the manuscript is that NSL KDD dataset is not new and there numerous papers, applying CNN for anomaly training on this dataset:

 

https://www.researchgate.net/profile/Kathiravan-Natarajan/publication/326566510_An_Empirical_Study_on_Network_Anomaly_Detection_Using_Convolutional_Neural_Networks/links/5bcabc3ba6fdcc03c7962386/An-Empirical-Study-on-Network-Anomaly-Detection-Using-Convolutional-Neural-Networks.pdf

https://ieeexplore.ieee.org/abstract/document/8463997

https://link.springer.com/chapter/10.1007/978-3-319-70139-4_87

https://dl.acm.org/doi/abs/10.1145/3297156.3297230

To be continued...

Motivation for choosing NSL-KDD and CNN is not given. Why not newer datasets, like CSE-CIC-IDS2018.

Response 1:  It was detailed in Lines: 118-134 in revised version of manuscript as:

The NSLKDD dataset is a designed military network environment with an Air Force LAN (Local Area Network). They created a range of attacks and TCP/IP data to replicate an Air Force LAN (USAF). Each data record comprises 41 network characteristics, and every data falls into one of four kinds of assaults (DoS, U2R, R2L, and Probing).

The Canadian Institute of Cyber Security using intrusion traffic characterization techniques created the CIC-IDS-2017 and CSE-CIC-IDS-2018 datasets. These datasets are divided into seven attack types, each of which describes a current assault scenario. The NSLKDD dataset, on the other hand, comprises 41 categories, which is greater than the CIC-IDS-2018 dataset.

The large quantity of redundant records in the KDD data set leads the learning algorithms to be biased towards frequent records, preventing them from learning rare records, which are frequently more detrimental to networks such as U2R and R2L assaults. Furthermore, the presence of these repeated records in the test set will bias the evaluation findings in favor of approaches with higher detection rates on frequent records.

In addition, the complexity level of the records in the KDD data set was examined. Surprisingly, all 21 learners accurately categorized roughly 98 percent of the recordings in the train set and 86 percent of the records in the test set.

 

Point 2 : Experiments are also not well described, e.g. classical cross-validation is not applied, so error rate and overall accuracy is under discussion.

Response 2: It was detailed in Lines: 509-528 in revised version of manuscript as:

 

Cross validation is a model selection method that is mostly used to choose hyper parameters. The number of parameters in the model will change when hyperparameters are changed. Increasing the number of layers in a neural network, for example, might create thousands of new parameters (depending on the width of the layer). Simple models with few parameters, models with simple hyperactive parameters, and models that are straightforward to tune are frequently employed with k-fold cross-validation. CNN has a proclivity for overfitting rather than underfitting.

The exact values of weights or coefficients that the model ends up with after solving the optimization using gradient descent or whatever approach you employ are referred to as parameters. This is the weights matrix for each layer in a CNN.

To discover the optimal collection of hyperparameters, cross validation is utilized. The optimizer (gradient descent, adam, etc.) finds the optimum set of parameters for a given collection of hyper parameters and data.

Furthermore, cross validation is mostly used in regular regression problems, in this paper the problem is classification which would necessitate k-fold cross validation as an equivalent to traditional cross validation. In this experiment, the reason for avoiding K-fold cross validation is the folds should have equal labeling percentage in each fold, which is very challenging to do especially with large dataset. Having inconsistent labeling would affect negatively on the accuracy of the model. However, as an alternative different testing method are done as f1-score, precision and recall are applied for both models.

 

Point 3: In general, such kind of research can exist, but in that case, the achieved results should be higher that these of other researchers and comparison should be provided.

Response 3: It was detailed in Lines: 578-583 in revised version of manuscript as:

 

Here we have divided NSLKDD train dataset into two parts train and test with 80-20 percent ratio using sklearn split function.  Table 5 shows the accuracy of train and test dataset of both the model. According to paper [29], conventional LSTM model shows low performance on NSLKDD test and NSLKDD test21 due to low number of dataset present this file. While our method of splitting train dataset into two-part train and test shows higher performance.

 

It was detailed in Lines: 596-615 in revised version of manuscript as:

 

For further verification f1-score, precision, recall and support are generated reports for both CNN and CNN-LSTM models. Table 5 describe the metrics of CNN model on all the categories of attacks. Whereas table 6 describe the same report but with CNN-LSTM model.

Table 5. Validation metrics on CNN model

 

precision

recall

f1-score

support

Normal

0.97

0.99

0.98

18050

Dos

0.94

0.96

0.95

3352

Probe

0.99

0.98

0.98

12437

R2L

0.00

0.00

0.00

30

U2R

0.91

0.73

0.81

825

 

 

 

 

 

accuracy

 

 

0.97

34694

macro avg

0.76

0.73

0.74

34694

weighted avg

0.97

0.97

0.97

34694

 

Table 6. Validation metrics on CNN-LSTM model

 

precision

recall

f1-score

support

Normal

0.99

0.99

0.99

18050

Dos

0.97

0.99

0.98

3352

Probe

1.00

1.00

1.00

12437

R2L

0.73

0.27

0.39

30

U2R

0.94

0.81

0.87

825

 

 

 

 

 

accuracy

 

 

0.99

34694

macro avg

0.93

0.81

0.85

34694

weighted avg

0.99

0.99

0.99

34694

 

Table 7 below describes a brief comparison between the accuracies on different models with the same datasets. Only testing accuracy is compared as other approaches did not mention the training accuracy and testing accuracy is the most important as the models will measure their effectiveness on unseen data.

Table 7. Accuracy comparison between proposed model and other models.

Model

Testing Accuracy

CNN Model

98.56 %

Genetic algorithm model [32]

94.58%

CNN multi-stage [33]

80.13%

Proposed CNN-LSTM model

99.25 %

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Authors provide a survey of their approach to prevent LAN intrusions using Convolutional Neural Networks. The presentation is written clearly but from my point of view, there is not much new in this approach. What I want to appreciate is the theoretical introduction and review. In the practical part, mostly the statistical review is provided.

It would be better to extend the part about implementation of the solution, about the application and to describe it.

List of minor issues:

- keywords: correct spelling of  Deep Learning

- l. 67> phrase -> phase

- l. 101, double "is" in sentence

- python -> Python

- l. 168-172 - this section description is somehow wrong, they do not correspond with real sections, sect. IV is twice and conclusion is missing

- l. 198, in their work

- l. 282, in [24], add comma ...

- l. 352-355: what do the labels mean? this is unclear

- useable -> usable

- l. 414, why sectioning with A. numbering?

- formatting of references is wrong, please check the proper format

Author Response

Response to Reviewer 2 Comments

Acknowledgement

The authors would like to thank reviewer for his/her invaluable comments and suggestions. We have made requested changes and reorganized manuscript according to reviewers comments. Here are our explanations to reviewer’s comments.

 

Point 1 : keywords: correct spelling of  Deep Learning

Response 1: It is corrected.

 

Point 2 : 67> phrase -> phase

Response 2: It is corrected.

 

Point 3 : 101, double "is" in sentence

Response 3: It was detailed in sentence revised version of manuscript as:

Convolution Neural Network is a subset of deep learning and is becoming a popular method of identifying sophisticated assaults with unusual patterns [11].

 

Point 4 : python -> Python

Response 4:      It is corrected.

 

Point 5 :  168-172 - this section description is somehow wrong, they do not correspond with real sections, sect. IV is twice and conclusion is missing

Response 5: It was detailed in Lines: 168-172 in revised version of manuscript as:

 

To that purpose, this paper is organized as follows: II Section delves into CNN & its derivatives. Section III contains information on NIDS data sets as well as the hyper parameter tuning technique for CNN and hybrid network parameters. Section IV contains an overview of the experimental outcomes. Performance evaluation of the proposed system and comparisons in the literature are given in Section V. Finally, section VI concludes the article with future work.

 

Point 6 :  198, in their work

Response 6: It was detailed in Lines: 198 in revised version of manuscript as:

 

In their work, the system has a high detection rate and a low false positive alarm rate.

 

Point 7 :  282, in [24], add comma ...

Response 7:  It is corrected.

 

Point 8 : 352-355: what do the labels mean? this is unclear

Response 8: It was detailed in Lines 373-381 in revised version of manuscript as:

Since CNN cannot process non-numeric values, it is not possible to train the data while maintaining the labels for each row with a textual property, such as the property label contains labels like Normal, DoS, Probe, U2R, and R2L. Consequently, these qualities must be converted into numerical values. Each label is changed to a number and added to the dataset as a separate property; for example, instead of one property in the dataset called “label” which contains all the categories, each category is added as a separate property in the data frame before parsing it to the model. Resulting the label normal is changed to "1," followed by DOS with "2," probe with "3," R2L with "4", and U2R with "5".

 

Point 9 : - useable -> usable

Response 9: It is corrected.

 

Point 10 :  414, why sectioning with A. numbering?

Response 10: A. The custom CNN model title has been removed.

 

Point 11 : formatting of references is wrong, please check the proper format

Response 11: The references have been checked and converted to the appropriate format.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

I can agree about the comments provided on point 1, partially on point 2 (although I do not think it could not be implemented). Still comments on point 3 are not yet sufficient, since there is a need for in depth result comparison with results by other authors, maybe even in a form of table.

Author Response

Acknowledgement

We thank the referee for his valuable comments and suggestions. We made the desired changes and rearranged the article according to the comments of the referees. We have explained our answers to the comments below, item by item, and presented them in the article.

  1. The main problem about the results presented in the manuscript is that NSL KDD dataset is not new and there numerous papers, applying CNN for anomaly training on this dataset: https://www.researchgate.net/profile/Kathiravan-Natarajan/publication/326566510_An_Empirical_Study_on_Network_Anomaly_Detection_Using_Convolutional_Neural_Networks/links/5bcabc3ba6fdcc03c7962386/An-Empirical-Study-on-Network-Anomaly-Detection-Using-Convolutional-Neural-Networks.pdfhttps://ieeexplore.ieee.org/abstract/document/8463997

https://link.springer.com/chapter/10.1007/978-3-319-70139-4_87

https://dl.acm.org/doi/abs/10.1145/3297156.3297230

To be continued...

Motivation for choosing NSL-KDD and CNN is not given. Why not newer datasets, like CSE-CIC-IDS2018.

  1. Experiments are also not well described, e.g. classical cross-validation is not applied, so error rate and overall accuracy is under discussion.
  2. In general, such kind of research can exist, but in that case, the achieved results should be higher that these of other researchers and comparison should be provided.

 

 

 

 

 

 

 

 

 

 

 

 

 

Response to Reviewer 1 Comments

 

Point 1 :The main problem about the results presented in the manuscript is that NSL KDD dataset is not new and there numerous papers, applying CNN for anomaly training on this dataset:

https://www.researchgate.net/profile/Kathiravan-Natarajan/publication/326566510_An_Empirical_Study_on_Network_Anomaly_Detection_Using_Convolutional_Neural_Networks/links/5bcabc3ba6fdcc03c7962386/An-Empirical-Study-on-Network-Anomaly-Detection-Using-Convolutional-Neural-Networks.pdf

https://ieeexplore.ieee.org/abstract/document/8463997

https://link.springer.com/chapter/10.1007/978-3-319-70139-4_87

https://dl.acm.org/doi/abs/10.1145/3297156.3297230

To be continued...

Motivation for choosing NSL-KDD and CNN is not given. Why not newer datasets, like CSE-CIC-IDS2018.

 

Response 1:It was detailed in Lines: 118-134 in revised version of manuscript as:

The NSLKDD dataset is a designed military network environment with an Air Force LAN (Local Area Network). They created a range of attacks and TCP/IP data to replicate an Air Force LAN (USAF). Each data record comprises 41 network characteristics, and every data falls into one of four kinds of assaults (DoS, U2R, R2L, and Probing) [13].

The CIC-IDS-2017 and CSE-CIC-IDS-2018 datasets were created by the Canadian In-stitute of Cyber Security using intrusion traffic characterization techniques. These datasets are divided into seven attack types, each of which describes a current assault scenario. The NSLKDD dataset, on the other hand, comprises 41 categories, which is greater than the CIC-IDS-2018 dataset [14].

The large quantity of redundant records in the KDD data set leads the learning algo-rithms to be biased towards frequent records, preventing them from learning rare records, which are frequently more detrimental to networks such as U2R and R2L assaults. Fur-thermore, the presence of these repeated records in the test set will bias the evaluation findings in favor of approaches with higher detection rates on frequent records.

In addition, the complexity level of the records in the KDD data set was examined. Surprisingly, all 21 learners accurately categorized roughly 98 percent of the recordings in the train set and 86 percent of the records in the test set [15].

 

Point 2 : Experiments are also not well described, e.g. classical cross-validation is not applied, so error rate and overall accuracy is under discussion.

Response 2: It was detailed in Lines: 511-530 in revised version of manuscript as:

Cross validation is a model selection method that is mostly used to choose hyper parame-ters. The number of parameters in the model will change when hyper parameters are changed. Increasing the number of layers in a neural network, for example, might create thousands of new parameters (depending on the width of the layer). Simple models with few parameters, models with simple hyper parameters, and models that are straightfor-ward to tune are frequently employed with k-fold cross-validation. CNN has a proclivity for overfitting rather than under fitting [32].

The exact values of weights or coefficients that the model ends up with after solving the optimization using gradient descent or whatever approach you employ are referred to as parameters. This is the weights matrix for each layer in a CNN.

To discover the optimal collection of hyper parameters, cross validation is utilized. The optimizer (gradient descent, adam, etc.) finds the optimum set of parameters for a given collection of hyper parameters and data [33].

Furthermore, cross validation is mostly used in regular regression problems, in this paper the problem is classification which would necessitate k-fold cross validation as an equivalent to traditional cross validation. In this experiment, the reason for avoiding K-fold cross validation is the folds should have equal labeling percentage in each fold, which is very challenging to do especially with large dataset. Having inconsistent labeling would affect negatively on the accuracy of the model. However, as an alternative different testing methods are done like f1-score, precision and recall are applied for both models [34].

 

Point 3: In general, such kind of research can exist, but in that case, the achieved results should be higher that these of other researchers and comparison should be provided.

Response 3: It was detailed in Lines: 580-585 in revised version of manuscript as:

 

Here we have divided NSLKDD train dataset into two parts train and test with 80-20 percent ratio using sklearn split function. Table 5 shows the accuracy of train and test da-taset of both the model. According to paper [35], conventional LSTM model shows low performance on NSLKDD test and NSLKDD test21 due to low number of dataset present this file. While our method of splitting train dataset into two-part train and test shows higher performance.

 

It was detailed in Lines: 598-616 in revised version of manuscript as:

 

For further verification f1-score, precision, recall and support are generated reports for both CNN and CNN-LSTM models. Table 5 describe the metrics of CNN model on all the categories of attacks. Whereas table 6 describe the same report but with CNN-LSTM model.

Table 5. Validation metrics on CNN model

 

precision

recall

f1-score

support

Normal

0.97

0.99

0.98

18050

Dos

0.94

0.96

0.95

3352

Probe

0.99

0.98

0.98

12437

R2L

0.00

0.00

0.00

30

U2R

0.91

0.73

0.81

825

 

 

 

 

 

accuracy

 

 

0.97

34694

macro avg

0.76

0.73

0.74

34694

weighted avg

0.97

0.97

0.97

34694

 

Table 6. Validation metrics on CNN-LSTM model

 

precision

recall

f1-score

support

Normal

0.99

0.99

0.99

18050

Dos

0.97

0.99

0.98

3352

Probe

1.00

1.00

1.00

12437

R2L

0.73

0.27

0.39

30

U2R

0.94

0.81

0.87

825

 

 

 

 

 

accuracy

 

 

0.99

34694

macro avg

0.93

0.81

0.85

34694

weighted avg

0.99

0.99

0.99

34694

 

Table 7 below describes a brief comparison between the accuracies on different models with the same datasets. Only testing accuracy is compared as other approaches did not mention the training accuracy and testing accuracy is the most important as the models will measure their effectiveness on unseen data.

Table 7. Accuracy comparison between proposed model and other models.

Model

Testing Accuracy

CNN Model

98.56 %

Genetic algorithm model [32]

94.58%

CNN multi-stage [33]

80.13%

Proposed CNN-LSTM model

99.25 %

 

Author Response File: Author Response.pdf

Reviewer 2 Report

Reviewer agrees with the changes and present version. Most of the reviewer's recommendations are corrected. Paper can be published in this version after the proofreading.

Author Response

Acknowledgement

We thank the referee for his valuable comments and suggestions. We made the desired changes and rearranged the article according to the comments of the referees. We have explained our answers to the comments below, item by item, and presented them in the article.

  1. keywords: correct spelling of Deep Learning
  2. 67> phrase -> phase
  3. 101, double "is" in sentence
  4. python -> Python
  5. 168-172 - this section description is somehow wrong, they do not correspond with real sections, sect. IV is twice and conclusion is missing
  6. 198, in their work
  7. 282, in [24], add comma ...
  8. 352-355: what do the labels mean? this is unclear
  9. - useable -> usable
  10. 414, why sectioning with A. numbering?
  11. formatting of references is wrong, please check the proper format

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

 

Response to Reviewer 2 Comments

 

Point 1: keywords: correct spelling of  Deep Learning

 

Response 1: Corrected misspelling to be "Deep Learning".

 

Point 2: 67> phrase -> phase

 

Response 2: The word phrase has been corrected to "phase".

 

Point 3: 101, double "is" in sentence

 

Response 3: It was detailed in sentence revised version of manuscript as: In Lines: 101-102.

Convolution Neural Network is a subset of deep learning and is becoming a popular method of identifying sophisticated assaults with unusual patterns [11].

 

Point 4: python -> Python

 

Response 4:       All python words in the article have been corrected as "Python".

 

Point 5: 168-172 - this section description is somehow wrong, they do not correspond with real sections, sect. IV is twice and conclusion is missing

 

Response 5: In the article, 168-172 - this section description is corrected in the revised version of the manuscript at Lines 188-191 as follows:

Section IV contains an overview of the experimental outcomes. Performance evaluation of the proposed system and comparisons in the literature are given in Section V. Finally, section VI concludes the article with future work.

 

Point 6: 198, in their work

 

Response 6: It was detailed in Lines: 217-218 in revised version of manuscript as:

"According to the writer's experiences, the system has a high detection rate and a low false positive alarm rate." The sentence has been corrected as follows.

"In their work, the system has a high detection rate and a low false positive alarm rate." as corrected.

 

Point 7: 282, in [24], add comma ...

 

Response 7:  In the edited article; the source numbered [24] became the source numbered [27]. A comma has been added after the reference number. It was detailed in Lines: 287-288 in revised version of manuscript as:

The new NSL-KDD dataset [27], is an upgrade version of the previous one.

 

Point 8: 352-355: what do the labels mean? this is unclear

 

Response 8: The sentence has been rearranged and made understandable.It was detailed in Lines 374-382 in revised version of manuscript as:

Since CNN cannot process non-numeric values, it is not possible to train the data while maintaining the labels for each row with a textual property, such as the property label contains labels like Normal, DoS, Probe, U2R, and R2L. Consequently, these qualities must be converted into numerical values. Each label is changed to a number and added to the dataset as a separate property; for example, instead of one property in the dataset called “label” which contains all the categories, each category is added as a separate property in the data frame before parsing it to the model. Resulting the label normal is changed to "1," followed by DOS with "2," probe with "3," R2L with "4", and U2R with "5".

 

Point 9 : - useable -> usable

 

Response 9: The word useable has been corrected to "usable" in Line 390 in the article has also been corrected.

 

Point 10: 414, why sectioning with A. numbering?

 

Response 10: The subtitle "A. Custom CNN model" has been removed.

 

Point 11: formatting of references is wrong, please check the proper format

 

Response 11: The references have been checked and converted to the appropriate format.

References

  1. Riesco, R.; Larriva-Novo, X.; Villagrá, V. A. Cybersecurity threat intelligence knowledge exchange based on blockchain. Telecommunication Systems 2020, 73(2), 259–88.
  2. Conti M, Dargahi T, Dehghantanha A. Cyber threat intelligence: challenges and opportunities. Cyber Threat Intelligence. 2018, 1–6.
  3. Wang, H.; Han, B.; Su, J.; Wang, X. A High-Performance Intrusion Detection Method Based on Combining Supervised and Unsupervised Learning. In 2018 IEEE SmartWorld, Ubiquitous Intelligence & Computing, Advanced & Trusted Computing, Scalable Computing & Communications, Cloud & Big Data Computing, Internet of People and Smart City Innovation (SmartWorld/SCALCOM/UIC/ATC/CBDCom/IOP/SCI) 2018, 1803-1810.
  4. Internet Security Threat Report. Available online: https://docs.broadcom.com/doc/istr-23-2018-en (accessed on 22 June 2022).
  5. Ghorbani, A.A.; Lu, W.; Tavallaee, M. Network Intrusion Detection and Prevention Concepts and Techniques. vol. 47. Springer Science & Business Media, 2010, pp. 27-54.
  6. Iqbal, S.; Kiah, M. L. M.; Dhaghighi, B.; Hussain, M.; Khan, S.; Khan, M. K.; Choo, K. K. R. On cloud security attacks: A taxonomy and intrusion detection and prevention as a service. Journal of Network and Computer Applications. 2016, 74, 98-120.
  7. Amala, P.; Gayathri, G.; Dinesh, S.; Prabagar, S. Effective Intrusion Detection System Using Support Vector Machine Learning. International Journal of Advanced Science and Engineering Research. 2018 3(1), 302- 305.
  8. Hakim, L; Fatma R. Influence Analysis of Feature Selection to Network Intrusion Detection System Performance Using NSL-KDD Dataset. In: 2019 International conference on computer science, information technology, and electrical engineering (ICOMITEE). IEEE, 2019. pp. 217-220.
  9. DDos attack that disrupted internet was largest of its kind in his- tory experts say. The Guardian, London, U.K., 2016, 26. Available online: https://www. theguardian.com/technology/2016/oct/26/ddos- attack- dyn- mirai- botnet (accessed on 22 June 2022)
  10. Habeeb, R. A. A.; Nasaruddin, F.; Gani, A.; Hashem, I. A. T.; Ahmed, E.; Imran, M. Real-time big data processing for anomaly detection: A survey. International Journal of Information Management. 2019, 45: 289-307.
  11. Kim, J.; Shin, Y.; Choi, E. An intrusion detection model based on a convolutional neural network. Journal of Multimedia Information System 2019, 6(4), 165-172.
  12. Zhang, W.; Yang, G.; Lin, Y.; Ji, C.; Gupta, M.M. On Definition of Deep Learning. In: 2018 World Automation Congress (WAC). IEEE, pp. 1–5.
  13. Stolfo, J.; Fan, W.; Lee, W.; Prodromidis, A.; Chan, P. K. Cost-based modeling and evaluation for data mining with application to fraud and intrusion detection. Results from the JAM Project by Salvatore, 2000, 1-15.
  14. Sharafaldin, A. Habibi Lashkari, and A. A. Ghorbani, “Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization,” in Proceedings of the 4th International Conference on Information Systems Security and Privacy, 2018, pp. 108–116. doi: 10.5220/0006639801080116.
  15. Tavallaee, E. Bagheri, W. Lu, and A. A. Ghorbani, “A detailed analysis of the KDD CUP 99 data set,” in 2009 IEEE Symposium on Computational Intelligence for Security and Defense Applications, Jul. 2009, pp. 1–6. doi: 10.1109/CISDA.2009.5356528.
  16. El Mrabet, Z.; El Ghazi, H.; Kaabouch, N. A performance comparison of data mining algorithms based intrusion detection system for smart grid. In 2019 IEEE International Conference on Electro Information Technology (EIT) 298-303.
  17. Phadke, A.; Kulkarni, M.; Bhawalkar, P.; Bhattad, R. A review of machine learning methodologies for network intrusion detection. In 2019 3rd International Conference on Computing Methodologies and Communication (ICCMC)2019, March. pp. 272-275. 
  18. Sivanantham, S.; Abirami, R.; Gowsalya, R. Comparing the performance of adaptive boosted classifiers in anomaly based intrusion detection system for networks. In 2019 International Conference on Vision Towards Emerging Trends in Communication and Networking (ViTECoN) 2019, March. pp. 1-5.
  19. Thomas, R.; Pavithran, D. A survey of intrusion detection models based on NSL-KDD data set. 2018 Fifth HCT Information Technology Trends (ITT), 2018, pp. 286-291.
  20. Saljoughi, A. S.; Mehrvarz, M.; Mirvaziri, H. Attacks and intrusion detection in cloud computing using neural networks and particle swarm optimization algorithms. Emerging Science Journal, 2017, 1(4), 179-191.
  21. Mehibs, S. M.; Hashim, S. H. Proposed network intrusion detection system based on fuzzy c mean algorithm in cloud computing environment. Journal of University of Babylon for Pure and Applied Sciences, 2017, 26(2), 27-35.
  22. Idhammad, M.; Afdel, K.; Belouch, M. Distributed intrusion detection system for cloud environments based on data mining techniques. Procedia Computer Science, 2018, 127, 35-41.
  23. Deshpande, P. S.; Sharma, S. C.; Peddoju, S. K. Security and Data Storage Aspect in Cloud Computing 52. Singapore: Springer 2019.
  24. Vinayakumar, R.; Alazab, M.; Soman, K. P.; Poornachandran, P.; Al-Nemrat, A.; Venkatraman, S. Deep learning approach for intelligent intrusion detection system. IEEE Access, 2019, 7, 41525-41550.
  25. Gupta, N.; Jindal, V.; & Bedi, P. LIO-IDS: handling class imbalance using LSTM and improved one-vs-one technique in intrusion detection system. Computer Networks, 2021, 192, 108076.
  26. Tavallaee, M.; Stakhanova, N.; Ghorbani, A. A. Toward credible evaluation of anomaly-based intrusion-detection methods. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), 2010, 40(5), 516-524.
  27. Kaushik, S. S.; Deshmukh, P. R. Detection of attacks in an intrusion detection system. International Journal of Computer Science and Information Technologies (IJCSIT), 2011, 2(3), 982-986.
  28. NSL-KDD dataset [online] available: https://www.unb.ca/cic/datasets/nsl.html. Accessed on 22 June 2022.
  29. Dhanabal, L.; Shantharajah, S. P. A study on NSL-KDD dataset for intrusion detection system based on classification algorithms. International journal of advanced research in computer and communication engineering, 2015, 4(6), 446-452.
  30. Buda, M.; Maki, A.; Mazurowski, M. A. A systematic study of the class imbalance problem in convolutional neural networks. Neural networks, 2018, 106, 249-259.
  31. Albawi, S.; Mohammed, T. A.; Al-Zawi, S. Understanding of a convolutional neural network. In 2017 international conference on engineering and technology (ICET). 2017, August, pp. 1-6. IEEE.
  32. Refaeilzadeh, P.; Tang, L.; Liu, H. Cross-validation. Encyclopedia of database systems, 2009, 5, 532-538.
  33. Berrar, D. Cross-Validation. Encyclopedia of Bioinformatics and Computational Biology, 2019, 542–545.
  34. Yadav, S.; Shukla, S. Analysis of k-fold cross-validation over hold-out validation on colossal datasets for quality classification. In 2016 IEEE 6th International conference on advanced computing (IACC)2016, February pp. 78-83.
  35. Dong, R. H.; Yan, H. H.; Zhang, Q. Y. An Intrusion Detection Model for Wireless Sensor Network Based on Information Gain Ratio and Bagging Algorithm. International Journal of Network Security. 2020, 22(2), 218-230.
  36. Amrita, K. K. R. A Hybrid Intrusion Detection System: Integrating Hybrid Feature Selection Approach with Heterogeneous Ensemble of Intelligent Classifiers. International Journal of Network Security. 2018, 20, 41-55.
  37. Lakhina, S.; Joseph, S.; Verma, B. Feature Reduction using Principal Component Analysis for Effective Anomaly–Based Intrusion Detection on NSL-KDD. International Journal of Engineering Science and Technology 2010.

Author Response File: Author Response.docx

Back to TopTop