Skip to Content
ElectronicsElectronics
  • Article
  • Open Access

7 October 2023

Numerical Feature Selection and Hyperbolic Tangent Feature Scaling in Machine Learning-Based Detection of Anomalies in the Computer Network Behavior

,
,
,
,
,
,
and
1
Center for Applied Mathematics and Electronics, 11000 Belgrade, Serbia
2
Mathematical Institute of SASA, 11000 Belgrade, Serbia
3
Military Academy, University of Defense, 11042 Belgrade, Serbia
4
Faculty of Technical Science, University of Novi Sad, 21000 Novi Sad, Serbia
This article belongs to the Special Issue New Trends in Information Security

Abstract

Anomaly-based intrusion detection systems identify the computer network behavior which deviates from the statistical model of typical network behavior. Binary classifiers based on supervised machine learning are very accurate at classifying network data into two categories: normal traffic and anomalous activity. Most problems with supervised learning are related to the large amount of data required to train the classifiers. Feature selection can be used to reduce datasets. The goal of feature selection is to select a subset of relevant input features to optimize the evaluation and improve performance of a given classifier. Feature scaling normalizes all features to the same range, preventing the large size of features from affecting classification models or other features. The most commonly used supervised machine learning models, including decision trees, support vector machine, k-nearest neighbors, weighted k-nearest neighbors and feedforward neural network, can all be improved by using feature selection and feature scaling. This paper introduces a new feature scaling technique based on a hyperbolic tangent function and damping strategy of the Levenberg–Marquardt algorithm.

1. Introduction

In recent years, the need for data protection has grown as computer applications process a large amount of data across the globe. Intrusion detection systems (IDSs) serve as the primary line of defense against malicious attacks to the computer networks. Signature-based intrusion detection systems protect computer networks by proactively detecting the presence of known attacks. Anomaly-based intrusion detection systems identify abnormal network behavior by detecting deviations from what is considered “normal” behavior [1]. The concept of normality is generally introduced through a formal model that describes the relationships between the variables involved in a system’s dynamic, and events are identified as abnormal when the deviation from the normality model is large enough. Anomaly detection in supervised machine learning (ML) can be thought of as a binary classification problem because the datasets used to train and test the models contain binary labels.
Research efforts have been extensively dedicated to utilizing ML algorithms for anomaly detection [2,3,4,5]. In order to determine the statistical model of an anomaly-based IDS, various approaches used a number of assumptions about the training data and validation methods. To classify network traffic, the authors of [6,7,8,9] present a comparative analyses and performance evaluation of multiple supervised ML algorithms. The most widely used supervised ML models including SVM [3,4,5], Naïve Bayes (NB) [6,7], DT [7,8], nearest neighbors [9,10,11], random forest (RF) [6,7,12], artificial neural networks (ANN) [7,10,13], logistic regression (LR) [6,7], discriminant analysis (DA) [9], and ensemble methods (EM) [14,15,16] were analyzed.
The main problem with supervised learning in the context of anomaly-based intrusion detection is that a large amount of data can affect classifier decisions. Dimensionality reduction is the most common method for eliminating irrelevant or duplicate features related to the target concept. Without compromising classification performance, this technique can significantly reduce processing time and improve the overall effectiveness of an IDS [17]. By identifying the features that are most important in deciding about anomalies, feature selection can improve classifier performance. The selected features must be able to distinguish between samples from different classes. If they are not, a model is more likely to overfit, resulting in poor classifier performance. It is important to note that a classifier built from the reduced feature set must be equally effective as one built from the full feature set [18].
Feature scaling is a popular method used to enhance classifier performance. Along with feature selection, many ML-based algorithms require feature scaling to ensure that all relevant features are on the same scale. This is performed to prevent a single feature from having a negative impact on the model or other features due to its magnitude. Since ML-based algorithms primary focus on numbers, different ranges can result in the more significant number becoming more important during the model training. This paper introduces a novel feature scaling technique that speeds up processing while improving model accuracy, based on hyperbolic tangent transformation and the Levenberg–Marquardt strategy.
Over the past few decades, researchers have examined a variety of anomaly-based intrusion detection systems. These systems have been tested using different datasets, and this study aims to provide analysis and comparison of 14 such datasets. Each dataset is unique in terms of the amount of data it contains, the nature of attacks, and its intended use [19,20,21,22,23,24,25,26]. These datasets are primarily designed for signature-based intrusion detection and mainly simulate real network traffic. The Kyoto 2006+ dataset, which was recorded directly from the actual network traffic, is the only dataset designed specifically for anomaly-based detection. As a result, we used the Kyoto 2006+ dataset as a baseline for tests on feedforward neural network (FNN), DT, SVM, k-nearest neighbors (k-NN), and weighted k-NN (wk-NN) [10,11,12,13,27,28,29,30,31,32,33]. We compared classification performance of the model using accuracy, processing time, and F1-score.

3. Proposed Work

The main purpose of feature scaling to prevent the models from giving more weight to one feature than the others. For classifiers that make the decision based on the Euclidean distance between two points, different values can affect the classification and lead to biased and incorrect results when features are in different units of measure. If the GD is an optimization algorithm and the features are on a different scale, the classifiers can update a given weight faster than others. Feature scaling can help the GD converge faster. Algorithms used in SVM, k-NN, wk-NN, and FNN classification converge faster and give better results with scaled features.
In this paper, we describe a novel hyperbolic tangent feature scaling technique. The technique is inspired by the S-shaped, zero-centered hyperbolic tangent function (tanh), its sharp gradient (tanh’), and the damping strategy of the Levenberg–Marquardt (LM) algorithm applied to the quasi-linear part of the tanh function. The main idea is to find near-optimal solution of the nonlinear objective function using the properties of the very sharp gradient of the tanh function and the approximation of the Hessian matrix of truncated Taylor formula (second-order derivatives) with the vector-product of Jacobians (first-order derivatives) through an iterative procedure based on step length and search direction. Figure 4 shows the tanh function and its derivative. Because the tanh function is centered on 0 and ranges from −1 to 1, the first derivative of the function is very sharp near zero.
Figure 4. Hyperbolic tangent function (blue) and its first derivative (red).
The magnitudes of the features are limited to the range t a n h 1 , t a n h 1 0.7616 , + 0.7616 while changing signs, which ensures that the directions of the gradient-based updates are independent. The part of the hyperbolic tangent function corresponding to the range 0.7616 , + 0.7616 can be assumed to be quasi linear (see Figure 5). The quasi-linearity used in preprocessing can be considered fine-tuning before the training step. It is important to understand that due to the limitation of instances in range 0.7616 , + 0.7616 , the product of these values and the maximum values of the weights cannot reach ±1, which will speed up any gradient-based algorithm. Also, saturation of the FNN at the beginning of the training process can be avoided in this way.
Figure 5. Quasi-linear part of hyperbolic tangent function.
The Min-Max normalization is given with the Formula (1).
x ( k ) n o r m a l i z e d = x ( k ) x m a x + x m i n 2 x m a x x m i n 2 ,
where x ( k ) n o r m a l i z e d 1,1 ,   x m a x and x m i n represent the normalized instance, maximum and minimum values, respectively. The hyperbolic tangent normalization (NTH) scales feature as follows (2):
x k N T H = t a n h x ( k ) n o r m a l i z e d
where x(k)NTH represents the NTH-normalized instance.
The main characteristic of NTH normalization is that the product of instance and weight chosen to range from −1 to 1 can never reach values ±1. These characteristic further speeds up the training of the classifiers. This is due to the properties of the non-linear Levenberg–Marquardt algorithm, which combines the GD algorithm and the Gauss–Newton (GN) algorithm. The GD algorithm uses an iterative process based on the step length and a search direction determined by the negative of the gradient to minimize objective function. The Jacobian matrix (J) is used in the GN algorithm to simplify the calculation of the Hessian (H) matrix, because it is assumed that the error function is approximately quadratic close to the optimal solution [109]. The idea of LM algorithm is to transform nonlinear function f : R R close to the current point m [110] and approximate the non-linear optimization problem to the extent that f ^ x = f x and x m . Consider the iterates x 1 , x 2 ,   x l . The iterate x i + 1 represents the solution of the problem when x i + 1 x i . It can be derived as in (3).
x i + 1 = x i H + β i I 1 J T f x i = = x i J T J + β i I 1 J T f x i ,   β i > 0 ,
where parameter β ( i ) represents an adjustable variable called the damping factor, I represents the identity matrix and Hessian matrix can be approximated so that H J T J . The parameter   β ( i ) can be changed as follows: If   β ( i ) is small, then x i + 1 is far from x i , and the approximation is poor; i.e.,   β ( i ) ought to be truncated (typically   β ( i + 1 ) =   0.5 β ( i ) ). Otherwise, x i + 1 is too close to x i , which slows progress down and   β ( i ) should be increased (typically,   β ( i + 1 ) = 2   β ( i ) ). Additionally, when   β ( i ) , then H + β i I I , and the LM algorithm behaves like the GD algorithm. While   β ( i ) 0 , the iterate x i is close to the optimal solution, and the LM algorithm behaves like the GN algorithm [61]. The change between the GD and GN algorithms is called the damping strategy [111]. The LM approximation and the damping strategy are described more detailed in our previous work [105,112]. The damping strategy is used here to speed up processing and prevent problems due to a very large or very small gradients. The main goal of the NTH normalization and dumping strategy approach is not only to speed up model evaluation but also to accelerate weight adjustment and avoid zigzag movements.

4. Experimental Results

To train, test, and compare the models in terms of feature selection and feature scaling, a MATLAB Classification learner is used. About 60 thousand instances from the Kyoto 2006+ dataset are used as a benchmark for the experiments. The fundamentals of the experiments are fully explained in our previous work [112]. In brief, the characteristics of the classifiers are as follows: (1) the maximum number of splits in DT classification is 20; (2) for k-NN and wk-NN models, k = 10, the distance measure is Euclidean, and weights are random numbers limited in range [−1, +1]; (3) Gaussian SVM with the One-vs-One multiclass method is applied, and data standardization is used to differentiate true and false; (4) the FNN with one hidden layer is used (nine inputs, nine hidden-layer neurons and one output neuron—all activation functions were tanh), the BP algorithm is based on mean squared errors (maximum number of iterations = 1000, gradient magnitude < 10−5, validation checks = 6). The Kyoto 2006+ dataset is preprocessed to remove Not-a-Number (NaN) values from the features, and then ~57 thousand instances are used to train and test the models. Second, all irrelevant features—categorical, statistical, and features for additional analyses—are eliminated. The evaluation of the model was then left to nine numerical features. Anomalies are determined via the feature Label; if Label = 1, the network traffic is considered normal, if not (Label = 0), an anomaly is detected. Both Min-Max normalization and NTH normalization were used to classify the data, with similar results. The classifier comparison is carried out using the binary confusion matrix as a tool. The confusion matrix contains True Positives (TP), False Positives (FP), True Negatives (TN), and False Negatives (FN) [113]. Binary zero describes the negatives, while binary one describes the positives. These numbers are frequently used to describe the measurement performance of classification problems in the datasets where the true values are known [114]. Metrics calculated from the confusion matrix are enlisted in Table 5.
Table 5. Metrics based on confusion matrix.
The findings demonstrate that all models, with the exception of the DT classifier, have very accurate decision making. The DT model is one of the most efficient classifiers, but it is almost non-affected by feature scaling methodologies. Both kinds of nearest neighbor models take much longer to process than the other models. The Min-Max scaling prevents problems with very high or very low derivatives. This did not, however, have an impact on F1-score or accuracy. High-precision models are evident in the results, but the processing time problems remained. In the experiments based on NTH normalization, processing time for all classifiers was found to be significantly reduced. The results support the assumption that it is reasonable to use NTH normalization to speed up training. All models have a very high F1-score and accuracy (SVM is the exception). Table 6 displays the results.
Table 6. Accuracy, processing time, and F1-score of the classifiers.
Min-Max normalization helps solve one of the problems with extremely small and very large derivatives that make training difficult. Except for the FNN, this normalization increases model properties in terms of processing time and F1-score. The results suggest that the issue of differing scales has no effect on model evaluation. The range of activation functions of input nodes is the key reason why this is not true for the FNN. If the number of inputs is greater than 3, non-tanh function in the hidden layer can become essentially saturated if weight updates are all positive or negative. Also, because of the very sharp gradient, zigzagging is prevented when the tanh activation function is used. When the range of 0.7616 , + 0.7616 is chosen, the quasi-linear part of the tanh accelerates the training. However, this method slightly reduces the accuracy of all models.

6. Conclusions

Classifying network behavior as normal or abnormal can be achieved relatively easy by identifying anomalies. The Kyoto 2006+ dataset, which includes approximately 90 million instances of real network data, is used in this study to evaluate five binary classifiers for anomaly detection. The dataset size is reduced via feature selection techniques used as a preprocessing stage in the classification process. Furthermore, the features are scaled with tangent-hyperbolic function. To speed up training, the dumping strategy is used. According to the results, the processing time is significantly shorter when compared to Min-Max normalization. Future research will use the findings of this study to identify a new search space to improve classification models. In addition, both feature selection and feature scaling will be further explored. Hybrid classifiers, which are expected to be more accurate with faster processing, will also be the subject of further research.

Author Contributions

Conceptualization, D.P.; methodology, D.P.; software, D.P.; validation, M.S. (Miomir Stanković), R.P. and I.V.; formal analysis, D.P. and M.S. (Miomir Stanković); investigation, D.P.; resources, M.S. (Miomir Stanković), G.M.S. and M.S. (Mitar Simić); data curation, D.P.; writing—original draft preparation, D.P.; writing—review and editing, M.S. (Miomir Stanković), R.P., I.V., G.M.S., G.O. and S.S.; visualization, D.P.; supervision, M.S. (Miomir Stanković), G.M.S., M.S. (Mitar Simić), G.O. and S.S.; project administration, G.M.S.; funding acquisition, M.S. (Miomir Stanković), G.M.S. and M.S. (Mitar Simić) All authors have read and agreed to the published version of the manuscript.

Funding

This work has partly received funding from Horizon Europe Framework Programme of European Commission, under grant agreement No. 101086348.

Data Availability Statement

New data were not created.

Acknowledgments

This research is a result of partly supported within the GaitREHub project which has received funding from the European Union’s Horizon research and innovation programme.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Syrris, V.; Geneiatakis, D. On machine learning effectiveness for malware detection in Android OS. J. Inf. Secur. Appl. 2021, 59, 102794. [Google Scholar] [CrossRef]
  2. Viegas, E.K.; Santin, A.O.; Oliviera, L.S. Toward a reliable anomaly-based intrusion detection in real-world environments. Comput. Netw. 2017, 127, 200–216. [Google Scholar] [CrossRef]
  3. Aljawarneh, S.; Aldwairi, M.; Yassein, M.B. Anomaly-based intrusion detection system through feature selection analysis and building hybrid efficient model. J. Comput. Sci. 2018, 25, 152–160. [Google Scholar] [CrossRef]
  4. Ullah, I.; Mahmoud, Q.M. A filter-based feature selection model for anomaly-based intrusion detection systems. In Proceedings of the 2017 IEEE International Conference on Big Data, Boston, MA, USA, 11–14 December 2017; pp. 2151–2159. [Google Scholar] [CrossRef]
  5. Vengatesan, K.; Kumar, A.; Naik, R.; Verma, D.K. Anomaly Based Novel Intrusion Detection System for Network Traffic Reduction. In Proceedings of the 2018 2nd International Conference on I-SMAC (IoT in Social, Mobile, Analytics and Cloud) (I-SMAC), Palladam, India, 30–31 August 2018; pp. 688–690. [Google Scholar] [CrossRef]
  6. Aloraini, F.; Javed, A.; Rana, O.; Burnap, P. Adversarial machine learning in IoT from an insider point of view. J. Inf. Secur. Appl. 2022, 70, 103341. [Google Scholar] [CrossRef]
  7. Pai, V.; Devidas, B.; Adesh, N.D. Comparative analysis of machine learning algorithms for intrusion detection. IOP Conf. Ser. Mater. Sci. Eng. 2021, 1013, 012038. [Google Scholar] [CrossRef]
  8. Protic, D.; Stankovic, M. Detection of Anomalies in the Computer Network Behaviour. Eur. J. Eng. Form. Sci. 2020, 4, 7–13. [Google Scholar] [CrossRef]
  9. Saranya, T.; Sridevi, S.; Deisy, C.; Chung, T.D.; Khan, M.K.A. Performance analysis of Machine Learning Algorithms in Intrusion Detection Systems: A review. Procedia Comput. Sci. 2020, 171, 1251–1260. [Google Scholar] [CrossRef]
  10. Protic, D.; Stankovic, M. A Hybrid Model for Anomaly-Based Intrusion Detection in Complex Computer Networks. In Proceedings of the 2020 21st International Arab Conference on Information Technology (ACIT), Giza, Egypt, 28–30 November 2020; pp. 1–8. [Google Scholar] [CrossRef]
  11. Protic, D.; Stankovic, M. Anomaly-Based Intrusion Detection: Feature Selection and Normalization Instance to the Machine Learning Model Accuracy. Eur. J. Eng. Form. Sci. 2018, 1, 43–48. [Google Scholar] [CrossRef][Green Version]
  12. Siddiqui, M.A.; Pak, W. Optimizing Filter-Based Feature Selection Method Flow for Intrusion Detection System. Electronics 2020, 9, 2114. [Google Scholar] [CrossRef]
  13. Abiodun, O.I.; Jantan, A.; Omolara, A.E.; Dada, K.V.; Mohamed, N.A.; Arshad, H. State-of-the-art in artificial neural network applications: A survey. Heliyon 2018, 4, e00938. [Google Scholar] [CrossRef] [PubMed]
  14. Ganapathy, S.; Kulothungan, K.; Muthurajkumar, S.; Vijayalakshimi, M.; Yogesh, P.; Kannan, A. Intelligent feature selection and classification techniques for intrusion detection in networks: A survey. EURASIP J. Wirel. Commun. Netw. 2013, 2013, 271. [Google Scholar] [CrossRef]
  15. Gautam, R.K.S.; Doegar, E.A. An ensemble approach for intrusion detection system using machine learning algorithms. In Proceedings of the 2018 8th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 11–12 January 2018; pp. 14–15. [Google Scholar]
  16. Khonde, R.S.; Ulagamuthalvi, V. Ensemble-based semi-supervised learning approach for a distributed intrusion detection system. J. Cyber Secur. Technol. 2019, 3, 163–188. [Google Scholar] [CrossRef]
  17. Ambusaidi, M.A.; He, X.; Nanda, P.; Tan, Y. Building an Intrusion Detection System Using a Filter-Based Feature Selection Algorithm. IEEE Trans. Comput. 2016, 65, 2986–2998. [Google Scholar] [CrossRef]
  18. Najafabadi, M.M.; Khoshgoftaar, T.; Selyia, N. Evaluating Feature Selection Methods for Network Intrusion Detection with Kyoto Data. Int. J. Reliab. Qual. Saf. Eng. 2016, 23, 1650001. [Google Scholar] [CrossRef]
  19. Protic, D. Review of KDD CUP ‘99, NSL-KDD and Kyoto 2006+ Datasets. Vojnoteh. Glas. Mil. Tech. Cour. 2018, 66, 580–595. [Google Scholar] [CrossRef]
  20. Bohara, B.; Bhuyan, J.; Wu, F.; Ding, J. A Survey on the Use of Data Clustering for Intrusion Detection System in Cybersecurity. Int. J. Netw. Secur. Appl. 2020, 12, 1–18. [Google Scholar] [CrossRef]
  21. Thakkar, A.; Lohiya, R. A Review of the Advancement in the Intrusion Detection Datasets. Procedia Comput. Sci. 2020, 167, 636–645. [Google Scholar] [CrossRef]
  22. Khraisat, A.; Gondal, L.; Vamplew, P.; Kamruzzaman, J. Survey of intrusion detection systems: Techniques, datasets and challenges. Cybersecurity 2019, 2, 20. [Google Scholar] [CrossRef]
  23. Ferryian, A.; Thamrin, A.H.; Takeda, K.; Murai, J. Generating Network Intrusion Detection Dataset Based on Real and Encrypted Synthetic Attack Traffic. Appl. Sci. 2021, 11, 7868. [Google Scholar] [CrossRef]
  24. Serkani, E.; Gharaee, H.; Mohammadzadeh, N. Anomaly detection using SVM as classifier and DT for optimizing feature vectors. ISeCure 2019, 11, 159–171. [Google Scholar]
  25. Mighan, S.N.; Kahani, M.S. A novel scalable intrusion detection system based on deep learning. Int. J. Inf. Secur. 2021, 20, 387–403. [Google Scholar] [CrossRef]
  26. Soltani, M.; Siavoshani, M.J.; Jahangir, A.H. A content-based deep intrusion detection system. Int. J. Inf. Secur. 2021, 21, 547–562. [Google Scholar] [CrossRef]
  27. Suman, C.; Tripathy, S.; Saha, S. Building an Effective Intrusion Detection Systems using Unsupervised Feature Selection in Multi-objective Optimization Framework. arXiv 2019, arXiv:1905.06562. [Google Scholar]
  28. Ruggieri, S. Complete Search for Feature Selection Decision Trees. J. Mach. Learn. Res. 2019, 20, 1–34. [Google Scholar]
  29. Shmiilovici, A. Support vector machines. In Data Mining and Knowledge Discovery Handbook; Springer: Boston, MA, USA, 2005. [Google Scholar] [CrossRef]
  30. Deris, A.M.; Zain, A.M.; Sallehudin, R. Overview of Support Vector Machine in Modeling Machining Performances. Procedia Eng. 2011, 24, 301–312. [Google Scholar] [CrossRef]
  31. Halimaa, A.; Sundarakantham, K. Machine Learning Based Intrusion Detection System. In Proceedings of the 2019 3rd International Conference on Trends in Electronics and Informatics (ICOEI), Tirunelveli, India, 23–25 April 2019; pp. 916–920. [Google Scholar] [CrossRef]
  32. Bhati, B.S.; Rai, C.S. Analysis of Support Vector Machine-based Intrusion Detection Techniques. Arab. J. Sci. Eng. 2020, 45, 2371–2383. [Google Scholar] [CrossRef]
  33. Nawi, N.M.; Atomi, W.H.; Rehman, M.Z. The Effect of Data Pre-processing on Optimizing Training on Artificial Neural Network. Procedia Technol. 2013, 11, 23–39. [Google Scholar] [CrossRef]
  34. Jie, C.; Jiawei, L.; Shulin, W.; Sheng, Y. Feature Selection in Machine Learning: A New Perspective. Neurocomputing 2018, 300, 70–79. [Google Scholar] [CrossRef]
  35. Ullah, F.; Babar, M.A. Architectural Tactics for Big Data Cybersecurity Analysis Systems: A Review. J. Syst. Softw. 2019, 15, 81–118. [Google Scholar] [CrossRef]
  36. Luque, A.; Carrasco, A.; Martin, A.; de las Heras, A. The impact of class imbalance in classification performance metrics based on the binary confusion matrix. Pattern Recognit. 2019, 91, 216–239. [Google Scholar] [CrossRef]
  37. Xie, M.; Hu, J.; Chang, E. Evaluating Host-Based Anomaly Detection Systems: Application of the Frequency-Based Algorithms to ADFA-LD. In Network and System Security. NSS 2015. Lecture Notes in Computer Science 8792; Au, M.H., Carminati, B., Kuo, J., Eds.; Springer: Cham, Switzerland, 2014. [Google Scholar] [CrossRef]
  38. Borisniya, B.; Patel, D.R. Evaluation of Modified Vector Space Representation Using ADFA-LD and ADFA-WD Datasets. J. Inf. Secur. 2015, 6, 250. [Google Scholar] [CrossRef]
  39. Vijayakumar, S.; Sannasi, G. Machine Learning Approach to Combat False Alarms in Wireless Intrusion Detection System. Comput. Inf. Sci. 2018, 11, 67. [Google Scholar] [CrossRef][Green Version]
  40. Proebstel, E.P. Characterizing and Improving Distributed Network-Based Intrusion Detection Systems (NIDS): Timestamp Synchronization and Sampled Traffic. Master’s Thesis, University of California DAVIS, Davis, CA, USA, 2008. [Google Scholar]
  41. Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy, Madeira, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar]
  42. Levy, J.L.; Khoshgoftaar, T.M. A survey and analysis of intrusion detection models based on CSE-CIC-IDS2018 Big Data. J. Big Data 2020, 7, 104. [Google Scholar] [CrossRef]
  43. Lippmann, R.P.; Cunningham, R.K.; Fried, D.J.; Graf, I.; Kendal, K.R.; Webster, S.E.; Zissman, M.A. Results of DARPA 1998 Offline Intrusion Detection Evaluation. In Proceedings of the DARPA Information Survivability Conference and Exposition (DISCEX), Hilton Head, SC, USA, 25–27 January 2000; IEEE Computer Society Press: Los Alamitos, CA, USA, 2000. [Google Scholar]
  44. Tavallaee, M.; Bagheri, E.; Lu, W.; Ghorbani, A. A Detailed Analysis of the KDD Cup ’99 dataset. In Proceedings of the 2009 IEEE Symposium of Computational Intelligence for Security and Defense Applications, Ottawa, ON, Canada, 8–10 July 2009; pp. 1–6. [Google Scholar] [CrossRef]
  45. Moustafa, N.; Slay, J. UNSW-NB15: A comprehensive data set for network intrusion detection systems (UNSW-NB15 network data set). In Proceedings of the 2015 Military Communications and Information Systems Conference, Canberra, ACT, Australia, 10–12 November 2015; pp. 1–6. [Google Scholar] [CrossRef]
  46. Song, J.; Takakura, H.; Okabe, Y.; Eto, M.; Inoue, D.; Nakao, K. Statistical Analysis of Honeypot Data and Building of Kyoto 2006+ Dataset for NIDS Evaluation. In Proceedings of the First Workshop on Building Analysis Datasets and Gathering Experience Returns for Security, Salzburg, Austria, 10–13 April 2011; pp. 29–36. [Google Scholar] [CrossRef]
  47. Singh, R.; Kumar, H.; Singla, R.K. An intrusion detection system using network traffic profiling and online sequential extreme learning machine. Expert Syst. Appl. 2015, 42, 8609–8624. [Google Scholar] [CrossRef]
  48. SIGKDD-KDD Cup. KDD Cup 1999: Computer Network Intrusion Detection. 1999. Available online: https://www.kdd.org/kdd-cup/view/kdd-cup-1999/Tasks (accessed on 20 May 2016).
  49. Park, K.; Song, Y.; Cheong, Y. Classification of Attack Types for Intrusion Detection Systems Using a Machine Learning Algorithm. In Proceedings of the 2018 IEEE 4th International Conference on Big Data Computing Service and Applications, Bamberg, Germany, 26–29 March 2018; pp. 282–286. [Google Scholar] [CrossRef]
  50. Ting, K.M. Confusion matrix. In Encyclopedia of Machine Learning; Sammut, C.J., Webb, G.I., Eds.; Springer: Boston, MA, USA, 2011. [Google Scholar] [CrossRef]
  51. Ring, M.; Wunderlich, S.; Scheuring, D.; Landes, D.; Hotho, A. A Survey of Network-based Intrusion Detection Data Sets. arXiv 2019, arXiv:1903.02460. [Google Scholar] [CrossRef]
  52. Demertzis, K. The Bro Intrusion Detection System. Machine Learning to Cyber Security. 2018. Available online: https://www.researchgate.net/publication/329217161_The_Bro_Intrusion_Detection_System (accessed on 27 May 2020).
  53. McCarthy, R. Network Analysis with the Bro Security Monitor. 2014. Available online: https://www.admin-magazine.com/Archive/2014/24/Network-analysis-with-the-Bro-Network-Security-Monitor (accessed on 14 February 2019).
  54. Papamartzivanos, D.; Marmol, F.G.; Kamblourakis, K. Introducing deep learning self-adaptive misuse network intrusion detection system. IEEE Access 2019, 7, 13546–13560. [Google Scholar] [CrossRef]
  55. Sahu, A.; Mao, Y.; Davis, K.; Goulart, A.E. Data Processing and Model Selection for Machine Learning-based Network Intrusion-Detection. In Proceedings of the 2020 IEEE International Workshop Technical Committee on Communications Quality and Reliability (CQR), Stevenson, WA, USA, 14 May 2020; pp. 1–6. [Google Scholar]
  56. Weston, J.; Elisseff, A.; Schoelkopf, B.; Tipping, M. Use of the zero norm with linear models and kernel methods. J. Mach. Learn. Res. 2003, 3, 1439–1461. [Google Scholar]
  57. Song, L.; Smola, A.; Gretton, A.; Borgwardt, K.; Bedo, J. Supervised feature selection via dependence estimation. In Proceedings of the 24th International Conference on Machine Learning, Corvallis, OR, USA, 20–24 June 2007; pp. 823–830. [Google Scholar] [CrossRef]
  58. Dy, J.G.; Brodley, C.E. Feature selection for unsupervised learning. J. Mach. Learn. Res. 2005, 5, 845–889. [Google Scholar]
  59. Mitra, P.; Murthy, C.A.; Pal, S. Unsupervised feature selection using feature similarity. IEEE Trans. Pattern Anal. Mach. Intell. 2002, 24, 301–312. [Google Scholar] [CrossRef]
  60. Zhao, Z.; Liu, H. Semi-supervised feature selection via spectral analysis. In Proceedings of the SIAM International Conference on Data Mining, Minneapolis, MN, USA, 26–28 April 2007; pp. 641–646. [Google Scholar] [CrossRef]
  61. Xu, Z.; Jin, R.; Ye, J.; Lyu, M.; King, I. Discriminative semi-supervised feature selection via manifold regularization. IEEE Trans. Neural Netw. 2010, 21, 1033–1047. [Google Scholar]
  62. Porkodi, R. Comparison on filter-based feature selection algorithms: An overview. Int. J. Innov. Res. Technol. Sci. 2014, 2, 108–113. [Google Scholar]
  63. Artur, A. Review the performance of the Bernoulli Naïve Bayes Classifier in Intrusion Detection Systems using Recursive Feature Elimination with Cross-validated Selection of the Best Number of Features. Procedia Comput. Sci. 2021, 190, 564–570. [Google Scholar] [CrossRef]
  64. Osnaiye, O.; Ogundile, O.; Aina, F.; Peniola, A. Feature Selection for Intrusion Detection System in cluster-based heterogeneous wireless sensor networks. Facta Univ. Ser. Electron. Energetics 2019, 32, 315–330. [Google Scholar] [CrossRef]
  65. Aamir, A.; Zaidi, S.M.A. DDoS Attack detection with feature engineering and machine learning, the framework and performance evaluation. Int. J. Inf. Secur. 2019, 18, 761–785. [Google Scholar] [CrossRef]
  66. Khammassi, C.; Krichen, S. A GA-LR wrapper approach for feature selection in network intrusion detection. Comput. Secur. 2017, 70, 255–277. [Google Scholar] [CrossRef]
  67. Umar, M.A.; Zhanfang, C.; Liu, Y. Network Intrusion Detection Using Wrapper-based Decision Tree for Feature Selection. In Proceedings of the 2020 International Conference on Internet Computing for Science and Engineering, Male, Maldives, 14–16 January 2020; pp. 5–13. [Google Scholar] [CrossRef]
  68. Venkateswaran, N.; Umadevi, K. Hybridized Wrapper Filter Using Deep Neural Network for Intrusion Detection. Comput. Syst. Sci. Eng. 2022, 42, 1–14. [Google Scholar] [CrossRef]
  69. Thakkar, A.; Lohiya, R.A. A survey on intrusion detection system: Feature selection, model, performance measures, application perspective, challenges, and future research directions. Artif. Intell. Rev. 2021, 55, 453–563. [Google Scholar] [CrossRef]
  70. Almomani, O. A Feature Selection Model for Network Intrusion Detection System Based on PSO, GWO, FFA and GA Algorithms. Symmetry 2020, 12, 1046. [Google Scholar] [CrossRef]
  71. Choudhury, A. What are Feature Selection Techniques in Machine Learning. Available online: https://analyticsindiamag.com/what-are-feature-selection-techniques-in-machine-learning/ (accessed on 21 May 2022).
  72. Biswas, S.K.; Bordoloi, M.; Purkayastha, B. Review of Feature Selection and Classification using Neuro-Fuzzy Approaches. Int. J. Appl. Evol. Comput. 2016, 7, 28–44. [Google Scholar] [CrossRef]
  73. Rosely, N.F.L.M.; Sallehuddin, R.; Zain, A.M. Overview Feature Selection Algorithm. J. Phys. Conf. Ser. 2019, 1192, 012068. [Google Scholar] [CrossRef]
  74. Musheer, R.A.; Verma, C.K.; Srivastava, N. Dimension reduction methods for microarray data: A review. AIMS Bioeng. 2017, 4, 179–197. [Google Scholar] [CrossRef]
  75. Ahmed, I.; Shin, H.; Hong, M. Fast Content-Based File Type Identification. In Advances in Digital Forensics VII. Digital Forensics 2011; Springer: Berlin/Heidelberg, Germany, 2011. [Google Scholar] [CrossRef]
  76. Maza, S.; Touahria, M. Feature Selection Algorithms in Intrusion Detection System: A Survey. KSII Trans. Internet Inf. Syst. 2018, 12, 5079–5099. [Google Scholar] [CrossRef]
  77. Zhao, F.; Zhao, J.; Niu, X.; Luo, S.; Xin, Y. A Filter Feature Selection Algorithm Based on Mutual Information for Intrusion Detection. Appl. Sci. 2018, 8, 1535. [Google Scholar] [CrossRef]
  78. Seok, J.; Seon Kang, Y. Mutual information between discrete variables with many categories using recursive adaptive partitioning. Sci. Rep. 2018, 5, 10981. [Google Scholar] [CrossRef] [PubMed]
  79. Battiti, R. Using mutual information for selecting features in supervised neural net learning. IEEE Trans. Neural Netw. 1994, 5, 537–550. [Google Scholar] [CrossRef] [PubMed]
  80. Peng, H.; Long, F.; Ding, C. Feature Selection based on Mutual Information: Criteria of max-dependency, max-relevance, and min-redundancy. IEEE Trans. Pattern Anal. Mach. Intell. 2005, 27, 1226–1238. [Google Scholar] [CrossRef]
  81. Kwak, N.; Choi, C.H. Input feature selection for classification problems. IEEE Trans. Neural Netw. 2002, 13, 143–159. [Google Scholar] [CrossRef]
  82. Novovicova, J.; Somol, P.; Haindl, M.; Pudil, P. Conditional Mutual Information Based Feature Selection for Classification Tasks; Springer: Berlin/Heidelberg, Germany, 2007. [Google Scholar]
  83. Amiri, F.; Yousefi, M.M.R.; Lucas, C.; Shakery, A.; Yazani, N. Mutual information-based feature selection for intrusion detection systems. J. Netw. Comput. Appl. 2011, 34, 1184–1195. [Google Scholar]
  84. Bindumadhavi, P.; Sandhya Rani, B. Building an intrusion detection system using a filter-based feature selection algorithm. Int. J. Innov. Res. Stud. 2017, 7, 24–35. [Google Scholar]
  85. Alanazi, R.; Aljuhani, A. Anomaly Detection for Industrial Internet of Things Cyberattacks. Comput. Syst. Sci. Eng. 2022, 44, 2361–2378. [Google Scholar] [CrossRef]
  86. Mitchel, T. Machine Learning; McGraw Hill: New York, NY, USA, 1997. [Google Scholar]
  87. Schroeber, P.; Boer, C.; Schwarte, L. Correlation coefficients: Appropriate use and interpretation. Anesth. Analg. 2018, 126, 1763–1768. [Google Scholar] [CrossRef]
  88. Kalavadekar, P.N.; Sane, S.S. Building an Effective Intrusion Detection System using combined Signature and Anomaly Detection Techniques. Int. J. Innov. Technol. Explor. Eng. 2019, 8, 429–435. [Google Scholar] [CrossRef]
  89. Perez, D.; Alonso, S.; Moran, A.; Prada, M.A.; Fuentes, J.J.; Domingez, M. Comparison of Network Intrusion Detection Performance Using Feature Representation. In Engineering Applications of Neural Networks. EANN 2019. Communications in Computer and Information Science (1000); Macintyre, J., Illadis, L., Maglogoiannis, I., Jayne, C., Eds.; Springer: Cham, Switzerland, 2019. [Google Scholar] [CrossRef]
  90. Salo, F. Towards Efficient Intrusion Detection Using Hybrid Data Mining Techniques. Ph.D. Thesis, Western University, London, ON, Canada, 2019. [Google Scholar]
  91. Ahmad, T.; Aziz, M.N. Data preprocessing and feature selection for machine learning intrusion detection systems. ICIC Int. Express Lett. 2019, 13, 93–101. [Google Scholar]
  92. Patro, S.G.K.; Sabu, K.K. Normalization: A Preprocessing Stage. 2015. Available online: https://arxiv.org/ftp/arxiv/papers/1503/1503.06462.pdf (accessed on 19 September 2022).
  93. Panda, S.K.; Jana, P.K. An Efficient Task Scheduling Algorithm for Heterogeneous Multi-cloud Environment. In Proceedings of the 3rd International Conference on Advances in Computing, Communications & Informatics (ICACCI), Delhi, India, 24–27 September 2014; pp. 1204–1209. [Google Scholar]
  94. Deb, K.; Pratap, A.; Agarwal, S. A fast and elitist multiobjective genetic algorithm: NSGA-II. IEEE Trans. Evol. Comput. 2002, 6, 182–197. [Google Scholar] [CrossRef]
  95. Mienye, I.D.; Sun, Y.; Wang, Z. Prediction performance of improved decision tree-based algorithms: A review. Procedia Manuf. 2019, 35, 698–703. [Google Scholar] [CrossRef]
  96. Singh, S.; Banerjee, S. Machine Learning Mechanisms for Network Anomaly Detection System: A Review. In Proceedings of the International Conference on Communication and Signal Processing (ICCSP), Chennai, India, 28–30 July 2020; pp. 0976–0980. [Google Scholar] [CrossRef]
  97. Mohammadi, S.; Mirvaziri, H.; Ahaeea, M.G.; Karimipour, H. Cyber intrusion detection by the combined feature selection algorithm. J. Inf. Secur. Appl. 2019, 44, 80–88. [Google Scholar] [CrossRef]
  98. Keserwani, P.K.; Govil, M.C.; Pilli, E.S. An effective NIDS framework based on a comprehensive survey of feature optimization and classification techniques. Neural Comput. Appl. 2023, 35, 4993–5013. [Google Scholar] [CrossRef]
  99. Mishra, A.; Cheng, A.; Zhang, Y. Intrusion Detection Using Principal Component Analysis and Support Vector Machines. In Proceedings of the 2020 IEEE 16th International Conference on Control and Automation (ICCA), Singapore, 9–11 October 2020; pp. 907–912. [Google Scholar]
  100. Vallejo-Huanga, D.; Ambuludi, M.; Morillo, P. Empirical Exploration Machine Learning Techniques for Detection of Anomalies Based on NIDS. IEEE Lat. Am. Trans. 2021, 19, 772–779. [Google Scholar] [CrossRef]
  101. Seelammal, C.; Devi, K.V. Multi-criteria decision support for feature selection in network anomaly detection system. Int. J. Data Anal. Tech. Strateg. 2018, 10, 334–350. [Google Scholar] [CrossRef]
  102. Wang, S.; Cai, C.X.; Tseng, Y.W.; Lin, K.S.M. Feature Selection for Malicious Traffic Detection with Machine Learning. In Proceedings of the 2020 International Computer Symposium (ICS), Tainan, Taiwan, 17–19 December 2020; pp. 414–419. [Google Scholar] [CrossRef]
  103. Idhammad, M.; Afdel, K.; Belouch, M. Semi-supervised machine learning approach for DDoS detection. Appl. Intell. 2018, 48, 3193–3208. [Google Scholar] [CrossRef]
  104. Hamid, Y.; Balasaraswahti, V.R.; Journaux, L.; Sugumaran, M. Benchmark Dataset for Network Intrusion Detection: A review. Int. J. Netw. Secur. 2018, 20, 645–654. [Google Scholar]
  105. Protic, D.; Stankovic, M.; Antic, V. WK-FNN design for detection of anomalies in the computer network traffic. Facta Univ. Ser. Electron. Eng. 2022, 35, 269–282. [Google Scholar] [CrossRef]
  106. Thakkar, A.; Lohiya, R. Attack classification using feature selection techniques: A comparative study. J. Ambient. Intell. Humaniz. Comput. 2020, 12, 1249–1266. [Google Scholar] [CrossRef]
  107. Rahman, A.; Islam, Z. AWST: A novel attribute weight selection technique for data clustering. In Proceedings of the 13th Australasian Data Mining Conference, Sidney, Australia, 8–9 August 2015; pp. 51–58. [Google Scholar]
  108. Rahman, M.A.; Islam, M. CRUDAW: A novel fuzzy technique for clustering records following user-defined attribute weights. In Proceedings of the 10th Australasian Data Mining Conference, Sydney, Australia, 5–7 December 2012; Volume 134, pp. 27–42. [Google Scholar]
  109. Torres, J.M.; Comesana, C.I.; Garcia-Nieto, P. Review: Machine learning techniques applied to cybersecurity. Int. J. Mach. Learn. Cybern. 2019, 10, 2823–2836. [Google Scholar] [CrossRef]
  110. Wang, Z.; Cai, L.; Su, Y.; Peng, Z. An inexact affine scaling Levenberg-Marquardt method under local error bound conditions. Acta Math. Appl. Sin. Engl. 2019, 35, 830–844. [Google Scholar] [CrossRef]
  111. Xue, X.; Zhang, K.; Tan, K.; Feng, L.; Wang, L.; Chen, G.; Zhao, X.; Zhang, L.; Yao, J. Affine transformation-enhanced multi factorial optimization for heterogeneous problems. IEEE Trans. Cybern. 2020, 52, 6217–6231. [Google Scholar] [CrossRef] [PubMed]
  112. Protic, D.; Gaur, L.; Stankovic, M.; Rahman, M.A. Cybersecurity in smart cities: Detection of opposing decisions of anomalies in the computer network behavior. Electronics 2022, 11, 3718. [Google Scholar] [CrossRef]
  113. Singh, P.; Singh, N.; Kant Singh, K.; Singh, A. Chapter 5—Diagnosing of disease using machine learning. In Machine Learning and the Internet of Medical Thing in Healthcare; Academic Press: Cambridge, MA, USA, 2021; pp. 89–111. [Google Scholar]
  114. Kulkarni, A.; Chong, D.; Bataresh, F.A. Data democracy. In Nexus of Artificial Intelligence, Software, Development, and Knowledge Engineering; Academic Press: Cambridge, MA, USA, 2020; pp. 83–106. [Google Scholar]
  115. Tyagi, N. What is Confusion Matrix? 2021. Available online: https://www.analyticssteps.com/blogs/what-confusion-matrix (accessed on 30 March 2022).
  116. Split Software: False Positive Rate. 2020. Available online: https://www.split.io/glossary/false-positive-rate/ (accessed on 27 August 2021).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Article Metrics

Citations

Article Access Statistics

Multiple requests from the same IP address are counted as one view.