Enhancing Firewall Packet Classification through Artificial Neural Networks and Synthetic Minority Over-Sampling Technique: An Innovative Approach with Evaluative Comparison

Korkmaz, Adem; Bulut, Selma; Talan, Tarık; Kosunalp, Selahattin; Iliev, Teodor

doi:10.3390/app14167426

Open AccessArticle

Enhancing Firewall Packet Classification through Artificial Neural Networks and Synthetic Minority Over-Sampling Technique: An Innovative Approach with Evaluative Comparison

by

Adem Korkmaz

^1,*

,

Selma Bulut

²

,

Tarık Talan

³

,

Selahattin Kosunalp

¹ and

Teodor Iliev

^4,*

¹

Department of Computer Technologies, Gönen Vocational School, Bandırma Onyedi Eylül University, Bandırma 10200, Türkiye

²

Department of Computer Technologies, Vocational School of Technical Sciences, Kırklareli University, Kırklareli 39100, Türkiye

³

Department of Computer Engineering, Faculty of Engineering and Natural Sciences, Gaziantep Islam Science and Technology University, Gaziantep 27000, Türkiye

⁴

Department of Telecommunications, University of Ruse, 7017 Ruse, Bulgaria

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 7426; https://doi.org/10.3390/app14167426 (registering DOI)

Submission received: 20 July 2024 / Revised: 12 August 2024 / Accepted: 21 August 2024 / Published: 22 August 2024

(This article belongs to the Special Issue Progress and Research in Cybersecurity and Data Privacy)

Download

Browse Figures

Versions Notes

Abstract

:

Firewall packet classification is a critical component of network security, demanding precise and reliable methods to ensure optimal functionality. This study introduces an advanced approach that combines Artificial Neural Networks (ANNs) with various data balancing techniques, including the Synthetic Minority Over-sampling Technique (SMOTE), ADASYN, and BorderlineSMOTE, to enhance the classification of firewall packets into four distinct classes: ‘allow’, ‘deny’, ‘drop’, and ‘reset-both’. Initial experiments without data balancing revealed that while the ANN model achieved perfect precision, recall, and F1-Scores for the ‘allow’, ‘deny’, and ‘drop’ classes, it struggled to accurately classify the ‘reset-both’ class. To address this, we applied SMOTE, ADASYN, and BorderlineSMOTE to mitigate class imbalance, which led to significant improvements in overall classification performance. Among the techniques, the ANN combined with BorderlineSMOTE demonstrated superior efficacy, achieving a 97% overall accuracy and consistently high performance across all classes, particularly in the accurate classification of minority classes. In contrast, while SMOTE and ADASYN also improved the model’s performance, the results with BorderlineSMOTE were notably more balanced and reliable. This study provides a comparative analysis with existing machine learning models, highlighting the effectiveness of the proposed approach in firewall packet classification. The synthesized results validate the potential of integrating ANNs with advanced data balancing techniques to enhance the robustness and reliability of network security systems. The findings underscore the importance of addressing class imbalance in machine learning models, particularly in security-critical applications, and offer valuable insights for the design and improvement of future network security infrastructures.

Keywords:

firewall packet classification; artificial neural networks; SMOTE; network security; imbalanced data

1. Introduction

Network security ensures information confidentiality, integrity, and availability in today’s increasingly digital world. As gatekeepers to a network, firewalls play a crucial role in managing and filtering network traffic, thus protecting the system from potential threats [1]. However, manually configuring and updating firewall rules can be challenging, especially given the exponential growth of data traffic and the rapid evolution of cyber threats.

A firewall operates as a fundamental component of network security, meticulously regulating both inbound and outbound data traffic by applying a defined set of rules or conditions. Its primary function is safeguarding an organization’s digital infrastructure from external threats. Presently, the usage of firewalls extends beyond their conventional protective role, being integrated into various internal domains of organizational networks. Illustratively, firewalls have been leveraged to restrict employee access to sensitive internal systems such as those for financial management or human resource administration [2]. Five distinct categories exist: packet filtering firewalls, circuit-level gateways, application-level gateways (commonly referred to as proxy firewalls), stateful firewalls, and the more recent incarnation known as the Next Generation Firewall (NGFW). Beyond their principal role, firewall devices and services encompass a broader spectrum of security functions. For instance, they offer systems for intrusion detection or prevention (IDS/IPS), safeguards against Denial of Service (DoS) attacks, session surveillance, and an array of additional protective services, thereby fortifying the security architecture of the network [3].

Packet filtering firewalls are a widely used tool for network security. Network administrators set firewall rules to control firewall policies. The firewall examines each packet containing user data and control information and tests them against a predetermined set of rules. If the packet completes the test, the firewall allows it to reach its target. It refuses those who fail the test. Firewalls test packets by examining rule sets, protocols, ports, and destination addresses [4]. To maintain a secure network, it is essential to understand how packet filtering firewalls work and how to set firewall rules properly [5].

Firewall logs are valuable sources of evidence for network security, but analyzing them can be challenging [6]. Research in network intrusion detection often focuses on using anomaly detection to find attacks. Anomaly detection involves monitoring a network’s activity for deviations from average profiles learned from benign traffic using machine learning tools [7]. To confront this exigency, the domains of artificial intelligence, machine learning, and deep learning have surfaced as formidable technological instruments for the construction of rigorous security infrastructures capable of expeditiously managing intricate cyberattacks. Various machine learning methodologies, encompassing multiclass Support Vector Machines (SVMs) and Artificial Neural Networks (ANNs), have been efficaciously applied for packet categorization and filtration within firewall systems, exemplifying the profound potential of these techniques in fortifying cybersecurity measures [8,9,10,11]. ANNs offer a promising solution among these techniques. Inspired by the functioning of the human brain, ANNs are designed to learn from data and improve their performance over time [12], making them well-suited for detecting patterns in network traffic and enhancing firewall efficiency [13].

The purpose of this research is multifaceted. It addresses critical aspects of firewall packet classification within the network security framework. The principal objective is to introduce and scrutinize a novel methodology incorporating ANNs to efficiently classify network packets within firewall systems, responding to the escalating complexities and diversities in network traffic and the intensifying spectrum of cyber threats.

The research is motivated by two overarching goals:

Data Preparation and Structuring: The first goal is to delineate a coherent and systematic approach for the preprocessing and structuring of raw network traffic data, making it amenable to the rigorous training of machine learning models. The study aims to elucidate the intricacies of data transformation, offering a structured pathway to convert raw network traffic data into an optimal format for training machine learning models.

Model Evaluation and Result Interpretation: The second goal is to delve into the interpretative aspects of the model’s outcomes, emphasizing seminal classification metrics such as precision, recall, F1-Score, and accuracy to ascertain the model’s proficiency and reliability in classifying network packets.

This research acknowledges the challenges posed by class imbalances, particularly in the ‘reset-both’ category, and addresses these issues through the strategic integration of advanced data balancing techniques such as the Synthetic Minority Over-sampling Technique (SMOTE), ADASYN, and BorderlineSMOTE. By incorporating these methods, the study significantly enhances the model’s predictive accuracy and reliability across all classes, thereby advancing the overall effectiveness of firewall packet classification.

In exploring the potential of Artificial Neural Networks (ANNs) within the domain of network packet classification, this research underscores the critical role of data resampling techniques in refining model performance. The study presents a forward-thinking, data-driven approach that bridges the fields of machine learning and network security, aligning with contemporary advancements in the field. It contributes to the evolution of network security by promoting more automated and precise methodologies for firewall packet classification.

This research stands as a seminal contribution by integrating ANNs with SMOTE, ADASYN, and BorderlineSMOTE to develop a robust and efficient framework for firewall packet classification. By combining these advanced methodologies, the study transcends traditional approaches, offering improved solutions to the increasingly complex challenges inherent in network security. This structured approach not only refines the classification process but also enhances the broader understanding and application of machine learning techniques in cybersecurity.

Addressing critical issues such as the exponential growth of data traffic, the rapid evolution of cyber threats, and the inherent challenges of data imbalance, the proposed framework significantly improves the accuracy, reliability, and real-time processing capabilities of firewall packet classification systems. Leveraging the adaptive learning capabilities of ANNs alongside the class-balancing strengths of SMOTE, ADASYN, and BorderlineSMOTE, this research provides comprehensive methodologies for data preprocessing, model training, and evaluation, ensuring solutions that are both practical and scalable. Ultimately, this work underscores the importance of merging theoretical advancements with practical implementations, setting a new benchmark for firewall packet classification and contributing to a more secure and resilient network environment.

2. Related Work

The intricate confluence of machine learning and network security has spurred extensive research, underscoring the paramountcy of ANNs in network traffic classification. This area’s burgeoning interest necessitates a granular examination of related works to delineate our study’s contextual fabric and underscore its significance.

Garcia-Teodoro et al. [14] spearheaded investigations in this domain, providing seminal insights into leveraging diverse machine learning techniques, including neural networks, for intrusion detection systems. This research is pivotal, laying the foundational understanding of the synergy between machine learning and network security and guiding our endeavor to explore advanced techniques for enhanced efficacy. Similarly, Sommer and Paxson [7] highlighted the pressing need for robust machine-learning models to navigate the multifaceted landscape of network traffic data. Their emphasis on resilient models informs our approach, focusing on developing ANNs to address the highlighted complexities [7].

Each subsequent study illuminates distinct facets of this domain. For instance, Ucar and Ozhan [15] concentrated on the adeptness of various data processing methodologies in analyzing firewall log data. On the other hand, Ertam and Kaya [9] and AL-Tarawneh and Bani-Salameh [8] explored classification approaches, providing comparative evaluations of classifiers’ efficacy using different metrics. These studies enriched our comprehension of classification efficacy, steering our selection and evaluation of techniques.

Naseer et al. [16] contrasted deep learning methodologies with conventional machine learning methodologies for intrusion detection, highlighting the former’s superiority. This understanding motivated our deep exploration into advanced learning methodologies for optimizing anomaly detection. Moreover, a comprehensive review by Salman et al. [17] unraveled the multifaceted applications and challenges inherent in internet traffic classification, illuminating the nuances of machine learning-based approaches. Zhao et al. [18] and AL-Behadili [19] probed into the enhancement of firewall systems through diverse machine learning models. Their findings corroborate the transformative potential of machine learning models in fortifying firewall operations and network security.

Andalib and Babamir [20] and Enosh Shaoul and Sonare [21] accentuated the formidable capabilities of machine learning algorithms coupled with big data in detecting anomalies in firewall policies and identifying complex Denial of Service (DoS) and Distributed Denial of Service (DDoS) attacks, respectively. Integrating big data with machine learning in these studies informed our methodology, as it emphasized the value of leveraging diverse datasets and sophisticated algorithms for precise anomaly detection.

Enosh Shaoul and Sonare [21]’s research is pivotal as they utilized a convolutional neural network (CNN) model to detect complex DoS and DDoS attacks, achieving 98% accuracy. This research not only underscores the potential of CNNs in recognizing various attack patterns but also sets a benchmark for our study, inspiring advancements in network security. Drawing parallels, Zhang et al. [22] delved into a comparative study, introducing a new neural network architecture—MODLSTM. Their innovative approach demonstrated substantial potential in detecting DoS attacks, contributing pivotal insights for enhancing the reliability of firewall systems against diverse malicious traffic, which is closely related to our research focus. Building on these insights, Al-Haija and Ishtaiwi [23] engineered an intelligent classification model for firewall systems, utilizing a shallow neural network (SNN). Their model, attaining 98.5% accuracy, surpassed several contemporary firewall classification systems, emphasizing the significance of incorporating advanced models in firewall systems—a concept central to our study. Further, Marques et al. [24] developed a dataset based on accurate DNS logs and employed various machine-learning methods. Their findings, especially the superior performance of the CART algorithm, highlighted the importance of effective feature selection methods and profoundly impacted our approach to data classification and feature selection.

Rahman et al. [25] emphasized the significance of measurement data in enhancing classifier performance, achieving a harmonic mean F1-Score with 99% accuracy using a random forest (RF) technique. Their methodology and results provided critical insights into optimizing classifier performance, steering our study toward developing robust and accurate classification models.

As cloud computing becomes increasingly central due to data security concerns and cyber threats, the development of practical intrusion detection systems (IDSs) has gained importance. Machine learning-based IDSs are proficient at detecting network threats, yet they experience performance degradation with large datasets. Bakro et al. [26] have proposed a hybrid approach that addresses the challenges of imbalanced datasets by integrating SMOTE with various feature selection techniques. This approach, tested using an RF model, achieved 98% and 99% accuracy rates on the UNSW-NB15 and Kyoto datasets, respectively, demonstrating superior performance compared to similar studies.

Bakro et al. [27] presented an advanced IDS for cloud environments in their study. This system enhances classification accuracy by optimizing feature selection and weighting the ensemble model using the crow search algorithm (CSA). The ensemble classifier incorporates machine and deep learning models, including RF, SVM, XGBoost, and Decision Tree (DT). Tested on NSL-KDD, Kyoto, and CSE-CIC-IDS-2018 datasets, the ensemble model achieved high accuracy, recall, and precision levels. The study particularly highlights the superiority of the ensemble model in terms of the false positive rate (FPR) and false negative rate (FNR).

In recent years, the evolution of deep learning has introduced novel methodologies like the Deep Packet method [28,29,30]. Due to its capacity to uncover hidden network patterns, this method offers a blueprint for our study, allowing us to delve deeper into network behavior analysis.

In summary, the existing body of research demonstrates significant progress in the application of machine learning techniques for firewall packet classification, with models like Random Forests (RFs), Support Vector Machines (SVMs), and K-Nearest Neighbors (KNNs) showing high accuracy across various datasets. However, a common limitation across many of these studies is the insufficient focus on addressing class imbalance, a critical issue that can skew results and compromise the reliability of classification models. While methods such as under-sampling and basic preprocessing have been applied, they often fall short of effectively balancing the data. Our study builds upon these foundational works by integrating advanced data balancing techniques—SMOTE, ADASYN, and BorderlineSMOTE—into an Artificial Neural Network (ANN) framework. This approach not only enhances the accuracy and reliability of classification across all classes, including minority ones, but also sets a new benchmark in the field of firewall packet classification, addressing the gaps identified in previous research.

Table 1 presents a comparative analysis of the success rates in using ANNs and various machine learning techniques for firewall packet classification across different datasets and feature sets.

In conclusion, our study makes a significant contribution to the field of network security by addressing the critical challenge of firewall packet classification through the innovative combination of Artificial Neural Networks (ANNs) and advanced data balancing techniques, including SMOTE, ADASYN, and BorderlineSMOTE. Unlike previous studies, such as those by Aljabri et al. [6] and AL-Tarawneh and Bani-Salameh [8], which primarily relied on traditional machine learning models like Random Forests (RFs) and K-Nearest Neighbors (KNNs) without specifically addressing class imbalance, our approach directly tackles this issue, which is often a limiting factor in achieving high accuracy and reliable classification performance.

The primary objective of our research was to improve the accuracy and reliability of classifying firewall packets, particularly in the face of class imbalances that can compromise the effectiveness of security systems. While studies like those by Ertam and Kaya [9] and Ucar and Ozhan [15] demonstrated high accuracy with methods like Support Vector Machines (SVMs) and KNNs, they did not incorporate strategies to mitigate the impact of imbalanced datasets. By carefully analyzing network traffic data and implementing these advanced balancing techniques, we developed a robust model that achieves enhanced classification performance across all packet classes, including those that are traditionally underrepresented.

This study not only advances the current understanding and application of machine learning in network security—where prior works such as those by AL-Behadili [19] and Rahman et al. [25] showed the efficacy of Decision Trees (DTs) and Random Forests—but also establishes a solid foundation for future research. By explicitly integrating data balancing methods that adaptively generate synthetic samples based on the difficulty of classifying minority classes, as demonstrated in our approach, we aspire to set a new standard for firewall packet classification methodologies, addressing the shortcomings of previous studies that often overlooked the complexities introduced by class imbalance. This research, therefore, builds upon and significantly contributes to existing work in the field, offering a more comprehensive and effective solution to the challenges of network security.

3. Materials and Methods

3.1. Firewall Data

The dataset employed in the analysis presented in Table 2 encompasses a variety of attributes pertinent to network traffic. These attributes include the source port, destination port, Network Address Translation (NAT) source port, NAT destination port, and metrics related to data flow, such as the volume of bytes and the number of packets sent and received. Additionally, the dataset captures the duration of network activities, providing a comprehensive overview of network traffic characteristics essential for effective firewall packet classification. Furthermore, the dataset records the action taken in response to the network traffic, with activities such as allowance, denial, dropping, or reset-both [36].

Here is a brief description of each of the fields in the dataset:

Source Port: this designates the port number on the originating machine from which the network traffic is initiated.

Destination Port: this signifies the port number on the recipient machine to which the network traffic is directed.

NAT Source Port: NAT, an acronym for Network Address Translation, is a methodology firewalls use to correlate an internal (private) IP address to a distinctive public IP address. This field represents the source port number post the application of NAT.

NAT Destination Port: this constitutes the destination port number after the NAT implementation.

Bytes: this quantifies the cumulative number of bytes constituting the data in the network packet.

Bytes Sent: this represents the volume of data bytes dispatched from the source to the destination.

Bytes Received: this demarcates the quantity of data bytes that the source accrues from the destination.

Packets: this denotes the overall count of network packets embodied in the network traffic.

Elapsed Time (s): this signifies the total duration of the network communication.

pkts_sent: this quantifies the network packets transmitted from the source to the destination.

pkts_received: this enumerates the network packets the source garners from the destination.

Action: the “Action” field in a firewall denotes the prescribed response to network packets and acts as a categorical variable, including ‘allow’, ‘deny’, ‘drop’, and ‘reset-both’ as potential values. ‘Allow’ permits passage of the packet through the network; ‘deny’ blocks it; ‘drop’ silently discards it without notifying the source; and ‘reset- both’ sends a reset packet to both source and destination, terminating the connection. As demonstrated in Figure 1, the distribution of actions is as follows: ‘allow’ at 37.640 instances, ‘deny’ at 14.987, ‘drop’ at 12.851, and ‘reset- both’ at 54, reflecting the varying frequencies of each response type in the analyzed dataset.

3.2. Limitations

The dataset reveals a significant number of instances classified as “allow” (37.640 occurrences), indicating a prevalence of packets allowed by the firewall. Additionally, there are actual occurrences of samples classified as “deny” (14.987 occurrences) and “drop” (12.851 events), representing packets that were denied or dropped by the firewall, respectively. However, the subclassification class “reset-both” is severely underrepresented in the dataset, with only 54 occurrences. This class imbalance raises concerns about the model’s performance and ability to predict the “reset-both” subclassification accurately. Moreover, the restricted quantity of instances encapsulated in the “reset-both” class might constrain the model’s capacity to generalize and apprehend the complexity inherent in real-world situations, particularly those where the firewall undertakes the reset of packets. Furthermore, within the methodological constraints of the study, the analysis was conducted solely using ANNs and SMOTE. The sequential process adhered to within this research study is depicted in Figure 2.

3.3. Data Analysis

The aim is to train an ANN model to predict the ‘Action’ taken given the other network features. The model is implemented in Python using the Keras library with the TensorFlow backend.

The method used in the analysis can be outlined as follows:

Data Preparation: The dataset is first loaded into a pandas DataFrame. The target variable ‘Action’ is separated from the input features. The input features are normalized using the StandardScaler from the sklearn library, which transforms the data such that their distribution will have a mean value of 0 and a standard deviation of 1. The target variable is label-encoded and then one-hot-encoded to transform into a binary class matrix for use with the categorical cross-entropy loss function during training.

Data Splitting: The dataset is subjected to a partitioning procedure to create distinct training and testing subsets. Specifically, 80% of the dataset is allocated for training purposes, establishing the foundation for the model’s learning. The remaining 20% is reserved for testing, providing an unbiased evaluation of the model’s predictive capability on unseen data.

3.4. Artificial Neural Networks (ANNs)

ANNs epitomize a subset of machine learning models that derive their foundational concept from biological neural networks’ structural and operational paradigm. ANNs encompass a network of interconnected artificial neurons systematically organized in layers. These layers are instrumental in processing and disseminating information via weighted connections [37]. The artificial neuron mimics a biological neuron’s input, processing, and output properties. Figure 3 shows the results produced by the network. The net input obtained by multiplying the information entered into the network by its weights (W) is processed with the transfer function and taken from the output layer [38,39,40].

Classification employing ANNs necessitates training a neural network model to categorize data into distinct classes or categories. ANNs have been extensively applied to classification tasks due to their ability to learn intricate patterns and relationships in the data. The model encompasses an input layer, a hidden layer equipped with ten neurons, a subsequent hidden layer endowed with five neurons, and an output layer having a neuron count equivalent to the number of classes within the classification task. The Rectified Linear Unit (ReLU) is utilized in the hidden layers, while the sigmoid activation function is employed in the output layer.

Multilayer Perceptrons are used to learn if there is no linear relationship between the inputs and outputs of ANNs. Therefore, this method was used in the study.

Model Architecture: A sequential ANN model is constructed using the Keras deep learning framework. This model encompasses an input layer, two hidden layers, and an output layer. The input layer is designed with a neuron count equivalent to the number of input features. The two concealed layers possess 10 and 5 neurons, applying the Rectified Linear Unit (ReLU) as their activation function. The output layer comprises a neuron count mirroring the number of classes in the target variable and employs the Sigmoid activation function, hence providing class probabilities as the output.

Model Compilation: The categorical cross-entropy loss function is widely recognized as an optimal choice for multi-class classification problems due to its ability to effectively measure the discrepancy between predicted probabilities and actual class labels. The model undergoes compilation utilizing the Adam optimization algorithm and the categorical cross-entropy loss function, an optimal arrangement for addressing multi-class classification quandaries [41].

Model Training: the model is trained on the training data for a specified number of epochs (50). The batch size (50) is also specified.

Model Evaluation: the model’s performance on the test set is evaluated, and the accuracy of the predictions is reported.

Results Visualization: A confusion matrix and a classification report are constructed to delineate the model’s performance on the test set. The confusion matrix provides a graphical representation of the model’s predictive success and failure for each class, enumerating correct and incorrect predictions. In parallel, the classification report offers salient classification metrics such as precision, recall, and F1-Score for each class. These tools collectively facilitate a comprehensive understanding of the model’s class-wise performance.

The Sigmoid and ReLU activation functions used in the ANNs are calculated as follows:

Sigmoid : f (a) = \frac{1}{1 + e^{- a}} = \frac{e^{a}}{1 + e^{a}}

(1)

ReLU : (0, a) = 0, i f a < 0; (0, a) = a, i f a \geq 0

(2)

3.5. Synthetic Minority Over-Sampling Technique (SMOTE)

In machine learning classification, imbalanced data refers to datasets where the number of instances belonging to different classes is uneven, leading to a potential bias in the classifier’s performance. SMOTE, introduced by Chawla et al. [42], is an oversampling technique that aims to overcome the imbalance problem by generating synthetic examples for the minority class. The method works by randomly selecting a member from the minority class and identifying its K-Nearest Neighbors (k-NNs). From these neighbors, SMOTE creates new synthetic examples along the line segment connecting the minority class member and its neighbors [42].

Generating synthetic examples involves selecting a minority class instance and calculating the feature-wise differences between it and its neighbors. SMOTE then randomly chooses a number between 0 and 1, multiplying the feature-wise differences by this value. The resulting values are added to the selected minority instance, producing new synthetic examples that represent the minority class but differ slightly in their feature values. Applying SMOTE makes class distribution more balanced, as artificial models are introduced to augment the minority class. This helps to alleviate the bias caused by imbalanced data and allows the classifier to learn from a more representative dataset [43]. The minority class member generation diagram based on SMOTE and k-NN algorithms is shown in Figure 4 below.

3.6. BorderlineSMOTE

BorderlineSMOTE is an advanced oversampling technique developed to improve upon the traditional SMOTE method, which generates synthetic samples for minority classes using their nearest neighbors [42]. A key limitation of SMOTE is its tendency to cause overlap between minority and majority class samples, leading to potential misclassification. To address this, Han et al. introduced BorderlineSMOTE, which focuses specifically on minority class samples near the decision boundary—those most susceptible to misclassification [45].

The BorderlineSMOTE algorithm identifies the K-Nearest Neighbors of each minority class sample. If a sample is surrounded predominantly by majority class neighbors, it is placed in a DANGER set, as it is more likely to be misclassified. Synthetic samples are then generated for these DANGER set instances by interpolating between the original sample (P_d) and its nearest neighbors (P_k). The new synthetic sample (P_new) is calculated using the following formula [46,47]:

P_{n e w} = P_{d} + r a n d (0,1) x (P_{k} - P_{d})

(3)

Here, rand(0, 1) is a random number between 0 and 1, ensuring that the synthetic sample is located along the line segment between the original sample and its nearest neighbor.

By focusing on generating synthetic samples near the decision boundary, BorderlineSMOTE reduces the risk of class overlap and improves the classifier’s ability to distinguish between classes, particularly in datasets with significant class imbalance.

3.7. Adaptive Synthetic Sampling (ADASYN)

ADASYN, introduced by He et al. [48], is an adaptive oversampling technique designed to address class imbalance by generating synthetic samples for the minority class. Unlike the traditional SMOTE method, which generates an equal number of synthetic samples for each minority instance, ADASYN prioritizes those minority samples that are harder to classify, thereby producing more synthetic data around these challenging instances. This approach reduces the learning bias introduced by imbalanced data and enhances the classifier’s ability to correctly classify minority class instances [49].

3.8. Model Performance and Evaluation

The TP, TN, FP, and FN confusion matrix metrics in Table 3 provide values for correct or incorrect classification of packets at the firewall. Recall, also known as sensitivity, measures the model’s ability to identify positive cases correctly. It is computed as the ratio of accurate positive predictions to the total number of actual positive cases. Precision evaluates the proportion of relevant instances among those retrieved by the model, defined as the ratio of true positive predictions to all positive predictions made by the model. Accuracy represents the overall effectiveness of a model in making correct predictions, calculated as the proportion of true predictions (both true positives and true negatives) to the total number of cases. F-Score, or F1-Score, integrates precision and recall into a single metric by calculating their harmonic mean. This metric is handy when seeking a balance between precision and recall, providing a more holistic evaluation of model performance. These values were used to calculate precision, recall, f-measure, and accuracy metrics as follows:

Precision = \frac{T P}{(T P + F P)}

(4)

Recall = \frac{T P}{(T P + F N)}

(5)

F-measure = \frac{(2 * p r e c i s i o n * r e c a l l)}{(p r e c i s i o n + r e c a l l)}

(6)

Accuracy = \frac{(T P + T N)}{(T P + T N + F P + F N)}

(7)

4. Results

In this investigation, the proficiency of the ANN model in classifying firewall packets based on their attributes is scrutinized. A range of widely accepted performance metrics, encompassing the confusion matrix, accuracy, precision, recall, and F1-Score, are employed to comprehensively evaluate the model’s performance.

As presented in Table 4, the initial experiments involved using ANNs without SMOTE to classify firewall packets. The results revealed that for the “allow”, “deny”, and “drop” classes, the model achieved perfect precision, recall, and F1-Score values (1.00), indicating flawless performance. However, for the “reset-both” class, all these metrics were at 0.00, indicating that the model could not accurately predict any instance of this class. Despite the issue with the “reset-both” class, the model’s overall accuracy was still extremely high (1.00), as measured across a total of 13.107 samples.

The confusion matrix delineated in Figure 5 illuminates the proficiency of the ANNs in categorizing network actions into ‘allow’, ‘deny’, and ‘drop’. The ANN exhibits high accuracy with 7521 correct predictions for ‘allow’ and a complete accuracy for ‘drop’ actions at 2562 predictions. While the ‘deny’ class has a notable accuracy with 2939 correct classifications, there is a small fraction of misclassifications, with 55 instances erroneously predicted as ‘drop’. The model notably underperforms in the ‘reset-both’ category, misclassifying all instances into other categories, indicating a need for model refinement or reconsideration of the feature set for this particular class to improve its predictive capability. The results highlight the strengths and areas for improvement in the ANN’s capacity to discern between classes, particularly when handling imbalanced datasets or underrepresented classes. Figure 6 presents the graphs depicting the model’s training and validation accuracy and its training and validation loss.

Figure 6 presents the accuracy and loss graphs for the training and validation phases of the ANN model. The accuracy graph shows a rapid increase during the initial epochs, with both training and validation accuracy quickly reaching and stabilizing around 99%. This high level of accuracy, maintained consistently across epochs, indicates that the model is well-trained and highly capable of generalizing to unseen data, with minimal discrepancies between the training and validation sets.

The loss graph complements this observation, showing a steep decline in both training and validation losses early in the training process. The losses stabilize at low values, reflecting the model’s strong learning capability and its ability to minimize errors. Although there are minor fluctuations, particularly a noticeable spike in the validation loss around epoch 30, the overall trend suggests that the model effectively converges, with both losses remaining low and closely aligned.

These results underscore the model’s robustness and efficiency, highlighting its ability to achieve near-perfect accuracy with minimal overfitting. The minor variations in the validation loss suggest that the model handles the complexity of the data well, maintaining a high level of performance throughout the training process.

The SMOTE algorithm was applied to balance the dataset and address the shortcomings of the initial model. The results of the subsequent experiments using this balanced dataset are presented in Table 5. Here, we see that the model performance was improved for all classes, including the previously problematic “reset-both” class, which now showed a precision of 0.77, a recall of 0.87, and an F1-Score of 0.82. The “allow” and “drop” classes maintained their perfect scores, while the “deny” classes showed a slight drop but still had high scores. When tested on the set of 13,107 samples in Table 4, the model’s overall accuracy was 1.00, indicating a high level of predictive capability across all classes.

The confusion matrix presented in Figure 7 underscores the elevated level of predictive precision achieved by the ANN after applying SMOTE to a balanced training dataset. The ANN demonstrated exceptional accuracy in the classification of ‘allow’, ‘deny’, and ‘drop’ directives, with the classification of ‘drop’ actions being particularly noteworthy due to its flawless accuracy. Notwithstanding a slight misclassification in the ‘reset-both’ category, such instances were negligible in the grand scope of the data, emphasizing the model’s resilience. The findings corroborate the effectiveness of employing SMOTE with an ANN to amplify the training dataset’s balance, thereby significantly boosting the model’s discriminative capacity across the various classes within the dataset. Figure 8 presents the graphs depicting the model’s training and validation accuracy and its training and validation loss.

The training and validation graphs presented provide critical insights into the performance of the ANN model when combined with SMOTE, as reflected in the numerical results of Table 5. The accuracy graph (left) shows that while training accuracy steadily increases, validation accuracy demonstrates fluctuations across epochs. These fluctuations indicate potential overfitting, where the model might be learning patterns specific to the training data but not generalizing well to unseen data. This behavior is consistent with the precision, recall, and F1-Scores observed, particularly for the minority classes like “deny” and “reset-both”, where the recall values are lower, leading to a weighted average accuracy of 90%. The loss graph (right) further corroborates these observations, with the validation loss not following the smooth decrease seen in the training loss. This divergence suggests that while the model is minimizing error on the training data, it struggles to achieve the same level of performance on the validation set, reinforcing the need for further tuning or alternative approaches to improve generalization.

The results presented in Table 6 demonstrate the performance of the ANN model when applying the ADASYN function for data balancing. The overall accuracy is 85%, indicating a decrease compared to the SMOTE-enhanced model. This reduction is primarily attributable to the “deny” class, which shows a recall of 60% and an F1-Score of 68%, highlighting the model’s struggle to accurately identify instances in this class. While the “allow” and “drop” classes exhibit high precision and recall, the “reset-both” class also encounters moderate challenges, as evidenced by its F1-Score of 79%. The weighted and macro averages, both at 85%, reflect a more balanced yet less robust performance across all classes compared to the SMOTE results. These findings suggest that while ADASYN contributes to addressing class imbalance, it may not be as effective as SMOTE in enhancing the model’s overall classification accuracy and stability across all classes. Further optimization or the exploration of alternative methods may be necessary to achieve higher performance levels, particularly for the more challenging minority classes.

The Figure 9 confusion matrix, coupled with the performance metrics in Table 6, offers a detailed insight into the model’s classification capabilities following the application of the ADASYN function. The model exhibits outstanding accuracy in predicting the “allow” and “drop” classes, with near-perfect precision and recall values, as evidenced by minimal misclassifications. However, the matrix reveals a notable challenge in distinguishing between the “deny” and “reset-both” classes, where a significant number of “deny” instances are incorrectly classified as “reset-both”. This misclassification is reflected in the lower recall and F1-Scores for the “deny” class, indicating that while the model is generally effective, there is a clear need for further refinement to enhance its ability to accurately differentiate between these classes. These results suggest that while the model performs well overall, targeted improvements are necessary to address these specific classification challenges and improve its robustness in more complex scenarios.

Figure 10 illustrates the training and validation accuracy and loss curves across 50 epochs for the ANN model enhanced with the ADASYN function. The accuracy graph shows a steady increase in both training and validation accuracy, eventually converging around 85%, indicating that the model achieves a high level of generalization without overfitting. The close alignment of the training and validation curves further suggests that the model maintains consistency in its performance across both the training and validation datasets.

On the loss graph, both training and validation losses decrease significantly during the initial epochs, stabilizing around a similarly low value after approximately 20 epochs. The minimal gap between the training and validation loss curves, along with their convergence, indicates that the model is learning effectively, with no signs of significant overfitting or underfitting. Overall, these graphs confirm that the ANN model, when combined with ADASYN, performs robustly, achieving a balanced and reliable classification across the imbalanced dataset.

Table 7 summarizes the performance metrics for the ANN model enhanced with the BorderlineSMOTE function, highlighting its effectiveness in classifying across different categories. The model achieves exemplary results, particularly in the “allow” and “reset-both” classes, with F1-Scores of 1.00 and 0.98, respectively, reflecting nearly perfect precision and recall. The “deny” class also performs strongly, with an F1-Score of 0.95, showcasing the model’s ability to handle this category with high accuracy despite the inherent complexity. The “drop” class, though with a lower support, still attains a notable F1-Score of 0.92, supported by a perfect recall, demonstrating the model’s robustness even with fewer instances.

The overall accuracy of 0.97, combined with the high weighted and macro averages of 0.96 and 0.97, respectively, indicates that the model not only addresses class imbalance effectively but also sustains a high level of accuracy across all classes. These results underscore the model’s capability to deliver balanced and reliable classification performance, making it a highly suitable approach for tasks that demand both precision and sensitivity, particularly in imbalanced datasets.

Figure 11 illustrates the confusion matrix for the ANN model combined with the BorderlineSMOTE function, showcasing the model’s classification performance across the four classes. The model demonstrates exceptional accuracy, particularly in the “allow” and “reset-both” classes, with a high number of correctly classified instances (6001 and 5867, respectively) and minimal misclassifications. The “deny” class also performs well, although there is a slight decrease in precision, as indicated by 384 instances being misclassified as “reset-both” and 160 as “allow”. Notably, the “drop” class is perfectly classified, with all 2122 instances accurately identified.

This confusion matrix highlights the model’s robustness, especially in handling the “allow” and “reset-both” classes, and its effectiveness in maintaining high classification accuracy despite class imbalances. The minor misclassifications in the “deny” class suggest an area for potential improvement, but overall, the model exhibits outstanding performance, providing reliable predictions across all categories. These results underscore the efficacy of integrating ANNs with the BorderlineSMOTE function in enhancing the model’s ability to accurately differentiate between various classes.

Figure 12 displays the accuracy and loss graphs for both the training and validation phases in the ANN model enhanced with the BorderlineSMOTE function. The accuracy graph shows that both training and validation accuracy improve steadily, stabilizing around 97%, indicating that the model has achieved high generalization capability. The close alignment between the training and validation accuracy curves, with only minor fluctuations, suggests that the model is not overfitting and is performing consistently across both datasets.

The loss graph further corroborates this observation, with both training and validation losses decreasing over time. The training loss stabilizes early, while the validation loss continues to fluctuate slightly, which is expected given the complexity of the data and the nature of the BorderlineSMOTE function. Despite a few spikes in the validation loss, the overall downward trend and the eventual convergence of the training and validation losses indicate that the model is effectively learning and generalizing.

These results affirm that the ANN model, when combined with BorderlineSMOTE, not only handles class imbalances effectively but also maintains strong predictive performance with minimal overfitting, making it a robust choice for complex classification tasks.

The real-time performance metrics in Table 8 reveal notable differences in latency and throughput among the various ANN models used for firewall packet classification. The baseline ANN model, without any data balancing techniques, exhibits the highest latency at 0.355833 s per prediction and the lowest throughput at 360.58 samples per second, indicating slower processing and less efficiency in handling large volumes of data in real-time scenarios.

On the other hand, the ANN models integrated with data balancing techniques, such as SMOTE, BorderlineSMOTE, and ADASYN, demonstrate significant improvements in real-time performance. The ANN + BorderlineSMOTE model, in particular, achieves the highest throughput of 510.53 samples per second with a markedly lower latency of 0.107312 s per prediction. This suggests that BorderlineSMOTE not only enhances the model’s classification accuracy but also optimizes its efficiency in real-time processing.

Similarly, the ANN + ADASYN model achieves the lowest latency among the tested models, at 0.089748 s per prediction, along with a robust throughput of 444.19 samples per second. The ANN + SMOTE model also shows considerable gains, with a latency of 0.107679 s and a throughput of 406.59 samples per second, demonstrating the effectiveness of SMOTE in improving both classification performance and processing speed.

Overall, these findings underscore the benefits of integrating advanced data balancing techniques with ANNs. Not only do these techniques enhance the accuracy of the models when handling imbalanced datasets but they also significantly improve real-time processing capabilities, making the models more suitable for deployment in time-sensitive network security environments.

Table 9 provides a comparative analysis of various studies on firewall packet classification, highlighting the performance of different models across a range of datasets and feature sets. In comparison to other studies, the ANN model combined with BorderlineSMOTE in our investigation demonstrates a high level of accuracy at 97%, with a weighted average of 96%. This performance is notably strong, especially when considering the complex nature of the data and the challenges associated with class imbalance, which were effectively mitigated using BorderlineSMOTE.

The comparison shows that our model’s performance is competitive with other advanced classifiers, such as Random Forest (RF) and Support Vector Machine (SVM), which are commonly used in similar contexts. For instance, the Random Forest model in AL-Tarawneh and Bani-Salameh’s [8] study achieved a slightly higher accuracy of 99.7%, though this model’s recall was lower at 0.85, indicating that while it was highly precise, it may not have been as effective at identifying all relevant instances. Similarly, the study by Rahman [25] et al. reported a 99% accuracy using SVM, RF, k-NN, and Logistic Regression, which is slightly higher than our results but without addressing the specific challenges of imbalanced datasets that our study confronts.

Furthermore, the diversity of models used across the different studies, such as Decision Trees (DTs), SVMs with various activation functions, and ensemble methods, highlights the complexity of achieving high accuracy in firewall packet classification. Despite this complexity, our study’s results are robust, showcasing the efficacy of integrating ANNs with data balancing techniques like BorderlineSMOTE to achieve a balanced, high-performance model. This positions our approach as a strong contender in the domain, particularly in tasks requiring both high accuracy and the ability to handle class imbalance effectively.

5. Conclusions

This study investigated the use of Artificial Neural Networks (ANNs) in combination with various data balancing techniques—namely SMOTE, ADASYN, and BorderlineSMOTE—to classify firewall packets, a critical task in network security. The findings from our research demonstrate that while ANNs are highly effective for this purpose; their performance is significantly enhanced when combined with appropriate data balancing techniques to address the inherent challenges of class imbalance.

The ANN model combined with BorderlineSMOTE emerged as the most effective approach, achieving an accuracy of 97% and a weighted average precision and recall of 96%. This model demonstrated strong performance across all classes, including minority ones, making it a robust tool for real-world applications where security accuracy is paramount. The integration of SMOTE and ADASYN also provided improvements, though to varying degrees, with SMOTE showing better overall performance in balancing the dataset than ADASYN.

Moreover, the real-time performance metrics indicate that the integration of these balancing techniques not only improves classification accuracy but also enhances the model’s efficiency in processing real-time data. This dual benefit underscores the critical importance of addressing class imbalance in machine learning models, particularly in cybersecurity contexts where the cost of misclassification can be severe.

These results underscore the importance of addressing class imbalance in machine learning models, particularly in cybersecurity applications where the cost of misclassification can be severe. The research highlights that the combination of ANNs with data balancing techniques not only improves overall accuracy but also ensures that the model performs consistently across all classes, including those that are underrepresented.

6. Discussion

The findings of this study offer valuable insights into the application of Artificial Neural Networks (ANNs) combined with advanced data balancing techniques—specifically SMOTE, ADASYN, and BorderlineSMOTE—for the classification of firewall packets. The results not only demonstrate the robustness of ANNs in this context but also emphasize the critical importance of addressing class imbalance to enhance model performance across all categories.

Our analysis revealed that integrating ANNs with BorderlineSMOTE resulted in the most effective model, achieving an accuracy of 97% with a weighted average precision and recall of 96%. This model’s strong and consistent performance across all classes, including minority ones, is further underscored by its real-time efficiency, as evidenced by the latency and throughput metrics in Table 8. With a latency of 0.107312 s per prediction and a throughput of 510.53 samples per second, the ANN + BorderlineSMOTE model not only achieves high classification accuracy but also demonstrates superior real-time processing capabilities, making it a robust tool for practical network security applications.

In comparison to prior studies, such as the Random Forest (RF) model used by AL-Tarawneh and Bani-Salameh [8], which achieved slightly higher accuracy at 99.7%, our approach offers a more balanced performance across all classes, particularly those that are underrepresented. The RF model’s lower recall (0.85) indicates potential struggles with accurately identifying minority classes, a challenge our model effectively addresses through the use of BorderlineSMOTE.

Ertam and Kaya [9] explored Support Vector Machines (SVMs) with different activation functions, achieving a high recall of 0.985 with the sigmoid function, but their results were less consistent across metrics. In contrast, our model with BorderlineSMOTE maintained a more balanced performance, achieving high precision, recall, and F1-Scores across all classes. Furthermore, our model exhibited lower latency and higher throughput compared to traditional methods, making it more suitable for real-time applications.

The study by AL-Behadili [19], which reported a 99.839% accuracy using Decision Trees (DTs), highlights the necessity of addressing class imbalance, an area that our research specifically targeted. While their model achieved high accuracy, it did not address the complexities of imbalanced data, which are critical for ensuring robust model performance. Our findings indicate that the integration of data balancing techniques like BorderlineSMOTE can significantly improve the model’s ability to handle such challenges, as reflected in both the classification accuracy and real-time performance.

Similarly, Rahman et al. [25] achieved 99% accuracy using a combination of SVM, Random Forest, k-NN, and Logistic Regression. However, their lack of focus on class imbalance limits the generalizability of their findings to datasets with underrepresented classes. Our approach, which integrates ANNs with data balancing techniques, particularly BorderlineSMOTE, provides a more comprehensive solution, ensuring balanced classification across all classes and demonstrating efficiency in real-time processing.

Sharma et al. [31] employed various classifiers, achieving 99.8% accuracy with a stacking ensemble, but the direct handling of class imbalance through techniques like BorderlineSMOTE, as demonstrated in our study, offers a more targeted approach to ensuring consistent performance across all classes, particularly the minority ones. Additionally, our model’s favorable real-time metrics highlight its practicality in real-world security scenarios where quick and accurate decision-making is essential.

The comparison with prior studies underscores the importance of not only achieving high accuracy but also ensuring that models perform consistently across all classes, particularly in imbalanced datasets. Our findings suggest that the integration of ANNs with targeted data balancing techniques like BorderlineSMOTE offers a robust solution for firewall packet classification, especially in applications where the correct classification of minority classes is critical. The added benefit of superior real-time performance further strengthens the case for using such integrated approaches in operational environments.

Future research could explore further combinations of ANNs with other advanced techniques, such as ensemble methods or hybrid approaches that incorporate different data balancing strategies. Additionally, expanding the scope of datasets to include more diverse network traffic scenarios could enhance the generalizability of these findings. Addressing these aspects will contribute to the development of more sophisticated and effective network security solutions.

In conclusion, while our study’s accuracy is slightly lower than some of the highest-reported figures in the literature, the balanced and consistent performance across all metrics, combined with the enhanced real-time processing capabilities, makes it a compelling approach for firewall packet classification. This study highlights the importance of addressing class imbalance and provides a foundation for future exploration in this area, contributing to the broader field of cybersecurity and machine learning.

Author Contributions

Conceptualization, A.K. and S.B.; methodology, A.K.; software, A.K., S.B., S.K. and T.T.; validation, A.K., S.B., T.T. and T.I.; investigation, A.K., S.B., T.T., S.K. and T.I.; writing—original draft preparation, A.K., S.B., T.T., S.K. and T.I.; writing—review and editing, A.K., S.B., T.T., S.K. and T.I.; visualization, A.K., S.B., T.T. and T.I. All authors have read and agreed to the published version of the manuscript.

Funding

This study is partially financed by the European Union-NextGenerationEU, through the National Recovery and Resilience Plan of the Republic of Bulgaria, project № BG-RRP-2.013-0001-C01.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Three datasets were collected from publicly available sources; they can be found at (1) UCI Machine Learning Repository. https://doi.org/10.24432/C5131M (accessed on 10 July 2023).

Conflicts of Interest

The authors declare no conflicts of interest.

References

Pang, B.; Fu, Y.; Ren, S.; Shen, S.; Wang, Y.; Liao, Q.; Jia, Y. A multi-modal approach for context-aware network traffic classification. In Proceedings of the ICASSP 2023—2023 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Rhodes Island, Greece, 4–10 June 2023; pp. 1–5. [Google Scholar] [CrossRef]
Gupta, B.B.; Badve, O.P. Taxonomy of DoS and DDoS attacks and desirable defense mechanism in a cloud computing environment. Neural Comput. Appl. 2017, 28, 3655–3682. [Google Scholar] [CrossRef]
DeCarlo, A.L.; Ferrell, R.G. The 5 Different Types of Firewalls Explained. SearchSecurity. 2021. Available online: https://www.techtarget.com/searchsecurity/feature/The-five-different-types-of-firewalls (accessed on 10 July 2023).
Indeed.com. What Is Packet Filtering? (Benefits and Types). 2022. Available online: https://www.indeed.com/career-advice/career-development/packet-filtering (accessed on 10 July 2023).
Khunkitti, A.; Chongsujjatham, P. A rule-based training for artificial neural network packet filtering Firewall. In Proceedings of the 2019 6th International Conference on Systems and Informatics (ICSAI), Shanghai, China, 2–4 November 2019; pp. 1010–1014. [Google Scholar] [CrossRef]
Aljabri, M.; Alahmadi, A.A.; Mohammad, R.M.A.; Aboulnour, M.; Alomari, D.M.; Almotiri, S.H. Classification of firewall log data using multiclass machine learning models. Electronics 2022, 11, 1851. [Google Scholar] [CrossRef]
Sommer, R.; Paxson, V. Outside the closed world: On using machine learning for network intrusion detection. In Proceedings of the 2010 IEEE Symposium on Security and Privacy, Oakland, CA, USA, 16–19 May 2010; pp. 305–316. [Google Scholar] [CrossRef]
AL-Tarawneh, B.A.; Bani-Salameh, H. Classification of firewall logs actions using machine learning techniques and deep neural network. AIP Conf. Proc. 2023, 2979, 050003. [Google Scholar] [CrossRef]
Ertam, F.; Kaya, M. Classification of firewall log files with multiclass support vector machine. In Proceedings of the 2018 6th International Symposium on Digital Forensic and Security (ISDFS), Antalya, Turkey, 22–25 March 2018; pp. 1–4. [Google Scholar] [CrossRef]
Turčaník, M. Packet filtering by artificial neural network. In Proceedings of the International Conference on Military Technologies (ICMT), Brno, Czech Republic, 19–21 May 2015; pp. 1–4. [Google Scholar] [CrossRef]
Valentin, K.; Malý, M. Network firewall using artificial neural networks. Comput. Inform. 2013, 32, 1312–1327. Available online: https://www.cai.sk/ojs/index.php/cai/article/view/2167 (accessed on 10 July 2023).
Şenol, A.; Talan, T.; Aktürk, C. A new hybrid feature reduction method by using MCMSTClustering algorithm with various feature projection methods: A case study on sleep disorder diagnosis. Signal Image Video Process. 2024, 18, 4589–4603. [Google Scholar] [CrossRef]
Bishop, C.M. Pattern Recognition and Machine Learning; Springer: Berlin/Heidelberg, Germany, 2006; Volume 2, pp. 5–43. [Google Scholar]
Garcia-Teodoro, P.; Diaz-Verdejo, J.; Maciá-Fernández, G.; Vázquez, E. Anomaly-based network intrusion detection: Techniques, systems and challenges. Comput. Secur. 2009, 28, 18–28. [Google Scholar] [CrossRef]
Ucar, E.; Ozhan, E. The analysis of firewall policy through machine learning and data mining. Wirel. Pers. Commun. 2017, 96, 2891–2909. [Google Scholar] [CrossRef]
Naseer, S.; Saleem, Y.; Khalid, S.; Bashir, M.K.; Han, J.; Iqbal, M.M.; Han, K. Enhanced network anomaly detection based on deep neural networks. IEEE Access 2018, 6, 48231–48246. [Google Scholar] [CrossRef]
Salman, O.; Elhajj, I.H.; Kayssi, A.; Chehab, A. A review on machine learning–based approaches for Internet traffic classification. Ann. Telecommun. 2020, 75, 673–710. [Google Scholar] [CrossRef]
Zhao, Q.; Sun, J.; Ren, H.; Sun, G. Machine-learning based TCP security action prediction. In Proceedings of the 2020 5th International Conference on Mechanical, Control and Computer Engineering (ICMCCE), Harbin, China, 25–27 December 2020; pp. 1329–1333. [Google Scholar] [CrossRef]
AL-Behadili, H.N.K. Decision tree for multiclass classification of firewall access. Int. J. Intell. Eng. Syst. 2021, 14, 294–302. [Google Scholar] [CrossRef]
Andalib, A.; Babamir, S.M. Anomaly detection of policies in distributed firewalls using data log analysis. J. Supercomput. 2023, 79, 19473–19514. [Google Scholar] [CrossRef]
Enosh Shaoul, P.; Sonare, S. IoT network attack detection and classification using standardized recurrent neural network model. Int. J. Adv. Eng. Manag. 2023, 5, 157–164. [Google Scholar]
Zhang, H.; Min, Y.; Liu, S.; Tong, H.; Li, Y.; Lv, Z. Improve the security of industrial control system: A fine-grained classification method for DoS attacks on Modbus/TCP. Mob. Netw. Appl. 2023, 28, 839–852. [Google Scholar] [CrossRef]
Al-Haija, Q.A.; Ishtaiwi, A. Multiclass classification of firewall log files using shallow neural network for network security applications. In Soft Computing for Security Applications: Proceedings of ICSCS 2021; Ranganathan, Y., Fernando, X., Shi, F., El Allioui, Eds.; Springer: Berlin/Heidelberg, Germany, 2021; pp. 27–41. [Google Scholar] [CrossRef]
Marques, C.; Malta, S.; Magalhães, J. DNS firewall based on machine learning. Future Internet 2021, 13, 309. [Google Scholar] [CrossRef]
Rahman, M.H.; Islam, T.; Rana, M.M.; Tasnim, R.; Mona, T.R.; Sakib, M.M. Machine learning approach on multiclass classification of internet firewall log files. In Proceedings of the 2023 International Conference on Computational Intelligence and Sustainable Engineering Solutions (CISES), Greater Noida, India, 28–30 April 2023; pp. 358–364. [Google Scholar] [CrossRef]
Bakro, M.; Kumar, R.R.; Alabrah, A.A.; Ashraf, Z.; Bisoy, S.K.; Parveen, N.; Khawatmi, S.; Abdelsalam, A. Efficient Intrusion Detection System in the Cloud Using Fusion Feature Selection Approaches and an Ensemble Classifier. Electronics 2023, 12, 2427. [Google Scholar] [CrossRef]
Bakro, M.; Kumar, R.R.; Alabrah, A.; Ashraf, Z.; Ahmed, M.N.; Shameem, M.; Abdelsalam, A. An Improved Design for a Cloud Intrusion Detection System Using Hybrid Features Selection Approach with ML Classifier. IEEE Access 2023, 11, 64228–64247. [Google Scholar] [CrossRef]
Bengio, Y. Learning Deep Architectures for AI. Found. Trends® Mach. Learn. 2009, 2, 1–127. [Google Scholar] [CrossRef]
LeCun, Y.; Bengio, Y.; Hinton, G. Deep learning. Nature 2015, 521, 436–444. [Google Scholar] [CrossRef] [PubMed]
Lotfollahi, M.; Jafari Siavoshani, M.; Shirali Hossein Zade, R.; Saberian, M. Deep packet: A novel approach for encrypted traffic classification using deep learning. Soft Comput. 2020, 24, 1999–2012. [Google Scholar] [CrossRef]
Sharma, D.; Wason, V.; Johri, P. Optimized classification of firewall log data using heterogeneous ensemble techniques. In Proceedings of the 2021 International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 4–5 March 2021; pp. 368–372. [Google Scholar] [CrossRef]
Allagi, S.; Rachh, R. Analysis of network log data using machine learning. In Proceedings of the 2019 IEEE 5th International Conference for Convergence in Technology (I2CT), Bombay, India, 29–31 March 2019; pp. 1–3. [Google Scholar] [CrossRef]
As-Suhbani, H.E.; Khamitkar, S.D. Classification of firewall logs using supervised machine learning algorithms. Int. J. Comput. Sci. Eng. 2019, 7, 301–304. [Google Scholar] [CrossRef]
Cao, Q.; Qiao, Y.; Lyu, Z. Machine learning to detect anomalies in web log analysis. In Proceedings of the 2017 3rd IEEE International Conference on Computer and Communications (ICCC), Chengdu, China, 13–16 December 2017; pp. 519–523. [Google Scholar] [CrossRef]
Schindler, T. Anomaly Detection in Log Data using Graph Databases and Machine Learning to Defend Advanced Persistent Threats. arXiv 2018, arXiv:1802.00259. [Google Scholar] [CrossRef]
UCI. Internet Firewall Data. UCI Machine Learning Repository; UCI: Sydney, Australia, 2019. [Google Scholar] [CrossRef]
He, K.; Zhang, X.; Ren, S.; Sun, J. Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, 27–30 June 2016; pp. 770–778. [Google Scholar]
Egrioglu, E.; Aladag, C.H.; Yolcu, U.; Uslu, V.R.; Basaran, M.A. A new approach based on artificial neural networks for high order multivariate fuzzy time series. Expert Syst. Appl. 2009, 36, 10589–10594. [Google Scholar] [CrossRef]
Haykin, S. Neural Networks and Learning Machines; Pearson Education: Chennai, India, 2010; pp. 10–11. [Google Scholar]
Öztemel, E. Yapay Sinir Ağlari; Papatya Yayincilik: Istanbul, Turkey, 2012. [Google Scholar]
Goodfellow, I.; Bengio, Y.; Courville, A. Deep Learning; MIT Press: Cambridge, MA, USA, 2016. [Google Scholar]
Chawla, N.V.; Bowyer, K.W.; Hall, L.O.; Kegelmeyer, W.P. SMOTE: Synthetic minority over-sampling technique. J. Artif. Intell. Res. 2002, 16, 321–357. [Google Scholar] [CrossRef]
Amirruddin, A.D.; Muharam, F.M.; Ismail, M.H.; Tan, N.P.; Ismail, M.F. Synthetic minority over-sampling technique (SMOTE) and logistic model tree (LMT)-adaptive boosting algorithms for classifying imbalanced datasets of nutrient and chlorophyll sufficiency levels of oil palm (Elaeis guineensis) using spectroradiometers and unmanned aerial vehicles. Comput. Electron. Agric. 2022, 193, 106646. [Google Scholar] [CrossRef]
Hu, F.; Li, H. A novel boundary oversampling algorithm based on Neighborhood rough set model: NRSBoundary-SMOTE. Math. Probl. Eng. 2013, 2013, 694809. [Google Scholar] [CrossRef]
Han, H.; Wang, W.Y.; Mao, B.H. Borderline-SMOTE: A new over-sampling method in imbalanced data sets learning. In International Conference on Intelligent Computing; Springer: Berlin/Heidelberg, Germany, 2005; pp. 878–887. [Google Scholar]
Lee, T.; Kim, M.; Kim, S.P. Data augmentation effects using borderline-SMOTE on classification of a P300-based BCI. In Proceedings of the 2020 8th International Winter Conference on Brain-Computer Interface (BCI), Gangwon, Republic of Korea, 26–28 February 2020; IEEE: Piscataway, NJ, USA, 2020; pp. 1–4. [Google Scholar] [CrossRef]
Dey, I.; Pratap, V. A comparative study of SMOTE, borderline-SMOTE, and ADASYN oversampling techniques using different classifiers. In Proceedings of the 2023 3rd International Conference on Smart Data Intelligence (ICSMDI), Trichy, India, 30–31 March 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 294–302. [Google Scholar] [CrossRef]
He, H.; Bai, Y.; Garcia, E.A.; Li, S. ADASYN: Adaptive synthetic sampling approach for imbalanced learning. In Proceedings of the 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence), Hong Kong, 1–8 June 2008; IEEE: Piscataway, NJ, USA, 2023; pp. 1322–1328. [Google Scholar] [CrossRef]
Gosain, A.; Sardana, S. Handling class imbalance problem using oversampling techniques: A review. In Proceedings of the 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), Udupi, India, 13–16 September 2017; IEEE: Piscataway, NJ, USA, 2023; pp. 79–85. [Google Scholar] [CrossRef]

Figure 1. Firewall dataset classification attribute data distribution.

Figure 2. Research methodology steps.

Figure 3. Artificial neural network structure.

Figure 4. Minority class member generation diagram based on SMOTE algorithm [44].

Figure 5. Confusion matrix output of the study.

Figure 6. Accuracy and loss graphs of training and validation metrics of ANN’s analysis.

Figure 7. Confusion matrix output of study with ANNs + SMOTE function.

Figure 8. Accuracy and loss graphs of training and validation metrics of ANNs + SMOTE analysis.

Figure 9. Confusion matrix output of study with ANNs + ADASYN function.

Figure 10. Accuracy and loss graphs of training and validation metrics of ANNs + ADASYN analysis.

Figure 11. Confusion matrix output of study with ANNs + BorderlineSMOTE function.

Figure 12. Accuracy and loss graphs of training and validation metrics of ANNs + BorderlineSMOTE analysis.

Table 1. Success rates of ANN and machine learning applications for firewall packet classification.

Authors	Dataset	Data Preprocessing	Number of Features	Models	Results (Accuracy)
Aljabri et al. [6]	Private data extracted from a firewall	Under Sampling	11 features, 13 features	K Nearest Neighbor (KNN), Naive Bayes (NB), J48, RF and ANN	(RF) 99.6
AL-Tarawneh & Bani-Salameh [8]	Public network log from Firat University	-	11 features	KNN, RF, and DNN	(RF) 99.7
Ertam & Kaya [9]	Public network log from Firat University	-	11 features	SVM with Linear, polynomial, RBF, and sigmoid activation functions.	SVM with the Sigmoid 0.98.5
Ucar & Ozhan [15]	Private data extracted from the firewall	-	17 features	Naive Bayes, KNN, Decision Table and HyperPipes	Accuracy: KNN, 100
AL-Behadili [19]	Public network log from Firat University	-	11 features	DT, SVM, One R, ANN, PSO, and ZeroR.	(DT) 99.8
Andalib & Babamir [20]	Their data	-	17 features	DT, DNN, SVM,	Accuracy: combined method, 97
Marques et al. [24]	Their data	-	34 features	SVM, Logistic Regression (LR), Linear Discriminant Analysis (LDA), KNN, CART, NB	CART accuracy: above 96%
Rahman et al. [25]	Public network log from Firat University	-	11 features	SVM, RF, k-NN, Logistic Regression	(RF) 99%
Sharma et al. [31]	Public network log from Firat University	-	11 features	KNN, LR, SVM, DT, and stochastic gradient descent classifier.	Stacker ensemble classifier 98.5
Allagi & Rachh [32]	Public network log from UCI machine learning repository	-	-	K-means with SOFM algorithms.	97.2
As-Suhbani & Khamitkar [33]	Private network logs from their department	-	Six features	NB, KNN, One R, and J48.	(KNN) 99.87
Cao et al. [34]	Private security company network log	-	Six features	First level: SVM, LR, or DT. Second level: HMM	93.5
Schindler [35]	Public KDD-Cup 99/DARPA	-	-	Multi-class and one-class SVMs.	Accuracy: One-class SVMs: 98.67

Table 2. Dataset information.

	Mean	Std	Min	25%	50%	75%	Max
SourcePort	4.94 × 10¹⁰	1.53 × 10¹⁰	0.0	49,183.0	53,776.5	58,638.00	6.55 × 10¹⁰
DestinationPort	1.06 × 10¹⁰	1.85 × 10¹⁰	0.0	80.0	445.0	15,000.00	6.55 × 10¹⁰
NATSourcePort	1.93 × 10¹⁰	2.20 × 10¹⁰	0.0	0.0	8820.5	38,366.25	6.55 × 10¹⁰
NATDestinationPort	2.67 × 10¹⁰	9.74 × 10⁹	0.0	0.0	53.0	443.00	6.55 × 10¹⁰
Bytes	9.71 × 10¹⁰	5.62 × 10¹²	60.0	66.0	168.0	752.25	1.27 × 10¹⁵
BytesSent	2.24 × 10¹⁰	3.83 × 10¹²	60.0	66.0	90.0	210.00	9.48 × 10¹⁴
BytesReceived	7.47 × 10¹⁰	2.46 × 10¹²	0.0	0.0	79.0	449.00	3.21 × 10¹⁴
Packets	1.03 × 10⁸	5.13 × 10⁹	1.0	1.0	2.0	6.00	1.04 × 10¹²
ElapsedTime (s)	6.58 × 10⁷	3.02 × 10⁸	0.0	0.0	15.0	30.00	1.08 × 10¹⁰
pkts_sent	4.14 × 10⁷	3.22 × 10⁹	1.0	1.0	1.0	3.00	7.48 × 10¹¹
pkts_received	6.15 × 10⁷	2.22 × 10¹⁰	0.0	0.0	1.0	2.00	3.27 × 10¹¹

Table 3. Confusion matrix.

Predict Class
Actual Class		Yes	No
	Yes	TP	FN
	No	FP	TN

Table 4. Numerical results of the experimental study.

Class	Precision	Recall	F1-Score	Support
allow	1.00	1.00	1.00	7545
deny	0.99	1.00	1.00	2994
drop	1.00	1.00	1.00	2562
reset-both	0.00	0.00	0.00	6
accuracy			1.00	13,107
weighted avg	0.75	0.75	0.75	13,107
macro avg	1.00	1.00	1.00	13,107

Table 5. Numerical results of the experimental study with ANNs + SMOTE function.

Class	Precision	Recall	F1-Score	Support
allow	1.00	1.00	1.00	6005
deny	0.85	0.75	0.80	6174
drop	1.00	1.00	1.00	5956
reset-both	0.77	0.87	0.82	5941
accuracy			0.90	24,076
weighted avg	0.91	0.90	0.90	24,076
macro avg	0.90	0.90	0.90	24,076

Table 6. Numerical results of the experimental study with ADASYN function.

Class	Precision	Recall	F1-Score	Support
allow	1.00	0.99	0.99	6024
deny	0.78	0.60	0.68	6189
drop	0.88	1.00	0.94	5988
reset-both	0.75	0.83	0.79	5875
accuracy			0.85	24,076
weighted avg	0.85	0.86	0.85	24,076
macro avg	0.85	0.85	0.85	24,076

Table 7. Numerical results of the experimental study with ANNs + BorderlineSMOTE function.

Class	Precision	Recall	F1-Score	Support
allow	1.00	0.99	1.00	6048
deny	0.99	0.91	0.95	6049
drop	0.85	1.00	0.92	2122
reset-both	0.97	1.00	0.98	5896
accuracy			0.97	20,115
weighted avg	0.95	0.97	0.96	20,115
macro avg	0.97	0.97	0.97	20,115

Table 8. Real-time performance metrics of different ANN models for firewall packet classification.

Model	Latency Per Prediction	Throughput (Samples/Second)
ANNs	0.355833 s	360.58 samples/second
ANNs + SMOTE	0.107679 s	406.59 samples/second
ANNs + BorderlineSMOTE	0.107312 s	510.53 samples/second
ANNs + ADASYN	0.089748 s	444.19 samples/second

Table 9. Comparison of experimental studies on firewall packet classification with similar ones.

Authors	Dataset	Number of Features	Models	Results (Accuracy)
AL-Tarawneh & Bani-Salameh [8]	Public network log from Firat University	11 features	K-Nearest Neighbor (KNN), Random Forest (RF), and Deep Neural Network (DNN) classifiers	The accuracy for RF is 99.7%; for KNN, it is 99.3%; and for DNN, it is 49.47% RF precision 1.00, recall %85
Ertam & Kaya [9]			SVM with Linear, polynomial, sigmoid, and RBF activation functions.	SVM with the linear activation precision 0.675, recall (SVM with the Sigmoid) 0.985, SVM with the RBF activation F1-Score 0.764
AL-Behadili [19]			DT, SVM, One R, ANN, PSO, and ZeroR.	Accuracy: (DT) 99.839%
Rahman et al. [25]			SVM, RF, k-NN, Logistic Regression	Accuracy: 99%
Sharma et al. [31]			KNN, LR, SVM, DT, and stochastic gradient descent classifier. Plus, stacking ensemble with RF as its meta	Precision: (DT) 87% Precision (Stacking Ensemble): 91% Accuracy: (Stacking Ensemble) 99.8%
Our Study			ANN, SMOTE + ANN, ADASYN + ANN, BorderlineSMOTE + ANN	ANN + BorderlineSMOTE Accuracy %97

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Korkmaz, A.; Bulut, S.; Talan, T.; Kosunalp, S.; Iliev, T. Enhancing Firewall Packet Classification through Artificial Neural Networks and Synthetic Minority Over-Sampling Technique: An Innovative Approach with Evaluative Comparison. Appl. Sci. 2024, 14, 7426. https://doi.org/10.3390/app14167426

AMA Style

Korkmaz A, Bulut S, Talan T, Kosunalp S, Iliev T. Enhancing Firewall Packet Classification through Artificial Neural Networks and Synthetic Minority Over-Sampling Technique: An Innovative Approach with Evaluative Comparison. Applied Sciences. 2024; 14(16):7426. https://doi.org/10.3390/app14167426

Chicago/Turabian Style

Korkmaz, Adem, Selma Bulut, Tarık Talan, Selahattin Kosunalp, and Teodor Iliev. 2024. "Enhancing Firewall Packet Classification through Artificial Neural Networks and Synthetic Minority Over-Sampling Technique: An Innovative Approach with Evaluative Comparison" Applied Sciences 14, no. 16: 7426. https://doi.org/10.3390/app14167426

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing Firewall Packet Classification through Artificial Neural Networks and Synthetic Minority Over-Sampling Technique: An Innovative Approach with Evaluative Comparison

Abstract

1. Introduction

2. Related Work

3. Materials and Methods

3.1. Firewall Data

3.2. Limitations

3.3. Data Analysis

3.4. Artificial Neural Networks (ANNs)

3.5. Synthetic Minority Over-Sampling Technique (SMOTE)

3.6. BorderlineSMOTE

3.7. Adaptive Synthetic Sampling (ADASYN)

3.8. Model Performance and Evaluation

4. Results

5. Conclusions

6. Discussion

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI