Enhancing IoT Network Security Using Feature Selection for Intrusion Detection Systems

Almohaimeed, Muhannad; Albalwy, Faisal

doi:10.3390/app142411966

Open AccessArticle

Enhancing IoT Network Security Using Feature Selection for Intrusion Detection Systems

by

Muhannad Almohaimeed

¹ and

Faisal Albalwy

^2,*

¹

Department of Information Systems, College of Computer Science and Engineering, Taibah University, Madinah 42353, Saudi Arabia

²

Department of Cybersecurity, College of Computer Science and Engineering, Taibah University, Madinah 42353, Saudi Arabia

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(24), 11966; https://doi.org/10.3390/app142411966

Submission received: 18 October 2024 / Revised: 12 December 2024 / Accepted: 17 December 2024 / Published: 20 December 2024

(This article belongs to the Special Issue Applications in Neural and Symbolic Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

The Internet of Things (IoT) connects people, devices, and processes in multiple ways, resulting in the rapid transformation of several industries. Apart from several positive impacts, the IoT presents various challenges that must be overcome. Considering that related devices are often resource-constrained and are deployed in insecure environments, the proliferation of IoT devices causes several security concerns. Given these vulnerabilities, this paper presents criteria for identifying those features most closely related to such vulnerabilities to help enhance anomaly-based intrusion detection systems (IDSs). This study uses the RT-IoT2022 dataset, sourced from the UCI Machine Learning Repository, which was specifically developed for real-time IoT intrusion detection tasks. Feature selection is performed by combining the concepts of information gain, gain ratio, correlation-based feature selection, Pearson’s correlation analysis, and symmetric uncertainty. This approach offers new insights into the tasks of detecting and mitigating IoT-based threats by analyzing the major correlations between several features of the network and specific types of attacks, such as the relationship between ‘fwd_init_window_size’ and SYN flood attacks. The proposed IDS framework is an accurate framework that can be integrated with real-time applications and provides a robust solution to IoT security threats. These selected features can be applied to machine learning and deep learning classifiers to further enhance detection capabilities in IoT environments.

Keywords:

IoT security; intrusion detection systems; feature selection; machine learning; real-time IoT monitoring

1. Introduction

Rapid growth in the Internet of Things (IoT) has enabled automation and an unparalleled degree of interconnection and real-time analysis of disparate data, driving a revolution in several industries such as manufacturing, transportation, healthcare, and smart homes [1]. Intelligent sensing has been embedded into daily objects, and the efficiency of its application has been enhanced through IoT devices. Although IoT devices have become an important component of system infrastructure, a high level of connectivity introduces several security challenges because the system is then exposed to various risks. Owing to their deployment in insecure environments, IoT devices may be exposed to unauthorized access and several types of attacks, such as denial-of-service attacks [2,3]. Generally, traditional measures of security demand a high level of resources, but IoT devices have a limited level of computational resources; thus, these measures cannot be implemented effectively [4]. Both IoT devices and the networks in which they are enrolled must be protected to maintain the continuously growing reliance on IoT systems. A lack of security may have severe effects on data privacy, physical safety, and operational continuity [5]. To avoid the negative consequences resulting from a lack of security, a robust, scalable, and efficient security solution is required [5].

IoT devices are often integrated into critical infrastructure and are deployed in remote and insecure environments. Therefore, they are exposed to significant security risks. Owing to limitations in memory, computational power, and energy, traditional measures of security, such as antivirus programs and firewalls, are not effective [6]. Moreover, the lack of homogeneity in IoT systems further complicates the development of useful security systems [7], and the vulnerabilities in these systems expose them to attackers. Considering various entry points, attackers can exploit IoT devices and infiltrate networks, potentially resulting in large-scale cyberattacks. With the current increase in the sophistication and level of cyber threats, security solutions must be tailored to protect IoT devices without stretching the limited availability of resources [8]. Overall, it is necessary to ensure the resilience and security of IoT systems.

Intrusion detection is an important component of IoT security that is designed to effectively identify and proactively mitigate potential threats and damage. Traditional methods rely on the predefined signatures of known attacks. Conversely, anomaly-based intrusion detection systems (IDSs) are more relevant for IoT systems. These systems detect both known and emerging threats by monitoring the real-time behavior of the network and flagging any deviations from established norms [9]. The dynamic and diverse operation of IoT ecosystems can be managed by anomaly-based IDSs, thereby generating a flexible and robust defense system and ultimately preventing the exploitation of the unique characteristics of IoT devices. Apart from identifying attack signatures, these systems monitor activity and focus on detecting unusual patterns. This generates an important layer of protection, thus complementing existing security systems and securing IoT networks from the growing range and risk level of cyber threats [10].

The RT-IoT2022 dataset used in this study contains a large number of instances with a variety of features related to network behavior [11]. Larger and more complex datasets may provide more valuable results regarding intrusion detection. However, with major increases in the datasets generated by IoT devices, the usual process of analyzing data has become a complicated task for investigators. Hence, there have been continuing efforts to adopt machine learning methodologies to investigate more complex security concerns.

In the present paper, we aim to facilitate research in intrusion prediction using a predictive model. This model is expected to be used as a reference for the implementation and integration of machine learning methods in network management related to IoT domain threats and security. The contributions of this study can be summarized as follows:

-: A comprehensive feature selection framework involving various feature selection methods to detect consistent, non-redundant, and relevant variables that help identify vulnerabilities in IoT environments.
-: Optimized feature set generation investigates the different thresholds for feature importance, combining features that appear in multiple selection methods, and generates a reduced set of features to enhance model performance.
-: Improving IoT network security by demonstrating that the reduced feature set significantly enhances the accuracy and efficiency of the multi-layer perceptron (MLP) classifier compared to the full feature set.

These contributions mark a significant advancement in enhancing the accuracy and efficiency of anomaly-based IDSs, thereby securing IoT ecosystems against increasingly sophisticated cyber threats.

The remainder of this paper is structured as follows. Section 2 reviews related work on IoT security and intrusion detection. Section 3 details the methods used for feature selection and dimensionality reduction. Section 4 presents the experimental setup and results, and Section 5 discusses the implications of the findings. Finally, Section 6 concludes the paper and suggests directions for future research.

2. Related Work

IDSs for the IoT have been extensively studied because of the unique challenges presented by IoT environments, including resource-constrained devices, diverse protocols, and the massive scale of network traffic. Feature selection is one of the critical steps in optimizing IDS performance because it helps reduce computational overhead and increase detection accuracy. To address these challenges, various approaches have been proposed in the literature, ranging from feature selection techniques to advanced machine learning models and hybrid frameworks. The following subsections review these works, categorizing them into feature selection methodologies, machine learning-based approaches, hybrid models, and innovative architectures for IoT security.

2.1. Feature Selection Techniques in Intrusion Detection Systems

Barbosa et al. proposed a method that utilizes Pareto dominance sets with mutual information and linear correlation to filter features for intrusion detection [12]. Their approach focused on balancing information quantity and correlation among features, which significantly reduced the feature set and achieved over 95% accuracy with only 14% of the original features, making it particularly suited for IoT applications.

Awad and Fraihat minimized the number of features by optimizing feature selection for the IDS in IoT networks, based on recursive feature elimination via cross-validation and decision trees [13]. The efficiency of the anomaly-based IDS was increased by reducing the feature count from 42 to 15 while maintaining an accuracy of 95.3% in the classification task.

Li et al. studied the trade-offs between the tasks of feature selection and feature extraction [14]. They demonstrated that while feature extraction may contribute to accuracy in the task of detection, feature selection contributes to reducing computational complexity. They evaluated these tradeoffs and their influence, using specific IoT scenarios.

Jayasanker et al. proposed a two-stage approach, using a model based on dynamic search firework optimization for selecting features [15]. This helped the system focus on the most relevant features, thus eliminating irrelevant ones and enhancing the task of intrusion detection. The accuracy reached 96.11%, using benchmark datasets.

2.2. Machine Learning Models for Anomaly Detection

Musthafa et al. emphasized the influence of feature selection on accuracy when employing analysis of variance and ensemble machine learning techniques [16]. Accuracies of 96.92% and 99.77% were achieved using the UNSW-NB15 and NSL-KD datasets, respectively, thus justifying the combination of feature selection criteria and ensemble learning models.

Johnson et al. explored feature selection techniques for IoT attack classification. Using OneVsRest (OVR) with machine learning models, such as extreme gradient boosting and random forest, their approach achieved 98.89% accuracy and optimized prediction times, highlighting its utility for real-time intrusion detection in IoT networks [17].

Lee et al. demonstrated the importance of ensemble machine learning models for IoT anomaly detection. Leveraging Bayesian hyperparameter optimization, their method addressed the heterogeneity of IoT datasets while significantly improving predictive accuracy, showcasing the role of feature selection in robust IDS frameworks [18].

2.3. Hybrid Approaches Combining Feature Selection and Machine Learning

Otokwala et al. introduced an optimized common feature selection and deep-autoencoder model, which operated on the MQTT-IoT-IDS2020 dataset and achieved an accuracy of 99% in lightweight intrusion detection [19]. Feature selection was incorporated, and memory usage was reduced to 2 kB.

Maseno and Wang combined a genetic algorithm with an extreme learning machine and generated a hybrid wrapper feature selection method for the task of detecting intrusions [20]. This approach underscored the importance of selecting relevant features in the IDS and accurately detecting intrusions using the IoT_ToN dataset.

Azimjonov and Kim developed a lightweight IDS specifically designed for IoT devices [21]. The IDS selected a subset of features based on their efficiency and detected cyberattacks based on stochastic gradient descent classifiers. The dimensionality of the dataset was reduced by 79.93% while the detection of cyberattacks was enhanced and computational complexity was minimized.

Xu et al. combined the Pearson correlation coefficient with random forest techniques for feature selection, addressing feature redundancy and improving the accuracy of IIoT anomaly detection. By integrating bidirectional gated recurrent units (BiGRU) and inception-CNN, they tackled challenges such as data imbalance, enhancing the overall performance of intrusion detection systems [22].

2.4. Innovative Architectures for IoT Security

Aljehane proposed a model named GSAFS-OQNN to classify intrusions in IoT environments. The optimal set of features was selected using a gravitational search algorithm [23]. By focusing on the relevant features, irrelevant features were eliminated, and Z-score normalization was applied to the selected features to improve the performance of a machine learning algorithm developed to classify intrusions.

Bakır and Ceviz combined hybrid feature selection criteria with hyperparameter tuning, based on a genetic algorithm [24]. This approach enhanced IDS performance by focusing on features according to their relevance, as demonstrated using the CICIDS2017 dataset.

Kim et al. used mutual information for feature selection and subspace clustering algorithms, such as CLIQUE and PROCLUS, to enhance anomaly detection. By integrating ensemble learning models, such as LightGBM and XGBoost, they achieved high accuracy and reduced the false-positive rates in IoT intrusion detection tasks [25].

3. Materials and Methods

3.1. Description of the Datasets and Methods

An open-source dataset, namely, RT-IoT2022, which is accessible via the UCI Machine Learning Repository [11], was used in this research. A large amount of network traffic data was derived from various real-time IoT devices such as Amazon Alexa (https://alexa.amazon.com, accessed on 15 October 2024) and MQTT (https://mqtt.org, accessed on 15 October 2024) to generate the dataset. The dataset captures the complexity involved in network traffic via 83 input features and an output feature determining the type of attack. The count of instances in the dataset is 123,117, each of which is classified as either normal or abnormal, and the attack pattern is identified for the abnormal cases. Several attack patterns have been captured, namely, address resolution protocol poisoning, denial-of-service (DoS) SYN Hping, brute-force SSH, Slowloris distributed denial-of-service (DDoS), and five different Nmap patterns. Because the RT-IoT2022 dataset provides comprehensive information about several factors related to IDSs, it is a valuable resource for developing a robust solution to security concerns [26].

Herein, a network identification model is proposed to identify the most important variables related to the IoT domain. As one of the important steps in the task of building the model, the dataset is first preprocessed and then cleaned to enhance performance. Several preprocessing methods were executed to check for missing values and to determine the appropriate format for each data type.

For this paper, an exhaustive feature screening process was deployed to recognize the most common predictors in IoT-based threat prediction. The proposed approach can be summarized into four main steps, as illustrated in Figure 1:

Finding feature importance: here, we harnessed the power of five distinct feature selection methods to ensure a robust and comprehensive feature elimination process. The selected methods were information gain, correlation-based feature subset selection (CFS), the gain ratio, symmetrical uncertainty, and Pearson’s analysis. These algorithms are among the most popular feature selection algorithms and are used in many areas [27,28]. For each method, an importance score was generated for each feature. Finally, a comparison between feature selection techniques in terms of execution time, search method, and attribute evaluator was performed.
Investigating the effect of thresholds and the number of features: here, we used the feature importance scores from each method (obtained in the previous step) to set different threshold values (cutoffs), which included different sets of features that contributed to a certain percentage of the total feature importance. For each feature selection method, the multi-layer perceptron (MLP) classifier was employed, using different sets of features and threshold values. We then compared the model performance with different numbers of features to find the optimal cutoff. The MLP classifier is a neural network model that processes data through interconnected layers, utilizing non-linear transformations to solve complex tasks.
Combining features generated from different feature selection methods: here, we used a frequency-based approach to retain the features that appeared in ≥ four feature selection methods. The most frequently occurring features were selected, and linear relationships were measured to determine the correlation coefficient. A reduced set of features is then generated in this step.
Examining the effect of the reduced set of features in enhancing IoT network security: here, we employed an MLP classifier on both datasets (a full feature set and the reduced set generated in this study) to test whether the proposed feature selection methods can enhance the accuracy and efficiency of the classifier. Then, a quantitative analysis of the experiments in terms of accuracy, precision, recall, and F1 score was performed.

3.2. Feature Selection

Deriving valuable information from large-scale datasets is a topic of special interest in machine learning and data mining [29]. Feature selection is conducted to select an appropriate set of features and eliminate the irrelevant features from a dataset [16]. Minimizing the number of features enables effective data analysis [30]. For several tasks, such as clustering, classification, regression, and generating association rules, the usage of all features is not recommended. The number of features must be minimized because some act as noise and increase data redundancy. The selection of appropriate features reduces data dimensionality, decreases computational costs, and improves the performance of classification [31].

Feature selection techniques designed with different evaluation criteria are broadly divided into three categories: filter, wrapper, and embedded models. In filter-based methods, which were mainly used in this study, features are chosen based on statistical measures [31]. This technique does not rely on learning induction methods and instead selects the features in a preprocessing phase. In addition, filter-based methods have relatively low complexity and offer satisfactory stability and scalability [32]. Meanwhile, wrapper methods assess the performance of various combinations of input features to find the best combination for the model. This method is computationally expensive but achieves high accuracy [20]. Moreover, embedded feature selection methods have built-in mechanisms for choosing the most important features as part of the learning process [20]. Various filter-based methods, specifically CFS, information gain, the gain ratio, Pearson’s analysis, and symmetrical uncertainty, were applied in the present study.

Information gain is an entropy-based feature selection technique commonly used in the field of machine learning. It refers to how much information a variable provides about the target class, thereby revealing the most informative features. Features with high information gain are significantly relevant to the target class and are generally selected to obtain the best classification results. Nevertheless, determining the information gain does not eliminate any redundant features. Therefore, a filter is also needed. Information gain is derived from entropy, which is used to measure the ambiguity of a class by measuring the probability of a certain feature. The maximum value of the information gain for multiple classes is 1 [33].

The gain ratio is a non-symmetrical measure that modifies the information gain to decrease its bias. The gain ratio accounts for the number and size of branches when selecting a feature and adjusts the information gain by considering the intrinsic information of a split. Intrinsic information is considered to be the entropy of the distribution of instances, divided into branches. The feature values decrease as the intrinsic information becomes larger [34].

CFS is a feature selection method that chooses subsets of variables that can provide the highest amount of information about the target class while decreasing redundancy with other features. The CFS technique first measures the correlation between each variable and the target class. Next, it calculates the correlation between each set of features. Finally, it picks the subset of features that has the maximum correlation with the target class and the lowest correlation with the other features [35].

The Pearson correlation coefficient is generally used to identify the degree of linear correlation between two features, which can help to eliminate features. Consequently, when two variables have a high correlation, one of them can be dropped. Features with high correlation values are considered redundant; thus, only the features with minimum redundancy are selected. The features with lower values are added to the selected feature set [36].

Symmetrical uncertainty is an entropy-based nonlinear correlation method that is implemented to evaluate the relationship between a feature and the target class. Symmetrical uncertainty is extended from mutual information, normalizing the entropy value of each variable with that of the target class. Symmetric uncertainty considers a couple of features symmetrically and calculates their suitability for feature selection by computing the correlation between the feature and the target class. The features with higher values of symmetrical uncertainty have higher importance [37].

4. Results

The work described herein was performed using the system specifications shown in Table 1. The Pandas and NumPy libraries were used for preprocessing, and the Matplotlib library was used to visualize the dataset. Additionally, the Scikit-learn and Weka platforms were used for data analysis. The Weka platform was used to perform CFS, determine the gain ratio, and ascertain symmetrical uncertainty, while the Scikit-learn package was used to perform Pearson’s analysis, determine information gain, and employ the MLP classifier.

Using the open-source RT-IoT2022 dataset, we considered 123,117 instances. In this section, we provide an overview of the experimental results of this work and assess the performance of the feature selection methods.

We employed five distinct feature selection techniques: CFS, Pearson’s analysis, the gain ratio, information gain, and symmetrical uncertainty. The list of reduced features (five features) is given for the CFS technique only, while in other techniques, an importance score for each feature is generated to identify strong correlations for detecting intrusions.

Therefore, we analyzed the importance scores for each technique in order to specify the possible optimal threshold values, as illustrated in Figure 2a, Figure 3a, Figure 4a and Figure 5a. To determine a reasonable threshold for feature selection, one approach is to identify the knee points in the importance score plots. The knee point represents the feature after which the importance scores drop sharply, indicating diminishing returns when adding more features. We identified 4–5 knee points for each technique. To find the most significant features that would most strongly influence the models, the threshold values were assessed with the MLP classifier, as shown in Figure 2b, Figure 3b, Figure 4b and Figure 5b.

The experimental results showed the impact of threshold values for the various feature selection methods on the performance of the MLP classifier. In the Pearson analysis, the optimal threshold of 0.2 resulted in the highest accuracy of 94.84%, indicating a strong correlation between the selected features and attack classification. Similarly, the gain ratio method achieved its best accuracy of 95.96% at a threshold of 0.41. Information gain showed the highest performance at a threshold of 0.6, achieving an accuracy of 95.12%, highlighting its ability to prioritize features based on entropy reduction. Symmetrical uncertainty reached its optimal performance of 95.6% at a threshold of 0.3, reflecting its capacity to measure non-linear relationships effectively. These findings emphasize the significance of selecting appropriate thresholds for feature selection methods to enhance the predictive accuracy of classification models in IoT security scenarios.

Referring to the feature selection outcomes across the techniques, the data from the algorithms indicated that the optimal number of predictor variables varies between the different algorithms. The number of features was reduced using CFS, Pearson’s analysis, the gain ratio, information gain, and symmetrical uncertainty to 5, 32, 51, 45, and 60, respectively. A comparison between the feature selection techniques in terms of execution time, search method, and attribute evaluator is shown in Table 2. The run-time of these algorithms on the dataset ranges between 1.2 and 13.3 s.

Delving into the feature importance within different feature selection techniques, the feature ‘fwd_init_window_size’ was the most important for predicting network behavior and appeared in all feature selection techniques. Fifteen other features also had relatively high importance, appearing in the reduced set of features of at least four methods. These features are shown in Table 3. Furthermore, the gain ratio, information gain, and symmetric uncertainty captured the same set of high-importance features. These features were then used for classification to test the usefulness of this process.

In addition, although ‘IDLE’ features, such as ‘Idle.avg’, ‘idle.max’, ‘idle.min’, ‘idle.std’, and ‘idle.tot’, appeared in some feature selection methods (Pearson’s analysis and the gain ratio), they had relatively low importance, indicating low correlations with predicting network behavior. Moreover, seven features did not appear in any of the feature selection methods, namely, ‘fwd_bulk_rate’, ‘bwd_URG_flag_count’, ‘fwd_bulk_packets’, ‘flow_CWR_flag_count’, ‘bwd_bulk_rate’, ‘fwd_bulk_bytes’, and ‘flow_ECE_flag_count’, which suggests that these features do not contribute to predicting network behavior.

Then, the correlation coefficients were determined based on the linear relationships between the most important features (16 features) and the class (attack type). We evaluated the correlation coefficients among the data, as presented in Figure 6. This correlation matrix shows that there are significant correlations between most of the features. The results suggest that ‘id.resp_p’, ‘service’, ‘fwd_init_window_size’, and ‘flow_SYN_flag_count’ are the principal contributors to ‘attack type’, with scores of 0.41, 0.4, 0.37, and 0.37, respectively.

Moreover, we assessed the use of 16 features, generated using a combination of feature selection methods with the MLP classifier, and compared the results with those when deploying the reduced set of features generated by each feature selection method and the full set of features (83 features) with the same classifier. To perform the experiments, each dataset was initially partitioned: 70% was used as a training set and 30% as a testing set. All performance metrics, including accuracy, precision, recall, and F1-score, were stated as percentages. Table 4 shows the performance of the approaches.

The results of the experiment demonstrated the impact of feature selection on the accuracy of the MLP classifier. Using a reduced set of features derived from the combination of five feature selection methods achieved the highest performance results of 96.4%, 97.4%, 87.1%, and 91.9% for accuracy, precision, recall, and F1-score, respectively, indicating that combining methods can effectively identify the most relevant features. It was observed that the percentage of runtime when using feature selection decreased by 66.4%. Among the individual methods, the gain ratio produced the highest accuracy at 95.96%, followed closely by symmetrical uncertainty (95.6%) and information gain (95.12%). Pearson’s analysis and CFS yielded accuracies of 94.84% and 93.1%, respectively, showing a varied performance among the methods. In contrast, using the full set of features resulted in a lower accuracy of 93.48%, highlighting the importance of feature selection in improving classification performance and reducing noise. These findings demonstrate that combining feature selection techniques can outperform both individual feature selection methods and the use of the full dataset.

5. Discussion

Intrusion detection is a vital process in IoT security as it is employed to effectively recognize and mitigate possible threats and malicious attacks. Notably, the absence of overt predictor variables until an attack occurs underscores the need for early detection and preventive strategies [23]. Interventions have proved to be significantly more effective when threats and attacks are identified at nascent stages [12].

In many practical scenarios, it might be challenging or cost-prohibitive to analyze a comprehensive set of features. Thus, building predictive models using reduced data offers a promising approach for the early screening and detection of abnormal activities. By identifying those areas most vulnerable to attack through simple indicators and by recommending further precise monitoring, this approach can help reduce unnecessary cybersecurity solutions and save costs.

Our aim is to study feature selection in IoT network traffic data with the aim of developing a predictive model that could contribute to the early warning and early detection of attacks. Our results have a double application. First, we focused on the key features in IoT network traffic data that can help in detecting malicious attacks. The properties of malicious attacks recognized by the feature selection algorithm can assist IT specialists to better defend these networks. Innovation in machine learning does not always mean using the most sophisticated methods or complicated feature engineering [38]. Sometimes, simplifying the development of predictive models shows an important form of innovation for improving their popularity and usability. From the comparative analysis, deploying various feature selection methods on an average-power device is computationally inexpensive and fast. It also has the advantage of decreasing the complexity of the model and boosting its practicality. The data from these feature selection algorithms indicated optimal predictive performance upon the inclusion of 16 predictor variables.

Second, feature selection was used to develop an effective machine and deep learning classifier-based IDS. Herein, we present an MLP algorithm built with the help of feature selection methods. This approach offers higher predictive accuracy over a model with a full set of features, a finding in alignment with contemporary studies on feature selection with ML [38]. Our approach achieved remarkable performance with respect to detection and false alarm rates. The results showed that various feature selection methods are capable of identifying relevant features in network traffic data and training a powerful ML-based IDS. Furthermore, the broad feature selection process and rigorous validation reaffirm the model’s robustness and reliability.

In this study, we identified significant correlations between various network traffic features and specific types of attacks through the application of feature selection algorithms and correlation analysis. These correlations reveal how certain traffic patterns and behaviors are closely linked to distinct cyberattack types, providing insights that enhance the performance of intrusion detection systems (IDSs), particularly in resource-constrained IoT environments. The following subsections delve into each identified correlation, discussing its implications for IoT network security, along with supporting evidence from the literature, and reviewing the potential for improving detection accuracy and robustness against evolving threats.

5.1. Fwd Init Window Size and Its Role in Detecting SYN Flood and TCP-Based Attacks

The results of this study, as highlighted in the correlation coefficient heatmap, identified ‘fwd_init_window_size’ as one of the key features strongly correlated with SYN flood attacks. This finding emphasizes the importance of this parameter in detecting TCP-based threats, particularly in IoT environments. By uncovering this correlation, our analysis validates the role of ‘fwd_init_window_size’ as a critical metric for identifying those anomalies associated with these types of cyberattacks.

A SYN flood attack is a form of TCP-based attack in which ‘fwd_init_window_size’ is a critical parameter. This parameter represents the initial size of the transmission window allocated by the server for incoming connections. SYN flood attacks lead to the manipulation of this feature. Specifically, a large number of SYN requests are sent to the server without completing a TCP handshake, thus leading to the consumption of server resources and resulting in denial of service [39]. Along with the manipulation of the size of the initial window, the server becomes preoccupied with the task of managing the flood of incomplete connections, thus disabling its ability to handle legitimate traffic. Thus, via SYN flood attacks, the attacker exhausts the server’s resources by exploiting the handshake mechanism of the TCP protocol [40]. ‘fwd_init_window_size’ is a prime example of the vulnerabilities found in IoT devices, which are frequently resource-constrained and rely on efficient communication with the network. For attackers who disrupt services by manipulating TCP parameters, IoT devices are particularly susceptible [41]. If an anomaly related to the ‘fwd_init_window_size’ parameter is detected, it could serve as a warning of various attacks, such as SYN flood attacks and several other TCP-based threats.

Recently developed detection systems are based on several deep learning models, such as CNN-GRU. These systems can analyze several patterns in network traffic and detect attacks in real time, based on deviations identified in the initial size of the window [41]. By exploiting these deviations, the detection accuracy of UDP Lag and SYN flood attacks has been enhanced by machine learning concepts, such as the AdaBoost algorithm [39]. The process of detecting SYN flood attacks may be further enhanced by considering several changes in the patterns of network traffic. These changes can be identified by entropy-based analysis in the hybrid environments of SDE, as in the case of SynFloWatch, including changes related to the initial size of the window [42].

Overall, ‘fwd_init_window_size’ is useful for detecting vulnerabilities in IoT networks. The values of the size parameter can be monitored to recognize an ongoing attack, based on unusual traffic patterns. Both the integrity and availability of services can be ensured by fortifying IoT networks against TCP-based attacks by employing machine learning and deep learning techniques [42]. To prevent several sophisticated attacks, such as off-path TCP hijacking, comprehensive security measures are required to consider all aspects of TCP communication, including the initial window size [43]. Moreover, packet size variations may be exploited in Wi-Fi networks via side-channel attacks. Ultimately, the manipulation of ‘Fwd Init Window Size’ can lead to significant disruptions in IoT services; thus, it acts as an important indicator of the effects of SYN flood and other TCP-based attacks. This feature can be integrated into detection and mitigation strategies, and the resilience of IoT devices against such threats can be enhanced by network administrators to ensure robustness and security in network operations [40].

5.2. Flow SYN Flag Count as an Indicator of Denial-of-Service (DoS) Attacks

The correlation analysis in this study revealed that ‘flow_SYN_flag_count’ is strongly associated with denial-of-service (DoS) attacks, particularly SYN flood attacks. This finding underscores the importance of monitoring SYN flag counts as a key metric for detecting anomalous traffic patterns that are indicative of DoS activity. By establishing this correlation, our results validate the relevance of this feature in identifying vulnerabilities in IoT networks.

DoS attacks, especially TCP SYN flood attacks, are specifically designed to send numerous connection requests to the server, thus overwhelming its resources. One of the critical metrics for identifying and mitigating these attacks is the ‘flow_SYN_flag_count’, which indicates the presence of abnormal patterns in traffic. Notably, the SYN flag count can be monitored continuously. Vulnerabilities are revealed, based on unusual spikes in activity that deviate from normal traffic patterns in SYN requests. For example, normal and SYN network traffic can easily be differentiated using OpenFlow port statistics and machine learning models, such as random forest classifiers, in software-defined networks. This enables the localization and mitigation of threats in a precise manner [44].

Moreover, the SYNTROPY framework enhances detection accuracy with Rényi entropy by adjusting its sensitivity to varying network conditions [45]. Enhancing detection accuracy is crucial because attack patterns can be sophisticated and variable in IoT environments. To detect anomalies, such as SYN floods, without updating dynamic data structures, a count-less sketch robustly and accurately executes network measurement tasks in various traffic distributions [46].

SYN flag monitoring is an important strategy for maintaining service availability under several attack conditions. As high-performance load balancers are frequently the targets of SVN attacks, SYN flag monitoring is one of the innovative schemes that has been proposed to maintain high-throughput connections during attacks [47]. In addition, the flow tables and sketches in the software-defined network can be integrated. This fine-grained traffic measurement enables the separation of mouse and traffic flows and thus contributes to the task of identifying SYN flood attacks [48]. Some approaches, such as tree-based DDoS detection methods, have early detection capabilities, identifying anomalies in the first few packets. Thus, SYN flag counts can be used to identify vulnerabilities and maintain the security of the network, representing one of the general countermeasures used to protect both traditional networks and IoT environments from DoS attacks.

5.3. The Destination Port as a Key Feature in Port Scanning and Reconnaissance Attacks

The correlation analysis from this study highlights the ‘Destination Port (id.resp_p)’ as a significant feature associated with port scanning and reconnaissance activities. This finding aligns with the nature of these attacks, in which malicious actors exploit open ports to identify vulnerabilities. By linking this feature to attack types, our results validate its importance in IoT network security.

The ‘Destination Port (id.resp_p)’ is one of the key features used for recognizing attempts to access the service or open ports in an unauthorized manner, thereby playing a crucial role in identifying scanning attacks and various reconnaissance-type activities. Using the port scanning technique, a network of systems is systematically explored to identify several active services. The associated information can be collected and leveraged by various malicious actors, thereby locating various vulnerable systems in the network. To troubleshoot various network issues and maintain the security of the system, these methods are often employed by IT professionals [49]. The destination port is connected to the specific service or application that a network packet is trying to access, making it a significant parameter to monitor. Attempts to access the destination port in an unusual and unauthorized manner can be detected by security professionals, based on the analysis of usage patterns.

The security of data transmission is one of the paramount goals in the domain of IoT systems, which frequently rely on power-line communication networks. The physical layers of security of networks can be compromised by the presence of an eavesdropper, but security has been enhanced, based on destination scheduling schemes that optimize the paths for data transmission and reduce the probability of interception. Potential vulnerabilities in IoT devices can be identified by continuously monitoring destination ports, which are most frequently open to exploitation. These destination ports and port scanning activities can be correlated to develop more robust security measures. Secrecy performance can be improved by controlling the impulsive-to-background-noise power ratio and arrival rate at receivers [50].

In summary, the destination port is a critical feature for recognizing and mitigating several vulnerabilities in IoT systems. It provides insights into unauthorized attempts to access the network and provides guidance for effectively implementing security measures to protect against various port scanning attacks and several other reconnaissance activities.

5.4. The Fwd Packet Length Mean as an Indicator of Botnet Activity

Our correlation analysis highlights the ‘Fwd Packet Length Mean’ feature as significantly associated with botnet activity. This result underscores its importance for detecting the anomalies caused by botnets in IoT environments. The identified correlation supports its utility in differentiating normal traffic patterns from malicious activities, particularly in resource-constrained IoT devices.

Botnet attacks can be identified in the IoT network by considering the ‘Fwd Packet Length Mean’ feature, which captures and, thus, provides insights into the average size of the packets that are traversing in a network session. Because botnet attacks are accompanied by packets of abnormal length, thereby causing deviations from typical network patterns, the length is one of the significant metrics that can be captured to detect the attack. Furthermore, these deviations are more prominent and easily detected in low-powered IoT environments with limited processing capabilities. Botnets generally flood the target with traffic using packets of unusual sizes to overwhelm the system during attacks, such as in DDoS attacks; hence, the existence of packets with abnormal lengths indicates the execution of malicious activities [51,52]. To maintain the security and operational integrity of IoT networks, these anomalies must be detected [53].

Features such as ‘Fwd Packet Length Mean’ have been effectively analyzed using machine learning and deep learning models, such as support vector machines, random forest, and long short-term memory, to identify and mitigate botnet attacks in IoT networks [51,52]. Several datasets, such as IoT-DH and IoT-23, include various scenarios related to attacks, which are used for learning based on realistic IoT environments, thereby developing robust detection mechanisms [54]. As IoT ecosystems are dynamic, several novel, low-frequency attacks can arise. Such attacks are addressed by allowing the models to adapt to new attack patterns with minimal data by detecting them using few-shot learning and federated learning approaches [53]. The integration of advanced learning techniques enables networks to proactively respond to emerging threats, thus maintaining resilience against the vulnerabilities induced by botnets [55]. Other methods, such as the Stockwell transform, can complement the analysis of important features, such as packet length, in time-frequency analysis. This helps to improve the detection of intrusion anomalies in complex environments [56].

Overall, ‘Fwd Packet Length Mean’ is an important indicator of potential vulnerabilities in IoT networks. Analysis of this feature via several machine learning frameworks is essential to enhance the level of security against various botnet attacks [57].

5.5. Flow Duration and Its Correlation with DoS and DDoS Attacks

The results of our study identified a significant correlation between flow duration and the presence of DoS and DDoS attacks in IoT environments. This finding highlights the role of flow duration as an indicator of abnormal network behavior. The extended flow duration observed in attack scenarios aligns with the consumption of resources caused by these attacks, validating our results through evidence in the existing literature.

Considering that flow duration is related to the nature and behavior of network traffic, it is one of the critical parameters in the domain of DoS and DDoS attacks, especially in IoT networks. Given that IoT devices are resource-constrained and operate on a limited bandwidth, an extension in the flow duration can indicate the existence of malicious activities. In particular, the resources of the target are exhausted by these attacks. The maintenance of prolonged connections by these attacks leads to the excess consumption of resources, especially bandwidth and processing power, which can result in either the denial or degradation of service [58,59]. For example, DDoS attacks may lead to the establishment of long-lasting connections, resulting in excessive traffic flooding the network, and monitoring these deviations helps to detect the attack [60].

Because many devices, such as smart refrigerators and webcams, do not typically possess adequate security measures, they are often targeted in IoT networks. Analysis of the duration of flow can aid in the process of identifying vulnerabilities and elucidating potential entry points for attackers [61]. Based on changes in one of the features of the network beyond a particular threshold value, abnormal flow durations can be detected. Hence, a DDoS attack causing abnormalities in the flow duration can be detected using entropy variation [60].

Advanced detection systems, such as the FLUID system, analyze features on a multidimensional basis. Unified information measures and flow behavior can be incorporated into the FLUID system to identify vulnerabilities in IoT network traffic and effectively distinguish between legitimate and malicious traffic [59]. Apart from other features, the ones capturing flow duration can be analyzed using the machine learning approaches employed in the DDoS-FOCUS model to enhance detection capabilities and accurately identify DDoS attacks [62]. The CICDDoS2019 and DoS/DDoS-MQTT-IoT datasets include several scenarios related to these attacks; thus, those methods based on flow duration can be tested and validated using these datasets [63].

Ultimately, IoT systems can be protected from the detrimental effects of DoS and DDoS attacks by developing more robust defense mechanisms based on the continuous monitoring of flow duration [58]. Moreover, detection techniques like entropy-based methods and feature mapping can be integrated with flow duration analysis to enhance resilience against several cyber threats by providing a comprehensive approach for identifying and mitigating vulnerabilities in IoT networks [64].

5.6. Fwd Header Length and Its Role in Detecting Reconnaissance Activities

Our analysis revealed a notable correlation between ‘Fwd Header Length’ and reconnaissance activities, validating its importance in detecting the preliminary phases of cyberattacks. This feature, as highlighted in our results, serves as a critical marker for abnormal traffic patterns, particularly in IoT environments where resource constraints amplify the impact of such attacks.

‘Fwd Header Length’ pertains to the length of the forward packet headers and is one of the critical features seen in the context of reconnaissance attacks, especially in an IoT environment. This feature can help detect the headers of abnormal packets, which often conclude the scanning activities or attempts by attackers to collect data. When attackers fetch information about the targeted system to identify any vulnerabilities that can be exploited, reconnaissance attacks are the preliminary phase of the cyberattack. Similar to DIS attacks [65], reconnaissance attacks may lead to an increase in power consumption, even though low power consumption and lossy networks are characteristic features of IoT networks using RPL protocols. Characteristics of network traffic, such as ‘Fwd Header Length’, can be analyzed to enhance the detection of reconnaissance activities and identify deviations from normal behavior. Headers of packets can be monitored in real time to detect any ongoing remote attacks and reconnaissance activities in lightweight IDSs by using hardware performance counters. This can be performed with minimal overhead in IoT devices [66]. IoT network traffic can be analyzed to detect vulnerabilities related to the transfer of data in an unencrypted format, which may be exploited during reconnaissance, using machine learning techniques, such as those used in MANDRAKE [67]. Adaptive feature engineering can be implemented using attention-based long short-term memory models to detect reconnaissance attacks. This detection can be achieved by focusing on the relevant features of the packet, such as the length of the header, thereby extracting the temporal dependencies and, finally, identifying the attack patterns [68]. Abnormal lengths of headers affect the utilization of the CPU and the sizes of the packets; hence, early detection methods can quickly identify reconnaissance attacks while analyzing the performance and traffic characteristics [69]. Protection against subsequent exploitation attempts can be generated by integrating these techniques into a comprehensive security network, significantly enhancing the ability of the network to detect and prevent reconnaissance attacks in IoT environments. Several studies based on the detection and prevention of vulnerability in IoT systems show that this approach contributes to the broader goal of improving security [70,71,72].

5.7. Inbound Packet Count as a Marker of Distributed Denial-of-Service (DDoS) Attacks

Our results identified a significant correlation between ‘Inbound Packet Count’ and DDoS attacks, validating its role as a critical feature in IoT network intrusion detection. The results from our analysis demonstrate how abnormal spikes in inbound packet counts are strongly associated with these types of attacks, providing a robust metric for identifying DDoS activities and enhancing detection frameworks.

The ‘Inbound Packet Count’ metric is one of the most critical metrics from the perspective of DDoS attacks, especially in IoT networks. DDoS attacks disrupt and overwhelm network resources by delivering a high volume of inbound packets from several sources. Hence, the count of inbound packets is one important feature to monitor in IoT environments. IoT devices such as thermostats, webcams, and refrigerators do not possess adequate measures of security or receive frequent updates; thus, they are susceptible to being co-opted by botnets in DDoS attacks [61]. The IoT-DH dataset, which is specifically designed for detecting and classifying DDoS attacks, has been used to demonstrate that inbound packet counts can be leveraged to effectively identify and mitigate these threats [54]. From an analysis of the KDD99 and IoTID20 datasets, traffic scenarios related to common and emerging attack patterns can be identified [73].

As discussed by Kaur and Ayoade [74], robust defenses can be developed by gaining an understanding of several operational mechanisms of DDoS attacks across various layers of the IoT architecture. Machine learning can be utilized to analyze IoT network traffic using several techniques to detect vulnerabilities that are based on unusual patterns in the inbound packet counts, such as those associated with unencrypted data transfer [67]. The DDoS-FOCUS model employs deep learning criteria to accurately detect and mitigate DDoS attacks, based on the inbound packet count [62]. Ahmad and Buriro proposed a machine-learning-based approach to dynamically leverage attribute selection by focusing on important features, such as inbound packet counts, to enhance the detection of DDoS attacks [75]. Overall, the performance of IoT networks can be optimized and their complexity can be reduced using the appropriate feature selection techniques and by focusing on important metrics, such as inbound packet counts [76]. The detection methods based on variations in entropy measures also make use of the count of inbound packets to identify anomalies in traffic density, thereby indicating the existence of DDoS attacks [60]. The DoS/DDoS-MQTT-IoT dataset highlights the importance of analyzing the counts of inbound traffic in the context of specific IoT communication protocols, namely, MQTT. This leads to the development of effective countermeasures to manage DDoS attacks [63]. In summary, the inbound packet count enables the development of targeted detection and mitigation strategies because it acts as an indicator of potential DDoS attacks and vulnerabilities in IoT networks.

6. Conclusions

In this paper, we present a comprehensive approach to enhancing the performance of anomaly-based IDSs in IoT environments via a feature selection process that selects valid features and rejects invalid ones. This paper also introduces the RT-IoT2022 dataset, which is specifically designed for intrusion detection in a real-time IoT environment to fill the research gap noted in previous studies. The proposed approach is validated on this dataset, demonstrating the need to use more recent and specific datasets to investigate the security challenges associated with real-time IoT systems.

Whereas prior research has mostly focused on a single feature that is identified using a single-feature selection method, this paper combines the gain ratio, information gain, CFS, and symmetrical uncertainty to comprehensively select multiple features. The use of feature selection makes the dataset suitable for resource-constrained IoT devices by streamlining the dataset without compromising performance.

This paper adds a layer of granularity and depth that has not previously been discussed. For example, the specific correlation between the key features, such as ‘fwd_init_window_size’, and various types of attacks has been analyzed. This correlation is a major requirement that has not been significantly addressed in previous research.

In conclusion, this study offers an efficient, highly accurate, and scalable IDS framework that contributes to the growing body of research in the domain of IoT network security. In the future, the methodology should be tested using other large IoT datasets from multiple sources, such as IoT-23 and UNSW-NB15, to capture latent features related to the field. In addition, the application of selected features to classify incoming network traffic in the IoT environment as either normal or abnormal should be explored. Both machine learning classifiers and deep learning models can be implemented for the task of classification. Various models, such as artificial neural networks, deep neural networks, and TabNet, can be leveraged to further improve the accuracy and operational efficiency of IDSs in real-time IoT systems. This approach may enhance the detection and classification of IoT attacks, thereby improving the security of IoT systems. The proposed approach can be deployed in real-time IoT systems by leveraging the reduced feature set for the efficient processing of resource-constrained devices. It integrates lightweight machine learning models and streaming analytics to classify traffic dynamically, enabling the immediate detection and mitigation of IoT-related cyberattacks.

Author Contributions

Conceptualization, M.A. and F.A.; methodology, M.A. and F.A.; software, M.A. and F.A.; validation, M.A. and F.A.; formal analysis, M.A. and F.A.; investigation, M.A. and F.A.; resources, M.A. and F.A.; data curation, M.A. and F.A.; writing—original draft preparation, M.A. and F.A.; writing—review and editing, M.A. and F.A.; supervision, M.A. and F.A. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Justin, J.; Razali, N.F.; Badaruddin, M.N.A. Transforming Malaysia’s Economic Landscape: The Pivotal Role of the Internet of Things (IoT). In Proceedings of the 2023 IEEE 21st Student Conference on Research and Development (SCOReD), Kuala Lumpur, Malaysia, 13–14 December 2023; pp. 463–468. [Google Scholar]
Abaimov, S. Understanding and Classifying Permanent Denial-of-Service Attacks. J. Cybersecur. Priv. 2024, 4, 324–339. [Google Scholar] [CrossRef]
Roopak, M.; Parkinson, S.; Tian, G.Y.; Ran, Y.; Khan, S.; Chandrasekaran, B. An unsupervised approach for the detection of zero-day distributed denial of service attacks in Internet of Things networks. IET Netw. 2024, 13, 513–527. [Google Scholar] [CrossRef]
Funchal, G.S.; Pedrosa, T.; Prieta, F.d.l.; Leitão, P. Edge Multi-agent Intrusion Detection System Architecture for IoT Devices with Cloud Continuum. In Proceedings of the 2024 IEEE 7th International Conference on Industrial Cyber-Physical Systems (ICPS), St. Louis, MO, USA, 12–15 May 2024; pp. 1–6. [Google Scholar]
Akinsanya, M.O.; Ekechi, C.C.; Okeke, C.D. Security Paradigms for Iot in Telecom Networks: Conceptual Challenges and Solution Pathways. Eng. Sci. Technol. J. 2024, 5, 1431–1451. [Google Scholar] [CrossRef]
Nguyen, V.-T.; Navas, R.E.; Doyen, G. Lightweight Security for IoT Systems leveraging Moving Target Defense and Intrusion Detection. In Proceedings of the NOMS 2024-2024 IEEE Network Operations and Management Symposium, Seoul, Republic of Korea, 6–10 May 2024; pp. 1–6. [Google Scholar]
Piyush, P.; Gill, N.S.; Gulia, P.; Rao, D.D.; Mandiga, Y.; Pareek, P.K. Systematic Analysis of threats, Machine Learning solutions and Challenges for Securing IoT environment. J. Cybersecur. Inf. Manag. 2024, 14, 367–382. [Google Scholar] [CrossRef]
Tanksale, V. Efficient Elliptic Curve Diffie–Hellman Key Exchange for Resource-Constrained IoT Devices. Electronics 2024, 13, 3631. [Google Scholar] [CrossRef]
Bella, K.; Guezzaz, A.; Benkirane, S.; Azrour, M.; Fouad, Y.; Benyeogor, M.S.; Innab, N. An efficient intrusion detection system for IoT security using CNN decision forest. PeerJ Comput. Sci. 2024, 10, e2290. [Google Scholar] [CrossRef] [PubMed]
Mazhar, T.; Talpur, D.B.; Shloul, T.A.; Ghadi, Y.Y.; Haq, I.; Ullah, I.; Ouahada, K.; Hamam, H. Analysis of IoT security challenges and its solutions using artificial intelligence. Brain Sci. 2023, 13, 683. [Google Scholar] [CrossRef] [PubMed]
Sharmila, B.S.; Nagapadma, R. RT-IoT2022; UCI Machine Learning Repository: Irvine, CA, USA, 2024. [Google Scholar] [CrossRef]
Barbosa, G.N.N.; Andreoni, M.; Mattos, D.M.F. Optimizing feature selection in intrusion detection systems: Pareto dominance set approaches with mutual information and linear correlation. Ad Hoc Netw. 2024, 159, 103485. [Google Scholar] [CrossRef]
Awad, M.; Fraihat, S. Recursive feature elimination with cross-validation with decision tree: Feature selection method for machine learning-based intrusion detection systems. J. Sens. Actuator Netw. 2023, 12, 67. [Google Scholar] [CrossRef]
Li, J.; Othman, M.S.; Chen, H.; Yusuf, L.M. Optimizing IoT intrusion detection system: Feature selection versus feature extraction in machine learning. J. Big Data 2024, 11, 36. [Google Scholar] [CrossRef]
Jayasankar, T.; Kiruba Buri, R.; Maheswaravenkatesh, P. Intrusion detection system using metaheuristic fireworks optimization based feature selection with deep learning on Internet of Things environment. J. Forecast. 2024, 43, 415–428. [Google Scholar] [CrossRef]
Musthafa, M.B.; Huda, S.; Kodera, Y.; Ali, M.A.; Araki, S.; Mwaura, J.; Nogami, Y. Optimizing IoT Intrusion Detection Using Balanced Class Distribution, Feature Selection, and Ensemble Machine Learning Techniques. Sensors 2024, 24, 4293. [Google Scholar] [CrossRef]
Alrefaei, A.; Ilyas, M. Using Machine Learning Multiclass Classification Technique to Detect IoT Attacks in Real Time. Sensors 2024, 24, 4516. [Google Scholar] [CrossRef] [PubMed]
Lai, T.; Farid, F.; Bello, A.; Sabrina, F. Ensemble learning based anomaly detection for IoT cybersecurity via Bayesian hyperparameters sensitivity analysis. Cybersecurity 2024, 7, 44. [Google Scholar] [CrossRef]
Otokwala, U.; Petrovski, A.; Kalutarage, H. Optimized common features selection and deep-autoencoder (OCFSDA) for lightweight intrusion detection in Internet of things. Int. J. Inf. Secur. 2024, 23, 2559–2581. [Google Scholar] [CrossRef]
Maseno, E.M.; Wang, Z. Hybrid wrapper feature selection method based on genetic algorithm and extreme learning machine for intrusion detection. J. Big Data 2024, 11, 24. [Google Scholar] [CrossRef]
Azimjonov, J.; Kim, T. Stochastic gradient descent classifier-based lightweight intrusion detection systems using the efficient feature subsets of datasets. Expert Syst. Appl. 2024, 237, 121493. [Google Scholar] [CrossRef]
Yang, K.; Wang, J.; Li, M. An improved intrusion detection method for IIoT using attention mechanisms, BiGRU, and Inception-CNN. Sci. Rep. 2024, 14, 19339. [Google Scholar] [CrossRef] [PubMed]
Aljehane, N.O.; Mengash, H.A.; Hassine, S.B.; Alotaibi, F.A.; Salama, A.S.; Abdelbagi, S. Optimizing intrusion detection using intelligent feature selection with machine learning model. Alex. Eng. J. 2024, 91, 39–49. [Google Scholar] [CrossRef]
Bakır, H.; Ceviz, Ö. Empirical enhancement of intrusion detection systems: A comprehensive approach with genetic algorithm-based hyperparameter tuning and hybrid feature selection. Arab. J. Sci. Eng. 2024, 49, 13025–13043. [Google Scholar] [CrossRef]
Zhu, J.; Liu, X. An integrated intrusion detection framework based on subspace clustering and ensemble learning. Comput. Electr. Eng. 2024, 115, 109113. [Google Scholar] [CrossRef]
Sharmila, B.; Nagapadma, R. Quantized autoencoder (QAE) intrusion detection system for anomaly detection in resource-constrained IoT devices using RT-IoT2022 dataset. Cybersecurity 2023, 6, 41. [Google Scholar] [CrossRef]
Kumar, V.; Minz, S. Feature selection. SmartCR 2014, 4, 211–229. [Google Scholar] [CrossRef]
Venkatesh, B.; Anuradha, J. A review of feature selection and its methods. Cybern. Inf. Technol. 2019, 19, 3–26. [Google Scholar] [CrossRef]
Bharadiya, J.P. The role of machine learning in transforming business intelligence. Int. J. Comput. Artif. Intell. 2023, 4, 16–24. [Google Scholar] [CrossRef]
Pande, S.; Khamparia, A.; Gupta, D. Feature selection and comparison of classification algorithms for wireless sensor networks. J. Ambient Intell. Humaniz. Comput. 2023, 14, 1977–1989. [Google Scholar] [CrossRef]
Moslemi, A. A tutorial-based survey on feature selection: Recent advancements on feature selection. Eng. Appl. Artif. Intell. 2023, 126, 107136. [Google Scholar] [CrossRef]
Masoudi-Sobhanzadeh, Y.; Motieghader, H.; Masoudi-Nejad, A. FeatureSelect: A software for feature selection based on machine learning approaches. BMC Bioinform. 2019, 20, 170. [Google Scholar] [CrossRef]
Win, T.Z.; Kham, N.S.M. Information Gain Measured Feature Selection to Reduce High Dimensional Data. Ph.D. Thesis, MERAL Portal, Naypyidaw, Myanmar, 2019. [Google Scholar]
Tamilmani, A.; Sughasiny, M. Gain Ratio With Optimization Based Feature Selection Method. Webology (ISSN: 1735-188X) 2021, 18, 6545–6557. [Google Scholar]
Doshi, M. Correlation based feature selection (CFS) technique to predict student Perfromance. Int. J. Comput. Netw. Commun. 2014, 6, 197. [Google Scholar] [CrossRef]
Mei, K.; Tan, M.; Yang, Z.; Shi, S. Modeling of feature selection based on random forest algorithm and Pearson correlation coefficient. In Journal of Physics: Conference Series; IOP Publishing: Bristol, UK, 2022; p. 012046. [Google Scholar]
Mustafa, B.; Cudi, O.M. A Comprehensive Review of Feature Selection and Feature Selection Stability in Machine Learning; Gazi University: Ankara, Turkey, 2023. [Google Scholar]
Kamalov, F.; Moussa, S.; Zgheib, R.; Mashaal, O. Feature selection for intrusion detection systems. In Proceedings of the 2020 13th International Symposium on Computational Intelligence and Design (ISCID), Hangzhou, China, 12–13 December 2020; IEEE: Piscataway, NJ, USA, 2020. [Google Scholar]
Syafiuddin, N.H.; Mandala, S.; Cahyani, N.D.W. Detection syn flood and UDP lag attacks based on machine learning using AdaBoost. In Proceedings of the 2023 International Conference on Data Science and Its Applications (ICoDSA), Bandung, Indonesia, 9–10 August 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Wibowo, N.A.; Ariami, D.; Lim, C. Analysis of SYN flood attack detection on web-based services using round trip time (RTT) calculation. In Proceedings of the 2023 IEEE International Conference on Cryptography, Informatics, and Cybersecurity (ICoCICs), Bogor, Indonesia, 22–24 August 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Ishaq, M.; Khan, I.; Ullah, S.I.; Ullah, T. TCP flood attack detection on internet of things devices using CNN-GRU deep learning model. In Proceedings of the 2023 3rd International Conference on Digital Futures and Transformative Technologies (ICoDT2), Islamabad, Pakistan, 3–4 October 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Sinha, M. SynFloWatch: A Detection System against TCP-SYN based DDoS Attacks using Entropy in Hybrid SDN. In Proceedings of the 25th International Conference on Distributed Computing and Networking, Chennai, India, 4–7 January 2024; ACM: New York, NY, USA, 2024. [Google Scholar]
Wang, Z.; Feng, X.; Li, Q.; Sun, K.; Yang, Y.; Li, M.; Du, G.; Xu, K.; Wu, J. Off-path TCP hijacking in WI-Fi networks: A packet-size side channel attack. arXiv 2024, arXiv:2402.12716. [Google Scholar] [CrossRef]
Das, T.; Hamdan, O.A.; Sengupta, S.; Arslan, E. Flood control: TCP-SYN flood detection for software-defined networks using OpenFlow port statistics. In Proceedings of the 2022 IEEE International Conference on Cyber Security and Resilience (CSR), Rhodes, Greece, 27–29 July 2022; IEEE: Piscataway, NJ, USA, 2022; pp. 1–8. [Google Scholar]
Shirsath, V.A.; Chandane, M.M.; Lal, C.; Conti, M. SYNTROPY: TCP SYN DDoS attack detection for software defined network based on Rényi entropy. Comput. Netw. 2024, 244, 110327. [Google Scholar] [CrossRef]
Kim, S.; Jung, C.; Jang, R.; Mohaisen, D.; Nyang, D. A robust counting sketch for data plane intrusion detection. In Proceedings of the 2023 Network and Distributed System Security Symposium, San Diego, CA, USA, 27 February 2023–3 March 2023; Internet Society: Reston, VI, USA, 2023. [Google Scholar]
Cohen, R.; Kadosh, M.; Lo, A.; Sayah, Q. On the Protection of a High Performance Load Balancer Against SYN Attacks* This is an extended journal version of [2]. IEEE Trans. Cloud Comput. 2023, 11, 2897–2909. [Google Scholar] [CrossRef]
Qian, Z.; Gao, G.; Du, Y. Per-flow size measurement by combining sketch and flow table in software-defined networks. In Proceedings of the 2022 IEEE Intl Conf on Parallel & Distributed Processing with Applications, Big Data & Cloud Computing, Sustainable Computing & Communications, Social Computing & Networking (ISPA/BDCloud/SocialCom/SustainCom), Melbourne, Australia, 17–19 December 2022; IEEE: Piscataway, NJ, USA, 2022. [Google Scholar]
Pittman, J.M. A comparative analysis of port scanning tool efficacy. arXiv 2023, arXiv:2303.11282. [Google Scholar] [CrossRef]
Kundu, C.; Dubey, A.; Tonello, A.M.; Nallanathan, A.; Flanagan, M.F. Destination scheduling for secure pinhole-based power-line communication. IEEE Open J. Commun. Soc. 2023, 4, 2245–2260. [Google Scholar] [CrossRef]
Ebady Manaa, M.; Hussain, S.M.; Alasadi, S.A.; Al-Khamees, H.A.A. DDoS attacks detection based on machine learning algorithms in IoT environments. Intel. Artif. 2024, 27, 152–165. [Google Scholar] [CrossRef]
Jalo, H.; Heydarian, M. A hybrid technique based on RF-PCA and ANN for detecting DDoS attacks IoT. InfoTech Spectr. Iraqi J. Data Sci. 2024, 1, 27–41. [Google Scholar] [CrossRef]
Monda, D.D.; Bovenzi, G.; Montieri, A.; Persico, V.; Pescapè, A. IoT botnet-traffic classification using few-shot learning. In Proceedings of the 2023 IEEE International Conference on Big Data (BigData), Sorrento, Italy, 15–18 December 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 3284–3293. [Google Scholar]
Saif, S.; Widyawan, W.; Ferdiana, R. IoT-DH dataset for classification, identification, and detection DDoS attack in IoT. Data Brief 2024, 54, 110496. [Google Scholar] [CrossRef]
Famera, A.G.; Shukla, R.M.; Bhunia, S. Cross device federated intrusion detector for early stage botnet propagation in IoT. In Proceedings of the 2024 IEEE International Systems Conference (SysCon), Montreal, QC, Canada, 15–18 April 2024. [Google Scholar]
Zeng, Y.; Zhang, J.; Zhong, Y.; Deng, L.; Wang, M. STNet: A time-frequency analysis-based intrusion detection network for distributed optical fiber acoustic sensing systems. Sensors 2024, 24, 1570. [Google Scholar] [CrossRef] [PubMed]
Mata-Hernandez, R.; Cardenas-Juarez, M.; Simón, J.; Stevens-Navarro, E.; Rizzardi, A. Exploring the path loss of a hacking tool for security matters in the internet of things. In Proceedings of the 2023 IEEE International Autumn Meeting on Power, Electronics and Computing (ROPEC), Ixtapa, Mexico, 18–20 October 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Pakmehr, A.; Aßmuth, A.; Taheri, N.; Ghaffari, A. DDoS attack detection techniques in IoT networks: A survey. Clust. Comput. 2024, 27, 14637–14668. [Google Scholar] [CrossRef]
Saiyed, M.F.; Al-Anbagi, I. Flow and unified information-based DDoS attack detection system for multi-topology IoT networks. Internet Things 2023, 24, 100976. [Google Scholar] [CrossRef]
Pandey, N.; Mishra, P.K. Performance analysis of entropy variation-based detection of DDoS attacks in IoT. Internet Things 2023, 23, 100812. [Google Scholar] [CrossRef]
Pravylo, V.; Averkiiev, Y. Analysing malicious software supporting DDoS attacks on IoT networks. Inf. Telecommun. Sci. 2024, 1, 50–54. [Google Scholar] [CrossRef]
Al-Khafajiy, M.; Al-Tameemi, G.; Baker, T. DDoS-FOCUS: A distributed DoS attacks mitigation using deep learning approach for a secure IoT network. In Proceedings of the 2023 IEEE International Conference on Edge Computing and Communications (EDGE), Chicago, IL, USA, 2–8 July 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Alatram, A.; Sikos, L.F.; Johnstone, M.; Szewczyk, P.; Kang, J.J. DoS/DDoS-MQTT-IoT: A dataset for evaluating intrusions in IoT networks using the MQTT protocol. Comput. Netw. 2023, 231, 109809. [Google Scholar] [CrossRef]
Mekala, S.H.; Baig, Z.; Anwar, A.; Syed, N. DoS attacks, human factors, and evidence extraction for the industrial internet of things (IIoT) paradigm. In Proceedings of the 2023 38th IEEE/ACM International Conference on Automated Software Engineering Workshops (ASEW), Luxembourg, 11–15 September 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Kamal, T.; Helmy, E.; Fahmy, S.; Abd El-Azeem, M.H. Detecting and preventing for performance assessment of IoT devices under dodag information solicitation (dis) attacks. In Proceedings of the 2023 40th National Radio Science Conference (NRSC), Giza, Egypt, 30 May–1 June 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 110–120. [Google Scholar]
Bouazzati, M.E.; Tessier, R.; Tanguy, P.; Gogniat, G. A lightweight intrusion detection system against IoT memory corruption attacks. In Proceedings of the 2023 26th International Symposium on Design and Diagnostics of Electronic Circuits and Systems (DDECS), Tallinn, Estonia, 3–5 May 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 118–123. [Google Scholar]
Brezolin, U.; Vergütz, A.; Nogueira, M. A method for vulnerability detection by IoT network traffic analytics. Ad Hoc Netw. 2023, 149, 103247. [Google Scholar] [CrossRef]
Alanazi, H.; Bi, S.; Wang, T.; Hou, T. Adaptive feature engineering via attention-based LSTM towards high performance reconnaissance attack detection. In Proceedings of the MILCOM 2023-2023 IEEE Military Communications Conference (MILCOM), Boston, MA, USA, 30 October–3 November 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 542–547. [Google Scholar]
Keshavamurthy, P.; Kulkarni, S. Early detection of reconnaissance attacks on IoT devices by analyzing performance and traffic characteristics. In Proceedings of the 2023 IEEE International Conference on Cyber Security and Resilience (CSR), Venice, Italy, 31 July–2 August 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Ma, X.; Yan, C.; Wang, Y.; Wei, Q.; Wang, Y. A vulnerability scanning method for web services in embedded firmware. Appl. Sci. 2024, 14, 2373. [Google Scholar] [CrossRef]
Bassiony, I.; Hussein, S.; Salama, G. Position falsification detection approach using travel distance-based feature. Transp. Telecommun. J. 2024, 25, 278–288. [Google Scholar] [CrossRef]
Li, S.; Zhu, Z.; Zhu, Y.; Zhu, Q.; Zhang, J.; Sun, W.; Dai, G.; Qiao, F.; Yang, H.; Wang, Y. Memory-efficient and real-time SPAD-based dToF depth sensor with spatial and statistical correlation. In Proceedings of the 2023 60th ACM/IEEE Design Automation Conference (DAC), San Francisco, CA, USA, 9–13 July 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Niang, P. Analysis of Data Sets for the Study of Computer Network Vulnerabilities. In Intelligent Transport Systems; Russian University of Transport: Moscow, Russia, 2024; pp. 699–709. [Google Scholar]
Kaur, K.; Ayoade, J. Analysis of DDoS attacks on IoT architecture. In Proceedings of the 2023 10th International Conference on Electrical Engineering, Computer Science and Informatics (EECSI), Palembang, Indonesia, 20–21 September 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]
Ullah, S.; Mahmood, Z.; Ali, N.; Ahmad, T.; Buriro, A. Machine learning-based dynamic attribute selection technique for DDoS attack classification in IoT networks. Computers 2023, 12, 115. [Google Scholar] [CrossRef]
Kumar, K.R.; Nakkeeran, R. A comprehensive study on denial of service (DoS) based on feature selection of a given set datasets in internet of things (IoT). In Proceedings of the 2023 International Conference on Signal Processing, Computation, Electronics, Power and Telecommunication (IConSCEPT), Karaikal, India, 25–26 May 2023; IEEE: Piscataway, NJ, USA, 2023. [Google Scholar]

Figure 1. The proposed feature selection methodology.

Figure 2. Pearson’s analysis: feature importance analysis. (a) The importance scores for all features. (b) Performance when using the MLP classifier, with the respective threshold values.

Figure 3. Gain ratio: feature importance analysis. (a) The importance scores for all features. (b) Performance when using the MLP classifier, with the respective threshold values.

Figure 4. Information gain: feature importance analysis. (a) The importance scores for all features. (b) Performance when using the MLP classifier, with the respective threshold values.

Figure 5. Symmetrical uncertainty: feature importance analysis. (a) The importance scores for all features. (b) Performance when using the MLP classifier, with the respective threshold values.

Figure 6. Correlation coefficient heatmap indicating the correlations between attack type and the identified important features.

Table 1. Environment setup.

Hardware/Software	Specification/Version
OS	Big Sur version 11.7.10
CPU	2.3 GHz 8-Core Intel Core i9
Hard disk space	1 TB
RAM	16 GB
GPU	AMD Radeon Pro 4 GB Intel UHD Graphics 630 1536 MB
Weka	3.8.6
Python	3.9
NumPy	1.26.4
Pandas	2.2.2
Matplotlib	3.8.4
Scikit-learn	1.4.2

Table 2. Comparative analysis of feature selection techniques.

Feature Selection	Search Method	Attribute Evaluator	Time (s)	Number of Features
CFS	Best first	CFS subset evaluator	13.3	5
Pearson’s analysis	Attribute ranking	Correlation ranking filter	1.2	32
Gain ratio	Attribute ranking	Gain ratio feature evaluator	9.75	51
Information gain	Attribute ranking	Information gain ranking filter	9.24	45
Symmetrical uncertainty	Attribute ranking	Symmetrical uncertainty ranking filter	10.02	60

Table 3. The occurrence of the most significant features. A value of ‘1’ means that the feature appeared in the specified feature selection method, while a value of ‘0’ means that the feature did not appear.

Feature	CFS	Pearson’s Analysis	Gain Ratio	Information Gain	Symmetric Uncertainty	Number of Occurrences
fwd_init_window_size	1	1	1	1	1	5
bwd_pkts_payload.avg	0	1	1	1	1	4
bwd_pkts_payload.max	0	1	1	1	1	4
bwd_pkts_payload.std	0	1	1	1	1	4
flow_SYN_flag_count	0	1	1	1	1	4
flow_iat.std	0	1	1	1	1	4
flow_pkts_payload.max	0	1	1	1	1	4
fwd_iat.avg	0	1	1	1	1	4
fwd_iat.max	0	1	1	1	1	4
fwd_last_window_size	1	0	1	1	1	4
fwd_pkts_payload.avg	0	1	1	1	1	4
fwd_pkts_payload.max	1	0	1	1	1	4
fwd_subflow_pkts	0	1	1	1	1	4
id.resp_p	0	1	1	1	1	4
payload_bytes_per_second	0	1	1	1	1	4
service	0	1	1	1	1	4

Table 4. Comparison of the performance of the MLP classifier using the full features dataset and different reduced feature datasets.

Methods	Number of Features	Accuracy	Precision	Recall	F1-Score
Original data	83	93.5%	61.7%	99.7%	76.2%
All FS methods	16	96.4%	97.4%	87.1%	91.9%
CFS	5	93.1%	80.9%	63.0%	70.8%
Pearson’s analysis	32	94.8%	76.8%	99.1%	86.5%
Gain ratio	51	96.0%	84.0%	90.3%	87.0%
Information gain	45	95.1%	92.9%	89.9%	91.4%
Symmetrical uncertainty	60	95.6%	93.9%	93.9%	93.9%

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Almohaimeed, M.; Albalwy, F. Enhancing IoT Network Security Using Feature Selection for Intrusion Detection Systems. Appl. Sci. 2024, 14, 11966. https://doi.org/10.3390/app142411966

AMA Style

Almohaimeed M, Albalwy F. Enhancing IoT Network Security Using Feature Selection for Intrusion Detection Systems. Applied Sciences. 2024; 14(24):11966. https://doi.org/10.3390/app142411966

Chicago/Turabian Style

Almohaimeed, Muhannad, and Faisal Albalwy. 2024. "Enhancing IoT Network Security Using Feature Selection for Intrusion Detection Systems" Applied Sciences 14, no. 24: 11966. https://doi.org/10.3390/app142411966

APA Style

Almohaimeed, M., & Albalwy, F. (2024). Enhancing IoT Network Security Using Feature Selection for Intrusion Detection Systems. Applied Sciences, 14(24), 11966. https://doi.org/10.3390/app142411966

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Enhancing IoT Network Security Using Feature Selection for Intrusion Detection Systems

Abstract

1. Introduction

2. Related Work

2.1. Feature Selection Techniques in Intrusion Detection Systems

2.2. Machine Learning Models for Anomaly Detection

2.3. Hybrid Approaches Combining Feature Selection and Machine Learning

2.4. Innovative Architectures for IoT Security

3. Materials and Methods

3.1. Description of the Datasets and Methods

3.2. Feature Selection

4. Results

5. Discussion

5.1. Fwd Init Window Size and Its Role in Detecting SYN Flood and TCP-Based Attacks

5.2. Flow SYN Flag Count as an Indicator of Denial-of-Service (DoS) Attacks

5.3. The Destination Port as a Key Feature in Port Scanning and Reconnaissance Attacks

5.4. The Fwd Packet Length Mean as an Indicator of Botnet Activity

5.5. Flow Duration and Its Correlation with DoS and DDoS Attacks

5.6. Fwd Header Length and Its Role in Detecting Reconnaissance Activities

5.7. Inbound Packet Count as a Marker of Distributed Denial-of-Service (DDoS) Attacks

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI