1. Introduction
The rapid evolution of the Internet of Things (IoT) and the advent of 5G technology have profoundly transformed various sectors, including e-learning, e-health, and intelligent industrial manufacturing. While these advancements have integrated smart devices into our daily lives, they have also introduced numerous security challenges. The interconnectivity of devices in IoT ecosystems creates vulnerabilities that cyberattackers can exploit, threatening the integrity and security of these systems [
1]. This vulnerability is particularly dangerous in IoT devices operating over 5G networks, which enhance connectivity and capacity while expanding the spectrum of frequencies available for mobile networks [
2].
As 5G networks continue to improve, advancements in wireless communication technology are made. However, with new technology, new security issues arise. Developing robust security guidelines and approaches is critical, mainly as more individuals and systems rely on 5G networks; this entails keeping data safe, safeguarding essential services and systems, and protecting user privacy. By addressing security problems head-on, we can collaboratively develop a secure and robust 5G network. The increased connectivity and bandwidth of 5G networks facilitate more connecting mobile devices simultaneously [
3].
The faster speeds and shorter delays of 5G technology raise additional concerns about user privacy and data security. The increased amount of data generated and shared creates risks of unauthorized access to private or sensitive information [
4]. Network elements such as base stations and edge servers are potential failure points that criminals might exploit to gain unauthorized access to user data. The interconnected nature of devices and systems in 5G networks heightens these risks [
5]. Furthermore, the vast amount of data disseminated by IoT devices using 5G poses concerns about data security, including their resale, user profiling, and intelligence collection [
6,
7].
One of the security breaches that may threaten IoT devices is Cross-Site Scripting (XSS) attacks, which can be particularly harmful to web-based networks connected to IoT devices [
8]. These attacks allow a hacker to exploit XSS vulnerabilities in web applications, altering IoT devices’ behavior or gaining access to the system to acquire sensitive information. Severe consequences of XSS attacks on IoT include unauthorized use of devices, data breaches, privacy breaches, and potential physical harm or safety threats [
9]. The high-speed distribution of data over 5G networks amplifies the threat posed by XSS attacks, making it critical to develop a comprehensive and proactive security approach [
10].
XSS is a type of security vulnerability commonly found in web applications that allow attackers to inject malicious scripts into web pages viewed by other users [
11]. These scripts can execute in the context of the user’s browser, potentially leading to unauthorized actions, data theft, session hijacking, and more [
12]. XSS attacks come in three main types: Stored XSS, Reflected XSS, and DOM-based XSS. Stored XSS involves permanently storing malicious scripts in a target server, such as in a database, comment field, or forum post, served to users’ browsers upon request [
13]. Reflected XSS occurs when malicious scripts are reflected off a web server in immediate responses, such as error messages or search results, executing as soon as the user receives them [
14]. DOM-based XSS happens when client-side scripts in the web application modify the DOM environment in unsafe ways, enabling the execution of an attacker’s script [
15].
Integrating IoT devices with 5G networks creates new vectors for XSS attacks due to increased connectivity, faster data transmission speeds, and device interdependencies [
16]. Attackers can inject malicious scripts into IoT devices that host web interfaces, such as smart home devices with web dashboards. These malicious scripts can be injected into device logs, configurations, or user interfaces. In a 5G network, where IoT devices frequently communicate, a script injected into one device could propagate to other interconnected devices [
17]. The high-speed data transmission capabilities of 5G networks mean that the execution of malicious scripts can happen more quickly and on a larger scale, leading to widespread disruption if one device is compromised.
Web-based management interfaces for IoT devices are another common target for XSS attacks. Attackers can inject scripts into fields that administrators or users interact with, like login pages, settings, or logs, causing the malicious script to execute when accessed [
18]. Social engineering and phishing are also tactics used by attackers to trick users into clicking on links that contain malicious scripts, exploiting vulnerabilities in the web interfaces of IoT devices [
19]. Additionally, as 5G enhances mobile device capabilities, mobile apps that control IoT devices can be targeted by injecting scripts into web views within these apps, compromising the managed IoT devices [
20].
Mitigating XSS attacks in IoT over 5G involves several strategies. Input validation and sanitization ensure that all input received by IoT devices and web interfaces is free of malicious scripts [
21]. Implementing a Content Security Policy (CSP) restricts the sources from which scripts can be executed, enhancing security through regular security updates for IoT device firmware and software patch known vulnerabilities. Secure coding practices during development can prevent the introduction of XSS vulnerabilities. Additionally, educating users about the dangers of phishing and social engineering attacks is crucial in reducing the risk of XSS attacks. By employing these measures, the security of IoT devices operating over 5G networks can be significantly improved, mitigating the potential impact of XSS attacks [
22].
Contribution
In this research, we achieved an important contribution in the IoT field over 5G networks, so we can summarize them as the following:
Novel ANN Application for XSS Detection: We introduced a novel ANN approach to detect XSS attacks in IoT systems over 5G, significantly improving detection accuracy and efficiency compared with traditional methods.
Comprehensive Dataset Utilization: We employed NF-ToN-IoT-v2 and Edge-IIoTset datasets to validate the model’s effectiveness and reliability across diverse IoT environments, ensuring generalizability and robustness.
Enhanced Feature Selection: We utilized both filter (mutual information (MI)) and wrapper (recursive feature elimination (RFE)) feature selection methods to optimize predictive performance, reducing computational complexity while maintaining high accuracy.
Statistical Validation via ANOVA Test: We applied an ANOVA test to confirm significant improvements in detection accuracy, addressing performance variability due to initial conditions and ensuring robustness and consistency.
High Detection Accuracy: We achieved remarkable detection accuracies with BLNN and TLNN models, reaching up to 99.84% using BLNN on the NF-ToN-IoT-v2 dataset and 99.79% using TLNN on the Edge-IIoTset dataset, demonstrating the potential for real-time intrusion detection in IoT systems over 5G networks.
The rest of the paper is structured as follows:
Section 2 provides related works;
Section 3 presents the proposed methodology to detect XSS attacks;
Section 4 presents the results and performance evaluation of the ANN detection approach;
Section 5 presents the results of the ANOVA test;
Section 6 discusses the effectiveness and efficiency of the proposed approach over the related works; and finally,
Section 7 concludes the paper and suggests future research directions.
2. Related Work
The widespread use of IoT systems and the deployment of 5G networks have significantly changed how we interact with our environments. However, this new level of connectivity has also opened up many security risks, with XSS attacks posing a significant threat to the security of IoT systems. This section overviews literature reviews on using Machine Learning (ML) and Deep Learning (DL) to secure 5G IoT networks, focusing on studies that address XSS attacks on IoT systems.
Duan et al. [
23] addressed the intrusion detection problem in IoT systems, particularly relevant to smart cities. The researchers proposed a novel approach supported by dynamic line graph neural networks and semi-supervised learning. They tested their model on six datasets, including NF-ToN-IoT-V2, and achieved the highest detection accuracy of 95.70% for XSS attacks.
Gaber et al. [
24] suggested an injection attack detection system for IoT, proposing an Intrusion Detection System (IDS). They investigated two feature selection approaches, constant removal and recursive feature elimination, with three machine learning classifiers: Support Vector Machine (SVMs), Random Forest, and Decision Tree. Using the AWID dataset (version AWID-CLS-R, created by Constantinos Kolias et al., University of the Aegean, Samos, Greece), the Decision Tree classifier outperformed others with a 99% injection attack detection rate by applying only eight selected features. This research highlights the importance of injection attack detection for the security of smart cities, where numerous threats are anticipated due to their development. Awad et al. [
25] emphasized the rapid increase in cyberattacks on IoT networks and devices, highlighting the significance of ML in Network Intrusion Detection Systems (NIDSs). They noted that the prediction time in anomaly-based NIDSs is directly proportional to the number of factors used by the ML model. Their proposed model achieved a detection accuracy of 98% for XSS attacks using just 13 features, demonstrating the effectiveness of the feature importance model.
Yigit et al. [
26] conducted a study on a digital twin-empowered smart attack detection system for 6G Edge of Things (EoT) networks using the ToN-IoT datasets. They employed an online learning algorithm with AutoFS and AutoCM for dynamic and adaptive feature selection and classification. Their system achieved a sensitivity metric of 98.04% for XSS attack detection, proving its efficiency in detecting and preventing these attacks due to innovative feature selection and machine learning techniques. Sarhan et al. [
27] explored XSS attack detection within IoT environments, integrating their work with the NF-ToN-IoT-V2 dataset. They utilized a machine-learning-based model to identify XSS injection attacks prevalent in such networks. Their model showcased robustness with an accuracy of 96.83% in detecting XSS attacks.
Awad et al. [
28] conducted a study focused on enhancing IIoT security using ML and DL techniques for intrusion detection. The primary objective was to detect and mitigate 14 distinct types of cyberattacks targeting IIoT and IoT protocols. Their methodology involved using the Edge-IIoTset dataset. They implemented various ML algorithms, including k-nearest neighbors (K-NNs), Decision Trees (DTs), and neural networks (NNs). The experiments were conducted using the KNIME platform, with preprocessing steps that included data cleaning, missing value, and normalization to improve classification performance. Their results revealed that the K-NN algorithm achieved an accuracy of 54.37%, while the DT algorithm achieved 85.48% accuracy in detecting XSS attacks. Their study is relevant as it focuses on the effectiveness of ML and DL in securing IoT environments, aligning with our goal of using ANN for XSS attack detection over 5G networks. However, its lower accuracy for specific attacks like XSS indicates a gap in comprehensive threat detection.
Ahmed and Askar [
29] developed EdgeGuard, a framework utilizing machine learning for proactive intrusion detection on edge networks. The main aim was to identify and counteract various cyber threats targeting edge and IoT environments. Their approach used convolutional neural networks (CNNs) with residual connections to effectively identify complex patterns in network traffic data. The experiments, conducted using the Edge-IIoTset dataset, demonstrated that their method achieved 77% accuracy in detecting XSS attacks. This research is significant as it showcases the effectiveness of ML techniques in enhancing the security of IoT and edge environments, aligning with our objective of using ANN for detecting XSS attacks over 5G networks.
Ferrag et al. [
30] introduced SecurityBERT, a model designed to be both lightweight and privacy-preserving, utilizing the BERT architecture to detect cyber threats in IoT devices. The research focused on enhancing threat detection accuracy while keeping computational demands low, thus making the model ideal for use in environments with limited resources. The methodology incorporated Privacy-Preserving Fixed-Length Encoding (PPFLE) and the Byte-Level Byte-Pair Encoder (BBPE) Tokenizer to effectively process network traffic data. Testing on the Edge-IIoTset dataset showed that SecurityBERT achieved an overall accuracy of 98.2% in detecting fourteen types of attacks, and it achieved 76.22% accuracy specifically for XSS attack detection.
The literature review highlights the significance of Artificial Intelligence (AI) approaches in securing 5G IoT networks from XSS attacks. Most studies noted the increased vulnerability caused by the adoption of IoT systems and the societal transformation facilitated by 5G networks. They emphasized the need for feature selection approaches to enhance the detection rate of intrusion detection systems, affirming the necessity of improving security in AI optimization. Thus, these efforts are justified to enhance the security and preservation of IoT in smart cities despite increasing insecurity.
3. Proposed Methodology
The rapid development of 5G technology and IoT has significantly transformed various aspects of modern life. These advancements have provided mobile networks with wider bandwidths, faster connections, and improved performance. However, they have also introduced a new range of security threats. Among these, XSS attacks are particularly impactful on data confidentiality, exploiting vulnerabilities in network components and web applications and jeopardizing user privacy and data security. The primary objective of this research is to develop a robust deep learning method for detecting XSS attacks on 5G-enabled IoT devices. This approach is crucial for preventing security breaches and ensuring the overall security of IoT systems.
Figure 1 illustrates the proposed methodology for identifying XSS attacks using the NF-ToN-IoT-v2 and Edge-IIoTset datasets.
The methodology begins with selecting subsets from the NF-ToN-IoT-v2 and Edge-IIoTset datasets that focus on “XSS” and “benign” categories. The preprocessing steps include data cleaning, normalization, and label encoding. The Synthetic Minority Over-sampling Technique (SMOTE) is applied to address the issue of imbalanced data. Next, both filter and wrapper feature selection methods are used to identify the most valuable features for XSS detection by ANN architectures. The dataset is divided into subsets: 70% for training and 30% for testing. Various ANN classifiers, including Narrow Neural Network, Bilayered Neural Network, and Trilayered Neural Network, are then employed to detect XSS attacks. The performance of these models is evaluated using metrics such as accuracy, precision, recall, and F1-score. Additionally, the ANOVA statistical test is applied to validate the results. This approach demonstrates its effectiveness in enhancing IoT system security and mitigating potential risks by achieving high accuracy in identifying XSS attacks.
3.1. Dataset
In this work, we adopted two datasets to evaluate the performance of our model as follows:
NF-ToN-IoT-v2 Dataset: All the NetFlow records in the ToN-IoT dataset are generated from publicly available packet captures (pcaps), resulting in the development of NFSIoT, a NetFlow-based IoT network dataset. The NF-ToN-IoT-v2 dataset comprises a total of 16,940,496 data flows. Of these, 6,099,469 (36.01%) are benign samples, and 10,841,027 (63.99%) are attack samples [
27]. The sampling distribution is well balanced, as shown in
Table 1.
Table 2 provides a comprehensive overview of the features included in the NF-ToN-IoT-v2 dataset, which is utilized for detecting XSS attacks in IoT environments over 5G networks. Each feature is described with its corresponding data type and classification as categorical or numerical. Key features include network-related attributes such as source and destination IP addresses, port numbers, protocol identifiers, and traffic metrics like byte and packet counts. Additional features capture specific behaviors and network traffic characteristics, such as TCP flags, flow durations, TTL values, packet lengths, and retransmission statistics. The dataset also includes metadata about DNS queries and FTP command responses, which are critical for identifying anomalies indicative of security threats. Combining these diverse features enables a thorough analysis and accurate classification of network traffic, facilitating the detection of potential XSS vulnerabilities and enhancing the overall security of IoT systems in high-speed 5G networks.
Edge-IIoTset Dataset: The Edge-IIoTset dataset is a comprehensive cybersecurity dataset for IoT and Industrial Internet of Things (IIoT) environments, designed to support developing and evaluating intrusion detection systems. This dataset contains network traffic data collected from various IoT devices under normal and attack conditions. The dataset comprises 157,800 records featuring diverse types of cyberattacks and benign instances [
31], as shown in
Table 3.
Table 4 provides a comprehensive overview of the features included in the Edge-IIoTset dataset, which is utilized for detecting various cyberattacks in IoT and Industrial Internet of Things IIoT environments. Each feature is described with its corresponding data type, either categorical or numerical. Key features include network-related attributes such as source and destination IP addresses, port numbers, protocol identifiers, and traffic metrics like byte and packet counts. Additional features, such as TCP flags, flow durations, and retransmission statistics, capture specific behaviors and network traffic characteristics. The dataset also includes metadata about HTTP requests, DNS queries, and other protocol-specific details critical for identifying anomalies indicative of security threats. Combining these diverse features enables a thorough analysis and accurate classification of network traffic, facilitating the detection of potential cyber vulnerabilities and enhancing the overall security of IoT systems.
3.2. Data Preprocessing
Preprocessing is essential in transforming unprocessed input into usable data, involving extensive cleaning to remove errors and superfluous information. Our method modifies the raw NF-ToN-IoT-v2 dataset, which includes 44 attributes and 53,464 entries, by correcting anomalies with substitute values and addressing missing entries using the maximum values of attributes. To facilitate the training of ANNs, we convert categorical labels into numeric codes, classifying network traffic as “Benign (0)” or “Attack (1)”, and use the SMOTE technique to balance the dataset. We also apply feature selection techniques to decrease the size of the dataset and simplify the computational demands, enhancing the effectiveness and quality of the analysis. These steps in the data preparation process are crucial for ensuring that the ANN model is trained efficiently and effectively utilizing MATLAB software (version R2023a, MathWorks, Natick, MA, USA).
Data Cleaning: The first step involved identifying and eliminating all cases with missing (NaN) or infinite (Inf) values. They may be absent from the dataset due to measurement errors or data corruption. Once such cases were cleaned, the dataset became more uniform and accurate for further analysis and model training [
32].
Data Normalization: The data have been normalized to improve the performance of the training and the ANN model. In addition, normalizing is critical when using a dataset in which each feature has significant numerical disparities. Normalizing all features to the range between 0 and 1 helps ensure the classification’s accuracy, but it also aids in minimizing the training time and potential error since the less exponentially prominent features would not govern the learning process [
33].
Label Encoding: Label encoding is a fundamental preprocessing step in the field of ANN, particularly useful when working with categorical data. This technique transforms categorical variables into a numerical format, making them compatible with ANN algorithms that require numerical input. In this study, we employed label encoding to convert categorical features into numeric labels, facilitating the training and evaluation of our predictive model [
34]. The adopted label encoding process can be summarized using Algorithm 1.
Algorithm 1: Label encoding of categorical features. |
|
3.3. Filter Feature Selection Method
We utilized mutual information (MI) to evaluate and select features. This method operates independently of any ANN model, focusing instead on the intrinsic properties of the data. The fundamental concept behind filter methods is to score the relevance of features based on statistical measures, precisely the MI in this case, which quantifies the amount of information one variable provides about another. The MI between a feature and the target variable is a non-negative value that measures their dependency [
35]. It is calculated as the following Equation (
1):
where
X and
Y are the feature and target variable, respectively;
is the joint probability distribution function of
X and
Y; and
and
are the marginal probability distribution functions of
X and
Y. A higher
score indicates a greater relevance of the feature to the target variable, suggesting that the feature shares more information with the target.
The process began with the encoded dataset
S, during which the features
X and the target variable
y were isolated. Following this, for each feature
f in
X, we computed the mutual information score between
f and
y using the
mutual_info_classif function. We then selected the top ten features with the highest mutual information (MI) scores to ensure an optimal balance between retaining highly informative features and minimizing model complexity. This approach was strategically chosen based on empirical evidence, suggesting that selecting features with the highest MI scores significantly enhances the model’s predictive accuracy while reducing the risk of overfitting. Algorithm 2 illustrates the details of this method.
Figure 2 presents the MI scores for all features in the NF-ToN-IoT-V2 dataset, and
Figure 3 presents the selected features with the highest MI values.
Figure 4 presents the MI scores for all features in the Edge-IIoTset dataset, and
Figure 5 presents the selected features with the highest MI values.
Algorithm 2: Filter feature selection based on mutual information (MI). |
|
3.4. Wrapper Feature Selection Method
We employed the recursive feature elimination (RFE) method for feature selection. Unlike filter methods, RFE operates with a predictive model, iteratively refining the feature subset to enhance model performance [
36]. The core principle of wrapper methods is to evaluate feature subsets based on the performance of a chosen model, optimizing for the most predictive combination of features. RFE begins by training an estimator on the entire set of features and computing the importance of each feature. The least important features are then recursively pruned from the current set of features. Specifically, RFE ranks the features based on their importance and recursively removes the least important feature, refitting the model on the remaining features in each iteration until the desired number of features is reached [
37].
In this study, we utilized a Linear Regression model as the estimator within the RFE algorithm, leveraging its robustness and capability to handle complex datasets. The process started with the encoded dataset
S, isolating the features
X and the target variable
y. We then applied the RFE method with the Random Forest classifier to iteratively rank and select the top ten most important features. Equation (
2) clarifies how it works as follows:
where
represents the selected subset of features,
k is the number of desired features,
L is the loss function, and
f is the predictive model. By minimizing the loss function, RFE identifies the feature subset that contributes most significantly to the model’s predictive accuracy.
This approach ensures that the selected features not only retain high predictive power but also maintain the interpretability and relevance of the model. Algorithm 3 details the RFE process.
Figure 6 and
Figure 7 illustrate the feature importances and the selected features in the NF-ToN-IoT-V2 dataset, respectively, while
Figure 8 and
Figure 9 present the corresponding results for the Edge-IIoTset dataset.
Algorithm 3: Wrapper feature selection based on recursive feature elimination (RFE). |
|
3.5. Classification Methods
In addressing the challenge of XSS attack detection, neural network architectures were adopted, ranging from the simplicity of Narrow Neural Networks to the complexity encapsulated within Trilayered Neural Networks. This endeavor is propelled by MATLAB’s software, allowing for an in-depth comparative analysis of architectures varying in layers and neuron counts to capture and model our dataset’s intricate dynamics accurately.
3.5.1. Narrow Neural Network
The Narrow Neural Network, with its single hidden layer, exemplifies model efficiency and swift training capabilities at the foundational level, making it particularly suitable for less complex predictive tasks. This algorithm initializes with random weights and biases, progressing through cycles of forward propagation, where data transformations through linear and non-linear operations culminate in output predictions. Backpropagation follows, adjusting weights and biases to minimize error, a process mathematically expressed as Equation (
3); Algorithm 4 illustrates the procedures of this model as follows [
38]:
where
denotes the activation function;
the weight matrices; and
the bias vectors, illuminating the path from inputs
x to the network’s output.
Algorithm 4: Training procedure for Narrow Neural Network (NNN). |
|
3.5.2. Bilayered Neural Network
Evolving complexity, the Bilayered Neural Network integrates an additional hidden layer, enabling the model to capture more nuanced patterns. This architecture’s ability to abstract complex relationships makes it apt for tasks with evolving data patterns, such as image and speech recognition. The BLNN extends the operational framework of the NNN, incorporating an extra layer into both the forward and backward propagation phases [
39], thereby enhancing the model’s depth and capability, as is apparent in Equation (
4); Algorithm 5 presents the details of this model as follows:
Here,
and
extend the model to accommodate another layer of computation, enriching the network’s capacity to process and learn from the input data.
Algorithm 5: Training procedure for Bilayered Neural Network (BLNN). |
|
3.5.3. Trilayered Neural Network
The Trilayered Neural Network, with its three hidden layers, represents the zenith of complexity in our exploration. This architecture’s deep structure is adept at modeling high-level abstractions, making it ideal for tackling the most intricate tasks in ANN, including natural language processing and advanced time series forecasting. The TLNN algorithm meticulously orchestrates forward and backward propagations across three layers, refining the network’s parameters for optimal performance [
40]. The mathematical representation captures this complexity in Equation (
5); Algorithm 6 illustrates the procedures of this model as follows:
where
and
are added to accommodate the third layer, highlighting the intricate computations that enable the TLNN to perform its sophisticated analyses.
Algorithm 6: Training procedure for Trilayered Neural Network (TLNN). |
|
3.6. Evaluation Metrics
The original set was divided into two 80% for training and the remaining for testing. Performance evaluation measures were selected based on the concepts introduced above in the confusion matrix: true positive (TP), false positive (FP), false negative (FN), and true negative (TN). FP corresponds to false alarms, and FN corresponds to misses. The evaluation process uses accuracy, precision, recall, and F1-score measures.
Accuracy: Accuracy measures the overall correctness of model predictions by comparing correct predictions with the total predictions. While useful, it may not be the best indicator for imbalanced datasets, where it can misleadingly appear high if the model predominantly predicts the majority class accurately but fails with the minority class [
41].
Precision: Precision is a metric that evaluates the accuracy of a model’s positive predictions. It is calculated by dividing the number of true positives by the total predicted positives, which include both true and false positives. This measure is crucial in contexts where avoiding false positives is important. It helps assess how well a model identifies only relevant instances as positive [
42].
Recall: Sensitivity or True Positive Rate: Recall is calculated as the ratio of actual positive instances that are predicted as positives. In other words, it shows how many positive instances the model correctly identified without missing anyone. It is calculated as the ratio of true positive and the sum of true positive and false negative [
43].
F1-score: The harmonic mean of precision and recall, F1-score is the balanced measure considering precision and recall together and mainly used to find an optimal balance between precision and recall. It is calculated by the following [
44]:
These metrics are crucial for evaluating the performance of a classification model comprehensively. Accuracy provides a general indication of the model’s effectiveness. Meanwhile, precision, recall, and the F1-score offer detailed insights into the model’s ability to manage false positives and false negatives and the balance between precision and recall. The choice of which metric(s) to use depends on the specific problem being addressed and the desired outcomes of the model evaluation.
3.7. ANOVA Test for Performance Variability Analysis
To analyze the variability in neural network performance due to different initial conditions, we employed the Analysis of Variance (ANOVA) test. ANOVA is a statistical method used to determine whether there are any statistically significant differences between the means of two or more independent groups [
45]. In this study, ANOVA was applied to compare neural network performance metrics (accuracy) across multiple trials with varying initial conditions.
The ANOVA test partitions the total variability in the data into components attributable to different sources of variation. In a one-way ANOVA, the primary components are the variability between groups and the variability within groups. The between-groups variability measures the variation due to differences between the groups, whereas the within-groups variability measures the variation within each group.
The total sum of squares (SS
total) is the sum of the between-groups sum of squares (SS
between) and the within-groups sum of squares (SS
within). The sum of squares for between groups (SS
between) is calculated as follows:
where
k is the number of groups,
is the number of observations in group
i,
is the mean of group
i, and
is the overall mean of all observations.
The sum of squares for within groups (SS
within) is calculated as follows:
where
is the
j-th observation in group
i.
The mean squares for between groups (MS
between) and within groups (MS
within) are obtained by dividing the corresponding sum of squares by their degrees of freedom (df). The mean square for between groups (MS
between) is calculated as follows:
and the mean square for within groups (MS
within) is calculated as follows:
where
N is the total number of observations across all groups, and
k is the number of groups.
The F-statistic is then calculated as the ratio of the between-groups mean square to the within-groups mean square, as follows:
This F-statistic follows an F-distribution with and degrees of freedom. The p-value associated with the F-statistic is used to determine whether the observed differences between group means are statistically significant.
We applied this ANOVA test to analyze the variability in neural network performance, ensuring that the evaluation accounts for differences due to initial conditions. Algorithm 7 used the ANOVA test to analyze the variability in neural network performance; this approach provides a statistically robust comparison of performance metrics. The results of the ANOVA test are discussed in
Section 4.
Algorithm 7: ANOVA test for neural network performance. |
|
4. Results
The performance metrics presented in
Table 5 provide a comprehensive evaluation of various neural network architectures during the training and testing stages across four datasets: NF-ToN-IoT-V2 Filtered Dataset, NF-ToN-IoT-V2 Wrapper Dataset, Edge-IIoTset Filtered Dataset, and Edge-IIoTset Wrapper Dataset.
For the NF-ToN-IoT-V2 Filtered Dataset, the Bilayered Neural Network architectures (2 × 10 and 2 × 25) exhibit superior performance, achieving remarkably high metrics across both training and testing stages. Specifically, the Bilayered Neural Network 2 × 10 achieves a training accuracy of 99.93% and a testing accuracy of 99.84%, with precision values of 99.90% (train) and 99.78% (test), recall values of 99.92% (train) and 99.82% (test), and F1-scores of 99.91% (train) and 99.80% (test). The Bilayered Neural Network 2 × 25 closely follows with a training accuracy of 99.91% and a testing accuracy of 99.88%, precision values of 99.88% (train) and 99.84% (test), recall values of 99.89% (train) and 99.86% (test), and F1-scores of 99.89% (train) and 99.85% (test). These results highlight their robust capability in generalizing from training data to accurately detect XSS attacks with minimal false positives and false negatives.
In comparison, the Medium Neural Network 1 × 25 demonstrates strong performance, particularly in the testing stage, with an accuracy of 98.99%, a precision value of 98.68%, a recall value of 98.88%, and an F1-score of 98.78%. This surpasses the Wide Neural Network 1 × 100, which records a slightly lower testing accuracy of 98.79%, a precision value of 98.43%, a recall value of 98.66%, and an F1-score of 98.54%. This indicates the effectiveness of the Medium Neural Network in balancing model complexity and generalization ability, making it a preferable choice over the Wide Neural Network for this dataset. Conversely, the Narrow Neural Network 1 × 10 shows a lower performance, with a training accuracy of 98.77%, a testing accuracy of 98.74%, precision values of 98.40% (train) and 98.36% (test), recall values of 98.64% (train) and 98.60% (test), and F1-scores of 98.52% (train) and 98.48% (test). These results highlight the challenges simpler architectures face in accurately detecting XSS attacks, underlining the importance of a little increasing model complexity for better performance.
For the NF-ToN-IoT-V2 Wrapper Dataset, the Trilayered Neural Network configurations (3 × 10 and 3 × 25) show consistently robust performance. The 3 × 10 network achieves a training accuracy of 99.76% and a testing accuracy of 99.76%, precision values of 99.68% (train) and 99.68% (test), recall values of 99.73% (train) and 99.73% (test), and F1-scores of 99.71% (train) and 99.71% (test). The 3 × 25 network follows closely with a training accuracy of 99.72% and a testing accuracy of 99.71%, precision values of 99.63% (train) and 99.61% (test), recall values of 99.68% (train) and 99.67% (test), and F1-scores of 99.66% (train) and 99.64% (test). These configurations outperform the Bilayered Neural Network 2 × 25, which, although also performing exceptionally well, records slightly lower metrics, particularly in precision (99.84% train, 99.78% test) and recall (99.86% train, 99.81% test).
In the case of the Edge-IIoTset dataset, the Trilayered Neural Network 3 × 10 emerges as the top performer with a test stage accuracy of 99.79%, showcasing its outstanding generalization capabilities. Specifically, it records a training accuracy of 99.85%, precision values of 99.80% (train) and 99.72% (test), recall values of 99.83% (train) and 99.76% (test), and F1-scores of 99.81% (train) and 99.74% (test). The Bilayered Neural Network 2 × 25 also achieves high performance, particularly in the filtered dataset variant, recording a training accuracy of 99.32% and a test stage accuracy of 99.00%, precision values of 99.11% (train) and 98.84% (test), recall values of 99.24% (train) and 99.01% (test), and F1-scores of 99.17% (train) and 98.92% (test). These results indicate that while both configurations are highly effective, the Trilayered Neural Network 3 × 10 holds a slight edge in terms of overall performance.
Across all datasets, the Bilayered Neural Network architectures consistently achieve the highest accuracy and robustness with relatively lower complexity compared with the trilayered networks. This is particularly evident in the NF-ToN-IoT-V2 Filtered Dataset, NF-ToN-IoT-V2 Wrapper Dataset, and Edge-IIoTset Dataset Wrapper Dataset, where the Bilayered Neural Network 2 × 10 and 2 × 25 configurations demonstrate superior performance metrics with high accuracy, precision, recall, and F1-scores, underscoring their effectiveness and efficiency. These findings suggest that the bilayered architectures provide a balanced and highly effective solution for accurately detecting XSS attacks in IoT systems over 5G networks, achieving high performance with less complexity.
The confusion matrices presented in
Figure 10 provide a comprehensive view of the performance of various neural network configurations during both the training and testing stages. Starting with the Narrow Neural Network 1 × 10 (a, b), the model performs well on the majority class (Benign), with 26,152 true positives and 10,813 true negatives during training, but there are 276 false positives and 184 false negatives. In the test stage, it shows 11,204 true positives and 4633 true negatives, with 121 false positives and 81 false negatives. Moving on to the Medium Neural Network 1 × 25 (c, d), there is a decrease in false negatives compared with the narrow network, with 26,197 true positives and 10,831 true negatives during training and 238 false positives and 159 false negatives. In the test stage, it achieves 11,233 true positives and 4644 true negatives, with 97 false positives and 65 false negatives. This indicates that a larger number of neurons in the single hidden layer has improved the model’s ability to identify XSS attacks. However, false positives have increased slightly, which may be acceptable if reducing false negatives is a priority. The Wide Neural Network 1 × 100 (e, f) sees a further reduction in both false positives and false negatives, with 26,163 true positives and 10,817 true negatives during training, and 267 false positives and 178 false negatives. In the test stage, it shows 11,210 true positives and 4635 true negatives, with 116 false positives and 78 false negatives. The performance of the Medium Neural Network 1 × 25 is better between all single-layer models, highlighting its high ability to generalize from training to testing data and effectively balance the trade-off between sensitivity and specificity.
In the case of multi-layer networks, the Bilayered Neural Network achieved the highest performance between all models, starting with the 2 × 10 configuration (g, h) demonstrating exceptional accuracy during both the training and test stages, with 26,459 true positives and 10,940 true negatives during training and 16 false positives and 10 false negatives. In the test stage, it achieves 11,329 true positives and 4684 true negatives, with 16 false positives and 10 false negatives, effectively balancing in detection XSS attacks over IoT environments. Moving to a more complex 2 × 25 configuration (i, j), the Bilayered Neural Networks have a little reduction in performance with 26,454 true positives and 10,937 true negatives and very low errors of 20 false positives and 14 false negatives during training. In the test stage, it maintains this strong performance with 11,334 true positives and 4686 true negatives and minimal errors of 11 false positives and 8 false negatives, indicating excellent generalization and robust detection capability. The Trilayered Neural Networks, while still highly effective, show slightly higher error rates compared with their bilayered counterparts. The 3 × 25 configuration (m, n) records 26,239 true positives and 10,849 true negatives during training, with 202 false positives and 135 false negatives, and in the test stage, it achieves 11,233 true positives and 4644 true negatives, with 97 false positives and 65 false negatives. The 3 × 10 configuration (k, l) shows 26,247 true positives and 10,852 true negatives during training, with 196 false positives and 130 false negatives, and in the test stage, it records 11,246 true positives and 4649 true negatives, with 86 false positives and 58 false negatives. Across all stages and configurations, it is evident that as the network width and depth increase, they do not necessarily increase the model’s performance in the detection process, as we observed in the experimental results of the NF-ToN-IoT-V2 Filtered Dataset, where bilayered models with a simple configuration of 2 × 10 obtained better results than their more complex counterparts. Therefore, the issue of complexity must be taken into consideration when adopting attack detection models.
The confusion matrices in
Figure 11 show that the Narrow Neural Network 1 × 10 (a, b) achieves 25,808 true positives and 10,670 true negatives during training, but has 568 false positives and 379 false negatives; in testing, it has 11,132 true positives, 4602 true negatives, 183 false positives, and 122 false negatives. The Medium Neural Network 1 × 25 (c, d) improves with 26,133 true positives and 10,805 true negatives during training and 11,180 true positives and 4623 true negatives in testing, with fewer false positives and negatives than the narrow network. The Wide Neural Network 1 × 100 (e, f) shows 26,152 true positives and 10,813 true negatives during training and 11,208 true positives and 4634 true negatives in testing, making the Medium Neural Network 1 × 25 the most balanced single-layer model. For multi-layer networks, the Bilayered Neural Network 2 × 10 (g, h) achieves 26,335 true positives and 10,888 true negatives in training and 11,278 true positives and 4663 true negatives in testing, with low error rates. The 2 × 25 configuration (i, j) presents the best performance with 26,446 true positives and 10,934 true negatives in training and 11,328 true positives and 4684 true negatives in testing. The Trilayered Neural Networks show slightly higher error rates: the 3 × 25 configuration (k, l) has 26,403 true positives and 10,917 true negatives in training and 11,314 true positives and 4678 true negatives in testing, while the 3 × 10 configuration (m, n) achieves 26,414 true positives and 10,921 true negatives in training and 11,321 true positives and 4680 true negatives in testing. The Bilayered Neural Network 2 × 10 offers the best balance of simplicity and performance, achieving superior results in detecting XSS attacks.
The confusion matrices presented in
Figure 12 provide insights into the performance of various neural network configurations on the Edge-IIoTset Filtered Dataset. The Narrow Neural Network 1 × 10 (a, b) shows relatively higher error rates, with 16,442 true positives and 6798 true negatives during training and 7043 true positives and 2912 true negatives during testing. The Medium Neural Network 1 × 25 (c, d) improves upon this, recording 16,578 true positives and 6854 true negatives during training and 7105 true positives and 2938 true negatives during testing. The Wide Neural Network 1 × 100 (e, f) further enhances accuracy, with 16,666 true positives and 6891 true negatives during training and 7099 true positives and 2935 true negatives during testing. The Bilayered Neural Network 2 × 10 (g, h) demonstrates exceptional performance, achieving 16,699 true positives and 6904 true negatives during training and 7117 true positives and 2943 true negatives during testing. The 2 × 25 configuration (i, j) shows even better results, with 16,898 true positives and 6986 true negatives during training and 7226 true positives and 2988 true negatives during testing. The Trilayered Neural Networks 3 × 10 (k, l) and 3 × 25 (m, n) perform well but show slightly higher error rates compared with their bilayered counterparts. Overall, the Bilayered Neural Networks, particularly the 2 × 25 configuration, achieve the highest accuracy with lower complexity, emphasizing the importance of considering model complexity in neural network design for security applications.
Finally, for the Edge-IIoTset Wrapper Dataset,
Figure 13 shows the confusion matrices during the training stage. The Narrow Neural Network (1 × 10) (a) achieved 16,787 true positives and 6941 true negatives, while the Wide Neural Network (1 × 100) (b) slightly outperformed with 16,844 true positives and 6964 true negatives. The Medium Neural Network (1 × 25) (c) had similar results to the wide network. In the test stage, the Wide Neural Network (d) again showed strong performance with 7214 true positives and 2982 true negatives. The Bilayered Neural Network with a 2 × 10 configuration (e) during training achieved 16,959 true positives and 7012 true negatives and 7270 true positives and 3006 true negatives in the test stage (f), indicating excellent detection capabilities. The 2 × 25 configuration (g) also performed well with 16,927 true positives and 6998 true negatives during training and 7257 true positives and 3001 true negatives during testing (h). The Trilayered Neural Network configurations, both 3 × 10 (i) and 3 × 25 (j), demonstrated robust performance, with the 3 × 10 configuration achieving 16,934 true positives and 7001 true negatives in training and 7255 true positives and 2999 true negatives in testing (k). The 3 × 25 configuration achieved 16,901 true positives and 6988 true negatives in training and 7240 true positives and 2994 true negatives in testing (l). The results indicate that while the complexity of the networks increases, the performance remains high, with the bilayered and trilayered networks showing particularly strong results in both the training and testing phases.