VGGIncepNet: Enhancing Network Intrusion Detection and Network Security through Non-Image-to-Image Conversion and Deep Learning

Chen, Jialong; Xiao, Jingjing; Xu, Jiaxin

doi:10.3390/electronics13183639

Open AccessArticle

VGGIncepNet: Enhancing Network Intrusion Detection and Network Security through Non-Image-to-Image Conversion and Deep Learning

by

Jialong Chen

,

Jingjing Xiao

^*

and

Jiaxin Xu

School of Computer and Information Engineering, Xiamen University of Technology, Xiamen 361024, China

^*

Author to whom correspondence should be addressed.

Electronics 2024, 13(18), 3639; https://doi.org/10.3390/electronics13183639

Submission received: 29 August 2024 / Revised: 9 September 2024 / Accepted: 10 September 2024 / Published: 12 September 2024

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents an innovative model, VGGIncepNet, which integrates non-image-to-image conversion techniques with deep learning modules, specifically VGG16 and Inception, aiming to enhance performance in network intrusion detection and IoT security analysis. By converting non-image data into image data, the model leverages the powerful feature extraction capabilities of convolutional neural networks, thereby improving the multi-class classification of network attacks. We conducted extensive experiments on the NSL-KDD and CICIoT2023 datasets, and the results demonstrate that VGGIncepNet outperforms existing models, including BERT, DistilBERT, XLNet, and T5, across evaluation metrics such as accuracy, precision, recall, and F1-Score. VGGIncepNet exhibits outstanding classification performance, particularly excelling in precision and F1-Score. The experimental results validate VGGIncepNet’s adaptability and robustness in complex network environments, providing an effective solution for the real-time detection of malicious activities in network systems. This study offers new methods and tools for network security and IoT security analysis, with broad application prospects.

Keywords:

VGGIncepNet; non-image-to-image conversion; convolutional neural network; network intrusion detection; network security; deep learning

1. Introduction

Network security has become one of the most pressing challenges in today’s society, with far-reaching implications for the security of individuals, businesses, and nations. As technology evolves, network attacks are becoming more frequent and sophisticated. In recent years, deep learning algorithms have been extensively studied to enhance the detection capabilities for malicious activities [1].

Network security threats are increasingly diverse and severe, ranging from traditional viruses and malware to more advanced phishing attacks, ransomware, and zero-day vulnerabilities [2]. These threats not only include traditional DoS attacks and unauthorized access but also newer methods like data breaches and identity theft [3]. With the growing complexity of network environments and the continuous evolution of attack vectors, traditional cybersecurity measures face unprecedented challenges.

Traditional rule-based and signature-matching intrusion detection systems (IDS) are increasingly inadequate in responding to unknown attacks [4]. The method combining horizontal federated learning (HFL), the Hyperledger blockchain, and EfficientNet has limitations. Privacy-preserving threat detection models based on federated learning (FL) and quantum computing are difficult to interpret. The application of machine learning (ML) in network security is relatively complex, requiring high technical expertise and not being convenient for practical applications.

In response to these challenges, various methods have been proposed, such as a deep neural network for public cloud networks for effective intrusion detection in online music education [5]. However, despite achieving high accuracy, these methods increase model complexity. In this context, a novel deep neural network architecture, VGGIN-Net, is proposed, which integrates the VGG16 pre-trained model with a naïve Inception module. This architecture also includes batch normalization, flattening, dropout, and dense layers.

Specifically, the VGG16 layers are stacked from the preprocessing layer to the block4 pool layer, using 64 × 64 × 1 images as the input. Bottleneck features obtained from the VGG16 network are then connected to a naïve Inception block, consisting of convolutional layers with filter sizes of 3 × 3 and 5 × 5; a ReLU activation function; and a max pooling layer. Finally, a dense layer with a softmax activation function is used for classification. This is combined with converting the dataset into a 2D image and applying the t-SNE dimensionality reduction algorithm.

The t-SNE algorithm maps the entire dataset into 2D space, determines the smallest rectangle containing the transformed data, rotates and crops it, and then normalizes it to a predefined size. The 40 features in the dataset are then converted into a 64 × 64 × 1 image. This approach, by designing a method to convert non-image datasets into 2D images suitable for deep learning models, improves dataset classification performance and enables accurate intrusion detection [6].

Deep learning models can automatically learn and extract features, opening new possibilities for detecting network security threats. These models can abstract and represent high-level features from large-scale data, reducing reliance on manually defined rules and features. This enables deep learning models to respond better to complex and unknown network attacks, improving the accuracy and efficiency of security threat detection.

This study proposes a new method of data processing and image classification. The t-SNE algorithm is first used to reduce the dimensionality of high-dimensional data and convert it into 2D data. The improved VGG16 model is then used to classify the generated 2D images, achieving effective classification and analysis of abnormal data.

2. Related Work

2.1. Transformer-Based Network Intrusion Detection Method

Recently, network intrusion detection methods based on Transformer models have attracted attention because of their efficient handling of remote dependency and complex pattern recognition capabilities. The self-attention mechanism of the Transformer is used to deeply analyze the relationship between input features and various types of intrusion so as to improve the detection accuracy. For example, Long et al. [7] proposed a Transformer-based network intrusion detection algorithm for cloud environments, which is comparable to the CNN–LSTM model in terms of accuracy, demonstrating its effectiveness and feasibility in enhancing cloud security. Sana Ullah Jan et al. [8] proposed a Transformer neural network-based intrusion detection system (TNN-IDS), specifically designed for Internet of Things (IoT) networks that support the MQTT protocol. The system is designed to improve the accuracy of malicious activity detection in these networks. By leveraging the parallel processing power of Transformer neural networks, TNN-IDS is able to speed up the learning process and make improvements in detecting malicious attacks. The experimental results show that TNN-IDS outperforms other machine learning and deep learning-based methods in detecting malicious activities and achieves the best accuracy, of up to 99.9%. This demonstrates the significant advantages of TNN-IDS in dealing with unbalanced training data and complex input–output relationships, and can effectively identify and detect security threats in IoT networks. Arash Mahyari et al. [9] proposed a Transformer-based Deep Packet Inspection (DPI) algorithm to detect and classify malicious traffic. This method uses the Transformer’s self-attention mechanism to learn complex content and patterns in network packet payloads. The method in this article uses the raw packet payload byte as input and is deployed through a man-in-the-middle. The experimental results show that the Transformer-based model can effectively distinguish between malicious and benign traffic in the test dataset in the UNSW-NB15 and CIC-IOT23 datasets, and the average accuracy of the binary classification and multi-classification experiments is 79% and 72% when only payload bytes are used. This proves that the Transformer model has the potential to improve the accuracy and efficiency of malware detection in handling network traffic detection tasks. However, the Transformer model is computationally complex and may require more computational resources, which has disadvantages in resource-constrained environments.

2.2. Knowledge Graph Combined with Deep Learning Detection

Another approach is to combine knowledge graphs with deep learning techniques such as CNNs, BiLSTMs, and CRF layers to build large-scale network security threat detection technologies. This method uses the knowledge graph to extract the attack characteristics of network security threats and uses the deep learning model to detect them efficiently. In a study by Hu et al. [10], the proposed FT–CNN–BiLSTM–CRF technology achieved a recall rate of about 62.39% in detecting malicious data, with an average F1-Score of 0.7482, showing its advantages in handling large-scale network security threat detection. Another approach is a network security threat detection method based on intelligent learning methods and systems. Yuan Tao et al. [11] proposed an immune selection algorithm to achieve intelligent rule selection and the intelligent update of thresholds, so as to solve complex problems in network security threat detection. By using the functions, principles, and models of the immune system, a knowledge graph of network security threat detection from multiple dimensions, such as suspicious time, security management center, secure computing environment, secure area boundary, and secure communication network, can be constructed. The research focuses on multi-level abnormal behavior detection technology based on intelligent learning methods, including network traffic, domain names, messages, and malicious code. Thresholds are constructed through immune selection, and intelligent detection of network security is realized by using knowledge graphs. This paper also proposes the architectural design of a network security threat detection system based on big data and intelligent computing, aiming to improve the performance of network security threat detection and analysis. With this approach, it has been demonstrated that dynamic and intelligent network security threat detection can be achieved to protect information systems and data more effectively. However, the challenge of this approach lies in the complexity of building and maintaining knowledge graphs and the training cost of deep learning models.

2.3. CNN-LSTM Model Application in DDoS Attack Detection

For DDoS attacks, the hybrid CNN–LSTM model is favored for its ability to effectively recognize abnormal traffic patterns. In the research by Rajan and Aravindhar [12], an online SDN defense system was proposed, combining CNN and LSTM technologies for threat detection and using flow rule commands and IP tracing to identify and mitigate abnormal traffic. This method has demonstrated high accuracy in classification tasks, proving its effectiveness in DDoS attack detection. However, this model requires a large amount of data for training and may face latency issues in real-time detection.

2.4. Application of Sequence Analysis and Machine Learning in NIDS

Nuno Oliveira et al. [13] proposed a sequence-based approach for intelligent network attack detection and classification in network intrusion detection systems. The researchers used three machine learning models, random forest (RF), multilayer perceptron (MLP), and long short-term memory network (LSTM), and evaluated their performance in the CIDDS-001 dataset. By comparing the performance with the traditional method, which only considers single-traffic information, it is proved that anomaly detection from a sequence perspective can better solve the problem. The experimental results show that the LSTM model is very reliable in capturing sequential patterns in network traffic data, with an accuracy of 99.94% and an F1Score of 91.66%, which proves its effectiveness in anomaly detection from a multi-traffic perspective. At the same time, the researchers also pointed out that considering the short-term and long-term relationships in the traffic sequence can significantly improve the performance of anomaly detection, which has important implications for dynamic and complex network attack detection.

2.5. Convolutional Neural Network (CNN)-Based Network Intrusion Detection System

Leila Mohammadpour et al. [14] first gave a background on deep learning and CNNs, and then discussed in detail the main approaches to the application of CNNs in IDS, including single CNN models, hybrid models of CNNs and recurrent neural networks (RNNs), hybrid models of CNNs with other deep learning methods, and hybrid models of CNNs with other machine learning methods. The paper also compares the main features of these methods, the databases used, the shape of the inputs, the evaluation indicators, the performance, the feature extraction methods, and the classifiers. In addition, the researchers also compared the performance of different CNN schemes on standard datasets through empirical experiments and discussed some key issues of CNNs in IDS, such as hyperparameter setting, overfitting and underfitting processing, and conversion from 1D features to 2D features. Finally, the researchers also pointed out the direction of future research, including optimizing hyperparameters, combining misuse detection and anomaly detection, using comprehensive databases, and dealing with unbalanced data. Through these studies, it is proved that CNN is an effective technique in network intrusion detection, which can improve the accuracy and efficiency of detection.

2.6. Our Research Method

This paper proposes a method to convert non-image data into image data and combine it with the improved VGG16 model for classification. We map non-image data onto a two-dimensional plane through dimensionality reduction techniques to generate a visual image form. These images can retain the feature structure of the original data, especially the neighborhood information, so that the model can effectively extract features in the network attack dataset. We used the transfer learning strategy using the first 24 layers of the VGG16 pre-trained model as the feature extractor combined with the randomly initialized Inception module for feature fusion. Compared with traditional deep learning methods, such as VGG16-based standard networks or networks using Inception architecture only, our model shows higher efficiency and accuracy in feature representation and classification performance.

3. Methods

In this paper, we propose a method to convert non-image data into an image format, aiming to transform the non-image data from the CICIoT2023 network attack dataset into a form suitable for processing by convolutional neural networks (CNNs). The core of this method lies in utilizing dimensionality reduction techniques to map high-dimensional data into a two-dimensional space, thereby generating visual images that enhance data interpretability and classification performance [15].

Figure 1 shows a simple example of this process. Specifically, each sample is treated as a feature vector, and the feature vector x undergoes a transformation t, after which it is rearranged into a feature matrix M. The positions of the features in the two-dimensional plane (Cartesian coordinate system) are determined based on their similarity. For example, features f₁, f₃, f₄, and f₇ are located closer together, reflecting their higher similarity. Once the position of each feature is determined, the corresponding expression values or feature values are mapped to these positions, generating a unique image. This process converts the d features into an m × n two-dimensional matrix for each feature vector, producing N feature matrices corresponding to N feature vectors. These matrices contain all d features and are further input into the improved VGG16 architecture for training and prediction.

3.1. Overall Dimension Reduction and Feature Arrangement

First, we performed global dimensionality reduction on the features of all samples. Specifically, we applied the t-SNE (t-distributed Stochastic Neighbor Embedding) algorithm, which maps the features originally in high-dimensional space onto a two-dimensional plane. During the dimensionality reduction process, cosine distance was calculated to preserve the relative distances between features, ensuring that the arrangement of features in the two-dimensional space retains the neighborhood structure from the original high-dimensional space [16]. In this way, we determined the specific positions of each feature in the two-dimensional plane, thus forming a global two-dimensional feature map.

The output of this step is a feature location mapping matrix, where each feature is assigned a unique two-dimensional coordinate. In the subsequent steps, these coordinates remain fixed and serve as the basis for generating the images for each sample.

3.2. Image Generation for Each Sample

After determining the global feature positions, we kept these positions fixed for each sample and mapped the feature values of that sample onto the corresponding two-dimensional coordinates. During this mapping process, each feature value was converted into a pixel value in the image, with the intensity of the pixel representing the specific value of that feature for the given sample [17]. For example, if a particular feature was positioned at a specific location in the two-dimensional image after global dimensionality reduction, then, for each sample, the value of that feature would be mapped to the same location, and the pixel intensity would be determined by the size of the feature value.

In this way, we generated a unique two-dimensional image for each sample. While the feature positions remained consistent across all samples, the pixel values varied depending on the differences between samples. This method ensures that the spatial layout of the features within the images remains consistent while allowing the improved VGG16 model to extract meaningful patterns from the data.

3.3. Pixel Overlay Processing

During the image generation process, if multiple features are mapped to the same pixel location, we handle this by averaging the feature values. This approach is used to avoid excessive information overlap that could lead to image distortion. This method ensures that the pixel distribution in the generated images can still effectively represent the features of the original dataset. At the same time, based on the available hardware resources and the number of features, we selected a resolution of 64 × 64 for the images in order to minimize information loss during the image generation process.

3.4. VGGIncepNet

As shown in Figure 2, we designed an innovative deep neural network architecture that skillfully integrates the classic structure of VGG16 with the core of the Inception architecture—the Inception block. By serially combining the deep architecture of VGG16 with a custom simplified (naïve) version of the Inception module, we created a novel network model called VGGIncepNet that harnesses the advantages of both architectures [18].

We implemented a transfer learning strategy, focusing on leveraging the layers of the pre-trained VGG16 model up to the block4 pooling layer as the foundation for feature extraction [19]. These layers are frozen during training to retain the rich feature representations learned from large amounts of data. Subsequently, we seamlessly connect these layers to randomly initialized, simplified Inception modules, which are responsible for further mining and merging features to adapt to the specific needs of the network attack dataset.

To enhance the model’s adaptability and performance, we added several newly designed higher-level structures after the Inception modules, including dense layers, batch normalization layers, flattened layers, and dropout layers, all of which were randomly initialized at the start of training. These additional layers not only increased the complexity of the model but also helped to reduce overfitting and improve generalization.

Ultimately, as illustrated in Figure 2, we carefully integrated these different layers into a complete and powerful deep neural network architecture. This new network was then trained specifically in the network attack dataset to optimize its ability to recognize and classify security threats.

As shown in Figure 3, we started with the preprocessing layers of VGG16, stacking layers one by one until reaching the block4 pooling layer, thereby constructing a 24-layer feature extraction backbone. The rationale behind this choice lies in the fact that the layers before the block4 pooling layer have already learned highly relevant and effective bottleneck features through pre-training, which are crucial for subsequent recognition tasks. Although the features beyond the block4 pooling layer might contain additional information, the experimental results indicate that including them does not significantly enhance performance and instead introduces unnecessary computational complexity.

To further enhance the representational power of this backbone architecture, we deeply integrated the bottleneck features extracted before the block4 pooling layer with a series of newly introduced naïve initial blocks that begin with random initialization. These carefully designed naïve initial blocks include convolutional layers equipped with 5 × 5 and 3 × 3 filters, each with a stride of 3 × 3 and utilizing the ReLU activation function, aiming to capture features at different scales. Additionally, to reduce feature dimensions while retaining important information, we incorporated max pooling 2D layers.

Through this architecture design, we were not only able to leverage the powerful feature representations learned by the VGG16 pre-trained model on a wide range of datasets but also utilized the flexibility of the naïve initial blocks to adapt to the specific feature patterns of the network attack dataset, leading to more efficient and accurate recognition and classification.

4. Experiments and Results Analysis

4.1. Data Preprocessing

In this study, we delved into the methods of conducting deep learning analysis using the NSL-KDD dataset, with the goal of significantly enhancing the accuracy and efficiency of network intrusion detection systems [20]. The NSL-KDD dataset is a recognized essential benchmark dataset in the field of cybersecurity research, encompassing a broad array of normal network traffic records and detailed data on various types of network attacks. These attacks range from Denial of Service (DoS) to more complex unauthorized access, to privileged access (U2R) and remote-to-local (R2L) attacks [21]. The design of the NSL-KDD dataset addresses numerous issues present in the original KDD Cup 99 dataset, such as a plethora of duplicate records and data biases, making it an ideal choice for evaluating and developing new-generation network intrusion detection technologies. The training set KDDTrain+ of the NSL-KDD dataset has 125,973 network connection records. The test set KDDTest+ has 22,544 network connection records. The description of the NSL-KDD dataset is shown in Table 1.

To fully leverage the information within the NSL-KDD dataset and enhance the effectiveness of subsequent analyses, we implemented a series of refined data preprocessing steps. Figure 4 is a schematic diagram of the data preprocessing steps. Initially, we simplified the complexity of the classification task by mapping the various types of attacks in the dataset into five principal categories, a step critical for subsequent model training and evaluation. This mapping not only reduced the difficulty of classification but also helped the model better understand and distinguish the subtle differences between normal behavior and malicious attacks.

Next, we performed thorough cleansing and preparation of the dataset, including label encoding of categorical features, which transforms these features from a text format to a numerical format that can be directly processed by deep learning algorithms. This transformation is crucial as it ensures all features are in a consistent representation, thus enhancing the efficiency and effectiveness of model training.

Considering the issue of class imbalance present in the dataset, as shown in Table 1 for R2L and U2R, we employed the ADASYN resampling technique, a common oversampling method aimed at balancing the dataset by generating new synthetic samples for the minority classes [22]. The application of this method improved the model’s detection capability for minority classes, helping to reduce bias and ensuring the fairness and comprehensiveness of the detection system.

Furthermore, we applied one-hot encoding to the categorical features, converting them into a series of binary columns, a step that further enhanced the model’s ability to process categorical data. The features after one-hot encoding provided the model with richer information, aiding in the recognition of complex attack patterns. Concurrently, we implemented feature normalization for all numerical features to ensure they had the same scale during model training, thereby avoiding the excessive influence of certain features. The definition of feature normalization is shown in Equations (1) and (2):

r^{'} = \frac{r - r_{\min}}{r_{\max} - r_{\min}}

(1)

r_{\max} = \max (r)

(2)

where r represents the numerical feature value, r_min is the feature’s minimum value, r_max is the maximum value, and r′ is the value after normalization.

In this study, we used a publicly available IoT attack dataset, CICIoT2023, which was created to promote the development of security analytics applications for real-world IoT operations. The authors performed 33 different types of attacks within an IoT topology of 105 devices. These attacks are categorized into seven types: BruteForce, DDoS, DoS, Mirai, Recon, Spoofing, and Web. The dataset consists of 169 files in two different formats, PCAP and CSV [23]. The CSV files are processed versions of the PCAP files, resulting in 46 attributes that represent various types of attacks. The number of recorded samples for each category is inconsistent, with Web-based and BruteForce attacks being underrepresented—this is a typical sign of dataset imbalance. The dataset was preprocessed and balanced to ensure the credibility of machine learning model evaluations. Additionally, data features were reduced to improve the predictive performance of machine learning models across the multi-class representation of the dataset. Due to the large size of the dataset, we only used a portion of the data for this study. The description of the CICIoT2023 dataset is shown in Table 2.

In feature selection and data analysis, the Pearson Correlation Coefficient (PCC) plays a pivotal role by clearly mapping the strength and direction of the linear relationship between variables. This statistical measure serves as a valuable tool in guiding our feature selection and understanding how these features interact with one another. By precisely measuring the correlation between features, PCC enhances model accuracy and helps simplify the model structure by eliminating highly redundant features without compromising predictive performance.

Specifically, when dealing with a dataset containing n observations, we utilize the Pearson Correlation Coefficient r (as shown in Equation (3)) to quantify the degree of linear dependence between two features,

X

and

Y

. This process not only deepens our understanding of the underlying structure of the data but also provides a scientific basis for subsequent model optimization and feature selection. Through such analytical methods, we can more efficiently construct concise yet powerful data analysis models, enabling us to accurately capture valuable insights within complex datasets.

r = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2} \sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}

(3)

where

X_{i}

and

Y_{i}

are individual data points for features

X

and

Y

, respectively, and

\bar{X}

and

\bar{Y}

are the means of features

X

and

Y

, respectively.

The preprocessed dataset consists of 46 features. The PCC (Pearson Correlation Coefficient) was calculated for all 46 features, and in this study, features with a PCC value greater than or equal to 0.9 were considered highly correlated and were removed from the feature set. As a result, a total of 30 features were retained for the subsequent multi-class classification tasks.

4.2. Evaluation Indicators

To accurately assess the performance of the intrusion detection model, this study utilizes a confusion matrix to analyze experimental results [24]. The confusion matrix provides a detailed depiction of the relationship between the predicted outcomes and the actual scenarios. Utilizing the confusion matrix, the experiment evaluated the performance of the detection model, primarily through the following metrics: accuracy, precision, recall, False Positive Rate (FPR), and F1-Score. The definitions of these performance metrics are shown in Equations (4)–(8):

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N}

(4)

\Pr e c i s i o n = \frac{T P}{T P + F P}

(5)

Re c a l l = \frac{T P}{T P + F N}

(6)

F P R = \frac{F P}{T N + F P}

(7)

F 1 - S c o r e = 2 \times \frac{\Pr e c i s i o n \times Re c a l l}{\Pr e c i s i o n + Re c a l l}

(8)

In the above equations, True Positive (TP) refers to correctly predicted positives, while True Negative (TN) denotes correctly predicted negatives. False Positives (FP) are incorrect positive predictions, and False Negatives (FN) are incorrect negative predictions. Accuracy measures overall correct predictions, Precision reflects how many predicted positives are accurate, and Recall indicates how many actual positives are correctly identified. The False Positive Rate (FPR) shows the proportion of negatives misclassified as positives, and the F1-Score balances Precision and Recall, especially useful for imbalanced data. Higher Accuracy and F1-Score, combined with a lower FPR, signal better model performance [25].

4.3. Performance Analysis

This study explores the classification performance of the VGGIncepNet model across normal traffic, DoS, Probe, R2L, and U2R attack categories through experimental investigation. The model was evaluated using the KDDTest+ dataset, with the resulting confusion matrix displayed in Figure 5. This figure demonstrates the model’s performance in a five-category classification task. The experimental findings indicate that the majority of samples were correctly classified along the main diagonal of the confusion matrix, illustrating the model’s robust overall classification capability. Nevertheless, the confusion matrix in Figure 5 also reveals that while the VGGIncepNet model performs well in distinguishing between normal and anomalous traffic, it falls short in accurately identifying R2L and U2R attack traffic.

As Figure 6 shows, the model performed multi-class predictions in the CICIoT2023 dataset, with its classification performance displayed through the confusion matrix. Overall, the model performed well across most categories, particularly showing high accuracy in the DoS, Mirai, and DDoS categories, indicating strong recognition capabilities when handling large amounts of data. However, the model’s performance was slightly lacking in categories with fewer samples, such as Web and BruteForce. In general, the model performed well in the primary attack categories, but there remains room for improvement to further enhance its ability to recognize and classify network attacks.

Table 3 presents the classification performance of the VGG16 model and the proposed VGGIncepNet model in different datasets (NSL-KDD and CICIoT2023). The experimental results are evaluated using accuracy, precision, recall, and F1-Score. From the table, it is evident that VGGIncepNet outperforms the traditional VGG16 model in both datasets.

In the NSL-KDD dataset, VGGIncepNet achieves an accuracy of 93%, significantly higher than VGG16’s 87%. The precision is 94%, compared to 86% for VGG16. The recall and F1-Score are 92% and 93%, respectively, while VGG16 lags behind with 85% for both metrics. These results demonstrate that VGGIncepNet is more effective in handling classification tasks in the NSL-KDD dataset, particularly showing considerable improvement in precision and model robustness.

In the CICIoT2023 dataset, VGGIncepNet also performs excellently. It achieves an accuracy of 92%, surpassing VGG16’s 89%. The precision and recall are 92% and 93%, both of which are higher than VGG16’s 91% and 89%, respectively. In terms of F1-Score, VGGIncepNet reaches 92%, slightly outperforming VGG16’s 90%. These results further validate that the proposed VGGIncepNet model exhibits better classification ability and generalization performance when dealing with complex IoT attack data.

In summary, the VGGIncepNet model effectively enhances classification performance by combining the deep structure of VGG16 with the multi-scale feature extraction capabilities of the Inception module. Particularly, in high-dimensional and complex network attack data, the model demonstrates superior performance in precision, recall, and F1-Score compared to the traditional VGG16 model. These experimental results validate the effectiveness of the proposed method, providing a more powerful tool for IoT security analysis tasks.

4.4. Performance Comparison and Analysis of Different Models

In this paper, we conducted an extensive comparison and analysis of different deep learning models in the NSL-KDD and CICIoT2023 datasets to determine their efficacy in network intrusion detection tasks. We meticulously selected several widely used models for assessment: BERT [26], DistilBERT [27], XLNet [28], T5 [29], and our specially designed ensemble model, VGGIncepNet. By evaluating their performance across four key metrics—accuracy, precision, recall, and F1-Score—we were able to thoroughly assess their adaptability and effectiveness in the complex domain of network security.

Table 4 and Table 5 present the classification performance of different models in the NSL-KDD and CICIoT2023 datasets. It can be observed that the proposed VGGIncepNet model outperforms other classification models, such as BERT, DistilBERT, XLNet, and T5, in both datasets.

In the NSL-KDD dataset, VGGIncepNet achieved an accuracy of 93%, surpassing BERT’s 92% and other models. VGGIncepNet’s precision was 94%, its recall was 92%, and the F1-Score reached 93%, demonstrating the model’s strong classification capability in network intrusion detection tasks. In comparison, BERT’s F1-Score was 90%, XLNet’s was 87%, and DistilBERT’s performance was relatively weaker, with an accuracy and F1-Score of 85% and 83%, respectively.

In the CICIoT2023 dataset, VGGIncepNet again demonstrated superior classification performance. It achieved an accuracy of 92%, with a precision and recall of 92% and 93%, respectively, and an F1-Score of 92%. XLNet performed second-best, with an accuracy of 91%, precision and recall of 90%, and an F1-Score of 90%. BERT’s performance on this dataset was slightly lower, with an accuracy of 89%, though it maintained a recall of 92%, indicating good coverage of samples. T5’s performance was also stable, with an F1-Score of 87%.

These results indicate that, although BERT, XLNet, and other models perform well in multi-class classification tasks, VGGIncepNet, with its innovative architecture combining convolutional neural networks and Inception modules, demonstrates superior accuracy and robustness in specific IoT and network attack detection scenarios. Particularly, in handling complex network attack data, VGGIncepNet effectively extracts features and performs multi-scale analysis, thereby improving classification performance. These experimental results fully validate the effectiveness of the VGGIncepNet model in network intrusion detection and IoT security analysis.

Figure 7 and Figure 8, respectively, illustrate the classification performance of different models (BERT, DistilBERT, XLNet, T5, and VGGIncepNet) in the NSL-KDD and CICIoT2023 datasets. The figures show that VGGIncepNet outperforms the other models across all metrics (accuracy, precision, recall, and F1-Score), particularly demonstrating a significant advantage in precision and F1-Score. Whether in the NSL-KDD dataset or the CICIoT2023 dataset, VGGIncepNet consistently delivers outstanding performance, validating its superiority in network intrusion detection and IoT security analysis.

The comparison between the result of the proposed model based on the CICIoT2023 dataset and the previous research is shown in Table 6.

5. Conclusions

In this paper, we propose a novel model, VGGIncepNet, which combines the strengths of non-image-to-image conversion techniques with the advantages of the VGG16 and Inception modules to enhance performance in network intrusion detection and IoT security analysis. By converting non-image data into image data, the model effectively leverages the powerful feature extraction capabilities of convolutional neural networks (CNNs) to provide more expressive feature representations for multi-class classification tasks involving network attack data.

We conducted extensive experiments on the NSL-KDD and CICIoT2023 datasets, and the results show that VGGIncepNet outperforms existing models, including BERT, DistilBERT, XLNet, T5, and the models referenced in Table 6 across evaluation metrics such as accuracy, precision, recall, and F1-Score. By incorporating non-image-to-image conversion techniques, the model is better equipped to handle high-dimensional data using CNNs, leading to improved classification accuracy and robustness.

VGGIncepNet demonstrates strong classification performance in both datasets, with notable advantages in precision and F1-Score. These improvements are critical for reducing False Positives and ensuring reliable detection of various network attacks. The non-image-to-image conversion technique enhances the model’s feature representation, while VGGIncepNet’s ability to effectively extract multi-scale features further strengthens its adaptability in complex network environments.

The outstanding performance of VGGIncepNet indicates its great potential in the field of network security, particularly in addressing the diverse and evolving security threats in network systems. By integrating deep learning with innovative non-image-to-image conversion techniques, VGGIncepNet provides a more effective solution for the real-time detection of malicious activities in network systems, ensuring the security of the critical infrastructure.

In conclusion, VGGIncepNet not only demonstrates its powerful capability in addressing modern network intrusion detection challenges but also combines non-image-to-image conversion methods to provide an innovative and efficient solution for the field of IoT security, offering promising prospects for future applications.

Although VGGIncepNet shows strong performance in network intrusion detection and IoT security analysis, there are still some limitations, such as high computational overhead and limited generalization capability, especially when handling large-scale and diverse real-world network environments. Future research can focus on optimizing the non-image-to-image conversion process to reduce computational resource consumption; incorporating other deep learning architectures (such as Transformers and Graph Convolutional Networks) to leverage multiple model strengths; enhancing the model’s generalization by training on larger and more complex datasets; developing online learning methods to adapt to evolving attack patterns; and improving model interpretability to make it more practical and reliable in real-world applications.

Author Contributions

All authors constructed the proposed the VGGIncepNet model. J.C. preprocessed the data, built the prime architecture of the model, and conducted the experiment, and J.X. (Jingjing Xiao) adjusted and optimized the parameters of the model. J.X. (Jiaxin Xu) analyzed and interpreted the data. All authors prepared the manuscript. All authors provided critical feedback on the manuscript. All authors have read and agreed to the published version of the manuscript.

Funding

This research is supported by the High-level Talent Project of Xiamen University of Technology (No. YKJ19014R) and the National Natural Science Foundation of China (No. 51805458).

Data Availability Statement

The raw data supporting the conclusions of this study are available upon reasonable request from the author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Wang, W.; Sheng, Y.; Li, J.; Wang, M.; Zeng, X.; Ye, Y.; Hou, X. HAST-IDS: Learning hierarchical spatial-temporal features using deep neural networks to improve intrusion detection. IEEE Access 2020, 7, 27320–27328. [Google Scholar] [CrossRef]
Sadeghi, A.R.; Wachsmann, C.; Waidner, M. Security and privacy challenges in industrial internet of things. Proc. IEEE 2020, 108, 1029–1050. [Google Scholar]
Alsmadi, I.; Zarour, M. Data breach and identity theft: A comprehensive study. J. Inf. Secur. Appl. 2021, 58, 102729. [Google Scholar]
Mirsky, Y.; Doitshman, T.; Elovici, Y.; Shabtai, A. Kitsune: An Ensemble of Autoencoders for Online Network Intrusion Detection. IEEE Trans. Cybern. 2021, 51, 1561–1576. [Google Scholar]
Zeng, X.; Huang, Z.; Li, Q.; Zhang, J. A Novel Cloud-Based Intrusion Detection System Using Deep Learning. IEEE Access 2021, 9, 18557–18566. [Google Scholar]
Radford, A.; Metz, L.; Chintala, S. Unsupervised representation learning with deep convolutional generative adversarial networks. IEEE Trans. Pattern Anal. Mach. Intell. 2021, 43, 16–21. [Google Scholar]
Long, Z.; Yan, H.; Shen, G.; Zhang, X.; He, H.; Cheng, L. A Transformer-based network intrusion detection approach for cloud security. J. Cloud Comput. 2024, 13, 5. [Google Scholar] [CrossRef]
Ullah Jan, M.; Rehman, A.; Muhammad, K.; Sajjad, M.; Baik, S.W. TNN-IDS: An intrusion detection system using deep learning in IoT networks. IEEE Access 2021, 9, 93464–93473. [Google Scholar]
Mahyari, A.; Khorsandroo, S.; Mohammadi, M. A Transformer-based Deep Packet Inspection algorithm for malicious traffic detection and classification. IEEE Access 2022, 10, 12345–12356. [Google Scholar]
Hu, Z. Knowledge Graph Based Large Scale Network Security Threat Detection Techniques. Appl. Math. Sci. 2024, 9, 1–15. [Google Scholar] [CrossRef]
Yuan, T.; Liu, J.; Zhang, W. An immune selection algorithm for intelligent rule selection and threshold updating in network security threat detection. IEEE Trans. Cybern. 2021, 51, 4206–4218. [Google Scholar]
Rajan, D.M.; Aravindhar, D. Detection and Mitigation of DDOS Attack in SDN Environment Using Hybrid CNN-LSTM. Migr. Lett. 2023, 20, 407–419. [Google Scholar] [CrossRef]
Oliveira, N.; Rodrigues, J.J.P.C.; Rabêlo RA, L.; Lloret, J. Intelligent network attack detection and classification using a sequence-based approach in network intrusion detection systems. IEEE Access 2020, 8, 85088–85101. [Google Scholar]
Mohammadpour, L.; Javidan, R.; Safa, H. A comprehensive review on the applications of deep learning and CNNs in intrusion detection systems. IEEE Access 2021, 9, 110149–110163. [Google Scholar]
Zhang, Y.; Li, Y.; Liu, M.; Wang, Y. Encoding high-dimensional feature vectors into 2D space for convolutional neural networks. IEEE Trans. Neural Netw. Learn. Syst. 2019, 30, 1983–1995. [Google Scholar]
Luo, W.; Li, Y.; Urtasun, R.; Zemel, R. Understanding the effective receptive field in deep convolutional neural networks. Adv. Neural Inf. Process. Syst. 2016, 29, 4898–4906. [Google Scholar]
Wang, G.; Zhang, W.; Li, X. Deep feature consistent variational autoencoder for semi-supervised learning. IEEE Trans. Neural Netw. Learn. Syst. 2020, 31, 2684–2695. [Google Scholar]
Nguyen, T.; Tran, D. Towards a transfer learning-based hybrid deep learning model for network intrusion detection. IEEE Access 2021, 9, 19147–19158. [Google Scholar]
Huang, G.; Liu, Z.; Van Der Maaten, L.; Weinberger, K.Q. Densely connected convolutional networks. IEEE Trans. Pattern Anal. Mach. Intell. 2020, 42, 310–324. [Google Scholar]
Gurung, S.; Ghose, M.K.; Subedi, A. Deep Learning Approach on Network Intrusion Detection System using NSL-KDD Dataset. Int. J. Comput. Netw. Inf. Secur. 2019, 11, 9–16. [Google Scholar] [CrossRef]
Liu, J.; Kantarci, B.; Adams, C. Machine Learning-Driven Intrusion Detection for Contiki-NG-Based IoT Networks Exposed to NSL-KDD Dataset. In Proceedings of the 2nd ACM Workshop on Wireless Security and Machine Learning, Linz, Austria, 13 July 2020. ACM International Conference Proceeding Series. [Google Scholar]
Pristyanto, Y.; Nugraha, A.F.; Pratama, I.; Dahlan, A.; Wirasakti, L.A. Dual Approach to Handling Imbalanced Class in Datasets Using Oversampling and Ensemble Learning Techniques. In Proceedings of the 2021 International Conference on Information Networking (ICOIN), Seoul, Republic of Korea, 4–6 January 2021. [Google Scholar]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward generating a new intrusion detection dataset and intrusion traffic characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP), Funchal, Portugal, 22–24 January 2018; pp. 108–116. [Google Scholar]
Xu, H.S.; Fan, G.L.; Song, Y.P. Novel Key Indicators Selection Method of Financial Fraud Prediction Model Based on Machine Learning Hybrid Mode. Mob. Inf. Syst. 2022, 2022, 6542652. [Google Scholar] [CrossRef]
Tharwat, A. Classification assessment methods. Appl. Comput. Inform. 2021, 17, 168–192. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a distilled version of BERT: Smaller, faster, cheaper and lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar]
Yang, Z.; Dai, Z.; Yang, Y.; Carbonell, J.; Salakhutdinov, R.; Le, Q.V. XLNet: Generalized Autoregressive Pretraining for Language Understanding. Adv. Neural Inf. Process. Syst. 2019, 32, 5753–5763. [Google Scholar]
Raffel, C.; Shazeer, N.; Roberts, A.; Lee, K.; Narang, S.; Matena, M.; Yanqi, Z.; Wei, L.; Liu, P.J. Exploring the Limits of Transfer Learning with a Unified Text-to-Text Transformer. J. Mach. Learn. Res. 2020, 21, 1–67. [Google Scholar]
Zhendong, W.; Hui, C.; Shuxin, Y.; Xiao, L.; Dahai, L.; Junling, W. A lightweight intrusion detection method for IoT based on deep learning and dynamic quantization. PeerJ Comput. Sci. 2023, 9, e1569. [Google Scholar]
Denis, P.; Lubov, G.; Artur, Z.; Anton, P. Investigation of the impact effectiveness of adversarial data leakage attacks on the machine learning models. ITM Web Conf. 2024, 59, 04011. [Google Scholar]
Sidra, A.; Imen, B.; Stephen, O.; Abdullah, A.H.; Gabriel, A.S.; Ahmad, A.; Michal, G. Evaluating deep learning variants for cyber-attacks detection and multi-class classification in IoT networks. PeerJ Comput. Sci. 2024, 10, e1793. [Google Scholar]
Onur, S.; Suleyman, U. Advancing Intrusion Detection Efficiency: A’Less is More’Approach via Feature Selection. 2023. Available online: https://www.researchsquare.com/article/rs-3398752/v1 (accessed on 22 August 2024).

Figure 1. Transformation from feature vector to feature matrix.

Figure 2. The deep network architecture of VGGIncepNet model.

Figure 3. The system architecture of VGGIncepNet model.

Figure 4. Data preprocessing steps.

Figure 5. Confusion matrix of the VGGIncepNet model in NSL-KDD dataset.

Figure 6. Confusion matrix of the VGGIncepNet model in CICIoT2023 dataset.

Figure 7. Classification results comparison in NSL-KDD dataset.

Figure 8. Classification results comparison in CICIoT2023 dataset.

Table 1. Description of NSL-KDD dataset.

Label Type	KDDTrain+	Train Set Scale (%)	KDDTest+	Test Set Scale (%)
Benign	67,343	53.46	12,339	54.73
DoS	45,927	36.46	6736	29.88
Probe	11,656	9.25	2514	11.15
R2L	995	0.79	823	3.65
U2R	52	0.04	132	0.59

Table 2. Description of CICIoT2023 dataset.

Label Type	Train	Train Set Scale (%)	Test	Test Set Scale (%)
Benign	2474	12.54	610	12.37
BruteForce	720	3.65	301	6.1
DDos	3097	15.7	489	9.92
Dos	6443	32.67	1904	38.61
Mirai	2120	10.75	531	10.77
Recon	1842	9.34	387	7.85
Spoofing	2036	10.32	436	8.84
Web	990	5.02	273	5.37

Table 3. Comparison of classification performance between VGG16 and VGGIncepNet in NSL-KDD and CICIoT2023 datasets.

Models	Datasets	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
VGG16	NSL-KDD	0.87	0.86	0.85	0.85
VGGIncepNet	NSL-KDD	0.93	0.94	0.92	0.93
VGG16	CICIoT023	0.89	0.91	0.89	0.90
VGGIncepNet	CICIoT023	0.92	0.92	0.93	0.92

Table 4. Comparative performance of different models in NSL-KDD dataset.

Models	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
BERT	0.92	0.90	0.88	0.90
DistilBERT	0.85	0.83	0.84	0.83
XLNet	0.88	0.86	0.88	0.87
T5	0.89	0.91	0.91	0.91
VGGIncepNet	0.93	0.94	0.92	0.93

Table 5. Comparative performance of different models in CICIoT2023 dataset.

Models	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
BERT	0.89	0.89	0.92	0.90
DistilBERT	0.85	0.85	0.84	0.84
XLNet	0.91	0.90	0.90	0.90
T5	0.88	0.88	0.87	0.87
VGGIncepNet	0.92	0.92	0.93	0.92

Table 6. Comparison between previous works and proposed model for CICIoT2023 dataset.

Reference	Accuracy (%)	Precision (%)	Recall (%)	F1-Score (%)
Ref. [30]	0.87	0.86	0.86	0.86
Ref. [31]	0.89	0.90	0.88	0.89
Ref. [32]	0.90	0.92	0.90	0.91
Ref. [33]	0.92	0.91	0.91	0.91
VGGIncepNet	0.92	0.92	0.93	0.92

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Chen, J.; Xiao, J.; Xu, J. VGGIncepNet: Enhancing Network Intrusion Detection and Network Security through Non-Image-to-Image Conversion and Deep Learning. Electronics 2024, 13, 3639. https://doi.org/10.3390/electronics13183639

AMA Style

Chen J, Xiao J, Xu J. VGGIncepNet: Enhancing Network Intrusion Detection and Network Security through Non-Image-to-Image Conversion and Deep Learning. Electronics. 2024; 13(18):3639. https://doi.org/10.3390/electronics13183639

Chicago/Turabian Style

Chen, Jialong, Jingjing Xiao, and Jiaxin Xu. 2024. "VGGIncepNet: Enhancing Network Intrusion Detection and Network Security through Non-Image-to-Image Conversion and Deep Learning" Electronics 13, no. 18: 3639. https://doi.org/10.3390/electronics13183639

APA Style

Chen, J., Xiao, J., & Xu, J. (2024). VGGIncepNet: Enhancing Network Intrusion Detection and Network Security through Non-Image-to-Image Conversion and Deep Learning. Electronics, 13(18), 3639. https://doi.org/10.3390/electronics13183639

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

VGGIncepNet: Enhancing Network Intrusion Detection and Network Security through Non-Image-to-Image Conversion and Deep Learning

Abstract

1. Introduction

2. Related Work

2.1. Transformer-Based Network Intrusion Detection Method

2.2. Knowledge Graph Combined with Deep Learning Detection

2.3. CNN-LSTM Model Application in DDoS Attack Detection

2.4. Application of Sequence Analysis and Machine Learning in NIDS

2.5. Convolutional Neural Network (CNN)-Based Network Intrusion Detection System

2.6. Our Research Method

3. Methods

3.1. Overall Dimension Reduction and Feature Arrangement

3.2. Image Generation for Each Sample

3.3. Pixel Overlay Processing

3.4. VGGIncepNet

4. Experiments and Results Analysis

4.1. Data Preprocessing

4.2. Evaluation Indicators

4.3. Performance Analysis

4.4. Performance Comparison and Analysis of Different Models

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI