Deep Feature Fusion via Transfer Learning for Multi-Class Network Intrusion Detection

Lee, Sunghyuk; Roh, Donghwan; Yu, Jaehak; Moon, Daesung; Lee, Jonghyuk; Bae, Ji-Hoon

doi:10.3390/app15094851

Open AccessArticle

Deep Feature Fusion via Transfer Learning for Multi-Class Network Intrusion Detection

by

Sunghyuk Lee

¹,

Donghwan Roh

¹,

Jaehak Yu

²

,

Daesung Moon

²,

Jonghyuk Lee

^1,*

and

Ji-Hoon Bae

^3,*

¹

Department of AI and Big Data Engineering, Daegu Catholic University, Gyeongsan-si 38430, Republic of Korea

²

Electronics and Telecommunications Research Institute, Daejeon 34129, Republic of Korea

³

Department of Computer Education, Korea National University of Education, Cheongju-si 28173, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2025, 15(9), 4851; https://doi.org/10.3390/app15094851 (registering DOI)

Submission received: 31 March 2025 / Revised: 22 April 2025 / Accepted: 23 April 2025 / Published: 27 April 2025

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

With the rapid advancement of network technologies, cyberthreats have become increasingly sophisticated, posing significant challenges to traditional intrusion detection systems. Conventional machine learning and deep learning approaches frequently experience performance degradation when confronted with imbalanced datasets and novel attack vectors. To address these limitations, this study proposes a deep learning-based intrusion detection framework that employs feature fusion through incremental transfer learning between source and target domains. The proposed architecture integrates convolutional neural networks (CNNs) with an attention mechanism to extract and aggregate salient features, thereby enhancing the model’s discriminative capacity between normal traffic and various network attack categories. Experimental results demonstrate that the proposed model achieves a detection accuracy of 94.21% even when trained on only 33% of the available data, outperforming conventional models. These findings underscore the effectiveness of the proposed feature fusion strategy via transfer learning in improving detection capabilities within dynamic and evolving cyberthreat environments.

Keywords:

network intrusion detection; transfer learning; deep learning; feature fusion

1. Introduction

With the rapid advancement of information and communication technology (ICT), networks have evolved into essential infrastructure in contemporary society, serving diverse technologies and services beyond fundamental data exchange. Critical sectors of modern society—including economics, education, healthcare, telecommunications, and finance—are increasingly interconnected and operated through networks, amplifying their significance. In the global marketplace, organizations exchange data in real time, while e-commerce platforms facilitate worldwide customer connections, enabling transactions irrespective of time or location. Furthermore, networks constitute the foundational infrastructure for cloud-based services and big data analytics, facilitating more efficient organizational decision making [1].

This proliferation of network technologies has generated vast quantities of data exchange while simultaneously creating novel security vulnerabilities, resulting in escalating cybercrimes and threats that exploit these weaknesses. Both the frequency and magnitude of cyberattacks increase annually, with attack methodologies becoming increasingly sophisticated and complex. Vulnerable sectors, including finance, healthcare, and defense, face threats that extend beyond simple data breaches, potentially incurring severe damage through attacks such as denial of service (DoS) and ransomware [2]. Consequently, cybersecurity has emerged as a critical imperative across societal and industrial domains. The advancement of technologies for detecting intrusions and anomalies in encrypted network traffic is essential to counter irregular attack patterns and evolving methodologies that transform annually across heterogeneous environments. Moreover, conventional security technologies exhibit limitations in defending against emergent threats that elude anticipation, necessitating the implementation of more sophisticated and automated detection mechanisms [3].

Previous research has predominantly focused on the binary classification of network traffic into normal and malicious categories using machine learning-based pattern analysis [4,5,6]. While this approach demonstrated efficacy in relatively simplistic historical network environments, it exhibits limitations in adequately capturing the complexities of rapidly evolving contemporary networks and traffic patterns. Specifically, traditional methodologies rely on fixed pattern-based learning, reducing adaptability to novel attack types or dynamic traffic variations. In addition, the substantial disparity between normal traffic and attack data proportions creates class imbalance, causing models to develop bias toward the predominant normal class, thereby diminishing sensitivity to actual attacks. Consequently, these approaches struggle to effectively respond to highly sophisticated and irregular network attacks [7].

Furthermore, datasets commonly employed in network intrusion detection research are predominantly designed around well-documented attack types, resulting in diminished generalization performance against emerging cyber threats [8]. Consequently, while existing models achieve high detection performance for known attacks, their effectiveness deteriorates rapidly when encountering unknown threats. This challenge intensifies as network security environments grow increasingly complex and cyber attackers employ diverse techniques to circumvent traditional detection systems. Accordingly, modern intrusion detection systems (IDSs) must demonstrate both scalability and adaptability to effectively counter emerging cyber threats, providing a defense not only against well-documented attack types but also against the increasing diversity and complexity of novel cyberattacks. Achieving this objective requires approaches that resolve dataset imbalance issues while maintaining robust detection performance across diverse environments through techniques such as transfer learning [9].

This paper utilizes the CIC-IDS dataset, widely employed for evaluating machine learning- and deep learning-based intrusion detection systems (IDSs) and encompassing diverse attack types and network traffic patterns, to effectively detect increasingly sophisticated and varied cyberattacks in network security environments. The methodology first defines source and target domains, extracting features through learning in the source domain based on shared input features between domains, which are subsequently transferred to the target domain. These features undergo fusion and fine-tuning operations. Through this process, a domain-adaptive transfer learning-based feature fusion model is proposed to classify multiple attack types in the target domain. The CIC-IDS 2017 dataset serves as the source domain for learning, with transfer learning techniques applied to transmit knowledge to the target domain based on the CSE-CIC-IDS 2018 dataset, enabling the model to acquire new features and maximize transfer learning effectiveness. In addition, binary classification and multi-class classification models are employed to fuse the extracted features, facilitating multi-perspective analysis of data characteristics and thereby enhancing detection performance. The result is a flexible intrusion detection system that effectively overcomes the limitations of previous models in detecting previously unknown attack types and mitigating dataset imbalance issues in network traffic.

The structure of this paper is as follows: Section 2 analyzes existing research to highlight the distinct contributions of this study. Section 3 provides a detailed discussion of the datasets and data preprocessing techniques employed for training the AI models. Section 4 introduces the proposed transfer learning and feature fusion-based network attack-type classification model. Section 5 presents experimental results and performance analysis to validate the effectiveness of the proposed model. Finally, Section 6 concludes the study and discusses future research directions.

2. Related Work

In the field of network intrusion detection, numerous studies have explored performance enhancement through machine learning- and deep learning-based models. This section examines relevant research and delineates the distinctive contributions of the present study.

Chen et al. proposed a hybrid intrusion detection model that integrates artificial neural networks (ANNs) with support vector machines (SVMs), employing traditional machine learning techniques [10]. Their methodology utilized feature extraction and dimensionality reduction to initially classify data into normal and abnormal categories, followed by multi-class classification of abnormal data to further differentiate attack types. This approach aimed to enhance both accuracy and sensitivity in intrusion detection while addressing contemporary network security challenges.

Selvam et al. (2023) developed a convolutional neural network (CNN) based model to improve existing autoencoder-based intrusion detection systems [11]. By leveraging the robust feature extraction capabilities of CNNs, their model effectively learned critical patterns from network data, thereby accelerating learning speed and increasing detection rates across various attack types compared to preceding methods. The research validated this model using the CIC-IDS dataset, demonstrating superior detection performance relative to established techniques.

Gayatri et al. introduced a hybrid intrusion detection system that synthesizes the advantages of both network intrusion detection systems (NIDSs) and host intrusion detection systems (HIDSs) [12]. This comprehensive approach analyzes not only network traffic data but also internal host logs and file modification data. Furthermore, the system implements a two-stage classification process: initially employing binary classification to distinguish between normal and abnormal data, subsequently applying multi-class classification to categorize abnormal data by specific attack type, thereby maximizing detection efficiency.

Transfer learning has emerged as a significant area of interest in intrusion detection research, offering techniques that accelerate learning and enhance performance by applying pretrained models to novel problems. Transfer learning is an effective strategy for addressing performance degradation across domains and the issue of insufficient labeled data. Ma et al. [13] conducted a systematic review of 1676 recent studies, providing a comprehensive overview of the current applications, practical use cases, and future research directions of transfer learning techniques. Transfer learning facilitates knowledge transference from models trained in the source domain to the target domain, enabling high performance even with limited training data. In network intrusion detection, where new attack types continuously emerge, the detection of threats in zero-day attacks and data-scarce environments constitutes a critical challenge. Consequently, research exploring intrusion detection through transfer learning has gained considerable momentum [14,15].

Additionally, feature fusion methodologies have been widely implemented to enhance performance across various deep learning domains by integrating diverse data sources or distinct features. Dai et al. [16] effectively classified plant disease images by proposing a deep information feature fusion network (DFN-PSAN) that integrates and processes multilevel deep feature representations for improved diagnostic accuracy. Meanwhile, Jabeen et al. [17] introduced an innovative breast cancer classification framework utilizing deep learning and feature fusion techniques, developing a system that automatically classifies breast cancer through ultrasound imaging. Furthermore, Li et al. [18] proposed a multi-scale feature fusion framework (MFFSP) that combines the complementary characteristics of CNNs and Transformers, enabling precise and effective detection of landslides. On the other hand, Li et al. [19] proposed an automatically detecting early depression (AEDE) device for depression prevention and developed a method that fuses pupil wave and pulse rate variability (PRV) signals via a multi-feature cross-attention (MFCA) technique.

Han et al. [20] demonstrated that despite transfer learning’s excellent performance in isolation, its limitations become evident in data-scarce environments. Integrating transfer learning with feature fusion to effectively combine complementary information from both domains achieves outstanding generalization performance and robustness, even when confronted with noise and limited data. This approach shows significant usefulness for accurately identifying complex network traffic and diverse abnormal signal patterns.

Therefore, this study proposes an intrusion detection model that surpasses existing approaches by applying transfer learning for performance enhancement in conjunction with advanced feature fusion techniques. The key contributions of this research in improving attack type classification performance for network anomaly detection are as follows:

This study employs the CIC-IDS dataset for attack-type classification in network anomaly detection. To address the persistent class imbalance challenge documented in previous research, the methodology incorporates strategic data preprocessing: rare attack categories were eliminated, while similar attack types were consolidated. Subsequently, random resampling techniques were implemented to equilibrate data distribution, accompanied by standard scaling to normalize all features to a mean of 0 and variance of 1, thereby enhancing the model’s generalization capabilities across diverse network environments.
By transferring learned feature representations from source domain models to the target domain, the proposed approach achieves superior detection performance despite limited training data availability. Furthermore, the model effectively leverages common feature spaces even when the source and target domains exhibit partial incongruence, significantly reducing computational training requirements while mitigating overfitting phenomena that commonly plague complex network detection systems.
To maximize detection efficacy, feature fusion techniques are integrated with the transfer learning-based architecture, amalgamating feature representations utilized in both binary classification and multi-class classification processes. This integration enables the model to synthesize diverse feature perspectives and derive more generalized pattern recognition capabilities, substantially enhancing its precision in identifying and categorizing various network attack typologies.

3. Data Processing and Preparation

This research utilizes the CIC-IDS2017 and CSE-CIC-IDS2018 datasets [21,22] for training and evaluating a network attack-type classification model that implements transfer learning techniques between source and target domains. These datasets, developed by the Canadian Institute for Cybersecurity (CIC), serve as standard benchmarks in network intrusion detection system (NIDS) research and provide comprehensive resources for training machine learning and deep learning models due to their inclusion of both normal network traffic and diverse cyberattack data.

The CIC-IDS2017 dataset, collected in 2017, comprises network traffic data encompassing normal traffic alongside 14 distinct attack types, such as DDoS, Brute Force, Botnet, and various Web Attacks. The CIC-IDS2018 dataset represents an enhanced version of the previous dataset, incorporating continuous network activity monitoring and expanding coverage to include IoT-based attacks as well as the latest web-based attacks, such as SQL Injection and cross-site scripting (XSS). Significantly, both datasets share 80 identical features, providing useful attributes for network traffic analysis, including flow duration metrics, interpacket time intervals, and TCP flag information.

A notable characteristic of both datasets is the predominance of normal traffic relative to attack traffic, with certain attack types represented by comparatively few samples. This class imbalance presents a critical challenge for model training, potentially inducing bias toward majority classes and compromising detection performance for underrepresented attack categories, thus presenting a significant obstacle to effective supervised learning [23]. Consequently, this study focuses on evaluating transfer learning techniques in network intrusion detection through the implementation of preprocessing methodologies that strategically integrate similar attack types and establish balanced class distributions to mitigate these inherent dataset limitations.

This study employs the CIC-IDS2017 and CIC-IDS2018 datasets as training data for the network intrusion detection model. These datasets include various cyberattack typologies and undergo systematic preprocessing to enhance model training effectiveness by integrating similar attack types and balancing data across classes. Figure 1 illustrates the key steps of this preprocessing process.

Data Loading and Outlier Removal

The first preprocessing step involves data loading and outlier elimination. After importing the CIC-IDS2017 and CIC-IDS2018 datasets, any outliers present in the original data are processed. Outliers can lead to calculation errors and cause distorted results during model training [24]. Therefore, Inf (∞) and −Inf (−∞) values are replaced with NaN, and all missing values (NaN) are removed to perform data cleaning.

2.: Removal of Features That Can Negatively Impact Model Training

After removing outliers, certain features within the dataset were removed. The dataset includes features that may not be useful for model training or may not be measured in real-world environments. For example, features such as Protocol, Destination Port (Dst Port), and Timestamp may be included in the dataset creation process but may not be measured in actual attack detection environments and can cause the model to overfit. Therefore, these features are removed to enhance the model’s generalization capability [25].

3.: Rare Class Removal

Next, after eliminating irrelevant features, attack types with extremely small sample sizes were excluded to address data imbalance issues. Certain attack categories containing minimal instances in the dataset tend to function as noise during training [26]. Due to the insufficient representative samples for these rare classes, the model struggles to learn their distinguishing characteristics, which consequently impairs its generalization capability when encountering similar attack types in operational scenarios. Moreover, retaining these underrepresented classes introduces superfluous noise and exacerbates the imbalance in sample distribution, further hindering the learning process by compromising the model’s ability to discern nuanced patterns within the dataset [27]. Therefore, to mitigate these challenges and focus on transfer learning performance, the following attack types were removed from the CIC-IDS2017 dataset: Bot, Web Attack Brute Force, Web Attack XSS, Infiltration, Web Attack SQL Injection, and Heartbleed. Similarly, BruteForce-Web, BruteForce-XSS, and SQL Injection were excluded from the CIC-IDS2018 dataset.

4.: Application of Sampling Techniques to Mitigate Data Imbalance

Following the removal of rare attack types, undersampling techniques were applied to adjust the number of samples per class, mitigating a data imbalance across attack types [28]. In multi-class classification models, an imbalanced number of samples across classes can negatively impact overall model training, and this imbalance can be mitigated by undersampling the majority-class data [29]. In this process, the data ratio for each attack type was adjusted to 1:1 relative to the minority classes, preventing overfitting to majority-class attack types.

5.: Integration of Similar Attack Types

As some attack types share similar characteristics, they were grouped into the same category, as outlined in [30]. Attack types that were not removed but had relatively fewer data samples were merged based on their similarities. For example, in the CIC-IDS2017 dataset, DoS_Hulk, DoS_GoldenEye, DoS_slowloris, and DoS_Slowhttptest were integrated into “DoS”, and in the CIC-IDS2018 dataset, each DoS attack type was combined into “DoS”, while DDoS attack-HOIC, DDoS attacks-LOIC-HTTP, and DDoS attack-LOIC-UDP were merged into “DDoS”. This process can help mitigate the data imbalance, ensuring the model learns effectively from minority-class attack types.

6.: Data Standardization (Standard Scaling)

When features have different units, certain features may disproportionately influence model training. To prevent this, the standard scaling technique was applied [31]. Standard scaling transforms each feature to exhibit a mean (μ) of 0 and a standard deviation (σ) of 1, following Equation (1).

x^{'} = \frac{x - μ}{σ},

(1)

where

x^{'}

is the standardized value,

x

is the original data,

μ

is the mean of the data, and

σ

is the standard deviation. Through this transformation, all features are adjusted to a comparable scale, allowing the model to learn from the data more effectively.

Thus, as illustrated in Figure 1, the data preprocessing process in this study comprises six sequential stages: removal of outliers, elimination of unnecessary features, removal of rare classes, application of sampling techniques, integration of similar classes, and standardization of data. This preprocessing mitigated the inherent data imbalance problem in network intrusion datasets, allowing the model to focus more on the transfer learning perspective and detect new attack types more accurately.

4. Proposed Method

The network intrusion detection model proposed in this study implements a three-stage learning process for network attack type classification, utilizing advanced transfer learning-based feature fusion techniques. The model is designed to achieve high detection performance across both source and target domains by integrating a one-dimensional CNN architecture with a multi-head self-attention mechanism [32]. Thus, the core innovation of the proposed model lies in its capacity to enhance the generalization performance through strategic feature fusion and transfer learning.

4.1. Overview of the Proposed Model

This study proposes a transfer learning-based feature fusion model that effectively transfers learned knowledge representations between domains to improve intrusion detection performance. Transfer learning enhances model generalization by adapting knowledge acquired from a source domain to a target domain [33]. Feature fusion, in parallel, integrates features from multiple sources or domains into a unified representation, enabling the model to capture complementary patterns for enhanced classification performance. The proposed design aims to mitigate the challenges posed by evolving threat patterns in real-world environments by employing a transfer learning-based feature fusion strategy. This approach enables the model to effectively capture both general anomaly patterns and fine-grained characteristics of specific attack types.

The proposed model consists of three main stages to achieve this purpose.

(1): First, binary and multi-class classification models are constructed by training on the CIC-IDS2017 dataset in the source domain.
(2): Subsequently, these models are transferred to the target domain using the CIC-IDS2018 dataset and retrained to adapt to updated traffic patterns and newly emerging attack types.
(3): Finally, the feature representations extracted from these retrained models are systematically fused, and final fine-tuning [34] is performed utilizing the CIC-IDS2018 dataset to execute a multi-class classification of network attack types.

Figure 2 illustrates the training methodology of the proposed model. The training process begins in the source domain (CIC-IDS2017), where two separate CNN-based models are trained in parallel.

(1): Source Model #1 performs binary classification (normal network traffic and attack traffic) to detect the presence of malicious traffic.
(2): Source Model #2 handles multi-class classification to categorize attack traffic into specific attack typologies further.

Upon completion of training, knowledge representations from both Source Model #1 and Source Model #2 are systematically transferred to the target domain through transfer learning mechanisms.

Subsequently, in the target domain, both Source Model #1 and Source Model #2 undergo retraining using the CIC-IDS2018 dataset, incorporating contemporary network intrusion data. This adaptive retraining process optimizes the models to accommodate the evolving network environment and emerging attack vectors in the target domain, thereby minimizing performance degradation resulting from domain disparities.

In the final phase, features extracted from the retrained Source Model #1 and Source Model #2 are strategically fused at the high-level feature layer, where semantic representations from both the binary and multi-class pathways are aggregated. This combined representation captures both general threat patterns and specific attack signatures, enhancing the model’s ability to distinguish between various types of intrusions. A subsequent fine-tuning process further refines the model for multi-class classification across six distinct categories (normal traffic plus five attack types), including identifying novel attack patterns absent in the source domain.

By synthesizing information from both detection and classification pathways, the proposed architecture achieves high precision in multi-class intrusion detection tasks. This strategy yields superior detection performance and enhanced generalization capabilities in diverse network environments compared to conventional single-model methodologies [35].

4.2. Detailed Description of the Proposed Model Structure

The feature fusion process integrates the complementary advantages of the binary classification model (Source Model #1) and the multi-class classification model (Source Model #2), both trained using transfer learning techniques in the source and target domains. Through this process, binary and multi-class classification features extracted from both models are fused, ultimately aiming for multi-class classification, including newly emerging attack types within the CIC-IDS2018 dataset.

Figure 3 illustrates the detailed structure of the binary and multi-class classification models in the source domain. Source Model #1 is designed as a binary classification model for network intrusion detection, while Source Model #2 is a multi-class classification model for identifying various attack types. Both models share an identical network structure except for the classifier.

Each model combines a one-dimensional convolutional neural network (1D-CNN)-based feature extraction module with a multi-head self-attention mechanism to learn patterns in network traffic data effectively. First, the 1D-CNN block sequentially extracts key features from traffic data. Then, the multi-head self-attention block is applied to assign weights to important features, enhancing learning performance.

Both models are trained using the CIC-IDS2017 dataset, with input data undergoing the preprocessing steps shown in Figure 1, where it is transformed into a two-dimensional array of shapes (76,1). These input data are defined as shown in Equation (2).

X = [x_{1}, x_{2}, \dots, x_{76}], x_{t} \in R^{1},

(2)

where

x_{t}

represents the t-th feature value.

At the input stage of the model, the data passes through two 1D-CNN blocks, each consisting of a Conv1D layer and a MaxPooling layer. This process extracts key data features while reducing dimensionality and improving computational efficiency. The convolution and max-pooling operations performed in each 11D-CNN block l

(l = 1, 2)

are defined by Equations (3) and (4), respectively.

h_{i}^{(l)} = R e L U (\sum_{j = 0}^{k - 1} W_{j}^{(l)} h_{i + j}^{(l - 1)} + b^{(l)}),

(3)

where ReLU is a function that outputs negative values as zero and positive values as they are for input values, defined as

R e L U (x) = m a x (0, x)

.

W

represents the weight of the convolution filter,

h

represents the feature value of the previous layer, and

b

represents the bias.

h_{j}^{(l)} (p o o l) = \max (h_{2 j - 1}^{(l)}, h_{2 j}^{(l)}) .

(4)

The features extracted through the additional Conv 1D layer are then input to the multi-head attention (MA) block. This process assigns higher weights to important patterns in the input data, facilitating more effective learning. The Attention Score applied to multi-head self-attention is calculated from Equations (5)–(7) [36].

Q = Z {\cdot W}_{Q}, K = Z {\cdot W}_{K}, V = Z {\cdot W}_{V} .

(5)

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{⊺}}{\sqrt{d_{k}}}) V .

(6)

A t t e n t i o n S c o r e = \frac{Q {\cdot K}^{T}}{\sqrt{d_{k}}} .

(7)

Here, Equation (5) represents the process of generating Query (Q), Key (K), and Value (V) by multiplying the weight matrices

W_{Q}

,

W_{K}

, and

W_{V}

for the input vector Z. That is, Z represents each input vector, Q, K, and V represent Query, Key, and Value, respectively, and

W_{Q} {, W}_{K}, W_{V}

are the corresponding weight matrices. Equation (6) defines the Attention function, where the dot product of Q and K is scaled by dividing it by the square root of the Key vector dimension

d_{k}

. The result is then passed through the Softmax function, generating a probability distribution between 0 and 1, ensuring that important features receive higher attention through this distribution. Equation (7) represents the computed Attention Score, which is defined as the value obtained by dividing the dot product of Q and K by the square root of

d_{k}

.

Finally, multi-head attention is defined as a process where multiple independent attention mechanisms are executed in parallel. The outputs from each attention mechanism are then concatenated, followed by multiplication with the weight matrix

W_{O}

to generate the final output. This can be expressed in Equation (8) as follows:

M u l t i H e a d (Q, K, V) = C o n c a t ({h e a d}_{1}, \dots, {h e a d}_{h}) W_{O},

(8)

where h represents the number of heads, and

W_{O}

represents the weight matrix applied to the combined heads.

Next, the Flatten layer converts the 2D feature map into a 1D vector, which is then processed through Dense and Dropout layers to reconstruct the features into a format suitable for classification. Finally, Source Model #1 performs binary classification, determining normal and attack status using the Sigmoid function, and Source Model #2 classifies six attack types through the Softmax function. This structure allows both models to share the same network architecture. Still, their roles differ based on the activation function used in the final output layer, enabling binary and multi-class classification, respectively.

Figure 4 illustrates the detailed structure of the binary and multi-class classification models in the target domain. In the target domain, two pre-trained models from the source domain, Source Model #1 and Source Model #2 are retrained using the CIC-IDS 2018 dataset through transfer learning techniques. During this process, the architecture and parameters of each model remain identical to those of Source Model #1 and Source Model #2 in the source domain, except for the classifier.

As the model structure remains the same in the target domain, the training process follows a similar approach to that of the source domain, allowing the models to learn the features of the CIC-IDS2018 dataset. Source Model #1 performs binary classification, determining whether the traffic is normal or an attack. Source Model #2 conducts multi-class classification for six attack types, including previously learned attack types from the source domain and newly emerging attack types in the target domain. Finally, after feature fusion using the retrained Source Model #1 and Source Model #2 in the target domain, the final classification process is carried out for six attack types in the CIC-IDS2018 dataset.

Figure 5 illustrates the overall structure of the proposed network intrusion detection model, incorporating transfer learning and feature fusion. The upper section of Figure 5 summarizes the overall model training process, while the lower section illustrates the detailed training procedure, including the feature fusion process. Both sections visually represent the same overall workflow.

During the model training process, once training in the source domain is completed, Source Model #1 and Source Model #2 are transferred to the target domain through transfer learning. Here, #1 represents the transfer learning process of Source Model #1, while #2 represents that of Source Model #2.

Next, in the target domain, the previously trained binary classification model (Source Model #1) and multi-class classification model (Source Model #2) are retrained. During this phase, the classifier layers are excluded, and only the remaining model structure is used for feature extraction. Once both models complete training, the trained models are saved.

At this stage, the extracted features from both models are structured as (1, 64) each, and these features are concatenated to form fused features in the shape of (1, 128).

Finally, the fused features are processed through a Softmax classifier, enabling multi-class classification into six attack types. In this feature fusion process, the features output from Source Model #1 and Source Model #2 are combined, creating a richer feature map than single-model training while integrating more information. Additionally, this transfer learning-based feature fusion approach helps reduce bias, which can occur in single-model training and improves the generalization performance even in small-scale data environments. This method enhances detection performance for new attack types and enables the model to adapt more effectively to changing network environments. Table 1 presents the final structure of the proposed model.

5. Results

This section presents the training parameters, performance evaluation metrics, and comparative performance analysis with various existing models for the model proposed in Figure 5.

5.1. Experimental Environment and Parameter Settings

The datasets used in the experiments are CIC-IDS2017 and CIC-IDS2018. To verify the effectiveness of transfer learning in the target domain, where attack-type data are sparser than in the source domain, 1000 samples were randomly selected from each dataset for the experiment. Subsequently, the data were divided into training data and test data at a 3:1 ratio, and in the target domain, the class ratio was maintained at 1:1. Table 2 shows the main training parameters used in binary and multi-class classification model training experiments using the source domain and target domain datasets. In particular, rigorous validation experiments systematically optimized key training parameters, such as the learning rate, number of epochs, and batch size. This process employed a grid search technique using the Keras Tuner API [37], allowing for the efficient exploration of various hyperparameter combinations across source and target domains. The configuration that yielded the highest performance was selected as the final setting.

Table 3 presents the class types and distributions in both domains. In the target domain, data were evenly sampled to maintain a 1:1 class ratio across all classes. Meanwhile, the source domain includes the PortScan attack type, which is absent in the target domain, whereas the target domain introduces the Bot attack type, which is not present in the source domain.

5.2. Analysis of Results for the Proposed Method

Accuracy and F1-score are used to evaluate the performance of the proposed model. Accuracy is defined as shown in Equation (9).

A c c u r a c y = \frac{T P + T N}{T P + T N + F P + F N},

(9)

where True Positive (TP) refers to the number of actual positive samples correctly classified as positive. At the same time, True Negative (TN) represents the number of actual negative samples correctly classified as negative. Conversely, False Positive (FP) indicates the number of actual negative samples incorrectly classified as positive, and False Negative (FN) denotes the number of actual positive samples incorrectly classified as negative. While this accuracy metric is useful for intuitively evaluating the model’s overall performance, its reliability may decrease if there is a class imbalance problem. Therefore, this study uses the F1-score as a supplementary performance metric. F1-score, defined in Equation (12), is the harmonic mean of Precision and Recall.

P r e c i s i o n = \frac{T P}{T P + F P} .

(10)

R e c a l l = \frac{T P}{T P + F N} .

(11)

F 1 - s c o r e = \frac{2 \times P r e c i s i o n \times R e c a l l}{P r e c i s i o n + R e c a l l}

(12)

Here, Precision in Equation (10) represents the proportion of actual positive samples among those predicted as positive by the model, while Recall in Equation (11) denotes the proportion of correctly predicted positive samples among all actual positive samples. The F1-score, derived from these values, serves as an effective metric for assessing the model’s overall performance, even in class imbalance scenarios.

Figure 6 visually represents the loss and accuracy trends over epochs during the training process of the proposed model. This study applied the Early Stopping technique to prevent unnecessary training while maintaining optimal performance. As a result, training continued up to the 16th epoch and was terminated early as no further performance improvement was observed. Figure 6a illustrates the changes in training and test loss over epochs, showing a gradual decrease and convergence of loss values as the model training progresses, indicating the model’s gradual optimization process. Figure 6b presents the changes in training accuracy and test accuracy over epochs. As training progressed, accuracy steadily increased and eventually stabilized at a certain level. This observation confirms that the model is effectively learning from the training data while maintaining stable performance without signs of overfitting.

For performance comparison of the proposed model, existing machine learning models such as k-nearest neighbor (KNN) [38], logistic regression (LR) [39], and support vector machine (SVM) [40] were used, as shown in Table 4. KNN is an algorithm that makes predictions and classifications based on the k closest neighbors of a data point. LR linearly combines input data, applies a Sigmoid function to convert it into probability values, and classifies labels accordingly. SVM is a model that classifies data by finding the optimal hyperplane to maximize the margin. Before evaluating the performance of models with transfer learning and feature fusion applied, the performance of existing models like CNN without these techniques was first compared, followed by the application of Attention and transfer learning. Here, NoTL refers to a state without transfer learning applied. Additionally, to evaluate the performance improvements achieved by the feature fusion technique, a comparative analysis was conducted between the proposed model and a CNN model that employed transfer learning from the source domain to the target domain without applying feature fusion (CNN + Attention (TL)). The proposed model in this study is designed with an architecture that includes CNN layers, attention layers, and a feature fusion process.

The proposed model achieved an accuracy of 94.21% and an F1-score of 94.16%, outperforming all comparison models. Among traditional machine learning models, KNN exhibited the highest performance, with 94.12% accuracy and a 94.05% F1-score, followed by SVM with 92.06% accuracy and a 91.84% F1-score, and logistic regression (LR) with 90.86% accuracy and a 90.77% F1-score. The CNN (NoTL) model recorded 92.70% accuracy and a 92.63% F1-score among deep learning models. The CNN + Attention (NoTL) model, incorporating an attention mechanism, demonstrated improved performance with 93.71% accuracy and a 93.64% F1-score. The CNN + Attention (TL) model, which adopted transfer learning, achieved improved performance with an accuracy of 93.96% and an F1-score of 93.90%. However, the proposed model outperformed all comparison models, confirming that model architecture optimization and learning strategy improvements effectively enhance classification performance.

5.3. Comprehensive Performance Analysis of the Proposed Method

Table 5 and Table 6 present the experimental results conducted to verify the impact of transfer learning in target domains with limited data. To analyze this effect, the accuracy and F1-score of each model were compared as the training data ratio was progressively reduced to 100%, 66%, and 33%.

According to the experimental results in Table 5, the proposed model consistently maintains the highest accuracy across all data ratios, with an increasing performance gap between models as the available data decreases. When utilizing 100% of the training data, the proposed model achieved 94.21% accuracy, outperforming all comparison models. At this stage, KNN recorded the second-highest accuracy at 94.12%, while CNN-based models also maintained relatively high accuracy. As the training data ratio decreased to 66%, most existing models experienced a decline in accuracy. However, the proposed model demonstrated superior performance, achieving an accuracy of 94.10%, while the CNN + Attention (TL) model recorded 93.90%, showing the second-highest performance. When the training data ratio was further reduced to 33%, all models exhibited a decrease in accuracy due to the significant reduction in data. However, the proposed model demonstrated the smallest performance degradation. Notably, while the CNN + Attention (NoTL) model showed a significant performance drop to 92.43% and the CNN (NoTL) model decreased to 91.38%, the proposed model exhibited the smallest performance degradation.

These results indicate that the proposed model maintains a high level of generalization performance even in limited data environments. Notably, the proposed model demonstrates greater robustness to data reduction than traditional machine learning models (KNN, SVM, LR) and exhibits more effective feature learning and a greater generalization performance than existing CNN- and attention-based models.

Next, when comparing F1-scores across models at different training data ratios (100%, 66%, 33%) using Table 6, similarly to the experimental results in Table 5, the proposed model consistently achieved the highest F1-score across all data ratios. Additionally, as the training data decreased, the performance gap between models tended to increase. With 100% of the training data, the proposed model recorded the highest F1-score of 94.16%, demonstrating the best performance. KNN followed with 94.05%, while the CNN + Attention (TL) model maintained competitive performance with an F1-score of 93.90%. When the training data ratio decreased to 66%, most existing models experienced performance degradation, but the proposed model showed the smallest performance decrease with an F1-score of 94.05%. Compared to the CNN + Attention (NoTL) model, which recorded 93.27%, and the CNN (NoTL) model, which dropped to 92.33%, the proposed model demonstrated higher robustness to data reduction. When the training data ratio was reduced to 33%, the F1-score of all models decreased; however, the proposed model maintained the highest F1-score at 93.59%. Notably, the performance of the CNN + Attention (TL) model decreased to 92.99%, and the CNN + Attention (NoTL) model declined further to 92.33%, indicating that the absence of feature fusion and transfer learning leads to significant performance degradation. Conversely, the proposed model exhibited relatively stable performance.

A comprehensive analysis of the experimental results from Table 5 and Table 6 indicates that the proposed model maintains high classification performance even as the training data decreases. This result demonstrates that the proposed approach can minimize performance degradation even in situations with limited data availability by effectively leveraging the feature representations of pre-trained models. Additionally, it confirms that the model can sustain high generalization performance even in new data environments, such as previously unseen attack types.

Notably, compared to traditional CNN-based models, the proposed model significantly improves robustness to data reduction by effectively adjusting initial weight configurations through transfer learning and optimizing information from various feature maps through feature fusion.

Figure 7 presents a confusion matrix that visualizes the model’s prediction results for each attack type, providing an intuitive comparison between actual and predicted classifications. In Figure 7a, the CNN (NoTL) model, which does not incorporate transfer learning or feature fusion, demonstrates lower classification performance for some attack types compared to the proposed model in Figure 7b. For example, in the Benign type, the CNN (NoTL) model correctly classified 250 out of 251 instances, while the proposed model classified 249 correctly. For the DDoS type, the CNN (NoTL) model correctly predicted 243 out of 250 instances, whereas the proposed model demonstrated improved performance by correctly predicting 249 instances. For the DoS type, the CNN (NoTL) model accurately classified 189 instances, while the proposed model correctly classified 191 instances.

Meanwhile, both models performed equally in classifying FTP-Patator and SSH-Patator attack types, accurately predicting 235 out of 251 and 251 out of 251 instances, respectively. However, for the Bot attack type, which is a new attack type introduced in the target domain visually highlighted with a red circle, the CNN (NoTL) model correctly classified 237 out of 251 instances. In contrast, the proposed model outperformed with 250 correct classifications, demonstrating superior performance. These results experimentally confirm that transfer learning and feature fusion effectively classify new attack types.

6. Conclusions

This study proposed a multi-class deep learning model that applies transfer learning-based feature fusion for network attack detection. To address the limitations of existing intrusion detection systems, which struggle to predict new attack types effectively, this study mitigated the data shortage problem through transfer learning and enhanced detection accuracy for new attack types by incorporating feature fusion techniques.

The experimental results demonstrated that the proposed model exhibited the smallest performance degradation among all comparison models, even in scenarios where the amount of training data was reduced through the transfer learning process between the source and target domains. Additionally, it was confirmed that binary and multi-class classification models applying feature fusion techniques achieved significantly improved performance compared to existing methods. The study demonstrated that transfer learning for relatively minor class data mitigated the class imbalance problem, resulting in enhanced classification performance. In terms of overall performance, the proposed model surpassed all comparison models, achieving 94.21% accuracy and 94.16% F1-score, outperforming traditional machine learning techniques. Therefore, this study enhanced cybersecurity defenses and demonstrated the potential for more effectively detecting newly emerging attack types by introducing a network intrusion detection model that leverages feature fusion techniques extracted through transfer learning.

Meanwhile, the training and prediction performance on the attack types removed during the experiments remains challenging. To address this, future research will explore techniques for augmenting rare classes to mitigate the data imbalance problem and enhance the model’s adaptability in responding to the evolving cyberthreat landscape more effectively. Specifically, future work will focus on applying data augmentation techniques, such as oversampling, generative adversarial network-based synthesis, and diffusion model-based synthesis, to generate additional samples for underrepresented attack types and incorporate them into the model proposed in this study. These efforts are expected to improve the model’s robustness and enhance its effectiveness in responding to the evolving cyberthreat landscape.

Author Contributions

Conceptualization, S.L., D.R. and J.-H.B.; methodology, S.L., J.-H.B. and J.Y; software, S.L. and D.R.; validation, J.-H.B., J.L. and D.M.; formal analysis, J.L. and D.M.; investigation, S.L., D.R., J.-H.B. and J.Y.; resources, J.-H.B. and J.Y.; data curation, S.L. and D.R.; writing—original draft preparation, S.L. and D.R.; writing—review and editing, S.L., D.R. and J.-H.B.; visualization, D.M. and J.L.; supervision, J.-H.B., J.Y. and J.L.; project administration, J.-H.B. and J.Y.; funding acquisition, J.-H.B. and J.Y.; All authors have read and agreed to the published version of the manuscript.

Funding

This work was supported by Institute of Information & Communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2023-00235509, Development of security monitoring technology based network behavior against encrypted cyber threats in ICT convergence environment).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The datasets analyzed in this study, CIC-IDS2017 and CIC-IDS2018, are publicly accessible and can be downloaded from the Canadian Institute for Cybersecurity (CIC) at the University of New Brunswick (UNB). The CIC-IDS2017 dataset is available at https://www.unb.ca/cic/datasets/ids-2017.html, accessed on 22 April 2025, and the CIC-IDS2018 dataset can be found at https://www.unb.ca/cic/datasets/ids-2018.html, accessed on 22 April 2025.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Dutton, W.H.; Peltu, M. Information and Communication Technologies: Visions and Realities; Oxford University Press: New York, NY, USA, 1996; pp. 113–115. ISBN 0198774966. [Google Scholar]
Falowo, O.I.; Ozer, M.; Li, C.; Abdo, J.B. Evolving malware and DDoS attacks: Decadal longitudinal study. IEEE Access 2024, 12, 39221–39237. [Google Scholar] [CrossRef]
Schmitt, M. Securing the digital world: Protecting smart infrastructures and digital industries with artificial intelligence (AI)-enabled malware and intrusion detection. J. Ind. Inf. Integr. 2023, 36, 100520. [Google Scholar] [CrossRef]
Zhang, C.; Jia, D.; Wang, L.; Wang, W.; Liu, F.; Yang, A. Comparative research on network intrusion detection methods based on machine learning. Comput. Secur. 2022, 121, 102861. [Google Scholar] [CrossRef]
Zaman, M.; Lung, C.-H. Evaluation of Machine Learning Techniques for Network Intrusion Detection. In Proceedings of the NOMS 2018—2018 IEEE/IFIP Network Operations and Management Symposium, Taipei, Taiwan, 23–27 April 2018; pp. 1–5. [Google Scholar] [CrossRef]
Li, J.; Qu, Y.; Chao, F.; Shum, H.P.H.; Ho, E.S.L.; Yang, L. Machine learning algorithms for network intrusion detection. In AI in Cybersecurity; Intelligent Systems Reference Library; Sikos, L., Ed.; Springer: Cham, Switzerland, 2018; pp. 151–179. [Google Scholar] [CrossRef]
Ashiku, L.; Dagli, C. Network intrusion detection system using deep learning. Procedia Comput. Sci. 2021, 185, 239–247. [Google Scholar] [CrossRef]
Cantone, M.; Marrocco, C.; Bria, A. On the cross-dataset generalization of machine learning for network intrusion detection. arXiv 2024, arXiv:2402.10974. [Google Scholar] [CrossRef]
Mehedi, S.T.; Anwar, A.; Rahman, Z.; Ahmed, K. Deep transfer learning based intrusion detection system for electric vehicular networks. Sensors 2021, 21, 4736. [Google Scholar] [CrossRef]
Chen, Z.; Simsek, M.; Kantarci, B.; Bagheri, M.; Djukic, P. Machine learning-enabled hybrid intrusion detection system with host data transformation and an advanced two-stage classifier. Comput. Netw. 2024, 250, 110576. [Google Scholar] [CrossRef]
Selvam, R.; Velliangiri, S. An Improving Intrusion Detection Model Based on Novel CNN Technique Using Recent CIC-IDS Datasets. In Proceedings of the 2024 International Conference on Distributed Computing and Optimization Techniques (ICDCOT), Bengaluru, India, 15–16 March 2024; pp. 1–6. [Google Scholar] [CrossRef]
Gayatri, K.; Premamayudu, B.; Yadav, M.S. A Two-Level Hybrid Intrusion Detection Learning Method. In Machine Intelligence and Soft Computing, Proceedings of the ICMISC 2021, Guntur, India, 22–24 September 2021; Bhattacharyya, D., Thirupathi Rao, N., Eds.; Springer: Singapore, 2021; pp. 241–253. [Google Scholar] [CrossRef]
Ma, Y.; Chen, S.; Ermon, S.; Lobell, D.B. Transfer learning in environmental remote sensing. Remote Sens. Environ. 2024, 301, 113924. [Google Scholar] [CrossRef]
Lu, H.; Zhao, Y.; Song, Y.; Yang, Y.; He, G.; Yu, H.; Ren, Y. A transfer learning-based intrusion detection system for zero-day attack in communication-based train control system. Clust. Comput. 2024, 27, 8477–8492. [Google Scholar] [CrossRef]
Li, X.; Hu, Z.; Xu, M.; Wang, Y.; Ma, J. Transfer learning based intrusion detection scheme for Internet of vehicles. Inf. Sci. 2021, 547, 119–135. [Google Scholar] [CrossRef]
Dai, G.; Tian, Z.; Fan, J.; Sunil, C.K.; Dewi, C. DFN-PSAN: Multi-level deep information feature fusion extraction network for interpretable plant disease classification. Comput. Electron. Agric. 2024, 216, 108481. [Google Scholar] [CrossRef]
Jabeen, K.; Khan, M.A.; Alhaisoni, M.; Tariq, U.; Zhang, Y.-D.; Hamza, A.; Mickus, A.; Damaševičius, R. Breast cancer classification from ultrasound images using probability-based optimal deep learning feature fusion. Sensors 2022, 22, 807. [Google Scholar] [CrossRef] [PubMed]
Li, P.; Wang, Y.; Si, T.; Ullah, K.; Han, W.; Wang, L. MFFSP: Multi-scale feature fusion scene parsing network for landslides detection based on high-resolution satellite images. Eng. Appl. Artif. Intell. 2024, 127, 107337. [Google Scholar] [CrossRef]
Li, M.; Chen, Y.; Lu, Z.; Ding, F.; Hu, B. ADED: Method and device for automatically detecting early depression using multimodal physiological signals evoked and perceived via various emotional scenes in virtual reality. IEEE Trans. Instrum. Meas. 2025, 74, 1–16. [Google Scholar] [CrossRef]
Han, Y.; Choi, Y.; Lee, J.; Bae, J.-H. Feature fusion model using transfer learning and bidirectional attention mechanism for plant pipeline leak detection. Appl. Sci. 2025, 15, 490. [Google Scholar] [CrossRef]
Sharafaldin, I.; Lashkari, A.H.; Ghorbani, A.A. Toward Generating a New Intrusion Detection Dataset and Intrusion Traffic Characterization. In Proceedings of the 4th International Conference on Information Systems Security and Privacy (ICISSP 2018), Funchal–Madeira, Portugal, 22–24 January 2018; Mori, P., Furnell, S., Camp, O., Eds.; Springer: Cham, Switzerland, 2019; pp. 108–116. [Google Scholar]
The CIC-IDS2017 Dataset and CIC-IDS2018 Dataset. Available online: https://www.unb.ca/cic/datasets/index.html (accessed on 22 April 2025).
Francazi, E.; Baity-Jesi, M.; Lucchi, A. A theoretical analysis of the learning dynamics under class imbalance. arXiv 2024, arXiv:2207.00391. [Google Scholar]
Michelucci, U.; Venturini, F. New metric formulas that include measurement errors in machine learning for natural sciences. Expert Syst. Appl. 2023, 224, 120013. [Google Scholar] [CrossRef]
Ayinde, B.O.; Zurada, J.M. Building efficient ConvNets using redundant feature pruning. arXiv 2018, arXiv:1802.07653. [Google Scholar]
He, H.; Garcia, E.A. Learning from imbalanced data. IEEE Trans. Knowl. Data Eng. 2009, 21, 1263–1284. [Google Scholar] [CrossRef]
Mujahid, M.; Kına, E.; Rustam, F.; Villar, M.G.; Alvarado, E.S.; de la Torre Diez, I.; Ashraf, I. Data oversampling and imbalanced datasets: An investigation of performance for machine learning and feature engineering. J. Big Data 2024, 11, 87. [Google Scholar] [CrossRef]
Abu Elsoud, E.; Hassan, M.; Alidmat, O.; Al Henawi, E.; Alshdaifat, N.; Igtait, M.; Ghaben, A.; Katrawi, A.; Dmour, M. Under sampling techniques for handling unbalanced data with various imbalance rates: A comparative study. Int. J. Adv. Comput. Sci. Appl. 2024, 15, 1274–1284. [Google Scholar] [CrossRef]
Kim, J.-M.; Chung, Y.-J. Clustering based under-sampling for imbalanced data classification. J. Korean Inst. Inf. Technol. 2024, 22, 51–60. [Google Scholar] [CrossRef]
Ryu, K.J.; Shin, D.-I.; Shin, D.-G.; Park, J.-C.; Kim, J.-G. A pre-processing study to solve the problem of rare class classification of network traffic data. KIPS Trans. Softw. Data Eng. 2020, 9, 411–418. [Google Scholar] [CrossRef]
De Amorim, L.B.V.; Cavalcanti, G.D.C.; Cruz, R.M.O. The choice of scaling technique matters for classification performance. Appl. Soft Comput. 2023, 133, 109924. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention is all you need. arXiv 2023, arXiv:1706.03762. [Google Scholar]
Yao, R.; Zhao, H.; Zhao, Z.; Guo, C.; Deng, W. Parallel convolutional transfer network for bearing fault diagnosis under varying operation states. IEEE Trans. Instrum. Meas. 2024, 73, 1–13. [Google Scholar] [CrossRef]
Ravikumar, A.; Harini, S. A comprehensive review of transfer learning on deep convolutional neural network models. Int. J. Adv. Trends Comput. Sci. Eng. 2020, 9, 8272–8278. [Google Scholar] [CrossRef]
Mungoli, N. Adaptive ensemble learning: Boosting model performance through intelligent feature fusion in deep neural networks. arXiv 2023, arXiv:2304.02653. [Google Scholar]
Al-Deen, H.S.S.; Zeng, Z.; Al-Sabri, R.; Hekmat, A. An improved model for analyzing textual sentiment based on a deep neural network using multi-head attention mechanism. Appl. Syst. Innov. 2021, 4, 85. [Google Scholar] [CrossRef]
Roy, S.; Mehera, R.; Pal, R.K.; Bandyopadhyay, S.K. Hyperparameter optimization for deep neural network models: A comprehensive study on methods and techniques. Innov. Syst. Softw. Eng. 2023. [Google Scholar] [CrossRef]
Fix, E.; Hodges, J.L. Discriminatory Analysis. Nonparametric Discrimination: Consistency Properties; USAF School of Aviation Medicine: Dayton, OH, USA, 1951. [Google Scholar]
Cox, D.R. The regression analysis of binary sequences. J. R. Stat. Soc. Ser. B Stat. Methodol. 1958, 20, 215–232. [Google Scholar] [CrossRef]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]

Figure 1. NIDS data preprocessing process for training the proposed model.

Figure 2. Overview of the proposed model training process.

Figure 3. Detailed structure of source domain binary and multi-class classification models.

Figure 4. Detailed structure of binary and multi-class classification models in the target domain.

Figure 5. Transfer learning-based feature fusion deep learning model for network intrusion detection.

Figure 6. Graph of loss and accuracy by epoch of the model: (a) results of loss graph by epoch of the model; (b) results of accuracy by epoch of the model.

Figure 7. Confusion matrix of the model for each attack type: (a) results of Confusion matrix of the CNN (NoTL) model for each attack type; (b) results of Confusion matrix of the proposed model for each attack type.

Table 1. Structure of final proposed model.

Stage	Output Dimension	Layer
Input	$76 \times 1$	Input Layer
1D CNN	$76 \times 64$	$[1 \times 6, 64]$
MaxPooling1D	$38 \times 64$	Pool size = 2
1D CNN	$38 \times 128$	$[1 \times 6, 128]$
MaxPooling1D	$19 \times 128$	Pool size = 2
1D CNN	$19 \times 256$	$[1 \times 6, 256]$
MultiHeadAttention	$19 \times 256$	Head = 4
Flatten	4864	-
Dense Layer 1	256	$FC : [4864 \to 256$ ]
Dense Layer 2	64	$FC : [256 \to 64$ ]
Concatenate	128	-
Classifier	6	$FC : [128 \to 6$ ] Softmax
# params	2,598,790

Table 2. Training parameters for source (CIC-IDS2017) and target domains (CIC-IDS2018).

Parameter	CIC-IDS 2017 (Source Domain)	CIC-IDS 2018 (Target Domain)
Train data	74,236	4510
Test data	24,746	1503
Optimizer	Adam	Adam
Loss function	Binary, categorical	Binary, categorical
Learning rate	$5 \times 10^{- 4}$	$1 \times 10^{- 4}$
Epoch	25	25
Batch size	16	64

Table 3. Class distribution in source (CIC-IDS2017) and target (CIC-IDS2018) domains.

Class	CIC-IDS 2017 (Source Domain)	CIC-IDS 2018 (Target Domain)
Benign	21,996	1003
PortScan	21,996	-
DoS	21,996	1001
DDoS	21,996	1001
FTP-Patator	5499	1003
SSH-Patator	5499	1003
Bot	-	1003
Total	98,982	6014

Table 4. Comparison of categorical classification performance among various models.

Model	Accuracy (%)	F1-Score (%)
KNN	94.12	94.05
LR	90.86	90.77
SVM	92.06	91.84
CNN (NoTL)	92.70	92.63
CNN + Attention (NoTL)	93.71	93.64
CNN + Attention (TL)	93.96	93.90
Proposed model	94.21	94.16

Table 5. Accuracy evaluation of models with 100%, 66%, and 33% training data.

Model	Train 100%	Train 66%	Train 33%
KNN	94.12	93.67	93.02
LR	90.86	90.74	90.41
SVM	92.06	91.82	91.38
CNN (NoTL)	92.70	92.40	91.38
CNN + Attention (NoTL)	93.71	93.32	92.43
CNN + Attention (TL)	93.96	93.77	93.08
Proposed model	94.21	94.10	93.66

Table 6. F1-score evaluation of models with 100%, 66%, and 33% training data.

Model	Train 100%	Train 66%	Train 33%
KNN	94.05	93.58	92.92
LR	90.77	90.64	90.29
SVM	91.84	91.59	91.14
CNN(NoTL)	92.63	92.33	91.25
CNN + Attention (NoTL)	93.64	93.27	92.33
CNN + Attention (TL)	93.90	93.70	92.99
Proposed model	94.16	94.05	93.59

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lee, S.; Roh, D.; Yu, J.; Moon, D.; Lee, J.; Bae, J.-H. Deep Feature Fusion via Transfer Learning for Multi-Class Network Intrusion Detection. Appl. Sci. 2025, 15, 4851. https://doi.org/10.3390/app15094851

AMA Style

Lee S, Roh D, Yu J, Moon D, Lee J, Bae J-H. Deep Feature Fusion via Transfer Learning for Multi-Class Network Intrusion Detection. Applied Sciences. 2025; 15(9):4851. https://doi.org/10.3390/app15094851

Chicago/Turabian Style

Lee, Sunghyuk, Donghwan Roh, Jaehak Yu, Daesung Moon, Jonghyuk Lee, and Ji-Hoon Bae. 2025. "Deep Feature Fusion via Transfer Learning for Multi-Class Network Intrusion Detection" Applied Sciences 15, no. 9: 4851. https://doi.org/10.3390/app15094851

APA Style

Lee, S., Roh, D., Yu, J., Moon, D., Lee, J., & Bae, J.-H. (2025). Deep Feature Fusion via Transfer Learning for Multi-Class Network Intrusion Detection. Applied Sciences, 15(9), 4851. https://doi.org/10.3390/app15094851

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Feature Fusion via Transfer Learning for Multi-Class Network Intrusion Detection

Abstract

1. Introduction

2. Related Work

3. Data Processing and Preparation

4. Proposed Method

4.1. Overview of the Proposed Model

4.2. Detailed Description of the Proposed Model Structure

5. Results

5.1. Experimental Environment and Parameter Settings

5.2. Analysis of Results for the Proposed Method

5.3. Comprehensive Performance Analysis of the Proposed Method

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI