Next Article in Journal
Semi-Supervised Drivable Road Segmentation with Expanded Feature Cross-Consistency
Next Article in Special Issue
A Comparative Analysis of Active Learning for Rumor Detection on Social Media Platforms
Previous Article in Journal
Residual Transformer YOLO for Detecting Multi-Scale Crowded Pedestrian
Previous Article in Special Issue
Knowledge-Graph- and GCN-Based Domain Chinese Long Text Classification Method
 
 
Article
Peer-Review Record

Deep Learning-Enabled Heterogeneous Transfer Learning for Improved Network Attack Detection in Internal Networks

Appl. Sci. 2023, 13(21), 12033; https://doi.org/10.3390/app132112033
by Gang Wang, Dong Liu, Chunrui Zhang * and Teng Hu
Reviewer 1: Anonymous
Reviewer 2: Anonymous
Reviewer 3: Anonymous
Appl. Sci. 2023, 13(21), 12033; https://doi.org/10.3390/app132112033
Submission received: 24 September 2023 / Revised: 23 October 2023 / Accepted: 2 November 2023 / Published: 4 November 2023
(This article belongs to the Special Issue New Insights and Perspectives in Cyber and Information Security)

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

1- The details of developed algorithm in the study did not explain. It should be clear how the contributions of the study in research work can boost the application of deep learning in network Security. 

2- the general data in the manuscript should be replaced by the developed algorithm of the research work.

3- The conclusion should be modified to discuss more results from the study.

Comments on the Quality of English Language

The quality of English language should be enhanced.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 2 Report

Comments and Suggestions for Authors

Overview of the Article:

 

About the Topic and Content. CONTRIBUTION:

The article "Deep Learning Enabled Heterogeneous Transfer Learning for Improved Network Attack Detection in Internal Networks" investigates theproblem of detecting attacks on internal networks (classifying network activity as attack or non-attack), but specifically through the use of automatic apprehension, with models trained on databases with large labeled (or unlabeled) datasets  

It is noted that the application of this approach to internal networks is difficult because there is usually not enough labeled data available.  To address this difficulty, this article proposes a new deep heterogeneous transfer learning model for the detection of attacks on internal networks, this model learns knowledge from one source domain to apply it to another target domain, despite the fact that these two domains have different feature spaces and different probability distributions. This new model constitutes the main contribution of the work

In this new model, ways of solving the difficulty caused by  the heterogeneity in the characteristic spaces and the probability distributions between  the domains are proposed, for which the following are used:   

1.                  A neural network for the projection, transformation of the data (features) from the source domain to the  target domain, creating a space of common features. This transformation decreases the heterogeneity of the feature space, which facilitates the transfer of knowledge between the source and target domains. (Figure 1, section 3.1).

The use of neural networks for data transformation generalizes previous works by the authors themselves (ref 30) where a linear transformation was used, which has the advantage of its simplicity but for that same reason sometimes does not fully capture the complex nonlinear relationship that can exist between the data.

2.                  Another Neural Classification Network, whose input is the already transformed characteristics.  Classifies network activity as attack or non-attack. Cross entropy is used in the classification loss function. (Figure 1, section 3.1)

 

1.                  The Maximum Mean Discrepancy (MMD) is used to align the probability distributions between the domains. The maximum mean discrepancy (MMD) quantifies the difference between two probability distributions by measuring the discrepancy between the means of samples drawn from each distribution. (equations 1 and 2 of section 3,1,3).

MMD uses kernel functions to map data into higher-dimensional feature spaces where their differences become more apparent. The choice of kernel  function affects MMD's sensitivity to several types of differences   in distribution characteristics, such as changes in measurements, variances, and correlations.

2.                  Use of unlabeled data  from the target domain, to which a "Pseudo-soft classification" is applied, which consists of assigning probability distributions over classes to each unlabeled sample, this allows a more stable iteration process and avoids negative transfer.  

3.                  Using MMDs makes it easy to integrate untagged data into the data in the  target domain and allows it to be used in the classification task. MMD is applied not only to marginal distributions, as usual, but also to conditional distributions (calculation of MMD over classes). (section 3.13). At the initial stage, the untrained classification network makes random assumptions for the unlabeled data, so in that step  the minimization of the distance of the conditional distributions has little effect.

4.                  The loss function of the proposed model  takes into account  the loss of classification and also  the loss of alignment  of the distributions, which is calculated based on the maximum mean diversity (MMD) for the  marginal  and  conditional  distributions. (Equation 4 in section 3.1.4).

5.                   Theproposed structure of this deep network for heterogeneous transfer learning is shown very clearly in Figure 1. ("Deep network architecture for heterogeneous transfer learning").

6.                  During the iterative training of the model, in order to minimize  the objective function given by equation 4, the different parameters of the model are adjusted/optimized, such as the structure of the data transformation network and the classification network, the components of the alignment and the alpha coefficient (alpha weights the relative importance of the classification loss and   the alignment loss of thes  distributionis). (Figure 1 and equation 4 in section 3.1.4).

 

Actuality and novelty.

The topic is very topical,  since the development of new attacks imposes the need to develop increasingly effective detection methods and the proposed model presents novelties with respect to previous work in this field, as already mentioned.

Structure of the work. Writing and style.

The work is well structured, and written with good writing and academic style, which makes it easy to understand despite the complexity of the topic. In the introduction, the problem is presented, the previous approaches are mentioned and the solution idea on which the proposed new model is based.

Section 2. Related Work, is subdivided into:

2.1. Network Attack Detection.

2.2. Transfer Learning in Network Attack Detection.

2.3. Deep Learning for Transfer Learning

and in each of them, an overview of the advances in previous work on these topics is given, pointing out contributions and limitations. It is highlighted that there is limited exploration of deep learning in the context of heterogeneous transfer learning for the detection of attacks on internal networks

Section 3. System Design and Methods is structured into:

3.1. Network Architecture Design:

3.1.1. Feature Projection Networks.

3.1.2. Classification Network.

3.1.3. Distribution Alignment.

3.1.4. The Optimization Objective of the Transfer Learning Network

Each ofthese headings describes clearly and comprehensibly the elements that make up the proposed new model.

Section 4. Performance Evaluation, the design of the experiment and the results of the experimental validation are presented.

Experimental mitigation.

3 well-known databases (DBs) were used to validate the detection of network attacks: BD1=NSL-KDD, BD2=UNSW-NB15, BD3=CIC-IDS2017 and2 types of experiments were performed. The characteristics of these databases are described, and in particular the types of attacks in each one (Table 1).

Experiment 1: The model was trained on DB 1 (source domain) and classified on DB2 (target domain). 7 knowledge transfer tasks were performed for different types of attacks (the 7 columns in Table 2). Only one hidden layer was used in the data transformation neural network and 3 hidden layers in the classification neural network. In all the hidden layers, the very popular non-linear activation function ReLU (Rectified Linear Units)  was used, which eliminates all the negative values received in the input, replacing them with zeros while keeping the value of the positive inputs as outputs. The dimension of the common feature space (number of outputs of the feature transformation network) was set to equal to 256. The training consisted of 3 epochs and each one performs multiple iterations depending on the size of the dataset.

The proposed model is theoretically compared with 4 previous models mentioned in its reference (30), arguing the superiority of its model due to three factors: Use of a nonlinear projection of the feature space, optimization of  classification loss and  use of all available data for training, unlike the compared methods that only use partial data.

The numerical results of the experiment are shown in Table 2 and confirm that better precision in the classification is achieved than in the 4 previous methods with which it was compared,  surpassing  the hemmd method,  which was the most accurate among the previous ones. The possible causes of this superiority are argued.

Experiment 2: The model was trained in DB 2 and classified in BD3.  Taking into account the types of attacks in each DB (Table 1),  the source and target datasets were divided into groups, according to the types of similar attacks, as shown in Table 3, thus constructing several scenarios to validate the proposed method. Different parameters were used for experiment 1. In the neural network for data transformation, 2 and 3 hidden layers were used at source (X s) and destination (XT) respectively and 4 hidden layers in the classification neural network. The dimension of the common feature space was set to equal to 128. The results of the experiment are shown in Table 4. The method works well in the detection of attacks, but nevertheless has a high possibility of detecting a benign dataset as attacks (High False Positive Ratio) which constitutes a limitation of the method that is clearly pointed out by the authors as a direction for future work.

In conclusion, the experimental results validate the efficacy of the proposed heterogeneous transfer deep learning model to improve the accuracy of the data.

Classification during the detection of attacks on internal networks.

 

Signs

The ReLU activation feature has some limitations that may affect its performance. One of these limitations is that, if the input to a neuron is negative, the neuron produces zero output, becomes inactive, and does not contribute to the output of the network (dead neuron). As an alternative to this problem, alternative activation functions have been proposed, such as Leaky ReLUR,  Parametric ReLU,  and ELU.

QUESTION 1: Can this problem occur in this model? If so, could you assess the influence of using some of the alternative activation functions to ReLu?

 

Conclusions of the review:

 

The topic is topical, the proposed new model has novelty with respect to previous methods, it is well structured and written with scientific rigor, its publication is recommended.

   

 

Comments for author File: Comments.pdf

Comments on the Quality of English Language

  Minor review

 

 

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

Comments and Suggestions for Authors

The authors need to address the following comments:

- Define what MMD, NSL-KDD, UNSW-NB15, and CICIDS2017 stand for in your abstract.

- It would be good to list the contribution in bullet point in the Introduction section. 

-Label your equations

- any graphical results? 

-The paper generally is good and comprehensive 

Comments on the Quality of English Language

Minor editing of English language required

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The paper can be accepted.

Back to TopTop