Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Similarity-Based Malware Classification Using Graph Neural Networks

Appl. Sci. 2022, 12(21), 10837; https://doi.org/10.3390/app122110837

by Yu-Hung Chen^*, Jiann-Liang Chen and Ren-Feng Deng

Reviewer 1:

Nikos Kanakaris

Reviewer 2: Anonymous

Reviewer 3:

Muhamad Usman Ashraf

Appl. Sci. 2022, 12(21), 10837; https://doi.org/10.3390/app122110837

Submission received: 14 August 2022 / Revised: 14 October 2022 / Accepted: 24 October 2022 / Published: 26 October 2022

Round 1

Reviewer 1 Report

This paper deals with the task of Malware Classification using graph neural networks and Siamese networks. The experimental results of the proposed approach show an improvement in the accuracy of the considered task.

The paper is well-written, has well coherence, and is well-structured.
The introduction section explains the concept in a nice and understandable way. Also, it mentions the motivation and the contribution of this study, something that is really important.
The proposed model and the experiments are well explained.

Suggestions and observations:

The size of the dataset is very limited for a ‘graph neural networks’-based approach.
Could you please test whether the proposed model overfits or not?
It is recommended that the authors upload their Code on GitHub.
In the last paragraph of the Introduction section, I suggest you mention the contributions of this paper in a bulleted list.
Please improve the quality of Figures 10 and 11.
Please convert the tables that are included in Figures 10 and 11 into two separate tables (and not a screenshot of your results).
It is suggested you use t-SNE instead of PCA to visualize your embeddings in Figure 12.
Could you please run a statistical test to show that the evaluation results have statistical significance?

Author Response

The response to reviewer 1 is as attached.

Author Response File: Author Response.docx

Reviewer 2 Report

Summary. The manuscript presents a methodology to classify malware software. The methodology is based on 1) GNN and 2) Siamese network where the GNN is used for classification; whereas the Siamese network is used for comparing similarity of two malware files.

Evaluation.

1. The manuscript needs significant improvement in terms of writing and structure. The current version is less readable, due to lack of a good structure. For instance, avoid long paragraphs. As an example, the third paragraph of Section 2 is almost 1.5 pages.

2. Most of figures are not visible (readable) and more importantly not enough explanations are given for the figures. Each figure needs profound explanation.

3. Overall about the procedure, it is not clear how the GNN part and the Siamese part are inter-related. It seems they are doing different things on their own. Accordingly, Figure 1 needs to be revised showing the connection between the last two steps.

4. There is no discussion on whether the proposed approach would work on obfuscated malware or not. It is unrealistic to believe that the code of each malware is clean enough to run a machine/deep learning algorithm to predict its class. Most malware are obfuscated, encrypted, etc. and the experimental part needs to discuss about these types of malware to show whether the proposed methodology works for obfuscated malware or not.

5. Change Figure 2 to Table.

6. There is not much background info about GNN and ASM2VEC.

7. There are some use of “latent semantic” but it is not clear what this phrase is referring to.

8. There are some discussion about conversion to graphs and pruning call graphs , but there is no clear discussion on how these graphs are generated and how they are pruned.

9. Figure 4, there is a coding scheme using color, but never discussed.

10. Figure 3 is hard to understand since there is no clear description.

11. There is a good number of typos and grammatical errors. E.g. title of Section 3.4. or line number 326.

12. Table 3 is comparing the paper’s result with literature that are dated back in 2006 and 2013. During that period deep learning approaches where not popular. The authors need to compare their results with the existing work which are utilizing deep learning for classification of malware. The authors cannot compare apples and organs.

Author Response

The response to reviewer 2 is as attached.

Author Response File: Author Response.docx

Reviewer 3 Report

The authors propose a novel GNN based malware identification model. The idea is really innovative and presented well in this paper. However, the authors need to improve the paper with the following corrections/suggestions.

The sentence "The function call relationship and the function assembly content are obtained by 10 analyzing the malware to generate a graph that represents its functional structure." is confusing and required to re-write.

Add a paragraph explaining how the other sections are organized in this paper.

Figure 1 Proposed Problem-Solving Mechanism, there should be a numbering of each stage for more clarity.

The acronyms should be described. i.e: knn, cnn, DLL etc.

To bring consistency in the paper, I would recommend adding preliminaries section after the introduction and add discuss 3.3 Graph Neural Networks (GNN), 3.4. Classiication Model, 3.5 Similarity Model.

Spell mistake: Classiication should be classification,

Figure 10, 11 are blurry. use HD figures.

For the results comparison, the existing studies are too old (2006, 2013). use the most recent proposed techniques for the results comparison.

It doesn't make sense to end the results section suddenly without appropriate discussion.

Author Response

The response to reviewer 3 is as attached.

Author Response File: Author Response.docx

Round 2

Reviewer 1 Report

Thank you for your changes!

Author Response

Thank you for your review and comment.

Reviewer 3 Report

My all previous comments are addressed carefully. appreciating the author's effort. However, some minor comments should be addressed in updated version as follows:

Table 3: mention the technique name that is being compared.

line 513: correct "It's" to It is

'So' word is not preferred in professional write up. Better to replace it with alternative connecting words. like: however, nevertheless, therefore, hence, etc.

Highlight the future work of your studies. prefer to add at the end of conclusion section.

Author Response

Please see the attachment. Thanks.

Author Response File: Author Response.docx

Article Menu

Similarity-Based Malware Classification Using Graph Neural Networks

Further Information

Guidelines

MDPI Initiatives

Follow MDPI