Next Article in Journal
Experimental Study on Failure Mechanism and Mode of Fly-Ash Dam Slope Triggered by Rainfall Infiltration
Previous Article in Journal
Extension of DBSCAN in Online Clustering: An Approach Based on Three-Layer Granular Models
 
 
Article
Peer-Review Record

A Multifaceted Deep Generative Adversarial Networks Model for Mobile Malware Detection

Appl. Sci. 2022, 12(19), 9403; https://doi.org/10.3390/app12199403
by Fahad Mazaed Alotaibi 1,† and Fawad 2,*,†
Reviewer 1:
Reviewer 2: Anonymous
Appl. Sci. 2022, 12(19), 9403; https://doi.org/10.3390/app12199403
Submission received: 31 August 2022 / Revised: 14 September 2022 / Accepted: 15 September 2022 / Published: 20 September 2022
(This article belongs to the Section Electrical, Electronics and Communications Engineering)

Round 1

Reviewer 1 Report

A Multifaceted Deep Generative Adversarial Networks Model for Mobile Malware Detection

This paper introduces a novel mobile malware (mainly for Android) detection algorithm called MDGAN. As its name implies, MDGAN employs GAN to overcome the problem of limited training samples. Features extracted from an APK file include its binary sequence (represented as a gray-scale image) and an API numeric sequence, as inputs to the underlying GAN.

MDGAN does tackle an important problem that malware has consistently attracted attention from both industry and academia. The battle against malware has never ended. GAN, as one of the most powerful tools in the past decades, could be a good candidate for this problem as the authors also mentioned. I would appreciate the efforts into this field.

However, the current version is still a bit rough. Here I list several high-level points, details can be found in the following comments below. (1) The problem definition is not quite clear. The definition of malware shouldn't rely on datasets. (2) The motivation to use images as input is unclear, especially when the problem to solve is quite independent from image processing. (3) The formal representation is quite broken. Section 4.3 is not quite readable now because of undefined notations. I would highly recommend the authors to further pursue this research and polish the draft to refine the problems stated above, and to make the paper more reasonable, reproducible and stronger.

Please find my detailed comments below:

Intro
- The problem definition is a little unclear: what is the scope of "malware" that MDGAN is detecting? Is it mainly defined by the app's behavior, outcome, or sth else? Will a legal app with frequent ads fetching be regarded as malware? Will an unlicensed app without any malicious behavior be regarded as malware? After reading the paper, my impression is that you highly depend on the dataset used for evaluation to determine malware. As a research paper, this is not quite sufficient, the introduction should have a more clear problem definition.

Literature Review
- "To enhance the classification performance, the malware is first converted into an image before the feature extraction and classification stage": why is this necessary for higher performance and how does such transformation work? I understand that existing techniques (eg. GoogleNet) may provide APIs taking images as inputs, but it would be nice if you provide more rationale for converting APK files to images because they are intuitively irrelevant.
- "Various types of neural networks have been developed in the literature for the classification of the ImageNet dataset": why is ImageNet representative enough for images generated from malware signature? The hidden assumption is that "malware images" have the same essence as normal images, but I'm not convinced of that.

Section 4
- " The mapping f : {τ, χ} → x": the notations here are not defined and thus this section is hard to parse
- "the conditional constrained χ" vs. " the synthetic by variable χ"
- Too many undefined notations in Section 4.3, eg. = ζ, L_g in (4). What does "arg" mean in (4)? Do you mean argmin or argmax? I would highly recommend rewriting this section as it is not readable now.

Evaluation
- In fig 4, it's a bit weird that "others" is the majority. Do the files belonging to "others" share the same characteristics? If not, the majority of the dataset is still a mystery. Unless I missed sth, it looks like you also don't include "others" in the following results, do you? If so, please remove "others" from fig 4 since it is misleading. If not, please be explicit how "others" are handled.

nits:
- "detecting zero-day type malware can’t adopt the feature reduction technique": "cannot"; "it’s not possible due to": "it is". It is highly encouraged to spell out words in academia papers.
- "The handcrafted approach for the features extraction process." Is this sentence complete?
- " The Feature set consists of": "feature" (lower case)
- In (2) and (3), is there an additional closing bracket in the first term of the right hand?

Author Response

Thanks so much, honorable reviewer for in-depth reviews on our manuscript. We have incorporated and tried to fix all the comments and answer all the questions raised by the respected reviewers. We have updated our manuscript by including the mentioned suggested changes and the updated manuscript and reviewer response are attached/uploaded here.

Author Response File: Author Response.pdf

Reviewer 2 Report

Authors proposed novel and interesting work based on " Deep Generative Adversarial Networks Model for Mobile Malware Detection". After carefully observing manuscript,manuscript seems to be strengthen in various sections.Suggestions are listed below :

a. What is the specific motivation to choose MDGAN when other type of GANs like SinGAN,DCGAN etc. are avialble.Kindly add description in revised manuscript.

b. Proposed framework mentioned in Fig.2 seems not to be clear,as classification models are not mentioned.It is requested to modify.

c. How multi-face contingent pixel-to-pixel version of the GAN is better than other GAN.Kindly add description with more clarity.

d. It is unclear in fig.5,confusion matrix obtained through which classifier.Further it is expected to compare the results from other classifiers as well so that results and discussion as well as justification of methodolgy proposed should be strengthen. Kindly refer following journals in which authors applied GAN and comparison of results are presented with different classifiers.

1. https://journals.sagepub.com/doi/abs/10.1177/09544062211043132

e. It is expected to add grayscale images in revised manuscript.

f. It is not mentioned that authors presented training or testing results (Table 3 and Table 4). Kindly include training as well as testing results in revised manuscript.

g. Some recent literatures related to GAN and mobile malware detection with neccessary description should be added in revised manuscript:

1. https://ieeexplore.ieee.org/document/9346277

2. https://journals.sagepub.com/doi/abs/10.1177/09544062211043132

3. https://www.mdpi.com/1424-8220/22/11/4302

4. https://www.mdpi.com/2076-3417/12/9/4664

 

 



 

Author Response

Respected reviewer we have tried to fix and answer all the questions raised by the honorable reviewer. the files are attached here.

Author Response File: Author Response.pdf

Round 2

Reviewer 1 Report

Thanks for the response and the updated version! Most of my comments are resolved properly and the draft is in a much better shape. Below is my last set of concerns, mainly for the formula, which still needs more work for polishing:

- Some more undefined notations: "τ" and "x" in "The mapping f : = {τ, χ} → x". Also, "χ" is reused later for synthetic samples. Please don't treat my examples as an exhaustive list of such issue, I'm not aiming to point out all cases. Instead, please do proofread this section carefully. As it includes formal expressions, it must be as strict as possible.
- In (2) and (3), there is still an additional closing bracket in the first term of the right hand, in the subscript of ζ

nits:
- "Although the disparity exists between the malware images and the real natural image; however, ..": "however" is redundant here because you already have "although" at the beginning.
- "The objective function in Eq." "The overall accuracy obtained with the expression given in Eq." Do you miss the number of the formula? ie. Eq (1), Eq (8).

Author Response

We are very thankful to the AE, EIC, and reviewers for their profound and thorough review. We have revised our manuscript in light of their valuable queries and suggestions. We hope our revision has improved the paper quality to a level of reviewers’ satisfaction. The answers to their specific suggestions/queries/comments are given below in detail.

Author Response File: Author Response.docx

Reviewer 2 Report

Authors justified reviewer comments and accordingly revised manuscript.

Author Response

We are thankful to the honorable reviewer for accepting out work, and we hope the paper will be useful for future researchers working in the field.

Author Response File: Author Response.docx

Back to TopTop