Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Using Live Spam Beater (LiSB) Framework for Spam Filtering during SMTP Transactions

Appl. Sci. 2022, 12(20), 10491; https://doi.org/10.3390/app122010491

by Silvana Gómez-Meire¹

, César Gabriel Márquez¹, Eliana Patricia Aray-Cappello¹

and José R. Méndez^1,2,3,*

Reviewer 1: Anonymous

Reviewer 2:

Salman Yussof

Appl. Sci. 2022, 12(20), 10491; https://doi.org/10.3390/app122010491

Submission received: 20 September 2022 / Revised: 7 October 2022 / Accepted: 16 October 2022 / Published: 18 October 2022

(This article belongs to the Special Issue Applied Artificial Intelligence (AI))

Round 1

Reviewer 1 Report

Advantages of the paper:

1. It presents a study on an important issue, e.g., automatic on-the-fly detection of e-mail spam.

2. The authors provide access to the produced code.

Disadvantages:

1. I do not understand what exactly new is provided here:

a. The system is not compared to the existing solutions of this kind.

b. The system is based on usage of machine learning, that apparently does not work as expected, since the indicators of quality are generally on the poor level.

2. Other problematic aspects from the scientific viewpoint:

a. Very old dataset - does it really make much sense to study spam filtering on the date 20 years old?

b. It is not clear what exactly is compared and understood as training or model building in the case of data provided in Table 1.

c. The selection of the literature on spam filtering in general and on spam detection supported by machine learning in particular seems to be quite random.

3. There are some editing issues that suggest that the paper is not thoroughly checked:

a. Strange font in footnotes 7 and 9.

b. "behatviour" in page 2

c. caption of Figure 2 is not given below the figure itself.

d. improper sign over "I" in "Naive" in Table 2.

e. strange text "The text continues here" in page 8.

Author Response

GENERAL OPINION.-

Advantages of the paper:

It presents a study on an important issue, e.g., automatic on-the-fly detection of e-mail spam.
The authors provide access to the produced code.

GENERAL RESPONSE: Thank you for your deep review and your positive feedback of our manuscript. Please find below the point-to-point responses to your suggestions.

Thank you very much for the positive criticism and issues reported. We have studied them carefully and have made the necessary modifications that have contributed to improving the quality of the manuscript. Changes made to the current version of the manuscript have been highlighted in red. Please examine our point-to-point responses given to the comments raised and check the suitability of the changes made in the manuscript.

QUESTION 1.- 1. I do not understand what exactly new is provided here:

The system is not compared to the existing solutions of this kind.
The system is based on usage of machine learning, that apparently does not work as expected, since the indicators of quality are generally on the poor level.

RESPONSE 1: We understand your comment and we will try to answer both questions. The lack of comparison of our system with existing solutions is due to the fact that there are no other systems that apply spam filtering at SMTP time. This issue has been highlighted in the State of the Art section.

In relation to the second question, the aim of our system is to achieve a compromise between the time needed to run the machine learning algorithms and their efficiency. In order to apply ML techniques in real time, the dimensionality of the feature vector must be reduced so that the classification can be performed in an acceptable time. It is not feasible to use high dimensionality, which would improve classification results, because it would increase time needed to process each email causing timeout issues. Please review our changes in the current version of the manuscript, we have added an explanation in the Introduction and Conclusions sections (highlighted in red).

QUESTION 2.- Other problematic aspects from the scientific viewpoint:

Very old dataset - does it really make much sense to study spam filtering on the date 20 years old?
It is not clear what exactly is compared and understood as training or model building in the case of data provided in Table 1.
The selection of the literature on spam filtering in general and on spam detection supported by machine learning in particular seems to be quite random.

RESPONSE 2: Thank you for the reported issues . Despite their age, we have used SpamAssassin, Enron and Bruce Guenter datasets because they are still a reference in many current works (Please check the following documents referenced by DOI: 10.1016/j.ipm.2021.102812 and 10.3390/app12147043). We have added a clarification about this issue in subsection 3.2.

Table 1 shows the time needed to build each of the models analysed. We have reworded the explanation in the text of the manuscript to make it clearer.

Finally, regarding the random selection of literature on spam filtering and detection using machine learning, this is due to the almost non-existent works related to real-time classification. However, following your advice, we have included some references to review works on the application of ML for spam filtering. Please note that the topic of these studies is the use of ML only in the context of reactive spam filtering (and therefore does not apply in SMTP time).

Please review our changes in the current version of the manuscript (highlighted in red).

QUESTION 3.- There are some editing issues that suggest that the paper is not thoroughly checked:

Strange font in footnotes 7 and 9.
"behatviour" in page 2
caption of Figure 2 is not given below the figure itself.
improper sign over "I" in "Naive" in Table 2.
strange text "The text continues here" in page 8.

RESPONSE 3:

Thank you very much for finding and reporting these editing issues. We have corrected all of them. Please refer to the manuscript to evaluate our changes that are highlighted in red.

Author Response File: Author Response.docx

Reviewer 2 Report

This is a well written paper on a proactive spam filtering framework. The framework is even published at github as an open source, which is very much appreciated.

One minor clarification is required. In line 292 and 293, it is stated that classification time should always be under 18 milliseconds. Why 18 milliseconds? Is there a reference or an experiment conducted to justify this?

Author Response

GENERAL OPINION.- This is a well written paper on a proactive spam filtering framework. The framework is even published at github as an open source, which is very much appreciated.

GENERAL RESPONSE: Thank you for your deep review and your positive feedback of our manuscript. Please find below the responses to your suggestions

QUESTION 1.- One minor clarification is required. In line 292 and 293, it is stated that classification time should always be under 18 milliseconds. Why 18 milliseconds? Is there a reference or an experiment conducted to justify this?

RESPONSE 1: We fully agree that a clarification is needed. With this statement, we are trying to highlight that the time needed by the models to classify a message is always below 18 ms. We consider this time acceptable to process an email as it does not suppose an appreciable delay during the SMTP transaction. We have reworded the sentence to clarify this idea. Please refer to the new version of the manuscript and revise our changes that were highlighted in red.

Author Response File: Author Response.docx

Article Menu

Using Live Spam Beater (LiSB) Framework for Spam Filtering during SMTP Transactions

Further Information

Guidelines

MDPI Initiatives

Follow MDPI