Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Quantifiable Interactivity of Malicious URLs and the Social Media Ecosystem

Electronics 2020, 9(12), 2020; https://doi.org/10.3390/electronics9122020

by Chun-Ming Lai^1,*

, Hung-Jr Shiu¹ and Jon Chapman²

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Reviewer 3: Anonymous

Electronics 2020, 9(12), 2020; https://doi.org/10.3390/electronics9122020

Submission received: 19 October 2020 / Revised: 11 November 2020 / Accepted: 23 November 2020 / Published: 30 November 2020

(This article belongs to the Special Issue New Challenges on Cyber Threat Intelligence)

Round 1

Reviewer 1 Report

This paper focuses on quantifying social influence on various malicious URLs on Facebook public pages. The paper is well-written with appropriate result to support its conclusion. My only suggestion to the author is to further improve the language of the present paper before publication. As a result, I support the publication of the paper with minor revision.

Author Response

Thank you so much for the suggestion. We have carefully reviewed the paper and corrected several typos / grammars to make the draft more easily-readable. For example, we have made the following changes for this version.

Line 4: “has lead”--> ‘has led’

Line 71: “it’s” --> “its”

Line 106: the user --> users

Line 127: page_id : remove the original subscript error

Line 311: “it’s”--> “its”

Line 212, 232: add and update the equation number

Line 212: t_final: check the subscript format

Line 248: parenthesis is not closed → parenthesis is closed

Figure 5: high resolution and retypesetting

We also make a change for the paper title to better catch the scope of this work.

Reviewer 2 Report

This paper gets the bulk of its strength from its large dataset of over 48,000 posts, 88,000,000 comments on those posts, as large datasets are vital to accurate training of classifiers. Additionally, the methodology of the paper is well stated and flows logically with the intentions of the paper. The overall presentation of the data is very professional and relatively easy to understand. The following list refers to the major methodological issues identified in the paper: In lines 155-157 authors say that "We first use a well-known Whitelist ’facebook.com’, ’youtube.com’, ’Twitter’, ’on.fb.me’, ’en.wikipedia’, ’huffingtonpost.com’, ’foxnews.com’, ’cnn.com’, ’google.com’, ’bbc.co.uk’, ’nytimes.com’, ’washingtonpost.com’ to do the first-step filter." There is a major issue here just on the whitelist itself. If you are whitelisting Facebook, Twitter, and YouTube, you are essentially destroying large sources of malicious URLs. How about a hacking video be posted on youtube.com? How about a conspiracy theory from twitter.com? The definitions of light and critical URLs (Lines 162-176) seems arbitrary, inconsistent, and not scientific. First, none of these definitions have anything to do with social influence (the main goal of the paper is quantifying social influence). All topics in the light category are related to advertisements, meaning that people posing these links are simply trying to get people to use their website to generate revenue. This has nothing to do with convincing them of anything other than to inform that their service exists while encouraging people to use it. In the critical category, there are a few subcategories that could be used to influence an individual’s behavior, specifically the “Aggressive” and “Violence” categories. However, many URLs for these two categories could be from whitelisted websites like youtube or Twitter. Other categories labeled as ‘Critical’ such as “Download”, “Drugs”, and “Weapons” have the same weakness as those labeled as ‘Light’, primarily existing as money-making ventures by those that post them, albeit through illicit and illegal means. They do not actually try to influence user behavior outside of informing them that the service exists. Or, the “Spyware” category literally cannot influence behavior as the software must be hidden to function properly. It is mentioned (lines 323-325) that light-type users are primarily commercially motivated while critical-type users are more likely to be politically influential, but this does not seem to be consistent with the definition of these categories. In fact, it can be presumed that a significant proportion of critical ones are instead commercially motivated in the same vein as the light-type users. The other major problem with the methodology is that it ignores some important details. For example, the influence ratio of a comment is defined as the log of the ratio of num. of activities after the comment over num. of activities before the comment for the same duration. But this model seems to be very simplistic. For example, assume post C2 which is a link is posted immediately right after C1 which is a very controversial comment. C1 would receive lots of feedback from users (appearing as comments C3, ...) but none of those are reactions to C2. However, C2 will have a high influence ratio the same as C1. Wouldn't it be more useful to know if subsequent comments are replies to that specific comment rather than replies to other posts? Also, several of the derived conclusions by the paper, or the justifications made for the results are simply assumptions. For example, In Lines 234-239, authors say that Target posts receive more comments than non-target posts, and one of the reasons is mentioned as "Normal users tended to react more than usual because of those malicious URLs — novel information would ignite interest to join a discussion." This, to me, looks like an assumption that is too generalizing. For example, how the "spyware" links could ignite interest to join a discussion? While the presented results include good information regarding the lifecycle of a post and its predictability, with an accuracy of 75% I'm not exactly sure if the models are predicting based on the prevalence of malicious URLs in the comment, or if the lifecycle of a post can just be easily predicted for large-enough platforms (such as a post made by CNN). I don't know how valuable the results of this paper are in a different context, but it gives very little insight into the patterns and prevalence of influence campaigns that are any more complicated than any advertisement. I believe if the authors were to put more effort into detecting the purpose of the URLs without white-listing known 'safe' domains, they would have a much better paper. For example, if they looked through all the YouTube links, and labeled the difference between a benign link and one to a conspiracy theory video. Finally, there are very obvious mistakes in the paper that are hard to ignore. For example "its" has been mistakenly written as "it's" on multiple occasions, or lines between 212 and 213 have no numbers, the subscript of t_{final} is not correct (between lines 212 and 213), or in line 248 the parenthesis is not closed.

Author Response

Please see the attachment.

Author Response File: Author Response.pdf

Reviewer 3 Report

This is paper analyzed how social recommendation systems affect the occurrence of malicious URLs on Facebook and other social media websites. The authors also explored how social recommendation systems work for both target and non-target threads. Two major experiments have been designed and multiple analysis models and algorithms were used.

One of the questions is: on line 160, and line 161, the authors mentioned that "Among the 74 categories listed, we manually divided targeted URLs into two
161 classes: Light and Critical". But there are thousands of URLs in these two categories according to Table 2.

This paper is very well written and organized.

Author Response

Line 4: “has lead” ‘has led’

Line 71: “it’s” “its”

Line 106: the user users

Line 127: page_id page_id: remove the original subscript error

Line 311: “it’s” “its”

Line 212, 232: add and update the equation number

Line 212: t_final t_final: check the subscript format

Line 248: parenthesis is not closed → parenthesis is closed

Figure 5: high resolution and retypesetting

We also make a change for the paper title to better catch the scope of this work.

As 160-161, yes, we divide the Malicious URLs into two classes (Light / Critical) based on Shalla List <http://www.shallalist.de/>.

Thank you so much for the confirmation and we already added it on the draft.

Round 2

Reviewer 2 Report

Some of the changes made in the paper address a few of those mentioned concerns, and I hope the rest could be addressed in the next steps of this project. It would have been more beneficial to include the changes you made with line numbers for each addressed comment.

Article Menu

Quantifiable Interactivity of Malicious URLs and the Social Media Ecosystem

Further Information

Guidelines

MDPI Initiatives

Follow MDPI