Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

Social Media Analytics on Russia–Ukraine Cyber War with Natural Language Processing: Perspectives and Challenges

Information 2023, 14(9), 485; https://doi.org/10.3390/info14090485

by Fahim Sufi

Reviewer 1: Anonymous

Reviewer 2:

Agnes Allansdottir

Reviewer 3:

Xujuan Zhou

Information 2023, 14(9), 485; https://doi.org/10.3390/info14090485

Submission received: 8 August 2023 / Revised: 29 August 2023 / Accepted: 30 August 2023 / Published: 31 August 2023

(This article belongs to the Collection Natural Language Processing and Applications: Challenges and Perspectives)

Round 1

Reviewer 1 Report

The paper demonstrates a comprehensive literature review in establishing the research context, methodology, framework, and challenges. Connecting back the findings more tightly to the introduction studies could make the value-add even clearer. The literatures used cover the breadth quite well for this novel cyber intelligence analysis using NLP on tweets. Conducting an extensive analysis of 37K tweets in 54 languages related to Russia-Ukraine cyber issues, which is the first reported large-scale multilingual study on this topic. The granularity of NLP techniques used is impressive. Identifying 12 main challenges faced in using NLP for social media cyber intelligence, categorized into data quality, privacy, bias, scope, technical, and legal issues. The coverage of challenges is quite comprehensive.

Here are some potential shortcomings to note in the technical overview and methodology of this paper:

1. While a high-level flowchart （Figure 4）of the methodology is provided, the paper lacks precise explanations and pseudocode showing how the different NLP techniques (LDA, N-Grams...) are sequenced and integrated together in the pipeline.

2. There is no discussion of data preprocessing steps like cleaning, normalization that are typically essential in NLP workflows. Details or necessary description on how linguistic noise was handled are absent.

3. The methodology seems to rely primarily on off-the-shelf APIs and packages, without much customization or domain adaptation of models. Advanced techniques like fine-tuning sentiment models on cybersecurity text could have improved performance.

4. While LDA topic modeling is used, no technical details are provided regarding number of topics, hyperparameter tuning, interpretation of topic weights etc. More analysis and discussion of the topic modeling process would add value.

5. In the literature review part, here are some papers published in 2022 on social media analysis of Russia-Ukraine cyberwar and its challenges, which are not cited in this article:

-Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web,**arXiv:1801.09288**

The paper, which examines state-related disinformation on Twitter, provides case studies in response to the Ukraine crisis,10.1609/icwsm.v14i1.7342

-Characterizing the Use of Images in State-Sponsored Information Warfare Operations by Russian Trolls on Twitter.

-MAUIL: Multilevel attribute embedding for semisupervised user identity linkage；（2022）

This article explains the problem of information alignment across multiple social media, which is a challenge that the theme of this article "Russia-Ukraine Cyber War with NLP: Perspectives & Challenges" must face, but the author did not mention it.

6. There is no examination on the error rates, biases, or limitations of the NLP models applied as highlighted in the challenges section. The technical results lack this self-critical analysis.

Some minor problems that need to be revised.

7. Some phrases are vague like "this term suggests" (Line 585) without specifying what the term concretely refers to. Replacing vague phrases with more definitive statements would add clarity.

I recommend accepting this paper after a minor revision.

Good.

Author Response

It is my great pleasure to know that all the reviewers have suggested acceptance of this paper. I am grateful that the honorable reviewers have taken great interest this paper and provided their suggestions on improving the quality of this paper. I have taken all the suggestions constructively and accordingly updated the entire manuscript.

Many thanks for this crucial suggestion. Accordingly, I have added the following new paragraph:

“Figure 4 shows the sequence of actions, that starts with language detection, followed by translation, sentiment analysis, country-wise information extraction, TF-IDF (i.e., Term frequency), Porter Stemming, N-Gram, and finally LDA.”

2. There is no discussion of data preprocessing steps like cleaning, normalization that are typically essential in NLP workflows. Details or necessary description on how linguistic noise was handled are absent.

Many thanks for pointing out this important matter. Accordingly, in the update manuscript, we have added the following:

“It should be mentioned that generic preprocessing steps like transforming all the posts in-to lower case texts, removing stop-words, removing hypertext markup tags, and tokenizing were performed similar to recent studies in [17] [22] [35].”

3. The methodology seems to rely primarily on off-the-shelf APIs and packages, without much customization or domain adaptation of models. Advanced techniques like fine-tuning sentiment models on cybersecurity text could have improved performance.

I concur with this suggestion. We have highlighted this aspect within the conclusion section with the following:

“Another limitation of this study was reliance on third party APIs and black box algorithm and eventually not being able to fine tune with hyper-tuning parameterizations for optimizations.”

4. While LDA topic modeling is used, no technical details are provided regarding number of topics, hyperparameter tuning, interpretation of topic weights etc. More analysis and discussion of the topic modeling process would add value.

Many thanks for this crucial suggestion. Accordingly, I have added the following new paragraph:

“Within the topic analysis process, we configured the LDA algorithm to produce 7 topics for both Russia and Ukraine. For each of these topics, LDA was configured to iden-tify 5 most using keywords ranked by the weight as show in Table 5.”

5. In the literature review part, here are some papers published in 2022 on social media analysis of Russia-Ukraine cyberwar and its challenges, which are not cited in this article:

-Disinformation Warfare: Understanding State-Sponsored Trolls on Twitter and Their Influence on the Web,**arXiv:1801.09288**

The paper, which examines state-related disinformation on Twitter, provides case studies in response to the Ukraine crisis,10.1609/icwsm.v14i1.7342

-Characterizing the Use of Images in State-Sponsored Information Warfare Operations by Russian Trolls on Twitter.

-MAUIL: Multilevel attribute embedding for semisupervised user identity linkage；（2022）

Many thanks for this suggestion. Accordingly, I have referenced these suggested articles in [88], [89], and [90].

This article explains the problem of information alignment across multiple social media, which is a challenge that the theme of this article "Russia-Ukraine Cyber War with NLP: Perspectives & Challenges" must face, but the author did not mention it.

Many thanks for this suggestion. Accordingly, I have added the following sentence to the conclusion.

“Moreover, while analyzing social media posts from multiple social-media platforms (e.g., Twitter, Facebook, Instagram etc.), information alignment becomes a critical challenge, and this study did not provide a solution to this critical limitation.”

6. There is no examination on the error rates, biases, or limitations of the NLP models applied as highlighted in the challenges section. The technical results lack this self-critical analysis.

Many thanks for this suggestion. Accordingly, I have added the following sentence to the conclusion.

“Finally, using various NLP techniques on social media posts often results in errors and misclassifications. For example, in [22], 4149 false negatives and 2241 false negatives were identified while using NLP on cyber related social media posts. With all these misclassifications, about 17% accuracy was obtained in [22]. Therefore, strategic decision makers need to be cautious while using social media based cyber intelligence.”

Some minor problems that need to be revised.

7. Some phrases are vague like "this term suggests" (Line 585) without specifying what the term concretely refers to. Replacing vague phrases with more definitive statements would add clarity.

Many thanks for this suggestion. Accordingly, the updated manuscript now clearly shows in the term in bracket, for all cases of “this term suggests”.

I recommend accepting this paper after a minor revision.

Reviewer 2 Report

This is a very good and a very interesting manuscript so I reccomend that it should be published without any further revisions.

The topic itself, social media analytics on the Russia - Ukraine war using NLP: Perspectives and challenges is of high general interest, but more importantly the methodological approach is innovating and very interesting. Further, all the analysis is clearly and transparently presented.

The manuscript is both well structured and well written and argued. The only thing I found a little strange, or rather unexpected, in the flow of the text is why the Discussion section opens up with with a comparison between Australia and China, while none single countries that were included in the analysis had been mentioned before.

The bibliography is adequate and indicates that the author has a grasp of the literature beyond the topic of cyber secturity.

In the conclusions the author claims that this analysis is groundbreaking, and I believe that such an optimistic statement is actually right in this context as the approach seems to be generalisable and applicable to other topics of analysis.

Author Response

Reviewer 3 Report

In this paper, the authors conducted a comprehensive analysis of the crucial role of social media-based cyber intelligence in understanding Russia's cyber threats during the ongoing Russo-Ukrainian conflict. An innovative multidimensional cyber intelligence framework was proposed. Using 37,386 tweets originating from 30,706 users in 54 languages from 13 October 2022 to 6 April 2023, the authors reported the first detailed multilingual analysis on Russia-Ukraine cyber crisis in 4 cyber dimensions (Geopolitical and socioeconomic, targeted victim, psychological & Societal, and National Priority and concerns). This study highlights challenges faced in harnessing reliable social media-based cyber intelligence as well.

Overall, the research work presented in the paper is interesting and good. There are only minor issues to further improve the paper.

1. It would be good if the authors could add a paragraph at the end of the introduction section to describe the structure of the rest of the paper to increase the readability of the paper.

2. It would be good if the authors could provide a brief summary of the literature review which helps the readers to understand the research gaps at the end of the Background Context and Literature section

3. What is this study’s limitation?

Author Response

Overall, the research work presented in the paper is interesting and good. There are only minor issues to further improve the paper.

It would be good if the authors could add a paragraph at the end of the introduction section to describe the structure of the rest of the paper to increase the readability of the paper.

Many thanks for this suggestion. I concur that adding the description of the structure of this paper at the end of introduction would improve the overall quality of the manuscript. Hence, I have added the following new paragraph:

“In the next section, background and contextual information on Russia-Ukraine cyber war, multi-dimensional analysis of cyber-threat, and NLP based Tweets analysis is provided. Then, in Section 3 (i.e., Materials and Methods), the detailed steps, flow chart, algorithms are provided for NLP based Tweet analysis for harnessing Russia-Ukraine cyber intelligence. Finally, results, discussions, and concluding remarks are detailed in Section 4, 5, and 6 consecutively.”

It would be good if the authors could provide a brief summary of the literature review which helps the readers to understand the research gaps at the end of the Background Context and Literature section

Many thanks for the suggestion. Accordingly, I have added the following new paragraph at the end of background context and literature section.

“Even though papers in [21] [23] [24] [25] sporadically used various NLP based techniques, this study provides the most comprehensive use of NLP based techniques in systematic manner. Hence this study leads to gradual improvements upon the research work presented in [21] [23] [24] [25] [28] [27] [29].”

What is this study’s limitation?

Many thanks for this suggestion. I concur that adding limitations of this study to improve the overall quality of the manuscript. Hence, I have added the following new paragraph:

“These 12 challenges briefly depict the limitations of social-media based cyber intelligence methodology that was portrayed in this study. The most crucial limitations of these studies are not identifying information coming out of fake Tweeter users [88], and misinformation (or fake information) generated by organized entities as a part of information operation [89] [90]. Another limitation of this study was reliance on third party APIs and black box algorithm and eventually not being able to fine tune with hyper-tuning parameterizations for optimizations. Moreover, while taking analyzing social media post from multiple social-media platforms (e.g., Twitter, Facebook, Instagram etc.), information alignment becomes a critical challenge, and this study did not provide a solution to this critical limitation. In our future studies, we endeavor to address these limitations with innovative algorithms and methodologies”

Article Menu

Social Media Analytics on Russia–Ukraine Cyber War with Natural Language Processing: Perspectives and Challenges

Further Information

Guidelines

MDPI Initiatives

Follow MDPI