Submit to Applied Sciences Review for Applied Sciences Propose a Special Issue

Journal Menu

Journal Browser

Data Mining and Machine Learning in Social Network Analysis

Special Issue Editors
Special Issue Information
Keywords
Benefits of Publishing in a Special Issue
Published Papers

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 January 2025) | Viewed by 25972

Share This Special Issue

Special Issue Editor

Dr. Dionisios Sotiropoulos

E-Mail Website
Guest Editor

Department of Informatics, University of Piraeus, Karaoli & Dimitriou 80, 18534 Piraeus, Greece
Interests: machine learning; data mining; evolutionary computing; signal processing; digital social networks
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data mining and machine learning have found significant applications in the realm of social networks, transforming the way we understand and interact with online communities. These technologies enable the extraction of valuable insights from the massive volumes of data generated on platforms like Facebook, Twitter, and Instagram. By analyzing user behavior, content interactions, and network structures, data mining uncovers hidden patterns and trends that inform personalized content recommendations, targeted advertising, and even sentiment analysis. Machine learning algorithms, on the other hand, play a pivotal role in predicting user preferences, identifying influencers, and detecting anomalies such as fake accounts or cyberbullying.

In the context of social networks, the synergy of data mining and machine learning drives the development of recommendation systems that cater to individual interests, enhancing user engagement and retention. Moreover, the integration of these technologies allows platforms to combat the spread of misinformation and harmful content by recognizing patterns of virality and identifying sources of fake news. As social networks continue to evolve, data mining and machine learning promise to reshape user experiences, fostering more personalized, secure, and socially responsible interactions in the digital landscape. However, ethical considerations surrounding data privacy, algorithmic biases, and potential misuse highlight the need for a thoughtful and balanced approach in leveraging these technologies for the benefit of both users and society as a whole.

This Special Issue will accept publications that fall within the following research topics:

Development of novel machine learning algorithms to identify and classify communities within social networks based on structural and behavioral patterns.
Influence propagation modeling: investigating machine learning approaches to model and predict the spread of influence and information within social networks.
Anomaly detection: design of techniques using machine learning to detect anomalous behaviors and activities within social networks, such as bots, spam, and unusual user interactions.
Link prediction: exploring predictive models using machine learning to forecast future connections between users or entities in social networks.
Sentiment analysis: development of advanced sentiment analysis methods using machine learning to understand and predict user emotions and opinions within social media posts.
User profiling and personalization: utilizing machine learning to create accurate user profiles for personalized content recommendation and targeted advertising in social networks.
Fake news detection: designing machine learning algorithms to identify and combat the dissemination of fake news and misinformation within social networks.
Opinion dynamics modeling: investigating how machine learning can be employed to model the evolution of opinions and beliefs in social networks over time.
Network evolution prediction: development of predictive models using machine learning to anticipate changes and shifts in the structure and dynamics of social networks.
Graph representation learning: exploring techniques for learning informative node and graph embeddings in social networks, enhancing various downstream tasks.
Network robustness analysis: using machine learning to study the vulnerability and resilience of social networks against attacks, failures, and cascading events.
Privacy preservation: researching machine learning methods to analyze and mitigate privacy risks in social networks while preserving data utility.
Temporal network analysis: development of models using machine learning to analyze the temporal dynamics of social networks and capture patterns of interactions over time.
Behavioral pattern recognition: designing algorithms that utilize machine learning to recognize recurring behavioral patterns and trends within social network activities.
Cross-network analysis: investigating methods to combine information from multiple social networks or platforms using machine learning to gain deeper insights.
Network visualization: exploring machine learning-driven visualization techniques to represent complex social network structures and interactions in interpretable ways.
Opinion leaders’ identification: developing machine learning approaches to identify influential users and opinion leaders within social networks based on their impact and interactions.
Gender and demographic analysis: using machine learning to infer user gender, age, and other demographics from their social network activities, enabling targeted studies.
Network fairness and bias: researching machine learning techniques to identify and mitigate biases in social network algorithms that can lead to unfair outcomes.
Multi-modal social network analysis: combining textual, visual, and other modalities in social network data using machine learning for a comprehensive understanding of user interactions.

Dr. Dionisios Sotiropoulos
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

data mining
machine learning
recommendation systems
social networks

Benefits of Publishing in a Special Issue

Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
Reprint: MDPI Books provides the opportunity to republish successful Special Issues in book format, both online and in print.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (10 papers)

Download All Papers

Order results

Result details

Show export options Show export options

Select all

Export citation of selected articles as:

Research

22 pages, 818 KB

Open AccessArticle

Detecting Fake Reviews Using Aspect-Based Sentiment Analysis and Graph Convolutional Networks

by Prathana Phukon, Petros Potikas and Katerina Potika

Appl. Sci. 2025, 15(7), 3771; https://doi.org/10.3390/app15073771 - 29 Mar 2025

Viewed by 2399

Abstract

Online reviews significantly influence consumer behavior and business reputations. Detecting fake reviews is important for maintaining trust and integrity in these platforms. We present an aspect-based sentiment analysis approach, referred to as FakeDetectionGCN, to distinguish genuine feedback from deceptive content. The idea is to analyze sentiments related to specific aspects (features) within reviews. Graph convolutional networks are used to model the complex contextual dependencies in the review texts. Additionally, SenticNet, an external semantic resource, is integrated to enhance the understanding of sentiments in the reviews. This model is capable of identifying both human-generated (genuine) as well as computer-generated (fake) reviews. It has been evaluated on two types of datasets and has shown strong performance across both. Through this work, we contribute to the effective detection of fake reviews and maintaining a trustworthy online review ecosystem. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Figure 1

21 pages, 721 KB

Open AccessArticle

Be Sure to Use the Same Writing Style: Applying Authorship Verification on Large-Language-Model-Generated Texts

by Janith Weerasinghe, Ovendra Seepersaud, Genesis Smothers, Julia Jose and Rachel Greenstadt

Appl. Sci. 2025, 15(5), 2467; https://doi.org/10.3390/app15052467 - 25 Feb 2025

Viewed by 2511

Abstract

Recently, there have been significant advances and wide-scale use of generative AI in natural language generation. Models such as OpenAI’s GPT3 and Meta’s LLaMA are widely used in chatbots, to summarize documents, and to generate creative content. These advances raise concerns about abuses of these models, especially in social media settings, such as large-scale generation of disinformation, manipulation campaigns that use AI-generated content, and personalized scams. We used stylometry (the analysis of style in natural language text) to analyze the style of AI-generated text. Specifically, we applied an existing authorship verification (AV) model that can predict if two documents are written by the same author on texts generated by GPT2, GPT3, ChatGPT and LLaMA. Our AV model was trained only on human-written text and was effectively used in social media settings to analyze cases of abuse. We generated texts by providing the language models with fanfiction snippets and prompting them to complete the rest of it in the same writing style as the original snippet. We then applied the AV model across the texts generated by the language models and the human written texts to analyze the similarity of the writing styles between these texts. We found that texts generated with GPT2 had the highest similarity to the human texts. Texts generated by GPT3 and ChatGPT were very different from the human snippet, and were similar to each other. LLaMA-generated texts had some similarity to the original snippet but also has similarities with other LLaMA-generated texts and texts from other models. We then conducted a feature analysis to identify the features that drive these similarity scores. This analysis helped us answer questions like which features distinguish the language style of language models and humans, which features are different across different models, and how these linguistic features change over different language model versions. The dataset and the source code used in this analysis have been made public to allow for further analysis of new language models. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Figure 1

27 pages, 1831 KB

Open AccessArticle

A Multi-Architecture Approach for Offensive Language Identification Combining Classical Natural Language Processing and BERT-Variant Models

by Ashok Yadav, Farrukh Aslam Khan and Vrijendra Singh

Appl. Sci. 2024, 14(23), 11206; https://doi.org/10.3390/app142311206 - 1 Dec 2024

Cited by 1 | Viewed by 2613

Abstract

Offensive content is a complex and multifaceted form of harmful material that targets individuals or groups. In recent years, offensive language (OL) has become increasingly harmful, as it incites violence and intolerance. The automatic identification of OL on social networks is essential to curtail the spread of harmful content. We address this problem by developing an architecture to effectively respond to and mitigate the impact of offensive content on society. In this paper, we use the Davidson dataset containing 24,783 samples of tweets and proposed three different architectures for detecting OL on social media platforms. Our proposed approach involves concatenation of features (TF-IDF, Word2Vec, sentiments, and FKRA/FRE) and a baseline machine learning model for the classification. We explore the effectiveness of different dimensions of GloVe embeddings in conjunction with deep learning models for classifying OL. We also propose an architecture that utilizes advanced transformer models such as BERT, ALBERT, and ELECTRA for pre-processing and encoding, with 1D CNN and neural network layers serving as the classification components. We achieve the highest precision, recall, and F1 score, i.e., 0.89, 0.90, and 0.90, respectively, for both the “bert encased preprocess/1 + small bert/L4H512A8/1 + neural network layers” model and the “bert encased preprocess/1 + electra small/2 + cnn” architecture. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Figure 1

30 pages, 3530 KB

Open AccessArticle

Spotting Leaders in Organizations with Graph Convolutional Networks, Explainable Artificial Intelligence, and Automated Machine Learning

by Yunbo Xie, Jose D. Meisel, Carlos A. Meisel, Juan Jose Betancourt, Jianqi Yan and Roberto Bugiolacchi

Appl. Sci. 2024, 14(20), 9461; https://doi.org/10.3390/app14209461 - 16 Oct 2024

Viewed by 1535

Abstract

Over the past few decades, the study of leadership theory has expanded across various disciplines, delving into the intricacies of human behavior and defining the roles of individuals within organizations. Its primary objective is to identify leaders who play significant roles in the communication flow. In addition, behavioral theory posits that leaders can be distinguished based on their daily conduct, while social network analysis provides valuable insights into behavioral patterns. Our study investigates five and six types of social networks frequently observed in different organizations. This study is conducted using datasets we collected from an IT company and public datasets collected from a manufacturing company for the thorough evaluation of prediction performance. We leverage PageRank and effective word embedding techniques to obtain novel features. State-of-the-art performance is obtained using various statistical machine learning methods, graph convolutional networks (GCN), automated machine learning (AutoML), and explainable artificial intelligence (XAI). More specifically, our approach can achieve state-of-the-art performance with an accuracy close to

90 %

for leaders identification with data from projects of different types. This investigation contributes to the establishment of sustainable leadership practices by aiding organizations in retaining their leadership talent. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Figure 1

23 pages, 5384 KB

Open AccessArticle

An Evaluation of the Maternal Patient Experience through Natural Language Processing Techniques: The Case of Twitter Data in the United States during COVID-19

by Debapriya Banik, Sreenath Chalil Madathil, Amit Joe Lopes, Sergio A. Luna Fong and Santosh K. Mukka

Appl. Sci. 2024, 14(19), 8762; https://doi.org/10.3390/app14198762 - 28 Sep 2024

Viewed by 1664

Abstract

The healthcare sector constantly investigates ways to improve patient outcomes and provide more patient-centered care. Delivering quality medical care involves ensuring that patients have a positive experience. Most healthcare organizations use patient survey feedback to measure patients’ experiences. However, the power of social media can be harnessed using artificial intelligence and machine learning techniques to provide researchers with valuable insights into understanding patient experience and care. Our primary research objective is to develop a social media analytics model to evaluate the maternal patient experience during the COVID-19 pandemic. We used the “COVID-19 Tweets” Dataset, which has over 28 million tweets, and extracted tweets from the US with words relevant to maternal patients. The maternal patient cohort was selected because the United States has the highest percentage of maternal mortality and morbidity rate among the developed countries in the world. We evaluated patient experience using natural language processing (NLP) techniques such as word clouds, word clustering, frequency analysis, and network analysis of words that relate to “pains” and “gains” regarding the maternal patient experience, which are expressed through social media. The pandemic showcased the worries of mothers and providers on the risks of COVID-19. However, many people also shared how they survived the pandemic. Both providers and maternal patients had concerns regarding the pregnancy risks due to COVID-19. This model will help process improvement experts without domain expertise to understand the various domain challenges efficiently. Such insights can help decision-makers improve the patient care system. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Figure 1

22 pages, 3950 KB

Open AccessArticle

Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images

by Abigail Paradise Vit, Yarden Aronson, Raz Fraidenberg and Rami Puzis

Appl. Sci. 2024, 14(17), 7915; https://doi.org/10.3390/app14177915 - 5 Sep 2024

Cited by 1 | Viewed by 2808

Abstract

Online social networks (OSNs) are fertile ground for information sharing and public relationships. However, the uncontrolled dissemination of information poses a significant risk of the inadvertent disclosure of sensitive information. This poses a notable challenge to the information security of many organizations. Improving organizations’ ability to automatically identify data leaked within image-based content requires specialized techniques. In contrast to traditional vision-based tasks, detecting data leaked within images presents a unique challenge due to the context-dependent nature and sparsity of the target objects, as well as the possibility that these objects may appear in an image inadvertently as background or small elements rather than as the central focus of the image. In this paper, we investigated the ability of multiple state-of-the-art deep learning methods to detect censored objects in an image. We conducted a case study utilizing Instagram images published by members of a large organization. Six types of objects that were not intended for public exposure were detected with an average accuracy of 0.9454 and an average macro F1-score of 0.658. A further analysis of relevant OSN images revealed that many contained confidential information, exposing the organization and its members to security risks. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Figure 1

16 pages, 339 KB

Open AccessArticle

RumorLLM: A Rumor Large Language Model-Based Fake-News-Detection Data-Augmentation Approach

by Jianqiao Lai, Xinran Yang, Wenyue Luo, Linjiang Zhou, Langchen Li, Yongqi Wang and Xiaochuan Shi

Appl. Sci. 2024, 14(8), 3532; https://doi.org/10.3390/app14083532 - 22 Apr 2024

Cited by 21 | Viewed by 5380

Abstract

With the rapid development of the Internet and social media, false information, rumors, and misleading content have become pervasive, posing significant threats to public opinion and social stability, and even causing serious societal harm. This paper introduces a novel solution to address the challenges of fake news detection, presenting the “Rumor Large Language Models” (RumorLLM), a large language model finetuned with rumor writing styles and content. The key contributions include the development of RumorLLM and a data-augmentation method for small categories, effectively mitigating the issue of category imbalance in real-world fake-news datasets. Experimental results on the BuzzFeed and PolitiFact datasets demonstrate the superiority of the proposed model over baseline methods, particularly in F1 score and AUC-ROC. The model’s robust performance highlights its effectiveness in handling imbalanced datasets and provides a promising solution to the pressing issue of false-information proliferation. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Figure 1

23 pages, 4610 KB

Open AccessArticle

Exploring the Performance of Continuous-Time Dynamic Link Prediction Algorithms

by Raphaël Romero, Maarten Buyl, Tijl De Bie and Jefrey Lijffijt

Appl. Sci. 2024, 14(8), 3516; https://doi.org/10.3390/app14083516 - 22 Apr 2024

Viewed by 1666

Abstract

Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks. However, accurately portraying the performance of DLP algorithms poses challenges that might impede progress in the field. Importantly, common evaluation pipelines usually calculate ranking or binary classification metrics, where the scores of observed interactions (positives) are compared with those of randomly generated ones (negatives). However, a single metric is not sufficient to fully capture the differences between DLP algorithms, and is prone to overly optimistic performance evaluation. Instead, an in-depth evaluation should reflect performance variations across different nodes, edges, and time segments. In this work, we contribute tools to perform such a comprehensive evaluation. (1) We propose Birth–Death diagrams, a simple but powerful visualization technique that illustrates the effect of time-based train–test splitting on the difficulty of DLP on a given dataset. (2) We describe an exhaustive taxonomy of negative sampling methods that can be used at evaluation time. (3) We carry out an empirical study of the effect of the different negative sampling strategies. Our comparison between heuristics and state-of-the-art memory-based methods on various real-world datasets confirms a strong effect of using different negative sampling strategies on the test area under the curve (AUC). Moreover, we conduct a visual exploration of the prediction, with additional insights on which different types of errors are prominent over time. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Figure 1

24 pages, 501 KB

Open AccessArticle

Outlier Detection and Prediction in Evolving Communities

by Nikolaos Sachpenderis and Georgia Koloniari

Appl. Sci. 2024, 14(6), 2356; https://doi.org/10.3390/app14062356 - 11 Mar 2024

Cited by 2 | Viewed by 1837

Abstract

Community detection in social networks is of great importance and is used in a variety of applications such as recommendation systems and targeted advertising. While detecting dense groups with high levels of connectivity and similar interests between their members is the main target of traditional network analysis, finding network members with quite different behavior than the majority of nodes is important as well. These nodes are known as outliers, and their accurate detection can be very useful; when outliers are marked as noisy nodes, their early exclusion from analysis can lead to high computational profits. On the other hand, they can represent interesting components that call for further investigation to find the reasons for their outlying behavior and possible ways to include them in a neighboring community. Both community and outlier detection are challenging in temporal environments where changes occur in real time; thus, dynamic methods need to be deployed rather than to static methods. In our work, we take into account the content of the network, in contrast to most of related studies, where only the network’s structure contributes to community formation. We define an adaptive outlier score to be assigned to each node in order to quantify its outlierness, and introduce a complete online community detection algorithm that analyzes both the network’s structure and content while at the same time detecting community outliers. To evaluate our method, we retrieved and processed two real datasets regarding social networks with temporal and content information. Experimental results show that our method is capable of detecting outliers in real-time evolving communities and provides an outlier score which is a better metric of each node’s outlierness compared to widely used metrics. Finally, experimental results indicate that our method is suitable for predicting the status of future nodes based on their current outlier score. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Figure 1

19 pages, 1345 KB

Open AccessArticle

Two-Stage Dimensionality Reduction for Social Media Engagement Classification

by Jose Luis Vieira Sobrinho, Flavio Henrique Teles Vieira and Alisson Assis Cardoso

Appl. Sci. 2024, 14(3), 1269; https://doi.org/10.3390/app14031269 - 3 Feb 2024

Cited by 4 | Viewed by 1481

Abstract

The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the curse of dimensionality. Laying the paper’s foundation based on this premise, we propose a two-stage dimensionality reduction (TSDR) method for data classification. The first stage extracts high-quality features to a new subset by maximizing the pairwise separation probability, with the aim of avoiding overlap between individuals from different classes that are close to one another, also known as the class masking problem. The second stage takes the previous resulting subset and transforms it into a reduced final space in a way that maximizes the distance between the cluster centers of different classes while also minimizing the dispersion of instances within the same class. Hence, the second stage aims to improve the accuracy of the succeeding classifier by lowering its sensitivity to an imbalanced distribution of instances between different classes. Experiments on benchmark and social media datasets show how promising the proposed method is over some well-established algorithms, especially regarding social media engagement classification. Full article

(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)

► Show Figures

Journal Menu

Journal Browser

Data Mining and Machine Learning in Social Network Analysis

Share This Special Issue

Special Issue Editor

Special Issue Information

Keywords

Benefits of Publishing in a Special Issue

Published Papers (10 papers)

Research

Further Information

Guidelines

MDPI Initiatives

Follow MDPI