applsci-logo

Journal Browser

Journal Browser

Data Mining and Machine Learning in Social Network Analysis

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: 20 January 2025 | Viewed by 7407

Special Issue Editor


E-Mail Website
Guest Editor
Department of Informatics, University of Piraeus, Karaoli & Dimitriou 80, 18534 Piraeus, Greece
Interests: machine learning; data mining; evolutionary computing; signal processing; digital social networks
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

Data mining and machine learning have found significant applications in the realm of social networks, transforming the way we understand and interact with online communities. These technologies enable the extraction of valuable insights from the massive volumes of data generated on platforms like Facebook, Twitter, and Instagram. By analyzing user behavior, content interactions, and network structures, data mining uncovers hidden patterns and trends that inform personalized content recommendations, targeted advertising, and even sentiment analysis. Machine learning algorithms, on the other hand, play a pivotal role in predicting user preferences, identifying influencers, and detecting anomalies such as fake accounts or cyberbullying.

In the context of social networks, the synergy of data mining and machine learning drives the development of recommendation systems that cater to individual interests, enhancing user engagement and retention. Moreover, the integration of these technologies allows platforms to combat the spread of misinformation and harmful content by recognizing patterns of virality and identifying sources of fake news. As social networks continue to evolve, data mining and machine learning promise to reshape user experiences, fostering more personalized, secure, and socially responsible interactions in the digital landscape. However, ethical considerations surrounding data privacy, algorithmic biases, and potential misuse highlight the need for a thoughtful and balanced approach in leveraging these technologies for the benefit of both users and society as a whole.

This Special Issue will accept publications that fall within the following research topics:

  • Development of novel machine learning algorithms to identify and classify communities within social networks based on structural and behavioral patterns.
  • Influence propagation modeling: investigating machine learning approaches to model and predict the spread of influence and information within social networks.
  • Anomaly detection: design of techniques using machine learning to detect anomalous behaviors and activities within social networks, such as bots, spam, and unusual user interactions.
  • Link prediction: exploring predictive models using machine learning to forecast future connections between users or entities in social networks.
  • Sentiment analysis: development of advanced sentiment analysis methods using machine learning to understand and predict user emotions and opinions within social media posts.
  • User profiling and personalization: utilizing machine learning to create accurate user profiles for personalized content recommendation and targeted advertising in social networks.
  • Fake news detection: designing machine learning algorithms to identify and combat the dissemination of fake news and misinformation within social networks.
  • Opinion dynamics modeling: investigating how machine learning can be employed to model the evolution of opinions and beliefs in social networks over time.
  • Network evolution prediction: development of predictive models using machine learning to anticipate changes and shifts in the structure and dynamics of social networks.
  • Graph representation learning: exploring techniques for learning informative node and graph embeddings in social networks, enhancing various downstream tasks.
  • Network robustness analysis: using machine learning to study the vulnerability and resilience of social networks against attacks, failures, and cascading events.
  • Privacy preservation: researching machine learning methods to analyze and mitigate privacy risks in social networks while preserving data utility.
  • Temporal network analysis: development of models using machine learning to analyze the temporal dynamics of social networks and capture patterns of interactions over time.
  • Behavioral pattern recognition: designing algorithms that utilize machine learning to recognize recurring behavioral patterns and trends within social network activities.
  • Cross-network analysis: investigating methods to combine information from multiple social networks or platforms using machine learning to gain deeper insights.
  • Network visualization: exploring machine learning-driven visualization techniques to represent complex social network structures and interactions in interpretable ways.
  • Opinion leaders’ identification: developing machine learning approaches to identify influential users and opinion leaders within social networks based on their impact and interactions.
  • Gender and demographic analysis: using machine learning to infer user gender, age, and other demographics from their social network activities, enabling targeted studies.
  • Network fairness and bias: researching machine learning techniques to identify and mitigate biases in social network algorithms that can lead to unfair outcomes.
  • Multi-modal social network analysis: combining textual, visual, and other modalities in social network data using machine learning for a comprehensive understanding of user interactions.

Dr. Dionisios Sotiropoulos
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • data mining
  • machine learning
  • recommendation systems
  • social networks

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (7 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

30 pages, 3530 KiB  
Article
Spotting Leaders in Organizations with Graph Convolutional Networks, Explainable Artificial Intelligence, and Automated Machine Learning
by Yunbo Xie, Jose D. Meisel, Carlos A. Meisel, Juan Jose Betancourt, Jianqi Yan and Roberto Bugiolacchi
Appl. Sci. 2024, 14(20), 9461; https://doi.org/10.3390/app14209461 - 16 Oct 2024
Viewed by 313
Abstract
Over the past few decades, the study of leadership theory has expanded across various disciplines, delving into the intricacies of human behavior and defining the roles of individuals within organizations. Its primary objective is to identify leaders who play significant roles in the [...] Read more.
Over the past few decades, the study of leadership theory has expanded across various disciplines, delving into the intricacies of human behavior and defining the roles of individuals within organizations. Its primary objective is to identify leaders who play significant roles in the communication flow. In addition, behavioral theory posits that leaders can be distinguished based on their daily conduct, while social network analysis provides valuable insights into behavioral patterns. Our study investigates five and six types of social networks frequently observed in different organizations. This study is conducted using datasets we collected from an IT company and public datasets collected from a manufacturing company for the thorough evaluation of prediction performance. We leverage PageRank and effective word embedding techniques to obtain novel features. State-of-the-art performance is obtained using various statistical machine learning methods, graph convolutional networks (GCN), automated machine learning (AutoML), and explainable artificial intelligence (XAI). More specifically, our approach can achieve state-of-the-art performance with an accuracy close to 90% for leaders identification with data from projects of different types. This investigation contributes to the establishment of sustainable leadership practices by aiding organizations in retaining their leadership talent. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)
Show Figures

Figure 1

23 pages, 5384 KiB  
Article
An Evaluation of the Maternal Patient Experience through Natural Language Processing Techniques: The Case of Twitter Data in the United States during COVID-19
by Debapriya Banik, Sreenath Chalil Madathil, Amit Joe Lopes, Sergio A. Luna Fong and Santosh K. Mukka
Appl. Sci. 2024, 14(19), 8762; https://doi.org/10.3390/app14198762 - 28 Sep 2024
Viewed by 534
Abstract
The healthcare sector constantly investigates ways to improve patient outcomes and provide more patient-centered care. Delivering quality medical care involves ensuring that patients have a positive experience. Most healthcare organizations use patient survey feedback to measure patients’ experiences. However, the power of social [...] Read more.
The healthcare sector constantly investigates ways to improve patient outcomes and provide more patient-centered care. Delivering quality medical care involves ensuring that patients have a positive experience. Most healthcare organizations use patient survey feedback to measure patients’ experiences. However, the power of social media can be harnessed using artificial intelligence and machine learning techniques to provide researchers with valuable insights into understanding patient experience and care. Our primary research objective is to develop a social media analytics model to evaluate the maternal patient experience during the COVID-19 pandemic. We used the “COVID-19 Tweets” Dataset, which has over 28 million tweets, and extracted tweets from the US with words relevant to maternal patients. The maternal patient cohort was selected because the United States has the highest percentage of maternal mortality and morbidity rate among the developed countries in the world. We evaluated patient experience using natural language processing (NLP) techniques such as word clouds, word clustering, frequency analysis, and network analysis of words that relate to “pains” and “gains” regarding the maternal patient experience, which are expressed through social media. The pandemic showcased the worries of mothers and providers on the risks of COVID-19. However, many people also shared how they survived the pandemic. Both providers and maternal patients had concerns regarding the pregnancy risks due to COVID-19. This model will help process improvement experts without domain expertise to understand the various domain challenges efficiently. Such insights can help decision-makers improve the patient care system. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)
Show Figures

Figure 1

22 pages, 3950 KiB  
Article
Visual Censorship: A Deep Learning-Based Approach to Preventing the Leakage of Confidential Content in Images
by Abigail Paradise Vit, Yarden Aronson, Raz Fraidenberg and Rami Puzis
Appl. Sci. 2024, 14(17), 7915; https://doi.org/10.3390/app14177915 - 5 Sep 2024
Viewed by 517
Abstract
Online social networks (OSNs) are fertile ground for information sharing and public relationships. However, the uncontrolled dissemination of information poses a significant risk of the inadvertent disclosure of sensitive information. This poses a notable challenge to the information security of many organizations. Improving [...] Read more.
Online social networks (OSNs) are fertile ground for information sharing and public relationships. However, the uncontrolled dissemination of information poses a significant risk of the inadvertent disclosure of sensitive information. This poses a notable challenge to the information security of many organizations. Improving organizations’ ability to automatically identify data leaked within image-based content requires specialized techniques. In contrast to traditional vision-based tasks, detecting data leaked within images presents a unique challenge due to the context-dependent nature and sparsity of the target objects, as well as the possibility that these objects may appear in an image inadvertently as background or small elements rather than as the central focus of the image. In this paper, we investigated the ability of multiple state-of-the-art deep learning methods to detect censored objects in an image. We conducted a case study utilizing Instagram images published by members of a large organization. Six types of objects that were not intended for public exposure were detected with an average accuracy of 0.9454 and an average macro F1-score of 0.658. A further analysis of relevant OSN images revealed that many contained confidential information, exposing the organization and its members to security risks. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)
Show Figures

Figure 1

16 pages, 339 KiB  
Article
RumorLLM: A Rumor Large Language Model-Based Fake-News-Detection Data-Augmentation Approach
by Jianqiao Lai, Xinran Yang, Wenyue Luo, Linjiang Zhou, Langchen Li, Yongqi Wang and Xiaochuan Shi
Appl. Sci. 2024, 14(8), 3532; https://doi.org/10.3390/app14083532 - 22 Apr 2024
Cited by 4 | Viewed by 2209
Abstract
With the rapid development of the Internet and social media, false information, rumors, and misleading content have become pervasive, posing significant threats to public opinion and social stability, and even causing serious societal harm. This paper introduces a novel solution to address the [...] Read more.
With the rapid development of the Internet and social media, false information, rumors, and misleading content have become pervasive, posing significant threats to public opinion and social stability, and even causing serious societal harm. This paper introduces a novel solution to address the challenges of fake news detection, presenting the “Rumor Large Language Models” (RumorLLM), a large language model finetuned with rumor writing styles and content. The key contributions include the development of RumorLLM and a data-augmentation method for small categories, effectively mitigating the issue of category imbalance in real-world fake-news datasets. Experimental results on the BuzzFeed and PolitiFact datasets demonstrate the superiority of the proposed model over baseline methods, particularly in F1 score and AUC-ROC. The model’s robust performance highlights its effectiveness in handling imbalanced datasets and provides a promising solution to the pressing issue of false-information proliferation. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)
Show Figures

Figure 1

23 pages, 4610 KiB  
Article
Exploring the Performance of Continuous-Time Dynamic Link Prediction Algorithms
by Raphaël Romero, Maarten Buyl, Tijl De Bie and Jefrey Lijffijt
Appl. Sci. 2024, 14(8), 3516; https://doi.org/10.3390/app14083516 - 22 Apr 2024
Viewed by 798
Abstract
Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks. However, accurately portraying the performance of DLP algorithms poses challenges that might impede progress in the field. Importantly, common evaluation pipelines usually calculate ranking or binary classification metrics, where the [...] Read more.
Dynamic Link Prediction (DLP) addresses the prediction of future links in evolving networks. However, accurately portraying the performance of DLP algorithms poses challenges that might impede progress in the field. Importantly, common evaluation pipelines usually calculate ranking or binary classification metrics, where the scores of observed interactions (positives) are compared with those of randomly generated ones (negatives). However, a single metric is not sufficient to fully capture the differences between DLP algorithms, and is prone to overly optimistic performance evaluation. Instead, an in-depth evaluation should reflect performance variations across different nodes, edges, and time segments. In this work, we contribute tools to perform such a comprehensive evaluation. (1) We propose Birth–Death diagrams, a simple but powerful visualization technique that illustrates the effect of time-based train–test splitting on the difficulty of DLP on a given dataset. (2) We describe an exhaustive taxonomy of negative sampling methods that can be used at evaluation time. (3) We carry out an empirical study of the effect of the different negative sampling strategies. Our comparison between heuristics and state-of-the-art memory-based methods on various real-world datasets confirms a strong effect of using different negative sampling strategies on the test area under the curve (AUC). Moreover, we conduct a visual exploration of the prediction, with additional insights on which different types of errors are prominent over time. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)
Show Figures

Figure 1

24 pages, 501 KiB  
Article
Outlier Detection and Prediction in Evolving Communities
by Nikolaos Sachpenderis and Georgia Koloniari
Appl. Sci. 2024, 14(6), 2356; https://doi.org/10.3390/app14062356 - 11 Mar 2024
Cited by 1 | Viewed by 903
Abstract
Community detection in social networks is of great importance and is used in a variety of applications such as recommendation systems and targeted advertising. While detecting dense groups with high levels of connectivity and similar interests between their members is the main target [...] Read more.
Community detection in social networks is of great importance and is used in a variety of applications such as recommendation systems and targeted advertising. While detecting dense groups with high levels of connectivity and similar interests between their members is the main target of traditional network analysis, finding network members with quite different behavior than the majority of nodes is important as well. These nodes are known as outliers, and their accurate detection can be very useful; when outliers are marked as noisy nodes, their early exclusion from analysis can lead to high computational profits. On the other hand, they can represent interesting components that call for further investigation to find the reasons for their outlying behavior and possible ways to include them in a neighboring community. Both community and outlier detection are challenging in temporal environments where changes occur in real time; thus, dynamic methods need to be deployed rather than to static methods. In our work, we take into account the content of the network, in contrast to most of related studies, where only the network’s structure contributes to community formation. We define an adaptive outlier score to be assigned to each node in order to quantify its outlierness, and introduce a complete online community detection algorithm that analyzes both the network’s structure and content while at the same time detecting community outliers. To evaluate our method, we retrieved and processed two real datasets regarding social networks with temporal and content information. Experimental results show that our method is capable of detecting outliers in real-time evolving communities and provides an outlier score which is a better metric of each node’s outlierness compared to widely used metrics. Finally, experimental results indicate that our method is suitable for predicting the status of future nodes based on their current outlier score. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)
Show Figures

Figure 1

19 pages, 1345 KiB  
Article
Two-Stage Dimensionality Reduction for Social Media Engagement Classification
by Jose Luis Vieira Sobrinho, Flavio Henrique Teles Vieira and Alisson Assis Cardoso
Appl. Sci. 2024, 14(3), 1269; https://doi.org/10.3390/app14031269 - 3 Feb 2024
Cited by 1 | Viewed by 897
Abstract
The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon [...] Read more.
The high dimensionality of real-life datasets is one of the biggest challenges in the machine learning field. Due to the increased need for computational resources, the higher the dimension of the input data is, the more difficult the learning task will be—a phenomenon commonly referred to as the curse of dimensionality. Laying the paper’s foundation based on this premise, we propose a two-stage dimensionality reduction (TSDR) method for data classification. The first stage extracts high-quality features to a new subset by maximizing the pairwise separation probability, with the aim of avoiding overlap between individuals from different classes that are close to one another, also known as the class masking problem. The second stage takes the previous resulting subset and transforms it into a reduced final space in a way that maximizes the distance between the cluster centers of different classes while also minimizing the dispersion of instances within the same class. Hence, the second stage aims to improve the accuracy of the succeeding classifier by lowering its sensitivity to an imbalanced distribution of instances between different classes. Experiments on benchmark and social media datasets show how promising the proposed method is over some well-established algorithms, especially regarding social media engagement classification. Full article
(This article belongs to the Special Issue Data Mining and Machine Learning in Social Network Analysis)
Show Figures

Figure 1

Planned Papers

The below list represents only planned manuscripts. Some of these manuscripts have not been received by the Editorial Office yet. Papers submitted to MDPI journals are subject to peer-review.

Title: Visual Censorship: A Deep Learning-Based Approach to Prevent Leakage of Confidential Content in Images
Author: Paradise vit
Highlights: We investigate the ability of state-of-the-art deep learning methods to detect censored objects in images; We demonstrate the effectiveness of our method with real-world organizational data collected from Instagram. Images published by organization members contained censored objects, exposing the organization to potential security risks. This highlights the need for robust methods to detect and mitigate data leakage on OSNs.

Back to TopTop