MDPI - Publisher of Open Access Journals

24 pages, 15793 KB

Open AccessArticle

AirCalypse: A Case Study of Temporal and User-Behaviour Contrasts in Social Media for Urban Air Pollution Monitoring in New Delhi Before and During COVID-19

by Prithviraj Pramanik, Tamal Mondal, Sirshendu Arosh and Mousumi Saha

Sustainability 2025, 17(19), 8924; https://doi.org/10.3390/su17198924 - 8 Oct 2025

Viewed by 369

Abstract

Air pollution has become a significant concern for human health, especially in developing countries. Among Primary Pollutants, particulate matter 2.5 (

{PM}_{2.5}

), refers to airborne particles which have a diameter of 2.5 micrometres or less, and has become a widely used [...] Read more.

Air pollution has become a significant concern for human health, especially in developing countries. Among Primary Pollutants, particulate matter 2.5 (

{PM}_{2.5}

), refers to airborne particles which have a diameter of 2.5 micrometres or less, and has become a widely used measure for monitoring air quality globally. The standard go-to method usually uses Federal Reference Grade sensors to understand air quality. But, they are quite cost-prohibitive, so the popular alternative is low-cost (LC) air quality sensors. Even LC air quality monitors do not cover many areas, especially across the global south. On the other hand, the ubiquitous use of online social media OSM has led to its evolution in participatory sensing. While it does not function as a physical sensor, it can be a proxy indicator of public perception on the topic under study. OSM platforms such as Twitter/X and Reddit have already demonstrated their value in understanding human perception across various domains, including air quality monitoring. This study focuses on understanding air pollution in a resource-constrained setting by examining how the community perception on social media can complement traditional monitoring. We leverage metadata readily available from social media user data to find patterns with air quality fluctuations before and during the pandemic. We use the US Embassy

{PM}_{2.5}

data for baseline measurement. In the study, we empirically analyse the variations in quantitative & intent-based community perception in seasonal & pandemic outbreaks with varying air quality. We compare the baseline against temporal & user-specific attributes of Twitter/X relating to tweets like daily frequency of tweets, tweet lags 1–5, user followers, user verified, and user lists memberships across two timelines: pre-COVID-19 (20 March 2019– 29 February 2020) & COVID-19 (1 March 2020–20 September 2020). Our analysis examines both the quantitative and the intent-based community engagement, highlighting the significance of features like user authenticity, tweet recurrence rates, and intensity of participation. Furthermore, we show how behavioural patterns in the online discussions diverged across the two periods, which reflected the broader shifts in the air pollution levels and the public attention. This study empirically demonstrates the significance of X/Twitter metadata, beyond standard tweet content, and provides additional features for modelling and understanding air quality in developing countries. Full article

(This article belongs to the Special Issue Air Pollution and Sustainability)

► Show Figures

Figure 1

19 pages, 4717 KB

Open AccessArticle

Benchmarking Psychological Lexicons and Large Language Models for Emotion Detection in Brazilian Portuguese

by Thales David Domingues Aparecido, Alexis Carrillo, Chico Q. Camargo and Massimo Stella

AI 2025, 6(10), 249; https://doi.org/10.3390/ai6100249 - 1 Oct 2025

Viewed by 424

Abstract

Emotion detection in Brazilian Portuguese is less studied than in English. We benchmarked a large language model (Mistral 24B), a language-specific transformer model (BERTimbau), and the lexicon-based EmoAtlas for classifying emotions in Brazilian Portuguese text, with a focus on eight emotions derived from [...] Read more.

Emotion detection in Brazilian Portuguese is less studied than in English. We benchmarked a large language model (Mistral 24B), a language-specific transformer model (BERTimbau), and the lexicon-based EmoAtlas for classifying emotions in Brazilian Portuguese text, with a focus on eight emotions derived from Plutchik’s model. Evaluation covered four corpora: 4000 stock-market tweets, 1000 news headlines, 5000 GoEmotions Reddit comments translated by LLMs, and 2000 DeepSeek-generated headlines. While BERTimbau achieved the highest average scores (accuracy 0.876, precision 0.529, and recall 0.423), an overlap with Mistral (accuracy 0.831, precision 0.522, and recall 0.539) and notable performance variability suggest there is no single top performer; however, both transformer-based models outperformed the lexicon-based EmoAtlas (accuracy 0.797) but required up to 40 times more computational resources. We also introduce a novel “emotional fingerprinting” methodology using a synthetically generated dataset to probe emotional alignment, which revealed an imperfect overlap in the emotional representations of the models. While LLMs deliver higher overall scores, EmoAtlas offers superior interpretability and efficiency, making it a cost-effective alternative. This work delivers the first quantitative benchmark for interpretable emotion detection in Brazilian Portuguese, with open datasets and code to foster research in multilingual natural language processing. Full article

(This article belongs to the Special Issue Understanding Transformers and Large Language Models (LLMs) with Natural Language Processing (NLP))

► Show Figures

Figure 1

15 pages, 1346 KB

Open AccessArticle

Using Social Media Listening to Characterize the Flare Lexicon in Patients with Sjögren’s Disease

by Chiara Baldini, Maurice Flurie, Zachary Cline, Colton Flowers, Coralie Peter Bouillot, Linda J. Stone, Lauren Dougherty, Christopher DeFelice and Maria Picone

Rheumato 2025, 5(4), 14; https://doi.org/10.3390/rheumato5040014 - 26 Sep 2025

Viewed by 331

Abstract

Background/Objectives: Sjögren’s disease (SjD) flares are incompletely understood. The patient perspective is critical to closing this gap. This retrospective social media listening (SML) study characterized the flare lexicon within the online Reddit SjD community using novel machine learning and natural language processing. Methods: [...] Read more.

Background/Objectives: Sjögren’s disease (SjD) flares are incompletely understood. The patient perspective is critical to closing this gap. This retrospective social media listening (SML) study characterized the flare lexicon within the online Reddit SjD community using novel machine learning and natural language processing. Methods: Documents (posts/comments) were analyzed from the subreddit group “r/Sjogrens” (October 2012 to August 2023). Outcomes were as follows: (1) Frequency of documents mentioning flare, and contexts in which flare was mentioned; (2) clinical concepts associated with flare (analyzed using co-occurrence and pointwise mutual information [PMI]); (3) proportion of flare vs. non-flare documents relevant to SYMPTOMS or TESTING (compared using a two-proportion z-test); and (4) primary emotions mentioned in flare documents. Results: Of 59,266 documents with 5025 authors, flare was mentioned 3330 times (4.4% of documents from 19.1% of authors). Flare was discussed as a symptom (1423 instances), disease (13), or with no clinical category (1890). Flare-associated clinical concepts (co-occurrence > 100 and PMI² > 3) included SYMPTOMS (pain, fatigue, dryness of eye, xerostomia, arthralgia, stress) and BODY PARTS (eye, mouth, joints, whole body). More flare vs. non-flare documents mentioned a SYMPTOM, whereas fewer mentioned a TEST (p < 0.001 for both). Within flare documents, 36.5% expressed emotions, primarily fear (40.5% of primary emotions), happiness (17.8%), sadness (15.7%), and anger (15.5%). Conclusions: The SjD community discusses flare frequently and in context with symptoms, specifically pain, eye and mouth dryness, and fatigue. Flare conversations frequently involve negative emotions. Additional research is required to clarify the patient experience of flare, its clinical parameters, and implications. Full article

► Show Figures

Figure 1

13 pages, 986 KB

Open AccessArticle

Public Engagement with Lung Cancer Screening Information: Topic Modeling of Lung Cancer-Related Reddit Posts

by Aditi Jaiswal, Samia Amin, Sayed M. S. Amin, Donghee Nicole Lee, Sungshim Lani Park and Pallav Pokhrel

Curr. Oncol. 2025, 32(10), 529; https://doi.org/10.3390/curroncol32100529 - 23 Sep 2025

Viewed by 450

Abstract

Lung cancer screening (LCS) with low-dose computed tomography is an effective strategy for early detection and improved survival. Despite its clinical benefits, public engagement with LCS topic remains unclear, particularly in the digital health communities. This study examines the thematic landscape of lung [...] Read more.

Lung cancer screening (LCS) with low-dose computed tomography is an effective strategy for early detection and improved survival. Despite its clinical benefits, public engagement with LCS topic remains unclear, particularly in the digital health communities. This study examines the thematic landscape of lung cancer-related discussions on Reddit. Using Python’s Reddit API Wrapper, we collected 109,868 posts from six lung cancer-related subreddits between January 2019 and December 2024. After preprocessing, 105,118 unique posts were analyzed using Latent Dirichlet Allocation topic modeling to identify emergent themes. Topics were qualitatively reviewed and categorized into four high-level themes: treatment, mental health, smoking, and LCS. Mental health (71.82%) and treatment (16.84%) dominated the discourse, followed by smoking (8.30%), while LCS remained underrepresented (3.04%). Despite an increase in overall engagement from 2022 onward, LCS-related posts remained sparse, with no sustained upward trend. Reddit users frequently discuss treatment and mental health concerns related to lung cancer but rarely engage with LCS as a topic, revealing a critical gap in public awareness. These findings highlight the need for targeted public health strategies to promote LCS awareness on social media platforms, leveraging the platforms’ growing role in health communication. Full article

(This article belongs to the Section Thoracic Oncology)

► Show Figures

Figure 1

17 pages, 2139 KB

Open AccessArticle

Decoding Digital Labor: A Topic Modeling Analysis of Platform Work Experiences

by Oya Ütük Bayılmış and Serdar Orhan

Systems 2025, 13(9), 819; https://doi.org/10.3390/systems13090819 - 18 Sep 2025

Viewed by 494

Abstract

The growing prevalence of digital labor platforms has fundamentally transformed business models by creating interconnected value systems that redefine how work is organized, delivered, and monetized in today’s digital economy. This study examines platform-based business model innovation through the lens of value co-creation [...] Read more.

The growing prevalence of digital labor platforms has fundamentally transformed business models by creating interconnected value systems that redefine how work is organized, delivered, and monetized in today’s digital economy. This study examines platform-based business model innovation through the lens of value co-creation processes, analyzing user-generated content from digital work platforms including Reddit, FlexJobs, Toptal, and Deel. Using Latent Dirichlet Allocation (LDA) topic modeling on 342 semantically filtered reviews from platform workers, we identified six key themes characterizing stakeholder experiences: User Experience and Platform Evaluation (23.77%), Financial Concerns and Time Management (18.49%), Platform Satisfaction and Recommendation System (16.60%), Paid Services and Investment Strategies (15.09%), Job Search Processes and Remote Work Alternatives (13.96%), and Overall Platform Performance and Account Management (12.08%). These findings reveal how digital platforms create value through complex interactions between technology infrastructure, governance mechanisms, and stakeholder experiences within interconnected ecosystems. The dominance of user experience concerns over purely economic considerations challenges traditional labor economics frameworks and highlights the critical role of platform design in worker satisfaction. Our analysis demonstrates that successful plsatform business models depend on balancing technological capabilities with human-centered value propositions, requiring innovative approaches to ecosystem orchestration, stakeholder engagement, and value distribution. The study contributes to understanding how digital business models can leverage interconnected value systems to drive sustainable innovation, offering strategic insights for platform design, ecosystem governance, and business model optimization in the digital era. Full article

(This article belongs to the Special Issue Business Model Innovation in the Digital Era)

► Show Figures

Figure 1

12 pages, 902 KB

Open AccessArticle

Mapping the Infodemic: Geolocating Reddit Users and Unsupervised Topic Modeling of COVID-19-Related Misinformation

by Lulu Alarfaj, Jeremy Blackburn, Maaz Amjad, Jay Patel and Zeynep Ertem

Information 2025, 16(9), 748; https://doi.org/10.3390/info16090748 - 28 Aug 2025

Viewed by 893

Abstract

The problem of geolocating Reddit users without access to the author information API is tackled in this study. Using subreddit data, we analyzed and identified user location based on their interactions within location-specific subreddits. Using unsupervised learning methods such as Latent Dirichlet Allocation [...] Read more.

The problem of geolocating Reddit users without access to the author information API is tackled in this study. Using subreddit data, we analyzed and identified user location based on their interactions within location-specific subreddits. Using unsupervised learning methods such as Latent Dirichlet Allocation (LDA) and Non-Negative Matrix Factorization (NMF) algorithms, we examined conversations about COVID-19 and immunization across the U.S., focusing on COVID-19 vaccination. Our topic modeling identifies four themes: humor and sarcasm (e.g., jokes about microchips), conspiracy theories (e.g., tracking devices and microchips in the COVID-19 vaccine), public skepticism (e.g., debates over vaccine safety and freedom), and vaccine brand concerns (e.g., Pfizer, Moderna, and booster shots). Our geolocation analysis shows that regions with lower vaccination rates often exhibit a higher prevalence of misinformation-labeled comments. For example, counties such as Ada County (Idaho), Newton County (Missouri), and Flathead County (Montana) showed both a low vaccine uptake and a high rate of false information. This study provides useful information on the many different examples of misinformation that are disseminated online. It gives us a better understanding of how people in different parts of the U.S. think about getting a COVID-19 vaccine. Full article

► Show Figures

Figure 1

20 pages, 1317 KB

Open AccessArticle

The ChatGPT Effect: Investigating Shifting Discourse Patterns, Sentiment, and Benefit–Challenge Framing in AI Mental Health Support

by Sanguk Lee, Minjin (MJ) Rheu and Jie Zhuang

Behav. Sci. 2025, 15(9), 1172; https://doi.org/10.3390/bs15091172 - 28 Aug 2025

Viewed by 1213

Abstract

AI has the potential to enhance mental health by scaling support. However, its implementation brings uncertainties and challenges that require careful review to ensure safety. This study examined evolving public views on AI mental health support by analyzing relevant Reddit posts (n [...] Read more.

AI has the potential to enhance mental health by scaling support. However, its implementation brings uncertainties and challenges that require careful review to ensure safety. This study examined evolving public views on AI mental health support by analyzing relevant Reddit posts (n = 517). Following the release of ChatGPT in 2022, discussions about AI in the context of mental health surged, with a noticeable shift in preference toward large language models (LLMs) over conventional therapy chatbots. Users appreciated AI for its emotional support, companionship, and accessibility, while also expressing concerns about adverse effects and lack of conversational depth and emotional connection. Distinct patterns in how benefits and challenges were discussed emerged between experienced and non-experienced AI users, as well as between AI-focused and mental health-focused communities. AI-experienced users acknowledged both the benefits and limitations, whereas AI communities emphasized the positives and mental health communities highlighted the lack of conversational depth. These findings underscore the need for tailored communication strategies to set realistic expectations about the utility of AI in mental healthcare among different stakeholders. This research provides insights into developing ethical AI systems that complement traditional care while addressing current limitations. Full article

(This article belongs to the Special Issue Promoting Health Behaviors in the New Media Era)

► Show Figures

Figure 1

20 pages, 351 KB

Open AccessArticle

Multi-Level Depression Severity Detection with Deep Transformers and Enhanced Machine Learning Techniques

by Nisar Hussain, Amna Qasim, Gull Mehak, Muhammad Zain, Grigori Sidorov, Alexander Gelbukh and Olga Kolesnikova

AI 2025, 6(7), 157; https://doi.org/10.3390/ai6070157 - 15 Jul 2025

Cited by 1 | Viewed by 1726

Abstract

Depression is now one of the most common mental health concerns in the digital era, calling for powerful computational tools for its detection and its level of severity estimation. A multi-level depression severity detection framework in the Reddit social media network is proposed [...] Read more.

Depression is now one of the most common mental health concerns in the digital era, calling for powerful computational tools for its detection and its level of severity estimation. A multi-level depression severity detection framework in the Reddit social media network is proposed in this study, and posts are classified into four levels: minimum, mild, moderate, and severe. We take a dual approach using classical machine learning (ML) algorithms and recent Transformer-based architectures. For the ML track, we build ten classifiers, including Logistic Regression, SVM, Naive Bayes, Random Forest, XGBoost, Gradient Boosting, K-NN, Decision Tree, AdaBoost, and Extra Trees, with two recently proposed embedding methods, Word2Vec and GloVe embeddings, and we fine-tune them for mental health text classification. Of these, XGBoost yields the highest F1-score of 94.01 using GloVe embeddings. For the deep learning track, we fine-tune ten Transformer models, covering BERT, RoBERTa, XLM-RoBERTa, MentalBERT, BioBERT, RoBERTa-large, DistilBERT, DeBERTa, Longformer, and ALBERT. The highest performance was achieved by the MentalBERT model, with an F1-score of 97.31, followed by RoBERTa (96.27) and RoBERTa-large (96.14). Our results demonstrate that, to the best of the authors’ knowledge, domain-transferred Transformers outperform non-Transformer-based ML methods in capturing subtle linguistic cues indicative of different levels of depression, thereby highlighting their potential for fine-grained mental health monitoring in online settings. Full article

(This article belongs to the Special Issue AI in Bio and Healthcare Informatics)

► Show Figures

Figure 1

20 pages, 3153 KB

Open AccessArticle

Backfire Effect Reveals Early Controversy in Online Media

by Songtao Peng, Tao Jin, Kailun Zhu, Qi Xuan and Yong Min

Mathematics 2025, 13(13), 2147; https://doi.org/10.3390/math13132147 - 30 Jun 2025

Viewed by 1088

Abstract

The rapid development of online media has significantly facilitated the public’s information consumption, knowledge acquisition, and opinion exchange. However, it has also led to more violent conflicts in online discussions. Therefore, controversy detection becomes important for computational and social sciences. Previous research on [...] Read more.

The rapid development of online media has significantly facilitated the public’s information consumption, knowledge acquisition, and opinion exchange. However, it has also led to more violent conflicts in online discussions. Therefore, controversy detection becomes important for computational and social sciences. Previous research on detection methods has primarily focused on larger datasets and more complex computational models but has rarely examined the underlying mechanisms of conflict, particularly the psychological motivations behind them. In this paper, we propose a lightweight and language-independent method for controversy detection by introducing two novel psychological features: ascending gradient (AG) and tier ascending gradient (TAG). These features capture psychological signals in user interactions—specifically, the patterns where controversial comments generate disproportionate replies or replies outperform parent comments in likes. We develop these features based on the theory of the backfire effect in ideological conflict and demonstrate their consistent effectiveness across models and platforms. Compared with structural, interaction, and text-based features, AG and TAG show higher importance scores and better generalizability. Extensive experiments on Chinese and English platforms (Reddit, Toutiao, and Sina) confirm the robustness of our features across languages and algorithms. Moreover, the features exhibit strong performance even when applied to early-stage data or limited “one-page” scenarios, supporting their utility for early controversy detection. Our work highlights a new psychological perspective on conflict behavior in online discussions and bridges behavioral patterns and computational modeling. Full article

(This article belongs to the Special Issue Data Mining Algorithms and Mathematical Models for Social Network Analysis)

► Show Figures

Figure 1

30 pages, 2494 KB

Open AccessArticle

A Novel Framework for Mental Illness Detection Leveraging TOPSIS-ModCHI-Based Feature-Driven Randomized Neural Networks

by Santosh Kumar Behera and Rajashree Dash

Math. Comput. Appl. 2025, 30(4), 67; https://doi.org/10.3390/mca30040067 - 30 Jun 2025

Viewed by 643

Abstract

Mental illness has emerged as a significant global health crisis, inflicting immense suffering and causing a notable decrease in productivity. Identifying mental health disorders at an early stage allows healthcare professionals to implement more targeted and impactful interventions, leading to a significant improvement [...] Read more.

Mental illness has emerged as a significant global health crisis, inflicting immense suffering and causing a notable decrease in productivity. Identifying mental health disorders at an early stage allows healthcare professionals to implement more targeted and impactful interventions, leading to a significant improvement in the overall well-being of the patient. Recent advances in Artificial Intelligence (AI) have opened new avenues for analyzing medical records and behavioral data of patients to assist mental health professionals in their decision-making processes. In this study performance of four Randomized Neural Networks (RandNNs) such as Board Learning System (BLS), Random Vector Functional Link Network (RVFLN), Kernelized RVFLN (KRVFLN), and Extreme Learning Machine (ELM) are explored for detecting the type of mental illness a user may have by analyzing the random text of the user posted on social media. To improve the performance of the RandNNs during handling the text documents with unbalanced class distributions, a hybrid feature selection (FS) technique named as TOPSIS-ModCHI is suggested in the preprocessing stage of the classification framework. The effectiveness of the suggested FS with all the four randomized networks is assessed over the publicly available Reddit Mental Health Dataset after experimenting on two benchmark multiclass unbalanced datasets. From the experimental results, it is inferred that detecting the mental illness using BLS with TOPSIS-ModCHI produces the highest precision value of 0.92, recall value of 0.66, f-measure value of 0.77, and Hamming loss value of 0.06 as compared to ELM, RVFLN, and KRVFLN with a minimum feature size of 900. Overall, utilizing BLS for mental health analysis can offer a promising avenue toward improved interventions and a better understanding of mental health issues, aiding in decision-making processes. Full article

► Show Figures

Figure 1

21 pages, 8895 KB

Open AccessArticle

Opioid Crisis Detection in Social Media Discourse Using Deep Learning Approach

by Muhammad Ahmad, Grigori Sidorov, Maaz Amjad, Iqra Ameer and Ildar Batyrshin

Information 2025, 16(7), 545; https://doi.org/10.3390/info16070545 - 27 Jun 2025

Cited by 2 | Viewed by 843

Abstract

The opioid drug overdose death rate remains a significant public health crisis in the U.S., where an opioid epidemic has led to a dramatic rise in overdose deaths over the past two decades. Since 1999, opioids have been implicated in approximately 75% of [...] Read more.

The opioid drug overdose death rate remains a significant public health crisis in the U.S., where an opioid epidemic has led to a dramatic rise in overdose deaths over the past two decades. Since 1999, opioids have been implicated in approximately 75% of the nearly one million drug-related deaths. Research indicates that the epidemic is caused by both over-prescribing and social and psychological determinants such as economic stability, hopelessness, and social isolation. Impeding this research is the lack of measurements of these social and psychological constructs at fine-grained spatial and temporal resolution. To address this issue, we sourced data from Reddit, where people share self-reported experiences with opioid substances, specifically using opioid drugs through different routes of administration. To achieve this objective, an opioid overdose dataset is created and manually annotated in binary and multi-classification, along with detailed annotation guidelines. In traditional manual investigations, the route of administration is determined solely through biological laboratory testing. This study investigates the efficacy of an automated tool leveraging natural language processing and transformer model, such as RoBERTa, to analyze patterns of substance use. By systematically examining these patterns, the model contributes to public health surveillance efforts, facilitating the identification of at-risk populations and informing the development of targeted interventions. This approach ultimately aims to enhance prevention and treatment strategies for opioid misuse through data-driven insights. The findings show that our proposed methodology achieved the highest cross-validation score of 93% for binary classification and 91% for multi-class classification, demonstrating performance improvements of 9.41% and 10.98%, respectively, over the baseline model (XGB, 85% in binary class and 81% in multi-class). Full article

(This article belongs to the Special Issue Learning and Knowledge: Theoretical Issues and Applications)

► Show Figures

Graphical abstract

15 pages, 2018 KB

Open AccessArticle

An Exploratory Network Analysis of Discussion Topics About Autism Across Subreddit Communities

by Skylar DeWitt, Kendall Mills and Adam M. Briggs

Behav. Sci. 2025, 15(6), 812; https://doi.org/10.3390/bs15060812 - 13 Jun 2025

Viewed by 857

Abstract

Using an inductive computational approach, our present data exploration sought to use machine learning methodology to define and identify patterns and gain insight into autism-related discussions on Reddit across three different categories of subreddits: (a) individuals who self-identify as autistic, (b) parents of [...] Read more.

Using an inductive computational approach, our present data exploration sought to use machine learning methodology to define and identify patterns and gain insight into autism-related discussions on Reddit across three different categories of subreddits: (a) individuals who self-identify as autistic, (b) parents of individuals on the autism spectrum, and (c) behavior therapists. By doing so, we sought to review authentic autism-related discussions and identify important topics that emerged across these three demographic groups, including insights related to assessing and treating challenging behavior. Following basic and advanced preprocessing, our extraction resulted in 57 subreddits and 46,914 comments from autism spectrum subreddit members, 46 subreddits and 27,838 comments from parent subreddit members, and six subreddits with 3163 comments from behavior therapist subreddit members. Subsequent network analyses revealed interesting patterns of discussion within and across subreddit groups that may be used to inform support and resources, practice considerations, and future directions for research. Full article

(This article belongs to the Special Issue Challenging Behavior of Individuals with Autism and/or Other Neurodevelopmental Disabilities)

► Show Figures

Figure 1

37 pages, 8684 KB

Open AccessArticle

Information Diffusion Modeling in Social Networks: A Comparative Analysis of Delay Mechanisms Using Population Dynamics

by Kamila Bakenova, Oleksandr Kuznetsov, Iryna Artyshchuk, Aigul Shaikhanova, Ruslan Shevchuk and Oleksandra Orobchuk

Appl. Sci. 2025, 15(11), 6092; https://doi.org/10.3390/app15116092 - 28 May 2025

Cited by 2 | Viewed by 3384

Abstract

This study presents a comprehensive analysis of information diffusion in social networks with time delay mechanisms. We first analyze real Reddit thread data, identifying limitations in the sample size. To overcome this, we develop synthetic network models with varied structural properties. Our approach [...] Read more.

This study presents a comprehensive analysis of information diffusion in social networks with time delay mechanisms. We first analyze real Reddit thread data, identifying limitations in the sample size. To overcome this, we develop synthetic network models with varied structural properties. Our approach tests three delay types (constant, uniform, exponential) across different network structures, using machine learning models to identify key factors influencing information coverage. The results show that spread probability consistently impacts diffusion across all datasets. Gradient Boosting models achieve R² = 0.847 on synthetic data. Random networks with a constant delay mechanism and high spread probability (0.4) maximize coverage. When verified against test data, peak speed time emerges as the strongest predictor (r = 0.995, p < 0.001). Our findings provide practical recommendations for optimizing information spread in social networks and demonstrate the value of integrating real and synthetic data in diffusion modeling. Full article

(This article belongs to the Special Issue Empowering Interactions: Advancing Human-Centred AI for Transparent, Collaborative and Accessible Applications)

► Show Figures

Figure 1

21 pages, 2372 KB

Open AccessArticle

Will You Become the Next Troll? A Computational Mechanics Approach to the Contagion of Trolling Behavior

by Qiusi Sun and Martin Hilbert

Entropy 2025, 27(5), 542; https://doi.org/10.3390/e27050542 - 21 May 2025

Viewed by 739

Abstract

Trolling behavior is not simply a result of ‘bad actors’, an individual trait, or a linguistic phenomenon, but emerges from complex contagious social dynamics. This study uses formal concepts from information theory and complexity science to study it as such. The data comprised [...] Read more.

Trolling behavior is not simply a result of ‘bad actors’, an individual trait, or a linguistic phenomenon, but emerges from complex contagious social dynamics. This study uses formal concepts from information theory and complexity science to study it as such. The data comprised over 13 million Reddit comments, which were classified as troll or non-troll messages using the BERT model, fine-tuned with a human coding set. We derive the unique, minimally complex, and maximally predictive model from statistical mechanics, i.e., ε-machines and transducers, and can distinguish which aspects of trolling behaviors are both self-motivated and socially induced. While the vast majority of self-driven dynamics are like flipping a coin (86.3%), when social contagion is considered, most users (95.6%) show complex hidden multiple-state patterns. Within this complexity, trolling follows predictable transitions, with, for example, a 76% probability of remaining in a trolling state once it is reached. We find that replying to a trolling comment significantly increases the likelihood of switching to a trolling state or staying in it (72%). Besides being a showcase for the use of information-theoretic measures from dynamic systems theory to conceptualize human dynamics, our findings suggest that users and platform designers should go beyond calling out and removing trolls, but foster and design environments that discourage the dynamics leading to the emergence of trolling behavior. Full article

(This article belongs to the Special Issue Complex Dynamic System Modelling, Identification and Control, 2nd Edition)

► Show Figures

Figure 1

19 pages, 641 KB

Open AccessArticle

Big Five Personality Trait Prediction Based on User Comments

by Kit-May Shum, Michal Ptaszynski and Fumito Masui

Information 2025, 16(5), 418; https://doi.org/10.3390/info16050418 - 20 May 2025

Cited by 1 | Viewed by 5084

Abstract

The study of personalities is a major component of human psychology, and with an understanding of personality traits, practical applications can be used in various domains, such as mental health care, predicting job performance, and optimising marketing strategies. This study explores the prediction [...] Read more.

The study of personalities is a major component of human psychology, and with an understanding of personality traits, practical applications can be used in various domains, such as mental health care, predicting job performance, and optimising marketing strategies. This study explores the prediction of Big Five personality trait scores from online comments using transformer-based language models, focusing on improving the model performance with a larger dataset and investigating the role of intercorrelations among traits. Using the PANDORA dataset from Reddit, the RoBERTa and BERT models, including both the base and large variants, were fine-tuned and evaluated to determine their effectiveness in personality trait prediction. Compared to previous work, our study utilises a significantly larger dataset to enhance the model’s generalisation and robustness. The results indicate that RoBERTa outperforms BERT across most metrics, with RoBERTa large achieving the best overall performance. In addition to evaluating the overall predictive accuracy, this study investigates the impact of intercorrelations among personality traits. A comparative analysis is conducted between a single-model approach, which predicts all five traits simultaneously, and a multiple-model approach, fine-tuning the models independently and each predicting a single trait. The findings reveal that the single-model approach achieves a lower RMSE and higher

R^{2}

values, highlighting the importance of incorporating trait intercorrelations in improving the prediction accuracy. Furthermore, RoBERTa large demonstrated a stronger ability to capture these intercorrelations compared to previous studies. These findings emphasise the potential of transformer-based models in personality computing and underscore the importance of leveraging both larger datasets and intercorrelations to enhance predictive performance. Full article

► Show Figures

Figure 1

Search Results (120)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI

Saved Queries

Search Filter Reset All

Years

Feature Papers

Subjects

Journals

Article Types

Countries / Regions

Search Results (120)

Further Information

Guidelines

MDPI Initiatives

Follow MDPI