2nd Edition of Information Retrieval and Social Media Mining

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: closed (30 November 2024) | Viewed by 17139

Special Issue Editor


E-Mail Website
Guest Editor
Department of Computer Science and Automation, Science Faculty, University of Salamanca, Plaza de los Caídos s/n, 37008 Salamanca, Spain
Interests: data mining; web mining; machine learning; deep learning; recommender system; decision support in medicine
Special Issues, Collections and Topics in MDPI journals

Special Issue Information

Dear Colleagues,

The MDPI journal Information is inviting submissions for a Special Issue on “2nd Edition of Information Retrieval and Social Media Mining”.

The increasing interest of citizens in websites, social networks, streaming services, and other online media has led to the web becoming an indispensable instrument in daily life for business activities, learning, entertainment, and communication, etc. Internet users can now share and access a nearly unlimited amount of information. This opens up great opportunities to exploit this valuable information by transforming it into useful knowledge through appropriate techniques. In this context, data mining methods arise as efficient tools to help users in the recovery of suitable online information, products, or services, as well as being useful for exploring a wide range of social media aspects such as user behavior, communities, networks structures, information diffusion, and many more.

This Special Issue aims to provide a forum for the presentation and discussion of the latest advances concerning web and social media mining.

Topics of interest may include, but are not limited to, the following:

  • Web mining—content, structure, and usage mining;
  • User profiling and personalization;
  • Recommender systems;
  • Sentiment analysis and opinion mining;
  • Social influence analysis;
  • Detection and analysis of social communities;
  • Information diffusion in social media.

Dr. María N. Moreno García
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • social media mining
  • personalization
  • recommender systems
  • sentiment analysis

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (10 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

18 pages, 2811 KiB  
Article
The Power of Words from the 2024 United States Presidential Debates: A Natural Language Processing Approach
by Ana Lorena Jiménez-Preciado, José Álvarez-García, Salvador Cruz-Aké and Francisco Venegas-Martínez
Information 2025, 16(1), 2; https://doi.org/10.3390/info16010002 - 25 Dec 2024
Abstract
This study analyzes the linguistic patterns and rhetorical strategies employed in the 2024 U.S. presidential debates from the exchanges between Donald Trump, Joe Biden, and Kamala Harris. This paper examines debate transcripts to find underlying themes and communication styles using Natural Language Processing [...] Read more.
This study analyzes the linguistic patterns and rhetorical strategies employed in the 2024 U.S. presidential debates from the exchanges between Donald Trump, Joe Biden, and Kamala Harris. This paper examines debate transcripts to find underlying themes and communication styles using Natural Language Processing (NLP) advanced techniques, including an n-gram analysis, sentiment analysis, and lexical diversity measurements. The methodology combines a quantitative text analysis with qualitative interpretation through the Jaccard similarity coefficient, the Type–Token Ratio, and the Measure of Textual Lexical Diversity. The empirical results reveal distinct linguistic profiles for each candidate: Trump consistently employed emotionally charged language with high sentiment volatility, while Biden and Harris demonstrated more measured approaches with higher lexical diversity. Finally, this research contributes to the understanding of political discourse in high-stakes debates through NLP and can offer information on the evolution of the communication strategies of the presidential candidates of any country with this regime. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Graphical abstract

24 pages, 1208 KiB  
Article
Text Analytics on YouTube Comments for Food Products
by Maria Tsiourlini, Katerina Tzafilkou, Dimitrios Karapiperis and Christos Tjortjis
Information 2024, 15(10), 599; https://doi.org/10.3390/info15100599 - 30 Sep 2024
Viewed by 1289
Abstract
YouTube is a popular social media platform in the contemporary digital landscape. The primary focus of this study is to explore the underlying sentiment in user comments about food-related videos on YouTube, specifically within two pivotal food categories: plant-based and hedonic product. We [...] Read more.
YouTube is a popular social media platform in the contemporary digital landscape. The primary focus of this study is to explore the underlying sentiment in user comments about food-related videos on YouTube, specifically within two pivotal food categories: plant-based and hedonic product. We labeled comments using sentiment lexicons such as TextBlob, VADER, and Google’s Sentiment Analysis (GSA) engine. Comment sentiment was classified using advanced Machine-Learning (ML) algorithms, namely Support Vector Machines (SVM), Multinomial Naive Bayes, Random Forest, Logistic Regression, and XGBoost. The evaluation of these models encompassed key macro average metrics, including accuracy, precision, recall, and F1 score. The results from GSA showed a high accuracy level, with SVM achieving 93% accuracy in the plant-based dataset and 96% in the hedonic dataset. In addition to sentiment analysis, we delved into user interactions within the two datasets, measuring crucial metrics, such as views, likes, comments, and engagement rate. The findings illuminate significantly higher levels of views, likes, and comments in the hedonic food dataset, but the plant-based dataset maintains a superior overall engagement rate. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Figure 1

15 pages, 549 KiB  
Article
A Hybrid Hierarchical Mathematical Heuristic Solution of Sparse Algebraic Equations in Sentiment Analysis
by Maryam Jalali, Morteza Zahedi, Abdorreza Alavi Gharahbagh, Vahid Hajihashemi, José J. M. Machado and João Manuel R. S. Tavares
Information 2024, 15(9), 513; https://doi.org/10.3390/info15090513 - 23 Aug 2024
Viewed by 745
Abstract
Many text mining methods use statistical information as a text- and language-independent approach for sentiment analysis. However, text mining methods based on stochastic patterns and rules require many samples for training. On the other hand, deterministic and non-probabilistic methods are easier and faster [...] Read more.
Many text mining methods use statistical information as a text- and language-independent approach for sentiment analysis. However, text mining methods based on stochastic patterns and rules require many samples for training. On the other hand, deterministic and non-probabilistic methods are easier and faster to solve than other methods, but they are inefficient when dealing with Natural Language Processing (NLP) data. This research presents a novel hybrid solution based on two mathematical approaches combined with a heuristic approach to solve unbalanced pseudo-linear algebraic equation systems that can be used as a sentiment word scoring system. In its first step, the proposed solution uses two mathematical approaches to find two initial populations for a heuristic method. The heuristic solution solves a pseudo-linear NLP scoring scheme in a polarity detection method and determines the final scores. The proposed solution was validated using three scenarios on the SemEval-2013 competition, the ESWC dataset, and the Taboada dataset. The simulation results revealed that the proposed solution is comparable to the best state-of-the-art methods in polarity detection. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Graphical abstract

13 pages, 2270 KiB  
Article
GRAAL: Graph-Based Retrieval for Collecting Related Passages across Multiple Documents
by Misael Mongiovì and Aldo Gangemi
Information 2024, 15(6), 318; https://doi.org/10.3390/info15060318 - 29 May 2024
Viewed by 849
Abstract
Finding passages related to a sentence over a large collection of text documents is a fundamental task for claim verification and open-domain question answering. For instance, a common approach for verifying a claim is to extract short snippets of relevant text from a [...] Read more.
Finding passages related to a sentence over a large collection of text documents is a fundamental task for claim verification and open-domain question answering. For instance, a common approach for verifying a claim is to extract short snippets of relevant text from a collection of reference documents and provide them as input to a natural language inference machine that determines whether the claim can be deduced or refuted. Available approaches struggle when several pieces of evidence from different documents need to be combined to make an inference, as individual documents often have a low relevance with the input and are therefore excluded. We propose GRAAL (GRAph-based retrievAL), a novel graph-based approach that outlines the relevant evidence as a subgraph of a large graph that summarizes the whole corpus. We assess the validity of this approach by building a large graph that represents co-occurring entity mentions on a corpus of Wikipedia pages and using this graph to identify candidate text relevant to a claim across multiple pages. Our experiments on a subset of FEVER, a popular benchmark, show that the proposed approach is effective in identifying short passages related to a claim from multiple documents. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Figure 1

18 pages, 4615 KiB  
Article
Fake User Detection Based on Multi-Model Joint Representation
by Jun Li, Wentao Jiang, Jianyi Zhang, Yanhua Shao and Wei Zhu
Information 2024, 15(5), 266; https://doi.org/10.3390/info15050266 - 9 May 2024
Viewed by 1363
Abstract
The existing deep learning-based detection of fake information focuses on the transient detection of news itself. Compared to user category profile mining and detection, transient detection is prone to higher misjudgment rates due to the limitations of insufficient temporal information, posing new challenges [...] Read more.
The existing deep learning-based detection of fake information focuses on the transient detection of news itself. Compared to user category profile mining and detection, transient detection is prone to higher misjudgment rates due to the limitations of insufficient temporal information, posing new challenges to social public opinion monitoring tasks such as fake user detection. This paper proposes a multimodal aggregation portrait model (MAPM) based on multi-model joint representation for social media platforms. It constructs a deep learning-based multimodal fake user detection framework by analyzing user behavior datasets within a time retrospective window. It integrates a pre-trained Domain Large Model to represent user behavior data across multiple modalities, thereby constructing a high-generalization implicit behavior feature spectrum for users. In response to the tendency of existing fake user behavior mining to neglect time-series features, this study introduces an improved network called Sequence Interval Detection Net (SIDN) based on Sequence to Sequence (seq2seq) to characterize time interval sequence behaviors, achieving strong expressive capabilities for detecting fake behaviors within the time window. Ultimately, the amalgamation of latent behavioral features and explicit characteristics serves as the input for spectral clustering in detecting fraudulent users. The experimental results on Weibo real dataset demonstrate that the proposed model outperforms the detection utilizing explicit user features, with an improvement of 27.0% in detection accuracy. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Figure 1

20 pages, 425 KiB  
Article
Clustering-Based Joint Topic-Sentiment Modeling of Social Media Data: A Neural Networks Approach
by David Hanny and Bernd Resch
Information 2024, 15(4), 200; https://doi.org/10.3390/info15040200 - 4 Apr 2024
Cited by 3 | Viewed by 2486
Abstract
With the vast amount of social media posts available online, topic modeling and sentiment analysis have become central methods to better understand and analyze online behavior and opinion. However, semantic and sentiment analysis have rarely been combined for joint topic-sentiment modeling which yields [...] Read more.
With the vast amount of social media posts available online, topic modeling and sentiment analysis have become central methods to better understand and analyze online behavior and opinion. However, semantic and sentiment analysis have rarely been combined for joint topic-sentiment modeling which yields semantic topics associated with sentiments. Recent breakthroughs in natural language processing have also not been leveraged for joint topic-sentiment modeling so far. Inspired by these advancements, this paper presents a novel framework for joint topic-sentiment modeling of short texts based on pre-trained language models and a clustering approach. The method leverages techniques from dimensionality reduction and clustering for which multiple algorithms were considered. All configurations were experimentally compared against existing joint topic-sentiment models and an independent sequential baseline. Our framework produced clusters with semantic topic quality scores of up to 0.23 while the best score among the previous approaches was 0.12. The sentiment classification accuracy increased from 0.35 to 0.72 and the uniformity of sentiments within the clusters reached up to 0.9 in contrast to the baseline of 0.56. The presented approach can benefit various research areas such as disaster management where sentiments associated with topics can provide practical useful information. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Figure 1

17 pages, 1604 KiB  
Article
Social Network Community Detection to Deal with Gray-Sheep and Cold-Start Problems in Music Recommender Systems
by Diego Sánchez-Moreno, Vivian F. López Batista, María Dolores Muñoz Vicente, Ángel Luis Sánchez Lázaro and María N. Moreno-García
Information 2024, 15(3), 138; https://doi.org/10.3390/info15030138 - 29 Feb 2024
Cited by 2 | Viewed by 1745
Abstract
Information from social networks is currently being widely used in many application domains, although in the music recommendation area, its use is less common because of the limited availability of social data. However, most streaming platforms allow for establishing relationships between users that [...] Read more.
Information from social networks is currently being widely used in many application domains, although in the music recommendation area, its use is less common because of the limited availability of social data. However, most streaming platforms allow for establishing relationships between users that can be leveraged to address some drawbacks of recommender systems. In this work, we take advantage of the social network structure to improve recommendations for users with unusual preferences and new users, thus dealing with the gray-sheep and cold-start problems, respectively. Since collaborative filtering methods base the recommendations for a given user on the preferences of his/her most similar users, the scarcity of users with similar tastes to the gray-sheep users and the unawareness of the preferences of the new users usually lead to bad recommendations. These general problems of recommender systems are worsened in the music domain, where the popularity bias drawback is also present. In order to address these problems, we propose a user similarity metric based on the network structure as well as on user ratings. This metric significantly improves the recommendation reliability in those scenarios by capturing both homophily effects in implicit communities of users in the network and user similarity in terms of preferences. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Figure 1

18 pages, 1865 KiB  
Article
Online Information Reviews to Boost Tourism in the B&B Industry to Reveal the Truth and Nexus
by Xiaoqun Wang, Xihui Chen and Zhouyi Gu
Information 2024, 15(2), 103; https://doi.org/10.3390/info15020103 - 9 Feb 2024
Cited by 1 | Viewed by 1948
Abstract
Grasping the concerns of customers is paramount, serving as a foundation for both attracting and retaining a loyal customer base. While customer satisfaction has been extensively explored across diverse industries, there remains a dearth of insights into how distinct rural bed and breakfasts [...] Read more.
Grasping the concerns of customers is paramount, serving as a foundation for both attracting and retaining a loyal customer base. While customer satisfaction has been extensively explored across diverse industries, there remains a dearth of insights into how distinct rural bed and breakfasts (RB&Bs) can effectively cater to the specific needs of their target audience. This research utilized latent semantic analysis and text regression techniques on online reviews, uncovering previously unrecognized factors contributing to RB&B customer satisfaction. Furthermore, the study demonstrates that certain factors wield distinct impacts on guest satisfaction within varying RB&B market segments. The implications of these findings extend to empowering RB&B owners with actionable insights to enhance the overall customer experience. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Figure 1

31 pages, 8938 KiB  
Article
Sentiment Analysis in the Age of COVID-19: A Bibliometric Perspective
by Andra Sandu, Liviu-Adrian Cotfas, Camelia Delcea, Liliana Crăciun and Anca Gabriela Molănescu
Information 2023, 14(12), 659; https://doi.org/10.3390/info14120659 - 13 Dec 2023
Cited by 13 | Viewed by 2466
Abstract
The global impact of the COVID-19 pandemic has been profound, placing significant challenges upon healthcare systems and the world economy. The pervasive presence of illness, uncertainty, and fear has markedly diminished overall life satisfaction. Consequently, sentiment analysis has gained substantial traction among scholars [...] Read more.
The global impact of the COVID-19 pandemic has been profound, placing significant challenges upon healthcare systems and the world economy. The pervasive presence of illness, uncertainty, and fear has markedly diminished overall life satisfaction. Consequently, sentiment analysis has gained substantial traction among scholars seeking to unravel the emotional and attitudinal dimensions of this crisis. This research endeavors to provide a bibliometric perspective, shedding light on the principal contributors to this emerging field. It seeks to spotlight the academic institutions associated with this research domain, along with identifying the most influential publications in terms of both paper volume and h-index metrics. To this end, we have meticulously curated a dataset comprising 646 papers sourced from the ISI Web of Science database, all centering on the theme of sentiment analysis during the COVID-19 pandemic. Our findings underscore a burgeoning interest exhibited by the academic community in this particular domain, evident in an astonishing annual growth rate of 153.49%. Furthermore, our analysis elucidates key keywords and collaborative networks within the authorship, offering valuable insights into the global proliferation of this thematic pursuit. In addition to this, our analysis encompasses an n-gram investigation across keywords, abstracts, titles, and keyword plus, complemented by an examination of the most frequently cited works. The results gleaned from these endeavors offer crucial perspectives, contribute to the identification of pertinent issues, and provide guidance for informed decision-making. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Figure 1

28 pages, 4736 KiB  
Article
Polarizing Topics on Twitter in the 2022 United States Elections
by Josip Katalinić, Ivan Dunđer and Sanja Seljan
Information 2023, 14(11), 609; https://doi.org/10.3390/info14110609 - 10 Nov 2023
Viewed by 2675
Abstract
Politically polarizing issues are a growing concern around the world, creating divisions along ideological lines, which was also confirmed during the 2022 United States midterm elections. The purpose of this study was to explore the relationship between the results of the 2022 U.S. [...] Read more.
Politically polarizing issues are a growing concern around the world, creating divisions along ideological lines, which was also confirmed during the 2022 United States midterm elections. The purpose of this study was to explore the relationship between the results of the 2022 U.S. midterm elections and the topics that were covered during the campaign. A dataset consisting of 52,688 tweets in total was created by collecting tweets of senators, representatives and governors who participated in the elections one month before the start of the elections. Using unsupervised machine learning, topic modeling is built on the collected data and visualized to represent topics. Furthermore, supervised machine learning is used to classify tweets to the corresponding political party, whereas sentiment analysis is carried out in order to detect polarity and subjectivity. Tweets from participating politicians, U.S. states and involved parties were found to correlate with polarizing topics. This study hereby explored the relationship between the topics that were creating a divide between Democrats and Republicans during their campaign and the 2022 U.S. midterm election outcomes. This research found that polarizing topics permeated the Twitter (today known as X) campaign, and that all elections were classified as highly subjective. In the Senate and House elections, this classification analysis showed significant misclassification rates of 21.37% and 24.15%, respectively, indicating that Republican tweets often aligned with traditional Democratic narratives. Full article
(This article belongs to the Special Issue 2nd Edition of Information Retrieval and Social Media Mining)
Show Figures

Figure 1

Back to TopTop