Natural Language Processing (NLP) with Applications and Natural Language Understanding (NLU)

A special issue of Information (ISSN 2078-2489). This special issue belongs to the section "Information Applications".

Deadline for manuscript submissions: 30 July 2025 | Viewed by 14198

Special Issue Editor


E-Mail Website
Guest Editor
Department of Linguistics and Comparative Cultural Studies, Ca Foscari University of Venice, 30123 Venice, Italy
Interests: natural language processing; text analysis; information extraction; computational linguistics

Special Issue Information

Dear Colleagues,

NLP is both a technology and a science, a branch of computational linguistics and artificial intelligence. The current LLM hype has maybe reached a peak thanks to CHATGPT and its attractive applications: how will this influence NLP, and is NLP still needed, maybe in a different form? This Special Issue aims to bring together researchers in all areas of NLP to discuss applications and future research directions in the field of natural language understanding for DNNs.

NLP applications are all based on the same source of knowledge, i.e., words, as even visual neural networks need words for their captions. LLM vocabularies are usually made up of the first most frequent 50,000 words or types extracted from billions of tokens, but they still produce hallucinations and biases. Part of the problem at the heart of DNN use is their lack of generalization and their inherent inability to understand what they are processing.

In this Special Issue, we wish to feature papers on all kinds of applications that are language-oriented, starting from speech recognition and synthesis, machine translation, and question answering. Below is a provisional list of the most relevant ones:

  • Sentiment Analysis from Social Media
  • Chatbots and Smart Assistants
  • Email Filters
  • Text Summarization
  • Customer Support and Analytics for Market Intelligence
  • Online Search Engine and Autocomplete
  • Recruitment and Hiring
  • Auto-Correct and Next-Word Prediction
  • Spell and Grammar Checking
  • Text Extraction and Classification
  • Image and Facial Recognition for/from Captions
  • NLP for Multimedia Self-Learning Language Tools

Traditionally, NLP techniques vary from symbolic to statistical approaches but always address linguistic content that may be constituted by phonemes or phones, morphemes or sequences of subword units, tokens of various types and length, including punctuation, words, multiwords, or polywords, syntactic constituents, and dependency structures. These low-level strata make up what is currently addressed directly or indirectly by the majority of applications.

Attempts are being made to address text understanding, which belongs to higher levels of linguistic knowledge: pronominal binding, coreference resolution, quantifier raising, semantic representation in terms of AMRs, or other similar theories completed by word-sense disambiguation. Not all of these tasks are suited to current transformer-based LLMs, but they could be carried out by NLP components. Finally, we assume that, to attain reasoning which includes causal inference from knowledge of the world, lower linguistic strata should not be erased but used as a trigger for further processing. Thus, we wish to dedicate a separate subsection of this Special Issue to these latter topics, which may be dubbed as NLP for future applications with AI for text understanding or NLU, promoting project presentation and works in progress.

In sum, in this Special Issue, we wish to bring together researchers by discussing innovative applications in all fields of NLP, alongside innovative result in semantic processing for NLU.

Prof. Dr. Rodolfo Delmonte
Guest Editor

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Information is an international peer-reviewed open access monthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 1600 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language processing in current applications
  • natural language understanding for future applications
  • question answering
  • sentiment analysis from social media
  • chatbots and smart assistants
  • email filters
  • text summarization
  • customer support and analytics for market intelligence
  • online search engine and autocomplete
  • recruitment and hiring
  • auto-correct and next-word prediction
  • spell and grammar checking
  • text extraction and classification
  • image and facial recognition for/from captions
  • nlp for multimedia self-learning language tools

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

Jump to: Review

18 pages, 1491 KiB  
Article
Using Natural Language Processing and Machine Learning to Detect Online Radicalisation in the Maldivian Language, Dhivehi
by Hussain Ibrahim, Ahmed Ibrahim and Michael N. Johnstone
Information 2025, 16(5), 342; https://doi.org/10.3390/info16050342 - 24 Apr 2025
Abstract
Early detection of online radical content is important for intelligence services to combat radicalisation and terrorism. The motivation for this research was the lack of language tools in the detection of radicalisation in the Maldivian language, Dhivehi. This research applied Machine Learning and [...] Read more.
Early detection of online radical content is important for intelligence services to combat radicalisation and terrorism. The motivation for this research was the lack of language tools in the detection of radicalisation in the Maldivian language, Dhivehi. This research applied Machine Learning and Natural Language Processing (NLP) to detect online radicalisation content in Dhivehi, with the incorporation of domain-specific knowledge. The research used Machine Learning to evaluate the most effective technique for detection of radicalisation text in Dhivehi and used interviews with Subject Matter Experts and self-deradicalised individuals to validate the results, add contextual information and improve recognition accuracy. The contributions of this research to the existing body of knowledge include datasets in the form of labelled radical/non-radical text, sentiment corpus of radical words and primary interview data of self-deradicalised individuals and a technique for detection of radicalisation text in Dhivehi for the first time using Machine Learning. We found that the Naïve Bayes algorithm worked best for the detection of radicalisation text in Dhivehi with an Accuracy of 87.67%, Precision of 85.35%, Recall of 92.52% and an F2 score of 91%. Inclusion of the radical words identified through the interviews with SMEs as a count feature improved the performance of ML algorithms and Naïve Bayes by 9.57%. Full article
Show Figures

Figure 1

31 pages, 442 KiB  
Article
Recommending Actionable Strategies: A Semantic Approach to Integrating Analytical Frameworks with Decision Heuristics
by Renato Ghisellini, Remo Pareschi, Marco Pedroni and Giovanni Battista Raggi
Information 2025, 16(3), 192; https://doi.org/10.3390/info16030192 - 1 Mar 2025
Viewed by 614
Abstract
We present a novel approach for recommending actionable strategies by integrating strategic frameworks with decision heuristics through semantic analysis. While strategy frameworks provide systematic models for assessment and planning, and decision heuristics encode experiential knowledge, these traditions have historically remained separate. Our [...] Read more.
We present a novel approach for recommending actionable strategies by integrating strategic frameworks with decision heuristics through semantic analysis. While strategy frameworks provide systematic models for assessment and planning, and decision heuristics encode experiential knowledge, these traditions have historically remained separate. Our methodology bridges this gap using advanced natural language processing (NLP), demonstrated through integrating frameworks like the 6C model with the Thirty-Six Stratagems. The approach employs vector space representations and semantic similarity calculations to map framework parameters to heuristic patterns, supported by a computational architecture that combines deep semantic processing with constrained use of Large Language Models. By processing both primary content and secondary elements (diagrams, matrices) as complementary linguistic representations, we demonstrate effectiveness through corporate strategy case studies. The methodology generalizes to various analytical frameworks and heuristic sets, culminating in a plug-and-play architecture for generating recommender systems that enable cohesive integration of strategic frameworks and decision heuristics into actionable guidance. Full article
Show Figures

Figure 1

24 pages, 3266 KiB  
Article
A Novel Comprehensive Framework for Detecting and Understanding Health-Related Misinformation
by Halyna Padalko, Vasyl Chomko, Sergiy Yakovlev and Dmytro Chumachenko
Information 2025, 16(3), 175; https://doi.org/10.3390/info16030175 - 26 Feb 2025
Viewed by 670
Abstract
The spread of health-related misinformation has become a significant global challenge, particularly during the COVID-19 pandemic. This study introduces a comprehensive framework for detecting and analyzing misinformation using advanced natural language processing techniques. The proposed classification model combines BERT embeddings with Bi-LSTM architecture [...] Read more.
The spread of health-related misinformation has become a significant global challenge, particularly during the COVID-19 pandemic. This study introduces a comprehensive framework for detecting and analyzing misinformation using advanced natural language processing techniques. The proposed classification model combines BERT embeddings with Bi-LSTM architecture and attention mechanisms, achieving high performance, including 99.47% accuracy and an F1-score of 0.9947. In addition to classification, topic modeling is employed to identify thematic clusters, providing valuable insights into misinformation narratives. The findings demonstrate the effectiveness and reliability of the proposed methodology in detecting misinformation while offering tools for understanding its underlying themes. The adaptable and scalable approach makes it applicable to various domains and datasets. This research improves public health communication and combating misinformation in digital environments. Full article
Show Figures

Figure 1

13 pages, 1920 KiB  
Article
Bibliometric Analysis on ChatGPT Research with CiteSpace
by Dongyan Nan, Xiangying Zhao, Chaomei Chen, Seungjong Sun, Kyeo Re Lee and Jang Hyun Kim
Information 2025, 16(1), 38; https://doi.org/10.3390/info16010038 - 9 Jan 2025
Cited by 1 | Viewed by 1519
Abstract
ChatGPT is a generative artificial intelligence (AI) based chatbot developed by OpenAI and has attracted great attention since its launch in late 2022. This study aims to provide an overview of ChatGPT research through a CiteSpace-based bibliometric analysis. We collected 2465 published articles [...] Read more.
ChatGPT is a generative artificial intelligence (AI) based chatbot developed by OpenAI and has attracted great attention since its launch in late 2022. This study aims to provide an overview of ChatGPT research through a CiteSpace-based bibliometric analysis. We collected 2465 published articles related to ChatGPT from the Web of Science. The main forces in ChatGPT research were identified by examining productive researchers, institutions, and countries/regions. Moreover, we performed co-authorship network analysis at the levels of author and country/region. Additionally, we conducted a co-citation analysis to identify impactful researchers, journals/sources, and literature in the ChatGPT field and performed a cluster analysis to identify the primary themes in this field. The key findings of this study are as follows. First, we found that the most productive researcher, institution, and country in ChatGPT research are Ishith Seth/Himel Mondal, Stanford University, and the United States, respectively. Second, highly cited researchers in this field are Tiffany H. Kung, Tom Brown, and Malik Sallam. Third, impactable sources/journals in this area are ARXIV, Nature, and Cureus Journal of Medical Science. Fourth, the most impactful work was published by Kung et al., who demonstrated that ChatGPT can potentially support medical education. Fifth, the overall author-based collaboration network consists of several isolated sub-networks, which indicates that the authors work in small groups and lack communication. Sixth, United Kingdom, India, and Spain had a high degree of betweenness centrality, which means that they play significant roles in the country/region-based collaboration network. Seventh, the major themes in the ChatGPT area were “data processing using ChatGPT”, “exploring user behavioral intention of ChatGPT”, and “applying ChatGPT for differential diagnosis”. Overall, we believe that our findings will help scholars and stakeholders understand the academic development of ChatGPT. Full article
Show Figures

Figure 1

17 pages, 588 KiB  
Article
EX-CODE: A Robust and Explainable Model to Detect AI-Generated Code
by Luana Bulla, Alessandro Midolo, Misael Mongiovì and Emiliano Tramontana
Information 2024, 15(12), 819; https://doi.org/10.3390/info15120819 - 20 Dec 2024
Viewed by 1859
Abstract
Distinguishing whether some code portions were implemented by humans or generated by a tool based on artificial intelligence has become hard. However, such a classification would be important as it could point developers towards some further validation for the produced code. Additionally, it [...] Read more.
Distinguishing whether some code portions were implemented by humans or generated by a tool based on artificial intelligence has become hard. However, such a classification would be important as it could point developers towards some further validation for the produced code. Additionally, it holds significant importance in security, legal contexts, and educational settings, where upholding academic integrity is of utmost importance. We present EX-CODE, a novel and explainable model that leverages the probability of the occurrence of some tokens, within a code snippet, estimated according to a language model, to distinguish human-written from AI-generated code. EX-CODE has been evaluated on a heterogeneous real-world dataset and stands out for its ability to provide human-understandable explanations of its outcomes. It achieves this by uncovering the features that for a snippet of code make it classified as human-written code (or AI-generated code). Full article
Show Figures

Graphical abstract

19 pages, 5271 KiB  
Article
Design and Implementation of an Intelligent Web Service Agent Based on Seq2Seq and Website Crawler
by Mei-Hua Hsih, Jian-Xin Yang and Chen-Chiung Hsieh
Information 2024, 15(12), 818; https://doi.org/10.3390/info15120818 - 20 Dec 2024
Viewed by 739
Abstract
This paper proposes using a web crawler to organize website content as a dialogue tree in some domains. We build an intelligent customer service agent based on this dialogue tree for general usage. The encoder-decoder architecture Seq2Seq is used to understand natural language [...] Read more.
This paper proposes using a web crawler to organize website content as a dialogue tree in some domains. We build an intelligent customer service agent based on this dialogue tree for general usage. The encoder-decoder architecture Seq2Seq is used to understand natural language and then modified as a bi-directional LSTM to increase the accuracy of the polysemy cases. The attention mechanism is added in the decoder to improve the problem of accuracy decreasing as the sentence grows in length. We conducted four experiments. The first is an ablation experiment demonstrating that the Seq2Seq + Bi-directional LSTM + Attention mechanism is superior to LSTM, Seq2Seq, Seq2Seq + Attention mechanism in natural language processing. Using an open-source Chinese corpus for testing, the accuracy was 82.1%, 63.4%, 69.2%, and 76.1%, respectively. The second experiment uses knowledge of the target domain to ask questions. Five thousand data from Taiwan Water Supply Company were used as the target training data, and a thousand questions that differed from the training data but related to water were used for testing. The accuracy of RasaNLU and this study were 86.4% and 87.1%, respectively. The third experiment uses knowledge from non-target domains to ask questions and compares answers from RasaNLU with the proposed neural network model. Five thousand questions were extracted as the training data, including chat databases from eight public sources such as Weibo, Tieba, Douban, and other well-known social networking sites in mainland China and PTT in Taiwan. Then, 1000 questions from the same corpus that differed from the training data for testing were extracted. The accuracy of this study was 83.2%, which is far better than RasaNLU. It is confirmed that the proposed model is more accurate in the general field. The last experiment compares this study with voice assistants like Xiao Ai, Google Assistant, Siri, and Samsung Bixby. Although this study cannot answer vague questions accurately, it is more accurate in the trained application fields. Full article
Show Figures

Graphical abstract

23 pages, 1262 KiB  
Article
Leveraging Large Language Models in Tourism: A Comparative Study of the Latest GPT Omni Models and BERT NLP for Customer Review Classification and Sentiment Analysis
by Konstantinos I. Roumeliotis, Nikolaos D. Tselikas and Dimitrios K. Nasiopoulos
Information 2024, 15(12), 792; https://doi.org/10.3390/info15120792 - 10 Dec 2024
Cited by 1 | Viewed by 2158
Abstract
In today’s rapidly evolving digital landscape, customer reviews play a crucial role in shaping the reputation and success of hotels. Accurately analyzing and classifying the sentiment of these reviews offers valuable insights into customer satisfaction, enabling businesses to gain a competitive edge. This [...] Read more.
In today’s rapidly evolving digital landscape, customer reviews play a crucial role in shaping the reputation and success of hotels. Accurately analyzing and classifying the sentiment of these reviews offers valuable insights into customer satisfaction, enabling businesses to gain a competitive edge. This study undertakes a comparative analysis of traditional natural language processing (NLP) models, such as BERT and advanced large language models (LLMs), specifically GPT-4 omni and GPT-4o mini, both pre- and post-fine-tuning with few-shot learning. By leveraging an extensive dataset of hotel reviews, we evaluate the effectiveness of these models in predicting star ratings based on review content. The findings demonstrate that the GPT-4 omni family significantly outperforms the BERT model, achieving an accuracy of 67%, compared to BERT’s 60.6%. GPT-4o, in particular, excelled in accuracy and contextual understanding, showcasing the superiority of advanced LLMs over traditional NLP methods. This research underscores the potential of using sophisticated review evaluation systems in the hospitality industry and positions GPT-4o as a transformative tool for sentiment analysis. It marks a new era in automating and interpreting customer feedback with unprecedented precision. Full article
Show Figures

Graphical abstract

Review

Jump to: Research

27 pages, 2910 KiB  
Review
A Survey on Multimodal Large Language Models in Radiology for Report Generation and Visual Question Answering
by Ziruo Yi, Ting Xiao and Mark V. Albert
Information 2025, 16(2), 136; https://doi.org/10.3390/info16020136 - 12 Feb 2025
Viewed by 1990
Abstract
Large language models (LLMs) and large vision models (LVMs) have driven significant advancements in natural language processing (NLP) and computer vision (CV), establishing a foundation for multimodal large language models (MLLMs) to integrate diverse data types in real-world applications. This survey explores the [...] Read more.
Large language models (LLMs) and large vision models (LVMs) have driven significant advancements in natural language processing (NLP) and computer vision (CV), establishing a foundation for multimodal large language models (MLLMs) to integrate diverse data types in real-world applications. This survey explores the evolution of MLLMs in radiology, focusing on radiology report generation (RRG) and radiology visual question answering (RVQA), where MLLMs leverage the combined capabilities of LLMs and LVMs to improve clinical efficiency. We begin by tracing the history of radiology and the development of MLLMs, followed by an overview of MLLM applications in RRG and RVQA, detailing core datasets, evaluation metrics, and leading MLLMs that demonstrate their potential in generating radiology reports and answering image-based questions. We then discuss the challenges MLLMs face in radiology, including dataset scarcity, data privacy and security, and issues within MLLMs such as bias, toxicity, hallucinations, catastrophic forgetting, and limitations in traditional evaluation metrics. Finally, this paper proposes future research directions to address these challenges, aiming to help AI researchers and radiologists overcome these obstacles and advance the study of MLLMs in radiology. Full article
Show Figures

Graphical abstract

36 pages, 1347 KiB  
Review
Transitioning from MLOps to LLMOps: Navigating the Unique Challenges of Large Language Models
by Saurabh Pahune and Zahid Akhtar
Information 2025, 16(2), 87; https://doi.org/10.3390/info16020087 - 23 Jan 2025
Cited by 1 | Viewed by 3501
Abstract
Large Language Models (LLMs), such as the GPT series, LLaMA, and BERT, possess incredible capabilities in human-like text generation and understanding across diverse domains, which have revolutionized artificial intelligence applications. However, their operational complexity necessitates a specialized framework known as LLMOps (Large Language [...] Read more.
Large Language Models (LLMs), such as the GPT series, LLaMA, and BERT, possess incredible capabilities in human-like text generation and understanding across diverse domains, which have revolutionized artificial intelligence applications. However, their operational complexity necessitates a specialized framework known as LLMOps (Large Language Model Operations), which refers to the practices and tools used to manage lifecycle processes, including model fine-tuning, deployment, and LLMs monitoring. LLMOps is a subcategory of the broader concept of MLOps (Machine Learning Operations), which is the practice of automating and managing the lifecycle of ML models. LLM landscapes are currently composed of platforms (e.g., Vertex AI) to manage end-to-end deployment solutions and frameworks (e.g., LangChain) to customize LLMs integration and application development. This paper attempts to understand the key differences between LLMOps and MLOps, highlighting their unique challenges, infrastructure requirements, and methodologies. The paper explores the distinction between traditional ML workflows and those required for LLMs to emphasize security concerns, scalability, and ethical considerations. Fundamental platforms, tools, and emerging trends in LLMOps are evaluated to offer actionable information for practitioners. Finally, the paper presents future potential trends for LLMOps by focusing on its critical role in optimizing LLMs for production use in fields such as healthcare, finance, and cybersecurity. Full article
Show Figures

Figure 1

Back to TopTop