Emerging Theory and Applications in Natural Language Processing

A special issue of Electronics (ISSN 2079-9292). This special issue belongs to the section "Artificial Intelligence".

Deadline for manuscript submissions: 15 November 2024 | Viewed by 15528

Special Issue Editors


E-Mail Website
Guest Editor
School of Computer Science and Technology, Beijing Institute of Technology, Beijing 100081, China
Interests: knowledge graph; natural language processing; multimodal

E-Mail Website
Guest Editor
School of Computer Science, Beijing Jiaotong University, Beijing 100091, China
Interests: natural language processing; knowledge graph; machine learning
School of Computer Science and Technology, Dalian University of Technology, Dalian 116081, China
Interests: information retrieval; question answering and dialogue; natural language processing; biomedical literature-based knowledge discovery

Special Issue Information

Dear Colleagues,

In recent years, natural language processing (NLP) has been transformed by groundbreaking deep learning advancements and the emergence of large language models (LLMs). The combination of LLMs with adaptation tuning methods has significantly increased the generalization capabilities of NLP models, illuminating the path towards general artificial intelligence systems for researchers. Recognizing the significance of these emerging developments, it is crucial to explore their potential and understand their relationship with classical methods in shaping the future of NLP and its real-world applications. The aim of this Special Issue is to showcase cutting-edge research in NLP, highlighting novel theories, methods, and applications that advance the state of the art, while also promoting interdisciplinary research.

Suggested themes for this Special Issue include, but are not limited to:

(1)  Novel NLP theory, architectures, and algorithms;

(2)  Theoretical foundations of LLMs: emergent abilities, scaling effects, etc.;

(3)  Model training and utilization strategies;

(4)  Efficiency and scalability of language models;

(5)  Integration of NLP with other AI technologies;

(6)  Interpretability of NLP and LLM;

(7)  Evaluating large language models: capabilities and limitations;

(8)  Ethical considerations and fairness;

(9)  Safety and alignment in LLMs;

(10) Domain-specific NLP applications;

(11)  Other emerging topics in NLP and LLM research.

Dr. Linmei Hu
Dr. Jian Liu
Dr. Bo Xu
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Electronics is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • natural language processing
  • large language models
  • NLP theory and application

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue polices can be found here.

Published Papers (11 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 319 KiB  
Article
KRA: K-Nearest Neighbor Retrieval Augmented Model for Text Classification
by Jie Li, Chang Tang, Zhechao Lei, Yirui Zhang, Xuan Li, Yanhua Yu, Renjie Pi and Linmei Hu
Electronics 2024, 13(16), 3237; https://doi.org/10.3390/electronics13163237 - 15 Aug 2024
Viewed by 565
Abstract
Text classification is a fundamental task in natural language processing (NLP). Deep-learning-based text classification methods usually have two stages: training and inference. However, the training dataset is only used in the training stage. To make full use of the training dataset in the [...] Read more.
Text classification is a fundamental task in natural language processing (NLP). Deep-learning-based text classification methods usually have two stages: training and inference. However, the training dataset is only used in the training stage. To make full use of the training dataset in the inference stage in order to improve model performance, we propose a k-nearest neighbors retrieval augmented method (KRA) for deep-learning-based text classification models. KRA works by first constructing a storage system that stores the embeddings of the training samples during the training stage. During the inference stage, the model retrieves the top k-nearest neighbors of the testing text from the storage. Then, we use text augmentation methods to expand the retrieved neighbors, including traditional augmentation methods and a large language model (LLM)-based method. Next, the method weights the augmented neighbors based on their distances from the target text and incorporates their labels into the inference of the final results accordingly. We evaluate our KRA method on six benchmark datasets using four commonly used deep learning models: CNN, LSTM, BERT, and RoBERTa. The results demonstrate that KRA significantly improves the classification performance of these models, with an average accuracy improvement of 0.3% for BERT and up to 0.4% for RoBERTa. These improvements highlight the effectiveness and generalizability of KRA across different models and datasets, making it a valuable enhancement for a wide range of text classification tasks. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

18 pages, 487 KiB  
Article
NLOCL: Noise-Labeled Online Continual Learning
by Kan Cheng, Yongxin Ma, Guanglu Wang, Linlin Zong and Xinyue Liu
Electronics 2024, 13(13), 2560; https://doi.org/10.3390/electronics13132560 - 29 Jun 2024
Viewed by 429
Abstract
Continual learning (CL) from infinite data streams has become a challenge for neural network models in real-world scenarios. Catastrophic forgetting of previous knowledge occurs in this learning setting, and existing supervised CL methods rely excessively on accurately labeled samples. However, the real-world data [...] Read more.
Continual learning (CL) from infinite data streams has become a challenge for neural network models in real-world scenarios. Catastrophic forgetting of previous knowledge occurs in this learning setting, and existing supervised CL methods rely excessively on accurately labeled samples. However, the real-world data labels are usually misled by noise, which influences the CL agents and aggravates forgetting. To address this problem, we propose a method named noise-labeled online continual learning (NLOCL), which implements the online CL model with noise-labeled data streams. NLOCL uses an empirical replay strategy to retain crucial examples, separates data streams by small-loss criteria, and includes semi-supervised fine-tuning for labeled and unlabeled samples. Besides, NLOCL combines small loss with class diversity measures and eliminates online memory partitioning. Furthermore, we optimized the experience replay stage to enhance the model performance by retaining significant clean-labeled examples and carefully selecting suitable samples. In the experiment, we designed noise-labeled data streams by injecting noisy labels into multiple datasets and partitioning tasks to simulate infinite data streams realistically. The experimental results demonstrate the superior performance and robust learning capabilities of our proposed method. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

24 pages, 8284 KiB  
Article
Hybrid Natural Language Processing Model for Sentiment Analysis during Natural Crisis
by Marko Horvat, Gordan Gledec and Fran Leontić
Electronics 2024, 13(10), 1991; https://doi.org/10.3390/electronics13101991 - 20 May 2024
Viewed by 899
Abstract
This paper introduces a novel natural language processing (NLP) model as an original approach to sentiment analysis, with a focus on understanding emotional responses during major disasters or conflicts. The model was created specifically for Croatian and is based on unigrams, but it [...] Read more.
This paper introduces a novel natural language processing (NLP) model as an original approach to sentiment analysis, with a focus on understanding emotional responses during major disasters or conflicts. The model was created specifically for Croatian and is based on unigrams, but it can be used with any language that supports the n-gram model and expanded to multiple word sequences. The presented model generates a sentiment score aligned with discrete and dimensional emotion models, reliability metrics, and individual word scores using affective datasets Extended ANEW and NRC WordEmotion Association Lexicon. The sentiment analysis model incorporates different methodologies, including lexicon-based, machine learning, and hybrid approaches. The process of preprocessing includes translation, lemmatization, and data refinement, utilized automated translation services as well as the CLARIN Knowledge Centre for South Slavic languages (CLASSLA) library, with a particular emphasis on diacritical mark correction and tokenization. The presented model was experimentally evaluated on three simultaneous major natural crises that recently affected Croatia. The study’s findings reveal a significant shift in emotional dimensions during the COVID-19 pandemic, particularly a decrease in valence, arousal, and dominance, which corresponded with the two-month recovery period. Furthermore, the 2020 Croatian earthquakes elicited a wide range of negative discrete emotions, including anger, fear, and sadness, with the recuperation period much longer than in the case of COVID-19. This study represents an advancement in sentiment analysis, particularly in linguistically specific contexts, and provides insights into the emotional landscape shaped by major societal events. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

17 pages, 1582 KiB  
Article
AdMISC: Advanced Multi-Task Learning and Feature-Fusion for Emotional Support Conversation
by Xuhui Jia, Jia He, Qian Zhang and Jin Jin
Electronics 2024, 13(8), 1484; https://doi.org/10.3390/electronics13081484 - 13 Apr 2024
Viewed by 946
Abstract
The emotional support dialogue system is an emerging and challenging task in natural language processing to alleviate people’s emotional distress. Each utterance in the dialogue has features such as emotion, intent, and commonsense knowledge. Previous research has indicated subpar performance in strategy prediction [...] Read more.
The emotional support dialogue system is an emerging and challenging task in natural language processing to alleviate people’s emotional distress. Each utterance in the dialogue has features such as emotion, intent, and commonsense knowledge. Previous research has indicated subpar performance in strategy prediction accuracy and response generation quality due to overlooking certain underlying factors. To address these issues, we propose Advanced Multi-Task Learning and Feature-Fusion for Emotional Support Conversation (AdMISC), which extracts various potential factors influencing dialogue through neural networks, thereby improving the accuracy of strategy prediction and the quality of generated responses. Specifically, we extract features affecting dialogue through dynamic emotion extraction and commonsense enhancement and then model strategy prediction. Additionally, the model learns these features through attention networks to generate higher quality responses. Furthermore, we introduce a method for automatically averaging loss function weights to improve the model’s performance. Experimental results using the emotional support conversation dataset ESConv demonstrate that our proposed model outperforms baseline methods in both strategy label prediction accuracy and a range of automatic and human evaluation metrics. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

25 pages, 3406 KiB  
Article
Persona-Identified Chatbot through Small-Scale Modeling and Data Transformation
by Bitna Keum, Juoh Sun, Woojin Lee, Seongheum Park and Harksoo Kim
Electronics 2024, 13(8), 1409; https://doi.org/10.3390/electronics13081409 - 9 Apr 2024
Viewed by 1094
Abstract
Research on chatbots aimed at facilitating more natural and engaging conversations is actively underway. With the growing recognition of the significance of personas in this context, persona-based conversational research is gaining prominence. Despite the abundance of publicly available chit-chat datasets, persona-based chat datasets [...] Read more.
Research on chatbots aimed at facilitating more natural and engaging conversations is actively underway. With the growing recognition of the significance of personas in this context, persona-based conversational research is gaining prominence. Despite the abundance of publicly available chit-chat datasets, persona-based chat datasets remain scarce, primarily due to the higher associated costs. Consequently, we propose a methodology for transforming extensive chit-chat datasets into persona-based chat datasets. Simultaneously, we propose a model adept at effectively incorporating personas into responses, even with a constrained number of parameters. This model can discern the most relevant information from persona memory without resorting to a retrieval model. Furthermore, it makes decisions regarding whether to reference the memory, thereby enhancing the interpretability of the model’s judgments. Our CC2PC framework demonstrates superior performance in both automatic and LLM evaluations when compared to high-cost persona-based chat dataset. Additionally, experimental results on the proposed model indicate the improved persona-based response capabilities. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

21 pages, 9086 KiB  
Article
Robust Testing of AI Language Model Resiliency with Novel Adversarial Prompts
by Brendan Hannon, Yulia Kumar, Dejaun Gayle, J. Jenny Li and Patricia Morreale
Electronics 2024, 13(5), 842; https://doi.org/10.3390/electronics13050842 - 22 Feb 2024
Cited by 1 | Viewed by 2236
Abstract
In the rapidly advancing field of Artificial Intelligence (AI), this study presents a critical evaluation of the resilience and cybersecurity efficacy of leading AI models, including ChatGPT-4, Bard, Claude, and Microsoft Copilot. Central to this research are innovative adversarial prompts designed to rigorously [...] Read more.
In the rapidly advancing field of Artificial Intelligence (AI), this study presents a critical evaluation of the resilience and cybersecurity efficacy of leading AI models, including ChatGPT-4, Bard, Claude, and Microsoft Copilot. Central to this research are innovative adversarial prompts designed to rigorously test the content moderation capabilities of these AI systems. This study introduces new adversarial tests and the Response Quality Score (RQS), a metric specifically developed to assess the nuances of AI responses. Additionally, the research spotlights FreedomGPT, an AI tool engineered to optimize the alignment between user intent and AI interpretation. The empirical results from this investigation are pivotal for assessing AI models’ current robustness and security. They highlight the necessity for ongoing development and meticulous testing to bolster AI defenses against various adversarial challenges. Notably, this study also delves into the ethical and societal implications of employing advanced “jailbreak” techniques in AI testing. The findings are significant for understanding AI vulnerabilities and formulating strategies to enhance AI technologies’ reliability and ethical soundness, paving the way for safer and more secure AI applications. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

21 pages, 815 KiB  
Article
Prediction of Arabic Legal Rulings Using Large Language Models
by Adel Ammar, Anis Koubaa, Bilel Benjdira, Omer Nacar and Serry Sibaee
Electronics 2024, 13(4), 764; https://doi.org/10.3390/electronics13040764 - 15 Feb 2024
Viewed by 1451
Abstract
In the intricate field of legal studies, the analysis of court decisions is a cornerstone for the effective functioning of the judicial system. The ability to predict court outcomes helps judges during the decision-making process and equips lawyers with invaluable insights, enhancing their [...] Read more.
In the intricate field of legal studies, the analysis of court decisions is a cornerstone for the effective functioning of the judicial system. The ability to predict court outcomes helps judges during the decision-making process and equips lawyers with invaluable insights, enhancing their strategic approaches to cases. Despite its significance, the domain of Arabic court analysis remains under-explored. This paper pioneers a comprehensive predictive analysis of Arabic court decisions on a dataset of 10,813 commercial court real cases, leveraging the advanced capabilities of the current state-of-the-art large language models. Through a systematic exploration, we evaluate three prevalent foundational models (LLaMA-7b, JAIS-13b, and GPT-3.5-turbo) and three training paradigms: zero-shot, one-shot, and tailored fine-tuning. In addition, we assess the benefit of summarizing and/or translating the original Arabic input texts. This leads to a spectrum of 14 model variants, for which we offer a granular performance assessment with a series of different metrics (human assessment, GPT evaluation, ROUGE, and BLEU scores). We show that all variants of LLaMA models yield limited performance, whereas GPT-3.5-based models outperform all other models by a wide margin, surpassing the average score of the dedicated Arabic-centric JAIS model by 50%. Furthermore, we show that all scores except human evaluation are inconsistent and unreliable for assessing the performance of large language models on court decision predictions. This study paves the way for future research, bridging the gap between computational linguistics and Arabic legal analytics. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

24 pages, 2195 KiB  
Article
CLICK: Integrating Causal Inference and Commonsense Knowledge Incorporation for Counterfactual Story Generation
by Dandan Li, Ziyu Guo, Qing Liu, Li Jin, Zequn Zhang, Kaiwen Wei and Feng Li
Electronics 2023, 12(19), 4173; https://doi.org/10.3390/electronics12194173 - 8 Oct 2023
Viewed by 1431
Abstract
Counterfactual reasoning explores what could have happened if the circumstances were different from what actually occurred. As a crucial subtask, counterfactual story generation integrates counterfactual reasoning into the generative narrative chain, which requires the model to preserve minimal edits and ensure narrative consistency. [...] Read more.
Counterfactual reasoning explores what could have happened if the circumstances were different from what actually occurred. As a crucial subtask, counterfactual story generation integrates counterfactual reasoning into the generative narrative chain, which requires the model to preserve minimal edits and ensure narrative consistency. Previous work prioritizes conflict detection as a first step, and then replaces conflicting content with appropriate words. However, these methods mainly face two challenging issues: (a) the causal relationship between story event sequences is not fully utilized in the conflict detection stage, leading to inaccurate conflict detection, and (b) the absence of proper planning in the content rewriting stage results in a lack of narrative consistency in the generated story ending. In this paper, we propose a novel counterfactual generation framework called CLICK based on causal inference in event sequences and commonsense knowledge incorporation. To address the first issue, we utilize the correlation between adjacent events in the story ending to iteratively calculate the contents from the original ending affected by the condition. The content with the original condition is then effectively prevented from carrying over into the new story ending, thereby avoiding causal conflict with the counterfactual conditions. Considering the second issue, we incorporate structural commonsense knowledge about counterfactual conditions, equipping the framework with comprehensive background information on the potential occurrence of counterfactual conditional events. Through leveraging a rich hierarchical data structure, CLICK gains the ability to establish a more coherent and plausible narrative trajectory for subsequent storytelling. Experimental results show that our model outperforms previous unsupervised state-of-the-art methods and achieves gains of 2.65 in BLEU, 4.42 in ENTScore, and 3.84 in HMean on the TIMETRAVEL dataset. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

16 pages, 615 KiB  
Article
Asking Questions about Scientific Articles—Identifying Large N Studies with LLMs
by Razvan Paroiu, Stefan Ruseti, Mihai Dascalu, Stefan Trausan-Matu and Danielle S. McNamara
Electronics 2023, 12(19), 3996; https://doi.org/10.3390/electronics12193996 - 22 Sep 2023
Viewed by 1181
Abstract
The exponential growth of scientific publications increases the effort required to identify relevant articles. Moreover, the scale of studies is a frequent barrier to research as the majority of studies are low or medium-scaled and do not generalize well while lacking statistical power. [...] Read more.
The exponential growth of scientific publications increases the effort required to identify relevant articles. Moreover, the scale of studies is a frequent barrier to research as the majority of studies are low or medium-scaled and do not generalize well while lacking statistical power. As such, we introduce an automated method that supports the identification of large-scale studies in terms of population. First, we introduce a training corpus of 1229 manually annotated paragraphs extracted from 20 articles with different structures and considered populations. Our method considers prompting a FLAN-T5 language model with targeted questions and paragraphs from the previous corpus so that the model returns the number of participants from the study. We adopt a dialogic extensible approach in which the model is asked a sequence of questions that are gradual in terms of focus. Second, we use a validation corpus with 200 articles labeled for having N larger than 1000 to assess the performance of our language model. Our model, without any preliminary filtering with heuristics, achieves an F1 score of 0.52, surpassing previous analyses performed that obtained an F1 score of 0.51. Moreover, we achieved an F1 score of 0.69 when combined with previous extraction heuristics, thus arguing for the robustness and extensibility of our approach. Finally, we apply our model to a newly introduced dataset of ERIC publications to observe trends across the years in the Education domain. A spike was observed in 2019, followed by a decrease in 2020 and, afterward, a positive trend; nevertheless, the overall percentage is lower than 3%, suggesting a major problem in terms of scale and the need for a change in perspective. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

19 pages, 3239 KiB  
Article
ConKgPrompt: Contrastive Sample Method Based on Knowledge-Guided Prompt Learning for Text Classification
by Qian Wang, Cheng Zeng, Bing Li and Peng He
Electronics 2023, 12(17), 3656; https://doi.org/10.3390/electronics12173656 - 30 Aug 2023
Viewed by 1988
Abstract
Text classification aims to classify text according to pre-defined categories. Despite the success of existing methods based on the fine-tuning paradigm, there is a significant gap between fine-tuning and pre-training. Currently, prompt learning methods can bring state of the art (SOTA) performance to [...] Read more.
Text classification aims to classify text according to pre-defined categories. Despite the success of existing methods based on the fine-tuning paradigm, there is a significant gap between fine-tuning and pre-training. Currently, prompt learning methods can bring state of the art (SOTA) performance to pre-trained language models (PLMs) in text classification and transform a classification problem into a masked language modeling problem. The crucial step of prompt learning is to construct a map between original labels and the label extension words. However, most mapping construction methods consider only labels themselves; relying solely on a label is not sufficient to achieve accurate prediction of mask tokens, especially in classification tasks where semantic features and label words are highly interrelated. Therefore, the accurate prediction of mask tokens requires one to consider additional factors beyond just label words. To this end, we propose a contrastive sample method based on knowledge-guided prompt learning framework (ConKgPrompt) for text classification. Specifically, this framework utilizes external knowledge bases (KBs) to expand the label vocabulary of verbalizers at multiple granularities. In the contrastive sample module, we incorporate supervised contrastive learning to make representations more expressive. Our approach was validated on four benchmark datasets, and extensive experimental results and analysis demonstrated the effectiveness of each module of the ConKgPrompt method. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

18 pages, 546 KiB  
Article
Prompt Learning with Structured Semantic Knowledge Makes Pre-Trained Language Models Better
by Hai-Tao Zheng, Zuotong Xie, Wenqiang Liu, Dongxiao Huang, Bei Wu and Hong-Gee Kim
Electronics 2023, 12(15), 3281; https://doi.org/10.3390/electronics12153281 - 30 Jul 2023
Viewed by 1651
Abstract
Pre-trained language models with structured semantic knowledge have demonstrated remarkable performance in a variety of downstream natural language processing tasks. The typical methods of integrating knowledge are designing different pre-training tasks and training from scratch, which requires high-end hardware, massive storage resources, and [...] Read more.
Pre-trained language models with structured semantic knowledge have demonstrated remarkable performance in a variety of downstream natural language processing tasks. The typical methods of integrating knowledge are designing different pre-training tasks and training from scratch, which requires high-end hardware, massive storage resources, and long computing times. Prompt learning is an effective approach to tuning language models for specific tasks, and it can also be used to infuse knowledge. However, most prompt learning methods accept one token as the answer, instead of multiple tokens. To tackle this problem, we propose the long-answer prompt learning method (KLAPrompt), with three different long-answer strategies, to incorporate semantic knowledge into pre-trained language models, and we compare the performance of these three strategies through experiments. We also explore the effectiveness of the KLAPrompt method in the medical field. Additionally, we generate a word sense prediction dataset (WSP) based on the Xinhua Dictionary and a disease and category prediction dataset (DCP) based on MedicalKG. Experimental results show that discrete answers with the answer space partitioning strategy achieve the best results, and introducing structured semantic information can consistently improve language modeling and downstream tasks. Full article
(This article belongs to the Special Issue Emerging Theory and Applications in Natural Language Processing)
Show Figures

Figure 1

Back to TopTop