applsci-logo

Journal Browser

Journal Browser

Natural Language Processing: Novel Methods and Applications

A special issue of Applied Sciences (ISSN 2076-3417). This special issue belongs to the section "Computing and Artificial Intelligence".

Deadline for manuscript submissions: closed (20 December 2024) | Viewed by 19411

Special Issue Editors


E-Mail Website
Guest Editor
Department of Computer Science, Università di Torino, 10124 Turin, Italy
Interests: text summarization; chatbot; word embedding; natural language processing

E-Mail Website
Guest Editor
Department of Computer Science, Università di Torino, 10124 Turin, Italy
Interests: text mining; information extraction; knowledge management; process mining; network analysis

Special Issue Information

Dear Colleagues,

We are inviting submissions to this Special Issue, entitled “Natural Language Processing: Novel Methods and Applications”.

Natural language processing is becoming ubiquitous in our life. It is used in several fields and contexts, ranging from simple voice assistants for house and car automation to complex systems that retrieve similar judicial cases for the court.

In the last decade, the availability of a large volume and variety of textual documents has attracted the interest of several scientific and humanistic areas. Therefore, the main objective of the Special Issue is to foster a shared view on this topic, integrating ideas and real-case applications from different communities.

In this Special Issue, we invite submissions exploring novel research frontiers and recent advances in this field, demonstrating how the interaction of different communities (such as Psychology, Law, etc.) and research fields (such as logic, human–computer interaction, deep learning, etc.) can both benefit from natural language processing and benefit the process itself. 

Dr. Giovanni Siragusa
Dr. Emilio Sulis
Guest Editors

Manuscript Submission Information

Manuscripts should be submitted online at www.mdpi.com by registering and logging in to this website. Once you are registered, click here to go to the submission form. Manuscripts can be submitted until the deadline. All submissions that pass pre-check are peer-reviewed. Accepted papers will be published continuously in the journal (as soon as accepted) and will be listed together on the special issue website. Research articles, review articles as well as short communications are invited. For planned papers, a title and short abstract (about 100 words) can be sent to the Editorial Office for announcement on this website.

Submitted manuscripts should not have been published previously, nor be under consideration for publication elsewhere (except conference proceedings papers). All manuscripts are thoroughly refereed through a single-blind peer-review process. A guide for authors and other relevant information for submission of manuscripts is available on the Instructions for Authors page. Applied Sciences is an international peer-reviewed open access semimonthly journal published by MDPI.

Please visit the Instructions for Authors page before submitting a manuscript. The Article Processing Charge (APC) for publication in this open access journal is 2400 CHF (Swiss Francs). Submitted papers should be well formatted and use good English. Authors may use MDPI's English editing service prior to publication or during author revisions.

Keywords

  • machine translation
  • irony and sarcasm detection
  • conversational agents
  • sentiment analysis
  • parsing and grammar formalism
  • speech recognition
  • text summarization
  • lexical semantics
  • linguistic resources
  • contrastive learning
  • social media mining
  • recommendation systems
  • information retrieval and semantic search
  • human–computer interaction

Benefits of Publishing in a Special Issue

  • Ease of navigation: Grouping papers by topic helps scholars navigate broad scope journals more efficiently.
  • Greater discoverability: Special Issues support the reach and impact of scientific research. Articles in Special Issues are more discoverable and cited more frequently.
  • Expansion of research network: Special Issues facilitate connections among authors, fostering scientific collaborations.
  • External promotion: Articles in Special Issues are often promoted through the journal's social media, increasing their visibility.
  • e-Book format: Special Issues with more than 10 articles can be published as dedicated e-books, ensuring wide and rapid dissemination.

Further information on MDPI's Special Issue policies can be found here.

Published Papers (9 papers)

Order results
Result details
Select all
Export citation of selected articles as:

Research

16 pages, 2242 KiB  
Article
Effective Data Augmentation Techniques for Arabic Speech Emotion Recognition Using Convolutional Neural Networks
by Wided Bouchelligua, Reham Al-Dayil and Areej Algaith
Appl. Sci. 2025, 15(4), 2114; https://doi.org/10.3390/app15042114 - 17 Feb 2025
Viewed by 525
Abstract
This paper investigates the effectiveness of various data augmentation techniques for enhancing Arabic speech emotion recognition (SER) using convolutional neural networks (CNNs). Utilizing the Saudi Dialect and BAVED datasets, we address the challenges of limited and imbalanced data commonly found in Arabic SER. [...] Read more.
This paper investigates the effectiveness of various data augmentation techniques for enhancing Arabic speech emotion recognition (SER) using convolutional neural networks (CNNs). Utilizing the Saudi Dialect and BAVED datasets, we address the challenges of limited and imbalanced data commonly found in Arabic SER. To improve model performance, we apply augmentation techniques such as noise addition, time shifting, increasing volume, and reducing volume. Additionally, we examine the optimal number of augmentations required to achieve the best results. Our experiments reveal that these augmentations significantly enhance the CNN’s ability to recognize emotions, with certain techniques proving more effective than others. Furthermore, the number of augmentations plays a critical role in balancing model accuracy. The Saudi Dialect dataset achieved its best results with two augmentations (increasing volume and decreasing volume), reaching an accuracy of 96.81%. Similarly, the BAVED dataset demonstrated optimal performance with a combination of three augmentations (noise addition, increasing volume, and reducing volume), achieving an accuracy of 92.60%. These findings indicate that carefully selected augmentation strategies can greatly improve the performance of CNN-based SER systems, particularly in the context of Arabic speech. This research underscores the importance of tailored augmentation techniques to enhance SER performance and sets a foundation for future advancements in this field. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

19 pages, 421 KiB  
Article
Task-Oriented Adversarial Attacks for Aspect-Based Sentiment Analysis Models
by Monserrat Vázquez-Hernández, Ignacio Algredo-Badillo, Luis Villaseñor-Pineda, Mariana Lobato-Báez, Juan Carlos Lopez-Pimentel and Luis Alberto Morales-Rosales
Appl. Sci. 2025, 15(2), 855; https://doi.org/10.3390/app15020855 - 16 Jan 2025
Viewed by 849
Abstract
Adversarial attacks deliberately modify deep learning inputs, mislead models, and cause incorrect results. Previous adversarial attacks on sentiment analysis models have demonstrated success in misleading these models. However, most existing attacks in sentiment analysis have applied a generalized approach to input modifications, without [...] Read more.
Adversarial attacks deliberately modify deep learning inputs, mislead models, and cause incorrect results. Previous adversarial attacks on sentiment analysis models have demonstrated success in misleading these models. However, most existing attacks in sentiment analysis have applied a generalized approach to input modifications, without considering the characteristics and objectives of the different analysis levels. Specifically, for aspect-based sentiment analysis, there is a lack of attack methods that modify inputs in accordance with the evaluated aspects. Consequently, unnecessary modifications are made, compromising the input semantics, making the changes more detectable, and avoiding the identification of new vulnerabilities. In previous work, we proposed a model to generate adversarial examples in particular for aspect-based sentiment analysis. In this paper, we assess the effectiveness of our adversarial example model in negatively impacting aspect-based model results while maintaining high levels of semantic inputs. To conduct this evaluation, we propose diverse adversarial attacks across different dataset domains, target architectures, and consider distinct levels of victim model knowledge, thus obtaining a comprehensive evaluation. The obtained results demonstrate that our approach outperforms existing attack methods in terms of accuracy reduction and semantic similarity, achieving a 65.30% reduction in model accuracy with a low perturbation ratio of 7.79%. These findings highlight the importance of considering task-specific characteristics when designing adversarial examples, as even simple modifications to elements that support task classification can successfully mislead models. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

14 pages, 325 KiB  
Article
Word Vector Representation of Latin Cuengh Based on Root Feature Enhancement
by Weibin Lyu, Jinlong Chen, Xingguo Qin and Jun Li
Appl. Sci. 2025, 15(1), 211; https://doi.org/10.3390/app15010211 - 29 Dec 2024
Viewed by 859
Abstract
The Latin Cuengh is a kind of language used in China’s minority areas. Due to its complex pronunciation and semantic system, it is difficult to spread widely. To deal with and protect this language further, this paper considers using the current word vector [...] Read more.
The Latin Cuengh is a kind of language used in China’s minority areas. Due to its complex pronunciation and semantic system, it is difficult to spread widely. To deal with and protect this language further, this paper considers using the current word vector representation technology to study it. Word vector representation is the basic method and an important foundation of current research on natural language processing. It relies on a large number of data resources and is obtained through the paradigm of pre-training and feature learning. Due to the extreme lack of Latin Cuengh corpus resources, it is very difficult to obtain word vectors by relying on a large amount of data training. In this study, we propose a word vector representation method that combines the root features of Latin Cuengh words. Specifically, while training and learning the Latin Cuengh language corpus, this method uses the special word roots in the Latin Cuengh language to modify the training process, which can enhance the expression ability of the root features. The method uses the mask method based on BERT to mask the word roots after word segmentation and predict the masked word roots in the output layer of the model to obtain a better vector representation of Latin Cuengh words. The experimental results show that the word vector representation method proposed in this paper is effective and has the ability to express Latin Cuengh semantics. The accuracy rate of words semantic is nearly 2% points higher than that of BERT representation, and the judgment of the semantic similarity of words is more accurate. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

16 pages, 635 KiB  
Article
TAWC: Text Augmentation with Word Contributions for Imbalance Aspect-Based Sentiment Classification
by Noviyanti Santoso, Israel Mendonça and Masayoshi Aritsugi
Appl. Sci. 2024, 14(19), 8738; https://doi.org/10.3390/app14198738 - 27 Sep 2024
Viewed by 940
Abstract
Text augmentation plays an important role in enhancing the generalizability of language models. However, traditional methods often overlook the unique roles that individual words play in conveying meaning in text and imbalance class distribution, thereby risking suboptimal performance and compromising the model’s generalizability. [...] Read more.
Text augmentation plays an important role in enhancing the generalizability of language models. However, traditional methods often overlook the unique roles that individual words play in conveying meaning in text and imbalance class distribution, thereby risking suboptimal performance and compromising the model’s generalizability. This limitation motivated us to develop a novel technique called Text Augmentation with Word Contributions (TAWC). Our approach tackles this problem in two core steps: Firstly, it employs analytical correlation and semantic similarity metrics to discern the relationships between words and their associated aspect polarities. Secondly, it tailors distinct augmentation strategies to individual words based on their identified functional contributions in the text. Extensive experiments on two aspect-based sentiment analysis datasets demonstrate that the proposed TAWC model significantly improves the classification performances of popular language models, achieving gains of up to 4% compared with the case of data without augmentation, thereby setting a new standard in the field of text augmentation. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

26 pages, 6325 KiB  
Article
Improving the Accuracy and Effectiveness of Text Classification Based on the Integration of the Bert Model and a Recurrent Neural Network (RNN_Bert_Based)
by Chanthol Eang and Seungjae Lee
Appl. Sci. 2024, 14(18), 8388; https://doi.org/10.3390/app14188388 - 18 Sep 2024
Cited by 6 | Viewed by 4687
Abstract
This paper proposes a new robust model for text classification on the Stanford Sentiment Treebank v2 (SST-2) dataset in terms of model accuracy. We developed a Recurrent Neural Network Bert based (RNN_Bert_based) model designed to improve classification accuracy on the SST-2 dataset. This [...] Read more.
This paper proposes a new robust model for text classification on the Stanford Sentiment Treebank v2 (SST-2) dataset in terms of model accuracy. We developed a Recurrent Neural Network Bert based (RNN_Bert_based) model designed to improve classification accuracy on the SST-2 dataset. This dataset consists of movie review sentences, each labeled with either positive or negative sentiment, making it a binary classification task. Recurrent Neural Networks (RNNs) are effective for text classification because they capture the sequential nature of language, which is crucial for understanding context and meaning. Bert excels in text classification by providing bidirectional context, generating contextual embeddings, and leveraging pre-training on large corpora. This allows Bert to capture nuanced meanings and relationships within the text effectively. Combining Bert with RNNs can be highly effective for text classification. Bert’s bidirectional context and rich embeddings provide a deep understanding of the text, while RNNs capture sequential patterns and long-range dependencies. Together, they leverage the strengths of both architectures, leading to improved performance on complex classification tasks. Next, we also developed an integration of the Bert model and a K-Nearest Neighbor based (KNN_Bert_based) method as a comparative scheme for our proposed work. Based on the results of experimentation, our proposed model outperforms traditional text classification models as well as existing models in terms of accuracy. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

18 pages, 1519 KiB  
Article
An Investigation of Applying Large Language Models to Spoken Language Learning
by Yingming Gao, Baorian Nuchged, Ya Li and Linkai Peng
Appl. Sci. 2024, 14(1), 224; https://doi.org/10.3390/app14010224 - 26 Dec 2023
Cited by 6 | Viewed by 5272
Abstract
People have long desired intelligent conversational systems that can provide assistance in practical scenarios. The latest advancements in large language models (LLMs) are making significant strides toward turning this aspiration into a tangible reality. LLMs are believed to hold the most potential and [...] Read more.
People have long desired intelligent conversational systems that can provide assistance in practical scenarios. The latest advancements in large language models (LLMs) are making significant strides toward turning this aspiration into a tangible reality. LLMs are believed to hold the most potential and value in education, especially in the creation of AI-driven virtual teachers that facilitate language learning. This study focuses on assessing the effectiveness of LLMs within the educational domain, specifically in the areas of spoken language learning, which encompass phonetics, phonology, and second language acquisition. To this end, we first introduced a new multiple-choice question dataset to evaluate the effectiveness of LLMs in the aforementioned scenarios, including the understanding and application of spoken language knowledge. Moreover, we investigated the influence of various prompting techniques such as zero- and few-shot methods (prepending the question with question-answer exemplars), chain-of-thought (CoT) prompting, in-domain exemplars, and external tools. We conducted a comprehensive evaluation of popular LLMs (20 distinct models) using these methods. The experimental results showed that the task of extracting conceptual knowledge posed few challenges for these LLMs, whereas the task of application questions was relatively difficult. In addition, some widely proven effective prompting methods combined with domain-specific examples resulted in significant performance improvements compared to the zero-shot baselines. Additionally, some other preliminary experiments also demonstrated the strengths and weaknesses of different LLMs. The findings of this study can shed light on the application of LLMs to spoken language learning. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

13 pages, 829 KiB  
Article
Boosting Lightweight Sentence Embeddings with Knowledge Transfer from Advanced Models: A Model-Agnostic Approach
by Kadir Gunel and Mehmet Fatih Amasyali
Appl. Sci. 2023, 13(23), 12586; https://doi.org/10.3390/app132312586 - 22 Nov 2023
Viewed by 1510
Abstract
In this study, we investigate knowledge transfer between two distinct sentence embedding models: a computationally demanding, highly performant model and a lightweight model derived from word vector averaging. Our objective is to augment the representational power of the lightweight model by exploiting the [...] Read more.
In this study, we investigate knowledge transfer between two distinct sentence embedding models: a computationally demanding, highly performant model and a lightweight model derived from word vector averaging. Our objective is to augment the representational power of the lightweight model by exploiting the sophisticated features of the robust model. Diverging from traditional knowledge distillation methods that align logits or hidden states of teacher and student models, our approach uses only the output sentence vectors of the teacher model for the alignment with the student models’s word vector representations. We implement two minimization techniques for this purpose: distance minimization and distance and perplexity minimization Our methodology uses WMT datasets for training, and the enhanced embeddings are validated via Google’s Analogy tasks and Meta’s SentEval datasets. We found that our proposed models intriguingly retained and conveyed information in a model-specific fashion. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

25 pages, 1513 KiB  
Article
Hierarchical Clause Annotation: Building a Clause-Level Corpus for Semantic Parsing with Complex Sentences
by Yunlong Fan, Bin Li, Yikemaiti Sataer, Miao Gao, Chuanqi Shi, Siyi Cao and Zhiqiang Gao
Appl. Sci. 2023, 13(16), 9412; https://doi.org/10.3390/app13169412 - 19 Aug 2023
Cited by 4 | Viewed by 2098
Abstract
Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as semantic parsing, syntactic parsing, machine translation, and text summarization. Previous works addressed the issue with the intuition of decomposing complex sentences and linking simple ones, such as rhetorical-structure-theory (RST)-style [...] Read more.
Most natural-language-processing (NLP) tasks suffer performance degradation when encountering long complex sentences, such as semantic parsing, syntactic parsing, machine translation, and text summarization. Previous works addressed the issue with the intuition of decomposing complex sentences and linking simple ones, such as rhetorical-structure-theory (RST)-style discourse parsing, split-and-rephrase (SPRP), text simplification (TS), simple sentence decomposition (SSD), etc. However, these works are not applicable for semantic parsing such as abstract meaning representation (AMR) parsing and semantic dependency parsing due to misalignments with semantic relations and unavailabilities to preserve the original semantics. Following the same intuition and avoiding the deficiencies of previous works, we propose a novel framework, hierarchical clause annotation (HCA), for capturing clausal structures of complex sentences, based on the linguistic research of clause hierarchy. With the HCA framework, we annotated a large HCA corpus to explore the potentialities of integrating HCA structural features into semantic parsing with complex sentences. Moreover, we decomposed HCA into two subtasks, i.e., clause segmentation and clause parsing, and provide neural baseline models for more-silver annotations. In evaluating the proposed models on our manually annotated HCA dataset, the performances of clause segmentation and parsing resulted in 91.3% F1-scores and 88.5% Parseval scores, respectively. Due to the same model architectures employed, the performance differences of the clause/discourse segmentation and parsing subtasks was reflected in our HCA corpus and compared discourse corpora, where our sentences contained more segment units and fewer interrelations than those in the compared corpora. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

18 pages, 2814 KiB  
Article
A Multitask Cross-Lingual Summary Method Based on ABO Mechanism
by Qing Li, Weibing Wan and Yuming Zhao
Appl. Sci. 2023, 13(11), 6723; https://doi.org/10.3390/app13116723 - 31 May 2023
Cited by 1 | Viewed by 1356
Abstract
Recent cross-lingual summarization research has pursued the use of a unified end-to-end model which has demonstrated a certain level of improvement in performance and effectiveness, but this approach stitches together multiple tasks and makes the computation more complex. Less work has focused on [...] Read more.
Recent cross-lingual summarization research has pursued the use of a unified end-to-end model which has demonstrated a certain level of improvement in performance and effectiveness, but this approach stitches together multiple tasks and makes the computation more complex. Less work has focused on alignment relationships across languages, which has led to persistent problems of summary misordering and loss of key information. For this reason, we first simplify the multitasking by converting the translation task into an equal proportion of cross-lingual summary tasks so that the model can perform only cross-lingual summary tasks when generating cross-lingual summaries. In addition, we splice monolingual and cross-lingual summary sequences as an input so that the model can fully learn the core content of the corpus. Then, we propose a reinforced regularization method based on the model to improve its robustness, and build a targeted ABO mechanism to enhance the semantic relationship alignment and key information retention of the cross-lingual summaries. Ablation experiments are conducted on three datasets of different orders of magnitude to demonstrate the effective enhancement of the model by the optimization approach; they outperform the mainstream approaches on the cross-lingual summarization task and the monolingual summarization task for the full dataset. Finally, we validate the model’s capabilities on a cross-lingual summary dataset of professional domains, and the results demonstrate its superior performance and ability to improve cross-lingual sequencing. Full article
(This article belongs to the Special Issue Natural Language Processing: Novel Methods and Applications)
Show Figures

Figure 1

Back to TopTop