Sign in to use this feature.

Years

Between: -

Subjects

remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline
remove_circle_outline

Journals

Article Types

Countries / Regions

Search Results (22)

Search Parameters:
Keywords = fastText and BERT word embeddings

Order results
Result details
Results per page
Select all
Export citation of selected articles as:
21 pages, 588 KB  
Article
Research on an MOOC Recommendation Method Based on the Fusion of Behavioral Sequences and Textual Semantics
by Wenxin Zhao, Lei Zhao and Zhenbin Liu
Appl. Sci. 2025, 15(18), 10024; https://doi.org/10.3390/app151810024 - 13 Sep 2025
Viewed by 383
Abstract
To address the challenges of user behavior sparsity and insufficient utilization of course semantics on MOOC platforms, this paper proposes a personalized recommendation method that integrates user behavioral sequences with course textual semantic features. First, shallow word-level features from course titles are extracted [...] Read more.
To address the challenges of user behavior sparsity and insufficient utilization of course semantics on MOOC platforms, this paper proposes a personalized recommendation method that integrates user behavioral sequences with course textual semantic features. First, shallow word-level features from course titles are extracted using FastText, and deep contextual semantic representations from course descriptions are obtained via a fine-tuned BERT model. The two sets of semantic features are concatenated to form a multi-level semantic representation of course content. Next, the fused semantic features are mapped into the same vector space as course ID embeddings through a linear projection layer and combined with the original course ID embeddings via an additive fusion strategy, enhancing the model’s semantic perception of course content. Finally, the fused features are fed into an improved SASRec model, where a multi-head self-attention mechanism is employed to model the evolution of user interests, enabling collaborative recommendations across behavioral and semantic modalities. Experiments conducted on the MOOCCubeX dataset (1.26 million users, 632 courses) demonstrated that the proposed method achieved NDCG@10 and HR@10 scores of 0.524 and 0.818, respectively, outperforming SASRec and semantic single-modality baselines. This study offers an efficient yet semantically rich recommendation solution for MOOC scenarios. Full article
Show Figures

Figure 1

23 pages, 3847 KB  
Article
Optimizing Sentiment Analysis in Multilingual Balanced Datasets: A New Comparative Approach to Enhancing Feature Extraction Performance with ML and DL Classifiers
by Hamza Jakha, Souad El Houssaini, Mohammed-Alamine El Houssaini, Souad Ajjaj and Abdelali Hadir
Appl. Syst. Innov. 2025, 8(4), 104; https://doi.org/10.3390/asi8040104 - 28 Jul 2025
Viewed by 3187
Abstract
Social network platforms have a big impact on the development of companies by influencing clients’ behaviors and sentiments, which directly affect corporate reputations. Analyzing this feedback has become an essential component of business intelligence, supporting the improvement of long-term marketing strategies on a [...] Read more.
Social network platforms have a big impact on the development of companies by influencing clients’ behaviors and sentiments, which directly affect corporate reputations. Analyzing this feedback has become an essential component of business intelligence, supporting the improvement of long-term marketing strategies on a larger scale. The implementation of powerful sentiment analysis models requires a comprehensive and in-depth examination of each stage of the process. In this study, we present a new comparative approach for several feature extraction techniques, including TF-IDF, Word2Vec, FastText, and BERT embeddings. These methods are applied to three multilingual datasets collected from hotel review platforms in the tourism sector in English, French, and Arabic languages. Those datasets were preprocessed through cleaning, normalization, labeling, and balancing before being trained on various machine learning and deep learning algorithms. The effectiveness of each feature extraction method was evaluated using metrics such as accuracy, F1-score, precision, recall, ROC AUC curve, and a new metric that measures the execution time for generating word representations. Our extensive experiments demonstrate significant and excellent results, achieving accuracy rates of approximately 99% for the English dataset, 94% for the Arabic dataset, and 89% for the French dataset. These findings confirm the important impact of vectorization techniques on the performance of sentiment analysis models. They also highlight the important relationship between balanced datasets, effective feature extraction methods, and the choice of classification algorithms. So, this study aims to simplify the selection of feature extraction methods and appropriate classifiers for each language, thereby contributing to advancements in sentiment analysis. Full article
(This article belongs to the Topic Social Sciences and Intelligence Management, 2nd Volume)
Show Figures

Figure 1

15 pages, 5650 KB  
Article
Enhancing Interprofessional Communication in Healthcare Using Large Language Models: Study on Similarity Measurement Methods with Weighted Noun Embeddings
by Ji-Young Yeo, Sungkwan Youm and Kwang-Seong Shin
Electronics 2025, 14(11), 2240; https://doi.org/10.3390/electronics14112240 - 30 May 2025
Viewed by 532
Abstract
Large language models (LLMs) are increasingly applied to specialized domains like medical education, necessitating tailored approaches to evaluate structured responses such as SBAR (Situation, Background, Assessment, Recommendation). This study developed an evaluation tool for nursing student responses using LLMs, focusing on word-based learning [...] Read more.
Large language models (LLMs) are increasingly applied to specialized domains like medical education, necessitating tailored approaches to evaluate structured responses such as SBAR (Situation, Background, Assessment, Recommendation). This study developed an evaluation tool for nursing student responses using LLMs, focusing on word-based learning and assessment methods to align automated scoring with expert evaluations. We propose a three-stage biasing approach: (1) integrating reference answers into the training corpus; (2) incorporating high-scoring student responses; (3) applying domain-critical token weighting through Weighted Noun Embeddings to enhance similarity measurements. By assigning higher weights to critical medical nouns and lower weights to less relevant terms, the embeddings prioritize domain-specific terminology. Employing Word2Vec and FastText models trained on general conversation, medical, and reference answer corpora alongside Sentence-BERT for comparison, our results demonstrate that biasing with reference answers, high-scoring responses, and weighted embeddings improves alignment with human evaluations. Word-based models, particularly after biasing, effectively distinguish high-performing responses from lower ones, as evidenced by increased cosine similarity differences. These findings validate that the proposed methodology enhances the precision and objectivity of evaluating descriptive answers, offering a practical solution for educational settings where fairness and consistency are paramount. Full article
(This article belongs to the Special Issue Deep Learning Approaches for Natural Language Processing)
Show Figures

Figure 1

28 pages, 1007 KB  
Article
Predicting the Event Types in the Human Brain: A Modeling Study Based on Embedding Vectors and Large-Scale Situation Type Datasets in Mandarin Chinese
by Xiaorui Ma and Hongchao Liu
Appl. Sci. 2025, 15(11), 5916; https://doi.org/10.3390/app15115916 - 24 May 2025
Viewed by 615
Abstract
Event types classify Chinese verbs based on the internal temporal structure of events. The categorization of verb event types is the most fundamental classification of concept types represented by verbs in the human brain. Meanwhile, event types exhibit strong predictive capabilities for exploring [...] Read more.
Event types classify Chinese verbs based on the internal temporal structure of events. The categorization of verb event types is the most fundamental classification of concept types represented by verbs in the human brain. Meanwhile, event types exhibit strong predictive capabilities for exploring collocational patterns between words, making them crucial for Chinese teaching. This work focuses on constructing a statistically validated gold-standard dataset, forming the foundation for achieving high accuracy in recognizing verb event types. Utilizing a manually annotated dataset of verbs and aspectual markers’ co-occurrence features, the research conducts hierarchical clustering of Chinese verbs. The resulting dendrogram indicates that verbs can be categorized into three event types—state, activity and transition—based on semantic distance. Two approaches are employed to construct vector matrices: a supervised method that derives word vectors based on linguistic features, and an unsupervised method that uses four models to extract embedding vectors, including Word2Vec, FastText, BERT and ChatGPT. The classification of verb event types is performed using three classifiers: multinomial logistic regression, support vector machines and artificial neural networks. Experimental results demonstrate the superior performance of embedding vectors. Employing the pre-trained FastText model in conjunction with an artificial neural network classifier, the model achieves an accuracy of 98.37% in predicting 3133 verbs, thereby enabling the automatic identification of event types at the level of Chinese verbs and validating the high accuracy and practical value of embedding vectors in addressing complex semantic relationships and classification tasks. This work constructs datasets of considerable semantic complexity, comprising a substantial volume of verbs along with their feature vectors and situation type labels, which can be used for evaluating large language models in the future. Full article
(This article belongs to the Special Issue Application of Artificial Intelligence and Semantic Mining Technology)
Show Figures

Figure 1

14 pages, 223 KB  
Proceeding Paper
Handling Semantic Relationships for Classification of Sparse Text: A Review
by Safuan and Ku Ruhana Ku-Mahamud
Eng. Proc. 2025, 84(1), 61; https://doi.org/10.3390/engproc2025084061 - 17 Feb 2025
Viewed by 1003
Abstract
The classification of sparse text, common in short or specialized content, is challenging for natural language processing. These challenges stem from high-dimensional data and scarce relevant features because sparse text can result from noisy, short, or contextually limited inputs. This paper reviews approaches [...] Read more.
The classification of sparse text, common in short or specialized content, is challenging for natural language processing. These challenges stem from high-dimensional data and scarce relevant features because sparse text can result from noisy, short, or contextually limited inputs. This paper reviews approaches for handling semantic relationships in sparse text classification. Approaches like FastText and Latent Dirichlet Allocation are discussed for addressing feature sparsity while maintaining semantic integrity. Embedding techniques, such as Word2Vec and BERT, are crucial for capturing contextual meanings and improving accuracy. Recent advances include hybrid models that combine deep learning and traditional methods for better performance. These approaches work across various datasets, including social media and scientific publications. Finally, progress in using semantic relationships for sparse text classification is reviewed, and open challenges and future research directions are identified to better integrate semantic understanding in sparse text classification. Full article
26 pages, 2692 KB  
Article
Automated Research Review Support Using Machine Learning, Large Language Models, and Natural Language Processing
by Vishnu S. Pendyala, Karnavee Kamdar and Kapil Mulchandani
Electronics 2025, 14(2), 256; https://doi.org/10.3390/electronics14020256 - 9 Jan 2025
Cited by 4 | Viewed by 3875
Abstract
Research expands the boundaries of a subject, economy, and civilization. Peer review is at the heart of research and is understandably an expensive process. This work, with human-in-the-loop, aims to support the research community in multiple ways. It predicts quality, and acceptance, and [...] Read more.
Research expands the boundaries of a subject, economy, and civilization. Peer review is at the heart of research and is understandably an expensive process. This work, with human-in-the-loop, aims to support the research community in multiple ways. It predicts quality, and acceptance, and recommends reviewers. It helps the authors and editors to evaluate research work using machine learning models developed based on a dataset comprising 18,000+ research papers, some of which are from highly acclaimed, top conferences in Artificial Intelligence such as NeurIPS and ICLR, their reviews, aspect scores, and accept/reject decisions. Using machine learning algorithms such as Support Vector Machines, Deep Learning Recurrent Neural Network architectures such as LSTM, a wide variety of pre-trained word vectors using Word2Vec, GloVe, FastText, transformer architecture-based BERT, DistilBERT, Google’s Large Language Model (LLM), PaLM 2, and TF-IDF vectorizer, a comprehensive system is built. For the system to be readily usable and to facilitate future enhancements, a frontend, a Flask server in the cloud, and a NOSQL database at the backend are implemented, making it a complete system. The work is novel in using a unique blend of tools and techniques to address most aspects of building a system to support the peer review process. The experiments result in a 86% test accuracy on acceptance prediction using DistilBERT. Results from other models are comparable, with PaLM-based LLM embeddings achieving 84% accuracy. Full article
(This article belongs to the Special Issue Data-Centric Artificial Intelligence: New Methods for Data Processing)
Show Figures

Figure 1

20 pages, 1728 KB  
Article
Sentence Embedding Generation Framework Based on Kullback–Leibler Divergence Optimization and RoBERTa Knowledge Distillation
by Jin Han and Liang Yang
Mathematics 2024, 12(24), 3990; https://doi.org/10.3390/math12243990 - 18 Dec 2024
Viewed by 2304
Abstract
In natural language processing (NLP) tasks, computing semantic textual similarity (STS) is crucial for capturing nuanced semantic differences in text. Traditional word vector methods, such as Word2Vec and GloVe, as well as deep learning models like BERT, face limitations in handling context dependency [...] Read more.
In natural language processing (NLP) tasks, computing semantic textual similarity (STS) is crucial for capturing nuanced semantic differences in text. Traditional word vector methods, such as Word2Vec and GloVe, as well as deep learning models like BERT, face limitations in handling context dependency and polysemy and present challenges in computational resources and real-time processing. To address these issues, this paper introduces two novel methods. First, a sentence embedding generation method based on Kullback–Leibler Divergence (KLD) optimization is proposed, which enhances semantic differentiation between sentence vectors, thereby improving the accuracy of textual similarity computation. Second, this study proposes a framework incorporating RoBERTa knowledge distillation, which integrates the deep semantic insights of the RoBERTa model with prior methodologies to enhance sentence embeddings while preserving computational efficiency. Additionally, the study extends its contributions to sentiment analysis tasks by leveraging the enhanced embeddings for classification. The sentiment analysis experiments, conducted using a Stochastic Gradient Descent (SGD) classifier on the ACL IMDB dataset, demonstrate the effectiveness of the proposed methods, achieving high precision, recall, and F1 score metrics. To further augment model accuracy and efficacy, a feature selection approach is introduced, specifically through the Dynamic Principal Component Selection (DPCS) algorithm. The DPCS method autonomously identifies and prioritizes critical features, thus enriching the expressive capacity of sentence vectors and significantly advancing the accuracy of similarity computations. Experimental results demonstrate that our method outperforms existing methods in semantic similarity computation on the SemEval-2016 dataset. When evaluated using cosine similarity of average vectors, our model achieved a Pearson correlation coefficient (τ) of 0.470, a Spearman correlation coefficient (ρ) of 0.481, and a mean absolute error (MAE) of 2.100. Compared to traditional methods such as Word2Vec, GloVe, and FastText, our method significantly enhances similarity computation accuracy. Using TF-IDF-weighted cosine similarity evaluation, our model achieved a τ of 0.528, ρ of 0.518, and an MAE of 1.343. Additionally, in the cosine similarity assessment leveraging the Dynamic Principal Component Smoothing (DPCS) algorithm, our model achieved a τ of 0.530, ρ of 0.518, and an MAE of 1.320, further demonstrating the method’s effectiveness and precision in handling semantic similarity. These results indicate that our proposed method has high relevance and low error in semantic textual similarity tasks, thereby better capturing subtle semantic differences between texts. Full article
Show Figures

Figure 1

19 pages, 1401 KB  
Article
Enhancing Arabic Sentiment Analysis of Consumer Reviews: Machine Learning and Deep Learning Methods Based on NLP
by Hani Almaqtari, Feng Zeng and Ammar Mohammed
Algorithms 2024, 17(11), 495; https://doi.org/10.3390/a17110495 - 3 Nov 2024
Cited by 2 | Viewed by 2534
Abstract
Sentiment analysis utilizes Natural Language Processing (NLP) techniques to extract opinions from text, which is critical for businesses looking to refine strategies and better understand customer feedback. Understanding people’s sentiments about products through emotional tone analysis is paramount. However, analyzing sentiment in Arabic [...] Read more.
Sentiment analysis utilizes Natural Language Processing (NLP) techniques to extract opinions from text, which is critical for businesses looking to refine strategies and better understand customer feedback. Understanding people’s sentiments about products through emotional tone analysis is paramount. However, analyzing sentiment in Arabic and its dialects poses challenges due to the language’s intricate morphology, right-to-left script, and nuanced emotional expressions. To address this, this study introduces the Arb-MCNN-Bi Model, which integrates the strengths of the transformer-based AraBERT (Arabic Bidirectional Encoder Representations from Transformers) model with a Multi-channel Convolutional Neural Network (MCNN) and a Bidirectional Gated Recurrent Unit (BiGRU) for Arabic sentiment analysis. AraBERT, designed specifically for Arabic, captures rich contextual information through word embeddings. These embeddings are processed by the MCNN to enhance feature extraction and by the BiGRU to retain long-term dependencies. The final output is obtained through feedforward neural networks. The study compares the proposed model with various machine learning and deep learning methods, applying advanced NLP techniques such as Term Frequency-Inverse Document Frequency (TF-IDF), n-gram, Word2Vec (Skip-gram), and fastText (Skip-gram). Experiments are conducted on three Arabic datasets: the Arabic Customer Reviews Dataset (ACRD), Large-scale Arabic Book Reviews (LABR), and the Hotel Arabic Reviews dataset (HARD). The Arb-MCNN-Bi model with AraBERT achieved accuracies of 96.92%, 96.68%, and 92.93% on the ACRD, HARD, and LABR datasets, respectively. These results demonstrate the model’s effectiveness in analyzing Arabic text data and outperforming traditional approaches. Full article
Show Figures

Figure 1

23 pages, 410 KB  
Article
Towards AI-Generated Essay Classification Using Numerical Text Representation
by Natalia Krawczyk, Barbara Probierz and Jan Kozak
Appl. Sci. 2024, 14(21), 9795; https://doi.org/10.3390/app14219795 - 26 Oct 2024
Cited by 2 | Viewed by 1986
Abstract
The detection of essays written by AI compared to those authored by students is increasingly becoming a significant issue in educational settings. This research examines various numerical text representation techniques to improve the classification of these essays. Utilizing a diverse dataset, we undertook [...] Read more.
The detection of essays written by AI compared to those authored by students is increasingly becoming a significant issue in educational settings. This research examines various numerical text representation techniques to improve the classification of these essays. Utilizing a diverse dataset, we undertook several preprocessing steps, including data cleaning, tokenization, and lemmatization. Our system analyzes different text representation methods such as Bag of Words, TF-IDF, and fastText embeddings in conjunction with multiple classifiers. Our experiments showed that TF-IDF weights paired with logistic regression reached the highest accuracy of 99.82%. Methods like Bag of Words, TF-IDF, and fastText embeddings achieved accuracies exceeding 96.50% across all tested classifiers. Sentence embeddings, including MiniLM and distilBERT, yielded accuracies from 93.78% to 96.63%, indicating room for further refinement. Conversely, pre-trained fastText embeddings showed reduced performance, with a lowest accuracy of 89.88% in logistic regression. Remarkably, the XGBoost classifier delivered the highest minimum accuracy of 96.24%. Specificity and precision were above 99% for most methods, showcasing high capability in differentiating between student-created and AI-generated texts. This study underscores the vital role of choosing dataset-specific text representations to boost classification accuracy. Full article
Show Figures

Figure 1

21 pages, 4107 KB  
Article
Sentiment Analysis: Predicting Product Reviews for E-Commerce Recommendations Using Deep Learning and Transformers
by Oumaima Bellar, Amine Baina and Mostafa Ballafkih
Mathematics 2024, 12(15), 2403; https://doi.org/10.3390/math12152403 - 2 Aug 2024
Cited by 11 | Viewed by 12255
Abstract
The abundance of publicly available data on the internet within the e-marketing domain is consistently expanding. A significant portion of this data revolve around consumers’ perceptions and opinions regarding the goods or services of organizations, making it valuable for market intelligence collectors in [...] Read more.
The abundance of publicly available data on the internet within the e-marketing domain is consistently expanding. A significant portion of this data revolve around consumers’ perceptions and opinions regarding the goods or services of organizations, making it valuable for market intelligence collectors in marketing, customer relationship management, and customer retention. Sentiment analysis serves as a tool for examining customer sentiment, marketing initiatives, and product appraisals. This valuable information can inform decisions related to future product and service development, marketing campaigns, and customer service enhancements. In social media, predicting ratings is commonly employed to anticipate product ratings based on user reviews. Our study provides an extensive benchmark comparison of different deep learning models, including convolutional neural networks (CNN), recurrent neural networks (RNN), and bi-directional long short-term memory (Bi-LSTM). These models are evaluated using various word embedding techniques, such as bi-directional encoder representations from transformers (BERT) and its derivatives, FastText, and Word2Vec. The evaluation encompasses two setups: 5-class versus 3-class. This paper focuses on sentiment analysis using neural network-based models for consumer sentiment prediction by evaluating and contrasting their performance indicators on a dataset of reviews of different products from customers of an online women’s clothes retailer. Full article
Show Figures

Figure 1

22 pages, 3560 KB  
Article
An Efficient Deep Learning for Thai Sentiment Analysis
by Nattawat Khamphakdee and Pusadee Seresangtakul
Data 2023, 8(5), 90; https://doi.org/10.3390/data8050090 - 13 May 2023
Cited by 22 | Viewed by 5931
Abstract
The number of reviews from customers on travel websites and platforms is quickly increasing. They provide people with the ability to write reviews about their experience with respect to service quality, location, room, and cleanliness, thereby helping others before booking hotels. Many people [...] Read more.
The number of reviews from customers on travel websites and platforms is quickly increasing. They provide people with the ability to write reviews about their experience with respect to service quality, location, room, and cleanliness, thereby helping others before booking hotels. Many people fail to consider hotel bookings because the numerous reviews take a long time to read, and many are in a non-native language. Thus, hotel businesses need an efficient process to analyze and categorize the polarity of reviews as positive, negative, or neutral. In particular, low-resource languages such as Thai have greater limitations in terms of resources to classify sentiment polarity. In this paper, a sentiment analysis method is proposed for Thai sentiment classification in the hotel domain. Firstly, the Word2Vec technique (the continuous bag-of-words (CBOW) and skip-gram approaches) was applied to create word embeddings of different vector dimensions. Secondly, each word embedding model was combined with deep learning (DL) models to observe the impact of each word vector dimension result. We compared the performance of nine DL models (CNN, LSTM, Bi-LSTM, GRU, Bi-GRU, CNN-LSTM, CNN-BiLSTM, CNN-GRU, and CNN-BiGRU) with different numbers of layers to evaluate their performance in polarity classification. The dataset was classified using the FastText and BERT pre-trained models to carry out the sentiment polarity classification. Finally, our experimental results show that the WangchanBERTa model slightly improved the accuracy, producing a value of 0.9225, and the skip-gram and CNN model combination outperformed other DL models, reaching an accuracy of 0.9170. From the experiments, we found that the word vector dimensions, hyperparameter values, and the number of layers of the DL models affected the performance of sentiment classification. Our research provides guidance for setting suitable hyperparameter values to improve the accuracy of sentiment classification for the Thai language in the hotel domain. Full article
Show Figures

Figure 1

23 pages, 32221 KB  
Article
Learned Text Representation for Amharic Information Retrieval and Natural Language Processing
by Tilahun Yeshambel, Josiane Mothe and Yaregal Assabie
Information 2023, 14(3), 195; https://doi.org/10.3390/info14030195 - 20 Mar 2023
Cited by 13 | Viewed by 6362
Abstract
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations for natural language processing (NLP) and other tasks. Many NLP applications rely on pre-trained text representations, leading to the development [...] Read more.
Over the past few years, word embeddings and bidirectional encoder representations from transformers (BERT) models have brought better solutions to learning text representations for natural language processing (NLP) and other tasks. Many NLP applications rely on pre-trained text representations, leading to the development of a number of neural network language models for various languages. However, this is not the case for Amharic, which is known to be a morphologically complex and under-resourced language. Usable pre-trained models for automatic Amharic text processing are not available. This paper presents an investigation on the essence of learned text representation for information retrieval and NLP tasks using word embeddings and BERT language models. We explored the most commonly used methods for word embeddings, including word2vec, GloVe, and fastText, as well as the BERT model. We investigated the performance of query expansion using word embeddings. We also analyzed the use of a pre-trained Amharic BERT model for masked language modeling, next sentence prediction, and text classification tasks. Amharic ad hoc information retrieval test collections that contain word-based, stem-based, and root-based text representations were used for evaluation. We conducted a detailed empirical analysis on the usability of word embeddings and BERT models on word-based, stem-based, and root-based corpora. Experimental results show that word-based query expansion and language modeling perform better than stem-based and root-based text representations, and fastText outperforms other word embeddings on word-based corpus. Full article
(This article belongs to the Special Issue Novel Methods and Applications in Natural Language Processing)
Show Figures

Figure 1

16 pages, 1438 KB  
Article
A Comparative Analysis of Word Embedding and Deep Learning for Arabic Sentiment Classification
by Sahar F. Sabbeh and Heba A. Fasihuddin
Electronics 2023, 12(6), 1425; https://doi.org/10.3390/electronics12061425 - 16 Mar 2023
Cited by 31 | Viewed by 7384
Abstract
Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of sentiment analysis is disrupted by the challenges of natural language processing (NLP). Recently, deep learning models have [...] Read more.
Sentiment analysis on social media platforms (i.e., Twitter or Facebook) has become an important tool to learn about users’ opinions and preferences. However, the accuracy of sentiment analysis is disrupted by the challenges of natural language processing (NLP). Recently, deep learning models have proved superior performance over statistical- and lexical-based approaches in NLP-related tasks. Word embedding is an important layer of deep learning models to generate input features. Many word embedding models have been presented for text representation of both classic and context-based word embeddings. In this paper, we present a comparative analysis to evaluate both classic and contextualized word embeddings for sentiment analysis. The four most frequently used word embedding techniques were used in their trained and pre-trained versions. The selected embedding represents classical and contextualized techniques. Classical word embedding includes algorithms such as GloVe, Word2vec, and FastText. By contrast, ARBERT is used as a contextualized embedding model. Since word embedding is more typically employed as the input layer in deep networks, we used deep learning architectures BiLSTM and CNN for sentiment classification. To achieve these goals, the experiments were applied to a series of benchmark datasets: HARD, Khooli, AJGT, ArSAS, and ASTD. Finally, a comparative analysis was conducted on the results obtained for the experimented models. Our outcomes indicate that, generally, generated embedding by one technique achieves higher performance than its pretrained version for the same technique by around 0.28 to 1.8% accuracy, 0.33 to 2.17% precision, and 0.44 to 2% recall. Moreover, the contextualized transformer-based embedding model BERT achieved the highest performance in its pretrained and trained versions. Additionally, the results indicate that BiLSTM outperforms CNN by approximately 2% in 3 datasets, HARD, Khooli, and ArSAS, while CNN achieved around 2% higher performance in the smaller datasets, AJGT and ASTD. Full article
Show Figures

Figure 1

17 pages, 4611 KB  
Article
The Application of Artificial Intelligence to Automate Sensory Assessments Combining Pretrained Transformers with Word Embedding Based on the Online Sensory Marketing Index
by Kevin Hamacher and Rüdiger Buchkremer
Computers 2022, 11(9), 129; https://doi.org/10.3390/computers11090129 - 26 Aug 2022
Cited by 2 | Viewed by 5024
Abstract
We present how artificial intelligence (AI)-based technologies create new opportunities to capture and assess sensory marketing elements. Based on the Online Sensory Marketing Index (OSMI), a sensory assessment framework designed to evaluate e-commerce websites manually, the goal is to offer an alternative procedure [...] Read more.
We present how artificial intelligence (AI)-based technologies create new opportunities to capture and assess sensory marketing elements. Based on the Online Sensory Marketing Index (OSMI), a sensory assessment framework designed to evaluate e-commerce websites manually, the goal is to offer an alternative procedure to assess sensory elements such as text and images automatically. This approach aims to provide marketing managers with valuable insights and potential for sensory marketing improvements. To accomplish the task, we initially reviewed 469 related peer-reviewed scientific publications. In this process, manual reading is complemented by a validated AI methodology. We identify relevant topics and check if they exhibit a comprehensible distribution over the last years. We recognize and discuss similar approaches from machine learning and the big data environment. We apply state-of-the-art methods from the natural language processing domain for the principal analysis, such as word embedding techniques GloVe and Word2Vec, and leverage transformers such as BERT. To validate the performance of our newly developed AI approach, we compare results with manually collected parameters from previous studies and observe similar findings in both procedures. Our results reveal a functional and scalable AI approach for determining the OSMI for industries, companies, or even individual (sub-) websites. In addition, the new AI selection and assessment procedures are extremely fast, with only a small loss in performance compared to a manual evaluation. It resembles an efficient way to evaluate sensory marketing efforts. Full article
Show Figures

Graphical abstract

19 pages, 1714 KB  
Article
Transfer Learning for Sentiment Analysis Using BERT Based Supervised Fine-Tuning
by Nusrat Jahan Prottasha, Abdullah As Sami, Md Kowsher, Saydul Akbar Murad, Anupam Kumar Bairagi, Mehedi Masud and Mohammed Baz
Sensors 2022, 22(11), 4157; https://doi.org/10.3390/s22114157 - 30 May 2022
Cited by 165 | Viewed by 22736
Abstract
The growth of the Internet has expanded the amount of data expressed by users across multiple platforms. The availability of these different worldviews and individuals’ emotions empowers sentiment analysis. However, sentiment analysis becomes even more challenging due to a scarcity of standardized labeled [...] Read more.
The growth of the Internet has expanded the amount of data expressed by users across multiple platforms. The availability of these different worldviews and individuals’ emotions empowers sentiment analysis. However, sentiment analysis becomes even more challenging due to a scarcity of standardized labeled data in the Bangla NLP domain. The majority of the existing Bangla research has relied on models of deep learning that significantly focus on context-independent word embeddings, such as Word2Vec, GloVe, and fastText, in which each word has a fixed representation irrespective of its context. Meanwhile, context-based pre-trained language models such as BERT have recently revolutionized the state of natural language processing. In this work, we utilized BERT’s transfer learning ability to a deep integrated model CNN-BiLSTM for enhanced performance of decision-making in sentiment analysis. In addition, we also introduced the ability of transfer learning to classical machine learning algorithms for the performance comparison of CNN-BiLSTM. Additionally, we explore various word embedding techniques, such as Word2Vec, GloVe, and fastText, and compare their performance to the BERT transfer learning strategy. As a result, we have shown a state-of-the-art binary classification performance for Bangla sentiment analysis that significantly outperforms all embedding and algorithms. Full article
(This article belongs to the Topic Artificial Intelligence in Sensors)
Show Figures

Figure 1

Back to TopTop