Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement

Nawawi, Ibrahim; Ilmawan, Kurnia Fahmy; Maarif, Muhammad Rifqi; Syafrudin, Muhammad

doi:10.3390/info15080499

Open AccessArticle

Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement

¹

Department of Electrical, Mechatronics and Information Engineering, Tidar University, Magelang 56116, Indonesia

²

Department of Tourism, Tidar University, Magelang 56116, Indonesia

³

Department of Mechanical and Industrial Engineering, Tidar University, Magelang 56116, Indonesia

⁴

Department of Artificial Intelligence and Data Science, Sejong University, Seoul 05006, Republic of Korea

^*

Authors to whom correspondence should be addressed.

Information 2024, 15(8), 499; https://doi.org/10.3390/info15080499

Submission received: 16 July 2024 / Revised: 16 August 2024 / Accepted: 18 August 2024 / Published: 20 August 2024

(This article belongs to the Special Issue Advances in Data and Network Sciences Applied to Computational Social Science)

Download

Browse Figures

Versions Notes

Abstract

:

Hospitality services play a crucial role in shaping tourist satisfaction and revisiting intention toward destinations. Traditional feedback methods like surveys often fail to capture the nuanced and real-time experiences of tourists. Digital platforms such as TripAdvisor, Yelp, and Google Reviews provide a rich source of user-generated content, but the sheer volume of reviews makes manual analysis impractical. This study proposes integrating aspect-based sentiment analysis with zero-shot learning to analyze online tourist reviews effectively without requiring extensive annotated datasets. Using pretrained models like RoBERTa, the research framework involves keyword extraction, sentence segment detection, aspect construction, and sentiment polarity measurement. The dataset, sourced from TripAdvisor reviews of attractions, hotels, and restaurants in Central Java, Indonesia, underwent preprocessing to ensure suitability for analysis. The results highlight the importance of aspects such as food, accommodation, and cultural experiences in tourist satisfaction. The findings indicate a need for continuous service improvement to meet evolving tourist expectations, demonstrating the potential of advanced natural language processing techniques in enhancing hospitality services and customer satisfaction.

Keywords:

natural language processing; large language model application; tourism data analytics; tourist feedback analysis; tourist preference understanding

1. Introduction

Hospitality services play a critical role in shaping the tourist experience, which, in turn, influences their satisfaction and intention to revisit the destinations. Positive experiences in hospitality services like accommodation, dining, and customer service can enhance a tourist’s impression, while negative ones can deter future visits. Hence, there is a clear need to regularly capture and evaluate tourist feedback about their preferences and perceptions of the hospitality services they receive. Traditional methods to acquire tourist feedback like surveys often fail to capture nuanced experiences due to predefined questions and limited responses, leading to low participation and less rich feedback. In contrast, digital platforms like TripAdvisor, Yelp, and Google Reviews allow tourists to share detailed experiences through online reviews, offering valuable insights for both travelers and service providers [1]. This user-generated content (UGC) helps travelers make informed decisions and provides businesses with feedback to identify strengths and areas for improvement, enabling them to tailor their services to better meet customer needs and enhance overall satisfaction [2].

With the increasing volume of online reviews, there is a growing need to analyze these data systematically to extract meaningful insights that can enhance service quality and overall customer satisfaction. Platforms like TripAdvisor generate an immense amount of user-generated content, making manual analysis impractical and inefficient. Hence, in recent years, there has been growing research using natural language processing (NLP) as a computational tool for analyzing textual review data. One of the most adopted NLP techniques in tourism is sentiment analysis, which has been widely applied in the tourism industry to gauge tourist satisfaction and preferences [3]. This method processes large volumes of textual data to determine the overall sentiment expressed by tourists through their reviews on online platforms or social media [4,5].

Despite its wide adoption and practicality, traditional sentiment analysis often lacks the nuances and specific aspects of the tourist experience embedded within a single review. This issue is due to the nature of sentiment analysis which tends to treat entire reviews as monolithic entities [6]. For instance, a review might praise the cleanliness of a hotel room but criticize the quality of customer service. Regular sentiment analysis struggles to capture these distinct sentiments accurately, potentially leading to a misleading summary of the review’s overall tone. Furthermore, this approach does not provide detailed insights into specific areas that require improvement or are performing well, making it difficult for businesses to take targeted actions based on the feedback. To overcome these issues, a specified technique of sentiment analysis, named aspect-based sentiment analysis (ABSA), addresses these limitations by dissecting reviews into specific components, allowing for a more nuanced understanding of tourist feedback [7].

By analyzing sentiments associated with distinct aspects, such as cleanliness, customer service, and amenities, ABSA provides detailed insights that can help businesses identify precise areas for improvement and strengths to leverage [8]. This granular approach facilitates more targeted and effective actions, enhancing overall service quality and tourist satisfaction. However, despite its benefits, ABSA faces challenges, particularly concerning the availability of suitable training data [9]. The effectiveness of ABSA largely depends on the quality and quantity of annotated datasets that accurately represent the diverse aspects of tourist experiences. Unfortunately, such datasets are often limited, making it difficult to train models effectively. This scarcity of training data can impede the ability of ABSA to accurately capture the full range of sentiments expressed in reviews, potentially limiting its utility [10].

One solution to overcome the limitations of ABSA due to scarce training data is the application of Zero-Shot Learning (ZSL) [11,12]. ZSL allows models to classify data without extensive task-specific training by leveraging pretrained language models like BERT [13,14]. This research proposes integrating ABSA with ZSL to analyze online tourist reviews effectively without needing large annotated datasets. The objectives of this study are twofold, as follows: (1) to develop a comprehensive framework for extracting and analyzing aspects of tourist experiences from online reviews using ABSA with Zero-Shot Learning, and (2) to provide actionable recommendations for hospitality service providers based on the analysis of sentiments associated with various aspects of tourist experiences. This approach provides a detailed and accurate understanding of tourist feedback, enabling businesses to make informed decisions and improve their services.

By systematically analyzing a large dataset of tourist reviews, this study highlights the importance of specific aspects such as food, accommodation, and cultural experiences in shaping overall tourist satisfaction. The findings underscore the need for continuous monitoring and adaptation of service offerings to meet the evolving expectations of tourists, ultimately enhancing the competitiveness and appeal of tourist destinations. This research contributes to the growing body of knowledge on the application of advanced NLP techniques in the tourism sector and demonstrates the potential of ABSA and ZSL in driving service improvements and customer satisfaction. The primary research questions addressed in this study are as follows: (1) How does tourist satisfaction vary across specific aspects such as food, accommodation, and cultural activities? (2) How can advanced NLP techniques like ABSA and ZSL be applied to effectively analyze tourist feedback for service improvement in the hospitality industry? By answering these questions, this study aims to provide actionable insights for enhancing hospitality services.

2. Social Media Analytics for Tourism and Hospitality

Recent studies have increasingly focused on leveraging social media analytics to understand and enhance the tourism and hospitality industry, particularly through the examination of customer feedback and sentiment expressed in online reviews. Several researchers have explored how social media platforms can be utilized to assess customer satisfaction and identify areas for service improvement. For instance, Park et al. [15] and Kim et al. [16] both investigated the impact of specific service attributes on customer emotions and satisfaction. While Park et al. [15] analyzed the attractiveness of service robots and their influence on customer emotions, Kim et al. [16] focused on understanding tourist satisfaction and dissatisfaction through key satisfaction attributes, emphasizing the significant role these attributes play in shaping overall customer experiences. Both studies underline the importance of specific service elements in the hospitality sector and their direct influence on customer sentiment.

Another notable area of research has been the analysis of online reviews to identify complaints and areas for improvement within hospitality services. Cevrimkaya and Sengel [17] concentrated on the complaints of local tourists at five-star hotels, revealing that issues related to ambiance, food, and staff were the most prevalent. This study, along with the findings of Qiang et al. [18], who identified various tourist interaction actors and emotions through user-generated content, highlights the utility of mixed methods and sentiment analysis in uncovering nuanced feedback. These studies emphasize the critical role of online reviews as data sources for identifying both strengths and weaknesses in service delivery, allowing businesses to make informed decisions about where to focus their improvement efforts.

Cluster analysis and sentiment clustering have also emerged as powerful techniques in the realm of social media analytics for tourism. Ghosh and Mukherjee [19] employed clustering techniques to group travelers based on their social media interactions, leading to the identification of distinct traveler segments with similar behaviors. Similarly, Mirzaalian and Halpenny [20] developed analytics methods to extract sentiments and loyalty behaviors from online reviews, providing insights into visitor preferences in natural settings such as Jasper National Park. Both studies illustrate how clustering and segmentation can be used to understand diverse traveler preferences and behaviors, enabling the creation of more targeted and effective marketing strategies.

In addition to clustering, the development of recommender systems and evaluation frameworks has been another key focus in the literature. Yong et al. [21] proposed a recommender system that ranks tourist attractions based on aspects extracted from online reviews. This system integrates sentiment analysis with multicriteria decision-making methods to provide comprehensive rankings of tourist attractions. The approach taken by Yong et al. [21] is echoed in the work of Skotis and Livas [22], who used a similar framework to assess factors influencing traveler satisfaction in historic districts. Both studies highlight the potential of combining sentiment analysis with evaluation frameworks to enhance the decision-making process for tourists and service providers alike.

Moreover, studies have explored the relationship between tourist experiences and a destination’s cultural and environmental attributes. Liu et al. [15] investigated the role of social media affordances in shaping destination image formation, finding that these affordances directly impact tourists’ cognitive and affective perceptions. This finding aligns with the research of Taechurangroj and Stoica [23], who compared place experiences using AI-generated lexicons to reveal how different aspects of the environment and cultural context influence tourist satisfaction. These studies emphasize the importance of considering both physical and cultural elements when evaluating tourist experiences, as they play a significant role in shaping overall satisfaction and destination image.

Eventually, the application of advanced sentiment analysis techniques such as bi-directional models and multicriteria evaluation is gaining traction in tourism research. Chen et al. [24] explored the use of a bi-directional hotel attribute performance analysis to identify attributes critical for improving customer ratings. Their study, along with the work of Yong et al. [21], underscores the growing importance of advanced analytical techniques in providing deeper insights into customer satisfaction. These approaches not only enhance the precision of sentiment analysis but also offer practical tools for service providers to optimize their offerings based on detailed customer feedback.

3. Materials and Methods

3.1. Dataset Collection and Preparation

The dataset for this study was sourced from TripAdvisor, a widely used online platform where tourists share reviews about their experiences at various attractions, hotels, and restaurants. TripAdvisor was chosen because of its extensive collection of user-generated content that provides rich and diverse insights into tourist experiences. TripAdvisor is a credible and representative data source for tourism research due to its extensive global reach and trusted reputation. As a leading online travel platform, it hosts millions of reviews from a diverse user base, offering insights into various aspects of the hospitality sector. This diversity ensures that the data reflect a wide range of cultural and demographic perspectives, enhancing the generalizability of the findings [25]. TripAdvisor significantly influences consumer decision making, as travelers often consult its reviews to make informed choices about accommodations, dining, and attractions. The detailed user-generated content provides valuable insights into tourist satisfaction and dissatisfaction, making TripAdvisor an ideal source for analyzing nuanced aspects of the tourist experience [26].

In data collection, we focused on hospitality services (attractions, hotels, and restaurants) from the following four major cities in the central part of Java Island, Indonesia: Semarang, Yogyakarta, Magelang, and Surakarta. These cities were selected because of their cultural significance, with notable attractions such as the Borobudur Temple in Magelang and various cultural attractions in Yogyakarta.

The Python (version 3.10) libraries Scrapy (version 2.11.2) were used to scrape reviews from TripAdvisor (Needham, MA, USA). The scraping process involved extracting URLs of pages related to attractions, hotels, and restaurants in the selected cities. Afterward, the scrapping process continued by gathering all available reviews from these URLs, including the review text, rating, date of review, and reviewer details. From the collected data, we determined the time range of the reviews posted on the platform, starting from September 2010 to August 2024. Eventually, the scraped reviews were compiled into a structured dataset, resulting in a total of approximately 21,390 reviews from 477 distinct objects.

After the collection process, the data underwent several preprocessing steps to ensure their suitability for analysis, as follows:

Cleaning: this preprocessing step was conducted by removing irrelevant information, like eliminating non-review-related content, such as advertisements, HTML tags, and other extraneous information;
Tokenization: splitting the review text into individual tokens, such as words or phrases, to facilitate analysis;
Stop-Word Removal: removing commonly used words (e.g., “and”, “the”, and “is”) that do not contribute to sentiment analysis, using a predefined stop-word list;
Normalization: converting all text to lowercase to avoid case sensitivity issues and standardizing different spellings of the same word to ensure uniformity.

Those preprocessing steps were crucial for preparing the dataset for subsequent aspect extraction and sentiment analysis.

3.2. Research Framework

The research framework involves a series of systematic steps designed to extract and analyze key aspects of tourist experiences from online reviews. Figure 1 depicts the step-by-step approach performed in this research.

Elaborating on the framework depicted in Figure 1, after data preprocessing, the following steps were undertaken:

Keyword Extraction: After preprocessing the dataset, the next step involved extracting keywords using a zero-shot keyword extraction technique. We employed KeyBERT, which uses the pretrained model RoBERTa (Robustly optimized BERT approach). RoBERTa was chosen over other pretrained models like BERT or GPT-3 because of its improved training methodology, which includes larger mini-batches, removing the next sentence prediction objective, and dynamically changing the masking pattern applied to the training data. These enhancements enable RoBERTa to achieve better performance in understanding the context and semantics of the text, making it highly suitable for keyword extraction.
Aspect Candidate Preservation: After the keywords’ extraction, we then filtered the identified keywords based on frequency of appearance. We applied a simple threshold approach by taking the mean minus the standard deviation to define the threshold. The keywords that had a frequency above the threshold are then retained as aspect candidates. These keywords represented the various elements of tourist experiences discussed in the reviews, serving as the foundation for further analysis.
Clustering of Keywords: The next step involved clustering these keywords to form more abstract aspects. K-means clustering was employed for this purpose, using word embeddings generated by RoBERTa (accessed on 5 August 2024 at https://huggingface.co/docs/transformers/en/model_doc/roberta) to calculate the similarity among keywords. RoBERTa’s robust contextual embeddings facilitated the identification of semantically similar keywords, ensuring meaningful clusters.
Visualization and Construction of Abstract Aspects: To visualize the keyword clusters, T-SNE (t-distributed stochastic neighbor embedding) was used. T-SNE is effective in reducing the dimensionality of high-dimensional data, making it easier to visualize clusters in a two-dimensional space. This visualization helped in constructing more abstract aspects by grouping semantically similar words. Each abstract aspect represented a set of related keywords, providing a comprehensive view of the main themes and sentiments expressed in the tourist reviews.
Segment Detection: Next, we identified segments of each review that were related to the preserved keywords. For this purpose, sentence embeddings were employed to capture the contextual meaning of sentences. Zero-shot learning was utilized again, using Sentence-BERT (SBERT) with the same pretrained RoBERTa model. Sentence-BERT effectively maps sentences to fixed-size vectors, allowing for efficient similarity calculations. This method ensures that the segments most relevant to the identified keywords are accurately detected.
Sentiment Polarity Measurement: Each identified segment, along with its corresponding keywords, was then subjected to sentiment polarity measurement using VADER (Valence Aware Dictionary and Sentiment Reasoner). VADER is specifically designed for sentiment analysis in social media texts, providing accurate sentiment scores (positive, negative, or neutral) for each segment. This step ensured that each keyword was associated with a corresponding sentiment, reflecting the tourists’ opinions expressed in the reviews.

The implementation of this methodological approach facilitated a comprehensive examination of tourists’ multifaceted experiences, enabling a nuanced analysis of diverse hospitality service dimensions and their corresponding affective responses. This research framework’s efficacy in capturing subtle experiential variations contributed to a more robust and granular understanding of the subject matter.

3.3. Zero-Shot Learning Using BERT Language Model

BERT (Bidirectional Encoder Representations from Transformers) is a groundbreaking model in the field of natural language processing (NLP) developed by researchers at Google. Introduced in 2018, BERT has significantly advanced the capabilities of NLP by enabling models to understand the context of words in a sentence effectively [13]. By considering the full context of a word based on its surrounding words, BERT excels in a variety of NLP tasks such as question answering, sentiment analysis, and language inference, setting new benchmarks for performance in these areas. Additionally, BERT supports zero-shot learning, allowing the model to perform tasks it was not explicitly trained on by leveraging its deep contextual understanding and vast pretraining knowledge [27,28].

3.3.1. BERT Architecture

Unlike traditional models that read text input sequentially, BERT’s architecture allows it to consider the context of a word from both directions, making it bidirectional. This bidirectional nature enables BERT to capture a word’s full context in a sentence, leading to a more accurate understanding and processing of language. Figure 2 depicts the architecture of the BERT model, which is based on the Transformer model introduced by Vaswani et al. [29]. The BERT model uses a series layer consisting of Transformer encoders to process text input in a bidirectional manner, allowing it to understand context from both directions [13].

On each Transformer layer, the BERT model consists of the following three elements: multihead attention, add and norm, and feed-forward neural networks. The first component is multihead attention. This component allows the model to focus on different parts of the input sentence simultaneously. It applies self-attention multiple times in parallel (hence, “multihead”) and then combines the results. Transformers use self-attention mechanisms to weigh the importance of different words in a sentence. The self-attention mechanism allows the model to focus on relevant words while processing a given word, effectively capturing long-range dependencies and relationships within the text. Equation (1) mathematically describes the attention mechanism [29].

A t t e n t i o n (Q, K, V) = s o f t m a x (\frac{Q K^{T}}{\sqrt{d_{k}}}) V,

(1)

where Q (query), K (key), and V (value) are derived from the input embeddings, and

d_{k}

is the dimension of the key vectors.

The second component of the Transformer layer is the “add & norm”, which enhances model stability and efficiency by adding the attention layer’s input to its output (residual connection) and normalizing the result to have a mean of 0 and variance of 1. This normalization prepares the output for the next component, the feed-forward neural network, which processes each position separately through two linear transformations with a ReLU activation in between. This setup captures complex patterns and relationships in the data, enhancing the model’s ability to understand and process language.

In summary, the workflow of the BERT architecture begins with multihead attention allowing the model to focus on different parts of the input simultaneously, capturing long-range dependencies. The add & norm component enhances stability and efficiency by adding the input to the output (residual connection) and normalizing the result. Finally, the feed-forward neural network processes each position with two linear transformations and a ReLU activation, helping the model capture complex patterns and relationships in the data. Together, these components enable BERT to effectively process the language.

3.3.2. Pretrained Model for BERT

Pretrained models are machine learning models that have been previously trained on a large dataset and then fine-tuned for specific tasks. In the context of BERT, pretraining involves training the model on vast amounts of text data to understand language patterns and semantics. This pretrained BERT model can then be fine-tuned on smaller, task-specific datasets, significantly enhancing performance and reducing training time. Several pretrained models built on the BERT architecture include unique enhancements and optimizations, offering various levels of accuracy and efficiency. Several pretrained models are built on the BERT architecture, each with unique enhancements and optimizations. Table 1 outlines some available pretrained models.

The pretrained models outlined in Table 1 offer varying levels of performance and are suitable for different use cases. BERT-Base provides a good baseline performance for many NLP tasks and serves as a general benchmark. BERT-Large, with its 24 layers, achieves higher accuracy but requires more computational resources, making it ideal for tasks needing higher precision. RoBERTa stands out with superior performance on NLP benchmarks due to extensive training data and optimized techniques, making it suitable for applications demanding top accuracy. DistilBERT retains about 97% of BERT’s performance while being faster and more efficient, perfect for resource-constrained environments. ALBERT, with parameter sharing and factorized embeddings, offers a comparable performance to BERT-Large but with reduced memory usage and faster training times, making it efficient for high-performance tasks with limited resources.

In this research, we utilized RoBERTa because of its superior performance and robust optimization techniques. RoBERTa’s training on a larger and more diverse dataset, along with improvements such as removing the next sentence prediction task and using dynamic masking, makes it more effective at understanding context and capturing nuances in language. These enhancements result in higher accuracy and better performance on various NLP benchmarks, making RoBERTa the optimal choice for our task of analyzing and understanding tourist experiences through online reviews.

3.3.3. BERT Embedding

Embeddings are a type of representation that allows words or sentences to be converted into continuous vector spaces, enabling machine learning models to process and understand text data numerically [33]. General word embeddings, like Word2Vec and GloVe, map words to vectors based on their context and co-occurrence in large corpora, capturing semantic relationships among words so that similar words have similar vector representations. Sentence embeddings extend this concept by representing entire sentences as fixed-size vectors, capturing the overall meaning of the sentence. Expanding the conventional embedding approach, BERT embeddings create deep contextualized representations of text [34]. These BERT embeddings improve the performance of various NLP tasks by leveraging the full context of words and sentences, enhancing the model’s ability to understand and process language. Similar to conventional embedding, BERT embedding also be divided into word embedding and sentence embedding.

BERT generates contextual word embeddings, meaning the representation of a word is influenced by the words around it. This allows the model to understand polysemous words (words with multiple meanings) based on their context. For instance, the word “bank” would have different embeddings in the sentences “He sat on the river bank” and “He went to the bank to deposit money”. The self-attention mechanism in BERT captures these nuances by considering the entire sentence when encoding each word [35]. Mathematically, the embedding for a word,

w_{i}

, in context C can be expressed in Equation (2).

E (w_{i} | C),

(2)

where E is denoted by the embedding function, and C is the context provided by the surrounding words.

In complement to word embedding, using models like Sentence-BERT (SBERT), BERT can generate embeddings for entire sentences [36]. SBERT modifies BERT by adding a pooling operation to produce fixed-size sentence embeddings, facilitating tasks like semantic similarity and sentence classification. The pooling operation typically involves averaging or taking the [CLS] token’s output from the final layer. This results in a fixed-size vector that represents the entire sentence, making it suitable for comparison and classification tasks. Formula 3 shows the mathematical expression of the embedding for a sentence, S, where

H_{[C L S]}

is the hidden state of the [CLS] token from the last Transformer layer, and Pooling denotes the pooling operation (e.g., mean pooling and max pooling).

E (S) = P o o l i n g (H_{[C L S]})

(3)

The embedding mechanism enables BERT to excel in a variety of NLP tasks by capturing both word-level nuances and sentence-level semantics. Word embeddings are particularly useful for tasks such as named entity recognition and part-of-speech tagging, where understanding the context of individual words is crucial. In contrast, sentence embeddings are essential for tasks like semantic similarity, sentiment analysis, and question answering, where the overall meaning of the sentence is more important than individual word meanings. By leveraging both types of embeddings, BERT can effectively address a wide range of language-understanding challenges.

3.3.4. KeyBERT for Keyword Extraction

KeyBERT is a minimal and easy-to-use keyword extraction technique that leverages BERT embeddings to identify contextually relevant keywords from a given text proposed by Lee et al. [37]. This technique exploits BERT’s powerful language-understanding capabilities to generate keywords that are closely aligned with the content and context of the input text. The process of keyword extraction in KeyBERT involves the following three main steps:

Embedding Generation: BERT generates embeddings for each word in the text. These embeddings are rich, contextual representations of the words, capturing their meanings based on the surrounding words in the sentence.
Keyword Scoring: KeyBERT calculates the similarity between the document embedding (a representation of the entire text) and individual word embeddings. This similarity score indicates how relevant each word is to the overall content of the document. Several similarities measurements (e.g., cosine, Jaccard, etc.) can be employed.
Keyword Extraction: The top-scoring keywords, as determined by their similarity scores, are selected as the most relevant keywords representing the document. These keywords effectively summarize the main themes and topics of the text.

KeyBERT uses pretrained BERT models to generate the word embeddings necessary for keyword extraction. The choice of a pretrained model can significantly impact the quality and relevance of the extracted keywords. Pretrained models like BERT-Base, BERT-Large, and RoBERTa are commonly used because they have been trained on vast amounts of text data, enabling them to generate high-quality embeddings that capture complex linguistic nuances. In this research, the pretrained RoBERTa is used alongside KeyBERT. By leveraging RoBERTa as a high-performance pretrained model, KeyBERT can efficiently extract meaningful keywords without the need for extensive task-specific training, making it a practical and powerful tool for various NLP applications, including document summarization, topic modeling, and information retrieval.

3.4. Analyzing Segment Sentiment with VADER

VADER (Valence Aware Dictionary and Sentiment Reasoner) is a rule-based sentiment analysis tool specifically designed to analyze sentiments expressed in social media texts proposed by Hutto et al. [38]. It is effective at capturing both the intensity and the valence (positive, negative, or neutral) of sentiments in text data. VADER uses a combination of a sentiment lexicon and grammatical and syntactical rules to assign sentiment scores to words and phrases, taking into account factors such as punctuation, capitalization, and conjunctions.

VADER operates by first tokenizing the text into individual words and phrases, then assigning each token a sentiment score based on its presence in the sentiment lexicon. These scores are adjusted according to the context provided by the surrounding words and punctuation. For example, intensifiers like “very” can amplify the sentiment, while negations like “not” can reverse it. The final sentiment score for a given piece of text is a composite of these adjusted scores, resulting in an overall valence score.

One of the key advantages of VADER is its ability to perform well without requiring extensive training on specific datasets. This capability stems from VADER’s reliance on a preconstructed sentiment lexicon and heuristic rules, which allows it to analyze sentiments in new and diverse datasets without the need for additional training [39]. This feature is particularly valuable in the tourism sector, where user-generated content is highly varied and continuously evolving.

By integrating VADER into our ABSA framework, we can efficiently measure sentiment polarity across different aspects of tourist reviews. VADER’s zero-shot learning capability enables us to apply it directly to the extracted review segments related to specific aspects, providing accurate and immediate sentiment analysis without the need for large annotated datasets. This integration enhances our ability to derive meaningful insights from online reviews, ultimately contributing to the improvement of hospitality services based on real-time tourist feedback.

4. Results

This section presents a detailed analysis of the key findings from the aspect extraction and sentiment analysis of tourist reviews. This analysis provides a comprehensive understanding of the main themes and sentiments expressed by tourists in their reviews, offering valuable insights into their preferences and experiences.

4.1. Aspect Extraction

The aspect extraction process involved the following three key steps: keyword identification, keyword clustering, and aspect construction/labeling. Initially, keywords were identified using KeyBERT (version 0.8.5), which leverages BERT embeddings to generate contextually relevant keywords from the text. This ensured that the extracted keywords were both frequent and significant within the reviews. These keywords were then filtered for relevance, retaining only the most pertinent terms. Using BERT embeddings, the similarity between keywords was measured to construct a keyword cluster for capturing their semantic meaning and context. By examining these clusters, keywords that were contextually and semantically related were grouped to construct distinct aspects of the tourist experience. This clustering approach, grounded in the powerful representations of BERT embeddings, ensured a nuanced and data-driven identification of aspects, accurately reflecting the underlying sentiments and themes present in the reviews.

Respectively, Figure 3a,b depict a visual representation of the keywords identified and filtered during the aspect extraction process. Figure 3a presents a word cloud of all the keywords initially identified using KeyBERT. This figure illustrates the breadth of terms extracted from the online reviews, encompassing various aspects of the tourist experience. Keywords such as “hotel”, “restaurant”, “breakfast”, “food”, and “coffee” appear prominently, indicating their high frequency and relevance across the dataset. The diversity of keywords in this word cloud highlights the multifaceted nature of tourist experiences and the wide range of topics discussed in the reviews.

The retained keywords in Figure 3b were then used to construct aspects through a clustering process.. A T-SNE plot was generated and is depicted in Figure 4, projecting high-dimensional data into a two-dimensional space to highlight clusters. Keywords that appear close to each other in the T-SNE plot indicate a high degree of semantic similarity, allowing for the grouping of related terms into coherent aspects. Each point in the plot represents a keyword, and the colors correspond to the estimated labels of the clusters, as indicated in the legend. This visual representation helps to identify and construct meaningful clusters based on keyword proximity and contextual similarity.

In Figure 4, prominent keywords, such as “hotel”, “restaurant”, and “breakfast”, form distinct clusters, underscoring their central importance in the reviews. These keywords frequently appear together in the context of hospitality services, reflecting core aspects of tourists’ experiences. For instance, “hotel” and “restaurant” are commonly mentioned together as they represent primary components of accommodation and dining experiences. Other notable clusters emerge around keywords related to health and relaxation services. Keywords such as “spa”, “massage”, and “wellness” cluster together, highlighting tourists’ interest in rejuvenation and self-care activities. This cluster signifies the value placed on wellness services during travel. Similarly, keywords associated with retail experiences, such as “shopping”, “mall”, and “shops”, form a cohesive cluster. This grouping indicates that tourists frequently discuss their shopping experiences, reflecting the significance of retail therapy and souvenir hunting as part of their travel activities.

The T-SNE plot also reveals clusters around cultural and local experiences. Keywords like “temple”, “sunset”, and “cultural” indicate a focus on sightseeing and cultural immersion. These clusters demonstrate tourists’ appreciation for local heritage and natural beauty, which are pivotal aspects of their travel experience. Accommodation-related keywords, such as “guest”, “room”, and “cleanliness”, form another significant cluster, reflecting the importance of comfort and hygiene in tourist accommodations. This cluster underscores the critical role that accommodation quality plays in shaping overall tourist satisfaction. Keywords related to transportation, such as “airport”, “taxi”, and “bus”, cluster together, indicating the frequent discussions around travel logistics and the ease of getting around tourist destinations. This cluster highlights the importance of efficient and reliable transportation services for a seamless travel experience.

The clusters identified in the T-SNE plot (Figure 4) were further analyzed to construct meaningful aspects of the tourist experience. Table 2 provides a detailed list of the proposed aspects and the corresponding keywords identified within each cluster. Table 2 summarizes the key aspects derived from the keyword clusters visualized in the T-SNE plot. A comparative analysis of the frequency and significance of these keywords reveals several insights into the primary concerns and interests of tourists.

From Table 2, the aspect “Food in general” emerges as one of the most discussed topics, highlighting the central role of general food experiences in tourist reviews. In contrast, “Food items (specified)” focuses on detailed mentions of specific dishes, indicating tourists’ keen interest in particular food items. Although “Indonesian specified cuisine” appears less frequently, it is crucial for understanding tourists’ appreciation of local cuisine. The aspect “Restaurants service” reflects tourists’ emphasis on the quality of dining experiences, while “Accommodation” underscores the importance of comfort and hygiene in shaping overall satisfaction.

Retail experiences, captured in the “Shopping” aspect, show the significance of retail therapy and souvenir purchasing. The “Landmarks” aspect highlights the value placed on sightseeing and cultural immersion. Wellness services and transportation logistics indicate that tourists also prioritize health and convenience during their travels. Lastly, aspects such as “Cultural items” and “Toiletries” reflect tourists’ appreciation for cultural heritage and basic comfort amenities.

From a more comparative point of view, the aspects outlined in Table 2 reveal that while food and accommodation dominate tourist reviews, other elements, like restaurant service, local cuisine, shopping, landmarks, wellness, and transportation, also significantly shape the overall tourist experience. General food discussions and specific dish mentions highlight the importance of culinary experiences, while frequent mentions of accommodation stress the need for comfort and cleanliness. Another aspect of which gained much attention is restaurant service which is crucial in shaping tourists’ perceptions, and the interest in local cuisine underscores the desire for authentic, culturally rich experiences.

4.2. Aspect-Based Sentiment Analysis Results

Understanding tourists’ sentiments toward various aspects of their travel experience is crucial for improving hospitality services. Figure 4 presents a comprehensive sentiment analysis of the identified aspects, categorized into positive, neutral, and negative sentiments. This bar chart provides a clear visual representation of how tourists perceive different components of their travel, highlighting areas of satisfaction and concern.

The sentiment analysis reveals a complex landscape of tourist feedback. From Figure 5, overall, aspects related to food, accommodation, and restaurant services are among the most frequently mentioned, reflecting their central importance in the tourist experience. Positive sentiments are predominant in these areas, suggesting that tourists generally have favorable experiences with food and lodging. However, the presence of neutral and negative sentiments indicates areas where improvements are necessary.

From the sentiment analysis results on each aspect, positive sentiment is notably high for “General food terms” and “Restaurants and chefs”, underscoring the critical role of culinary experiences in tourism. Tourists frequently express enjoyment of food and dining, contributing significantly to overall satisfaction. Similarly, accommodation garners a strong positive sentiment, emphasizing the importance of comfort and quality. However, the presence of neutral and negative sentiments in these areas points to inconsistencies in service that need addressing. “Food items” and “Desserts and beverages” show a balanced sentiment distribution, indicating varied experiences and opportunities for culinary improvements to cater to diverse tastes.

Critical feedback is more prevalent in aspects like “Toiletries”, “Shopping”, and “Hospitality”, suggesting gaps in meeting tourist expectations. These areas have higher proportions of neutral and negative sentiments, indicating a need for better quality, variety, and consistent service. Meanwhile, “Landmarks”, “Travel”, and “Cultural items” predominantly receive neutral feedback, implying these aspects meet basic expectations but lack elements that significantly enhance satisfaction. Lastly, wellness services and transportation show mixed sentiments, highlighting the growing interest in health and relaxation during travel and the importance of convenience and ease of movement.

Figure 6 provides a different perspective on sentiment analysis than Figure 5 by illustrating the proportion of positive, neutral, and negative sentiments for each aspect. Unlike Figure 5, which shows sentiment based on frequency, this figure focuses on the relative distribution of sentiments within each aspect, offering insights into the overall quality and consistency of tourist experiences. Figure 6 complements the previous sentiment analysis by providing a nuanced view of tourist feedback across different aspects. By examining both the frequency and proportion of sentiments, stakeholders can gain a comprehensive understanding of tourist experiences, identify key areas for improvement, and implement strategies to enhance service quality and overall satisfaction.

From Figure 6, it can be seen that the proportions of positive sentiment are highest for “Landmarks”, “Travel”, and “Desserts and beverages”, suggesting that these aspects generally meet or exceed tourists’ expectations. Conversely, “Toiletries” and “Hospitality” show higher proportions of neutral and negative sentiments, highlighting areas where improvements are needed to better satisfy tourists. Interestingly, while “General food terms” and “Restaurants and chefs” have high positive sentiment frequencies, their proportions of positive sentiment are lower compared to other aspects. This indicates that although many tourists have positive dining experiences, there are significant neutral and negative experiences that require attention. Similarly, “Accommodation” shows a substantial portion of neutral and negative sentiments, reinforcing the need for consistent service quality. Eventually, the relatively balanced sentiment proportions in aspects like “Food items”, “Shopping”, and “Wellness” highlight the diverse range of tourist experiences and suggest areas where targeted improvements could enhance overall satisfaction. The predominantly neutral sentiments in “Cultural items” and “Transportation” suggest these aspects meet basic expectations but lack the impact needed to have strong positive feedback.

To complement Figure 5 and Figure 6, Figure 7 plots the aspects based on their sentiment polarity scores, categorizing them into positive, negative, and neutral zones. This figure provides a more detailed perspective on sentiment analysis by plotting the aspects based on their sentiment polarity scores. This approach offers a more detailed view by using the polarity score of each review instead of discrete positive/negative/neutral classifications. In Figure 7, aspects with a polarity score greater than 0.2 fall into the positive zone, indicating strong positive sentiment. These aspects include “General food terms”, “Restaurants and chefs”, “Accommodation”, “Food items”, and “Desserts and beverages”, reinforcing their significant positive impact on tourist experiences. Aspects with polarity scores between −0.2 and 0.2 are considered neutral, showing a mix of positive, neutral, and negative sentiments. Most aspects fall into this category, including “Cultural Items”, “Transportation”, “Wellness”, “Dining and cooking”, “Indonesian cuisine”, “Travel”, “Landmarks”, “Shopping”, and “Hospitality”. This distribution suggests that while these aspects are important, they elicit a wide range of sentiments from tourists. Interestingly, “Toiletries” is the only aspect that approaches the negative zone, with a polarity score slightly below neutral.

Figure 6 provides a refined perspective on tourist sentiments, allowing stakeholders to understand the intensity of feelings toward different aspects. By focusing on both the frequency and polarity of sentiments, a comprehensive strategy can be developed to enhance tourist experiences and address specific areas of concern effectively. The polarity score offers a nuanced view of how strongly tourists feel about each aspect, highlighting not just whether the sentiment is positive or negative but also the intensity of those feelings. This highlights a significant area of concern where tourists’ experiences are more frequently negative, indicating an urgent need for improvement.

5. Discussion

ABSA is a sophisticated technique in NLP that allows for the extraction and analysis of sentiments toward specific aspects or features within a given text. By focusing on particular components, such as service quality or food experiences in tourist reviews, ABSA provides detailed insights into what aspects are contributing positively or negatively to overall sentiment. As an enhancement to traditional ABSA technique, the use of zero-shot learning in ABSA offers significant advantages by leveraging the power of LLM. Zero-shot learning models, which are capable of understanding and classifying data without needing extensive training on specific datasets, allow for more flexible and scalable sentiment analysis. This approach is particularly valuable in the dynamic field of tourism, where new trends and aspects emerge frequently [40,41].

In the tourism context, ABSA enhances the precision of sentiment which allows for a more nuanced understanding of tourist feedback, identifying not just the overall sentiment but the particular elements that contribute to satisfaction or dissatisfaction. Hence, ABSA plays a crucial role in adapting tourist preferences by providing immediate feedback related to their experiences. By continuously monitoring feedback, providers can detect shifts in tourist expectations and preferences, allowing them to adapt their offerings accordingly to enhance the overall quality of service. For example, if an increasing number of guests express interest in wellness services, providers can introduce new wellness packages or enhance existing ones to meet this demand.

Based on the ABSA results, enhancing hospitality services involves strategically addressing areas with negative feedback and amplifying the aspects that receive positive sentiments. In our case study, the highly positive sentiments observed in aspects related to culinary experiences—including “General food terms”, “Food items”, “Indonesian specified cuisine”, “Desserts and beverages”, “Dining and cooking”, and “Restaurants service”—highlight a key strength in the current hospitality services provided to tourists. These aspects, frequently mentioned and predominantly receiving positive feedback, indicate that tourists consistently enjoy and appreciate the diverse and high-quality food offerings at tourist destinations. Those aspects have shown significant positive polarity scores, underscoring the critical role of culinary experiences in enhancing tourist satisfaction. This aligns with recent literature emphasizing the pivotal role of food in shaping overall tourism experiences, significantly influencing tourists’ perceptions and satisfaction [42,43].

The strong positive sentiment for accommodation highlights the importance of comfort, cleanliness, and service quality in lodging. Tourists frequently mention these factors as key contributors to a pleasant stay, indicating that many hotels and lodging facilities are successfully providing environments that meet and exceed guest expectations. Positive polarity scores for accommodation suggest that this aspect is a significant strength for many destinations, reinforcing the importance of maintaining high standards in lodging services. Recent research supports this, indicating that accommodation quality is a major determinant of overall tourist satisfaction and repeat visitation [44,45]. Beyond food and accommodation, other aspects such as “Landmarks”, and “Travel” also exhibit high proportions of positive sentiments. The positive feedback in these areas suggests that tourists value engaging with cultural and natural landmarks, as well as having reliable and enjoyable travel experiences. Landmarks, in particular, offer unique and memorable experiences that contribute significantly to the overall appeal of a destination [46].

On the other hand, insights into areas with negative feedback can guide targeted improvements. Negative feedback in aspects such as “Toiletries” and “Hospitality” highlights specific areas where tourists’ expectations are not being met. For toiletries, dissatisfaction often stems from poor quality, insufficient quantity, or irregular replenishment of amenities like soap, shampoo, and towels. Tourists expect a certain standard of comfort and convenience, and any lapses in these basic provisions can significantly impact their overall experience [47,48]. In hospitality, negative sentiments are typically due to inconsistent service quality, unprofessional staff behavior, or lack of personalized attention [45]. Hence, enhancing staff training programs to focus on personalized service and cultural sensitivity can improve guest satisfaction. In addition, implementing standardized service protocols can help maintain consistent quality across all interactions. By addressing these pain points, providers can turn negative experiences into positive ones, fostering loyalty and repeat visits.

The integration of Aspect-Based Sentiment Analysis (ABSA) with Zero-Shot Learning (ZSL) in this study offers significant potential for adoption in tourism and hospitality research. This method provides a scalable and flexible approach to analyzing vast amounts of user-generated content, such as online reviews and social media posts, without requiring extensive labeled datasets. In the dynamic tourism industry, where consumer feedback is abundant and constantly evolving, this capability is particularly valuable. By capturing and analyzing detailed aspects of tourist experiences, the method enables researchers and practitioners to understand specific areas of satisfaction and dissatisfaction, allowing service providers to tailor their offerings to better meet customer needs and enhance overall satisfaction.

Furthermore, the method’s adaptability to emerging trends and shifting consumer preferences makes it a powerful tool for strategic decision making and policy development. The tourism and hospitality sectors are characterized by rapid changes in consumer behavior and expectations, and the ability to quickly adapt to these changes provides a competitive advantage. The insights gained from this method can guide businesses in optimizing their services and marketing strategies, ensuring they remain relevant in a rapidly evolving market. Additionally, policymakers can leverage these insights to develop targeted policies that improve tourism infrastructure, promote sustainable practices, and enhance service quality, ultimately benefiting the entire tourism ecosystem. The method’s broad applicability extends beyond tourism, offering valuable insights for other service-oriented industries seeking to understand and respond to customer sentiment effectively.

6. Conclusions and Future Works

Aspect-Based Sentiment Analysis (ABSA) enhanced by zero-shot learning is a powerful tool for understanding tourist preferences by analyzing feedback on specific aspects like food, accommodation, and hospitality. This method offers detailed insights into what drives tourist satisfaction, enabling service providers to improve guest experiences. The flexibility of zero-shot learning allows for rapid adaptation to new trends in tourism, while BERT and RoBERTa provide high accuracy in keyword extraction and sentiment analysis. ABSA’s insights help hospitality services quickly identify and address issues, enhancing service quality and competitiveness. By leveraging these techniques, providers can better understand and anticipate tourist preferences, fostering positive experiences, building loyalty, and ensuring long-term success in the industry.

Beyond the technical aspect, the research contributes significantly to the tourism sector by offering practical applications with economic and commercial impacts, such as optimizing services and targeting marketing strategies to attract more visitors. In the educational sphere, this study can be integrated into tourism and hospitality management curricula, equipping students with skills in data-driven decision making and advanced analytics. Policymakers can also utilize these insights to develop guidelines that improve service quality and infrastructure, fostering a competitive and sustainable tourism sector. The societal impact is evident in the enhancement of tourist experiences and the promotion of cultural exchange between visitors and local communities. To improve the replicability of this study, future researchers can adopt the detailed methodology outlined here and share anonymized datasets and code, facilitating further exploration of ABSA and zero-shot learning in diverse contexts.

A limitation of our current study is the lack of consideration for reviewers’ origins, which may influence their perceptions and evaluations of tourist experiences. To address this, future research should explore the impact of reviewers’ nationalities on their feedback. This could involve analyzing metadata on reviewer origins, conducting comparative studies of reviews from different countries for the same destinations, and investigating how cultural dimensions correlate with review content and sentiment. Such analyses would provide a more nuanced understanding of how cultural backgrounds and expectations shape tourist feedback. This approach could reveal valuable insights into universal versus culture-specific preferences in hospitality, allowing for more targeted and culturally sensitive service improvements. By incorporating this dimension, future studies can offer a more comprehensive view of tourist experiences, potentially leading to more effective strategies for enhancing hospitality services across diverse cultural contexts. This extension of our current work would significantly contribute to the field’s understanding of cross-cultural dynamics in tourism and hospitality management.

Author Contributions

Conceptualization, I.N., K.F.I. and M.S.; methodology, I.N., M.R.M. and M.S.; software, M.R.M.; validation, I.N., K.F.I. and M.S.; formal analysis, M.R.M.; investigation, K.F.I.; resources, I.N. and M.R.M.; data curation, M.R.M.; writing—original draft preparation, I.N., K.F.I. and M.R.M.; writing—review and editing, M.S.; visualization, M.R.M.; supervision, M.S.; project administration, M.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data used in this study may be requested from the first/corresponding author.

Acknowledgments

The authors express their sincere appreciation for the exceptional support, in terms of facilities and policies, provided by Tidar University and Sejong University. This support was instrumental in the successful completion and publication of our work. In addition, we highly appreciate the financial support given by Tidar University for this collaboration.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Guo, X.; Pesonen, J.; Komppula, R. Comparing Online Travel Review Platforms as Destination Image Information Agents. Inf. Technol. Tour. 2021, 23, 159–187. [Google Scholar] [CrossRef]
Garner, B.; Kim, D. Analyzing User-Generated Content to Improve Customer Satisfaction at Local Wine Tourism Destinations: An Analysis of Yelp and TripAdvisor Reviews. Consum. Behav. Tour. Hosp. 2022, 17, 413–435. [Google Scholar] [CrossRef]
Álvarez-Carmona, M.Á.; Aranda, R.; Rodríguez-Gonzalez, A.Y.; Fajardo-Delgado, D.; Sánchez, M.G.; Pérez-Espinosa, H.; Díaz-Pacheco, Á. Natural Language Processing Applied to Tourism Research: A Systematic Review and Future Research Directions. J. King Saud Univ.-Comput. Inf. Sci. 2022, 34, 10125–10144. [Google Scholar] [CrossRef]
Abbasi-Moud, Z.; Vahdat-Nejad, H.; Sadri, J. Tourism Recommendation System Based on Semantic Clustering and Sentiment Analysis. Expert Syst. Appl. 2021, 167, 114324. [Google Scholar] [CrossRef]
Mehraliyev, F.; Chan, I.C.C.; Kirilenko, A.P. Sentiment Analysis in Hospitality and Tourism: A Thematic and Methodological Review. Int. J. Contemp. Hosp. Manag. 2022, 34, 46–77. [Google Scholar] [CrossRef]
Raghunathan, N.; Saravanakumar, K. Challenges and Issues in Sentiment Analysis: A Comprehensive Survey. IEEE Access 2023, 11, 69626–69642. [Google Scholar] [CrossRef]
Zhang, W.; Li, X.; Deng, Y.; Bing, L.; Lam, W. A Survey on Aspect-Based Sentiment Analysis: Tasks, Methods, and Challenges. IEEE Trans. Knowl. Data Eng. 2022, 35, 11019–11038. [Google Scholar] [CrossRef]
Jain, A.; Bansal, A.; Tomar, S. Aspect-Based Sentiment Analysis of Online Reviews for Business Intelligence. Int. J. Inf. Technol. Syst. Approach IJITSA 2022, 15, 1–21. [Google Scholar] [CrossRef]
Jiang, Q.; Chen, L.; Xu, R.; Ao, X.; Yang, M. A Challenge Dataset and Effective Models for Aspect-Based Sentiment Analysis. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Hong Kong, China, 3–7 November 2019; pp. 6280–6285. [Google Scholar] [CrossRef]
Nazir, A.; Rao, Y.; Wu, L.; Sun, L. Issues and Challenges of Aspect-Based Sentiment Analysis: A Comprehensive Survey. IEEE Trans. Affect. Comput. 2020, 13, 845–863. [Google Scholar] [CrossRef]
Wang, W.; Zheng, V.W.; Yu, H.; Miao, C. A Survey of Zero-Shot Learning: Settings, Methods, and Applications. ACM Trans. Intell. Syst. Technol. TIST 2019, 10, 1–37. [Google Scholar] [CrossRef]
Shu, L.; Xu, H.; Liu, B.; Chen, J. Zero-Shot Aspect-Based Sentiment Analysis. arXiv 2022, arXiv:2202.01924. [Google Scholar] [CrossRef]
Devlin, J.; Chang, M.W.; Lee, K.; Toutanova, K. Bert: Pre-Training of Deep Bidirectional Transformers for Language Understanding. arXiv 2018, arXiv:1810.04805. [Google Scholar] [CrossRef]
Hoang, M.; Bihorac, O.A.; Rouces, J. Aspect-Based Sentiment Analysis Using BERT. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland, 30 September–2 October 2019; pp. 187–196. [Google Scholar]
Park, H.; Jiang, S.; Lee, O.K.D.; Chang, Y. Exploring the attractiveness of service robots in the hospitality industry: Analysis of online reviews. Inf. Syst. Front. 2024, 26, 41–61. [Google Scholar] [CrossRef]
Kim, W.; Kim, S.B.; Park, E. Mapping tourists’ destination (dis) satisfaction attributes with user-generated content. Sustainability 2021, 13, 12650. [Google Scholar] [CrossRef]
Çevrimkaya, M.; Çavus, Ş.; Şengel, Ü. Assessment of hotels’ online complaints in domestic tourism: Mixed analysis approach. Int. J. Tour. Cities, 2024; in press. [Google Scholar] [CrossRef]
Yan, Q.; Jiang, T.; Zhou, S.; Zhang, X. Exploring tourist interaction from user-generated content: Topic analysis and content analysis. J. Vacat. Mark. 2024, 30, 327–344. [Google Scholar] [CrossRef]
Ghosh, P.; Mukherjee, S. Understanding tourist behaviour towards destination selection based on social media information: An evaluation using unsupervised clustering algorithms. J. Hosp. Tour. Insights 2023, 6, 754–778. [Google Scholar] [CrossRef]
Mirzaalian, F.; Halpenny, E. Exploring destination loyalty: Application of social media analytics in a nature-based tourism setting. J. Destin. Mark. Manag. 2021, 20, 100598. [Google Scholar] [CrossRef]
Qin, Y.; Wang, X.; Xu, Z. Ranking tourist attractions through online reviews: A novel method with intuitionistic and hesitant fuzzy information based on sentiment analysis. Int. J. Fuzzy Syst. 2022, 24, 755–777. [Google Scholar] [CrossRef]
Skotis, A.; Livas, C. A data-driven analysis of experience in urban historic districts. Ann. Tour. Res. Empir. Insights 2022, 3, 100052. [Google Scholar] [CrossRef]
Taecharungroj, V.; Stoica, I.S. Assessing place experiences in Luton and Darlington on Twitter with topic modelling and AI-generated lexicons. J. Place Manag. Dev. 2024, 17, 49–73. [Google Scholar] [CrossRef]
Chen, Y.; Zhong, Y.; Yu, S.; Xiao, Y.; Chen, S. Exploring bidirectional performance of hotel attributes through online reviews based on sentiment analysis and Kano-IPA model. Appl. Sci. 2022, 12, 692. [Google Scholar] [CrossRef]
Ayeh, J.K.; Au, N.; Law, R. “Do we believe in TripAdvisor?” Examining credibility perceptions and online travelers’ attitude toward using user-generated content. J. Travel Res. 2013, 52, 437–452. [Google Scholar] [CrossRef]
Filieri, R.; Acikgoz, F.; Ndou, V.; Dwivedi, Y. Is TripAdvisor still relevant? The influence of review credibility, review usefulness, and ease of use on consumers’ continuance intention. Int. J. Contemp. Hosp. Manag. 2021, 33, 199–223. [Google Scholar] [CrossRef]
Chen, C.Y.; Li, C.T. ZS-BERT: Towards Zero-Shot Relation Extraction with Attribute Representation Learning. arXiv 2021, arXiv:2104.04697. [Google Scholar] [CrossRef]
Wang, Y.; Wu, L.; Li, J.; Liang, X.; Zhang, M. Are the BERT Family Zero-Shot Learners? A Study on Their Potential and Limitations. Artif. Intell. 2023, 322, 103953. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is All You Need. arXiv 2017, arXiv:1706.03762. [Google Scholar] [CrossRef]
Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar] [CrossRef]
Sanh, V.; Debut, L.; Chaumond, J.; Wolf, T. DistilBERT, a Distilled Version of BERT: Smaller, Faster, Cheaper and Lighter. arXiv 2019, arXiv:1910.01108. [Google Scholar] [CrossRef]
Lan, Z.; Chen, M.; Goodman, S.; Gimpel, K.; Sharma, P.; Soricut, R. ALBERT: A Lite BERT for Self-Supervised Learning of Language Representations. arXiv 2019, arXiv:1909.11942. [Google Scholar] [CrossRef]
Selva Birunda, S.; Kanniga Devi, R. A Review on Word Embedding Techniques for Text Classification. In Innovative Data Communication Technologies and Application: Proceedings of ICIDCA 2020; Springer Nature: Berlin, Germany, 2021; pp. 267–281. [Google Scholar]
Puccetti, G.; Miaschi, A.; Dell’Orletta, F. How Do BERT Embeddings Organize Linguistic Knowledge? In Proceedings of the Deep Learning Inside Out (DeeLIO): The 2nd Workshop on Knowledge Extraction and Integration for Deep Learning Architectures, Dublin, Ireland, 10 June 2021; pp. 48–57. [Google Scholar] [CrossRef]
Wiedemann, G.; Remus, S.; Chawla, A.; Biemann, C. Does BERT Make Any Sense? Interpretable Word Sense Disambiguation with Contextualized Embeddings. arXiv 2019, arXiv:1909.10430. [Google Scholar] [CrossRef]
Reimers, N.; Gurevych, I. Sentence-BERT: Sentence Embeddings Using Siamese BERT-Networks. arXiv 2019, arXiv:1908.10084. [Google Scholar] [CrossRef]
Lee, J.S.; Hsiang, J. Patent Classification by Fine-Tuning BERT Language Model. World Pat. Inf. 2020, 61, 101965. [Google Scholar] [CrossRef]
Hutto, C.; Gilbert, E. VADER: A Parsimonious Rule-Based Model for Sentiment Analysis of Social Media Text. In Proceedings of the International AAAI Conference on Web and Social Media, Ann Arbor, MI, USA, 1–4 June 2014; Volume 8, pp. 216–225. [Google Scholar] [CrossRef]
Bonta, V.; Kumaresh, N.; Janardhan, N. A Comprehensive Study on Lexicon Based Approaches for Sentiment Analysis. Asian J. Comput. Sci. Technol. 2019, 8, 1–6. [Google Scholar] [CrossRef]
Toubes, D.R.; Araújo Vila, N.; Fraiz Brea, J.A. Changes in Consumption Patterns and Tourist Promotion after the COVID-19 Pandemic. J. Theor. Appl. Electron. Commer. Res. 2021, 16, 1332–1352. [Google Scholar] [CrossRef]
Kalia, P.; Mladenović, D.; Acevedo-Duque, Á. Decoding the Trends and the Emerging Research Directions of Digital Tourism in the Last Three Decades: A Bibliometric Analysis. Sage Open 2022, 12, 21582440221128179. [Google Scholar] [CrossRef]
Rachão, S.; Breda, Z.; Fernandes, C.; Joukes, V. Food Tourism and Regional Development: A Systematic Literature Review. Eur. J. Tour. Res. 2019, 21, 33–49. [Google Scholar] [CrossRef]
Naruetharadhol, P.; Gebsombut, N. A Bibliometric Analysis of Food Tourism Studies in Southeast Asia. Cogent Bus. Manag. 2020, 7, 1733829. [Google Scholar] [CrossRef]
Kalnaovakul, K.; Promsivapallop, P. Hotel Service Quality Dimensions and Attributes: An Analysis of Online Hotel Customer Reviews. Tour. Hosp. Res. 2023, 23, 420–440. [Google Scholar] [CrossRef]
Ali, B.J.; Gardi, B.; Othman, B.J.; Ahmed, S.A.; Ismael, N.B.; Hamza, P.A.; Anwar, G. Hotel Service Quality: The Impact of Service Quality on Customer Satisfaction in Hospitality. Int. J. Eng. Bus. Manag. 2021, 5, 14–28. [Google Scholar] [CrossRef]
Chen, J.; Park, H.; Fan, P.; Tian, L.; Ouyang, Z.; Lafortezza, R. Cultural Landmarks and Urban Landscapes in Three Contrasting Societies. Sustainability 2021, 13, 4295. [Google Scholar] [CrossRef]
Cicerali, E.E.; Kaya Cicerali, L.; Saldamlı, A. Linking Psycho-Environmental Comfort Factors to Tourist Satisfaction Levels: Application of a Psychology Theory to Tourism Research. J. Hosp. Mark. Manag. 2017, 26, 717–734. [Google Scholar] [CrossRef]
PJ, S.; Singh, K.; Kokkranikal, J.; Bharadwaj, R.; Rai, S.; Antony, J. Service Quality and Customer Satisfaction in Hospitality, Leisure, Sport and Tourism: An Assessment of Research in Web of Science. J. Qual. Assur. Hosp. Tour. 2023, 24, 24–50. [Google Scholar]

Figure 1. Step-by-step research framework (originally compiled by authors).

Figure 2. General BERT architecture (adapted from Vaswani et al. [30]).

Figure 3. Word clouds of identified keywords: (a) all initially identified keywords; (b) keywords retained after filtering for relevance and significance.

Figure 4. T-SNE plot of keywords’ semantic similarities based on BERT embedding.

Figure 5. The sentiment analysis results for each aspect based on frequency.

Figure 6. The sentiment analysis results for each aspect based on proportion.

Figure 7. The distribution of the sentiment polarity scores of all proposed aspects.

Table 1. Available generic pretrained model for BERT.

Model Name	Architecture	Training Data	Performance
BERT-Base [13]	12 layers, 768 hidden units, 12 attention heads	BooksCorpus and English Wikipedia (16 GB)	Good baseline performance on NLP tasks
BERT-Large [13]	24 layers, 1024 hidden units, 16 attention heads	BooksCorpus and English Wikipedia (16 GB)	Good baseline performance on NLP tasks
RoBERTa [30]	Similar to BERT (Base and Large variants)	160 GB of diverse text data	Superior performance on NLP benchmarks
DistilBERT [31]	6 layers, 768 hidden units, 12 attention heads	Same data as BERT	Slightly lower accuracy, much faster, and more efficient
ALBERT [32]	Parameter sharing across layers, factorized embeddings	Same data as BERT	Comparable to BERT-Large with reduced memory usage

Table 2. List of proposed aspects and the corresponding identified keywords.

Aspect Name	Keywords	Frequency
Food in general	foods, food, menu, meal, dishes, breakfast	5914
Food items (specified)	beef, meat, pork, seafood, noodles, pasta, pizza, steak, sushi, vegetarian, soup, chicken, fish, lamb, mushroom	3087
Specified Indonesian cuisine	bakso, gudeg, soto	303
Desserts and beverage	cake, dessert, snacks, cafe, milk, chocolate, coffee	1561
Dining and cooking	bar, kitchen, cook	289
Restaurants service	restaurant, restaurants, chef	5522
Shopping	supermarket, mall, shopping, shop, shops	780
Landmarks	borobudur, sunrise, sunset	370
Wellness	massage, spa	257
Transportation	airport, car, travel agent	152
Travel and tour	holiday, trip, vacation	342
Hospitality	hospitality, guest, guests, receptionist, waitress	616
Accommodation	hostel, hotel, hotels, favehotel, cleanliness, pool	4246
Cultural items	batik, wayang	110
Toiletries	bath, bathroom, toilet, towels	1064

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Nawawi, I.; Ilmawan, K.F.; Maarif, M.R.; Syafrudin, M. Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement. Information 2024, 15, 499. https://doi.org/10.3390/info15080499

AMA Style

Nawawi I, Ilmawan KF, Maarif MR, Syafrudin M. Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement. Information. 2024; 15(8):499. https://doi.org/10.3390/info15080499

Chicago/Turabian Style

Nawawi, Ibrahim, Kurnia Fahmy Ilmawan, Muhammad Rifqi Maarif, and Muhammad Syafrudin. 2024. "Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement" Information 15, no. 8: 499. https://doi.org/10.3390/info15080499

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Exploring Tourist Experience through Online Reviews Using Aspect-Based Sentiment Analysis with Zero-Shot Learning for Hospitality Service Enhancement

Abstract

1. Introduction

2. Social Media Analytics for Tourism and Hospitality

3. Materials and Methods

3.1. Dataset Collection and Preparation

3.2. Research Framework

3.3. Zero-Shot Learning Using BERT Language Model

3.3.1. BERT Architecture

3.3.2. Pretrained Model for BERT

3.3.3. BERT Embedding

3.3.4. KeyBERT for Keyword Extraction

3.4. Analyzing Segment Sentiment with VADER

4. Results

4.1. Aspect Extraction

4.2. Aspect-Based Sentiment Analysis Results

5. Discussion

6. Conclusions and Future Works

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI