1. Introduction
In this paper, we present our study in developing a system for tourists review collection and visualization that accommodates on-spot reviews for less-known tourist spots. On-spot reviews are online opinions assumed to be posted at the target facility. In this study, we apply geotagged tweets as potential on-spot reviews, estimate their adequateness as reviews and further apply the verified tweets as on-spot reviews in the designed Park Supplementary Review System (PSRS).
Recently, there has been a rapidly increasing demand for the application of information technologies in the field of tourism (defined with a blanket term of Tourism Informatics). Diverse Big Data have been applied to tourism research and have made considerable improvements, for example, in the development of recommendation systems (Masui et al. [
1]), navigation systems (Yoshida et al. [
2]), and regional content tourism support systems (Masui et al. [
3]). The main goal is to promote tourism of a specific place and to provide personalized information as per specific search. Apart from the developed systems, the task of analyzing tourism information is of great importance. It enables the collection of large amounts of data to supplement the developed systems. By data sources, tourism-related Big Data generally fall into a few broad categories, which include the following.
User Generated Contents (UGC), defined as data generated by users which includes online textual and photo data, etc.;
Device Data (generated by devices), which includes GPS data, roaming data from mobile devices, Bluetooth data, etc.;
Transaction Data (generated by operations), with the likes of Web search data, Web page visiting data, or online booking data.
These carry different information and different data types which may address different tourism issues as explained by Ling et al. [
4].
The Internet today has vastly altered the data landscape, by accumulating a lot of information. People, businesses, and devices have all become data factories that are pumping out large amounts of information to the Web each day, Askitasklaus et al. [
5]. This huge amount of data shared on the Internet can be utilized to foster tourism activities in a given specific area. Internet users can easily express their opinions about a product, service or a place they have recently visited using popular Social Networking Services (SNS), such as Twitter, Facebook, or Instagram and reach millions of other potential visitors. In this way, people tend to transmit their daily events in the form of diaries and textual messages using online social services such as blogs, online posts, microblogs, and other SNS. Among many SNS, the one that has been greatly popular for people to express their opinions, share their thoughts, and report real-time events has been Twitter (
https://twitter.com/, accessed on 15 January 2022). Many companies and organizations have been interested in utilizing the data appearing on Twitter to study the opinions of people towards different products, services, facilities, and events taking place around the world. Through Twitter, a great number of messages (known as “tweets”) are posted daily because of its simplicity. Moreover, with GPS technology implemented in mobile phones and computers, sightseers as well share their views and pictures regarding their tour experiences on Twitter. This type of information is valuable and important in facilitating tourism activities of the specific area tagged with GPS information. Online opinions thus can have a great impact on brand, product or place reputation. For this reason, some potential visitors make informed decisions based on online opinions. Primarily, there is a number of online review sites for tourism related activities, such as TripAdvisor (
https://tripadvisor.com/, accessed on 15 January 2022), Booking.com, or Expedia (
https://www.expedia.com, accessed on 15 January 2022).
Unfortunately, less-known and rarely visited sightspots often do not accumulate sufficient number of valuable opinions. Therefore, to address this, we introduce the concept of using on-spot reviews (on-spot tweets with contents verified to contain visitor opinions). These are Internet opinions about the target spot extracted from geotagged tweets. To prove the adequateness of the extracted information we propose our classification method that uses a fine-tuned BERT model. Previously, Shimada et al. [
6] introduced a method to identify on-site likelihood of tweets using a two-stage method, a rule based and contextual approach. Unlike them, in our proposed method we prove adequateness using a fine-tuned BERT model.
Approved geotagged tweets are mapped as on-spot reviews in the designed system (PSRS). This is realized as efforts to cultivate newly Point Of Interest (POI) and to supplement additional information to the less-known places in the target spot (Serengeti and Ngorongoro) National Park (NP), which are famous and largest NP in northern Tanzania. Serengeti’s annual great wildebeest migration is an iconic feature of the park which is happening around the end of year. The two parks are in the list of UNESCO World Heritage Sites with Serengeti NP property changing seamlessly to Ngorongoro Conservation Unit (see
Figure 1 for details). The plains of Serengeti NP, comprising 1.5 million hectares of savanna, while the annual migration of two million wildebeests, with thousands of other ungulates in search of pasture and water, engage in a 1000 km long annual circular trek spanning the two adjacent countries of Kenya and Tanzania. It is known to be one of the nature’s most impressive spectacles (
https://whc.unesco.org/en/list/156/,accessed on 15 January 2022). The two spots together cover the area of more than twenty thousand square kilometers with many sightspots scattered around the area. Because of its wide area, some spots are less-known among sightseers than others and therefore rarely visited, thus accumulating few reviews.
Additionally, the wildebeest migration is a famous but seasonal scenery across the target spot. Precise timing is entirely dependent upon the rainfall patterns each year. Hence, POI also differ periodically. Despite the fact that the migration and animal spot can be predicted, in this study, we take extra efforts to cultivate new POI pointed out in tweets by tourists. This is an important task as it can improve tourism activities of those target spots. Moreover, if the method is verified as effective, it can be applied also to other such attractive, yet not often visited sightseeing spots, all around the globe, in any country.
Therefore, in this study, we propose a method of obtaining tourist on-spot reviews from the Internet to complement the least reviewed sightspots by extracting information directly from geotagged tweets. Tweets are considered geotagged if they include geolocation information assigned to it. We treat tweets that include the name of the target spot as potential tourist on-spot reviews. Results published in this paper represent an effort to complement reviews information for less-known places and rarely visited sightspots areas. Therefore, this article, by presenting a method to support less-known, yet valuable tourist attractions by cultivating on-spot reviews automatically with automatically collected and analyzed geotagged tweets, presents an important contribution for Tourism Informatics in general. The main scientific problem we solve in this paper is answering the question of how to identify the authenticity and utility of the extracted tweets as equivalents of online reviews.
Various types of approaches were developed and improved to tackle the task of extracting valuable information from the Internet by proposing POI recommendations that provide a location suitable as per user’s preferences. Some of the most successful approaches so far include rule-based or statistical approaches, while novel Deep Learning-based approaches are yet to be commonly used (Minaee et al. [
7]).
This study attempts to address the task of obtaining online reviews (UGC data source category) by extracting Twitter microblog posts (tweets), in form of textual data with the aim to extract useful information and further create a classifier to determine whether the tweets are likely to carry similar information. This task is widely recognized as text classification which is one of the fundamental tasks in Natural Language Processing (NLP).
Due to its nature, text classification has important implications for NLP tasks, which aim to either analyze, understand, or produce human language. Text classification has a large potential for various applications in the domain of text mining, especially those that require semantic analysis, such as author profiling and sentiment analysis (Sboev et al. [
8]).
Categorizing tweets has been challenging due to insufficient contextual information and noisy possession. Recently, Zahera et al. [
9] suggested a disaster management multi model approach for identifying actionable information from disaster-related tweets using Bert, graph attention network and relation network.In their work, the focus is on multiple classification so as to allow rapid detection of various categories of tweets. Their approaches outperform state-of-the-art approaches. Masaki et al. [
10] proposed a real-time analysis method of detecting tourists spots from geotagged tweets using location information from tweets and a time-series changes. Their method revealed improvements compared to their previous moving-average method. Compared to above related works, we use BERT for both binary text classification (on-spot tweets or not) and multi text classification where we identify the semantic polarity of the tweets using a three and five stage rating score.
The main contribution of this study is four-fold. Firstly, this study proposes a mining framework that cultivates on-spot reviews and related POI from geotagged tweets by using location clustering and BERT neural network model. Secondly, it adds most probable rating score to the on-spot reviews extracted by learning the sentiment orientation of the tweets using BERT neural network model. Thirdly, we develop a corpus of on-spot annotated tweets which can further be applicable in other NLP tasks. Finally, we designed a web system (PSRS) and use selected and rated tweets as touristic information.
In a global perspective, this study intends to support the local tourism sector in Tanzania specifically in the area of wildlife-based tourism as one of the promising and fastest-growing sector among others in Tanzania, with the selected target spot attracting the most sightseers [
11,
12,
13].
The rest of this paper is organized as follows.
Section 2 introduces related works and previous research.
Section 3 briefly outlines the proposed method used in this research.
Section 4 describes the applied data. Experiment setup and analysis of the results are discussed in
Section 5. Additionally,
Section 6 discusses results and various experimental findings.
Section 7 introduces the designed system (PSRS). Finally, in
Section 8 we present conclusions and future works.
3. Proposed Method
In this section, we describe the proposed method that:
- (i)
classifies on-spot tweets from Twitter data by incorporating clustering and BERT, and
- (ii)
adds rating information to on-spot judged tweets
In this section, we firstly, introduce the procedures involved in realization of the proposed method and further discuss its inner processes at each stage.
The proposed method incorporates location clustering and classification techniques. The outline of the procedures involved, consists of a series of stages as observed in
Figure 2.
Figure 2 outlines the procedures involved in the realization of the proposed method. In stage A, tweets are collected from the Internet by specifying the keywords “ngorongoro” and “serengeti”, which may appear anywhere in the tweet, by using an accredited Twitter API (
https://developer.twitter.com/en/products/twitter-api, accessed on 15 January 2022). In stage B, we cluster the collected tweets by location. A K-means algorithm, which is a vector quantization algorithm introduced by Hartigan et al. [
19] is applied to tweets’ location information to automatically partition them into clusters K, by calculating the nearest mean from cluster centroid. Tweets located within the target spot estimated boundaries are retained. Since the target spot boundaries are not explicitly specified, we decide our target spot boundaries with the help of Google maps (
https://maps.google.com/, accessed on 15 January 2022) which highlights the East, West, North and South boundaries of the target spot as follows;
East = 2°2413.5 S 35°1603.4 E
West = 2°1127.2 S 34°0758.8 E
North = 1°2633.6 S 34°4845.0 E
South = 3°1102.6 S 34°3808.2 E
In stage C, we manual annotate location clustered tweets as either on-spot or not. We also assign sentiment score to the tweets. To accomplish this task, we use three annotators. The details of annotation task is discussed in details in later part of this article.
In stage D and E, we trained our classifier to predict tweets and the sentiment score assigned to them and further evaluate the model performance. We adopt a pre-trained BERT neural network model for this task. In stage F, we map selected and rated tweets as touristic information in the designed system (PSRS).
3.1. Location Clustering of Tweets
Clustering is the task of grouping a set of objects in such a way that objects in the same group (called a cluster) are more similar (in some sense) to each other than to those in other groups (clusters).
Using K-means clustering, the number of clusters must be decided beforehand. Based on collected tweets data distribution, we adopt a technical approach method to identify the optimal number of clusters using an Elbow method, Average Silhouette method, and Gap statistics method, respectively.
Figure 3 shows the results of the most optimal number of cluster groups as obtained from an Elbow method.We can further observe a 2D representation of the obtained clusters with the distribution of extracted tweets as shown in
Figure 4.
We analyze the results of location clustering and consider tweets within the target spot boundaries as potential on-spot reviews. We use filtering approach to distinguish tweets beyond target spot boundaries.
Table 1 shows few examples of on-spot judged tweets. In the next procedure, we identify on-spot tweets and assign sentiment scores to them by manual annotation, putting into consideration sets of features established and discussed in the following section.
3.2. Corpus Annotation
Annotation is a methodology for adding information to a document at some level, such as a word, a phrase, paragraph, section, or the entire document. Manual text annotation is an essential part of text analytics. Although annotators (workers performing the manual annotation) work with limited parts of data sets, their results are applied to further train automated text classification techniques and thus affect the final classification results. Automated text analytics methods rely on manually annotated data by building their heuristic, or statistical rules, or neural networks on such annotated data (Bobicev et al. [
20]). In the annotation process, we define the text to annotate, set labels to put in tweets, and we discard tweets with a certain degree of ambiguity so as to reduce noise when classifying.
To accomplish this task, we asked three annotators to carefully assign the clustered 1273 tweets. Additionally, annotators also assigned sentiment score of tweets as either positive, neutral or negative.
Table 2 highlights the summary of annotated tweets (here referred to as corpus of annotated tweets).
Table 3, shows a number of examples of the annotated tweets. “1” indicates an on-spot annotated tweet, while “0” indicates a not on-spot tweet. The remarks column indicates a reason for such annotation. In
Table 3 for example, a tweet “k” is not tweet from the target spot however the name of target spot was tagged in it. For this reason, it is important to manually annotate our data.
3.3. Inter-Rater Agreement
The reliability of annotations and adequacy of assigned labels are especially important in the case of sentiment annotations. In particular, Plaban et al. [
21], addressed the importance of evaluating the reliability between annotators for statistical accuracy. To measure the agreement between three raters, we use Cohen’s kappa coefficient, Cohen et al. [
22].
Kappa coefficient between two or more annotators can be computed by using the following formula:
In this above equation, Po is the relative observed agreement among raters, and Ph is the hypothetical probability of chance agreement, using the observed tweets data to calculate the probabilities of each observer randomly seeing each category.
When kappa = 1, the annotators are in complete agreement. When the score is negative, it shows that there is no effective agreement between annotators, or the agreement is worse than random.
In addition, the hypothetical probability of the chance of agreement can be computed using the following formula:
where
k represents categories, and
N being the number of observations to categorize. In this study, the degree of agreement between the three annotators was calculated as 0.37. Kappa’s have specific interpretations, and 0.37 can be interpreted as “substantial”, “fair”, “medium” or “somewhat good” depending on the interpretation (Landis and Koch et al. [
23]). This value, however, is not high to say annotators have an agreement on the annotation results. From this observation, we can assume that the final results of our proposed model was also affected by the low level of agreement between annotators. One way to improve this is by carefully removing ambiguous tweets, which will be our improvement consideration in our future work.
3.4. Feature Selection
Many tourism-related tweets on Twitter do not contain on-spot information. One of the solutions to extract on-spot tweets is by classifying them as such by using a machine learning-based classifier. In collecting tourists’ tweets, it is necessary to determine the conditions of considering which tweets are tourists’ tweets. Therefore, we introduce a set of tweets classification features to be used for the automatic classification as follows:
Tweet location: We observed that tweets tweeted within the radius of the target spot’s boundaries (latitude and longitude) introduced in the previous section which was acquired using Google’s Geocoding API (
https://developers.google.com/maps/documentation/geocoding/overview, accessed on 15 January 2022) often had a high chance of becoming a valuable on-spot review.
Presence of “NOW”: The word “now” is a characteristic keyword on Twitter. Although the presence of the word does not always indicate on-spot information, it is considered to suggest a high probability of the tweet containing on-spot information. We, therefore, retain tweets with this word.
Presence of a mention “@ Target spot”: In many cases, tourists’ tweets about places they are sightseeing are accompanied with images the users attach to tweets by using mobile camera functions. At that time, expressions such as “@ Serengeti national park” frequently indicate places visited after “@”.
Bag of Words (BOW): All words from the whole corpus with the term frequency for the BOW language model, which contains 1273 sentences.
3.5. BERT for Classification
We adopted a BERT model for the training and evaluation of our classifier. BERT architecture is defined as follows; “BERT stands for Bidirectional Encoder Representations from Transformers. It is designed to pre-train deep bidirectional representations from an unlabeled text by jointly conditioning on both the left and right context. As a result, the pre-trained BERT model can be fine-tuned with just one additional output layer to create state-of-the-art models for a wide range of NLP tasks” [
24]. The Transformers architecture is the main block in BERT. Transformers is a deep learning model used primarily in the field of NLP. It is deeply bidirectional which means it learns from both sides during the training phase. Its token input representation is constructed by summing the token, segment, and position embeddings [
25]. One of the biggest challenges in NLP is the shortage of training data. However, by adopting a fine-tuned BERT model that takes into account the context orientation of the token in the sentence, it is in theory possible to obtain high results with only a limited amount of training data. This is the main reason behind adopting this approach. This advantage is due to the impact of the pre-training mechanism, which established the formula of transfer learning in NLP. The transfer learning process in NLP can be achieved with two major processes, namely, a pre-training process and a fine-tuning process.
6. Results and Discussion
We adopted BERT for on-spot tweets classification and sentimental polarity prediction. The results show BERT outperform baseline classifiers in binary classification task. In the sentiment polarity classification,
Table 10 shows the prediction performance was better for the 3-star range interval compared to 5-star score range. A three-score range setup outperformed a five-score range scale with an F-score of 0.74. 5-star and 3-star, and the smaller number of classes results in more samples per each class and eventually allows for better generalization of data. Recently, (Kayastha et al. [
31]) demonstrated a procedure to tackle class imbalance by addition of per- class weights to the standard cross-entropy loss function, which shows better results compared to oversampling or undersampling. Therefore, it will be our consideration for future improvements.
On the other hand, as observed in
Table 11 there was classifier misjudgement between annotated score and predicted score. We identified these tweets as difficult to judge.
Figure 8 also shows our model evaluated a negative sentiment tweet (tweet number 23) as positive sentiment. One way to improve our model performance is to remove tweets with high degree of ambiguities in training set. This will be our consideration in our future works.
The results demonstrated that our proposed method, although not ideal, is sufficiently usable to be used for score generation.
Moreover, wildlife-related sentiments differed significantly. For example,
Serengeti is basically just animals killing one another
I have been to Africa and the Serengeti. I have seen hundreds of giraffes. Killing one as a sport
Serengeti: pride of lions hunting and killing zebras
#serengeti That lion killing the cub, has put me in such mood. i’m absolutely livid
Words such as “animal killing”, “killing” can be perceived as dangerous, scary which could potentially cause their lower rating generation. This contextual ambiguity poses a challenge in the automatic prediction of wildlife sentiment rating. To deal with this it is necessary to remove noisy data, hence improve degree of agreement between annotators.
7. PSRS
Tourism, for many areas in the world, is one of the most important industries. The activation of tourism leads to the activation of industries and communities. In this situation, easy access to information provided on the World Wide Web plays an important role. In this study, apart from developing two methods—one for detecting opinionated tweets, and second for assigning sentiment score, we also designed a system (PSRS) on which we mapped all on-spot judged tweets (on-spot reviews) as tourist’s opinions.
Figure 9 shows the interface of the designed system in a mobile view. The system displays verified tweets (on-spot reviews) as tourist’s opinions collected from the target spot using our method of complementing the lack of reviews for the rarely visited sightspots.
Figure 10 shows the overview of the designed system. In this system, user can search and select registered sightspots and new POI (see
Figure 11 for example of sightspots registered in the system). The database of our designed system holds about 165 sightspots extracted as sightspots and POI, respectively. Extracted sightspots include, wildlife spots, accommodation spots, souvenir spots and restaurant spots in the target spot. Upon selection, the system does not only display the verified tweets (on-spot reviews) corresponding to the selected sightspot, but also displays rating information added on the sightspot. Optimal routes between two points in the target spot can also be displayed (see
Figure 12). We used Google Maps API for this function. Since one sightspot possesses number of rating scores evaluated from multiple on-spot reviews, respectively, we also compute the average rating score of a given sightspot by using the formula below.
= Sightspot score
n = number of score items
scri = the value of each individual score given
Further use the computed score to represent the sightspot rating information (see
Figure 13).
The designed system intends to support local tourism in Tanzania specifically in the area of wildlife-base tourism.
POI Discovered
Location and time data associated with extracted tweets can be considered as useful geographically annotated materials on the Web. They generate detailed tourist trails of which regions have been visited more or are more attractive (Lee et al. [
32]).
Figure 14, visualize the distribution of geotagged tweets collected, some in the target spot (possible on-spot tweets) and others outside the target spot (not on-spot). From this figure, we can observe tweets posted in different areas (here referred to as sightspots) with precise location and time of the day. This helps us analyze the posting behavior and extract useful information such as POI, events, trends, or activities. For example, we can observe late-night tweets at some points in the target spot which can possibly represents on-spot reviews of the accommodation facilities used by the tourists. Another example is seen from the accumulation of tweets from the same location which represents the most visited spots, (For example, in
Figure 15 shows 92 times for ngorongoro crater spot). This information is also useful in recommending sightspots, or sightspot routes.
Table 12 shows few examples of tweets with newly found POI.From our analysis, we discovered 68 different POI.
Figure 15 illustrates a bar graph that shows sightspots discovered with the number of occurrences in tweets. For example, “ngorongoro crater” has the largest number of on-spot tweets (92 tweets). It shows this sightspot was the most visited. We were interested to cultivate such information as season and time of the day.
Figure 16 shows the visiting progression in a year basis and time of the day.We can learn the time interval between 15 through 18 accumulates most of the tweets, which can possibly represent the preferable crater visiting time. Another example of point of attraction discovered is the Mara River. An annual scenery of wildebeest migration in the planes of Serengeti happens across this river. We can identify different observation points throughout the river for best scenery using geolocation information tagged in the tweets. For example,
Figure 17 shows a point of intersection between the river channel and sightseeing pattern which can possibly represent the optimal wildebeest migration view point.
8. Conclusions
In this study, we proposed a new method to extract from the Internet new on-spot tourist opinions for the tourism information analysis system, by collecting Twitter data and building a classifier that distinguishes on-spot tweets from a set of collected tweets and automatically adds rating information to the opinion by using a BERT neural language model-based classifier which learns the geotagged tweets information.
The proposed method incorporates a location clustering and classification technique using multiple algorithms including state-of-the-art neural architecture. In the experiment, we used location-clustered tweets to build a classifier that learns information from on-spot annotated tweets to further classify those tweets and compare various classifiers. The results showed that the best performance was achieved by the baseline (SVM) classifier which achieved a high F-score of 0.85 compared to others.
Finally, we compared the SVM classifier to a deep learning state-of-the-art technique (BERT), utilizing the same tweet dataset. Experiments showed that BERT outperformed SVM and achieved a high F-score of 0.94. Despite its demand for high computing power, BERT showed excellent results with only limited training data. It suggests that a BERT model can be adopted in solving the task of on-spot tweets identification and sentimental polarity prediction in particular when there is a challenge of limited training data.
Classified on-spot tweets with their added rating information were mapped as on-spot reviews into the designed system (PSRS) as sightseer’s supplementary opinions. From the classified on-spot reviews, we also took efforts to discover POI from the tweets and present them as interesting sightseeing points.
Since we built a classifier that automatically detects on-spot tweets and adds rating information to them by solely relying on geotagged tweets, it would be interesting to use this classifier to predict also non-geotagged tweets. Therefore, we will consider that in our future studies. Furthermore, we hope our corpus (
Table 2) of on-spot annotated tweets can be used in the future for the deployment of prediction system.To increase the usefulness we also plan to increase the data volume of this corpus. We hope that this study will inform and enrich other researchers and would be useful for future studies on also exploring the application of NLP, Big Data, and Artificial Intelligence to the full advantage of the revitalization of regional tourism in areas other than Tanzania.