Next Article in Journal
A Multi-Hop Reasoning Knowledge Selection Module for Dialogue Generation
Previous Article in Journal
Research on the Teaching of Laser Chaotic Communication Based on Optisystem and Matlab Software
Previous Article in Special Issue
Caching Method for Information-Centric Ad Hoc Networks Based on Content Popularity and Node Centrality
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Effects of Machine Learning and Multi-Agent Simulation on Mining and Visualizing Tourism Tweets as Not Summarized but Instantiated Knowledge

by
Shun Hattori
1,*,
Yuto Fujidai
2,
Wataru Sunayama
1 and
Madoka Takahara
3
1
Faculty of Advanced Engineering, The University of Shiga Prefecture, 2500 Hassaka-cho, Hikone 522-8533, Japan
2
Graduate School of Engineering, The University of Shiga Prefecture, Hikone 522-8533, Japan
3
Faculty of Advanced Science and Technology, Ryukoku University, Otsu 520-2194, Japan
*
Author to whom correspondence should be addressed.
Electronics 2024, 13(16), 3276; https://doi.org/10.3390/electronics13163276
Submission received: 15 July 2024 / Revised: 8 August 2024 / Accepted: 14 August 2024 / Published: 19 August 2024
(This article belongs to the Special Issue New Advances in Multi-agent Systems: Control and Modelling)

Abstract

:
Various technologies with AI (Artificial Intelligence), DS (Data Science), and/or IoT (Internet of Things) have been starting to be pervasive in e-tourism (i.e., smart tourism). However, most of them for a target (e.g., what to do in such a tourism spot as Hikone Castle) utilize their “typical/major signals” (e.g., taking a photo) as summarized knowledge based on “The Principle of Majority”, and tend to filter out not only their noises but also their valuable “peculiar/minor signals” (e.g., view Sawayama Castle) as instantiated knowledge. Therefore, as a challenge to salvage not only “typical signals” but also “peculiar signals” without noises for e-tourism, this paper compares various methods of ML (Machine Learning) to text-classify a tweet as being a “tourism tweet” or not, to precisely mine tourism tweets as not summarized but instantiated knowledge. In addition, this paper proposes a MAS (Multi-Agent Simulation), powered with artisoc, for visualizing “tourism tweets”, including not only “typical signals” but also “peculiar signals”, whose number can be enormous, as not summarized but instantiated knowledge, i.e., instances of them without any summarization, and validates the effects of the proposed MAS by conducting some experiments with subjects.

1. Introduction

In recent years, AI (Artificial Intelligence) technologies, especially Artificial Neural Networks [1], have started to become pervasive/ubiquitous in various situations in the real world (towards Society 5.0 [2], which was proposed by the Cabinet Office, Government of Japan): dialogue systems based on LLMs (Large Language Models) such as OpenAI’s ChatGPT and Google’s Gemini, Text-to-Image generation [3] such as Stable Diffusion and Midjourney, DX (Digital Transformation) in companies, more advanced ITSs (Intelligent Transport Systems) and automated driving, and a diverse array of AIs in education (EduTech [4]), finance (FinTech), medical care and nursing care (MedTech and SleepTech [5,6]), clothing, food, and housing [7,8], various forms of entertainment such as sports [9] and video games [10,11], and so on.
A great number of technologies with AI, DS (Data Science), and/or IoT (Internet of Things) are also pervasive in e-tourism (i.e., smart tourism) [12,13,14,15]. The most related domains of research to this paper are data analyses of user-generated contents on tourism from the Web, especially SNS (Social Networking Service) and tourism review (i.e., word of mouth) websites such as Tripadvisor.com (Tripadvisor offers more than 1 billion user-generated ratings and reviews on over 8 million experiences, accommodations, restaurants, airlines, and cruises), and various systems using the analyzed results as shown in Figure 1:
(a)
Analysis of where tourism spots/regions are, e.g., geomapping tourism spots using place names in contents (texts) of tweets, geotags in metadata of tweets [16] and geotags in EXIF (Exchangeable Image File Format) of sightseeing photos [17];
(b)
Analysis of what to do in tourism spots/regions, e.g., mining and geomapping local experiences from blog entries [18,19,20] and real-world context-aware querying (ReCQ) based on geospatial activities in geographical spots/regions [21,22];
(c)
Analysis of how tourism spots look (from the other spots), e.g., extracting visual appearance descriptions of real-world objects, especially geographical features, from the Web [23,24] and using search engines’ metadata [25];
(d)
Analysis of relationships between tourism spots, e.g., networks of paths (edges/links) between tourism spots (nodes), typical routes and their context from local blogs [26] and travel paths from travelogues [27] in SNS;
(e)
Various systems to support/recommend users while planning before traveling, while traveling, while creating their word of mouth after traveling, and so forth [28,29,30,31].
However, most of these technologies for a target (e.g., routes for visitors to tourism spots, or images as visual appearance information of real-world objects, especially geographical features) utilize their “typical signals” (e.g., typical routes [32], or typical images [33]) as summarized knowledge (i.e., collective intelligence) based on “The Principle of Majority Rule [34]”, and tend to filter out not only their noises but also their valuable “peculiar signals” (e.g., peculiar routes that such a specific author as a celebrity or a historical figure traced [27], or peculiar images [35,36,37]) as instantiated knowledge (i.e., individual intelligence).
As shown in Figure 2, mining the Web, e.g., Twitter.com (currently, X.com), for “typical/major signals” as summarized knowledge without noises is less problematic because the number N t of sampled tweets that are summarized as a “typical/major signal” tends to be greater than the number N n of sampled tweets that are summarized as a noise, and it only has to be based on “The Principle of Majority Rule [34]” even if “tourism tweets” have not been mined precisely, while mining the Web for not only “typical/major signals” but also “peculiar/minor signals” as not summarized but instantiated knowledge without noises remains problematic because they are not summarized, the number N of sampled tweet(s) that is instantiated as a “typical/major signal”, a “peculiar signal”, or a noise is all one, and it is not enough just to be based on “The Principle of Majority Rule [34]” and requires “tourism tweets” to be mined as precisely as possible. Therefore, as a challenge to salvage not only “typical/major signals” but also “peculiar/minor signals” without noises for e-tourism (i.e., smart tourism), this paper compares various methods of ML (Machine Learning) to text-classify a tweet as a “tourism tweet” or not for precisely mining tourism tweets as not summarized but instantiated knowledge.
Meanwhile, visualizing (e.g., geomapping) “typical/major signals” as summarized knowledge is less problematic because they are summarized and their number is limited, while visualizing (e.g., geomapping) not only “typical/major signals” but also “peculiar/minor signals” as instantiated knowledge remains problematic because they are not summarized and the their number, especially the latter, is not limited and can be enormous. Therefore, this paper proposes a MAS (Multi-Agent Simulation), powered with artisoc (artisoc [38] is a MAS (Multi-Agent Simulation) platform whose intuitive controllability allows ideas to be rapidly incorporated into models), for visualizing “tourism tweets”, including not only “typical/major signals” but also “peculiar/minor signals” as not summarized but instantiated knowledge, i.e., instances of them without any summarization, and validates the effects of the proposed MAS by conducting some experiments with subjects.
The proposed system with MLs and MAS for mining and visualizing “tourism tweets” as not summarized but instantiated knowledge has future potential to enable users to discover new/unknown tourism spots/regions and new/unknown attractions of well-known tourism spots/regions, and to promote users to visit new ones. According to the surveys on domestic tourism needs in Japan by Jalan Research Center [39], users who prefer new/unknown tourism spots/regions to well-known ones are not few in number.
Note that this paper defines a “tourism tweet” as a tweet in which a tourist seems to participate in any tourism activity in a tourism spot/region. For instance, the following texts as contents of tweets are classified as “tourism tweets” for the tourism spot Hikone Castle:
  • “I spent about an hour visiting the grounds and the museum of Hikone Castle.”
    (where = “Japan” > “Shiga” > “Hikone” > “Hikone Castle”, what to do = “spend about an hour visiting the grounds and the museum of Hikone Castle”);
  • “I was taken a photo of me and Hikonyan (Hikonyan is the city’s mascot character, created in homage to the Hikone domain, and one of the most widely recognized mascot characters in Japan [40]) with Hikone Castle in the background.”
    (where = “Japan” > … > “Hikone Castle”, what to do = “take a photo of me and Hikonyan with Hikone Castle in the background”, who/when/how = ?);
  • “There was a nice souvenir shop near Hikone Castle. I had bought many souvenirs.”
    (where = “Japan” > … > “Hikone Castle”, what to do = “buy many souvenirs (at a nice souvenir shop near Hikone Castle)”, who/when/how = ?);
  • “This is the castle stamp of Sawayama Castle. I only viewed Sawayama Castle from Hikone Castle in this trip. I wanna climb Sawayama Castle in the next trip.”
    (where = “Japan” > … > “Hikone Castle”, what to do = “view Sawayama Castle from Hikone Castle”, who/when/how = ?).
Meanwhile, the following texts as contents of tweets are classified as not “tourism tweets” for the tourism spot Hikone Castle but noises:
  • “In a ground of Hikone Castle, I did radio gymnastic exercises in my daily routine.”
    (because she/he seems to be not a tourist but a citizen near Hikone Castle);
  • “Hikone Castle is one of the 12 original castles in Japan.”
    (because it is among the common knowledge about Hikone Castle);
  • “Where are you all? I am just at Hikone Castle.”
    (because it is uncertain whether or not she/he was on a trip to Hikone Castle).
The remainder of this paper is organized as follows. Section 2 proposes a MAS (Multi-Agent Simulation) powered with artisoc for visualizing tourism tweets as not summarized but instantiated knowledge, i.e., instances of them without any summarization. Section 3 shows the experimental results of Experiment I to validate the effects of the proposed MAS powered with artisoc for visualizing tourism tweets as not summarized but instantiated knowledge. Section 4 shows the experimental results of Experiment II to investigate the bad effects of noises for e-tourism, i.e., tweets that include any tourism spot in “Hikone”, Japan but do not include any tourism activity in the tourism spot, on visualizing (more specifically, not visualizing by our proposed MAS powered with artisoc in Section 2 but geomapping simply) tourism tweets as not summarized but instantiated knowledge. Section 5 shows the experimental results of Experiment III to compare various methods of ML (Machine Learning) with respect to their effects on precisely mining the Twitter.com (currently, X.com) for text-classifying tweets as being tourism tweets or not. Finally, Section 6 concludes this paper.

2. Multi-Agent Simulation for Visualizing Tourism Tweets

This section proposes a MAS (Multi-Agent Simulation) powered with artisoc [38] for visualizing “tourism tweets”, including not only “typical/major signals” but also “peculiar/minor signals” as not summarized but instantiated knowledge, i.e., instances of them without any summarization.
Artisoc [38] is a MAS (Multi-Agent Simulation) platform whose intuitive controllability allows ideas to be rapidly incorporated into models. Our proposed MAS powered with artisoc has the following two kinds of agents as shown in Figure 3:
  • Spot Agent is an agent in the Universe of artisoc that represents a tourism spot, especially in “Hikone”, Japan.
    • Properties (i.e., data; fields in Java programming): Each Spot Agent has the name of the tourism spot, e.g., “Hikone Castle” or “Hikone Station”, and its corresponding GPS geotag, i.e., longitude and latitude.
    • Behaviors (i.e., functions; methods in Java programming): Each Spot Agent appears at the start of a simulation based on its GPS geotag. In addition, each Spot Agent offers its GPS geotag to the User Agents who have the name of its tourism spot in a text as the content of her/his posted tourism tweet.
  • User Agent is an agent in the universe of artisoc, who represents the user who makes a trip to a tourism spot and posts a tourism tweet about the trip.
    • Properties (i.e., data; fields in Java programming): Each User Agent has the text as the content of the posted tourism tweet, the tourism spot (e.g., “Hikone Castle”) extracted from the text, and the activities (i.e., verbs like “take (a photo)” or “view (Sawayama Castle)”) extracted from the text, and the posted time stamp (e.g., “11:06 8 April 2024” or “12:47 8 April 2024”) as a metadata of the posted tourism tweet as shown in Step 2 of Figure 3.
    • Behaviors (i.e., functions; methods in Java programming): Each User Agent appears with her/his activities if any at the posted time stamp in a simulation. Sequentially, each User Agent departs from a randomized distance because the point from where she/he traveled to the destination might be unknown, and travels to the GPS geotag of the designated Spot Agent (i.e., tourism spot). Finally, each User Agent disappears at the destination at a constant time after arriving. Note that the number of User Agents is equal to the number of tourism tweets because instances of them are utilized without any summarization.
Our proposed MAS powered with artisoc generates a result of the simulation by the Spot Agents and User Agents in the universe with animation for visualizing “tourism tweets” in such a target tourism region as “Hikone” in Japan, including not only “typical/major signals” (e.g., take a photo) but also “peculiar/minor signals” (e.g., view Sawayama Castle) as not summarized but instantiated knowledge, i.e., instances of them without any summarization, as shown in Figure 1b. Our proposed MAS has the following three steps:
  • Step 1. At the start of a simulation, each Spot Agent reads the name of the tourism spot and its corresponding GPS geotag, i.e., longitude and latitude. Each Spot Agent converts its corresponding GPS geotag to the 2D position in a simulation, and all Spot Agents appear with the name of their tourism spot at their 2D position at the start of a simulation. In addition, each Spot Agent offers its GPS geotag to the User Agents who have the name of its tourism spot in a text as a content of her/his posted tourism tweet, but each User Agent has not yet appeared in a simulation.
  • Step 2. Each User Agent appears with her/his activities at the posted time stamp in a simulation and then departs from a randomized distance and travels to the GPS geotag of the designated Spot Agent (i.e., tourism spot). Finally, each User Agent disappears at the destination at a constant time after arriving.
  • Step 3. The simulation is finished when all User Agents (i.e., instances of tourism tweets) have been visualized and disappear from the simulation.
For example, a 27 s animation video is publicly available online [41] and was used in Experiment I to validate our proposed MAS powered with artisoc for visualizing tourism tweets as not summarized but instantiated knowledge, i.e., instances of them without any summarization, as shown in Figure 4.

3. Experiment I: The Effects of Multi-Agent Simulation on Visualizing Tourism Tweets

This section, Experiment I, shows the experimental results to validate the effects of the proposed MAS (Multi-Agent Simulation) powered with artisoc [38] for visualizing tourism tweets as not summarized but instantiated knowledge.

3.1. Subjects

In January 2023, the first questionnaire investigation was conducted for 81 subjects: 76 males and 5 females, Japanese students who study at The University of Shiga Prefecture, Department of Electronic Systems Engineering, School of Engineering.
The subject samples for Experiment I have a potential limitation because of their gender imbalance. The future work requires to make the subject samples for Experiment I as gender-balanced as possible and also as age-balanced as possible.

3.2. Dataset

Dataset I has Japanese texts of 94 ground-true tourism tweets (i.e., only signals but not noises), pseudo-automatically collected by firstly inputting “Hikone” as a query to Twitter.com (currently, X.com), secondly filtering their Japanese texts with the names of tourism spots in “Hikone”, their GPS geotag, and/or TFIDF-based feature words [42] of the tourism region “Hikone”, and finally checking whether or not each of them is a tourism tweet manually by the second author.

3.3. Experimental Method

Experiment I has the following 4 steps:
  • Step 1. Each subject watches Japanese texts of 94 tourism tweets without visualizing them by such a MAS (Multi-Agent Simulation) as artisoc as a baseline;
  • Step 2. Each subject watches a 27 s animation video [41] of visualizing them by our MAS powered with artisoc proposed in Section 2 as shown in Figure 4;
  • Step 3. Each subject answers five-grade evaluations for four kinds of questions to compare our proposed MAS with the baseline;
  • Step 4. Each subject responds with the tourism spot(s) in Hikone that she/he wants to take a trip to, if any, and then freely describes the baseline and the MAS.

3.4. Experimental Results

Table 1 shows the subjective evaluations of Japanese texts of 94 tourism tweets without visualizing them by such a MAS (Multi-Agent Simulation) as artisoc ( N = 81 ), while Table 2 shows the subjective evaluations of visualizing them by our proposed MAS as artisoc ( N = 81 ). In addition, Table 2 shows the p-value of the ANOVA test [43] to compare our proposed MAS with the baseline. Note that the tourism spots in Hikone that the subjects wanted to take a trip to were “Hikone Castle”, “la-men NIKKOU”, and “Ohmi-beef SENNARITEI” by the baseline, while “Hikone Castle”, “la-men NIKKOU”, and “Hikone Castle Museum” were those for our proposed MAS.
Unfortunately, our proposed MAS is not superior to the baseline, i.e., without visualizing Japanese texts of tourism tweets by our proposed MAS, with respect to any question. An analysis of the problems of our proposed MAS with the baseline and free descriptions found the following future directions for improvement:
  • It requires to be more easily watchable by resolving the overlap problem of texts, such as the name of each Spot Agent (i.e., tourism spot) and each User Agent’s activities (i.e., verbs) for a designated Spot Agent;
  • It requires to be geomapped as shown in Figure 1;
  • It also requires the number of User Agents (i.e., visitors) to each Spot Agent (i.e., tourism spot) and the typical routes that more User Agents traveled as summarized knowledge, as well as each User Agent’s travels with its activities (i.e., verbs) for a designated Spot Agent as instantiated knowledge.

4. Experiment II: The Bad Effects of Noises on Visualizing Tourism Tweets

This section, Experiment II, shows the experimental results of investigating the bad effects of noises for e-tourism, i.e., tweets that include any tourism spot in “Hikone”, Japan, but does not include any tourism activity in the tourism spot (but might include any daily routine, e.g., “In a ground of Hikone Castle, I did radio gymnastic exercises in my daily routine.”), on visualizing (more specifically, not visualizing by our proposed MAS powered with artisoc [38] in Section 2 but geomapping simply) tourism tweets as not summarized but instantiated knowledge, i.e., instances of them without any summarization.

4.1. Subjects

In November 2023, the second questionnaire investigation was conducted for 63 subjects: 59 males and 4 females, Japanese students who study at The University of Shiga Prefecture, Department of Electronic Systems Engineering, School of Engineering. Table 3 shows their four-grade degree of knowledge about tourism in “Hikone”, Japan, and that all subjects have 1.29 on average, i.e., knew little about it on average.
The subject samples for Experiment II have a potential limitation because of their gender imbalance. The future work requires to make the subject samples for Experiment II, as gender-balanced as possible and also as age-balanced as possible.

4.2. Dataset

Dataset II has Japanese texts of tweets posted on 23 October 2022 and 24 October 2022, which necessarily include any tourism spot(s) in “Hikone”, Japan, including not only ground-true tourism tweets (i.e., signals) but also noises, pseudo-automatically collected by firstly inputting “Hikone” as a query to Twitter.com (currently, X.com), secondly filtering their Japanese texts with the names of tourism spots in “Hikone”, their GPS geotag, and/or TFIDF-based feature words [42] of the tourism region “Hikone”, and finally checking whether or not each of them is a tourism tweet manually by the second author. Note that Japanese texts of not only signals but also noises include any tourism spot in “Hikone”, Japan, while Japanese texts of only signals include any tourism activity in the tourism spot, but Japanese texts of noises do not include any tourism activity in the tourism spot (but might include any daily routine, e.g., “In a ground of Hikone Castle, I did radio gymnastic exercises in my daily routine.”), and also note that a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held on 23 October 2022 in Yonbancho Square of “Hikone”, Japan, while no specific event was held on 24 October 2022 in “Hikone”, Japan.
Visualizations (more specifically, geomapping) of tourism tweets (i.e., signals s) with noises n at a ratio of n s + n = 0.00 to 0.50 are generated manually by the second author using Google My Maps [45]. Figure 5a,b show the geomapping tourism activities of tourism tweets on 23 October 2022 in “Hikone”, Japan, with not-tourism activities of noises at a ratio of n s + n = 0.00 and 0.50, respectively, while Figure 6a,b show the geomapping tourism activities of tourism tweets on 24 October 2022 in “Hikone”, Japan, with not-tourism activities of noises at a ratio of n s + n = 0.00 and 0.50, respectively.

4.3. Experimental Method

Experiment II has the following 2 steps:
  • Step 1. Each subject watches the static image of a map with geomapping tourism tweets (i.e., signals s) on 23 or 24 October 2022 with noises n at a ratio of n s + n = 0.00 to 0.50, generated manually by the second author using Google My Maps [45] but cannot operate the map dynamically;
  • Step 2. Each subject answers 10-grade evaluations for two kinds of questions to investigate the bad effects of noises for e-tourism on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge, i.e., instances of them without any summarization.

4.4. Experimental Results

Table 4 investigates the bad effects of noises for e-tourism on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge, on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, Japan. In addition, Table 4 shows the p-value of the ANOVA test [43] to compare the ratio of noises n s + n = 0.10 (unlike Table 5, not 0.00 but 0.10 has been adopted because the result of 0.00 for Q5 seems to be an outlier) with 0.50, for all subjects ( N = 63 ).
Meanwhile, Table 5 investigates the bad effects of noises for e-tourism on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge, on 24 October 2022 when no specific event was held in “Hikone”, Japan. In addition, Table 5 shows the p-value of the ANOVA test [43] to compare the ratio of noises n s + n = 0.00 (unlike Table 4, not 0.10 but 0.00 has been adopted because the result of 0.10 for Q5 seems to be an outlier) with 0.50 for all subjects ( N = 63 ).
An analysis of these tables and Figure 7 found the following findings:
  • No statistically significant (n.s.) difference is unfortunately made by the bad effects of noises for e-tourism on the geomapping tourism tweets as instantiated knowledge (one of its reason is that the responses of all subjects have large variance and any p-values by the ANOVA test [43] are unfortunately large).
  • Some sort of difference (more specifically, decline) seems to be made by the bad effects of noises on the naturality (Q5) and the understandability (Q6) for e-tourism, especially on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, Japan.
  • No difference seems to be made by the effects of noises on the naturality (Q5) and the understandability (Q6) for e-tourism on 24 October 2022 when no specific event was held in “Hikone”, Japan (because on average, all subjects knew little about tourism in “Hikone”, Japan, and tourism tweets were not able to be distinguished, i.e., signals s, from noises n, especially on 24 October 2022, where the number of tourism tweets was much fewer than on 23 October 2022).
Therefore, one of the requirements for visualizing (more specifically, not only geomapping simply but visualizing by our proposed MAS powered with artisoc [38] in Section 2) tourism tweets as not summarized but instantiated knowledge, i.e., instances of them without any summarization, is to precisely mine the Web, e.g., Twitter.com (currently, X.com), for tourism tweets as not summarized but instantiated knowledge. Section 5 compares various methods of ML (Machine Learning) to precisely text-classify a tweet as a tourism tweet, where the precision 0.80 and recall 1.00 are set as a goal.
The responses of all subjects have large variance and might depend on their prior knowledge about tourism in “Hikone”, Japan. And on average, all subjects knew little about tourism in “Hikone” and were not able to distinguish tourism tweets, i.e., signals, from noises. Therefore, for not all the subjects ( N = 63 ) but only the subjects ( N = 23 ) who knew a little (2) or a lot (3) about tourism in “Hikone”, Japan, excluding the subjects who knew nothing (0) or little (1) about it, Table 6 investigates the bad effects of noises for e-tourism on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, Japan, and shows the p-value of the ANOVA test [43] to compare the ratio of noises n s + n = 0.00 with 0.50.
Meanwhile, for not all subjects ( N = 63 ) whose average degree of knowledge about tourism in “Hikone” is 1.29, i.e., who knew little about it, but only the subjects ( N = 23 ) who knew a little (2) or well (3) about tourism in “Hikone”, Japan, excluding the subjects who knew nothing (0) or little (1) about it, Table 7 investigates the bad effects of noises for e-tourism on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge on 24 October 2022 when no specific event was held in “Hikone”, Japan, and shows the p-value of the ANOVA test [43] to compare the ratio of noises n s + n = 0.00 with 0.50.
An analysis of these tables and Figure 8 found the following findings:
  • The bad effects of noises fortunately make statistically significant (*) difference (more specifically, decline), only on the naturality (Q5) of geomapping tourism tweets as instantiated knowledge, on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, while they unfortunately make no statistically significant (n.s.) difference on the others, but
  • Some sort of difference (more specifically, decline) seems to be made by the bad effects of noises not on the understandability (Q6) but only on the naturality (Q5) for e-tourism, on 24 October 2022 when no specific event was held in “Hikone”, Japan.

5. Experiment III: The Effects of Machine Learning on Mining Tourism Tweets

This section, Experiment III, shows the experimental results to compare various methods of ML (Machine Learning) with respect to their effects on precisely mining the Twitter.com (currently, X.com) for text-classifying tweets as being tourism tweets or not, as a challenge to salvage not only “typical/major signals” but also “peculiar/minor signals” without noises for e-tourism (i.e., smart tourism).

5.1. Dataset

Dataset III has Japanese texts of 94 ground-true tourism tweets (i.e., signals) that are the same as those of Dataset I, and 100 not-tourism tweets (i.e., noises) in “Hikone”, Japan, pseudo-automatically collected by firstly inputting “Hikone” as a query to Twitter.com (currently, X.com), secondly filtering their Japanese texts with the names of tourism spots in “Hikone”, their GPS geotag, and/or TFIDF-based feature words [42] of the tourism region “Hikone”, and finally checking whether or not each of them is a tourism tweet manually by the second author. Note that Japanese texts of not only signals but also noises include any tourism spot in “Hikone”, Japan, while Japanese texts of only signals include any tourism activity in the tourism spot, but Japanese texts of noises do not include any tourism activity in the tourism spot (but might include any daily routine, e.g., “In a ground of Hikone Castle, I did radio gymnastic exercises in my daily routine.”), and also note that the only Japanese texts as contents of tweets were utilized for MLs to precisely text-classify a tweet into a “tourism tweet” or not; the other data will be able to be utilized in the near future as shown in Figure 9.
Dataset III has a potential limitation because of its small size, unlike the dataset of over eight million tweets used for temporal and spatiotemporal analyses of tourists’ attraction visit sentiments in Chicago on Twitter [46]. The future work requires to make the size of Dataset III as large as possible and also to make it include tourism spots/regions that are as diverse as possible.

5.2. Experimental Methods

Experiment III compares various methods of ML (Machine Learning) with respect to their effects on precisely mining the Twitter.com (currently, X.com) for text-classifying tweets as being tourism tweets or not by utilizing 80% randomly sampled data of Dataset III for pre-training a ML model and the remainder (i.e., 20%) for testing the pre-trained ML model.

5.3. Machine Learning for Mining Tourist Tweets

By referencing several review articles [47,48,49] of traditional ML (Machine Learning) for text classification compared with BERT [50] and ChatGPT [51], this paper has selected and adopted the following five (eight in more detail) kinds of supervised MLs for precisely mining Twitter.com (currently, X.com) for tourist tweets to text-classify tweets as being tourism tweets or not, as a challenge to salvage not only “typical/major signals” but also “peculiar/minor signals” without noises for e-tourism (i.e., smart tourism).

5.3.1. SVM

SVM (Support Vector Machine) [52] is a kind of supervised ML model, mainly for robust classification, regression, and outliers detection. In this paper, SVM was inputted with a TFIDF-based vector [42] or Doc2Vec-based vector [53,54], transformed from a Japanese text as a content of a tweet. Note that scikit-learn [55] has been adopted for SVM and TFIDF, gensim [56] has been adopted for Doc2Vec, and their hyperparameters were set to the default settings.

5.3.2. fastText

FastText [57] is an open-source, free, lightweight library for ML of text representations and text classifiers based on the Word2Vec models [58], developed by Facebook, Inc. (currently, Meta Platforms, Inc., Menlo Park, CA, USA). fastText can transform a n-grams string of characters, i.e., a subword, to its embedding, while the Word2Vec models (CBOW or skip-gram) can transform only words to their embeddings. fastText can transform not only the registered words in a word dictionary but also unknown (not-registered) words to their embeddings. Note that the hyperparameters for fastText were set to the default settings.

5.3.3. Naive Bayes

Naive Bayes algorithms are a kind of supervised ML models, based on applying Bayes’ theorem with strong (naive) feature independence assumptions. This paper adopted Gaussian Naive Bayes (GaussianNB) [59] from among various NB algorithms, e.g., BernoulliNB for multi-variate Bernoulli models, CategoricalNB for categorical features, and MultinomialNB for multi-nomial models, while Complement Naive Bayes (ComplementNB) [60] is designed to correct the severe assumptions made by the standard Multi-nomial Naive Bayes (MultinomialNB) classifier and is particularly suited for imbalanced datasets. And GaussianNB was inputted with a TFIDF-based vector [42], transformed from a Japanese text as a content of a tweet. Note that scikit-learn [55] was adopted for GaussianNB, and its hyperparameters were set to the default settings.

5.3.4. BERT and RoBERTa

BERT (Bidirectional Encoder Representations from Transformers) [61] is a kind of self-supervised Transformer-based ML model for NLP (Natural Language Processing), developed by Google LLC (Mountain View, CA, USA). This paper adopted tohoku-nlp/bert-base-Japanese [62] as a pre-trained model for Japanese texts of tweets, and conducted Transfer Learning based on the pre-trained model by utilizing the data of the Dataset III for pre-training a ML model and tested our pre-trained and transfer-learned model by utilizing the data of the Dataset III for testing a ML model. Note that BertJapaneseTokenizer [63] was adopted as a tokenizer for Japanese texts of tweets.
RoBERTa (Robustly optimized BERT approach) [64] is a robustly optimized method for pre-training NLP systems that improves on BERT, developed by Facebook, Inc. (currently, Meta Platforms, Inc.). This paper adopted rinna/japanese-roberta-base [65] as a pre-trained model for Japanese texts of tweets, and conducted Transfer Learning based on the pre-trained model by utilizing the data of the Dataset III for pre-training a ML model. We tested our pre-trained and transfer-learned model by utilizing the data of Dataset III for testing a ML model. Note that T5Tokenizer [66] was adopted as a tokenizer for Japanese texts of tweets.

5.3.5. ChatGPT

OpenAI’s ChatGPT [67] is one of the most popular Q&A (Question and Answering) systems based on LLMs (Large Language Models), e.g., GPT (Generative Pre-trained Transformer). In this paper, ChatGPT was inputted the following two types of questions that ask it to text-classify a tweet as being a “tourism tweet” or not, as a result of prompt engineering. Note that not the latest gpt-4-o(mni) but gpt-3.5-turbo was adopted for ChatGPT.
  • Simple Prompt:
    • Question 1 (by a user): From now, you are asked by me to text-classify an inputted tweet into a tourism tweet or not. A “tourism tweet” is defined as follows:
      • A tourism tweet is defined as a tweet in which a tourist travels to tourism spot(s) and does any tourism activity in the tourism spot.
    • Please output the result as follows: (a text of an inputted tweet).
    • The result of classification: a tourism tweet in Hikone, or a tourism tweet but not in Hikone, or not a tourism tweet.
    • The reason why you have classified the above-mentioned result: (the reason).
    •  
    • An example of an inputted tweet and its result and reason:
    • “Today, I went to Hikone Castle. Hikonyan came to me.”
    • The result of classification: a tourism tweet in Hikone.
    • The reason why it is a tourism tweet in Hikone: The inputted tweet specifies that it is a trip to Hikone Castle and also that it is an activity of the day. In addition, the inputted tweet obviously shows a tourism activity in Hikone Castle because Hikonyan3 (who is the mascot character of Hikone Castle) appeared in it.
    • I will ask you to text-classify inputted tweets as soon as they are ready.
    •  
    • Answer 1 (by ChatGPT): I have understood your instruction. Please input tweets for text-classification, as soon as they are ready. I can help you for any tweets.
  • Extended Prompt:
    • Question 2 (by a user): From now, you are asked by me to text-classify an inputted tweet into a tourism tweet or not. A “tourism tweet” is defined as follows:
      • A tourism tweet is defined as a tweet in which a tourist travels to tourism spot(s) and does any tourism activity in the tourism spot.
    • The detailed conditions of tourism tweets are as follows:
    • (1) A tourism tweet requires to include not only a tourism spot in a tourism region or the name of a tourism region but also any tourism activity in the tourism spot/region. Even if a tourism activity is not specified in a tweet, you have to classify the tweet as a tourism tweet if the tourism activity performed in the tweet can be judged as such based on the context.
    • (2) A tourism tweet requires to be posted on the day (“I went to today!” is a tourism tweet, but “I enjoyed yesterday!” and “I wanna go to next week!” are not tourism tweets). If the date is not specified in a tweet, you have to judge that the date is the day (i.e., today).
    • (3) A tourism tweet requires to include not only the other’s tourism activity but any first-person tourism activity (“I went to today!” is a tourism tweet, but “Ms. said that she enjoyed today!” is not a tourism tweet.).
    • (4) Such a tweet as “I’m in ” can be classified as a tourism tweet only if the target topics of the tweet are about a tourism spot.
    • (5) If a tweet includes a tourism activity and not only its mainly targeted tourism spot but also the other tourism spots, the tourism activity has to be paired with the mainly targeted tourism spot.
    • (6) A tweet posted by a user who seems to live in its mainly targeted tourism spot cannot be classified as a tourism tweet (because any activity in the tweet is not a tourism activity for her/him who is a citizen of its mainly targeted tourism spot).
    • Please output the result as follows: (a text of an inputted tweet).
    • The result of classification: a tourism tweet in Hikone, a tourism tweet but not in Hikone, or not a tourism tweet.
    • The reason why you have classified the above-mentioned result: (the reason).
    •  
    • An example of an inputted tweet and its result and reason:
    • “Today, I went to Hikone Castle. Hikonyan came to me.”
    • The result of classification: a tourism tweet in Hikone.
    • The reason why it is a tourism tweet in Hikone: The inputted tweet specifies that it is a trip to Hikone Castle and also that it is an activity of the day. In addition, the inputted tweet obviously shows a tourism activity in Hikone Castle because Hikonyan3 (who is the mascot character of Hikone Castle) appeared in it.
    • I will ask you to text-classify inputted tweets as soon as they are ready.
    •  
    • Answer 2 (by ChatGPT): I have understood your instruction. Please input tweets for text-classification as soon as they are ready. I can help you for any tweets.
ChatGPT was regarded as text-classifying a tweet as being a tourism tweet in “Hikone”, Japan, only if ChatGPT outputted “a tourism tweet in Hikone” as the result of the classification. Therefore, ChatGPT was regarded as text-classifying a tweet as being not a tourism tweet in “Hikone”, Japan, if ChatGPT outputted “a tourism tweet but not in Hikone” or “not a tourism tweet” or the any other texts (e.g., “not a tourism tweet in Hikone”) as the result of the classification.

5.4. Experimental Results

Table 8 shows the comparison between various methods of ML (Machine Learning) using Dataset III, with respect to their effects on precisely mining the Twitter.com (currently, X.com) for text-classifying tweets as being tourism tweets or not. An analysis of the table found the following findings:
  • With respect to the precision for tourism tweets, SVM with TFIDF-based vectors, fastText, and ChatGPT with simple/extended prompt performed very well, while SVM with Doc2Vec-based vectors did not perform well.
  • With respect to the recall for tourism tweets, Naive Bayes (more specifically, GaussianNB) with TFIDF-based vectors performed the best.
  • With respect to the F1-score for tourism tweets, Naive Bayes (more specifically, GaussianNB) with TFIDF-based vectors performed the best, but a ML method with higher F1-score (>0.88) is required for visualizing tourism tweets as not summarized but instantiated knowledge, i.e., instances of them without any summarization.
  • SVM with TFIDF-based vectors, fastText, and ChatGPT with simple/extended prompt performed with very high precision but not high recall, and they filtered out many tourism tweets (i.e., signals) as well as not tourism tweets (i.e., noises), possibly including valuable “peculiar/minor signals”. Meanwhile, Naive Bayes with TFIDF-based vectors performed with the highest recall and not low precision, and it may have best tackled the challenge of salvaging not only “typical/major signals” but also “peculiar/minor signals” without noises for e-tourism (i.e., smart tourism).

6. Conclusions

A great number of technologies with AI (Artificial Intelligence)), DS (Data Science), and/or IoT (Internet of Things) are also pervasive in e-tourism (i.e., smart tourism). However, most of these technologies for a target (e.g., what to do in such a tourism spot as Hikone Castle) utilize their “typical/major signals” (e.g., take a photo) as summarized knowledge based on “The Principle of Majority”, and tend to filter out not only their noises but also their valuable “peculiar/minor signals” (e.g., view Sawayama Castle) as instantiated knowledge. Therefore, as a challenge to salvage not only “typical signals” but also “peculiar signals” without noises for e-tourism, this paper has compared various methods of ML (Machine Learning) with respect to the effects on precisely mining the Twitter.com (currently, X.com) for tourism tweets as not summarized but instantiated knowledge. In addition, this paper has proposed a MAS (Multi-Agent Simulation), powered with artisoc, for visualizing tourism tweets, including not only “typical signals” but also “peculiar signals” whose number can be enormous, as not summarized but instantiated knowledge, i.e., instances of them without summarization, and has tried to validate the effects of the proposed MAS powered with artisoc by conducting some experiments with subjects.
This paper has acquired a diverse array of findings with three kinds of experiments. Unfortunately, the compared ML methods and also the proposed MAS powered with artisoc did not perform well enough. The following future works are planned:
  • Data other than Japanese text as a content of a tweet will be utilized in the near future as shown in Figure 10. For example, Japanese texts converted from photo(s) in a tweet by Image-to-Text such as Japanese Stable VLM (Vision–Language Model) [68] will be utilized to compare various methods of MLs for precisely text-classifying a tweet as being a “tourism tweet” or not.
  • The proposed MAS powered with artisoc will be improved by resolving the overlap problem of texts, such as the name of each Spot Agent (i.e., tourism spot) and each User Agent’s activities (i.e., verbs) for a designated Spot Agent; made more understandable by geomapping Spot Agents and User Agents; and made more hybrid by visualizing not only instances of tourism tweets, i.e., each User Agent’s traveling with its tourism activities (i.e., verbs) in a designated Spot Agent as instantiated knowledge, but also the number of User Agents (i.e., visitors) for each Spot Agent (i.e., tourism spot) as well as the typical routes that more User Agents traveled as summarized knowledge.
  • The future work requires to make the subject samples as gender-balanced as possible and also as age-balanced as possible. In addition, the future work requires to make the size of the dataset of consumer-generated contents, e.g., tweets, as large as possible and also to make it include tourism spots/regions that are as diverse as possible.

Author Contributions

Conceptualization, Y.F. and S.H.; methodology, Y.F. and S.H.; software, Y.F.; validation, Y.F. and S.H.; formal analysis, Y.F. and S.H.; investigation, Y.F., S.H. and M.T.; resources, Y.F.; data curation, Y.F. and S.H.; writing—original draft, Y.F. and S.H.; writing—review and editing, S.H., Y.F., W.S. and M.T.; visualization, Y.F. and S.H.; supervision, S.H., W.S. and M.T.; project administration, S.H.; funding acquisition, S.H. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by JSPS’s (the Japan Society for the Promotion of Science) KAKENHI grants (24K06287 to S.H.).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study.

Data Availability Statement

The datasets presented in this article may not be available on request from the corresponding author due to the privacy reasons of subjects and copyrights of tweets (from Twitter.com, currently X.com).

Acknowledgments

This work was partially supported by Regional ICT Research Center of Human, Industry and Future at The University of Shiga Prefecture, and by the Cabinet Office, Government of Japan.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. Yang, Z.R.; Yang, Z. Chapter 6.01—Artificial Neural Networks. In Comprehensive Biomedical Physics; Persson, B., Ed.; Volume 6: Bioinformatics; Elsevier: Amsterdam, The Netherlands, 2014; pp. 1–17. [Google Scholar] [CrossRef]
  2. Society 5.0. Available online: https://www8.cao.go.jp/cstp/english/society5_0/index.html (accessed on 7 July 2024).
  3. Hattori, S.; Aiba, K.; Takahara, M. R2-B2: A Metric of Synthesized Image’s Photorealism by Regression Analysis based on Recognized Objects’ Bounding Box. In Proceedings of the Joint 12th International Conference on Soft Computing and Intelligent Systems and 23rd International Symposium on advanced Intelligent Systems (SCIS&ISIS’22), Online/Ise-Shima, Japan, 29 November–2 December 2022. F-1-F-1. [Google Scholar] [CrossRef]
  4. Hattori, S.; Takahara, M. A Study on Human-Computer Interaction with Text-to/from-Image Game AIs for Diversity Education. In Proceedings of the 25th International Conference on Human-Computer Interaction (HCI International 2023), Online/Copenhagen, Denmark, 23–28 July 2023; LNCS. Volume 14015, pp. 471–486. [Google Scholar] [CrossRef]
  5. Takahara, M.; Hattori, S. A Study on HCI of a Collaborated Nurture Game for Sleep Education with Child and Parent. In Proceedings of the 25th International Conference on Human-Computer Interaction (HCI International 2023), Online/Copenhagen, Denmark, 23–28 July 2023; LNCS. Volume 14015, pp. 169–181. [Google Scholar] [CrossRef]
  6. Takahara, M.; Nishimura, S.; Hattori, S. A Study on a Mechanism to Prevent Sleeping Smartphones using ASMR. In Proceedings of the 26th International Conference on Human-Computer Interaction (HCI International 2024), Online/Washington, DC, USA, 29 June 29–4 July 2024; LNCS. Volume 14689, pp. 279–288. [Google Scholar] [CrossRef]
  7. Hattori, S.; Miyamoto, S.; Sunayama, W.; Takahara, M. A Study on Input Methods of User Preference for Personalized Fashion Coordinate Recommendations. In Proceedings of the 26th International Conference on Human-Computer Interaction (HCI International 2024), Online/Washington, DC, USA, 29 June 29–4 July 2024; LNCS. Volume 14691, pp. 178–196. [Google Scholar] [CrossRef]
  8. SAMOE–Simple Simulation for Semi-Order Made Apron-Premium Pattern-. Available online: https://samoe.net/f/simulation-special (accessed on 8 August 2024).
  9. Arasawa, K.; Hattori, S. Automatic Baseball Video Tagging based on Voice Pattern Prioritization and Recursive Model Localization. J. Adv. Comput. Intell. Intell. Inform. 2017, 21, 1262–1279. [Google Scholar] [CrossRef]
  10. Watanabe, R.; Arasawa, K.; Hattori, S. Rule-Based Role Analysis of Game Characters Using Tags about Characteristics for Strategy Estimation by Game AI. In Proceedings of the Intelligent Systems Workshop 2018 (ISWS ’18) in Conjunction with SCIS&ISIS’18, Toyama, Japan, 5–8 December 2018; Fr6-1-5. pp. 814–819. [Google Scholar]
  11. Hattori, S.; Kurono, M.; Yoshida, Y.; Takahara, M.; Kudo, Y. Time Control of Thinking and Cursor Movement for Humanized Othello AIs. Inf. Process. Soc. Jpn. Trans. Database 2023, 16, 16–33. [Google Scholar]
  12. Sharda, N. (Ed.) Tourism Informatics: Visual Travel Recommender Systems, Social Communities, and User Interface Design; Information Science Reference (ISR); IGI Global: Hershey, PA, USA, 2009. [Google Scholar] [CrossRef]
  13. Matsuo, T.; Hashimoto, K.; Iwamoto, H. (Eds.) Tourism Informatics; Springer: Berlin/Heidelberg, Germany, 2015; ISRL; Volume 90. [Google Scholar] [CrossRef]
  14. Wang, R.; Wu, C.; Wang, X.; Xu, F.; Yuan, Q. e-Tourism Information Literacy and Its Role in Driving Tourist Satisfaction with Online Travel Information: A Qualitative Comparative Analysis. J. Travel Res. 2024, 63, 904–922. [Google Scholar] [CrossRef]
  15. Wei, W.; Ding, S.; Chai, Y.; Sun, J.; Wang, F. Exploring the Role of Information Technology in Tourism Informatics: State of the Art. Asia Pac. J. Tour. Res. 2024, 29, 995–1016. [Google Scholar] [CrossRef]
  16. Oku, K.; Hattori, F. Mapping Geotagged Tweets to Tourist Spots Considering Activity Region of Spot. In Tourism Informatics; Matsuo, T., Hashimoto, K., Iwamoto, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; ISRL; Volume 90, pp. 15–30. [Google Scholar] [CrossRef]
  17. Nakano, H.; Arasawa, K.; Watanabe, R.; Hattori, S. Sightseeing Spot Recommendation Based on Photographer’s Preference Extracted from Sightseeing Photographs. IEICE SIG-IN Tech. Rep. 2019, 118, 45–50. [Google Scholar]
  18. Kurashima, T.; Tezuka, T.; Tanaka, K. Blog Map of Experiences: Extracting and Geographically Mapping Visitor Experiences from Urban Blogs. In Proceedings of the 6th International Conference on Web Information Systems Engineering (WISE ’05), New York, NY, USA, 20–22 November 2005; LNCS. Volume 3806, pp. 496–503. [Google Scholar] [CrossRef]
  19. Kurashima, T.; Tezuka, T.; Tanaka, K. Mining and Visualizing Local Experiences from Blog Entries. In Proceedings of the 17th International Conference on Database and Expert Systems Applications (DEXA’06), Kraków, Poland, 4–8 September 2006; LNCS. Volume 4080, pp. 213–222. [Google Scholar] [CrossRef]
  20. Kurashima, T.; Fujimura, K.; Okuda, H. Discovering Association Rules on Experiences from Large-Scale Blog Entries. In Proceedings of the 31th European Conference on IR Research on Advances in Information Retrieval (ECIR’09), Toulouse, France, 6–9 April 2009; LNCS. Volume 5478, pp. 546–553. [Google Scholar] [CrossRef]
  21. Hattori, S.; Tezuka, T.; Tanaka, K. Activity-Based Query Refinement for Context-Aware Information Retrieval. In Proceedings of the 9th International Conference on Asian Digital Libraries (ICADL’06), Kyoto, Japan, 27–30 November 2006; LNCS. Volume 4312, pp. 474–477. [Google Scholar] [CrossRef]
  22. Hattori, S.; Tezuka, T.; Ohshima, H.; Oyama, S.; Kawamoto, J.; Tajima, K.; Tanaka, K. ReCQ: Real-world Context-aware Querying. In Proceedings of the 6th International and Interdisciplinary Conference on Modeling and Using Context (CONTEXT’07), Roskilde, Denmark, 20–24 August 2007; LNAI. Volume 4635, pp. 248–262. [Google Scholar] [CrossRef]
  23. Hattori, S.; Tezuka, T.; Tanaka, K. Mining the Web for Appearance Description. In Proceedings of the 18th International Conference on Database and Expert Systems Applications (DEXA’07), Regensburg, Germany, 3–7 September 2007; LNCS. Volume 4653, pp. 790–800. [Google Scholar] [CrossRef]
  24. Hattori, S.; Tezuka, T.; Tanaka, K. Extracting Visual Descriptions of Geographic Features from the Web as the Linguistic Alternatives to Their Images in Digital Documents. Inf. Process. Soc. Jpn. Trans. Database 2007, 48, 69–82. [Google Scholar]
  25. Hattori, S.; Ohshima, H.; Oyama, S.; Tanaka, K. Extracting Conceptu(r)al Hierarchies from the Web by Term Coordinate and Property Inheritance Relationships. IEICE SIG-DE Tech. Rep. 2006, 107, 127–132. [Google Scholar]
  26. Kori, H.; Hattori, S.; Tezuka, T.; Tanaka, K. Automatic Generation of Multimedia Tour Guide from Local Blogs. In Proceedings of the 13th International MultiMedia Modeling Conference (MMM’07), Singapore, 9–12 January 2007; LNCS. Volume 4351, pp. 690–699. [Google Scholar] [CrossRef]
  27. Nagasawa, Y.; Yoshida, K.; Hattori, S. A Study on Travel Path Extraction and Mapping for Understanding Support of Travelogues in Mobile Devices. IEICE SIG-MoNA Tech. Rep. 2014, 114, 19–24. [Google Scholar]
  28. Samejima, M. Topic Analysis of Case Reports in Tourism towards Collaborative Tourism Planning Support. In Tourism Informatics; Matsuo, T., Hashimoto, K., Iwamoto, H., Eds.; Springer: Berlin/Heidelberg, Germany, 2015; ISRL; Volume 90, pp. 1–14. [Google Scholar] [CrossRef]
  29. Tezuka, T.; Kurashima, T.; Tanaka, K. Toward Tighter Integration of Web Search with a Geographic Information System. In Proceedings of the 15th International Conference on World Wide Web (WWW’06), Edinburgh, UK, 23–26 May 2006; pp. 277–286. [Google Scholar] [CrossRef]
  30. Kawamura, N.; Arasawa, K.; Hattori, S. e-Travel: Automatical Travel Support Site Generation Based on Review Analysis per Travel Style. IEICE SIG-IN Tech. Rep. 2020, 119, 13–18. [Google Scholar]
  31. Uwano, F.; Kobayashi, R.; Manabu Ohta, M. Automatic Extraction of User-Centric Aspects for Tourist Spot Recommender Systems Using Reviews in Japanese. In Proceedings of the 26th International Conference on Human-Computer Interaction (HCI International’24), Online/Washington, DC, USA, 29 June 29–4 July 2024; LNCS. Volume 14691, pp. 245–260. [Google Scholar] [CrossRef]
  32. Kori, H.; Hattori, S.; Tezuka, T.; Tanaka, K. Extraction of Visitors’ Typical Route and its Context from Local Blogs. IEICE SIG-DE Tech. Rep. 2006, 106, 29–34. [Google Scholar]
  33. Hattori, S.; Tanaka, K. Search(ing) the Web for Typical Images based on Extracting Color-names from the Web and Converting them to Color-Features. Inf. Process. Soc. Jpn. Trans. Database 2008, 6, 9–12. [Google Scholar]
  34. Heinberg, J.G. History of the Majority Principle. Am. Political Sci. Rev. 1926, 20, 52–68. [Google Scholar] [CrossRef]
  35. Hattori, S.; Tanaka, K. Search(ing) the Web for Peculiar Images by Converting Web-extracted Peculiar Color-Names into Color-Features. Inf. Process. Soc. Jpn. Trans. Database 2010, 3, 49–63. [Google Scholar]
  36. Hattori, S. Peculiar Image Retrieval by Cross-Language Web-extracted Appearance Descriptions. Int. J. Comput. Inf. Syst. Ind. Manag. 2012, 4, 486–495. [Google Scholar]
  37. Hattori, S. Hyponymy-Based Peculiar Image Retrieval. Int. J. Comput. Inf. Syst. Ind. Manag. 2013, 5, 79–88. [Google Scholar]
  38. artisoc4—MAS Community—Kozo Keikaku Engineering Inc. Available online: https://mas.kke.co.jp/en/artisoc4/ (accessed on 7 July 2024).
  39. The Surveys on Domestic Tourism Needs by Jalan Research Center. Available online: https://jrc.jalan.net/surveys/corona_investigation/ (accessed on 8 August 2024).
  40. Hikonyan—Visit Omi. Available online: https://visit-omi.com/people/article/hikonyan (accessed on 7 July 2024).
  41. Visualizing Tourism Tweets as a MAS (Multi-Agent Simulation) Powered with Artisoc. Available online: https://www.youtube.com/watch?v=8K3wQy60slI (accessed on 7 July 2024).
  42. TfidfTransformer—Scikit-Learn 1.5.1 Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.feature_extraction.text.TfidfTransformer.html (accessed on 7 July 2024).
  43. Sthle, L.; Wold, S. Analysis of Variance (ANOVA). Elsevier Chemom. Intell. Lab. Syst. 1989, 6, 259–272. [Google Scholar] [CrossRef]
  44. Gotouchi-Chara in Hikone 2022. Available online: http://gotouchi-chara.jp/hikone2022/ (accessed on 7 July 2024).
  45. Google My Maps—About. Available online: https://www.google.com/maps/about/mymaps/ (accessed on 7 July 2024).
  46. Padilla, J.J.; Kavak, H.; Lynch, C.J.; Gore, R.J.; Diallo, S.Y. Temporal and Spatiotemporal Investigation of Tourist Attraction Visit Sentiment on Twitter. PLoS ONE 2018, 13, e0198857. [Google Scholar] [CrossRef] [PubMed]
  47. Kotsiantis, S.B.; Zaharakis, I.D.; Pintelas, P.E. Machine Learning: A Review of Classification and Combining Techniques. Artif. Intell. Rev. 2006, 26, 159–190. [Google Scholar] [CrossRef]
  48. Khan, A.; Baharudin, B.; Lee, L.H.; Khan, K. A Review of Machine Learning Algorithms for Text-Documents Classification. J. Adv. Inf. Technol. 2007, 1, 4–20. [Google Scholar] [CrossRef]
  49. Osisanwo, F.Y.; Akinsola, J.E.T.; Awodele, O.; Hinmikaiye, J.O.; Olakanmi, O.; Akinjobi, J. Supervised Machine Learning Algorithms: Classification and Comparison. Int. J. Comput. Trends Technol. (IJCTT) 2017, 48, 128–138. [Google Scholar] [CrossRef]
  50. Garrido-Merchán, E.C.; Gozalo-Brizuela, R.; González-Carvajal, S. Comparing BERT Against Traditional Machine Learning Text Classification. J. Comput. Cogn. Eng. 2023, 2, 352–356. [Google Scholar] [CrossRef]
  51. Reiss, M.V. Testing the Reliability of ChatGPT for Text Annotation and Classification: A Cautionary Remark. arXiv 2023, arXiv:2304.11085. [Google Scholar] [CrossRef]
  52. Support Vector Machines—Scikit-Learn 1.5.1 Documentation. Available online: https://scikit-learn.org/stable/modules/svm.html (accessed on 7 July 2024).
  53. Mikolov, T.; Sutskever, I.; Chen, K.; Corrado, G.; Dean, J. Distributed Representations of Words and Phrases and their Compositionality. In Proceedings of the 26th International Conference on Neural Information Processing Systems (NeurIPS’13), Lake Tahoe, NV, USA, 5–10 December 2013; Volume 2, pp. 3111–3119. [Google Scholar]
  54. models.doc2vec—Doc2vec Paragraph Embeddings—Gensim. Available online: https://radimrehurek.com/gensim/models/doc2vec.html (accessed on 7 July 2024).
  55. Scikit-Learn: Machine Learning in Python—Scikit-Learn 1.5.1 Documentation. Available online: https://scikit-learn.org/stable/ (accessed on 7 July 2024).
  56. Gensim: Topic Modelling for Humans. Available online: https://radimrehurek.com/gensim/ (accessed on 7 July 2024).
  57. fastText. Available online: https://fasttext.cc/ (accessed on 7 July 2024).
  58. Mikolov, T.; Chen, K.; Corrado, G.; Dean, J. Efficient Estimation of Word Representations in Vector Space. In Proceedings of the 1st International Conference on Learning Representations (ICLR’13), Scottsdale, AZ, USA, 2–4 May 2013. Workshop Track Proceedings. [Google Scholar]
  59. GaussianNB—Scikit-Learn 1.5.1 Documentation. Available online: https://scikit-learn.org/stable/modules/generated/sklearn.naive_bayes.GaussianNB.html (accessed on 7 July 2024).
  60. Rennie, J.D.M.; Shih, L.; Teevan, J.; Karger, D.R. Tackling the Poor Assumptions of Naive Bayes Text Classifiers. In Proceedings of the Twentieth International Conference on Machine Learning (ICML’03), Washington, DC, USA, 21–24 August 2003; pp. 616–623. [Google Scholar]
  61. Devlin, J.; Chang, M.-W.; Lee, K.; Toutanova, K. BERT: Pre-Training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 17th Annual Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (NAACL-HLT’19), Minneapolis, MN, USA, 2–7 June 2019; pp. 4171–4186. [Google Scholar]
  62. tohoku-nlp/bert-base-japanese—Hugging Face. Available online: https://huggingface.co/tohoku-nlp/bert-base-japanese (accessed on 7 July 2024).
  63. BertJapanese—Hugging Face. Available online: https://huggingface.co/docs/transformers/en/model_doc/bert-japanese (accessed on 7 July 2024).
  64. Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; Stoyanov, V. RoBERTa: A Robustly Optimized BERT Pretraining Approach. arXiv 2019, arXiv:1907.11692. [Google Scholar]
  65. Rinna/Japanese-Roberta-Base—Hugging Face. Available online: https://huggingface.co/rinna/japanese-roberta-base (accessed on 7 July 2024).
  66. T5—Hugging Face. Available online: https://huggingface.co/docs/transformers/model_doc/t5 (accessed on 7 July 2024).
  67. ChatGPT|OpenAI. Available online: https://openai.com/chatgpt/ (accessed on 7 July 2024).
  68. Japanese Stable VLM—Stability.ai. Available online: https://ja.stability.ai/blog/japanese-stable-vlm (accessed on 7 July 2024).
Figure 1. Four kinds of data analyses of user-generated contents on tourism spots from the Web.
Figure 1. Four kinds of data analyses of user-generated contents on tourism spots from the Web.
Electronics 13 03276 g001
Figure 2. ML (Machine Learning) and MAS (Multi-Agent Simulation) powered with artisoc [38] for mining and visualizing tourism tweets as not summarized but instantiated knowledge.
Figure 2. ML (Machine Learning) and MAS (Multi-Agent Simulation) powered with artisoc [38] for mining and visualizing tourism tweets as not summarized but instantiated knowledge.
Electronics 13 03276 g002
Figure 3. An example flow of our proposed MAS of Spot Agents and User Agents, powered with artisoc [38], for visualizing tourism tweets as not summarized but instantiated knowledge.
Figure 3. An example flow of our proposed MAS of Spot Agents and User Agents, powered with artisoc [38], for visualizing tourism tweets as not summarized but instantiated knowledge.
Electronics 13 03276 g003
Figure 4. Examples of frames in a 27 s animation video [41] of visualizing tourism tweets on 3 February 2019 in “Hikone”, Japan as a MAS (Multi-Agent Simulation) powered with artisoc [38].
Figure 4. Examples of frames in a 27 s animation video [41] of visualizing tourism tweets on 3 February 2019 in “Hikone”, Japan as a MAS (Multi-Agent Simulation) powered with artisoc [38].
Electronics 13 03276 g004
Figure 5. Geomapping tourism activities of tourism tweets on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, Japan, with not-tourism activities of noises at a ratio of n s + n = 0.00 and 0.50, respectively.
Figure 5. Geomapping tourism activities of tourism tweets on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, Japan, with not-tourism activities of noises at a ratio of n s + n = 0.00 and 0.50, respectively.
Electronics 13 03276 g005
Figure 6. Geomapping tourism activities of tourism tweets on 24 October 2022 when no specific event was held in “Hikone”, Japan, with not-tourism activities of noises at a ratio of n s + n = 0.00 and 0.50, respectively.
Figure 6. Geomapping tourism activities of tourism tweets on 24 October 2022 when no specific event was held in “Hikone”, Japan, with not-tourism activities of noises at a ratio of n s + n = 0.00 and 0.50, respectively.
Electronics 13 03276 g006
Figure 7. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge for all subjects ( N = 63 ).
Figure 7. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge for all subjects ( N = 63 ).
Electronics 13 03276 g007
Figure 8. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge only for the subjects ( N = 23 ) who knew a little or a lot about tourism in “Hikone”, Japan, excluding the subjects who knew nothing or little.
Figure 8. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge only for the subjects ( N = 23 ) who knew a little or a lot about tourism in “Hikone”, Japan, excluding the subjects who knew nothing or little.
Electronics 13 03276 g008
Figure 9. Contents and metadata of a tweet, which can be utilized for ML (Machine Learning) for mining tourist tweets as not summarized but instantiated knowledge.
Figure 9. Contents and metadata of a tweet, which can be utilized for ML (Machine Learning) for mining tourist tweets as not summarized but instantiated knowledge.
Electronics 13 03276 g009
Figure 10. Contents and metadata of a tweet, and also Image-to-Text, e.g., Japanese Stable VLM [68] which can be utilized for ML (Machine Learning) for mining tourist tweets as not summarized but instantiated knowledge.
Figure 10. Contents and metadata of a tweet, and also Image-to-Text, e.g., Japanese Stable VLM [68] which can be utilized for ML (Machine Learning) for mining tourist tweets as not summarized but instantiated knowledge.
Electronics 13 03276 g010
Table 1. Subjective evaluations of Japanese texts of 94 tourism tweets without visualizing them by such a MAS (Multi-Agent Simulation) as artisoc ( N = 81 ).
Table 1. Subjective evaluations of Japanese texts of 94 tourism tweets without visualizing them by such a MAS (Multi-Agent Simulation) as artisoc ( N = 81 ).
Question5-Grade EvaluationsAverage
12345
Q1: Did it make Hikone seem attractive to you?325213113.02 ± 0.95
Q2: Did it make you want a trip to Hikone?531261722.75 ± 0.94
Q3: Did you find the tourism spot(s) in Hikone you want a trip to?721202763.05 ± 1.12
Q4: Did you find a new discovery about Hikone?819192783.10 ± 1.17
1 = “I disagree very much”, 2 = “I disagree a little”, 3 = “I neither agree nor disagree”, 4 = “I agree a little”, 5 = “I agree very much”.
Table 2. Subjective evaluations of visualizing Japanese texts of 94 tourism tweets by such a MAS as artisoc and p-value of ANOVA test [43] to compare them with/without the MAS ( N = 81 ).
Table 2. Subjective evaluations of visualizing Japanese texts of 94 tourism tweets by such a MAS as artisoc and p-value of ANOVA test [43] to compare them with/without the MAS ( N = 81 ).
Question5-Grade EvaluationsAveragep-Value
12345
Q1: Did it make Hikone seem attractive to you?325213112.59 ± 0.920.003 **
Q2: Did it make you want a trip to Hikone?1325271602.57 ± 0.940.224 (n.s.)
Q3: Did you find the tourism spot(s) in Hikone you want a trip to?1228191842.68 ± 1.130.037 *
Q4: Did you find a new discovery about Hikone?724242242.90 ± 1.060.261 (n.s.)
1 = “I disagree very much”, 2 = “I disagree a little”, 3 = “I neither agree nor disagree”, 4 = “I agree a little”, 5 = “I agree very much”, * = < 0.05, ** = < 0.01, (n.s.) = not significant.
Table 3. Subjects’ 4-grade degree of knowledge about tourism in “Hikone”, Japan in Experiment II.
Table 3. Subjects’ 4-grade degree of knowledge about tourism in “Hikone”, Japan in Experiment II.
4-Grade Degree of KnowledgeNumber of Subjects
0“I do not know.”13
1“I know little.”27
2“I know a little.”15
3“I know well.”8
1.29(Average of the all subjects)63
Table 4. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge and p-value of ANOVA test [43] to compare the ratio of noises n s + n = 0.10 with 0.50, on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, Japan, for all subjects ( N = 63 ).
Table 4. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge and p-value of ANOVA test [43] to compare the ratio of noises n s + n = 0.10 with 0.50, on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, Japan, for all subjects ( N = 63 ).
QuestionThe Ratio of Noises, n s + n p-Value
0.000.100.200.300.400.50
Q5: (Naturality) Did you feel strangeness with the map?6.516.516.356.276.276.000.215 (n.s.)
Q6: (Understandability) Can you understand tourism of Hikone?5.896.216.156.136.025.940.471 (n.s.)
For Q5, 1 = “I felt strangeness with the map.” ⇔ 10 = “I did not feel strangeness with the map.” For Q6, 1 = “I was not able to understand Hikone.” ⇔ 10 = “I was able to understand Hikone.” (n.s.) = not significant.
Table 5. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge and p-value of ANOVA test [43] to compare the ratio of noises n s + n = 0.00 with 0.50, on 24 October 2022 when no specific event was held in “Hikone”, Japan, for all subjects ( N = 63 ).
Table 5. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge and p-value of ANOVA test [43] to compare the ratio of noises n s + n = 0.00 with 0.50, on 24 October 2022 when no specific event was held in “Hikone”, Japan, for all subjects ( N = 63 ).
QuestionThe Ratio of Noises, n s + n p-Value
0.000.100.200.300.400.50
Q5: (Naturality) Did you feel strangeness with the map?6.386.066.416.356.376.490.779 (n.s.)
Q6: (Understandability) Can you understand tourism of Hikone?6.106.106.006.196.086.101.000 (n.s.)
For Q5, 1 = “I felt strangeness with the map.” ⇔ 10 = “I did not feel strangeness with the map.” For Q6, 1 = “I was not able to understand Hikone.” ⇔ 10 = “I was able to understand Hikone.” (n.s.) = not significant.
Table 6. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge and p-value of ANOVA test [43] to compare the ratio of noises n s + n = 0.00 with 0.50, on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, Japan, only for the subjects ( N = 23 ) who knew a little or a lot about tourism in “Hikone”, excluding the subjects who knew nothing or little.
Table 6. The bad effects of noises on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge and p-value of ANOVA test [43] to compare the ratio of noises n s + n = 0.00 with 0.50, on 23 October 2022 when a popular annual event “Gotouchi-Chara Fair in Hikone 2022” [44] was held in “Hikone”, Japan, only for the subjects ( N = 23 ) who knew a little or a lot about tourism in “Hikone”, excluding the subjects who knew nothing or little.
QuestionThe Ratio of Noises, n s + n p-Value
0.000.100.200.300.400.50
Q5: (Naturality) Did you feel strangeness with the map?6.876.836.706.225.705.350.032 *
Q6: (Understandability) Can you understand tourism of Hikone?6.136.356.436.135.655.830.624 (n.s.)
For Q5, 1 = “I felt strangeness with the map.” ⇔ 10 = “I did not feel strangeness with the map.” For Q6, 1 = “I was not able to understand Hikone.” ⇔ 10 = “I was able to understand Hikone”. * = < 0.05, (n.s.) = not significant.
Table 7. The bad effects of noises for e-tourism on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge and p-value of ANOVA test [43] to compare the ratio of noises n s + n = 0.00 with 0.50, on 24 October 2022 when no specific event was held in “Hikone”, Japan, only for the subjects ( N = 23 ) who knew a little or well about tourism in “Hikone”, excluding the subjects who knew nothing or little about it.
Table 7. The bad effects of noises for e-tourism on visualizing (more specifically, geomapping) tourism tweets as not summarized but instantiated knowledge and p-value of ANOVA test [43] to compare the ratio of noises n s + n = 0.00 with 0.50, on 24 October 2022 when no specific event was held in “Hikone”, Japan, only for the subjects ( N = 23 ) who knew a little or well about tourism in “Hikone”, excluding the subjects who knew nothing or little about it.
QuestionThe Ratio of Noises, n s + n p-Value
0.000.100.200.300.400.50
Q5: (Naturality) Did you feel strangeness with the map?6.656.396.356.576.356.220.510 (n.s.)
Q6: (Understandability) Can you understand tourism of Hikone?5.916.176.136.006.176.170.703 (n.s.)
For Q5, 1 = “I felt strangeness with the map.” ⇔ 10 = “I did not feel strangeness with the map.” For Q6, 1 = “I was not able to understand Hikone.” ⇔ 10 = “I was able to understand Hikone.” (n.s.) = not significant.
Table 8. Comparison between ML (Machine Learning) methods using Dataset III with respect to their effects to precisely text-classify tweets as being tourism tweets or not.
Table 8. Comparison between ML (Machine Learning) methods using Dataset III with respect to their effects to precisely text-classify tweets as being tourism tweets or not.
ML Method A Tourism Tweet in HikoneNot a Tourism Tweet in Hikone
AccuracyPrecisionRecallF1-ScorePrecisionRecallF1-Score
SVM with TFIDF-based vectors0.7490.9700.5200.6770.6790.9860.804
SVM with Doc2Vec-based vectors0.4970.4990.3870.4360.5140.6190.562
fastText0.6850.9550.3670.5300.6260.9840.765
Naive Bayes with TFIDF-based vectors0.7310.6630.8530.7460.8360.6360.722
BERT0.5160.5110.4470.4770.5090.5700.538
RoBERTa0.4920.5150.4900.5020.4660.4920.479
ChatGPT with simple prompt0.6550.9350.3090.4640.6010.9800.745
ChatGPT with extended prompt0.7471.0000.4790.6470.6711.0000.803
Bold = highly-scored ML method(s) for each criterion, especially for “A Tourism Tweet in Hikone” (i.e., signals).
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hattori, S.; Fujidai, Y.; Sunayama, W.; Takahara, M. Effects of Machine Learning and Multi-Agent Simulation on Mining and Visualizing Tourism Tweets as Not Summarized but Instantiated Knowledge. Electronics 2024, 13, 3276. https://doi.org/10.3390/electronics13163276

AMA Style

Hattori S, Fujidai Y, Sunayama W, Takahara M. Effects of Machine Learning and Multi-Agent Simulation on Mining and Visualizing Tourism Tweets as Not Summarized but Instantiated Knowledge. Electronics. 2024; 13(16):3276. https://doi.org/10.3390/electronics13163276

Chicago/Turabian Style

Hattori, Shun, Yuto Fujidai, Wataru Sunayama, and Madoka Takahara. 2024. "Effects of Machine Learning and Multi-Agent Simulation on Mining and Visualizing Tourism Tweets as Not Summarized but Instantiated Knowledge" Electronics 13, no. 16: 3276. https://doi.org/10.3390/electronics13163276

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop