1. Introduction
In the digital transformation era, online platforms have become the primary means for travellers to search, compare, and book travel accommodations. With the vast amount of information available, finding the most suitable option can be overwhelming for users. Moreover, travellers often have distinct preferences, such as location, amenities, price range, and specific interests. Consequently, online platforms and revenue managers (In the tourism domain, a person whose job is to optimize the performance of an accommodation is commonly referred to as a “revenue manager” or a “revenue optimization manager”) in the hospitality industry must acquire a comprehensive comprehension of these dynamics to formulate a competitive and appealing offering [
1].
Recent advancements in Natural Language Processing (NLP), specifically through the development of large language models based on transformers, have demonstrated significant progress in capturing the intricate nuances of human language [
2]. Transformer-based models have exhibited remarkable capabilities in diverse natural language understanding tasks, including language generation, sentiment analysis, and question answering [
3,
4,
5,
6]. A wide range of systems now rely on these technologies, including modern conversational agents [
7,
8], medical applications [
9], translation systems [
10], and even tools for literature reviews [
11]. On the other hand, knowledge graphs have emerged as potent instruments for representing and organizing structured information in a semantic manner [
12]. These knowledge bases effectively capture the relationships between entities and attributes, providing a machine-readable depiction of the domain to various intelligent services [
13]. Typically, knowledge graphs structure information based on a domain ontology [
14], which formally describes entity types and relationships while supporting reasoning processes. They can also be automatically refined by means of link prediction techniques, which aim to identify additional relationships between domain entities [
15,
16].
Nevertheless, effectively integrating these two powerful technologies remains an ongoing challenge, giving rise to several intriguing issues [
17]. The primary challenges revolve around effectively combining information from unstructured and structured data sources, as well as appropriately encoding knowledge graph information [
18].
This paper presents KGE-BERT (Knowledge Graph Enhanced BERT), an innovative deep learning methodology that combines large language models with domain knowledge graphs with the goal of classifying tourism offers. Our approach employs transformer models to acquire a comprehensive understanding of accommodation descriptions that are expressed as unstructured texts. This acquired knowledge is then seamlessly combined with a detailed depiction of the tourism domain obtained from a knowledge graph that we generated using Airbnb data, improving the classification capabilities of the system. The underlying knowledge graph (Tourism Knowledge Graph—London) describes over 65,000 accommodations modelled according to the Tourism Analytic Ontology (TAO) (see
http://purl.org/tao/ns, accessed on 1 July 2024).
The main objective of our system is to assist revenue managers in the following two fundamental dimensions: (i) comprehending the market positioning of their accommodation offerings, taking into account price and availability, together with user reviews and demand, and (ii) optimizing their offerings on online platforms. This optimization can be achieved through improvements in the style and level of detail provided in the descriptions or through modifications to the accommodations themselves. For instance, introducing new amenities that are typically associated with better reviews can enhance the overall appeal of the offering.
More specifically, we focus on a set of classification tasks that were identified by collaborating with Linkalab S.R.L, an Italian company specialized in data science and data engineering, which has developed an industrial project about Tourism 4.0 called Data Lake Turismo (“Turismo” means tourism in Italian) that collects and analyzes tourism-related data from the web. Hospitality business strategy needs to be informed by a variety of information about the business audience’s consumer behaviour to maximize revenue ensure competitiveness. When it comes to revenue management, it is critical to create specific strategies and adapt them to the current conditions. Therefore, revenue managers need tools to analyze various aspects, such as pricing, availability, and distribution channels. Our system aims to provide additional support through the classification of offer positioning based on the following four predicted dimensions: (1) price range, (2) user interest, (3) relevance within a specific market, and (4) user appreciation after utilization. We describe each relevant classification task in detail in
Section 4.
We evaluated our approach by comparing it to a BERT classifier and a baseline logistic regression classifier on a dataset of more than 15,000 accommodation offers for each classification task. We also performed a study on various combinations of feature types that highlighted how these have a significant impact on the performance of the models. The proposed solution obtains excellent results and significantly outperforms alternative methods, such as transformers models trained on the texts.
The main contributions of our work are summarized as follows:
We propose a novel methodology that effectively integrates large language models and knowledge graphs in the context of the tourism domain.
We provide a comprehensive evaluation demonstrating the advantages of the proposed solution compared to conventional transformer models.
We conduct an in-depth analysis of feature engineering techniques to identify the most effective combination of features.
We offer the full codebase (
https://github.com/luca-secchi/kge-bert, accessed on 15 June 2024) of our methodology, which successfully addresses four classification tasks within the tourism domain.
This manuscript is structured as follows. In
Section 2, we report previous related work.
Section 3 presents the materials (e.g., ontology and knowledge graph) considered in our study.
Section 4 discusses and formalizes the four tasks we tackle, while
Section 5 describes, in detail, the dataset used for training of the machine learning models.
Section 6 discusses a variety of alternative strategies for feature engineering.
Section 7 presents the architecture of our system, and
Section 8 reports the experimental evaluation.
Section 9 discusses the obtained results. Finally,
Section 10 discusses conclusions and future research directions.
4. Task Formulation
We identified four classification tasks that can be used to optimize an accommodation offer. These were derived from discussions with stakeholders and revenue managers in the context of our collaboration with Linkalab.
We defined and labeled them as follows:
- Task 1
Price value: Predict if the accommodation can be considered to be of high-value.
- Task 2
User interest: Predict if potential customers would be interested in a specific accommodation.
- Task 3
Relevance: Predict if the accommodation offer is competitive for the market.
- Task 4
User appreciation: Predict if users would appreciate the accommodation after trying it.
We encode each of them as a binary classification task, enabling the use of the outcomes as practical checklists for revenue managers. This approach allows users of our systems to experiment with various options and observe how their decisions affect the predicted dimensions.
The price value classification task uses two labels, namely low and high. Low-valued accommodations have a single-night stay price that is lower than the median of the price for Airbnb accommodations in the destination after removing outliers (Prices that are too big are removed. We first calculate the mean () and the standard deviation () for all prices (p), then remove all prices, where ). If the opposite is true, the accommodation is labeled as high value. This categorization helps users comprehend the appropriate pricing to propose, whether it is an initial proposal or a response to market fluctuations.
For the user interest task, we use the number of reviews to measure user interest. Indeed, Airbnb’s review system mandates that users can only provide a review once they have actually stayed at a property. Consequently, the number of reviews can be perceived as an indirect indicator of the property’s occupancy and, correspondingly, the volume of booking requests received by the host. Even when the user gives a negative evaluation after trying it, it still means that the user has booked and stayed in the first place, thus manifesting a concrete interest in the offer. In this case, we define the following two classes: uninteresting and interesting. Uninteresting accommodations do not have any reviews in the last 12 months, while interesting accommodations have one or more reviews in the same interval of time.
To evaluate the relevance, we considered the availability calendar for 365 days in the future for each accommodation. We define the following two classes: high relevance if there is at least one date bookable in the next year and low relevance otherwise (Even if it is not possible to know why an accommodation is not available for booking on a specific date, it is very unlikely that it is booked for all days in the following year). If an accommodation becomes unavailable for booking for long periods (365 days) we suppose that this happens because it is not competitive for the market and that this can be related to what it offers and how. Thus this classification works like an alert about the relevance of the offering in the long run.
Finally, we evaluated user appreciation by using the average review score. In Airbnb, each user must give six different review scores about specific aspects of their experience, namely accuracy, cleanliness, check-in, communication, location, and value. The average review score is a number from 1 to 5 calculated as the average of these six scores. We define the following two classes: highly appreciated if the average review score is higher than 4.5 and normally appreciated otherwise.
6. Feature Engineering
The datasets extracted from TKG were processed to produce the following four types of features: (i)
textual features, which constitute a natural input for transformer models such as BERT; (ii)
numerical features; (iii)
categorical features; and (iv)
linked entities, i.e., DBpedia entities extracted from the descriptions. The process is illustrated in the B block of
Figure 1.
The process includes the following steps. First, accommodation properties (i) are processed by a data transformation process (3) to preprocess the textual descriptions (a), which are then transformed into text features expressed as tokens (f) by a BERT tokenizer (6). The same data transformation also produces a vector of numeric values (b) that is normalized by a dedicated process (8) to produce a vector of numeric features (h). Amenities (ii) are transformed into numeric vectors (e) using a one-hot encoding process (5). DBpedia entities (iii) are reduced to a manageable number by discarding the ones associated with fewer than 100 accommodations (2), then transformed into numeric vectors (d) using a one-hot encoding process (4). Finally, we use a text augmentation process (9) to generate an augmented version of the descriptions that also include numeric features and amenities (c). This is processed by a BERT tokenizer (7) and transformed into another set of text features (g) that will be used to evaluate the ability of transformers to directly process structured features.
The longest textual description in our dataset consists of 198 words, with an average length of 114 words. Consequently, these descriptions can be conveniently processed by BERT, which accommodates texts with a maximum token limit of 512. Additionally, we have ample “room” within this limit to incorporate supplementary features in text format.
The numerical features encompass a range of metrics already in numeric format, including the number of bedrooms and beds, the number of bathrooms, minimum night stays, and so on. All dates (e.g., the first review date) are expressed as the number of days in the past with respect to the day the original data were retrieved from Airbnb (in our case, 10 September 2022). Finally, the true/false flag values, like instantly bookable flags, are transformed into numeric values of 1 or 0. Concatenating the numerical features, we produce an n-dimensional vectorwhose length differs slightly for each task, as we exclude the predicted variables specific to that task as well as all correlated values. For example, for the user interest classification task, we hide all metrics about the number of reviews (e.g., number_of_reviews, first_review, and last_review) or those related to review scores (e.g., review_scores_rating, review_scores_accuracy, and so on). Similarly, for the relevance classification task, we hide all variables about availability (e.g., availability_30, availability_60, availability_90, and availability_365). As a final step, we performed unity-based normalization on this vector, aiming to scale all values within the range of [0, 1]. This normalization process ensures a standardized representation of the vector, facilitating comparisons and analysis across different variables.
Information Injection as Text
In order to enhance the classification process, transformers such as BERT can be provided with additional information by extending the description texts with injected knowledge in the form of key terms and numbers. This can be considered a form of prompt addition as described by [
42]. To assess this methodology, we produced the augmented descriptions (g) depicted in
Figure 1. We did so by appending numeric and categorical features after the accommodation description. Entities extracted from the description in the previous steps are excluded to avoid redundancy.
More specifically, we used the following approach: (i) numeric properties’ values were incorporated as text, with each value separated by spaces; (ii) TAO amenities were included by adding their corresponding labels, separated by spaces. Regarding numeric value injection as text, previous works [
43,
44] have proven that BERT can handle numeric values expressed as text.
What follows is an example of an accommodation description text extended with the list of amenities (in bold) and the numerical features (in underlined text).
“This beautifully decorated two-bedroom serviced apartment is conveniently located in the vibrant Shoreditch area […] dishes and silverware cable tv cooking basics bathtub carbon monoxide alarm smoke alarm heating lockbox first aid kit […] 1 2.0 3.0 4.0 […]”.
In this example, we list four numeric properties associated with the accommodation whose meanings are based on their positions in the sequence, namely the host-is-super-host binary flag (value 1), the number of bedrooms (value 2.0), the number of beds (value 3.0), and the minimum nights bookable (value 4.0).