A Multi-Aspect Informed GRU: A Hybrid Model of Flight Fare Forecasting with Sentiment Analysis

Degife, Worku Abebe; Lin, Bor-Shen

doi:10.3390/app14104221

Open AccessArticle

A Multi-Aspect Informed GRU: A Hybrid Model of Flight Fare Forecasting with Sentiment Analysis

by

Worku Abebe Degife

and

Bor-Shen Lin

^*

Department of Information Management, National Taiwan University of Science and Technology, Taipei City 106335, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(10), 4221; https://doi.org/10.3390/app14104221

Submission received: 7 April 2024 / Revised: 12 May 2024 / Accepted: 13 May 2024 / Published: 16 May 2024

(This article belongs to the Section Computing and Artificial Intelligence)

Download

Browse Figures

Versions Notes

Abstract

:

This paper presents an advanced method for forecasting flight fares that combines aspect-based sentiment analysis (ABSA) with deep learning techniques, particularly the gated recurrent unit (GRU) model. This approach leverages historical airline ticket transaction data and customer reviews to better understand airline fare dynamics and the impact of customer sentiments on pricing. The aspect analysis extracts key service aspects from customer feedback and provides insightful correlations with airfare. These were further categorized into nine groups for sensitivity analysis, which offered a deeper understanding of how each group influences customers’ attitudes. This ABSA-driven forecasting method marks a departure from traditional models by utilizing sentiment data alongside airline transaction data to improve the predictive accuracy. Its effectiveness is demonstrated through metrics including a root mean square error (RMSE) of 0.0071, a mean absolute error (MAE) of 0.0137, and a coefficient of determination (R²) of 0.9899. Additionally, this model shows strong prediction performance in both short- and long-term fare predictions. It not only advances airfare forecasting methods but provides valuable insights for decision makers of airline industry to refine the pricing strategies or make improvements when it is indicated some services require further attention.

Keywords:

aspect-based sentiment analysis; flight fare forecasting; gated recurrent unit; aspect extraction; aspect categorization; sensitivity analysis

1. Introduction

Air travel has undergone a remarkable transformation over the years, evolving from an emblem of luxury to an accessible mode of global transportation. It stands as a testament to human ingenuity, facilitating global connectivity while adhering to stringent safety protocols [1]. The aviation revolution has fundamentally transformed spatial mobility and catalyzed significant shifts in international commerce. Nations with a strong aerospace presence enjoy benefits like boosted tourism, enhanced employment opportunities, technological advancements, and overall economic prosperity [2,3]. A comprehensive study of airline markets is essential, which benefits not only frequent flyers and industry stakeholders but broader intellectual and economic spheres [4,5].

However, forecasting airfares continues to pose significant difficulties [4,6]. The dynamics of travel patterns are characterized by continual fluctuations, which ultimately impact the variability of ticket prices [7]. Airlines face the challenge of striking a balance between maximizing the profits and managing the changes of demand since the travelers are constantly seeking for cost-effective options and availability [8,9,10]. The convergence of these diverse interests highlights the importance of comprehending the mechanics of ticket pricing. Therefore, this study tries to explore the intricacies associated with the prediction of airplane ticket fare to gain comprehensive business insights.

On the other hand, understanding customers’ viewpoints holds the utmost significance in formulating price prediction methods, particularly within the aviation industry [11]. The dissemination of opinions on social networks significantly influences the purchase patterns [12]. However, traditional forecasting techniques, which heavily rely on the pricing data impacted by such factors as oil prices, seat demand, seasonality, and rival pricing tactics, sometimes fail to consider the influence of the digital footprints [4]. Though such factors have undeniable influences on the pricing dynamics, it is in fact not easy to formulate them precisely in the predictive models. However, the digital footprints are openly accessible from social media, and accordingly, it is possible to understand the customer attitudes through sentiment analysis, which plays a crucial role in enhancing the price prediction models and assures the attitudes align with real market sentiments [13]. Sentiment analysis, which examines customer views across various platforms, has increasingly garnered attention and recognition [14,15]. Within this domain, ABSA has emerged as a powerful technique that provides a nuanced analysis on the feelings directed towards specific features of a service [16]. In this paper, instead of using manually defined aspects, the sentiment analysis first makes use of a BERT model to automatically extract the aspects of airline service such as safety, seat, comfort, food, airport, baggage, customer service, etc., from the customer reviews. A few aspects were then selected and categorized into nine related aspect groups according to the airline management point of view, based on which sensitivity analysis was conducted to assess the customer attitudes for individual groups. These efforts play a crucial role in enriching the understanding of customer perspectives, which is essential for the success of airline business.

Additionally, a hybrid model was proposed to combine the aspect-based sentiment analysis (ABSA) with historical ticket transaction data, specifically leveraging the capabilities of the gated recurrent unit (GRU) model that has shown impressive progress in modeling sequential data [17]. Inclusion of the ABSA can expand the information scope of the models and ensure a more precise depiction of market attitudes when performing predictions. Through a series of comprehensive experiments, ABSA exhibits significant enhancements in predictive capabilities, and the GRU model with ABSA achieves the highest performances of RMSE, MAE, and R² among all, which may establish a benchmark in airfare forecasting methods by thorough empirical validation.

The key contributions of this paper are summarized as follows:

ABSA-driven airfare forecasting approach: This study proposes using deep learning models with ABSA. This methodology is a shift from classical machine learning models, making use of both ticket transaction data and sentiment data to enhance predictive performance. The proposed model steps forward in airfare forecasting;
Aspect extraction and correlation analysis: It involves extracting key service aspects from customer reviews and performing correlation analysis to see the correlations among the aspects and the airfare. It not only reveals how different service elements affect the customers’ perceptions and airfares but gives valuable information for improving the airline services;
Aspect categorization and sensitivity analysis: The aspects are categorized into nine aspect groups, and accordingly, sensitivity analysis was conducted to determine how each aspect group impacts the airfare. This analysis provides deeper business insights into customer preferences and sentiments.

The rest of this paper is organized to provide a comprehensive understanding of our study. Section 2, Related Work, examines existing research and sets the theoretical foundation for our approach. Section 3, Methodology, is subdivided into three parts, namely datasets and preparations, ABSA, and the GRU model, detailing the procedures and techniques employed in our research. In Section 4, Results and Discussion, we present and analyze our findings, exploring their implications and significance. finally, Section 5, Conclusions, summarizes the study, reflecting on its contributions and potential future directions.

2. Related Work

Collectively, few studies contribute essential insights into airfare forecasting [1,18,19,20,21,22]. As methodologies are continuously advanced and refined, the industry can achieve increasingly accurate and efficient fare prediction, which holds potential benefits for both airlines and consumers. Built upon the earlier works, recent studies have made notable strides by utilizing deep learning algorithms to overcome the limitations of traditional prediction models [18]. Leveraging an expansive dataset from Ethiopian Airlines, which spans from January 2018 to July 2022, the researchers integrated 44 decision-making features into a GRU model. This model excels over conventional machine learning methods and outclasses such models as multilayer perceptrons (MLP) and long short-term memory (LSTM) networks. Performance metrics including MAE, RMSE, and R² demonstrate its efficacy. The study thereby substantiates the potential of GRU-based deep learning models for enhancing the accuracy of airfare predictions, contributing valuable business insights to ongoing discussions in this domain.

In parallel with ongoing efforts in airfare prediction, another study explored various AI techniques to examine pricing policies across multiple airlines and destinations [1]. Employing an extensive dataset from four airlines, the research implemented 16 different models spanning from machine learning (ML) and deep learning (DL) to quantum machine learning (QML). With accuracies between 89% and 99%, the study broke new ground by first applying QML to airfare price prediction and offering an in-depth investigation into the influence of diverse pricing features. Subsequent research highlighted the complexity and unpredictability of airfares, suggesting that these lead to revenue inefficiencies and customer dissatisfaction [19]. This study proposed an integrated model consisting of neural networks and XGBoost regressor that uses a light gradient boosting machine to minimize the mean absolute error in fare prediction. The analysis was based on flight data spanning from March to June 2019 across multiple airlines. The prediction outcomes can guide the airlines in resource allocation and strategic planning aiming at targeted market sectors. Another investigation contrasted eight regression models, including multilayer perceptrons, generalized regression neural networks, random forest regression trees, etc. [21]. Using a dataset of 1814 flights on a single international route, this study illuminated key factors such as departure and arrival times and luggage allowances. Among the models, bagging regression trees yielded the highest accuracy, closely followed by the random forest methods. Additionally, individual random forest and multilayer perceptron models were further merged using weighted stacking, leading to a composite model with enhanced predictive performance [22]. Employing a dataset of 51,000 records of one-week round-trip flights for three domestic airlines, this research considered a range of 12 variables, including the purchase and departure dates. The composite model demonstrated performance gains of 4.4% and 7.7% over the individual random forest and multilayer perceptron models, respectively, as gauged by the R² metric.

Moreover, another research model was presented to forecast the lowest possible fares for specific flight itineraries [23]. A unique ensemble learning algorithm was designed to leverage historical price variations to predict future fare adjustments. This recursive model iteratively utilizes prior predictions to inform subsequent ones and incorporates such features as similar itineraries and the temporal factors like the day of the week. Experimental results indicated that this model is particularly effective for those routes featuring independent pricing behaviors across different flights. Comparative assessments revealed that this ensemble-based approach outperformed the K-nearest neighbors (KNN) algorithm.

3. Methodology

3.1. Datasets

The study presented in this paper employs a rich historical dataset of flight ticket fares, with 841,160 records spanning from January 2018 to July 2023. This dataset encompasses various attributes, including travel date, booking class, distance, duration, total number of stops, flight number, flight type (domestic or international), number of passengers (on a specific ticket), weekend, holiday, season, actual fare, airline code, and so on.

While historical fare data offer a substantial basis for understanding pricing patterns, consumer reviews can possibly complement the information and enhance the fare prediction since they convey important messages correlated to fare changes, including perceptions of airline services, public sentiments, and industry events. Therefore, through integrating those reviews, we aimed to develop a more holistic and accurate predictive model. In this study, the trusted platforms of Skytrax and Trip Advisor served as sources of sentiment data, from which we gathered a collection of 46,167 consumer reviews from January 2018 to July 2023.

3.2. Data Preprocessing

In developing our fare prediction model, we initiated the process by rigorously cleaning an already high-quality and meticulously detailed ticket transactions dataset to ensure data integrity and accuracy. This dataset, characterized by its comprehensive features and adherence to industry standards, provided a solid foundation for our analysis. The initial stage involved the systematic elimination of duplicate entries and the correction of misaligned or incorrect values, with a specific focus on the Booking Class, which varies from A to Z. We identified records that, although identical in Travel Date, Distance, Class of Service, Seg Orig., and Seg Dest., diverged solely in Booking Class. These duplications, though infrequent, required careful evaluation to determine whether they represented true redundancies or necessary variations in fare classes reflecting different pricing strategies and service levels. Upon review, we found that while the duplicates could be removed, certain variations in Booking Class were critical to retain, as they provided insights into fare adjustments and reflected different service levels and associated costs. This approach helped us maintain essential data variations that are important for accurate fare predictions. By using Booking Class as a reference for identifying duplications and verifying their legitimacy, we reduced unnecessary data redundancy, enhancing the dataset’s robustness. Subsequently, we addressed anomalies in the Booking Class feature, where entries were mistakenly replaced by Class of Service. To correct these, we engaged domain experts and employed mode imputation, replacing incorrect entries with the most frequently occurring valid booking class within that fare range.

The next phase of our data preparation focused on strategically managing missing data to maintain uniformity and completeness across the dataset. For continuous variables such as Distance and Actual Fare, we corrected missing values using median values. This choice was guided by the skewed nature of these variables, where the median provides a more robust measure. Specifically, in cases of missing Distance data, we identified similar flight segments by cross-checking our dataset’s Seg Orig. and Seg Dest. to find records with the same route characteristics. This allowed us to accurately impute missing distances based on observed values from identical or similar routes. For categorical variables, we applied mode imputation, filling in missing entries with the most frequently occurring value within similar segments.

Following the feature engineering phase, we engaged in data transformation and normalization to meet the requirements of our machine learning algorithms. We implemented two key data transformation techniques: data encoding and data normalization. Categorical features such as Season, Booking Class, Class of Service, and Segment Origin were transformed using one-hot encoding. For continuous variables like Distance and Actual Fare, we applied min–max normalization, scaling the values to a range between 0 and 1. This scaling is achieved using the following formula:

x' = \frac{x - m i n (X)}{m a x (X) - m i n (X)}

(1)

where

m i n (X)

and

m a x (X)

are the minimum value and maximum value of the feature X in the dataset, respectively.

After data preprocessing, we obtained a cleaned dataset consisting of 840,158 airline fare transaction records. We divided this dataset into three parts: the training set, the validation set, and the test set, with allocations of 80%, 10%, and 10%, respectively. A random selection method was applied to ensure unbiased distribution among these sets. Table 1 below shows the distribution of records across each set after the division. Table 2 displays further a few sample features employed in the prediction model.

After preparing the airline fare data, our focus was redirected toward the analysis of the customer review data to explore the attitudes and perspectives of the travelers, thereby providing a complete comprehension of the airline sector from both the ticket transactions and the customer opinions. Figure 1 depicts the workflow of processing the customer reviews to extract the aspects and evaluate the sentiment. The workflow contains mainly three stages. In the first stage of tokenization and screening, the raw texts of customer reviews are tokenized into identifiable words or symbols one by one, and the stop words as well as those terms with less impact on the semantics are removed. Table 3 shows two sample reviews before and after tokenization and screening. As can be observed from Table 3, such stop words as “I”, “the”, “and”, “by”, and “it” are removed. Following this, all the cleaned texts are collected and passed further to the stage of aspect extraction, which makes use of a pre-trained BERT model, which is specially customized for aspect-based tasks, to extract the aspect words. Aspect words are essential for the sentiment analysis since the customer opinions may be well categorized according to the aspects of the services. Afterwards, the extracted aspect words are utilized in the stage of sentiment evaluation to obtain a sentiment score for every review according to the aspect words within it.

Figure 2 offers a visual representation of the predominant aspect words extracted from customer reviews, presented through a word cloud. This visualization method not only highlights the relative frequency of each term by adjusting its font size but also serves as an immediate visual summary of passenger priorities and concerns. The prominence of certain aspects within the word cloud indicates their frequent mention across the dataset, suggesting that addressing these areas could significantly enhance customer satisfaction. By focusing on these high-frequency terms, airline service providers can align their service improvements more closely with customer expectations. Furthermore, this word cloud enables quick identification of the most discussed service aspects in passenger reviews, providing actionable insights that are supported by quantitative data analysis. These insights are essential for management to effectively prioritize resource allocation toward areas that will most improve service quality.

The selected sample aspects are summarized in Table 4 with a concise description of its significance for each. These aspects were used further in the sentiment evaluation since they are what the customers care about from the administrative point of view.

Additionally, Figure 3 and Figure 4 display correlation matrices that encompass both historical ticket fare data and various airline service aspects, respectively. Figure 3 focuses on booking and travel-related attributes such as holidays, booking class, and the point of ticket issuance, while Figure 4 delves into service aspects including safety, food, crew interaction, and comfort. Each matrix not only illustrates the strength and direction of relationships between pairs of features but also includes p-values to highlight the statistical significance of these correlations, thereby providing a more robust basis for data interpretation. The p-values are denoted within the matrices using a straightforward notation to simplify understanding: three asterisks (***) indicate a highly significant correlation with a p-value less than 0.0001, two asterisks (**) denote significance at the 0.01 level, one asterisk (*) signifies significance at the 0.05 level, and a dot (.) represents correlations that are not statistically significant.

For instance, in Figure 3, the significant positive correlation between actual fare and class of service (r = 0.96 ***) underlines a strong and statistically robust association where higher classes of service generally correspond to higher fares, consistent with market expectations. Conversely, the negative correlation between booking class D and holidays (r = −0.51 **) suggests a decrease in bookings of this class during holiday periods, likely reflecting shifts in consumer preferences or pricing strategies during these times. Additionally, the correlation between the point of ticket issuance and holiday travel (r = 0.86 ***) highlights that tickets issued at specific locations are significantly associated with holiday travel, potentially indicating regional travel trends or the impact of targeted holiday marketing.

Similarly, Figure 4 reveals crucial insights such as the extremely significant correlation between safety and actual fare (r = 0.98 ***), suggesting that perceptions of safety could significantly influence fare decisions. This implies a customer’s willingness to pay a premium for perceived higher safety standards. The correlation between the boarding process and safety (r = 0.91 ***) further supports the notion that efficient boarding procedures enhance passengers’ overall impressions of airline safety. Moreover, the correlation between comfort and class of service (r = 0.87 **) supports the expectation that higher service classes are linked to increased comfort, enriching the customer’s travel experience.

3.3. Aspect-Based Sentiment Analysis

Aspect-based sentiment analysis (ABSA) is the analysis method that extracts specific attributes or features, called aspects, from textual data and independently assesses the sentiment associated with each of them to gain more insights into customer opinions [24].The field of sentiment analysis categorizes every review into positive, negative, or neutral attitudes, offering a comprehensive approach to understanding customer opinions [25,26,27]. Unlike traditional sentiment analysis, ABSA delves into the specific characteristics of a product or service [28]. For example, in hotel reviews analyzed in the research, aspects such as comfort, price, facility, traffic, and service are frequently discussed. A review might state, “The hotel’s location was incredibly convenient, but the room price was unexpectedly high” [29]. Here, ABSA would identify a positive sentiment toward the aspect of location (convenient) but a negative sentiment toward price (high). This detailed method enables a deeper understanding of customer perspectives, capturing nuanced views that might be overlooked in general sentiment analysis [29]. ABSA is particularly beneficial in the scenarios where various product or service aspects significantly influence overall customer perception and satisfaction [30,31].

ABSA starts from pinpointing specific aspects or features mentioned in textual reviews, such as comfort, customer, and other services in the airline industry. After identifying these aspects, ABSA assigns a sentiment polarity to each aspect individually based on the context in which it is mentioned [32,33] instead of labeling an entire review with merely one sentiment score. In this way, ABSA can illuminate those aspects directly influencing the customers’ perceptions and offer a more detailed understanding of areas for enhancement or marketing emphasis [34,35,36].

In predicting airline fares, the typical factors, including season, travel day and month, weekend, holiday, booking class, class of service, and number of stops, are taken into consideration [6,9,18]. However, passenger sentiments on key services may also affect the ticket prices [37,38,39]. With ABSA, airlines can discern the specific service attributes that might determine travelers’ decisions of expenditure [39,40]. By combining this knowledge with historical fare data, the prediction models may become more refined and can capture those aspects of passengers’ experiences that influence fare choices.

In ABSA, each review in the dataset might touch upon multiple aspects of the airline service, and the sentiment score for a distinct aspect

a_{i}

is denoted as

s (a_{i})

, of which the value ranges from −1 to 1, and its sign denotes the polarity. For a review indexed by

l

, the sentiment score

s_{l}

is computed as follows:

s_{l} = \sum_{a_{i} \in A_{l}} w_{i} s (a_{i})

(2)

s (a_{i}) = n_{+} (a_{i}) - n_{-} (a_{i})

(3)

where

A_{l}

is the set containing all the aspects present in the review,

w_{i}

is the weight assigned to the aspect

a_{i}

, and

n_{+} (a_{i})

and

n_{-} (a_{i})

are the positive and negative evidences of

a_{i}

, respectively.

3.4. GRU Model

The GRU, a member of the recurrent neural network family, excels at handling sequential regression tasks [14,41]. Its ability to preserve and recall vital data from previous steps in a sequence stands out, mirroring the operational strengths of long short-term memory (LSTM) models [42]. A standout feature of GRU is its swift training pace combined with a diminished risk of overfitting. Such traits become especially beneficial in scenarios with extensive datasets or limited computational resources, as commonly seen in airfare prediction tasks [18].

Additionally, the GRU model excels at detecting long-term dependencies and complex interrelations in timeseries data [18,43]. This capability facilitates capturing influential factors such as seasonal variation, holiday, and economic oscillation, which may affect the prices over time. Due to the proficiency on highlighting the intricate relationships, GRU contributes to the good performance of airfare prediction.

Furthermore, GRU exhibits lower sensitivity to the vanishing gradient problem, which is a common pitfall basic recurrent neural networks (RNNs) often encounter when deciphering long-term dependencies [44]. This resilience makes GRU more robust and suitable for the tasks involving extended sequences, which is a characteristic essential for airfare prediction. Moreover, GRU can be integrated seamlessly with other neural networks or feature engineering methods to improve the prediction precision. Such versatility makes the GRU model appropriate for managing complex prediction tasks. In this paper, the ABSA is incorporated into the GRU-based airfare prediction model since the sentiment scores quantifying customer opinions on various aspects might be correlated to the airfare and could enhance the predictive capability of the model.

Next, the basics of GRU are briefly summarized. Figure 5 shows the internal architecture of a GRU, as formulated below.

z_{t} = σ (W_{z} x_{t} + U_{z} h_{t - 1} + b_{z})

(4)

r_{t} = σ (W_{r} x_{t} + U_{r} h_{t - 1} + b_{r})

(5)

u_{t} = t a n h (W_{h} x_{t} + U_{h} (r_{t} ⊙ h_{t - 1}) + b_{h})

(6)

h_{t} = z_{t} ⊙ h_{t - 1} + (1 - z_{t}) ⊙ u_{t}

(7)

where

σ

is the sigmoid function, normalizing outputs to between 0 and 1, and

t a n h

is the hyperbolic tangent function, scaling activations to between −1 and 1.

W_{z}, W_{r}

,

W_{h}

,

U_{z}

,

U_{r}

, and

U_{h}

are weight matrices associated with the update gate, reset gate, and candidate hidden state, respectively, while

b_{z}

,

b_{r}

, and

b_{h}

are the corresponding bias vectors. These matrices and vectors are crucial for the gates’ operations and are learned during training.

x_{t}

is input vector at time t, while

h_{t - 1}

and

h_{t}

are previous and current hidden states, respectively.

z_{t}

,

r_{t}

, and

h_{t}

represent the update, reset, and candidate hidden state vectors, and

⊙

denotes element-wise multiplication.

First, the current input

x_{t}

and the previous hidden state

h_{t - 1}

are used to compute the update gate and the reset gate, as depicted in Equations (4) and (5), respectively. The outputs of the two gates are normalized so as to fall into the range between 0 and 1 by the sigmoid function,

σ

. The current update state,

u_{t}

, is then calculated with Equation (6), in which the messages from the former hidden state

h_{t - 1}

and the current input

x_{t}

are blended, and the result is further normalized with the hyperbolic tangent function, tanh. The reset gate,

r_{t}

, determines the fraction of the previous hidden state

h_{t - 1}

to be reserved when computing

u_{t}

. Equation (7) finally takes the weighting sum of the previous hidden state

h_{t - 1}

and the current update state

u_{t}

. The update gate

z_{t}

is the weight for deciding the proportion of the previous hidden state

h_{t - 1}

to be kept and the portion of the update state

u_{t}

to be supplanted. Here,

⊙

denotes the element-wise multiplication, and

1 - z_{t}

is the complement of

z_{t}

. The operation of weighting the sum signifies the new hidden state

h_{t}

is a tradeoff between

u_{t}

and

h_{t - 1}

controlled by

z_{t}

. The ratio of the previous hidden state

h_{t - 1}

increases when the value of update gate

z_{t}

approaches 1 and diminishes as

z_{t}

approaches 0. The gating mechanism in the GRU allows for effective control over the messages flowing through the network, and its efficacy and excellence has been demonstrated across various tasks.

Moreover, a series of GRUs with shared parameters can be concatenated and stacked into a network of multiple layers so as to be used for sequence prediction tasks. Figure 6 shows the architecture of such a network, which is typically called the GRU model. In Figure 6, the input sequence comprises a sequence of feature vectors:

X_{1}

,

X_{2}

,

X_{3}

, and

X_{n}

. As a sequence of data flows into the network, each GRU layer processes the sequence, updates the hidden state

h_{t}

sequentially at every time step

t

, and passes the outputs to the next layer. This stacked configuration allows for capturing complex patterns in timeseries data. When used for the task of sequence-to-one prediction, the outputs in the last time step are fed into a fully connected layer, which integrates the learned representations and performs the final prediction of classification or regression tasks.

Notably, the GRU model here in this paper is intended to be used for airfare prediction, and the input vector contains the aspect-based sentiment scores actually derived from customer reviews in addition to the flight ticket data. This rich fusion of sentiment analysis and flight ticket information serves as a comprehensive input for airfare prediction, capturing both the influence of public opinions and historical pricing trends.

In this study, we employed an architecture comprising seven GRU layers with the following units: 824, 512, 256, 128, 64, 32, and 16. The rectified linear unit (ReLU) was used as the activation function, and the Adam optimizer was applied with a learning rate set at 0.001. All results presented in this study are based on test data, using a distribution ratio of 80% training set, 10% validation set, and 10% test set. Our experiment was conducted using the Python programming language. We specifically leveraged PyTorch and an NVIDIA GeForce GPU with 12 GB GDDR6X for building and training our deep learning models. The model underwent an extensive training process spanning 1400 epochs with a batch size of 450. Its design was explicitly crafted to optimize the balance between practical learning and computational efficiency. The detailed experimental architecture for the proposed ABSA_GRU flight fare prediction model is presented in Figure 7.

Following its development, the model was subjected to rigorous testing on test datasets to ascertain its efficacy and generalizability. We evaluated the model’s performance using the following metrics: mean absolute error (MAE) root mean squared error (RMSE), and coefficient of determination (R²), as defined in Equations (8)–(10) [45,46]. These metrics were helpful in comparing our model with others so as to verify its performance and gain insights into its strength and weakness in the context of flight fare analysis.

M A E = \frac{1}{N} \sum_{i = 1}^{N} |y_{i} - {\hat{y}}_{i}|

(8)

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}

(9)

R^{2} = 1 - \frac{R S S}{T S S} = 1 - \frac{\sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}}{\sum_{i = 1}^{N} {(y_{i} - {\bar{y}}_{i})}^{2}}

(10)

The MAE measures the mean absolute error between predicted and actual values. In this context,

y_{i}

represents the actual fare,

{\hat{y}}_{i}

denotes the predicted fare, and

N

is the total number of observations. The RMSE (Equation (9)) quantifies the square root of the mean of the squared discrepancies between predicted and actual values. By squaring the errors, RMSE effectively places a higher penalty on larger discrepancies; thus, it is particularly sensitive to significant deviations and serves as a stringent metric of model performance. Finally, the

R^{2}

(Equation (10)) indicates the proportion of variance in the dependent variable (observed flight fares) that is predictable from the independent variables. A higher

R^{2}

value suggests a model’s enhanced ability to account for more variance, indicating superior model fit. Here, the RSS (residual sum of squares) represents the unexplained variance by the model, whereas the TSS (total sum of squares) encapsulates the overall variance observed in the data.

4. Results and Discussion

In this section, we first investigate how ABSA could contribute to the prediction of flight ticket fare. This experiment utilized ninety-four input features drawn from both historical fare data and the airline sentiment data. Historical fare data included features such as travel date, booking class (ranging from A to Z), service class (economy, business, and first-class), origin, destination, distance, flight duration, number of stops, and season (summer, winter, autumn, and spring). The service aspects in the sentiment data include announcements, cleanliness, safety, baggage handling, staff interactions, cabin pressure, cancellation policies, security procedures, seat, comfort, crew, ambient noise, gate efficiency, rebooking processes, loyalty program benefits, legroom, air conditioning quality, boarding experience, and in-flight entertainment offerings. These features provide unique and valuable information that might be helpful in fare prediction.

However, the many aspects might make the analysis results verbose and complicated, and make it difficult to obtain an overview on the service quality. To simplify the analysis, the aspects were therefore merged into nine distinct groups. Booking and ticketing address the initial stages of a traveler’s journey. Pre-flight procedures involve the steps for boarding. Airport services encompass the range of facilities and operations within the airport. In-flight amenities and services, along with seat and cabin features, ensure comfort and entertainment during the flight. The staff category assesses the performance and interaction of airline personnel. Safety and security, together with cleanliness, are dedicated to maintaining high standards of safety and hygiene, respectively. Lastly, post-flight services and issues deal with aspects of the journey after the aircraft lands, such as baggage claim, post-flight customer service, and handling flight-related issues. The grouping of aspects aims primarily to reduce the complexity such that the sentiment analysis results could be more compact, manageable, meaningful, and easy to visualize and interpret with clarity and relevance. In addition, the aspect groups could help to quickly identify the range of service aspects, so the airline company can improve the service quality more effectively. To provide a clear view of the sentiment data used in our analysis, Table 5 displays the sentiment scores across each aspect group.

Figure 8 shows the yearly ABSA results presented based on the nine aspect groups from 2018 to 2023. The two colors signify the ratios of the positive and negative sentiments within the customer reviews for each aspect group denoting distinct service dimensions. Across the years, the persistence of positive feedback in certain categories, such as safety and security, suggests favorable passenger feedback or effective service strategies, while the presence of negative sentiment, such as post-flight services and issues, suggests unfavorable passenger feedback and the need for ongoing attention and enhancement. The depiction of sentiment analysis across the consecutive years enables tracking the long-term changes in customer perception and could provide invaluable insights for strategic planning or operational adjustments aimed at elevating the customer journey.

Additionally, sensitivity analysis was conducted for the proposed prediction model of ticket fare to investigate the significance of various aspect groups. We utilized the aspects sentiment score dataset presented in Table 5, along with the flight fare dataset from Table 1. The model demonstrates exemplary performance when all aspects are considered, with a remarkably low RMSE = 0.0071, MAE = 0.0137, and a R² = 0.9899, indicative of its robust predictive power. When the aspect group of safety and security is omitted, there is a noticeable decrease in model performance, as reflected by marked rises in RMSE 0.0672 and MAE 0.5431 and a substantial drop in the R² value to 0.6752. This suggests that the safety and security aspect group is critical for accurate fare forecasting. A similar trend is observed when the staff aspect group is excluded, albeit to a lesser extent, with the RMSE increasing to 0.0482 and MAE to 0.6887 and the R² decreasing to 0.8142. Excluding seat and cabin features results in an RMSE of 0.0412, MAE of 0.3128, and an R² of 0.9572, highlighting their significant but lesser influence compared to safety aspects. The exclusion of cleanliness aspects still impacts the model but less pronouncedly, with RMSE = 0.0223, MAE = 0.1145, and R² = 0.9379, indicating a lower yet still noteworthy importance in fare prediction. Table 6 shows the impact of aspect group exclusion on prediction model performance

To further elucidate the impact of ABSA on the precision of predictive models, a comparative study was conducted. This study examined multiple models to determine the extent to which incorporating ABSA can refine flight ticket fare forecasting accuracy. Figure 9 illustrates the MAE for the MLP, LSTM, and GRU models, both with and without ABSA integration. It reveals that incorporating ABSA leads to performance enhancements across all models, with sequence-based models LSTM and GRU showing particularly significant improvements. Notably, the ABSA_GRU model recorded the lowest MAE at 0.0137, underscoring the efficacy of sentiment analysis in capturing critical price-related factors from passenger feedback.

Further detailed in Figure 10 is an RMSE analysis over 1400 epochs, where the ABSA_GRU model consistently outperformed others, maintaining a low RMSE, indicative of stable data trend representation. In contrast, the MLP model exhibited a higher RMSE, highlighting the significant error reduction when ABSA is applied.

An additional experiment assessed the impact of increasing the number of layers in a stacked GRU model, with results depicted in Figure 11. A clear trend emerged; RMSE decreases as more layers are added, achieving optimal performance with a seven-layer configuration. This suggests that additional layers enhance the model’s predictive capabilities.

Further extending the analysis, Figure 12 compares the R² values of the proposed ABSA_GRU model over the same training epochs. The model shows progressive accuracy improvements, with R² values nearing 1 for both training and validation sets. This consistent performance indicates a high reliability in fare prediction when ABSA is utilized, with R² improvements highlighting the substantial benefits of integrating passenger feedback into the predictive models.

Figure 13 further shows how well the ABSA_GRU model predicts flight ticket fares. On the scatter plot, the x-axis shows the actual fares, and the y-axis displays the predicted fares. The closer the data points are to the red diagonal line, the more accurate the predictions are. It can be seen in Figure 13 that most of the model’s predictions align well with the line, indicating that the model accurately predicts the flight ticket fares. The small differences from the ideal line show that the model is generally reliable across various fare ranges. The consistent efficiency of the ABSA_GRU model is evident regardless of whether the ticket fares are low or high. Its RMSE value is low, which means there is only a small difference between the predicted value and the actual fare. The performance metrics for various models are summarized in Table 7.

Furthermore, a comprehensive assessment was carried out across several timeframes to ascertain the effectiveness of the GRU model with ABSA. The evaluations encompassed prediction of varying durations, ranging from short-term to long-term, and were conducted at 7, 14, 21, and 30 days. Table 8 shows the predictive accuracy of this model over these time intervals. The results in Table 8 reveal that the model demonstrates high precision in the 7-day horizon, with an RMSE of 0.0072 and an R² of 0.9769, indicating its accuracy in forecasting flight ticket fares over one week. In the 14-day timeframe, the model maintains this robustness, evidenced by an RMSE of 0.0078 and an R² of 0.9181. However, there is a noted slight decrease in accuracy for medium-term predictions. Over 21 days, the model begins to show instances of deviation, but most predictions still align closely with the actual values. The increase in RMSE to 0.0671 and the decrease in R² to 0.8246 highlights the growing impact of external and inherent uncertainties on prediction efficiency. For the interval of 30 days, the model exhibits more noticeable variations in its predictions, particularly with higher fares. The RMSE of 0.0881 and an R² of 0.8143 indicate the challenges in accurately predicting fares over a longer period. Despite the potential drop in performance over extended periods, the model still demonstrates a considerable degree of predictive capability. Figure 14 shows the model’s predictive accuracy over these time intervals.

5. Conclusions

This study pioneers the integration of ABSA with advanced deep-learning models for predicting flight ticket fares. By introducing a novel feature fusion approach that combines historical price data with customer review data, our research offers a unique perspective on fare prediction. This methodology utilizes timeseries patterns of air ticket prices and incorporates the often-overlooked sentiment trends from passenger evaluations, providing a more comprehensive analysis.

Our findings underscore that integrating ABSA with the GRU model significantly enhances its performance compared to conventional models, emphasizing its practical applicability in aviation pricing strategies. This synergy between sentiment analysis and deep learning showcases a novel approach to fare prediction, contributing to the body of knowledge in this domain. The study also highlights the importance of considering external information sources, such as customer sentiments, in fare prediction models. Traditional models often overlook these aspects, yet our research demonstrates their significant impact on the accuracy of fare predictions.

Acknowledging the challenges in long-term fare prediction, our model opens avenues for future research, especially in enhancing efficiency over extended periods. Understanding the influence of external factors, such as global events and promotional activities, is crucial in improving long-term forecasting.

In summary, this research marks a significant step forward in integrating sentiment analysis and deep learning for predicting flight ticket fares. The insights gained from this study can inform airlines’ pricing strategies, potentially leading to increased profitability and enhanced passenger satisfaction. The research’s ability to incorporate a wide range of service aspects, as evidenced in our comprehensive assessment, underscores its potential as a transformative factor in the aviation industry.

Author Contributions

W.A.D., conceptualization, methodology, software, validation, formal analysis, investigation, visualization, and writing—original draft; B.-S.L., methodology, conceptualization, validation, supervision, resources, and writing—review and editing. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data are not publicly available due to privacy concerns regarding the use of flight fare transaction data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Kalampokas, T.; Tziridis, K.; Kalampokas, N.; Nikolaou, A.; Vrochidou, E.; Papakostas, G.A. A Holistic Approach on Airfare Price Prediction Using Machine Learning Techniques. IEEE Access 2023, 11, 46627–46643. [Google Scholar] [CrossRef]
Buyruk, M.; Güner, E. Personalization in airline revenue management: An overview and future outlook. J. Revenue Pricing Manag. 2022, 21, 129–139. [Google Scholar] [CrossRef]
Badanik, B.; Remenysegova, R.; Kazda, A. Sentimental Approach to Airline Service Quality Evaluation. Aerospace 2023, 10, 883. [Google Scholar] [CrossRef]
Wang, T.; Pouyanfar, S.; Tian, H.; Tao, Y.; Alonso, M.; Luis, S.; Chen, S.-C. A framework for airfare price prediction: A machine learning approach. In Proceedings of the 2019 IEEE 20th International Conference on Information Reuse and Integration for Data Science (IRI), Los Angeles, CA, USA, 30 July–1 August 2019. [Google Scholar]
Samunderu, E.; Farrugia, M. Predicting customer purpose of travel in a low-cost travel environment—A Machine Learning Approach. Mach. Learn. Appl. 2022, 9, 100379. [Google Scholar] [CrossRef]
Branda, F.; Marozzo, F.; Talia, D. Ticket Sales Prediction and Dynamic Pricing Strategies in Public Transport. Big Data Cogn. Comput. 2020, 4, 36. [Google Scholar] [CrossRef]
Subramanian, R.R.; Murali, M.S.; Deepak, B.; Deepak, P.; Reddy, H.N.; Sudharsan, R.R. Airline Fare Prediction Using Machine Learning Algorithms. In Proceedings of the 2022 4th International Conference on Smart Systems and Inventive Technology (ICSSIT), Tirunelveli, India, 20–22 January 2022; pp. 877–884. [Google Scholar]
Hu, Y.; Li, J.; Ran, L. Dynamic pricing for airline revenue management under passenger mental accounting. Math. Probl. Eng. 2015, 2015, 836434. [Google Scholar] [CrossRef]
Abdella, J.A.; Zaki, N.M.; Shuaib, K.; Khan, F. Airline ticket price and demand prediction: A survey. J. King Saud Univ.-Comput. Inf. Sci. 2021, 33, 375–391. [Google Scholar] [CrossRef]
Wang, Z.; Han, X.; Chen, Y.; Ye, X.; Hu, K.; Yu, D. Prediction of Willingness to Pay for Airline Seat Selection Based on Improved Ensemble Learning. Aerospace 2022, 9, 47. [Google Scholar] [CrossRef]
Filieri, R. What makes an online consumer review trustworthy? Ann. Tour. Res. 2016, 58, 46–64. [Google Scholar] [CrossRef]
Sezgen, E.; Mason, K.J.; Mayer, R. Voice of airline passenger: A text mining approach to understand customer satisfaction. J. Air Transp. Manag. 2019, 77, 65–74. [Google Scholar] [CrossRef]
Xu, X.; Liu, W.; Gursoy, D. The impacts of service failure and recovery efforts on airline customers’ emotions and satisfaction. J. Travel Res. 2019, 58, 1034–1051. [Google Scholar] [CrossRef]
Birjali, M.; Kasri, M.; Beni-Hssane, A. A comprehensive survey on sentiment analysis: Approaches, challenges and trends. Knowl.-Based Syst. 2021, 226, 107134. [Google Scholar] [CrossRef]
Aldayel, M.; Ykhlef, M. A new sentiment case-based recommender. IEICE Trans. Inf. Syst. 2017, 100, 1484–1493. [Google Scholar] [CrossRef]
Hu, Y.-H.; Chen, Y.-L.; Chou, H.-L. Opinion mining from online hotel reviews—A text summarization approach. Inf. Process. Manag. 2017, 53, 436–449. [Google Scholar] [CrossRef]
Li, C.; Qian, G. Stock Price Prediction Using a Frequency Decomposition Based GRU Transformer Neural Network. Appl. Sci. 2022, 13, 222. [Google Scholar] [CrossRef]
Degife, W.A.; Lin, B.-S. Deep-Learning-Powered GRU Model for Flight Ticket Fare Forecasting. Appl. Sci. 2023, 13, 6032. [Google Scholar] [CrossRef]
Tuli, M.; Singh, L.; Tripathi, S.; Malik, N. Prediction of Flight Fares Using Machine Learning. In Proceedings of the 2023 13th International Conference on Cloud Computing, Data Science & Engineering (Confluence), Noida, India, 19–20 January 2023. [Google Scholar]
Prasath, S.N.; Kumar, M.S.; Eliyas, S. A Prediction of Flight Fare Using K-Nearest Neighbors. In Proceedings of the 2022 2nd International Conference on Advance Computing and Innovative Technologies in Engineering (ICACITE), Greater Noida, India, 28–29 April 2022; pp. 1347–1351. [Google Scholar]
Tziridis, K.; Kalampokas, T.; Papakostas, G.A.; Diamantaras, K.I. Airfare prices prediction using machine learning techniques. In Proceedings of the 2017 25th European Signal Processing Conference (EUSIPCO), Kos, Greece, 28 August–2 September 2017. [Google Scholar]
Vu, V.H.; Minh, Q.T.; Phung, P.H. An airfare prediction model for developing markets. In Proceedings of the 2018 International Conference on Information Networking (ICOIN), Opatija, Croatia, 21–25 May 2018. [Google Scholar]
Chen, Y.; Cao, J.; Feng, S.; Tan, Y. An ensemble learning based approach for building airfare forecast service. In Proceedings of the 2015 IEEE International Conference on Big Data (Big Data), Santa Clara, CA, USA, 29 October–1 November 2015. [Google Scholar]
Mowlaei, M.E.; Abadeh, M.S.; Keshavarz, H. Aspect-based sentiment analysis using adaptive aspect-based lexicons. Expert Syst. Appl. 2020, 148, 113234. [Google Scholar] [CrossRef]
Cambria, E.; Poria, S.; Gelbukh, A.; Thelwall, M. Sentiment analysis is a big suitcase. IEEE Intell. Syst. 2017, 32, 74–80. [Google Scholar] [CrossRef]
Ligthart, A.; Catal, C.; Tekinerdogan, B. Systematic reviews in sentiment analysis: A tertiary study. Artif. Intell. Rev. 2021, 54, 4997–5053. [Google Scholar] [CrossRef]
Young, T.; Hazarika, D.; Poria, S.; Cambria, E. Recent trends in deep learning based natural language processing. IEEE Comput. Intell. Mag. 2018, 13, 55–75. [Google Scholar] [CrossRef]
Do, H.H.; Prasad, P.W.; Maag, A.; Alsadoon, A. Deep learning for aspect-based sentiment analysis: A comparative review. Expert Syst. Appl. 2019, 118, 272–299. [Google Scholar] [CrossRef]
Han, C.-Z.; Lin, B.-S. A hybrid model of tensor factorization and sentiment utility logistic model for trip recommendation. In Proceedings of the 2018 1st IEEE International Conference on Knowledge Innovation and Invention (ICKII), Jeju, Republic of Korea, 23–27 July 2018. [Google Scholar]
Liang, B.; Su, H.; Gui, L.; Cambria, E.; Xu, R. Aspect-based sentiment analysis via affective knowledge enhanced graph convolutional networks. Knowl.-Based Syst. 2022, 235, 107643. [Google Scholar] [CrossRef]
Sun, C.; Huang, L.; Qiu, X. Utilizing BERT for aspect-based sentiment analysis via constructing auxiliary sentence. arXiv 2019, arXiv:1903.09588. [Google Scholar]
Wang, K.; Shen, W.; Yang, Y.; Quan, X.; Wang, R. Relational graph attention network for aspect-based sentiment analysis. arXiv 2020, arXiv:2004.12362. [Google Scholar]
Hoang, M.; Bihorac, O.A.; Rouces, J. Aspect-based sentiment analysis using bert. In Proceedings of the 22nd Nordic Conference on Computational Linguistics, Turku, Finland, 30 September–2 October 2019. [Google Scholar]
Siering, M.; Deokar, A.V.; Janze, C. Disentangling consumer recommendations: Explaining and predicting airline recommendations based on online reviews. Decis. Support Syst. 2018, 107, 52–63. [Google Scholar] [CrossRef]
Tubishat, M.; Idris, N.; Abushariah, M.A. Implicit aspect extraction in sentiment analysis: Review, taxonomy, oppportunities, and open challenges. Inf. Process. Manag. 2018, 54, 545–563. [Google Scholar] [CrossRef]
Chatterjee, S. Explaining customer ratings and recommendations by combining qualitative and quantitative user generated contents. Decis. Support Syst. 2019, 119, 14–22. [Google Scholar] [CrossRef]
Monika, R.; Deivalakshmi, S.; Janet, B. Sentiment analysis of US airlines tweets using LSTM/RNN. In Proceedings of the 2019 IEEE 9th International Conference on Advanced Computing (IACC), Tiruchirappalli, India, 13–14 December 2019. [Google Scholar]
Hasib, K.M. Sentiment Analysis on Bangladesh Airlines Review Data Using Machine Learning. Ph.D. Thesis, BRAC University, Dhaka, Bangladesh, 2022. [Google Scholar]
Farzadnia, S.; Vanani, I.R. Identification of opinion trends using sentiment analysis of airlines passengers’ reviews. J. Air Transp. Manag. 2022, 103, 102232. [Google Scholar] [CrossRef]
Tsafarakis, S.; Kokotas, T.; Pantouvakis, A. A multiple criteria approach for airline passenger satisfaction measurement and service quality improvement. J. Air Transp. Manag. 2018, 68, 61–75. [Google Scholar] [CrossRef]
Song, H.; Choi, H. Forecasting stock market indices using the recurrent neural network based hybrid models: CNN-LSTM, GRU-CNN, and ensemble models. Appl. Sci. 2023, 13, 4644. [Google Scholar] [CrossRef]
Yurtsever, M. Gold Price Forecasting Using LSTM, Bi-LSTM and GRU. Eur. J. Sci. Technol. 2021, 1, 341–347. [Google Scholar] [CrossRef]
Almuammar, M.; Fasli, M. Deep learning for non-stationary multivariate time series forecasting. In Proceedings of the 2019 IEEE International Conference on Big Data (Big Data), Los Angeles, CA, USA, 9–12 December 2019. [Google Scholar]
Ahmed, S.F.; Alam, M.S.B.; Hassan, M.; Rozbu, M.R.; Ishtiak, T.; Rafa, N.; Mofijur, M.; Shawkat Ali, A.; Gandomi, A.H. Deep learning modelling techniques: Current progress, applications, advantages, and challenges. Artif. Intell. Rev. 2023, 56, 13521–13617. [Google Scholar] [CrossRef]
Wu, J.-H.; Lin, B.-S. Salinity analysis based on multivariate nonlinear regression for web-based visualization of oceanic data. Terr. Atmos. Ocean. Sci. 2022, 33, 6. [Google Scholar] [CrossRef]
Ometov, A.; Mezina, A.; Nurmi, J. On Applicability of Imagery-Based CNN to Computational Offloading Location Selection. IEEE Access 2022, 11, 2433–2444. [Google Scholar] [CrossRef]

Figure 1. Workflow for customer review processing, aspect extraction, and sentiment scoring.

Figure 2. Word cloud visualization of predominant aspects in customer review.

Figure 3. Correlation matrix of some selected features with statistical significance levels in historical data.

Figure 4. Correlation matrix of airline service aspects and actual flight ticket fare with statistical significance indicators.

Figure 5. Internal structure of a GRU unit.

Figure 6. The stacked network structure of the GRU model.

Figure 7. Experimental structure of the hybrid ABSA_GRU flight fare prediction model.

Figure 8. Temporal distribution of positive and negative customer sentiments by aspects group.

Figure 9. Mean absolute error (MAE) across different model architectures.

Figure 10. Comparative analysis of root mean square error (RMSE) across different model architectures over 1400 epochs.

Figure 11. Performance of RMSE for stacked-GRU model across seven layers.

Figure 12. The proposed model of coefficient of determination R².

Figure 13. Scatter plot points of actual vs. predicted flight ticket fares.

Figure 14. Comparison of actual vs. predicted flight ticket fares using the ABSA_GRU model in different prediction timeframes.

Table 1. A distribution of airline fare transaction across dataset splits.

Dataset	Records	Percentage
Training set	672,126	80%
Validation set	84,016	10%
Test set	84,016	10%

Table 2. A few sample features employed in the flight fare forecasting model.

Feature	Description	Possible Values
Travel Date	Date of the scheduled flight	YY/MM/DD
Booking Class	Code representing fare type and associated restrictions	A, B, C, D, … Z
Class of Service	Level of service for the flight	Economy, Business, First class
Seg Orig.	The starting point of the travel segment	Airport’s code
Seg Dest.	The endpoint of the travel segment	Airport’s code
Distance	Distance between the departure and arrival airports	Miles/Kilometers
Season	Time of year, such as summer, winter, autumn, or spring, when travel occurs	Summer/Winter/Autumn/Spring
Point of Ticket Issuance	The location where the ticket is purchased	Location/Office/Platform name
Luggage	Policies on allowed luggage	Yes/1–2, No/0
Overnight Flight	Whether the flight travels overnight	Yes/No
Actual Fare	Final ticket cost	USD

Table 3. Sample reviews from the dataset before and after tokenization and screening.

Review Text

Cleaned Text after Tokenization and Screening

I appreciate the good and warm services provided
by every staff ground and cabin. They make this my journey comfortable and relaxing. The money is worth it, the food and beverages were good too. The flight time is right on schedule.

[‘appreciate’, ‘good’, ‘warm’, ‘services’, ‘provided’, ‘every’, ‘staff’, ‘ground’, ‘cabin’, ‘make’, ‘journey’, ‘comfortable’, ‘relaxing’, ‘money’, ‘worth’, ‘food’, ‘beverages’, ‘good’, ‘flight’, ‘time’, ‘right’, ‘schedule’]

Overall a good quality airline but not at the front. Flew A380 lower deck, aisle rear seating, reasonable experience, very smooth flight. Cabin clean and tidy, staff and service good, ground service also good. Onboard entertainment below par, selection reasonable but not up with some, food ordinary to average only, leg space and seating on lower level below average—tight between seats-limited leg/knee room. Having experienced the A380 I now prefer the Boeing 777 or 340. I spoke to friends who have flown upstairs on the A380 and thought it to be better option.

[‘Overall’, ‘good’, ‘quality’, ‘airline’, ‘front’, ‘Flew’, ‘A380’, ‘lower’, ‘deck’, ‘aisle’, ‘rear’, ‘seating’, ‘reasonable’, ‘experience’, ‘smooth’, ‘flight’, ‘Cabin’, ‘clean’, ‘tidy’, ‘staff’, ‘service’, ‘good’, ‘ground’, ‘service’, ‘also’, ‘good’, ‘Onboard’, ‘entertainment’, ‘par’, ‘selection’, ‘reasonable’, ‘food’, ‘ordinary’, ‘average’, ‘leg’, ‘space’, ‘seating’, ‘lower’, ‘level’, ‘average’, ‘tight’, ‘seats’, ‘limited’, ‘leg’, ‘knee’, ‘room’, ‘experienced’, ‘A380’, ‘prefer’, ‘Boeing’, ‘777’, ‘340’, ‘spoke’, ‘friends’, ‘flown’, ‘upstairs’, ‘A380’, ‘thought’, ‘better’, ‘option’]

Table 4. Sample of aspects with descriptions.

Aspect	Description
Safety	Protocols and measures implemented to ensure the safety and security of passengers.
Seat	Options available for seat selection, potentially encompassing aspects such as legroom, view, and proximity to amenities.
Luggage	Policies governing the types and amounts of luggage permissible, including distinctions between checked and carry-on items.
Comfort	Features enhancing passengers’ physical ease and relaxation, such as seat quality, cabin temperature, and spacing.
Food	Types and quality of meals and snacks offered to passengers may vary by flight duration and class of service.
Airport	Conditions and services at the originating and destination airports, including cleanliness and accessibility.
Baggage	The comprehensive term encapsulates all items passengers transport, including checked and carry-on luggage.
Turbulence	Unexpected aircraft movements due to atmospheric conditions can indicate perceived safety and comfort.
Customer service	The quality of engagement between airline staff and passengers, including responsiveness to inquiries and resolution of issues.
Upgrades	Options for improving one’s travel experience, such as moving to a higher class of service, usually for an additional fee or as a complimentary service.
Staff	All personnel employed by the airline who have direct or indirect interactions with passengers, including flight and ground crew.
Boarding	The efficiency and organization of the process by which passengers enter the aircraft.
Entertainment	A range of multimedia options is available to passengers during flight, including but not limited to audio, video, and internet services.
Gate	The area in the airport terminal where passengers wait to board their flight impacts the overall travel experience.
Delays	The flight fails to adhere to its scheduled departure or arrival time.
Refund	Policies and processes governing the return of fare in the event of service failure or cancellation.
Announcement	Clarity and frequency of in-flight and airport announcements providing operational or emergency information.

Table 5. Aspects group sentiment score.

Aspect Group	Positive	Negative	Training Set	Validation Set	Test Set
Staff	18,028	14,532	26,048	3256	3256
Safety and Security	10,601	2108	10,167	1270	1272
Seat and Cabin Features	14,108	7478	17,269	2159	2159
Pre-flight Procedures	3562	3838	5920	740	740
Post-flight Services and Issues	2762	4854	6094	762	762
In-flight Amenities and Services	9156	2896	9642	1205	1205
Cleanliness	3586	3446	5626	703	703
Booking and Ticketing	3454	3362	5453	682	682
Airport Services	11,528	12,736	19,411	2426	2426

Table 6. Sensitivity analysis of the prediction model across customer service aspect groups.

Aspect Group Excluded	Change in RMSE	Change in MAE	Change in R²
Safety and Security	Increased by 0.0601	Increased by 0.5294	Reduced by 0.3147
Staff	Increased by 0.0411	Increased by 0.6750	Reduced by 0.1757
Seat and Cabin Features	Increased by 0.0341	Increased by 0.2991	Reduced by 0.0327
Cleanliness	Increased by 0.0152	Increased by 0.1008	Reduced by 0.0520

Table 7. Summary of performance metrics for different models.

Model	RMSE	MAE	R²
MLP	0.0582	0.7431	0.6991
MLP with ABSA	0.0544	0.7087	0.7644
LSTM	0.0341	0.5989	0.9053
LSTM with ABSA	0.0312	0.3684	0.9289
GRU	0.0221	0.1384	0.9463
ABSA_GRU	0.0071	0.0137	0.9899

Table 8. Performance of ABSA_GRU model over various timeframes.

Time Interval	RMSE	R²
7 days	0.0072	0.9769
14 days	0.0078	0.9181
21 days	0.0671	0.8246
30 days	0.0881	0.8143

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Degife, W.A.; Lin, B.-S. A Multi-Aspect Informed GRU: A Hybrid Model of Flight Fare Forecasting with Sentiment Analysis. Appl. Sci. 2024, 14, 4221. https://doi.org/10.3390/app14104221

AMA Style

Degife WA, Lin B-S. A Multi-Aspect Informed GRU: A Hybrid Model of Flight Fare Forecasting with Sentiment Analysis. Applied Sciences. 2024; 14(10):4221. https://doi.org/10.3390/app14104221

Chicago/Turabian Style

Degife, Worku Abebe, and Bor-Shen Lin. 2024. "A Multi-Aspect Informed GRU: A Hybrid Model of Flight Fare Forecasting with Sentiment Analysis" Applied Sciences 14, no. 10: 4221. https://doi.org/10.3390/app14104221

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Multi-Aspect Informed GRU: A Hybrid Model of Flight Fare Forecasting with Sentiment Analysis

Abstract

1. Introduction

2. Related Work

3. Methodology

3.1. Datasets

3.2. Data Preprocessing

3.3. Aspect-Based Sentiment Analysis

3.4. GRU Model

4. Results and Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI