Strategic Decisions in Corporate Travel: Optimization Through Decision Trees

Zarate-Carbajal, Jose-Mario; Ruiz-Cruz, Riemann; Sánchez-Torres, Juan Diego

doi:10.3390/math12233741

Open AccessArticle

Strategic Decisions in Corporate Travel: Optimization Through Decision Trees

by

Jose-Mario Zarate-Carbajal

^†

,

Riemann Ruiz-Cruz

^*,†

and

Juan Diego Sánchez-Torres

Departamento de Matemáticas y Física, Instituto Tecnológico y de Estudios Superiores de Occidente, 8585 Periférico Sur Manuel Gómez Morín, Tlaquepaque 45604, Mexico

^*

Author to whom correspondence should be addressed.

^†

These authors contributed equally to this work.

Mathematics 2024, 12(23), 3741; https://doi.org/10.3390/math12233741

Submission received: 29 October 2024 / Revised: 12 November 2024 / Accepted: 21 November 2024 / Published: 28 November 2024

Download

Browse Figures

Versions Notes

Abstract

:

Global corporations frequently grapple with a dilemma between fulfilling business needs and adhering to travel policies to mitigate excessive fare expenditures. This research examines the multifaceted nature of business travel, delving into its key characteristics and the inherent complexities faced by management in formulating effective policies. An optimal travel policy must both be practical to implement and contribute to budget optimization. The specific requirements of each company necessitate tailored policies; for instance, a manufacturing company with scheduled trips demands a distinct policy, unlike a consulting firm with unplanned travel. This study proposes a modified regression decision tree machine learning algorithm to incorporate the unique features of corporate travel policies. Our algorithm is designed to self-adjust based on the specific data of each individual company. The authors implement the proposed approach using travel data from a real-world company and conduct simulations in various scenarios, comparing the results with the industry standard. This research offers a machine-learning-based approach to determining the optimal advance booking policy for corporate travel.

Keywords:

corporate travel policy; decision tree; machine learning

MSC:

90B50

1. Introduction

In global businesses, travel spending has a positive trend despite numerous economic downturns, including the Dot-Com (2000), Subprime Mortgage (2008), and COVID-19 (2020) crises. Travel expenses have returned to pre-pandemic levels and continue to rise, as stated in [1]. With an increasing number of companies going global, employee trips are becoming a business necessity. The market size of business travel reached USD

1.29

trillion in 2019 [2].

The adoption of communication technology by companies has had a limited effect on business air travel [3]. Travel and face-to-face meetings are the most effective way of doing business [4]. Business travel is sometimes associated with social status and has been perceived as an employee benefit [5]. The business travel industry provides exclusive services, as noted by [6]; examples of such services include frequent-flyer programs, loyalty schemes, high-speed internet, special meals, upgraded rooms, and more—benefits that are typically not available to regular travelers [7]. Business travel expenses escalate rapidly if not managed properly; the authors of [8] emphasized the importance of having a written framework to regulate business trips and manage expenses.

Business trips have created their own niche market that tourism companies have capitalized on. It started with the deregulation of airfares, allowing airlines to offer the same flight ticket at different rates based on supply and demand. American Airlines pioneered the implementation of a revenue management system; the basic concept of revenue management systems is finding the right price for the right customer at the right time. Tourism companies started to charge more for tickets with higher flexibility. A ticket booked within the same week of the flight is more expensive than one booked well in advance; as a result, it is almost impossible for two passengers to pay the same price for a given flight. This situation presents a management challenge in defining a company’s optimal travel policy.

Corporate travel policies mitigate the impact of revenue management systems while satisfying business requirements. More than a budgeting constraint, the travel policy is an optimization practice where the companies want to obtain the most benefits from each dollar spent on travel. The purpose of a travel policy is to keep the cost of corporate travel within predictable and realistic parameters and to save money for the organization [9,10,11].

The advanced booking parameter is the primary factor for controlling expenses in business tourism [11,12]. Lodging, meals, and ground transportation are typically arranged after flights; hence, the key determinant of a corporate travel policy is the number of days in advance to book a flight. The advanced booking parameter in the travel policy must be practical and tailored to a company’s specific needs. Travel agencies have become increasingly common in global companies for providing business travel services [13]. Companies delegate control, management, and policy rules to travel agencies. Travel agencies often formulate advance booking policies based on generalized trends but lack a clearly defined methodological framework. According to the authors’ observations across various global and specialized travel agencies, the industry standard (IS) for advance booking policies is set at 15 days for domestic flights and 21 days for international flights, as indicated in [14,15,16]. This generalized approach presents a paradox, as travel policies should ideally be tailored to the specific requirements of individual companies. For instance, the advance booking policy for a consulting firm may necessitate different considerations than that of a manufacturing entity. Consequently, there exists an opportunity to develop a quantitative model that enables the formulation of customized advance booking travel policies, thereby addressing the distinct needs of individual organizations.

The existing literature on revenue management models that are applicable to corporate travel is extensive; however, there is a notable lack of quantitative methodologies that can assist management in securing optimal advance bookings to fulfill corporate objectives. This study aims to investigate the following hypotheses:

Hypothesis 1.

A company’s optimal advance booking policy should be dynamic, necessitating periodic adjustments rather than adherence to a static policy.

Hypothesis 2.

The effectiveness of advance booking policies depends on a company’s mix of scheduled and unscheduled trips. A company with a higher proportion of unscheduled travelers requires a shorter advance booking period than one that primarily manages scheduled trips.

The historical trips of a company contain the characteristics and behavior of its travel needs; this research investigates a machine learning methodology to establish optimal travel policies from a corporate perspective, unlike traditional approaches in the tourism industry. The machine learning algorithm of the regression decision tree effectively leverages these historical data, ensuring that the recommended advance booking policy is tailored to the specific requirements of each individual company while keeping interpretability. By modifying the regression decision tree model, we propose a novel approach to determining the ideal book-in-advance policy considering the characteristics of corporate travel. This optimization aims to enhance fare certainty or increase the likelihood of securing fares below the average. While previous studies [17,18,19,20,21] have primarily focused on improving the prediction accuracy or reducing tree complexity, our work is novel by concentrating on the split function and its threshold. This threshold signifies the optimal book-in-advance value. This study contributes to the existing literature, providing management with a quantitative framework for determining the ideal book-in-advance parameter using modifications to the regression decision tree model.

2. Preliminary Review

2.1. Booking Traveler Problem

The core business and management strategies dictate business trips, as each company has unique needs, locations, tax tactics, engineering developments, customer distribution, sales projects, etc. Generally, business travelers prioritize suitable working conditions and aim to minimize their time away from home [22]. Traveling employees usually do not travel during holidays or vacation periods, making the seasonal factor less relevant; this contrasts with tourist travelers, who prioritize cost, planning their trips well in advance and with more extended periods of stay [23]. The majority of the expenses in business tourism are associated with air tickets and hotel accommodations; it is no coincidence that these sectors were pioneers in revenue management systems [24].

Most of the methodologies in the existing literature study the phenomenon from the perspective of the tourism industry and how the sector benefits from business travelers. The literature on revenue management systems is abundant: The authors of [25] proposed a decision system for an airline booking system to reject tourist travelers and accept more business travelers based on the book-in-advance parameter, and the authors of [26] developed a set of rules to accept bookings based on expected revenue. The authors of [27] considered dynamic pricing for hotels, and the authors of [28] tested revenue systems in hotels. The authors of [29] incorporated the overbooking factor by taking last-minute customer no-shows into account. The authors of [30,31,32,33,34] extended revenue management theory with simulations and overbooking and cancellation policies, and the authors of [35] considered the stochastic factor. The authors of [36] used machine learning techniques to predict the cancellation of bookings, and the authors of [37] proposed a self-adjusted optimization function.

Revenue management techniques have evolved to take advantage of business tourism by developing advanced quantitative approaches; this differs from the research that examines business tourism from the corporate perspective, which primarily focuses on qualitative methods. The authors of [12,38,39] analyzed the relation between policy controls and employee commitment, and the authors of [40] provided a complete overview of airline economics. The authors of [41] extended the fairness of air ticket prices created by revenue management systems. The authors of [42] developed a methodology to evaluate the efficiency of a travel department across multiple factors.

The dilemmas and violation factors in travel policy are discussed in [43,44]; there is a continuous conflict between travel budget controls and business strategies, which directly impacts travel policy compliance. In any company, booking an expensive flight for the following day to negotiate a 100-million-dollar deal may be acceptable; however, excessively restrictive travel policies can become challenging to implement, potentially leading to a decline in compliance rates [13,44,45]. Within a company, we can divide business travelers into two groups:

Scheduled travelers;
Unscheduled travelers.

Unscheduled travelers, such as executives, directors, salespersons, and consultants, typically have their trips driven by external factors such as customers, campaigns, and deals. It is generally acceptable for this group to book trips without prior planning. Conversely, scheduled travelers are more often found in departments such as those of engineering, finance, and operations. The business purpose of this group is usually related to internal activities that are planned well in advance [44]. The composition of scheduled and unscheduled employees varies across different companies.

The advanced booking parameter is the foremost determinant for managing costs [11,12]; however, establishing a book-in-advance rule solely based on cost is impractical. While all employees strive to comply with the advance booking policy, exceptions are generally accepted [46,47]. The primary goals of a travel policy are to manage and optimize the budget. The advance booking parameter directly influences the prices paid by travelers, and it is the most critical factor in a travel policy. An advance booking policy should fall within a feasible time frame that aligns with the company’s travel requirements.

2.2. Machine Learning Model Selection

The core of machine learning is its capacity to continuously adapt by learning from new data, enabling it to update outputs in response to the latest information; therefore, a machine learning algorithm is suitable to capture the characteristics and behavior of each company by learning historical trips to establish the optimal book-in-advance policy.

An effective travel policy must be simple, straightforward, and supported by comprehensible procedures. The optimal book-in-advance parameter represents the division between scheduled and unscheduled travelers; the optimal parameter is a threshold, not a prediction.

Traditional regressive machine learning methods such as linear regression, the support vector machine [48], neural networks [49], improvements of these methods [50,51], the ensemble method of bagged trees [52], random forests [53], and even more advanced and recent techniques such as the gradient-boosting machine (GBM) [54], XGboost, and flexible EHD pumps [55] work as a black box, and their primary focus is on prediction. None of these methods provide a comprehensive optimal threshold for use as a book-in-advance parameter. Additionally, the interpretability of the models is not simple due to the complexity of the models themselves.

Machine learning methods for classification, including logistic regression, support vector machines [48], neural networks [56], and decision trees [57], as well as their ensemble improvements, primarily focus on predicting class labels. Although these supervised methods offer thresholds as part of their output, they rely on prior knowledge of the optimal booking parameters, which is the problem to be solved. A possible adjustment that can be made to use these models is to test a different set of initial values, but the initial values introduce bias in the thresholds because the models would focus on predicting the classes provided; a possible calibration using a technique such as the receiver operating characteristic (ROC) may decrease the bias, but this would increase the complexity of the methods, thus breaking the objective of having a compressed method. For these reasons, we consider this family of machine learning methods unsuitable for finding the optimal book-in-advance parameters.

We explored unsupervised machine learning methodologies such as the nearest neighbor [58], K-means [59], hierarchical clustering [60], and Gaussian mix [61]. These algorithms do not provide a direct threshold; the division of classes is inferred. For instance, setting K-means for two clusters produces two centroids. Obtaining the difference in the centroids, dividing it by two, and adding this value to the lower centroid or reducing it from the highest centroid determines the class division. Although this algorithm family provides a threshold that can be interpreted as an optimal book-in-advance parameter, it is not possible to integrate the characteristics of the corporate travel policy to reduce costs or increase the certainty of the travel budget.

Generative AI has recently been widely used in multiple areas; the purpose is to create new data and content. Its usage would contribute to the overall discussion and dilemmas in the area of corporate travel from the perspective of augmentation. However, we do not see a direct application in the optimal solution of the book-in-advance problem.

We propose a regressive decision tree with Iterative Dichotomizer 3 (ID3) [62]. The decision tree framework offers a systematic approach to developing decision policies by constructing decision paths based on thresholds that optimize a split function; this approach provides a powerful and interpretable machine learning model. By modifying the split function, we integrate the corporate travel needs, and the exhaustive search in the predictor variable ensures that all scenarios are evaluated; the key benefits include simplicity, straightforward threshold estimation, and minimal data preprocessing requirements.

2.3. Regression Decision Tree

The regression decision tree model predicts a variable Y given some variables X within a dataset S (

S = {Y, X}

). The regressive tree is a data-based model, and the outcomes produced by the algorithm depend on a one-to-one relation between Y and the values of X. The extended notation for

Y = F (X)

is

[\begin{matrix} c y_{1} \\ y_{2} \\ y_{3} \\ ⋮ \\ y_{i} \\ y_{n} \end{matrix}] = F ([\begin{matrix} x_{1, 1} & x_{1, 2} & x_{1, 3} & x_{1, j} & \dots & x_{1, m} \\ x_{2, 1} & x_{2, 2} & x_{2, 3} & x_{2, j} & \dots & x_{2, m} \\ x_{3, 1} & x_{3, 2} & x_{3, 3} & x_{3, j} & \dots & x_{3, m} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋱ & ⋮ \\ x_{i, 1} & x_{i, 2} & x_{i, 3} & x_{i, j} & \dots & x_{i, m} \\ x_{n, 1} & x_{n, 2} & x_{n, 3} & x_{n, j} & \dots & x_{n, m} \end{matrix}]),

(1)

where

i, j

represents an element in the list, n is the number of samples in the data, and m is the number of predictor variables.

For simplicity, let us consider a single variable X (

m = 1

),

X = {x_{1, 1}, x_{2, 1}, x_{2, 1}, \dots, x_{i, 1}, x_{n, 1}} : = {x_{1}, x_{2}, x_{2}, \dots, x_{i}, x_{n}}

. The regression decision tree algorithm splits the data into smaller groups based on a selected threshold value of

U_{x_{i}}

. The method searches for the predictor and split values that best reduce the sum of the square error (SSE) on the predictors (Y); assuming that the partition is into two, the resulting groups are

S_{a}

and

S_{b}

. Group

S_{a}

contains the following paired records:

S_{a} = \{[\begin{matrix} c y_{1} \\ y_{2} \\ y_{3} \\ ⋮ \\ y_{i} \\ y_{k_{a}} \end{matrix}]; [\begin{matrix} c x_{1} \\ x_{2} \\ x_{3} \\ ⋮ \\ x_{i} \\ x_{k_{a}} \end{matrix}]\} .

(2)

Similarly, Group

S_{b}

contains the following:

S_{b} = \{[\begin{matrix} c y_{1} \\ y_{2} \\ y_{3} \\ ⋮ \\ y_{i} \\ y_{k_{b}} \end{matrix}]; [\begin{matrix} c x_{1} \\ x_{2} \\ x_{3} \\ ⋮ \\ x_{i} \\ x_{k_{b}} \end{matrix}]\} .

(3)

It is important to note that

k_{a}

and

k_{b}

denote the number of records in each partition group

S_{a}

and

S_{b}

, respectively. Consequently,

k_{a}

plus

k_{b}

equals the total number of records or n.

The sorted unique values of X are denoted as

U_{x}

; a specific value of

U_{x_{i}}

may have different values of

y_{i}

, and these values are organized in ascending order. The decision tree algorithm is fundamentally based on the work of [62]. In this model, iterations occur across all values of

U_{x_{i}}

to generate combinations of

k_{a}

and

k_{b}

to optimize a function

L (Y)

, which is known as the split function. The most common split function is represented by the sum of square errors (SSE):

S S E = \sum_{U_{x_{i}} \in S_{a}} {(y_{i} - {\bar{y}}_{a})}^{2} + \sum_{U_{x_{i}} \in S_{b}} {(y_{i} - {\bar{y}}_{b})}^{2},

(4)

where

{\bar{y}}_{a} = \frac{1}{k_{a}} \sum_{i \in S_{a}}^{k_{a}} y_{i}

and

{\bar{y}}_{b} = \frac{1}{k_{b}} \sum_{i \in S_{b}}^{k_{b}} y_{i}

are the averages of the partition groups within

S_{a}

and

S_{b}

, respectively. The model searches in all values of

U_{x_{i}}

to minimize the function (4); then, the selected value of

U_{x_{i}}

is the threshold for dividing the dataset S in

S_{a}, S_{b}

. The threshold is the middle value between

U_{x_{i}}

and

U_{x_{i + 1}}

, which is defined as

\frac{U_{x_{i}} + U_{x_{i + 1}}}{2}

to highlight the frontier.

Another popular split function in the libraries consists of minimizing the sum of absolute errors (SAE):

S A E = \sum_{U_{x_{i}} \in S_{a}} | y_{i} - {\bar{y}}_{a} | + \sum_{U_{x_{i}} \in S_{b}} | y_{i} - {\bar{y}}_{b} | .

(5)

The process continues in each sub-group to create p partitions in the data (

S = {S_{a}, S_{b}

},

S_{a} = {S_{a_{a}}, S_{a_{b}}}

,

S_{b} = {S_{b_{a}}, S_{b_{b}}}

,

S_{a_{a}} = {S_{a_{a_{a}}}, S_{a_{a_{b}}}}

, …); the minimum number of partitions is two, and the maximum is the number of distinct values (

U_{x}

) in X. Each sub-group generates a decision path in X to predict Y. The resulting sub-groups created by the threshold values share similar characteristics. The model estimates the prediction by computing the averages of the partition sub-group in Y; it is required to keep a meaningful solution by setting a minimum number of elements i in each sub-sample.

The algorithm is greedy; all combinations are explored to determine the optimal partitions. The complexity behaves similarly to a combination equation:

C o m p l e x i t y = (\frac{k_{U x_{i, j}}!}{p! (k_{x_{i, j}} - p)!}) d,

(6)

where

k_{U x_{i, j}}

is the number of distinct values in each

{x_{i, 1}, x_{i, 2}, x_{i, 3}, x_{i, j}, \dots, x_{i, m}}

; p is the number of partitions or the depth (sub-groups:

S_{a}

,

S_{b}

,

S_{a_{a}}

,

S_{a_{b}}

,

S_{b_{a}}

,

S_{b_{b}}

,

S_{a_{a_{a}}}

, …); d is the number of searches of variables X to consider from one to m. The parameters and data define the search space, which may grow exponentially yet remain finite.

The regression decision tree model achieves optimal solutions within a bounded search space; however, practical applications may face computational constraints. To address these issues, pruning techniques—such as data sampling, partition limits, depth restrictions, and other methodologies—are utilized to balance the model complexity with real-time implementation needs.

The regression decision tree offers precise results and a highly interpretative solution, albeit at a significant computational cost. The threshold values for each data partition establish the boundaries where the predictive values shift and the decision path is traced for each input. The predictive results (

y_{i})

and the threshold values (

U_{x_{i, j}}

) differ based on the initial variable; the model yields distinct outcomes if the partition begins with variable 1 (

x_{i, 1}

) rather than variable 2 (

x_{i, 2}

). The primary drawback of the regression decision tree model is its computational intensity. Identifying the optimal combination requires testing all variables and partitions; alternative ensemble decision tree algorithms, such as those of [52,53], reduce the optimization time by sampling variables and partitions. Choosing one model over another requires a thorough analysis of the problem and the response time needed in a production environment. The booking traveler problem focuses on a single predictor, X (days booked in advance), which reduces the processing time. We suggest using the initial threshold as the optimal value for the booking advance policy, resulting in a decision tree with a depth of one, thus eliminating the need for pruning or ensemble techniques.

3. Optimal Policy with Decision Trees

This research methodology is structured into two primary sections. Section 2.3 introduces the concept of regression decision tree models, comprehensively discussing their strengths, weaknesses, and limitations. In Section 3.1, we delve into the distinctive characteristics of travel policies and propose modifications to the regression decision tree model, taking the unique facets of business travel and the criticality of the book-in-advance parameter into account. Section 3.2 outlines the source of the empirical dataset utilized in this study and details the data transformations undertaken to ensure strict adherence to privacy regulations. Moreover, the authors propose a simulation framework for generating supplementary datasets, enabling the evaluation of the proposed models in two additional specific scenarios.

3.1. Regressive Decision Tree Modification to Optimize Travel Policy

The split function of the regressive decision tree algorithm is adapted to consider the travel policy characteristics. Business travelers are divided into two groups: unscheduled travelers (

S_{a}

) and scheduled travelers (

S_{b}

). Rather than predicting prices (Y), the algorithm concentrates on the threshold (

U_{x_{i}}

). This threshold delineates the separation between unscheduled travelers (

S_{a}

) and scheduled travelers (

S_{b}

), representing the optimal book-in-advance value (

{\hat{U}}_{x_{i}}

) that optimizes a function of Y.

A variation of the decision tree algorithm exists for clustering [57]. The clustering model identifies threshold values by considering partitions in both the dependent variable Y and the independent variables X (

U_{y_{i}}

and

U_{x_{i}}

). However, the method requires pre-defined classes that are not available. Consequently, the clustering decision tree model is not suitable for the problem. In contrast, the regression decision tree algorithm uses the fare prices (Y) and employs partitions only in X (

U_{x_{i}}

), ensuring that the solution is applicable for defining the book-in-advance policy.

The main objective of a travel policy is to manage a budget, as noted by [63]. Corporate budgets are typically fixed and planned months in advance. Each fiscal year, a specific budget is allocated for travel. Significant deviations in the estimated fares directly impact the number of travelers and potentially the business strategy, emphasizing the importance of accurate budgeting. The first proposal aims to enhance the overall certainty of the fares paid by unscheduled travelers (

S_{a}

) and scheduled travelers (

S_{b}

). Both groups have the same level of certainty, leading to a lower probability of both groups paying above the average, resulting in more accurate travel expense planning to support budgeting control.

Equation (4) is modified to find the threshold (

U_{x_{i}}

) between

S_{a}

and

S_{b}

that minimizes the absolute difference of square errors (ADSE):

A D S E = | \sum_{U_{x_{i}} \in S_{a}} {(y_{i} - {\bar{y}}_{a})}^{2} - \sum_{U_{x_{i}} \in S_{b}} {(y_{i} - {\bar{y}}_{b})}^{2} | .

(7)

The nomenclature is the same:

{\bar{y}}_{a} = \frac{1}{k_{a}} \sum_{i \in S_{a}}^{k_{a}} y_{i}

and

{\bar{y}}_{b} = \frac{1}{k_{b}} \sum_{i \in S_{b}}^{k_{b}} y_{i}

are the averages of the partition groups within

S_{a}

and

S_{b}

, respectively. The model searches for all possible splits in the value

U_{x_{i}}

that minimize the function. The selected value

{\hat{U}}_{x_{i}}

is the optimal threshold for dividing the dataset S into

S_{a}, S_{b}

.

Similarly, (5) is modified to find the absolute difference of the absolute errors (ADAE):

A D A E = | \sum_{U_{x_{i}} \in S_{a}} | y_{i} - {\bar{y}}_{1} | - \sum_{U_{x_{i}} \in S_{b}} | y_{i} - {\bar{y}}_{b} | | .

(8)

Equations (7) and (8) determine the threshold (

U_{x_{i}}

) at which both samples present the most similar square and absolute errors, respectively. Consequently, the variations in the fares paid by scheduled and unscheduled travelers are most balanced at this value. Using these models to set the book-in-advance policy increases the likelihood of achieving accurate budgets for both groups (

S_{a}

,

S_{b}

); we aim to evaluate the effectiveness of this proposal by analyzing the sum of the probability of paying above the average (PAA) for

S_{a}

and

S_{b}

. The optimal advance booking parameter should result in a lower PAA than the industry standard (IS) of 15 days for domestic flights.

Another crucial factor in corporate travel policy is cost optimization. Unscheduled travelers (

S_{a}

) may not comply with the policy due to business reasons, so the opportunity lies with scheduled travelers (

S_{b}

).

A modified split function in the regression decision tree model is proposed, concentrating solely on

S_{b}

. We compute the empirical probability of paying below the average (PBA) specifically for scheduled travelers (

S_{b}

), which is defined as follows:

P B A_{S_{b}, U_{x_{i}} \in S_{b}} = \frac{{k_{b}}_{(y_{i} < \bar{y})}}{k_{b}},

(9)

where

{k_{b}}_{(y_{i} < \bar{y})}

is the number of employees in

S_{b}

paying a fare below the average of total travelers (

\bar{y} = \frac{1}{n} \sum_{i = 1 S}^{n} y_{i}

) and

k_{b}

is the total number of elements in the group

S_{b}

. We find the threshold (

{\hat{U}}_{x_{i}}

) that maximizes the PBA for the group

S_{b}

. The probability of paying below the average (PBA) and probability of paying above the average (PAA) in each traveler group are one. Setting the advance booking day at a threshold value of

P B A_{S_{b}}

increases the chances of securing fares below the group average compared with industry standards. This strategy supports the organization’s goal of optimizing travel expenses for employees who comply with the policy.

3.2. Dataset

The data of the sector studied are considered restricted, which makes it difficult to obtain them; a global tech company supplied information on business domestic trips from 2018 to 2020, spanning the most important travel locations, under the condition of scaling the data to comply with their privacy policy. The data points for these years amount to 1928, 1762, and 153, respectively. The significant decline in the number of trips in 2020 is directly linked to travel restrictions. This global tech company is considered to be between a consulting and a manufacturing company, with a balanced mix of scheduled and unscheduled travelers, see Supplementary Material.

The data include booking and flight dates, origin/destination, fares, and service levels for each record. Multiple origins, destinations, and service levels influence fares; therefore, to fairly compare the trips, the fares are converted into a price factor using linear scaling based on the average (

\bar{y} = \frac{1}{n} \sum_{i \in S}^{n} y_{i}

) for each origin–destination–service combination. For example, if the average fare for the standard service between London and San Francisco is USD 1000 and a given flight is USD 1200, the price factor would be

1.2

. In other words, if the fare matches the average, the price factor is exactly one; if the fare paid is above or below the average, the price factor is correspondingly above or below one. The scaling does not influence the results and complies with data privacy. Only the book-in-advance parameter (X) and the price factor (Y) are disclosed in this study case. The case study data

S = {Y, X}

are divided into three datasets:

S_{2018}

,

S_{2019}

, and

S_{2020}

.

3.3. Simulation Dataset

The study dataset comes from a global tech company that is considered to be between a consulting and a manufacturing company, with a balanced mix of scheduled and unscheduled travelers. The simulation assesses the proposed models’ behavior under various combinations of scheduled and unscheduled travelers; the purpose of the simulation is to replicate the behavior of the study dataset in two scenarios: one with a higher proportion of unscheduled travelers, which is assumed to represent consulting companies, and another with a higher proportion of scheduled travelers, which is assumed to represent manufacturing companies. The consulting and manufacturing companies represent the extreme values of recurrent unscheduled trips and scheduled trips; companies from other sectors would fall in the middle. Following the nomenclature in Section 2.3, each simulation

S = {Y, X}

consists of the price factor (Y) and the book-in-advance parameter (X).

The adoption of revenue management systems in the airline sector has led to stochastic fares. In the simulation, we assume that the factor prices (Y) follow a geometric Brownian motion (GBM). This assumption is reasonable for dynamic pricing. Several authors have extensively studied the modeling of dynamic financial flows. Merton (1973) popularized the use of a GBM for modeling stock prices. Samuelson (1965), Tourinho (1979), and Margrabe (1978) extended the application of a GBM to model commodities and corporate assets for optimal decision making. This concept has evolved into what is now known as Real Options, where stochastic and dynamic variables are commonly modeled using a GBM.

The fares are a function of the booking in advance:

Y : {y_{1}, y_{2}, y_{2}, \dots, y_{i}, x_{n}} = X : {x_{1}, x_{2}, x_{2}, \dots, x_{i}, x_{n}}

, where n represents the number of business travelers. The behavior of Y is opposite to the traditional GBM, as the variability of fares increases when the booking in advance is closer to the departure time. In the simulation,

t_{0}

represents the maximum booking in advance while X denotes the book-in-advance time. Consequently,

t_{i} = t_{0} - x_{i}

represents the simulation period for each business traveler (i). We consider the factor price (

Y : = Y_{t}

) to be a GBM:

d Y_{t} = μ Y_{t} (d_{t}) + Y_{t} σ (d W_{t}),

(10)

where the solution for a given factor price of a business traveler (i) is

y_{i} = f (x_{i}) = y_{t_{0}} e^{(μ - \frac{σ^{2}}{2}) (t_{i}) + σ (t_{i}) (W_{t})},

(11)

where

y_{i}

is the price factor of the fares at the buying time for a business traveler i;

y_{t_{0}}

represents the average of the fare prices (

\bar{y} = \frac{1}{n} \sum_{i \in S}^{n} y_{i}

), and by definition, the price factor is set to one;

μ

is the rate of increase in the price factor, and it is the average of the returns of the price factors;

σ^{2}

is the variance of the returns of the price factors;

t_{i} = t_{0} - x_{i}

is the simulation period; and

W_{t}

is a random variable

\sim N (0, 1)

. The study dataset of 2019 is used to determine the parameters for the simulation and the GBM settings. The records were sorted by booking in advance, and the log returns were computed on the price factors to obtain the parameters:

μ_{\log return} = \frac{1}{n} \sum_{i = 1}^{n} log (\frac{y_{i}}{y_{i - 1}}) = 0.0523 %

and

σ_{\log return}^{2} = \frac{1}{n - 1} \sum_{i = 1}^{n} {(log (\frac{y_{i}}{y_{i - 1}}) - μ_{\log return})}^{2} = 54.492 %^{2}

. Similarly to the case study dataset, the simulation results were scaled using the average. Figure 1 illustrates the simulation process of 100 price factor trajectories using the GBM model from Equation (11); each traveler has a different book-in-advance period. The model outcomes consist of a set of book-in-advance parameters (X) and the final values of the trajectories, representing the price factor (Y).

To model advance bookings (X), we divided the 2019 dataset into two categories: scheduled travelers and unscheduled travelers. This classification was based on a pre-calculated optimal threshold from using one of the methods proposed in this study: the ADSE (as referenced in (7), ADAE (8), and PBA (9)). Notably, both the ADSE and ADAE methods yielded the same splitting value of

17.5

. We utilized this value to split the dataset X into

X_{a}

and

X_{b}

. Specifically, for scheduled travelers, we defined

X_{a} = {x \in X ∣ x \leq 17.5}

, while for unscheduled travelers,

X_{b} = {x \in X ∣ x > 17.5}

. The number of elements in

X_{a}

is denoted by

k_{a}

, and in

X_{b}

, it is designated as

k_{b}

. The total number of records in the dataset is expressed as

k_{a} + k_{b} = n

.

The proposed simulation utilizes an exponential distribution to model unscheduled travelers’ book-in-advance days (X). In this context, the parameter

λ

signifies the rate of arrival, calculated as the average number of days for advance bookings drawn from the subset

X_{a}

, where

X_{a} \sim e x p (λ),

(12)

where

λ = {\bar{x}}_{a} = (\frac{1}{k_{a}} \sum_{i \in X_{a}}^{k_{a}} x_{i}) = 10.543

.

For scheduled travelers, we proposed the use of a normal distribution to model the book-in-advance days (X). The parameters for this distribution are derived from the average and variance of the advance booking times in the subset

X_{b}

, where

X_{b} \sim N ({\bar{x}}_{b}, s_{x_{b}}^{2}),

(13)

where

{\bar{x}}_{b} = \frac{1}{k_{b}} \sum_{i \in S_{b}}^{k_{b}} x_{i} = 36.451

and

s_{x_{b}} = \frac{1}{k_{b} - 1} \sum_{i \in S_{b}}^{k_{b}} {(x_{i} - {\bar{x}}_{b})}^{2} = {24.682}^{2}

.

To complete the simulation setup, the prices (

Y = f (X)

) are obtained by using (11) for each element

i \in

S_{s i m}

. The process is repeated

k_{a}

times for scheduled travelers using (12) and

k_{b}

times using (13) for unscheduled travelers according to the desired mix. The proposed simulated scenarios are as follows.

Simulation 1: This scenario simulates a consulting company where most employees book trips a few days in advance. The mixture was selected by the authors and consisted of $85 %$ unscheduled travelers ( $k_{a}$ ) and $15 %$ scheduled travelers ( $k_{b}$ ).
- $X_{a} \sim e x p (λ = 10.543)$ simulated $k_{a} = 1700$ times;
- $X_{b} \sim N (\bar{x_{b}} = 36.451, s_{x_{b}}^{2} = {24.682}^{2})$ simulated $k_{b} = 300$ times.
Simulation 2: Employs the same setup as that in Simulation 1, but with a reversed mixture of travelers. The mixture is $15 %$ unscheduled travelers ( $k_{a}$ ) and $85 %$ scheduled travelers ( $k_{b}$ ).
- $X_{a} \sim e x p (λ = 10.543)$ simulated $k_{a} = 300$ times;
- $X_{b} \sim N (\bar{x_{b}} = 36.451, s_{x_{b}}^{2} = {24.682}^{2})$ simulated $k_{b} = 1700$ times.

This study utilizes a total of five datasets for model testing, including three years of actual data (

S_{2018}

,

S_{2019}

, and

S_{2020}

) and two simulation scenarios (

S_{s i m 1}

and

S_{s i m 2}

). The datasets are evaluated using traditional metrics such as the SSE and MAE, alongside the proposed split functions ADSE, ADAE, and PBA, in comparison with the industry standard (IS).

4. Analysis and Results

The regressive decision tree model requires the definition of operational parameters to ensure that the results have a meaningful solution in the application. We specified the minimum number of samples in the partition as

10 %

of the dataset; the number of partitions or depth was

p = 1

. After all, a single split was required to divide the data into two sub-groups—

S_{a}

and

S_{b}

—and d was set to one because there was only one variable X (the book-in-advance period).

According to (6), the computational effort in decision trees grows exponentially; however, in this application, the complexity grows linearly because p and d are one. The complexity is linear at the rate of the maximum number of days booked in advance; each booking day represents a possible optimal threshold

U_{x_{i}}

. Airlines and companies set an upper limit for booking in advance, and the limit is usually one year (365 days). The operational parameters and the book-in-advance limit create a finite search space for this application. In a finite search space, an optimal solution is secured through an exhaustive search.

We implemented the regressive decision tree model using the traditional split equations (SSE (4) and SAE (5)) and the proposed split functions (the ADSE (7), ADAE (8), and PBA (9)). In order to enhance the robustness of our analysis, we incorporate a comparative evaluation against the industry standard (IS). The search is exhaustive, and every possible threshold

U_{x_{i}}

is evaluated before determining the optimal threshold. The thresholds (

{\hat{U}}_{x_{i}}

) represent the optimal book-in-advance policy according to each model. A visualization of the numerical process of each model using the 2019 dataset is presented in Figure 2.

Table 1 presents a summary of the threshold values by model; the threshold values represent the optimal booking policy according to a business need. We added the results of the regressive decision tree model for reference. The split functions of the SSE (4) and SAE (5) represent the value where the sum of the square or absolute error is the lowest for the price factors; the variance differs between scheduled and unscheduled travelers. The regressive decision tree model does not differentiate between scheduled and unscheduled travelers; the optimal threshold is computed based on the sum of the total variance. Defining the travel policy based on the SSE or SAE may lead to low variance for one group and higher variance for the other, resulting in budget uncertainty for one group. Consequently, we did not find a practical application for the original regressive decision tree split functions.

We recommend using the ADSE (7) or ADAE (8) for companies that require more certainty in the travel budget. The ADSE and ADAE models establish a threshold where the variance between scheduled and unscheduled travelers is the most equal. Both groups have the same level of certainty, leading to a lower probability of paying above the average, resulting in more accurate travel expense planning to support budgeting control.

Table 2 shows the sum of the probability of paying above the average (PAA) of each traveler group in comparison with the results using the industry standard (IS) of 15 days. The proposed approach of the ADSE and ADAE leads to a lower PAA in all years and simulation scenarios.

For companies with a focus on cost and with traveling employees who are able to comply with the travel policy, we recommend the PBA proposal (Equation (9)). There are potential cost savings by setting the book-in-advance policy to the value where the probability of paying a fare below the average is the highest, which is mainly because the scheduled travelers comply with the policy. The findings presented in Table 3 illustrate the advantages of our approach over the industry standard (IS) across all study years and simulations. Notably, in challenging scenarios such as that of 2020, the probability of paying below the average with our proposal increases by up to 11.5%, resulting in savings for the group of scheduled travelers. Table 4 contains the complete set of results.

Figure 3 presents a visual representation of the data of the case study. The results for 2018 and 2019 are compared directly, as the company did not change its business travel strategy or the employee mix between scheduled and unscheduled travelers from one year to another. The results contrast with those of 2020, when the number of travelers decreased due to pandemic restrictions; the variations in fares in the 2020 data demonstrate that regardless of the pandemic, airline companies continued to use revenue management systems by having higher prices for flights booked a few days in advance. The optimal thresholds vary according to each model and dataset, and the optimal policy differs from year to year, contrasting with the fixed approach of the industry standard (IS). These findings suggest that a company’s optimal booking policy should be dynamic and revisited on a periodic basis.

The simulated dataset aims to test the models in the presence of a company’s different mix of scheduled and unscheduled travelers. Figure 4 visually illustrates the results for the simulated dataset. Simulation 1 has the greatest concentration of unscheduled travelers, with most of the trips being booked a few days in advance; Simulation 2 has a higher density of scheduled travelers, who book with more time in advance. The optimal book-in-advance policy varies from one simulation dataset to another; the results from the simulation demonstrate that a company with a larger proportion of unscheduled travelers has a lower optimal booking policy compared with a company with a greater proportion of scheduled travelers. For instance, a consulting company with employees traveling on short notice should have a policy with a lower book-in-advance time than a manufacturing company where trips are planned ahead of time. The optimal thresholds offer a practical book-in-advance value that adapts to the company’s needs while enabling employees to comply with the travel policy.

We acknowledge that multiple factors influence travel policies; however, numerous authors have identified the advance booking factor as the most crucial aspect in controlling expenses [11,12]. The approach proposed in this study finds the optimal book-in-advance policy using a company’s historical data and the regressive decision tree approach. Decision tree models find the local optima due to their exhaustive approach in searching all of the data; the optimal threshold is sensitive to new data and extreme values. For instance, a highly expensive ticket booked many days in advance influences the thresholds more than a similarly expensive ticket booked within just a few days; the ADAE (8) is the model that is least sensitive to variation, followed by the ADSE (7), and the PBA (9) is the most sensitive to data changes. Another disadvantage of decision trees is the high computational processing time. In this application, the parameter settings create a finite search space. In all of the scenarios, the algorithms run within seconds (see the results in Table 4), enabling the models to be implemented in real time.

Despite evidence from this study indicating that the optimal advance booking parameter is dynamic, we advise against adopting a real-time dynamic book-in-advance policy. Implementing our approach using “real-time” data could lead to frequent fluctuations in optimal advance booking parameters due to the characteristics of the regressive decision tree, challenging the principles of a coherent, compelling corporate travel policy [8]. Organizations experience gradual shifts in scheduled versus unscheduled travelers, which diminishes the need for daily updates to travel policies. However, the pandemic reinforces the necessity for a quantitative approach to regularly revising advance booking policies, as traditional fixed book-in-advance parameters inadequately address the need to minimize uncertainty and costs. The optimal frequency for adjusting the book-in-advance parameters should be tailored to each company’s specific circumstances and economic conditions while considering additional elements in the travel policy, such as management controls, approval processes, and employee commitment.

The threshold values obtained with the different datasets are consistent, and their implementation in defining the book-in-advance policy leads companies to achieve travel goals of controlling and optimizing their budgets compared with the industry standard; we propose the following methodology for implementation:

Collect historical data on the relationship between fares and advance booking days for the company’s most common destinations.
Scale the data to achieve homogeneity. We suggest linear scaling using the average as a basis for each origin–destination–service group.
Since the models are sensitive to extreme values in price factors, search for outliers in the data and remove them if exceptions are present.
Implement the decision tree regression model with the following custom split functions:
(a)
For budgeting certainty, use (7) or (8) to find the book-in-advance policy that reduces the variation in the price factor for unscheduled and scheduled travelers.
(b)
For cost savings, use (9) to increase the probability of paying a price factor below the average for scheduled travelers.
Periodically repeat the process to adjust the policy based on company needs.

5. Conclusions

Much of the existing literature on quantitative methodologies within business travel primarily adopts the perspective of the tourism industry, investigating the benefits accrued in this sector at the expense of corporate entities. This research endeavors to contribute a novel machine learning approach to determine the optimal advance booking policy, offering a quantitative methodology from the standpoint of companies themselves.

Two primary drivers were identified in defining the book-in-advance parameter: budget certainty and optimal cost. To address budget certainty, the traditional split function of a regressive decision tree was modified to identify the threshold between unscheduled and scheduled travelers that minimizes the absolute difference in squared or absolute errors. This modification enables companies to enhance fare certainty for both groups of travelers. For the second relevant factor, the optimizing cost, a new split function was introduced within the regressive decision tree model, focusing exclusively on scheduled employees. The empirical probability of paying below the average for scheduled travelers was subsequently calculated. The results of our approach were compared with the industry standard and presented improvements in the reduction in the probability of paying above the average (PAA) using (7) and (8), as well as an increase in the probability of paying below the average (PBA) using (9).

The modifications to the regressive decision tree split functions are centered on the threshold value rather than the prediction itself. The optimal threshold represents the ideal book-in-advance policy. The parameter settings establish a finite search space, facilitating the practical implementation of the models. The proposed quantitative approach effectively identifies the optimal advance booking policy tailored to the specific needs of the business, whether it prioritizes reducing fare variability or increasing the likelihood of securing fares that are below the average.

The proposed models were rigorously evaluated in a range of scenarios and compared with the industry standard (IS), culminating in the conclusion that a company’s optimal booking policy is dynamic, adapting to the evolving strategies of the company and the behavior of its travelers, satisfying hypothesis 1.Companies with a higher proportion of unscheduled travelers tend to exhibit lower optimal booking policies compared with those with a greater proportion of scheduled travelers, satisfying hypothesis 2. The findings demonstrate the superiority of our methodology compared with the industry standard (IS) throughout all years and simulations in this study.

Our conclusions underscore companies’ need to update book-in-advance parameters using quantitative methods to establish a realistic book-in-advance value that aligns with their specific needs while encouraging employees to continue to comply with the travel policy. The frequency of updating the book-in-advance parameters should be customized to each organization while considering management oversight and employee commitment.

Supplementary Materials

The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/math12233741/s1.

Author Contributions

Methodology, J.-M.Z.-C., R.R.-C. and J.D.S.-T.; Software, J.-M.Z.-C.; Investigation, J.-M.Z.-C. and R.R.-C.; Writing—original draft, J.-M.Z.-C.; Writing—review & editing, R.R.-C. and J.D.S.-T. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

References

World Travel and Tourism Council. Business Travel Spend Set to Reach Two Thirds of Pre-Pandemic Levels by 2022. 2021. Available online: https://wttc.org/news-article/business-travel-spend-set-to-reach-two-thirds-of-pre-pandemic-levels-by-2022-reveals-new-report-from-wttc (accessed on 1 October 2021).
Lock, S. Market Size of the Global Hotel Industry from 2014 to 2018. 2020. Available online: https://www.statista.com/statistics/247264/total-revenue-of-the-global-hotel-industry (accessed on 20 December 2022).
Denstadli, J.M. Impacts of videoconferencing on business travel: The Norwegian experience. J. Air Transp. Manag. 2004, 10, 371–376. [Google Scholar] [CrossRef]
Denstadli, J.M.; Gripsrud, M. Face-to-face by travel or picture: The relationship between travelling and video communication in business settings. In International Business Travel in the Global Economy; Routledge: Abingdon-on-Thames, UK, 2010; pp. 217–238. [Google Scholar]
Lassen, C. Individual rationalities of global business travel. In International Business Travel in the Global Economy; Derudder, B., Witlox, F., Beaverstock, J.V., Eds.; Routledge: New York, NY, USA, 2016; pp. 203–220. [Google Scholar]
Thurlow, C.; Jaworski, A. The alchemy of the upwardly mobile: Symbolic capital and the stylization of elites in frequent-flyer programmes. Discourse Soc. 2006, 17, 99–135. [Google Scholar] [CrossRef]
Steinhoff, L.; Zondag, M.M. Loyalty programs as travel companions: Complementary service features across customer journey stages. J. Bus. Res. 2021, 129, 70–82. [Google Scholar] [CrossRef]
Rothschild, J. Corporate travel policy. Tourism Manag. 1988, 9, 66–68. [Google Scholar] [CrossRef]
Lang, J.B. Corporate travel: How to develop a formal, written policy. HR Focus 1993, 70, 1. [Google Scholar]
Wint, C.; Avish, S. US Airline Mergers and Acquisitions. 2003. Available online: https://www.airlines.org/dataset/u-s-airline-mergers-and-acquisitions/ (accessed on 15 March 2022).
Douglas, A.; Lubbe, B.A. Identifying value conflicts between stakeholders in corporate travel management by applying the soft value management model: A survey in South Africa. Tour. Manag. 2006, 27, 1130–1140. [Google Scholar] [CrossRef]
Gustafson, P. Control and commitment in corporate travel management. Res. Transp. Bus. Manag. 2013, 9, 21–28. [Google Scholar] [CrossRef]
Mason, K.J. Future trends in business travel decision making. J. Air Transp. 2002, 7, 47–68. [Google Scholar]
Oversee. Advance Purchase Policy. 2024. Available online: https://oversee.biz/business-travel-glossary/advance-purchase-policy/ (accessed on 7 October 2024).
Teplis Travel. The Benefits of an Advance Purchase Policy in Air Travel. 2024. Available online: https://teplis.com/blog/the-benefits-of-an-advance-purchase-policy-in-air-travel/ (accessed on 7 October 2024).
Advito. The Narrowing Advantage of Advance Purchase. 2024. Available online: https://www.advito.com/resources/the-narrowing-advantage-of-advance-purchase/ (accessed on 7 October 2024).
Mingers, J. An empirical comparison of pruning methods for decision tree induction. Mach. Learn. 1989, 4, 227–243. [Google Scholar] [CrossRef]
Olaru, C.; Wehenkel, L. A complete fuzzy decision tree technique. Fuzzy Sets Syst. 2003, 138, 221–254. [Google Scholar] [CrossRef]
Poyarkov, A.; Drutsa, A.; Khalyavin, A.; Gusev, G.; Serdyukov, P. Boosted decision tree regression adjustment for variance reduction in online controlled experiments. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 235–244. [Google Scholar]
Hothorn, T.; Hornik, K.; Zeileis, A. Unbiased recursive partitioning: A conditional inference framework. J. Comput. Graph. Stat. 2006, 15, 651–674. [Google Scholar] [CrossRef]
Jena, M.; Dehuri, S. Decision tree for classification and regression: A state-of-the art review. Informatica 2020, 44, 405–420. [Google Scholar] [CrossRef]
Gustafson, P. Travel time and working time: What business travellers do when they travel, and why. Time Soc. 2012, 21, 203–222. [Google Scholar] [CrossRef]
Eugenio-Martin, J.L.; Inchausti-Sintes, F. Low-cost travel and tourism expenditures. Ann. Tour. Res. 2016, 57, 140–159. [Google Scholar] [CrossRef]
Yeoman, I. The History of Revenue and Pricing Management–15 Years and More. J. Rev. Pric. Mngt. 2016, 15, 185–196. [Google Scholar] [CrossRef]
Shlifer, E.; Vardi, Y. An airline overbooking policy. Transp. Sci. 1975, 9, 101–114. [Google Scholar] [CrossRef]
Simpson, R.W. Using Network Flow Techniques to Find Shadow Prices for Market Demands and Seat Inventory Control; MIT, Department of Aeronautics and Astronautics, Flight Transportation: Cambridge, MA, USA, 1989. [Google Scholar]
Bitran, G.R.; Mondschein, S.V. An application of yield management to the hotel industry considering multiple day stays. Oper. Res. 1995, 43, 427–443. [Google Scholar] [CrossRef]
Kimes, S.E. Strategic Pricing Through Revenue Management. 2010. Available online: https://ecommons.cornell.edu/server/api/core/bitstreams/896ca3ca-3233-41a9-b109-185d6e268cd8/content (accessed on 22 November 2022).
Brumelle, S.; McGill, J. A general model for airline overbooking and two-class revenue management with dependent demands. In Technical Report, Working Paper; University of British Columbia: Vancouver, BC, USA, 1989. [Google Scholar]
Dunleavy, H.N. Airline Passenger Overbooking. In The Handbook of Airline Economics; Jenkins, D., Ray, C.P., Eds.; The Aviation Weekly Group; McGraw-Hill: New York, NY, USA, 1995; pp. 469–482. [Google Scholar]
Gosavi, A.; Ozkaya, E.; Kahraman, A.F. Simulation optimization for revenue management of airlines with cancellations and overbooking. OR Spectr. 2007, 29, 21–38. [Google Scholar] [CrossRef]
Fouad, A.M.; Atiya, A.F.; Saleh, M.; Bayoumi, A.E.M.M. A simulation-based overbooking approach for hotel revenue management. In Proceedings of the 10th International Computer Engineering Conference (ICENCO), Cairo, Egypt, 29–30 December 2014; pp. 61–69. [Google Scholar]
Baker, T.K.; Collier, D.A. The benefits of optimizing prices to manage demand in hotel revenue management systems. Prod. Oper. Manag. 2003, 12, 502–518. [Google Scholar] [CrossRef]
Pimentel, V.; Aizezikali, A.; Baker, T. An evaluation of the bid price and nested network revenue management allocation methods. Comput. Ind. Eng. 2018, 115, 100–108. [Google Scholar] [CrossRef]
Higle, J.L. Bid-price control with origin–destination demand: A stochastic programming approach. J. Revenue Pricing Manag. 2007, 5, 291–304. [Google Scholar] [CrossRef]
Nuno, A.; de Almeida, A.; Nunes, L. Predicting hotel bookings cancellation with a machine learning classification model. In Proceedings of the16th IEEE International Conference on Machine Learning and Applications (ICMLA), Cancun, Mexico, 18–21 December 2017; pp. 1049–1054. [Google Scholar]
Subulan, K.; Baykasoğlu, A.; Akyol, D.E.; Yildiz, G. Metaheuristic-based simulation optimization approach to network revenue management with an improved self-adjusting bid price function. Eng. Econ. 2017, 62, 3–32. [Google Scholar] [CrossRef]
Koopman, P. Between control and commitment: Management and change as the art of balancing. Leadersh. Organ. Dev. J. 1991, 12, 3–7. [Google Scholar] [CrossRef]
Verheul, I. Commitment or Control? Human Resource Management Practices in Female and Male-Led Businesses; ERIM Report Series Reference No. ERS-2007-071-ORG; Rotterdam School of Management: Rotterdam, The Netherlands, 2007. [Google Scholar]
Jenkins, D. (Ed.) Handbook of Airline Economics; Aviation Week Group of McGraw-Hill: Washington, DC, USA, 1995. [Google Scholar]
Aslani, S.; Modarres, M.; Sibdari, S. On the fairness of airlines’ ticket pricing as a result of revenue management techniques. J. Air Transp. Manag. 2014, 40, 56–64. [Google Scholar] [CrossRef]
Anderson, R.I.; Lewis, D.; Parker, M.E. Another look at the efficiency of corporate travel management departments. J. Travel Res. 1999, 37, 267–272. [Google Scholar] [CrossRef]
Gustafson, P. Managing business travel: Developments and dilemmas in corporate travel management. Tour. Manag. 2012, 33, 276–284. [Google Scholar] [CrossRef]
Douglas, A.; Lubbe, B.A. Violation of the corporate travel policy: An exploration of underlying value-related factors. J. Bus. Ethics 2009, 84, 97–111. [Google Scholar] [CrossRef]
Holma, A.M. Interpersonal interaction in business triads—Case studies in corporate travel purchasing. J. Purch. Supply Manag. 2012, 18, 101–112. [Google Scholar] [CrossRef]
Douglas, A.; Lubbe, B.A. An empirical investigation into the role of personal-related factors on corporate travel policy compliance. J. Bus. Ethics 2010, 92, 451–461. [Google Scholar] [CrossRef]
Weber, M.M. Corporate Travel Policy Compliance: A Generational Analysis of Corporate Travellers. Ph.D. Thesis, University of Pretoria, Pretoria, South Africa, 2019. [Google Scholar]
Cortes, C.; Vapnik, V. Support-vector networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
McCulloch, W.S.; Pitts, W. A Logical Calculus of the Ideas Immanent in Nervous Activity. Bull. Math. Biophys. 1943, 5, 115–133. [Google Scholar] [CrossRef]
Haykin, S. Neural Networks: A Comprehensive Foundation; Prentice Hall PTR: Saddle River, NJ, USA, 1994. [Google Scholar]
Schölkopf, B.; Smola, A.J. Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond; MIT Press: Cambrudge, MA, USA, 2002. [Google Scholar]
Breiman, L. Bagging predictors. Mach. Learn. 1996, 24, 123–140. [Google Scholar] [CrossRef]
Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
Friedman, J.H. Greedy function approximation: A gradient boosting machine. Ann. Stat. 2001, 29, 1189–1232. [Google Scholar] [CrossRef]
Peng, Y.; He, M.; Hu, F.; Mao, Z.; Huang, X.; Ding, J. Predictive Modeling of Flexible EHD Pumps using Kolmogorov-Arnold Networks. arXiv 2024, arXiv:2405.07488. [Google Scholar] [CrossRef]
Rumelhart, D.E.; Hinton, G.E.; Williams, R.J. Learning Representations by Backpropagating Errors. Nature 1986, 323, 533–536. [Google Scholar] [CrossRef]
Breiman, L.; Friedman, J.H.; Olshen, R.A.; Stone, C.J. Classification and Regression Trees, 1st ed.; Chapman and Hall/CRC: New York, NY, USA, 1984. [Google Scholar]
Cover, T.M.; Hart, P.E. Nearest Neighbor Pattern Classification. IEEE Trans. Inf. Theory 1967, 13, 21–27. [Google Scholar] [CrossRef]
MacQueen, J. Some Methods for Classification and Analysis of Multivariate Observations. In Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics and Probability; University of California Press: Berkeley, CA, USA, 1967; pp. 281–297. [Google Scholar]
Johnson, S.C. Hierarchical Clustering Schemes. Psychometrika 1967, 32, 241–254. [Google Scholar] [CrossRef]
Reynolds, D.A. Gaussian Mixture Models. In Encyclopedia of Biometrics; Jajodia, S., Prakash, S., Eds.; Springer: New York, NY, USA, 2009; pp. 659–663. [Google Scholar]
Quinlan, J.R. Induction of decision trees. Mach. Learn. 1986, 1, 81–106. [Google Scholar] [CrossRef]
Lubbe, B. A study of corporate travel management in selected South African organisations and a conceptual model for effective corporate travel management. S. Afr. J. Econ. Manag. Sci. 2003, 6, 304–330. [Google Scholar] [CrossRef]

Figure 1. This figure illustrates the usages of the geometric Brownian motion (GBM) to model price factors of airfares. In GBM, the variance increases over time. Conversely, the behavior of the price factors (Y) is reversed; the variability of the fares increases as the booking in advance approaches the departure time. The figure displays 100 price factor trajectories using the GBM model from Equation (11); each traveler has a different book-in-advance period. The traveler with the maximum booking in advance (

t_{0}

) is the reference for calculating the simulation period (

t_{i}

). The simulation period is the difference between the maximum booking in advance (

t_{0}

) and each traveler’s book-in-advance period (

x_{i}

) such that

t_{i} = t_{0} - x_{i}

. The model results are a set of bookings in advance (X) and the trajectories’ final value, representing the factor price (Y).

Figure 1. This figure illustrates the usages of the geometric Brownian motion (GBM) to model price factors of airfares. In GBM, the variance increases over time. Conversely, the behavior of the price factors (Y) is reversed; the variability of the fares increases as the booking in advance approaches the departure time. The figure displays 100 price factor trajectories using the GBM model from Equation (11); each traveler has a different book-in-advance period. The traveler with the maximum booking in advance (

t_{0}

) is the reference for calculating the simulation period (

t_{i}

). The simulation period is the difference between the maximum booking in advance (

t_{0}

) and each traveler’s book-in-advance period (

x_{i}

) such that

t_{i} = t_{0} - x_{i}

. The model results are a set of bookings in advance (X) and the trajectories’ final value, representing the factor price (Y).

Figure 2. This figure displays the numerical implementation of the models using the 2019 dataset; the search is exhaustive, and each threshold

U_{x_{i}}

is evaluated before determining the optimal one. The thresholds (

{\hat{U}}_{x_{i}}

) represent the optimal book-in-advance policy according to each model. The traditional functions of the SSE (4) and SAE (5) find thresholds where the sum of the error (squared or absolute) is the lowest, whereas the proposed split functions of the ADSE (7) and ADAE (8) create a partition where the error (squared or absolute) is the most equal. The PBA (9) finds the thresholds where the scheduled travelers (

S_{b}

) have the highest probability of paying a price factor that is below the average (

\bar{y}

).

Figure 2. This figure displays the numerical implementation of the models using the 2019 dataset; the search is exhaustive, and each threshold

U_{x_{i}}

is evaluated before determining the optimal one. The thresholds (

{\hat{U}}_{x_{i}}

) represent the optimal book-in-advance policy according to each model. The traditional functions of the SSE (4) and SAE (5) find thresholds where the sum of the error (squared or absolute) is the lowest, whereas the proposed split functions of the ADSE (7) and ADAE (8) create a partition where the error (squared or absolute) is the most equal. The PBA (9) finds the thresholds where the scheduled travelers (

S_{b}

) have the highest probability of paying a price factor that is below the average (

\bar{y}

).

Figure 3. This figure visually represents the case study data, illustrating the advance bookings and fares after converting them into price factors. Each data point represents a business trip in 2018, 2019, or 2020. The pandemic led to a reduction in trips for 2020. The optimal thresholds vary depending on each model and year, contrasting with the fixed value of the industry standard (IS), which leads to the conclusion that a company’s optimal booking policy is dynamic and adapts according to travelers’ behavior.

Figure 4. This figure shows the booking in advance and the price factors in the simulated dataset. Simulation 1 has a higher concentration of unscheduled travelers, with most bookings being made just a few days in advance. In contrast, Simulation 2 has a higher concentration of scheduled travelers who book well ahead of time. The industry standard is fixed for all companies; the results indicate that a company with a larger proportion of unscheduled travelers has a lower optimal booking policy compared with a company with a greater proportion of scheduled travelers.

Table 1. Book-in-advance policy by model.

	Model Eq.	SSE (4)	SAE (5)	ADSE (7)	ADAE (8)	$P B A_{S_{b}}$ (9)	Industry Standard
dataset	2018	$13.5$	$14.5$	$19.5$	$18.5$	$14.5$	$15.5$
	2019	$8.5$	$11.5$	$17.5$	$17.5$	$18.5$	$15.5$
	2020	$15.5$	$15.5$	$15.5$	$16.5$	$25.5$	$15.5$
	Sim. 1	$1.5$	$1.5$	$3.5$	$4.5$	$17.5$	$15.5$
	Sim. 2	$33.5$	$29.5$	$10.5$	$12.5$	$47.5$	$15.5$

Table 2. Probability of paying above the average by model in comparison with the industry standard (IS).

2018	${\hat{U}}_{x_{i}}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	Total $P A A$	Delta to IS
SSE	$13.5$	$0.544$	$0.340$	$0.884$	$0.045$
SAE	$14.5$	$0.518$	$0.336$	$0.854$	$0.015$
ADSE	$19.5$	$0.456$	$0.347$	$0.803$	$- 0.036$
ADAE	$18.5$	$0.460$	$0.350$	$0.811$	$- 0.029$
IS	$15.5$	$0.502$	$0.337$	$0.839$	0
2019	${\hat{U}}_{x_{i}}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	Total $P A A$	Delta to IS
SSE	$8.5$	$0.744$	$0.278$	$1.022$	$0.247$
SAE	$11.5$	$0.701$	$0.254$	$0.956$	$0.18$
ADSE	$17.5$	$0.554$	$0.173$	$0.727$	$- 0.048$
ADAE	$17.5$	$0.554$	$0.173$	$0.727$	$- 0.048$
IS	$15.5$	$0.585$	$0.190$	$0.775$	0
2020	${\hat{U}}_{x_{i}}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	Total $P A A$	Delta to IS
SSE	$15.5$	$0.786$	$0.253$	$1.039$	0
SAE	$15.5$	$0.786$	$0.253$	$1.039$	0
ADSE	$15.5$	$0.786$	$0.253$	$1.039$	0
ADAE	$16.5$	$0.770$	$0.241$	$1.011$	$- 0.028$
IS	$15.5$	$0.786$	$0.253$	$1.039$	0
Sim. 1	${\hat{U}}_{x_{i}}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	Total $P A A$	Delta to IS
SSE	$1.5$	$0.416$	$0.445$	$0.861$	$- 0.014$
SAE	$1.5$	$0.416$	$0.445$	$0.861$	$- 0.014$
ADSE	$3.5$	$0.428$	$0.446$	$0.874$	$- 0.001$
ADAE	$4.5$	$0.438$	$0.439$	$0.877$	$0.002$
IS	$15.5$	$0.439$	$0.436$	$0.875$	0
Sim. 2	${\hat{U}}_{x_{i}}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	Total $P A A$	Delta to IS
SSE	$33.5$	$0.438$	$0.486$	$0.924$	$0.023$
SAE	$29.5$	$0.437$	$0.481$	$0.918$	$0.017$
ADSE	$10.5$	$0.429$	$0.466$	$0.895$	$- 0.006$
ADAE	$12.5$	$0.432$	$0.465$	$0.897$	$- 0.004$
IS	$15.5$	$0.433$	$0.468$	$0.901$	0

Table 3. Probability of paying above the average for scheduled travelers by model in comparison with the industry standard (IS).

	2018	2019	2020	Sim. 1	Sim. 2
PBA $S_{b}$	$0.664$	$0.839$	$0.862$	$0.572$	$0.583$
IS.	$0.663$	$0.810$	$0.747$	$0.564$	$0.532$
Delta	$0.002$	$0.029$	$0.115$	$0.008$	$0.051$

Table 4. Results by dataset, where the SSE, SAE, ADSE, ADAE, PBA, and IS are defined using Equations (4)–(9), and the industry standard (IS), respectively.

2018	${\hat{U}}_{x_{i}}$	$\bar{y_{a}}$	$\bar{y_{b}}$	$σ_{y_{a}}^{2}$	$σ_{y_{b}}^{2}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	$P B A_{S_{a}}$	$P B A_{S_{b}}$	n	$k_{a}$	$k_{b}$	Time (s)
SSE	$13.5$	$1.106$	$0.948$	$0.220$	$0.275$	$0.544$	$0.34$	$0.456$	$0.660$	1928	638	1290	$0.22$
SAE	$14.5$	$1.092$	$0.94$	$0.226$	$0.277$	$0.518$	$0.336$	$0.482$	$0.664$	1928	760	1168	$0.192$
ADSE	$19.5$	$1.040$	$0.949$	$0.237$	$0.289$	$0.456$	$0.347$	$0.544$	$0.653$	1928	1075	853	$0.157$
ADAE	$18.5$	$1.044$	$0.952$	$0.236$	$0.286$	$0.460$	$0.350$	$0.540$	$0.650$	1928	1006	922	$0.156$
PBA $S_{b}$	$14.5$	$1.092$	$0.940$	$0.226$	$0.277$	$0.518$	$0.336$	$0.482$	$0.664$	1928	760	1168	$0.159$
IS	$15.5$	$1.078$	$0.942$	$0.226$	$0.281$	$0.502$	$0.337$	$0.498$	$0.663$	1928	825	1103	−
2019	${\hat{U}}_{x_{i}}$	$\bar{y_{a}}$	$\bar{y_{b}}$	$σ_{y_{a}}^{2}$	$σ_{y_{b}}^{2}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	$P B A_{S_{a}}$	$P B A_{S_{b}}$	n	$k_{a}$	$k_{b}$	Time (s)
SSE	$8.5$	$1.314$	$0.934$	$0.304$	$0.23$	$0.744$	$0.278$	$0.256$	$0.722$	1762	305	1457	$0.215$
SAE	$11.5$	$1.256$	$0.922$	$0.278$	$0.233$	$0.701$	$0.254$	$0.299$	$0.746$	1762	412	1350	$0.202$
ADSE	$17.5$	$1.098$	$0.907$	$0.268$	$0.242$	$0.554$	$0.173$	$0.446$	$0.827$	1762	859	903	$0.18$
ADAE	$17.5$	$1.098$	$0.907$	$0.268$	$0.242$	$0.554$	$0.173$	$0.446$	$0.827$	1762	859	903	$0.184$
PBA $S_{b}$	$18.5$	$1.088$	$0.905$	$0.263$	$0.247$	$0.540$	$0.161$	$0.460$	$0.839$	1762	918	844	$0.184$
IS	$15.5$	$1.126$	$0.906$	$0.263$	$0.243$	$0.585$	$0.190$	$0.415$	$0.810$	1762	752	1010	−
2020	${\hat{U}}_{x_{i}}$	$\bar{y_{a}}$	$\bar{y_{b}}$	$σ_{y_{a}}^{2}$	$σ_{y_{b}}^{2}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	$P B A_{S_{a}}$	$P B A_{S_{b}}$	n	$k_{a}$	$k_{b}$	Time (s)
SSE	$15.5$	$1.154$	$0.870$	$0.073$	$0.058$	$0.786$	$0.253$	$0.214$	$0.747$	153	70	83	$0.197$
SAE	$15.5$	$1.154$	$0.870$	$0.073$	$0.058$	$0.786$	$0.253$	$0.214$	$0.747$	153	70	83	$0.289$
ADSE	$15.5$	$1.154$	$0.870$	$0.073$	$0.058$	$0.786$	$0.253$	$0.214$	$0.747$	153	70	83	$0.037$
ADAE	$16.5$	$1.143$	$0.866$	$0.076$	$0.055$	$0.770$	$0.241$	$0.230$	$0.759$	153	74	79	$0.033$
PBA $S_{b}$	$25.5$	$1.045$	$0.808$	$0.081$	$0.054$	$0.581$	$0.138$	$0.419$	$0.862$	153	124	29	$0.037$
IS	$15.5$	$1.154$	$0.870$	$0.073$	$0.058$	$0.786$	$0.253$	$0.214$	$0.747$	153	70	83	−
Sim. 1	${\hat{U}}_{x_{i}}$	$\bar{y_{a}}$	$\bar{y_{b}}$	$σ_{y_{a}}^{2}$	$σ_{y_{b}}^{2}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	$P B A_{S_{a}}$	$P B A_{S_{b}}$	n	$k_{a}$	$k_{b}$	Time (s)
SSE	$1.5$	$0.976$	$1.007$	$0.093$	$0.081$	$0.416$	$0.445$	$0.584$	$0.555$	2000	459	1541	$0.225$
SAE	$1.5$	$0.976$	$1.007$	$0.093$	$0.081$	$0.416$	$0.445$	$0.584$	$0.555$	2000	459	1541	$0.205$
ADSE	$3.5$	$0.991$	$1.007$	$0.096$	$0.076$	$0.428$	$0.446$	$0.572$	$0.554$	2000	841	1159	$0.087$
ADAE	$4.5$	$0.994$	$1.006$	$0.093$	$0.076$	$0.438$	$0.439$	$0.562$	$0.561$	2000	984	1016	$0.084$
PBA $S_{b}$	$17.5$	$1.001$	$0.992$	$0.088$	$0.056$	$0.440$	$0.428$	$0.560$	$0.572$	2000	1757	243	$0.091$
IS	$15.5$	$1.000$	$0.997$	$0.088$	$0.062$	$0.439$	$0.436$	$0.561$	$0.564$	2000	1711	289	−
Sim. 2	${\hat{U}}_{x_{i}}$	$\bar{y_{a}}$	$\bar{y_{b}}$	$σ_{y_{a}}^{2}$	$σ_{y_{b}}^{2}$	$P A A_{S_{a}}$	$P A A_{S_{b}}$	$P B A_{S_{a}}$	$P B A_{S_{b}}$	n	$k_{a}$	$k_{b}$	Time (s)
SSE	$33.5$	$0.996$	$1.013$	$0.092$	$0.063$	$0.438$	$0.486$	$0.562$	$0.514$	2000	1506	494	$0.226$
SAE	$29.5$	$0.995$	$1.011$	$0.093$	$0.064$	$0.437$	$0.481$	$0.563$	$0.519$	2000	1408	592	$0.216$
ADSE	$10.5$	$0.993$	$1.005$	$0.101$	$0.073$	$0.429$	$0.466$	$0.571$	$0.534$	2000	854	1146	$0.084$
ADAE	$12.5$	$0.995$	$1.004$	$0.103$	$0.069$	$0.432$	$0.465$	$0.568$	$0.535$	2000	918	1082	$0.083$
PBA $S_{b}$	$47.5$	$1.000$	$1.003$	$0.088$	$0.054$	$0.454$	$0.417$	$0.546$	$0.583$	2000	1782	218	$0.086$
IS	$15.5$	$0.996$	$1.005$	$0.101$	$0.067$	$0.433$	$0.468$	$0.567$	$0.532$	2000	1026	974	−

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zarate-Carbajal, J.-M.; Ruiz-Cruz, R.; Sánchez-Torres, J.D. Strategic Decisions in Corporate Travel: Optimization Through Decision Trees. Mathematics 2024, 12, 3741. https://doi.org/10.3390/math12233741

AMA Style

Zarate-Carbajal J-M, Ruiz-Cruz R, Sánchez-Torres JD. Strategic Decisions in Corporate Travel: Optimization Through Decision Trees. Mathematics. 2024; 12(23):3741. https://doi.org/10.3390/math12233741

Chicago/Turabian Style

Zarate-Carbajal, Jose-Mario, Riemann Ruiz-Cruz, and Juan Diego Sánchez-Torres. 2024. "Strategic Decisions in Corporate Travel: Optimization Through Decision Trees" Mathematics 12, no. 23: 3741. https://doi.org/10.3390/math12233741

APA Style

Zarate-Carbajal, J.-M., Ruiz-Cruz, R., & Sánchez-Torres, J. D. (2024). Strategic Decisions in Corporate Travel: Optimization Through Decision Trees. Mathematics, 12(23), 3741. https://doi.org/10.3390/math12233741

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Strategic Decisions in Corporate Travel: Optimization Through Decision Trees

Abstract

1. Introduction

2. Preliminary Review

2.1. Booking Traveler Problem

2.2. Machine Learning Model Selection

2.3. Regression Decision Tree

3. Optimal Policy with Decision Trees

3.1. Regressive Decision Tree Modification to Optimize Travel Policy

3.2. Dataset

3.3. Simulation Dataset

4. Analysis and Results

5. Conclusions

Supplementary Materials

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI