1. Introduction
Thailand’s transport sector faces critical challenges in energy management and sustainable development amid rapid economic growth and urbanization. As the country’s second-largest energy-consuming sector, transport accounts for approximately 30% of total final energy consumption, with a consistent upward trend over the past three decades. This escalating energy demand poses significant challenges to energy security, environmental sustainability, and the nation’s commitment to reducing greenhouse gas emissions under international agreements [
1,
2]. The complexity of Thailand’s transport energy consumption patterns is uniquely characterized by several factors. First, the country has a distinctive vehicle fleet composition, with an unusually high proportion of motorcycles alongside a growing number of private cars and commercial vehicles [
3,
4]. Second, there are significant disparities in transportation needs and energy consumption patterns between urban and rural areas, reflecting Thailand’s diverse geographical and economic landscape [
5,
6]. Third, the rapid expansion of e-commerce and logistics services has led to increased energy demand in the freight transport sector [
7,
8]. In this context, the ability to accurately predict transport energy consumption becomes crucial for effective policy planning and resource allocation [
9,
10].
Recent advancements in artificial intelligence and machine learning, particularly Long Short-Term Memory (LSTM) neural networks and XGBoost, offer promising alternatives for more accurate transport energy consumption predictions. These methods have demonstrated superior capabilities in handling complex, time-dependent data and capturing non-linear relationships between variables. LSTM networks excel in modeling sequential data with temporal dependencies, making them particularly suitable for forecasting energy consumption patterns over time [
11,
12]. XGBoost, as an ensemble learning method, has gained significant traction due to its computational efficiency and ability to handle diverse datasets with minimal preprocessing [
13,
14]. Both approaches can effectively incorporate multiple variables and identify complex interactions that traditional statistical methods often struggle to capture. Recent studies by Çınarer et al. [
15] and Hoxha et al. [
16] have demonstrated that these advanced methods significantly outperform conventional approaches in predictive accuracy for transportation-related energy consumption and emissions. However, their application in the specific context of Thailand’s transport sector remains limited, particularly in incorporating the unique characteristics of the country’s transport system and vehicle usage patterns [
17,
18].
Regarding a comprehensive review of relevant literature, several key trends emerge in the field of transport energy consumption prediction. Recent research by Pongthanaisawan et al. [
1] has highlighted the significant growth in fuel consumption and greenhouse gas emissions in Thailand’s transport sector, emphasizing the need for accurate predictive models. Studies by Ağbulut [
2] and Ji et al. [
13] have demonstrated the efficacy of machine learning approaches in forecasting transportation energy demand and emissions, with their models showing high prediction accuracies ranging from 90.8% to 95.2%. The application of LSTM networks has been particularly noteworthy, with researchers such as Duan et al. [
12] and Ghanbari and Borna [
19] demonstrating their effectiveness in capturing temporal dependencies in time-series data related to energy consumption. In parallel, XGBoost has emerged as a powerful alternative, with studies by Çınarer et al. [
15] showcasing its superior performance in predicting carbon emissions and energy needs in transportation contexts. These advanced modeling techniques have significantly outperformed traditional statistical approaches, as demonstrated by Rahman et al. [
20] and Shams Amiri et al. [
21] in their comparative analyses. When considering this body of research alongside the available dataset (1993–2022), several specific research gaps become apparent. First, while previous studies have incorporated vehicle-related variables, none have comprehensively analyzed the combined impact of differentiated vehicle categories (small, medium, and large) alongside their corresponding vehicle kilometers traveled (VKT) data for motorcycles, passengers, and trucks. This unique combination of variables presents an opportunity for more nuanced prediction modeling that has not been explored in existing literature. Second, although studies like Mohsin et al. [
22] and Limanond et al. [
23] have used GDP and population data, they have not specifically examined the long-term relationship (30-year period) between these macroeconomic indicators and transport energy consumption in Thailand’s context. The extensive temporal coverage of the dataset (1993–2022) offers a unique opportunity to analyze long-term patterns that most existing studies, typically using 5–10-year periods, cannot address. Furthermore, while existing research has employed LSTM and XGBoost models, there is a limited exploration of how these methods perform with segregated vehicle data (both registration numbers and VKT). The granularity of the vehicle-related variables provides an opportunity to develop more sophisticated prediction models that can account for the distinct impacts of different vehicle types and their usage patterns on energy consumption. Additionally, there is a notable gap in understanding how the relationship between vehicle registration numbers and actual vehicle usage (VKT) influences energy consumption predictions. This aspect is particularly relevant for Thailand’s transport policy planning but remains unexplored in the current literature.
This study utilizes a comprehensive dataset spanning from 1993 to 2022, which provides a unique opportunity for detailed analysis and modeling. The dataset includes the following: (1) vehicle registration data categorized by size (small, medium, and large vehicles), (2) vehicle kilometers traveled (VKT) for different vehicle types (motorcycles, passenger vehicles, and trucks), (3) macroeconomic indicators (GDP and population), and (4) historical transport energy consumption data. The length and granularity of this dataset enable a more nuanced understanding of the relationships between vehicle ownership, usage patterns, economic factors, and energy consumption. This rich data environment allows for the development of more sophisticated and accurate prediction models that can capture both long-term trends and short-term variations in transport energy consumption [
8]. By leveraging this extensive historical data alongside advanced machine learning techniques, this study aims to provide insights that bridge the gap between technical modeling capabilities and practical policy applications in Thailand’s transport sector, addressing a critical need highlighted by Supasa et al. [
7], Ji et al. [
13] in their research on energy consumption patterns and emissions in the transportation sector.
The main objectives of this study are as follows:
To develop and compare the performance of LSTM and XGBoost models in predicting Thailand’s transport energy consumption, evaluating their respective strengths and limitations in handling different aspects of the prediction task.
To analyze the differential impact of various vehicle categories and their usage patterns on energy consumption, providing insights into how fleet composition affects overall energy demand.
To identify and quantify the relative importance of different factors affecting transport energy consumption, including vehicle-related variables, economic indicators, and demographic factors.
To provide evidence-based recommendations for transport energy policy planning, particularly in areas of vehicle fleet management and energy efficiency improvements.
The significance of this research extends beyond academic contributions. For policymakers, accurate predictions of transport energy consumption are essential for infrastructure planning, setting realistic energy efficiency targets, developing effective energy conservation measures, and allocating resources for sustainable transport initiatives [
24,
25]. The methodological advancements in this study contribute to a growing body of work on applying machine learning to energy forecasting, addressing limitations in traditional approaches identified by Antonopoulos et al. [
26] and Alabi et al. [
27]. Furthermore, this research comes at a critical time when Thailand is striving to balance economic growth with sustainable development goals and energy security concerns. The country’s commitment to reducing greenhouse gas emissions and promoting sustainable transport solutions makes the accurate prediction of transport energy consumption particularly relevant for future policy planning [
1]. By providing a comparative analysis of LSTM and XGBoost models in the specific context of Thailand’s transport sector, this study offers valuable insights for both academic understanding and practical policy implementation, addressing a significant gap in regional research on advanced predictive modeling for transportation energy demand as highlighted by Rahman et al. [
20], Emami Javanmard et al. [
28].
This study’s planning perspective is particularly significant given Thailand’s current transportation policy landscape. Accurate predictive models for transport energy consumption serve as critical decision-support tools for policymakers tasked with developing sustainable transport systems [
23,
24]. The ability to forecast energy demand with greater precision enables more efficient resource allocation, supports targeted infrastructure investments, and facilitates the development of evidence-based regulatory frameworks [
25]. Furthermore, the identification of key drivers of energy consumption through feature importance analysis provides valuable insights for prioritizing policy interventions and designing effective energy conservation measures [
26]. As Thailand continues to balance economic development with environmental sustainability goals, robust predictive models that capture the complex relationships between vehicle fleet composition, usage patterns, and energy consumption become increasingly essential for strategic transportation planning [
27,
28]. By identifying which factors most significantly influence transport energy consumption, this research provides practical guidance for policymakers to develop more targeted and effective interventions, potentially yielding substantial energy savings and emissions reductions while maintaining economic growth.
2. Literature Reviews
Recent research in transport energy consumption prediction has witnessed significant advancements through artificial intelligence applications. A comprehensive analysis of the literature reveals several distinct research streams that have emerged in the past five years (
Table 1 and
Table A1).
The application of deep learning approaches, particularly LSTM networks, has gained considerable attention for transport energy forecasting. Duan et al. [
12] demonstrated LSTM’s effectiveness in capturing temporal dependencies in time-series data, achieving prediction accuracies of up to 94% for transportation-related energy forecasting. Similarly, Ghanbari et al. [
19] enhanced LSTM with attention mechanisms for real-time predictions of energy consumption in public transport systems, reporting accuracy improvements of 8–15% over traditional statistical methods. These advancements were further refined by Karim et al. [
11], who introduced multivariate LSTM-FCNs that demonstrated superior performance in handling multiple input variables for energy demand forecasting.
In parallel, gradient boosting methods have shown promising results for transport energy prediction. Ji et al. [
13] applied XGBoost to predict CO
2 emissions and energy consumption in China’s transport sector, leveraging economic indicators and vehicle usage data to achieve remarkable accuracy with
R2 values exceeding 0.95. Çınarer et al. [
15] conducted a comparative analysis of machine learning approaches for predicting transportation-related emissions in Turkey, finding that XGBoost consistently outperformed support vector machines and neural networks across various input scenarios. Similarly, Zhang et al. [
14] demonstrated XGBoost’s effectiveness in handling urban transport data from expanding megacities, achieving lower
RMSE values compared to traditional statistical methods.
The integration of socioeconomic factors with vehicle-specific variables has emerged as another significant research direction. Rahman et al. [
20] developed causality-based machine learning models to establish relationships between GDP, urbanization, and transport energy demand in Saudi Arabia, finding that urbanization exerted a stronger influence than economic growth on energy consumption patterns. This approach was extended by Hoxha et al. [
16], who employed stacking ensemble methods to improve prediction accuracy by incorporating diverse socioeconomic indicators alongside traditional transport metrics.
Regional studies have provided valuable insights into context-specific factors affecting transport energy consumption. In Thailand, Pongthanaisawan et al. [
1] developed econometric models to forecast transport energy demand and emissions until 2030, identifying significant relationships between economic growth and transportation energy intensity. Champeecharoensuk et al. [
17] focused on aviation emissions in Thailand, demonstrating how economic growth and aviation activity served as primary drivers of increasing energy demand in the sector. These studies highlight the importance of region-specific analyses but are limited by their reliance on conventional statistical approaches rather than advanced machine learning techniques.
Methodological innovations have further expanded the predictive capabilities in this domain. Wang et al. [
29] introduced a LASSO regression framework for predicting fuel consumption in maritime transportation, addressing multicollinearity challenges in feature variables. Antonopoulos et al. [
26] conducted a systematic review of AI applications in energy demand-side response, identifying significant opportunities for machine learning in transport energy optimization. Alabi et al. [
27] explored the integration of optimization techniques with machine learning for energy systems planning, highlighting the potential for hybrid approaches in transport energy modeling.
The temporal scope of previous studies varies considerably, with most focusing on short to medium-term forecasting horizons of 5–10 years. Ağbulut [
2] provided longer-term projections of transportation energy demand in Turkey until 2050, but noted increasing uncertainty with extended forecast horizons. Chai et al. [
24] examined historical trends in road transportation energy consumption in China, establishing ‘S’ type patterns in the relationship between economic development and energy intensity that could inform long-term projections.
Despite these advances, several notable research gaps persist. First, while previous studies have incorporated vehicle-related variables, few have comprehensively analyzed the combined impact of different vehicle categories alongside their corresponding usage patterns. Second, the long-term relationship between macroeconomic indicators and transport energy consumption remains underexplored, particularly in developing economies like Thailand. Third, there are limited comparative analyses of deep learning versus gradient-boosting approaches for transport energy forecasting using consistent evaluation metrics and datasets. Finally, feature importance analysis to identify key drivers of transport energy consumption has received insufficient attention despite its potential to inform targeted policy interventions.
This study addresses these gaps by leveraging a comprehensive dataset spanning 30 years (1993–2022), incorporating granular vehicle data categorized by size and type, comparing two advanced machine learning approaches using consistent evaluation metrics, and conducting detailed feature importance analysis to identify key drivers of transport energy consumption in Thailand. This approach enables a more nuanced understanding of the complex relationships between vehicle ownership, usage patterns, economic factors, and energy consumption than has been previously achieved in the literature.
3. Methodologies
The selection of appropriate modeling approaches is critical for achieving reliable and accurate predictions in transport energy forecasting. From the vast array of artificial intelligence and machine learning techniques available in the literature, this study deliberately chose Long Short-Term Memory (LSTM) neural networks and Extreme Gradient Boosting (XGBoost) for this study based on several key considerations.
LSTM was selected primarily for its proven capability in handling sequential time-series data with temporal dependencies, which is essential when analyzing transportation patterns that evolve over extended periods. Unlike traditional recurrent neural networks, LSTM’s unique architecture with memory cells enables it to capture long-term relationships in time-series data while avoiding the vanishing gradient problem. This architecture is particularly relevant for the dataset spanning three decades (1993–2022), where historical patterns may significantly influence future energy consumption trends. Previous applications of LSTM in energy forecasting domains have demonstrated its effectiveness, with studies by Karim et al. [
11] and Duan et al. [
12] showing superior performance compared to conventional statistical methods for similar prediction tasks.
Conversely, XGBoost was selected for its exceptional performance in handling structured tabular data with complex feature relationships, which characterizes the multivariate dataset with vehicle categorizations, economic indicators, and usage patterns. As an ensemble learning method based on decision trees, XGBoost excels at capturing non-linear relationships and interactions between variables without requiring extensive data preprocessing. Its built-in regularization features help prevent overfitting, which is particularly valuable when working with relatively limited observations (30 years in this case) but multiple predictor variables. Additionally, XGBoost’s computational efficiency and inherent feature importance capabilities provide practical advantages for both model training and interpretability. Recent studies by Zhang et al. [
14], Çınarer et al. [
15] have demonstrated XGBoost’s effectiveness specifically for transportation-related energy and emissions prediction tasks.
3.1. Data Collection
This study employs a comprehensive dataset spanning from 1993 to 2022, encompassing various aspects of Thailand’s transport sector shown in
Table 2.
3.1.1. Transport Energy Consumption
The transport energy consumption in Thailand demonstrates a significant upward trajectory over the 30-year period, increasing from 14.581 MTOE in 1993 to 30.927 MTOE in 2022, representing a 112% growth. The data reveals three distinct phases: steady growth (1993–2008), accelerated increase (2009–2019), and pandemic disruption (2020–2022). The most substantial growth occurred between 2009 and 2019, with an average annual increase of 3.4%. However, the COVID-19 pandemic caused a notable disruption, resulting in an 11.6% decline from 33.607 MTOE in 2019 to 29.699 MTOE in 2020. The sector has shown signs of recovery since 2021, though not yet reaching pre-pandemic levels. This pattern reflects Thailand’s economic development, urbanization, and increasing motorization rate. The mean consumption over the period was 24.374 MTOE, with a standard deviation of 5.484 MTOE, indicating considerable variability in consumption patterns.
3.1.2. GDP
Thailand’s Gross Domestic Product exhibits a strong positive trend, growing from 4.341 trillion Baht in 1993 to 10.676 trillion Baht in 2022, representing a 146% increase. The growth pattern shows resilience despite several economic challenges, including the 1997 Asian Financial Crisis and the 2008 Global Financial Crisis. The average GDP over the period was 7.821 trillion Baht, with a standard deviation of 2.124 trillion Baht. The data reveal four distinct growth phases: rapid growth (1993–1997), recovery and stabilization (1998–2003), sustained growth (2004–2019), and pandemic impact (2020–2022). The strongest period of growth was observed between 2004 and 2019, with an average annual growth rate of 4.2%. The COVID-19 pandemic caused a significant contraction in 2020, with GDP falling by 6.1% from 10.919 trillion Baht in 2019 to 10.259 trillion Baht in 2020. However, the economy showed remarkable recovery strength, rebounding to 10.676 trillion Baht by 2022.
3.1.3. Population
Thailand’s population demonstrates a steady but slowing growth pattern, increasing from 57.776 million in 1993 to 71.697 million in 2022, representing a 24.1% growth over the 30-year period. The data reveals a gradual demographic transition, with the average annual growth rate declining from 1.4% in the early 1990s to 0.3% in recent years. The mean population over the period was 66.839 million, with a relatively small standard deviation of 4.227 million, indicating stable demographic changes. The population growth pattern can be divided into three phases: moderate growth (1993–2000), slowing growth (2001–2010), and near-stabilization (2011–2022). This trend reflects Thailand’s successful family planning programs and the country’s transition to an aging society. The data also show increasing urbanization, with a higher concentration of population in urban areas over time. The slowing population growth has significant implications for transport energy consumption patterns, particularly in terms of changing mobility needs and transportation preferences.
3.1.4. Vehicle Registration Data (Small, Medium, Large)
The vehicle registration data reveal dramatic changes in Thailand’s motorization patterns. Small vehicles, primarily motorcycles and mopeds, increased from 7.313 million in 1993 to 22.301 million in 2022, representing a 205% growth. Medium vehicles showed the most substantial increase, from 3.030 million to 18.970 million (526% growth), reflecting rising middle-class affluence and changing consumer preferences. Large vehicles experienced more modest growth, from 0.533 million to 1.359 million (155% growth). The data show distinct growth phases: rapid motorization (1993–2003), stabilization (2004–2010), and renewed growth (2011–2022). The average number of registered vehicles reached 16.956 million for small vehicles (SD = 4.439), 9.843 million for medium vehicles (SD = 4.851), and 0.972 million for large vehicles (SD = 0.254). The growth patterns indicate a significant shift in Thailand’s vehicle fleet composition, with medium-sized vehicles gaining an increasingly larger share of the total fleet.
3.1.5. Vehicle Kilometers Traveled
The Vehicle Kilometers Traveled data provide crucial insights into actual vehicle usage patterns. Motorcycle VKT closely mirrors small vehicle registration trends, increasing from 7.313 million vehicle kilometers in 1993 to 22.301 million in 2022. Passenger vehicle VKT showed the most dramatic increase, from 3.138 million to 19.103 million vehicle kilometers, representing a 509% growth. Truck VKT demonstrated steady but slower growth, from 0.426 million to 1.226 million vehicle kilometers (188% growth). The mean VKT values were 16.956 million for motorcycles (SD = 4.439), 10.027 million for passenger vehicles (SD = 4.915), and 0.827 million for trucks (SD = 0.232). The data reveal changing mobility patterns, with significant increases in personal vehicle usage and moderate growth in freight transport. The COVID-19 pandemic caused temporary reductions in VKT across all categories in 2020, but recovery trends became evident by 2022, particularly in passenger vehicle usage.
3.2. Multivariate Long Short-Term Memory (LSTM) Neural Networks
Multivariate Long Short-Term Memory (LSTM) Neural Networks represent an advanced adaptation of the traditional LSTM architecture, specifically designed to handle multiple input variables in time series forecasting. First introduced as an extension of univariate LSTM models, multivariate LSTM has gained significant attention in complex forecasting applications where multiple interrelated factors influence the target variable [
12,
19]. This architecture’s ability to process multiple time-dependent variables simultaneously while maintaining temporal relationships has made it particularly valuable in energy consumption prediction and complex system modeling.
The fundamental structure of multivariate LSTM builds upon the standard LSTM cell architecture but incorporates multiple input features at each time step, as shown in
Figure 1. The network processes these inputs through its gates: forget gate (
ft), input gate (
it), and output gate (
ot), which now handle vectors of multiple variables rather than single values. The mathematical formulation extends the basic LSTM equations to accommodate multiple input variables:
xt becomes a vector [
x1t,
x2t …
xnt] where n represents the number of input variables. The cell state update equation becomes more complex, as it must account for the interactions between different variables:
, where
now incorporates information from multiple input features [
11].
The key advantages of multivariate LSTM lies in its ability to capture complex relationships and dependencies not only across time but also between different variables. For instance, in transport energy consumption forecasting, the model can simultaneously process vehicle registration data, economic indicators, and seasonal patterns, learning how these variables interact and influence energy consumption patterns. Studies by Henriques et al. [
25] demonstrated that multivariate LSTM models achieve significantly higher accuracy compared to univariate approaches, with improvements in prediction accuracy ranging from 15% to 25%.
The architecture’s effectiveness stems from its capacity to learn feature interactions automatically through its training process. Unlike traditional statistical methods that often require explicit specification of variable relationships, multivariate LSTM can discover and leverage both linear and non-linear relationships between input variables. This capability is particularly valuable in complex systems where the relationships between variables may not be immediately apparent or may change over time. Additionally, the model’s memory mechanism allows it to retain information about multiple variables over long sequences, making it especially suitable for long-term forecasting tasks where historical patterns across multiple dimensions influence future values [
13,
30].
3.3. XGboost
XGBoost is a powerful machine learning technique that enhances the efficiency of constructing scalable decision trees and is widely applied in transport energy forecasting due to its accuracy and computational efficiency [
31]. Developed from the gradient boosting decision tree (GBDT) algorithm, XGBoost improves upon GBDT by utilizing a second-order Taylor expansion for the loss function and incorporating regularization to reduce model complexity and prevent overfitting [
32]. These enhancements make XGBoost particularly suitable for forecasting transport energy consumption, where multiple interdependent factors—such as vehicle registration, vehicle kilometers traveled (VKT), GDP, and population growth—must be accounted for in predictive modeling. The algorithm constructs a series of decision trees and refines predictions iteratively using a regularization function to balance model complexity and predictive accuracy.
A GB model, such as XGBoost, uses an additional regularization function
N to predict results [
31].
This refers to the regression tree domain, specifically the Classification and Regression Trees (CART) methodology.
N is denoting the total number of trees, while
F represents all tree regions. The variable
q defines the tree structure,
T indicates the number of leaves, and
represents a tree with the structure
q, where the leaf weights are independent. The function
aligns with the input data to learn the group of functions utilized in the model, allowing the objectives to be regularized as follows:
The term
l represents different convex loss functions used to measure the prediction of
against the target of
. The second term, Ω, serves as a regularization function to penalize model complexity, incorporating the regression tree parameters
γ and
λ as regularization factors. This term helps smooth the final learned weights, effectively reducing the risk of overfitting. The primary goal of regularization is to select a simpler, more generalizable model that maintains high predictive accuracy. However, the inclusion of multiple efficient tree models does not necessarily improve spatial efficiency when relying solely on traditional Euclidean methods, as the model is trained by incrementally optimizing this loss function.
This equation will gradually add a function , which helps to improve the model as much as the equation.
Ensuring that each tree incrementally improves the forecast accuracy. This methodology enables XGBoost to efficiently capture transport energy consumption trends by integrating complex relationships between economic indicators, transport activity, and policy changes, making it a valuable tool for sustainable energy planning and forecasting.
3.4. Model Evaluation
The performance of the models is assessed using the following statistical metrics, which provide a comprehensive evaluation of their predictive accuracy and reliability:
Mean Squared Error (
MSE): Measures the average squared differences between predicted and actual values.
Root Mean Squared Error (
RMSE): Provides an interpretable measure of prediction errors in the same units as the target variable.
Mean Absolute Error (
MAE): Captures the average magnitude of prediction errors.
Mean Absolute Percentage Error (
MAPE): Expresses errors as a percentage of actual values for better interpretability.
R-Squared (
R2): Indicates the proportion of variance explained by the mode Both models are evaluated using the following metrics:
where
is the total number of observations.
is the actual value of the target variable.
is the predicted value.
is the mean of the actual target values.
is the residual sum of squares: representing the unexplained variance by the model.
is the total sum of squares: representing the total variance in the data.
While both LSTM and XGBoost are traditionally considered data-intensive techniques, this study acknowledges the relatively limited sample size in this study (30 annual observations from 1993 to 2022). This raises legitimate concerns about potential overfitting and model generalizability. To address this issue, this study has implemented several strategies to optimize model performance despite data constraints.
First, this study carefully applied regularization techniques in both models to prevent overfitting. For XGBoost, this study utilized L1 and L2 regularization parameters along with early stopping mechanisms. For LSTM, this study employed dropout layers and recurrent dropout to enhance generalization ability. Second, this study implemented cross-validation techniques specifically adapted for time series data, using a time-based splitting approach rather than random sampling to preserve temporal dependencies.
Additionally, this study conducted comparative analyses with classical statistical methods, including Multiple Linear Regression (MLR) and Autoregressive Integrated Moving Average (ARIMA), which are often considered more appropriate for smaller datasets. The evaluation revealed that despite the data limitations, XGBoost consistently outperformed these traditional approaches, achieving superior prediction accuracy across multiple metrics. This aligns with findings from Sun-Youn et al. [
33], who demonstrated that machine learning approaches could achieve better performance than econometric methods even with limited observations when the underlying relationships between variables are complex and non-linear.
The performance advantage of XGBoost in this case can be attributed to its ability to capture complex interactions between variables without requiring the large sample sizes typically needed for deep learning models like LSTM. This is consistent with research by Emmanuel Hidalgo et al. [
34], who found that tree-based models like Random Forest and XGBoost could effectively model energy consumption with limited data points due to their ensemble nature and feature-based learning approach.
4. Results
4.1. Descriptive Analysis
From
Table 3, the dataset provides a comprehensive overview of key variables influencing transport energy consumption in Thailand from 1993 to 2022. Transport energy consumption, Gross Domestic Product (GDP), visualized, demonstrates a steady growth trajectory, increasing from 4.341 trillion Baht in 1993 to a peak of 10.919 trillion Baht in 2019, with an average of 7.570 trillion Baht and a standard deviation of 2.133 trillion Baht. The temporary contraction in GDP during 2020 coincided with a similar dip in transport energy consumption, emphasizing the correlation between economic growth and energy demand. By 2022, GDP rebounded to 10.676 trillion Baht, reflecting resilience in economic activity.
Population growth is another significant factor influencing transport energy demand. The population increased steadily from 57.776 million in 1993 to 71.697 million in 2022, with an average of 66.368 million and a standard deviation of 4.238 million. Although population growth alone does not directly determine energy consumption, it drives increased demand for both private and public transportation.
Vehicle registration trends show substantial growth across all vehicle categories. Small vehicle registrations (motorcycles and mopeds) increased from 7.313 million in 1993 to 22.301 million in 2022, averaging 16.508 million. Medium-sized vehicles (passenger cars, pickup trucks, and vans) saw the most dramatic rise, growing from 3.030 million to 18.970 million, with an average of 9.950 million and a standard deviation of 4.876 million. Large vehicle registrations (buses, trucks, and trailers) increased from 0.533 million to 1.359 million, with an average of 0.951 million.
Vehicle kilometers traveled (VKT), reflects usage patterns across different vehicle types. Motorcycle VKT increased significantly from 7.313 million in 1993 to 22.301 million in 2022, with an average of 16.508 million. Passenger vehicle VKT grew substantially from 3.138 million to 19.103 million, with an average of 10.084 million and a standard deviation of 4.889 million. Similarly, truck VKT, which represents freight transport demand, rose from 0.426 million to 1.226 million, averaging 0.817 million over the period.
This analysis highlights the interdependence between economic growth, vehicle expansion, and transport energy consumption. The consistent rise in vehicle registrations and kilometers traveled underscores the increasing reliance on private and commercial transport, which directly impacts energy demand. Furthermore, the fluctuations observed in 2020 emphasize the sector’s vulnerability to external shocks, such as economic downturns and global disruptions. Together, these insights provide valuable guidance for future transport energy policies and planning.
As shown in
Figure 2, transport energy consumption, measured in million tons of oil equivalent (MTOE), exhibits an increasing trend over the years, with an average of 23.888 MTOE. The relationship between transport energy consumption and GDP demonstrates strong coupling, with both metrics showing similar growth trajectories until 2019, followed by a pandemic-induced decline in 2020, and a subsequent recovery phase. This visualization effectively captures the interconnection between economic growth and energy demand in the transport sector over the three-decade period.
Figure 3 illustrates the evolution of vehicle fleet composition in Thailand and its relationship with transport energy consumption. The visualization reveals a gradual but significant shift in fleet structure, with medium-sized vehicles (passenger cars, pickup trucks, and vans) gaining an increasingly larger share of the total fleet, growing from approximately 28% in 1993 to nearly 45% in 2022. This transformation coincides with the increasing trajectory of transport energy consumption, suggesting a potential causal relationship between the growing proportion of medium vehicles and rising energy demand. The dominance of small vehicles (motorcycles and mopeds) has correspondingly decreased from 67% to 52% during this period, while large vehicles have maintained a relatively stable proportion of approximately 3–5% throughout.
Figure 4 presents the changing patterns of vehicle kilometers traveled (VKT) by different vehicle types in relation to transport energy consumption. The visualization demonstrates that passenger vehicle VKT has grown substantially as a proportion of total kilometers traveled, particularly since 2003. This trend aligns closely with the acceleration in transport energy consumption during the same period, highlighting the significant energy implications of increased passenger vehicle usage. While motorcycle VKT has maintained a substantial share of total kilometers traveled, its proportional contribution has gradually declined. Truck VKT has remained relatively stable as a percentage of total kilometers traveled, yet its energy intensity per kilometer makes it a significant contributor to overall energy consumption despite its smaller share of total VKT.
The correlation matrix provides strong empirical support for the relationships identified in the feature importance analysis. As shown in the heatmap (
Figure 5), all predictor variables exhibit high positive correlations with transport energy consumption, with coefficients ranging from 0.90 to 0.98. This confirms the strong statistical relationships underpinning the model. Particularly notable is the strong correlation (0.98) between GDP and transport energy consumption, validating the economic-energy consumption relationship established in the literature. Similarly, the high correlations between vehicle registration data and energy consumption (ranging from 0.90 to 0.95) substantiate the finding that fleet composition significantly influences energy demand. The correlation analysis also reveals interesting inter-variable relationships. For instance, the perfect correlation (1.00) between registered small vehicles and VKT Motorcycle, as well as between medium vehicles and VKT Passenger, confirms the direct relationship between vehicle ownership and usage patterns. The strong correlations between medium and large vehicles (0.99) and their corresponding VKT measures suggest coordinated growth in these segments.
4.2. Model Fitting and Performance Comparison LSTM and XGBoost
The study investigates transport energy consumption in Thailand, employing advanced predictive models to analyze trends and key influencing factors over the period 1993 to 2022. By leveraging the Long Short-Term Memory (LSTM) and XGBoost models, the research aims to accurately predict energy demand and identify critical contributors to consumption patterns. Through comparative analysis, the models reveal distinct capabilities in capturing historical trends and projecting future demands. Additionally, feature importance analysis highlights the dominant role of medium vehicles and freight transportation in shaping energy usage, alongside demographic and economic factors. These findings offer valuable insights for designing energy-efficient policies and sustainable transportation strategies.
To compare the performance of the LSTM and XGBoost. Bayesian Optimization was applied to critical hyperparameters for LSTM and XGBoost to decrease overfitting and ensure robust model predictions, as shown in
Table 4 and
Table 5.
This study did not apply stationarity tests or differencing, as LSTM models inherently handle non-stationary time series without explicit transformations [
35]. Unlike traditional models, LSTM captures temporal dependencies directly, making such preprocessing unnecessary [
36]. However, its ability to learn long-term dependencies may be affected by extreme trends or structural shifts. In contrast, XGBoost, a tree-based model, relies on direct feature interactions rather than sequential dependencies [
37]. making stationarity adjustments irrelevant to its framework. To ensure a fair comparison, both models were evaluated on the same untransformed dataset, allowing an unbiased assessment of sequential versus feature-based learning approaches.
From
Table 6, the comparison of LSTM and XGBoost model performance reveals significant differences in prediction accuracy across various evaluation metrics. XGBoost outperforms LSTM in Mean Squared Error (MSE), with values of 0.0256 for training data and 0.2790 for test data, compared to 4.9473 and 4.5695, respectively, for LSTM, highlighting XGBoost’s superior error minimization. Similarly, Root Mean Squared Error (
RMSE) is lower for XGBoost (0.1599 Train, 0.5282 Test) than for LSTM (2.2243 Train, 2.1377 Test), indicating a more precise fit. In terms of Mean Absolute Error (
MAE), XGBoost exhibits a lower error magnitude (0.0815 Train, 0.3200 Test) compared to LSTM (1.9142 Train, 1.8057 Test), further demonstrating its accuracy. Additionally, Mean Absolute Percentage Error (
MAPE) results show that XGBoost achieves superior percentage error reduction (0.49% Train, 1.08% Test) over LSTM (9.36% Train, 6.06% Test). Lastly, R-squared (
R2) values confirm that XGBoost has significantly higher explanatory power (0.9976 Train, 0.9508 Test) than LSTM (0.3140 Train, 0.2005 Test), indicating better predictive reliability. These results confirm that XGBoost consistently outperforms LSTM across all key performance metrics, making it a more effective model for predictive tasks.
Figure 6 presents the results of transport energy consumption predictions in Thailand using the Long Short-Term Memory (LSTM) model, covering the period 1993 to 2022. The graph compares actual and predicted values for both the training and testing datasets. In the training phase (1993–2012), the actual values, represented by the blue line, align closely with the predicted values, shown by the green dashed line. This close agreement demonstrates the LSTM model’s strong ability to learn and replicate temporal patterns and trends from historical data. The model effectively tracks the gradual increase in energy consumption over this period, reflecting its strength in capturing long-term trends. In the testing phase (2013–2022), the model’s predictions, represented by the red dashed line, are compared with actual values, shown by the purple line. The graph indicates that the LSTM model continues to perform well, capturing the general upward trend in energy consumption. The vertical dashed line in the graph clearly separates the training and testing datasets, illustrating the distinction between the data used for model training and validation.
Figure 7 illustrates the results of transport energy consumption predictions in Thailand using the XGBoost model, covering the same period of 1993 to 2022. This graph also compares actual and predicted values for the training and testing datasets. In the training phase (1993–2012), the actual values, represented by the blue line, closely align with the predicted values, shown by the green dashed line. This alignment demonstrates XGBoost’s ability to accurately learn and replicate historical energy consumption trends. The model captures the increasing pattern during this period with high precision. In the testing phase (2013–2022), the predicted values, represented by the red dashed line, closely follow the actual values, shown by the purple line. The graph highlights XGBoost’s strong predictive performance in capturing the continued increase in energy consumption. The vertical dashed line separates the training and testing datasets, emphasizing the clear distinction between the two phases. The XGBoost model demonstrates excellent predictive accuracy across both training and testing periods, effectively capturing long-term growth patterns in transport energy consumption. Its ability to consistently align with actual data showcases its reliability as a tool for forecasting energy demand.
The comparison between the Long Short-Term Memory (LSTM) and XGBoost models reveals notable differences in their performance for predicting transport energy consumption in Thailand from 1993 to 2022. In the training phase (1993–2012), both models show a close alignment between actual and predicted values, but XGBoost demonstrates a tighter fit, with its predicted values almost perfectly overlapping the actual values. This indicates its superior ability to minimize errors in historical data. During the testing phase (2013–2022), XGBoost continues to exhibit higher accuracy, closely following the actual values and effectively handling both gradual and sharp changes in energy consumption. In contrast, LSTM, while performing well overall, shows slightly larger deviations during periods of rapid fluctuations. LSTM excels in capturing long-term trends and temporal dependencies, leveraging its memory mechanism to learn sequential patterns, though it lags slightly during periods of variability. XGBoost, on the other hand, consistently maintains precision by iteratively refining predictions and effectively handling non-linear relationships. Both models effectively predict transport energy consumption, but XGBoost outperforms LSTM in terms of predictive accuracy, particularly in the testing phase, while LSTM’s strength lies in modeling complex temporal sequences.
Figure 8 presents the Feature Importance Analysis for the prediction model, highlighting the relative contributions of various factors to transport energy consumption. The most influential feature is Registered-vehicle Medium, which accounts for 36.6% of the total importance score, underscoring the significant role of medium vehicles, such as passenger cars and pick-up trucks, in driving energy demand. Following this, VKT Truck contributes 20.5%, emphasizing the critical impact of freight transportation and heavy vehicle usage on energy consumption. Both Registered-vehicle Large and Registered-vehicle Small each contribute 11.6%,11.1%, indicating that while medium vehicles dominate, large vehicles (e.g., buses, trucks) and small vehicles (e.g., motorcycles, mopeds) also play notable roles. Demographic and economic factors such as Population and GDP contribute 9.6% and 5.6%, respectively, reflecting their influence on shaping transport demand and energy usage patterns. Finally, VKT Passenger and VKT Motorcycle have smaller contributions, accounting for 3.9% and 1.1%, respectively, indicating that while personal and motorcycle travel influence energy consumption, their overall impact is relatively limited compared to other factors. These findings provide valuable insights for prioritizing energy-efficient policies and sustainable transportation planning.
6. Conclusions
This research addressed the critical challenges facing Thailand’s transport sector in energy management and sustainable development. As the second-largest energy-consuming sector in Thailand, accounting for approximately 30% of total final energy consumption with a consistent upward trend over three decades, the transport sector presents significant challenges for energy security, environmental sustainability, and greenhouse gas emission reduction commitments. The complex nature of Thailand’s transport energy consumption, characterized by its unique vehicle fleet composition, geographical disparities in transportation needs, and the growing impact of e-commerce and logistics services, necessitated a sophisticated forecasting approach.
The literature review identified several key research gaps. While previous studies have incorporated vehicle-related variables, none had comprehensively analyzed the combined impact of differentiated vehicle categories alongside their corresponding vehicle kilometers traveled data. There was also limited exploration of the long-term relationship between macroeconomic indicators and transport energy consumption in Thailand, and insufficient understanding of how the relationship between vehicle registration numbers and actual usage influences energy consumption predictions. To address these gaps, this study established four main objectives:
to develop and compare the performance of LSTM and XGBoost models in predicting Thailand’s transport energy consumption,
to analyze the differential impact of various vehicle categories and their usage patterns on energy consumption,
to identify and quantify the relative importance of different factors affecting transport energy consumption, and
to provide evidence-based recommendations for transport energy policy planning.
The study leveraged a comprehensive dataset spanning from 1993 to 2022, which included vehicle registration data categorized by size, vehicle kilometers traveled for different vehicle types, macroeconomic indicators, and historical transport energy consumption data. This extensive temporal coverage provided a unique opportunity to analyze long-term patterns that most existing studies cannot address. This study implemented and compared two advanced machine learning approaches: Multivariate Long Short-Term Memory (LSTM) neural networks and the XGBoost algorithm. The models were evaluated using multiple metrics, including Mean Squared Error, Root Mean Squared Error, Mean Absolute Error, Mean Absolute Percentage Error, and R-squared values. The findings demonstrated that XGBoost consistently outperformed LSTM across all evaluation metrics, achieving an R-squared value of 0.9508 for test data compared to LSTM’s 0.2005. Feature importance analysis revealed that medium-sized vehicles had the most substantial influence (36.6%) on transport energy consumption, followed by truck VKT (20.5%), suggesting that policies targeting fuel efficiency standards and freight transport optimization could yield substantial energy savings.
This research contributes to both academic understanding and practical policy implementation in several ways.
First, it establishes the superior performance of XGBoost for transport energy forecasting in Thailand’s context, challenging conventional assumptions about the superiority of recurrent neural networks for time-series forecasting.
Second, it quantifies the differential impact of various vehicle categories and their usage patterns on energy consumption, providing a nuanced understanding of how fleet composition shapes transport energy demand.
Third, it demonstrates the value of integrating both vehicle registration and vehicle usage data in predictive models, offering enhanced insights compared to approaches based solely on one data category.
Moreover, while this study focuses on Thailand, its methodology and findings have broader implications for other developing and middle-income economies with similar transportation characteristics. The proposed machine learning framework can be adapted to national contexts where energy demand is shaped by comparable macroeconomic conditions, vehicle fleet compositions, and infrastructure developments. Further validation across multiple regions is necessary, incorporating localized adjustments for variations in transport systems, fuel composition, and regulatory frameworks.
The dataset used, including vehicle registration, vehicle kilometers traveled (VKT), macroeconomic indicators, and historical energy consumption, aligns with data structures maintained by transport and energy agencies worldwide. Given this consistency, the XGBoost and LSTM models can be applied in other countries for high-precision energy forecasting. Feature importance analysis highlights medium-sized vehicles and freight transport as dominant contributors to energy demand, a pattern observed in various developing economies. These findings suggest that machine learning-based forecasting approaches are effective across diverse regional settings.
Beyond methodological contributions, this study presents a data-driven framework for transport energy policy planning. The predictive capability of these models supports policy evaluation in areas such as fuel efficiency regulations, vehicle electrification, and logistics optimization. Quantifying the impact of different vehicle categories on energy consumption allows policymakers to prioritize interventions such as fuel economy standards, freight transport efficiency improvements, and modal shifts from road to rail.
Given the global relevance of transport energy challenges, these models can inform international strategies for improving energy efficiency. Countries experiencing rapid urbanization and motorization could leverage these techniques to forecast fuel demand, optimize taxation policies, and design incentives for low-emission vehicles. Incorporating machine learning forecasting into multi-modal transport planning can also support sustainable urban mobility strategies.
Despite strong predictive performance, these models require adaptation to different national contexts due to variations in infrastructure, fuel types, and policy frameworks. Future research should validate their applicability in multiple regions, integrating localized variables and emerging trends such as autonomous mobility and renewable energy transitions. Refining these methodologies will enhance the role of machine learning in transport energy policy planning, supporting more effective and sustainable mobility solutions globally.
Based on the model results, this study offers the following evidence-based recommendations for transport energy policy planning in Thailand:
The XGBoost models feature importance analyses providing clear direction for prioritizing transport energy interventions. Given that medium-sized vehicles (passenger cars and pick-up trucks) account for 36.6% of the influence on transport energy consumption, policies targeting this vehicle segment should receive the highest priority. This study recommends implementing progressively stringent fuel economy standards for new medium vehicles, coupled with financial incentives for low-emission alternatives. Tax structures should be redesigned to favor more efficient vehicles within this category, potentially through engine displacement or emissions-based taxation.
The significant contribution of truck VKT (20.5%) indicates that freight transport efficiency represents a high-leverage intervention point. This study recommends developing a national freight efficiency program that includes driver training, logistics optimization, and aerodynamic retrofitting incentives. Additionally, strategic investment in rail freight infrastructure could yield substantial energy savings by shifting appropriate cargo from road to more efficient transport modes. Time-of-day delivery regulations in urban centers could further reduce congestion-related energy waste in freight operations.
For the combined 22.7% importance of small and large vehicle registrations, the result recommends differentiated policy approaches. For large vehicles, implementing scrappage programs targeting the oldest, most inefficient models would accelerate fleet modernization. For small vehicles, particularly motorcycles, policies facilitating the transition to electric two-wheelers through purchase incentives and charging infrastructure would address their collective impact despite individual efficiency.
The relatively modest direct influence of GDP (5.6%) suggests opportunities for decoupling economic growth from transport energy consumption. The result recommends incorporating transport energy considerations into economic development planning, particularly through transit-oriented development policies that reduce travel demand while maintaining economic growth. Similarly, the 9.6% contribution from population factors points to the importance of integrated land-use and transportation planning to manage mobility needs as Thailand’s demographics continue to evolve.
The performance advantage of XGBoost over LSTM in the study suggests that transportation authorities would benefit from implementing similar machine learning approaches for ongoing energy forecasting. This study recommends establishing a data integration framework that regularly updates the predictive models with current vehicle registration, usage patterns, and economic indicators to maintain forecast accuracy and adjust policies accordingly.
Finally, the complex interactions between vehicle categories, usage patterns, and macroeconomic factors identified in the model highlight the need for a coordinated, cross-sectoral approach to transport energy management. The result recommends establishing an inter-ministerial transport energy task force to ensure policy coherence across economic development, environmental, and transport sectors, with regular reassessment based on updated predictive modeling to track progress toward energy efficiency goals.
This study is limited by focusing exclusively on historical patterns without addressing potential technological disruptions in transportation.
While demonstrating superior predictive performance, the XGBoost model primarily extrapolates from existing relationships between vehicles, usage, and energy consumption. The approach may fail to adequately capture the impact of emerging technologies like electric vehicles, autonomous systems, and mobility-as-a-service platforms that could fundamentally transform transport-energy dynamics.
Future research should integrate technological transition scenarios into forecasting models through hybrid approaches that combine machine learning with systems dynamics modeling. Incorporating detailed data on fuel types, powertrain technologies, and charging infrastructure would enhance the model’s sensitivity to ongoing energy transitions in transportation.