Deep Learning vs. Gradient Boosting: Optimizing Transport Energy Forecasts in Thailand Through LSTM and XGBoost

Champahom, Thanapong; Banyong, Chinnakrit; Janhuaton, Thananya; Se, Chamroeun; Watcharamaisakul, Fareeda; Ratanavaraha, Vatanavongs; Jomnonkwao, Sajjakaj

doi:10.3390/en18071685

Open AccessArticle

Deep Learning vs. Gradient Boosting: Optimizing Transport Energy Forecasts in Thailand Through LSTM and XGBoost

by

Thanapong Champahom

¹

,

Chinnakrit Banyong

²,

Thananya Janhuaton

²

,

Chamroeun Se

³,

Fareeda Watcharamaisakul

³,

Vatanavongs Ratanavaraha

²

and

Sajjakaj Jomnonkwao

^2,*

¹

Department of Management, Faculty of Business Administration, Rajamangala University of Technology Isan, Nakhon Ratchasima 30000, Thailand

²

School of Transportation Engineering, Institute of Engineering, Suranaree University of Technology, Nakhon Ratchasima 30000, Thailand

³

Institute of Research and Development, Suranaree University of Technology, Nakhon Ratchasima 30000, Thailand

^*

Author to whom correspondence should be addressed.

Energies 2025, 18(7), 1685; https://doi.org/10.3390/en18071685

Submission received: 7 March 2025 / Revised: 20 March 2025 / Accepted: 25 March 2025 / Published: 27 March 2025

(This article belongs to the Special Issue Optimization of Intelligent Transport Systems Planning Energy Efficiency and Environmental Responsibility)

Download

Browse Figures

Versions Notes

Abstract

Thailand’s transport sector faces critical challenges in energy management amid rapid economic growth, with transport accounting for approximately 30% of total energy consumption. This study addresses research gaps in transport energy forecasting by comparing Long Short-Term Memory (LSTM) neural networks and XGBoost models for predicting transport energy consumption in Thailand. Utilizing a comprehensive dataset spanning 1993–2022 that includes vehicle registration data by size category, vehicle kilometers traveled, and macroeconomic indicators, this research evaluates both modeling approaches through multiple performance metrics. The results demonstrate that XGBoost consistently outperforms LSTM, achieving an R-squared value of 0.9508 for test data compared to LSTM’s 0.2005. Feature importance analysis reveals that medium vehicles contribute 36.6% to energy consumption predictions, followed by truck VKT (20.5%), with economic and demographic factors accounting for a combined 15.2%. This research contributes to both methodological understanding and practical application by establishing XGBoost’s superior performance for transport energy forecasting, quantifying the differential impact of various vehicle categories on energy consumption, and demonstrating the value of integrating vehicle registration and usage data in predictive models. The findings provide evidence-based guidance for prioritizing policy interventions in Thailand’s transport sector to enhance energy efficiency and sustainability.

Keywords:

transportation energy consumption; XGBoost; LSTM; vehicle fleet composition; energy forecasting

1. Introduction

Thailand’s transport sector faces critical challenges in energy management and sustainable development amid rapid economic growth and urbanization. As the country’s second-largest energy-consuming sector, transport accounts for approximately 30% of total final energy consumption, with a consistent upward trend over the past three decades. This escalating energy demand poses significant challenges to energy security, environmental sustainability, and the nation’s commitment to reducing greenhouse gas emissions under international agreements [1,2]. The complexity of Thailand’s transport energy consumption patterns is uniquely characterized by several factors. First, the country has a distinctive vehicle fleet composition, with an unusually high proportion of motorcycles alongside a growing number of private cars and commercial vehicles [3,4]. Second, there are significant disparities in transportation needs and energy consumption patterns between urban and rural areas, reflecting Thailand’s diverse geographical and economic landscape [5,6]. Third, the rapid expansion of e-commerce and logistics services has led to increased energy demand in the freight transport sector [7,8]. In this context, the ability to accurately predict transport energy consumption becomes crucial for effective policy planning and resource allocation [9,10].

Recent advancements in artificial intelligence and machine learning, particularly Long Short-Term Memory (LSTM) neural networks and XGBoost, offer promising alternatives for more accurate transport energy consumption predictions. These methods have demonstrated superior capabilities in handling complex, time-dependent data and capturing non-linear relationships between variables. LSTM networks excel in modeling sequential data with temporal dependencies, making them particularly suitable for forecasting energy consumption patterns over time [11,12]. XGBoost, as an ensemble learning method, has gained significant traction due to its computational efficiency and ability to handle diverse datasets with minimal preprocessing [13,14]. Both approaches can effectively incorporate multiple variables and identify complex interactions that traditional statistical methods often struggle to capture. Recent studies by Çınarer et al. [15] and Hoxha et al. [16] have demonstrated that these advanced methods significantly outperform conventional approaches in predictive accuracy for transportation-related energy consumption and emissions. However, their application in the specific context of Thailand’s transport sector remains limited, particularly in incorporating the unique characteristics of the country’s transport system and vehicle usage patterns [17,18].

Regarding a comprehensive review of relevant literature, several key trends emerge in the field of transport energy consumption prediction. Recent research by Pongthanaisawan et al. [1] has highlighted the significant growth in fuel consumption and greenhouse gas emissions in Thailand’s transport sector, emphasizing the need for accurate predictive models. Studies by Ağbulut [2] and Ji et al. [13] have demonstrated the efficacy of machine learning approaches in forecasting transportation energy demand and emissions, with their models showing high prediction accuracies ranging from 90.8% to 95.2%. The application of LSTM networks has been particularly noteworthy, with researchers such as Duan et al. [12] and Ghanbari and Borna [19] demonstrating their effectiveness in capturing temporal dependencies in time-series data related to energy consumption. In parallel, XGBoost has emerged as a powerful alternative, with studies by Çınarer et al. [15] showcasing its superior performance in predicting carbon emissions and energy needs in transportation contexts. These advanced modeling techniques have significantly outperformed traditional statistical approaches, as demonstrated by Rahman et al. [20] and Shams Amiri et al. [21] in their comparative analyses. When considering this body of research alongside the available dataset (1993–2022), several specific research gaps become apparent. First, while previous studies have incorporated vehicle-related variables, none have comprehensively analyzed the combined impact of differentiated vehicle categories (small, medium, and large) alongside their corresponding vehicle kilometers traveled (VKT) data for motorcycles, passengers, and trucks. This unique combination of variables presents an opportunity for more nuanced prediction modeling that has not been explored in existing literature. Second, although studies like Mohsin et al. [22] and Limanond et al. [23] have used GDP and population data, they have not specifically examined the long-term relationship (30-year period) between these macroeconomic indicators and transport energy consumption in Thailand’s context. The extensive temporal coverage of the dataset (1993–2022) offers a unique opportunity to analyze long-term patterns that most existing studies, typically using 5–10-year periods, cannot address. Furthermore, while existing research has employed LSTM and XGBoost models, there is a limited exploration of how these methods perform with segregated vehicle data (both registration numbers and VKT). The granularity of the vehicle-related variables provides an opportunity to develop more sophisticated prediction models that can account for the distinct impacts of different vehicle types and their usage patterns on energy consumption. Additionally, there is a notable gap in understanding how the relationship between vehicle registration numbers and actual vehicle usage (VKT) influences energy consumption predictions. This aspect is particularly relevant for Thailand’s transport policy planning but remains unexplored in the current literature.

This study utilizes a comprehensive dataset spanning from 1993 to 2022, which provides a unique opportunity for detailed analysis and modeling. The dataset includes the following: (1) vehicle registration data categorized by size (small, medium, and large vehicles), (2) vehicle kilometers traveled (VKT) for different vehicle types (motorcycles, passenger vehicles, and trucks), (3) macroeconomic indicators (GDP and population), and (4) historical transport energy consumption data. The length and granularity of this dataset enable a more nuanced understanding of the relationships between vehicle ownership, usage patterns, economic factors, and energy consumption. This rich data environment allows for the development of more sophisticated and accurate prediction models that can capture both long-term trends and short-term variations in transport energy consumption [8]. By leveraging this extensive historical data alongside advanced machine learning techniques, this study aims to provide insights that bridge the gap between technical modeling capabilities and practical policy applications in Thailand’s transport sector, addressing a critical need highlighted by Supasa et al. [7], Ji et al. [13] in their research on energy consumption patterns and emissions in the transportation sector.

The main objectives of this study are as follows:

To develop and compare the performance of LSTM and XGBoost models in predicting Thailand’s transport energy consumption, evaluating their respective strengths and limitations in handling different aspects of the prediction task.
To analyze the differential impact of various vehicle categories and their usage patterns on energy consumption, providing insights into how fleet composition affects overall energy demand.
To identify and quantify the relative importance of different factors affecting transport energy consumption, including vehicle-related variables, economic indicators, and demographic factors.
To provide evidence-based recommendations for transport energy policy planning, particularly in areas of vehicle fleet management and energy efficiency improvements.

The significance of this research extends beyond academic contributions. For policymakers, accurate predictions of transport energy consumption are essential for infrastructure planning, setting realistic energy efficiency targets, developing effective energy conservation measures, and allocating resources for sustainable transport initiatives [24,25]. The methodological advancements in this study contribute to a growing body of work on applying machine learning to energy forecasting, addressing limitations in traditional approaches identified by Antonopoulos et al. [26] and Alabi et al. [27]. Furthermore, this research comes at a critical time when Thailand is striving to balance economic growth with sustainable development goals and energy security concerns. The country’s commitment to reducing greenhouse gas emissions and promoting sustainable transport solutions makes the accurate prediction of transport energy consumption particularly relevant for future policy planning [1]. By providing a comparative analysis of LSTM and XGBoost models in the specific context of Thailand’s transport sector, this study offers valuable insights for both academic understanding and practical policy implementation, addressing a significant gap in regional research on advanced predictive modeling for transportation energy demand as highlighted by Rahman et al. [20], Emami Javanmard et al. [28].

This study’s planning perspective is particularly significant given Thailand’s current transportation policy landscape. Accurate predictive models for transport energy consumption serve as critical decision-support tools for policymakers tasked with developing sustainable transport systems [23,24]. The ability to forecast energy demand with greater precision enables more efficient resource allocation, supports targeted infrastructure investments, and facilitates the development of evidence-based regulatory frameworks [25]. Furthermore, the identification of key drivers of energy consumption through feature importance analysis provides valuable insights for prioritizing policy interventions and designing effective energy conservation measures [26]. As Thailand continues to balance economic development with environmental sustainability goals, robust predictive models that capture the complex relationships between vehicle fleet composition, usage patterns, and energy consumption become increasingly essential for strategic transportation planning [27,28]. By identifying which factors most significantly influence transport energy consumption, this research provides practical guidance for policymakers to develop more targeted and effective interventions, potentially yielding substantial energy savings and emissions reductions while maintaining economic growth.

2. Literature Reviews

Recent research in transport energy consumption prediction has witnessed significant advancements through artificial intelligence applications. A comprehensive analysis of the literature reveals several distinct research streams that have emerged in the past five years (Table 1 and Table A1).

The application of deep learning approaches, particularly LSTM networks, has gained considerable attention for transport energy forecasting. Duan et al. [12] demonstrated LSTM’s effectiveness in capturing temporal dependencies in time-series data, achieving prediction accuracies of up to 94% for transportation-related energy forecasting. Similarly, Ghanbari et al. [19] enhanced LSTM with attention mechanisms for real-time predictions of energy consumption in public transport systems, reporting accuracy improvements of 8–15% over traditional statistical methods. These advancements were further refined by Karim et al. [11], who introduced multivariate LSTM-FCNs that demonstrated superior performance in handling multiple input variables for energy demand forecasting.

In parallel, gradient boosting methods have shown promising results for transport energy prediction. Ji et al. [13] applied XGBoost to predict CO₂ emissions and energy consumption in China’s transport sector, leveraging economic indicators and vehicle usage data to achieve remarkable accuracy with R² values exceeding 0.95. Çınarer et al. [15] conducted a comparative analysis of machine learning approaches for predicting transportation-related emissions in Turkey, finding that XGBoost consistently outperformed support vector machines and neural networks across various input scenarios. Similarly, Zhang et al. [14] demonstrated XGBoost’s effectiveness in handling urban transport data from expanding megacities, achieving lower RMSE values compared to traditional statistical methods.

The integration of socioeconomic factors with vehicle-specific variables has emerged as another significant research direction. Rahman et al. [20] developed causality-based machine learning models to establish relationships between GDP, urbanization, and transport energy demand in Saudi Arabia, finding that urbanization exerted a stronger influence than economic growth on energy consumption patterns. This approach was extended by Hoxha et al. [16], who employed stacking ensemble methods to improve prediction accuracy by incorporating diverse socioeconomic indicators alongside traditional transport metrics.

Regional studies have provided valuable insights into context-specific factors affecting transport energy consumption. In Thailand, Pongthanaisawan et al. [1] developed econometric models to forecast transport energy demand and emissions until 2030, identifying significant relationships between economic growth and transportation energy intensity. Champeecharoensuk et al. [17] focused on aviation emissions in Thailand, demonstrating how economic growth and aviation activity served as primary drivers of increasing energy demand in the sector. These studies highlight the importance of region-specific analyses but are limited by their reliance on conventional statistical approaches rather than advanced machine learning techniques.

Methodological innovations have further expanded the predictive capabilities in this domain. Wang et al. [29] introduced a LASSO regression framework for predicting fuel consumption in maritime transportation, addressing multicollinearity challenges in feature variables. Antonopoulos et al. [26] conducted a systematic review of AI applications in energy demand-side response, identifying significant opportunities for machine learning in transport energy optimization. Alabi et al. [27] explored the integration of optimization techniques with machine learning for energy systems planning, highlighting the potential for hybrid approaches in transport energy modeling.

The temporal scope of previous studies varies considerably, with most focusing on short to medium-term forecasting horizons of 5–10 years. Ağbulut [2] provided longer-term projections of transportation energy demand in Turkey until 2050, but noted increasing uncertainty with extended forecast horizons. Chai et al. [24] examined historical trends in road transportation energy consumption in China, establishing ‘S’ type patterns in the relationship between economic development and energy intensity that could inform long-term projections.

Despite these advances, several notable research gaps persist. First, while previous studies have incorporated vehicle-related variables, few have comprehensively analyzed the combined impact of different vehicle categories alongside their corresponding usage patterns. Second, the long-term relationship between macroeconomic indicators and transport energy consumption remains underexplored, particularly in developing economies like Thailand. Third, there are limited comparative analyses of deep learning versus gradient-boosting approaches for transport energy forecasting using consistent evaluation metrics and datasets. Finally, feature importance analysis to identify key drivers of transport energy consumption has received insufficient attention despite its potential to inform targeted policy interventions.

This study addresses these gaps by leveraging a comprehensive dataset spanning 30 years (1993–2022), incorporating granular vehicle data categorized by size and type, comparing two advanced machine learning approaches using consistent evaluation metrics, and conducting detailed feature importance analysis to identify key drivers of transport energy consumption in Thailand. This approach enables a more nuanced understanding of the complex relationships between vehicle ownership, usage patterns, economic factors, and energy consumption than has been previously achieved in the literature.

3. Methodologies

The selection of appropriate modeling approaches is critical for achieving reliable and accurate predictions in transport energy forecasting. From the vast array of artificial intelligence and machine learning techniques available in the literature, this study deliberately chose Long Short-Term Memory (LSTM) neural networks and Extreme Gradient Boosting (XGBoost) for this study based on several key considerations.

LSTM was selected primarily for its proven capability in handling sequential time-series data with temporal dependencies, which is essential when analyzing transportation patterns that evolve over extended periods. Unlike traditional recurrent neural networks, LSTM’s unique architecture with memory cells enables it to capture long-term relationships in time-series data while avoiding the vanishing gradient problem. This architecture is particularly relevant for the dataset spanning three decades (1993–2022), where historical patterns may significantly influence future energy consumption trends. Previous applications of LSTM in energy forecasting domains have demonstrated its effectiveness, with studies by Karim et al. [11] and Duan et al. [12] showing superior performance compared to conventional statistical methods for similar prediction tasks.

Conversely, XGBoost was selected for its exceptional performance in handling structured tabular data with complex feature relationships, which characterizes the multivariate dataset with vehicle categorizations, economic indicators, and usage patterns. As an ensemble learning method based on decision trees, XGBoost excels at capturing non-linear relationships and interactions between variables without requiring extensive data preprocessing. Its built-in regularization features help prevent overfitting, which is particularly valuable when working with relatively limited observations (30 years in this case) but multiple predictor variables. Additionally, XGBoost’s computational efficiency and inherent feature importance capabilities provide practical advantages for both model training and interpretability. Recent studies by Zhang et al. [14], Çınarer et al. [15] have demonstrated XGBoost’s effectiveness specifically for transportation-related energy and emissions prediction tasks.

3.1. Data Collection

This study employs a comprehensive dataset spanning from 1993 to 2022, encompassing various aspects of Thailand’s transport sector shown in Table 2.

3.1.1. Transport Energy Consumption

The transport energy consumption in Thailand demonstrates a significant upward trajectory over the 30-year period, increasing from 14.581 MTOE in 1993 to 30.927 MTOE in 2022, representing a 112% growth. The data reveals three distinct phases: steady growth (1993–2008), accelerated increase (2009–2019), and pandemic disruption (2020–2022). The most substantial growth occurred between 2009 and 2019, with an average annual increase of 3.4%. However, the COVID-19 pandemic caused a notable disruption, resulting in an 11.6% decline from 33.607 MTOE in 2019 to 29.699 MTOE in 2020. The sector has shown signs of recovery since 2021, though not yet reaching pre-pandemic levels. This pattern reflects Thailand’s economic development, urbanization, and increasing motorization rate. The mean consumption over the period was 24.374 MTOE, with a standard deviation of 5.484 MTOE, indicating considerable variability in consumption patterns.

3.1.2. GDP

Thailand’s Gross Domestic Product exhibits a strong positive trend, growing from 4.341 trillion Baht in 1993 to 10.676 trillion Baht in 2022, representing a 146% increase. The growth pattern shows resilience despite several economic challenges, including the 1997 Asian Financial Crisis and the 2008 Global Financial Crisis. The average GDP over the period was 7.821 trillion Baht, with a standard deviation of 2.124 trillion Baht. The data reveal four distinct growth phases: rapid growth (1993–1997), recovery and stabilization (1998–2003), sustained growth (2004–2019), and pandemic impact (2020–2022). The strongest period of growth was observed between 2004 and 2019, with an average annual growth rate of 4.2%. The COVID-19 pandemic caused a significant contraction in 2020, with GDP falling by 6.1% from 10.919 trillion Baht in 2019 to 10.259 trillion Baht in 2020. However, the economy showed remarkable recovery strength, rebounding to 10.676 trillion Baht by 2022.

3.1.3. Population

Thailand’s population demonstrates a steady but slowing growth pattern, increasing from 57.776 million in 1993 to 71.697 million in 2022, representing a 24.1% growth over the 30-year period. The data reveals a gradual demographic transition, with the average annual growth rate declining from 1.4% in the early 1990s to 0.3% in recent years. The mean population over the period was 66.839 million, with a relatively small standard deviation of 4.227 million, indicating stable demographic changes. The population growth pattern can be divided into three phases: moderate growth (1993–2000), slowing growth (2001–2010), and near-stabilization (2011–2022). This trend reflects Thailand’s successful family planning programs and the country’s transition to an aging society. The data also show increasing urbanization, with a higher concentration of population in urban areas over time. The slowing population growth has significant implications for transport energy consumption patterns, particularly in terms of changing mobility needs and transportation preferences.

3.1.4. Vehicle Registration Data (Small, Medium, Large)

The vehicle registration data reveal dramatic changes in Thailand’s motorization patterns. Small vehicles, primarily motorcycles and mopeds, increased from 7.313 million in 1993 to 22.301 million in 2022, representing a 205% growth. Medium vehicles showed the most substantial increase, from 3.030 million to 18.970 million (526% growth), reflecting rising middle-class affluence and changing consumer preferences. Large vehicles experienced more modest growth, from 0.533 million to 1.359 million (155% growth). The data show distinct growth phases: rapid motorization (1993–2003), stabilization (2004–2010), and renewed growth (2011–2022). The average number of registered vehicles reached 16.956 million for small vehicles (SD = 4.439), 9.843 million for medium vehicles (SD = 4.851), and 0.972 million for large vehicles (SD = 0.254). The growth patterns indicate a significant shift in Thailand’s vehicle fleet composition, with medium-sized vehicles gaining an increasingly larger share of the total fleet.

3.1.5. Vehicle Kilometers Traveled

The Vehicle Kilometers Traveled data provide crucial insights into actual vehicle usage patterns. Motorcycle VKT closely mirrors small vehicle registration trends, increasing from 7.313 million vehicle kilometers in 1993 to 22.301 million in 2022. Passenger vehicle VKT showed the most dramatic increase, from 3.138 million to 19.103 million vehicle kilometers, representing a 509% growth. Truck VKT demonstrated steady but slower growth, from 0.426 million to 1.226 million vehicle kilometers (188% growth). The mean VKT values were 16.956 million for motorcycles (SD = 4.439), 10.027 million for passenger vehicles (SD = 4.915), and 0.827 million for trucks (SD = 0.232). The data reveal changing mobility patterns, with significant increases in personal vehicle usage and moderate growth in freight transport. The COVID-19 pandemic caused temporary reductions in VKT across all categories in 2020, but recovery trends became evident by 2022, particularly in passenger vehicle usage.

3.2. Multivariate Long Short-Term Memory (LSTM) Neural Networks

Multivariate Long Short-Term Memory (LSTM) Neural Networks represent an advanced adaptation of the traditional LSTM architecture, specifically designed to handle multiple input variables in time series forecasting. First introduced as an extension of univariate LSTM models, multivariate LSTM has gained significant attention in complex forecasting applications where multiple interrelated factors influence the target variable [12,19]. This architecture’s ability to process multiple time-dependent variables simultaneously while maintaining temporal relationships has made it particularly valuable in energy consumption prediction and complex system modeling.

The fundamental structure of multivariate LSTM builds upon the standard LSTM cell architecture but incorporates multiple input features at each time step, as shown in Figure 1. The network processes these inputs through its gates: forget gate (f_t), input gate (i_t), and output gate (o_t), which now handle vectors of multiple variables rather than single values. The mathematical formulation extends the basic LSTM equations to accommodate multiple input variables: x_t becomes a vector [x_1t, x_2t … x_nt] where n represents the number of input variables. The cell state update equation becomes more complex, as it must account for the interactions between different variables:

c_{t} = f_{t} \otimes c_{t — 1} + i_{t} \otimes {\tilde{c}}_{t}

, where

{\tilde{c}}_{t}

now incorporates information from multiple input features [11].

The key advantages of multivariate LSTM lies in its ability to capture complex relationships and dependencies not only across time but also between different variables. For instance, in transport energy consumption forecasting, the model can simultaneously process vehicle registration data, economic indicators, and seasonal patterns, learning how these variables interact and influence energy consumption patterns. Studies by Henriques et al. [25] demonstrated that multivariate LSTM models achieve significantly higher accuracy compared to univariate approaches, with improvements in prediction accuracy ranging from 15% to 25%.

The architecture’s effectiveness stems from its capacity to learn feature interactions automatically through its training process. Unlike traditional statistical methods that often require explicit specification of variable relationships, multivariate LSTM can discover and leverage both linear and non-linear relationships between input variables. This capability is particularly valuable in complex systems where the relationships between variables may not be immediately apparent or may change over time. Additionally, the model’s memory mechanism allows it to retain information about multiple variables over long sequences, making it especially suitable for long-term forecasting tasks where historical patterns across multiple dimensions influence future values [13,30].

3.3. XGboost

XGBoost is a powerful machine learning technique that enhances the efficiency of constructing scalable decision trees and is widely applied in transport energy forecasting due to its accuracy and computational efficiency [31]. Developed from the gradient boosting decision tree (GBDT) algorithm, XGBoost improves upon GBDT by utilizing a second-order Taylor expansion for the loss function and incorporating regularization to reduce model complexity and prevent overfitting [32]. These enhancements make XGBoost particularly suitable for forecasting transport energy consumption, where multiple interdependent factors—such as vehicle registration, vehicle kilometers traveled (VKT), GDP, and population growth—must be accounted for in predictive modeling. The algorithm constructs a series of decision trees and refines predictions iteratively using a regularization function to balance model complexity and predictive accuracy.

A GB model, such as XGBoost, uses an additional regularization function N to predict results [31].

{\hat{Y}}_{i} = ϕ (X_{i}) = \sum_{n = 1}^{N} f_{n} (X_{i}), f_{n} \in F,

(1)

This refers to the regression tree domain, specifically the Classification and Regression Trees (CART) methodology. N is denoting the total number of trees, while F represents all tree regions. The variable q defines the tree structure, T indicates the number of leaves, and

f_{n}

represents a tree with the structure q, where the leaf weights are independent. The function

q (x)

aligns with the input data to learn the group of functions utilized in the model, allowing the objectives to be regularized as follows:

Λ (θ) = \sum_{i} l ({\hat{Y}}_{i}, Y_{i}) + \sum_{n} Ω (f_{n})

(2)

w h e r e Ω (f_{n}) = γ T + \frac{1}{2} λ {‖ω‖}^{2}

The term l represents different convex loss functions used to measure the prediction of

{\hat{Y}}_{i}

against the target of

{\hat{Y}}_{i}

. The second term, Ω, serves as a regularization function to penalize model complexity, incorporating the regression tree parameters γ and λ as regularization factors. This term helps smooth the final learned weights, effectively reducing the risk of overfitting. The primary goal of regularization is to select a simpler, more generalizable model that maintains high predictive accuracy. However, the inclusion of multiple efficient tree models does not necessarily improve spatial efficiency when relying solely on traditional Euclidean methods, as the model is trained by incrementally optimizing this loss function.

Λ^{(t)} = \sum_{i = 1}^{k} l (Y_{i}, {\hat{Y}}_{i}^{t - 1} + f_{t} (X_{i})) + Ω (f_{t})

(3)

This equation will gradually add a function

f_{t}

, which helps to improve the model as much as the equation.

Ensuring that each tree incrementally improves the forecast accuracy. This methodology enables XGBoost to efficiently capture transport energy consumption trends by integrating complex relationships between economic indicators, transport activity, and policy changes, making it a valuable tool for sustainable energy planning and forecasting.

3.4. Model Evaluation

The performance of the models is assessed using the following statistical metrics, which provide a comprehensive evaluation of their predictive accuracy and reliability:

Mean Squared Error (MSE): Measures the average squared differences between predicted and actual values.

M S E = (\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2})

(4)

Root Mean Squared Error (RMSE): Provides an interpretable measure of prediction errors in the same units as the target variable.

R M S E = \sqrt{\frac{\sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}}{n}}

(5)

Mean Absolute Error (MAE): Captures the average magnitude of prediction errors.

M A E = (\frac{1}{n}) \times \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|

(6)

Mean Absolute Percentage Error (MAPE): Expresses errors as a percentage of actual values for better interpretability.

M A P E = (\frac{1}{n} \sum_{i = 1}^{n} \frac{|y_{i} - {\hat{y}}_{i}|}{|y_{i}|}) \times 100

(7)

R-Squared (R²): Indicates the proportion of variance explained by the mode Both models are evaluated using the following metrics:

R^{2} = 1 - \frac{{S S}_{r e s}}{{S S}_{t o t}}

(8)

where

$n$ is the total number of observations.
$y_{i}$ is the actual value of the target variable.
${\hat{y}}_{i}$ is the predicted value.
$\bar{y}$ is the mean of the actual target values.
${S S}_{r e s}$ is the residual sum of squares: $\sum {(y_{i} - {\hat{y}}_{i})}^{2}$ representing the unexplained variance by the model.
${S S}_{t o t}$ is the total sum of squares: $\sum {(y_{i} - \bar{y})}^{2}$ representing the total variance in the data.

While both LSTM and XGBoost are traditionally considered data-intensive techniques, this study acknowledges the relatively limited sample size in this study (30 annual observations from 1993 to 2022). This raises legitimate concerns about potential overfitting and model generalizability. To address this issue, this study has implemented several strategies to optimize model performance despite data constraints.

First, this study carefully applied regularization techniques in both models to prevent overfitting. For XGBoost, this study utilized L1 and L2 regularization parameters along with early stopping mechanisms. For LSTM, this study employed dropout layers and recurrent dropout to enhance generalization ability. Second, this study implemented cross-validation techniques specifically adapted for time series data, using a time-based splitting approach rather than random sampling to preserve temporal dependencies.

Additionally, this study conducted comparative analyses with classical statistical methods, including Multiple Linear Regression (MLR) and Autoregressive Integrated Moving Average (ARIMA), which are often considered more appropriate for smaller datasets. The evaluation revealed that despite the data limitations, XGBoost consistently outperformed these traditional approaches, achieving superior prediction accuracy across multiple metrics. This aligns with findings from Sun-Youn et al. [33], who demonstrated that machine learning approaches could achieve better performance than econometric methods even with limited observations when the underlying relationships between variables are complex and non-linear.

The performance advantage of XGBoost in this case can be attributed to its ability to capture complex interactions between variables without requiring the large sample sizes typically needed for deep learning models like LSTM. This is consistent with research by Emmanuel Hidalgo et al. [34], who found that tree-based models like Random Forest and XGBoost could effectively model energy consumption with limited data points due to their ensemble nature and feature-based learning approach.

4. Results

4.1. Descriptive Analysis

From Table 3, the dataset provides a comprehensive overview of key variables influencing transport energy consumption in Thailand from 1993 to 2022. Transport energy consumption, Gross Domestic Product (GDP), visualized, demonstrates a steady growth trajectory, increasing from 4.341 trillion Baht in 1993 to a peak of 10.919 trillion Baht in 2019, with an average of 7.570 trillion Baht and a standard deviation of 2.133 trillion Baht. The temporary contraction in GDP during 2020 coincided with a similar dip in transport energy consumption, emphasizing the correlation between economic growth and energy demand. By 2022, GDP rebounded to 10.676 trillion Baht, reflecting resilience in economic activity.

Population growth is another significant factor influencing transport energy demand. The population increased steadily from 57.776 million in 1993 to 71.697 million in 2022, with an average of 66.368 million and a standard deviation of 4.238 million. Although population growth alone does not directly determine energy consumption, it drives increased demand for both private and public transportation.

Vehicle registration trends show substantial growth across all vehicle categories. Small vehicle registrations (motorcycles and mopeds) increased from 7.313 million in 1993 to 22.301 million in 2022, averaging 16.508 million. Medium-sized vehicles (passenger cars, pickup trucks, and vans) saw the most dramatic rise, growing from 3.030 million to 18.970 million, with an average of 9.950 million and a standard deviation of 4.876 million. Large vehicle registrations (buses, trucks, and trailers) increased from 0.533 million to 1.359 million, with an average of 0.951 million.

Vehicle kilometers traveled (VKT), reflects usage patterns across different vehicle types. Motorcycle VKT increased significantly from 7.313 million in 1993 to 22.301 million in 2022, with an average of 16.508 million. Passenger vehicle VKT grew substantially from 3.138 million to 19.103 million, with an average of 10.084 million and a standard deviation of 4.889 million. Similarly, truck VKT, which represents freight transport demand, rose from 0.426 million to 1.226 million, averaging 0.817 million over the period.

This analysis highlights the interdependence between economic growth, vehicle expansion, and transport energy consumption. The consistent rise in vehicle registrations and kilometers traveled underscores the increasing reliance on private and commercial transport, which directly impacts energy demand. Furthermore, the fluctuations observed in 2020 emphasize the sector’s vulnerability to external shocks, such as economic downturns and global disruptions. Together, these insights provide valuable guidance for future transport energy policies and planning.

As shown in Figure 2, transport energy consumption, measured in million tons of oil equivalent (MTOE), exhibits an increasing trend over the years, with an average of 23.888 MTOE. The relationship between transport energy consumption and GDP demonstrates strong coupling, with both metrics showing similar growth trajectories until 2019, followed by a pandemic-induced decline in 2020, and a subsequent recovery phase. This visualization effectively captures the interconnection between economic growth and energy demand in the transport sector over the three-decade period.

Figure 3 illustrates the evolution of vehicle fleet composition in Thailand and its relationship with transport energy consumption. The visualization reveals a gradual but significant shift in fleet structure, with medium-sized vehicles (passenger cars, pickup trucks, and vans) gaining an increasingly larger share of the total fleet, growing from approximately 28% in 1993 to nearly 45% in 2022. This transformation coincides with the increasing trajectory of transport energy consumption, suggesting a potential causal relationship between the growing proportion of medium vehicles and rising energy demand. The dominance of small vehicles (motorcycles and mopeds) has correspondingly decreased from 67% to 52% during this period, while large vehicles have maintained a relatively stable proportion of approximately 3–5% throughout.

Figure 4 presents the changing patterns of vehicle kilometers traveled (VKT) by different vehicle types in relation to transport energy consumption. The visualization demonstrates that passenger vehicle VKT has grown substantially as a proportion of total kilometers traveled, particularly since 2003. This trend aligns closely with the acceleration in transport energy consumption during the same period, highlighting the significant energy implications of increased passenger vehicle usage. While motorcycle VKT has maintained a substantial share of total kilometers traveled, its proportional contribution has gradually declined. Truck VKT has remained relatively stable as a percentage of total kilometers traveled, yet its energy intensity per kilometer makes it a significant contributor to overall energy consumption despite its smaller share of total VKT.

The correlation matrix provides strong empirical support for the relationships identified in the feature importance analysis. As shown in the heatmap (Figure 5), all predictor variables exhibit high positive correlations with transport energy consumption, with coefficients ranging from 0.90 to 0.98. This confirms the strong statistical relationships underpinning the model. Particularly notable is the strong correlation (0.98) between GDP and transport energy consumption, validating the economic-energy consumption relationship established in the literature. Similarly, the high correlations between vehicle registration data and energy consumption (ranging from 0.90 to 0.95) substantiate the finding that fleet composition significantly influences energy demand. The correlation analysis also reveals interesting inter-variable relationships. For instance, the perfect correlation (1.00) between registered small vehicles and VKT Motorcycle, as well as between medium vehicles and VKT Passenger, confirms the direct relationship between vehicle ownership and usage patterns. The strong correlations between medium and large vehicles (0.99) and their corresponding VKT measures suggest coordinated growth in these segments.

4.2. Model Fitting and Performance Comparison LSTM and XGBoost

The study investigates transport energy consumption in Thailand, employing advanced predictive models to analyze trends and key influencing factors over the period 1993 to 2022. By leveraging the Long Short-Term Memory (LSTM) and XGBoost models, the research aims to accurately predict energy demand and identify critical contributors to consumption patterns. Through comparative analysis, the models reveal distinct capabilities in capturing historical trends and projecting future demands. Additionally, feature importance analysis highlights the dominant role of medium vehicles and freight transportation in shaping energy usage, alongside demographic and economic factors. These findings offer valuable insights for designing energy-efficient policies and sustainable transportation strategies.

To compare the performance of the LSTM and XGBoost. Bayesian Optimization was applied to critical hyperparameters for LSTM and XGBoost to decrease overfitting and ensure robust model predictions, as shown in Table 4 and Table 5.

This study did not apply stationarity tests or differencing, as LSTM models inherently handle non-stationary time series without explicit transformations [35]. Unlike traditional models, LSTM captures temporal dependencies directly, making such preprocessing unnecessary [36]. However, its ability to learn long-term dependencies may be affected by extreme trends or structural shifts. In contrast, XGBoost, a tree-based model, relies on direct feature interactions rather than sequential dependencies [37]. making stationarity adjustments irrelevant to its framework. To ensure a fair comparison, both models were evaluated on the same untransformed dataset, allowing an unbiased assessment of sequential versus feature-based learning approaches.

From Table 6, the comparison of LSTM and XGBoost model performance reveals significant differences in prediction accuracy across various evaluation metrics. XGBoost outperforms LSTM in Mean Squared Error (MSE), with values of 0.0256 for training data and 0.2790 for test data, compared to 4.9473 and 4.5695, respectively, for LSTM, highlighting XGBoost’s superior error minimization. Similarly, Root Mean Squared Error (RMSE) is lower for XGBoost (0.1599 Train, 0.5282 Test) than for LSTM (2.2243 Train, 2.1377 Test), indicating a more precise fit. In terms of Mean Absolute Error (MAE), XGBoost exhibits a lower error magnitude (0.0815 Train, 0.3200 Test) compared to LSTM (1.9142 Train, 1.8057 Test), further demonstrating its accuracy. Additionally, Mean Absolute Percentage Error (MAPE) results show that XGBoost achieves superior percentage error reduction (0.49% Train, 1.08% Test) over LSTM (9.36% Train, 6.06% Test). Lastly, R-squared (R²) values confirm that XGBoost has significantly higher explanatory power (0.9976 Train, 0.9508 Test) than LSTM (0.3140 Train, 0.2005 Test), indicating better predictive reliability. These results confirm that XGBoost consistently outperforms LSTM across all key performance metrics, making it a more effective model for predictive tasks.

Figure 6 presents the results of transport energy consumption predictions in Thailand using the Long Short-Term Memory (LSTM) model, covering the period 1993 to 2022. The graph compares actual and predicted values for both the training and testing datasets. In the training phase (1993–2012), the actual values, represented by the blue line, align closely with the predicted values, shown by the green dashed line. This close agreement demonstrates the LSTM model’s strong ability to learn and replicate temporal patterns and trends from historical data. The model effectively tracks the gradual increase in energy consumption over this period, reflecting its strength in capturing long-term trends. In the testing phase (2013–2022), the model’s predictions, represented by the red dashed line, are compared with actual values, shown by the purple line. The graph indicates that the LSTM model continues to perform well, capturing the general upward trend in energy consumption. The vertical dashed line in the graph clearly separates the training and testing datasets, illustrating the distinction between the data used for model training and validation.

Figure 7 illustrates the results of transport energy consumption predictions in Thailand using the XGBoost model, covering the same period of 1993 to 2022. This graph also compares actual and predicted values for the training and testing datasets. In the training phase (1993–2012), the actual values, represented by the blue line, closely align with the predicted values, shown by the green dashed line. This alignment demonstrates XGBoost’s ability to accurately learn and replicate historical energy consumption trends. The model captures the increasing pattern during this period with high precision. In the testing phase (2013–2022), the predicted values, represented by the red dashed line, closely follow the actual values, shown by the purple line. The graph highlights XGBoost’s strong predictive performance in capturing the continued increase in energy consumption. The vertical dashed line separates the training and testing datasets, emphasizing the clear distinction between the two phases. The XGBoost model demonstrates excellent predictive accuracy across both training and testing periods, effectively capturing long-term growth patterns in transport energy consumption. Its ability to consistently align with actual data showcases its reliability as a tool for forecasting energy demand.

The comparison between the Long Short-Term Memory (LSTM) and XGBoost models reveals notable differences in their performance for predicting transport energy consumption in Thailand from 1993 to 2022. In the training phase (1993–2012), both models show a close alignment between actual and predicted values, but XGBoost demonstrates a tighter fit, with its predicted values almost perfectly overlapping the actual values. This indicates its superior ability to minimize errors in historical data. During the testing phase (2013–2022), XGBoost continues to exhibit higher accuracy, closely following the actual values and effectively handling both gradual and sharp changes in energy consumption. In contrast, LSTM, while performing well overall, shows slightly larger deviations during periods of rapid fluctuations. LSTM excels in capturing long-term trends and temporal dependencies, leveraging its memory mechanism to learn sequential patterns, though it lags slightly during periods of variability. XGBoost, on the other hand, consistently maintains precision by iteratively refining predictions and effectively handling non-linear relationships. Both models effectively predict transport energy consumption, but XGBoost outperforms LSTM in terms of predictive accuracy, particularly in the testing phase, while LSTM’s strength lies in modeling complex temporal sequences.

Figure 8 presents the Feature Importance Analysis for the prediction model, highlighting the relative contributions of various factors to transport energy consumption. The most influential feature is Registered-vehicle Medium, which accounts for 36.6% of the total importance score, underscoring the significant role of medium vehicles, such as passenger cars and pick-up trucks, in driving energy demand. Following this, VKT Truck contributes 20.5%, emphasizing the critical impact of freight transportation and heavy vehicle usage on energy consumption. Both Registered-vehicle Large and Registered-vehicle Small each contribute 11.6%,11.1%, indicating that while medium vehicles dominate, large vehicles (e.g., buses, trucks) and small vehicles (e.g., motorcycles, mopeds) also play notable roles. Demographic and economic factors such as Population and GDP contribute 9.6% and 5.6%, respectively, reflecting their influence on shaping transport demand and energy usage patterns. Finally, VKT Passenger and VKT Motorcycle have smaller contributions, accounting for 3.9% and 1.1%, respectively, indicating that while personal and motorcycle travel influence energy consumption, their overall impact is relatively limited compared to other factors. These findings provide valuable insights for prioritizing energy-efficient policies and sustainable transportation planning.

5. Discussion

5.1. Comparative Performance of LSTM and XGBoost

The results demonstrate that the XGBoost model significantly outperformed the LSTM model across all evaluation metrics, which warrants deeper examination. The XGBoost model achieved an MSE of 0.0256 for training data and 0.2790 for test data, compared to LSTM’s 4.9473 and 4.5695, respectively. This substantial performance gap challenges conventional assumptions about the superiority of recurrent neural networks for time-series forecasting in transportation contexts.

The performance difference may be attributed to several factors. First, XGBoost’s ensemble approach combines multiple decision trees, enabling it to efficiently capture non-linear relationships without requiring extensive hyperparameter tuning. This aligns with findings from Çınarer et al. [15], who demonstrated XGBoost’s exceptional performance in predicting CO₂ emissions from Turkey’s transportation sector. Similarly, Ağbulut [2] found that XGBoost outperformed neural network approaches in forecasting transportation-related energy demand with similar datasets.

Second, the temporal dependencies in the dataset spanning 1993–2022 may not be as critical as the complex interactions between different input variables. While LSTM models excel at capturing long-term dependencies in sequential data, the feature importance analysis suggests that the relationships between vehicle categories, economic indicators, and energy consumption may be more effectively modeled through XGBoost’s decision tree structure. This supports the observations of J. Zhang et al. [14], who found that XGBoost was particularly effective at modeling complex multivariate relationships in emissions prediction.

Third, the dimensionality of the dataset (spanning 30 years with multiple features) presents challenges for LSTM models, which require careful tuning to avoid overfitting. The relatively modest R² values for LSTM (0.3140 for training and 0.2005 for testing) compared to XGBoost (0.9976 for training and 0.9508 for testing) suggest that the LSTM architecture may have struggled to generalize patterns from the training data to the testing period. This is consistent with research by Ji et al. [13], who observed similar challenges when applying deep learning models to transportation emission predictions with complex multivariate inputs. LSTM may have struggled to capture volatility in the test set, as its predictions remained overly smooth, suggesting overfitting to stable trends [38,39]. Increased variability after 2015, driven by policy changes and economic shifts, may have further hindered its ability to generalize [40,41]. Unlike XGBoost, which effectively captured feature interactions, LSTM’s reliance on sequential patterns may have been less suited to the dataset. Additionally, the model’s limited capacity to learn long-term dependencies due to data constraints likely contributed to its suboptimal performance [42,43]. These findings emphasize the need for a modeling approach better aligned with the dataset’s characteristics.

Fourth, the nature of the transport energy consumption patterns in Thailand may exhibit characteristics that are more amenable to tree-based methods. The potential presence of threshold effects and non-linear relationships between urbanization, vehicle ownership, and energy consumption aligns with the strengths of gradient-boosting algorithms. Henriques et al. [25] observed similar advantages when applying ensemble methods to energy consumption pattern classification in their research.

Additionally, the significant disparity in MAPE values between XGBoost (0.49% training, 1.08% testing) and LSTM (9.36% training, 6.06% testing) highlights XGBoost’s exceptional precision in percentage terms. This level of accuracy is particularly valuable for energy planning applications where even small percentage errors can translate to significant resource misallocations. Shams Amiri et al. [21] similarly emphasized the importance of high percentage accuracy in transportation energy forecasting for effective policy planning.

While the findings strongly favor XGBoost for this application, it is important to note that LSTM models might perform better with different hyperparameter configurations, additional feature engineering, or alternative architectures such as encoder-decoder structures [15]. Further research incorporating attention mechanisms or hybrid approaches combining LSTM with convolutional layers could potentially improve LSTM performance for similar forecasting tasks, as suggested by the promising results of Ghanbari et al. [19] in their work on multivariate time-series prediction.

It is important to acknowledge the limitations of applying data-intensive machine learning techniques to the relatively small dataset of 30 annual observations. While the XGBoost model demonstrated strong performance, this finding should be interpreted with appropriate caution. The success of XGBoost in this context aligns with Dahiru and Mohammed [44], who found that for certain energy forecasting tasks, machine learning approaches could outperform traditional methods even with limited historical data, particularly when the underlying relationships are non-linear and complex.

However, the significant performance gap between XGBoost and LSTM likely reflects the latter’s known requirement for larger training datasets. As noted by Hamed et al. [45], LSTM models typically require substantial historical data to effectively capture temporal dependencies, while gradient boosting methods can perform well with fewer observations due to their ensemble learning approach. This limitation is evident in the LSTM model’s modest R² values of 0.3140 for training and 0.2005 for testing, suggesting insufficient data for the model to effectively learn the complex temporal patterns.

To validate the findings, this study compared XGBoost results with classical forecasting methods, including ARIMA and Multiple Linear Regression. The XGBoost model outperformed these traditional approaches, achieving 15–28% lower RMSE values. This performance advantage suggests that despite data limitations, the complex non-linear relationships in transport energy consumption patterns are better captured by XGBoost’s tree-based ensemble approach than by linear statistical methods. Nevertheless, as suggested by Hamidreza et al. [46], incorporating elements from traditional statistical approaches into machine learning frameworks could potentially enhance model robustness for limited datasets in future research.

5.2. Feature Importance Analysis and Policy Implications

The feature importance analysis yielded several significant insights that have profound implications for transport energy policy in Thailand. The finding that medium-sized vehicles account for 36.6% of the influence on transport energy consumption represents the most substantial factor in the model, followed by truck VKT at 20.5%, registered large vehicles at 11.6%, and registered small vehicles at 11.1%. This hierarchical influence pattern reveals a complex transportation energy landscape that warrants detailed examination.

The dominant influence of medium-sized vehicles (passenger cars and pick-up trucks) aligns with Thailand’s motorization trends over the past three decades. As documented by Chai et al. [24], rising middle-class incomes in developing Asian economies have driven rapid growth in private car ownership, particularly evident in the dataset where medium vehicle registrations increased by 526% from 1993 to 2022. The high energy impact of this vehicle category reflects not only their growing numbers but also their relatively high per-vehicle energy consumption compared to motorcycles, combined with frequent use patterns. This finding corroborates research by Pongthanaisawan et al. [1], who identified passenger vehicles as a critical driver of growing transport energy consumption in Thailand, with significant implications for future energy security.

The substantial contribution of truck VKT (20.5%) highlights the critical role of freight transportation in Thailand’s energy landscape. This finding is particularly noteworthy considering that truck registrations account for only a portion of the registered large vehicles (11.6% importance), suggesting that utilization intensity per truck substantially magnifies their energy impact. Lin et al. [8] similarly identified road freight as having significantly lower energy efficiency compared to other transport modes in their multi-modal analysis of China’s transportation sector. The disproportionate energy impact of freight suggests that targeted policies for the trucking sector could yield substantial energy savings. These might include logistics optimization, driver training programs, aerodynamic retrofitting requirements, and accelerated fleet modernization initiatives, as suggested by Ahmad and Zhang [47] in their comprehensive review of energy demand management strategies.

The combined 22.7% importance of small and large vehicle registrations further emphasizes how fleet composition fundamentally shapes transport energy consumption patterns. The relatively lower contribution of small vehicles (11.1%) despite their numerical dominance in the fleet (22.3 million versus 19.0 million medium vehicles in 2022) reflects their greater energy efficiency. This efficiency differential between vehicle classes was similarly observed by Champeecharoensuk et al. [17] in their sectoral analysis of Thailand’s transport emissions patterns.

Notably, the model identifies actual vehicle usage (VKT) as less influential than vehicle ownership for motorcycles and passenger vehicles, with motorcycle VKT contributing only 1.1% and passenger VKT 3.9% to energy consumption predictions. This counterintuitive finding suggests that in Thailand’s context, the accumulation of vehicles may be outpacing their utilization growth, creating a potential “latent demand” phenomenon where vehicle ownership growth serves as a leading indicator for future energy consumption increases. This pattern differs from observations in other developing economies where utilization rates typically drive consumption more directly, as noted by Rahman et al. [20] in their analysis of Saudi Arabia’s transportation sector.

The policy implications of these findings are substantial and multifaceted. First, they strongly suggest that medium vehicle efficiency improvements should be prioritized in Thailand’s transport energy policies. Measures could include strengthened fuel economy standards, accelerated electric vehicle adoption incentives targeting the mid-size segment, and congestion pricing schemes in urban centers. Ji et al. [13] demonstrated that such targeted interventions in China’s urban centers yielded significant energy consumption reductions in the medium vehicle segment.

Second, the substantial contribution of truck VKT suggests that freight transport efficiency represents a high-leverage intervention point. Policies promoting a modal shift from road to rail freight, as suggested by Lin et al. [8], could yield substantial energy savings. Additional measures might include incentives for fleet renewal targeting the highest-consumption vehicles, weight-based road pricing systems, and logistics optimization programs to reduce empty-running rates [48].

Third, the relatively modest contributions of passenger VKT (3.9%) and motorcycle VKT (1.1%) suggest that policies focused solely on reducing personal vehicle kilometers traveled—such as ride-sharing promotion or telecommuting incentives—may yield less immediate impact than interventions targeting vehicle technology or freight systems. This contrasts with findings from Shams Amiri et al. [21], who identified trip distances as a primary driver of household transportation energy in their US-based study, highlighting important contextual differences between developed and developing transport systems.

Fourth, the combined influence of economic and demographic factors (15.2%) suggests that transport energy policies must be integrated with broader economic development and urban planning strategies. As Supasa et al. [7] argued in their consumption-based analysis of Thailand’s energy policies, siloed approaches that fail to account for these interconnections risk suboptimal outcomes. The relationship between energy consumption and socioeconomic factors identified in the model suggests opportunities for proactive policy development that anticipates future consumption patterns as Thailand’s economy and demographics continue to evolve.

Additionally, the relatively modest direct influence of GDP (5.6%) compared to vehicle-specific factors could indicate opportunities for economic growth pathways that minimize transport energy intensity through strategic infrastructure investments and urban development patterns, as suggested by Zhang et al. [18] in their analysis of transport electrification pathways.

5.3. Economic and Demographic Factors

The analysis reveals a nuanced relationship between economic and demographic factors and transport energy consumption in Thailand. While vehicle-related variables dominated the feature importance rankings, the combined contribution of economic and demographic factors—GDP at 5.6% and population at 9.6%—accounts for approximately 15.2% of the influence on transport energy consumption. This significant but secondary role of macroeconomic indicators warrants deeper examination.

The 9.6% contribution from population factors aligns with Thailand’s demographic transition over the study period (1993–2022). During this time, Thailand experienced slowing population growth but increasing urbanization, which fundamentally altered mobility patterns. This demographic influence on transport energy consumption supports findings by Pongthanaisawan et al. [1], who identified population distribution shifts as a key driver of changing transportation energy needs in Thailand. Similarly, Chai et al. [24] found that a 1% increase in urbanization led to a 1.26% rise in road transportation energy consumption in their analysis of Chinese cities, suggesting similar mechanisms may be at work in Thailand’s developing urban centers.

The relatively modest 5.6% contribution from GDP is particularly interesting when contrasted with earlier studies that positioned economic growth as the primary driver of transport energy demand. Supasa et al. [7] found stronger correlations between economic indicators and energy consumption in their consumption-based analysis of Thailand, suggesting that the relationship between economic growth and transport energy consumption may have evolved over time. This evolution could reflect increasing energy efficiency in the transport sector, structural economic changes, or shifting consumer preferences toward less energy-intensive transportation modes as incomes rise.

The relatively moderate influence of GDP in the model may also indicate a decoupling effect between economic growth and transport energy consumption, which has been observed in other maturing economies. Rahman et al. [20] noted similar decoupling trends in their analysis of transportation energy demand in Saudi Arabia, attributing this to technological improvements and changing economic structures. This potential decoupling represents a positive development from an energy efficiency perspective and warrants further investigation in future research.

The interaction between economic factors and vehicle ownership patterns deserves particular attention. The results suggest that economic growth translates into transport energy consumption primarily through its effect on vehicle ownership patterns, especially for medium-sized vehicles. This mediated relationship helps explain why GDP shows a lower direct importance score despite its known influence on motorization rates. Zhang et al. [14] observed similar mediated relationships in their study of carbon emissions in expanding Chinese megacities, where economic growth influenced emissions primarily through changes in urban form and transportation patterns.

Demographic factors’ relatively strong showing (9.6%) compared to GDP (5.6%) may also reflect Thailand’s transition to an aging society during the study period. Aging populations typically exhibit different mobility patterns than younger demographics, with potential implications for transport mode choices and trip frequencies [49,50]. This demographic transition may be exerting an influence on energy consumption patterns beyond what would be expected from population growth rates alone.

The multi-factorial influence of both economic and demographic variables on transport energy consumption underscores the need for integrated policy approaches. As emphasized by Champeecharoensuk et al. [17] in their analysis of aviation emissions in Thailand, addressing energy consumption challenges requires coordinated policies that account for both economic development objectives and demographic realities. The relatively modest contribution of these factors compared to vehicle-specific variables indicates that technological and behavioral interventions focused on the transport sector itself may yield more immediate impacts than broader economic or demographic policies.

These findings align with recent international research suggesting that transport energy consumption in developing economies follows complex, non-linear relationships with economic and demographic variables. Emami Javanmard et al. [28] observed similar complexity in their analysis of air transportation demand and emissions, highlighting the need for sophisticated modeling approaches that can capture these multidimensional relationships. The results demonstrate that XGBoost’s ability to model non-linear interactions makes it particularly well-suited for capturing these complex socioeconomic influences on transport energy consumption.

6. Conclusions

This research addressed the critical challenges facing Thailand’s transport sector in energy management and sustainable development. As the second-largest energy-consuming sector in Thailand, accounting for approximately 30% of total final energy consumption with a consistent upward trend over three decades, the transport sector presents significant challenges for energy security, environmental sustainability, and greenhouse gas emission reduction commitments. The complex nature of Thailand’s transport energy consumption, characterized by its unique vehicle fleet composition, geographical disparities in transportation needs, and the growing impact of e-commerce and logistics services, necessitated a sophisticated forecasting approach.

The literature review identified several key research gaps. While previous studies have incorporated vehicle-related variables, none had comprehensively analyzed the combined impact of differentiated vehicle categories alongside their corresponding vehicle kilometers traveled data. There was also limited exploration of the long-term relationship between macroeconomic indicators and transport energy consumption in Thailand, and insufficient understanding of how the relationship between vehicle registration numbers and actual usage influences energy consumption predictions. To address these gaps, this study established four main objectives:

to develop and compare the performance of LSTM and XGBoost models in predicting Thailand’s transport energy consumption,
to analyze the differential impact of various vehicle categories and their usage patterns on energy consumption,
to identify and quantify the relative importance of different factors affecting transport energy consumption, and
to provide evidence-based recommendations for transport energy policy planning.

The study leveraged a comprehensive dataset spanning from 1993 to 2022, which included vehicle registration data categorized by size, vehicle kilometers traveled for different vehicle types, macroeconomic indicators, and historical transport energy consumption data. This extensive temporal coverage provided a unique opportunity to analyze long-term patterns that most existing studies cannot address. This study implemented and compared two advanced machine learning approaches: Multivariate Long Short-Term Memory (LSTM) neural networks and the XGBoost algorithm. The models were evaluated using multiple metrics, including Mean Squared Error, Root Mean Squared Error, Mean Absolute Error, Mean Absolute Percentage Error, and R-squared values. The findings demonstrated that XGBoost consistently outperformed LSTM across all evaluation metrics, achieving an R-squared value of 0.9508 for test data compared to LSTM’s 0.2005. Feature importance analysis revealed that medium-sized vehicles had the most substantial influence (36.6%) on transport energy consumption, followed by truck VKT (20.5%), suggesting that policies targeting fuel efficiency standards and freight transport optimization could yield substantial energy savings.

This research contributes to both academic understanding and practical policy implementation in several ways.

First, it establishes the superior performance of XGBoost for transport energy forecasting in Thailand’s context, challenging conventional assumptions about the superiority of recurrent neural networks for time-series forecasting.
Second, it quantifies the differential impact of various vehicle categories and their usage patterns on energy consumption, providing a nuanced understanding of how fleet composition shapes transport energy demand.
Third, it demonstrates the value of integrating both vehicle registration and vehicle usage data in predictive models, offering enhanced insights compared to approaches based solely on one data category.

Moreover, while this study focuses on Thailand, its methodology and findings have broader implications for other developing and middle-income economies with similar transportation characteristics. The proposed machine learning framework can be adapted to national contexts where energy demand is shaped by comparable macroeconomic conditions, vehicle fleet compositions, and infrastructure developments. Further validation across multiple regions is necessary, incorporating localized adjustments for variations in transport systems, fuel composition, and regulatory frameworks.

The dataset used, including vehicle registration, vehicle kilometers traveled (VKT), macroeconomic indicators, and historical energy consumption, aligns with data structures maintained by transport and energy agencies worldwide. Given this consistency, the XGBoost and LSTM models can be applied in other countries for high-precision energy forecasting. Feature importance analysis highlights medium-sized vehicles and freight transport as dominant contributors to energy demand, a pattern observed in various developing economies. These findings suggest that machine learning-based forecasting approaches are effective across diverse regional settings.

Beyond methodological contributions, this study presents a data-driven framework for transport energy policy planning. The predictive capability of these models supports policy evaluation in areas such as fuel efficiency regulations, vehicle electrification, and logistics optimization. Quantifying the impact of different vehicle categories on energy consumption allows policymakers to prioritize interventions such as fuel economy standards, freight transport efficiency improvements, and modal shifts from road to rail.

Given the global relevance of transport energy challenges, these models can inform international strategies for improving energy efficiency. Countries experiencing rapid urbanization and motorization could leverage these techniques to forecast fuel demand, optimize taxation policies, and design incentives for low-emission vehicles. Incorporating machine learning forecasting into multi-modal transport planning can also support sustainable urban mobility strategies.

Despite strong predictive performance, these models require adaptation to different national contexts due to variations in infrastructure, fuel types, and policy frameworks. Future research should validate their applicability in multiple regions, integrating localized variables and emerging trends such as autonomous mobility and renewable energy transitions. Refining these methodologies will enhance the role of machine learning in transport energy policy planning, supporting more effective and sustainable mobility solutions globally.

Based on the model results, this study offers the following evidence-based recommendations for transport energy policy planning in Thailand:

The XGBoost models feature importance analyses providing clear direction for prioritizing transport energy interventions. Given that medium-sized vehicles (passenger cars and pick-up trucks) account for 36.6% of the influence on transport energy consumption, policies targeting this vehicle segment should receive the highest priority. This study recommends implementing progressively stringent fuel economy standards for new medium vehicles, coupled with financial incentives for low-emission alternatives. Tax structures should be redesigned to favor more efficient vehicles within this category, potentially through engine displacement or emissions-based taxation.
The significant contribution of truck VKT (20.5%) indicates that freight transport efficiency represents a high-leverage intervention point. This study recommends developing a national freight efficiency program that includes driver training, logistics optimization, and aerodynamic retrofitting incentives. Additionally, strategic investment in rail freight infrastructure could yield substantial energy savings by shifting appropriate cargo from road to more efficient transport modes. Time-of-day delivery regulations in urban centers could further reduce congestion-related energy waste in freight operations.
For the combined 22.7% importance of small and large vehicle registrations, the result recommends differentiated policy approaches. For large vehicles, implementing scrappage programs targeting the oldest, most inefficient models would accelerate fleet modernization. For small vehicles, particularly motorcycles, policies facilitating the transition to electric two-wheelers through purchase incentives and charging infrastructure would address their collective impact despite individual efficiency.
The relatively modest direct influence of GDP (5.6%) suggests opportunities for decoupling economic growth from transport energy consumption. The result recommends incorporating transport energy considerations into economic development planning, particularly through transit-oriented development policies that reduce travel demand while maintaining economic growth. Similarly, the 9.6% contribution from population factors points to the importance of integrated land-use and transportation planning to manage mobility needs as Thailand’s demographics continue to evolve.
The performance advantage of XGBoost over LSTM in the study suggests that transportation authorities would benefit from implementing similar machine learning approaches for ongoing energy forecasting. This study recommends establishing a data integration framework that regularly updates the predictive models with current vehicle registration, usage patterns, and economic indicators to maintain forecast accuracy and adjust policies accordingly.
Finally, the complex interactions between vehicle categories, usage patterns, and macroeconomic factors identified in the model highlight the need for a coordinated, cross-sectoral approach to transport energy management. The result recommends establishing an inter-ministerial transport energy task force to ensure policy coherence across economic development, environmental, and transport sectors, with regular reassessment based on updated predictive modeling to track progress toward energy efficiency goals.

This study is limited by focusing exclusively on historical patterns without addressing potential technological disruptions in transportation.

While demonstrating superior predictive performance, the XGBoost model primarily extrapolates from existing relationships between vehicles, usage, and energy consumption. The approach may fail to adequately capture the impact of emerging technologies like electric vehicles, autonomous systems, and mobility-as-a-service platforms that could fundamentally transform transport-energy dynamics.
Future research should integrate technological transition scenarios into forecasting models through hybrid approaches that combine machine learning with systems dynamics modeling. Incorporating detailed data on fuel types, powertrain technologies, and charging infrastructure would enhance the model’s sensitivity to ongoing energy transitions in transportation.

Author Contributions

Conceptualization, T.C., T.J., C.S., F.W. and S.J.; formal analysis, T.C., C.B. and F.W.; funding acquisition, S.J.; methodology, T.J. and S.J.; project administration, S.J.; software, C.B. and C.S.; supervision, V.R.; validation, T.J. and C.S.; visualization, F.W.; writing—original draft, T.C. and S.J.; writing—review and editing, V.R. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by Suranaree University of Technology (SUT), grant number IRD7-704-65-12-23.

Data Availability Statement

The data presented in this study are available on request from the corresponding author.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:

LSTM	Long Short-Term Memory
XGBoost	Extreme Gradient Boosting
VKT	Vehicle Kilometers Traveled
GDP	Gross Domestic Product
MTOE	Million Tons of Oil Equivalent

Appendix A

Table A1. Additional information about previous studies.

Authors	Model Performance Metrics	Dataset Characteristics	Study Limitations
Ji et al. [13]	- Models I and II evaluated using R² and RMSE - Six measures assess prediction accuracy - Smaller MABE indicates more reliable results	- GDP per capita and population data - Automobile kilometers traveled per year - Energy consumption and CO₂ emissions in transportation - Data spans 2009–2022	- Machine learning models rely on specific assumptions and simplifications - Additional influential factors not considered - Findings may not apply to other countries or regions - Projections based on historical data and present patterns - Unanticipated occurrences may alter emission and energy usage trends
Hoxha et al. [16]	- Stacking ensemble models outperform single algorithms - Using all features leads to better performance than two features - Best ensemble model achieved R² of approximately 0.99 - Hyperparameter tuning essential for model performance	- GDP, population, vehicle-km, passenger-km, ton-km, oil price - Data spans 1975–2019 (33 observations) - Sources include General Directorate of Turkish Highways - Focus on transportation energy demand in Turkey	- Simplifying assumptions may affect modeling accuracy - Performance may vary across different datasets or regions - Additional data sources could enhance model performance - Potential biases in dataset may exist
Çınarer et al. [15]	- ANN model: R² of 0.58, RMSE of 20.8, MAE of 14.4 - XGBoost performed best in scenario 4 with high R² values	- Transportation energy consumption, vehicle kilometers, population, GDP, CO₂ emissions - Data spans 1970–2016 - Input parameters: VK, GDP, POP, ENERGY; output: CO₂ emissions - Sources: World Bank and Turkish General Directorate of Highways	- Only three AI algorithms applied - Five input parameters assumed linear relationships - Other influencing factors not considered - Scenarios based on limited correlation effects
Ağbulut [2]	- R² values: 0.8639 to 0.9235 across algorithms - RMSE below 5 × 10⁶ tons for CO₂ emissions - Model II shows better results in testing	- Dataset spans 1970–2016 - Includes GDP, population, vehicle kilometers - CO₂ emissions and energy consumption data - Data sourced from World Bank and Turkish Statistical Institute	- No specific limitations mentioned in the paper
Henriques et al. [25]	- Gradient boosting robust for benchmarking performance - AHC recall rate: 64% for high-profile dwellings - AHC precision for low-profile dwellings: 92% - K-means clustering outperforms other methods	- Monthly consumption data for 383 households - 12 months of consumption data in kWh - 373 rows after preprocessing	- Defining true positives/negatives challenging in unlabeled datasets - Misclassification errors can lead to incorrect billing and energy management decisions - Need for context-specific adaptations in clustering methods
Champeecharoensuk et al. [17]	- Performance not explicitly quantified in the paper - Model validation through comparison of estimated and actual values	- Data from 39 airports in Thailand - 638 registered airplanes with Thai Nationality Mark - Data spans 2015–2020 for GHG emissions estimation - Key factors include energy consumption, fuel costs, and GDP	- Study limited to GHG emissions data from 2008 to 2020 - Lacks specific aircraft details and flight timing information - Additional factors influencing GHG emissions remain unexamined - Data availability constraints hinder accurate GHG emissions accounting
Zhang et al. [14]	- XGBoost shows better applicability and accuracy - RMSE as low as 0.036 - MLP neural networks outperformed regression models	- Four Chinese megacities: Beijing, Tianjin, Shanghai, Chongqing - Data spans 2003–2017 - Energy supply data for carbon emissions from fossil fuel combustion - Energy consumption data from China Energy Statistical Yearbook	- Further exploration using other methods and data needed - Study primarily focuses on megacities, limiting broader applicability - Single-factor analysis may overlook multifactor synergies in carbon emissions
Rahman et al. [20]	- SVR outperformed ANN in energy demand prediction - Correlation coefficients: 0.8932 (SVR) and 0.9925 (ANN) - Satisfactory performance on training and testing datasets - Performance indices: RMSE, MAE, MAPE, CC, and IA	- Time-series data from 1996 to 2017 - Variables include GDP, fuel price, and urban population - Data used for training and testing machine learning models	- No specific limitations addressed in the paper
Lin et al. [8]	- Optimal energy efficiencies observed for rail and water transport in 2011 - Future energy consumption estimates using extended DEA model	- Energy consumption data from IEA - TKM and PKM data from NBS of China - Future capacity data from ERI’s report on Low Carbon Development	- Energy consumption apportionment to passenger or freight transport is challenging - Data from Taiwan, Hong Kong, and Macau excluded due to unavailability - Assumptions affect accuracy of efficiency assessments
Pongthanaisawan et al. [1]	- Model performance verified through comparison of estimated and actual fuel consumption - MBE and RMSE calculated - Most models accurately estimate total fuel consumption	- Data collected from official sources for 1989–2007 - Dependent and independent variables for fuel demand forecasting models	- Biofuel supply limitations constrain long-term GHG emissions reduction potential. - Rail transport infrastructure fully utilized, limiting fuel demand response. - Econometric models struggle with sudden fuel demand changes due to oil price shocks.
Supasa et al. [7]	- Not explicitly quantified in the paper	- 2000 and 2010 Thailand IO tables - Physical energy use data from Thailand’s Energy Situation Annual Reports - Non-competitive imports assumption adopted	- Least energy-intensive sectors neglected in energy conservation policies - Energy policies should reflect consumption characteristics for better interventions
Shams Amiri et al. [21]	- Random Forest outperforms Neural Network - RMSE, MAE, and R² metrics used - Cross-validation ensures model accuracy	- 9235 households and 20,216 residents (2015) - Travel behaviors from Pennsylvania and New Jersey counties - Household Travel Survey (HTS) as primary dataset - Additional zonal boundaries and neighborhood data	- Study focuses only on household demographics and travel characteristics - Other factors like vehicle type and road conditions excluded

References

Pongthanaisawan, J.; Sorapipatana, C. Greenhouse gas emissions from Thailand’s transport sector: Trends and mitigation options. Appl. Energy 2013, 101, 288–298. [Google Scholar]
Ağbulut, Ü. Forecasting of transportation-related energy demand and CO₂ emissions in Turkey with different machine learning algorithms. Sustain. Prod. Consum. 2022, 29, 141–157. [Google Scholar] [CrossRef]
Satiennam, T.; Jaensirisak, S.; Satiennam, W.; Detdamrong, S. Potential for modal shift by passenger car and motorcycle users towards Bus Rapid Transit (BRT) in an Asian developing city. IATSS Res. 2016, 39, 121–129. [Google Scholar]
Department of Land Transport. Statistics of Vehicle Registration. Available online: https://web.dlt.go.th/statistics/ (accessed on 17 October 2023).
Corlu, C.G.; de la Torre, R.; Serrano-Hernandez, A.; Juan, A.A.; Faulin, J. Optimizing energy consumption in transportation: Literature review, insights, and research opportunities. Energies 2020, 13, 1115. [Google Scholar] [CrossRef]
Chindaprasirt, P.; Klungboonkrong, P.; Jaensirisak, S.; Faiboun, N.; Long, S.; Tippichai, A.; Taylor, M.A. Integrated Urban Transport and Land-Use Policies in Reducing CO₂ Emissions and Energy Consumption: Case Study of a Medium-Sized City in Thailand. World Electr. Veh. J. 2024, 15, 349. [Google Scholar] [CrossRef]
Supasa, T.; Hsiau, S.-S.; Lin, S.-M.; Wongsapai, W.; Chang, K.-F.; Wu, J.-C. Sustainable energy and CO₂ reduction policy in Thailand: An input–output approach from production- and consumption-based perspectives. Energy Sustain. Dev. 2017, 41, 36–48. [Google Scholar]
Lin, W.; Chen, B.; Xie, L.; Pan, H. Estimating Energy Consumption of Transport Modes in China Using DEA. Sustainability 2015, 7, 4225–4239. [Google Scholar] [CrossRef]
Sun, L.; Zhang, T.; Liu, S.; Wang, K.; Rogers, T.; Yao, L.; Zhao, P. Reducing energy consumption and pollution in the urban transportation sector: A review of policies and regulations in Beijing. J. Clean. Product. 2021, 285, 125339. [Google Scholar]
Morrow, W.R.; Gallagher, K.S.; Collantes, G.; Lee, H. Analysis of policies to reduce oil consumption and greenhouse-gas emissions from the US transportation sector. Energy Policy 2010, 38, 1305–1320. [Google Scholar]
Karim, F.; Majumdar, S.; Darabi, H.; Harford, S. Multivariate LSTM-FCNs for time series classification. Neural Netw. 2019, 116, 237–245. [Google Scholar]
Duan, G.; Su, Y.; Fu, J. Landslide Displacement Prediction Based on Multivariate LSTM Model. Int. J. Environ. Res. Public Health 2023, 20, 1167. [Google Scholar] [CrossRef] [PubMed]
Ji, T.; Li, K.; Sun, Q.; Duan, Z. Urban transport emission prediction analysis through machine learning and deep learning techniques. Transp. Res. Part D 2024, 135, 104389. [Google Scholar] [CrossRef]
Zhang, J.; Zhang, H.; Wang, R.; Zhang, M.; Huang, Y.; Hu, J.; Peng, J. Measuring the Critical Influence Factors for Predicting Carbon Dioxide Emissions of Expanding Megacities by XGBoost. Atmosphere 2022, 13, 599. [Google Scholar] [CrossRef]
Çınarer, G.; Yeşilyurt, M.K.; Ağbulut, Ü.; Yılbaşı, Z.; Kılıç, K. Application of various machine learning algorithms in view of predicting the CO₂ emissions in the transportation sector. Sci. Technol. Energy Transit. 2024, 79, 15. [Google Scholar] [CrossRef]
Hoxha, J.; Çodur, M.Y.; Mustafaraj, E.; Kanj, H.; El Masri, A. Prediction of transportation energy demand in Türkiye using stacking ensemble models: Methodology and comparative analysis. Appl. Energy 2023, 350, 121765. [Google Scholar] [CrossRef]
Champeecharoensuk, A.; Dhakal, S.; Chollacoop, N.; Phdungsilp, A. Greenhouse gas emissions trends and drivers insights from the domestic aviation in Thailand. Heliyon 2024, 10, e24206. [Google Scholar] [CrossRef]
Zhang, R.; Fujimori, S. The role of transport electrification in global climate change mitigation scenarios. Environ. Res. Lett. 2020, 15, 034019. [Google Scholar] [CrossRef]
Ghanbari, R.; Borna, K. Multivariate Time-Series Prediction Using LSTM Neural Networks. In Proceedings of the 2021 26th International Computer Conference, Computer Society of Iran (CSICC), Tehran, Iran, 3–4 March 2021. [Google Scholar]
Rahman, M.M.; Rahman, S.M.; Shafiullah, M.; Hasan, M.A.; Gazder, U.; Al Mamun, A.; Mansoor, U.; Kashifi, M.T.; Reshi, O.; Arifuzzaman, M.; et al. Energy Demand of the Road Transport Sector of Saudi Arabia—Application of a Causality-Based Machine Learning Model to Ensure Sustainable Environment. Sustainability 2022, 14, 16064. [Google Scholar] [CrossRef]
Shams Amiri, S.; Mostafavi, N.; Lee, E.R.; Hoque, S. Machine learning approaches for predicting household transportation energy use. City Environ. Interact. 2020, 7, 100044. [Google Scholar] [CrossRef]
Mohsin, M.; Abbas, Q.; Zhang, J.; Ikram, M.; Iqbal, N. Integrated effect of energy consumption, economic development, and population growth on CO(2) based environmental degradation: A case of transport sector. Environ. Sci. Pollut. Res. Int. 2019, 26, 32824–32835. [Google Scholar] [CrossRef]
Limanond, T.; Jomnonkwao, S.; Srikaew, A. Projection of future transport energy demand of Thailand. Energy Policy 2011, 39, 2754–2763. [Google Scholar]
Chai, J.; Lu, Q.-Y.; Wang, S.-Y.; Lai, K.K. Analysis of road transportation energy consumption demand in China. Transp. Res. Part D 2016, 48, 112–124. [Google Scholar] [CrossRef]
Henriques, L.; Castro, C.; Prata, F.; Leiva, V.; Venegas, R. Modeling Residential Energy Consumption Patterns with Machine Learning Methods Based on a Case Study in Brazil. Mathematics 2024, 12, 1961. [Google Scholar] [CrossRef]
Antonopoulos, I.; Robu, V.; Couraud, B.; Kirli, D.; Norbu, S.; Kiprakis, A.; Flynn, D.; Elizondo-Gonzalez, S.; Wattam, S. Artificial intelligence and machine learning approaches to energy demand-side response: A systematic review. Renew. Sustain. Energy Rev. 2020, 130, 109899. [Google Scholar] [CrossRef]
Alabi, T.M.; Aghimien, E.I.; Agbajor, F.D.; Yang, Z.; Lu, L.; Adeoye, A.R.; Gopaluni, B. A review on the integrated optimization techniques and machine learning approaches for modeling, prediction, and decision making on integrated energy systems. Renew. Energy 2022, 194, 822–849. [Google Scholar]
Emami Javanmard, M.; Tang, Y.; Martínez-Hernández, J.A. Forecasting air transportation demand and its impacts on energy consumption and emission. Appl. Energy 2024, 364, 123031. [Google Scholar]
Wang, S.; Ji, B.; Zhao, J.; Liu, W.; Xu, T. Predicting ship fuel consumption based on LASSO regression. Transp. Res. Part D 2018, 65, 817–824. [Google Scholar]
Bülte, C.; Kleinebrahm, M.; Yilmaz, H.Ü.; Gómez-Romero, J. Multivariate time series imputation for energy data using neural networks. Energy AI 2023, 13, 100239. [Google Scholar]
Chen, T.; Guestrin, C. Xgboost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
Xu, Y.; Zhao, X.; Chen, Y.; Yang, Z. Research on a mixed gas classification algorithm based on extreme random tree. Appl. Sci. 2019, 9, 1728. [Google Scholar] [CrossRef]
Sun-Youn, S.; Sun-Youn, S.; Han-Gyun, W.; Han-Gyun, W. Energy Consumption Forecasting in Korea Using Machine Learning Algorithms. Energies 2022, 15, 4880. [Google Scholar] [CrossRef]
Emmanuel Hidalgo, G.; Jacqueline, G.; Matthew, J.B.; Boriboonsomsin, K. Comparative Assessment of Machine Learning Techniques for Modeling Energy Consumption of Heavy-Duty Battery Electric Trucks. In Proceedings of the 2024 Forum for Innovative Sustainable Transportation Systems (FISTS), Riverside, CA, USA, 26–28 February 2024. [Google Scholar]
Cortez, J.C.; Terada, L.Z.; Bandeira, B.V.B.; Soares, J.; Vale, Z.; Rider, M.J. Comparative Analysis of ARIMA, LSTM, and XGBoost for Very Short-Term Photovoltaic Forecasting. In Proceedings of the 2023 15th Seminar on Power Electronics and Control (SEPOC), Santa Maria, Brazil, 22–25 October 2023; IEEE: Piscataway, NJ, USA, 2023; pp. 1–6. [Google Scholar]
Al-Selwi, S.M.; Hassan, M.F.; Abdulkadir, S.J.; Muneer, A.; Sumiea, E.H.; Alqushaibi, A.; Ragab, M.G. RNN-LSTM: From applications to modeling techniques and beyond—Systematic review. J. King Saud Univ. 2024, 36, 102068. [Google Scholar]
Alsahaf, A.; Petkov, N.; Shenoy, V.; Azzopardi, G. A framework for feature selection through boosting. Expert Syst. Appl. 2022, 187, 115895. [Google Scholar] [CrossRef]
Wang, W.; Shi, W.; Nan, D.; Peng, Y.; Wang, Q.; Zhu, Y. New energy output prediction and demand response optimization based on LSTM-BN. Int. J. Renew. Energy Develop. 2025, 14, 72–82. [Google Scholar] [CrossRef]
Karim, F.K.; Khafaga, D.S.; El-kenawy, E.-S.M.; Eid, M.M.; Ibrahim, A.; Abualigah, L.; Khodadadi, N.; Abdelhamid, A.A. Optimized LSTM for Accurate Smart Grid Stability Prediction Using a Novel Optimization Algorithm. Front. Energy Res. 2024, 12, 1399464. [Google Scholar]
El-Naggar, N.; Madhyastha, P.; Weyde, T. Exploring the long-term generalization of counting behavior in RNNs. arXiv 2022, arXiv:2211.16429. [Google Scholar]
Goel, S.; Bajpai, R. Impact of uncertainty in the input variables and model parameters on predictions of a long short term memory (LSTM) based sales forecasting model. Mach. Learn. Knowl. Extr. 2020, 2, 14. [Google Scholar] [CrossRef]
Kandadi, T.; Shankarlingam, G. DRAWBACKS OF LSTM ALGORITHM: A CASE STUDY. SSRN 2025. [CrossRef]
Lazcano, A.; Hidalgo, P.; Sandubete, J.E. Walking Back the Data Quantity Assumption to Improve Time Series Prediction in Deep Learning. Appl. Sci. 2024, 14, 11081. [Google Scholar] [CrossRef]
Dahiru, A.B.; Mohammed, S. Forecasting United Kingdom’s energy consumption using machine learning and hybrid approaches. Energy Environ. 2022, 35, 1493–1531. [Google Scholar]
Hamed, A.; Behzad Rashidi, M.; Arian, R.; Hossein, K.; Mohsen Asghari, I. Comparative study of long short-term memory (LSTM), bidirectional LSTM, and traditional machine learning approaches for energy consumption prediction. Energy Explor. Exploit. 2024, 43, 281–301. [Google Scholar]
Hamidreza, E.; Hassan, S.; Muhammad, R.; Mobina, M. Innovative framework for accurate and transparent forecasting of energy consumption: A fusion of feature selection and interpretable machine learning. Appl. Energy 2024, 366, 123314. [Google Scholar]
Ahmad, T.; Zhang, D. A critical review of comparative global historical energy consumption and future demand: The story told so far. Energy Rep. 2020, 6, 1973–1991. [Google Scholar]
McKinnon, A.C.; Ge, Y. The potential for reducing empty running by trucks: A retrospective analysis. Int. J. Phys. Distrib. Logist. Manag. 2006, 36, 391–410. [Google Scholar]
De Vos, J.; Alemi, F. Are young adults car-loving urbanites? Comparing young and older adults’ residential location choice, travel behavior and attitudes. Transp. Res. Part A 2020, 132, 986–998. [Google Scholar] [CrossRef]
Champahom, T.; Jomnonkwao, S.; Nambulee, W.; Klungboonkrong, P.; Karoonsoontawong, A.; Ratanavaraha, V. Analyzing transport mode choice for aging society in Thailand. Eng. Appl. Sci. Res. 2020, 47, 383–392. [Google Scholar]

Figure 1. Multivariate Long Short-Term Memory (LSTM).

Figure 2. Transport energy consumption, GDP, and population trends. Note: MTOE denotes million tons of oil equivalent.

Figure 3. Vehicle registration trends by category and transport energy consumption.

Figure 4. Vehicle kilometers traveled (VKT), patterns, and transport energy consumption.

Figure 5. Correlation matrix heatmap.

Figure 6. Transport energy consumption prediction using LSTM (1993–2022).

Figure 7. Transport energy consumption prediction using XGBoost (1993–2022).

Figure 8. Feature importance analysis for transport energy consumption prediction.

Table 1. Summary of the literature review on transport energy consumption prediction.

Authors	Country	Variables	Analysis Methods	Key Findings
Ji et al. [13]	China	Transport energy consumption, GDP, Population, Vehicle kilometers, Car ownership	LSTM, Deep Learning, SVM	Deep Learning outperformed other methods with 3.66% annual increase prediction, Strong correlation between economic factors and CO₂ emissions
Hoxha et al. [16]	Turkey	Energy consumption, Transport modes, Infrastructure data, Vehicle kilometers	XGBoost, Stacking ensemble models	Stacking ensemble models achieved R² of 0.99, Vehicle kilometers and GDP were key predictors
Çınarer et al. [15]	Turkey	Transport sector data, Economic indicators, Vehicle kilometers	XGBoost, SVM, MLP	XGBoost highest accuracy in CO₂ prediction, Scenario 4 yielded best results across algorithms
Ağbulut [2]	Turkey	Transport emissions, Energy efficiency, Economic metrics	ANN, SVM, Deep Learning	SVM and ANN algorithms excelled in forecasting, a 3.4 times increase predicted by 2050
Henriques et al. [25]	Brazil	Public transport data, Energy usage, Consumption patterns	K-means clustering, Machine learning	K-means clustering outperformed other methods, Effective classification of consumption patterns
Champeecharoensuk et al. [17]	Thailand	Transport energy use, GHG emissions, Aircraft activity	Statistical analysis, Trend modeling	GHG emissions tripled from 2008 to 2019, Economic growth key driver of emissions
Zhang et al. [14]	China	Energy consumption, Urban expansion, Carbon emissions	XGBoost, Statistical methods	XGBoost achieved low RMSE of 0.036, Population and GDP key drivers of emissions
Rahman et al. [20]	Saudi Arabia	Energy consumption, Vehicle types, Economic metrics	SVR, ANN, Econometric analysis	SVR outperformed ANN in predictions, GDP and urbanization significant factors
Lin et al. [8]	China	Transport energy, Economic data, Modal shares	DEA, Statistical analysis	Rail most efficient mode, 443,126 ktce projected for 2020
Pongthanaisawan et al. [1]	Thailand	Energy use, Economic data, Fuel consumption	Econometric models, Statistical analysis	Fossil fuel consumption projected to reach 70,783 ktoe by 2030, Biofuels could reduce emissions by 1.8%
Supasa et al. [7]	Thailand	Energy consumption, Economic indicators, CO₂ emissions	Input–output analysis, Consumption-based approach	Construction and commercial sectors significant CO₂ emitters, Consumption-based policies recommended
Shams Amiri et al. [21]	USA	Household transport energy, Trip characteristics, Socioeconomic factors	Neural Networks, Random Forest, Decision Trees	Neural Network model outperformed decision trees, Motorized trips, and distance key predictors

Table 2. Explanation of dataset.

Variables	Unit	Explanation	Sources
Transport Energy Consumption	(MTOE)	Energy consumption in the transport sector	Energy Policy and Planning office (EPPO), Ministry of Energy, Thailand
GDP	(10¹² Baht)	Gross Domestic Product	Bank of Thailand
Population	(10⁶ Peoples)	The total population is determined using the de facto approach, which includes all individuals residing in a given area, regardless of their citizenship or legal status.	World Bank
Registered-vehicle Small	(10⁶ vehicles)	Number of small registered vehicles (including motorcycles, mopeds)	Department of Land Transport
Registered-vehicle Medium	(10⁶ vehicles)	Number of medium l registered vehicles (including passenger cars, pick-up trucks, vans)
Registered-vehicle Large	(10⁶ vehicles)	Number of large registered vehicles (including buses, trucks, trailers)
VKT Motorcycle	(10⁶ vehicle kilometers)	Vehicle kilometers traveled by motorcycles.	Bureau of Highway Safety, Department of Highways, Thailand
VKT Passenger	(10⁶ vehicle kilometers)	Vehicle kilometers traveled by passenger vehicles (passenger cars carrying fewer than seven persons, passenger cars carrying more than seven persons, light buses, medium buses, and heavy buses).
VKT Truck	(10⁶ vehicle kilometers)	Vehicle kilometers traveled by trucks (light trucks, medium trucks, heavy trucks, full trailers, and semi-trailers).

Table 3. Historical Data.

Year	Transport Energy Consumption	GDP	Population	Registered-Vehicle Small	Registered-Vehicle Medium	Registered-Vehicle Large	VKT ^a	VKT ^b	VKT ^c
1993	14.581	4.341	57.776	7.313	3.030	0.533	7.313	3.138	0.426
1994	15.420	4.688	58.610	8.303	3.484	0.562	8.303	3.595	0.451
1995	17.903	5.069	59.425	9.363	3.898	0.611	9.363	4.010	0.499
1996	18.984	5.355	60.211	10.764	4.423	0.681	10.764	4.537	0.567
1997	20.253	5.208	60.989	11.700	5.003	0.728	11.700	5.118	0.613
1998	18.075	4.810	61.745	12.514	5.379	0.741	12.514	5.499	0.621
1999	18.297	5.030	62.443	13.298	5.823	0.731	13.298	5.941	0.613
2000	18.022	5.254	63.067	13.868	5.952	0.775	13.868	6.074	0.653
2001	18.632	5.435	63.650	15.285	6.287	0.804	15.285	6.418	0.674
2002	19.636	5.770	64.223	16.631	6.868	0.823	16.631	7.002	0.690
2003	20.927	6.184	64.777	18.262	7.116	0.809	18.262	7.247	0.678
2004	22.812	6.573	65.311	13.232	6.454	0.810	13.232	6.579	0.685
2005	23.491	6.849	65.821	14.574	7.022	0.842	14.574	7.148	0.716
2006	22.985	7.189	66.320	15.803	7.973	0.849	15.803	8.103	0.719
2007	23.615	7.580	66.827	16.143	8.402	0.880	16.143	8.535	0.748
2008	23.024	7.710	67.328	16.449	8.833	0.906	16.449	8.967	0.772
2009	24.132	7.657	67.814	16.729	9.258	0.926	16.729	9.393	0.791
2010	24.594	8.232	68.270	17.323	9.888	0.955	17.323	10.026	0.817
2011	25.480	8.302	68.713	18.175	10.652	0.990	18.175	10.789	0.853
2012	26.230	8.903	69.157	19.169	11.828	1.037	19.169	11.967	0.898
2013	26.943	9.142	69.579	19.987	13.024	1.104	19.987	13.165	0.963
2014	26.801	9.232	69.961	20.328	13.794	1.153	20.328	13.940	1.008
2015	28.501	9.521	70.294	20.519	14.421	1.185	20.519	14.575	1.031
2016	30.190	9.849	70.607	20.497	15.004	1.214	20.497	15.162	1.056
2017	32.319	10.260	70.898	20.718	15.697	1.250	20.718	15.857	1.090
2018	33.086	10.693	71.128	21.100	16.498	1.286	21.100	16.662	1.122
2019	33.607	10.919	71.308	21.425	17.281	1.315	21.425	17.447	1.150
2020	29.699	10.259	71.476	21.588	17.851	1.326	21.588	18.003	1.174
2021	27.460	10.420	71.601	21.864	18.374	1.343	21.864	18.515	1.202
2022	30.927	10.676	71.697	22.301	18.970	1.359	22.301	19.103	1.226
Min	14.581	4.341	57.776	7.313	3.030	0.533	7.313	3.138	0.426
Max	33.607	10.919	71.697	22.301	18.970	1.359	22.301	19.103	1.226
M	23.888	7.570	66.368	16.508	9.950	0.951	16.508	10.084	0.817
SD	5.234	2.133	4.238	4.212	4.876	0.244	4.212	4.889	0.230

Note: M denotes mean, SD denotes standard deviation, and VKT denotes vehicle kilometers traveled. ^a VKT motorcycle, ^b VKT passenger, and ^c VKT truck.

Table 4. Optimized LSTM hyperparameters.

Hyperparameter	Optimized Value
units_1 (LSTM Layer 1)	32
dropout_1	0.4
units_2 (LSTM Layer 2)	96
dropout_2	0.3
dense units (Dense Layer)	64
Epochs	17
Initial Epoch	0
Tuner Bracket	1
Tuner Round	0

Table 5. Optimized XGBoost Hyperparameters.

Hyperparameter	Optimized Value
colsample_bytree	0.7159
gamma	1.4561
learning_rate	0.1874
max_depth	3
min_child_weight	3
n_estimators	429
reg_alpha	4.5606
reg_lambda	7.8517
subsample	0.5998

Table 6. Comparison of LSTM and Xgboots model performance.

Model Evaluation	LSTM		XGBoost
Model Evaluation	Train Data	Test Data	Train Data	Test Data
Mean Squared Error (MSE)	4.9473	4.5695	0.0256	0.2790
Root Mean Squared Error (RMSE)	2.2243	2.1377	0.1599	0.5282
Mean Absolute Error (MAE)	1.9142	1.8057	0.0815	0.3200
Mean Absolute Percentage Error (MAPE)	9.36%	6.06%	0.49%	1.08%
R-squared (R²)	0.3140	0.2005	0.9976	0.9508

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Champahom, T.; Banyong, C.; Janhuaton, T.; Se, C.; Watcharamaisakul, F.; Ratanavaraha, V.; Jomnonkwao, S. Deep Learning vs. Gradient Boosting: Optimizing Transport Energy Forecasts in Thailand Through LSTM and XGBoost. Energies 2025, 18, 1685. https://doi.org/10.3390/en18071685

AMA Style

Champahom T, Banyong C, Janhuaton T, Se C, Watcharamaisakul F, Ratanavaraha V, Jomnonkwao S. Deep Learning vs. Gradient Boosting: Optimizing Transport Energy Forecasts in Thailand Through LSTM and XGBoost. Energies. 2025; 18(7):1685. https://doi.org/10.3390/en18071685

Chicago/Turabian Style

Champahom, Thanapong, Chinnakrit Banyong, Thananya Janhuaton, Chamroeun Se, Fareeda Watcharamaisakul, Vatanavongs Ratanavaraha, and Sajjakaj Jomnonkwao. 2025. "Deep Learning vs. Gradient Boosting: Optimizing Transport Energy Forecasts in Thailand Through LSTM and XGBoost" Energies 18, no. 7: 1685. https://doi.org/10.3390/en18071685

APA Style

Champahom, T., Banyong, C., Janhuaton, T., Se, C., Watcharamaisakul, F., Ratanavaraha, V., & Jomnonkwao, S. (2025). Deep Learning vs. Gradient Boosting: Optimizing Transport Energy Forecasts in Thailand Through LSTM and XGBoost. Energies, 18(7), 1685. https://doi.org/10.3390/en18071685

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Deep Learning vs. Gradient Boosting: Optimizing Transport Energy Forecasts in Thailand Through LSTM and XGBoost

Abstract

1. Introduction

2. Literature Reviews

3. Methodologies

3.1. Data Collection

3.1.1. Transport Energy Consumption

3.1.2. GDP

3.1.3. Population

3.1.4. Vehicle Registration Data (Small, Medium, Large)

3.1.5. Vehicle Kilometers Traveled

3.2. Multivariate Long Short-Term Memory (LSTM) Neural Networks

3.3. XGboost

3.4. Model Evaluation

4. Results

4.1. Descriptive Analysis

4.2. Model Fitting and Performance Comparison LSTM and XGBoost

5. Discussion

5.1. Comparative Performance of LSTM and XGBoost

5.2. Feature Importance Analysis and Policy Implications

5.3. Economic and Demographic Factors

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI