Enhancing Accuracy in Hourly Passenger Flow Forecasting for Urban Transit Using TBATS Boosting

Patel, Madhuri; Patel, Samir B.; Swain, Debabrata; Mallagundla, Rishikesh

doi:10.3390/modelling6020032

Open AccessArticle

Enhancing Accuracy in Hourly Passenger Flow Forecasting for Urban Transit Using TBATS Boosting

by

Madhuri Patel

^1,2,*,

Samir B. Patel

¹

,

Debabrata Swain

¹

and

Rishikesh Mallagundla

³

¹

Department of Computer Engineering, Pandit Deendayal Energy University, Gandhinagar 382007, India

²

Department of Information Technology, L. D. College of Engineering, Ahmedabad 380015, India

³

Department of Computer Engineering and Technology, Chaitanya Bharathi Institute of Technology, Hyderabad 500075, India

^*

Author to whom correspondence should be addressed.

Modelling 2025, 6(2), 32; https://doi.org/10.3390/modelling6020032

Submission received: 17 March 2025 / Revised: 4 April 2025 / Accepted: 10 April 2025 / Published: 17 April 2025

Download

Browse Figures

Versions Notes

Abstract

:

Passenger flow forecasting is crucial for optimizing urban transit operations, especially in developing countries such as India, where congestion, infrastructure constraints, and diverse commuter behaviors pose significant challenges. Despite its importance, limited research explored forecasting models for Indian urban transit systems, particularly incorporating the effects of holidays and disruptions caused by the COVID-19 pandemic. To address this gap, we propose TBATS Boosting, a novel hybrid forecasting model that integrates the statistical strengths of trigonometric, Box–Cox, ARMA, trend, and seasonal (TBATS) with the predictive power of LightGBM. The model is trained on a five-year real-world dataset from e-ticketing machines (ETM) in Thane Municipal Transport (TMT), incorporating holiday and pandemic-related variations. While Route 12 serves as a primary evaluation route, different station pairs are analyzed to validate their scalability across varying passenger demand levels. To comprehensively evaluate the proposed framework, a rigorous performance assessment was conducted using MAE, RMSE, MAPE, and WMAPE across station pairs characterized by heterogeneous passenger flow patterns. Empirical results demonstrate that the TBATS Boosting approach consistently outperforms benchmark models, including standalone SARIMA, TBATS, XGBoost, and LightGBM. By effectively capturing complex temporal dependencies, multiple seasonalities, and nonlinear relationships, the proposed framework significantly enhances forecasting accuracy. These advancements provide transit authorities with a robust tool for optimizing resource allocation, improving service reliability, and enabling data-driven decision making across varied and dynamic urban transit environments.

Keywords:

urban transit system; passenger flow forecasting; machine learning; hybrid forecasting model; time series forecasting

1. Introduction

Improvement in service quality, efficient resource allocation, waiting time minimization, and congestion management for the city transport system can be achieved with forecasting of passenger flow [1,2,3,4]. The various studies have been reported with machine learning, deep learning, and hybrid models for developed countries, but the unique situations faced by developing countries such as India still remain underexplored. These challenges are different in developing countries, such as different passenger behavior, infrastructure constraints, limited data availability, and limited usage of technologies, along with external conditions such as holidays and weather, which introduce additional complexity in forecasting [5,6]. For the sustainable transport operations, accurate and reliable forecasting in the mentioned context is essential [7].

To capture the effect of specific events such as regional holidays based on the moon cycle and the behavior of weekdays and weekends over a cycle, the model must be equipped with a sufficiently large dataset for training [8]. In addition to these temporal dynamics, the urban transportation system was significantly disrupted during the COVID-19 pandemic due to various regulatory constraints. These included complete lockdowns, social distancing mandates, reduced passenger capacity, and restricted operational hours, all of which severely impacted public transport services. Multiple studies focusing on the studied region indicate that in the post-pandemic phase, passenger commuting behavior required a substantial period to recover to pre-pandemic levels [9]. The study by Padmakumar et al. further highlights that public transportation usage declined sharply by approximately 70% to 90% during the peak of the pandemic, including lockdown periods [10]. Subsequently, commuter preferences shifted more permanently. A report by TERI India reveals that 35% of commuters altered their primary mode of travel following the pandemic [11]. While similar studies on post-pandemic travel behavior have been conducted for various global cities, findings suggest that the patterns emerging in Indian cities do not align precisely with international trends [12,13]. As a result, examining the post-pandemic travel dynamics specific to Indian urban environments becomes essential for developing accurate and context-sensitive forecasting models. Extensive work on city transit systems in developed countries has been reported, yet Indian urban transport remains underexplored. The clear distinction in post-pandemic travel patterns and demands and lunar-based cultural holiday effects is a major distinguishing feature from reported studies. The available reported study on similar problems for India covers conventional temporal features such as periodicity and trends [14,15] and some consideration of external features such as weather and holidays [16,17]. It is to be noted that some are extensive features that affect the commute.

The following are key contributions of the presented research:

This study identifies the gap between existing research and the operational challenges of urban public transport systems in the Indian subcontinent. The identified gap has been addressed by developing a hybrid model for hourly passenger flow forecasting.
The study encompasses the results of Route 12 operating from Thane Station (West) to Pawar Nagar of the Thane Municipal Transport System. This route is one of the highest commute routes and serves as a benchmark, having typical commuting behavior for this region. The addressed challenges have been tested on the entire network, including this route that provides insight to the civic authorities for further planning.
Hourly passenger flow forecasting has been carried out at the level of individual station pairs using a range of models, including SARIMA, TBATS, XGBoost, LightGBM, and the proposed TBATS Boosting model. TBATS, which integrates trigonometric seasonality, Box–Cox transformation, ARMA errors, and trend components, has been applied to capture the underlying statistical patterns in the data. The hybrid TBATS Boosting model enhances this framework by incorporating LightGBM for residual learning, enabling the capture of nonlinear dependencies and external influences. To the best of our knowledge, this is the first study from the Indian subcontinent that focuses on forecasting hourly passenger flow between city transit station pairs, explicitly incorporating regional holidays, a large historical dataset of hourly records, and the effects of the COVID-19 pandemic.
The forecasting results from all models have been compared with actual passenger flow data for the same period. Performance has been rigorously evaluated under diverse operational conditions, including weekdays, different time slots, and holidays, across station pairs with high, moderate, and low passenger volumes. The evaluation has been conducted using standard metrics such as MAE, RMSE, MAPE, WMAPE, and MREP, offering a comprehensive assessment of forecasting accuracy in realistic transit scenarios.
The proposed hybrid model demonstrates superior ability to capture dynamic passenger flow patterns compared to standalone frameworks. It offers actionable insights for transport authorities to optimize scheduling, improve resource allocation, and reduce passenger waiting times. The model is highly applicable in real-world operational settings, particularly in contexts affected by regional holidays, fluctuating demand, special events, and unexpected disruptions such as pandemics.

The paper is structured as follows: Section 2 presents an extensive literature review, critically examining existing methodologies and identifying key gaps. Section 3 provides a comprehensive dataset description, including exploratory data analysis and an assessment of feature importance. Section 4 details the methodological framework, outlining the modeling process and algorithmic design. Section 5 evaluates the performance of the proposed model across varying passenger flow scenarios using MAE, RMSE, and MREP, with a comparative analysis of standalone TBATS and the hybrid TBATS Boosting approach. Finally, Section 6 synthesizes the key findings, explores their practical implications, and suggests future directions for enhancing predictive capabilities in urban transit systems.

2. Literature Review

Due to the availability of various machine learning and deep learning models, the passenger flow predictions and forecasting caught great improvement. In the initial phase of analysis, the studies were carried out based on historical average models for traffic flow prediction with limited accuracy, as the passenger flow data [18,19,20] do not have adequate stability and periodical behavior. Later on, a Kalman filter method, particle filter, and its improvements are utilized by several researchers based on behavioral analysis for forecasting passenger flow and traffic flow [21,22,23,24,25,26]. Few research works have also been reported based on popular statistical models, including autoregressive, moving average, and autoregressive moving average models. Work based on ARIMA provides accurate results for non-stationary time series data with trends without having the effect of seasonality [27,28]. It is also observed that seasonality has been handled by SARIMA, including high passenger flow during peak hours or on specific days of the week, while SARIMAX is ideal for seasonal data influenced by external factors [29,30,31,32]. Statistical models lack the ability to handle categorical variables, specifically on large-scale datasets. The development of various machine learning and deep learning algorithms offers a decent prospect for improving the forecasting of passenger demands. Liu et al. work with a modular CNN-based model with decision tree features on large passenger flow [33]. Covering multi-station subway passenger flow with support vector machines (SVM) is reported by Li et al. to show stability of the prediction model [34]. Zhang et al., combined with the residual network (ResNet), graphic convolutional network (GCN), and long and short-term memory (LSTM), put forward a deep learning architecture [35]. Predictions of short-term passenger flow have also been reported by various researchers on boosting models. The multi-feature gradient boosting model on the real passenger flow dataset was reported by Zixian Xu et al. [36]. Significant feature considerations, such as flow direction, holidays, lunar calendars, and previous commute history, were reported by Liu et al. [37] with the random forest model. Later on, the improvement in accuracy for forecasting passenger flow was testified to with various ensemble tree models. Models with a combination of singular spectrum analysis (SSA) and the AdaBoost-weighted extreme learning machine (AWELM) for forecasting were used by Zhou et al. [38]. Recently, boosting models demonstrated highly effective results for intelligent transportation system (ITS) applications, particularly in traffic and passenger flow forecasting. The HI-XGB model, which integrates extreme gradient boosting (XGB) with SHAP values, demonstrated superior accuracy in urban traffic speed prediction by capturing complex spatiotemporal relationships, including the influence of non-adjacent links [39]. In parallel, a deep multimodal model leveraging vehicle detection systems achieved more than 90% accuracy with consideration of features such as peak hours and weekends, highlighting its robustness and suitability for real-time applications [40].

The linear and non-linear behavior of passenger flow can be captured by hybrid models. The recent studies show significant interest in passenger flow analysis through various hybrid models. Guo et al. proposed the hybrid model of support vector regression and LSTM for capturing the behavior of unusual passenger flow [41]. The other time series and statistical-based hybrid models have also been reported by various researchers with promising results [42,43,44,45]. Pan et al. proposed a hybrid FD–Markov–LSTM model that captures residual patterns using LSTM after initial Markov modeling, aligning with the boosting concept of sequential error correction. By integrating traffic conditions into the framework, the model significantly outperformed traditional methods, demonstrating its effectiveness for advanced forecasting applications [46]. The city passenger flow was severely affected during the COVID-19 pandemic, and it is also observed that the post-pandemic behavior of commuting deviated significantly. Some important work has been reported that helps to recognize the effects of data related to COVID-19 cases and passenger flow [47,48,49,50].

The study pertaining to Indian cities is listed in Table 1 with a detailed analysis. Table 1 includes the key insight about various features considered for city transport passenger flow studies. It is inferred that despite advancements and the inclusion of various requisite features, a significant key gap persists in Indian transit research. The city transport behavior is different than intercity transport services. The city transport system requires critical accuracy and insight about patterns for route optimization. This is possible with passenger flow forecasting that has been carried out with the minimum possible time slots and a sufficiently longer time span. Limited consideration has been given to identifying the categorical features such as time slots, perilous variables such as holidays, and disruptive events, such as the COVID-19 pandemic, weather, and other parameters.

Yun et al. [51] proposed an advanced approach for rail transit systems through a hybrid model by capturing long-term trends through LSTM, dynamic regression by KNN cluster, and integrating weights through LightGBM. By integrating LightGBM with variational mode decomposition (VMD), enhanced passenger volume forecasting has been reported by Zhang et al. [52]. VMD reduces the noise data by reconstructing primary signal modes with a tree of Parzen estimators (TPE) optimization that achieves optimal performance. As discussed earlier in Table 1, for Indian cities, traditional statistical models such as SARIMA and Holt–Winters have been used for short-term passenger flow forecasting. The hybrid models have also been implemented majorly for intercity daily passenger flow analysis by combining LSTM with RNNs and SM-LSTM models. Perone et al. utilized time series forecasting models such as ARIMA, ETS, NNAR, and TBATS and predicted a rise in various levels of COVID-19-affected patients in Italy for the post-October, 2020 period [53]. Similarly, the TBATS performance has been reported to be more accurate against the ARMA study for HFMD disease in China for the dataset from 2009 to 2019 [54]. Thayyib et al. [55] forecasted monthly GST revenue with linear and non-linear features by comparing ARIMA with TBATS, ANN, NNAR, and hybrid models. The TBATS Boosting model captures linear and non-linear dynamics effectively and provides the most accurate result. The key advancement of the current study compared to recently reported studies is summarized in Table 2. The work reported by Hayal et al. [5] and Liu et al. [31] mainly focused on daily and hourly forecasting without considering the effect of the pandemic. Zhang et al. [35] take into account the impact of the pandemic in a partial manner without integrating external features. Perone et al. [53] attempted to forecast COVID-19 cases with the TBATS model but have not considered the external features such as lockdowns or other constraints. This study is based on a standalone TBATS model. The proposed study addresses these limitations through a hybrid TBATS Boosting model that fully integrates pandemic data, incorporates a wide range of holiday types, and captures cyclic time features and weekday patterns, effectively handling multi-seasonality and non-linearity for robust and adaptable urban transit forecasting.

Table 1. Various passenger flow related studies reported for Indian cities.

Reported Studies	Study Area	Data	Weather	COVID-19	Holiday	Method
Shivaraj Halyal et al. [5]	Intercity (Hubballi-Dharwad BRTS)	December 2019–February 2020 (91 days)	×	×	×	LSTM, SARIMA
Shanthappa et al. [14]	Intercity (Within Udupi City)	January 2022–December 2022 (365 days)	✓	×	×	RPTW-LSTM
Nagaraj et al. [56]	Karnataka State Road Transport Corporation (KSRTC)—Various Regions	KSRTC dataset	×	✓	×	LSTM, RNN, and greedy layer-wise algorithm
Thandassery et al. [57]	Kochi Metro Rail System (KMRL), Kerala, India	AFC data from 2017 to 2019	×	×	×	Station Memorizing LSTM
Cyril et al. [58]	Trivandrum City, Kerala, India	ETM data KSRTC, 2011–2013	×	×	×	Holt–Winters’ additive and multiplicative models
Cyril et al. [59]	Inter-district travel in Kerala (Trivandrum to five districts)	ETM data KSRTC, 2010–2013	×	×	×	ARIMA
Gummadi et al. [60]	Macherla route, Andhra Pradesh, India	Transit data from Macherla route for April 2016–December 2016	×	×	×	ARIMA, artificial neural network (ANN)

× indicates that corresponding feature is not included, ✓ indicates that corresponding feature is included.

Table 2. Comparative analysis of recent studies with similar approaches and their benchmarking against the present research.

Study	Geographical Focus	Models	External Features	Pandemic Impact	Forecasting Granularity	Key Limitations
Shivaraj Halyal et al. [5]	Intercity (Hubballi-Dharwad BRTS)	LSTM, SARIMA	Weather, weekday/weekend	×	Daily passenger flow	No pandemic data; Limited temporal features; inter city transport
Zhang et al. [35]	Intercity, China	RNN, GCN, LSTM	Holidays, weather	✓ (partial)	Hourly (Metro systems)	Limited to metro; no multi-seasonality handling
Perone et al. [53]	Italy	ARIMA, TBATS	COVID-19 cases (basic integration)	✓	Monthly forecasting	Focus on disease data; lacks external variability
Liu et al. [31]	Multi-city, China	CNN, decision trees	Passenger history, Time of day, lunar cycles	×	Hourly passenger flow	No pandemic context; overfitting in deep learning models
Proposed Study	Thane City, India	TBATS + LightGBM (hybrid)	Holidays (27 types), COVID-19 cases, cyclic time features (sin_time, cos_time), weekday patterns	✓ (full integration)	Hourly forecasting for diverse station pairs (high, moderate, low)	-

× indicates that corresponding feature is not included, ✓ indicates that corresponding feature is included.

To effectively address the challenges of non-linearity and trend variations in passenger flow forecasting, this study proposes a hybrid TBATS Boosting model tailored for the TMT system. Unlike conventional approaches, this model integrates regional lunar-based holidays and accounts for pandemic-induced shifts in travel behavior, ensuring a more adaptive and resilient forecasting framework.

3. Data Description and Analysis

3.1. Study Area and Data Source

Thane city is located in the Mumbai Metropolitan Region (MMR), Maharashtra, India, renowned for its rapidly growing residential and industrial areas. It connects Mumbai with other neighboring regions. The urban transit system operating for the Thane region in Figure 1 is known as Thane Municipal Transport (TMT).

The TMT operates 158 routes, including special routes with 460 buses, and completes approximately 80,000 trips per month. Ticketing volume per month is an average of 32 lakhs, which highlights its critical role in urban mobility. The TMT network is presented in Figure 1, indicating its routes with details of commuters between various places. The visualization was created using Kepler.gl, an open-source geospatial analysis tool [61].

3.2. Data Description

Hourly passenger flow forecasting has been carried out using the real-world dataset obtained from the electronic ticketing machines (ETMs) for the period of January 2019 to March 2024. A total of 1,830,954 passenger ticket records were fetched with ticket information, viz., issuance date and time, route number, and boarding and alighting station details. For the discussion purpose, this study focuses on Route 12, operating between Thane Station West and Pawar Nagar. This route is one of the highest ticketing routes, having operational significance in the TMT network. Figure 2 highlights the map of Route 12 with important bus stops and its connectivity. The places have been described in regional i.e., Hindi and English language.

3.3. Data Pre-Processing and Exploratory Data Analysis (EDA)

Preprocessing and analysis are essential for providing proper time series datasets and identifying dependent features for time series models. The magnitude and attributes of available passenger records have already been discussed. For the reliable input dataset, the pre-processing steps were designed to transform this data into a structured format for time series modeling.

The stepwise pre-processing is achieved in five steps as follows:

Mapping and Cleaning:
- Prepared reference dictionary to convert station names from Marathi to English.
- Assign a unique number to each station and ensure unique station names and numbers.
- Eliminated extraneous features such as Sourceid, Destinationid, RouteNumber, and others.
Outlier Removal:
- Handling outliers occurs due to unforeseen events after analyses of station-wise day-wise ticket counts.
Feature Engineering:
- Generation of a Station_Pair feature by concatenating source and destination station names.
- Identification of temporal features for time series analysis, viz. month, weekday, and date and time of travel.
- Insertion of a binary feature Is_Holiday attribute and a categorical feature Holiday_Type (27 distinguished categories) to capture holiday effects.
- Introduced cyclic time features in the form of sin_time and cos_time for hourly and Weekday_Timeslot features to encapsulate behavioral variations.
Handling Missing Data:
- Resolved temporal consistency through incorporating missing time slots with zero passenger counts.
Integration of Pandemic Data:
- Incorporated COVID-19 indicators, including daily confirmed, deceased, and recovered cases, to capture pandemic-related impacts on passenger flow.

This concise yet comprehensive technique ensures that the dataset reflects the temporal, pandemic, and holiday features required for determined efficiency through hybrid modeling. The final list of dataset features is provided in Table 3.

3.4. Exploratory Data Analysis (EDA)

Exploratory data analysis (EDA) of the dataset was carried out to examine the core patterns and dependent features that reflect the crucial insights into the factors impacting passenger flows. Being a sensitive time series model, this critical step helps to finalize the temporal trends and external effects such as peak and off-peak travel periods, weekday and holiday effects, and major disruptions caused during the timeline affected by COVID-19. By integrating these analytical approaches, the study gave actionable insights into city transport passenger demands and their mobility patterns, allowing for the creation of robust feature sets for demand forecasting and transit system optimization.

To assess the applicability of models across various scenarios, the analysis shall be carried out on different OD pairs representing higher, moderate, and lower passenger commutes. The first analysis was carried out to identify such station pairs. For reporting purposes in this article, the following station pairs have been identified:

The Thane_Station_West to Pawar_Nagar, classified as a high-passenger-flow pair, usually possesses over 2000 daily passengers in pre-pandemic periods, with the highest peak of 4000 passengers in the studied range; whereas the hourly peaks reach approximately 400 passengers during peak hours, indicating its essentiality in the TMT network.
Classified as a moderate-passenger-flow pair, the Thane_Station_West to Voltas_Gate pair accommodates a daily passenger flow ranging from 1000 to 3000 in a normal situation. The observed hourly demand is approximately 250 passengers during peak hours. This pair is striking a balance between high demand and localized transport.
The Pawar_Nagar to Civil_Court pair, classified as a low-demand station pair, normally has a daily passenger demand of less than 1000 passengers, with the highest reported count being 800, and hourly peaks around 80 passengers during rush hours, indicating localized demand.

Each mentioned station pair is covering all similar OD pairs, reflecting volumes of passenger groups. Figure 3 gives the details of daily passenger flow graphs, based on the entire dataset from January 2019 to March 2024, and hourly graphs focus on January 2023. The daily graph provides a comprehensive interpretation of long-term trends, and the hourly graph ensures better visibility of short-term patterns. It is apparent from graphs that all pairs have a significant drop in passenger flow during early 2020 and 2021, conforming to COVID-19 lockdowns. Gradual but incomplete recovery in subsequent years has been observed. Seasonal and temporal variations are visible across all categories of station pairs, with weekday peaks and lower commutes on weekends or holidays, highlighting the significance of accounting for temporal features, multiple seasonality, and external impacts in transit forecasting.

The effects of regional holidays are not coherently observed; therefore, it is essential to categorize the types of various holidays based on their impact on passenger commutes. The heatmap provided in Figure 4 enables the identification of the relationship between various holidays and passengers traveling on that day, highlighting significant variations in transit demand. Some festivals, such as Mahavir Jayanti, Guru Nanak Jayanti, and Independence Day, have higher passenger counts, consistent with the highest average passenger count (942) reported during non-holidays, most likely due to cultural or public events that encourage mobility, whereas a significant reduction in demand is observed during holidays such as Holi, Ambedkar Jayanti, and Diwali, mainly due to the nature of these holidays being more home-centric or locally celebrated. This exploratory analysis highlights the essential requirement of both attributes, such as the binary holiday feature (holiday vs. non-holiday) and the categorical feature holiday type, that enable models to capture the impact of individual holidays. For forecasting commutes in countries such as India, which have several cultural and religious holidays, this feature is extremely crucial for scheduling routes and resource allocation.

The trend of the hourly passenger count over a period of a week is provided in Figure 5, which reveals the behavior of passenger commutes with significant variation. The weekdays have a dual peak pattern with sharp demand observed during morning time from 8 to 10 a.m. and during evening time from 5 to 7 p.m., with minor dilution corresponding to work and school-related travel. Against the weekday, the flat trend with significantly lower passenger count is observed during Sunday, reflecting lower transit demand due to recreational or non-commuter activities. The passenger commute during Saturday is not in line with other working days but is observed just beneath the same. This indicates that the model should be incorporated with temporal features such as sine time and cosine time to capture daily cyclic patterns and also weekday timeslot, which accounts for distinct behavioral differences between weekdays and weekends.

The COVID-19 pandemic significantly impacted city transport systems around the world. The passenger commute was restricted by various reasons, such as reported cases and restrictions in the form of lockdowns. The timeline graph indicating the COVID-19 cases reported, counts of deceased cases, and recovered cases with lockdown phases in the studied region is shown in Figure 6. The strict lockdown phase in 2020 led to minimal mobility, while the second wave during April–June 2021 caused another sharp decline in transit activity due to surging cases. During this time, mainly buses were restricted to operate with constrained capacity. The entire effect of COVID-19 is clearly observed on passenger count as described in Figure 3. Although the effect of the pandemic is gradually vanishing, the passenger commute has not reached the pre-pandemic level, indicating a change in travel behavior.

The correlation of passenger commute with the pandemic can be easily seen in Figure 3, with various COVID-19 statistics depicted in Figure 6 that support the logical inclusion of COVID-19-related features—such as confirmed, recovered, and deceased cases—into forecasting models.

4. Methodology

During the passenger commute analysis, various patterns of commute are observed for a single route itself. It becomes essential to capture multiple seasonality trends as well as sudden changes in passenger flow due to the COVID-19 pandemic effect and holidays. With the same consideration, the study adopts the forecasting of passengers through hybrid modeling. The model incorporates the capabilities of the statistically based TBATS model with the machine learning-based LightGBM model aimed at high accuracy and robustness to outliers. The temporal dependencies, such as seasonality and trend, are captured with TBATS, and the residuals that occur from TBATS are handled by LightGBM. The models complement each other, as the ability of the TBATS to capture the temporal dependencies such as multiple seasonality and trend where LightGBM handles residuals from TBATS arises due to non-linearity. The final passenger count is forecasted by adding TBATS predictions with corresponding LightGBM-modeled residuals.

4.1. TBATS

The requirement of fewer parameters during model building and the ability to handle complex datasets with multiple seasonal patterns and long-term trends, as observed in this dataset, make TBATS more appropriate for forecasting compared to conventional statistical models such as ARIMA, SARIMA, and BATS. The ability of the ARMA model for forecasting is limited to stationary data having no seasonal pattern. TBATS incorporates the non-stationary data and models intricate seasonal dependencies. This flowchart presented in Figure 7 visually represents the TBATS model for time series forecasting of passenger counts, highlighting key components such as Box–Cox transformation, seasonal adjustments, and trend modeling. The process culminates in the ARMA error model, which refines the final forecasting output.

Advanced ARMA, also referred to as SARIMA, handles seasonal variation; however, it is constrained to a single seasonality. TBATS overcomes this limitation by capturing multiple seasonality through a trigonometric term and stabilizing variance using a Box–Cox transformation. Compared to its predecessor, BATS, TBATS requires fewer parameters for forecasting. This integral ability allows TBATS to handle the multiple seasonal patterns and evolving trends in an efficient manner.

4.1.1. Box–Cox Transformation

The Box–Cox statistical transformation approach stabilizes the variance of a dataset, causing it to adhere closely to the normal distribution. Normally distributed data are an essential component for many forecasting models. This transformation is especially important when working with time series data that are highly nonlinear and heteroscedastic. The transformation is defined as:

Y_{t}^{(λ)} = \{\begin{matrix} \frac{Y_{t}^{λ} - 1}{λ} λ \neq 0 \\ \log (Y_{t}) λ = 0 \end{matrix}

(1)

where

Y_{t}

represents the original data and

λ

is a parameter that adjusts the transformation.

4.1.2. Trend Component

The trend captures the long-term movement in the volume of passengers over time. To limit the impact of the slope over time, TBATS incorporates a damped trend, which ensures that the trend adopts smoothly over time while preventing extreme decline or growth, making it suitable for real-world forecasting scenarios. The trend can be modeled as follows:

μ_{t} = l_{t - 1} + φ b_{t - 1} + ϵ_{t}

(2)

where

φ

denotes a damping factor,

l_{t}

is the level,

b_{t}

is the slope and

ϵ_{t}

is the error term.

4.1.3. Seasonal Component

Seasonality in time series forecasting refers to recurring patterns in data that occur at regular time intervals. TBATS handles multiple and non-integer seasonality using trigonometric Fourier-based seasonality represented as follows:

s_{t} = \sum_{j = 1}^{J} [ᵧ_{j 1} s i n (\frac{2 ᴨ j t}{T}) + ᵧ_{j 2} \cos (\frac{2 ᴨ j t}{T})]

(3)

where T is the seasonal period, J is the number of harmonics, and

ᵧ_{j 1}

,

ᵧ_{j 2}

are coefficients. This formulation enables the model to handle complex and overlapping seasonal patterns, which improves the accuracy of time series forecasting in dynamic scenarios.

4.1.4. ARMA Errors

The autoregressive moving average (ARMA) component in the TBATS model is important to capture short-term autocorrelation and residual dependencies that have been ignored while capturing trend and seasonality. ARMA justifies the residual autocorrelation by applying the ARMA (p, q) model, where p denotes the order of autoregressive terms and q shows the order of moving average terms. Using this TBATS ensures that the dependencies in the error terms are effectively modeled, leading to robust forecasting. The ARMA is modeled as follows:

ϵ_{t} = φ^{1} ε_{\{t - 1\}} + φ^{2} ε_{\{t - 2\}} + \dots + φ_{p ε_{\{t - p\}}} + θ^{1} e_{\{t - 1\}} + θ^{2} e_{\{t - 2\}} + \dots + θ_{q e_{\{t - q\}}} + e_{t}

(4)

where

ϵ_{t}

refers to the current residual along with

φ^{1}

,

φ^{2}

, …, and

φ^{p}

, which are the terms reflecting coefficients for the autoregressive (AR). The coefficient for the moving average (MA) is given by

θ^{1}

,

θ^{2}

,…, and

θ_{q}

;

e_{t}

is the current white noise error term; and

e_{\{t - 1\}}

,

e_{\{t - 2\}}

, …, and

e_{\{t - q\}}

are the past white noise error terms.

It has been observed that the model sometimes captures unrealistic growth or drop over time; this is tackled by integrating a damping mechanism that ensures long-term trends by lowering their influence over time. This critical characteristic ensures that the system remains stable in the face of changing forecasting conditions. Among different suitable models, the TBATS works more appropriately due to its capacity to handle multiple seasonality, non-linear trends, and auto-correlated errors, which makes it a robust framework for dynamic forecasting in complex and diverse environments.

4.1.5. TBATS Model Configuration and Training

Multiple seasonal patterns and non-linear trends of time series passenger flow data are to be captured by the TBATS model. Most of the TBATS parameters will be automatically selected while model building, being in this case that the AR term is enabled and the seasonal periods have been manually identified based on domain knowledge. The seasonal periods are 24 h to capture daily seasonality and 148 h for capturing additional latent seasonal cycles. Apart from that, the following components were automatically optimized by the algorithm with the help of internal model selection criteria, viz.

Trend component: auto-selects linear, damped, or no trend based on model fit criteria.
Box–Cox transformation: applied for variance stabilization if it improves model performance.
Damped trend: introduced when necessary to control long-term forecast growth.

This method allows dynamic modeling with multiple seasonality that is essential for passenger counts.

4.2. LightGBM

The LightGBM model is based on a gradient boosting framework that generates sequential decision trees to abate a loss function. This enables the model to capture complex and non-linear relationships in data. The light gradient boosting machine (LightGBM) is a powerful machine learning model that functions effectively for modeling residual errors that occur from statistical models such as TBATS in hybrid forecasting frameworks. The model builds an optimized objective function by integrating weak learners into a strong learner through gradient descent. The function can be expressed as

O b j e c t i v e = \sum_{i = 1}^{n} L (y i, \hat{y i}) + \sum_{k = 1}^{K} Ω (T_{k})

(5)

where

L (y i, \hat{y i})

is the loss function,

Ω (T_{k})

is the regularization term for tree

T_{k}

, and n is the number of data points. LightGBM employs a histogram-based approach to split the finding, which significantly reduces training time compared to traditional gradient boosting frameworks. LightGBM also has an exceptional capacity for handling high-dimensional data and categorical variables through exclusive feature building (EFB). These enable the model to use comprehensive historical data, such as weather, holidays, and day of the week, for modeling residuals. Its leaf-wise tree-building technique enhances the ability of the model to capture intricate patterns. As the splits are made on leaves with the largest loss reduction, the model is able to perform exceptionally even in the large dataset.

The regularization techniques, such as L1 and L2, make LightGBM more suitable in a hybrid approach, as the model possesses the ability to handle overfitting along with distributed and parallel training. Such features enable the model to enhance accuracy in forecasting passenger count by learning residual errors from the initial TBATS model with consideration of external features. It is a requisite part of the hybrid framework due to its efficiency, scalability, and robust performance in capturing non-linearity. The LightGBM can be presented in mathematical terms as

\hat{r_{i}} = \sum_{k = 1}^{K} w_{k} T_{k} (x_{i})

(6)

where

w_{k}

are the weights of the trees

T_{k}

, and

x_{i}

represents the input features.

Hyper Parameter Selection and Tuning

LightGBM model performance is highly influenced by the hyperparameter tuning used while training the model. These enable handling the complexity of the model, learning speed, and generalization abilities. By considering computational overhead and required accuracy, Bayesian optimization has been used for hyperparameter tuning. It efficiently navigates the hyperparameter space by leveraging probabilistic models to predict promising configurations. A separate test set has been used during the tuning process for ensuring generalizability and prevention of overfitting. Although cross-validation is a common practice for robust validation, the extensive size and temporal nature of the dataset in this study made holdout validation a more computationally efficient choice without compromising performance reliability. The systematic tuning has been carried out, and the final values are listed in Table 4.

The optimized LightGBM model provides dependable performance across validation sets, confirming the effectiveness of the tuning strategy.

4.3. TBATS Boosting Hybrid Model

This study proposes the TBATS Boosting model, a hybrid model that takes into account the complementary strengths of statistical and machine learning models. Among the available statistical models, the TBATS model is used as a base model to forecast the initial passenger flow by capturing the long-term trend and multiple seasonality existing within the data. However, TBATS struggles with non-linear relationships and external feature impact, which leads to creating residual errors. To address this, a second-stage LightGBM model is employed to learn and predict these residuals. LightGBM incorporates external features such as sine- and cosine-transformed time cycles, holiday indicators, and pandemic-related statistics, along with the predicted values from the TBATS model and their lagged features.

Feature importance analysis from the residual learning stage in Figure 8 highlights the dominance of variables such as Predicted_Passenger_Count, its lagged values, and temporal features, indicating the model’s ability to capture structured dependencies. Contextual variables such as Confirmed COVID cases, Holiday_Type, and Month further enhance robustness by adjusting for external disruptions.

This enables the model to capture complex feature interactions and nonlinear patterns more effectively. The proposed TBATS Boosting architecture, illustrated in Figure 9, operates in two stages:

TBATS model: produces the initial forecast based on trend and seasonality.
LightGBM model: refines the forecast using residual learning informed by external features and lagged TBATS predictions.

The final forecast is obtained by summing the TBATS output and the residuals predicted by LightGBM. The detailed algorithm and training workflow are presented below in Figure 9 and Algorithm 1. The symbols utilized in the architecture flowchart have been generated using AI tools [62].

Algorithm 1. TBATS Boosting Hybrid Approach for Passenger Flow Forecasting
Step 1:	Load input CSV files containing File1: origin-destination (OD) passenger count with Date_Time and external features. File2: forecast dataset with corresponding external features.
Step 2:	Form Train and Test dataset for TBATS using input File1
Step 3:	Train the TBATS model using Date_Time as the index and Passenger_Count as the target variable: ${P a s s e n g e r_C o u n t}_{T B A T S, T r a i n} = f_{T B A T S} ({D a t e_T i m e}_{T r a i n}, {P a s s e n g e r_C o u n t}_{T r a i n})$ . Save the trained TBATS model for subsequent steps.
Step 4:	Load the saved TBATS model and predict passenger count for the entire Train-Test dataset: ${P a s s e n g e r_C o u n t}_{T B A T S, T r a i n_T e s t} = f_{T B A T S} ({D a t e_T i m e}_{T r a i n_T e s t})$ .
Step 5:	Calculate the residuals for the Train-Test dataset: $r_{R e s i d u a l, T r a i n_T e s t} = {P a s s e n g e r_C o u n t}_{A c t u a l, T r a i n_T e s t} - {P a s s e n g e r_{C o u n t}}_{T B A T S, T r a i n_{T e s t}} .$
Step 6:	Merge the TBATS predictions with external features and residuals to form a comprehensive feature set for the next modeling phase.
Step 7:	Train LightGBM model using the dataset generated in step 6: ${\hat{r}}_{L i g h t G B M, T r a i n_T e s t} = f_{L i g h t G B M} ({R e s i d u a l_D a t a s e t}_{T r a i n_T e s t})$ . Save the trained LightGBM model for subsequent steps.
Step 9:	Utilize the trained TBATS model to forecast initial passenger counts for the forecast horizon: ${P a s s e n g e r_C o u n t}_{T B A T S, F o r e c a s t} = f_{T B A T S} ({D a t e_T i m e}_{F o r e c a s t})$ .
Step 10:	Combine TBATS forecast outputs with corresponding external features from the forecast dataset.
Step 11:	Load the saved LightGBM model and predict the residuals for the forecast horizon: ${\hat{r}}_{L i g h t G B M, F o r e c a s t} = f_{L i g h t G B M} ({P a s s e n g e r_C o u n t}_{T B A T S, F o r e c a s t}, {E x t e r n a l_F e t u r e s}_{F o r e c a s t}) .$

4.4. Evaluation Parameters

To assess the performance of implemented forecasting models, evaluation is carried out using four performance metrics: mean absolute error (MAE), root mean squared error (RMSE), mean absolute percentage error (MAPE), and weighted mean absolute percentage error (WMAPE).

MAE considers the absolute difference between the actual and forecasted value. However, it does not differentiate between overestimation and underestimation, hence making it difficult to capture the actual impact of outliers or extreme errors. It can be expressed as follows:

M A E = \frac{1}{n} \sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}| .

(7)

RMSE is one of the important performance measures for a regression model. It calculates the average difference between forecasted values and actual values. RMSE focuses more on larger errors by squaring the difference, which makes it sensitive to outliers. It can be expressed as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(y_{i} - {\hat{y}}_{i})}^{2}} .

(8)

MAPE is a widely used percentage-based metric that calculates the average absolute difference between actual and predicted passenger count values relative to the actual values of the same. However, it can be sensitive to small actual values, leading to inflated error percentages. The mathematical expression of MAPE is given by

M A P E = \frac{1}{n} \sum_{i = 1}^{n} |\frac{y_{i} - {\hat{y}}_{i}}{y_{i}}| \times 100 .

(9)

WMAPE is less sensitive to outliers or small denominators, offering a more stable and realistic measure of forecasting performance in datasets with varying scales. It improves upon MAPE by weighting the absolute errors based on the magnitude of actual values. It can be expressed as follows:

W M A P E = \frac{\sum_{i = 1}^{n} |y_{i} - {\hat{y}}_{i}|}{\sum_{i = 1}^{n} y_{i}} \times 100

(10)

5. Results and Discussion

The proposed TBATS Boosting model was evaluated for hourly passenger flow forecasting and compared against traditional statistical models such as SARIMA and TBATS, as well as advanced machine learning models including XGBoost and LightGBM. All models were trained on data from 2019 to 2023, tested on January–February 2024, and used to forecast hourly passenger count for March 2024. Forecasting has been performed across Route 12, with performance analysis focused on three representative station pairs: Thane Station West to Pawar Nagar (high demand), Thane Station West to Voltas Gate (moderate demand), and Pawar Nagar to Civil Court (low demand). Model accuracy was assessed using RMSE, MAE, MAPE, and WMAPE. To evaluate adaptability under operational conditions, the analysis considered temporal variations, including weekdays, weekends, holidays, and non-holidays. The chosen station pairs effectively capture the range of passenger flow scenarios, providing a comprehensive assessment of model robustness under varying demand conditions.

As illustrated in Figure 10, one-day hourly passenger flow forecasts in the case of non-holiday high-, moderate-, and low-passenger-flow station pairs reveal critical insights into the comparative performance of the TBATS and TBATS Boosting models. For the high-flow station pair, the TBATS Boosting model corrects the underestimation of the TBATS model and effectively captures morning and evening rush-hour peaks. In the case of the moderate-flow station pair, the TBATS Boosting model outperforms the TBATS model by capturing midday variations and evening peaks. Similarly, for the low station pair, the TBATS Boosting model achieves precise alignment with actual passenger counts by addressing overestimations during stable periods and capturing minor spikes more effectively than TBATS.

The hourly passenger flow forecasts for Holi (25 March 2024), as shown in Figure 11, highlight the superior accuracy of the TBATS Boosting model for all types of station pairs. The TBATS model effectively captures seasonality and trend; however, it struggles to adapt to non-linear disruptions caused by the holidays and results in a consistent overestimation of the passenger flow. In contrast, the TBATS Boosting model demonstrates remarkable adaptability by incorporating holiday-related features and forecasting the passenger count, which closely aligns with actual passenger counts. The hybrid model effectively accounts for the suppressed rush-hour peaks and decreased overall flow for the high- and moderate-flow station pairs, whereas it successfully mimics the minimal variations and reduced activity for the low-flow station pair. This shows the robustness of the proposed model in addressing extreme deviations in passenger flow patterns caused by external factors such as holidays and holiday types.

Figure 12 highlights the superior ability of the TBATS Boosting model while forecasting hourly passenger flow for one week, including different station pairs. On weekdays, passenger flow exhibits predictable peaks during morning and evening rush hours, particularly in the high-flow pair. The TBATS model efficiently captures the overall trends; however, it does not capture high intensities and fails to adapt to sharp surges during critical time slots. The TBATS Boosting model handles distinct weekday and weekend dynamics by integrating weekday_timeslot and other time-related features. Similarly, the moderate-flow station pair hybrid model captures smaller peaks during rush hours and reduces the deviations observed in the TBATS forecast, particularly during midday fluctuations. Weekend patterns exhibit significantly lower passenger counts and reduced peak activity across all station pairs. As TBATS models rely mainly on temporal seasonality, they overestimate the passenger flow during weekends and struggle to adapt to these non-linear disruptions. The hybrid model effectively addresses these challenges, closely aligning with actual passenger counts by incorporating external influences and adapting to dynamic weekend patterns.

Figure 13 shows the performance comparison of TBATS and the proposed TBATS Boosting model for hourly passenger flow forecasts on the entire prediction horizon of one month. It reflects the different scenarios, including holidays, non-holidays, weekdays, and weekends. While forecasting the passenger flow, the TBATS Boosting model consistently outperforms the standalone TBATS model. By addressing the underestimation of TBATS during rush hours and overestimations during off-peak times for high-flow station pairs, the hybrid model accurately captures the peaks and troughs. In moderate-flow scenarios, the hybrid model reduces deviations observed in TBATS forecasts, adapting well to midday fluctuations and rush-hour trends. Similarly, minimal variability of low-flow station pairs is handled through the TBATS Boosting model with precision by correcting overestimations of TBATS and capturing sporadic fluctuations effectively.

The comparative performance of SARIMA, TBATS, XGBoost, LightGBM, and the proposed TBATS Boosting model across three station pairs, Thane Station West to Pawar Nagar (high flow), Thane Station West to Voltas Gate (moderate flow), and Pawar Nagar to Civil Court (low flow), is summarized in Table 5 and illustrated in Figure 14. The TBATS Boosting model consistently achieves the lowest forecasting errors across all evaluation metrics—MAE, RMSE, MAPE, and WMAPE—demonstrating superior accuracy and adaptability across varying demand levels.

In terms of absolute error, TBATS Boosting significantly reduced both RMSE and MAE when compared to all other models. For the high-demand pair, the RMSE and MAE decreased from 43.12 and 27.15 (SARIMA) to 24.70 and 16.11, respectively. Similar improvements were observed for the moderate-demand pair, where RMSE dropped from 43.78 to 26.37 and MAE from 29.01 to 16.19. In the low-demand scenario, the hybrid model reduced RMSE from 14.96 to 7.07 and MAE from 10.05 to 4.92. These reductions in raw error values highlight the practical operational benefits of the hybrid approach, especially in high-volume segments where minor forecast inaccuracies can have significant downstream effects on service quality and congestion.

Percentage-based metrics such as MAPE and WMAPE provide additional insights, particularly for comparing performance across station pairs with different passenger volumes. TBATS Boosting reduced WMAPE from 25.59% to 14.52% in the high-flow pair, from 21.03% to 16.62% in the moderate-flow pair, and from 26.64% to 17.33% in the low-flow pair. These relative improvements emphasize the consistent performance of the TBATS Boosting model, even when forecasting conditions vary significantly. While absolute errors are naturally lower in low-flow settings due to smaller volumes, the percentage errors in these scenarios tend to remain higher, reflecting the challenges of predicting sparse, irregular demand. Despite this, the hybrid model consistently maintained its advantage over the baselines.

The differences in model performance can be attributed to the inherent design and capabilities of each forecasting approach. SARIMA, although well-suited for time series with linear trends and single-seasonality, struggles with the multi-seasonal structures and nonlinear behavior characteristic of urban transit data. TBATS addresses this limitation by effectively capturing complex seasonal components; however, it lacks mechanisms to incorporate exogenous variables. In contrast, XGBoost and LightGBM excel at learning nonlinear patterns and interactions from input features but do not explicitly model seasonality. The proposed TBATS Boosting framework overcomes these individual limitations by integrating the strengths of both model families: TBATS captures long-term seasonal and trend dynamics, while LightGBM models the residual variation influenced by external factors such as holidays, temporal features, and pandemic-related disruptions. This hybrid design enables the model to maintain robust forecasting performance across diverse demand levels, time-of-day variations, and operational contexts within the transit system.

To enhance robustness and generalizability, the model was designed to handle missing or unavailable input features. For instance, in the absence of pandemic indicators (such as in post-COVID periods such as 2024), the corresponding variables are set to zero, enabling the model to treat these periods as non-pandemic without disruption. For holidays, a dual-feature approach was adopted, using is_holiday to signal atypical travel days and holiday_type for additional classification. This allows the model to account for both known and previously unseen holidays.

The hybrid TBATS-LightGBM framework leverages both temporal dynamics and contextual features, enabling the system to adapt under partially unseen conditions. This adaptability was further validated through diverse testing across multiple Station_Pair routes and time periods, reflecting robust performance under varied travel behaviors.

6. Conclusions

This study highlights the effectiveness of the hybrid TBATS Boosting model for passenger flow forecasting in the Thane Municipal Transport system, addressing critical challenges in developing countries such as India. By focusing on Route 12, a high-priority route operating between Thane Station West (W) and Pawar Nagar, this research bridges the gap in existing studies on urban transit systems in contexts characterized by heterogeneous passenger behavior and infrastructure constraints. The inclusion of real-world data from 2019 to March 2024, encompassing significant disruptions such as the COVID-19 pandemic, adds depth and relevance to the analysis.

Hourly passenger flow forecasting was conducted using SARIMA, TBATS, XGBoost, LightGBM, and the proposed TBATS Boosting model. The hybrid approach consistently delivered the most accurate results across all station pairs, achieving the lowest MAE, RMSE, MAPE, and WMAPE. This superior performance reflects its ability to capture complex seasonality, nonlinear patterns, and the influence of external variables. SARIMA was limited by its assumptions of linearity and single-seasonality, while TBATS improved temporal modeling but lacked external variable integration. XGBoost and LightGBM effectively modeled nonlinear patterns and feature interactions but did not explicitly account for seasonal dynamics. By combining the strengths of TBATS and LightGBM, the hybrid model delivers more accurate and robust forecasts across varying passenger demand levels and temporal conditions.

The proposed model holds substantial practical value for transport planning and policy formulation. Accurate demand predictions enable data-driven decisions related to fleet allocation, route scheduling, and demand-responsive service planning. For example, during periods of increased passenger volume, transit agencies can allocate additional vehicles to reduce congestion, while during lower demand intervals, schedules can be adjusted to minimize operational costs without compromising service reliability. In areas with limited ridership, the model can support flexible routing strategies or microtransit services that maintain accessibility while improving resource efficiency. By informing targeted operational decisions, the model supports the development of more efficient, equitable, and resilient public transport systems in line with sustainable urban mobility goals.

The study also establishes a foundation for future work. Integrating additional external variables such as weather, socioeconomic indicators, or event-specific data could further enhance forecast precision. Moreover, validating the framework across other cities, networks, or multimodal transit systems would confirm its scalability and generalizability. Overall, the TBATS Boosting model offers a robust, adaptive, and policy-relevant solution for data-driven transit planning, supporting broader goals in sustainable urban mobility and smart city development.

Author Contributions

Conceptualization, M.P., S.B.P. and D.S.; methodology, M.P.; software, M.P. and R.M; validation, M.P., S.B.P. and D.S.; formal analysis, M.P.; investigation, M.P. and R.M.; resources, M.P.; data curation, M.P. and S.B.P.; writing—original draft preparation, M.P.; writing—review and editing, M.P., S.B.P. and D.S.; visualization, M.P. and R.M.; supervision, S.B.P. and D.S.; project administration, M.P., S.B.P. and D.S. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The data can be availed based on request.

Acknowledgments

The authors are thankful to authorities of Thane Municipal Transport Services for their support in this study.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Balcombe, R.; Mackett, R.; Paulley, N.; Preston, J.; Shires, J.; Titheridge, H.; Wardman, M.; White, P. The Demand for Public Transport: A Practical Guide; Technical Report; Transportation Research Laborator: Crowthorne, UK, 2004. [Google Scholar]
Talusan, J.P.; Mukhopadhyay, A.; Freudberg, D.; Dubey, A. On Designing Day Ahead and Same Day Ridership Level Prediction Models for City-Scale Transit Networks Using Noisy APC Data. arXiv 2022, arXiv:2210.04989. [Google Scholar] [CrossRef]
Tirachini, A.; Hensher, D.A.; Rose, J.M. Crowding in public transport systems: Effects on users, operation and implications for the estimation of demand. Transp. Res. Part A Policy Pract. 2013, 53, 36–52. [Google Scholar] [CrossRef]
Subah, A.I.; Rhythy, T.R.; Quadir, S.T.; Raihan, M.A. A systematic review on forecasting passenger flows of multimodal transportation system integrating metro. In Proceedings of the 7th International Conference on Civil Engineering for Sustainable Development, Khulna, Bangladesh, 23 July 2024. [Google Scholar]
Halyal, S.; Mulangi, R.H.; Harsha, M.M. Forecasting public transit passenger demand: With neural networks using APC data. Case Stud. Transp. Policy 2022, 10, 965–975. [Google Scholar] [CrossRef]
Agrawal, A. Sustainability of airlines in India with COVID-19: Challenges ahead and possible way-outs. J. Revenue Pricing Manag. 2021, 20, 457–472. [Google Scholar] [CrossRef]
Elassy, M.; Al-Hattab, M.; Takruri, M.; Badawi, S. Intelligent transportation systems for sustainable smart cities. Transp. Eng. 2024, 16, 100252. [Google Scholar] [CrossRef]
Ziel, F. Modeling public holidays in load forecasting: A German case study. J. Mod. Power Syst. Clean Energy 2018, 6, 191–207. [Google Scholar] [CrossRef]
Jain, D. Impact of COVID-19 led transition of work culture and travel to work patterns on society and environment in Delhi. Urban Gov. 2024, 4, 388–400. [Google Scholar] [CrossRef]
Padmakumar, A.; Patil, G.R. COVID-19 effects on urban driving, walking, and transit usage trends: Evidence from Indian metropolitan cities. Cities 2022, 126, 103697. [Google Scholar] [CrossRef]
TERI. 2020. Impact of COVID-19 on Urban Mobility in India: Evidence from a Perception Study. New Delhi: The Energy and Resources Institute. 2020. Available online: https://www.teriin.org (accessed on 1 April 2025).
Abu-Rayash, A.; Dincer, I. Analysis of mobility trends during the COVID-19 coronavirus pandemic: Exploring the impacts on global aviation and travel in selected cities. Energy Res. Soc. Sci. 2020, 68, 101693. [Google Scholar] [CrossRef]
Thomas, N.; Jana, A.; Bandyopadhyay, S. Physical distancing on public transport in Mumbai, India: Policy and planning implications for unlock and post-pandemic period. Transp. Policy 2022, 116, 217–236. [Google Scholar] [CrossRef]
Shanthappa, N.K.; Mulangi, R.H.; Manjunath, H.M. Deep learning-based public transit passenger flow prediction model: Integration of weather and temporal attributes. Public Transp. 2024. [Google Scholar] [CrossRef]
Luo, Q.; Zhou, Y. Spatial-temporal Structures of Deep Learning Models for Traffic Flow Forecasting: A Survey. In Proceedings of the 2021 4th International Conference on Intelligent Autonomous Systems (ICoIAS), Wuhan, China, 14–16 May 2021; pp. 187–193s. [Google Scholar]
Fontes, T.; Correia, R.; Ribeiro, J.; Borges, J.L. A Deep Learning Approach for Predicting Bus Passenger Demand Based on Weather Conditions. Transp. Telecommun. J. 2020, 21, 255–264. [Google Scholar] [CrossRef]
Jain, D.; Singh, S. Adaptation of trips by metro rail users at two stations in extreme weather conditions: Delhi. Urban Clim. 2021, 36, 100766. [Google Scholar] [CrossRef]
Jeffery, D.J.; Russam, K.; Robertson, D.I. Electronic route guidance by AUTOGUIDE: The research background. Traffic Eng. Control 1987, 28, 525–529. [Google Scholar]
El Esawey, M. Daily bicycle traffic volume estimation: Comparison of historical average and count models. J. Urban Plan. Dev. 2018, 144, 04018011. [Google Scholar] [CrossRef]
Yang, H.; Yang, J.; Han, L.D.; Liu, X.; Pu, L.; Chin, S.-M.; Hwang, H.-L. A Kriging based spatiotemporal approach for traffic volume data imputation. PLoS ONE 2018, 13, e0195957. [Google Scholar] [CrossRef]
Kumar, S.V. Traffic Flow Prediction using Kalman Filtering Technique. Procedia Eng. 2017, 187, 582–587. [Google Scholar] [CrossRef]
Jiao, P.; Li, R.; Sun, T.; Hou, Z.; Ibrahim, A. Three Revised Kalman Filtering Models for Short-Term Rail Transit Passenger Flow Prediction. Math. Probl. Eng. 2016, 2016, 9717582. [Google Scholar] [CrossRef]
Emami, A.; Sarvi, M.; Asadi Bagloee, S. Using Kalman filter algorithm for short-term traffic flow prediction in a connected vehicle environment. J. Mod. Transport. 2019, 27, 222–232. [Google Scholar] [CrossRef]
Chen, H.; Rakha, H.A. Real-time travel time prediction using particle filtering with a non-explicit state-transition model. Transp. Res. Part C Emerg. Technol. 2014, 43, 112–126. [Google Scholar] [CrossRef]
Okutani, I.; Stephanedes, Y.J. Dynamic prediction of traffic volume through Kalman filtering theory. Transp. Res. Part B Methodol. 1984, 18, 1–11. [Google Scholar] [CrossRef]
Whittaker, J.; Garside, S.; Lindveld, K. Tracking and predicting a network traffic process. Int. J. Forecast. 1997, 13, 51–61. [Google Scholar] [CrossRef]
Williams, B.M.; Durvasula, P.K.; Brown, D.E. Urban Freeway Traffic Flow Prediction: Application of Seasonal Autoregressive Integrated Moving Average and Exponential Smoothing Models. Transp. Res. Rec. 1998, 1644, 132–141. [Google Scholar] [CrossRef]
Milenković, M.; Švadlenka, L.; Melichar, V.; Bojović, N.; Avramović, Z. SARIMA modelling approach for railway passenger flow forecasting. Transport 2016, 33, 1113–1120. [Google Scholar] [CrossRef]
Li, W.; Sui, L.; Zhou, M.; Dong, H. Short-term passenger flow forecast for urban rail transit based on multi-source data. J. Wirel. Commun. Netw. 2021, 2021, 9. [Google Scholar] [CrossRef]
Devianto, D.; Permana, D.; Arif, E.; Afrimayani, A.; Yanuar, F.; Maiyastri, M.; Yollanda, M. An innovative model for capturing seasonal patterns of train passenger movement using exogenous variables and fuzzy time series hybridization. J. Open Innov. Technol. Mark. Complex. 2024, 10, 100232. [Google Scholar] [CrossRef]
Liu, J.; Yang, X. Research on Passenger Flow Forecast of Urban Rail Transit Based on SARIMA-RBF Combination Model. In Proceedings of the 2021 International Conference on Intelligent Transportation, Big Data & Smart City (ICITBS), Xi’an, China, 27–28 March 2021; pp. 26–30. [Google Scholar] [CrossRef]
Chuwang, D.D.; Chen, W. Forecasting Daily and Weekly Passenger Demand for Urban Rail Transit Stations Based on a Time Series Model Approach. Forecasting 2022, 4, 904–924. [Google Scholar] [CrossRef]
Liu, Y.; Lyu, C.; Liu, X.; Liu, Z. Automatic Feature Engineering for Bus Passenger Flow Prediction Based on Modular Convolutional Neural Network. IEEE Trans. Intell. Transp. Syst. 2021, 22, 2349–2358. [Google Scholar] [CrossRef]
Li, D.; Zhang, C.; Cao, J. Short-Term Passenger Flow Prediction of a Passageway in a Subway Station Using Time Space Correlations Between Multi Sites. IEEE Access 2020, 8, 72471–72484. [Google Scholar] [CrossRef]
Zhang, J.; Chen, F.; Cui, Z.; Guo, Y.; Zhu, Y. Deep Learning Architecture for Short-Term Passenger Flow Forecasting in Urban Rail Transit. IEEE Trans. Intell. Transp. Syst. 2021, 22, 7004–7014. [Google Scholar] [CrossRef]
Xu, Z.; Zhu, R.; Yang, Q.; Wang, L.; Wang, R.; Li, T. Short-Term Bus Passenger Flow Forecast Based on the Multi-feature Gradient Boosting Decision Tree. In Advances in Natural Computation, Fuzzy Systems and Knowledge Discovery. ICNC-FSKD 2019; Liu, Y., Wang, L., Zhao, L., Yu, Z., Eds.; Advances in Intelligent Systems and Computing; Springer: Cham, Switzerland, 2019; Volume 1074, pp. 660–673. [Google Scholar]
Liu, L.; Chen, R.-C.; Zhao, Q.; Zhu, S. Applying a multistage of input feature combination to random forest for improving MRT passenger flow prediction. J. Ambient. Intell. Hum. Comput. 2019, 10, 4515–4532. [Google Scholar] [CrossRef]
Zhou, W.; Wang, W.; Zhao, D. Passenger Flow Forecasting in Metro Transfer Station Based on the Combination of Singular Spectrum Analysis and AdaBoost-Weighted Extreme Learning Machine. Sensors 2020, 20, 3555. [Google Scholar] [CrossRef] [PubMed]
Lee, E.H. Traffic Speed Prediction of Urban Road Network Based on High Importance Links Using XGB and SHAP. IEEE Access 2023, 11, 113217–113226. [Google Scholar] [CrossRef]
Min, J.H.; Ham, S.W.; Kim, D.-K.; Lee, E.H. Deep Multimodal Learning for Traffic Speed Estimation Combining Dedicated Short-Range Communication and Vehicle Detection System Data. Transp. Res. Rec. 2023, 2677, 247–259. [Google Scholar] [CrossRef]
Guo, J.; Xie, Z.; Qin, Y.; Jia, L.; Wang, Y. Short-Term Abnormal Passenger Flow Prediction Based on the Fusion of SVR and LSTM. IEEE Access 2019, 7, 42946–42955. [Google Scholar] [CrossRef]
Xia, D.; Yang, N.; Jian, S.; Hu, Y.; Li, H. SW-BiLSTM: A Spark-based weighted BiLSTM model for traffic flow forecasting. Multimed. Tools Appl. 2022, 81, 23589–23614. [Google Scholar] [CrossRef]
Li, W.-T.; Zhao, M.; Wu, Y.-H.; Yu, J.-J.; Bao, L.-Y.; Yang, H.; Liu, D. Collaborative offloading for UAV-enabled time-sensitive MEC networks. J. Wirel. Commun. Netw. 2021, 1, 2021. [Google Scholar] [CrossRef]
Gong, M.; Fei, X.; Wang, Z.H.; Qiu, Y.J. Sequential Framework for Short-Term Passenger Flow Prediction at Bus Stop. Transp. Res. Rec. 2014, 2417, 58–66. [Google Scholar] [CrossRef]
Mirzahossein, H.; Gholampour, I.; Sajadi, S.R.; Zamani, A.H. A hybrid deep and machine learning model for short-term traffic volume forecasting of adjacent intersections. IET Intell. Transp. Syst. 2022, 16, 1648–1663. [Google Scholar]
Pan, Y.A.; Guo, J.; Chen, Y.; Cheng, Q.; Li, W.; Liu, Y. A fundamental diagram based hybrid framework for traffic flow estimation and prediction by combining a Markovian model with deep learning. Expert Syst. Appl. Part E 2024, 238, 122219. [Google Scholar] [CrossRef]
Burdzik, R.; Chema, W.; Celiński, I. A study on passenger flow model and simulation in aspect of COVID-19 spreading on public transport bus stops. J. Public Transp. 2023, 25, 100063. [Google Scholar] [CrossRef]
Shi, G.; Luo, L. Prediction and Impact Analysis of Passenger Flow in Urban Rail Transit in the Postpandemic Era. J. Adv. Transp. 2023, 2023, 3448864. [Google Scholar] [CrossRef]
Zhang, L.; Liu, K. Unsupervised origin-destination flow estimation for analyzing COVID-19 impact on public transport mobility. Cities 2024, 151, 105086. [Google Scholar] [CrossRef]
Li, X.; de Groot, M.; Bäck, T. Using forecasting to evaluate the impact of COVID-19 on passenger air transport demand. Decis. Sci. 2021, 54, 394–409. [Google Scholar] [CrossRef] [PubMed] [PubMed Central]
Jing, Y.; Hu, H.; Guo, S.; Wang, X.; Chen, F. Short-Term Prediction of Urban Rail Transit Passenger Flow in External Passenger Transport Hub Based on LSTM-LGB-DRS. IEEE Trans. Intell. Transp. Syst. 2021, 22, 4611–4621. [Google Scholar] [CrossRef]
Zhang, Y.; Zhu, C.; Wang, Q. LightGBM-based model for metro passenger volume forecasting. IET Intell. Transp. Syst. 2020, 14, 1815–1823. [Google Scholar] [CrossRef]
Perone, G. Comparison of ARIMA, ETS, NNAR, TBATS and hybrid models to forecast the second wave of COVID-19 hospitalizations in Italy. Eur. J. Health Econ. 2022, 23, 917–940. [Google Scholar] [CrossRef]
Yu, C.; Xu, C.; Li, Y.; Yao, S.; Bai, Y.; Li, J.; Wang, L.; Wu, W.; Wang, Y. Time Series Analysis and Forecasting of the Hand-Foot-Mouth Disease Morbidity in China Using An Advanced Exponential Smoothing State Space TBATS Model. Infect. Drug Resist. 2021, 14, 2809–2821. [Google Scholar] [CrossRef]
Thayyib, P.V.; Thorakkattle, M.N.; Usmani, F.; Yahya, A.T.; Farhan, N.H.S. Forecasting Indian Goods and Services Tax revenue using TBATS, ETS, Neural Networks, and hybrid time series models. Cogent Econ. Financ. 2023, 11, 2285649. [Google Scholar] [CrossRef]
Nagaraj, N.; Gururaj, H.L.; Swathi, B.H.; Hu, Y.-C. Passenger flow prediction in bus transportation system using deep learning. Multimed. Tools Appl. 2022, 81, 12519–12542. [Google Scholar] [CrossRef]
Sajanraj, T.D.; Mulerikkal, J.; Raghavendra, S.; Vinith, R.; Fábera, V. Passenger flow prediction from AFC data using station memorizing LSTM for metro rail systems. Neural Netw. World 2021, 31, 173–189. [Google Scholar] [CrossRef]
Cyril, A.; Mulangi, R.H.; George, V. Bus Passenger Demand Modelling Using Time-Series Techniques- Big Data Analytics. Open Transp. J. 2019, 13, 41–47. [Google Scholar] [CrossRef]
Cyril, A.; Mulangi, R.H.; George, V. Modelling and Forecasting Bus Passenger Demand using Time Series Method. In Proceedings of the 2018 7th International Conference on Reliability, Infocom Technologies and Optimization (Trends and Future Directions) (ICRITO), Noida, India, 29–31 August 2018; pp. 460–466. [Google Scholar] [CrossRef]
Gummadi, R.; Edara, S.R. Prediction of Passenger Flow of Transit Buses Over a Period of Time Using Artificial Neural Network. In Third International Congress on Information and Communication Technology; Yang, X.S., Sherratt, S., Dey, N., Joshi, A., Eds.; Advances in Intelligent Systems and Computing; Springer: Singapore, 2018; Volume 797, pp. 963–971. [Google Scholar]
Uber Technologies, Inc. (n.d.). Kepler.gl [Data Visualization Tool]. Available online: https://kepler.gl (accessed on 27 March 2024).
OpenAI. ChatGPT (Mar 14 Version) [Large Language Model]. 2023. Available online: https://chat.openai.com/chat (accessed on 15 October 2024).

Figure 1. Route map of the Thane Transport network with passenger flow.

Figure 2. Detailed route map of Route 12 with major stations highlighted.

Figure 3. Daily and hourly passenger flow trends across selected station pairs.

Figure 4. Impact of holidays on passenger flow: insights from average commute patterns.

Figure 5. Temporal variations in hourly passenger flow across weekdays and weekends.

Figure 6. COVID-19 trends: insights from lockdown and recovery phases.

Figure 7. Structure of TBATS model with overview of city passenger flow.

Figure 8. Normalized feature importance observed in residual learning.

Figure 9. Hybrid TBATS Boosting model workflow.

Figure 10. One-day non-holiday forecasts across different pairs: TBATS vs. hybrid TBATS Boosting models.

Figure 11. One-day holiday forecasts across different pairs: TBATS vs. hybrid TBATS Boosting models.

Figure 12. One-week passenger flow forecasts: weekday and weekend analysis across station pairs.

Figure 13. Passenger flow forecasts for the entire horizon: evaluation of TBATS and hybrid TBATS Boosting models.

Figure 14. Comparison of forecasting metrics across station pairs: TBATS vs. hybrid TBATS Boosting model.

Table 3. Pre-processed dataset with description used for model building.

Feature Name	Description	Type	Example Values
Date_Time	Timestamp in YYYY-MM-DD HH:MM:SS format.	Timestamp	01-01-2023 00:00:00
Station_Pair	Combined source and destination station details.	Categorical	Thane_Station_West_Pawar_Nagar
Passenger_Count	Number of passengers recorded for the given timestamp.	Numeric	3, 22, 37
Is_Holiday	Binary indicator for holidays (1 = Holiday, 0 = Non-Holiday).	Binary	0
Holiday_Type	Categorical representation of holiday types.	Categorical	0, 1, 27
Month	Numeric representation of the month.	Categorical	1, 2
sin_time	Sine-transformed hourly cyclic pattern.	Numeric	0.866025404
cos_time	Cosine-transformed hourly cyclic pattern.	Numeric	0.707106781
Weekday_Timeslot	Weekday and hourly time slot (e.g., 6_0).	Categorical	6_0, 6_12
Confirmed	Confirmed COVID-19 cases for the corresponding day.	Numeric	1234, 5678
Deceased	Deceased COVID-19 cases for the corresponding day.	Numeric	5, 10
Recovered	Recovered COVID-19 cases for the corresponding day.	Numeric	1200, 5500
TBATS_prediction	Initial passenger count forecasted by TBATS model	Numeric	25, 22, 425
TBATS_prediction lag_24	TBATS prediction value observed 24 h prior	Numeric	22, 27, 45
TBATS_prediction lag_148	TBATS prediction value observed 148 h prior	Numeric	22, 27, 45

Table 4. Hyperparameters with its value and description.

Hyperparameter	Final Value	Role in Model
Number of leaves	51	Controls the complexity and depth of each tree.
Learning rate	0.01	Determines the step size for updating weights during training.
Feature fraction	0.8	Fraction of features used per iteration; helps in regularization.
Bagging fraction	0.8	Fraction of data used per iteration; reduces overfitting.
Bagging frequency	5	Specifies frequency of bagging, promoting generalization.
Minimum data in leaf	30	Minimum number of samples required in a leaf; prevents overfitting.

Table 5. Forecasting performance metrics: comparison of TBATS and hybrid TBATS boosting models across station pairs.

Station_Pair	MAE	RMSE	MAPE	WMAPE	Forecasting_Model
Thane_Pawar	27.15	43.12	32.72	25.59	SARIMA
	24.65	37.12	30.09	21.76	TBATS
	20.47	29.23	27.95	17.12	XGBoost
	20.32	29.17	27.75	16.99	LightGBM
	16.11	24.7	22.03	14.52	TBATS Boosting
Thane _Voltas	29.01	43.78	34.62	21.03	SARIMA
	26.01	38.04	32.07	20.25	TBATS
	20.08	28.23	31.23	18.41	XGBoost
	19.06	28.81	29.93	18.38	LightGBM
	16.19	26.37	25.07	16.62	TBATS Boosting
Pawar_ Civil	10.05	14.96	36.14	26.64	SARIMA
	8.67	11.31	33.08	23.63	TBATS
	6.2	8.9	32.43	21.63	XGBoost
	6.27	8.98	31.02	21.87	LightGBM
	4.92	7.07	27.37	17.33	TBATS Boosting

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Patel, M.; Patel, S.B.; Swain, D.; Mallagundla, R. Enhancing Accuracy in Hourly Passenger Flow Forecasting for Urban Transit Using TBATS Boosting. Modelling 2025, 6, 32. https://doi.org/10.3390/modelling6020032

AMA Style

Patel M, Patel SB, Swain D, Mallagundla R. Enhancing Accuracy in Hourly Passenger Flow Forecasting for Urban Transit Using TBATS Boosting. Modelling. 2025; 6(2):32. https://doi.org/10.3390/modelling6020032

Chicago/Turabian Style

Patel, Madhuri, Samir B. Patel, Debabrata Swain, and Rishikesh Mallagundla. 2025. "Enhancing Accuracy in Hourly Passenger Flow Forecasting for Urban Transit Using TBATS Boosting" Modelling 6, no. 2: 32. https://doi.org/10.3390/modelling6020032

APA Style

Patel, M., Patel, S. B., Swain, D., & Mallagundla, R. (2025). Enhancing Accuracy in Hourly Passenger Flow Forecasting for Urban Transit Using TBATS Boosting. Modelling, 6(2), 32. https://doi.org/10.3390/modelling6020032

Article Menu

Enhancing Accuracy in Hourly Passenger Flow Forecasting for Urban Transit Using TBATS Boosting

Abstract

1. Introduction

2. Literature Review

3. Data Description and Analysis

3.1. Study Area and Data Source

3.2. Data Description

3.3. Data Pre-Processing and Exploratory Data Analysis (EDA)

3.4. Exploratory Data Analysis (EDA)

4. Methodology

4.1. TBATS

4.1.1. Box–Cox Transformation

4.1.2. Trend Component

4.1.3. Seasonal Component

4.1.4. ARMA Errors

4.1.5. TBATS Model Configuration and Training

4.2. LightGBM

Hyper Parameter Selection and Tuning

4.3. TBATS Boosting Hybrid Model

4.4. Evaluation Parameters

5. Results and Discussion

6. Conclusions

Author Contributions

Funding

Data Availability Statement

Acknowledgments

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI