Short-Term and Long-Term Travel Time Prediction Using Transformer-Based Techniques

Lin, Hui-Ting Christine; Dai, Hao; Tseng, Vincent S.

doi:10.3390/app14114913

Open AccessArticle

Short-Term and Long-Term Travel Time Prediction Using Transformer-Based Techniques

by

Hui-Ting Christine Lin

¹,

Hao Dai

²

and

Vincent S. Tseng

^1,3,*

¹

Department of Computer Science, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan

²

Institute of Information Management, National Yang Ming Chiao Tung University, Hsinchu 30010, Taiwan

³

Department of Management Information Systems, National Chung Hsing University, Taichung 402, Taiwan

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2024, 14(11), 4913; https://doi.org/10.3390/app14114913

Submission received: 19 April 2024 / Revised: 20 May 2024 / Accepted: 31 May 2024 / Published: 5 June 2024

Download

Browse Figures

Versions Notes

Abstract

:

In the evolving field of Intelligent Transportation Systems (ITSs), accurate and reliable traffic prediction is essential in enhancing management and planning capabilities. Accurately predicting traffic conditions over both short-term and long-term intervals is vital for the practical application of ITS. The integration of deep learning into traffic prediction has proven crucial in advancing traffic prediction beyond traditional approaches, particularly in analyzing and forecasting complex traffic scenarios. Despite these advancements, the existing methods are unable to effectively handle both short-term and long-term traffic patterns given their complex nature, revealing a need for more comprehensive forecasting solutions. To address this need, we propose a new approach named the Short-Term and Long-Term Integrated Transformer (SLIT). SLIT is a Transformer-based encoder–decoder architecture, designed for the effective prediction of both short-term and long-term travel time durations. The architecture integrates the Enhanced Data Preprocessing (EDP) with the Short-Term and Long-Term Integrated Encoder–Decoder (SLIED). This harmonious combination enables SLIT to effectively capture the complexities of traffic data over varying time horizons. Extensive evaluations on a large-scale real-world traffic dataset demonstrate the excellence of SLIT compared with existing competitive methods in both short- and long-term travel time predictions across various metrics. SLIT exhibits significant improvements in prediction results, particularly in short-term forecasting. Remarkable improvements are observed in SLIT, with enhancements of up to 9.67% in terms of all evaluation metrics across various time horizons. Furthermore, SLIT demonstrates the capability to analyze traffic patterns across various road complexities, proving its adaptability and effectiveness in diverse traffic scenarios with improvements of up to 10.83% in different road conditions. The results of this study highlight the high potential of SLIT in significantly enhancing traffic prediction within ITS.

Keywords:

Intelligent Transportation Systems; travel time prediction; short-term prediction; long-term prediction

1. Introduction

In recent years, the rapid expansion in sensor technologies has resulted in an increase in traffic data, bringing in the era of Intelligent Transportation Systems (ITSs). ITS has revolutionized traffic management and efficiency, offering advanced solutions for monitoring vehicle flows in real time, improving safety measures, evaluating environmental impacts, and optimizing public transport systems. These advancements are crucial for the growth of smart cities, emphasizing the pivotal role of ITS in modern urban infrastructure. Given the availability of extensive real-world traffic data, there is a significant effort to use these data to accurately predict future travel times, which is fundamental to the domain of ITS. Travel time prediction utilizes historical traffic data to estimate future travel conditions, a task that is essential for effective route planning for users and traffic flow management for urban planners. Due to its significant impact on numerous real-world applications, travel time prediction has captured growing interest, establishing itself as a notable research topic. This focus reflects the significance of leveraging large-scale datasets to improve the predictive accuracy of traffic models, serving as a foundation for advancing ITS applications.

In the development of ITS, traditional methods for predicting traffic always serve as the foundation for improvements in managing and planning traffic. These traditional approaches, including statistical models and machine learning, play a crucial role in the initial understanding of traffic patterns and in making predictions. However, these methods show their limitations, especially in dealing with the complex, non-linear nature of traffic data and the dynamic conditions on the roads. This situation highlights the need for more advanced solutions that can effectively address these challenges. Given these circumstances, the rise of deep learning represents a significant advancement. It leads to a new era in how we handle and analyze complex traffic data. The ability of deep learning models to identify complex patterns in extensive datasets makes them particularly suitable for addressing the challenges of traffic prediction tasks.

In traffic prediction, the categorization of short-term and long-term predictions varies across the academic literature. For example, Ref. [1] generally classifies short-term predictions as those lasting under 15 min, while long-term predictions are considered to surpass 1 h. Other studies, such as [2,3,4,5], categorize long-term predictions as those that extend past 1 day. In our study, we define short-term predictions as those within an hour’s time frame and long-term predictions as those that exceed this duration. Significant advancements have been made in the area of short-term traffic prediction, largely driven by developments in deep learning technologies that focus on immediate or upcoming traffic conditions. Research has explored the use of Long Short-Term Memory Networks combined with Deep Neural Networks (LSTMs–DNNs) for analyzing highway travel time data [6]. Similarly, another study merged LSTM techniques with ensemble learning and optimization strategies to enhance short-term traffic flow predictions [7]. Moreover, the application of Gated Recurrent Units (GRUs) coupled with eXtreme Gradient Boosting (XGBoost) in a separate study has provided deeper insights into traffic behaviors, thus enhancing the precision of traffic predictions [8]. These studies highlight various methods employed in short-term travel time prediction, each with unique benefits. However, they also share common obstacles such as the complexity of multi-step forecasting and the accumulation of errors [4].

Innovative modeling techniques have significantly impacted the field of long-term traffic prediction, notably improving strategic planning and decision making within urban traffic management. Research employing Recurrent Neural Networks (RNNs) has demonstrated the ability to forecast traffic flows over extended periods by incorporating meteorological and contextual data [9]. Additionally, a blend of wavelet decomposition, Convolutional Neural Network (CNN), and LSTM has been applied to predict traffic flow for the subsequent day, emphasizing the analysis of long-term temporal characteristics [10]. Moreover, recent progress has showcased the utility of Transformer-based methods in effectively capturing long-term dependencies, with one study introducing a Transformer model equipped with multi-head attention designed for this task [11]. Another study presented a Multi-Size Patched Spatiotemporal Transformer Network (MSP-STTN), a model that utilizes a patched Transformer structure to predict traffic flow in both the short and long term [4]. These technological advancements highlight the importance of long-term traffic prediction for real-world applications, illustrating how these models provide valuable insights that aid in traffic management and urban planning. The continuous development of these methods to capture complex traffic patterns plays a vital role in advancing ITS.

To address the challenges and evolving requirements in short-term and long-term traffic prediction, this study introduces the Short-Term and Long-Term Integrated Transformer (SLIT). SLIT is designed to leverage the strengths of Transformer models for effectively handling the complexities of traffic data, enabling reliable predictions for continuous periods within the short term (up to 1 h) and for specific days in the long-term duration (e.g., 1, 2, ... days ahead). Unlike existing methods, which often struggle to simultaneously capture the dynamic nature of short-term and long-term traffic patterns, SLIT introduces a novel architecture that ensures a comprehensive approach to traffic forecasting. The key contributions of this study are as follows:

The proposed SLIT, which incorporates the Enhanced Data Preprocessing (EDP) and the Short-Term and Long-Term Integrated Encoder–Decoder (SLIED), represents a significant advancement in traffic prediction. SLIT demonstrates the capability to predict travel times across various periods.
The EDP module effectively utilizes periodic segments and temporal attributes in data processing, significantly enhancing the model’s predictive capabilities for both short-term and long-term traffic scenarios.
The SLIED module, consisting of Short-Term and Long-Term Integrated Encoding (SLI-E) and Decoding (SLI-D), captures dependencies over different time frames. This design mitigates error accumulation commonly observed in autoregressive models.
Comprehensive testing on a large-scale real-world dataset proves that SLIT outperforms contemporary methods in both short-term and long-term travel time predictions. Notably, SLIT exhibits significant enhancement in short-term prediction, achieving improvements of up to 9.67%, 9.2%, and 8.66% in terms of MAE, RMSE, and SMAPE, respectively.
Across a wide range of road complexity conditions, SLIT consistently achieves notable results. Such performance demonstrates SLIT’s capability to handle varied road conditions effectively, proving its adaptability and robustness across multiple traffic environments.

The rest of this paper is organized as follows: Section 2 provides a review of the relevant literature. Section 3 details the proposed framework. Following this, Section 4 and Section 5 present the experimental setup and the experimental results and discussions, respectively. Finally, Section 6 concludes with a summary of our study.

2. Related Work

This section explores the use of Transformer models in Intelligent Transportation Systems (ITSs), emphasizing their widespread applications with a particular focus on advantages in long-term prediction. The section also presents an overview of the evolution and capabilities of sequence-to-sequence models in long-term travel time prediction. This review highlights the existing gaps that the proposed model is designed to address.

2.1. Transformer Models in Traffic Prediction

Transformers have broadened the scope of traffic prediction research in ITS, showing notable versatility in managing sequential data and capturing long-range dependencies [12,13]. They are effectively applied in a range of applications from travel time prediction [14] to traffic flow [1,4,15] and speed analysis [16], demonstrating their capability to process diverse data types and address complex traffic system needs. Studies [12,17] have illustrated their ability to capture complex spatiotemporal patterns and adapt to various scenarios, including those integrating weather data [1,4]. Such adaptability not only demonstrates their potential to address multifaceted traffic system challenges but also demonstrates their ability to incorporate external factors, such as weather conditions, into their analysis. The notable effectiveness of Transformers across various settings, particularly in their ability to manage long-range dependencies, highlights a holistic approach to understanding traffic behavior.

Transformer-based models have introduced innovative strategies that significantly improve over traditional forecasting methods, particularly by capturing intricate temporal relationships more effectively. One such advancement, the Informer model [18], tackles the inherent issues of traditional Transformers related to quadratic time complexity and substantial memory requirements. Employing a ProbSparse self-attention mechanism alongside a generative decoder, the Informer model significantly boosts the speed of processing long data sequences, improving computational efficiency. Further, Spatial-Temporal Transformer Networks (STTNs) [19] leverage dynamically directed spatial dependencies and long-range temporal dependencies. This approach emphasizes the capacity of Transformer architectures to navigate the highly nonlinear and dynamic spatiotemporal dependency characteristic of traffic flows. Additionally, MultiResFormer [20] introduces a dynamic approach to modeling time series by selectively choosing the optimal lengths of data patches. This method demonstrates the flexibility and efficiency of Transformer models. Meanwhile, TS-Transformer [21] makes enhancements to overcome the common limitation of slow training and inference times in Transformer models. With mechanisms like the Sub Window Tokenizer and a Time-Series Pre-trained Encoder, TS-Fastformer accelerates the processing speed, streamlining the workflow for faster outcomes. Moreover, PDFormer [15] introduces a novel approach for traffic flow prediction, considering propagation delays within dynamic long-range scenarios. This innovative approach broadens the forecasting framework, integrating dynamic considerations to enrich traffic flow models’ predictive accuracy and relevance. These contributions signal a significant enhancement in traffic forecasting methodologies. Transformer models, with their unique strengths, have shown vast potential in long-term time series analysis, adeptly managing the dynamic and complex nature of traffic data. This capability provides more precise, reliable, and comprehensive solutions to the challenges widespread in today’s ITS landscape.

2.2. Sequence-to-Sequence Models in Long-Term Prediction

Sequence-to-sequence methods are commonly used in the domain of travel time prediction, representing a significant evolution in the application of deep learning within ITS. These methods have led to the development of innovative models that provide substantial improvements over traditional machine learning approaches. For instance, the deep ensemble stacked LSTM model (DE-SLSTM) [22] integrates weather conditions into long-term forecasts covering several hours. One study established a network of Long Short-Term Memory units paired with Deep Neural Networks (LSTMs–DNNs) for analyzing highway travel time data [6]. In another effort, the synergistic use of GRU and XGBoost was utilized to uncover hidden features in traffic data, leading to improved short-term forecasts [8]. Furthermore, MTSMFF introduces a multivariate time series forecasting framework that employs an attention-based encoder–decoder model, utilizing BiLSTM for encoding data’s hidden states before making predictions [23]. Continuing this trend, the temporal fusion Transformer (TFT) [24] combines short-term and long-term temporal dependencies, utilizing diverse inputs for predictions ranging from 5 to 150 min. The fusion network [25] captures both long- and short-term dependencies by integrating spatial and temporal information. Additionally, the study in [5] introduces an LSTM-based encoder–decoder architecture with attention mechanisms, designed for multi-step, long-term travel time predictions. These advancements highlight the role of deep learning networks as foundational elements in traffic prediction, enhancing the capabilities of ITS.

These approaches generally focus on predicting traffic conditions for the upcoming few hours. However, when forecasts are required for more extended periods, such as multiple days, the task becomes significantly more challenging. The complexity of traffic variables and the necessity for longer forecasting intervals highlight the critical need for accurate long-term forecasts in practical scenarios. One common difficulty with extended forecasts is the progressive accumulation of errors in multi-step forecasting. To tackle this, the research highlighted in [5] presents an approach that employs an LSTM-based encoder–decoder model with attention mechanisms designed to each of the periodic patterns. This model, referred to as PASS2S, processes each periodic segment individually, generating multiple forecasts that are then integrated through a fusion mechanism to produce the final prediction. By effectively leveraging periodic patterns, this strategy enables the model to focus on essential information at each forecasting step, thereby mitigating the issue of error accumulation common in long-term predictions. While this demonstrates advantages in handling periodic data and employing attention mechanisms, it does not specifically focus on simultaneously managing short-term and long-term predictions, a critical area for comprehensive traffic forecasting.

Despite these advancements, the domain of both long-term and short-term prediction remains challenging, particularly when extending the forecasting horizon while ensuring reliable performance. Capturing complex traffic patterns, along with the need to manage sequential data and long-range dependencies, demonstrates the intricacies involved in traffic forecasting. These challenges motivate the exploration of Transformer-based methods, known for their capabilities in capturing long-range dependencies effectively. In this study, the proposed Short-Term and Long-Term Integrated Transformer (SLIT) effectively addresses both the need for accurate short-term forecasts and the complexities of long-term traffic prediction.

3. Proposed Methodology

3.1. Problem Formulation

The aim of this study is to forecast future travel times over extended periods for specific road segments by leveraging their respective historical traffic data. The predictive challenge can be formulated as follows:

Given the historical traffic data

X^{τ}

spanning a duration of T time steps of a road segment r, our goal is to develop a model

f (\cdot)

to estimate the travel time for the upcoming

T^{'}

time steps, starting from

Δ

steps after the present time

τ

, denoted as

Y^{τ, Δ}

. The model

f (\cdot)

is thus defined as follows:

[X_{τ - T + 1}, \dots, X_{τ - 1}, X_{τ}] \overset{f_{r}^{Δ}}{\to} [Y_{τ + Δ + 1}, Y_{τ + Δ + 2}, \dots, Y_{τ + Δ + T^{'}}]

(1)

Note: To maintain clarity and focus in the subsequent sections of this paper, the superscripts of

τ

and

Δ

in

X

and

Y

will be omitted.

3.2. Overall Structure of the Proposed Framework

Figure 1 presents the architecture of the proposed Short-Term and Long-Term Integrated Transformer (SLIT), which is composed of two main modules: the Enhanced Data Preprocessing (EDP) and the Short-Term and Long-Term Integrated Encoder–Decoder (SLIED). At the beginning, the EDP module refines input historical traffic data

X

into attribute-encoded periodic segments

\hat{S}

. These segments capture both short-term and long short-term patterns of historical traffic data. They incorporate periodic information, temporal attributes, and positional encodings to extract patterns over various time horizons, thereby enhancing the model’s predictive capabilities for diverse time frames. Subsequently, SLIED, a Transformer-based encoder–decoder architecture, processes these enriched segments to generate multi-step travel time predictions

\hat{Y}

.

3.3. Enhanced Data Preprocessing

Inspired by the collaborative use of attention mechanisms with periodic segment generation for travel time prediction [5], the Enhanced Data Preprocessing (EDP) is devised to enhance the quality of historical traffic data

X

. This procedure aims to facilitate the generation of attribute-encoded periodic segments

\hat{S}

, which are further divided into long-term and short-term segments. The procedure is illustrated in Figure 2 and is detailed in Algorithm A1, as presented in Appendix A. For additional clarity, a detailed explanation of the symbols utilized throughout the process is provided in Table 1.

In Figure 2, EDP incorporates several procedures to refine the input data for subsequent predictive modeling. Initially, missing data points are tackled through the Missing Data Filling (MDF) using linear interpolation. By averaging the data from equivalent time slots on adjacent days where information is present, MDF ensures the continuity and reliability of the dataset, thus preparing it for further analysis.

Following this, the Data Scaling (DS) step is introduced, where the data are normalized by Z-score standardization. This normalization mitigates the influence of outliers and ensures consistency in data scales. This standardization is necessary for enhancing the predictive model’s ability to generalize and perform consistently, regardless of the varying scales of the input data.

Furthermore, the Temporal Attribute Extraction (TAE) transforms each data point, denoted as

X_{i}

, into an attribute-encoded vector

{\hat{X}}_{i}

in

R^{m}

, where

m = 316

[5]. The first two dimensions of this vector represent essential traffic information, including travel time and speed. The subsequent 314 dimensions are encoded as one-hot vectors, representing a range of temporal attributes. These include holidays, the day of the week, month, peak hours, and specific time slots, providing a comprehensive view of the traffic patterns influenced by temporal patterns and societal behaviors, as detailed in Table 2.

The core part of EDP is known as Periodic Segmentation (PS). Building upon the segmentation approach outlined in [5], PS is designed specifically for SLIT’s requirements. To effectively capture temporal dependencies in traffic data, PS systematically segments

\hat{X}

into periodic segments S, which include long short-term segments

S^{L}

and long short-term segments

S^{S}

, as presented in Figure 3. For example, considering the timestamp of January 8th at 8:00 AM (

τ

), the objective is to forecast travel times for the following hour. As depicted in Figure 3a,

S^{L}

includes traffic data from the same hour but on the d-th day before, with each day defined by an interval of 1 day. Thus,

S_{7}^{L}

contains traffic data from January 1st (the 7th day before January 8th), represented as twelve consecutive time slots, each denoted by

s_{7, l}^{L}

(

l = 1, \dots, 12

). Each time slot represents a discrete 5 min interval, specifically from 8:00 AM to 8:55 AM, covering the entire hour prior to 9:00 AM, aligned with the prediction target. Each vector

s_{7, l}^{L}

is then encoded into a 316-dimensional feature vector, as illustrated in Figure 3b. On the other hand,

S^{S}

captures the most recent short-term segment, containing the latest and most immediate data points leading up to the current time. This segment ensures that the model has access to the latest information. Each data point in both

S^{L}

and

S^{S}

is comprehensively encoded. Detailed descriptions of the symbol definitions and specific dimensions are provided in Table 1.

This PS plays a pivotal role in structuring the data for the subsequent application of positional encoding, therefore enhancing the overall effectiveness of the SLIT model in traffic prediction. The algorithm, detailed in Algorithm A1, through its structured presentation of the segmentation process, enhances both clarity and reproducibility while perfectly aligning with the SLIT model’s innovative approach to traffic data analysis.

Following the original Transformer model [26], Positional Encoding (PE) is applied to the periodic segments S to enhance temporal context understanding. PE utilizes sinusoidal functions to encode each position within S in

R^{(D^{'} \times T^{'}) \times m}

, where

D^{'} = 8

and

T^{'} = 12

denote the number of segments and the number of time points per segment, respectively. This results in 96 unique positions, each associated with

m = 316

dimensions of features, making up the tensor dimensions of

96 \times 316

. This encoding embeds each position with a pair of sine and cosine functions, whose frequencies are dependent on the position within S and scaled logarithmically by the dimension index. A detailed pseudocode for this PE process is described in Algorithm A1. Following PE, the encoded tensor,

\hat{S}

, is then split into short-term and long short-term encoded segments, represented as

{\hat{S}}^{S}

(an

12 \times 316

tensor) and

{\hat{S}}^{L}

(an

84 \times 316

tensor), respectively. These encoded segments are subsequently inputted into the SLIED module. This process not only aims to enhance SLIT’s capability for predicting travel time across different time frames but also contributes to improving data quality.

3.4. Short-Term and Long-Term Integrated Encoder–Decoder

The Short-Term and Long-Term Integrated Encoder–Decoder (SLIED) forms an integral part of the SLIT framework, exploiting the potential of

\hat{S}

for a multi-step travel time prediction

\hat{Y}

. SLIED is composed of two main components: the Short-Term and Long-Term Integrated Encoder (SLI-E) and the corresponding Decoder (SLI-D). The procedure is detailed in Algorithm 1.

Algorithm 1 Short-Term and Long-Term Integrated Encoder–Decoder.

1:: Input: Periodic segments $\hat{S}$ , Target travel time Y
2:: Output: Predicted travel time $\hat{Y}$
3:: procedure SLI-E
4:: $S_{e n c}^{S} \leftarrow S D E B (\hat{S^{S}})$
5:: $S_{e n c}^{L} \leftarrow L D E B (\hat{S^{L}})$
6:: $S_{e n c} \leftarrow Concatenate (S_{e n c}^{S}, S_{e n c}^{L})$
7:: end procedure
8:: procedure SLI-D
9:: $D e c I n p u t \leftarrow PE (Y)$
10:: for each decoder layer do
11:: $M a s k e d O u t \leftarrow MSA (D e c I n p u t)$
12:: $C r o s s O u t \leftarrow CrossAtt (M a s k e d O u t, S_{e n c})$
13:: $D e c O u t \leftarrow FFN (C r o s s O u t)$
14:: $D e c I n \leftarrow AddNorm (D e c O u t)$
15:: end for
16:: $S_{d e c} \leftarrow D e c I n p u t$
17:: end procedure
18:: $\hat{Y} \leftarrow FC (S_{d e c})$

SLI-E is designed to capture a wide range of temporal patterns, addressing both long- and short-term dependencies within the data through distinct encoding blocks: the Long Short-Term Dependency Encoding Block (LDEB) and the Short-Term Dependency Encoding Block (SDEB), respectively. The design of LDEB and SDEB follows the encoder block of the original Transformer model [26].

LDEB processes the long short-term segments

{\hat{S}}^{L}

, represented as an

84 \times 316

tensor, which aggregates data from each day’s short-term segments extracted by EDP. This data aggregation over consecutive days is beneficial for extracting and learning dependencies from observed patterns over extended periods. The self-attention mechanism is then applied to these periodic segments, weighing the significance of each input based on its relevance to the prediction. The design of LDEB aims to intuitively model potential influences from past traffic conditions on future predictions, hoping to enrich the model’s forecasting capabilities based on observed outcomes. Consequently, LDEB’s output, denoted as

S_{e n c}^{L}

, offers a representation of long-term temporal dependencies. Conversely, SDEB is designed for capturing the most recent traffic conditions close to the current time by processing

{\hat{S}}^{S}

, which is formatted as a

12 \times 316

tensor. This tensor contains the immediate short-term traffic data, representing the latest and most relevant traffic conditions. The SDEB algorithm analyzes data points up to the current moment, using a self-attention mechanism to learn from rapid changes occurring in short intervals. It assigns significance to each data point based on its impact on upcoming traffic conditions. The output of SDEB is denoted as

S_{e n c}^{S}

.

The encoded long-term and short-term representations,

S_{e n c}^{L}

and

S_{e n c}^{S}

, are integrated into

S_{e n c}

, a

96 \times 316

tensor, which contains temporal dependencies important for both upcoming and extended travel time predictions. This integrated result,

S_{e n c}

, then serves as the input for the subsequent SLI-D stage.

SLI-D uses cross-attention mechanisms to relate encoded features

S_{e n c}

with target data Y. This establishes connections between historical data and target queries, enhancing traffic time prediction. The non-autoregressive design of the decoder, enhanced by cross-attention, effectively mitigates error accumulation across extended prediction periods. The output of SLI-D, denoted as

S_{d e c}

, is processed through a two-layer fully connected neural network (FC). This FC maps the encoded features from

S_{d e c}

into a 12x1 vector. Each element of this vector provides a travel time prediction for consecutive time steps within a designated hour. The result is structured into

\hat{Y}

, a practical forecast format used for hourly traffic management predictions.

In our design, we strategically utilize periodic segments together with the architecturally distinct dual aspects of the Transformer encoder blocks, specifically LDEB and SDEB in SLI-E, to separately learn short-term and long-term dependencies. These components are effectively integrated with SLI-D, which utilizes attention mechanisms to focus on various temporal horizons. This strategic integration not only sets our model apart from conventional Transformer applications but also is specifically designed to tackle the complex multi-horizon forecasting challenges associated with traffic data patterns.

4. Experimental Setup

4.1. Datasets

To validate the proposed SLIT performance, comprehensive experiments were conducted using the Taiwan expressway dataset from the Freeway Bureau of Taiwan, R.O.C. [27]. This dataset includes traffic information for 322 road segments. Following the methodology in [5], we selected 15 representative road segments from different regions in Taiwan—North, Central, and South. Table 3 provides detailed information about these segments, including their geographic locations and highway specifications. This table demonstrates the distribution of segments across Taiwan’s major regions, facilitating a contextual understanding of our experimental environment. The selected data cover travel time data from 1 October 2019 to 31 January 2021. To ensure fairness in the experimental design, the first year of data was used for training, 1 month for validation, and the last 3 months for the test phase.

4.2. Competitive Methods

To effectively assess the proposed SLIT’s capabilities, it is compared with both widely recognized baseline and existing leading methods in the field. These methods are briefly outlined as follows:

HA [28]: The HA model employs a traditional time series regression approach by using historical data averages as the basis for its forecasts.
LSTM [29]: LSTM models, a specialized form of recurrent neural networks, are adept at handling sequence prediction tasks, including the forecasting of travel times.
DNN [30]: This deep neural network model, consisting of six layers, is designed to tackle a range of traffic-related prediction tasks, such as estimating travel times and analyzing traffic flows.
DE-SLSTM [22]: This DE-SLSTM enhances the capabilities of traditional LSTM models by focusing on both short-term and long-term historical traffic data dependencies, aiming to improve travel time prediction accuracy.
MTSMTT [23]: A framework for multivariate time series forecasting that integrates BiLSTM units with attention mechanisms to unveil hidden data patterns for precise forecasting efforts.
DHM [8]: This DHM integrates the Gated Recurrent Unit (GRU) with the XGBoost algorithm for freeway travel time predictions, employing linear regression for the integration process.
TFT [24]: TFT leverages Temporal Fusion Transformers to integrate various input types, showcasing its adaptability in predicting freeway speeds under different conditions.
PASS2S [5]: The PASS2S model embeds an attention mechanism within a sequence-to-sequence LSTM framework, targeting specifically the challenge of long-term travel time forecasting.

To ensure consistency in our evaluation, we use the test results from [5], aligning our experimental data with their study’s findings.

4.3. Parameter Settings

In configuring our model’s hyperparameters, a learning rate of 0.0001 and a batch size of 128 were set. We employed a multi-head attention mechanism with 4 heads in each encoder block. The architecture includes 2 layers each for both the encoder and the decoder. Additionally, the output dimension of the fully connected layer was set to 12, corresponding to a 1 h travel time prediction. Our experimental setup involved training the model for 15 epochs, and Mean Square Error (MSE) was utilized as the criterion for the loss function.

4.4. Evaluation Metrics

The evaluation of the proposed SLIT in this study was conducted using three standard metrics often utilized in time series forecasting. These metrics are as follows:

Mean Absolute Error (MAE): Measures the average magnitude of errors in a set of predictions, without considering their direction.
Root Mean Square Error (RMSE): Provides a measure of the magnitude of errors by squaring the average difference between predicted and actual values, emphasizing larger errors.
Symmetric Mean Absolute Percentage Error (SMAPE): Offers a normalized measure of prediction error in percentage terms, symmetrically penalizing both over-predictions and under-predictions, thus ensuring fairness regardless of the direction of errors.

These metrics are calculated as follows:

\begin{matrix} MAE & = \frac{1}{N \times l} \sum_{i = 1}^{N} \sum_{j = 1}^{l} | y_{i, j} - {\hat{y}}_{i, j} | \end{matrix}

(2)

\begin{matrix} RMSE & = \sqrt{\frac{1}{N \times l} \sum_{i = 1}^{N} \sum_{j = 1}^{l} {(y_{i, j} - {\hat{y}}_{i, j})}^{2}} \end{matrix}

(3)

\begin{matrix} SMAPE & = \frac{100}{N \times l} \sum_{i = 1}^{N} \sum_{j = 1}^{l} \frac{| y_{i, j} - {\hat{y}}_{i, j} |}{(| y_{i, j} | + | {\hat{y}}_{i, j} |) / 2} \end{matrix}

(4)

where

$y_{i, j}$ is the actual travel time for the i-th sample at the j-th time point;
${\hat{y}}_{i, j}$ is the predicted travel time for the i-th sample at the j-th time point;
N represents the total number of samples in the dataset; and
l is the number of time points in the prediction horizon.

5. Results and Discussions

This section presents a comparative analysis of the proposed SLIT against competitive methods. We use MAE, RMSE, and SMAPE metrics to assess SLIT’s performance for short-term predictions within the next hour and long-term predictions over horizons beyond 1 h across all road segments. Additionally, empirical evaluation results are categorized based on the road segments’ complexity levels. The best-performing results are highlighted in bold, with the second-best underlined.

5.1. Comprehensive Comparison across All Road Segments

5.1.1. Short-Term Travel Time Prediction

As depicted in Table 4, SLIT significantly outperforms all other methods in short-term prediction across all metrics. This superior performance is attributed to the integration of attention mechanisms across periodic segments and within the final prediction phase through the proposed decoder, SLI-D. This design is intended to allow SLIT to dynamically weigh the importance of short-term versus long-term data, adapting responsively to changes in traffic patterns. The improvements demonstrate the effectiveness of the design of SLIT in handling the intricacies of short-term traffic data. Historical Average (HA), a well-known traditional method, demonstrates the highest error, highlighting the complexity of traffic prediction that requires more complex models. LSTM, DNN, and DE-SLSTM show significant improvements over HA, which emphasizes the importance of deep learning in traffic forecasting. DHM performs notably in MAE, and PASS2S excels in RMSE among competitive methods. However, the design of SLIT aims to integrate attention across multiple temporal segments, which is expected to facilitate a more comprehensive understanding of traffic dynamics.

5.1.2. Long-Term Travel Time Prediction

Table 4 also illustrates the average overall performance for long-term predictions spanning 1 to 7 days. The proposed SLIT consistently achieves the best results in long-term forecasting for both MAE and SMAPE metrics, demonstrating its robustness. Although PASS2S leads in RMSE, the proposed SLIT secures the second-best performance, further presenting its comprehensive predictive strength. Drawing on the strengths of PASS2S, which excels in utilizing attention mechanisms across periodic patterns, our design of SLIT extends this strategy, by enhancing the attention mechanism to cover a broader focus on both short-term and long-term data. This design is intended to allow SLIT to dynamically adjust its focus depending on the immediate and future relevance of the data, thereby providing a more accurate and comprehensive forecasting ability across different time horizons. Overall, SLIT demonstrates a significant advancement in modeling complex temporal relationships over extended horizons. The modest decrease in performance compared with the best-performing model demonstrates the challenges inherent in long-term forecasting. Despite the inherent complexities of long-term forecasting, SLIT maintains a remarkable performance, proving its versatility across varying forecasting horizons, a crucial attribute for ITS implementations where prediction needs can range widely.

Delving deeper into the long-term prediction performance, Table 5 presents a day-by-day analysis over a 7-day period, revealing SLIT’s robust performance in long-term traffic prediction. Specifically, SLIT consistently leads in MAE and SMAPE metrics across each day, proving its superiority compared with competitive methods. Its consistent performance in MAE across all 7 days demonstrates SLIT’s ability to enhance the reliability of traffic forecasts. For RMSE, although SLIT does not always secure the top rank, its performance remains competitive, nearly matching or sometimes exceeding PASS2S. This tight competition in RMSE metrics shows SLIT’s competency in maintaining prediction performance even as forecasting horizons extend. Furthermore, the SMAPE results across the week highlight SLIT’s exceptional performance in ensuring consistently precise predictions. Achieving the lowest SMAPE values demonstrates SLIT’s capability to provide consistent and reliable forecasts, which is crucial for planning and operational decision making in ITS applications.

5.1.3. Statistical Analysis of Predictive Performance

To validate our findings, we conducted a statistical analysis comparing the MAEs and RMSEs of the presented models for statistical significance. We selected the Mann–Whitney U test due to its robustness in handling non-normally distributed data [31]. Our analysis utilized data from multiple prediction horizons, as presented in Table 5, to assess the stability and consistency of the SLIT model in long-term prediction. Compared with the best competitive method (PASS2S), for MAE, our analysis indicates that SLIT statistically outperforms PASS2S. This result demonstrates a significant advantage across the prediction horizon. Conversely, the RMSE analysis showed no significant difference, suggesting that SLIT and PASS2S manage larger errors similarly and that SLIT does not have a significant disadvantage compared with PASS2S. Given these outcomes, we extended our analysis to include tests against other competitive models, hypothesizing that SLIT’s design capabilities are likely to result in lower RMSE values. This evidence represents that SLIT’s RMSE performance is not only statistically significant but also superior to these competitors under long-term prediction conditions. The details of this statistical analysis are described in Appendix B.

5.2. Performance Comparison on Road Segments of Varying Complexities

To explore SLIT’s effectiveness in diverse traffic scenarios, its performance was evaluated across road segments categorized by different levels of complexity. This complexity was measured based on the variability of travel times, as depicted in Figure 4. We divided road segments into three groups—High, Moderate, and Low Variability—according to their standard deviations (

σ

). Segments with

σ

greater than 100, termed High Variability, experience significant travel time fluctuations, often due to congestion, accidents, or adverse weather. Moderate Variability segments, defined by

σ

between 50 and 100, experience moderate fluctuations, while those with

σ

below 50, labeled Low Variability, indicate consistent and predictable travel times.

Additionally, we consider the coefficient of variation (

C V

), which measures variability relative to the mean. This metric is calculated as the ratio of the standard deviation (

σ

) to the mean travel time (

μ

), and is represented as follows:

C V = \frac{σ}{μ}

(5)

A higher

C V

value suggests a higher forecasting challenge due to increased unpredictability within a segment [5,32]. Typically, High or Moderate Variability segments exhibit

C V

values exceeding 0.1, signifying the heightened challenge in forecasting their conditions as opposed to those in the Low Variability category. Figure 4 provides a visual representation of the

σ

and

C V

of travel times for each road segment, illustrating the different levels of variability and complexity across the categorized segments. This depiction clearly represents the variability range from high to low, offering a concise visual summary of the textual data presented.

Table 6 illustrates the short-term prediction performance for different types of road segments. The proposed SLIT outperforms all competitive methods across complex, medium, and smooth road conditions, proving its robustness. Notably, the proposed SLIT exhibits the most remarkable improvements in all metrics for road segments of varying complexities, particularly in complex road scenarios. These areas present more significant forecasting challenges due to greater variability and unpredictability in traffic patterns. Moreover, the proposed SLIT demonstrates the most notable improvements in all metrics across road segments of varying complexities, with the most notable advancements observed in complex road scenarios. These environments cause greater forecasting challenges due to their increased variability and unpredictability in traffic conditions.

Table 7 presents the average overall performance for long-term predictions from 1 to 7 days across road segments of varying complexities. The proposed SLIT consistently achieves the best results for both MAE and SMAPE metrics. Although SLIT often matches or exceeds the performance of PASS2S in long-term predictions, it is noteworthy that, in high variability road segments, SLIT ranks second to PASS2S regarding RMSE. This difference can possibly be explained by the design orientations of the two systems: while SLIT is designed to adapt across various prediction horizons, PASS2S focuses on long-term prediction with an attention mechanism that handles each segment independently, perhaps making it particularly effective in highly variable environments. SLIT’s broader attention mechanism, focusing on flexibility across different horizons, may not specifically target the variabilities in complex traffic scenarios as effectively as PASS2S, where RMSE becomes a critical metric. This observation suggests that although SLIT provides robust long-term prediction capabilities, there may be opportunities to refine its attention mechanism to better manage scenarios with high variability. Exploring these refinements could be a valuable direction for future work, potentially leading to enhanced RMSE management in complex traffic conditions for SLIT.

As we conclude our primary analysis of SLIT’s performance across various traffic scenarios, we have also included detailed experimental results for peak and off-peak hours in Appendix C. These results are aligned with the PASS2S [5] settings for consistency. These additional results provide a deeper dive into the model’s adaptability to different traffic densities and times, offering insights that complement the main findings discussed above.

6. Conclusions

This study addresses predicting travel times for both short-term and long-term intervals, a key requirement in Intelligent Transportation Systems (ITSs). We propose the Short-Term and Long-Term Integrated Transformer (SLIT) model in this field, harmonizing the capabilities of Enhanced Data Preprocessing (EDP) with the Short-Term and Long-Term Integrated Encoder–Decoder (SLIED). This combination enables SLIT to effectively handle the intricacies of traffic data across varied time horizons, effectively tackling the challenges of traffic prediction across different time horizons. Extensive evaluations on a large-scale real-world traffic dataset demonstrate the robust capabilities of SLIT compared with existing competitive methods in both short-term and long-term travel time predictions. Remarkable improvements are observed in SLIT, with enhancements of up to 9.67%, 9.2%, and 8.66% in MAE, RMSE, and SMAPE metrics, respectively, for short-term forecasting. Furthermore, the impressive results highlight the notable competence of SLIT across road segments of varying complexities, demonstrating its adaptability and efficacy in diverse traffic scenarios. These results signify a significant advancement in ITS travel time prediction.

Author Contributions

Conceptualization, H.-T.C.L., H.D. and V.S.T.; methodology, H.-T.C.L., H.D. and V.S.T.; software, H.D.; validation, H.-T.C.L. and H.D.; formal analysis, H.-T.C.L. and V.S.T.; investigation, H.-T.C.L. and H.D.; resources, V.S.T.; data curation, H.-T.C.L. and H.D.; writing—original draft preparation, H.-T.C.L. and V.S.T.; writing—review and editing, H.-T.C.L. and V.S.T.; visualization, H.-T.C.L.; supervision, V.S.T.; project administration, V.S.T.; funding acquisition, V.S.T. All authors have read and agreed to the published version of the manuscript.

Funding

This research was supported in part by the National Science and Technology Council of Taiwan under grant nos. 111-2221-E-A49-124-MY3 and 112-2634-F-A49-005.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The original data presented in this study are openly available on the website of the Freeway Bureau of Taiwan, R.O.C., at https://tisvcloud.freeway.gov.tw (accessed on 10 March 2021).

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A. Algorithm for EDP

Algorithm A1 Enhanced Data Preprocessing.

1:: Input: Historical data X
2:: Output: Attribute-encoded periodic segments $\hat{S}$
3:: Define range for X: $X = [X_{τ - T + 1}, \dots, X_{τ - 1}, X_{τ}]$
4:: MDF: Fill missing data using linear interpolation
5:: DS: Standardize data using Z-score: $z = \frac{(X - μ)}{σ}$
6:: TAE: Encode each data point $X_{i}$ into ${\hat{X}}_{i}$ in $R^{m}$ , where $m = 316$
7:: procedure PS
8:: for $d = 0$ to $D - 1$ do
9:: Define $S_{d}^{L}$ as the d-th long short-term segment
10:: for $d = 1$ to D do
11:: $t_{d} = τ - d \times i n t e r v a l$
12:: for $l = 1$ to $T^{'}$ do
13:: $s_{d, l}^{L} \leftarrow {\hat{X}}_{t_{d} + l}$
14:: end for
15:: $S_{d}^{L} \leftarrow s_{d, 1 : T'}^{L}$
16:: end for
17:: end for
18:: $S^{L} \leftarrow S_{1 : D}^{L}$
19:: Define $S^{S}$ as the short-term segment
20:: $t_{k} = τ - T^{'}$
21:: for $l = 1$ to $T^{'}$ do
22:: $s_{l}^{S} \leftarrow {\hat{X}}_{t_{k} + l}$
23:: end for
24:: $S^{S} \leftarrow s_{1 : T'}^{S}$
25:: Combine $S^{L}$ and $S^{S}$ into S
26:: end procedure
27:: procedure PE
28:: $\hat{S} \leftarrow Create tensor of size (D^{'} \times T^{'}) \times m$
29:: for $p o s \leftarrow 0$ to $(D^{'} \times T^{'}) - 1$ do
30:: for $i \leftarrow 0$ to $\frac{m}{2} - 1$ do
31:: ${\hat{S}}_{(p o s, 2 i)} \leftarrow sin (\frac{p o s}{10000^{2 i / d_{m o d e l}}})$
32:: ${\hat{S}}_{(p o s, 2 i + 1)} \leftarrow cos (\frac{p o s}{10000^{2 i / d_{m o d e l}}})$
33:: end for
34:: end for
35:: $\hat{S} \leftarrow Combine \hat{S} with S$
36:: end procedure
37:: Split: Divide $\hat{S}$ into ${\hat{S}}^{S}$ and ${\hat{S}}^{L}$

Appendix B. Statistical Analysis of Predictive Performance

To validate our findings, we conducted a statistical analysis comparing the MAEs and RMSEs of the presented models for statistical significance. We selected the Mann–Whitney U test due to its suitability for its robustness in handling non-normally distributed data [31]. Our analysis utilized 7 days of data, as presented in Table 5, to assess the stability and consistency of the SLIT model under varying conditions. Compared with the best competitive method, PASS2S, in MAE, we employed a one-tailed test based on our directional hypothesis, anticipating superior performance from SLIT.

Null Hypothesis ( $H_{0}$ ): $M A E_{S L I T} \geq M A E_{P A S S 2 S}$
Alternative Hypothesis ( $H_{1}$ ): $M A E_{S L I T} < M A E_{P A S S 2 S}$

Our findings (U statistic: 9.0, p-value: 0.0265) indicate that SLIT statistically outperforms PASS2S. This result is statistically significant as the p-value is less than 0.05, suggesting a significant advantage in decreasing errors across the prediction horizon.

Conversely, for RMSE, initial tests were two-tailed to assess any performance differences without a predefined direction of superiority, given the competitive nature of SLIT and PASS2S in managing larger errors.

Null Hypothesis ( $H_{0}$ ): $R M S E_{S L I T} = R M S E_{P A S S 2 S}$
Alternative Hypothesis ( $H_{1}$ ): $R M S E_{S L I T} \neq R M S E_{P A S S 2 S}$

The RMSE results (U statistic: 38.0, p-value: 0.9636) showed no significant difference, indicating that both models manage larger errors similarly. This finding suggests that there is no statistically significant difference in how each model handles larger errors under the tested conditions.

Given these outcomes, we extended our analysis to include one-tailed tests against other competitive models, hypothesizing that SLIT’s capabilities are likely to result in lower RMSE values.

Null Hypothesis ( $H_{0}$ ): $R M S E_{S L I T} \geq R M S E_{O t h e r M o d e l s}$
Alternative Hypothesis ( $H_{1}$ ): $R M S E_{S L I T} < R M S E_{O t h e r M o d e l s}$

Results from these one-tailed tests, detailed in Table 5, confirm that SLIT significantly outperforms DE-SLSTM, MTSMFF, DHM, and TFT in RMSE reduction. This evidence, indicated by p-values well below 0.05, underscores that SLIT’s RMSE performance is not only statistically significant but also superior to these competitors under long-term prediction conditions.

Table A1. Statistical comparison results.

Competitive Methods	U Statistic	p-Value
DE-SLSTM	0.0	0.00029
MTSMFF	6.0	0.00874
DHM	5.0	0.00554
TFT	5.0	0.00554

Appendix C. Experimental Results for Peak and Off-Peak Hours

Appendix C.1. Short-Term Prediction

In the short-term prediction analysis, the performance of the proposed SLIT excels during both peak and off-peak hours, indicating its robustness under varying traffic conditions. These results are detailed in Table A2, where SLIT’s superior performance across all evaluation metrics can be observed.

Table A2. Comparison with competitive methods in short-term prediction across peak and off-peak hours.

	Peak			Off-Peak
	MAE	RMSE	SMAPE (%)	MAE	RMSE	SMAPE (%)
HA [28]	43.326	87.989	7.076	32.654	70.332	6.161
LSTM [29]	36.152	81.328	5.541	21.001	57.223	3.931
DNN [30]	32.605	68.975	5.342	21.827	53.581	4.189
DE-SLSTM [22]	29.003	62.941	4.861	19.315	48.498	3.740
MTSMFF [23]	32.49	70.536	5.837	23.742	54.207	4.825
DHM [8]	26.035	64.071	4.449	17.284	47.132	3.448
TFT [24]	44.852	89.494	8.029	28.362	62.695	5.620
PASS2S [5]	28.222	61.011	4.778	18.738	46.972	3.668
SLIT	22.667	53.892	3.846	16.656	42.846	3.295
Improvement ratio (%)	12.938	11.668	13.555	3.634	8.783	4.432

Appendix C.2. Long-Term Prediction

For long-term predictions, the results are detailed in Table A3, representing the average performance over a 7-day period. SLIT does not perform as effectively in RMSE, particularly during peak hours. However, SLIT excels in MAE and SMAPE, outperforming all comparative methods during off-peak hours, and shows a performance that is comparable with that of PASS2S during peak hours.

Table A3. Comparison with competitive methods in long-term prediction across peak and off-peak hours.

	Peak			Off-Peak
	MAE	RMSE	SMAPE (%)	MAE	RMSE	SMAPE (%)
HA [28]	43.335	87.007	7.081	32.659	70.134	6.141
LSTM [29]	43.440	89.895	6.958	29.011	69.080	5.488
DNN [30]	41.065	84	6.861	29.219	66.087	5.681
DE-SLSTM [22]	41.68	84.723	6.841	29.899	68.765	5.632
MTSMFF [23]	41.23	83.999	7.026	28.193	63.141	5.509
DHM [8]	43.506	86.560	7.459	29.609	64.803	5.808
TFT [24]	46.439	91.406	7.965	29.023	62.462	5.705
PASS2S [5]	37.596	78.879	6.211	27.105	62.947	5.197
SLIT	37.879	81.329	6.152	26.248	63.620	4.992
Improvement ratio (%)	−0.753	−3.106	0.948	3.1627	−1.855	3.935

References

Qi, X.; Mei, G.; Tu, J.; Xi, N.; Piccialli, F. A Deep Learning Approach for Long-Term Traffic Flow Prediction With Multifactor Fusion Using Spatiotemporal Graph Convolutional Network. IEEE Trans. Intell. Transp. Syst. 2023, 24, 8687–8700. [Google Scholar] [CrossRef]
Hou, Z.; Li, X. Repeatability and Similarity of Freeway Traffic Flow and Long-Term Prediction Under Big Data. IEEE Trans. Intell. Transp. Syst. 2016, 17, 1786–1796. [Google Scholar] [CrossRef]
Li, R.; Hu, Y.; Liang, Q. T2F-LSTM Method for Long-term Traffic Volume Prediction. IEEE Trans. Fuzzy Syst. 2020, 28, 3256–3264. [Google Scholar] [CrossRef]
Xie, Y.; Niu, J.; Zhang, Y.; Ren, F. Multisize Patched Spatial-Temporal Transformer Network for Short- and Long-Term Crowd Flow Prediction. IEEE Trans. Intell. Transp. Syst. 2022, 23, 21548–21568. [Google Scholar] [CrossRef]
Huang, Y.; Dai, H.; Tseng, V.S. Periodic Attention-based Stacked Sequence to Sequence framework for long-term travel time prediction. Knowl.-Based Syst. 2022, 258, 109976. [Google Scholar] [CrossRef]
Liu, Y.; Wang, Y.; Yang, X.; Zhang, L. Short-term travel time prediction by deep learning: A comparison of different LSTM-DNN models. In Proceedings of the 2017 IEEE 20th International Conference on Intelligent Transportation Systems (ITSC), Yokohama, Japan, 16–19 October 2017; pp. 1–8. [Google Scholar]
Zhao, F.; Zeng, G.Q.; Lu, K.D. EnLSTM-WPEO: Short-Term Traffic Flow Prediction by Ensemble LSTM, NNCT Weight Integration, and Population Extremal Optimization. IEEE Trans. Veh. Technol. 2020, 69, 101–113. [Google Scholar] [CrossRef]
Ting, P.Y.; Wada, T.; Chiu, Y.L.; Sun, M.T.; Sakai, K.; Ku, W.S.; Jeng, A.A.K.; Hwu, J.S. Freeway Travel Time Prediction Using Deep Hybrid Model – Taking Sun Yat-Sen Freeway as an Example. IEEE Trans. Veh. Technol. 2020, 69, 8257–8266. [Google Scholar] [CrossRef]
Belhadi, A.; Djenouri, Y.; Djenouri, D.; Lin, J. A recurrent neural network for urban long-term traffic flow forecasting. Appl. Intell. 2020, 50, 3252–3265. [Google Scholar] [CrossRef]
Li, Y.; Chai, S.; Ma, Z.; Wang, G. A Hybrid Deep Learning Framework for Long-Term Traffic Flow Prediction. IEEE Access 2021, 9, 11264–11271. [Google Scholar] [CrossRef]
Reza, S.; Ferreira, M.; Machado, J.; Tavares, J. A Multi-head Attention-based Transformer Model for Traffic Flow Forecasting with a Comparative Analysis to Recurrent Neural Networks. Expert Syst. Appl. 2022, 202, 1–11. [Google Scholar] [CrossRef]
Jin, D.; Shi, J.; Wang, R.; Li, Y.; Huang, Y.; Yang, Y.B. Trafformer: Unify Time and Space in Traffic Prediction. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, BC, Canada, 22–25 February 2023; Volume 37, pp. 8114–8122. [Google Scholar]
Oluwasanmi, A.; Aftab, M.U.; Qin, Z.; Sarfraz, M.S.; Yu, Y.; Rauf, H.T. Multi-head spatiotemporal attention graph convolutional network for traffic prediction. Sensors 2023, 23, 3836. [Google Scholar] [CrossRef] [PubMed]
Mashurov, V.; Chopurian, V.; Porvatov, V.; Ivanov, A.; Semenova, N. GCT-TTE: Graph Convolutional Transformer for Travel Time Estimation. arXiv 2023, arXiv:2301.07945. [Google Scholar] [CrossRef]
Jiang, J.; Han, C.; Zhao, W.X.; Wang, J. PDFormer: Propagation Delay-Aware Dynamic Long-Range Transformer for Traffic Flow Prediction. In Proceedings of the AAAI conference on artificial intelligence, Vancouver, BC, Canada, 22–25 February 2023; Volume 37, pp. 4365–4373. [Google Scholar]
Wu, L.; Wang, Y.Q.; Liu, J.B.; Shan, D.H. Developing a time-series speed prediction model using Transformer networks for freeway interchange areas. Comput. Electr. Eng. 2023, 110, 108860. [Google Scholar] [CrossRef]
Chen, C.; Liu, Y.; Chen, L.; Zhang, C. Bidirectional Spatial-Temporal Adaptive Transformer for Urban Traffic Flow Forecasting. IEEE Trans. Neural Netw. Learn. Syst. 2022, 34, 6913–6925. [Google Scholar] [CrossRef] [PubMed]
Zhou, H.; Zhang, S.; Peng, J.; Zhang, S.; Li, J.; Xiong, H.; Zhang, W. Informer: Beyond Efficient Transformer for Long Sequence Time-Series Forecasting. Proc. AAAI Conf. Artif. Intell. 2021, 35, 11106–11115. [Google Scholar] [CrossRef]
Mingxing, X.; Dai, W.; Liu, C.; Gao, X.; Lin, W.; Qi, G.J.; Xiong, H. Spatial-Temporal Transformer Networks for Traffic Flow Forecasting. arXiv 2020, arXiv:2001.02908. [Google Scholar]
Du, L.; Xin, J.; Labach, A.; Zuberi, S.; Volkovs, M.; Krishnan, R.G. MultiResFormer: Transformer with Adaptive Multi-Resolution Modeling for General Time Series Forecasting. arXiv 2023, arXiv:2311.18780. [Google Scholar]
Lee, S.; Hong, J.; Liu, L.; Choi, W. TS-Fastformer: Fast Transformer for Time-series Forecasting. ACM Trans. Intell. Syst. Technol. 2024, 15, 1–20. [Google Scholar] [CrossRef]
Chou, C.; Huang, Y.; Huang, C.; Tseng, V. Long-term traffic time prediction using deep learning with integration of weather effect. In Proceedings of the Advances in Knowledge Discovery and Data Mining —23rd Pacific-Asia Conference, PAKDD 2019, Macau, China, 14–17 April 2019. [Google Scholar] [CrossRef]
Du, S.; Li, T.; Yang, Y.; Horng, S.J. Multivariate time series forecasting via attention-based encoder–decoder framework. Neurocomputing 2020, 388, 269–279. [Google Scholar] [CrossRef]
Zhang, H.; Zou, Y.; Yang, X.; Yang, H. A temporal fusion transformer for short-term freeway traffic speed multistep prediction. Neurocomputing 2022, 500, 329–340. [Google Scholar] [CrossRef]
Lin, Y.; Ge, L.; Li, S.; Zeng, B. Prior Knowledge and Data-Driven Based Long- and Short-Term Fusion Network for Traffic Forecasting. In Proceedings of the 2022 International Joint Conference on Neural Networks (IJCNN), Padua, Italy, 18–23 July 2022; pp. 1–8. [Google Scholar]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, L.; Polosukhin, I. Attention Is All You Need. In Proceedings of the Advances in Neural Information Processing Systems, Long Beach, CA, USA, 4–9 December 2017. [Google Scholar]
Freeway Bureau Taiwan R.O.C. Taiwan Expressway Dataset. Available online: https://tisvcloud.freeway.gov.tw (accessed on 10 March 2021).
Smith, B.L.; Demetsky, M.J. Traffic Flow Forecasting: Comparison of Modeling Approaches. J. Transp. Eng. 1997, 123, 261–266. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long short-term memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Qu, L.; Li, W.; Li, W.; Ma, D.; Wang, Y. Daily Long-Term Traffic Flow Forecasting Based on a Deep Neural Network. Expert Syst. Appl. 2019, 121, 304–312. [Google Scholar] [CrossRef]
Ruland, F. The Wilcoxon-Mann-Whitney Test—An Introduction to Nonparametrics; Independently Published: USA, 2018; p. 77. [Google Scholar]
Everitt, B.; Skrondal, A. The Cambridge Dictionary of Statistics; Cambridge University Press: Cambridge, UK, 2010. [Google Scholar]

Figure 1. The architecture of the proposed framework.

Figure 2. Enhanced data preprocessing.

Figure 3. Periodic segments.

Figure 4. Travel time variation by road segment complexity organized in a descending step line chart. The chart displays the standard deviation (std) of travel time as a blue line and the coefficient of variation (cv) as a red dashed line, plotted against the left and right y-axes, respectively. Segments are annotated on the chart with thresholds of 100 and 50 to indicate their variability categories—high, moderate, or low.

Table 1. Symbol descriptions.

Symbol	Description
$S^{L}$	Collection of long short-term segments $S_{d}^{L}$ in $R^{\begin{matrix} (D \times T^{'}) \end{matrix} \times m}$ .
$S^{S}$	Short-term segment in $R^{\begin{matrix} (1 \times T^{'}) \end{matrix} \times m}$ .
$T^{'}$	Number of data points per periodic segment, 12 time slots per hour.
$t_{d}$ , $t_{k}$	Starting time step for $S_{d}^{L}$ and $S^{S}$ , respectively.
S	Combined periodic segments in $R^{\begin{matrix} (D^{'} \times T^{'}) \times m \end{matrix}}$ , with $m = 316$ .
D	Number of long short-term segments, set to 7 in this study.
$D^{'}$	Represents $D + 1$ , denoting the combined periodic segments S.
interval	Duration of each period’s time step, with 288 time slots per day.
$p o s$	Position within $\hat{S}$ , i indexes the feature dimensions.
$d_{m o d e l}$	Size of the feature vector.

Table 2. Temporal attributes.

Attributes	Dimensions	Description
		Weekday
Holiday	3	Weekend
		National holiday
Day of the week	7	Monday to Sunday
Month	12	January to December
		Morning-peak
Peak	4	Noon-peak
		Night-peak
		Off-peak
Time slot	288	# of time slots in each day (1 time slot = 5 min)

Table 3. Road segment information.

Area	ID	Highway	Name
	nfb0019	No. 1	Neihu Interchange to Yuanshan Interchange
	nfb0033	No. 1	Linkou Interchange to Taoyuan Interchange
North	nfb0370	No. 5	Toucheng Interchange to Pinglin Traffic Control Interchange
	nfb0425	No. 1	Taishan Connector Interchange to Linkou Interchange
	nfb0431	No. 1	Elevated Yangmei End to Hukou Interchange
	nfb0061	No. 1	Hsinchu System Interchange to Toufen Interchange
	nfb0063	No. 1	Toufen Interchange to Touwu Interchange
Central	nfb0064	No. 1	Touwu Interchange to Toufen Interchange
	nfb0247	No. 3	Tongxiao Interchange to Yuanli Interchange
	nfb0248	No. 3	Yuanli Interchange to Tongxiao Interchange
	nfb0117	No. 1	Chiayi System Interchange to Xinying Service Area
	nfb0123	No. 1	Xinying Interchange to Xiaying System Interchange
South	nfb0124	No. 1	Xiaying System Interchange to Xinying Interchange
	nfb0327	No. 3	Tianliao Interchange to Yanchao System Interchange
	nfb0328	No. 3	Yanchao System Interchange to Tianliao Interchange

Table 4. Comprehensive comparison with competitive methods across short- and long-term prediction.

	Short-Term			Long-Term
	MAE	RMSE	SMAPE (%)	MAE	RMSE	SMAPE (%)
HA [28]	35.036	75.516	6.408	34.516	74.000	6.304
LSTM [29]	23.626	62.972	4.210	31.520	74.062	5.743
DNN [30]	23.695	57.216	4.389	31.279	70.295	5.886
DE-SLSTM [22]	20.994	51.870	3.934	31.948	72.549	5.842
MTSMFF [23]	26.048	59.649	5.092	31.639	69.964	5.91
DHM [8]	19.591	52.872	3.712	33.281	71.896	6.245
TFT [24]	31.964	70.441	6.118	32.316	70.084	6.089
PASS2S [5]	20.381	50.250	3.860	28.929	66.695	5.373
SLIT	17.697	45.628	3.391	28.270	67.794	5.194
Improvement ratio (%)	9.67	9.20	8.66	2.28	−1.65	3.33

Table 5. Comparison with competitive methods in long-term prediction over 7-day horizons.

	Method	1 Day	2 Days	3 Days	4 Days	5 Days	6 Days	7 Days
MAE	DE-SLSTM [22]	30.154	32.390	32.138	31.442	32.297	33.285	31.926
	MTSMFF [23]	32.093	32.389	31.667	31.070	31.586	31.145	31.520
	DHM [8]	33.056	33.521	33.663	33.971	33.354	33.770	31.635
	TFT [24]	31.781	32.949	31.622	31.816	32.651	32.966	32.429
	PASS2S [5]	28.295	29.178	29.035	28.684	28.914	29.368	29.028
	SLIT	27.745	27.758	27.670	28.383	28.354	29.074	28.906
RMSE	DE-SLSTM [22]	71.720	75.150	74.403	70.567	70.850	74.206	70.945
	MTSMFF [23]	71.692	71.533	69.482	69.014	70.718	69.537	67.771
	DHM [8]	74.553	72.640	71.874	72.127	71.246	72.920	67.911
	TFT [24]	70.021	72.044	69.479	69.227	70.322	71.335	68.161
	PASS2S [5]	67.431	67.937	66.930	65.891	67.171	67.267	64.238
	SLIT	68.808	68.299	65.951	68.345	68.044	69.401	65.711
SMAPE(%)	DE-SLSTM [22]	5.492	5.810	5.856	5.814	5.997	6.058	5.866
	MTSMFF [23]	5.955	6.020	5.920	5.866	5.854	5.842	5.910
	DHM [8]	6.154	6.311	6.322	6.422	6.256	6.297	5.950
	TFT [24]	5.984	6.162	5.978	6.017	6.173	6.193	6.118
	PASS2S [5]	5.238	5.367	5.423	5.341	5.359	5.451	5.433
	SLIT	5.090	5.070	5.134	5.201	5.246	5.296	5.321

Table 6. Comparison of short-term prediction results on different types of road segments against competitive methods.

	High			Moderate			Low
	MAE	RMSE	SMAPE (%)	MAE	RMSE	SMAPE (%)	MAE	RMSE	SMAPE (%)
HA [28]	68.907	141.653	10.191	25.554	68.178	6.935	13.130	25.293	2.903
LSTM [29]	46.336	117.844	6.619	17.772	58.566	4.637	8.604	20.183	1.918
DNN [30]	45.047	100.781	6.739	20.179	60.147	5.305	8.245	18.959	1.819
DE-SLSTM [22]	38.638	87.794	5.849	18.308	57.005	4.776	8.081	18.510	1.777
MTSMFF [23]	44.152	97.656	6.939	22.695	62.921	6.051	13.198	25.795	2.912
DHM [8]	34.706	88.262	5.254	17.805	57.428	4.608	8.185	20.343	1.830
TFT [24]	57.547	124.223	8.868	25.550	68.055	6.912	14.919	27.214	3.297
PASS2S [5]	37.125	82.990	5.697	17.847	55.799	4.648	8.118	19.268	1.806
	MAE	RMSE	SMAPE (%)	MAE	RMSE	SMAPE (%)	MAE	RMSE	SMAPE (%)
SLIT	31.454	74.005	4.852	15.928	52.754	4.149	7.412	17.229	1.667
Improvement ratio (%)	9.37	10.83	7.65	10.38	5.46	9.96	8.28	6.92	6.19

Table 7. Comparison of long-term prediction results on different types of road segments against competitive methods.

	High			Moderate			Low
	MAE	RMSE	SMAPE (%)	MAE	RMSE	SMAPE (%)	MAE	RMSE	SMAPE (%)
HA [28]	67.888	140.256	10.006	24.913	63.929	6.788	13.107	25.501	2.896
LSTM [29]	60.085	140.170	8.734	23.520	63.769	6.311	13.049	25.833	2.873
DNN [30]	59.056	129.579	8.976	25.511	63.939	6.899	11.977	25.129	2.637
DE-SLSTM [22]	61.200	135.985	8.970	23.805	63.383	6.395	12.999	25.796	2.867
MTSMFF [23]	58.257	126.716	8.675	24.828	64.113	6.706	13.997	26.571	3.074
DHM [8]	61.833	130.079	9.316	25.607	64.545	6.985	14.604	28.311	3.191
TFT [24]	59.128	126.357	8.880	25.252	63.877	6.858	14.683	27.328	3.250
PASS2S [5]	54.747	121.140	8.217	22.404	61.597	5.996	11.764	24.723	2.587
SLIT	53.264	124.927	7.809	22.132	61.202	5.895	11.534	24.578	2.547
Improvement ratio (%)	2.71	−3.13	4.97	1.22	0.64	1.70	1.95	0.59	1.55

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Lin, H.-T.C.; Dai, H.; Tseng, V.S. Short-Term and Long-Term Travel Time Prediction Using Transformer-Based Techniques. Appl. Sci. 2024, 14, 4913. https://doi.org/10.3390/app14114913

AMA Style

Lin H-TC, Dai H, Tseng VS. Short-Term and Long-Term Travel Time Prediction Using Transformer-Based Techniques. Applied Sciences. 2024; 14(11):4913. https://doi.org/10.3390/app14114913

Chicago/Turabian Style

Lin, Hui-Ting Christine, Hao Dai, and Vincent S. Tseng. 2024. "Short-Term and Long-Term Travel Time Prediction Using Transformer-Based Techniques" Applied Sciences 14, no. 11: 4913. https://doi.org/10.3390/app14114913

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Short-Term and Long-Term Travel Time Prediction Using Transformer-Based Techniques

Abstract

1. Introduction

2. Related Work

2.1. Transformer Models in Traffic Prediction

2.2. Sequence-to-Sequence Models in Long-Term Prediction

3. Proposed Methodology

3.1. Problem Formulation

3.2. Overall Structure of the Proposed Framework

3.3. Enhanced Data Preprocessing

3.4. Short-Term and Long-Term Integrated Encoder–Decoder

4. Experimental Setup

4.1. Datasets

4.2. Competitive Methods

4.3. Parameter Settings

4.4. Evaluation Metrics

5. Results and Discussions

5.1. Comprehensive Comparison across All Road Segments

5.1.1. Short-Term Travel Time Prediction

5.1.2. Long-Term Travel Time Prediction

5.1.3. Statistical Analysis of Predictive Performance

5.2. Performance Comparison on Road Segments of Varying Complexities

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Appendix A. Algorithm for EDP

Appendix B. Statistical Analysis of Predictive Performance

Appendix C. Experimental Results for Peak and Off-Peak Hours

Appendix C.1. Short-Term Prediction

Appendix C.2. Long-Term Prediction

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI