Next Article in Journal
Modal Discontinuous Galerkin Simulations of Richtmyer–Meshkov Instability at Backward-Triangular Bubbles: Insights and Analysis
Next Article in Special Issue
Solving Dynamic Full-Truckload Vehicle Routing Problem Using an Agent-Based Approach
Previous Article in Journal
Multi-Soliton, Soliton–Cnoidal, and Lump Wave Solutions for the Supersymmetric Boussinesq Equation
Previous Article in Special Issue
DE-MKD: Decoupled Multi-Teacher Knowledge Distillation Based on Entropy
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Robust Truck Transit Time Prediction through GPS Data and Regression Algorithms in Mixed Traffic Scenarios

by
Adel Ghazikhani
1,2,
Samaneh Davoodipoor
1,
Amir M. Fathollahi-Fard
3,4,*,
Mohammad Gheibi
5,6 and
Reza Moezzi
6,7
1
Department of Computer Engineering, Imam Reza International University, Mashhad 178-436, Iran
2
Big Data Lab, Imam Reza International University, Mashhad 178-436, Iran
3
Département d’Analytique, Opérations et Technologies de l’Information, Université de Québec à Montreal, 315, Sainte-Catherine Street East, Montreal, QC H2X 3X2, Canada
4
Department of Engineering Science, Faculty of Innovation Engineering, Macau University of Science and Technology, Macau 999078, China
5
Institute for Nanomaterials, Advanced Technologies, and Innovation, Technical University of Liberec, 461 17 Liberec, Czech Republic
6
Faculty of Mechatronics, Informatics and Interdisciplinary Studies, Technical University of Liberec, 461 17 Liberec, Czech Republic
7
Association of Talent Under Liberty in Technology (TULTECH), 10615 Tallinn, Estonia
*
Author to whom correspondence should be addressed.
Mathematics 2024, 12(13), 2004; https://doi.org/10.3390/math12132004
Submission received: 31 March 2024 / Revised: 31 May 2024 / Accepted: 27 June 2024 / Published: 28 June 2024

Abstract

:
To enhance safety and efficiency in mixed traffic scenarios, it is crucial to predict freight truck traffic flow accurately. Issues arise due to the interactions between freight trucks and passenger vehicles, leading to problems like traffic congestion and accidents. Utilizing data from the Global Positioning System (GPS) is a practical method to enhance comprehension and forecast the movement of truck traffic. This study primarily focuses on predicting truck transit time, which involves accurately estimating the duration it will take for a truck to travel between two locations. Precise forecasting has significant implications for truck scheduling and urban planning, particularly in the context of cross-docking terminals. Regression algorithms are beneficial in this scenario due to the empirical evidence confirming their efficacy. This study aims to achieve accurate travel time predictions for trucks by utilizing GPS data and regression algorithms. This research utilizes a variety of algorithms, including AdaBoost, GradientBoost, XGBoost, ElasticNet, Lasso, KNeighbors, Linear, LinearSVR, and RandomForest. The research provides a comprehensive assessment and discussion of important performance metrics, including Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R2). Based on our research findings, combining empirical methods, algorithmic knowledge, and performance evaluation helps to enhance truck travel time prediction. This has significant implications for logistical efficiency and transportation dynamics.

1. Introduction

The precise forecasting of heavy truck movements’ timing and location is crucial for the efficient operation of cross-docking terminals, which are integral to the management of consumable goods, just-in-time inventory strategies, and rapid distribution across various locations [1,2,3,4]. Over the last three decades, numerous studies on transportation have focused on generating short-term forecasts for traffic variables, including traffic flow patterns, flow velocity, travel durations, and matchmaking distances [5,6,7]. Estimating travel times for commercial vehicles in urban environments presents significant challenges due to limited observed data, numerous origin–destination pairs, and variations in travel times primarily caused by traffic-related delays [8,9,10].
Traditional data collection methods provide general insights but fall short of the in-depth analysis required for modern transportation dynamics [11,12]. The integration of GPS data in traffic monitoring and strategic planning has increased steadily over the past two decades, driven by significant technological advancements [13,14]. This integration is particularly relevant in regions like Iran, which serves as a critical junction for the transportation of goods between European nations and Central Asia, including Turkmenistan, Tajikistan, Kazakhstan, Kyrgyzstan, and Uzbekistan. Iran’s strategic location offers diverse opportunities for trade and transportation interactions.
The accurate utilization of GPS data is essential for understanding and predicting truck traffic behavior in complex scenarios. The numerous border crossings and maritime gateways in Iran facilitate the intricate process of transporting goods across the country, providing trucks with multiple entry and exit options. This predictive model extends beyond logistics, impacting border traffic forecasting by leveraging the consistent trends observed in the trucking industry, where trucks return to the country with or without cargo after delivering shipments abroad [15].
This paper presents a comprehensive framework, combining GPS data with advanced machine learning regression algorithms to forecast travel times for freight trucks accurately. The study explores how to maximize the efficiency of goods in transit through Iran’s strategic location, offering valuable insights for logistics planning, border traffic estimation in strategic trade contexts, and cross-dock management. By examining the intricacies of trucking operations, including border crossings, road transport, and maritime gateways, the research enhances our understanding of these processes.
Key contributions of this study include:
  • Introducing a novel method that integrates GPS data with advanced machine learning regression algorithms to forecast freight truck travel times accurately.
  • Providing valuable insights for logistics planning and border traffic estimation through an exploration of goods’ transit efficiency in Iran’s strategic trade corridors.
  • Enhancing the depth of analysis by examining various factors influencing trucking operations, such as border crossings and maritime gateways.
  • Offering a unique and detailed visualization of truck trajectories, grouped by date and time, to improve the understanding of movement patterns.
  • Establishing a foundation for informed decision-making in transportation, logistics, and cross-docking terminals by advancing knowledge in travel time prediction and truck traffic flow.
The structure of this paper is as follows: Section 2 presents a literature review to identify research gaps forming the basis of our contributions. Section 3 explains the research method, which involves advanced machine learning regression algorithms. Section 4 provides detailed findings, potential scenarios, and data analysis related to predicting journey times and truck traffic trends. Section 5 discusses practical recommendations and managerial perspectives on cross-docking terminals, transportation, and logistics planning. Finally, Section 6 concludes the research by summarizing the findings, constraints, and potential avenues for future research.

2. Literature Review

Case studies from China, the US, and Europe underscore the significance of accurate truck scheduling and urban planning, particularly for cross-docking terminals [16,17]. Various prediction techniques have been employed in truck scheduling research, with recent studies focusing on data mining and machine learning methods [18,19,20]. Below, we examine key contributions in this field.
One of the earliest studies, conducted by Zhao and Goodchild [21], investigated port discharge—a critical aspect of intermodal maritime systems affecting supply chain efficiency. By leveraging truck entry data, they improved port discharge and operational effectiveness. Their research included a reliability assessment of travel time changes across drainage networks, an analysis of truck routing choices, and a method to predict 95% confidence intervals for travel times between origin–destination pairs using GPS data. Morgul et al. [22] assessed the role of GPS data in transportation planning and proposed an integrated strategy to use robust GPS data for predicting travel times for commercial vehicles. Their study showed that the travel times derived from taxi GPS data closely matched those of trucks, suggesting the potential for scaling taxi GPS data to enhance truck travel time insights. Despite limited truck GPS data, taxi GPS data provided citywide travel time estimations, showcasing an innovative synergy between existing data sources and effective estimation strategies. Moniruzzaman et al. [1] utilized one-month volume data from remote microwave traffic sensors and one-year GPS data to develop two sets of artificial neural network (ANN) models. These models predicted short-term truck volumes at a specific crossing, bridge clearance times, and traversal durations. Separate ANN models were trained for volume prediction using a multi-layer feedforward neural network with backpropagation. The predicted crossing times from the ANN models showed a strong correlation with observed values, confirmed by evaluation indices, demonstrating the robust predictive capability of the models.
Jiang [23] focused on predicting bus transit times using bus GPS data and artificial neural networks. Accurate predictions were essential for urban transportation planning and optimizing bus schedules. Jiang introduced three predictive models for travel time estimation, each based on a three-layer neural network architecture. The first model predicted total travel time using calculated features from bus GPS data. The second model utilized information from preceding buses to predict segment travel times, and the third combined segment predictions to estimate the total route travel time. Wang et al. [24] presented an innovative method for predicting truck traffic flow using sampled GPS data within road networks. They employed a two-stage framework: expansion and prediction. The expansion phase used a piecewise constant coefficient method to align the sampled and actual truck flows, considering road gradients and traffic flow magnitudes. The prediction phase applied Long Short-Term Memory (LSTM) and Gated Recurrent Unit (GRU) methodologies, significantly improving prediction accuracy. In recent research, Demissie and Kattan [25] explored large-scale GPS data streams to estimate truck origin–destination flows. They developed an exploratory framework to identify significant events such as truck halts and trips, supported by the Pearson correlation coefficient and an entropy measure. This approach facilitated the comparative analysis of truck movement patterns and identified potential shifts in truck travel dynamics over a year. The researchers then used a multinomial logistic model to develop destination choice models across five time intervals.
Table 1 summarizes the current data mining and machine learning methodologies based on the referenced studies and research environment. Notably, none of these studies combine GPS data with machine learning techniques, such as AdaBoost, GradientBoost, XGBoost, ElasticNet, Lasso, KNeighbors, Linear, LinearSVR, and RandomForest for predicting the arrival times of heavy trucks in transportation systems and cross-docking terminals.
To deeper examine into the landscape of heavy truck arrival time prediction, we introduce a comprehensive Sankey diagram, illustrated in Figure 1. This diagram provides an insightful visual representation of prominent research trends at the confluence of transportation, machine learning, and predictive analysis. Leveraging data sourced from the Scopus database since July 2023, this diagram intricately synthesizes the interrelationships among countries, keywords, and primary sources within this domain.
The diagram distinctly accentuates three pivotal countries that have emerged as influential drivers in shaping this research terrain: China, the United States, and India. These nations have consistently exhibited remarkable prominence in propelling advancements at the nexus of transportation and machine learning. Moreover, the diagram informs three predominant keywords that have captured substantial attention within the discourse: “machine learning”, “deep learning”, and “prediction”. These keywords encapsulate the quintessence of research endeavors focused on leveraging computational intelligence to elevate predictive modelling and analytical capacities within the ambit of transportation systems. Significantly, the Sankey diagram also spotlights the primary sources that serve as critical conduits for disseminating cutting-edge research outcomes in this domain. Noteworthy among these sources are the journals od “Transportation Research Part C: Emerging Technologies”, “IEEE Access”, and the “IEEE Transactions on Intelligent Transportation Systems”.
Realizing the critical importance of predicting vehicle timing and location on both domestic and international roads, particularly in regions like Iran where research in this area is scarce, we initiated a comprehensive investigation into this problem. While extensive studies have focused on truck scheduling, urban planning, and the application of machine learning in transportation, there remains a significant gap in accurately predicting heavy truck transit times using GPS data combined with advanced machine learning algorithms. Existing studies have largely utilized traditional data sources or applied machine learning techniques to related but distinct problems, such as bus transit time prediction or using taxi GPS data for estimating truck travel times. None have comprehensively integrated GPS data with a wide array of machine learning algorithms specifically tailored for predicting heavy truck arrival times in cross-docking terminals and complex transportation networks.
This paper addresses this gap by introducing a novel approach that leverages GPS data alongside advanced regression algorithms, including AdaBoost, GradientBoost, XGBoost, ElasticNet, Lasso, KNeighbors, Linear, LinearSVR, and RandomForest. To facilitate accurate prediction, the geographic coordinates are converted into real addresses, followed by a normalization process using the MinMax method to account for differences in time and place. Data segmentation then categorizes each route based on parameters such as date and time, allowing for a detailed visualization of each truck’s full path and location on specific dates through scatter plots. This approach not only enhances the visualization of truck trajectories but also connects their movement paths chronologically.
For evaluation, the study employs the k-fold method, utilizing 80% of the dataset for training and 20% for thorough testing. The integration of the K-Nearest Neighbors (KNN) algorithm with the Leave One Out technique further refines the evaluation framework, streamlining both the training and testing phases. By doing so, this research not only improves the accuracy of truck travel time predictions but also provides valuable insights for logistics planning, border traffic estimation, and the optimization of cross-dock operations. This comprehensive integration of empirical data and machine learning techniques offers a robust framework for enhancing the efficiency and reliability of truck transit time predictions, thereby filling a critical void in the existing body of research.

3. Methodology and Empirical Applications

Accurate freight truck traffic flow prediction is crucial for solving problems in urban planning, truck scheduling, and cross-docking terminals. These predictions help urban planners optimize traffic management, allocate resources, and design road infrastructure for freight truck movements, ultimately reducing congestion and accidents and improving traffic safety. In logistics, predictive truck scheduling enables companies to streamline operations by optimizing delivery schedules, reducing idle time, and enhancing fleet efficiency. This leads to reduced costs and improved sustainability. Cross-docking terminals, which are critical for time-sensitive goods and just-in-time inventory management, benefit from accurate predictions by optimizing inbound and outbound shipment coordination, minimizing wait times, and ensuring smooth goods flow. Our research provides significant benefits for urban infrastructure planning, logistics operations, and cross-docking terminal efficiency, extending beyond algorithmic advancements (Figure 2).
Within this paper, we have considered the applied methods, which include both the use of regression algorithms (Section 3.1) and the smart selection of the right time series techniques (Section 3.2). After that, we go into more detail about the complicated steps that went into making our complete algorithm (Section 3.3). Figure 3 shows the conceptual roadmap of our research methodology.

3.1. Data Preprocessing

We began by collecting raw GPS data from freight trucks navigating various routes in mixed traffic scenarios. The initial step involved cleaning the data to address outliers and missing values. We then normalized the features using the MinMax method to ensure equal contribution during model training. Additionally, we engineered features such as road type and truck characteristics to enhance predictive performance.
For feature selection, we employed correlation analysis, feature importance assessment, and dimensionality reduction techniques to identify the most informative subset of features. This selection process was guided by domain-specific insights, ensuring the inclusion of factors known to influence truck transit time.
By systematically preprocessing the data, we ensured that our models were trained on high-quality, relevant information, thereby enhancing the accuracy and reliability of our predictions. The following sections provide detailed descriptions of the regression algorithms employed and the selection of time series techniques, followed by the intricate steps of our algorithm development process.

3.2. Regression

One of the most popular methods to model data is linear regression, which has a robust and simple mathematical foundation [28]. This method is beneficial for identifying linear correlations between two variables, enabling the estimation of one variable’s value based on another. In this context, a linear relationship indicates that a change in one variable directly impacts the other [42]. The independent variable, which influences change, is a key term in this model.
A scatter plot is used to depict the relationship between two variables by plotting one against the other [43]. If the graph forms a straight line, it indicates a linear relationship between the variables [28]. Four conditions must be satisfied to validate the proposed model that defines the link between the data and the dependent variable:
  • Linearity: the relationship between the variables should manifest as a linear pattern on the scatter plot.
  • Independence: each data point should remain independent, without any connection or reliance on others.
  • Homoscedasticity: the variability in the dependent variable should remain consistent across the range of the independent variable, with data points scattering similarly around the regression line for all values of the independent variable.
  • Normality: The residuals, representing the differences between observed and predicted values, should follow a normal distribution. This condition is crucial for hypothesis testing and confidence interval calculations.
Ensuring these conditions are met enhances confidence in the accuracy of the proposed model, establishing a strong foundation for further investigation.

3.3. Time Series

The data windowing approach is also utilized in data prediction. Sometimes, a time series can be viewed as a regression issue, where “time” serves as the independent variable [44]. The primary objective of time series analysis is to forecast the future value of the dependent variable. Stationarity, a crucial quality in time series analysis, allows for effective data analysis [45].
Time windows are essential for determining the long-term movement of a truck or vehicle. Forecasting time depends on the location parameter (Loc). In our research, we examine several time frames, such as 1, 3, 5, 7, 9, and 11 units. By carefully analyzing outcomes from different time frame configurations, we gain a deeper understanding of the effectiveness of our predictive models.

3.4. Full Algorithmic Framework

Figure 4 displays the flowchart of the entire procedure included in our comprehensive method. The process begins with loading and reading raw data. Geographical coordinates must be converted into a uniform numerical representation due to the various data formats used. Data grouping is performed to enable the creation of a scatter plot. Simultaneously, data undergoes time windowing to enhance the quality of our study.
The core phase, moving ahead, involves the machine learning process of training and testing. This phase concludes with the development of a machine learning model that is carefully designed using regression algorithms. We improve our ability to forecast by utilizing the k-fold approach to enhance the accuracy of our predictions.
Here is an overview of each ensemble algorithm utilized in our research:
  • AdaBoost (Adaptive Boosting): AdaBoost is an ensemble learning method that combines multiple weak classifiers to create a strong classifier. It iteratively adjusts the weights of incorrectly classified instances to focus on the difficult-to-classify samples. Each weak classifier is trained sequentially, and its predictions are combined using a weighted majority vote.
  • GradientBoosting: Gradient boosting is a machine learning technique that builds a strong predictive model by sequentially fitting new models to the residuals of the previous models. It minimizes a loss function by iteratively adding new decision trees, where each tree is trained to correct the errors of the previous ones.
  • XGBoost (Extreme Gradient Boosting): XGBoost is an optimized implementation of gradient boosting that offers improvements in speed and performance. It incorporates features such as parallelized tree construction and hardware optimization to achieve state-of-the-art results in many machine learning tasks.
  • ElasticNet: ElasticNet is a regularization technique that combines the penalties of both the L1 (Lasso) and L2 (Ridge) regularization methods. It is used to address multicollinearity and perform feature selection by encouraging sparse coefficients while still allowing for correlated predictors.
  • Lasso (Least Absolute Shrinkage and Selection Operator): Lasso is a linear regression method that performs both variable selection and regularization by adding a penalty term to the absolute values of the regression coefficients. It encourages sparsity in the model by shrinking some coefficients to zero, effectively performing feature selection.
  • KNeighbors (K-Nearest Neighbors): KNeighbors is a non-parametric algorithm used for classification and regression tasks. It predicts the output of a data point by averaging the target values of its k nearest neighbors in the feature space.
  • Linear Regression: Linear regression is a simple linear model that predicts the target variable as a linear combination of the input features. It is widely used for regression tasks when the relationship between the features and the target variable is assumed to be linear.
  • LinearSVR (Linear Support Vector Regression): LinearSVR is a variant of support vector regression that uses a linear kernel function to map the input features into a higher-dimensional space. It aims to find a hyperplane that best fits the training data while maximizing the margin.
  • RandomForest: RandomForest is an ensemble learning method that constructs a multitude of decision trees during training and outputs the average prediction of the individual trees. It improves upon the decision tree algorithm by reducing overfitting and increasing robustness.

3.5. Model Validation

Ensuring the dependability and accuracy of our predictive models is crucial for the strength of our research. The validation phase evaluates the accuracy of our created models in forecasting truck transit times by analyzing the collected GPS location data.
To validate our models, we adopted a rigorous approach, employing the following key methodologies:
  • Dataset Splitting: The dataset was divided into training and testing sets. The training set was utilized for model training, while the testing set remained unseen during the training phase, allowing us to evaluate the model’s generalization to new, unseen data.
  • Evaluation Metrics: We employed standard evaluation metrics to assess the performance of our models. Key metrics include Mean Squared Error (MSE), Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and R-squared (R2). These metrics provide a comprehensive understanding of the model’s accuracy and predictive capabilities.
  • Cross-Validation: To further enhance the robustness of our models, we implemented cross-validation techniques. This involved dividing the dataset into multiple folds, training the model on subsets of the data, and evaluating its performance across different subsets. This approach helps mitigate overfitting and ensures the model’s consistency across various data partitions.
  • Comparison with Baseline: We compared the performance of our models against a baseline model, such as a simple linear regression or a basic algorithm. This comparison provides insights into the added value of our proposed approach. Through these validation procedures, it aims to ascertain the effectiveness of our models in accurately predicting truck transit times based on GPS location data. The results of this validation process will be presented in the next section.

4. Results

This research is based on a detailed dataset consisting of 14 Excel files specifically organized to store truck tracking data from 17 June 2021, to 3 August 2022. Consistently, each file contains an astounding aggregation of 1,048,576 records. This data includes locations in the routes of the trucks, from different cities in Iran to cities in Turkey, Turkmenistan, Tajikistan and Afghanistan (Figure 5).
The dataset contains crucial attributes such as truck number, x-coordinate position, y-coordinate position, Gregorian date, and time. The sole concern is converting the coordinates into geographic locations, for which Section 4.1 provides a detailed explanation.
Regression techniques are well-suited for estimating truck trip times due to their compatibility and effectiveness when used in combination. Utilize the extensive truck monitoring data spanning a full year to enable the platform to generate very precise forecasts regarding the arrival times and locations of trucks. This section delves into the insights obtained by utilizing GPS data to predict the travel time of a truck. A comprehensive analysis is being compiled to provide various perspectives on the outcomes of predictive modeling. Section 4.1 explains the process of converting raw geographic coordinates into actual addresses, a crucial step in the research. The narrative transitions to Section 4.2, which discusses the methodical categorization of the data into several groups. Section 4.3 presents a visual tale using scatter plots, while Section 4.4 breaks down the structure of the data. Section 4.5 outlines the challenging processes of data training and testing as the trip progresses. Section 4.6 tests the proposed method in various scenarios, resulting in a comprehensive evaluation across multiple settings.

4.1. Process of Transforming Geographic Coordinates into Physical Addresses

Geographical coordinates are essential in mapping vehicles to convert an address into specific latitude and longitude map points. Geocoding is the process of converting a physical address into its corresponding geographical coordinates. Reverse geocoding converts geographic coordinates into physical addresses. Geocoding assigns unique coordinates to each address on the map. The dynamic change is facilitated by a Locator service, which is simplified by the Locator class.
When geocoding, several input parameters must be considered. The Address object is crucial for matching addresses during the geocoding process. During reverse geocoding, the Point object holds more significance than the Address object. The Address object within the geocoding framework communicates with the geocoding service. The geocoding service returns an Address Candidate object that includes the matched address and its matching map point. Subsequently, this map point becomes the focal point on the map.
The geographical coordinates of address-linked sites have been meticulously assigned, as evident from a detailed examination of Table 2. Assigning a distinct identifying field to each vehicle is crucial in this architecture, as it simplifies monitoring the routes they have travelled.

4.2. Data Grouping

An essential aspect of our study involves categorizing each route and linking it to precise time markers, which include both date and time. This thorough classification is beneficial for locating and identifying all the locations visited by each truck. The segmentation provides a precise depiction of the whereabouts of each vehicle on specific dates. Please review Table 3 for complete information on the locations and positioning of each vehicle at various periods.

4.3. Scatter Plots

Visualizing data graphically is a strategic approach to clarify and reveal the intricate patterns, trends, and relationships within the data. We utilized the elements id, Loc, and time to create a three-dimensional graph to analyze the intricate network of vehicle paths. This research on dynamic visualization provides us with the means to go beyond static images and experience the immersive capabilities of a 3D platform. Figure 6 visually represents the multidimensional trip and acts as evidence of this visual narrative.
Strategic color usage enhances the graphical scene with deeper significance. The color palette is an effective tool for categorizing and labelling different data classes. Cars are categorized based on a unique identifier (id) and each car is associated with a particular color, occupying a specified location. Vehicles are arranged using a chromatic orchestration, with each one vividly colored to represent a specific genre. This division provides energy and visual differentiation to the graphical representation. Please refer to Figure 7 for a colorful and compelling graphic portrayal of this image.
Figure 8 illustrates the evolution of a vehicle’s trajectory over time. This graphic provides a detailed representation of the vehicle’s chronological voyage by clearly illustrating its path and movements. Figure 8 illustrates the complete trajectory of a vehicle, from the beginning of its voyage to the conclusion of its route. Overall, these stats provide more than just numerical details, and offer informative comments.

4.4. Data Structure

This research is based on a large dataset consisting of 14 Excel files. This data repository stores intricate truck tracking data from 17 June 2021 to 3 August 2022, spanning a complete year. There are five crucial characteristics in the dataset: (1) truck number; (2) x- and y-coordinates of the place; (3) date; and (4) time.
Once we have carefully developed our machine learning regression model to address the issue, the subsequent crucial stage is evaluation. It is crucial to thoroughly evaluate the model’s performance by examining a well-chosen set of parameters from this perspective. The paper will discuss key parameters, such as MSE, RMSE, MAE, and R2. Table 4 meticulously documents the results of this assessment, which is the crucial aspect of validation.
In the study, we employed an ensemble of algorithms to predict truck transit time. These algorithms include AdaBoost, GradientBoosting, XGBoost (Extreme Gradient Boosting), ElasticNet, Lasso (Least Absolute Shrinkage and Selection Operator), KNeighbors (K-Nearest Neighbors), Linear Regression, LinearSVR (Linear Support Vector Regression), and RandomForest.
AdaBoost iteratively adjusts the weights of incorrectly classified instances to focus on difficult-to-classify samples. GradientBoosting sequentially fits new models to the residuals of the previous models, minimizing a loss function by adding decision trees that correct the errors of the preceding ones. XGBoost, an optimized implementation of gradient boosting, incorporates features like parallelized tree construction and hardware optimization to achieve state-of-the-art results. ElasticNet combines L1 and L2 regularization to address multicollinearity and perform feature selection. Lasso performs variable selection and regularization by adding a penalty term to the absolute values of regression coefficients, encouraging sparsity in the model. KNeighbors predicts the output of a data point by averaging the target values of its k nearest neighbors. Linear Regression predicts the target variable as a linear combination of input features and is used when the relationship is assumed to be linear. LinearSVR uses a linear kernel function to find a hyperplane that best fits the training data while maximizing the margin. RandomForest constructs multiple decision trees during training and outputs the average prediction of individual trees, reducing overfitting and increasing robustness.

4.5. Training and Testing Process

Our machine learning model has undergone extensive training, utilizing 80% of the dataset just for this purpose. The meticulous training occurred in the context of Figure 9, illustrating the convergence of data and learning. The primary concept of this approach is to evaluate the performance of a model on data subsets that it has not been exposed to previously. After completing the training phase, the main validation step was initiated, utilizing the final 20% of the data. Proficiently isolating this section for examination is an excellent way to evaluate the effectiveness of a model by contrasting it with unfamiliar data. The model can assess its capacity to apply acquired knowledge to unfamiliar scenarios by precisely categorizing items. This demonstrates its potential for application in new environments.

4.6. Evaluation of Algorithms

The evaluation, a crucial component of algorithmic application, is conducted based on the carefully prepared and trained data from the preceding section. This section demonstrates how our suggested algorithms engage with real-world data through a range of scenarios and methodically selected criteria.
  • First Scenario
We investigate truck movement prediction in the first scenario and analyze the impact of different window sizes on the predictive accuracy. Window sizes of 1, 3, 5, 7, 9, and 11 play a significant role in enhancing prediction accuracy, with each size contributing distinct nuances. Our inquiry progresses through various stages with precise attention to detail, thoroughly examining the complexity of predicting truck movements within a specific range.
Table 5 displays a curated set of results, showing the effects of using our nine regression techniques with different window sizes. We have meticulously gathered many factors that affect prediction accuracy to analyze the interaction between algorithms and the changes in window sizes.
The complex patterns of predicted performance vary across different window sizes, with each size imparting distinct characteristics to the algorithms.
  • With a window size of 1, the XGBRegressor method stands out as the top performer, with an R2 parameter value of 0.4156. The AdaBoostRegressor performs poorly, showing a weak stride with a value of 0.1129. The Lasso and ElasticNet algorithms show mediocre performance.
  • Increasing the window size to 3, the XGBRegressor demonstrates its superiority by achieving an R2 parameter value of 0.458. On the other hand, the Lasso and ElasticNet algorithms encounter a situation where they stop making progress, resulting in a value of 0.
  • The XGBRegressor demonstrates predictive ability, with an R2 score of 0.4494 and a window size of 5. The Lasso and ElasticNet algorithms exhibit a subdued trajectory, converging to a value of 0.
  • The XGBRegressor shows a crescendo in performance, achieving an R2 parameter value of 0.4628 with a window size of 7. On the contrary, the Lasso and ElasticNet algorithms experience reduced output when the value is 0.
  • The XGBRegressor stands out with an R2 parameter value of 0.4659 in a window size of 9. Both the Lasso and ElasticNet algorithms exhibit a gradual decrease, reaching a value of 0.0001.
  • With a window size of 11, the XGBRegressor’s performance improves, resulting in an R2 value of 0.4624. The AdaBoostRegressor demonstrates modesty by yielding a value of 0.346.
Table 5 provides detailed information on the interaction between algorithms and window sizes. The research concludes that a window size of 7 indicates optimal performance. Around 70% of the algorithms perform optimally throughout this period, leading to a dependable outcome.
  • Second Scenario
As we delve further into our exploration, the second scenario involves dividing the data into segments of varying sizes—50,000, 75,000, and 100,000 records—each revealing unique insights through regression methods. Algorithms play various roles in these explanations by creating patterns that are well-suited to the dataset’s size. Table 6 illustrates the interaction between data pieces and regression algorithms in response to this case.
From the outcomes of each segment size, we may draw the following conclusions from Table 6:
  • With a segment size of 50,000, the XGBRegressor method demonstrates an R2 parameter value of 0.4486, taking the lead in the ensemble of algorithms. The AdaBoostRegressor method communicates its story softly, with a coefficient of 0.0434. The GradientBoosting and KNeighbors algorithms stand out as top performers, leaving a unique mark in the story.
  • The XGBRegressor algorithm emerges as the top performer with an R2 value of 0.4395, while the AdaBoostRegressor algorithm performs less impressively at 1.124 in a segment size of 75,000. Every algorithm contributes to the discussion, creating its unique mark on the overall picture.
  • With a segment size of 100,000, the XGBRegressor algorithm achieves an R2 parameter value of 0.4417, outperforming the Lasso and ElasticNet algorithms, which both score 0. Three narratives arise, reflecting the interaction between algorithms and data records.
When applied to a wide range of data record counts, the XGBRegressor algorithm consistently delivers more satisfactory results.
  • Third Scenario
As our research progresses to the third scenario, the k-fold method provides valuable insights into the practical utility of the model, revealing the effectiveness of predictions. The parameter “k” determines the division of data samples into k subsets. These subsets are utilized for both training and validation purposes.
K-fold cross-validation involves dividing the dataset into k subsets. One subset is selected for testing, while the remaining k-1 subsets are used for training. There are 10 threads of data that form the tapestry, encompassing nine regression algorithms. Table 7 displays the unexpected findings and the effectiveness of the k-fold method in achieving accurate results.
Algorithms showcased their proficiency through multiple trials by adjusting to the rhythm of “k” values in the evolving script of k-fold validation. The main findings from Table 7 can be summarized as follows:
The XGBRegressor demonstrates its power for k = 1, achieving an R2 parameter value of 0.4433. The Lasso and ElasticNet have coefficients of 0, while the LinearSVR model performs poorly. The KNeighbors, Linear, RandomForest, and GradientBoost algorithms synergize effectively. During the k = 2 iteration, the XGBRegressor model achieves an R-squared value of 0.4476. The Lasso, ElasticNet, and AdaBoost algorithms are experiencing setbacks. KNeighbors, Linear, LinearSVR, RandomForest, and GradientBoost models harmonize perfectly.
During the beginning of k = 3, the XGBRegressor shows an R2 value of 0.4402. AdaBoost fails at 1.9995, while Lasso and ElasticNet exhibit weakness. When k = 4, XGBRegressor achieves an R2 of 0.436, while AdaBoost drops to 3.6026, and LinearSVR only reaches 0.0491. KNeighbors performs well with an R2 of 0.4267, while Lasso and ElasticNet do not perform as strongly. In k = 6’s analysis, XGBRegressor outperformed Lasso and ElasticNet with an R2 of 0.4477. AdaBoost’s explanation lacks impact.
When k = 7, this correlates with Linear’s R2 at 0.4201, but AdaBoost shows a discrepancy at 18.0133. LinearSVR’s optimization fails. In chapter 8, the XGBRegressor achieved an R2 score of 0.4374, while the LinearSVR scored 1.1097. An orchestra of equality exists alongside the others. KNeighbors’ R2 in stanza k = 9 is 0.44, with Lasso and ElasticNet following behind. The XGBRegressor model with k = 10 produced a final R2 score of 0.4389. AdaBoost registers a value of 0.3251, while the other models show a consistent response.
  • Fourth Scenario
In the fourth case, the model’s error rate is equivalent to the average error rate observed in all iterations. In this system, “k” is equivalent to “n,” which represents the number of samples in the dataset. This plot resembles an extended k-fold cross-validation and involves the utilization of the Leave One Out (LOO) method. Table 8 displays the results of this case and the knowledge acquired through this approach.

5. Discussion and Managerial Insights

The above-mentioned findings offer a comprehensive overview of outcomes derived from different scenarios, parameters, and evaluations, aiming to provide deep insights into the practicality and efficiency of the proposed algorithms through an Iranian case study.
The analysis of different scenarios has uncovered interesting patterns. The XGBRegressor algorithm consistently outperformed others in predictive accuracy across various window sizes, demonstrating its ability to effectively manage variations in the data’s level of detail. The XGBRegressor also performed well in data-partitioning scenarios, handling different data quantities adeptly. In particular, the dataset containing 75,000 records required further investigation into data distribution and algorithmic patterns. The k-fold cross-validation method revealed nuanced variations in algorithm efficacy, with XGBRegressor remaining the top performer, although other algorithms showed strengths in different iterations. This underscores the importance of using diverse validation strategies in model assessment. Both the Leave One Out (LOO) technique and the k-fold method highlighted the importance of validation techniques in achieving conclusive results. The parameters MSE, RMSE, MAE, and R2 served as benchmarks for model evaluation in all situations. These metrics provided a comprehensive view of each algorithm’s predictive capability. The XGBRegressor consistently demonstrated strong performance, aligning with its reputation for reliable and accurate prediction. The managerial implications suggest that algorithms like RandomForest, KNeighbors, and XGBRegressor hold significant potential for practical use.

5.1. Managerial Insights

The outcomes can provide crucial managerial insights that can significantly enhance transportation and logistics management, as depicted in Figure 10.
By leveraging GPS data and advanced regression algorithms such as XGBoost, RandomForest, and GradientBoosting, the study achieves high accuracy in predicting truck transit times. This precision is vital for logistics managers to effectively plan and optimize delivery schedules, ultimately improving operational efficiency. For example, Zhao et al. (2019) used GPS data from Beijing’s Sixth Ring Road to predict truck travel speeds under various conditions with an optimized GRU algorithm, demonstrating the practical application of similar methodologies [46].
Accurate transit time predictions enable logistics companies to streamline their operations by minimizing idle time, optimizing routes, and improving fleet utilization, leading to cost reductions and increased operational efficiency. Better predictions also allow for more precise resource allocation, such as scheduling loading and unloading activities and managing driver shifts. This reduces bottlenecks at cross-docking terminals and other logistics hubs, ensuring a smoother flow of goods. Wang et al. (2020) similarly applied machine learning techniques to improve driving style identification in open-pit mining, demonstrating the broader applicability of these methods [47].
The integration of GPS data and machine learning provides a robust foundation for data-driven decision-making. Managers can rely on empirical data and sophisticated algorithms rather than heuristics or past experiences. This data-driven approach enhances the scalability and flexibility of predictive models, allowing for adaptation to different regions, traffic conditions, and logistical scenarios. Predictive analytics also aid in long-term strategic planning. Understanding traffic patterns and potential delays can inform infrastructure development, investment in new technologies, and partnerships with other logistics providers. Predictive models help identify potential delays and disruptions in advance, enabling managers to develop contingency plans to mitigate risks, ensuring more reliable delivery schedules and improved customer satisfaction. Rivera-Campoverde et al. (2024) demonstrated similar applications in the management of vehicle emissions, showing the versatility of these approaches [48].
The study highlights the importance of synchronizing inbound and outbound logistics at cross-docking terminals. Efficiently managing these terminals reduces wait times and ensures a smooth flow of goods, especially crucial for time-sensitive and perishable items. For policymakers, the study provides valuable insights into traffic management and infrastructure development, informing policies aimed at reducing congestion and improving road safety.

5.2. Alignment with Sustainable Development Goals (SDGs)

The present study aligns with several Sustainable Development Goals (SDGs) through its implications in logistics, urban planning, and transportation efficiency. The following sections provide a detailed assessment of the study’s impact on specific SDGs [49,50,51]:
  • SDG 3: Improved health and safety on roads due to reduced congestion and accidents.
  • SDG 7: Lower fuel consumption through optimized routes.
  • SDG 8: Increased economic productivity and better working conditions for drivers.
  • SDG 9: Innovations in transportation infrastructure and logistics.
  • SDG 11: Enhanced urban sustainability and livability.
  • SDG 12: More efficient use of resources in production and distribution.
  • SDG 13: Reduced greenhouse gas emissions.
The conceptual model in Figure 11 illustrates the flow from data collection and processing to the positive impacts on various SDGs, showcasing the broader societal benefits of the research on truck transit time prediction.

6. Conclusions, Limitations, and Future Works

The effective optimization of truck scheduling in cross-docking terminals and traffic management in large urban areas requires an insightful strategy. Restrictions on heavy-duty trucks, which limit them to specific routes, often lead to traffic congestion. Thus, predicting travel speeds on specific roads is crucial to providing customized information services to drivers. This study utilized truck-generated tracking data and regression algorithms, recognized for their predictive capabilities in estimating travel durations and vehicle locations. By employing a range of regression algorithms, including AdaBoost, GradientBoost, XGBoost, ElasticNet, RandomForest, KNeighbors, Linear, LinearSVR, and Lasso, our investigation provides insights across various scenarios. Among these, the XGBRegressor algorithm consistently stands out as a superior predictor, surpassing other algorithms in different time steps and situations. This finding opens exciting research opportunities for the future, including investigating different regression algorithms and performing comparative analyses to understand their effectiveness relative to our results.
Expanding the scope of this predictive model to include a broader range of vehicle types, such as buses, vans, and cars, presents a promising opportunity. These diverse vehicle types exhibit inherent variations in movement patterns, characteristics, and operational dynamics, making them rich subjects for investigation. By customizing and refining predictive methodologies to match their unique characteristics, we can develop a comprehensive toolkit for predicting the travel times and trajectories of various vehicles. This effort is crucial for enhancing transportation management strategies that address both freight logistics and the efficient movement of passengers, promoting a holistic approach to optimizing mobility.
Our research established a solid foundation for improved predictive models in estimating truck travel time. Future research can focus on ongoing exploration, method improvement, and expanding applications to various vehicle types. This advancement will contribute significantly to transportation management, achieving effective and sustainable mobility solutions [52,53].

Author Contributions

Conceptualization and design: A.G. and S.D.; Data collection: M.G. and R.M.; Analysis and interpretation of the results: M.G., A.G. and A.M.F.-F.; Draft manuscript preparation: A.G., S.D., M.G. and R.M.; Writing—review and editing: R.M.; Supervision: A.M.F.-F. All authors have read and agreed to the published version of the manuscript.

Funding

The authors would like to thank Technical University of Liberec for their support through the Student Grant Competition SGS-2023–3401. The research also was supported by Research Infrastructure NanoEnviCz, via Czech Republic’s Ministry of Education, Youth, and Sports under Project No. LM2023066.

Data Availability Statement

The datasets examined in this research are not accessible to the public because of privacy considerations, but the data can be made accessible upon a reasonable request to the corresponding author.

Conflicts of Interest

The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.

References

  1. Moniruzzaman, M.; Maoh, H.; Anderson, W. Short-term prediction of border crossing time and traffic volume for commercial trucks: A case study for the Ambassador Bridge. Transp. Res. Part C Emerg. Technol. 2016, 63, 182–194. [Google Scholar] [CrossRef]
  2. Golob, T.F.; Regan, A.C. Impacts of highway congestion on freight operations: Perceptions of trucking industry managers. Transp. Res. Part A Policy Pract. 2001, 35, 577–599. [Google Scholar] [CrossRef]
  3. Theophilus, O.; Dulebenets, M.A.; Pasha, J.; Lau, Y.Y.; Fathollahi-Fard, A.M.; Mazaheri, A. Truck scheduling optimization at a cold-chain cross-docking terminal with product perishability considerations. Comput. Ind. Eng. 2021, 156, 107240. [Google Scholar] [CrossRef]
  4. Fathollahi-Fard, A.M.; Ranjbar-Bourani, M.; Cheikhrouhou, N.; Hajiaghaei-Keshteli, M. Novel modifications of social engineering optimizer to solve a truck scheduling problem in a cross-docking system. Comput. Ind. Eng. 2019, 137, 106103. [Google Scholar] [CrossRef]
  5. Shi, L.; Liu, M.; Liu, Y.; Zhao, Q.; Cheng, K.; Zhang, H.; Fathollahi-Fard, A.M. Evaluation of Urban Traffic Accidents Based on Pedestrian Landing Injury Risks. Appl. Sci. 2022, 12, 6040. [Google Scholar] [CrossRef]
  6. Nadi, A.; Sharma, S.; Snelder, M.; Bakri, T.; van Lint, H.; Tavasszy, L. Short-term prediction of outbound truck traffic from the exchange of information in logistics hubs: A case study for the port of Rotterdam. Transp. Res. Part C Emerg. Technol. 2021, 127, 103111. [Google Scholar] [CrossRef]
  7. Borowska-Stefańska, M.; Kowalski, M.; Kurzyk, P.; Sahebgharani, A.; Sapińska, P.; Wiśniewski, S.; Goniewicz, K.; Dulebenets, M.A. Assessing the impacts of sunday trading restrictions on urban public transport: An example of a big city in central Poland. J. Public Transp. 2023, 25, 100049. [Google Scholar] [CrossRef]
  8. Sun, Z.; Ban, X.J. Vehicle classification using GPS data. Transp. Res. Part C Emerg. Technol. 2013, 37, 102–117. [Google Scholar] [CrossRef]
  9. Gingerich, K.; Maoh, H.; Anderson, W. Classifying the purpose of stopped truck events: An application of entropy to GPS data. Transp. Res. Part C Emerg. Technol. 2016, 64, 17–27. [Google Scholar] [CrossRef]
  10. Li, F.; Feng, J.; Yan, H.; Jin, G.; Yang, F.; Sun, F.; Jin, D.; Li, Y. Dynamic graph convolutional recurrent network for traffic prediction: Benchmark and solution. ACM Trans. Knowl. Discov. Data 2023, 17, 1–21. [Google Scholar] [CrossRef]
  11. Zhou, M.; Kong, N.; Zhao, L.; Huang, F.; Wang, S.; Campy, K.S. Understanding urban delivery drivers’ intention to adopt electric trucks in China. Transp. Res. Part D Transp. Environ. 2019, 74, 65–81. [Google Scholar] [CrossRef]
  12. Bai, R.; Xue, N.; Chen, J.; Roberts, G.W. A set-covering model for a bidirectional multi-shift full truckload vehicle routing problem. Transp. Res. Part B Methodol. 2015, 79, 134–148. [Google Scholar] [CrossRef]
  13. Yang, Y.; Jia, B.; Yan, X.Y.; Jiang, R.; Ji, H.; Gao, Z. Identifying intracity freight trip ends from heavy truck GPS trajectories. Transp. Res. Part C Emerg. Technol. 2022, 136, 103564. [Google Scholar] [CrossRef]
  14. Gheibi, M.; Karrabi, M.; Latifi, P.; Fathollahi-Fard, A.M. Evaluation of traffic noise pollution using geographic information system and descriptive statistical method: A case study in Mashhad, Iran. In Environmental Science and Pollution Research; Springer: Berlin/Heidelberg, Germany, 2022; pp. 1–14. [Google Scholar]
  15. Bombelli, A.; Fazi, S. The ground handler dock capacitated pickup and delivery problem with time windows: A collaborative framework for air cargo operations. Transp. Res. Part E Logist. Transp. Rev. 2022, 159, 102603. [Google Scholar] [CrossRef]
  16. Ni, L.; Wang, X.C.; Zhang, D. Impacts of information technology and urbanization on less-than-truckload freight flows in China: An analysis considering spatial effects. Transp. Res. Part A Policy Pract. 2016, 92, 12–25. [Google Scholar] [CrossRef]
  17. Popken, D.A. An analytical framework for routing multiattribute multicommodity freight. Transp. Res. Part B Methodol. 1996, 30, 133–145. [Google Scholar] [CrossRef]
  18. Li, N.; Wu, Y.; Wang, Q.; Ye, H.; Wang, L.; Jia, M.; Zhao, S. Underground mine truck travel time prediction based on stacking integrated learning. Eng. Appl. Artif. Intell. 2023, 120, 105873. [Google Scholar] [CrossRef]
  19. Sharman, B.W.; Roorda, M.J. Multilevel modelling of commercial vehicle inter-arrival duration using GPS data. Transp. Res. Part E Logist. Transp. Rev. 2013, 56, 94–107. [Google Scholar] [CrossRef]
  20. Zhao, J.; Gao, Y.; Yang, Z.; Li, J.; Feng, Y.; Qin, Z.; Bai, Z. Truck traffic speed prediction under non-recurrent congestion: Based on optimized deep learning algorithms and GPS data. IEEE Access 2019, 7, 9116–9127. [Google Scholar] [CrossRef]
  21. Zhao, W.; Goodchild, A.V. Truck travel time reliability and prediction in a port drayage network. Marit. Econ. Logist. 2011, 13, 387–418. [Google Scholar] [CrossRef]
  22. Morgul, E.F.; Ozbay, K.; Iyer, S.; Holguin-Veras, J. Commercial vehicle travel time estimation in urban networks using GPS data from multiple sources. In Proceedings of the Transportation Research Board 92nd Annual Meeting (No. 13–4439), Washington, DC, USA, 13–17 January 2013. [Google Scholar]
  23. Jiang, F. Bus Transit Time Prediction Using GPS Data with Artificial Neural Networks. 2017. Available online: https://www.semanticscholar.org/paper/Bus-Transit-Time-Prediction-using-GPS-Data-with-Jiang/fd4d5ffba0471cffeee9b0045b3b4407b26ef160 (accessed on 20 May 2023).
  24. Wang, S.; Zhao, J.; Shao, C.; Dong, C.; Yin, C. Truck traffic flow prediction based on LSTM and GRU methods with sampled GPS data. IEEE Access 2020, 8, 208158–208169. [Google Scholar] [CrossRef]
  25. Demissie, M.G.; Kattan, L. Estimation of truck origin-destination flows using GPS data. Transp. Res. Part E Logist. Transp. Rev. 2022, 159, 102621. [Google Scholar] [CrossRef]
  26. Pani, C.; Fadda, P.; Fancello, G.; Frigau, L.; Mola, F. A data mining approach to forecast late arrivals in a transhipment container terminal. Transport 2014, 29, 175–184. [Google Scholar] [CrossRef]
  27. Bhattacharya, A.; Kumar, S.A.; Tiwari, M.K.; Talluri, S. An intermodal freight transport system for optimal supply chain logistics. Transp. Res. Part C Emerg. Technol. 2014, 38, 73–84. [Google Scholar] [CrossRef]
  28. Li, X.; Bai, R. Freight vehicle travel time prediction using gradient boosting regression tree. In Proceedings of the 2016 15th IEEE International Conference on Machine Learning and Applications (ICMLA), Anaheim, CA, USA, 18–20 December 2016; pp. 1010–1015. [Google Scholar]
  29. Van der Spoel, S.; Amrit, C.; van Hillegersberg, J. Predictive analytics for truck arrival time estimation: A field study at a European distribution centre. Int. J. Prod. Res. 2017, 55, 5062–5078. [Google Scholar] [CrossRef]
  30. Salleh NH, M.; Riahi, R.; Yang, Z.; Wang, J. Predicting a containership’s arrival punctuality in liner operations by using a fuzzy rule-based bayesian network (frbbn). Asian J. Shipp. Logist. 2017, 33, 95–104. [Google Scholar] [CrossRef]
  31. Alcoba, R.D.; Ohlund, K.W. Predicting on-Time Delivery in the Trucking Industry. Ph.D. Dissertation, Supply Chain Management Program, Massachusetts Institute of Technology, Cambridge, MA, USA, 2017. Available online: https://dspace.mit.edu/handle/1721.1/112870 (accessed on 20 May 2023).
  32. Barbour, W.; Samal, C.; Kuppa, S.; Dubey, A.; Work, D.B. On the data-driven prediction of arrival times for freight trains on us railroads. In Proceedings of the 2018 21st International Conference on Intelligent Transportation Systems (ITSC), Maui, HI, USA, 4–7 November 2018; pp. 2289–2296. [Google Scholar]
  33. Wu, R.; Luo, G.; Shao, J.; Tian, L.; Peng, C. Location prediction on trajectory data: A review. Big Data Min. Anal. 2018, 1, 108–127. [Google Scholar] [CrossRef]
  34. James, J.Q.; Yu, W.; Gu, J. Online vehicle routing with neural combinatorial optimization and deep reinforcement learning. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3806–3817. [Google Scholar]
  35. Yu, J.; Tang, G.; Song, X.; Yu, X.; Qi, Y.; Li, D.; Zhang, Y. Ship arrival prediction and its value on daily container terminal operation. Ocean Eng. 2018, 157, 73–86. [Google Scholar] [CrossRef]
  36. Balster, A.; Hansen, O.; Friedrich, H.; Ludwig, A. An ETA prediction model for intermodal transport networks based on machine learning. Bus. Inf. Syst. Eng. 2020, 62, 403–416. [Google Scholar] [CrossRef]
  37. Servos, N.; Liu, X.; Teucke, M.; Freitag, M. Travel time prediction in a multimodal freight transport relation using machine learning algorithms. Logistics 2019, 4, 1. [Google Scholar] [CrossRef]
  38. Verma, A.K.; Saxena, R.; Jadeja, M.; Bhateja, V.; Lin, J.C.W. Bet-GAT: An Efficient Centrality-Based Graph Attention Model for Semi-Supervised Node Classification. Appl. Sci. 2023, 13, 847. [Google Scholar] [CrossRef]
  39. Liu, Y.; Zou, B.; Ni, A.; Gao, L.; Zhang, C. Calibrating microscopic traffic simulators using machine learning and particle swarm optimization. Transp. Lett. 2021, 13, 295–307. [Google Scholar] [CrossRef]
  40. Antamis, T.; Medentzidis, C.R.; Skoumperdis, M.; Vafeiadis, T.; Nizamis, A.; Ioannidis, D.; Tzovaras, D. AI-supported forecasting of intermodal freight transportation delivery time. In Proceedings of the 2021 62nd International Scientific Conference on Information Technology and Management Science of Riga Technical University (ITMS), Riga, Latvia, 14–15 October 2021; pp. 1–6. [Google Scholar]
  41. Valatsos, P.; Vafeiadis, T.; Nizamis, A.; Ioannidis, D.; Tzovaras, D. Freight transportation route time prediction with ensemble learning techniques. In Proceedings of the 25th Pan-Hellenic Conference on Informatics, Volos, Greece, 26–28 November 2021; pp. 52–57. [Google Scholar]
  42. Konečný, V.; Brídziková, M.; Marienka, P. Research of bus transport demand and its factors using multicriteria regression analysis. Transp. Res. Procedia 2021, 55, 180–187. [Google Scholar] [CrossRef]
  43. Costa, M.; Félix, R.; Marques, M.; Moura, F. Impact of COVID-19 lockdown on the behavior change of cyclists in Lisbon, using multinomial logit regression analysis. Transp. Res. Interdiscip. Perspect. 2022, 14, 100609. [Google Scholar] [CrossRef]
  44. Comi, A.; Zhuk, M.; Kovalyshyn, V.; Hilevych, V. Investigating bus travel time and predictive models: A time series-based approach. Transp. Res. Procedia 2020, 45, 692–699. [Google Scholar] [CrossRef]
  45. Ma, T.; Antoniou, C.; Toledo, T. Hybrid machine learning algorithm and statistical time series model for network-wide traffic forecast. Transp. Res. Part C Emerg. Technol. 2020, 111, 352–372. [Google Scholar] [CrossRef]
  46. Li, M.; Qi, J.; Tian, X.; Guo, H.; Liu, L.; Fathollahi-Fard, A.M.; Tian, G. Smartphone-based straw incorporation: An improved convolutional neural network. Comput. Electron. Agric. 2024, 221, 109010. [Google Scholar] [CrossRef]
  47. Wang, Q.; Zhang, R.; Wang, Y.; Lv, S. Machine learning-based driving style identification of truck drivers in open-pit mines. Electronics 2019, 9, 19. [Google Scholar] [CrossRef]
  48. Rivera-Campoverde, N.D.; Arenas-Ramírez, B.; Muñoz Sanz, J.L.; Jiménez, E. GPS Data and Machine Learning Tools, a Practical and Cost-Effective Combination for Estimating Light Vehicle Emissions. Sensors 2024, 24, 2304. [Google Scholar] [CrossRef]
  49. Behdadfar, E.; Samaei, S.R. Towards a Smart Tehran: Leveraging Machine Learning for Sustainable Development, Balanced Growth, and Resilience. J. New Res. Smart City 2024, 2, 53–67. [Google Scholar]
  50. Alqahtani, H.; Kumar, G. Machine learning for enhancing transportation security: A comprehensive analysis of electric and flying vehicle systems. Eng. Appl. Artif. Intell. 2024, 129, 107667. [Google Scholar] [CrossRef]
  51. Kunieda, Y.; Suzuki, H. A Detection Method of Garbage Collection Status from Sound of Garbage Trucks. In Proceedings of the 2024 IEEE International Conference on Consumer Electronics (ICCE), Las Vegas, NV, USA, 6–8 January 2024; pp. 1–6. [Google Scholar]
  52. Zhan, C.; Zhang, X.; Yuan, J.; Chen, X.; Zhang, X.; Fathollahi-Fard, A.M.; Tian, G. A hybrid approach for low-carbon transportation system analysis: Integrating CRITIC-DEMATEL and deep learning features. Int. J. Environ. Sci. Technol. 2024, 21, 791–804. [Google Scholar] [CrossRef] [PubMed]
  53. Fathollahi-Fard, A.M.; Woodward, L.; Akhrif, O. A distributed permutation flow-shop considering sustainability criteria and real-time scheduling. J. Ind. Inf. Integr. 2024, 39, 100598. [Google Scholar] [CrossRef]
Figure 1. Sankey diagram for highlighting the main countries, keywords, and journal sources in the field.
Figure 1. Sankey diagram for highlighting the main countries, keywords, and journal sources in the field.
Mathematics 12 02004 g001
Figure 2. Empirical application of this research.
Figure 2. Empirical application of this research.
Mathematics 12 02004 g002
Figure 3. Conceptual diagram of this research.
Figure 3. Conceptual diagram of this research.
Mathematics 12 02004 g003
Figure 4. Flowchart for the full algorithm.
Figure 4. Flowchart for the full algorithm.
Mathematics 12 02004 g004
Figure 5. Geographical locations covered by truck routes connecting cities in Iran with cities in Turkey, Turkmenistan, Tajikistan, and Afghanistan (In this map, there are some Persian text as follows: Tehran: تهران, Riyadh:الریاض, Jeddahجده:, and Cairo:القاهره).
Figure 5. Geographical locations covered by truck routes connecting cities in Iran with cities in Turkey, Turkmenistan, Tajikistan, and Afghanistan (In this map, there are some Persian text as follows: Tehran: تهران, Riyadh:الریاض, Jeddahجده:, and Cairo:القاهره).
Mathematics 12 02004 g005
Figure 6. Data scatter diagram for track of trucks.
Figure 6. Data scatter diagram for track of trucks.
Mathematics 12 02004 g006
Figure 7. Color diagram of data dispersion for trucks routing.
Figure 7. Color diagram of data dispersion for trucks routing.
Mathematics 12 02004 g007
Figure 8. Analyses on the travel path: (a) Travel path of a vehicle with a given ID based on time. (b) Changing a truck’s route according to time.
Figure 8. Analyses on the travel path: (a) Travel path of a vehicle with a given ID based on time. (b) Changing a truck’s route according to time.
Mathematics 12 02004 g008
Figure 9. Training and testing data.
Figure 9. Training and testing data.
Mathematics 12 02004 g009
Figure 10. The managerial insight plan of the present study.
Figure 10. The managerial insight plan of the present study.
Mathematics 12 02004 g010
Figure 11. The assessment of SDGs aspects in the present study.
Figure 11. The assessment of SDGs aspects in the present study.
Mathematics 12 02004 g011
Table 1. Review of existing machine learning and data mining techniques in this field.
Table 1. Review of existing machine learning and data mining techniques in this field.
Data Mining and Machine Learning MethodsReference
Regression and Classification TreePani et al. [26]
Support Vector Machine (SVM), Regression, and Mixed Integer ProgrammingBhattacharya et al. [27]
Gradient Boosting Regression (GBR), and Decision Tree (DT)Li and Bai [28]
KNeighbors, DT, SVM, Ensemble learningVan der Spoel et al. [29]
Bayesian Network using Fuzzy ruleSalleh et al. [30]
Logistics RegressionAlcoba, and Ohlund, [31]
Random Forest (RF), Non-Linear and Linear SVM, and ANNBarbour et al. [32]
Distribution, Spatiotemporal data mining, Pattern, and Social Representation and Relation analysisWu et al. [33]
Backpropagation, Regression and Classification Tree, and RFJames et al. [34]
Segment-Based Ordinary Kriging and Regression Kriging for Spatial PredictionYu et al. [35]
RF, GBR and Linear Regression TreesBalster et al. [36]
SVM, Adaptive Boosting, and Extremely Randomized TreeServos et al. [37]
Graph Neural Network (GNN)Verma et al. [38]
Dynamic Graph Convolutional Recurrent Imputation Network (DGCRIN)Li et al. [10]
SVM, DT, ANN, and Gaussian process regressionLiu et al. [39]
RF, GBR, Bagging, and WaveNetAntamis et al. [40]
RF, GBR, Natural GBR, Extreme GBR and BaggingValatsos et al. [41]
Table 2. Physical aspects.
Table 2. Physical aspects.
IndexIdx1y1x2y2Loc1Loc2Datedes Second
0044.3800839.40164244.38008039.40164224,227.81604224,227.8160422021-06-1704:47:58900.0
1044.3800839.40164244.38008039.40164224,227.81604224,227.8160422021-06-1705:02:58900.0
2044.3800839.40164244.38008039.40164224,227.81604224,227.8160422021-06-1705:17:58900.0
3044.3800839.40164244.38008039.40164224,227.81604224,227.8160422021-06-1705:32:58900.0
4044.3800839.40164244.38008039.40164224,227.81604224,227.8160422021-06-1705:47:58900.0
6,995,098489625.98681343.97432225.98681343.97432220,921.60071620,921.6007162022-07-2810:38:47901.0
6,995,099489625.98681343.97432225.98681343.97432220,921.60071620,921.6007162022-07-2810:40:27100.0
6,995,100489625.98681343.97432225.98681343.97432220,921.60071620,921.6007162022-07-2810:42:09102.0
6,995,101489625.98681343.97432225.98681343.97432220,921.60071620,921.6007162022-07-2810:57:11902.0
6,995,102489625.98681343.97432225.98681343.97432220,921.60071620,921.6007162022-07-2811:03:30379.0
6,995,103 rows × 9 columns
Table 3. Data grouping.
Table 3. Data grouping.
Number of DataDate Id Loc Time Time_Cumsum
002021-06-1701:00:00024,228.18113845,000.045,000.0
112021-06-1702:00:00024,228.1811383600.048,600.0
222021-06-1703:00:00024,228.1811383600.052,200.0
332021-06-1704:00:00024,228.43442321,522.073,722.0
442021-06-1705:00:00024,227.8160423600.077,322.0
1,158,5102022-07-2906:00:00489620,692.5530743118.04,806,768.0
1,158,5112022-07-2907:00:00489620,698.4434863545.04,810,313.0
1,158,5122022-07-2908:00:00489620,802.59163746,731.04,857,044.0
1,158,5132022-07-2909:00:00489620,916.917395122,581.04,979,625.0
1,158,5142022-07-2912:00:00489620,884.90548087,301.05,066,926.0
1,158,515 rows × 5 columns
Table 4. Comparison of algorithms based on metrics.
Table 4. Comparison of algorithms based on metrics.
AlgorithmMSERMSEMAER2
0KNeighborsRegressor0.0000500.0070630.0045400.430742
1LinearRegression0.0000520.0072120.0047170.406420
2LinearSVR0.0000540.0073590.0044230.381872
3RandomForestRegressor0.0000520.0072230.0046560.404593
4AdaBoostRegressor0.0001010.0100390.008029−0.150267
5GradientBoostingRegressor0.0000510.0071510.0045990.416330
6XGBRegressor0.0000510.0071290.0045980.419922
7Lasso0.0000880.0093610.006873−0.000002
8ElasticNet0.0000880.0093610.006873−0.000002
Table 5. Prediction results for different windows.
Table 5. Prediction results for different windows.
Window Size AlgorithmsMSERMSEMAER2
1KNeighborsRegressor0.00280.05280.03520.409
1LinearRegression0.00290.05350.03660.3912
1LinearSVR0.00290.0540.03560.3814
1RandomForestRegressor0.003.20.05650.03750.3224
1AdaBoostRegressor0.00520.07240.0601−0.1129
1GradientBoostingRegressor0.00280.05270.03550.4108
1XGBRegressor0.00280.05250.03520.4156
1Lasso0.00470.06860.05150
1ElasticNet0.00470.06860.05150
3KNeighborsRegressor0.00260.05.150.03400.4399
3LinearRegression0.002.70.05.170.03500.4338
3LinearSVR0.00280.05330.03260.398.9
3RandomForestRegressor0.002.70.05.20.03470.4275
3AdaBoostRegressor0.003.50.05940.04540.2543
3GradientBoostingRegressor0.00260.05.110.03400.4486
3XGBRegressor0.00260.05060.03350.458
3Lasso0.00470.06870.0514-
3ElasticNet0.00470.06870.05140
5KNeighborsRegressor0.00260.05120.03380.424
5LinearRegression0.00260.05140.03470.4197
5LinearSVR0.00280.053.10.03270.3813
5RandomForestRegressor0.00260.05.130.03420.4236
5AdaBoostRegressor0.00340.05840.04480.2508
5GradientBoostingRegressor0.00260.05060.03370.4387
5XGBRegressor0.00250.05.010.03320.4494
5Lasso0.00460.06750.0509-
5ElasticNet0.00460.06750.0509-
7KNeighborsRegressor0.00270.05.180.03390.4227
7LinearRegression0.00270.05.150.03460.4298
7LinearSVR0.00280.05330.03260.3885
7RandomForestRegressor0.00260.05.110.03410.4387
7AdaBoostRegressor0.0040.063.20.05140.1419
7GradientBoostingRegressor0.00260.05070.03350.4483
7XGBRegressor0.00250.050.03300.4628
7Lasso0.00470.06820.05100
7ElasticNet0.00470.06820.05100
9KNeighborsRegressor0.00270.05190.03390.4199
9LinearRegression0.00260.05120.03430.4343
9LinearSVR0.00280.05250.03240.4059
9RandomForestRegressor0.00260.05070.03380.4458
9AdaBoostRegressor0.00450.0670.05530.0336
9GradientBoostingRegressor0.00250.05050.03330.4508
9XGBRegressor0.00250.04980.03280.4659
9Lasso0.00460.06810.0508−0.0001
9ElasticNet0.00460.068.10.0508−0.0001
11KNeighborsRegressor0.002.70.05.160.03390.4133
11LinearRegression0.00260.05080.03420.4311
11LinearSVR0.00370.06090.04970.1835
11RandomForestRegressor0.00250.05040.03380.4407
11AdaBoostRegressor0.00610.078.10.0685−0.346
11GradientBoostingRegressor0.00250.050.03320.4491
11XGBRegressor0.00240.04940.03280.4624
11Lasso0.00450.06740.05050
11ElasticNet0.00450.06740.05050
Table 6. Results of different data parts for the algorithms based on different parameters.
Table 6. Results of different data parts for the algorithms based on different parameters.
Size of Each Data PartAlgorithmsMSERMSEMAER2
50,000KNeighborsRegressor0.00420.0650.0433260.4353
50,000LinearRegression0.00430.06530.0445480.429
50,000LinearSVR0.00480.06920.0435320.3596
50,000RandomForest Regressor0.00440.06640.0445550.4108
50,000AdaBoostRegressor0.00780.08.830.075135−0.0434
50,000GradientBoostingRegressor0.00420.06440.0433080.4445
50,000XGBRegressor0.00410.06420.0428470.4486
50,000Lasso0.0075008650.0647280
50,000ElasticNet0.0075008650.0647280
75,000KNeighborsRegressor0.00260.05110.0336810.4357
75,000LinearRegression0.00270.05150.0348080.4166
75,000LinearSVR0.00280.05310.0323820.3799
75,000RandomForest Regressor0.00270.05180.0344250.411
75,000AdaBoostRegressor0.00970.09830.089883.−1.124
75,000GradientBoostingRegressor0.00260.05080.0337570.432
75,000XGBRegressor0.00250.05050.0333640.4395
75,000Lasso0.00450.06740.050340
75,000ElasticNet0.00450.06740.050340
100,000KNeighborsRegressor0.0027005180.0342240.4242
100,000LinearRegression0.00270.05210.0353490.4177
100,000LinearSVR0.0030.0550.0341610.3519
100,000RandomForest Regressor0.00270.05230.0348510.4134
100,000AdaBoostRegressor0.0040.06320.0524250.1442
100,000GradientBoostingRegressor0.00260.05140.0342160.4333
100,000XGBRegressor0.00260.05110.0337770.4417
100,000Lasso0.00470.06830.0512250
100,000ElasticNet0.00470.06830.0512250
Table 7. Outcomes of our algorithms employing our parameters in the k-fold technique.
Table 7. Outcomes of our algorithms employing our parameters in the k-fold technique.
k Algorithm_Name MSE RMSE MAE R2
1KNeighborsRegressor 00.0070.0045540.4319
1LinearRegression 00.0070.004730.4228
1LinearSVR0.00010.00930.006880.0029
1 RandomForestRegressor00.00710.0046520.4205
1 AdaBoostRegressor0.00010.00830.0066390.209
1 GradientBoostingRegressor00.0070.00460.4314
1 XGBRegressor00.00690.0045160.4433
1 Lasso 0.00010.00930.0068790
1ElasticNet 0.00010.00930.0068790
2KNeighborsRegressor 00.0070.0045450.4354
2LinearRegression 00.0070.0047240.4226
2LinearSVR 0.00010.00710.0051440.4067
2RandomForestRegressor 00.0070.0046520.4206
2AdaBoostRegressor 0.00010.00820.0066460.2141
2GradientBoostingRegressor 00.00690.0045860.4381
2XGBRegressor 00.00690.0045140.4476
2Lasso 0.00010.00930.0068860
2ElasticNet 0.00010.00930.0068860
3KNeighborsRegressor 00.0070.0045530.4317
3LinearRegression 00.00710.0047320.4167
3LinearSVR 0.00010.00730.0044710.3704
3RandomForestRegressor 0.00010.00710.0046560.4128
3AdaBoost Regressor 0.00030.0160.01224−1.9995
3GradientBoostingRegressor 00.0070.0045990.4301
3XGBRegressor 00.00690.004520.4402
3Lasso 0.00010.00930.0068730
3ElasticNet 0.00010.00930.0068730
4KNeighborsRegressor 00.00710.0045620.4228
4LinearRegression 0.00010.00710.0047430.4127
A LinearSVR 0.00010.00950.008205−0.0491
4RandomForestRegressor 0.00010.00710.0046620.4108
4AdaBoostRegressor 0.00040.01990.016922−3.6026
4GradientBoostingRegressor 0.00010.00710.0046160.4214
4XGBRegressor 00.0070.0045310.436
4Lasso 0.00010.00930.0068850
4ElasticNet 0.00010.00930.0068850
5KNeighborsRegressor 00.0070.0045660.4267
5LinearRegression 00.0070.0047450.4225
5LinearSVR 0.00010.00750.0045880.3488
5RandomForestRegressor 0.00010.00710.0046630.407
5AdaBoostRegressor 0.00010.00880.0069090.1045
5GradientBoostingRegressor 0.00010.00730.0046230.3747
5XGBRegressor 0.00010.00730.0045410.3761
5Lasso 0.00010.00930.0068820
5ElasticNet 0.00010.00930.0068820
6KNeighborsRegressor 00.00690.0045380.4372
6LinearRegression 0.00010.00710.0047170.3977
6LinearSVR 0.00010.00820.0066360.2024
6RandomForestRegressor 00.0070.0046380.422
6AdaBoostRegressor 0.00010.00840.0069240.1487
6GradientBoostingRegressor 00.00690.0045950.4336
6XGBRegressor 00.00680.004510.4477
6Lasso 0.00010.00920.0068830
6ElasticNet 0.00010.00920.0068830
7KNeighborsRegressor 0.00010.00760.0045620.3993
7LinearRegression 0.00010.00750.0047430.4201
7LinearSVR 0.00010.01090.008452−0.2307
7RandomForestRegressor 0.00010.00770.0046620.3857
7AdaBoostRegressor 0.00180.04280.036901−18.0133
7GradientBoostingRegressor 0.00010.00760.0046030.3981
7XGBRegressor 0.00010.00760.0045250.4055
7Lasso 0.00010.00980.0069020
7ElasticNet 0.00010.00980.0069020
8KNeighborsRegressor 00.0070.0045640.4295
8LinearRegression 0.00010.00710.004740.4114
8LinearSVR 0.00020.01340.012251−1.1097
8RandomForestRegressor 0.00010.00710.0046860.4115
8AdaBoostRegressor 0.00010.00840.0069110.1732
8GradientBoostingRegressor 00.0070.0046180.4253
8XGBRegressor 00.00690.0045410.4374
8Lasso 0.00010.00920.0068810
8ElasticNet 0.00010.00920.0068810
9KNeighborsRegressor 00.00690.0045270.44
9LinearRegression 00.0070.0047180.4198
9LinearSVR 00.0070.0047750.4185
9RandomForestRegressor 00.0070.0046330.4208
9AdaBoostRegressor 0.00010.00820.0066480.2064
9GradientBoostingRegressor 00.00690.0045830.4296
9XGBRegressor 00.00690.0045050.433
9Lasso 0.00010.00920.0068870
9ElasticNet 0.00010.00920.0068870
10KNeighborsRegressor 00.00710.004570.4275
10LinearRegression 0.00010.00710.0047490.4209
10LinearSVR 0.00010.00740.0044890.3769
10RandomForestRegressor 0.00010.00720.0046780.4049
10AdaBoostRegressor 0.00010.01070.009156−0.3251
10GradientBoostingRegressor 00.0070.0046180.4292
10XGBRegressor 00.0070.0045420.4389
10Lasso 0.00010.00930.006890
10ElasticNet 0.00010.00930.006890
Table 8. Comparison of parameters using LOO technique.
Table 8. Comparison of parameters using LOO technique.
Metrics Results
MSE0.009389521
RMSE0.096899541
MAE0.06528897
R20.404312521
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Ghazikhani, A.; Davoodipoor, S.; Fathollahi-Fard, A.M.; Gheibi, M.; Moezzi, R. Robust Truck Transit Time Prediction through GPS Data and Regression Algorithms in Mixed Traffic Scenarios. Mathematics 2024, 12, 2004. https://doi.org/10.3390/math12132004

AMA Style

Ghazikhani A, Davoodipoor S, Fathollahi-Fard AM, Gheibi M, Moezzi R. Robust Truck Transit Time Prediction through GPS Data and Regression Algorithms in Mixed Traffic Scenarios. Mathematics. 2024; 12(13):2004. https://doi.org/10.3390/math12132004

Chicago/Turabian Style

Ghazikhani, Adel, Samaneh Davoodipoor, Amir M. Fathollahi-Fard, Mohammad Gheibi, and Reza Moezzi. 2024. "Robust Truck Transit Time Prediction through GPS Data and Regression Algorithms in Mixed Traffic Scenarios" Mathematics 12, no. 13: 2004. https://doi.org/10.3390/math12132004

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop