**1. Introduction**

Smart cities have emerged at the heart of "next stage urbanization" as they are equipped with fully digital infrastructure and communication technologies to facilitate e fficient urban mobility. The fundamental enabler of a smart city is dependent on connected devices, though the real concern is how the collected data are distributed city-wide through sensor technologies via the Internet of Things (IoT). Heterogeneous vehicular networks in a connected infrastructure network are able to sense, compute, and communicate information through various access technologies: Universal Mobile Telecommunications System (UTMS), Fourth Generation (4G), and Dedicated Short-Range Communications (DSRC) [1,2]. In vehicular sensor networks (VSN) and Internet of vehicles (IOV), each vehicle act as receivers, senders, and routers simultaneously to transmit data over the network or to a central transportation agency as an integral part of intelligent transportation systems (ITS) [3,4]. Furthermore, each and every network node in VSN is assumed to store, carry, and precisely transfer the

data with cooperative behavior. In recent years, following rapid diversification, navigation technologies and tra ffic information services enable a large amount of data to be collected from the di fferent devices such as loop detectors, on-board equipment, speed sensors, remote microwave tra ffic sensors (RTMS), and road-side surveillance cameras etc., that have been proactively used for monitoring of tra ffic conditions in the ITS domain [5–9]. Sensor networks in the form of road side units (RSUs) o ffer numerous applications including broadcasting periodic informatory, warnings, and safety messages to road users. The data obtained from these di fferent sources have provided myriad opportunities to estimate and predict travel time and future tra ffic states through a large number of data-driven computational and machine learning approaches. Accurate tra ffic state prediction (TSP) ensures efficient vehicle route planning, and pro-active real-time tra ffic management.

TSP is achieved in three distinct steps: (i) prediction of the desired tra ffic flow parameters (i.e., volume, speed, and occupancy); (ii) identification of tra ffic state; and (iii) realizing the tra ffic state output. TSP can be classified as either short-term prediction or long-term prediction. In the former prediction type, short-term changes in tra ffic status are predicted (e.g., during a 5, 10, 15, or 30 min prediction horizon), and long-term prediction is usually estimated in days and months [10]. Short-term predictions can either be used directly by tra ffic professionals to take appropriate actions or can be added as inputs for proactive solutions in congestion management. Short-term prediction reduces common problems such as tra ffic congestion, road accidents, and air pollution; meanwhile, it also o ffers road users and tra ffic managemen<sup>t</sup> agencies with important information to assist in better decision-making [11]. Three factors a ffect the quality of prediction in real-time tra ffic information. These factors include: (i) variation in data collected from various sources like sensors and other sources; (ii) dynamic nature of tra ffic conditions; and (iii) randomness and stochastic nature of tra ffic appearing in the supply and demand. However, addressing these factors remained challenging and significant in the realm of quality prediction for real-time tra ffic information [12].

In TSP, prediction methodologies are broadly studied into two main categories: parametric and non-parametric techniques [8]. Parametric methods include auto aggressive integrated moving average method ARIMA [13], exponential smoothing (ES) [14], and seasonal auto aggressive integrated moving average method (SARIMA) [15,16]. In their study, Li et al. suggested that a multi-view learning approach estimates the missing values in tra ffic-related time series data [17]. Parametric methods focus on pre-determining the structure of the model based on theoretical or physical assumptions, later tuning a set of parameters that represent the tra ffic conditions (i.e., a trend in the actual world) [10,11]. These practices develop a mathematical function between historical and predicted states, for instance, model-based time series such as ARIMA, which is commonly used for tra ffic predictions in all parametric methods [18]. However, autoregressive models provide better accuracy for TSP models, while considering the tra ffic information about upstream and downstream locations is accounted for on freeways [9]. Parametric methods have good accuracy and high computational e fficiency and are highly suited for linear or stationary time-series [19]. On the other hand, non-parametric approaches provide several advantages such as the ability to avoid model's strong assumptions and learn from the implicit dynamic tra ffic characteristics through archived tra ffic data. These models have the benefit of being able to manage non-linear, dynamic tasks, and can also utilize spatial–temporal relationships, whereas non-parametric methods require a large amount of historical data and training processes. Non-parametric techniques include artificial neural network (ANN) [20–22]; support vector regression (SVR) [23,24]; K-nearest neighbor (KNN) [25–29]; and Bayesian models [30,31]. Since non-parametric techniques yield better prediction accuracy compared to ordinary parametric techniques like time series as they require significantly high computational e ffort. Their prediction accuracy is largely dependent on the quantity and quality of training data [32]. The above-mentioned methods have been successfully deployed in various transport related applications where predictions are required for excessive passenger flow at a metro station or in a crowd gathered for a special event [33,34].

A critical review of literature for TSP indicates that time series and conventional ANN models have been widely employed for short term TSP. Although these models were aimed to fit the speed, density, or volume data as they usually inherit an overfitting issue. Thereby, the ability of models that capture generalized trends for tra ffic prediction is compromised. Macroscopic tra ffic parameters such as tra ffic flow, tra ffic speed, and density are the state variables of interest used in TSP, and are subsequently evaluated using level of service (LOS). However, training and testing the accuracy for the majority of such modeling approaches is frequently questioned. To overcome this issue, we incorporated recent AI and machine learning state-of-art-approaches such as decision jungles and LD-SVM (via hyperparameter optimization) as these methods have been rarely been explored in the existing works. Data utilized in current study was extracted from tra ffic simulator 'VISSIM', which realistically simulates complex vehicle interaction in transportation systems. Furthermore, this study has major contributions in terms of spatiotemporal analysis of di fferent LOS classes (i.e., A-F) under di fferent data-collection time-intervals. In general, we emphasized short-term prediction, which is considered useful for improving the productivity of transportation systems, and also beneficial in reducing both the direct and indirect costs. Moreover, this study reviewed the di fferent techniques and approaches that have been used for short-term TSP. A comprehensive comparative analysis was also conducted to evaluate the ability and e fficiency of proposed methods in terms of prediction accuracy. The specific main contributions of this paper are:


The remainder of this paper is organized as follows. Section 2 presents a brief overview of the methods and techniques for TSP in the existing literature. Section 3 describes the preliminaries for di fferent machine learning models used in this study. Section 4 presents study area, data description, and key parameter settings. Section 5 highlights results and discussion. Section 6 includes the comparison of di fferent models. Finally, Section 7 summarizes the conclusions, presents key study limitations, and outlook for future studies.

## **2. Related Work**

Since early 1980, non-linear tra ffic flow prediction has been the focus of several research studies as it is regarded as extremely useful for real-time proactive tra ffic control measures [15,16]. From its inception in the 1980s, artificial neural networks (ANNs) have been widely used for the analysis and prediction of time series data. They have the ability to perceive the non-linear connection between features of input and output variables that in turn can produce e ffective TSP solutions. For example, Zheng et al. combined Bayesian inference and neural networks to forecast future tra ffic flow [35]. Ziang and Adeli proposed a time-delay via recurrent wavelet neural network, where the periodicity demonstrated the significance of tra ffic flow forecasting [36]. Parametric methods can obtain better prediction outcomes when the data flow of the tra ffic varies temporally. These methods assume a variety of di fficult conditions such as residual normalization and predefined system structure and rarely converged due to the stochastic or non-linear tra ffic flow characteristics.

To address the limitations of parametric models, di fferent approaches including linear kernel, polynomial kernel, Gaussian kernel, and optimized multi kernel SVM (MK-SVM) have been proposed by recent research studies for tra ffic flow prediction [37–40]. MK-SVM predicted the results by mapping the linear parts of historical tra ffic flow data using the linear kernel, while map residual was performed using the non-linear kernel. Alternatively, generating if–then rules, also known as

rule induction techniques that search the training data for proposition rules, can also be used. which CN2 is best-known example of this approach, that have been successfully utilized by previous for flow prediction [41,42]. Hashemi et al. developed di fferent models for classification based on if–then rules in the short-term tra ffic state prediction for a highway segmen<sup>t</sup> [43]. In contrast, ANNs' popular network structure is multi-layer perceptron (MLP), which has been widely used in many transport applications due to its simplicity and capacity to conduct non-linear pattern classification and function approximation. The MLP model generally works well in the capture of complex and non-linear relations, but it usually requires a large volume of data and complex training. Many researchers, therefore, consider it as the most commonly implemented network topology [44–46]. Recently, in the study by Chen et al., they adapted a novel approach using dynamic graph hybrid automata for the modeling and estimation of density on an urban freeway in the city of Beijing, China [47]. The authors validated the feasibility of their modeling approach on Beijing's Third Ring Road. A recent study conducted by Zahid et al., proposed a new ensemble-based Fast forest quantile regression (FFQR) method to forecast short-term travel speed prediction [48]. It was concluded that proposed approach yielded robust speed prediction results, particularly at larger time-horizons.

Aside from the above-mentioned models, decision trees and forests have a rich history in machine learning and have shown significant progress in TSP, as reported in some of the recent literature [49,50]. Various studies have been conducted to address the shortcomings of traditional decision trees, for example, their sub-optimal e fficiency and lack of robustness [51,52]. Similarly, in another research study, the researchers investigated the e fficacy of the ensemble decision trees for the TSP [50]. It was concluded that trees generate e fficient predictions traditionally. At the same time, researchers have concluded that learning with ideal decision trees could be problematic due to overfitting [53]. Henceforth, this approach has some limitations, such that the amount of data to be provided as the number of nodes in decision trees would increase exponentially with depth, a ffecting the accuracy [54]. Recently, a study proposed a novel online seasonal adjustment factors coupled with adaptive Kalman filter (OSAF-AKF) model for estimating the real-time seasonal heteroscedasticity in tra ffic flow series [55].

In contrast, machine learning techniques and their performances for classifying di fferent problems have been encouraging such as decision jungles and LD-SVM, which are heavily dependent on a set of hyperparameters that, in turn, e fficiently describes di fferent aspects of algorithm behavior [54,56,57]. It is important to note that no suitable default configuration exists for all problem domains. Optimizing the hyperparameter for di fferent models is important in achieving good performance in the realm of TSP [56]. There are two types of hyperparameter optimization: manual and automatic. Manual is time-consuming and depends on expert inputs, while an automatic approach removes expert input. Automatic approaches include the most common practice methods such as grid search and random search [58]. Several libraries have recently been introduced to optimize hyperparameters. Hyperopt Library is one of the libraries o ffering di fferent hyper-optimization algorithms for machine learning algorithms [59]. Existing techniques for optimizing EC-based hyperparameters [60,61] such as di fferential evolution (DE) and particle swarm optimization (PSO) are useful since they are conceptually easy and can achieve highly competitive output in various fields [62–65]. However, these methods have a grea<sup>t</sup> deal of calculation and a low convergence rate in the iterative process. In contrast, hyperparameter optimization methods such as random grid, entire grid, and random sweep have achieved a grea<sup>t</sup> deal of attention in hyperparameter optimization. In a random grid, the matrix is computed for all combinations, and the values are extracted from the matrix by the number of defined iterations in relation to the entire grid incurred for all possible combinations. The di fference between the random grid and the random sweep is that the latter technique selects random parameter values within the set, while the former only employs the exact values defined in the algorithm module. With this understanding, random sweep was chosen for the models conducted in this study for hyperparameter optimization with the intention of improving the accuracy of short-term TSP.
