1. Introduction
The prediction of the average speed (AS) of road segments plays an important role in an intelligent transportation system (ITS). Its accuracy and timeliness have a great impact on the implementation of dynamic traffic management, such as traffic congestion estimation [
1] and signal control [
2]. The data collection of floating cars has the advantages of high flexibility, strong real-time performance, wide coverage, and high data precision, when compared to that of fixed detectors [
3].
Existing researches usually relied on traffic parameters of fixed detectors to predict the AS in a road segment. The low accuracy is the main barrier for its wide application. Cetin and Comert [
4] utilized the coil dataset published by California Path and then proposed the expectation maximization and Cumulative Sum (CUSUM) algorithms to predict the average traffic speed. Chandra and Al-Deek [
5] mined the interaction between the upstream and downstream segments using dual-loop detector data and predicted the AS of road segments designed by a vector self-decreasing time series model. Jing et al. [
6] assessed the multistep speed predictive performance of eight different models using 2-min road segment speed data collected from remote traffic microwave sensors. All above approaches failed to consider the traffic state of adjacent intersections, so it is difficult to accurately demonstrate the traffic state of urban roads via data acquisition from fixed detectors.
Recently, the booming of mobile Internet has inspired new ideas for traffic congestive detection. The mobile detective data from the vehicle have a wide coverage and continuous space, and the huge daily traffic data make the prediction of the traffic state more accurate and reliable. Emerging technologies based on Global Positioning System (GPS) enable us to track vehicle trajectories and collect real-time traffic data across entire road networks [
7], and have been introduced to predict the AS of road segments. Queen and Albers [
8] proposed a dynamic Bayesian model to identify lagged causal relationships between time series, and predict traffic speed at multiple road link locations. Pei et al. [
9] collected GPS probed data of road segments, and developed a predictive model of AS using a full Bayesian method. Combining the acceleration of the target segment and the speed of the adjacent segment, Ye et al. [
10] used an improved Neural Network (NN) to improve the prediction performance of AS. Based on a study of prediction bias correlation among adjacent road segments and weather factors, Yang et al. [
11] employed an artificial NN and adjustment approach to predict the AS of a road segment. Yao et al. [
12] developed a Support Vector Machine (SVM) model consisting of spatiotemporal parameters. It is commonly used for short-term prediction under the experimental condition that the runtime speed should be below 35 KM/h. Satrinia and Saptawati [
13] combined map-matching with topological information to predict traffic speed via Support Vector Regression (SVR). Zhao et al. [
14] adopted a deep learning model to predict the traffic speed during non-recurrent congestion periods. These approaches perform well only if the GPS data sampling is sufficient. On the other hand, the predictive accuracy of these approaches based on NNs, SVM, and SVR usually depend on the training quality of the traffic dataset.
Apart from the aforementioned road traffic predictions, Kalman filter (KF) does not depend on the training quality of the traffic dataset, and is one of the most widely used traffic prediction methods, which was first introduced in traffic forecasting by Okutani and Stephanedes [
15]. KF addressed the problem of filtering the recursion of discrete linear data, which is applied to the fields of traffic variable prediction and travel time estimation [
16,
17]. However, due to its linear model, it is not appropriate for nonlinear and random traffic variables. To overcome this issue, an extended Kalman filter (EKF) that is suitable for nonlinear traffic prediction is implemented with the KF algorithm, which linearizes the nonlinear state space model. Liu et al. [
18] proposed a state-space model and a progressive EKF method. It fuses heterogeneous data and tracks the variation in traffic dynamics. Yuan et al. [
19,
20] later used the EKF to predict the traffic states, in which the discretized Lagrangian model was used as the process equation. Based on the EKF, Dong et al. [
21] developed a spatiotemporal model to predict traffic flow. Huang et al. [
22] designed an advanced EKF algorithm to improve the accuracy of vehicle speed prediction by combining the adaptive forgetting factor and the EKF algorithm. Although EKF has been widely adopted to speed prediction, it failed to enable high accuracy and parameter estimation, as well as random factors.
Recursive least squares (RLS) is used to correct the previous results by using new observational data. RLS usually performs real-time traffic state estimation toward the system parameters [
23]. Comert et al. [
24] adopted a RLS filtering and proposed a model for predicting traffic speed with the considerations to impact factors such as weather, accidents, and driving characteristics. A weighted RLS estimator was used to optimize these parameters of the linear functions. Tang et al. [
25] established the Takagi–Sugeno-type fuzzy rules to forecast travel speed. Aiming to optimize wireless network performance, Kulkarni et al. [
26] proposed a simple traffic mechanism to predict traffic load by using RLS. However, RLS performs poor recognition accuracy if noise exists.
Hybrid models incorporate the advantages of single approaches to improve traffic prediction accuracy [
27,
28]. However, the road segment data used [
27] is not sufficient, and random events should be taken into account for further accuracy improvement.
Recently, existing researches based on motion detectors have had these problems. On one hand, the accuracy of AS prediction would be affected when the road segment data is not sufficient or a random event occurs. On the other hand, the predictive accuracy of Machine Learning methods such as NNs, SVM, and SVR usually depend on the training quality of the dataset.
In this study, we proposed a novel road segment AS prediction model based on floating car GPS data (FCG-ASpredictor), which adopted a spatiotemporal correlation calculation method and a recursive least squares–extended Kalman filter (RLS-EKF) to solve current issues. Finally, we identified our approach on the AS prediction on four road segments in Chengdu and found that FCG-ASpredictor is feasible and highly accurate.
The rest of the paper is organized into five sections.
Section 2 analyzes the data association.
Section 3 describes the materials and methods.
Section 4 illustrates the experimental results.
Section 5 discusses the evaluation and feasibility. Finally, conclusions are drawn in
Section 6.
3. Materials and Methods
The traffic flow system is a highly correlated system, and a change is random at a certain moment, which makes traffic status prediction difficult. RLS can realize the real-time estimation of system parameters and has a great influence on model identification accuracy under noisy conditions. An EKF can be applied to nonlinear system prediction, but it is susceptible to the accuracy of the state estimation.
In order to compensate for the defects of the respective methods and solve the issue of insufficient road segment data, the main idea of FCG-ASpredictor is shown in
Figure 5. By establishing multiple regression equations, the historical AS obtained by the spatiotemporal correlation calculation method and the external factors (i.e., weather and date attribute) of the current timeslot are identified by the RLS. The measured values and observed values are adopted by the EKF to improve the predictive accuracy of the AS of the target road segment.
3.1. Study Area and Data Sources
Chengdu, as the capital of Sichuan Province in China, is an important central city in the western region. Its geographical coordinate range is 30◦05′–31◦26′ latitude and 102◦54′–104◦53′ longitude. It is consist of 20 districts, covering the total area of 14,335 km2, with a resident population of 16.33 million. This paper selected the central urban areas, the Wuhou, Jingjiang, Qingyang, Jinniu, and Chenghua districts, as the study areas.
Due to the high sampling frequency of floating cars data, we employ the dataset (i.e., order details) from the Chengdu branch of Didi Chuxing, The sampling frequency is 3 s. The data size is 462 GB, and each record includes: (1) driver ID; (2) order ID; (3) timestamp; (4) latitude; (5) longitude; and (6) vehicle status. The raw data format is shown in
Table 2.
3.2. The Computational Procedures of AS
The AS of the road segment usually refers to the AS of travel through the road segment. We employ the travel speed of the road segment by using the accumulated integral of the instantaneous speed, and obtain the AS of the road segment.
According to the position and timestamp of the adjacent position belonging to the same order ID, the distance between adjacent positions can be calculated by using the spherical distance formula. The time interval can be calculated by the timestamp of the adjacent positions. The instantaneous speed of each position is calculated as follows:
where
v is the instantaneous speed,
r is the earth radius,
x1 and
x2 are the latitudes of the adjacent positions,
y1 and
y2 are the longitudes of the adjacent positions, and
T1 and
T2 are the time stamps of the adjacent positions.
The travel distance of a positioning car based on the accumulated integral is calculated as follows:
where
is the travel distance,
is the GPS positioning time, and
is the instantaneous speed.
Since the sampling frequency is fixed, Formula (3) is modified as follows:
where
is the fixed time interval.
According to the travel distance and time interval, the travel speed is calculated as follows:
where
is the travel speed.
Owing to the uneven distribution of the floating car in the urban road network, the speed measurement accuracy is degraded, and the AS prediction of the road segment is considered from the distribution of the floating car. In order to ensure accurate calculation of the AS of the road segment, the number of travel speed samples n at a certain time should not be less than the minimum number of samples nmin. If the number of travel speed samples n is insufficient, then the historical AS and AS of the upstream and downstream segments during the simultaneous timeslot need to be integrated.
In addition, if the cumulative number
m of continuous travel speed samples is greater than the maximum value
mmax, this indicates that the number of travel speed samples in the previous
mmax timeslots is continuously less than the minimum number of samples
nmin, and the AS of the upstream and downstream segments in the simultaneous timeslot is insufficient to reflect the current traffic status. Then, it is necessary to integrate the historical AS of road segments. The spatiotemporal correlation calculation process of AS is shown in
Figure 6.
The formula for calculating the AS of the road segment is as follows:
where
is the AS of the road segment during timeslot
t,
n is the number of travel speed samples, and
is the
ith travel speed at timeslot
t.
If the travel speed sample number
n of the road segment at timeslot
t is smaller than the minimum sample number
nmin, then the historical AS and the simultaneous AS of the upstream and downstream segments are integrated as follows:
where
is the estimated historical AS of the road segment, and
is the estimated AS of the upstream and downstream segment during the current timeslot. The control parameters
nmin and
mmax are derived from the example calibration.
and
are calculated by weighting the corresponding correlation speeds. The weighting formula is as follows:
where
and
are the AS of the historical simultaneous timeslot and the AS of the forward timeslot, respectively;
and
are the AS of the upstream and downstream segments during the current timeslot, respectively; and
and
are weight coefficients that are adjusted according to the measurement of actual data.
3.3. Establishment of Multiple Regression Equations
According to the impact of historical AS, weather, and date attributes on AS prediction, the degree of influence between the AS of the target road segment and the historical AS is calculated by the Pearson correlation coefficient.
The AS in the historical simultaneous timeslots of the previous
days, the AS of the previous
timeslots during a day, the weather value of the current timeslot, and the date attribute value of the current timeslot are selected. The following multiple regression equation for predicting the AS value is established:
where
is the predicted AS during timeslot
t of the
kth day,
are the AS in the historical simultaneous timeslot
t of the previous
days, and
are the AS in the previous
timeslots of the
kth day.
and
are the weather-quantized value and the date-attribute-quantized value, respectively, in timeslot
t of the
kth day; these need to be quantified according to the standard.
,
,
and
are the influence weights of each system variable on the predicted value.
3.4. System Identification of RLS Method
The system parameters are identified and updated according to Formula (10). The transformed recursive equation is as follows:
where
is the AS value of the road segment,
is the identified parameter vector, and
is the error caused by observation noise.
and
are recorded as vectors as follows:
Combining Formulas (11), (12), and (13), the system parameter identification gain and the error covariance matrix are updated. The least-squares equation is expressed as follows:
where
is the parameter identification gain for timeslot
t,
is the error covariance matrix of different timeslots, and
is the identity matrix of the identification parameter.
According to Formulas (6), (7), and (11) to (15), the recursive formula for system parameter identification during timeslot
t is expressed as follows:
where
is the least-squares estimate of the system parameters for different timeslots, and
is the correction term of the identified parameter estimation for timeslot
t−1.
3.5. Implementation of EKF
It can be seen from Formula (10) that the AS prediction model includes nonlinear external factors such as the weather and date attributes. This study uses an EKF algorithm to improve the AS prediction accuracy of the target segment. For the sake of simplicity, Formula (10) is modified as follows:
where
is the number of timeslots in a day (assuming the length of the timeslot and the number of timeslots remain constant),
is the AS prediction of the road segment,
are the AS in the historical simultaneous timeslots of the previous
days, and
are the AS of the previous
timeslots during a day.
According to Formula (17), the standard form of the state equation and the observation equation are expressed as follows:
where
and
are state and observation vector values, respectively;
is the system process noise;
is the observation noise; and
and
are nonlinear mapping functions of the state equations and observation equations, respectively.
,
, and
are expressed as follows:
where
is the estimated value of
, and
and
are the system state matrix and the observation matrix, respectively.
According to Formulas (17) to (21),
and
are derived as follows:
The three components of state vector
in Formula (19) are
,
, and
. They are partial derivatives.
and
are converted to a Jacobian matrix:
The corresponding parameters
,
, and
in Formula (24) are calculated as follows:
Since the specific values of parameters
,
, and
corresponding to Formula (25) are calculated by Formula (16), then
and
are known values. Combining with the KF, the time update of Formula (17) is expressed as follows:
where
is the prior estimate of the state vector at timeslot
t,
is the covariance of the state vector estimation error, and
is the covariance matrix of the process noise.
According to Formulas (18), (23), (26), and (27), the observation update of Formula (17) is expressed as follows:
where
is the Kalman gain at timeslot
t,
is the posterior estimate of the state vector at timeslot
t, and
is the covariance matrix of the observation noise.
6. Conclusions
In this paper, we propose an integrated analysis model of predicting road segment AS: FCG-ASpredictor. It incorporates the spatiotemporal correlation calculation and RLS-EKF to address two issues: (1) low accuracy due to insufficient data and (2) poor training quality. By using traffic data in Chengdu, China to verify the proposed model, the analysis result is feasible. The main contributions of this paper are as follows: (1) new design to obtain an accurate AS of the road segment: we use the number of travel speed samples and the cumulative number of segments with less continuous travel speed samples as the benchmark metrics, and build a spatiotemporal correlations calculation method with regard to GPS data; (2) new approach based on RLS-EKF, which utilizes the RLS to fuse the historical AS with other factors (such as weather and date attributes) and apply EKF to predict the AS in the target segment. The experimental result shows that the RLS-EKF performs well and achieves high accuracy.
The FCG-ASpredictor combines various impact factors such as AS-hst, AS-ft, WC-ct, DA-ct, etc., and achieves good results for the AS prediction of road segments. However, there still exists limitations while applying the model for the speed prediction of long-term traffic; thus, we will work toward improving the model adaptation on spatiotemporal correlations in the future.