A Hybrid ARIMA-LSTM Model for Short-Term Vehicle Speed Prediction

Wang, Wei; Ma, Bin; Guo, Xing; Chen, Yong; Xu, Yonghong

doi:10.3390/en17153736

Open AccessArticle

A Hybrid ARIMA-LSTM Model for Short-Term Vehicle Speed Prediction

by

Wei Wang

¹,

Bin Ma

^1,2

,

Xing Guo

^3,*,

Yong Chen

^1,2,* and

Yonghong Xu

^1,2

¹

School of Mechanical and Electrical Engineering, Beijing Information Science and Technology University, Beijing 100192, China

²

Beijing Laboratory for New Energy Vehicles, Beijing 100192, China

³

School of Automation and Electrical Engineering, University of Science and Technology Beijing, Beijing 100083, China

^*

Authors to whom correspondence should be addressed.

Energies 2024, 17(15), 3736; https://doi.org/10.3390/en17153736

Submission received: 22 June 2024 / Revised: 22 July 2024 / Accepted: 25 July 2024 / Published: 29 July 2024

(This article belongs to the Section E: Electric Vehicles)

Download

Browse Figures

Versions Notes

Abstract

:

Short vehicle speed prediction is important in predictive energy management strategies, and the accuracy of the prediction is beneficial for energy-saving performance. However, the nonlinear feature of the speed series hinders the improvement of prediction accuracy. In this study, a novel hybrid model that combines an autoregressive integrated moving average (ARIMA) and a long short-term memory (LSTM) model is proposed to handle the nonlinear part efficiently. Generally, the ARIMA component filters out linear trends from the speed series data, and the parameters of the ARIMA are determined with the analysis. Then the LSTM handles the residual normalized nonlinear items, which is the residual of ARIMA. Finally, the two parts of the prediction results are superimposed to obtain the final speed prediction results. To assess the performance of the hybrid model (ARIMA-LSTM), two tested driving cycles and two typical driving scenarios are subjected to rigorous analysis. The results demonstrate that the combined prediction model outperforms individual methods ARIMA and LSTM in dealing with complex, nonlinear variations, and exhibits significantly improved performance metrics, including root mean square error (RMSE), mean absolute error (MAE), and mean percentage error (MAPE). The proposed hybrid model provides a further improvement for the accuracy prediction of vehicle traveling processes.

Keywords:

speed prediction; hybrid model; ARIMA; LSTM; nonlinear

1. Introduction

The rapid development of new energy vehicles has highlighted their advantages in controllability, drivability, and environmental friendliness, making them a research hotspot for energy saving and emission reduction [1]. Developing online optimization control strategies can significantly improve the energy utilization efficiency of new energy vehicles, and accurate speed prediction is a key factor in energy management control strategies [2,3]. Therefore, the development of efficient and accurate speed prediction algorithms is vital for the advanced control of new energy vehicles.

The prediction of vehicle speed, due to its typically unstable nature, poses a significant challenge when using a single model to achieve accurate results. Numerous hybrid models have been proposed to enhance the convergence speed and global search capabilities of individual neural networks, thereby improving the performance of speed time series predictions. Li [3] proposed a new data-driven backpropagation long short-term memory (BP-LSTM) algorithm for predicting the long-term average speed of individuals on a driving route. Ma [4] combined the spatio-temporal feature selection algorithm (STFSA) with the convolutional neural network gated recursive unit (CNN-GRU) to provide a hybrid traffic speed prediction model (STFSA + CNN-GRU abbreviated as SCG). In [5,6], a combined prediction model with Markov and BP was developed to forecast the vehicle velocity. Hybrid models are often used to overcome the limitations of a single model when solving other time series problems. For instance, Seckin [7,8] introduced a support vector regression (SVR) model incorporating an encapsulation-based feature selection method for predicting the volatility of crude oil price time series. A hybrid neural network, called DCGNet (deep CNN and gated recurrent unit network), was developed for predicting sintering temperature [9]. Li [10] proposed a new hybrid model for wind speed interval prediction based on gated recurrent unit neural networks and variational mode decomposition. Aytac [11] devised a hybrid forecasting model for digital currency time series using LSTM and empirical wavelet transform (EWT) decomposition. However, many of these models primarily focus on the nonlinear characteristics of vehicle speed, potentially overlooking its linear components. This oversight can result in suboptimal prediction accuracy, as the linear factors also play a crucial role in determining the vehicle’s speed.

Currently, the commonly used vehicle speed prediction methods can be categorized into vehicle-road cooperative modeling and data-driven methods [12]. Vehicle-road cooperative methods establish relationships between vehicle speed and the road environment, requiring accurate maps, GPS, and sensors, which can be costly. Data-driven methods, including Kalman filter, neural networks, and time series methods, have also been explored [13,14]. However, these methods have limitations such as sensitivity to data noise, dependency on network structure parameters, and inability to fully extract effective information from historical data. Therefore, in order to improve the effectiveness and accuracy of predicting vehicle speed, a more efficient and accurate method should be established.

Vehicle speed data typically follows a time series structure, prompting numerous researchers to employ time series analysis techniques to uncover hidden patterns in historical data for predicting vehicle speed behavior [15,16,17,18]. Among those model-based methods, ARIMA is the most renowned and effective linear statistical models, which can achieve better prediction with local history data. As an extension of the autoregressive moving average model (ARMA), the ARIMA has been applied in numerous areas [19,20]. In [21], an ARIMA predictor was developed to forecast road gradients and vehicle speed accurately without the help of external devices or GIS maps. However, this method has not been widely used in vehicle speed prediction due to its shortcomings in nonlinear feature extraction.

With the widespread application of information technology and communications in the field of vehicle, machine learning has accumulated over the years. With big databases, the data-driven deep learning (DL) models have a great potential to understand speed prediction with sufficient training, and can efficiently handle the relationships of predictive influences [22,23,24]. Currently, typical DL models used for speed prediction mainly include artificial neural networks (ANN) [25], convolutional neural networks (CNN) [26], deep neural networks (DNN) [27], and temporal models (RNN, LSTM) [3,28]. In the above model, the LSTM algorithm with an improved RNN architecture has gained attention for its ability to adequately capture nonlinear trends and correlations in 1997 [29]. The results demonstrate the LSTM performs better than other models in predicting nonlinear data [30]. However, individual machine-learning models suffer from drawbacks such as low convergence, outlier effects, time loss, and local minima.

The enhancement of prediction performance is often achieved by integrating two prediction algorithms, surpassing the limitations of individual prediction models to achieve improved accuracy. The hybrid approach combining linear statistical models and machine learning has found success in numerous other domains [31,32,33,34,35,36]. However, this hybrid approach has not received sufficient attention in vehicle speed prediction. The ARIMA model, widely acknowledged for its proficiency, is excellent at extracting linear patterns from time series data. Meanwhile, the RNN model is well-suited for accurately addressing complex nonlinear relationships due to its unique memory capabilities and deterministic structure. Therefore, inspired by the success of combined prediction methods, we introduce a model that integrates ARIMA and LSTM techniques to construct a vehicle speed prediction model.

The main goal of this research is to develop dependable and precise time series prediction models for vehicle speed that efficiently handle nonlinear features, thereby enhancing prediction performance across various scenarios. A hybrid ARIMA-LSTM model is investigated by modeling two components, linear and nonlinear parts. Two tested driving cycles and two typical driving scenarios are subjected to verify the effectiveness of the algorithm. This study makes the following main contributions. Firstly, on the theoretical front, it showcases the efficacy and convenience of the hybrid ARIMA-LSTM model for vehicle speed prediction. Secondly, based on four typical speed conditions and through comparison with various models, the results indicate that the hybrid model proposed in this paper demonstrates smaller errors within a 15 s prediction horizon, making it suitable for short-term vehicle speed prediction. Finally, by predicting speed during vehicle operation, it offers valuable insights into future vehicle timings, thereby addressing critical concerns related to energy conservation and driving safety. The ARIMA method excels with linear fluctuating data, while the LSTM method excels with nonlinear fluctuating data. Combining both in the ARIMA-LSTM hybrid model yields superior performance, effectively integrating the strengths of linear and nonlinear models.

The remainder of the paper is structured as follows: Section 2 provides a detailed description of our proposed model. Section 3 details the analysis of naturalistic driving data, covering linear predictions with the ARIMA model, nonlinear predictions with the LSTM model, and the results from the hybrid ARIMA-LSTM model. Section 4 compares the hybrid model to the individual models. Section 5 presents the study’s conclusions.

2. Methodology

Vehicle speed data, typical of time series, show both linear and nonlinear characteristics. Prior research suggests that ARIMA is a conventional but effective method for linear statistical time series prediction, while LSTM excels in capturing nonlinear features in the data. Therefore, we propose a coupled ARIMA-LSTM model to include both linear and nonlinear components.

2.1. Description of Vehicle Speed Data

Two randomly real-world driving scenarios located in Beijing are collected, including part of the highway, city road, part of the higher speed ring road of the Third Ring Road and the Fourth Ring Road, as shown in Figure 1. It is important to note that the “natural driving data driving routes” in Figure 1 represent real driving data collected from actual vehicle runs. These routes have relatively long travel distances and contain busy sections in daily applications. To thoroughly validate the effectiveness of the proposed ARIMA-LSTM, four working condition data, namely, the Third Ring Road (3-rd Ring), the Fourth Ring Road (4-th Ring), the New European Driving Cycle (NEDC), and the Urban Dynamometer Driving Schedule (UDDS), are suggested. The four working conditions data are specifically shown in Figure 2.

As functional routes in daily life, typical places such as schools, stations, hospitals, shopping centers, and scenic spots have a large proportion of daily schedules. Therefore, the selected scenarios included most of those conditions to make the test results more credible. The statistical information of the natural driving speed conditions of the four vehicles is calculated, as shown in Table 1. The data for the four conditions in Table 1 correspond to the four plots in Figure 2.

2.2. The Autoregressive Integrated Moving Average Modeling, ARIMA

ARIMA is widely recognized as a highly effective linear statistical model for time series forecasting. ARIMA, which is a time series forecasting method introduced by Box and Jenkins in the early 1970s, could achieve real-time effectiveness, and good performance for short-term prediction [37,38]. The core of the ARIMA model involves forecasting using historical observations at p sampling intervals and historical random disturbances at q sampling intervals, expressed mathematically as:

\begin{array}{l} x_{t} = ϕ_{0} + ϕ_{1} x_{t - 1} + ϕ_{2} x_{t - 2} + \dots + ϕ_{p} x_{t - p} + \\ ε_{t} - θ_{1} ε_{t - 1} - \dots - θ_{q} ε_{t - q}, t \in Z . \end{array}

(1)

where p represents AR model order, q represents MA model order, x_t represents predicted data,

x_{t - 1}

,

x_{t - 2}

, and

x_{t - p}

represent historical observations in period p,

ϕ_{0}, ϕ_{1}, ϕ_{2}

, and

ϕ_{p}

represent AR model coefficients,

ε_{t - 1}

,

ε_{t - 2}

, and

ε_{t - q}

represent historical noise disturbances in period q, and

ε ~ N (0)

,

θ_{1}

,

θ_{2}

, and

θ_{q}

represent the MA model coefficients.

In this paper, the modeling process is completed from four aspects: the smoothness test, model order, model parameter estimation, and model test of the vehicle speed series as shown in Figure 3. The details of ARIMA modeling can be illustrate as follows:

(1) Smoothness of the speed series is tested using the ADF test. If the speed series is non-smooth, the difference method is used to flatten it. Next, the autocorrelation function (ACF) and partial autocorrelation function (PACF) are examined to aid in selecting the model type.

(2) Model order determination will be based on the BIC criterion. This will help determine the number of model orders under the various driving behaviors. The

B I C

formula is:

B I C = k l n (n) - 2 l n (L) .

(2)

where

k

represents the number of unknown parameters,

n

represents the sample size, and

L

represents the likelihood function.

(3) Model parameter estimation adopts the maximum likelihood estimation method of the parameters of the ARIMA model. The expression of the system of equations for the log-likelihood function

l (\tilde{β}; \tilde{x})

of

\tilde{x}

is given by:

{\begin{cases} \frac{\partial}{\partial σ_{ε}^{2}} l (\tilde{β}; \tilde{x}) = - \frac{n}{2 σ_{ε}^{2}} + \frac{S (\tilde{β})}{2 σ_{ε}^{4}} = 0 \\ \frac{\partial}{\partial \tilde{β}} l (\tilde{β}; \tilde{x}) = - \frac{1}{2} \frac{\partial l n | Ω |}{\partial \tilde{β}} - \frac{1}{σ_{ε}^{2}} \frac{\partial S (\tilde{β})}{2 \partial \tilde{β}} = 0 \end{cases}

(3)

Style:

\tilde{x} = {(x_{1}, \dots, x_{n})}^{T}, \tilde{β} = {(ϕ_{1}, \dots, ϕ_{p}, θ_{1}, \dots, θ_{q})}^{T}

,

S (\tilde{β}) = {\tilde{x}}^{'} Ω^{- 1} \tilde{x}

,

Ω = [\begin{matrix} \sum_{i = 0}^{\infty} G_{i}^{2} & \dots & \sum_{i = 0}^{\infty} G_{i} G_{i + n - 1} \\ ⋮ & ⋱ & ⋮ \\ \sum_{i = 0}^{\infty} G_{i} G_{i + n - 1} & \dots & \sum_{i = 0}^{\infty} G_{i}^{2} \end{matrix}] .

where G represents the Green’s function of the ARIMA model.

(4) Model testing, using the LB (ljung-box) method to test the model for significance, i.e., white noise test of the residual series. The expression is given as:

L B = n (n + 2) \sum_{k = 1}^{m} \frac{{\tilde{ρ}}_{k}^{2}}{n - k} ~ χ^{2} (m) \begin{matrix} \forall m > 0 \end{matrix} .

(4)

2.3. The Long Short-Term Method Modeling, LSTM

As a new type of deep learning network, the LSTM network belongs to a kind of RNN. Compared with the traditional RNN, the LSTM network adds a memory module consisting of one storage unit and three logic gates in each hidden layer neuron, which can realize the functions of reading, writing, and saving, so that the network has the function of preserving the feedback error during gradient propagation, which greatly improves the convergence of the network, and makes the network not easy to fall into the local optimal solution. The LSTM network structure is depicted in Figure 4, where the three logic gates are the input gate, forget gate, and output gate, respectively.

The neurons in Figure 4 are the hidden layer neurons in the recurrent neural network, and the addition of the long and short-term memory module structure gives the network the ability to learn long-term information. The three logic gates of this memory module receive current state information from the input layer and the output of the previous memory module. They incorporate this information using a logic function (S-function) to update the current state of the memory unit. The output of the state of this neuron is:

c_{t} = f_{t} c_{t - 1} + i_{t} t a n h (W_{x c} V_{x_r e s i d u a l s} + W_{h c} h_{t - 1} + b_{c}) .

(5)

where

c_{t - 1}

is the output of the

t - 1

th neuron;

f_{t}

and

i_{t}

are the output results of the forgetting and input gates, respectively;

W_{x c}

is the weight coefficients from the input layer to the current hidden layer;

W_{h c}

is the weight coefficients from the previous memory module to the current memory module;

h_{t - 1}

is the output of the previous memory module; and

b_{c}

is the bias term specific to the current memory module.

2.4. ARIMA-LSTM

To achieve the better prediction results, we constructed a hybrid model that combines the advantages of ARIMA and LSTM. The autopilot data, being time series data, can be assumed to consist of both linear and nonlinear components, represented as follows:

x_{t} = L_{t} + N_{t} + ε_{t}

(6)

where

L_{t}

represents the linearity components of the data at time t which processed with ARIMA,

N_{t}

represents the nonlinearity components which processed with LSTM, and

ε_{t}

represents the error term.

The proposed hybrid ARIMA-LSTM algorithm is listed in Algorithm 1. According to the prediction process, the hybrid model can be segmented into four steps:

(1) Record raw data. The data used in this study come from two randomly selected real-world driving scenarios and two typical cycle datasets, thereby reflecting actual vehicle speed changes on the road.

(2) Linear prediction for ARIMA modeling. The ARIMA model extracts the linear component

L_{t}

from the production time series, producing the residual term

ε_{L}

and autoregressive order p, which are then used as inputs for the following step.

(3) Nonlinear prediction for LSTM modeling. The LSTM model in the ARIMA-LSTM uses the residuals from the ARIMA model as its sole input. Therefore, we predict the nonlinear data as

N_{ε}

which is derived from the current input and the previous output, as shown in the LSTM network structure in Figure 4.

(4) Coupling and evaluation of final results. The final step involves adding the prediction results from the ARIMA model to those from the LSTM network to obtain the fitted production time series. The accuracy and error effects of the final results are then evaluated.

Algorithm 1: ARIMA-LSTM

Forecasting process begins
1: Predicted input vehicle speed data working speed sequence

V_{x}

2: V_x smoothing process (determination of parameter D)
3: V_x_ARIMA(4,4) (vehicle speed model initial order p,q)
4:

l (\tilde{β}; x ˜) = - \frac{n}{2} l n (2 π) - \frac{n}{2} (σ_{ε}^{2}) - \frac{1}{2} l n | Ω | - \frac{1}{2 σ_{ε}^{2}} ({\tilde{x}}^{'} Ω^{- 1} \tilde{x})

(model parameter estimation)
5: If

L B \sim χ^{2} (m)

(model checking)
6:

V_{x_A R I M A_p r e}

(ARIMA modeling of vehicle speed 15 s time-domain prediction)
7: End if
8:

V_{x_r e s i d u a l s} = V_{x_p r e} + V_{x_A R I M A_p r e}

9: Input LSTM
10: Training of LSTM
11:

V_{x_r e s i d u a l s_p r e}

(Predicted output of speed residuals)
12:

V_{x_p r e} = V_{x_A R I M A_p r e} + V_{x_r e s i d u a l s_p r e}

(ARIMA-LSTM modeling of vehicle speed 15 s time-domain prediction)
13: End if
End of the prediction process, return to step 6 to loop through the prediction

3. Analysis of Vehicle Speed Prediction

3.1. Evaluation Indicators

To assess performance across various experimental scenarios, we chose scientific metrics for evaluating time series prediction accuracy. The selected metrics include RMSE, MAE, and MAPE. These metrics are described as follows:

R M S E = \sqrt{\frac{1}{n} \sum_{i = 1}^{n} {(x_{i} (t) - y_{i} (t))}^{2}},

(7)

M A E = \frac{1}{n} \sum_{i = 1}^{n} | x_{i} (t) - y_{i} (t) |,

(8)

M A P E = \frac{1}{n} \sum_{i = 1}^{n} | \frac{x_{i} (t) - y_{i} (t)}{x_{i} (t)} |

(9)

where

x_{i} (t)

represents the original vehicle speed data values,

y_{i} (t)

represents the yield values predicted by different models, and n represents the number of time series. In general, lower values of RMSE, MAE, and MAPE signify improved prediction performance.

3.2. Linear Prediction of ARIMA Model

Regardless of the driving cycles, all the velocity data accounting for 80% of the entire series length are used for training, and the remaining 20% is used for testing. According to this rule, the first 1395 samples of the 4-th Ring are used for training, and the remaining 337 samples are used for testing. The speed series for 3-rd Ring, UDDS, and NEDC are about 2302, 1397, and 1165 samples for training, respectively, and the remaining samples used for test.

The ARIMA model is employed for vehicle speed prediction, which generates residual components. From the basic, the crucial parameters, p, d, and q must be initially determined to ensure the prediction performance. The maximum differencing order d is set at 2 to avoid excessive differencing. For the 4-th Ring data, Figure 5 illustrates the original speed series and its two-order differencing, ensuring stationary. The ACF and PACF plots aid in selecting the optimal values of p and q, as depicted in Figure 6. The two blue lines represent the reasonable bounds of the ACF and PACF values. When the sample data autocorrelation exceeds the upper limits, it indicates that the original gradient data are not stationary. According to the results, the optimal values for p and q are determined to be 4 and 4. A consistent pattern of these optimal values across four speed conditions is presented in Table 2.

After identifying the parameters of the ARIMA (p, D, q) model, the speeds are predicted for all four speed conditions, and only the 4-th Ring conditions are represented here to illustrate the whole process of single-step prediction with a prediction horizon of 15 s, as shown in Figure 7. As results show the ARIMA follows the trend well within the prediction horizon. Using the ARIMA model’s predictions, the residual time series, which represents the prediction errors, is obtained. This is illustrated in the lower part of each speed profile, as indicated by the blue line.

According to the hybrid method, the residual time series results are used to prepare the input data for the LSTM model. The black line illustrates the ARIMA model residual predictions for the test set, which serve as the test data for the LSTM model, as depicted in Figure 7. In order to test the linear prediction performance, the ARIMA evaluation metrics are calculated in Table 3. The results illustrate the ARIMA could take the prediction and has a good performance in NEDC conditions. That is because the NEDC contain most linear parts, as shown in Figure 2.

3.3. Nonlinear Prediction of LSTM Modeling

The LSTM model is developed to discover nonlinear relationships in speed series. In the ARIMA-LSTM hybrid model, the residual values generated by the ARIMA model are utilized as the only input for the LSTM model. During the LSTM prediction process, in order to predict the

v (t)

,

v (t - 1), v (t - 2), v (t - 3), \dots, v (t - p)

are used. Therefore, in order to predict

v (t + 1)

,

v (t)

is included as an input value together with the terms mentioned previously.

To achieve optimal fitting and mitigate training bias, we standardize the training data by centering its mean at zero and scaling its variance to one [39]. The normalization parameters are computed as follows:

X_{t, N} = \frac{X_{t} - {\bar{X}}_{t}}{S t d (X_{t})}, Y_{t, N} = \frac{Y_{t} - {\bar{Y}}_{t}}{S t d (Y_{t})} .

(10)

where

X_{t, N}

and

Y_{x, N}

represent the normalized input speed sequence and output speed data, respectively,

X_{t}

and

Y_{t}

represent the original input speed sequence and output speed data,

{\bar{X}}_{t}

and

{\bar{Y}}_{t}

represent the mean values of input and output. In addition,

S t d (X_{t})

and

S t d (Y_{t})

represent the standard deviation.

Overfitting poses a common challenge during the training phase, where the model performs well on the training dataset but poorly on new data. To mitigate this issue, a ‘discard layer’ is integrated into the LSTM model architecture. Our neural network design comprises input, LSTM, discard, fully connected, and regression layers. Given the intricate nature of LSTM models, parameter tuning significantly influences prediction accuracy. Our experiments reveal that increasing the number of iterations improves performance, with final results set at 100, 200, 150, and 100 for different speed conditions. Other hyperparameters utilized include a small batch size of 32, Adam optimization, a dropout rate of 0.2, and an initial learning rate of 0.001. Figure 8 presents a comparison of LSTM predictions for residuals derived from ARIMA fitting under the 4-th Ring speed condition.

Table 4 presents the errors obtained using the LSTM algorithm for the ARIMA residual time series across the four speed conditions. Notably, under the NEDC condition, both RMSE and MAE values are notably smaller at 0.0206 and 0.0191, respectively, compared to the first three conditions. This observation aligns with the earlier assertion regarding ARIMA’s proficiency in predicting linear data, thus justifying its utilization for forecasting the linear segment of speed data in this study.

3.4. Coupling Prediction Results

In this section, the speed sequence predicted by ARIMA and ARIMA-LSTM for the 4-th Ring is shown in Figure 9. In the speed plots for each condition, the prediction curves are compared with the original test data (red line). The plots include predictions from ARIMA (green line) and ARIMA-LSTM (black line). As shown in Figure 9, the ARIMA model can predict the linear trend for all four conditions, especially for condition four where the raw speed data are nearly constant. However, it is not possible for ARIMA to capture the nonlinear fluctuations. Therefore, the LSTM model is used to learn the oscillations as well as the instability information. The results show that the ARIMA-LSTM model gives better results than the ARIMA model.

In order to illustrate the ARIMA-LSTM prediction performance along the vehicle traveling in real-time, the whole prediction process under four speed conditions is shown in Figure 10. The prediction time domain of each step is 15 s. According to the prediction results, both the ARIMA model and ARIMA-LSTM are able to predict the results well in the stable data change phase. However, for the speed series like 4-th Ring, 3-rd Ring, and UDDS which are mostly in the unstable variation range, ARIMA can only predict the linear part well, while the results are bad where the speed suddenly increases or decreases.

Further examination of vehicle driving patterns under multi-step prediction shows that the hybrid model ARIMA-LSTM proposed in this paper is highly capable of predicting acceleration and deceleration, as demonstrated in scenarios like the 4-th and 3-rd Rings. For example, the vehicle is accelerating and decelerating at step 1490 in Figure 10a and step 2050 in Figure 10b, and it can be seen that the black line (ARIMA) is almost unpredictable while the green line (ARIMA-LSTM) is able to follow the original data closely. Similarly, for UDDS and NEDC, where emergency stops and uniform speeds are prevalent, the hybrid model ARIMA-LSTM demonstrates superior performance, albeit with ARIMA dominance in certain cases. Based on the comparison results, the hybrid ARIMA-LSTM proposed in this paper is able to predict the sudden change of the speed sequence of the vehicle, which can provide a particularly important and accurate reference for the application of energy saving and safety of the vehicle.

4. Comparison and Discussion

4.1. The Prediction Performance under Single Step

In order to fully utilize the linear and nonlinear models, a hybrid model ARIMA-LSTM is introduced into vehicle speed prediction. The other models, including ARIMA, LSTM, RNN, CNN, and WNN (wavelet neural network), are considered and compared to evaluate the prediction performance of ARIMA-LSTM under four speed conditions. It is worth noting that WNN represents state-of-the-art methods [18].

Figure 11 shows the metrics for the different models corresponding to the single-step prediction with 15 s prediction horizons. It is evident that across the first three conditions, there is a significant reduction in error indicators from ARIMA to LSTM, and further to ARIMA-LSTM. However, for NEDC, the error initially increases from ARIMA to LSTM, but subsequently decreases from LSTM to ARIMA-LSTM, with ARIMA-LSTM achieving minimal error values across all datasets. Meanwhile, compared with the other models RNN, CNN, and WNN, the errors of ARIMA-LSTM are optimal in terms of the overall effect of the performance of Figure 11, which once again verifies the effectiveness of the ARIMA-LSTM hybrid model proposed in this paper.

4.2. The Prediction Performance under Multi-Step

The above analysis of the error in the 15 s prediction time domain from the single-step prediction shows that the ARIMA-LSTM hybrid model is the most effective, and the following analysis of the whole test data from the multi-step prediction. As shown in Table 5, for the entire test data of the fifth average error results, it can clearly be seen that this is the same as the effect in the single-step prediction, for the first three conditions. To take the 4-th Ring as an example, from the ARIMA and LSTM to the ARIMA-LSTM hybrid model, the values of the RMSE are 1.1075, 0.5873, and 0.2440, the values of the MAPE are 4.29, 2.43, and 0.71, the values of MAE are 0.8251, 0.4774, and 0.1746, and the values of the error metrics are all decreasing in order. For the fifth prediction results of NEDC, using the RMSE as an example, the value of 1.1945 for ARIMA is less than the value of 2.3907 for the LSTM, which indicates that the linear data of this condition is dominant but the hybrid model’s performance is still optimal and the value of 0.3105 is the smallest.

Figure 12, Figure 13 and Figure 14 show the statistical plots of different models for the four vehicle speed conditions at fifths, tenths, and fifteenths. As we all know, lower metrics of RMSE, MAPE, and MAE represent better prediction performance. Taking the 4-th Ring, the 3-rd Ring, and the UDDS, which represent the nonlinear data-dominated condition, the statistical graphs illustrate a consistent trend across all metrics, with the hybrid model outperforming ARIMA and LSTM models in fifths, tenths, and fifteenths. However, the prediction results of the fifths show the convex results for ARIMA, LSTM, and ARIMA-LSTM three models under the NEDC which represent the nonlinear data-dominated condition. This is due to the fact that the ARIMA model is more suitable for predicting linear data than the LSTM model. However, the ARIMA model does not have long-term memory, whereas the LSTM does. This is the main reason why the LSTM is better than the ARIMA in the tenths and fifteenths. However, overall the ARIMA-LSTM hybrid model has the lowest statistic plots.

Notably, the ARIMA-LSTM hybrid outperforms compared to the standalone ARIMA and LSTM models, as well as RNN, CNN, and WNN models in multi-step prediction, highlighting its adaptability and efficiency in capturing both linear and nonlinear temporal patterns. ARIMA excels at linear speed trends, while LSTM is particularly proficient in capturing nonlinear fluctuations. Consequently, the proposed ARIMA-LSTM hybrid model presents a robust and efficient solution for vehicle speed time series prediction, with significant implications for energy conservation and travel safety.

5. Conclusions

The accuracy of the short-term velocity prediction is beneficial for energy-saving performance in predictive energy management strategies such as providing reliable power curves for model predictive control. This enables better power tracking and distribution among multiple power sources, thereby enhancing vehicle performance and energy utilization efficiency. However, the nonlinear feature of the speed series hinders the improvement of prediction accuracy. The study focuses on developing reliable and accurate prediction models for vehicle velocity series to handle the nonlinear part efficiently. Thus, a novel hybrid model that combines ARIMA and LSTM models is proposed in this study. Generally, the ARIMA component filters out linear trends from the speed series data, and the LSTM handles the residual normalized nonlinear items. Finally, the two parts of the prediction results are superimposed to obtain the final speed prediction results. Two actual driving cycles and two typical driving scenarios are subjected to rigorous analysis. The compared results demonstrate that the combined prediction model outperforms the individual methods ARIMA and LSTM, as well as the RNN, CNN, and WNN models. Taking the 4-th Ring as an example in multi-step prediction, the ARIMA-LSTM model reduces the RMSE at fifths by 0.8635 compared to the ARIMA model and by 0.3433 compared to the LSTM model. Additionally, compared to the RNN CNN and WNN models, it reduces the RMSE by 1.4138, 0.6226, and 0.4059, respectively. In conclusion, the hybrid ARIMA-LSTM model emerges as the optimal choice for predicting vehicle speed during travel, offering improved performance in dealing with complex, nonlinear variations. The proposed hybrid model provides a further improvement for the accuracy prediction of vehicle traveling processes.

Author Contributions

Conceptualization, B.M.; methodology, B.M.; software, W.W.; validation, W.W. and X.G.; formal analysis, Y.C.; investigation, W.W. and X.G.; resources, B.M. and Y.X.; writing—original draft preparation, W.W. and X.G.; writing—review and editing, W.W., B.M. and X.G.; visualization, Y.C. and Y.X.; supervision, W.W., B.M., X.G., Y.C. and Y.X. All authors have read and agreed to the published version of the manuscript.

Funding

This work is funded by the National Natural Science Foundation of Beijing (Grant number 3212005, Grant number 3244039), the National Natural Science Foundation of China (Grant number 52302425, Grant number 51608040), and the Open Research Fund of the Public Security Behavioral Science Laboratory (Grant number 2023ZB02).

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to confidentiality reasons related to laboratory data.

Conflicts of Interest

The authors declare no conflicts of interest.

References

Zhang, Y.; Huang, Z.; Zhang, C.; Lv, C.; Deng, C.; Hao, D.; Chen, J.; Ran, H. Improved Short-Term Speed Prediction Using Spatiotemporal-Vision-Based Deep Neural Network for Intelligent Fuel Cell Vehicles. IEEE Trans. Ind. Inform. 2021, 17, 6004–6013. [Google Scholar] [CrossRef]
Zhang, F.; Xi, J.; Langari, R. Real-Time Energy Management Strategy Based on Velocity Forecasts Using V2V and V2I Communications. IEEE Trans. Intell. Transp. Syst. 2017, 18, 416–430. [Google Scholar] [CrossRef]
Li, Y.; Chen, M.; Zhao, W. Investigating long-term vehicle speed prediction based on BP-LSTM algorithms. IET Intell. Transp. Syst. 2019, 13, 1281–1290. [Google Scholar] [CrossRef]
Ma, C.; Zhao, Y.; Dai, G.; Xu, X.; Wong, S.-C. A Novel STFSA-CNN-GRU Hybrid Model for Short-Term Traffic Speed Prediction. IEEE Trans. Intell. Transp. Syst. 2023, 24, 3728–3737. [Google Scholar] [CrossRef]
Zhang, L.; Liu, W.; Qi, B. Combined Prediction for Vehicle Speed with Fixed Route. Chin. J. Mech. Eng. 2020, 33, 60. [Google Scholar] [CrossRef]
Zhang, L.; Liu, W.; Qi, B. Energy optimization of multi-mode coupling drive plug-in hybrid electric vehicles based on speed prediction. Energy 2020, 206, 118126. [Google Scholar] [CrossRef]
Karasu, S.; Altan, A.; Bekiros, S.; Ahmad, W. A new forecasting model with wrapper-based feature selection approach using multi-objective optimization technique for chaotic crude oil time series. Energy 2020, 212, 118750. [Google Scholar] [CrossRef]
Altan, A.; Karasu, S. The effect of kernel values in support vector machine to forecasting performance of financial time series and cognitive decision making. J. Cogn. Syst. 2019, 4, 17–21. [Google Scholar]
Zhang, X.; Lei, Y.; Chen, H.; Zhang, L.; Zhou, Y. Multivariate Time-Series Modeling for Forecasting Sintering Temperature in Rotary Kilns Using DCGNet. IEEE Trans. Ind. Inform. 2021, 17, 4635–4645. [Google Scholar] [CrossRef]
Li, C.; Tang, G.; Xue, X.; Saeed, A.; Hu, X. Short-Term Wind Speed Interval Prediction Based on Ensemble GRU Model. IEEE Trans. Sustain. Energy 2020, 11, 1370–1380. [Google Scholar] [CrossRef]
Altan, A.; Karasu, S.; Bekiros, S. Digital currency forecasting with chaotic meta-heuristic bio-inspired signal processing techniques. Chaos Solitons Fractals 2019, 126, 325–336. [Google Scholar] [CrossRef]
Lefevre, S.; Sun, C.; Bajcsy, R.; Laugier, C. Comparison of parametric and non-parametric approaches for vehicle speed prediction. In Proceedings of the 2014 American Control Conference, Portland, OR, USA, 4–6 June 2014; pp. 3494–3499. [Google Scholar] [CrossRef]
Huang, Y.; Qian, L.; Feng, A.; Wu, Y.; Zhu, W. RFID Data-Driven Vehicle Speed Prediction via Adaptive Extended Kalman Filter. Sensors 2018, 18, 2787. [Google Scholar] [CrossRef]
Turki, A.I.; Hasson, S.T. A Markova-Chain Approach to Model Vehicles Traffic Behavior. In Proceedings of the 2022 International Conference of Science and Information Technology in Smart Administration (ICSINTESA), Denpasar, Indonesia, 10–12 November 2022; pp. 117–122. [Google Scholar] [CrossRef]
Jiang, B.; Fei, Y. Vehicle Speed Prediction by Two-Level Data Driven Models in Vehicular Networks. IEEE Trans. Intell. Transp. Syst. 2017, 18, 1793–1801. [Google Scholar] [CrossRef]
Jing, J.; Kurt, A.; Ozatay, E.; Michelini, J.; Filev, D.; Ozguner, U. Vehicle Speed Prediction in a Convoy Using V2V Communication. In Proceedings of the 2015 IEEE 18th International Conference on Intelligent Transportation Systems, Gran Canaria, Spain, 15–18 September 2015; pp. 2861–2868. [Google Scholar] [CrossRef]
Park, J.; Murphey, Y.L.; McGee, R.; Kristinsson, J.G.; Kuang, M.L.; Phillips, A.M. Intelligent Trip Modeling for the Prediction of an Origin–Destination Traveling Speed Profile. IEEE Trans. Intell. Transp. Syst. 2014, 15, 1039–1053. [Google Scholar] [CrossRef]
Ivanyuk, V. Forecasting of digital financial crimes in Russia based on machine learning methods. J. Comput. Virol. Hacking Tech. 2023, 1–14. [Google Scholar] [CrossRef]
Young, P.; Shellswell, S. Time series analysis, forecasting and control. IEEE Trans. Autom. Control 1972, 17, 281–283. [Google Scholar] [CrossRef]
Wilson, G.T. Time Series Analysis: Forecasting and Control, 5th ed.; Box, G.E.P., Jenkins, G.M., Reinsel, G.C., Ljung, G.M., Eds.; John Wiley and Sons Inc.: Hoboken, NJ, USA; p. 712. ISBN 978-1-118-67502-1.
Guo, J.; He, H.; Sun, C. ARIMA-Based Road Gradient and Vehicle Velocity Prediction for Hybrid Electric Vehicle Energy Management. IEEE Trans. Veh. Technol. 2019, 68, 5309–5320. [Google Scholar] [CrossRef]
Shi, Q.; Abdel-Aty, M. Big Data applications in real-time traffic operation and safety monitoring and improvement on urban expressways. Transp. Res. Part C Emerg. Technol. 2015, 58, 380–394. [Google Scholar] [CrossRef]
Xing, J.; Chu, L.; Hou, Z.; Sun, W.; Zhang, Y. Energy Management Strategy Based on a Novel Speed Prediction Method. Sensors 2021, 21, 8273. [Google Scholar] [CrossRef]
Gataullin, T.; Gataullin, S. Management of Financial Flows on Transport. In Proceedings of the 2019 Twelfth International Conference “Management of Large-Scale System Development” (MLSD), Moscow, Russia, 1–3 October 2019; pp. 1–4. [Google Scholar]
Jiang, H.; Zou, Y.; Zhang, S.; Tang, J.; Wang, Y. Short-Term Speed Prediction Using Remote Microwave Sensor Data: Machine Learning versus Statistical Model. Math. Probl. Eng. 2016, 2016, 9236156. [Google Scholar] [CrossRef]
Li, Y.; Wu, C.; Yoshinaga, T. Vehicle Speed Prediction with Convolutional Neural Networks for ITS. In Proceedings of the 2020 IEEE/CIC International Conference on Communications in China (ICCC Workshops), Chongqing, China, 9–11 August 2020; pp. 41–46. [Google Scholar] [CrossRef]
Park, J.; Li, D.; Murphey, Y.L.; Kristinsson, J.; McGee, R.; Kuang, M.; Phillips, T. Real time vehicle speed prediction using a Neural Network Traffic Model. In Proceedings of the 2011 International Joint Conference on Neural Networks, San Jose, CA, USA, 31 July–5 August 2011; pp. 2991–2996. [Google Scholar] [CrossRef]
Du, C.; Wang, Z.; Malcolm, A.A.; Ho, C.L. Imitation Learning for Autonomous Driving Based on Convolutional and Recurrent Neural Networks. In Proceedings of the 2021 International Conference on High Performance Big Data and Intelligent Systems (HPBD&IS), Macau, China, 5–7 December 2021; pp. 256–260. [Google Scholar] [CrossRef]
Hochreiter, S.; Schmidhuber, J. Long Short-Term Memory. Neural Comput. 1997, 9, 1735–1780. [Google Scholar] [CrossRef] [PubMed]
Geng, Q.; Liu, Z.; Li, B.; Zhao, C.; Deng, Z. Long-Short Term Memory-Based Heuristic Adaptive Time-Span Strategy for Vehicle Speed Prediction. IEEE Access 2023, 11, 65559–65568. [Google Scholar] [CrossRef]
Fan, D.; Sun, H.; Yao, J.; Zhang, K.; Yan, X.; Sun, Z. Well production forecasting based on ARIMA-LSTM model considering manual operations. Energy 2021, 220, 119708. [Google Scholar] [CrossRef]
Phan, T.-T.-H.; Nguyen, X.H. Combining statistical machine learning models with ARIMA for water level forecasting: The case of the Red river. Adv. Water Resour. 2020, 142, 103656. [Google Scholar] [CrossRef]
Temür, A.S.; Akgün, M.; Temür, G. Predicting housing sales in turkey using arima, lstm and hybrid models. J. Bus. Econ. Manag. 2019, 20, 920–938. [Google Scholar] [CrossRef]
Ma, R.; Li, Z.; Breaz, E.; Liu, C.; Bai, H.; Briois, P.; Gao, F. Data-Fusion Prognostics of Proton Exchange Membrane Fuel Cell Degradation. IEEE Trans. Ind. Appl. 2019, 55, 4321–4331. [Google Scholar] [CrossRef]
Sun, Y.; Zhao, Z.; Ma, X.; Du, Z. Hybrid Model for Efficient Anomaly Detection in Short-timescale GWAC Light Curves and Similar Datasets. Proc. Inst. Syst. Program. RAS 2019, 31, 33–40. [Google Scholar] [CrossRef] [PubMed]
Ji, L.; Zou, Y.; He, K.; Zhu, B. Carbon futures price forecasting based with ARIMA-CNN-LSTM model. Procedia Comput. Sci. 2019, 162, 33–38. [Google Scholar] [CrossRef]
Rajeevan, A.K.; Shouri, P.V.; Nair, U. ARIMA Based Wind Speed Modeling for Wind Farm Reliability Analysis and Cost Estimation. J. Electr. Eng. Technol. 2016, 11, 869–877. [Google Scholar] [CrossRef]
Zhang, J.; Wei, Y.-M.; Li, D.; Tan, Z.; Zhou, J. Short term electricity load forecasting using a hybrid model. Energy 2018, 158, 774–781. [Google Scholar] [CrossRef]
Tsapin, D.; Pitelinskiy, K.; Suvorov, S.; Osipov, A.; Pleshakova, E.; Gataullin, S. Machine learning methods for the industrial robotic systems security. J. Comput. Virol. Hacking Tech. 2023, 1–18. [Google Scholar] [CrossRef]

Figure 1. Natural driving data driving routes.

Figure 2. Natural driving vehicle speed data curves.

Figure 3. ARIMA modeling process.

Figure 4. The basic structure of LSTM.

Figure 5. The original speed series of 4-th Ring and after differencing.

Figure 6. The optimal p and q with the graphical method.

Figure 7. The linear modeling single-step results of the ARIMA method of 4-th Ring.

Figure 8. Comparison of LSTM modeling single-step results of 4-th Ring for ARIMA residuals.

Figure 9. Comparison of 4-th Ring hybrid modeling single-step speed prediction results.

Figure 10. Hybrid modeling results for four speeds.

Figure 11. The metric evaluation indicators of three models’ performances.

Figure 12. RMSE values of different models for different prediction step sizes.

Figure 13. MAPE values of different models for different prediction step sizes.

Figure 14. MAE values of different models for different prediction step sizes.

Table 1. The statistical information of four-speed data.

	Count	Mean	Min	Max	Standard Derivation
4-th Ring	1746	37.3143	2.436	69.0444	16.5624
3-rd Ring	2302	25.7561	0.019	67.704	18.1411
UDDS	1397	30.8974	0	91.0156	23.4633
NEDC	1165	34.0779	0	120	30.8298

Table 2. The p, D, q values of four-speed data.

Name	p Value	D Value	q Value
4-th Ring	4	2	4
3-rd Ring	5	2	5
UDDS	5	2	4
NEDC	3	2	5

Table 3. The errors of four-speed production data using ARIMA: the prediction is 15 s.

Name	RMSE	MAPE/%	MAE
4-th Ring	1.6448	8.38	1.1543
3-rd Ring	2.8286	3.76	1.9545
UDDS	5.2525	9.37	3.8257
NEDC	0.174	0.25	0.1259

Table 4. The errors of four-speed residuals using LSTM modeling.

Name	RMSE	MAPE/%	MAE
4-th Ring	0.1422	18.57	0.1125
3-rd Ring	0.2429	175.3	0.1729
UDDS	0.5431	14.10	0.4243
NEDC	0.0206	150.7	0.0191

Table 5. Comparison of the fifth speed prediction.

Speed	Model	RMSE	MAPE/%	MAE
4-th Ring	ARIMA	1.1075	4.29	0.8251
	LSTM	0.5873	2.43	0.4774
	ARIMA-LSTM	0.2440	0.71	0.1746
	RNN	0.8666	2.71	0.7299
	CNN	1.6578	5.99	1.2204
	WNN	0.6499	4.92	0.5396
3-rd Ring	ARIMA	1.2428	7.17	0.8157
	LSTM	1.1171	2.53	0.9098
	ARIMA-LSTM	0.1028	1.35	0.0739
	RNN	1.7329	2.67	1.2673
	CNN	0.5461	1.76	0.3996
	WNN	0.5492	1.88	0.3184
UDDS	ARIMA	3.5427	43.3	2.5384
	LSTM	0.9391	3.83	0.7126
	ARIMA-LSTM	0.3505	2.94	0.2797
	RNN	0.8574	9.95	0.6988
	CNN	0.7657	9.95	0.5692
	WNN	0.6477	5.86	0.4256
NEDC	ARIMA	1.1945	0.80	0.5941
	LSTM	2.3907	2.62	2.2250
	ARIMA-LSTM	0.3105	0.30	0.2279
	RNN	0.3289	0.33	0.2741
	CNN	0.6321	0.63	0.4785
	WNN	0.6640	0.75	0.4183

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Wang, W.; Ma, B.; Guo, X.; Chen, Y.; Xu, Y. A Hybrid ARIMA-LSTM Model for Short-Term Vehicle Speed Prediction. Energies 2024, 17, 3736. https://doi.org/10.3390/en17153736

AMA Style

Wang W, Ma B, Guo X, Chen Y, Xu Y. A Hybrid ARIMA-LSTM Model for Short-Term Vehicle Speed Prediction. Energies. 2024; 17(15):3736. https://doi.org/10.3390/en17153736

Chicago/Turabian Style

Wang, Wei, Bin Ma, Xing Guo, Yong Chen, and Yonghong Xu. 2024. "A Hybrid ARIMA-LSTM Model for Short-Term Vehicle Speed Prediction" Energies 17, no. 15: 3736. https://doi.org/10.3390/en17153736

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Article metric data becomes available approximately 24 hours after publication online.

Article Menu

A Hybrid ARIMA-LSTM Model for Short-Term Vehicle Speed Prediction

Abstract

1. Introduction

2. Methodology

2.1. Description of Vehicle Speed Data

2.2. The Autoregressive Integrated Moving Average Modeling, ARIMA

2.3. The Long Short-Term Method Modeling, LSTM

2.4. ARIMA-LSTM

3. Analysis of Vehicle Speed Prediction

3.1. Evaluation Indicators

3.2. Linear Prediction of ARIMA Model

3.3. Nonlinear Prediction of LSTM Modeling

3.4. Coupling Prediction Results

4. Comparison and Discussion

4.1. The Prediction Performance under Single Step

4.2. The Prediction Performance under Multi-Step

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI