A Comparative Study of Vehicle Velocity Prediction for Hybrid Electric Vehicles Based on a Neural Network

Zhang, Pei; Lu, Wangda; Du, Changqing; Hu, Jie; Yan, Fuwu

doi:10.3390/math12040575

Open AccessArticle

A Comparative Study of Vehicle Velocity Prediction for Hybrid Electric Vehicles Based on a Neural Network

by

Pei Zhang

^1,2,3,*

,

Wangda Lu

^1,2,3,

Changqing Du

^1,2,3

,

Jie Hu

^1,2,3,4 and

Fuwu Yan

^1,2,3

¹

Hubei Key Laboratory of Advanced Technology for Automotive Components, Wuhan University of Technology, Wuhan 430070, China

²

Hubei Research Center for New Energy & Intelligent Connected Vehicle Engineering, Wuhan University of Technology, Wuhan 430070, China

³

Hubei Collaborative Innovation Center for Automotive Components Technology, Wuhan University of Technology, Wuhan 430070, China

⁴

Hubei Longzhong Laboratory, Wuhan University of Technology, Xiangyang 441000, China

^*

Author to whom correspondence should be addressed.

Mathematics 2024, 12(4), 575; https://doi.org/10.3390/math12040575

Submission received: 14 January 2024 / Revised: 11 February 2024 / Accepted: 12 February 2024 / Published: 14 February 2024

Download

Browse Figures

Versions Notes

Abstract

:

Vehicle velocity prediction (VVP) plays a pivotal role in determining the power demand of hybrid electric vehicles, which is crucial for establishing effective energy management strategies and, subsequently, improving the fuel economy. Neural networks (NNs) have emerged as a powerful tool for VVP, due to their robustness and non-linear mapping capabilities. This paper describes a comprehensive exploration of NN-based VVP methods employing both qualitative theory analysis and quantitative numerical simulations. The used methodology involved the extraction of key feature parameters for model inputs through the utilization of Pearson correlation coefficients and the random forest (RF) method. Subsequently, three distinct NN-based VVP models were constructed comprising the following: a backpropagation neural network (BPNN) model, a long short-term memory (LSTM) model, and a generative pre-training (GPT) model. Simulation experiments were conducted to investigate various factors, such as the feature parameters, sliding window length, and prediction horizon, and the prediction accuracy and computation time were identified as key performance metrics for VVP. Finally, the relationship between the model inputs and velocity prediction performance was revealed through various comparative analyses. This study not only facilitated the identification of an optimal NN model configuration to balance prediction accuracy and computation time, but also serves as a foundational step toward enhancing the energy efficiency of hybrid electric vehicles.

Keywords:

hybrid electric vehicles; vehicle velocity prediction; neural network; model inputs; prediction performance

MSC:

68T07

1. Introduction

Vehicle velocity prediction (VVP) has significant theoretical value and widespread applications in many vehicular applications, especially for energy saving and security control in new energy-based intelligent connected vehicles. For example, short-term VVP information can be applied in the energy management strategies (EMSs) and adaptive cruise control of hybrid electric vehicles [1]. To improve the fuel economy and safety of drivers, short-term VVP algorithms have been given top priority to enhance the effectiveness and viability of predictive energy management strategies (PEMSs). The prediction of vehicle velocity information in a timely and precise manner is of significant importance, providing useful instructions before implementing receding horizon control. However, the velocity of a vehicle is influenced by a variety of factors, including driving style, driving pattern, and traffic conditions, and accurate VVP remains one of the key bottlenecks for the application of PEMSs. Thus, VVP has become an important research focus related to PEMSs.

The mainstream VVP methods for PEMSs reported in the literature can be generally divided into two groups: stochastic approaches and deterministic approaches [2]. These two kinds of VVP methods predict the future vehicle velocity time series as a probability distribution interval and a single trajectory curve, respectively. A Markov chain (MC) is the most representative stochastic VVP algorithm [3]. A multi-level MC model is considered to be an effective measure for improving the prediction accuracy of an MC; however, the size of the transition probability matrix grows exponentially with additional model inputs, which leads to a high computational burden when using this algorithm, making it impossible to cover all potential Markov states [4,5]. Deterministic VVP algorithms can be further divided into parametric methods and non-parametric methods [6]. Parametric methods perform VVP by building models with various parameters, such as an auto-regressive and moving average model (ARMA), an exponential model (EM), or a gray model (GM). Parametric methods require pre-calibration of the model parameters for the target data. The randomness and non-linearity of a vehicle’s velocity hinder the parameter calibration process; accordingly, the predictive errors of parametric methods are higher than those of non-parametric methods [7]. In contrast, non-parametric methods are also known as data-driven methods, as they employ historical data to make predictions [8]. Data-driven methods are preferred when testing random driving cycles, and neural networks (NNs) have emerged as a powerful tool due to their robustness and non-linear mapping capabilities. Under the same or similar driving conditions, vehicle velocity changes are similar or even the same. Accurate VVP can be obtained by training an NN with a reasonable number of samples, modifying the weight ratio of the neuronal functions in the hidden layer and the output layer, storing the non-linear characteristics of velocity change in the NN model as a black box, and reproducing the non-linear characteristics in prediction behavior. Extensive and in-depth research has been conducted on different NN-based VVP methods. From the perspective of the network architecture, neural networks that have been most frequently studied in the field of VVP are commonly classified into four groups—namely, feedforward architectures, recurrent architectures, hybrid architectures, and attention mechanisms—as shown in Figure 1.

A unidirectional feedforward architecture flows from the input layer to the output layer. Two widely used configurations that fall under this category are backpropagation neural networks (BPNNs) and radial basis function neural networks (RBFNNs). The prediction accuracy of a BPNN is greatly affected by its initial weights and thresholds, making it vulnerable to local minimum issues. An RBFNN only needs to adjust a few significant weights that have an impact on the output, and the activation function used by the hidden layer’s nodes is a Gaussian radial basis function which is symmetric about the center, thus avoiding the local optimization problem and having the capacity to train faster than a BPNN. Lin et al. [9] developed a BPNN-based velocity prediction method for the predictive control strategy of fuel cell electric vehicles. Xiang et al. [10] proposed a vehicle velocity predictor based on an RBFNN for real-time energy management, and their simulation results showed that the RBFNN had a relatively high accuracy in the short-term prediction horizon. To improve the prediction performance of BPNNs and RBFNNs, some novel structures have also been applied to conduct VVP. Based on the network architecture of an RBFNN, a general regression neural network (GRNN) can be established to achieve a higher speed of training. Wang et al. [11] utilized a GRNN as an upper layer to make velocity prediction for an MPC-based EMS by taking advantage of the GRNN’s short training time and ability to yield accurate forecasts under a limited number of training samples. Based on the network architecture of the BPNN used, a non-linear auto-regressive model with exogenous inputs (NARX) is proposed by adding delay and feedback mechanisms to enhance the memory ability of historical data, which can converge more rapidly and generalize well with a lower sensitivity to long-term dependencies. Zhang et al. [12] constructed a VVP model with a NARX NN, and the model’s prediction accuracy was validated by means of simulation comparison. Although feedforward neural networks (FNNs) and their variants are good at modeling non-linear characteristics, as shown in Figure 2a, they are only applicable for one-to-one prediction. Thus, the use of recurrent architectures with feedback loops has been proposed to solve one-to-many prediction with input repetitions over time.

As shown in Figure 2b, the three representative structures of a recurrent neural network (RNN) are a standard RNN, a long short-term memory (LSTM), and a gated recurrent unit (GRU). As the standard RNN model suffers from gradient exploding or vanishing when dealing with longer sequences [13], LSTM and GRU, as promising variants of RNNs, have become effective methods for predicting future velocity with enhanced structures. LSTM adds an additional memory component to avoid the vanishing gradient problem, to some extent [14]. With a simpler internal configuration to balance prediction accuracy and computation time, a GRU exhibits better prediction performance than an LSTM. Du et al. [15] compared the predictive errors of an RNN and an LSTM model in each prediction step, and the results showed that the LSTM model exhibited a better prediction effect in the velocity time series. Wu et al. [16] established an LSTM model to perform medium-term prediction of driving cycles. Shin et al. [17] compared three VVP models—an RNN model, an LSTM model, and a GRU model—and the results indicated that the average prediction error of the GRU model was lower than that of the RNN model by 45.1% and that of the LSTM model by 11.4%.

With the development of deep learning, attention or self-attention mechanisms have emerged, which can achieve sequence-to-sequence correlation learning. NNs based on an attention mechanism have been introduced into the VVP field, and transformer modules with self-attention stacks have garnered extensive attention as a typical representative of this architecture; a component diagram is depicted in Figure 3. Xu et al. [18] devised a transformer-based model that integrated the features of multiple vehicles to predict the velocity of driving vehicles. Shen et al. [19] proposed a novel, deterministic, transformer-based NN to predict the acceleration and deceleration behaviors of drivers and implemented a stochastic MC-based Monte Carlo method to forecast the velocity trajectory.

From the above descriptions, various architectures of NNs that have been applied in the field of VVP have their own advantages and limitations, as identified in Table 1.

Prediction accuracy and computational efficiency are considered the two primary issues during the implementation of NN-based VVP methods. To improve these two performance indices of NN models, model parameter optimization of NN models and hybrid architectures have been extensively studied. On the one hand, optimization algorithms, such as genetic algorithms (GAs) and particle swarm optimization (PSO), are frequently employed to adjust the model parameters to enhance the prediction performance of NN models. For example, Liu et al. [20] applied a GA and PSO to optimize the model parameters of a BPNN-based VVP model. Hou et al. [21] used the fixed-order Akaike information criterion (AIC) to optimize the network parameters of an RBFNN-based vehicle velocity predictor. Bharti et al. [22] employed PSO to search for the best parameters of an LSTM-based VVP model on a global scale. On the other hand, hybrid architectures composed of feedforward, recurrent, or attention mechanism architectures have been designed to balance prediction accuracy and computational intensity. For instance, in [23,24], a VVP method combining an MC and a BPNN algorithm was proposed. Upadhyaya et al. [25] proposed a velocity prediction technique combining a BPNN and an RBFNN, wherein the RBFNN was adopted to compensate for the predictive errors resulting from the BPNN. Li et al. [26] presented a mixed BP-LSTM prediction approach to performing velocity prediction in different driving scenarios. A blended convolutional neural network (CNN) and a GRU model with an attention mechanism were proposed in [27] for VVP. Cao et al. [28] devised a CNN-LSTM-based model for traffic speed prediction.

Aside from network architectures and model parameters, the number and type of model inputs also have a great impact on the prediction performance of NN-based VVP methods, such as the historical velocity, acceleration, date, location, weather, gradient, and traffic signals. The findings reported in [29] indicates that, after adding a vehicle’s position information, relative velocity, and distance from the vehicle in front, the prediction accuracy of the proposed LSTM-based VVP model increased by 18.7%. Zhang et al. [30] utilized the distance from the first vehicle to the traffic light and the leading vehicle’s velocity as the model inputs for a CNN-based VVP model. Even though the prediction accuracy and generalization performance of VVP models have improved with the inclusion of more information characteristics, difficulties related to information acquisition and poor data stability remain challenges in the field.

In summary, the model structures, optimization algorithms, and model inputs of NN-based VVP methods have been widely studied by domestic and foreign scholars to improve their velocity prediction performance, and they have been proven to be effective through simulations and experiments. However, the quantitative impact of the above- mentioned measures on prediction performance is usually overlooked in existing studies. Thus, the optimal NN configuration for VVP remains unknown, and knowledge regarding its maximum improvement potential is also missing. To bridge this research gap, the primary objective of this paper is to explore the influencing mechanism of model inputs on the prediction performance of NN-based VVP models by means of a comparative analysis, thus laying a theoretical foundation for further improvement in the prediction performance of NN-based VVP models.

The major contributions of this paper are the following:

(1): The feature parameters of NN-based VVP model inputs were extracted based using the random forest (RF) method and Pearson’s correlation coefficient (PCC).
(2): In view of the three typical network architectures of NNs, three VVP models were constructed based on a BPNN, an LSTM, and generative pre-training (GPT).
(3): The simulation setup was designed on the basis of feature parameters, sliding window length, and prediction horizon, and the mean absolute error (MAE), goodness-of-fit (R²), and computation time are proposed as the main performance metrics for quantifying the prediction accuracy and real-time performance of the proposed VVP models.
(4): The effects of feature parameters, sliding window length, and prediction horizon on the prediction performance of the VVP models were analyzed through numerical simulations, following which the relationship between vehicle velocity and model inputs was examined.

The remainder of this paper is organized as follows. Section 2 describes the extraction and preprocessing of driving feature parameters as the VVP model inputs. Section 3 illustrates the prediction frameworks of the BPNN-, LSTM-, and GPT-based VVP models. Section 4 introduces the simulation setup and performance metrics of the VVP models for the comparative experiments, followed by a presentation of the results and discussion in Section 5. Section 6 presents the main conclusions and directions for future works.

2. Vehicle Velocity Feature Selection and Preprocessing

This section introduces the various input features used for VVP. The parameters that represent features of the vehicle driving conditions are based on vehicle velocity and can be utilized to supplement vehicle velocity as the model inputs. Through an investigation of the relevant literature [31,32,33], a total of 17 feature parameters were selected as the initial inputs of the VVP models, as illustrated in Table 2. The major parameters of the feature parameter equations are summarized in Table 3.

An appropriate number of extracted feature parameters should be determined with simultaneous consideration of prediction accuracy and computational intensity; fewer parameters lead to a low prediction accuracy, while more parameters bring about a high computational burden. Therefore, it was necessary to process the above 17 feature parameters through feature selection to remove some unnecessary and redundant features. The use of PCC is a general approach for evaluating the degree of correlation between two variables, whose calculation is presented in Equation (1). A larger correlation coefficient indicates a stronger correlation between two feature parameters. In this study, the correlation coefficient between two feature parameters was evaluated with a limit of 0.8 [32,33]. The formula for calculating the correlation coefficient is the following:

r = \frac{\sum_{i = 1}^{n} (X_{i} - \bar{X}) (Y_{i} - \bar{Y})}{\sqrt{\sum_{i = 1}^{n} {(X_{i} - \bar{X})}^{2}} \sqrt{\sum_{i = 1}^{n} {(Y_{i} - \bar{Y})}^{2}}}

(1)

where

X

and

Y

represent the two variables under consideration, and

\bar{X}

and

\bar{Y}

are the average values of the corresponding variables. After performing the correlation analysis for the 17 feature parameters, it was evident that the pairs of feature parameters with a strong correlation were

v_{m r}

and

v_{m e}

,

a_{\max}

and

a_{m e 1}

, and

a_{\min}

and

a_{m e 2}

, as described in Figure 4. Therefore, it was crucial to avoid selecting similar features repeatedly when choosing feature parameters.

Apart from the correlation analysis of the 17 feature parameters, an RF algorithm was also employed to further extract the feature parameters that had the most significant impact on the predictive velocity. An RF algorithm calculates the contribution of each feature in every decision tree of the RF, and then compares the average value to acquire the importance ranking result of the feature parameters. For the RF algorithm in the scikit-learn library, feature importance scores were calculated according to the Gini index. The relevant calculation formulae are as follows.

G I_{m} = \sum_{k = 1}^{K} {\hat{p}}_{m k} (1 - {\hat{p}}_{m k})

(2)

V I M_{j m}^{(G i n i)} = G I_{m} - G I_{l} - G I_{r}

(3)

V I M_{i j}^{(G i n i)} = \sum_{m = 1}^{M} V I M_{j m}^{(G i n i)}

(4)

V I M_{j}^{(G i n i)} = \frac{1}{n} \sum_{1}^{n} V I M_{i j}^{(G i n i)}

(5)

In Equation (2),

G I_{m}

is the Gini index of node

m

,

K

is the number of categories in the sample set, and

{\hat{p}}_{m k}

represents the probability estimate that node

m

belongs to class

k

. In Equation (3),

V I M_{j m}^{(G i n i)}

is the importance of feature

X_{j}

in node

m

, and

G I_{l}

and

G I_{r}

represent the Gini indices of the two new nodes split by node

m

. In Equation (4),

V I M_{i j}^{(G i n i)}

represents the importance of the feature

X_{j}

occurring

M

times in the

i

th decision tree. In Equation (5),

V I M_{j}^{(G i n i)}

is the final importance score for the feature

X_{j}

, and

n

is the number of decision trees.

As shown in Table 4, the top five features in terms of importance were

v_{m r}

,

v_{m e}

,

v a_{\max}

,

a_{\min}

, and

a

. Combined with the above feature correlation analysis based on PPC, five feature parameters, namely

v_{m r}

,

v a_{\max}

,

a_{\min}

,

a

, and

f_{v}

, were finally selected. Furthermore, as a typical time series, velocity shows a certain coherence in terms of time, and the previous vehicle velocity information provides some guiding influence on the future direction of vehicle velocity. Simultaneously, prediction performance is affected by varying lengths of historical vehicle velocity time windows. Based on the above analysis, the current vehicle velocity was determined to be the primary feature output, and the five feature parameters and historical velocity were selected as the sub-feature inputs for the VVP models.

3. VVP Principle and Models

In this section, the specific implementation process of VVP is introduced, and a description of how three VVP models based on LSTM, BPNN, and GPT were constructed is provided.

3.1. The Principle of VVP

After extracting the input feature parameters as described in the preceding section, the generation of target samples for model training and prediction began. Three VVP models based on LSTM, BPNN, and GPT were constructed in the form of multiple inputs and single outputs. The training step of these NN-based VVP models is shown in Figure 5, and the specific procedures are described below, where M is the time length of the sliding window.

(1) Collect the standard driving cycle conditions of heavy vehicles to form a historical velocity database. Extract the velocity in historical M seconds to form a sliding window

[v_{(t - M)}, v_{(t - M + 1)}, \dots, v_{(t - 1)}, v_{t}]

. Calculate the required feature parameters within the sliding window, including average of driving velocity

v_{m r}

, velocity times acceleration maximum

v a_{\max}

, minimum acceleration

a_{\min}

, acceleration

a

, and velocity variance

f_{v}

. Combine the velocity of the sliding window and above calculated feature parameters to construct the input vector

N_{i n}

, and the (t+1)th second velocity is taken as the output vector

N_{o u t}

.

N_{i n} = [v_{(t - M)}, v_{(t - M + 1)}, \dots, v_{(t - 1)}, v_{t}, v_{m r}, v a_{\max}, a_{\min}, a, f_{v}]

(6)

N_{o u t} = [v_{(t + 1)}]

(7)

(2) Based on step (1), the 1st vector of the input matrix is the velocity from the 1st second to the Mth second and the corresponding driving feature parameters, and the velocity of the (M+1)th second is then used as the 1st ground truth of the output matrix. With 1 s as the sliding step size, the 2nd vector of the input matrix is the velocity from the 2nd second to the (M+1)th second and the corresponding driving feature parameters, and the velocity of the (M+2)th second is then used as the 2nd ground truth of the output matrix. After performing data processing for (N−M+1) times, the final model input

N_{i n_F}

and output

N_{o u t_F}

are formulated as indicated in Equation (8):

N_{i n_F} = [\begin{array}{l} v_{1} & \dots & v_{M} & \dots & f_{v M} \\ v_{2} & \dots & v_{(M + 1)} & \dots & f_{v (M + 1)} \\ ⋮ & ⋮ & ⋮ & ⋮ & ⋮ \\ v_{N - (M - 3)} & \dots & v_{(N - 2)} & \dots & f_{v (N - 2)} \\ v_{N - (M - 2)} & \dots & v_{(N - 1)} & \dots & f_{v (N - 1)} \end{array}], N_{o u t_F} = [\begin{array}{l} v_{(M + 1)} \\ v_{(M + 2)} \\ ⋮ \\ v_{(N - 1)} \\ v_{N} \end{array}]

(8)

(3) Set up the parameters of the model structure and train the three NN-based models using the above training data set, including the BPNN-, LSTM-, and GPT-based models.

(4) Import the first sequence of the test data sets into the trained model and forecast the 1st second velocity in the future. To obtain a multi-step output, the predictive velocity is filled into the next sliding window of historical velocity, and the sliding window is shifted to the right by one step. In the meantime, the feature parameters of the current sliding window are extracted and combined with the velocity of the current sliding window as the input to predict the 2nd second velocity in the future. As illustrated in Figure 6, a future multi-step output can be attained through these rolling prediction operations.

3.2. The BPNN Model

Combining the multi-layer feedforward structure with an error backpropagation algorithm, the BPNN model is composed of an input layer, one or more hidden layers, and an output layer. The input layer receives external inputs, the hidden layer utilizes an activation function to accomplish the non-linear mapping of information, and the output layer converts the hidden layer’s output into a specific form of output data. The BPNN model, with a hyperbolic tangent S-function, is depicted in Figure 2a, and its activation function is defined as shown in Equation (9):

a^{1} = \tan s i g (n) = \frac{e^{n} - e^{- n}}{e^{n} + e^{- n}}

(9)

n = W a^{0} + b

(10)

where

a^{1}

and

a^{0}

are the neuronal outputs of the current and previous layers, respectively;

n

is the cumulative production;

W

is the weight value; and

b

is the bias value.

3.3. The LSTM Model

As an enhancement of the RNN configuration, an LSTM has a sophisticated information transmission framework. Specifically, an LSTM contains three gate units, including an input gate

i_{t}

, a forget gate

f_{t}

, and an output gate

o_{t}

, and a memory unit

C_{t}

in the structure’s core. The past states’ information is stored in the memory unit, while the input, output, and forgetting information are managed by the gate units.

A typical LSTM network structure is shown in Figure 2b, and the three gate units are explained below.

(1) The forget gate determines how much information from the previous cell state can be transferred to the current moment, as shown in Equation (11):

f_{t} = σ (W_{f} [h_{t - 1}, x_{t}] + b_{f})

(11)

where

W_{f}

is the forget gate’s weight matrix,

h_{t - 1}

is the last cell’s output;

x_{t}

is the current moment’s input;

b_{f}

denotes the bias vector; and

σ

is the activation function sigmoid.

(2) The input gate decides how much of the newly generated information from the current input can be stored in the cell state, and its two factors

i_{t}

and

{\tilde{C}}_{t}

are calculated according to Equations (12) and (13):

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i})

(12)

{\tilde{C}}_{t} = \tanh (W_{c} \cdot [h_{t - 1}, x_{t}] + b_{c})

(13)

where

W_{i}

and

W_{c}

are the input gate’s weight matrices, and

b_{i}

and

b_{c}

are the bias vectors. The formula for generating the updated cell state information

C_{t}

can be written as the following:

C_{t} = f_{t} \cdot C_{t - 1} + i_{t} \cdot {\tilde{C}}_{t}

(14)

(3) The output gate outputs the current information based on the updated cell state information, as shown in Equation (15):

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o})

(15)

where

W_{o}

is the output gate’s weight matrix and

b_{o}

is the bias vector. The output of the hidden layer at the current moment

h_{t}

is calculated according to Equation (16):

h_{t} = o_{t} \cdot \tanh (C_{t})

(16)

3.4. The GPT Model

The GPT model is made up of multi-layer unidirectional transformer decoder elements. As shown in Figure 7, each layer of the decoder is primarily composed of a feedforward NN module and a masked multi-head self-attention module.

The attention score for every input vector is calculated by the self-attention mechanism using scaled dot product attention [34,35]. First, for every input vector, the query, key, and value matrices are created. Next, the softmax function computes the dot products between the query and key matrices to convert them into attention scores. Finally, each attention score is weighted and added together with the value matrix, as described in Equation (17):

A t e n t i o n (Q, K, V) = s o f t \max (\frac{Q K^{T}}{\sqrt{d_{k}}}) V

(17)

Q = W^{q} \cdot I

(18)

K = W^{k} \cdot I

(19)

V = W^{v} \cdot I

(20)

where

Q

is the query matrix;

K

is the key matrix;

V

is the value matrix;

I

is the input matrix;

W^{q}

,

W^{k}

, and

W^{v}

are the corresponding weight matrices; and

d_{k}

is the dimensions of

Q

,

K

, and

V

.

The multi-head attention module splits the single-head attention input matrix equally, and then each scaled dot product attention head focuses separately on information from different representation subspaces at different positions, as illustrated in Equation (21):

M u l t i H e a d (Q, K, V) = C o n c a t (h e a d_{1}, \dots, h e a d_{h}) W^{O}

(21)

h e a d_{i} = A t t e n t i o n (Q W_{i}^{Q}, K W_{i}^{K}, V W_{i}^{V})

(22)

where

W_{i}^{Q}

,

W_{i}^{K}

, and

W_{i}^{V}

are the weight matrices of the

i

th attention head;

W^{O}

is the multi-attention weight matrix;

h

is the number of attention heads; and the concat function splices the output values calculated by each attention head.

The GPT model takes a given time series as the input, and for each new timestamp, it generates a new prediction for the following timestamp. The generated prediction sequence is then compared with the corresponding true sequence to calculate the training loss, as demonstrated in Figure 8. To accomplish the process described above, a mask must be employed to ensure that the model only has access to the tokens coming prior to the sequence at each step. An extra matrix is added to prevent the model from cheating by looking ahead before the softmax function is applied, which has a value of negative infinity for the upper triangle and a value of 0 for the diagonal and lower triangle.

To prevent disappearing gradients, a residual join operation is inserted between each decoder sublayer. Layer normalization is also utilized to hasten the network convergence, as depicted in Equation (23):

o = L a y e r N o r m (x + S u b l a y e r (x))

(23)

where the term Sublayer denotes the function inside each sublayer, and it is a fully connected feedforward NN processing function, while LayerNorm is the layer normalization processing function.

Scheduled sampling is employed to train the GPT model. Unlike the traditional training method of teacher forcing, scheduled sampling selects the ground truth information with a higher probability as the model inputs in the early stage of model training, and it gradually employs the predicted outputs as the model inputs in the later stage of model training to avoid the problem of exposure bias [36,37]. The sampling rate is determined by using a probability decay function, and the general probability decay functions include linear decay, exponential decay, and inverse sigmoid decay, as illustrated in Figure 9. The inverse sigmoid decay presented in Equation (24) was chosen for this study.

P_{i} = \frac{k}{k + e^{\frac{i}{k}}}

(24)

where

k

is used to finetune the rate of decay and

i

is the number of training epochs.

4. Experimental Setup and Performance Metrics

4.1. Experimental Setup

Various data sets representing the typical standard driving cycle conditions of heavy vehicles were chosen as the data sets for this study. As shown in Figure 10, the training data set consisted of the following driving cycles: MANHATTAN_CYC, CHTC_HT, CHTC_B, CHTC_C, CHTC_TT, WVUSUB_CYC, WVUINTER_CYC, UDDSHDV_CYC, and HWFET_CYC. NYCTRUCK_CYC, NYCBUS_CYC, and C_WTVC were employed as three test data sets. Moreover, 70% of the data were included in the training set, while 30% of the data were included in the test sets.

As indicated in Table 5, with the consideration of the driving feature parameters, sliding window, and prediction horizon, a series of experiments were designed to explore the impact of various model inputs on the prediction performance of the NN-based VVP models.

The parameters of trained models should be configured properly to eliminate the extent of overfitting and underfitting of NN-based models. For the setup of the hidden layers of BPNN and LSTM models, a three-layer network structure is sufficient for fitting any non-linear curves of simple predictive regression projects. Therefore, the number of hidden layers of the BPNN and LSTM models was set to 1. In regard to the number of hidden units, it was initially determined for the BPNN model by using the empirical formula depicted in Equation (25), whereas it was resolved based on the power of two for the LSTM model. Eventually, the critical parameters of the three VVP models were set as shown in Table 6 and Table 7.

n = \sqrt{i + o} + a

(25)

where

n

is the number of hidden units,

i

is the number of input neurons,

o

is the number of output neurons, and

a

is an adjustable parameter between 1 and 10.

Furthermore, the BPNN model is trained by the trainlm function, which obtains the current values of weights and biases from the neural network and minimizes the error function mean square error (MSE) through the Levenberg–Marquardt algorithm to update the weight and bias value. In addition, the LSTM model employs the adaptive moment estimation (Adam) optimization function to update the weight and bias values of the network adaptively, while the GPT model utilizes the stochastic gradient descent (SGD) algorithm to optimize the parameters of the network structure.

4.2. Performance Metrics

VVP models are typically tested in terms of their prediction accuracy and computation time. In this study, the MAE and R² were employed to quantify the velocity prediction accuracy, and their expressions are shown in Equations (26) and (27):

M A E = \frac{1}{n} \sum_{i = 1}^{n} |{\hat{y}}_{i} - y_{i}|

(26)

R^{2} = 1 - \frac{\sum_{i}^{n} {({\hat{y}}_{i} - y_{i})}^{2}}{\sum_{i}^{n} {({\bar{y}}_{i} - y_{i})}^{2}}

(27)

where

{\hat{y}}_{i}

and

y_{i}

are, respectively, the predicted velocity and actual velocity at the

i

th second;

{\bar{y}}_{i}

represents the average value of actual velocity; and

n

is the total number of velocity points.

The real-time performance of a prediction method is commonly evaluated based on its computation time. For NN-based VVP models, the computation time includes the inference time and training time. The inference time refers to the execution time of single-step prediction, while the training time represents the complexity of NN-based models. Moreover, the computation time of NN-based models is greatly affected by computer hardware. As a result, every simulation group was run on the same computer hardware—a Lenovo laptop with an Intel(R) Core (TM) i7-7700HQ with 2.8 GHz CPU. On the basis of the above analysis, the model training time

T_{t r a i n}

and the model inference time

T_{p r e}

were employed to measure the computational efficiency of the VVP model.

5. Simulation Results and Discussion

In this study, the above proposed performance metrics were employed to evaluate the performance of the three VVP models, and their prediction accuracy was analyzed in terms of three aspects, including the driving feature parameters, the sliding window, and the prediction horizon. Since a BPNN model tends to fall into local optimization easily, which will lead to substantial variation in the results and will not be conducive to comparison, each group of experiments was run through five simulations for the BPNN model in this study, and the results were averaged to obtain the final results.

5.1. Analysis of Prediction Accuracy

(1): Driving feature parameters

Firstly, the effects of driving feature parameters as the model inputs was analyzed by comparing the simulation results of groups A1 and A2, B1 and B2, and C1 and C2, and the results are shown in Table 8. It is clear that, on the one hand, the BPNN, LSTM, and GPT models all show an improvement in their prediction accuracy after adding the driving feature parameters as the model inputs, with the LSTM model showing the greatest improvement effect and the GPT model showing the lowest improvement. On the other hand, in the simulation of groups where feature parameters were not chosen as the model inputs, the BPNN model predicts better than the LSTM and GPT models.

The quantitative impact of the feature parameters on the performance of the three VVP models for groups A1 and A2 is charted in Figure 11. The MAE for the 1 s, 5 s, and 10 s prediction horizons in group A2 were 5.4%, 17.5%, and 50.9% lower than the corresponding MAE in group A1 for the BPNN model, respectively; 43.2%, 56.3%, and 71.2% lower for the LSTM model, respectively; and 40.0%, 20.6%, and 20.9% lower for the GPT model, respectively. When the model input contained only historical velocity, as demonstrated in Figure 12a, the MAE for the 1 s, 5 s, and 10 s prediction horizons for the BPNN model in group A1 were 54.3%, 40.4%, and 34.2% lower than the corresponding MAE for the LSTM model, respectively, and 32.7%, 42.4%, and 42.4% lower than for the GPT model, respectively. When we added the feature parameters to the model inputs, as shown in Figure 12b, the MAE for the 5 s and 10 s prediction horizons for the LSTM model in group B2 were lower than the corresponding MAE for the BPNN model by 19.0% and 10.9%, respectively, and lower than the GPT model by 50.0% and 70.3%, respectively.

Furthermore, it can be seen from the comparison of groups C2–C5 and D1 in Table 9 that the number of driving feature parameters also had an effect on the prediction accuracy of the VVP models. As depicted in Figure 13, except for the 1 s prediction horizon, the MAE of the BPNN, LSTM, and GPT models first decreased and then increased as the number of driving feature parameters decreased, and the MAE was the smallest under the model inputs of group C3. This phenomenon indicates that the prediction accuracy of the BPNN, LSTM, and GPT models can only be greatly improved when the model inputs contain the acceleration parameter

a

. Meanwhile, the comparison of the MAE in groups C2 and C3 demonstrated that the addition of the velocity variance

f_{v}

did not further improve the prediction performance. In addition, the prediction performance of the BPNN and LSTM models was more sensitive to the acceleration parameter

a

than that of the GPT model.

Additionally, the influence of the type of driving feature parameters on prediction accuracy was also revealed through a comparison of groups D1–D5 and C1 using the three test data sets, and the results are shown in Table 10. The MAE was minimum in group D4 for the three VVP models. These comparison results imply that the acceleration parameter

a

was the key feature parameter to improving the prediction accuracy of the VVP models. Compared to the results in group C1, as displayed in Figure 14, the MAE of the 10 s prediction horizon in group D4 decreased by 56.1% for the BPNN model, by 74.3% for the LSTM model, and by 18.6% for the GPT model when tested on the C_WTVC data set. Meanwhile, the impact of the driving feature parameters

v_{m r}

,

v a_{\max}

,

a_{\min}

, and

f_{v}

on the prediction performance of the three VVP models was not the same when tested on different test data sets. For example, taking the results for group C1 as the benchmark, it was observed that the driving feature parameter

v_{m r}

improved the prediction accuracy of the LSTM and GPT models when tested on the C_WTVC data set, but reduced the prediction accuracy of the BPNN model. Regarding their prediction performance on the three test data sets, the BPNN and LSTM models performed better when tested on the C_WTVC data set than on the NYCBUS_CYC and NYCTRUCK_CYC data sets under the same model inputs.

The above analysis shows that the acceleration parameter

a

improved the prediction performance the most. However, the ranking of feature importance obtained by using the RF method shows the order of

v_{m r}

,

v_{m e}

,

v a_{\max}

,

a_{\min}

, and

a

, which indicates that

v_{m r}

should be the most significant feature parameter. In view of the above difference, the variability in data distribution between the test data sets and the training data set should be considered as one of the important factors affecting the prediction performance of a model. The KDE is a non-parametric estimation method that does not require the inclusion of any prior knowledge and fits the distribution based on the characteristics of data themselves; hence, it was adopted in this study to analyze the distribution of feature parameters. The KDE can be easily visualized through a plot of the kernel density probability density function to study the distributional information of data. The relevant formulae are shown in Equations (28)–(30):

{\hat{f}}_{h} (x) = \frac{1}{n} \sum_{i = 1}^{n} K_{h} (x - x_{i}) = \frac{1}{n h} \sum_{i = 1}^{n} K (\frac{x - x_{i}}{h})

(28)

K_{h} (x) = \frac{1}{h} K (\frac{x}{h})

(29)

K (x) = \frac{1}{\sqrt{2 π}} e^{- \frac{x^{2}}{2}}

(30)

where

K_{h} (x)

is the scaling kernel function;

K (x)

is the Gaussian kernel that satisfies the probability density function property;

x_{i}

denotes the independent and identically distributed sample points;

n

is the number of sample points; and

h

is the bandwidth.

Figure 15a shows that the test data set C_WTVC and the training data set exhibit the slightest difference in the distribution of historical velocity, while the test data set NYCBUS_CYC shows the most significant difference. Regarding the distribution of driving feature parameters, Figure 15b–f demonstrate that the distribution difference in the driving feature parameters between the test data set C_WTVC and the training data set was minimal. The training data set and the test data sets NYCBUS_CYC and NYCTRUCK_CYC show relatively significant distribution differences in the feature parameters

v_{m r}

,

v a_{\max}

, and

a_{\min}

, while all three test data sets and the training data set show minor distribution differences in the feature parameter

a

.

(2): Sliding window length

The sliding window length is associated with the dimensionality of the model inputs, which affects the complexity of the model structure. By taking groups A1, B1, C1, E, and F in a simulation comparison, the velocity prediction results of different sliding window lengths within the 10 s prediction horizon when tested using the three test data sets are presented in Table 11. The MAE initially declined and then increased with an increase in the sliding window length. As illustrated in the shaded portion of Figure 16, the optimal sliding window length for the BPNN model was 15 s for the NYCBUS_CYC test set and 10 s for the NYCTRUCK_CYC and C_WTVC test sets; the optimal sliding window length for the LSTM model was 20 s for the NYCBUS_CYC test set and 15 s for the NYCTRUCK_CYC and C_WTVC test sets; and the optimal sliding window length for the GPT model was 15 s for the NYCTRUCK_CYC test set and 20 s for the NYCBUS_CYC and C_WTVC test sets.

In general, a short sliding window length makes the model layer structure simpler, while a long sliding window length contains more historical information and inadvertently contributes to an increase in the structural complexity of the model. Therefore, it is crucial to balance the influencing elements of the sequence information within the sliding window and the corresponding complexity of the NN model when choosing an optimal sliding window to obtain the best prediction results.

(3): Prediction horizon

Table 12 displays the results obtained with different prediction horizons under the model inputs of group C2 for the three VVP models, which illustrates that the MAE of the VVP models increased rapidly with an increase in prediction horizon, and the LSTM model had the best performance in each prediction horizon except for 1 s. Specifically, as shown in Figure 17, Figure 18 and Figure 19, the BPNN model did not predict subsequent velocity changes effectively during the stage of rapid velocity changes over a long-term prediction horizon, while the predictive velocity results of the LSTM model fit the target velocity curve better when the target velocity changed sharply. The GPT model performed relatively better only within the 1 s and 5 s prediction horizons, and its R² in long-term prediction horizon was the smallest, which shows that the GPT model exhibits the worst performance in curve fitting over long-term prediction horizons.

5.2. Analysis of Computation Time

The above section mainly presents the evaluation of the prediction accuracy of the three VVP models under different model inputs from the perspective of the MAE. To explore the possibility of real-time application requirements, the computational efficiencies of the three VVP models were also compared. Table 13 displays the results of the training time and inference time under the model input of group D1. It can be seen from Table 13 that the BPNN model had the shortest model training time due to its simple network structure, followed by the relatively complex LSTM model, and the GPT model had the longest training time. Nevertheless, in terms of inference time, the single-step prediction time of the GPT model had the smallest average value. In short, for a complex NN-based velocity prediction model, online training cannot be achieved under the conditions of an existing vehicle-embedded system in complex and constantly updated driving scenarios. However, this difficulty can be solved in the near future as the computing power of embedded systems is steadily improving.

6. Conclusions

In this study, a comparative analysis of NN-based VVP methods was conducted qualitatively based on theory and quantitatively based on simulations. As three representative NN-based models, BPNN, LSTM, and GPT models were constructed for VVP after extracting the driving feature parameters from historical vehicle velocity data using the PCC and RF methods. The effects of model inputs, including feature parameters, sliding window length, and prediction horizon, on the prediction performance of the three VVP models were analyzed through multiple simulation experiments. The main conclusions are summarized as the following: (1) Model inputs should match the model structure, and the BPNN model (with the simplest model structure) performs better when the model input is a single historical vehicle velocity parameter, while the LSTM model performs better when the model input contains driving feature parameters. (2) Prediction accuracy declines with an increase in the prediction horizon. The BPNN model achieved the most accurate prediction in the 1 s prediction horizon, while the LSTM model presented the best prediction accuracy in both the 5 s and 10 s prediction horizons with the addition of feature parameters to the model inputs. The GPT model made accurate predictions in the 1 s prediction horizon with different model inputs, but performed poorly over a long-term prediction horizon. (3) The acceleration parameter

a

was the most crucial feature parameter to enhancing the model prediction accuracy, while the number and type of feature parameters, as well as the distribution of the feature parameters between the training and testing data sets, had a significant impact on VVP model performance. (4) In terms of the sliding window length, the three VVP models achieved a relatively higher prediction accuracy when the sliding window was between 15 and 20 s.

In future works, some potential improvements can be made in the following aspects: (1) the parameters and structures of NN-based VVP models can be identified by employing optimization algorithms to improve the computational efficiency of these prediction models; (2) to further improve the prediction accuracy, the model input used for VVP should not only be limited to historical vehicle velocity information, but should also contain other historical vehicle state information or external ITS information; (3) the appropriate number of training samples should be identified, in order to balance prediction accuracy with computation time; and (4) the prediction performance of VVP models should be further validated through a combination with a PEMS.

Author Contributions

P.Z.: Conceptualization, supervision, funding acquisition, writing—original draft, and writing—review and editing. W.L.: Formal analysis, software, investigation, validation, writing—original draft, and writing—review and editing. C.D.: Methodology, writing—review and editing, and supervision. J.H.: Writing—review and editing, supervision, and funding acquisition. F.Y.: Writing—review and editing and supervision. All authors have read and agreed to the published version of the manuscript.

Funding

This study was supported by the National Natural Science Foundation of China (52305069), Key R&D Project of Hubei Province (2022BAA076), and Independent Innovation Projects of the Hubei Longzhong Laboratory (2022ZZ-21).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data presented in this study are available if requested from the corresponding author. The data are not publicly available due to privacy.

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

References

Wasserburger, A.; Schirrer, A.; Didcock, N.; Hametner, C. A probability-based short-term velocity prediction method for energy-efficient cruise control. IEEE Trans. Veh. Technol. 2020, 69, 14424–14435. [Google Scholar] [CrossRef]
Liu, K.; Asher, Z.; Gong, X.; Huang, M.; Kolmanovsky, I. Vehicle Velocity Prediction and Energy Management Strategy Part 1: Deterministic and Stochastic Vehicle Velocity Prediction using Machine Learning; 0148-7191; SAE Technical Paper; SAE International: Warrendale, PA, USA, 2019. [Google Scholar] [CrossRef]
Shin, J.; Sunwoo, M. Vehicle speed prediction using a Markov Chain with speed constraints. IEEE Trans. Intell. Transp. Syst. 2019, 20, 3201–3211. [Google Scholar] [CrossRef]
Chao, S.; Xiaosong, H.; Moura, S.J.; Fengchun, S. Velocity predictors for predictive energy management in hybrid electric vehicles. IEEE Trans. Control. Syst. Technol. 2015, 23, 1197–1204. [Google Scholar] [CrossRef]
Liu, H.; Li, X.; Wang, W.; Han, L.; Xiang, C. Markov velocity predictor and radial basis function neural network based real-time energy management strategy for plug-in hybrid electric vehicles. Energy 2018, 152, 427–444. [Google Scholar] [CrossRef]
Lefevre, S.; Sun, C.; Bajcsy, R.; Laugier, C. Comparison of parametric and non-parametric approaches for vehicle speed prediction. In Proceedings of the 2014 American Control Conference, Portland, OR, USA, 4–6 June 2014; pp. 3494–3499. [Google Scholar] [CrossRef]
Jing, J.; Filev, D.; Kurt, A.; Ozatay, E.; Michelini, J.; Ozguner, U. Vehicle speed prediction using a cooperative method of fuzzy Markov model and auto-regressive model. In Proceedings of the 2017 IEEE Intelligent Vehicles Symposium (IV), Los Angeles, CA, USA, 11–14 June 2017; pp. 881–886. [Google Scholar] [CrossRef]
Rosolia, U.; Zhang, X.; Borrelli, F. Data-driven predictive control for autonomous systems. Annu. Rev. Control. Robot. Auton. Syst. 2018, 1, 259–286. [Google Scholar] [CrossRef]
Lin, X.; Wang, Z.; Wu, J. Energy management strategy based on velocity prediction using back propagation neural network for a plug-in fuel cell electric vehicle. Int. J. Energy Res. 2020, 45, 2629–2643. [Google Scholar] [CrossRef]
Xiang, C.; Ding, F.; Wang, W.; He, W. Energy management of a dual-mode power-split hybrid electric vehicle based on velocity prediction and nonlinear model predictive control. Appl. Energy 2017, 189, 640–653. [Google Scholar] [CrossRef]
Wang, W.; Guo, X.; Yang, C.; Zhang, Y.; Zhao, Y.; Huang, D.; Xiang, C. A multi-objective optimization energy management strategy for power split HEV based on velocity prediction. Energy 2022, 238, 121714. [Google Scholar] [CrossRef]
Zhang, Y.; Gao, M.; Hua, G.; Xie, Q.; Guo, Y.; Zheng, R. Multisource fusion of exogenous inputs based NARXs neural network for vehicle speed prediction between urban road intersections. Proc. Inst. Mech. Eng. Part D J. Automob. Eng. 2023, 09544070231186186. [Google Scholar] [CrossRef]
Che, Z.; Purushotham, S.; Cho, K.; Sontag, D.; Liu, Y. Recurrent neural networks for multivariate time series with missing values. Sci. Rep. 2018, 8, 6085. [Google Scholar] [CrossRef]
Sherstinsky, A. Fundamentals of recurrent neural network (RNN) and long short-term memory (LSTM) network. Phys. D Nonlinear Phenom. 2020, 404, 132306. [Google Scholar] [CrossRef]
Du, Y.; Cui, N.; Li, H.; Nie, H.; Shi, Y.; Wang, M.; Li, T. The vehicle’s velocity prediction methods based on RNN and LSTM neural network. In Proceedings of the 2020 Chinese Control and Decision Conference (CCDC), Hefei, China, 22–24 August 2020; pp. 99–102. [Google Scholar] [CrossRef]
Wu, Y.; Huang, Z.; Zheng, Y.; Liu, Y.; Li, H.; Che, Y.; Peng, J.; Teodorescu, R. Spatial–temporal data-driven full driving cycle prediction for optimal energy management of battery/supercapacitor electric vehicles. Energy Convers. Manag. 2023, 277, 116619. [Google Scholar] [CrossRef]
Shin, J.; Yeon, K.; Kim, S.; Sunwoo, M.; Han, M. Comparative study of Markov chain with recurrent neural network for short term velocity prediction implemented on an embedded system. IEEE Access 2021, 9, 24755–24767. [Google Scholar] [CrossRef]
Xu, M.; Lin, H.; Liu, Y. A deep learning approach for vehicle velocity prediction considering the influence factors of multiple lanes. Electron. Res. Arch. 2023, 31, 401–420. [Google Scholar] [CrossRef]
Shen, H.; Wang, Z.; Zhou, X.; Lamantia, M.; Yang, K.; Chen, P.; Wang, J. Electric vehicle velocity and energy consumption predictions using transformer and Markov-chain Monte carlo. IEEE Trans. Transp. Electrif. 2022, 8, 3836–3847. [Google Scholar] [CrossRef]
Liu, J.; Chen, Y.; Zhan, J.; Shang, F. An on-line energy management strategy based on trip condition prediction for commuter plug-in hybrid electric vehicles. IEEE Trans. Veh. Technol. 2018, 67, 3767–3781. [Google Scholar] [CrossRef]
Hou, J.; Yao, D.; Wu, F.; Shen, J.; Chao, X. Online vehicle velocity prediction using an adaptive radial basis function neural network. IEEE Trans. Veh. Technol. 2021, 70, 3113–3122. [Google Scholar] [CrossRef]
Redhu, P.; Kumar, K. Short-term traffic flow prediction based on optimized deep learning neural network: PSO-Bi-LSTM. Phys. A Stat. Mech. Its Appl. 2023, 625, 129001. [Google Scholar] [CrossRef]
Zhang, L.; Liu, W.; Qi, B. Energy optimization of multi-mode coupling drive plug-in hybrid electric vehicles based on speed prediction. Energy 2020, 206, 118126. [Google Scholar] [CrossRef]
Shen, P.; Zhao, Z.; Zhan, X.; Li, J.; Guo, Q. Optimal energy management strategy for a plug-in hybrid electric commercial vehicle based on velocity prediction. Energy 2018, 155, 838–852. [Google Scholar] [CrossRef]
Upadhyaya, A.; Mahanta, C. Improving velocity prediction in electric vehicles using hybrid artificial neural network (ANN). In Proceedings of the 2022 IEEE 10th Conference on Systems, Process & Control (ICSPC), Malacca, Malaysia, 17 December 2022; pp. 94–99. [Google Scholar] [CrossRef]
Yufang, L.; Mingnuo, C.; Wanzhong, Z. Investigating long-term vehicle speed prediction based on BP-LSTM algorithms. IET Intell. Transp. Syst. 2019, 13, 1281–1290. [Google Scholar] [CrossRef]
Jiao, X.; Wang, Z.; Zhang, Z. Vehicle Speed Prediction Using a Combined Neural Network of Convolution and Gated Recurrent Unit with Attention. Res. Sq. 2022. [Google Scholar] [CrossRef]
Cao, M.; Li, V.O.; Chan, V.W. A CNN-LSTM model for traffic speed prediction. In Proceedings of the 2020 IEEE 91st Vehicular Technology Conference (VTC2020-Spring), Antwerp, Belgium, 25–28 May 2020; pp. 1–5. [Google Scholar] [CrossRef]
Yeon, K.; Min, K.; Shin, J.; Sunwoo, M.; Han, M. Ego-vehicle speed prediction using a long short-term memory based recurrent neural network. Int. J. Automot. Technol. 2019, 20, 713–722. [Google Scholar] [CrossRef]
Zhang, F.; Xi, J.; Langari, R. Real-time energy management strategy based on velocity forecasts using V2V and V2I communications. IEEE Trans. Intell. Transp. Syst. 2017, 18, 416–430. [Google Scholar] [CrossRef]
Huang, X.; Tan, Y.; He, X. An intelligent multifeature statistical approach for the discrimination of driving conditions of a hybrid electric vehicle. IEEE Trans. Intell. Transp. Syst. 2011, 12, 453–465. [Google Scholar] [CrossRef]
Montazeri-Gh, M.; Fotouhi, A.; Naderpour, A. Driving patterns clustering based on driving feature analysis. Proc. Inst. Mech. Eng. Part C J. Mech. Eng. Sci. 2011, 225, 1301–1317. [Google Scholar] [CrossRef]
Montazeri-Gh, M.; Fotouhi, A. Traffic condition recognition using the -means clustering method. Sci. Iran. 2011, 18, 930–937. [Google Scholar] [CrossRef]
Vaswani, A.; Shazeer, N.; Parmar, N.; Uszkoreit, J.; Jones, L.; Gomez, A.N.; Kaiser, Ł.; Polosukhin, I. Attention is all you need. Adv. Neural Inf. Process. Syst. 2017, 30. [Google Scholar] [CrossRef]
Sun, S.; Liu, Y.; Li, Q.; Wang, T.; Chu, F. Short-term multi-step wind power forecasting based on spatio-temporal correlations and transformer neural networks. Energy Convers. Manag. 2023, 283, 116916. [Google Scholar] [CrossRef]
Bengio, S.; Vinyals, O.; Jaitly, N.; Shazeer, N. Scheduled sampling for sequence prediction with recurrent neural networks. Adv. Neural Inf. Process. Syst. 2015, 28. [Google Scholar] [CrossRef]
Mihaylova, T.; Martins, A.F. Scheduled sampling for transformers. arXiv 2019, arXiv:1906.07651. [Google Scholar]

Figure 1. Commonly used NN-based approaches to conducting VVP.

Figure 2. (a) Commonly used feedforward NN architectures and (b) commonly used recurrent NN architectures.

Figure 3. Attention mechanism and transformer.

Figure 4. Matrix of correlation coefficients for each pair of driving feature parameters.

Figure 5. Training step for velocity prediction.

Figure 6. Prediction process of sliding windows.

Figure 7. Structure of the GPT model.

Figure 8. Training loss calculation of the GPT model.

Figure 9. Linear, exponential, and inverse sigmoid decay curves.

Figure 10. Training data set and three test data sets.

Figure 11. Comparison of MAE in groups A1 and A2 for the three VVP models.

Figure 12. (a) Comparison of MAE in group A1 for the three VVP models and (b) comparison of MAE in group B2 for the three VVP models.

Figure 13. Comparison of MAE in the 10 s prediction horizons in groups C2–C5 and D1 when tested on the C_WTVC test set.

Figure 14. Comparison of the MAE within the 10 s prediction horizons for groups C1 and D1–D5 when tested on the C_WTVC test set.

Figure 15. The distributions of driving feature parameters.

Figure 16. Comparison of MAE for different sliding window lengths when tested on three test data sets.

Figure 17. Vehicle velocity prediction curves over different prediction horizons for the BPNN model.

Figure 18. Vehicle velocity prediction curves over different prediction horizons for the LSTM model.

Figure 19. Vehicle velocity prediction curves over different prediction horizons for the GPT model.

Table 1. Comparison of different types of NNs.

NN Type	Advantage	Limitation
PNN	Simple structure with strong non-linear mapping capability	Long training time, slow convergence, affected by initial weights and thresholds, and easy to fall into local minima
RBFNN	No local minima problem and faster convergence than BPNNs	Does not work when there are insufficient data
GRNN	Faster convergence than RBFNNs with high fault tolerance and robustness	High computational volume and high storage space requirements
NARX	Add delay and feedback mechanisms, and suitable for dealing with time series problems	Reverse propagation and time-consuming training
RNN	Limited ability to memorize time series information	Gradient vanishing problem
LSTM	Stronger information memorization than RNNs, and can mitigate the gradient vanishing problem	Complex structure, large number of parameters, and slow training speed
GRU	Simpler structure and faster training speed than LSTMs	Predictive performance may not be as good as LSTMs for complex tasks
Transformer	Stronger ability to memorize long time series information with parallel computing capability	Complex structure requires positional encoding to characterize sequence order and may not perform well for simple prediction tasks

Table 2. Vehicle driving feature parameters.

Feature Parameter	Denotation	Unit	Calculation Equation
Average velocity	$v_{m e}$	km/h	$v_{m e} = \frac{1}{n} \sum_{i = 1}^{n} v_{i}$
Average driving velocity	$v_{m r}$	km/h	$v_{m r} = \frac{1}{k} \sum_{m = 1}^{k} v_{m}$
Average positive acceleration	$a_{m e 1}$	m/s²	$a_{m e 1} = \frac{1}{m} \sum_{i = 1}^{m} a_{i}$
Average negative acceleration	$a_{m e 2}$	m/s²	$a_{m e 2} = \frac{1}{b} \sum_{j = 1}^{b} a_{j}$
Velocity variance	$f_{v}$	m²/s²	$f_{v} = \frac{1}{n} \sum_{i = 1}^{n} {(v_{i} - v_{m e})}^{2}$
Acceleration variance	$f_{a}$	m²/s⁴	$f_{a} = \frac{1}{n} \sum_{i = 1}^{n} {(a_{i} - a_{m e})}^{2}$
Variance in velocity times acceleration	$f_{v a}$	m⁴/s⁶	$f_{v a} = \frac{1}{n} \sum_{i = 1}^{n} {(v a_{i} - v a_{m e})}^{2}$
Acceleration time ratio	$P_{a}$	–	$P_{a} = \frac{t_{a}}{T} \times 100$
Deceleration time ratio	$P_{d}$	–	$P_{d} = \frac{t_{d}}{T} \times 100$
Uniform time ratio	$P_{c}$	–	$P_{c} = \frac{t_{c}}{T} \times 100$
Idling time ratio	$P_{i}$	–	$P_{i} = \frac{t_{i}}{T} \times 100$
Maximum acceleration	$a_{\max}$	m/s²	$a_{\max} = {\{a_{1}, a_{2}, \dots, a_{T}\}}_{\max}$
Minimum acceleration	$a_{\min}$	m/s²	$a_{\min} = {\{a_{1}, a_{2}, \dots, a_{T}\}}_{\min}$
Maximum value of velocity times acceleration	$v a_{\max}$	m²/s³	$v a_{\max} = {\{v a_{1}, v a_{2}, \dots, v a_{T}\}}_{\max}$
Minimum value of velocity times acceleration	$v a_{\min}$	m²/s³	$v a_{\min} = {\{v a_{1}, v a_{2}, \dots, v a_{T}\}}_{\min}$
Velocity first-order difference (acceleration)	$a$	m/s²	$a_{i} = v_{i} - v_{i - 1}, a_{0} = 0$
Velocity second-order difference	$δ_{a}$	m/s³	$δ_{a i} = a_{i} - a_{i - 1}, a_{0} = 0$

Table 3. The major parameters of the feature parameter equations.

Symbol	Name	Symbol	Name
$n$ , $T$	The sampling time length	$t_{a}$	The acceleration time length on the sampling time
$k$	The time length of non-zero velocity on the sampling time	$t_{d}$	The deceleration time length on the sampling time
$m$	The time length of positive acceleration on the sampling time	$t_{c}$	The time length of uniform velocity on the sampling time
$b$	The time length of negative acceleration on the sampling time	$t_{i}$	The time length of idling velocity on the sampling time

Table 4. Importance scores of the first eight feature parameters.

Feature Parameter	$v_{m r}$	$v_{m e}$	$v a_{\max}$	$a_{\min}$	$a$	$f_{v}$	$a_{m e 2}$	$P_{d}$
Importance Score	0.5312	0.4184	0.0199	0.0131	0.0127	0.0013	0.0012	0.0009

Table 5. Experimental setup.

Group	Model Input	Group	Model Input
A1	5 s historical velocity	C5	15 s historical velocity + $v_{m r}, v a_{\max}$
A2	5 s historical velocity + $v_{m r}, v a_{\max}, a_{\min}, a, f_{v}$	D1	15 s historical velocity + $v_{m r}$
B1	10 s historical velocity	D2	15 s historical velocity + $v a_{\max}$
B2	10 s historical velocity + $v_{m r}, v a_{\max}, a_{\min}, a, f_{v}$	D3	15 s historical velocity + $a_{\min}$
C1	15 s historical velocity	D4	15 s historical velocity + $a$
C2	15 s historical velocity + $v_{m r}, v a_{\max}, a_{\min}, a, f_{v}$	D5	15 s historical velocity + $f_{v}$
C3	15 s historical velocity + $v_{m r}, v a_{\max}, a_{\min}, a$	E	20 s historical velocity
C4	15 s historical velocity + $v_{m r}, v a_{\max}, a_{\min}$	F	30 s historical velocity

Table 6. Related parameter settings for the BPNN and LSTM models.

Model	Group	Epochs	Number of Hidden Layers	Number of Hidden Units	Learning Rate
BPNN	A1	500	1	6	0.05
	A2\B1	500	1	11	0.05
	B2\C1	500	1	15	0.05
	C2\C3\E\F	500	1	21	0.05
	C4\C5	500	1	18	0.05
	D1~D5	500	1	16	0.05
LSTM	A1~F	500	1	32	0.05

Table 7. Related parameter settings for the GPT model.

Group	Epochs	Batch Size	N	h	k	Dropout
A1	1000	100	3	6	500	0.1
A2\B1	1000	100	3	11	500	0.1
B2\C1	1000	100	3	16	500	0.1
C2\E	1000	100	3	21	500	0.1
C3	1000	100	3	20	500	0.1
C4	1000	100	3	19	500	0.1
C5	1000	100	3	18	500	0.1
D1~D5	1000	100	3	17	500	0.1
F	1000	100	3	31	500	0.1

Table 8. MAE of three VVP models with/without driving feature parameters on C_WTVC.

Group	Prediction Horizon P(s)	MAE
Group	Prediction Horizon P(s)	BPNN	LSTM	GPT
A1	1	0.37	0.81	0.55
	5	1.37	2.38	2.38
	10	2.79	4.24	4.84
A2	1	0.35	0.46	0.33
	5	1.13	1.04	1.89
	10	1.37	1.22	3.83
B1	1	0.36	0.83	0.60
	5	1.35	2.26	2.32
	10	2.70	3.94	4.72
B2	1	0.34	0.41	0.40
	5	1.21	0.98	1.96
	10	1.42	1.16	3.91
C1	1	0.35	0.82	0.55
	5	1.32	2.19	2.13
	10	2.62	3.82	4.69
C2	1	0.35	0.44	0.37
	5	1.25	0.87	1.85
	10	1.54	1.14	3.82

Table 9. Comparison of MAE of three VVP models with different numbers of driving feature parameters when tested on the C_WTVC test set.

Group	Prediction Horizon P(s)	MAE
Group	Prediction Horizon P(s)	BPNN	LSTM	GPT
C2	1	0.35	0.44	0.37
	5	1.25	0.87	1.85
	10	1.54	1.14	3.82
C3	1	0.34	0.40	0.32
	5	1.11	0.78	1.76
	10	1.32	1.08	3.61
C4	1	0.35	0.72	0.48
	5	1.31	1.73	1.98
	10	2.54	2.33	3.88
C5	1	0.38	0.88	0.55
	5	1.34	1.96	2.11
	10	2.68	3.25	4.35
D1	1	0.36	0.82	0.56
	5	1.34	2.04	2.08
	10	2.71	3.59	4.45

Table 10. The MAE of different types of driving feature parameters when examined using the three test data sets.

Group	Prediction Horizon P(s)	MAE
		BPNN			LSTM			GPT
		Cycle 1	Cycle 2	Cycle 3	Cycle 1	Cycle 2	Cycle 3	Cycle 1	Cycle 2	Cycle 3
C1	1	1.04	0.93	0.35	1.86	1.40	0.82	0.59	0.44	0.55
	5	3.25	2.46	1.32	4.57	2.58	2.19	2.35	1.90	2.13
	10	5.38	3.98	2.62	7.12	4.09	3.82	3.89	3.53	4.69
D1	1	1.03	0.94	0.36	2.03	1.52	0.82	0.50	0.34	0.56
	5	3.25	2.48	1.34	4.39	3.15	2.04	2.34	1.88	2.08
	10	5.28	4.17	2.71	6.57	3.97	3.59	3.85	3.50	4.45
D2	1	1.00	0.95	0.36	1.79	1.31	0.78	0.56	0.35	0.45
	5	3.45	2.27	1.33	3.88	2.83	1.91	2.44	1.78	2.05
	10	5.68	3.64	2.66	5.53	3.94	3.28	3.89	3.27	4.38
D3	1	1.07	0.94	0.35	1.90	1.25	0.80	0.60	0.37	0.46
	5	3.53	2.39	1.35	4.20	2.64	2.08	2.53	1.77	2.11
	10	5.69	3.75	2.73	5.96	3.92	3.47	3.83	3.22	4.55
D4	1	1.04	0.92	0.35	1.18	1.11	0.39	0.40	0.35	0.39
	5	1.90	1.27	1.02	1.30	1.20	0.72	2.23	1.73	1.85
	10	2.63	1.57	1.15	1.58	1.40	0.98	3.63	3.12	3.82
D5	1	1.04	0.92	0.37	2.12	1.68	0.69	0.56	0.39	0.43
	5	3.66	2.43	1.38	4.63	2.98	2.11	2.63	1.88	1.90
	10	5.47	4.11	2.80	7.23	4.44	3.77	3.98	3.61	3.99

Note: Cycle 1, cycle 2, and cycle 3 represent NYCBUS_CYC, NYCTRUCK_CYC, and C_WTVC, respectively.

Table 11. Velocity prediction results of sliding window length within the 10 s prediction horizon when tested using the three test data sets.

Length of Sliding Window	MAE
	BPNN			LSTM			GPT
	Cycle 1	Cycle 2	Cycle 3	Cycle 1	Cycle 2	Cycle 3	Cycle 1	Cycle 2	Cycle 3
5 s	5.49	4.09	2.79	7.37	4.33	4.24	5.59	5.72	4.84
10 s	5.42	3.86	2.70	7.31	4.23	3.94	4.14	3.72	4.72
15 s	5.38	3.98	2.62	7.12	4.09	3.82	3.89	3.53	4.69
20 s	5.40	4.19	2.81	5.07	4.29	3.89	3.60	3.61	3.95
30 s	5.68	4.32	2.86	6.69	4.70	4.22	3.67	3.66	4.15

Table 12. The MAE over different prediction horizons for the three VVP models.

Prediction Horizon	BPNN		LSTM		GPT
Prediction Horizon	MAE	R²	MAE	R²	MAE	R²
1 s	0.35	0.999	0.44	0.999	0.37	0.999
5 s	1.25	0.996	0.87	0.998	1.85	0.986
10 s	1.54	0.995	1.14	0.996	3.82	0.947
15 s	3.87	0.946	3.24	0.992	5.79	0.913
20 s	6.63	0.916	5.28	0.971	8.02	0.844

Table 13. Training and prediction times of the three VVP models.

Model	$T_{t r a i n}$ (s)	$T_{p r e}$ (s)
BPNN	3.566	0.011
LSTM	82.197	0.004
GPT	423.619	0.003

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zhang, P.; Lu, W.; Du, C.; Hu, J.; Yan, F. A Comparative Study of Vehicle Velocity Prediction for Hybrid Electric Vehicles Based on a Neural Network. Mathematics 2024, 12, 575. https://doi.org/10.3390/math12040575

AMA Style

Zhang P, Lu W, Du C, Hu J, Yan F. A Comparative Study of Vehicle Velocity Prediction for Hybrid Electric Vehicles Based on a Neural Network. Mathematics. 2024; 12(4):575. https://doi.org/10.3390/math12040575

Chicago/Turabian Style

Zhang, Pei, Wangda Lu, Changqing Du, Jie Hu, and Fuwu Yan. 2024. "A Comparative Study of Vehicle Velocity Prediction for Hybrid Electric Vehicles Based on a Neural Network" Mathematics 12, no. 4: 575. https://doi.org/10.3390/math12040575

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

A Comparative Study of Vehicle Velocity Prediction for Hybrid Electric Vehicles Based on a Neural Network

Abstract

1. Introduction

2. Vehicle Velocity Feature Selection and Preprocessing

3. VVP Principle and Models

3.1. The Principle of VVP

3.2. The BPNN Model

3.3. The LSTM Model

3.4. The GPT Model

4. Experimental Setup and Performance Metrics

4.1. Experimental Setup

4.2. Performance Metrics

5. Simulation Results and Discussion

5.1. Analysis of Prediction Accuracy

5.2. Analysis of Computation Time

6. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI