Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction

Zheng, Guangji; Li, Ye; Xia, Yu

doi:10.3390/app15031055

Open AccessArticle

Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction

by

Guangji Zheng

^*,

Ye Li

and

Yu Xia

Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha 410075, China

^*

Author to whom correspondence should be addressed.

Appl. Sci. 2025, 15(3), 1055; https://doi.org/10.3390/app15031055

Submission received: 3 December 2024 / Revised: 16 January 2025 / Accepted: 18 January 2025 / Published: 21 January 2025

(This article belongs to the Section Energy Science and Technology)

Download

Browse Figures

Versions Notes

Abstract

:

Crude oil price forecasting contributes to global economic development. This study proposes a hybrid deep learning model for crude oil price forecasting. First, empirical wavelet transform decomposes raw data into multiple. Then, three neural networks generate preliminary forecasts, which are subsequently refined by a reinforcement learning-based ensemble method. Finally, an error correction module handles residuals, further enhancing the forecasting outcomes. Three West Texas Intermediate datasets and additional emergency scenarios were used to validate the hybrid model. The findings indicate that the proposed model achieves superior predictive performance compared with sixteen benchmark methods and three advanced models.

Keywords:

crude oil price forecasting; empirical wavelet transform; neural networks; error correction module; hybrid model

1. Introduction

Crude oil is a key strategic resource for social and economic development. It remains important in global industrial production, even though alternative energy sources exist [1]. In addition, crude oil price volatility has an important impact on economic growth, securities markets, and national security. Efficient and accurate crude oil price forecasting contributes to global economic development. However, the price is affected by lots of factors, including geopolitical conflicts, oil production, and the pace of economic development [2]. Hence, crude oil price data are highly nonlinear, non-stationary, and stochastic. To precisely forecast the crude oil price, researchers have made lots of contributions [3].

Common crude oil price forecasting models include causal relationship forecasting models, statistical models, machine learning models, and hybrid forecasting models [4]. The causal relationships model selects several independent variables to predict future values, but it needs many data sources, and complicated factors, high research costs, and interactions between independent variables limit its use [5]. The current international situation changes quickly, and slow updates make the causal relationships model less suited to short-term forecasting.

Statistical models, in contrast, have simpler structures and meet short-term speed needs. Typical statistical methods include the autoregressive integrated moving average (ARIMA) [6], and gray models (GM) [7]. Yet crude oil prices often show strong nonlinearity. Hence, such statistical models may fail to capture sudden or large price swings. This problem makes accurate prediction difficult. As global markets grow more volatile, better approaches are needed to handle complex data patterns.

Many scholars use deep learning models to predict time series [8,9,10,11]. Zhou et al. employed a deep neural network (DNN) model to forecast equity premiums [8]. They compared the DNN model with ordinary least squares (OLS) and historical average (HA) models. Experimental findings indicated that the DNN model demonstrated superior predictive performance. Cen Z. et al. performed short-term forecasts of WTI and Brent crude oil prices using long short-term memory (LSTM) networks by reducing the influence of historical data while increasing that of current data. Their results showed that this approach produced lower prediction errors [12]. Busari G. A. et al. compared the predictive performance of a gated recurrent unit (GRU) model with that of a single LSTM network, revealing that GRU effectively reduced prediction errors in crude oil price models [13]. Additionally, several studies employed group method of data handling (GMDH) networks for time series forecasting and reported favorable results on respective datasets [14,15]. Foroutan et al. implemented 16 deep learning architectures to predict daily oil prices for WTI and Brent. Their findings indicated that the temporal convolutional network (TCN) surpassed the other models [16].

However, a single machine learning model may not adapt well to all time series conditions. Therefore, more researchers turn to hybrid forecasting models. Two common approaches in crude oil price forecasting are the decomposition algorithm and the ensemble algorithm [17]. The decomposition method handles raw crude oil price data, which are nonlinear and random. Wang et al. used integrated empirical mode decomposition (EEMD) to preprocess crude oil prices [18]. Experimental results showed that this decomposition method could effectively improve the accuracy of convolutional neural networks (CNNs) and LSTM networks. Liu et al. adopted variational modal decomposition (VMD) to process original data, then used an artificial neural network for each sub-series [19]. Their results showed reduced prediction errors. Lin et al. applied complementary EEMD (CEEMD) to split crude oil data into multiple subsequences [20]. It could be seen that the method improved the recognition ability for nonlinear data of the GRU. The ensemble method blends different base predictors to reduce forecasting errors. One common ensemble approach is the heuristic method [21]. Zeng et al. proposed a hybrid model with particle swarm optimization (PSO) [22]. Their experiments on corn and wheat future price series showed better results than other methods. Qu et al. used the sine cosine algorithm-whale optimization algorithm (SCWOA) to optimize the weights of four deep learning methods [23]. Experiments showed that the accuracy of this model was superior to that of all single models. Alruqimi et al. applied gray wolf optimization (GWO) to optimize the weights of base predictors [24]. They found that GWO effectually reduced the prediction errors. These methods underscore the importance of mixing different strategies to improve the crude oil price forecasting performance.

In summary, scholars have made many important advances in crude oil price forecasting. Table 1 briefly summarizes the algorithms proposed in these papers, the advantages and disadvantages of the models, and their publication dates.

Although lots of effective prediction methods have been proposed in time series prediction, there are still some limitations. There is still some potential to improve the forecasting precision of hybrid models. (1) The heuristic algorithm as a common ensemble learning method has been studied by many scholars, and they have found it difficult to make a breakthrough. Compared with the heuristic algorithm, reinforcement learning (RL) has a strong self-learning ability. Hence, the application of RL in ensemble learning is worth studying. (2) There are still some predictable components in the forecasting residuals of the hybrid model [25]. To further improve the performance of the hybrid model, the residuals can be used to modify prediction results.

Therefore, to further improve the accuracy and robustness of crude oil price forecasting, a new prediction method is proposed. This method integrates EWT, three neural networks (TCN, GRU, and ESN), reinforcement learning-based weight optimization, and a final error correction module. The main contributions of this study are as follows. The crude oil price series were adaptively decomposed into multiple subsequences using EWT according to their fluctuation characteristics, reducing nonlinearity and delivering higher predictive accuracy compared with classical decomposition methods. In addition, combining three deep neural networks leveraged their extensive hidden layers to manage non-stationary data and exploit complementary strengths, thereby enhancing both adaptability and generalization in forecasting. The SARSA was employed to optimize the ensemble weights among diverse base predictors. Moreover, a machine learning-based error correction module was applied to address predictable residual components, thereby further improving the accuracy and robustness of crude oil price forecasting.

Empirical evaluations on three crude oil price datasets indicate that the model effectively handles nonlinear and volatile market conditions, achieving a mean absolute error (MAE) of USD 0.1239 and a mean absolute percentage error (MAPE) of 0.1914% in the best scenario. Furthermore, additional experiments incorporating abrupt price fluctuations induced by geopolitical events verified the model’s adaptability, as it accurately identified mutation points and outperforms advanced benchmarks, reducing the MAE to USD 0.3510 under high volatility. These findings emphasize the potential of this hybrid approach to enhance forecasting accuracy, offering a reliable tool for decision-makers in the energy sector and financial markets.

The article is organized as follows. Section 2 elaborates on the structure of our proposed model and the theoretical knowledge involved. Section 3 analyzes the crude oil price data and conducts corresponding comparative experiments. The application analysis of the proposed model is shown in Section 4. The conclusions and future research related to this article are outlined in Section 5.

2. Methodology

2.1. Structure of the Hybrid Model

The modeling process for our hybrid model is given in Figure 1. The forecasting model includes three modules: data preprocessing and prediction, prediction results ensemble, and error correction. The detailed modeling process is as follows.

A.: Data preprocessing and prediction: The raw data are divided into three parts: training sets, validation sets, and test sets. Then, EWT is applied to decompose the three parts adaptively. The base predictors, the TCN, GRU, and ESN, predict the decomposed sub-series. The details of EWT and predictors are given in Section 2.2 and Section 2.3, respectively.
B.: Prediction results ensemble: In this paper, the ensemble method is completed by setting weights of the forecasting results of different predictors. The calculation method is shown in Equation (1). The SARSA algorithm, a reinforcement learning method, is applied to optimize the weights. The SARSA-based ensemble learning method is detailed in Section 2.4.

\hat{Y} (t) = w_{1} \hat{y_{1}} (t) + w_{2} \hat{y_{2}} (t) + w_{3} \hat{y_{3}} (t)

(1)

where

w_{i}

represent the weights of different predictors,

\hat{y_{i}} (t)

represent the time-dependent forecasting results of the i–th predictor at time

t

, derived from distinct predictive models, and

\hat{Y} (t)

represent the predicted results after the ensemble.

C.: Calculating error and error correction: The forecasting residuals still have predictable components after ensemble learning. Hence, there is still room for further improvement in forecasting accuracy. An extreme learning machine (ELM) is used for error correction. The final crude oil price prediction results are obtained by combining error correction results and model ensemble results. The ECM is detailed in Section 2.5.

2.2. Empirical Wavelet Transform

EWT is an adaptive signal decomposition method, which is based on EMD [26,27]. The principle of EWT is to decompose the Fourier spectrum adaptively by marking maximum points in the frequency domain [28], then construct several corresponding filters to process the data. EWT has the advantages of sufficient theoretical assurance, good adaptability, simple computation, and lacking mode aliasing. EWT includes the empirical wavelet function and empirical scale function. The equations of these two functions are as follows [29]:

ϕ_{n} (ω) = \{\begin{cases} 1 & |ω| \leq (1 - γ) ω_{n} \\ \cos [\frac{π}{2} β (\frac{1}{2 γ ω_{n}} (|ω| - (1 - γ) ω_{n}))] & (1 - γ) ω_{n} \leq |ω| \leq (1 + γ) ω_{n} \\ 0 & otherwise \end{cases}

(2)

ψ_{n} (ω) = \{\begin{cases} 1 & (1 + γ) ω_{n} \leq |ω| \leq (1 - γ) ω_{n + 1} \\ \cos [\frac{π}{2} β (\frac{1}{2 γ ω_{n + 1}} (|ω| - (1 - γ) ω_{n + 1}))] & (1 - γ) ω_{n + 1} \leq |ω| \geq (1 + γ) ω_{n + 1} \\ \sin [\frac{π}{2} β (\frac{1}{2 γ ω_{n}} (|ω| - (1 - γ) ω_{n}))] & (1 - γ) ω_{n} \leq |ω| \leq (1 + γ) ω_{n} \\ 0 & otherwise \end{cases}

(3)

where

β (x)

and parameter

γ

are defined below:

β (x) = \{\begin{cases} 1 & x \geq 1 \\ 0 & x \leq 0 \\ β (x) + β (1 - x) = 1 & otherwise \end{cases}

(4)

γ < \min_{n} (\frac{ω_{n + 1} - ω_{n}}{ω_{n + 1} + ω_{n}}), γ \in (0, 1)

(5)

where

γ

represents the width ratio of the transition region between adjacent frequency bands, constrained within

(0, 1)

.

(ω_{n})

and

(ω_{n + 1})

denote the upper boundary of the frequency band and the lower boundary of the

(n + 1) - t h

frequency band, respectively, expressed in normalized frequency.

(ω_{n + 1} - ω_{n})

defines the frequency gap between adjacent bands, while

(ω_{n + 1} + ω_{n})

normalizes the gap.

2.3. Forecasting Methods

2.3.1. Temporal Convolutional Network

Bai et al. first combined various convolution methods with a one-dimensional CNN to make the CNN more applicable to time series data, and this network is called a temporal convolutional network [30]. A TCN is made up of stacked residual modules that combine extended convolution and two-layer causal convolution, and the weights of each convolution kernel are normalized. The TCN adds the nonlinear relation between convolution layers using a rectified linear unit (ReLU) and avoids overfitting by using a dropout. In general, there are four main portions in the TCN: causal convolution, one-dimensional CNN, extended convolution, and residual connection. A detailed explanation of each portion is given below.

Causal convolution: A TCN imparts causality to convolution, so the TCN will not produce information “leakage”, improving its prediction accuracy. Assuming that the input sequence of the model is

\{x_{0}, x_{1}, \dots, x_{T}\}

, and the expected predicted output is, causal convolution gives predicted output

y_{t}

at time

t

only decided by

\{y_{0}, y_{1}, \dots, y_{T}\}

, and not affected by

\{x_{t + 1}, x_{t + 2}, \dots, x_{T}\}

.

One-dimensional CNN: The TCN uses one-dimensional CNN to generate an output sequence with an equal length to the input sequence [31], which can retain the information contained in the entire input data and construct long-term memory.

Extended convolution: The TCN uses extended convolution to obtain a larger receptive field. For a one-dimensional sequence

x \in R^{n}

and convolution kernels

f : \{0, 1, \dots, k - 1\}

, the extended convolution equation

F

on the sequence element

M

is defined as:

F (s) = \sum_{i = 0}^{k - 1} f (i) \cdot x_{m - v \cdot i}

(6)

where

v

is the expansion coefficient,

k

is the size of the convolution kernels, and

(m - v \cdot i)

indicates that the

(m - v \cdot i) t h

element of upper layer is used.

From Equation (6), it can be seen that enlarging the convolution kernel size

k

or increasing the expansion coefficient

v

can increase the receptive field of the network. Generally, the number of extended networks at layer

i

is:

v = O (2^{i})

(7)

Residual connection: The TCN guarantees the accuracy of the deep network by residual connection. We define the expression of residual connection with

x

as the input and

y

as the output as follows:

y = η (x, ρ) + x

(8)

where

η

represents a series of convolution operations and

ρ

is the weight matrix of the convolution kernel.

According to Equation (8), the output of the residual module combines the input information with the output information of the convolution calculation, which ensures the accuracy of the deep TCN network model.

2.3.2. Gated Recurrent Unit

A RNN considers the influence of the previous state on the current state during the prediction by adding a recurrent structure to the model. This structure is beneficial in predicting the time series data [32]. However, the traditional RNN is prone to the bottlenecks of gradient disappearance or gradient explosion when handling long time series problems. The LSTM introduces the gated structure for solving the gradient problem in RNNs [33]. Compared with LSTM, a GRU only needs an update gate and reset gate, which can save memory and speed up the computation.

The equations of the update gate and reset gate of the GRU are as follows [34]:

z_{t} = σ (W_{z} * [h_{t - 1}, x_{t}])

(9)

r_{t} = σ (W_{r} * [h_{t - 1}, x_{t}])

(10)

where

x_{t}

represents the current state,

h_{t - 1}

shows the last hidden state, and

W

represents the weight matrix.

2.3.3. Echo State Network

The ESN has been broadly utilized in nonlinear prediction [35]. An ESN consists of an input layer, dynamic reservoir, and output layer. The dynamic reservoir is an important module, which has many hidden layer neurons connected randomly and sparsely. Thus, an ESN has the ability of short-term memory. It has the following characteristics [36]: (a) the main construction is a random and invariable reservoir; (b) connections between neurons are generated randomly; and (c) the ESN can realize algorithm training through simple linear regression.

2.4. Multi-Predictor Ensemble

RL is a kind of interactive learning, which makes the agent learn by interacting with the environment [37]. The agent can learn the optimal strategy and adjust it automatically according to the objective, which makes RL maximize the return or achieve a fixed objective. RL has been utilized in many fields, such as autonomous driving, game confrontation, integration optimization, and so on [38,39].

The SARSA, proposed by Rummery and Niranjan [40], has strong scalability and excellent convergence. Hence, we used the SARSA to combine three neural networks in the paper. The parameters of the SARSA are defined as follows:

State: The state of each neural network weight is regarded as the state matrix

S

.

S = [w_{1}, w_{2}, w_{3}]

(11)

where

w_{1}

represents the weight of TCN,

w_{2}

represents the weight of GRU, and

w_{3}

represents the weights of the ESN.

Action: Action matrix

a

represents the action to change the weight of each network.

a = [Δ w_{1}, Δ w_{2}, Δ w_{3}]

(12)

where

Δ w_{m}

represents the change of weights for different neural networks.

The SARSA is a deterministic strategy. To better balance “exploration” and “utilization”, and make the model can search for more potential actions, the SARSA introduces the

ε - g r e e d y

policy to select the action. The definition of the

ε - g r e e d y

policy is as follows:

a_{m} = \{\begin{matrix} B a s e d o n \max Q (p r o b a b i l i t y o f 1 - ε) \\ R a n d o m (p r o b a b i l i t y o f ε) \end{matrix}

(13)

ε \in (0, 1)

(14)

where

ε

is exploration probability.

Reward: The model acquires reward

R

by calculating loss function

L

.

L = \sqrt{(\sum_{t = 1}^{N} {[Y (t) - \hat{Y} (t)]}^{2}) / N}

(15)

R = \{\begin{matrix} + 1 + L_{t} - L_{t + 1} (L_{t + 1} < L_{t}) \\ - 1 + L_{t} - L_{t + 1} (L_{t + 1} > L_{t}) \end{matrix}

(16)

where

Y (t)

represents the raw crude oil price data,

\hat{Y} (t)

indicates the forecasting value.

Action value function Q: The iterative equation of value function is expressed as follows:

Q_{n + 1} (S_{n}, a_{n}) = Q_{n} (S_{n}, a_{n}) + η_{n} (R (S_{n}, a_{n}) + γ Q_{n} (S_{n + 1}, a_{n + 1}) - Q_{n} (S_{n}, a_{n}))

(17)

where

η

indicates learning rate,

γ

indicates discount coefficient.

The solving processes of the ensemble module are described in Algorithm 1.

Algorithm 1 Ensemble module based on the SARSA

Input:
Forecasting results for the TCN, GRU, and ESN:

{\hat{Y}}_{T}, {\hat{Y}}_{G}, {\hat{Y}}_{E}

.
The maximum number of episodes:

Z

.
The maximum step of each episode:

K

.
Discount coefficient:

γ

.
Learning rate:

η

.
Output: Weights of the three predictors:

w_{1}, w_{2}, w_{3}

.
Algorithm:
1: Initialize all parameters
2: for z = 1: Z do
3: for k = 1: K do
4: Construct loss function

L

and reward

R

:

L = \sqrt{(\sum_{t = 1}^{N} {[Y (t) - \hat{Y} (t)]}^{2}) / N}, R = \{\begin{matrix} + 1 + L_{t} - L_{t + 1} (L_{t + 1} < L_{t}) \\ - 1 + L_{t} - L_{t + 1} (L_{t + 1} > L_{t}) \end{matrix}

5: Select

a

through the

ε - g r e e d y

policy

a_{m} = \{\begin{matrix} B a s e d o n \max Q (p r o b a b i l i t y o f 1 - ε) \\ R a n d o m (p r o b a b i l i t y o f ε) \end{matrix}, ε \in (0, 1)

6: Compute loss function

L

and reward

R

and update

Q

table:

Q_{n + 1} (S_{n}, a_{n}) = Q_{n} (S_{n}, a_{n}) + η_{n} (R (S_{n}, a_{n}) + γ Q_{n} (S_{n + 1}, a_{n + 1}) - Q_{n} (S_{n}, a_{n}))

7: end for
8: end for

2.5. Error Correction Module

When confronted with highly nonlinear, non-stationary and stochastic crude oil price series, the ensemble module often fails to capture all potential patterns, leading to errors. The error series still contain valuable information, revealing predictive elements not fully captured by the ensemble module. The ECM can analyze and leverage these error series, thereby further improving prediction accuracy. A more accurate prediction can be obtained by overlaying the correction output from the ECM with the prediction from the ensemble module.

The ELM only requires random initialization of hidden layer weights during training, and then solves the output layer weights using least squares. This approach avoids the high computational costs associated with gradient-based iterative training. In error correction, this rapid training strategy effectively captures and fits the residual information. It also decreases the number of training iterations required and minimizes the need for extensive hyperparameter tuning. Therefore, the ELM is utilized to correct the ensemble module’s prediction errors. The solving steps of the ECM are as follows:

Step 1: The

E_{1} (t)

, which is the difference between the validation set result

\hat{Y_{1}} (t)

in Module 2 and the corresponding true value

Y_{1} (t)

, is the training set of the ELM:

E_{1} (t) = Y_{1} (t) - \hat{Y_{1}} (t)

(18)

Step 2: The error correction series

E_{2} (t)

is obtained by training the ELM. The final crude oil price forecasting result can be obtained by adding

E_{1} (t)

to the test set result

\hat{Y_{2}} (t)

in Module 2:

Y_{2} (t) = \hat{Y_{2}} (t) + E_{2} (t)

(19)

3. Experiments

3.1. Crude Oil Price Datasets

The closing spot prices for different time scales in West Texas Intermediate (WTI) crude oil were selected to comprehensively analyze the forecasting performance of our proposed model. The data were chosen from the US energy information administration (EIA). The data could be accessed through their official website: http://www.eia.gov/ (accessed on 26 December 2024). These spot prices are widely recognized as a benchmark for short-term oil market analysis and reflect the actual cash market value. The three datasets were the daily price data from 15 February 2013 to 5 November 2018, the weekly price data from 27 September 1992 to 20 June 2021, and the monthly price data from July 1998 to October 2021. Figure 2 shows the detailed volatility shape of three datasets. Table 2 presents statistical information on three crude oil price datasets. Data from different time scales (daily, weekly, and monthly) were selected to thoroughly evaluate the model’s forecasting performance.

For consistency in dataset length, 1500 crude oil price data were used for the first two datasets. However, the third dataset was monthly and had fewer available records, so 400 crude oil price data were chosen for clarity. Furthermore, dataset #1 and dataset #2 had a different volatility shape. All of these verified the forecasting performance and generalization ability of our proposed model for different types of price data.

In the paper, the raw price data were divided into three parts: training sets, validation sets, and test sets. The proportion of these three parts was 3:1:1 [41,42,43]. The training sets were utilized to train base predictors. Validation sets were applied to train the ensemble optimization method and error correction method. The test sets were utilized to evaluate forecasting performance. The dataset was relatively small. If the validation or test set is proportionally too large, the training data become insufficient. This diminishes the model’s learning capacity. Conversely, if the validation or test set is too small, the evaluation may be unreliable. Therefore, a 3:1:1 data split provides a suitable balance.

In the paper, the forecasting process used a single-step-ahead prediction framework with a sliding window of size 5. For each new time step, the most recent five real observations were fed into the neural network to generate the next single-step forecast. Thus, the state of the model is updated using real data rather than predicted data, because each time the actual value becomes available, it replaces the oldest data in the sliding window. Furthermore, forecasting is performed one by one (i.e., step-by-step). After the network produces a single-step forecast at time

t + 1

based on

\{t - 4, t - 3, t - 2, t - 1, t\}

real data, the window then slides by one step, and the newly observed real data at

t + 1

enters the window for forecasting

t + 2

. This approach ensures that each prediction relies on the most up-to-date and accurate input sequence, thereby reducing the compounding of errors that can occur when predicted values are fed back into the model.

Training assumptions: It was assumed that the raw crude oil price data, although relatively small, provided sufficient coverage of various market conditions for training the proposed model. A 3:1:1 data split was chosen to balance learning capacity and evaluation reliability, ensuring that neither the training set nor the validation/test sets was too small. During hyperparameter tuning, the models were trained for a limited number of iterations due to computational constraints. Although the number of training iterations was restricted, it was assumed that this procedure still converged to a near-optimal set of parameters and hyperparameters. It was assumed that a single-step-ahead prediction approach with a sliding window of size 5 adequately captured temporal dependencies.

Validation assumptions: It was assumed that the validation set, drawn from the same overall distribution as the training set, represented typical variations in the crude oil market despite the dataset’s limited size. Validation was performed step by step, with each new real observation being slid into the model input window to reduce the accumulation of forecast errors. It was assumed that this method preserves the inherent time dependencies and ensures a realistic performance evaluation. It was likewise assumed that the 3:1:1 data split ratio struck a balance between sufficient training data and reliable validation.

All experiments were carried out on computers equipped with Core i7-9800X 3.8 GHz CPU, 16 G memory, and a single GPU 2080Ti. The deep learning framework was PyTorch 1.10.1. Table A1 in the Appendix A lists the hyperparameters for the proposed model.

Additionally, because some hybrid models were used in this paper, they were renamed for easier tracking. The correspondence between these hybrid models and their new names is shown in Table 3.

3.2. Evaluation Indicators

The statistical indicator is an important index to analyze forecasting accuracy. The MAE, MAPE, and root mean square error (RMSE) are utilized in the article. Moreover, to directly compare the forecasting precision of different methods, the promoting percentages of the MAE, MAPE, and RMSE (

P_{M A E}

,

P_{M A P E}

, and

P_{R M S E}

) are applied. The relevant equations are given as follows:

\{\begin{cases} M A E = (\sum_{t = 1}^{N} |Y (t) - \hat{Y} (t)|) / N \\ M A P E = (\sum_{t = 1}^{N} |(Y (t) - \hat{Y} (t)) / Y (t)|) / N \\ R M S E = \sqrt{(\sum_{t = 1}^{N} {[Y (t) - \hat{Y} (t)]}^{2}) / N} \end{cases}

(20)

\{\begin{cases} P_{M A E} = (M A E_{a} - M A E_{b}) / M A E_{a} \\ P_{M A P E} = (M A P E_{a} - M A P E_{b}) / M A P E_{a} \\ P_{R M SE} = (R M S E_{a} - R M S E_{b}) / R M S E_{a} \end{cases}

(21)

where

Y (t)

represents the true crude oil price series for time

t

,

\hat{Y} (t)

represents the prediction crude oil price series for time

t

, and

N

indicates the number of samples in

Y (t)

.

3.3. Experimental Evaluation of the Proposed Model

3.3.1. Comparison with Diverse Benchmark Models

To verify the superiority of the TCN, GRU, and ESN deep networks in crude oil price forecasting, we compared the three predictors with six different algorithms, which were the BPNN, RBFNN, GRNN, LSTM, CNN, and GMDH algorithms. Table 4 visually shows the evaluation indicators of different predictors. Figure 3 shows the prediction errors of different predictors in crude oil price series #1. Figure 4 presents the forecasting results for different predictors in series #1. The following conclusions can be drawn from Table 4, Figure 3 and Figure 4:

a.: The forecasting precision of the deep networks was superior to that of other traditional algorithms in all cases. This shows that the deep networks can better identify the volatility shape of raw data. The probable reason is that deep networks have rich hidden layers, which improves their performance to deal with non-stationary data and could mine the deep information more effectively.
b.: The TCN, GRU, and ESN showed better prediction performance than the LSTM and CNN on three datasets. Possible reasons are as follows. First, the TCN incorporates dilated convolution and residual structures. These features enable a large receptive field in a relatively shallow network and maintain stable gradient propagation while capturing long-range dependencies. Second, the GRU employs a simplified gating mechanism that lowers model complexity. Hence, it achieves better training efficiency and faster convergence when handling random fluctuations and long-term dependencies. Third, the ESN utilizes a reservoir with randomly sparse connections, which helps capture dynamic features in crude oil price data more effectively. It also maintains robust performance under highly random conditions. In contrast, the LSTM and CNN have more complex structures or limited receptive fields. When they face high noise and strong non-stationarity, they often encounter unstable training or inadequate long-term dependence capture. Therefore, the TCN, GRU, and ESN offer stronger adaptability and prediction ability for non-stationary and random crude oil price data.

3.3.2. Comparison with Models Utilizing Diverse Ensemble Strategies

To verify that the ensemble method had optimal prediction accuracy, model 4 was compared with the TCN, GRU and ESN. In addition, to fully verify the superiority of the ensemble strategy employing RL over traditional heuristic ensemble methods, the SARSA was compared with several classical heuristic algorithms, including the BHA, GA, and GWO. Table 5 visually shows the evaluation indicators of experimental models. Table 6 presents the promoting percentages of ensemble learning methods. Figure 5 is the scatter of the prediction results for the RL and heuristic ensemble methods. The following conclusions can be drawn from Table 5 and Table 6, and Figure 5:

a.: The prediction errors of all ensemble methods were lower than those of the TCN, GRU, and ESN. Hybrid model 4 improved the accuracy of the single prediction algorithm by 4–12%. This shows that ensemble learning predicted the trend of crude oil price data more accurately than the single predictor. The possible reason is the ensemble method made excellent optimization decisions on ensemble weights according to the volatility of crude oil price series. Hence, ensemble approaches could combine the strengths for different single models to reduce prediction errors.
b.: Compared with other models, model 4 showed optimal forecasting precision. This shows that RL can optimize the ensemble weights more effectively than the traditional heuristic method, and improve the precision for the ensemble method. It could be seen that during the process for optimal decision-making, agents adjusted themselves through continuous interaction with their surroundings, which makes the RL more intelligent and the prediction results more accurate.

3.3.3. Comparison with Diverse Decomposition Methods

To fully evaluate the effectiveness of decomposition methods, we compared the forecasting results of model 4 and those of the model employing different decomposition algorithms. In addition, we also compared EWT with WPD and VMD to prove the superiority of EWT. Table 7 visually shows the evaluation indicators of experimental models, Table 8 shows the improvement of the model 4 on forecasting performance based on different decomposition methods and Figure 6 describes the comparison of scatter points predicted by experimental models. Table 9 shows MSE1, MSE2, and MSE3, which are the prediction results of each predictor on the decomposed data before using the ensemble model. From these experimental results we can conclude:

a.: The prediction performance of the proposed model utilizing the decomposition algorithm is more excellent than that of the model without the algorithm. The forecasting performance of models using the algorithm is improved by more than 40%. This shows decomposition methods greatly reduce the high fluctuation of raw data and the prediction errors.
b.: In all experiments, the prediction errors of EWT are the lowest among the three decomposition algorithms. Compared with two other decomposition algorithms, EWT could effectually decrease the nonlinearity for raw data and enhance forecasting performance. The main reason is that EWT could adaptively decompose the raw series into multiple subsequences, which enhances the ability of the ensemble method to analyze volatility characteristics.

3.3.4. Comparison with Models Using Error Correction Methods

To evaluate the validity of the ECM, we compared model 8 and the model applying the ECM. Table 10 shows the indicator results for experimental methods. Table 11 presents the promoting percentages for the ECM. Figure 7, Figure 8 and Figure 9 are the prediction results for different series. From Table 10 and Table 11, and Figure 7, Figure 8 and Figure 9, these conclusions could be summarized:

a.: ECM decomposition can effectively correct prediction residuals. In three different datasets, model 9 reduces the $P_{M A E} (%)$ , $P_{M A P E} (%)$ and $P_{R M S E} (%)$ of model 8 by more than 15%. This indicates that the ECM could well excavate the predictable components hidden in the residuals and improve forecasting performance.
b.: Each module of our proposed model can effectively reduce prediction errors. For example, the MAEs of TCN, GRU, ESN, model 4, mode 8, and model 9 were USD 0.7462, USD 0.7680, USD 0.8262, USD 0.7119, USD 0.1728, and USD 0.1239, respectively. This indicates that the RL-based ensemble method in model 4 can make the optimal weight decision when combining different predictors, which can realize the complementary advantages of base predictors. The decomposition method in model 8 reduces the non-stationarity and randomness of raw data and improves the prediction accuracy. Model 9 could correct the prediction residuals based on the first two modules, which could minimize the prediction errors of our hybrid model.

3.4. Supplementary Experiments

Emergencies, including geopolitical conflicts, extreme weather, and financial crises, trigger sharp fluctuations in crude oil prices. These fluctuations often result in atypical trading behaviors and exert significant impacts on supply and demand. For forecasting models, such extreme conditions challenge their ability to learn from historical patterns and assess their robustness and adaptability to abrupt changes. Therefore, this section specifically employs crude oil price data encompassing periods of geopolitical conflict. By evaluating the proposed model’s predictive accuracy under severe price fluctuations, we more effectively gauged its capacity to handle volatile conditions. To further verify the efficiency of the model, we selected advanced forecasting methods as baselines, such as Liu’s model [44], Mi’s model [39], and Huang’s model [45]. Table 12 presents the evaluation indicator of the experimental models. Figure 10 shows the original crude oil price data of supplementary experiments and Figure 11 is the variation trend of forecasting results. From these results, we can obtain the following conclusions.

a.: In all cases, the forecasting precision of the hybrid model was better than that of the base predictor. This indicates that because the crude oil price data showed obvious chaos and fluctuation, it was difficult for a single model to accurately capture the changes in crude oil price, especially at mutation points. Hence, it was necessary to adopt an effective hybrid model to achieve precise forecasting for the crude oil price.
b.: Compared with other state-of-the-art models, the EWT-SARSA-TGE-ELM captured the changes at the mutation points more accurately. The main reason is that our model fully combines the advantages of single algorithms, which makes the model more robust and generalizable. EWT adaptively decomposed the original data into multiple sub-sequences to decrease the nonlinearity and randomness of raw data. In addition, the RL-based ensemble method made the optimal weight decision and combined different predictors to achieve the complementary advantages of base predictors. In addition, the ECM well extracted the predictable components hidden in residuals. In the context of the geopolitical conflict, the ECM focused on the leftover signals in residuals that reflected abrupt changes in price trends. A more accurate prediction could be obtained by overlaying the correction output from the ECM with the prediction from the ensemble module. Hence, our proposed model has potential application in crude oil price forecasting.

4. Application Analysis

4.1. Real-Time Adaptability

Deep learning and reinforcement learning are utilized in our model. Hence, the training process was time-consuming. To further evaluate the proposed model, this section contains analysis of its real-time performance and computational efficiency. All experiments were carried out on computers equipped with Core i7-9800X 3.8 GHz CPU, 16 G memory, and a single GPU 2080Ti. Table 13 presents the computational efficiency of different models. Analyzing the data in the table led to the following conclusions.

a.: Hybrid models require longer computation time than single models. This is primarily due to hybrid models’ intricate structure, which prolongs computation time compared with individual models. Moreover, using RL for ensemble optimization is slower than employing heuristic algorithms. This occurs because RL requires extensive training iterations during ensemble optimization to explore optimal strategies. During each decision-making step, RL not only evaluates the action values of the current state but also predicts the state and action values of the next step, thereby updating the current strategy. These procedures increase computational complexity. In contrast, heuristic algorithms optimize directly using rules or prior knowledge, requiring fewer iterations and thus achieving higher computational efficiency. However, the maximum time for RL was 105.36 s, which was considerably shorter than the model’s time interval (1 day). Therefore, despite the longer computation time for RL, such durations remain acceptable given the more excellent forecasting performance achieved.
b.: A highly adaptive real-time model can rapidly respond to market fluctuations and promptly update forecast outcomes, thus providing investors and decision-makers with timely and accurate references. As shown in the accompanying table, the model’s computational time was in the range of [663.46 s, 1297.93 s] across three datasets. The longest recorded computation time was 300 s. Notably, the shortest dataset interval in this study was one day, indicating that the model’s computation time is considerably shorter than the minimum data update interval. Consequently, the proposed model demonstrated robust real-time adaptability. Furthermore, the proposed model shows substantial potential for real-world applications, providing strong technical support for accurate crude oil price forecasting.

4.2. Application

Accurate and efficient crude oil price forecasting is essential for strategic planning across multiple stakeholders. Our proposed model shows significant potential for application in crude oil price forecasting. Similarly, this principle remains valid when confronting unforeseen contingencies in the crude oil market. For instance, supplementary experiments reveal that, even under pronounced price volatility triggered by geopolitical conflicts, the model sustained exceptional predictive accuracy. Consequently, the proposed forecasting framework allows energy companies to predict future price trends, manage price risks judiciously, maintain stable production costs, and ultimately improve profitability. Furthermore, because crude oil prices serve as a pivotal commodity in the futures market, their fluctuations substantially affect the stock market and other speculative investments. Timely adaptation to these variations can help the investment sector optimize asset allocation strategies and enhance capital efficiency. Lastly, accurate crude oil price forecasts offer vital insights for governmental policy-making, helping policymakers promptly address environmental pollution and energy scarcity.

5. Conclusions and Future Researches

5.1. Conclusions

Efficient and accurate crude oil price forecasting contributes to global economic development. A new model using ensemble deep reinforcement learning with error correction is proposed in this article. The article’s conclusions are as follows. First, deep neural networks with multiple hidden layers effectively handle non-stationary data and extract complex information. Among the base predictors, TCN, GRU, and ESN achieved superior forecasting accuracy. Second, the ensemble method capitalized on the complementary strengths of the base predictors, improving overall forecasting performance. The RL agent learned the optimal strategy and automatically adjusted it according to the objective, enabling the system to maximize returns or fulfill predefined goals. Hence, RL could autonomously learn and make decisions compared with heuristic algorithms. Additionally, the decomposition method effectively mitigated the nonlinearity of the original series. EWT adaptively partitions the raw data into multiple subsequences, enhancing the forecasting accuracy of the SARSA-TCN-GRU-ESN model. Furthermore, employing the ECM enhanced forecasting performance by extracting predictable components from the ensemble model’s residuals, thus reducing prediction errors. Lastly, the proposed model exhibited robust forecasting performance during emergencies. Compared with three state-of-the-art models, it more precisely captured volatility patterns at mutation points, underscoring its promising potential in crude oil price forecasting.

5.2. Future Research

Although the proposed model exhibits excellent forecasting performance for crude oil prices, it still suggests some directions for further research. First, incorporating additional oil-related indicators for multivariate forecasting could enhance the model’s predictive capability. This strategy enriches quantitative relationships and more accurately captures complex supply–demand and risk patterns. Additionally, because deep learning models often function like a “black box,” we should strengthen interpretability analysis. For instance, weight visualizations, feature-importance analyses, or integrated explainable models could be employed. Such approaches will help researchers and decision-makers comprehend the model’s reasoning processes and evaluate its reliability and stability.

Author Contributions

Conceptualization, Y.X.; methodology, G.Z.; validation, G.Z.; visualization, G.Z. and Y.X.; writing—original draft, G.Z. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported by the Postgraduate Scientific Research Innovation Project of Hunan Province (Funding number: CX20220280).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data were chosen from the US energy information administration (EIA). The data could be accessed through their official website: http://www.eia.gov/ (accessed on 26 December 2024).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:

Abbreviations
AI	Artificial intelligence
ARIMA	Autoregressive integrated moving average
BHA	Black-hole optimization
BPNN	Back propagation neural network
CEEMD	Complementary ensemble empirical mode decomposition
DNN	Deep neural network
ECM	Error correction method
EELM	Extended extreme learning machine
ELM	Extreme learning machine
EEMD	Ensemble empirical mode decomposition
ESN	Echo state network
EWT	Empirical wavelet transform
GA	Genetic algorithm
GARCH	Generalized autoregressive conditional heteroskedasticity
GM	Gray model
GRNN	Generalized regression neural network
GWO	Gray wolf optimization
GRU	Gated recurrent unit
HA	Historical average
IMF	Intrinsic mode function
LSSVM	Least squares support vector machine
LSTM	Long short-term memory network
MAE	Mean absolute error
MAPE	Mean absolute percentage error
NAR	Nonlinear auto regressive
OLS	Ordinary least squares
PSO	Particle swarm optimization
RL	Reinforcement learning
RMSE	Root mean square error
RNN	Recurrent neural network
SARSA	State action reward state action
TCN	Temporal convolutional network
VMD	Variational modal decomposition
WPD	Wavelet packet decomposition

Appendix A

Table A1. The experimental parameters for the proposed model.

Stage	Model/Algorithm	Parameter	Value
Data pre- processing	-	Split ratio	3:1:1
Data pre- processing	-	Sliding window	5
Decomposition	EWT	Detection method	scalespace
		Degree for the polynomial interpolation	6
		Maximum number of bands	25
		Sampling rate	1
		Filter width	10
Forecasting	ESN	Reservoir size	400
		Spectral radius	0.95
		Input scaling	0.5
		Reservoir connectivity	0.1
	TCN	Size of epochs	100
		Size of layers	4
		Kernel size	5
		Dropout	0.15
		Batch size	16
		Learning rate	0.01
	GRU	Size of epochs	100
		Size of layers	3
		Batch size	16
		Size of hidden units	100
		Learning rate	0.01
Ensemble	SARSA	Maximum number of episodes	100
		Maximum step of each episode	100
		Learning rate	0.3
		Discount coefficient	0.95
Error correction	ELM	Size of hidden neurons	500
Error correction	ELM	Activation function	ReLU

References

Kaymak, Ö.Ö.; Kaymak, Y. Prediction of crude oil prices in COVID-19 outbreak using real data. Chaos Solitons Fractals 2022, 158, 111990. [Google Scholar] [CrossRef] [PubMed]
Miao, H.; Ramchander, S.; Wang, T.; Yang, D. Influential factors in crude oil price forecasting. Energy Econ. 2017, 68, 77–88. [Google Scholar] [CrossRef]
Bisoi, R.; Dash, P.; Mishra, S. Modes decomposition method in fusion with robust random vector functional link network for crude oil price forecasting. Appl. Soft Comput. 2019, 80, 475–493. [Google Scholar] [CrossRef]
Wang, D.; Luo, H.; Grunder, O.; Lin, Y.; Guo, H. Multi-step ahead electricity price forecasting using a hybrid model based on two-layer decomposition technique and BP neural network optimized by firefly algorithm. Appl. Energy 2017, 190, 390–407. [Google Scholar] [CrossRef]
Tang, L.; Wu, Y.; Yu, L. A non-iterative decomposition-ensemble learning paradigm using RVFL network for crude oil price forecasting. Appl. Soft Comput. 2018, 70, 1097–1108. [Google Scholar] [CrossRef]
Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175. [Google Scholar] [CrossRef]
Wang, Q.; Song, X. Forecasting China’s oil consumption: A comparison of novel nonlinear-dynamic grey model (GM), linear GM, nonlinear GM and metabolism GM. Energy 2019, 183, 160–171. [Google Scholar] [CrossRef]
Zhou, X.; Zhou, H.; Long, H. Forecasting the equity premium: Do deep neural network models work? Mod. Financ. 2023, 1, 1–11. [Google Scholar] [CrossRef]
Chen, J.; Wei, X.; Liu, Y.; Zhao, C.; Liu, Z.; Bao, Z. Deep Learning for Water Quality Prediction—A Case Study of the Huangyang Reservoir. Appl. Sci. 2024, 14, 8755. [Google Scholar] [CrossRef]
Chen, L.; Pelger, M.; Zhu, J. Deep learning in asset pricing. Manag. Sci. 2024, 70, 714–750. [Google Scholar] [CrossRef]
Fan, X.; Wang, R.; Yang, Y.; Wang, J. Transformer–BiLSTM Fusion Neural Network for Short-Term PV Output Prediction Based on NRBO Algorithm and VMD. Appl. Sci. 2024, 14, 11991. [Google Scholar] [CrossRef]
Cen, Z.; Wang, J. Crude oil price prediction model with long short term memory deep learning based on prior knowledge data transfer. Energy 2019, 169, 160–171. [Google Scholar] [CrossRef]
Busari, G.A.; Lim, D.H. Crude oil price prediction: A comparison between AdaBoost-LSTM and AdaBoost-GRU for improving forecasting performance. Comput. Chem. Eng. 2021, 155, 107513. [Google Scholar] [CrossRef]
Lytvynenko, V.; Wojcik, W.; Fefelov, A.; Lurie, I.; Savina, N.; Voronenko, M.; Boskin, O.; Smailova, S. Hybrid methods of GMDH-neural networks synthesis and training for solving problems of time series forecasting. In Proceedings of the Lecture Notes in Computational Intelligence and Decision Making: Proceedings of the XV International Scientific Conference “Intellectual Systems of Decision Making and Problems of Computational Intelligence”(ISDMCI’2019), Salisnyj Port, Ukraine, 21–25 May 2019; pp. 513–531. [Google Scholar]
Sobolewski, Ł.; Miczulski, W. Methods of constructing time series for predicting local time scales by means of a GMDH-type neural network. Appl. Sci. 2021, 11, 5615. [Google Scholar] [CrossRef]
Foroutan, P.; Lahmiri, S. Deep learning systems for forecasting the prices of crude oil and precious metals. Financ. Innov. 2024, 10, 111. [Google Scholar] [CrossRef]
Li, J.; Zhu, S.; Wu, Q. Monthly crude oil spot price forecasting using variational mode decomposition. Energy Econ. 2019, 83, 240–253. [Google Scholar] [CrossRef]
Wang, J.; Zhang, T.; Lu, T.; Xue, Z. A hybrid forecast model of EEMD-CNN-ILSTM for crude oil futures price. Electronics 2023, 12, 2521. [Google Scholar] [CrossRef]
Liu, W.; Wang, C.; Li, Y.; Liu, Y.; Huang, K. Ensemble forecasting for product futures prices using variational mode decomposition and artificial neural networks. Chaos Solitons Fractals 2021, 146, 110822. [Google Scholar] [CrossRef]
Lin, H.; Sun, Q. Crude oil prices forecasting: An approach of using CEEMDAN-based multi-layer gated recurrent unit networks. Energies 2020, 13, 1543. [Google Scholar] [CrossRef]
Zhou, Y.; Wang, J.; Lu, H.; Zhao, W. Short-term wind power prediction optimized by multi-objective dragonfly algorithm based on variational mode decomposition. Chaos Solitons Fractals 2022, 157, 111982. [Google Scholar] [CrossRef]
Zeng, L.; Ling, L.; Zhang, D.; Jiang, W. Optimal forecast combination based on PSO-CS approach for daily agricultural future prices forecasting. Appl. Soft Comput. 2023, 132, 109833. [Google Scholar] [CrossRef]
Qu, Z.; Li, Y.; Jiang, X.; Niu, C. An innovative ensemble model based on multiple neural networks and a novel heuristic optimization algorithm for COVID-19 forecasting. Expert Syst. Appl. 2023, 212, 118746. [Google Scholar] [CrossRef]
Alruqimi, M.; Di Persio, L. Multistep Brent oil price forecasting with a multi-aspect aeta-heuristic optimization and ensemble deep learning model. Energy Inform. 2024, 7, 130. [Google Scholar] [CrossRef]
Ding, M.; Zhou, H.; Xie, H.; Wu, M.; Nakanishi, Y.; Yokoyama, R. A gated recurrent unit neural networks based wind speed error correction model for short-term wind power forecasting. Neurocomputing 2019, 365, 54–61. [Google Scholar] [CrossRef]
Gilles, J. Empirical wavelet transform. IEEE Trans. Signal Process. 2013, 61, 3999–4010. [Google Scholar] [CrossRef]
Hong, S.; Zhou, Z.; Zio, E.; Wang, W. An adaptive method for health trend prediction of rotating bearings. Digit. Signal Process. 2014, 35, 117–123. [Google Scholar] [CrossRef]
Kong, Y.; Wang, T.; Chu, F. Meshing frequency modulation assisted empirical wavelet transform for fault diagnosis of wind turbine planetary ring gear. Renew. Energy 2019, 132, 1373–1388. [Google Scholar] [CrossRef]
Bhattacharyya, A.; Singh, L.; Pachori, R.B. Fourier–Bessel series expansion based empirical wavelet transform for analysis of non-stationary signals. Digit. Signal Process. 2018, 78, 185–196. [Google Scholar] [CrossRef]
Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint 2018, arXiv:1803.01271. [Google Scholar]
Wang, Y.; Chen, J.; Chen, X.; Zeng, X.; Kong, Y.; Sun, S.; Guo, Y.; Liu, Y. Short-term load forecasting for industrial customers based on TCN-LightGBM. IEEE Trans. Power Syst. 2020, 36, 1984–1997. [Google Scholar] [CrossRef]
Wang, J.; Chen, Y.; Zhu, S.; Xu, W. Depth feature extraction-based deep ensemble learning framework for high frequency futures price forecasting. Digit. Signal Process. 2022, 127, 103567. [Google Scholar] [CrossRef]
Ke, K.; Hongbin, S.; Chengkang, Z.; Brown, C. Short-term electrical load forecasting method based on stacked auto-encoding and GRU neural network. Evol. Intell. 2019, 12, 385–394. [Google Scholar] [CrossRef]
Wang, B.; Wang, J. Energy futures and spots prices forecasting by hybrid SW-GRU with EMD and error evaluation. Energy Econ. 2020, 90, 104827. [Google Scholar] [CrossRef]
Qin, L.; Li, W.; Li, S. Effective passenger flow forecasting using STL and ESN based on two improvement strategies. Neurocomputing 2019, 356, 244–256. [Google Scholar] [CrossRef]
Wang, L.; Hu, H.; Ai, X.-Y.; Liu, H. Effective electricity energy consumption forecasting using echo state network improved by differential evolution algorithm. Energy 2018, 153, 801–815. [Google Scholar] [CrossRef]
Li, B.; Wu, G.; He, Y.; Fan, M.; Pedrycz, W. An Overview and Experimental Study of Learning-Based Optimization Algorithms for the Vehicle Routing Problem. IEEE/CAA J. Autom. Sin. 2022, 9, 1115–1138. [Google Scholar] [CrossRef]
Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
Mi, X.; Liu, H.; Li, Y. Wind speed prediction model using singular spectrum analysis, empirical mode decomposition and convolutional support vector machine. Energy Convers. Manag. 2019, 180, 196–205. [Google Scholar] [CrossRef]
Rummery, G.A.; Niranjan, M. On-Line Q-Learning Using Connectionist Systems; University of Cambridge: Cambridge, UK, 1994; Volume 37. [Google Scholar]
Wang, L.; He, Y.; Liu, X.; Li, L.; Shao, K. M2TNet: Multi-modal multi-task Transformer network for ultra-short-term wind power multi-step forecasting. Energy Rep. 2022, 8, 7628–7642. [Google Scholar] [CrossRef]
Song, W.; Fujimura, S. Capturing combination patterns of long-and short-term dependencies in multivariate time series forecasting. Neurocomputing 2021, 464, 72–82. [Google Scholar] [CrossRef]
Hae, H.; Kang, S.-J.; Kim, T.O.; Lee, P.H.; Lee, S.-W.; Kim, Y.-H.; Lee, C.W.; Park, S.-W. Machine Learning-Based prediction of Post-Treatment ambulatory blood pressure in patients with hypertension. Blood Press. 2023, 32, 2209674. [Google Scholar] [CrossRef] [PubMed]
Liu, H.; Yu, C.; Yu, C.; Chen, C.; Wu, H. A novel axle temperature forecasting method based on decomposition, reinforcement learning optimization and neural network. Adv. Eng. Inform. 2020, 44, 101089. [Google Scholar] [CrossRef]
Huang, Y.; Deng, Y. A new crude oil price forecasting model based on variational mode decomposition. Knowl.-Based Syst. 2021, 213, 106669. [Google Scholar] [CrossRef]

Figure 1. Structure of the hybrid model.

Figure 2. Raw crude oil price data and their split.

Figure 3. Prediction errors for different predictors across the WTI crude oil price series #1.

Figure 4. Prediction results for different predictors across the WTI crude oil price series #1.

Figure 5. Scatter of the prediction results for SARSA and different heuristic ensemble methods.

Figure 6. Scatter of the prediction results for different decomposition methods.

Figure 7. Prediction results for experimental models across the WTI crude oil price series #1.

Figure 8. Prediction results for experimental models across the WTI crude oil price series #2.

Figure 9. Prediction results for experimental models across the WTI crude oil price series #3.

Figure 10. Raw crude oil price series during geopolitical conflicts.

Figure 11. Prediction results for experimental models.

Table 1. The literature review for different models.

Methods	Refs.	Models	Year	Advantages	Drawbacks
Traditional methods	[6]	ARIMA	2003	a. Simple framework; b. Quick modeling.	a. Limited nonlinearity capture; b. Requires stationarity.
Traditional methods	[7]	GM	2019	a. Simple framework; b. Quick modeling.	a. Limited nonlinearity capture; b. Requires stationarity.
Deep learning methods	[12]	LSTM	2019	a. Captures complex patterns; b. Learns hierarchical features; c. Adapts to nonlinearity.	a. Single-model accuracy constraints.
	[13]	GRU	2021
	[8]	DNN	2024
	[15]	GMDH	2021
	[16]	TCN	2024
Hybrid methods	[18]	EEMD-CNN-ILSTM	2023	a. Reduces data complexity through decomposition; b. Adapts to diverse patterns; c. Leverages multiple predictors’ strengths.	a. Suboptimal usage of decomposed data; b. Heuristic methods risk local optima.
	[19]	VMD-ANN	2021
	[20]	CEEMD-ML-GRU	2020
	[22]	PSO-based ensemble	2023
	[23]	SCWOA-based ensemble	2023
	[24]	GWO-based ensemble	2024

Table 2. Statistical indicators of crude oil price data.

Crude Oil Price Data	Dataset #1 (USD)	Dataset #2 (USD)	Dataset #3 (USD)
Minimum	26.21	10.79	11.22
Mean	66.05	50.21	46.86
Maximum	110.53	145.29	140
Standard derivation	22.83	28.92	28.87

Table 3. The descriptions for different hybrid methods.

Name	Description
Model 1	GA-TCN-GRU-ESN
Model 2	BHA-TCN-GRU-ESN
Model 3	GWO-TCN-GRU-ESN
Model 4	SARSA-TCN-GRU-ESN
Model 5	PSO-TCN-GRU-ESN
Model 6	WPD-SARSA-TCN-GRU-ESN
Model 7	VMD-SARSA-TCN-GRU-ESN
Model 8	EWT-SARSA-TCN-GRU-ESN
Model 9	EWT-SARSA-TCN-GRU-ESN-ELM

Table 4. Evaluation indicators for different predictors across three WTI crude oil price series.

Series	Predictors	MAE (USD)	MAPE (%)	RMSE (USD)
#1	TCN	0.7462	1.1634	1.0092
	GRU	0.7680	1.1910	1.0048
	ESN	0.8262	1.2775	1.0821
	BPNN	2.7229	4.3405	2.9994
	GRNN	3.6410	5.6739	4.0624
	RBFNN	1.3925	2.1287	1.8051
	LSTM	0.9536	1.4394	1.1935
	CNN	0.8698	1.3729	1.1409
	GMDH	1.1592	1.8026	1.4410
#2	TCN	2.2485	4.7553	3.2635
	GRU	2.1264	4.6699	2.8677
	ESN	2.1818	4.7152	2.9101
	BPNN	2.8466	6.3487	3.6113
	GRNN	3.5657	7.9179	4.6912
	RBFNN	3.5109	7.8512	4.8379
	LSTM	2.3411	5.0723	3.1779
	CNN	2.3602	5.1736	3.2895
	GMDH	2.9150	6.5246	3.8918
#3	TCN	5.6431	12.0686	7.7571
	GRU	6.0148	13.9647	8.5027
	ESN	5.5273	12.0189	7.4268
	BPNN	7.1119	14.7113	9.2021
	GRNN	7.7206	18.4653	10.3241
	RBFNN	7.3445	18.0054	10.1406
	LSTM	6.0466	14.1694	8.4556
	CNN	6.2123	14.1316	8.6501
	GMDH	7.6269	18.3832	11.0227

Note. MAE (mean absolute error), MAPE (mean absolute percentage error), and RMSE (root mean square error) measure predictive accuracy. Lower values indicate better performance. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.

Table 5. Evaluation indicators for different ensemble methods across three WTI crude oil price series.

Series	Models	Ensemble Methods	MAE (USD)	MAPE (%)	RMSE (USD)
#1	Model 1	GA	0.7327	1.1419	0.9950
	Model 2	BHA	0.7162	1.1142	0.9739
	Model 3	GWO	0.7218	1.1239	0.9826
	Model 4	SARSA	0.7119	1.1082	0.9691
#2	Model 1	GA	2.0895	4.4896	2.7733
	Model 2	BHA	2.1079	4.5164	2.8199
	Model 3	GWO	2.0513	4.4849	2.7391
	Model 4	SARSA	2.0144	4.3907	2.6651
#3	Model 1	GA	5.3891	11.8152	7.1542
	Model 2	BHA	5.4202	12.0044	7.1597
	Model 3	GWO	5.4539	11.8065	7.2562
	Model 4	SARSA	5.2533	11.8783	6.9218

Note. Model 1 = GA-TCN-GRU-ESN; Model 2 = BHA-TCN-GRU-ESN; Model 3 = GWO-TCN-GRU-ESN; Model 4 = SARSA-TCN-GRU-ESN. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.

Table 6. Promoting percentages for ensemble methods.

Models	Indices	Series #1	Series #2	Series #3
Model 4 vs. TCN	P_MAE (%)	4.5966	10.4114	6.9076
	P_MAPE (%)	4.7445	7.6672	1.5768
	P_RMSE (%)	3.9734	18.3111	10.7682
Model 4 vs. GRU	P_MAE (%)	7.3047	5.2671	12.6604
	P_MAPE (%)	6.9521	5.9787	14.9405
	P_RMSE (%)	3.5529	7.0649	18.5929
Model 4 vs. ESN	P_MAE (%)	11.8343	7.6726	4.9572
	P_MAPE (%)	13.2524	6.8820	1.1698
	P_RMSE (%)	10.4427	7.0649	6.7997

Note. Model 4 = SARSA-TCN-GRU-ESN. This table shows the promoting percentages of evaluation indicators for the SARSA-based ensemble compared to different single prediction methods. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.

Table 7. Evaluation indicators for different decomposition methods across three WTI crude oil price series.

Series	Models	Decomposition Methods	MAE (USD)	MAPE (%)	RMSE (USD)
#1	Model 4	-	0.7119	1.1082	0.9691
	Model 6	WPD	0.2239	0.3538	0.2671
	Model 7	VMD	0.4131	0.6228	0.5415
	Model 8	EWT	0.1728	0.2640	0.2207
#2	Model 4	-	2.0144	4.3907	2.6651
	Model 6	WPD	0.6911	1.5565	0.9437
	Model 7	VMD	1.1031	2.4394	1.3172
	Model 8	EWT	0.3387	0.7625	0.4701
#3	Model 4	-	5.2533	11.8783	6.9218
	Model 6	WPD	1.7792	4.2214	2.4301
	Model 7	VMD	3.2493	6.6851	3.6351
	Model 8	EWT	1.6131	3.5695	1.9352

Note. Model 4 = SARSA-TCN-GRU-ESN; Model 6 = WPD-SARSA-TCN-GRU-ESN; Model 7 = VMD-SARSA-TCN-GRU-ESN; Model 8 = EWT-SARSA-TCN-GRU-ESN. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.

Table 8. Promoting percentages for decomposition algorithms.

Models	Indices	Series #1	Series #2	Series #3
Model 6 vs. Model 4	P_MAE (%)	68.5490	65.6920	66.1318
	P_MAPE (%)	68.0744	64.5501	64.4612
	P_RMSE (%)	72.4383	64.5904	64.8920
Model 7 vs. Model 4	P_MAE (%)	41.9722	45.2392	38.1474
	P_MAPE (%)	43.8008	44.4417	43.7201
	P_RMSE (%)	44.1234	50.5760	47.4833
Model 8 vs. Model 4	P_MAE (%)	75.7269	83.1861	69.2936
	P_MAPE (%)	76.1776	82.6337	69.9494
	P_RMSE (%)	77.2263	82.3609	72.0420

Note. Model 4 = SARSA-TCN-GRU-ESN; Model 6 = WPD-SARSA-TCN-GRU-ESN; Model 7 = VMD-SARSA-TCN-GRU-ESN; Model 8 = EWT-SARSA-TCN-GRU-ESN. This table shows the promoting percentages of evaluation indicators for different decomposition methods compared to the SARSA-based ensemble. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.

Table 9. MSE with its components MSE1, MSE2, MSE3.

Series	Decomposition Methods	MSE
Series	Decomposition Methods	TCN	GRU	ESN
#1	WPD VMD EWT	0.1353	0.1619	0.1527
		0.3731	0.4492	0.4913
		0.0975	0.1013	0.1434
#2	WPD VMD EWT	1.1731	1.0752	1.5963
		2.3832	1.8923	2.4435
		0.9816	0.4352	0.7369
#3	WPD VMD EWT	7.1536	6.7949	6.8753
		15.4031	16.7098	15.2039
		5.0524	4.1983	4.7721

Note. The table shows MSE1, MSE2, and MSE3, which are the prediction results of each predictor on the decomposed data before using the ensemble model.

Table 10. Evaluation indicators for experimental models across three WTI crude oil price series.

Series	Models	MAE (USD)	MAPE (%)	RMSE (USD)
#1	TCN	0.7462	1.1634	1.0092
	GRU	0.7680	1.1910	1.0048
	ESN	0.8262	1.2775	1.0821
	Model 4	0.7119	1.1082	0.9691
	Model 8	0.1728	0.264	0.2207
	Model 9	0.1239	0.1914	0.1582
#2	TCN	2.2485	4.7553	3.2635
	GRU	2.1264	4.6699	2.8677
	ESN	2.1818	4.7152	2.9101
	Model 4	2.0144	4.3907	2.6651
	Model 8	0.3387	0.7625	0.4701
	Model 9	0.2529	0.5493	0.3375
#3	TCN	5.6431	12.0686	7.7571
	GRU	6.0148	13.9647	8.5027
	ESN	5.5273	12.0189	7.4268
	Model 4	5.2533	11.8783	6.9218
	Model 8	1.6131	3.5695	1.9352
	Model 9	1.3224	2.7004	1.6415

Note. Model 4 = SARSA-TCN-GRU-ESN; Model 8 = EWT-SARSA-TCN-GRU-ESN; Model 9 = EWT-SARSA-TCN-GRU-ESN-ELM. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.

Table 11. Promoting percentages for the ECM.

Indices	Model 9 vs. Model 8
Indices	Series #1	Series #2	Series #3
P_MAE (%)	28.2986	25.3322	18.0212
P_MAPE (%)	27.5000	27.9607	24.3479
P_RMSE (%)	28.3190	28.2068	15.1767

Note. Model 8 = EWT-SARSA-TCN-GRU-ESN; Model 9 = EWT-SARSA-TCN-GRU-ESN-ELM. This table shows the promoting percentages of evaluation indicators for the model with the error correction module and that without error correction. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.

Table 12. Evaluation indicators for experimental models.

Models	MAE (USD)	MAPE (%)	RMSE (USD)
TCN	2.3003	2.8941	3.3531
GRU	2.9344	3.4514	5.1989
ESN	2.4249	2.9076	4.4565
Liu’s model	0.4926	0.5575	1.1428
Mi’s model	0.5827	0.6596	1.3803
Huang’s model	0.4270	0.5446	0.7404
Model 9	0.3510	0.4420	0.5818

Table 13. Computation time for different models.

Forecasting Models	Training Time (s)
Forecasting Models	#1	#2	#3
TCN	45.32	47.81	26.93
GRU	19.21	18.71	7.34
ESN	8.13	7.25	4.15
PSO-TCN-GRU-ESN	87.75	91.54	53.51
GA-TCN-GRU-ESN	79.82	81.61	45.32
SARSA-TCN-GRU-ESN	100.23	105.36	58.86
EWT- SARSA-TCN-GRU-ESN	1185.91	1290.49	659.19
EWT-SARSA-TGE-ELM	1194.16	1297.93	663.46

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Zheng, G.; Li, Y.; Xia, Y. Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction. Appl. Sci. 2025, 15, 1055. https://doi.org/10.3390/app15031055

AMA Style

Zheng G, Li Y, Xia Y. Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction. Applied Sciences. 2025; 15(3):1055. https://doi.org/10.3390/app15031055

Chicago/Turabian Style

Zheng, Guangji, Ye Li, and Yu Xia. 2025. "Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction" Applied Sciences 15, no. 3: 1055. https://doi.org/10.3390/app15031055

APA Style

Zheng, G., Li, Y., & Xia, Y. (2025). Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction. Applied Sciences, 15(3), 1055. https://doi.org/10.3390/app15031055

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction

Abstract

1. Introduction

2. Methodology

2.1. Structure of the Hybrid Model

2.2. Empirical Wavelet Transform

2.3. Forecasting Methods

2.3.1. Temporal Convolutional Network

2.3.2. Gated Recurrent Unit

2.3.3. Echo State Network

2.4. Multi-Predictor Ensemble

2.5. Error Correction Module

3. Experiments

3.1. Crude Oil Price Datasets

3.2. Evaluation Indicators

3.3. Experimental Evaluation of the Proposed Model

3.3.1. Comparison with Diverse Benchmark Models

3.3.2. Comparison with Models Utilizing Diverse Ensemble Strategies

3.3.3. Comparison with Diverse Decomposition Methods

3.3.4. Comparison with Models Using Error Correction Methods

3.4. Supplementary Experiments

4. Application Analysis

4.1. Real-Time Adaptability

4.2. Application

5. Conclusions and Future Researches

5.1. Conclusions

5.2. Future Research

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Conflicts of Interest

Abbreviations

Appendix A

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI