Next Article in Journal
Improving the Efficiency of Multimedia Learning and the Quality of Experience by Reducing Cognitive Load
Previous Article in Journal
HyFusER: Hybrid Multimodal Transformer for Emotion Recognition Using Dual Cross Modal Attention
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction

Key Laboratory of Traffic Safety on Track of Ministry of Education, School of Traffic & Transportation Engineering, Central South University, Changsha 410075, China
*
Author to whom correspondence should be addressed.
Appl. Sci. 2025, 15(3), 1055; https://doi.org/10.3390/app15031055
Submission received: 3 December 2024 / Revised: 16 January 2025 / Accepted: 18 January 2025 / Published: 21 January 2025
(This article belongs to the Section Energy Science and Technology)

Abstract

:
Crude oil price forecasting contributes to global economic development. This study proposes a hybrid deep learning model for crude oil price forecasting. First, empirical wavelet transform decomposes raw data into multiple. Then, three neural networks generate preliminary forecasts, which are subsequently refined by a reinforcement learning-based ensemble method. Finally, an error correction module handles residuals, further enhancing the forecasting outcomes. Three West Texas Intermediate datasets and additional emergency scenarios were used to validate the hybrid model. The findings indicate that the proposed model achieves superior predictive performance compared with sixteen benchmark methods and three advanced models.

1. Introduction

Crude oil is a key strategic resource for social and economic development. It remains important in global industrial production, even though alternative energy sources exist [1]. In addition, crude oil price volatility has an important impact on economic growth, securities markets, and national security. Efficient and accurate crude oil price forecasting contributes to global economic development. However, the price is affected by lots of factors, including geopolitical conflicts, oil production, and the pace of economic development [2]. Hence, crude oil price data are highly nonlinear, non-stationary, and stochastic. To precisely forecast the crude oil price, researchers have made lots of contributions [3].
Common crude oil price forecasting models include causal relationship forecasting models, statistical models, machine learning models, and hybrid forecasting models [4]. The causal relationships model selects several independent variables to predict future values, but it needs many data sources, and complicated factors, high research costs, and interactions between independent variables limit its use [5]. The current international situation changes quickly, and slow updates make the causal relationships model less suited to short-term forecasting.
Statistical models, in contrast, have simpler structures and meet short-term speed needs. Typical statistical methods include the autoregressive integrated moving average (ARIMA) [6], and gray models (GM) [7]. Yet crude oil prices often show strong nonlinearity. Hence, such statistical models may fail to capture sudden or large price swings. This problem makes accurate prediction difficult. As global markets grow more volatile, better approaches are needed to handle complex data patterns.
Many scholars use deep learning models to predict time series [8,9,10,11]. Zhou et al. employed a deep neural network (DNN) model to forecast equity premiums [8]. They compared the DNN model with ordinary least squares (OLS) and historical average (HA) models. Experimental findings indicated that the DNN model demonstrated superior predictive performance. Cen Z. et al. performed short-term forecasts of WTI and Brent crude oil prices using long short-term memory (LSTM) networks by reducing the influence of historical data while increasing that of current data. Their results showed that this approach produced lower prediction errors [12]. Busari G. A. et al. compared the predictive performance of a gated recurrent unit (GRU) model with that of a single LSTM network, revealing that GRU effectively reduced prediction errors in crude oil price models [13]. Additionally, several studies employed group method of data handling (GMDH) networks for time series forecasting and reported favorable results on respective datasets [14,15]. Foroutan et al. implemented 16 deep learning architectures to predict daily oil prices for WTI and Brent. Their findings indicated that the temporal convolutional network (TCN) surpassed the other models [16].
However, a single machine learning model may not adapt well to all time series conditions. Therefore, more researchers turn to hybrid forecasting models. Two common approaches in crude oil price forecasting are the decomposition algorithm and the ensemble algorithm [17]. The decomposition method handles raw crude oil price data, which are nonlinear and random. Wang et al. used integrated empirical mode decomposition (EEMD) to preprocess crude oil prices [18]. Experimental results showed that this decomposition method could effectively improve the accuracy of convolutional neural networks (CNNs) and LSTM networks. Liu et al. adopted variational modal decomposition (VMD) to process original data, then used an artificial neural network for each sub-series [19]. Their results showed reduced prediction errors. Lin et al. applied complementary EEMD (CEEMD) to split crude oil data into multiple subsequences [20]. It could be seen that the method improved the recognition ability for nonlinear data of the GRU. The ensemble method blends different base predictors to reduce forecasting errors. One common ensemble approach is the heuristic method [21]. Zeng et al. proposed a hybrid model with particle swarm optimization (PSO) [22]. Their experiments on corn and wheat future price series showed better results than other methods. Qu et al. used the sine cosine algorithm-whale optimization algorithm (SCWOA) to optimize the weights of four deep learning methods [23]. Experiments showed that the accuracy of this model was superior to that of all single models. Alruqimi et al. applied gray wolf optimization (GWO) to optimize the weights of base predictors [24]. They found that GWO effectually reduced the prediction errors. These methods underscore the importance of mixing different strategies to improve the crude oil price forecasting performance.
In summary, scholars have made many important advances in crude oil price forecasting. Table 1 briefly summarizes the algorithms proposed in these papers, the advantages and disadvantages of the models, and their publication dates.
Although lots of effective prediction methods have been proposed in time series prediction, there are still some limitations. There is still some potential to improve the forecasting precision of hybrid models. (1) The heuristic algorithm as a common ensemble learning method has been studied by many scholars, and they have found it difficult to make a breakthrough. Compared with the heuristic algorithm, reinforcement learning (RL) has a strong self-learning ability. Hence, the application of RL in ensemble learning is worth studying. (2) There are still some predictable components in the forecasting residuals of the hybrid model [25]. To further improve the performance of the hybrid model, the residuals can be used to modify prediction results.
Therefore, to further improve the accuracy and robustness of crude oil price forecasting, a new prediction method is proposed. This method integrates EWT, three neural networks (TCN, GRU, and ESN), reinforcement learning-based weight optimization, and a final error correction module. The main contributions of this study are as follows. The crude oil price series were adaptively decomposed into multiple subsequences using EWT according to their fluctuation characteristics, reducing nonlinearity and delivering higher predictive accuracy compared with classical decomposition methods. In addition, combining three deep neural networks leveraged their extensive hidden layers to manage non-stationary data and exploit complementary strengths, thereby enhancing both adaptability and generalization in forecasting. The SARSA was employed to optimize the ensemble weights among diverse base predictors. Moreover, a machine learning-based error correction module was applied to address predictable residual components, thereby further improving the accuracy and robustness of crude oil price forecasting.
Empirical evaluations on three crude oil price datasets indicate that the model effectively handles nonlinear and volatile market conditions, achieving a mean absolute error (MAE) of USD 0.1239 and a mean absolute percentage error (MAPE) of 0.1914% in the best scenario. Furthermore, additional experiments incorporating abrupt price fluctuations induced by geopolitical events verified the model’s adaptability, as it accurately identified mutation points and outperforms advanced benchmarks, reducing the MAE to USD 0.3510 under high volatility. These findings emphasize the potential of this hybrid approach to enhance forecasting accuracy, offering a reliable tool for decision-makers in the energy sector and financial markets.
The article is organized as follows. Section 2 elaborates on the structure of our proposed model and the theoretical knowledge involved. Section 3 analyzes the crude oil price data and conducts corresponding comparative experiments. The application analysis of the proposed model is shown in Section 4. The conclusions and future research related to this article are outlined in Section 5.

2. Methodology

2.1. Structure of the Hybrid Model

The modeling process for our hybrid model is given in Figure 1. The forecasting model includes three modules: data preprocessing and prediction, prediction results ensemble, and error correction. The detailed modeling process is as follows.
A.
Data preprocessing and prediction: The raw data are divided into three parts: training sets, validation sets, and test sets. Then, EWT is applied to decompose the three parts adaptively. The base predictors, the TCN, GRU, and ESN, predict the decomposed sub-series. The details of EWT and predictors are given in Section 2.2 and Section 2.3, respectively.
B.
Prediction results ensemble: In this paper, the ensemble method is completed by setting weights of the forecasting results of different predictors. The calculation method is shown in Equation (1). The SARSA algorithm, a reinforcement learning method, is applied to optimize the weights. The SARSA-based ensemble learning method is detailed in Section 2.4.
Y ^ ( t ) = w 1 y 1 ^ ( t ) + w 2 y 2 ^ ( t ) + w 3 y 3 ^ ( t )
where w i represent the weights of different predictors, y i ^ ( t ) represent the time-dependent forecasting results of the i–th predictor at time t , derived from distinct predictive models, and Y ^ ( t ) represent the predicted results after the ensemble.
C.
Calculating error and error correction: The forecasting residuals still have predictable components after ensemble learning. Hence, there is still room for further improvement in forecasting accuracy. An extreme learning machine (ELM) is used for error correction. The final crude oil price prediction results are obtained by combining error correction results and model ensemble results. The ECM is detailed in Section 2.5.

2.2. Empirical Wavelet Transform

EWT is an adaptive signal decomposition method, which is based on EMD [26,27]. The principle of EWT is to decompose the Fourier spectrum adaptively by marking maximum points in the frequency domain [28], then construct several corresponding filters to process the data. EWT has the advantages of sufficient theoretical assurance, good adaptability, simple computation, and lacking mode aliasing. EWT includes the empirical wavelet function and empirical scale function. The equations of these two functions are as follows [29]:
ϕ n ( ω ) = 1 ω   1 γ ω n cos π 2 β 1 2 γ ω n ω 1 γ ω n 1 γ ω n ω   1 + γ ω n 0   otherwise  
ψ n ( ω ) = 1   1 + γ ω n ω   1 γ ω n + 1 cos π 2 β 1 2 γ ω n + 1 ω 1 γ ω n + 1 1 γ ω n + 1 ω   1 + γ ω n + 1 sin π 2 β 1 2 γ ω n ω 1 γ ω n 1 γ ω n ω   1 + γ ω n 0 otherwise  
where β x and parameter γ are defined below:
β x = 1   x 1 0 x 0   β x + β 1 x = 1 otherwise  
γ < min n ω n + 1 ω n ω n + 1 + ω n ,   γ 0 , 1
where γ represents the width ratio of the transition region between adjacent frequency bands, constrained within 0 , 1 . ω n and ω n + 1 denote the upper boundary of the frequency band and the lower boundary of the n + 1 t h frequency band, respectively, expressed in normalized frequency. ω n + 1 ω n defines the frequency gap between adjacent bands, while ω n + 1 + ω n normalizes the gap.

2.3. Forecasting Methods

2.3.1. Temporal Convolutional Network

Bai et al. first combined various convolution methods with a one-dimensional CNN to make the CNN more applicable to time series data, and this network is called a temporal convolutional network [30]. A TCN is made up of stacked residual modules that combine extended convolution and two-layer causal convolution, and the weights of each convolution kernel are normalized. The TCN adds the nonlinear relation between convolution layers using a rectified linear unit (ReLU) and avoids overfitting by using a dropout. In general, there are four main portions in the TCN: causal convolution, one-dimensional CNN, extended convolution, and residual connection. A detailed explanation of each portion is given below.
Causal convolution: A TCN imparts causality to convolution, so the TCN will not produce information “leakage”, improving its prediction accuracy. Assuming that the input sequence of the model is x 0 , x 1 , , x T , and the expected predicted output is, causal convolution gives predicted output y t at time t only decided by y 0 , y 1 , , y T , and not affected by x t + 1 , x t + 2 , , x T .
One-dimensional CNN: The TCN uses one-dimensional CNN to generate an output sequence with an equal length to the input sequence [31], which can retain the information contained in the entire input data and construct long-term memory.
Extended convolution: The TCN uses extended convolution to obtain a larger receptive field. For a one-dimensional sequence x R n and convolution kernels f : 0 , 1 , , k 1 , the extended convolution equation F on the sequence element M is defined as:
F ( s ) = i = 0 k 1 f i · x m v · i
where v is the expansion coefficient, k is the size of the convolution kernels, and m v · i indicates that the m v · i t h element of upper layer is used.
From Equation (6), it can be seen that enlarging the convolution kernel size   k or increasing the expansion coefficient v can increase the receptive field of the network. Generally, the number of extended networks at layer i is:
v = O 2 i
Residual connection: The TCN guarantees the accuracy of the deep network by residual connection. We define the expression of residual connection with x as the input and y as the output as follows:
y = η x , ρ + x
where η represents a series of convolution operations and ρ is the weight matrix of the convolution kernel.
According to Equation (8), the output of the residual module combines the input information with the output information of the convolution calculation, which ensures the accuracy of the deep TCN network model.

2.3.2. Gated Recurrent Unit

A RNN considers the influence of the previous state on the current state during the prediction by adding a recurrent structure to the model. This structure is beneficial in predicting the time series data [32]. However, the traditional RNN is prone to the bottlenecks of gradient disappearance or gradient explosion when handling long time series problems. The LSTM introduces the gated structure for solving the gradient problem in RNNs [33]. Compared with LSTM, a GRU only needs an update gate and reset gate, which can save memory and speed up the computation.
The equations of the update gate and reset gate of the GRU are as follows [34]:
z t = σ W z h t 1 , x t
r t = σ W r h t 1 , x t
where x t represents the current state, h t 1 shows the last hidden state, and W represents the weight matrix.

2.3.3. Echo State Network

The ESN has been broadly utilized in nonlinear prediction [35]. An ESN consists of an input layer, dynamic reservoir, and output layer. The dynamic reservoir is an important module, which has many hidden layer neurons connected randomly and sparsely. Thus, an ESN has the ability of short-term memory. It has the following characteristics [36]: (a) the main construction is a random and invariable reservoir; (b) connections between neurons are generated randomly; and (c) the ESN can realize algorithm training through simple linear regression.

2.4. Multi-Predictor Ensemble

RL is a kind of interactive learning, which makes the agent learn by interacting with the environment [37]. The agent can learn the optimal strategy and adjust it automatically according to the objective, which makes RL maximize the return or achieve a fixed objective. RL has been utilized in many fields, such as autonomous driving, game confrontation, integration optimization, and so on [38,39].
The SARSA, proposed by Rummery and Niranjan [40], has strong scalability and excellent convergence. Hence, we used the SARSA to combine three neural networks in the paper. The parameters of the SARSA are defined as follows:
State: The state of each neural network weight is regarded as the state matrix S .
S = w 1 , w 2 , w 3
where w 1 represents the weight of TCN, w 2 represents the weight of GRU, and w 3 represents the weights of the ESN.
Action: Action matrix a represents the action to change the weight of each network.
a = Δ w 1 , Δ w 2 , Δ w 3
where Δ w m represents the change of weights for different neural networks.
The SARSA is a deterministic strategy. To better balance “exploration” and “utilization”, and make the model can search for more potential actions, the SARSA introduces the ε g r e e d y policy to select the action. The definition of the ε g r e e d y policy is as follows:
a m = B a s e d   o n   max Q   p r o b a b i l i t y   o f   1 - ε R a n d o m   p r o b a b i l i t y   o f   ε
ε 0 , 1
where ε is exploration probability.
Reward: The model acquires reward R by calculating loss function L .
L = ( t = 1 N Y ( t ) Y ^ ( t ) 2 ) / N
R = + 1 + L t L t + 1 L t + 1 < L t 1 + L t L t + 1 L t + 1 > L t
where Y t represents the raw crude oil price data, Y ^ ( t ) indicates the forecasting value.
Action value function Q: The iterative equation of value function is expressed as follows:
Q n + 1 ( S n , a n ) = Q n ( S n , a n ) + η n R ( S n , a n ) + γ Q n ( S n + 1 , a n + 1 ) Q n ( S n , a n )
where η indicates learning rate, γ indicates discount coefficient.
The solving processes of the ensemble module are described in Algorithm 1.
Algorithm 1 Ensemble module based on the SARSA
Input:
Forecasting results for the TCN, GRU, and ESN: Y ^ T , Y ^ G , Y ^ E .
The maximum number of episodes: Z .
The maximum step of each episode: K .
Discount coefficient: γ .
Learning rate: η .
Output: Weights of the three predictors: w 1 , w 2 , w 3 .
Algorithm:
1: Initialize all parameters
2: for z = 1: Z do
3: for k = 1: K do
4:  Construct loss function L and reward R :
L = ( t = 1 N Y ( t ) Y ^ ( t ) 2 ) / N ,   R = + 1 + L t L t + 1 L t + 1 < L t 1 + L t L t + 1 L t + 1 > L t
5:  Select a through the ε g r e e d y policy
a m = B a s e d   o n   max Q   p r o b a b i l i t y   o f   1 - ε R a n d o m   p r o b a b i l i t y   o f   ε ,   ε 0 , 1
6:  Compute loss function L and reward R and update Q table:
Q n + 1 ( S n , a n ) = Q n ( S n , a n ) + η n R ( S n , a n ) + γ Q n ( S n + 1 , a n + 1 ) Q n ( S n , a n )
7: end for
8: end for

2.5. Error Correction Module

When confronted with highly nonlinear, non-stationary and stochastic crude oil price series, the ensemble module often fails to capture all potential patterns, leading to errors. The error series still contain valuable information, revealing predictive elements not fully captured by the ensemble module. The ECM can analyze and leverage these error series, thereby further improving prediction accuracy. A more accurate prediction can be obtained by overlaying the correction output from the ECM with the prediction from the ensemble module.
The ELM only requires random initialization of hidden layer weights during training, and then solves the output layer weights using least squares. This approach avoids the high computational costs associated with gradient-based iterative training. In error correction, this rapid training strategy effectively captures and fits the residual information. It also decreases the number of training iterations required and minimizes the need for extensive hyperparameter tuning. Therefore, the ELM is utilized to correct the ensemble module’s prediction errors. The solving steps of the ECM are as follows:
Step 1: The E 1 t , which is the difference between the validation set result Y 1 ^ ( t ) in Module 2 and the corresponding true value Y 1 ( t ) , is the training set of the ELM:
E 1 t = Y 1 ( t ) Y 1 ^ ( t )
Step 2: The error correction series E 2 t is obtained by training the ELM. The final crude oil price forecasting result can be obtained by adding E 1 t to the test set result Y 2 ^ ( t ) in Module 2:
Y 2 ( t ) = Y 2 ^ ( t ) + E 2 t

3. Experiments

3.1. Crude Oil Price Datasets

The closing spot prices for different time scales in West Texas Intermediate (WTI) crude oil were selected to comprehensively analyze the forecasting performance of our proposed model. The data were chosen from the US energy information administration (EIA). The data could be accessed through their official website: http://www.eia.gov/ (accessed on 26 December 2024). These spot prices are widely recognized as a benchmark for short-term oil market analysis and reflect the actual cash market value. The three datasets were the daily price data from 15 February 2013 to 5 November 2018, the weekly price data from 27 September 1992 to 20 June 2021, and the monthly price data from July 1998 to October 2021. Figure 2 shows the detailed volatility shape of three datasets. Table 2 presents statistical information on three crude oil price datasets. Data from different time scales (daily, weekly, and monthly) were selected to thoroughly evaluate the model’s forecasting performance.
For consistency in dataset length, 1500 crude oil price data were used for the first two datasets. However, the third dataset was monthly and had fewer available records, so 400 crude oil price data were chosen for clarity. Furthermore, dataset #1 and dataset #2 had a different volatility shape. All of these verified the forecasting performance and generalization ability of our proposed model for different types of price data.
In the paper, the raw price data were divided into three parts: training sets, validation sets, and test sets. The proportion of these three parts was 3:1:1 [41,42,43]. The training sets were utilized to train base predictors. Validation sets were applied to train the ensemble optimization method and error correction method. The test sets were utilized to evaluate forecasting performance. The dataset was relatively small. If the validation or test set is proportionally too large, the training data become insufficient. This diminishes the model’s learning capacity. Conversely, if the validation or test set is too small, the evaluation may be unreliable. Therefore, a 3:1:1 data split provides a suitable balance.
In the paper, the forecasting process used a single-step-ahead prediction framework with a sliding window of size 5. For each new time step, the most recent five real observations were fed into the neural network to generate the next single-step forecast. Thus, the state of the model is updated using real data rather than predicted data, because each time the actual value becomes available, it replaces the oldest data in the sliding window. Furthermore, forecasting is performed one by one (i.e., step-by-step). After the network produces a single-step forecast at time t + 1 based on t 4 , t 3 , t 2 , t 1 , t real data, the window then slides by one step, and the newly observed real data at t + 1 enters the window for forecasting t + 2 . This approach ensures that each prediction relies on the most up-to-date and accurate input sequence, thereby reducing the compounding of errors that can occur when predicted values are fed back into the model.
Training assumptions: It was assumed that the raw crude oil price data, although relatively small, provided sufficient coverage of various market conditions for training the proposed model. A 3:1:1 data split was chosen to balance learning capacity and evaluation reliability, ensuring that neither the training set nor the validation/test sets was too small. During hyperparameter tuning, the models were trained for a limited number of iterations due to computational constraints. Although the number of training iterations was restricted, it was assumed that this procedure still converged to a near-optimal set of parameters and hyperparameters. It was assumed that a single-step-ahead prediction approach with a sliding window of size 5 adequately captured temporal dependencies.
Validation assumptions: It was assumed that the validation set, drawn from the same overall distribution as the training set, represented typical variations in the crude oil market despite the dataset’s limited size. Validation was performed step by step, with each new real observation being slid into the model input window to reduce the accumulation of forecast errors. It was assumed that this method preserves the inherent time dependencies and ensures a realistic performance evaluation. It was likewise assumed that the 3:1:1 data split ratio struck a balance between sufficient training data and reliable validation.
All experiments were carried out on computers equipped with Core i7-9800X 3.8 GHz CPU, 16 G memory, and a single GPU 2080Ti. The deep learning framework was PyTorch 1.10.1. Table A1 in the Appendix A lists the hyperparameters for the proposed model.
Additionally, because some hybrid models were used in this paper, they were renamed for easier tracking. The correspondence between these hybrid models and their new names is shown in Table 3.

3.2. Evaluation Indicators

The statistical indicator is an important index to analyze forecasting accuracy. The MAE, MAPE, and root mean square error (RMSE) are utilized in the article. Moreover, to directly compare the forecasting precision of different methods, the promoting percentages of the MAE, MAPE, and RMSE ( P M A E , P M A P E , and P R M S E ) are applied. The relevant equations are given as follows:
M A E = ( t = 1 N Y ( t ) Y ^ ( t ) ) / N M A P E = ( t = 1 N ( Y ( t ) Y ^ ( t ) ) / Y ( t ) ) / N R M S E = ( t = 1 N Y ( t ) Y ^ ( t ) 2 ) / N
P M A E = M A E a M A E b / M A E a P M A P E = M A P E a M A P E b / M A P E a P R M SE = R M S E a R M S E b / R M S E a
where Y t represents the true crude oil price series for time t , Y ^ ( t ) represents the prediction crude oil price series for time t , and N indicates the number of samples in Y t .

3.3. Experimental Evaluation of the Proposed Model

3.3.1. Comparison with Diverse Benchmark Models

To verify the superiority of the TCN, GRU, and ESN deep networks in crude oil price forecasting, we compared the three predictors with six different algorithms, which were the BPNN, RBFNN, GRNN, LSTM, CNN, and GMDH algorithms. Table 4 visually shows the evaluation indicators of different predictors. Figure 3 shows the prediction errors of different predictors in crude oil price series #1. Figure 4 presents the forecasting results for different predictors in series #1. The following conclusions can be drawn from Table 4, Figure 3 and Figure 4:
a.
The forecasting precision of the deep networks was superior to that of other traditional algorithms in all cases. This shows that the deep networks can better identify the volatility shape of raw data. The probable reason is that deep networks have rich hidden layers, which improves their performance to deal with non-stationary data and could mine the deep information more effectively.
b.
The TCN, GRU, and ESN showed better prediction performance than the LSTM and CNN on three datasets. Possible reasons are as follows. First, the TCN incorporates dilated convolution and residual structures. These features enable a large receptive field in a relatively shallow network and maintain stable gradient propagation while capturing long-range dependencies. Second, the GRU employs a simplified gating mechanism that lowers model complexity. Hence, it achieves better training efficiency and faster convergence when handling random fluctuations and long-term dependencies. Third, the ESN utilizes a reservoir with randomly sparse connections, which helps capture dynamic features in crude oil price data more effectively. It also maintains robust performance under highly random conditions. In contrast, the LSTM and CNN have more complex structures or limited receptive fields. When they face high noise and strong non-stationarity, they often encounter unstable training or inadequate long-term dependence capture. Therefore, the TCN, GRU, and ESN offer stronger adaptability and prediction ability for non-stationary and random crude oil price data.

3.3.2. Comparison with Models Utilizing Diverse Ensemble Strategies

To verify that the ensemble method had optimal prediction accuracy, model 4 was compared with the TCN, GRU and ESN. In addition, to fully verify the superiority of the ensemble strategy employing RL over traditional heuristic ensemble methods, the SARSA was compared with several classical heuristic algorithms, including the BHA, GA, and GWO. Table 5 visually shows the evaluation indicators of experimental models. Table 6 presents the promoting percentages of ensemble learning methods. Figure 5 is the scatter of the prediction results for the RL and heuristic ensemble methods. The following conclusions can be drawn from Table 5 and Table 6, and Figure 5:
a.
The prediction errors of all ensemble methods were lower than those of the TCN, GRU, and ESN. Hybrid model 4 improved the accuracy of the single prediction algorithm by 4–12%. This shows that ensemble learning predicted the trend of crude oil price data more accurately than the single predictor. The possible reason is the ensemble method made excellent optimization decisions on ensemble weights according to the volatility of crude oil price series. Hence, ensemble approaches could combine the strengths for different single models to reduce prediction errors.
b.
Compared with other models, model 4 showed optimal forecasting precision. This shows that RL can optimize the ensemble weights more effectively than the traditional heuristic method, and improve the precision for the ensemble method. It could be seen that during the process for optimal decision-making, agents adjusted themselves through continuous interaction with their surroundings, which makes the RL more intelligent and the prediction results more accurate.

3.3.3. Comparison with Diverse Decomposition Methods

To fully evaluate the effectiveness of decomposition methods, we compared the forecasting results of model 4 and those of the model employing different decomposition algorithms. In addition, we also compared EWT with WPD and VMD to prove the superiority of EWT. Table 7 visually shows the evaluation indicators of experimental models, Table 8 shows the improvement of the model 4 on forecasting performance based on different decomposition methods and Figure 6 describes the comparison of scatter points predicted by experimental models. Table 9 shows MSE1, MSE2, and MSE3, which are the prediction results of each predictor on the decomposed data before using the ensemble model. From these experimental results we can conclude:
a.
The prediction performance of the proposed model utilizing the decomposition algorithm is more excellent than that of the model without the algorithm. The forecasting performance of models using the algorithm is improved by more than 40%. This shows decomposition methods greatly reduce the high fluctuation of raw data and the prediction errors.
b.
In all experiments, the prediction errors of EWT are the lowest among the three decomposition algorithms. Compared with two other decomposition algorithms, EWT could effectually decrease the nonlinearity for raw data and enhance forecasting performance. The main reason is that EWT could adaptively decompose the raw series into multiple subsequences, which enhances the ability of the ensemble method to analyze volatility characteristics.

3.3.4. Comparison with Models Using Error Correction Methods

To evaluate the validity of the ECM, we compared model 8 and the model applying the ECM. Table 10 shows the indicator results for experimental methods. Table 11 presents the promoting percentages for the ECM. Figure 7, Figure 8 and Figure 9 are the prediction results for different series. From Table 10 and Table 11, and Figure 7, Figure 8 and Figure 9, these conclusions could be summarized:
a.
ECM decomposition can effectively correct prediction residuals. In three different datasets, model 9 reduces the P M A E % , P M A P E % and P R M S E % of model 8 by more than 15%. This indicates that the ECM could well excavate the predictable components hidden in the residuals and improve forecasting performance.
b.
Each module of our proposed model can effectively reduce prediction errors. For example, the MAEs of TCN, GRU, ESN, model 4, mode 8, and model 9 were USD 0.7462, USD 0.7680, USD 0.8262, USD 0.7119, USD 0.1728, and USD 0.1239, respectively. This indicates that the RL-based ensemble method in model 4 can make the optimal weight decision when combining different predictors, which can realize the complementary advantages of base predictors. The decomposition method in model 8 reduces the non-stationarity and randomness of raw data and improves the prediction accuracy. Model 9 could correct the prediction residuals based on the first two modules, which could minimize the prediction errors of our hybrid model.

3.4. Supplementary Experiments

Emergencies, including geopolitical conflicts, extreme weather, and financial crises, trigger sharp fluctuations in crude oil prices. These fluctuations often result in atypical trading behaviors and exert significant impacts on supply and demand. For forecasting models, such extreme conditions challenge their ability to learn from historical patterns and assess their robustness and adaptability to abrupt changes. Therefore, this section specifically employs crude oil price data encompassing periods of geopolitical conflict. By evaluating the proposed model’s predictive accuracy under severe price fluctuations, we more effectively gauged its capacity to handle volatile conditions. To further verify the efficiency of the model, we selected advanced forecasting methods as baselines, such as Liu’s model [44], Mi’s model [39], and Huang’s model [45]. Table 12 presents the evaluation indicator of the experimental models. Figure 10 shows the original crude oil price data of supplementary experiments and Figure 11 is the variation trend of forecasting results. From these results, we can obtain the following conclusions.
a.
In all cases, the forecasting precision of the hybrid model was better than that of the base predictor. This indicates that because the crude oil price data showed obvious chaos and fluctuation, it was difficult for a single model to accurately capture the changes in crude oil price, especially at mutation points. Hence, it was necessary to adopt an effective hybrid model to achieve precise forecasting for the crude oil price.
b.
Compared with other state-of-the-art models, the EWT-SARSA-TGE-ELM captured the changes at the mutation points more accurately. The main reason is that our model fully combines the advantages of single algorithms, which makes the model more robust and generalizable. EWT adaptively decomposed the original data into multiple sub-sequences to decrease the nonlinearity and randomness of raw data. In addition, the RL-based ensemble method made the optimal weight decision and combined different predictors to achieve the complementary advantages of base predictors. In addition, the ECM well extracted the predictable components hidden in residuals. In the context of the geopolitical conflict, the ECM focused on the leftover signals in residuals that reflected abrupt changes in price trends. A more accurate prediction could be obtained by overlaying the correction output from the ECM with the prediction from the ensemble module. Hence, our proposed model has potential application in crude oil price forecasting.

4. Application Analysis

4.1. Real-Time Adaptability

Deep learning and reinforcement learning are utilized in our model. Hence, the training process was time-consuming. To further evaluate the proposed model, this section contains analysis of its real-time performance and computational efficiency. All experiments were carried out on computers equipped with Core i7-9800X 3.8 GHz CPU, 16 G memory, and a single GPU 2080Ti. Table 13 presents the computational efficiency of different models. Analyzing the data in the table led to the following conclusions.
a.
Hybrid models require longer computation time than single models. This is primarily due to hybrid models’ intricate structure, which prolongs computation time compared with individual models. Moreover, using RL for ensemble optimization is slower than employing heuristic algorithms. This occurs because RL requires extensive training iterations during ensemble optimization to explore optimal strategies. During each decision-making step, RL not only evaluates the action values of the current state but also predicts the state and action values of the next step, thereby updating the current strategy. These procedures increase computational complexity. In contrast, heuristic algorithms optimize directly using rules or prior knowledge, requiring fewer iterations and thus achieving higher computational efficiency. However, the maximum time for RL was 105.36 s, which was considerably shorter than the model’s time interval (1 day). Therefore, despite the longer computation time for RL, such durations remain acceptable given the more excellent forecasting performance achieved.
b.
A highly adaptive real-time model can rapidly respond to market fluctuations and promptly update forecast outcomes, thus providing investors and decision-makers with timely and accurate references. As shown in the accompanying table, the model’s computational time was in the range of [663.46 s, 1297.93 s] across three datasets. The longest recorded computation time was 300 s. Notably, the shortest dataset interval in this study was one day, indicating that the model’s computation time is considerably shorter than the minimum data update interval. Consequently, the proposed model demonstrated robust real-time adaptability. Furthermore, the proposed model shows substantial potential for real-world applications, providing strong technical support for accurate crude oil price forecasting.

4.2. Application

Accurate and efficient crude oil price forecasting is essential for strategic planning across multiple stakeholders. Our proposed model shows significant potential for application in crude oil price forecasting. Similarly, this principle remains valid when confronting unforeseen contingencies in the crude oil market. For instance, supplementary experiments reveal that, even under pronounced price volatility triggered by geopolitical conflicts, the model sustained exceptional predictive accuracy. Consequently, the proposed forecasting framework allows energy companies to predict future price trends, manage price risks judiciously, maintain stable production costs, and ultimately improve profitability. Furthermore, because crude oil prices serve as a pivotal commodity in the futures market, their fluctuations substantially affect the stock market and other speculative investments. Timely adaptation to these variations can help the investment sector optimize asset allocation strategies and enhance capital efficiency. Lastly, accurate crude oil price forecasts offer vital insights for governmental policy-making, helping policymakers promptly address environmental pollution and energy scarcity.

5. Conclusions and Future Researches

5.1. Conclusions

Efficient and accurate crude oil price forecasting contributes to global economic development. A new model using ensemble deep reinforcement learning with error correction is proposed in this article. The article’s conclusions are as follows. First, deep neural networks with multiple hidden layers effectively handle non-stationary data and extract complex information. Among the base predictors, TCN, GRU, and ESN achieved superior forecasting accuracy. Second, the ensemble method capitalized on the complementary strengths of the base predictors, improving overall forecasting performance. The RL agent learned the optimal strategy and automatically adjusted it according to the objective, enabling the system to maximize returns or fulfill predefined goals. Hence, RL could autonomously learn and make decisions compared with heuristic algorithms. Additionally, the decomposition method effectively mitigated the nonlinearity of the original series. EWT adaptively partitions the raw data into multiple subsequences, enhancing the forecasting accuracy of the SARSA-TCN-GRU-ESN model. Furthermore, employing the ECM enhanced forecasting performance by extracting predictable components from the ensemble model’s residuals, thus reducing prediction errors. Lastly, the proposed model exhibited robust forecasting performance during emergencies. Compared with three state-of-the-art models, it more precisely captured volatility patterns at mutation points, underscoring its promising potential in crude oil price forecasting.

5.2. Future Research

Although the proposed model exhibits excellent forecasting performance for crude oil prices, it still suggests some directions for further research. First, incorporating additional oil-related indicators for multivariate forecasting could enhance the model’s predictive capability. This strategy enriches quantitative relationships and more accurately captures complex supply–demand and risk patterns. Additionally, because deep learning models often function like a “black box,” we should strengthen interpretability analysis. For instance, weight visualizations, feature-importance analyses, or integrated explainable models could be employed. Such approaches will help researchers and decision-makers comprehend the model’s reasoning processes and evaluate its reliability and stability.

Author Contributions

Conceptualization, Y.X.; methodology, G.Z.; validation, G.Z.; visualization, G.Z. and Y.X.; writing—original draft, G.Z. and Y.L. All authors have read and agreed to the published version of the manuscript.

Funding

This study is supported by the Postgraduate Scientific Research Innovation Project of Hunan Province (Funding number: CX20220280).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

The data were chosen from the US energy information administration (EIA). The data could be accessed through their official website: http://www.eia.gov/ (accessed on 26 December 2024).

Conflicts of Interest

The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.

Abbreviations

The following abbreviations are used in this manuscript:
Abbreviations
AIArtificial intelligence
ARIMAAutoregressive integrated moving average
BHABlack-hole optimization
BPNNBack propagation neural network
CEEMDComplementary ensemble empirical mode decomposition
DNNDeep neural network
ECMError correction method
EELMExtended extreme learning machine
ELMExtreme learning machine
EEMDEnsemble empirical mode decomposition
ESNEcho state network
EWTEmpirical wavelet transform
GAGenetic algorithm
GARCHGeneralized autoregressive conditional heteroskedasticity
GMGray model
GRNNGeneralized regression neural network
GWOGray wolf optimization
GRUGated recurrent unit
HAHistorical average
IMFIntrinsic mode function
LSSVMLeast squares support vector machine
LSTMLong short-term memory network
MAEMean absolute error
MAPEMean absolute percentage error
NARNonlinear auto regressive
OLSOrdinary least squares
PSOParticle swarm optimization
RLReinforcement learning
RMSERoot mean square error
RNNRecurrent neural network
SARSAState action reward state action
TCNTemporal convolutional network
VMDVariational modal decomposition
WPDWavelet packet decomposition

Appendix A

Table A1. The experimental parameters for the proposed model.
Table A1. The experimental parameters for the proposed model.
StageModel/AlgorithmParameterValue
Data pre-
processing
-Split ratio3:1:1
Sliding window5
DecompositionEWTDetection methodscalespace
Degree for the polynomial
interpolation
6
Maximum number of bands25
Sampling rate1
Filter width10
ForecastingESNReservoir size400
Spectral radius0.95
Input scaling0.5
Reservoir connectivity0.1
TCNSize of epochs100
Size of layers4
Kernel size5
Dropout 0.15
Batch size16
Learning rate0.01
GRUSize of epochs100
Size of layers3
Batch size16
Size of hidden units100
Learning rate0.01
EnsembleSARSAMaximum number of episodes100
Maximum step of each episode100
Learning rate0.3
Discount coefficient0.95
Error correctionELMSize of hidden neurons500
Activation functionReLU

References

  1. Kaymak, Ö.Ö.; Kaymak, Y. Prediction of crude oil prices in COVID-19 outbreak using real data. Chaos Solitons Fractals 2022, 158, 111990. [Google Scholar] [CrossRef] [PubMed]
  2. Miao, H.; Ramchander, S.; Wang, T.; Yang, D. Influential factors in crude oil price forecasting. Energy Econ. 2017, 68, 77–88. [Google Scholar] [CrossRef]
  3. Bisoi, R.; Dash, P.; Mishra, S. Modes decomposition method in fusion with robust random vector functional link network for crude oil price forecasting. Appl. Soft Comput. 2019, 80, 475–493. [Google Scholar] [CrossRef]
  4. Wang, D.; Luo, H.; Grunder, O.; Lin, Y.; Guo, H. Multi-step ahead electricity price forecasting using a hybrid model based on two-layer decomposition technique and BP neural network optimized by firefly algorithm. Appl. Energy 2017, 190, 390–407. [Google Scholar] [CrossRef]
  5. Tang, L.; Wu, Y.; Yu, L. A non-iterative decomposition-ensemble learning paradigm using RVFL network for crude oil price forecasting. Appl. Soft Comput. 2018, 70, 1097–1108. [Google Scholar] [CrossRef]
  6. Zhang, G.P. Time series forecasting using a hybrid ARIMA and neural network model. Neurocomputing 2003, 50, 159–175. [Google Scholar] [CrossRef]
  7. Wang, Q.; Song, X. Forecasting China’s oil consumption: A comparison of novel nonlinear-dynamic grey model (GM), linear GM, nonlinear GM and metabolism GM. Energy 2019, 183, 160–171. [Google Scholar] [CrossRef]
  8. Zhou, X.; Zhou, H.; Long, H. Forecasting the equity premium: Do deep neural network models work? Mod. Financ. 2023, 1, 1–11. [Google Scholar] [CrossRef]
  9. Chen, J.; Wei, X.; Liu, Y.; Zhao, C.; Liu, Z.; Bao, Z. Deep Learning for Water Quality Prediction—A Case Study of the Huangyang Reservoir. Appl. Sci. 2024, 14, 8755. [Google Scholar] [CrossRef]
  10. Chen, L.; Pelger, M.; Zhu, J. Deep learning in asset pricing. Manag. Sci. 2024, 70, 714–750. [Google Scholar] [CrossRef]
  11. Fan, X.; Wang, R.; Yang, Y.; Wang, J. Transformer–BiLSTM Fusion Neural Network for Short-Term PV Output Prediction Based on NRBO Algorithm and VMD. Appl. Sci. 2024, 14, 11991. [Google Scholar] [CrossRef]
  12. Cen, Z.; Wang, J. Crude oil price prediction model with long short term memory deep learning based on prior knowledge data transfer. Energy 2019, 169, 160–171. [Google Scholar] [CrossRef]
  13. Busari, G.A.; Lim, D.H. Crude oil price prediction: A comparison between AdaBoost-LSTM and AdaBoost-GRU for improving forecasting performance. Comput. Chem. Eng. 2021, 155, 107513. [Google Scholar] [CrossRef]
  14. Lytvynenko, V.; Wojcik, W.; Fefelov, A.; Lurie, I.; Savina, N.; Voronenko, M.; Boskin, O.; Smailova, S. Hybrid methods of GMDH-neural networks synthesis and training for solving problems of time series forecasting. In Proceedings of the Lecture Notes in Computational Intelligence and Decision Making: Proceedings of the XV International Scientific Conference “Intellectual Systems of Decision Making and Problems of Computational Intelligence”(ISDMCI’2019), Salisnyj Port, Ukraine, 21–25 May 2019; pp. 513–531. [Google Scholar]
  15. Sobolewski, Ł.; Miczulski, W. Methods of constructing time series for predicting local time scales by means of a GMDH-type neural network. Appl. Sci. 2021, 11, 5615. [Google Scholar] [CrossRef]
  16. Foroutan, P.; Lahmiri, S. Deep learning systems for forecasting the prices of crude oil and precious metals. Financ. Innov. 2024, 10, 111. [Google Scholar] [CrossRef]
  17. Li, J.; Zhu, S.; Wu, Q. Monthly crude oil spot price forecasting using variational mode decomposition. Energy Econ. 2019, 83, 240–253. [Google Scholar] [CrossRef]
  18. Wang, J.; Zhang, T.; Lu, T.; Xue, Z. A hybrid forecast model of EEMD-CNN-ILSTM for crude oil futures price. Electronics 2023, 12, 2521. [Google Scholar] [CrossRef]
  19. Liu, W.; Wang, C.; Li, Y.; Liu, Y.; Huang, K. Ensemble forecasting for product futures prices using variational mode decomposition and artificial neural networks. Chaos Solitons Fractals 2021, 146, 110822. [Google Scholar] [CrossRef]
  20. Lin, H.; Sun, Q. Crude oil prices forecasting: An approach of using CEEMDAN-based multi-layer gated recurrent unit networks. Energies 2020, 13, 1543. [Google Scholar] [CrossRef]
  21. Zhou, Y.; Wang, J.; Lu, H.; Zhao, W. Short-term wind power prediction optimized by multi-objective dragonfly algorithm based on variational mode decomposition. Chaos Solitons Fractals 2022, 157, 111982. [Google Scholar] [CrossRef]
  22. Zeng, L.; Ling, L.; Zhang, D.; Jiang, W. Optimal forecast combination based on PSO-CS approach for daily agricultural future prices forecasting. Appl. Soft Comput. 2023, 132, 109833. [Google Scholar] [CrossRef]
  23. Qu, Z.; Li, Y.; Jiang, X.; Niu, C. An innovative ensemble model based on multiple neural networks and a novel heuristic optimization algorithm for COVID-19 forecasting. Expert Syst. Appl. 2023, 212, 118746. [Google Scholar] [CrossRef]
  24. Alruqimi, M.; Di Persio, L. Multistep Brent oil price forecasting with a multi-aspect aeta-heuristic optimization and ensemble deep learning model. Energy Inform. 2024, 7, 130. [Google Scholar] [CrossRef]
  25. Ding, M.; Zhou, H.; Xie, H.; Wu, M.; Nakanishi, Y.; Yokoyama, R. A gated recurrent unit neural networks based wind speed error correction model for short-term wind power forecasting. Neurocomputing 2019, 365, 54–61. [Google Scholar] [CrossRef]
  26. Gilles, J. Empirical wavelet transform. IEEE Trans. Signal Process. 2013, 61, 3999–4010. [Google Scholar] [CrossRef]
  27. Hong, S.; Zhou, Z.; Zio, E.; Wang, W. An adaptive method for health trend prediction of rotating bearings. Digit. Signal Process. 2014, 35, 117–123. [Google Scholar] [CrossRef]
  28. Kong, Y.; Wang, T.; Chu, F. Meshing frequency modulation assisted empirical wavelet transform for fault diagnosis of wind turbine planetary ring gear. Renew. Energy 2019, 132, 1373–1388. [Google Scholar] [CrossRef]
  29. Bhattacharyya, A.; Singh, L.; Pachori, R.B. Fourier–Bessel series expansion based empirical wavelet transform for analysis of non-stationary signals. Digit. Signal Process. 2018, 78, 185–196. [Google Scholar] [CrossRef]
  30. Bai, S.; Kolter, J.Z.; Koltun, V. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint 2018, arXiv:1803.01271. [Google Scholar]
  31. Wang, Y.; Chen, J.; Chen, X.; Zeng, X.; Kong, Y.; Sun, S.; Guo, Y.; Liu, Y. Short-term load forecasting for industrial customers based on TCN-LightGBM. IEEE Trans. Power Syst. 2020, 36, 1984–1997. [Google Scholar] [CrossRef]
  32. Wang, J.; Chen, Y.; Zhu, S.; Xu, W. Depth feature extraction-based deep ensemble learning framework for high frequency futures price forecasting. Digit. Signal Process. 2022, 127, 103567. [Google Scholar] [CrossRef]
  33. Ke, K.; Hongbin, S.; Chengkang, Z.; Brown, C. Short-term electrical load forecasting method based on stacked auto-encoding and GRU neural network. Evol. Intell. 2019, 12, 385–394. [Google Scholar] [CrossRef]
  34. Wang, B.; Wang, J. Energy futures and spots prices forecasting by hybrid SW-GRU with EMD and error evaluation. Energy Econ. 2020, 90, 104827. [Google Scholar] [CrossRef]
  35. Qin, L.; Li, W.; Li, S. Effective passenger flow forecasting using STL and ESN based on two improvement strategies. Neurocomputing 2019, 356, 244–256. [Google Scholar] [CrossRef]
  36. Wang, L.; Hu, H.; Ai, X.-Y.; Liu, H. Effective electricity energy consumption forecasting using echo state network improved by differential evolution algorithm. Energy 2018, 153, 801–815. [Google Scholar] [CrossRef]
  37. Li, B.; Wu, G.; He, Y.; Fan, M.; Pedrycz, W. An Overview and Experimental Study of Learning-Based Optimization Algorithms for the Vehicle Routing Problem. IEEE/CAA J. Autom. Sin. 2022, 9, 1115–1138. [Google Scholar] [CrossRef]
  38. Kiran, B.R.; Sobh, I.; Talpaert, V.; Mannion, P.; Al Sallab, A.A.; Yogamani, S.; Pérez, P. Deep reinforcement learning for autonomous driving: A survey. IEEE Trans. Intell. Transp. Syst. 2021, 23, 4909–4926. [Google Scholar] [CrossRef]
  39. Mi, X.; Liu, H.; Li, Y. Wind speed prediction model using singular spectrum analysis, empirical mode decomposition and convolutional support vector machine. Energy Convers. Manag. 2019, 180, 196–205. [Google Scholar] [CrossRef]
  40. Rummery, G.A.; Niranjan, M. On-Line Q-Learning Using Connectionist Systems; University of Cambridge: Cambridge, UK, 1994; Volume 37. [Google Scholar]
  41. Wang, L.; He, Y.; Liu, X.; Li, L.; Shao, K. M2TNet: Multi-modal multi-task Transformer network for ultra-short-term wind power multi-step forecasting. Energy Rep. 2022, 8, 7628–7642. [Google Scholar] [CrossRef]
  42. Song, W.; Fujimura, S. Capturing combination patterns of long-and short-term dependencies in multivariate time series forecasting. Neurocomputing 2021, 464, 72–82. [Google Scholar] [CrossRef]
  43. Hae, H.; Kang, S.-J.; Kim, T.O.; Lee, P.H.; Lee, S.-W.; Kim, Y.-H.; Lee, C.W.; Park, S.-W. Machine Learning-Based prediction of Post-Treatment ambulatory blood pressure in patients with hypertension. Blood Press. 2023, 32, 2209674. [Google Scholar] [CrossRef] [PubMed]
  44. Liu, H.; Yu, C.; Yu, C.; Chen, C.; Wu, H. A novel axle temperature forecasting method based on decomposition, reinforcement learning optimization and neural network. Adv. Eng. Inform. 2020, 44, 101089. [Google Scholar] [CrossRef]
  45. Huang, Y.; Deng, Y. A new crude oil price forecasting model based on variational mode decomposition. Knowl.-Based Syst. 2021, 213, 106669. [Google Scholar] [CrossRef]
Figure 1. Structure of the hybrid model.
Figure 1. Structure of the hybrid model.
Applsci 15 01055 g001
Figure 2. Raw crude oil price data and their split.
Figure 2. Raw crude oil price data and their split.
Applsci 15 01055 g002
Figure 3. Prediction errors for different predictors across the WTI crude oil price series #1.
Figure 3. Prediction errors for different predictors across the WTI crude oil price series #1.
Applsci 15 01055 g003
Figure 4. Prediction results for different predictors across the WTI crude oil price series #1.
Figure 4. Prediction results for different predictors across the WTI crude oil price series #1.
Applsci 15 01055 g004
Figure 5. Scatter of the prediction results for SARSA and different heuristic ensemble methods.
Figure 5. Scatter of the prediction results for SARSA and different heuristic ensemble methods.
Applsci 15 01055 g005
Figure 6. Scatter of the prediction results for different decomposition methods.
Figure 6. Scatter of the prediction results for different decomposition methods.
Applsci 15 01055 g006
Figure 7. Prediction results for experimental models across the WTI crude oil price series #1.
Figure 7. Prediction results for experimental models across the WTI crude oil price series #1.
Applsci 15 01055 g007
Figure 8. Prediction results for experimental models across the WTI crude oil price series #2.
Figure 8. Prediction results for experimental models across the WTI crude oil price series #2.
Applsci 15 01055 g008
Figure 9. Prediction results for experimental models across the WTI crude oil price series #3.
Figure 9. Prediction results for experimental models across the WTI crude oil price series #3.
Applsci 15 01055 g009
Figure 10. Raw crude oil price series during geopolitical conflicts.
Figure 10. Raw crude oil price series during geopolitical conflicts.
Applsci 15 01055 g010
Figure 11. Prediction results for experimental models.
Figure 11. Prediction results for experimental models.
Applsci 15 01055 g011
Table 1. The literature review for different models.
Table 1. The literature review for different models.
MethodsRefs.ModelsYearAdvantagesDrawbacks
Traditional
methods
[6]ARIMA2003
a.
Simple framework;
b.
Quick modeling.
a.
Limited nonlinearity capture;
b.
Requires stationarity.
[7]GM2019
Deep
learning
methods
[12]LSTM2019
a.
Captures complex patterns;
b.
Learns hierarchical features;
c.
Adapts to nonlinearity.
a.
Single-model accuracy constraints.
[13]GRU2021
[8]DNN2024
[15]GMDH2021
[16]TCN2024
Hybrid
methods
[18]EEMD-CNN-ILSTM2023
a.
Reduces data complexity through decomposition;
b.
Adapts to diverse patterns;
c.
Leverages multiple predictors’ strengths.
a.
Suboptimal usage of decomposed data;
b.
Heuristic methods risk local optima.
[19]VMD-ANN2021
[20]CEEMD-ML-GRU2020
[22]PSO-based ensemble2023
[23]SCWOA-based ensemble2023
[24]GWO-based ensemble2024
Table 2. Statistical indicators of crude oil price data.
Table 2. Statistical indicators of crude oil price data.
Crude Oil Price DataDataset #1 (USD)Dataset #2 (USD)Dataset #3 (USD)
Minimum26.2110.7911.22
Mean66.0550.2146.86
Maximum110.53145.29140
Standard derivation22.8328.9228.87
Table 3. The descriptions for different hybrid methods.
Table 3. The descriptions for different hybrid methods.
NameDescription
Model 1GA-TCN-GRU-ESN
Model 2BHA-TCN-GRU-ESN
Model 3GWO-TCN-GRU-ESN
Model 4SARSA-TCN-GRU-ESN
Model 5PSO-TCN-GRU-ESN
Model 6WPD-SARSA-TCN-GRU-ESN
Model 7VMD-SARSA-TCN-GRU-ESN
Model 8EWT-SARSA-TCN-GRU-ESN
Model 9EWT-SARSA-TCN-GRU-ESN-ELM
Table 4. Evaluation indicators for different predictors across three WTI crude oil price series.
Table 4. Evaluation indicators for different predictors across three WTI crude oil price series.
SeriesPredictorsMAE (USD)MAPE (%)RMSE (USD)
#1TCN0.74621.16341.0092
GRU0.76801.19101.0048
ESN0.82621.27751.0821
BPNN2.72294.34052.9994
GRNN3.64105.67394.0624
RBFNN1.39252.12871.8051
LSTM0.95361.43941.1935
CNN0.86981.37291.1409
GMDH1.15921.80261.4410
#2TCN2.24854.75533.2635
GRU2.12644.66992.8677
ESN2.18184.71522.9101
BPNN2.84666.34873.6113
GRNN3.56577.91794.6912
RBFNN3.51097.85124.8379
LSTM2.34115.07233.1779
CNN2.36025.17363.2895
GMDH2.91506.52463.8918
#3TCN5.643112.06867.7571
GRU6.014813.96478.5027
ESN5.527312.01897.4268
BPNN7.111914.71139.2021
GRNN7.720618.465310.3241
RBFNN7.344518.005410.1406
LSTM6.046614.16948.4556
CNN6.212314.13168.6501
GMDH7.626918.383211.0227
Note. MAE (mean absolute error), MAPE (mean absolute percentage error), and RMSE (root mean square error) measure predictive accuracy. Lower values indicate better performance. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.
Table 5. Evaluation indicators for different ensemble methods across three WTI crude oil price series.
Table 5. Evaluation indicators for different ensemble methods across three WTI crude oil price series.
SeriesModelsEnsemble MethodsMAE (USD)MAPE (%)RMSE (USD)
#1Model 1GA0.73271.14190.9950
Model 2BHA0.71621.11420.9739
Model 3GWO0.72181.12390.9826
Model 4SARSA0.71191.10820.9691
#2Model 1GA2.08954.48962.7733
Model 2BHA2.10794.51642.8199
Model 3GWO2.05134.48492.7391
Model 4SARSA2.01444.39072.6651
#3Model 1GA5.389111.81527.1542
Model 2BHA5.420212.00447.1597
Model 3GWO5.453911.80657.2562
Model 4SARSA5.253311.87836.9218
Note. Model 1 = GA-TCN-GRU-ESN; Model 2 = BHA-TCN-GRU-ESN; Model 3 = GWO-TCN-GRU-ESN; Model 4 = SARSA-TCN-GRU-ESN. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.
Table 6. Promoting percentages for ensemble methods.
Table 6. Promoting percentages for ensemble methods.
ModelsIndicesSeries #1Series #2Series #3
Model 4 vs. TCNPMAE (%)4.596610.41146.9076
PMAPE (%)4.74457.66721.5768
PRMSE (%)3.973418.311110.7682
Model 4 vs. GRUPMAE (%)7.30475.267112.6604
PMAPE (%)6.95215.978714.9405
PRMSE (%)3.55297.064918.5929
Model 4 vs. ESNPMAE (%)11.83437.67264.9572
PMAPE (%)13.25246.88201.1698
PRMSE (%)10.44277.06496.7997
Note. Model 4 = SARSA-TCN-GRU-ESN. This table shows the promoting percentages of evaluation indicators for the SARSA-based ensemble compared to different single prediction methods. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.
Table 7. Evaluation indicators for different decomposition methods across three WTI crude oil price series.
Table 7. Evaluation indicators for different decomposition methods across three WTI crude oil price series.
SeriesModelsDecomposition MethodsMAE (USD)MAPE (%)RMSE (USD)
#1Model 4-0.71191.10820.9691
Model 6WPD0.22390.35380.2671
Model 7VMD0.41310.62280.5415
Model 8EWT0.17280.26400.2207
#2Model 4-2.01444.39072.6651
Model 6WPD0.69111.55650.9437
Model 7VMD1.10312.43941.3172
Model 8EWT0.33870.76250.4701
#3Model 4-5.253311.87836.9218
Model 6WPD1.77924.22142.4301
Model 7VMD3.24936.68513.6351
Model 8EWT1.61313.56951.9352
Note. Model 4 = SARSA-TCN-GRU-ESN; Model 6 = WPD-SARSA-TCN-GRU-ESN; Model 7 = VMD-SARSA-TCN-GRU-ESN; Model 8 = EWT-SARSA-TCN-GRU-ESN. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.
Table 8. Promoting percentages for decomposition algorithms.
Table 8. Promoting percentages for decomposition algorithms.
ModelsIndicesSeries #1Series #2Series #3
Model 6 vs. Model 4PMAE (%)68.549065.692066.1318
PMAPE (%)68.074464.550164.4612
PRMSE (%)72.438364.590464.8920
Model 7 vs. Model 4PMAE (%)41.972245.239238.1474
PMAPE (%)43.800844.441743.7201
PRMSE (%)44.123450.576047.4833
Model 8 vs. Model 4PMAE (%)75.726983.186169.2936
PMAPE (%)76.177682.633769.9494
PRMSE (%)77.226382.360972.0420
Note. Model 4 = SARSA-TCN-GRU-ESN; Model 6 = WPD-SARSA-TCN-GRU-ESN; Model 7 = VMD-SARSA-TCN-GRU-ESN; Model 8 = EWT-SARSA-TCN-GRU-ESN. This table shows the promoting percentages of evaluation indicators for different decomposition methods compared to the SARSA-based ensemble. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.
Table 9. MSE with its components MSE1, MSE2, MSE3.
Table 9. MSE with its components MSE1, MSE2, MSE3.
SeriesDecomposition MethodsMSE
TCNGRUESN
#1WPD
VMD
EWT
0.13530.16190.1527
0.37310.44920.4913
0.09750.10130.1434
#2WPD
VMD
EWT
1.17311.07521.5963
2.38321.89232.4435
0.98160.43520.7369
#3WPD
VMD
EWT
7.15366.79496.8753
15.403116.709815.2039
5.05244.19834.7721
Note. The table shows MSE1, MSE2, and MSE3, which are the prediction results of each predictor on the decomposed data before using the ensemble model.
Table 10. Evaluation indicators for experimental models across three WTI crude oil price series.
Table 10. Evaluation indicators for experimental models across three WTI crude oil price series.
SeriesModelsMAE (USD)MAPE (%)RMSE (USD)
#1TCN0.74621.16341.0092
GRU0.76801.19101.0048
ESN0.82621.27751.0821
Model 40.71191.10820.9691
Model 80.17280.2640.2207
Model 90.12390.19140.1582
#2TCN2.24854.75533.2635
GRU2.12644.66992.8677
ESN2.18184.71522.9101
Model 42.01444.39072.6651
Model 80.33870.76250.4701
Model 90.25290.54930.3375
#3TCN5.643112.06867.7571
GRU6.014813.96478.5027
ESN5.527312.01897.4268
Model 45.253311.87836.9218
Model 81.61313.56951.9352
Model 91.32242.70041.6415
Note. Model 4 = SARSA-TCN-GRU-ESN; Model 8 = EWT-SARSA-TCN-GRU-ESN; Model 9 = EWT-SARSA-TCN-GRU-ESN-ELM. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.
Table 11. Promoting percentages for the ECM.
Table 11. Promoting percentages for the ECM.
IndicesModel 9 vs. Model 8
Series #1Series #2Series #3
PMAE (%)28.298625.332218.0212
PMAPE (%)27.500027.960724.3479
PRMSE (%)28.319028.206815.1767
Note. Model 8 = EWT-SARSA-TCN-GRU-ESN; Model 9 = EWT-SARSA-TCN-GRU-ESN-ELM. This table shows the promoting percentages of evaluation indicators for the model with the error correction module and that without error correction. Series #1, Series #2, and Series #3 refer to three WTI crude oil price datasets with different time spans.
Table 12. Evaluation indicators for experimental models.
Table 12. Evaluation indicators for experimental models.
ModelsMAE (USD)MAPE (%)RMSE (USD)
TCN2.30032.89413.3531
GRU2.93443.45145.1989
ESN2.42492.90764.4565
Liu’s model0.49260.55751.1428
Mi’s model0.58270.65961.3803
Huang’s model0.42700.54460.7404
Model 90.35100.44200.5818
Table 13. Computation time for different models.
Table 13. Computation time for different models.
Forecasting ModelsTraining Time (s)
#1#2#3
TCN45.3247.8126.93
GRU19.2118.717.34
ESN8.137.254.15
PSO-TCN-GRU-ESN87.7591.5453.51
GA-TCN-GRU-ESN79.8281.6145.32
SARSA-TCN-GRU-ESN100.23105.3658.86
EWT- SARSA-TCN-GRU-ESN1185.911290.49659.19
EWT-SARSA-TGE-ELM1194.161297.93663.46
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Zheng, G.; Li, Y.; Xia, Y. Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction. Appl. Sci. 2025, 15, 1055. https://doi.org/10.3390/app15031055

AMA Style

Zheng G, Li Y, Xia Y. Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction. Applied Sciences. 2025; 15(3):1055. https://doi.org/10.3390/app15031055

Chicago/Turabian Style

Zheng, Guangji, Ye Li, and Yu Xia. 2025. "Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction" Applied Sciences 15, no. 3: 1055. https://doi.org/10.3390/app15031055

APA Style

Zheng, G., Li, Y., & Xia, Y. (2025). Crude Oil Price Forecasting Model Based on Neural Networks and Error Correction. Applied Sciences, 15(3), 1055. https://doi.org/10.3390/app15031055

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop