1. Introduction
Metal minerals are an important source of economic income for many metal-exporting countries, such as Chile and Zambia, and thus, fluctuations in metal prices have a significant impact on trade in these countries [
1,
2,
3]. At the same time, price fluctuations also affect business management, raw material supply and investment risk, as metal raw materials are closely related to many industries and involved in industrial production [
4]. This is why price forecasting has become an important part of the solution to price fluctuations. Forecasting is defined as the art and science of predicting future events [
5]. Typically, the process of forecasting starts with collecting historical information and extrapolating it into the future using various mathematical models. Accurately predicting the future is essential for effective planning in all areas of business, making forecasting one of the most important tools managers can use to manage or organize.
In strategic mine planning, the price of mineral products is the most important and effective parameter in evaluating engineering projects, such as mines. The feasibility and economics of mining projects are typically assessed using the Net Present Value (NPV) method, which calculates the present value of the economic value of a mining block [
6]. Price forecasts are important in assessing the potential for economic extraction of reserves [
7]. In addition, by forecasting changes in ore prices, mining project decision-makers can play a key role in making accurate project decisions, blocking development or limiting mining activities [
8]. It is worth noting that the establishment of reliable and stable metal price forecasting models, even with small improvements in forecasting accuracy, can bring significant benefits to metal producers [
9,
10]. However, the uncertain nature of time series data is a significant barrier to forecasting accuracy. At the same time, metal price data have a high degree of non-linear complexity and are influenced by supply and demand and financial markets, and effective forecasting methods and high forecasting accuracy remain a challenge [
1,
9,
11].
To address this issue, researchers have proposed a variety of forecasting methods to improve the accuracy of predictions. In terms of using statistical methods, a number of econometric models have been developed to forecast metal prices, such as the Autoregressive Integrated Moving Average (ARIMA) [
12,
13], Wavelet Analysis [
14] and Generalized Autoregressive Conditional Heterogeneity (GARCH) models [
13]. Kriechbaumer et al. proposed an improved model based on wavelet analysis and ARIMA to predict copper prices. The results showed that the performance of the ARIMA model in predicting metal prices was significantly improved compared to other methods [
14]. However, the inherent limitations of these time series models in capturing the linear behavior of the time series and the specific non-linear patterns prevent them from further improving time series forecasting [
11,
15]. As technology advances, Artificial Intelligence (AI) models have been widely studied and used to identify and capture various features in the analysis of metal price time series [
8,
12,
16,
17,
18,
19]. In terms of using metaheuristic algorithms, Eguel et al. used simulated annealing (SA) and genetic algorithms (GAs), selected a 10-month price data set as a training set to predict copper prices, and proposed genetic algorithms as the best model for predicting copper prices [
20]. Regarding the use of hybrid intelligence algorithms, Alameer et al. developed a novel hybrid artificial intelligence model (GA-ANFIS) based on a genetic algorithm and an adaptive neuro-fuzzy inference system (ANFIS) model for predicting copper prices. Several different models were also used for a comparison with the new model, such as GARCH, ARIMA, ANFIS (without optimization) and Support Vector Machine (SVM) models. In the end, they concluded that the GA-ANFIS model was better at predicting copper prices than the other models [
21]. Hu et al. developed a novel intelligent model based on the ANN-LSTM-GARCH method for predicting copper price fluctuations using machine learning and deep learning methods. By optimizing an artificial neural network (ANN), the prediction accuracy was greatly improved [
22]. However, these models also have their own drawbacks; for example, the widely used ANN can cause the algorithm to lack global search capability and easily fall into local optimization [
23].
Although there is a large body of literature on metal price forecasting as described above, the uncertainty of metal price markets is not taken into account in forecasting, despite the prevalence of this reality. At the same time, in the face of highly volatile metal prices, deterministic forecasting can be difficult to achieve accurate predictions, which can increase the likelihood of mispricing and lead to irrational decisions or investments. In addition, most price forecasting focuses on single or multi-point forecasts at a deterministic time, while interval forecasting of metal prices has received little attention. The variability of forecasts due to uncertainty can be greatly reduced through interval quantification. At the same time, within the upper and lower limits of the significance level considered, interval forecasts can provide more reliable and stable references for mine managers and decision-makers. This study proposes a novel hybrid system consisting of point forecasting and interval forecasting models. The metal price point forecasts are used to support the price interval forecasts. For the point forecasting, an innovative and stable hybrid forecasting method was developed using Variational Modal Decomposition (VMD) and a Sparrow Search Algorithm Optimization (SSA)-based Long Short-Term Memory (LSTM) neural network to build the novel hybrid price point forecasting model. The VMD is used to decompose the metal price time series data, and then, each decomposition unit is predicted using the SSA-LSTM model, and the prediction results of each decomposition unit are summed to obtain the point prediction results. Finally, based on the best distribution function and the point forecasting results, a range and confidence level are set and an interval forecasting model is established, with copper and aluminum price data fed into the model to obtain interval forecasting results. Three comparative models (LSTM, VMD-LSTM and SSA-LSTM) are also introduced for comparative experiments.
Further material is divided into several parts. Thus,
Section 2 presents the modelling methodology;
Section 3 describes the hybrid prediction model structure and the model prediction process;
Section 4 presents the model data, parameter settings and evaluation metrics;
Section 5 presents the analysis and discussion of the experimental prediction results; and finally,
Section 6 presents the overall conclusions and recommendations.
2. Methodology
This paper introduces the VMD, SSA, LSTM and the Uncertainty Prediction Method.
2.1. Variational Mode Decomposition
In 1998, Huang et al. invented Empirical Mode Decomposition (EMD) to solve the characteristics of traditional decomposition methods (including wavelet transform and Fourier transform), which cannot satisfy the uncertainty in the face of nonlinear and non-smooth features, but this decomposition method has a serious modal confounding phenomenon. In order to solve this problem, Konstantin Dragomiretskiy et al. proposed its improved version of variational modal decomposition VMD in 2014. Unlike the EMD principle, the VMD decomposition method uses an iterative search for the optimal solution of the variational model to determine the center frequency and bandwidth of each decomposition component, which is a completely non-recursive model, with which the model searches for the modal components of the set of modal components and their respective center frequencies, and each mode is smoothed after demodulation into baseband, as demonstrated by experimental results: the method is more robust with respect to sampling and noise. This means that the non-periodic signal is analyzed in the frequency domain and the complex signal is decomposed into multiple harmonics [
24]. It is currently used in many fields [
25,
26].
The goal of VMD is to decompose real-valued input signals into discrete sub-signals (modes) , assuming that each mode is mostly compact in the vicinity of the frequency center . Use the VMD method to decompose into k sub-sequences. The procedure is as follows.
Step 1: For each mode , the associated resolved signal is calculated using the Hilbert transform and the spectrum is constructed.
Step 2: The spectrum of the modes is shifted to the baseband by the respective estimated center frequencies.
Step 3: The bandwidth is estimated from the Gaussian smoothness of the demodulated signal, i.e., the gradient
of the signal to estimate the bandwidth. The resulting constrained variational problem is as follows:
where,
is the first
modal component,
is the frequency center of the
is the frequency center corresponding to the first mode and
is the unit pulse function.
Step 4: Introduce the quadratic penalty term and the Lagrange multiplier
The Lagrange expression is obtained by transforming the constrained variational problem into an unconstrained variational problem, with the quadratic penalty term
The effect is to reduce the interference of Gaussian noise.
Step 5: Finally, the solution is carried out using the alternating direction multiplier method (ADMM).
2.2. Sparrow Search Algorithm
The Sparrow Search Algorithm SSA is a novel swarm intelligence optimization algorithm inspired by the foraging and anti-predation behavior of sparrows, proposed by Xue et al. in 2020 [
27]. It has the characteristics of fast convergence and strong search ability, and thus, this paper uses the sparrow optimization algorithm to search for the optimization of LSTM parameters and improve its prediction accuracy.
In SSA, individuals are divided into three categories: discoverers, followers and vigilantes, and the position of each individual corresponds to a solution. The algorithm obtains the position of the optimal solution by continuously updating the positions of these three categories of individuals and calculating the fitness value of all individuals at each cycle, with the main update iteration steps shown below.
Step 1: Initialize the population, the proportion of predators and joiners and the number of iterations.
Step 2: Calculate the fitness values and sort them from largest to smallest.
Step 3: Update the finder position (1).
Step 4: Update the follower position (2).
Step 5: Update the vigilante position (3) (Aware of Danger Sparrow).
Step 6: Calculate the fitness value and update the sparrow position.
Step 7: If the requirements are met, output the result; otherwise, repeat steps 2–6.
- (1)
Finder position update.
The finder checks for predators in the foraging area and if not, searches extensively for food; if there are predators, it flies to a safe area. The expression is shown in Equation (10) below.
where
t is the number of current iterations;
is the position of the
ith sparrow in the
jth dimension in generation
t,
is the maximum number of iterations, which is a random number of (0, 1],
Q is a random number obeying a normal distribution and
L is an all −1 matrix,
is the alarm value and
ST is the safety threshold.
- (2)
Follower position update
When a follower joins, it is determined whether it is eligible to compete with the finder for food i.e., whether its location is better, if its location corresponds to a lower fitness level then it is not eligible to compete and it needs to fly to another area to forage; otherwise, the joiner will forage in the vicinity of the best individual
; otherwise, the joiner will forage near the best individual. The expressions are shown below.
where
denotes the position of the worst adapted individual in generation
t,
denotes the position of the best adapted individual in generation
t + 1 and
A is a matrix of the same dimension as
L with elements that are subsequently pre-defined as 1 or −1 and satisfy
A+ =
AT (
A AT)
−1.
- (3)
Vigilante position update
When individuals are at the periphery of the population, they need to adopt anti-predatory behavior to achieve a higher degree of adaptation; when they are at the center of the population, they need to move closer to their peers to stay away from the danger zone. The expressions are as follows.
where
is the global optimal position in the t generation,
is the control step size that follows a (0, 1) normal distribution,
is a constant of [−1, 1],
is a constant that avoids the denominator being zero and
and
are the current best and worst fitness values, respectively.
2.3. Long Short-Term Memory Neural Network
Hidden variable models have long suffered from long-term information preservation and short-term input deficits. LSTM is a modified Recurrent Neural Network (RNN) proposed by Hochreiter et al. in 1997, which consists of an input layer, an implicit layer, a recurrent layer and an output layer [
28]. Shown in
Figure 1 is the Long Short-Term Memory Neural Network.
In order to solve the long dependency problem of RNN based on historical data, the problem of long-term time series processing was enhanced by adding memory unit states in the implicit layer and also improving the problems of gradient disappearance and gradient explosion [
29,
30].
The control units are created in the implicit layer as forgetting gates
, input gates
and output gates
. The forgetting gate is to selectively forget the information in the cell; the input gate is to selectively record new information into the cell state. The forgetting gate selectively forgets the information in the cell; the input gate serves to selectively record new information into the cell state, and the input gate controls how much of the new data from the candidate memory cell is used
. The output gate is to take the stored information to the next neuron. The expressions for the calculation of each variable are shown below:
where
is the sigmoid activation function;
,
,
and
are the weight matrices of the respective gates;
,
,
and
are the bias parameters of the corresponding gates; and
,
and
are the cell inputs, outputs, and states at the current.
2.4. Interval Prediction Theory
In recent years, many researchers have started interval predictions of time series data based on point prediction results and the most appropriate distribution function to capture the trend characteristics of the time series objects they are studying, such as airborne particulate matter, wind speed, etc. [
31,
32,
33,
34,
35].
This forecasting method first requires fitting the time series data with a distribution function, mining its relevant features and finding the most appropriate distribution function. This section introduces five metal price distribution models, Weibull, Logistic, Lognormal, Normal and Gamma five distribution functions DFs to represent metal price states. The associated probability density functions and cumulative distribution functions are shown in
Table 1.
Point forecast values and distribution functions are then used to provide uncertainty information about future values. A dynamic interval forecasting method is proposed to give uncertainty information about future prices by updating the expectation at the next point using the forecast value. For example, the forecast value of the next point is
, and
almost reaches the maximum of the historical data. If the forecasts are reliable and accurate, then the lower and upper limits of the interval at the next point are very large values. At the
significance level, the relationship between the upper and lower limits and the real value can be expressed by Equation (8):
In this paper, price values are random variables for which estimates are considered as expectations of future points. Equation (8) can be written as Equation (9):
In addition, we assume that the form of the predicted values is similar to the historical distribution
f. Therefore, once the distribution function of the original time series has been determined, the historical variance
S2 can be used as the variance of the unknown quantity points. The conditional probability is then equal to Equation (10); thus, it is possible to calculate, at a certain confidence level,
. The values of the upper and lower bounds are calculated at a certain confidence level.
Equation (10) can also be written as Equation (11):
4. Hybrid Predictive Model Applications
This section is divided into three parts: data introduction, parameter setting and model evaluation metrics, as described below.
4.1. Introduction to the Data
The data source for this study is the listed prices of copper and aluminum metal (in RMB) on the Shanghai Futures Exchange, and the daily prices of copper and aluminum metal are used to test the hybrid forecasting model developed. The time interval selected was from 5 January 2012 to 28 November 2022. Specifically, these datasets consist of two subsets: the training dataset (in-sample) ranging from 5 January 2012 to 21 August 2019 (70%), which is used to build the forecasting model; and the testing dataset (out-of-sample) ranging from 22 August 2019 to 28 November 2022 (30%), which is used to validate the performance of the designed model. Details of the total number of samples, the number of training samples, the number of test samples and the maximum, minimum, mean and standard deviation of the samples for the experimental dataset are shown in
Table 2.
4.2. Parameter Settings
When the VMD method is used to decompose the original metal price data, the value of the number of modes k has a great influence on the decomposition effect. When k is 7, the corresponding central frequency is more dispersed, and thus, the value is 7.
The neural network of the single LSTM model is constructed using a double hidden layer structure, with the number of neurons H contained in each hidden layer being 20, the training count E being 200 and the learning rate being taken as 0.005.
The VMD–SSA–LSTM hybrid model was set to have a population size of 50 sparrows and a maximum number of iterations M of 10; 20% of the population was found, with the remainder being joiners. The safety threshold is 0.8, and when the warning value is less than 0.8, there is no predator; otherwise, there is a predator that is dangerous to the population and needs to go elsewhere to feed. The sparrow search algorithm optimization LSTM parameters number of neurons H, training number E and learning rate are [10, 200], [10, 200] and [0.001, 0.02], respectively.
4.3. Evaluation Indicators
To evaluate the predictive capability of the proposed hybrid system, seven evaluation metrics are introduced in this paper. These performance metrics include the performance of the prediction model at five measurement points [
36,
37], the package Mean Absolute Error (MAE), Root Mean Square Error (RMSE), Mean Absolute Percentage Error (MAPE), Nash-Sutcliffe efficiency coefficient (NSE) and coefficient of determination
R2, and two generic metrics reflecting interval prediction capability [
38,
39], Interval forecasts normalized average width (IFNAW) and Interval forecasts coverage probability (IFCP). The detailed formulas and definitions of the above model performance metrics are shown in
Table 3.
5. Analysis of Experimental Results
In this section, two experiments are conducted: metal price point prediction and interval prediction. In order to demonstrate the better forecasting capability of this hybrid model, four benchmark models are introduced for a comparison, including LSTM, SSA-LSTM, VMD–LSTM and VMD–SSA–LSTM, thus illustrating the effectiveness and forecasting capability of the proposed hybrid system.
5.1. Point Forecasting of Metal Prices
In this section, an innovative hybrid model combining Variational Modal Decomposition (VMD) and SSA–LSTM, namely VMD–SSA–LSTM, is used for the point forecasting of prices of copper and aluminum metals. To demonstrate that the hybrid VMD–SSA–LSTM model proposed in this paper has good forecasting capability, three comparative models, LSTM, SSA–LSTM and VMD–LSTM, are also developed in this paper. A comparison of the price forecast and true value trends of the four models is shown in
Figure 2, and a comparison of the forecast scatter is shown in
Figure 3. In addition, five evaluation metrics were used to reflect the prediction level of the models, MAE, RMSE, MAPE, NSE and
R2, and they are proposed. The detailed values of the errors for the proposed hybrid and comparative models are shown in
Table 4, where the bolded labeled values are the best values for each evaluation indicator.
As can be seen from
Table 4, the VMD–SSA–LSTM model has a more obvious advantage in predicting metal prices compared to the LSTM model and the VMD–LSTM model, where for copper metal prices, with a reduction of 14.30%, 9.85% and 9.50%, respectively, in the indicator MAE and 23.34%, 12.75% and 7.08%, respectively, in the RMSE, decreases in MAPE of 0.24%, 0.13% and 0.21%, respectively, increases in the R
2 indicator of 0.82%, 0.28% and 0.22% and increases in the NSE of 0.43%, 0.19% and 0.10%. For aluminum metal prices, there were decreases of 59.68%, 18.53% and 18.93%, respectively, in the indicator MAE, and the RMSE decreased by 61.72%, 26.29% and 23.17%, respectively, with decreases in MAPE of 1.22%, 0.19% and 0.23%, respectively. The indicator R
2 increased by 0.13%, 0.45% and 0.52%, respectively, and NSE increased by 3.38%, 0.47% and 0.39%, respectively.
Figure 3 reflects the comparison between the predicted and true price trends of the four models. It can be seen from
Figure 2 that the LSTM model has poor prediction results due to its large deviation from the actual values, while the SSA model has the second best prediction results, and the VMD–SSA–LSTM and VMD–LSTM models have better prediction results due to their higher overlap with the measured values, but there is no significant difference between these two models; furthermore, according to the scatter plot in
Figure 3, it can be seen that the VMD–SSA–LSTM model has the best convergence and slightly outperforms the VMD–LSTM and SSA–LSTM models in terms of peak prediction accuracy and fluctuation.
It can be seen from
Figure 3 that the LSTM model has poor prediction results due to its large deviation from the actual values, while the SSA model has the second best prediction results, and the VMD–SSA–LSTM and VMD–LSTM models have better prediction results due to their higher overlap with the measured values, but there is no significant difference between these two models; furthermore, according to the scatter plot in
Figure 4, it can be seen that the VMD–SSA–LSTM model has the best convergence and slightly outperforms the VMD–LSTM and SSA–LSTM models in terms of peak prediction accuracy and fluctuation.
In summary, it can be seen that the VMD–SSA–LSTM model has the highest prediction accuracy, and it can also be seen that after the data has been decomposed by VMD, the prediction accuracy is much higher than that of the data without VMD decomposition, which indicates that VMD decomposition is sufficient for noise reduction of the original time series and extracting the complex and effective information implied in the price data, and to a certain extent, it can better reflect the intrinsic mechanism of prices. Thus, it also shows the importance of VMD decomposition in time series forecasting. The SSA optimizes the model parameters of the LSTM and improves the efficiency of model parameter selection, which shows that the hybrid VMD–SSA–LSTM forecasting model proposed in this paper is effective.
5.2. Interval Forecasting of Metal Prices
5.2.1. Distribution Fitting
Numerous scholars have introduced a variety of distribution functions, DFs, to describe time series in different fields [
31,
40]. The determination coefficient (0 ≤ R
2 ≤ 1) is used to determine the level of fit of these DFs. The larger the R
2, the better the fit of the DFs.
In this section, five DFs—Weibull, Logistic, Lognormal, Normal and Gamma—are used to represent metal price states. The associated probability density functions and cumulative distribution functions are shown in
Table 1. The calculation of the parameters of the DFs is an important step in this section. The traditional method for estimating the parameters of the DFs is to use the method of great likelihood estimation (MLE). However, parameters can also be calculated using intelligent optimization algorithms to maximize the value of the objective function of R
2, ultimately improving the fit of the DFs.
In this paper, a Sparrow Search Algorithm (SSA) is used to optimize the parameters of interest, and MLE is used as a comparative method to illustrate the excellent optimization performance of SSA.
Table 5 shows the parameters of the five DFs estimated using both MLE and SSA methods. These parameters can be used to describe the scales and translations of these DFs. Using the coefficient of determination R2 as an evaluation indicator that reflects the ability to fit the distribution, the values are shown in
Table 6. It can be seen that the values of R
2 calculated based on MLE and SSA are different, and the value of R
2 for SSA is higher than that of MLE, indicating that SSA is associated with a better estimation in all cases, which further confirms that SSA is more effective than MLE. We can also see that the Weibull distribution function is the most appropriate distribution function in most cases. It can also be seen from
Table 5 that the values of these parameters are different for the DFs obtained using the two methods but are all within one order of magnitude.
5.2.2. Interval Forecasts
Unlike point forecasts, interval forecasts can provide upper and lower bounds on observations and construct interval forecasts at a given level of significance. It can provide investors in the metal’s financial markets with more uncertain information to help them analyze metal market conditions. Based on the point forecasting results of the hybrid model (VMD–SSA–LSTM), the metal price best-fit distribution function (normal distribution) and the interval prediction method [
31,
32,
33,
34,
35], interval forecasts of metal prices can be made at a given significance α level. In addition, two evaluation metrics listed in
Table 3 (IFNAW and IFCP) are presented in this section with the aim of evaluating the performance of interval prediction. It should be noted that the optimal interval prediction should satisfy the following conditions: the larger the IFCP (0 ≤ IFCP ≤ 1) and the smaller the IFNAW at the α significant level, the better the predictive performance of the interval prediction at the same time.
Table 7 presents the metal price interval forecasting results for five different significance levels. Based on the interval forecasting results in
Table 7, it is evident that the interval forecasts vary in accuracy due to changes in width at different significance levels; for example, the interval forecasts for copper prices are 84.32% (IFCP), 0.0383 (IFNAW), 100% (IFCP) and 0.0454 (IFNAW) at significance levels of 0.1 and 0.05, respectively.
For a more visual presentation of the interval forecasts,
Figure 5 shows the three metal price interval forecasts at three significant levels. As can be seen from the graph, the dots represent the actual values and the colored areas represent the forecast intervals. It is clear that the interval forecasts perform well, with a large number of actuals in the shaded areas. In particular, for point forecasts of a time series where the errors are often larger at the peaks and troughs and where there is greater volatility, interval forecasting can improve the accuracy of the forecasts in these areas, and it is also worth noting that the forecasting performance of interval forecasting depends, to a large extent, on the results of point forecasting.
6. Conclusions
This paper presents a hybrid forecasting system for a metal price time series. For point forecasting, a new hybrid forecasting model for metal price forecasting is built based on variational modal decomposition, a sparrow search algorithm and a long short-term memory neural network LSTM technique. Tests against other forecasting models showed a year-on-year reduction of around 10% in the Mean Absolute Error (MAE), indicating that the hybrid forecasting model outperformed other comparative models. In addition, the point forecasting results also fully illustrate that the forecasting model makes full use of the VMD decomposition technique for data noise reduction and the SSA for optimizing the LSTM neural network parameters to improve the accuracy of metal price forecasting. In terms of interval prediction, five distribution functions are first introduced, the distribution characteristics of metal prices are analyzed and the parameters of the distribution functions are optimized using the SSA, which improves the distribution fitting ability and shows strong optimization performance, with the best fitting of the positive-terrestrial distribution function and the highest coefficient of determination. The interval prediction results of metal prices were then obtained based on the optimal point prediction results and the optimal distribution function. The numerical results show that there is a good interval-prediction effect at different significance levels, with higher IFCP values and smaller IFNAW values, and the majority of the actual values of the data lie in the prediction interval, and the interval coverage can basically reach over 85%.
In conclusion, this study has developed a mixed point and interval forecasting model for copper and aluminum metal prices. The forecasting model has better accuracy and robustness in terms of forecasting results and forecasting capability, which can provide more reference value for futures investors and policy-makers. At the same time, the authors believe that the prediction results of this forecasting model need to be applied to engineering practice, and the next step in the research is to prepare the predicted price as dynamic parameters to be applied to the dynamic boundary grade measurement of metal mines to provide theoretical guidance for the future production planning of mining enterprises. In addition, the single-step forecasting approach has some limitations, and further research will be carried out in the future in the area of multi-step forecasting.