Research on Agricultural Product Price Prediction Based on Improved PSO-GA

Li, Yunhong; Zhang, Tianyi; Yu, Xintong; Sun, Feihu; Liu, Pingzeng; Zhu, Ke

doi:10.3390/app14166862

Open AccessArticle

Research on Agricultural Product Price Prediction Based on Improved PSO-GA

by

Yunhong Li

¹,

Tianyi Zhang

¹

,

Xintong Yu

¹,

Feihu Sun

^1,2,3,

Pingzeng Liu

^1,2,3,* and

Ke Zhu

^1,2,3,*

¹

School of Information Science and Engineering, Shandong Agricultural University, Tai’an 271018, China

²

Key Laboratory of Huanghuaihai Smart Agricultural Technology, Ministry of Agriculture and Rural Affairs, Tai’an 271018, China

³

Agricultural Big Data Research Center, Shandong Agricultural University, Tai’an 271018, China

^*

Authors to whom correspondence should be addressed.

Appl. Sci. 2024, 14(16), 6862; https://doi.org/10.3390/app14166862

Submission received: 6 June 2024 / Revised: 16 July 2024 / Accepted: 26 July 2024 / Published: 6 August 2024

Download

Browse Figures

Versions Notes

Abstract

:

The accurate prediction of scallion prices can not only optimize supply chain management and help related practitioners and consumers to make more reasonable purchasing decisions, but also provide guidance for farmers’ planting choices, thus enhancing market efficiency and promoting the sustainable development of the whole industry. This study adopts the idea of decomposition–denoising–aggregation, using three decomposition and denoising techniques combined with three single prediction models to form a base model. Various base models are divided into different combinations based on whether the computational structure is the same or not, and the optimal weights of the combinations are determined by using the improved particle swarm optimization–genetic algorithm (PSO-GA) optimization algorithm in different combinations. The experimental results show that the scallion price in Shandong Province from 2014 to 2023 shows an overall upward trend, and there is a cyclical and seasonal fluctuation pattern of “high in winter and low in summer”; the semi-heterogeneous-PSO-GA model reduces the MAPE by 49.03% and improves the directional accuracy by 41.52%, compared to the optimal single prediction model, ARIMA. In summary, the combined model has the most accurate prediction and strong robustness, which can provide ideas and references for the difficult problem of determining the optimal weights of the combined model in the field of predicting the prices of agricultural products.

Keywords:

optimization; PSO-GA; forecast combination; semi-heterogeneous combination; agricultural commodity price; Shandong scallion

1. Introduction

As the main production area of Chinese scallions, Shandong Province has an important impact on the price fluctuations of scallions for all participants in the agricultural industry chain [1,2,3]. When there is an oversupply, it often leads to low prices hurting farmers and stockpiling, while when there is a shortage of supply, there is often a problem of “scarcity of goods is expensive” and market price inflation [4,5,6]. This mismatch between production and market demand leads to high price volatility [7,8,9]. However, in the current market environment, due to the lack of effective price forecasting technology, it is difficult for farmers to judge the scale of planting, for supply chain managers to effectively plan inventory and logistics, and for the government to formulate relevant industry policies [10,11,12]. The price of scallions is a very important factor in the development of the industry. Therefore, the accurate forecasting of scallion prices is important for many participants in order to understand market trends, make rational decisions, and promote the balance of supply and demand, as well as the sustainable development of the whole industry.

In the field of agricultural product price forecasting, extensive research has been carried out at home and abroad. Previous methods for forecasting agricultural commodity prices can be roughly categorized into three types: traditional methods, machine learning methods, and hybrid methods [13]. In traditional methods, commonly used forecasting models include linear regression, autoregressive integrated moving average (ARIMA) [14], smoothing (ES) [15], etc., which are widely used in agricultural price forecasting. However, these methods often exhibit problems such as unstable prediction accuracy when dealing with non-smooth, nonlinear data, such as the scallion price series [16]. With the rapid development of artificial intelligence technology, machine learning methods have gradually emerged in agricultural price prediction [17,18]. These include multilayer perceptron (MLP) [19], Long Short-Term Memory (LSTM) [20], and gated recurrent units (GRUs) [21]. However, it is often difficult for a single model to fully capture the complexity of data when dealing with complex data [22], and it is prone to the overfitting phenomenon. In order to overcome the limitations of single models, combinatorial modeling has gradually become a popular method for agricultural price prediction [23,24]. In the field of forecasting, combinatorial methods refer to the same techniques as the hybrid method. By integrating the prediction results of multiple single models, combinatorial modeling aims to find the best combination to solve a specific prediction problem. For the process of determining the weights of the combined model, the traditional simple weight statistics method [25,26] has been gradually replaced by intelligent optimization algorithms, such as the Artificial Bee Colony (ABC) [27], particle swarm optimization (PSO) [28], and genetic algorithms (GAs) [29] because of its inability to effectively utilize the characteristics of the base models [30]. However, for problems such as determining the weights of a combined model, the use of only a single optimization algorithm tends to suffer from problems such as the tendency to fall into local optimal solutions. Therefore, in order to solve the above problems, this paper proposes an optimized PSO-GA weight assignment method [31], which utilizes elite learning [32], uniform variation, and arithmetic crossover methods for the optimization in the serial mixing of PSO and the GA and makes full use of the global search ability of particle swarm optimization algorithms, as well as the characteristics of population diversity in genetic algorithms, to achieve complementarity of strengths [33] in order to form optimal weights in the combined model.

The scallion price series is a nonlinear and complex mixture, its data features are not obvious, and it is often difficult to directly utilize the raw data for forecasting to achieve the desired accuracy [34]. Some studies have shown that by extracting the effective features in the original data and reconstructing them into new series through decomposition denoising techniques, and then using them as inputs for a single model, the forecasting accuracy and the ability to capture trend changes can be significantly improved [35,36,37]. For example, some scholars combined empirical modal decomposition (EMD) with ARIMA for electricity price prediction, which enabled the model to better capture the nonlinear fluctuations of prices, and found that the prediction effect was significantly better than that of a single ARIMA [38]. Additionally, some scholars have combined singular spectrum analysis (SSA) with a wavelet neural network (WNN) for training on denoised data; this combination can better capture the complex patterns and trends in price series and has been used for predicting crude oil and gold, which are characterized by high price uncertainty [39]. Other scholars combine variational modal decomposition (VMD) with LSTM, which not only improves the robustness of the prediction model, but also enhances the model’s ability to recognize long-term price trends, forming a more advantageous crude oil price prediction model [40]. Therefore, combining advanced decomposition and denoising techniques with an accurate single prediction model is an effective way to analyze and predict the price dynamics of agricultural products.

Compared with large agricultural products, the prices of small agricultural products tend to change drastically, with obvious cyclical and seasonal characteristics [41], and there are almost no previous case studies on the price of scallions in Shandong Province optimizing the weights of the combination model. Therefore, in order to accurately predict the price of agricultural products and further improve the forecasting performance, this study proposes another novel optimized forecasting combination framework based on the improved PSO-GA method for predicting the price of spring scallions in Shandong Province. This study proposes adopting the idea of decomposition–denoising–aggregation, combining the three decomposition and denoising techniques of SSA, EMD, and VMD with the three single prediction models of ARIMA, LSTM, and GRUs, respectively, classifying all the combined models into three major categories based on isomorphic, heteromorphic, and semi-heteromorphic strategies, and applying the averaging method, the median method, and the improved PSO-GA. The average method, median method, and improved PSO-GA algorithm are applied to optimize the weights of each combined model to find the optimal combination of forecasting models and to promote research and development in the field of price forecasting in small agricultural products and the optimization of the weights of combined models.

2. Literature Review

The following paragraphs will present three aspects of the price volatility pattern in small agricultural products in China, how this combined model determines its base model, and the weights between the base models.

First of all, in the field of price forecasting in small agricultural products, previous researchers have conducted many studies in the field of price fluctuation law, which helped us to determine the fluctuation law of scallion price and then determine the forecasting model. For example, a researcher [41] conducted an in-depth analysis of the characteristics of garlic price fluctuations, revealing the seasonality and cyclicality in price fluctuations, and these fluctuation characteristics may also exist in scallion prices. One scholar [42] has studied combined model prediction for ginger prices by analyzing the factors affecting ginger price, studying the fluctuation law in ginger price, and constructing a ginger price prediction model based on a Prophet-SVM combined model to predict the trend of ginger price, providing an important reference for us to understand the price fluctuation factors and predict the price of green scallion. Other research results show [43] that all small agricultural products have significant seasonality, cyclicality, and trends. The results of various studies show that the price fluctuations of small agricultural products are regular and can be predicted by modeling.

Secondly, the techniques of decomposition, reconstruction, and single model forecasting used in this study have been used in previous cases of agricultural price forecasting. In the field of traditional forecasting, the ARIMA model is commonly used to forecast the price of agricultural products, and a study [44] used ARIMA as one of the base models in a combination model, which is capable of interpreting and forecasting the data in a simple and flexible linear manner. This provides empirical support for the use of similar methods to deal with the scallion price data in this study. In machine learning methods, LSTM and GRU models, which are intelligent models, are also often applied in the field of agricultural price prediction [45]. For example, there has been a study on predicting Chinese agricultural futures prices by constructing a VMD-SGMD-LSTM emerging hybrid model [46], while LSTM is essentially an improved RNN, which can improve the dilemma of the standard RNN where the gradient disappears or explodes during the training process, and a study has shown that it has a better learning and prediction ability for nonlinear time series. Another study predicts the spot price of soybean in China by constructing a CNN-GRU-Attention model [47], which demonstrates the adaptive learning ability of the GRU model, allowing it to cope well with the difficult problem of dynamic changes in commodity prices and reduce the long-term dependence problem well through its gating mechanism. As a more computationally efficient recurrent neural network, it is a good choice for predicting the price of such small agricultural commodities. In summary, the base models selected for the combined model in this study include the classical model ARIMA in the field of traditional time series forecasting, and the two machine learning models LSTM and GRUs, which are quite effective in the field of small agricultural products.

Finally, the selection of an appropriate optimization algorithm for determining the weights of the combination model is often the key to obtaining the best combination forecasting results. It has been clearly shown that the serial hybrid optimization algorithm with PSO followed by the GA excels in optimization problems with multiple peak functions, like determining the optimal weights of the combined model [48]. One study implemented the optimization of network model parameters through PSO-GA [49], which helped to avoid overfitting, improve the generalization ability of the model with unknown data, and use the optimized network to forecast the future electricity consumption in Wuhan. Another study used the hybrid PSO-GA algorithm to optimize different types of model coefficients, such as linear, exponential, and quadratic models, when forecasting energy demand in China [50]. This indicates that PSO-GA can adapt to multiple model structures, provide optimal weights for different types of models, and increase the flexibility and adaptability of models. The above findings provide strong support for the selection of PSO-GA, it as the optimization algorithm determine the optimal weight of the combined model in this study. The improved PSO-GA algorithm proposed in this study, by combining the global search capability of the PSO algorithm and the local search capability of the GA algorithm, is not only able to determine the optimal weights of the combined model more accurately, but also helps to improve the generalization capability of the combined model, so as to solve the problem of determining the optimal weights of the combined model in more price-forecasting fields, thus expanding the applicability of the combined model for price forecasting and enhancing the flexibility and adaptability of the combined model for price forecasting. This expands the applicability of the combination model for price forecasting and strengthens the accuracy of the combination model’s prediction.

Supported by the above literature, this study proposes a novel optimized forecasting combinatorial framework based on the improved PSO-GA algorithm to achieve more accurate and stable forecasting of scallion prices in Shandong Province. This not only fills in the research gap regarding the application of the combination forecasting framework in the field of scallion price forecasting in Shandong Province but also provides a new direction and reference for the field of small agricultural product price forecasting.

3. Materials and Methods

3.1. Data Description

The data for this experiment are scallion price data from the wholesale market in Shandong Province from Brick’s Agricultural Data Intelligence Terminal. The data range from 2 January 2014 to 31 August 2023 for scallion prices is used. The unit for the price of scallions is CNY/kg. Some of the missing data have been filled in using the mean value method. The sample data were divided into three subsets, with the first 80% of the data considered as the training subset, the next 10% as the validation set, and the most recent 10% as the testing subset. The validation set is used to tune the hyperparameters of individual prediction models and prediction combinations. The prediction results are included in the test set and compared. The scallion price dataset is shown in Figure 1, and its commonly used statistics are shown in Table 1, and some price data examples of the training set, test set, and verification set are shown in Table 2, Table 3 and Table 4. From the Figure 1, it can be seen that the scallion price in Shandong Province from 2014 to 2023 shows an overall upward trend, and there is a cyclical and seasonal fluctuation pattern of “high in winter and low in summer”.

3.2. Assessment Indicators

3.2.1. Evaluation Metrics

Prediction performance is evaluated by four prediction error metrics, namely, the RMSE, MAE, MAPE, and

D_{s t a t}

, where N is the number of observations in the test set, and the numbers in

y_{i}

and

y_{i}^{^}

are the true and predicted values, which can only be predicted if

({y_{i + 1} - y}_{i})

(y_{i + 1}^{^} - y_{i}^{^}) \geq 0

when

A_{i}

= 1; otherwise,

A_{i}

= 0.

R M S E = \sqrt{\frac{1}{N} \sum_{i = 1}^{N} {(y_{i} - {\hat{y}}_{i})}^{2}},

(1)

M A E = \frac{1}{N} \sum_{i = 1}^{N} | y_{i} - {\hat{y}}_{i} |,

(2)

M A P E = \frac{1}{N} \sum_{i = 1}^{N} \frac{| y_{i} - {\hat{y}}_{i} |}{| y_{i} |},

(3)

D_{s t a t} = \frac{1}{N} \sum_{i = 1}^{N} A_{i}

(4)

3.2.2. Test Methods for Predictive Validity

The validity of the predictive model is statistically tested using the Diebold–Mariano test. Hypothesis testing is conducted, where the null hypothesis indicates whether there is a difference in the prediction errors of the compared models, and the alternative hypothesis indicates that the prediction error of the proposed model is lower than that of the compared models. Assuming that the prediction accuracy of the target model A is equal to the prediction accuracy of the baseline model B, the original hypothesis is as follows:

H_{0} : E [F (e_{t}^{A})] = E [F (e_{t}^{B})]

(5)

of which

e_{t}^{A}

and

e_{t}^{B}

are the prediction errors of model A and model B, respectively, and setting the loss function F as the mean square error, the Diebold–Mariano (DM) statistic can be defined as follows:

S_{D M} = \frac{\bar{g}}{\sqrt{V_{\bar{g}} / T}}

(6)

\bar{g} = \frac{1}{T} \sum_{t = 1}^{T} g_{t}, g_{t} = {(x_{t} - x_{A, t})}^{2} - {(x_{t} - x_{B, t})}^{2}, V_{g} = γ_{0} + 2 \sum_{t = 1}^{\infty} γ_{t}, (γ_{t} = c o v (g_{t + 1}, g_{t})) .

(7)

γ_{0} d e n o t e s t h e v a r i a n c e o f g_{t}

;

x_{A, t} a n d x_{B, t}

are the predicted values of model A and model B, respectively, in period t. T is the number of observations for the test set.

3.3. Data Preprocessing

When analyzing the volatility of non-stationary or nonlinear time series, it is crucial to employ decomposition and denoising techniques that reduce the complexity and enhance the interpretability when forecasting time series [51]. Through multiscale decomposition, the raw time series is finely divided into different frequency components, allowing us to gain independent insights into long-term trends, seasonal fluctuations, cyclical characteristics, and random noise in scallion prices [52,53]. Once the raw data have been decomposed and denoised, recombining all the valid decomposed sequences produces a sharper and less noisy reconstructed sequence. It has been pointed out that using the reconstructed sequence as the input data for the model can effectively eliminate the interference of irrelevant information, which, in turn, significantly improves the prediction accuracy, and thus significantly improves the accuracy of prediction [54,55].

Therefore, three advanced denoising techniques, SSA, EMD, and VMD, are used in this study in order to accurately remove the noise components from the original price series and recombine the remaining valid information into a new input series. For detailed techniques and principles of SSA, EMD, and VMD, please refer to Appendix A. In the SSA method, we specifically select the top three decomposed sequences among the largest components with contribution rates over 99.9% to be accumulated in order to form the denoised reconstructed sequences. For the EMD method, since the intrinsic mode functions with the highest frequencies tend to carry the most noise, we choose to accumulate the residual terms and all modes except the highest frequency modes to obtain the denoised reconstructed sequence. As for the VMD, we video the highest-frequency modes as noise and choose only the other modes to be combined to obtain a pure denoised sequence.

3.4. Single Forecast

The ARIMA model, or autoregressive integral sliding average model, as a classic model in traditional time series forecasting methods, captures the dynamics of time series data by analyzing their autocorrelation. The ARIMA model has a wide range of applications in the field of price forecasting, especially in handling linear data and revealing seasonality and trends in data. LSTM and GRUs are two classic models in the field of machine learning methods; they belong to the variants of recurrent neural networks (RNNs), which are recurrent neural network structures in deep learning that are specialized in capturing nonlinear relationships and temporal dynamics in data and are particularly suitable for processing and predicting long-term dependence in time series data. LSTM avoids the gradient vanishing problem of traditional RNNs in long sequences by introducing a memory unit, whereas GRUs act as a simplified variant of LSTM, reducing the complexity of the model while maintaining the performance. Due to the differences between LSTM and GRUs in terms of information flow and control mechanisms during computation, the selection of base models by incorporating them simultaneously into the combined model can capture the intrinsic features of the data in a more comprehensive way, thus significantly enhancing the generalization ability of the model.

Therefore, this study adopts these three classical single prediction models as the source for generating the base model in the combined model, which can take advantage of the fact that the combined model has both statistical and data-driven models, capturing a wider range of data features and patterns and improving the accuracy and reliability of prediction. The specific operational steps and details of each model have been described in detail in Appendix A.

Meanwhile, in order to verify the applicability and effectiveness of the decomposition denoising technique combined with a single forecasting model on the large scallion price series, this study forms and compares the three single models, based on the adoption or non-adoption of the three decomposition techniques, in a variety of integrated models, so as to evaluate the differences in the accuracy and directional smoothness of the forecasts of the different decomposition denoising techniques when they are applied to the different models and to provide references and bases for subsequent studies.

3.5. Combined Forecasts

In different prediction scenarios, prediction using only a single model often suffers from high sensitivity to data changes and poor generalization ability applied to different scenarios. In contrast, a combined model can be composed of multiple single models combined with decomposition and denoising techniques, and the prediction results of the combined model can be obtained by assembling multiple base models. This not only improves the overall prediction accuracy but also reduces the limitations caused by using only a single model for prediction [56]. In this study, weight optimization is used to incorporate the prediction results of each base model to form a combined prediction by assigning appropriate weights to each base model in the combined model (see Definition 1 for details). The combined prediction model can effectively incorporate the unique advantages of each model, thus improving the accuracy and reliability of prediction.

Definition 1.

For an individual prediction model of type m, by giving the corresponding weights W = {

w_{1}

,

w_{2}

,

w_{3}

,…}, the combination prediction can be expressed as

{\hat{y}}_{c o m b i n e d}

=

{\hat{y}}_{1}

∗

w_{1}

+

{\hat{y}}_{2}

∗

w_{2}

+ … +

{\hat{y}}_{m}

∗

w_{m}

.

An important issue that needs to be addressed in forecasting combinations is the combination strategy. To further evaluate the performance of the PSO-GA weight assignment method in various prediction tasks and find the most suitable combination model for scallion price prediction, we apply it to three combinatorial strategies with different combinatorial capabilities: homomorphic, heteromorphic, and semi-heteromorphic. We classified the results based on the similarities and differences between different models. That is, if different base models use the same learning algorithm, then the combination is called a homogeneous structure, if different learning algorithms are used, then it is a heterogeneous structure, and the combination model that contains both homogeneous and heterogeneous structures is a semi-heterogeneous structure. By complementing the advantages of different types of models, the predictive performance of the model can be improved, overfitting can be reduced, and the generalization ability of the model can be improved.

3.6. Optimization Algorithm Based on PSO-GA Improvement

The problem of determining the optimal weights of a combinatorial model belongs to the optimization problem of multi-peak functions, i.e., the need to find the optimal value of the function among a number of extremes, which makes determining the optimal combination of weights for a combinatorial model in a wide search domain quite tricky. It has been pointed out that the serial hybrid approach using PSO followed by the GA is suitable for solving optimization problems with multi-peak functions, which can jump out of the local optimal solution and determine the global optimal solution efficiently [50]. Therefore, this study proposes a novel PSO-GA weight optimization algorithm to determine the weights in each combination prediction on this basis. In order to assess the superiority of the weight assignment method in this study, we compare the traditional statistical weight assignment methods, such as the mean and median methods, with the current optimization method. The mean is considered as a benchmark, assigning the same weight to each base model. The median method (Median) selects the middle value of the prediction results of each model as the prediction result of the current combination. In the same combination model, the optimal weight combination for the current combination is determined by calculating the optimal fitness values for different weight combinations under the specified fitness function (see Definition 2).

Definition 2.

Thus, the optimization problem of computing reasonable weights for each model is defined as follows:

M i n G (y - {\hat{y}}_{c o m b i n e d}), s . t . \sum_{i = 1}^{m} w_{i} = 1 .

(8)

where G is a predetermined function that can be used as the sum of squares (SSE), mean square (MSE), absolute sum (SAE), mean absolute value (MAE), etc., and y is the observed value. The fitness function chosen for this study is the mean square (MSE).

w_{i}

denotes the weights of the base models within a combination, m denotes the number of base models within a portfolio, and the sum of the weights of all base models within the same portfolio is required to be one.

3.6.1. Particle Swarm and Genetic Algorithm Overview

The particle swarm optimization algorithm, an optimization algorithm based on group intelligence, was proposed by Kennedy and Eberhart in 1995 [51]. The idea of the particle swarm optimization algorithm originates from the social behaviors of bird flocks and fish schools in nature, and it can be used to solve problems such as optimizing combination weights. During the foraging process, each bird in the flock adjusts its flight direction by observing the flight paths and positions of its surrounding companions in order to find the food source. Similarly, in the PSO algorithm, each “particle” represents a candidate solution in the problem space, and the particle adjusts its position by tracking its own historical best position and the global best position in its neighborhood. Compared with other optimization algorithms, the PSO algorithm has fewer control parameters, particle memory, and faster convergence. However, the PSO algorithm is prone to falling into local optimal solutions during the search process due to the lack of population diversity, and the search range is small. Especially when dealing with the multi-peak function optimization problem, the particles may prematurely gather near a local optimal solution during the iteration process, leading to the premature convergence of the algorithm [52].

In contrast, genetic algorithms, inspired by evolutionary theory, search for optimal solutions by mimicking natural selection and hereditary processes. The GA has selection, crossover, and mutation operations that increase the diversity of the population, thus helping to reduce the problem of falling into local optima. It iteratively searches the entire solution space rather than only local space, which enables the GA to search for globally optimal or near-optimal solutions, and adopting a probabilistic rather than a deterministic search strategy allows the GA to reduce the number of local optima to be trapped in. Since crossover and mutation operations in genetic algorithms are randomized and do not rely on past search experience, this may cause the algorithm to take more time to converge to the optimal solution. As verified by previous studies, since the above two optimization algorithms have complementary characteristics in terms of strengths and weaknesses, the optimization algorithm formed by mixing two single algorithms is able to overcome each single algorithm’s limitations and achieve complementary advantages, which helps to avoid precocious convergence and falling into local optima [53].

3.6.2. Improved PSO-GA

The improved PSO-GA optimization algorithm proposed in this study utilizes elite learning, the dynamic adjustment of speed parameters, and uniform variation and arithmetic crossover methods, and makes full use of the global search ability of the particle swarm optimization algorithm and the characteristics of population diversity in genetic algorithms to achieve complementary advantages in order to achieve the purpose of determining the optimal combination model weights. The flowchart of this algorithm is given in Figure 2, below. The computational steps of the proposed improved PSO-GA weight assignment method can be expressed as follows:

Step 1: Define the objective function

F (w) = \sum_{i = 1}^{n} y - |{\hat{y}}_{n} * x_{n}|

where

x_{n}

is the value of the weighting coefficient,

{\hat{y}}_{n}

is the predicted value of the nth individual prediction model, and

\sum_{i = 1}^{n} x_{i}

= 1.

Step 2: Set the solution space dimension dim (D), population size N, maximum number of iterations T, acceleration coefficients c1, c2, GA maximum generation g1, GA mutation rate

r_{1}

∈ [0, 1], GA crossover rate

r_{2}

∈ [0, 1], and other parameters are initialized.

Step 3: Randomly generate the initial positions of N particles X = {

x_{1}

,

x_{2}

,

x_{3}

…}, corresponding to the velocity V = {

v_{1}

,

v_{2}

,

v_{3}

…};

x_{i} a n d v_{i}

denote the position and velocity of the particle in d-dimensional space, respectively.

x_{i}

is also a solution to the problem.

Step 4: In the ith iteration, the particle velocity and position are updated based on individual and population optimal solutions through dynamic weight adjustment, with the first T/2 iterations facilitating individual exploration and the second T/2 iterations facilitating convergence and evaluating particle adaptation. The formulas for updating velocity and position are as follows:

v_{i j}^{(t + 1)} = w \times v_{i j}^{(t)} + c_{1} \times r a n d (0, 1) \times [{p b e s t}_{i j}^{(t)} - x_{i j}^{(t)}] + c_{2} \times r a n d (0, 1) \times [{g e s t}_{j}^{(t)} - x_{i j}^{(t)}],

(9)

x_{i j}^{(t + 1)} = x_{i j}^{(t)} + v_{i j}^{(t + 1)},

(10)

of which

v_{i j}^{(t + 1)}

and

x_{i j}^{(t + 1)}

denote the position and velocity of the (t + 1)th particle.

Step 5: In each iteration, sort the particle swarm, keep the first half of the excellent particles p1, perform arithmetic crossover and uniform mutation on the last half of the particles p2, and then merge the old and new particles to form a new particle swarm.

Step 6: Update the optimal position and optimal fitness value of the population by comparing the fitness value of the current position with the optimal fitness value.

Step 7: Repeat Steps 4–6 to end the algorithm when the set maximum number of iterations or convergence accuracy is reached.

3.7. Methodological Framework of the Experiment

In summary, the combined forecasting framework based on the improved PSO-GA weight assignment method proposed in this study is shown in Figure 3, below, which includes four parts: data preprocessing, single forecasting, combined forecasting, and determining combined model weights. First, the original dataset is decomposed and denoised using the three techniques of SSA, EMD, and VMD to filter the external noise of the original price series and obtain the reconstructed price series. Second, the reconstructed prices are used as input sequences for a single model, which are combined with the three techniques, ARIMA, LSTM, and GRUs, respectively, to generate diverse base models for selection. Finally, all the base models are divided into different combinations under the three combination strategies of isomorphism, heteromorphism, and semi-heteromorphism, and the improved PSO-GA, a weight optimization approach, is used to combine the individual base models in the group to form the prediction results of each combination. This method has the advantages of better scalability and easy implementation, while improving the prediction accuracy and robustness.

4. Results

4.1. Parameterization

For SSA, the window length L is selected based on the maximum period, and L decompositions are obtained. Decomposition contributions less than 0.1% are considered as noise, and the others are valid patterns. The final window length is set to 10, 10 decompositions are obtained, and the first three sequences are selected to reconstruct the denoising sequence. The scallion price sequence was decomposed into 10 sub-sequences using the one-dimensional EMD method, the mode with the highest frequency was treated as noise, and the other 9 modes were summed up as the denoised sequence. For VMD, the scallion price sequence was divided into 11 modes and a residual term, and all modes except the component with the highest frequency were summed to obtain the denoised sequence.

All denoised price series on the validation and test sets are shown in Figure 4, below, for clearer comparison. All three denoising techniques are able to reduce the effect of noise and capture the overall trend and major fluctuations. SSA has the strongest smoothing. After smoothing, EMD is able to maintain amplitude during sharp changes. VMD retains more detail than SSA or EMD. In the absence of severe shocks to the scallion price series, EMD and VMD have similar smoothed series.

4.2. Forecast Analysis

The overall experimental process is divided into three modules, which explore the effects on a single model before and after decomposition, the effects on prediction using different weight allocation methods, and testing the validity of the model through DM. Specifically, each module works as follows. First, by interacting the ARIMA, LSTM, and GRU models with SSA, EMD, and VMD, respectively, each prediction model in the proposed optimal prediction combination is generated, which proves that the use of denoising to process the input data of a single model in advance can significantly improve the prediction effect of the model. Secondly, by combining the single models in different combinations through different methods of assigning weights, the performance of the three weight assignment methods, namely, Mean, Median, and PSO-GA, is compared among the homogeneous, heterogeneous, and semi-heterogeneous combination strategies, and it is proven that the optimized semi-heterogeneous combination prediction of the PSO-GA method is the best in terms of the performance of the combination in a comprehensive way. Finally, a Diebold–Mariano test is performed to determine whether there is a significant difference between the different combinatorial prediction models optimized based on PSO-GA.

4.2.1. Prediction Results from a Single Model

This subsection compares the effect of each single model with or without denoising techniques in terms of predictive performance.

Different signal decomposition techniques can reveal unique patterns in time series data. The high-frequency components usually capture the effects of random factors, while the mid-frequency components reflect the role of cyclical factors, and these components show a distinct sinusoidal waveform. The low-frequency components, on the other hand, represent trends in price changes over time. By reconstructing these decomposition sequences, the noise in the data can be effectively removed, thus improving the prediction accuracy while reducing the computational complexity. Table 5 shows the performance evaluation results of different single models with or without various decomposition techniques, where the best performing results are shown in bold and the respective MAPE results are presented in Figure 5.

First, in terms of decomposition techniques, the SSA method has the best decomposition results. Compared with the single model without decomposition, the MAPE of scallions is reduced by 41.94% on average, and this improvement can be attributed to the precise identification of data features by singular spectrum analysis (SSA), which helps to reveal the potential drivers of price volatility of agricultural products. The EMD decomposition technique is the next most effective, with an average MAPE reduction of 25.95%. The VMD is the least effective, with an average MAPE reduction of 17.70%. Overall, the application of decomposition and denoising techniques did significantly enhance the performance of the single models in scallion price prediction; however, for those models with poor model stability, too much model decomposition may trigger the accumulation of errors, which may negatively affect the accuracy of the prediction results.

Second, the prediction effects of the three single models are roughly the same, while the decomposition ensemble models mixed with SSA, EMD, and VMD decomposition techniques improve the prediction performance of the single models to different degrees. The use of decomposition denoising techniques can effectively capture trends and fluctuations in the data, thus improving the predictive ability of the models. On average, the ARIMA, LSTM, and GRU models with decomposition techniques have their directional accuracy improved by 54.39%, 43.53%, and 38.89%, respectively, and their prediction accuracy MAPE reduced by 32.80%, 32.73%, and 20.97%, respectively, compared to the single model. It can be seen that, with the same decomposition technique, the prediction accuracy of the ARIMA-based single model is significantly improved in terms of the effect of the directional transformation of the predicted price and the prediction accuracy compared to the LSTM- and GRU-based models. And the prediction effects of the two intelligent models are approximately the same. In summary, SSA-ARIMA is the most effective in predicting the price of scallion, with a 33.02% improvement in MAPE and a 10.50% improvement in

D_{s t a t}

compared to the average results of all other models. These results indicate that the application of this decomposition denoising and reconstruction technique enables the model to capture the details of price fluctuations more accurately and thus performs better in predicting short-term fluctuations and long-term trends in prices. In addition, the decomposition and reconstruction technique can also help the model identify and filter out noise and outliers in the price series, further improving the accuracy and reliability of the forecasts.

4.2.2. Prediction Results of Combined Models

This subsection demonstrates the prediction performance of the combination model by comparing the prediction effect of the combination model, as well as the performance of the combination methods, by using three combination methods for different combinations of models, which include the mean (MEAN), median (MED), and PSO-GA methods.

First, Table 6, below, demonstrates the average prediction performance of the three prediction combination methods. The data show that all three combination methods contribute to obtaining better prediction results and have a stable prediction ability. In the linear combination methods (the average method and median method), they determine the weights of each of the combination models by statistical methods, and the MED method performs better than the average method in most cases, but it does not consider the performance differences between different models within the combination, and ignores the prediction bias between the models. In contrast, among the intelligent optimized combination methods, PSO-GA improves the stability and accuracy of the overall prediction by taking advantage of the complementarity between models, thus clearly overcoming the limitations of the linear combination methods. As can be seen in Figure 6, the PSO-GA method has, on average, the lowest values for RMSE, MAE, MAPE, and the highest value for

D_{s t a t}

, compared to the mean and median methods. Compared to the mean method, PSO-GA has 13.77%, 13.37%, and 14.09% lower RMSE, MAE, and MAPE, respectively, and 10.67% higher directional accuracy

D_{s t a t}

. Compared with the median method, the RMSE, MAE, and MAPE of PSO-GA were reduced by 10.65%, 10.35%, and 11.20%, respectively, and the directional accuracy

D_{s t a t}

was improved by 8.07%. This indicates that the PSO-GA method can make full use of the complementarity between the prediction results of different single models to find the global optimal solution through powerful global search and optimization capabilities, thus improving the accuracy of the prediction results, and it also effectively proves the robustness of the combined PSO-GA method in predicting the price of large scallions.

Secondly, different combinations are divided into different combination strategies due to different decomposition techniques and prediction models, which makes the prediction performance of each combination strategy different. According to the similarity in the base model structure and the difference in the algorithm, the combination model is divided into three combination strategies: isomorphism, heteromorphism, and semi-heteromorphism. Their base models in the combination model under the different combination strategies are shown in Table 7, and their prediction performance is shown in Table 8, below. From the data, we can see that all three combination strategies can significantly improve the prediction performance. In terms of the prediction accuracy measured by the RMSE, MAE, and MAPE, the homogeneous model reduces by, on average, 24.26%, 23.82%, and 23.73% compared to the single combination model without decomposition, and the heterogeneous model reduces by, on average, 31.03%, 30.66%, and 29.73% compared to the single combination model. In terms of directional accuracy, the

D_{s t a t}

of the homogeneous (heterogeneous) model increased by an average of 27.00% (34.72%). The semi-heterogeneous combination strategy (i.e., all-base combination) benefits significantly from model diversity compared to one of the other two combination strategies, and collectively tends to produce the best predictions, with the lowest horizontal accuracy and highest directional accuracy. In terms of prediction accuracy, the semi-heterogeneous models reduced each metric by 17.43%, 17.14%, and 16.32% on average compared to the homogeneous models and by 9.32%, 8.97%, and 9.18% on average compared to the heterogeneous models. In terms of direction prediction, on average, the semi-heterogeneous model improved the value of

D_{s t a t}

by 9.76% compared to the isomorphic model, while there was little change compared to the heterogeneous model. These findings reveal that by optimizing the combination weights, the semi-heterogeneous combination strategy can achieve better prediction. Even for models designed on the same dataset, the prediction performance can be significantly improved by introducing more diverse models.

In order to further compare the prediction accuracies of different combination models, the MAPE values of the eight-seeded strategy under the three combination methods are compared. The results are shown in Table 9, below. The data show in Figure 7 and Figure 8 that in most cases, the method of determining the weights of each model in the combination model by PSO-GA has the smallest error value for the MAPE compared with the mean and median methods. In addition, the MAPE values of the PSO-GA-based portfolio models decreased by an average of 14.09% compared to the mean-based benchmark portfolio model, and 11.20% compared to the median method. The PSO-GA of the full model was the most effective, with the lowest error and highest accuracy. Its MAPE value for large scallions is 1.9209%, which is a decrease of 8.96% compared to the MAPE value for the best single model, SSA-ARIMA. As shown in Table 9, among the homogeneous models, the ARIMA-based model performs better, and this result suggests that the combination of homogeneous models with different decomposition techniques can further improve prediction accuracy. Among the heterogeneous models, the SSA-based model performs the best, with the smallest MAPE error, and its prediction performance is quite excellent compared with the single model.

Table 5. Evaluation of denoising hybrid strategies for scallion prices.

Model	Decomposition Method	RMSE	MAE	MAPE (%)	$D_{s t a t}$
ARIMA	single	0.1536	0.1133	3.7684	0.4174
	SSA	0.0871	0.0635	2.1098	0.6355
	EMD	0.0920	0.0697	2.3758	0.6449
	VMD	0.1327	0.0961	3.2247	0.6531
LSTM	single	0.1675	0.1211	4.0358	0.4399
	SSA	0.0906	0.0666	2.2651	0.7753
	EMD	0.1433	0.1018	3.2626	0.6013
	VMD	0.1072	0.0790	2.6165	0.5175
GRUs	single	0.1513	0.1122	3.8860	0.4272
	SSA	0.0959	0.0708	2.4120	0.6266
	EMD	0.1125	0.0859	3.0221	0.6519
	VMD	0.1498	0.1100	3.7797	0.5016

Table 6. PSO-GA, mean, median comparison of three methods in four indicators.

	RMSE	MAE	MAPE	$D_{s t a t}$
Mean	0.1213	0.0896	0.0304	0.5683
Median	0.1170	0.0866	0.0294	0.5848
PSO-GA	0.1046	0.0776	0.0261	0.6362

Table 7. The individual models for different combination strategies.

Combination Strategy	Individual Models
Panel A: Homogeneous
ARIMA-based	ARIMA, EMD-ARIMA, VMD-ARIMA, SSA-ARIMA
LSTM-based	LSTM, EMD-LSTM, VMD-LSTM, SSA-LSTM
GRU-based	grus, emd-grus, vmd-grus, ssa-grus
Panel B: Heterogeneous
SSA-based	SSA-ARIMA, SSA-LSTM, SSA-GRUs
VMD-based	vmd-arima, vmd-lstm, vmd-grus
EMD-based	EMD-ARIMS, EMD-LSTM, EMD-GRUs
Panel C: Semi-heterogeneous
Full-based	ssa-arima, ssa-lstm, ssa-grus, vmd-arima, vmd-lstm, vmd-grus, emd-arims, emd-lstm, emd-grus
Panel D: Benchmark combination model
Single-based	ARIMA, LSTM, GRUs

Table 8. Evaluation of individual models for wheat price forecasting.

		Models	RMSE	MAE	MAPE (%)	$D_{s t a t}$
Homogeneous combination	ARIMA-based	Mean	0.1009	0.0753	2.5245	0.6
		Median	0.0989	0.0739	2.5103	0.5968
		PSO-GA	0.0839	0.0633	2.1190	0.6947
	LSTM-based	Mean	0.1489	0.1081	3.6087	0.5175
		Median	0.1507	0.1098	3.6555	0.5111
		PSO-GA	0.1360	0.0991	3.3173	0.6101
	GRU-based	Mean	0.1144	0.0856	2.9431	0.5587
		Median	0.1167	0.0876	3.00	0.5111
		PSO-GA	0.0941	0.0709	2.3952	0.6415
Heterogeneous combination	SSA-based	Mean	0.0994	0.0726	2.4552	0.6381
		Median	0.0946	0.0691	2.3464	0.6349
		PSO-GA	0.0871	0.0635	2.1190	0.6384
	VMD-based	Mean	0.1425	0.1045	3.5940	0.5714
		Median	0.1174	0.0851	2.8778	0.7548
		PSO-GA	0.1157	0.0847	2.8794	0.7278
	EMD-based	Mean	0.1036	0.0790	2.7204	0.6254
		Median	0.0990	0.0761	2.6548	0.6254
		PSO-GA	0.0918	0.0696	2.3808	0.6447
	Single-model-based	Mean	0.1541	0.1130	3.8139	0.4254
		Median	0.152	0.1125	3.8117	0.4317
		PSO-GA	0.1532	0.1129	3.7718	0.4182
Semi-heterogeneous combination	Full models	Mean	0.1063	0.0785	2.6724	0.6095
		Median	0.1066	0.0784	2.6809	0.6127
		PSO-GA	0.0746	0.0568	1.9209	0.7138

Table 9. The MAPE error in combination models for scallion price forecasting.

Combined Strategy	Mean	Median	PSO-GA
Panel A: Homogeneous
ARIMA-based	2.5245	2.5103	2.1190
LSTM-based	3.6087	3.6555	3.3173
GRU-based	2.9431	3.0036	2.3952
Panel B: Heterogeneous
VMD-based	3.5940	2.8778	2.8794
EMD-based	2.7204	2.6548	2.3808
SSA-based	2.4552	2.3464	2.1190
Panel C: Semi-heterogeneous
Full-based	2.6724	2.6809	1.9209
Panel D: Benchmark combination model
Single-based	3.8139	3.8117	3.7718

4.3. Statistical Tests

The results of all DM tests are shown in Table 10. From the results, it can be seen that the full-based model is much lower than 0.05 relative to the other listed combinatorial models. In most cases, this indicates that the full-based model outperforms the other listed combinatorial models at the 95% confidence level. The main reason for this is that the full-based combination strategy increases the size of the model pool and maintains the model diversity within a certain range.

5. Conclusions

In this study, an improved PSO-GA weight allocation method is innovatively proposed to determine the optimal weights for the combination models, and a novel forecasting combination framework based on weight optimization is used to analyze the characteristics of scallion price changes in Shandong Province, as well as the robustness, forecasting accuracy, and accuracy of directional changes using various combination models. The conclusions of this study are as follows:

(1): The noise components in the scallion price sequence are removed by SSA, EMD, and VMD decomposition and denoising methods, which makes the sequence smoother and improves the reliability and accuracy of prediction, while reducing the complexity of prediction. This provides a theoretical basis for predicting prices for more minor agricultural products in the future. Meanwhile, after further analyzing the reconstructed series after decomposition and denoising, it is found that the scallion price in Shandong Province from 2014 to 2023 has obvious cyclical and seasonal characteristics, showing the law of “high in winter and low in summer”.
(2): The experimental results show that combining the classical time series forecasting model (ARIMA) and intelligent models (LSTM and GRUs) with different decomposition techniques to construct diversified combination models can harness the advantages of each base model and effectively enhance the comprehensiveness and accuracy of the combination models in capturing sequence information. Compared with a single model, these combined models can significantly reduce the prediction error, and, at the same time, enhance the robustness and adaptability of the model with better generalization ability. This will provide a reference for future researchers in how to choose a base model for a combined model.
(3): In the homogeneous, heterogeneous, and semi-heterogeneous combinatorial models, by comparing the three weight assignment methods, namely, the average method, the median method, and the improved PSO-GA, it is found that the semi-heterogeneous-PSO-GA combination model has the lowest error, the highest accuracy, and the best performance. Its MAPE value in predicting large scallions is 1.9209%, which is 8.96% lower than the best single model, SSA-ARIMA. These experimental results indicate that the weights of combinatorial models determined by the PSO-GA method innovated in this study can allow the combinatorial models to have higher horizontal and directional accuracies and can effectively solve the common problems encountered when determining the weights of combinatorial models, such as slow convergence speed, ease of falling into local optimal solutions, poor adaptability, and so on.

In terms of social impact, the accurate prediction of scallion prices through this research method can not only optimize supply chain management and help related practitioners and consumers make more reasonable purchasing decisions, but also guide farmers’ planting choices and improve market efficiency, which is of great theoretical and practical significance to promote the sustainable development of the scallion industry.

Author Contributions

The design of the overall process of the experiment, the design of the innovation point, the design of the verification model, the compilation of the code of the experiment, the sorting of the data, and the writing of the full paper were all completed by Y.L. Y.L., X.Y. and T.Z., as were investigation, resources, visualization, writing—review and editing. X.Y. assisted Y.L. in the verification and format adjustment of the English manuscript when submitting the paper. The general direction of the research and the logic of the main body of the paper were directed by P.L.; the accuracy of the sentences in the paper and the work of supervision were directed by K.Z.; and the detailed introduction of previous studies and the analysis and confirmation of the correctness of experimental results were directed by F.S. when determining the direction of this research. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Data Availability Statement

The original contributions presented in the study are included in the article; further inquiries can be directed to the corresponding authors.

Conflicts of Interest

The authors declare no conflicts of interest.

Appendix A

Method introduction:

Appendix A.1. Singular Spectrum Analysis (SSA)

Singular Spectrum Analysis is a nonparametric time series analysis technique that decomposes a time series into a sum of components representing the trend, season, and remainder. This process forms a trajectory matrix by embedding the time series into a high-dimensional space and then decomposes it using Singular Value Decomposition (SVD). Its computational steps are as follows:

1. Embedding: Time series

x = [x_{1}, x_{2}, \dots, x_{N}]

is embedded into a Hankel matrix X of size K × L by sliding a window of length K over the series. The embedding process can be expressed as follows:

X = [\begin{matrix} x_{1} & \dots & x_{L} \\ ⋮ & ⋱ & ⋮ \\ x_{K} & \dots & x_{K + L - 1} \end{matrix}]

2. Singular Value Decomposition: The trajectory matrix X is decomposed by SVD into

X = U Σ V^{T}

3. Grouping and reconstructing components: Singular vectors are grouped to form components. Each group corresponds to a specific temporal pattern. The components are reconstructed by

X_{r} = U_{r} Σ_{r} V_{r}^{T}

where

U_{r}, Σ_{r}, a n d V_{r}^{T}

are the selected singular vectors and values used for reconstruction.

4. Component Summation: The final reconstructed time series is obtained by summing the reconstructed components:

\hat{x} = \sum_{r} X_{r}

Appendix A.2. Variational Modal Decomposition (VMD)

Variational Modal Decomposition is a signal processing technique that decomposes a complex signal into a set of intrinsic modal functions (IMFs).

The VMD is based on the construction and solution of a variational problem that aims to find a set of IMFs with localized center frequencies whose sum is equal to the original signal.

1. Constructing a variational problem: Construct a variational problem that seeks a set of IMFs with finite bandwidths that are localized around their center frequencies and whose sum is equal to the original signal. The objective is to minimize a cost function consisting of the sum of the squares of the differences between the signal and the estimated IMFs and the sum of the bandwidths of the IMFs.

2. Solving the variational problem: The variational problem is solved using the alternating direction multiplier method (ADMM). In each iteration, the IMFs and their center frequencies are updated to minimize the cost function.

3. Determine the IMFs and center frequencies: Optimize the IMFs and their center frequencies through an iterative process until the convergence criteria are met. The final IMFs obtained are the decomposition of the original signal.

4. Signal Reconstruction: The original signal can be reconstructed by summing all IMFs.

The variational problem in VMD can be expressed as follows:

\underset{u_{k}, w_{k}}{m i n} \sum_{k} {| | \partial_{t} [(u_{k} - f (t)) e^{- i w_{k} t}] | |}_{2}^{2} + λ | | f (t) - \sum_{k} u_{k} | |_{2}^{2}

In the above equation,

u_{k}

is the kth IMF,

w_{k}

is the corresponding center frequency,

f (t)

is the original signal,

λ

is the equilibrium parameter, and

\partial_{t}

is the derivative with respect to time.

Appendix A.3. Empirical Modal Decomposition (EMD)

Empirical Modal Decomposition is a fully data-driven signal processing technique that decomposes a signal into waveforms modulated in both amplitude and frequency; i.e., the raw data are decomposed into a set of Intrinsic Mode Functions (IMFs) and a residual term, where each IMF represents a localized feature in the data at a different time scale. The EMD approach for the steps of EMD method are as follows:

Step 1: Identify local extremes: Find the local maxima and minima of the input signal x(t).

Step 2: Construct upper and lower envelopes: Fit a cubic spline to the maximum and minimum values to create the upper envelope

E_{m a x}

(t) and lower envelope

E_{m i n}

(t).

Step 3: Calculate the average envelope: Calculate the average of the upper and lower envelopes to obtain the average envelope

E_{m e a n}

(t).

Step 4: Subtract the average envelope from the input signal to obtain the first

{I M F}_{c 1} (t) : c_{1} (t) = x (t) - E_{m e a n} (t)

Final IMF extraction: For the remaining residuals

r_{1}

(t) repeat steps 1–4 until no more IMF can be extracted. The final decomposition is as follows:

x (t) = \sum_{i = 1}^{n} c_{i} (t) + r_{n} (t)

Appendix A.4. Autoregressive Integrated Moving Average (ARIMA)

Autoregressive Integrated Moving Average is a forecasting model that combines autoregressive (AR), moving average (MA), and integrated (I) components to analyze and forecast time-series data. It is a common method used in statistics and econometrics for modeling and forecasting time-series data. The calculation process is as follows:

Step 1: Smoothness test: The time series is tested for smoothness by examining the statistical properties of the time series. If it is not smooth, then differentiate the series to make it smooth.

Step 2: Identify the ARIMA parameters: Estimate the ARIMA parameters (p, d, q) using methods such as maximum likelihood estimation.

Step 3: Fit the model: Fit the ARIMA model to the smooth time series.

Step 4: Forecasting: Use the fitted model to forecast the future values of the time series.

Appendix A.5. Long Short-Term Memory (LSTM)

Long Short-Term Memory is a type of recurrent neural network (RNN) with three gates, namely, the unitary state and forgetting gate, input gate, and output gate, which is capable of learning long-term dependencies. LSTM solves the problem of gradient vanishing or gradient explosion that is encountered by the traditional RNNs when dealing with long sequential data by means of its unique network structure.

The computational formulae in LSTM are as follows:

Oblivion Gate:

f_{t} = σ (W_{f} \cdot [h_{t - 1}, x_{t}] + b_{f})

Input Gate:

i_{t} = σ (W_{i} \cdot [h_{t - 1}, x_{t}] + b_{i}) a n d {\tilde{C}}_{t} = \tanh (+ b_{C})

Unit status update:

C_{t} = f_{t} * C_{t - 1} + i_{t} * {\tilde{C}}_{t}

Output gates:

o_{t} = σ (W_{o} \cdot [h_{t - 1}, x_{t}] + b_{o}) a n d h_{t} = o_{t} * t a n h (C_{t})

.

Appendix A.6. Gated Recirculation Units (GRUs)

Gated recurrent units comprise a variant of LSTM that is also used to process sequence data. It simplifies the structure by combining the forgetting and input gates of the LSTM into a single update gate and by combining the unit state and hidden state. GRUs are particularly suited to processing and predicting time-series data, such as stock prices, natural language processing (including text generation and machine translation), and speech recognition.

The relevant formulae for GRUs are as follows:

1. Renewal gates:

z_{t} = σ (W_{z} \cdot [h_{t - 1}, x_{t}] + b_{z})

2. Reset door:

r_{t} = σ (W_{r} \cdot [h_{t - 1}, x_{t}] + b_{r})

3. Candidates for hidden status:

{\tilde{h}}_{t} = t a n h (W \cdot [r_{t} * h_{t - 1}, x_{t}] + b)

4. Hidden states:

h_{t} = (1 - z_{t}) * h_{t - 1} + z_{t} * {\tilde{h}}_{t}

where

σ

is the sigmoid function, b is the hyperbolic tangent function,

t a n h

is the hyperbolic tangent function, W and b are the weights and bias, h is the hidden state, and x is the input.

References

Gavirneni, S. Price fluctuations, information sharing, and supply chain performance. Eur. J. Oper. Res. 2006, 174, 1651–1663. [Google Scholar] [CrossRef]
Zhao, S.; Zhu, Q. A risk-averse marketing strategy and its effect on coordination activities in a remanufacturing supply chain under market fluctuation. J. Clean. Prod. 2018, 171, 1290–1299. [Google Scholar] [CrossRef]
Chen, X.; Wang, C.; Li, S. The impact of supply chain finance on corporate social responsibility and creating shared value: A case from the emerging economy. Supply Chain Manag. Int. J. 2022, 28, 324–346. [Google Scholar] [CrossRef]
Hong, X.; He, Y.; Zhou, P.; Chen, J. Demand information sharing in a contract farming supply chain. Eur. J. Oper. Res. 2023, 309, 560–577. [Google Scholar] [CrossRef]
Lin, Q.; Zhao, Q.; Lev, B. Influenza vaccine supply chain coordination under uncertain supply and demand. Eur. J. Oper. Res. 2022, 297, 930–948. [Google Scholar] [CrossRef]
Seyedhosseini, S.M.; Hosseini-Motlagh, S.-M.; Johari, M.; Jazinaninejad, M. Social price-sensitivity of demand for competitive supply chain coordination. Comput. Ind. Eng. 2019, 135, 1103–1126. [Google Scholar] [CrossRef]
Tai, P.D.; Duc TT, H.; Buddhakulsomsiri, J. Measure of bullwhip effect in supply chain with price-sensitive and correlated demand. Comput. Ind. Eng. 2019, 127, 408–419. [Google Scholar] [CrossRef]
Ezeaku, H.C.; Asongu, S.A.; Nnanna, J. Volatility of international commodity prices in times of COVID-19: Effects of oil supply and global demand shocks. Extr. Ind. Soc. 2021, 8, 257–270. [Google Scholar] [CrossRef]
Kilian, L.; Zhou, X. Modeling fluctuations in the global demand for commodities. J. Int. Money Financ. 2018, 88, 54–78. [Google Scholar] [CrossRef]
Tian, A.; Zhang, H. Development path and countermeasure analysis of Zhangqiu onion industry. China Fruit Veg. 2023, 43, 80–84. [Google Scholar]
Gao, Q.; Xu, H.; Li, A. The analysis of commodity demand predication in supply chain network based on particle swarm optimization algorithm. J. Comput. Appl. Math. 2022, 400, 113760. [Google Scholar] [CrossRef]
Wei, L.; Wei, W.; Liu, Y.; Zhang, J.; Xu, X. Mitigating supply-demand mismatch: The relationship between inventory sharing and demand learning. Decis. Sci. 2023. [CrossRef]
Sun, F.; Meng, X.; Zhang, Y.; Wang, Y.; Jiang, H.; Liu, P. Agricultural Product Price Forecasting Methods: A Review. Agriculture 2023, 13, 1671. [Google Scholar] [CrossRef]
Contreras, J.; Espinola, R.; Nogales, F.; Conejo, A. ARIMA models to predict next-day electricity prices. IEEE Trans. Power Syst. 2003, 18, 1014–1020. [Google Scholar] [CrossRef]
Wu, L.; Liu, S.; Yang, Y. Grey double exponential smoothing model and its application on pig price forecasting in China. Appl. Soft Comput. 2016, 39, 117–123. [Google Scholar] [CrossRef]
Lu, W.; Li, J.; Wang, J.; Qin, L. A CNN-BiLSTM-AM method for stock price prediction. Neural Comput. Appl. 2021, 33, 4741–4753. [Google Scholar] [CrossRef]
Bali, N.; Singla, A. Emerging Trends in Machine Learning to Predict Crop Yield and Study Its Influential Factors: A Survey. Arch. Comput. Methods Eng. 2022, 29, 95–112. [Google Scholar] [CrossRef]
Jaiswal, R.; Jha, G.K.; Kumar, R.R.; Choudhary, K. Deep long short-term memory based model for agricultural price forecasting. Neural Comput. Appl. 2022, 34, 4661–4676. [Google Scholar] [CrossRef]
Chen, L.; Wu, T.; Wang, Z.; Lin, X.; Cai, Y. A novel hybrid BPNN model based on adaptive evolutionary Artificial Bee Colony Algorithm for water quality index prediction. Ecol. Indic. 2023, 146, 109882. [Google Scholar] [CrossRef]
Gu, Y.H.; Jin, D.; Yin, H.; Zheng, R.; Piao, X.; Yoo, S.J. Forecasting Agricultural Commodity Prices Using Dual Input Attention LSTM. Agriculture 2022, 12, 256. [Google Scholar] [CrossRef]
Li, K.; Shen, N.; Kang, Y.; Chen, H.; Wang, Y.; He, S. Livestock Product Price Forecasting Method Based on Heterogeneous GRU Neural Network and Energy Decomposition. IEEE Access 2021, 9, 158322–158330. [Google Scholar] [CrossRef]
Wang, J.; Wang, Z.; Li, X.; Zhou, H. Artificial bee colony-based combination approach to forecasting agricultural commodity prices. Int. J. Forecast. 2022, 38, 21–34. [Google Scholar] [CrossRef]
Clemen, R.T. Combining forecasts: A review and annotated bibliography. Int. J. Forecast. 1989, 5, 559–583. [Google Scholar] [CrossRef]
Zhang, Y.; Wang, Y. Forecasting crude oil futures market returns: A principal component analysis combination approach. Int. J. Forecast. 2023, 39, 659–673. [Google Scholar] [CrossRef]
Kang, H. Unstable Weights in the Combination of Forecasts. Manag. Sci. 1986, 32, 683–695. [Google Scholar] [CrossRef]
Blanc, S.M.; Setzer, T. When to choose the simple average in forecast combination. J. Bus. Res. 2016, 69, 3951–3962. [Google Scholar] [CrossRef]
Karaboga, D.; Basturk, B. A powerful and efficient algorithm for numerical function optimization: Artificial bee colony (ABC) algorithm. J. Glob. Optim. 2007, 39, 459–471. [Google Scholar] [CrossRef]
Wang, D.; Tan, D.; Liu, L. Particle swarm optimization algorithm: An overview. Soft Comput. 2018, 22, 387–408. [Google Scholar] [CrossRef]
De Jong, K. Learning with genetic algorithms: An overview. Mach. Learn. 1988, 3, 121–138. [Google Scholar] [CrossRef]
Wang, X.; Hyndman, R.J.; Li, F.; Kang, Y. Forecast combinations: An over 50-year review. Int. J. Forecast. 2023, 39, 1518–1547. [Google Scholar] [CrossRef]
Gad, A.G. Particle Swarm Optimization Algorithm and Its Applications: A Systematic Review. Arch. Comput. Methods Eng. 2022, 29, 2531–2561. [Google Scholar] [CrossRef]
Yang, Q.; Guo, X.; Gao, X.-D.; Xu, D.-D.; Lu, Z.-Y. Differential Elite Learning Particle Swarm Optimization for Global Numerical Optimization. Mathematics 2022, 10, 1261. [Google Scholar] [CrossRef]
Shi, L.; Gong, J.; Zhai, C. Application of a hybrid PSO-GA optimization algorithm in determining pyrolysis kinetics of biomass. Fuel 2022, 323, 124344. [Google Scholar] [CrossRef]
Vautard, R.; Yiou, P.; Ghil, M. Singular-spectrum analysis: A toolkit for short, noisy chaotic signals. Phys. D. Nonlinear Phenom. 1992, 58, 95–126. [Google Scholar] [CrossRef]
He, K.; Chen, Y.; Tso, G.K. Price forecasting in the precious metal market: A multivariate EMD denoising approach. Resour. Policy 2017, 54, 9–24. [Google Scholar] [CrossRef]
Yang, H.; Shi, W.S.; Li, G.H. Underwater acoustic signal denoising model based on secondary variational mode decomposition. Def. Technol. 2023, 28, 87–110. [Google Scholar] [CrossRef]
Gong, X.; Chen, W.; Chen, J.; Ai, B. Tensor Denoising Using Low-Rank Tensor Train Decomposition. IEEE Signal Process. Lett. 2020, 27, 1685–1689. [Google Scholar] [CrossRef]
Yaslan, Y.; Bican, B. Empirical mode decomposition based denoising method with support vector regression for time series prediction: A case study for electricity load forecasting. Measurement 2017, 103, 52–61. [Google Scholar] [CrossRef]
Wang, J.; Li, X. A combined neural network model for commodity price forecasting with SSA. Soft Comput. 2018, 22, 5323–5333. [Google Scholar] [CrossRef]
Huang, Y.; Deng, Y. A new crude oil price forecasting model based on variational mode decomposition. Knowl.-Based Syst. 2021, 213, 106669. [Google Scholar]
Ma, H.; Zhao, X. An Empirical Analysis of the Price Fluctuation Characteristics of China’s Small Agricultural Products—Taking Garlic as an Example. J. Agrotech. Econ. 2021, 38, 33–48. [Google Scholar]
Teng, J. Prediction of Ginger Price Based on Combination Model and Development of Mobile Terminal; Shandong Agricultural University: Tai’an, China, 2020. [Google Scholar]
Meng, J.; Lv, X. Research on the Characteristics of Small Agricultural Products Price Fluctuations in China and the Fluctuation Pattern—Analysis based on the ARCH class models. Price Theory Pract. 2021, 87–90+197. [Google Scholar]
Zeng, L.; Ling, L.; Zhang, D.; Jiang, W. Optimal forecast combination based on PSO-CS approach for daily agricultural future prices forecasting. Appl. Soft Comput. 2023, 132, 109833. [Google Scholar] [CrossRef]
Deng, C.; Huang, Y.; Hasan, N.; Bao, Y. Multi-step-ahead stock price index forecasting using long short-term memory model with multivariate empirical mode decomposition. Inf. Sci. 2022, 607, 297–321. [Google Scholar] [CrossRef]
Zhang, T.; Tang, Z. Agricultural commodity futures prices prediction based on a new hybrid forecasting model combining quadratic decomposition technology and LSTM model. Front. Sustain. Food Syst. 2024, 8, 1334098. [Google Scholar] [CrossRef]
Liu, D.; Tang, Z.; Cai, Y. A Hybrid Model for China’s Soybean Spot Price Prediction by Integrating CEEMDAN with Fuzzy Entropy Clustering and CNN-GRU-Attention. Sustainability 2022, 14, 15522. [Google Scholar] [CrossRef]
Li, H.; Peng, Y.; Deng, C.; Gong, D. Review of hybrids of GA and PSO. Comput. Eng. Appl. 2018, 54, 20–28. [Google Scholar]
Yu, S.; Wang, K.; Wei, Y.-M. A hybrid self-adaptive Particle Swarm Optimization–Genetic Algorithm–Radial Basis Function model for annual electricity demand prediction. Energy Convers. Manag. 2015, 91, 176–185. [Google Scholar] [CrossRef]
Yu, S.; Zhu, K.; Zhang, X. Energy demand projection of China using a path-coefficient analysis and PSO–GA approach. Energy Convers. Manag. 2012, 53, 142–153. [Google Scholar] [CrossRef]
Liu, T.; Ma, X.; Li, S.; Li, X.; Zhang, C. A stock price prediction method based on meta-learning and variational mode decomposition. Knowl.-Based Syst. 2022, 252, 109324. [Google Scholar] [CrossRef]
Liu, J.; Wang, P.; Chen, H.; Zhu, J. A combination forecasting model based on hybrid interval multi-scale decomposition: Application to interval-valued carbon price forecasting. Expert Syst. Appl. 2022, 191, 116267. [Google Scholar] [CrossRef]
Zhang, Z.; Chen, Y.; Zhang, D.; Qian, Y.; Wang, H. CTFNet: Long-Sequence Time-Series Forecasting Based on Convolution and Time–Frequency Analysis. IEEE Trans. Neural Netw. Learn. Syst. 2023. [Google Scholar] [CrossRef] [PubMed]
Xu, K.; Niu, H. Denoising or distortion: Does decomposition-reconstruction modeling paradigm provide a reliable prediction for crude oil price time series? Energy Econ. 2023, 128, 107129. [Google Scholar] [CrossRef]
Rezaei, H.; Faaljou, H.; Mansourfar, G. Stock price prediction using deep learning and frequency decomposition. Expert Syst. Appl. 2021, 169, 114332. [Google Scholar] [CrossRef]
Zheng, G.; Zhang, H.; Han, J.; Zhuang, C.; Xi, L. The Research on Agricultural Product Price Forecasting Service Based on Combination Model. In Proceedings of the 2020 IEEE 13th International Conference on Cloud Computing (CLOUD), Beijing, China, 18–24 October 2020; pp. 4–9. [Google Scholar]

Figure 1. Daily scallion future price series.

Figure 2. Flowchart of PSO-GA methodology.

Figure 3. Optimal forecast combination based on PSO-GA approach.

Figure 4. The smoothed price series for scallions in the test set.

Figure 5. MAPE of individual models in predicting scallion prices.

Figure 6. PSO-GA, Mean, and Median comparison of three methods in four indicators.

Figure 7. MAPE (%) of combination models for scallion price forecasting.

Figure 8. MAPE reduction rate for PSO-GA approach relative to the mean approach for scallions.

Table 1. Descriptive statistics for agricultural commodity future prices.

	Min	Max	Mean	Standard Deviation	Skewness	Kurtosis
Scallion	0.8	12	3.0915	1.9341	1.7611	2.8712

Table 2. Some sample data from the training set.

Date	2021-08-27	2021-08-28	2021-08-29	2021-08-30	2021-08-31
Price	2.67	2.935	3.2	2.71	2.8

Table 3. Some sample data from the test set.

Date	2022-08-27	2022-08-28	2022-08-29	2022-08-30	2022-08-31
Price	5.54	5.08	5.66	5.55	5.5

Table 4. Some sample data from the validation set.

Date	2023-08-27	2023-08-28	2023-08-29	2023-08-30	2023-08-31
Price	2.51	2.39	2.43	2.54	2.5

Table 10. DM test results of PSO-GA under different sub-strategies for scallion price.

Target Model: Full-Based	Benchmark Model
Target Model: Full-Based	PSO-GA-ARIMA	PSO-GA-LSTM	PSO-GA-GRU	PSO-GA-VMD	PSO-GA-EMD	PSO-GA-SSA	PSO-GA-3 Model
DMstat	−2.3575	−3.3081	−2.3502	−2.3500	−2.3628	−2.3478	−2.3941
p-value	0.0204	0.0013	0.0207	0.0208	0.0201	0.0209	0.0185

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Share and Cite

MDPI and ACS Style

Li, Y.; Zhang, T.; Yu, X.; Sun, F.; Liu, P.; Zhu, K. Research on Agricultural Product Price Prediction Based on Improved PSO-GA. Appl. Sci. 2024, 14, 6862. https://doi.org/10.3390/app14166862

AMA Style

Li Y, Zhang T, Yu X, Sun F, Liu P, Zhu K. Research on Agricultural Product Price Prediction Based on Improved PSO-GA. Applied Sciences. 2024; 14(16):6862. https://doi.org/10.3390/app14166862

Chicago/Turabian Style

Li, Yunhong, Tianyi Zhang, Xintong Yu, Feihu Sun, Pingzeng Liu, and Ke Zhu. 2024. "Research on Agricultural Product Price Prediction Based on Improved PSO-GA" Applied Sciences 14, no. 16: 6862. https://doi.org/10.3390/app14166862

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Menu

Research on Agricultural Product Price Prediction Based on Improved PSO-GA

Abstract

1. Introduction

2. Literature Review

3. Materials and Methods

3.1. Data Description

3.2. Assessment Indicators

3.2.1. Evaluation Metrics

3.2.2. Test Methods for Predictive Validity

3.3. Data Preprocessing

3.4. Single Forecast

3.5. Combined Forecasts

3.6. Optimization Algorithm Based on PSO-GA Improvement

3.6.1. Particle Swarm and Genetic Algorithm Overview

3.6.2. Improved PSO-GA

3.7. Methodological Framework of the Experiment

4. Results

4.1. Parameterization

4.2. Forecast Analysis

4.2.1. Prediction Results from a Single Model

4.2.2. Prediction Results of Combined Models

4.3. Statistical Tests

5. Conclusions

Author Contributions

Funding

Data Availability Statement

Conflicts of Interest

Appendix A

Appendix A.1. Singular Spectrum Analysis (SSA)

Appendix A.2. Variational Modal Decomposition (VMD)

Appendix A.3. Empirical Modal Decomposition (EMD)

Appendix A.4. Autoregressive Integrated Moving Average (ARIMA)

Appendix A.5. Long Short-Term Memory (LSTM)

Appendix A.6. Gated Recirculation Units (GRUs)

References

Share and Cite

Article Metrics

Article Access Statistics

Further Information

Guidelines

MDPI Initiatives

Follow MDPI