1. Introduction
China is an agricultural giant, and the development of agriculture is of paramount importance. Agricultural products are crucial foundations for the rural economy in China, and accurately predicting their prices is essential for ensuring stable income for farmers and meeting the daily needs of society [
1,
2,
3]. By effectively predicting agricultural product prices, we can analyze the characteristics and patterns of price fluctuations, thereby reducing the impact of unforeseen factors on the economy. Additionally, by establishing effective models for predicting agricultural product prices, we can more accurately analyze price trends. Accurately predicting agricultural product prices not only provides rational recommendations for government decision-making but also offers effective support for determining target prices for agricultural products. This can effectively guide the development of the agricultural product industry, thereby promoting the prosperity and healthy development of the agricultural economy [
4,
5].
Traditional prediction models for agricultural product prices primarily encompass time series analysis, regression analysis, and an array of associated linear and nonlinear modeling techniques. Ren Weijie and Li Baisong [
6] proposed a new method called the Hilbert–Schmidt independence criterion Lasso Granger causality, which is used to reveal nonlinear causal relationships between multivariate time series. The results show that the proposed method can effectively analyze the nonlinear causal relationships between multivariate time series and can simultaneously conduct causal analysis of multiple input variables for output variables, which is suitable for addressing high-dimensional problems. Weng Yucheng [
7] and others tested Autoregressive Integrated Moving Average (ARIMA) models, backpropagation (BP) neural network methods, and recursive neural network (RNN) methods to predict the short-term (several days) and long-term (several weeks or months) prices of agricultural products (cucumbers, tomatoes, and eggplants). They found that the ARIMA model requires continuous and periodic data, so it is apt for handling small-scale periodic datasets. It excels with average monthly data but struggles with daily data. Conversely, neural network approaches, such as backpropagation networks and Recurrent Neural Networks (RNNs), are adept at forecasting daily, weekly, and monthly price volatility trends effectively. Varun [
8] and others laid a practical foundation for future predictions by studying the impact of foreign exchange rates on agricultural product price changes using multivariate regression techniques and data mining. Brandt and Bessler [
9] used 24 quarters of U.S. pork as the research object and employed seven forecasting methods for verification and performance evaluation, deriving the value of the forecasting methods.
In order to address various issues regarding prediction accuracy and effectiveness, experts and scholars have proposed various intelligent forecasting methods. Purohit, SK [
10] introduced two additive hybrid methods (Additive-ETS-SVM, Additive-ETS-LSTM) and five multiplicative hybrid methods (Multiplicative-ETS-ANN, Multiplicative-ETS-SVM, Multiplicative-ETS-LSTM, Multiplicative-ARIMA-SVM, Multiplicative-ARIMA-LSTM) for predicting the monthly retail and wholesale prices of three commonly used vegetable crops in India (tomatoes, onions, and potatoes). They confirmed the superiority of hybrid methods in forecasting TOP prices. Raflesia [
11] and others studied agricultural sales systems and applied the PSO particle algorithm to construct radial basis function neural networks for price prediction of different agricultural products. Zhao Hailei [
12] first used wavelet analysis to smooth the data, then constructed a model to process the hierarchical information after signal decomposition, and finally used the ARIMA algorithm for analysis. They found that using wavelet analysis yielded better results than using the ARIMA model alone, affirming the importance of data smoothing. Liu Jindian [
13] and colleagues proposed a soybean futures price prediction model based on Ensemble Empirical Mode Decomposition (EEMD) and New Attention Gating Units (NAGU). The results showed that the predictive performance of the EEMD-NAGU model was superior to other models such as LSTM, GRU, NAGU, EEMD-LSTM, EEMD-GRU, and EEMD-NGU. This model can be widely used to predict the prices of wheat, corn, gold, oil, and other time series data. Abdullah [
14] and others established linear and nonlinear models to better predict coconut prices and studied the capabilities of a hybrid method combining ARIMA and NARNET (ANN) models. Experimental results showed that the proposed ARIMA-NARNET method outperformed the ARIMA model and the NARNET model in predicting coconut prices. Liu Weiping [
15] and others proposed a novel method that combines Variational Mode Decomposition (VMD) and Artificial Neural Networks (ANN) into a “Decomposition and Integration” framework. This method transforms the high-volatility futures price prediction problem into predicting multi-component time series with a unique central frequency, then independently predicts all components using the ANN method, and finally integrates the prediction results of each component into the final prediction result. The results showed that both accuracy and trend accuracy were significantly better than some state-of-the-art methods, validating that the proposed VMD-ANN method can effectively predict non-stationary and nonlinear futures price sequences. Zhang [
16] combined the linear prediction advantages of the ARIMA model with the nonlinear prediction features of the ANN model and found that the combined model had higher prediction accuracy. Gu Yongxian [
17] and others proposed a model called Dual Input Attention Long Short-Term Memory (DIA-LSTM), which, compared to traditional models using static meteorological information from the main production areas, reduced the Mean Absolute Percentage Error (MAPE) by 2.8% to 5.5%. Moreover, its MAPE was lower than the benchmark model by 1.41% to 4.26%. Ray, S [
18] and others proposed an improved hybrid ARIMA-LSTM model based on the Random Forest Lagged Selection Criterion. The results showed that the proposed model outperformed traditional statistical models, with RMSE increasing by 8–25%, MAPE increasing by 2–28%, and MASE increasing by 2–29%. Simões [
19] and Sivapragasam [
20] constructed SVM prediction models based on SSA for short-term random rainfall prediction, validating that the models had higher prediction effectiveness and result confidence. Paul [
21] and others used machine learning algorithms such as GRNN and SVR to forecast eggplant prices in the Indian state of Odisha and tested the accuracy of different models through the Diebold–Mariano test.
The focus of this paper is the prediction model of corn futures price based on the LSTM Bezier curve. Bezier curve is a parametric polynomial curve based on the Bornstein basis function, and the shape of the curve is adjusted by the coordinates of the control points. Designers without mathematical knowledge can also complete interactive design modification work, so it is widely used in CAD design software. The greatest advantage of Bezier curves is the flexibility to change the shape of the curve by moving the control points, and because of the consistent approximation of Bornstein basis functions to polynomials (and even continuous functions), it is an excellent curve fitting tool. The Bezier curve fits the data point column as follows: First, the logarithmic data point column is parameterized (isometric parameterization, cumulative chord-length parameterization, modified sine length parameterization, etc.) so that each data point corresponds to a parameter t in the interval [0,1], and then the least square fitting method is used to obtain a fitting curve that minimizes the distance between each data point and the curve, thus revealing the intrinsic basic trend of the data point column.
Krishna [
22] proposed a convolutional recurrent neural network (ConvRNN) kernel based on wavelet to improve the time–frequency localization of non-stationary signals, applied Bezier–Bernstein polynomial functions to model NSS, and used inflection points for signal segmentation. From the obtained fragments, statistical time–frequency features are extracted and fed to the ConvRNN for better time–frequency localization. Bai [
23] proposed an improved path planning algorithm based on deep reinforcement learning to find a class of AMR optimization paths, in which Bessel curve theory was used to smooth the planned paths.
As a special recurrent neural network structure, Long Short-Term Memory (LSTM) is mainly used to improve and solve the long-term dependence problem encountered in traditional RNN. LSTM is able to control and transmit information using a unique gating mechanism to better capture long-term dependencies.
Huang Y [
24] proposed a coal seam thickness prediction method based on VMD and LSTM methods. The VMD method is used to denoise the signal, and compared with the EMD method, the result of VMD is proved to be better. Then, LSTM is used to predict the thickness of the coal seam, and compared with other benchmark models, it is found that the prediction method proposed by him has higher accuracy, which indicates that the data denoising can help improve the prediction accuracy. Otero [
25] used a multivariate empirical mode decomposition method to decompose the original time signals collected from the training set and patients with essential tremors. The decomposed data are respectively input into the LSTM model for training and prediction, and then all prediction results are added together to form the final prediction result. Finally, the trained LSTM model is obtained. Then, the remaining raw samples are used to test the model. The experimental results show that the proposed method is superior to all other benchmark methods in all cases.
Finally, As an important agricultural product, corn has the characteristics of high supply and demand and high price fluctuation. Accurate analysis of corn prices and prediction of future corn prices based on this can effectively guarantee the stable development of corn planting, breeding, and corn processing industries and also help the government to timely regulate corn prices and reduce possible crises [
26,
27,
28]. Therefore, this study will focus on the fluctuation of corn prices. With the benchmark LSTM and ARIMA models, a corn futures price prediction model based on the Bezier curve is proposed to effectively analyze and forecast the daily futures price of corn, providing a scientific basis for the government and related enterprises to formulate relevant policies.
3. Empirical Research
3.1. Experimental Environment and Data
The experiments in this paper were conducted on a Windows 10 64-bit operating system, using an AMD Ryzen 7 5800H with Radeon Graphics 3.20 GHz processor and 16 GB of memory. The programming language used was Python 3.7.5, with the Matplotlib 3.0.2 plotting tool.
The data for this study were sourced from the Dalian Commodity Exchange’s corn futures market data. The empirical analysis covers the price series from 4 January 2013 to 30 December 2022, totaling 10 consecutive years of working days, with a total of 16,038 original data points. The dataset includes 12 different indicators, namely, the previous closing price, previous settlement price, opening price, highest price, lowest price, closing price, settlement price, price change ratio 1, price change ratio 2, trading volume, trading value, and open interest. Each year includes 12 different contract prices. The original dataset is formatted as shown in the
Table 1 below (PCP means previous closing price, PST means previous settlement price, OP means opening price, HP means highest price, LP means lowest price, CP means closing price, SP means settlement price, PF means price fluctuations, and TV means trading volume).
- (1)
Contract Selection: When selecting contracts, we refer to the main contract, which is the contract with the largest open interest. In cases where open interest is the same, we choose the contract with higher trading volume, and if trading volume is also the same, we select the contract with a later expiration date. To filter the main contract, we use the pandas library in Python for calculation. After filtering, we obtained a total of 2430 main contract data points.
- (2)
Price Selection: The data source includes seven different price indicators: previous closing price, previous settlement price, opening price, highest price, lowest price, closing price, and settlement price. In our study, we choose the settlement price as the research object. The settlement price is the final delivery price of the futures contract on the expiration date or specified date, and it plays a crucial role in the futures market [
50,
51].
Therefore, by ensuring the selection of appropriate contracts and focusing on the settlement price as our key indicator, we ensure the effectiveness of corn price prediction. After organizing and processing the data, we obtained a total of 2430 daily price data points for corn futures, and the price trend chart is shown in
Figure 4.
3.2. LSTM Model Construction and Prediction Analysis
In data set partitioning, most models divide the training set and the test set in a 7:3 ratio because the data volume of these data is usually large. In the process of data splitting, due to the small amount of data, in order to avoid the problem of underfitting the model due to the lack of learning of some features, the training set and the test set are divided according to the ratio of 8:2. If the ratio of 9:1 is divided, it may lead to the problem of overfitting the training set results and poor test set results, and the test set data of this division is too small to show the prediction effect over the whole year. After comparing the results of the algorithm model with these three partitioning methods, it is found that partitioning according to the ratio of 8:2 is slightly better than the other two methods, and the test set of this partitioning can cover a whole year’s data, which is more conducive to observation. Therefore, this paper divides the training set and the test set in an 8:2 ratio.
To analyze and predict the daily price dataset of corn futures, we divided the dataset into training and testing sets using an 80:20 ratio. The first 80% of the data was used as the training set, while the remaining 20% was used as the testing set. Considering the characteristics of the data and the business requirements, we used the training data of corn futures prices as input to the LSTM network and set the output layer dimension to 1. To evaluate the performance of the model, we chose Root Mean Squared Error (RMSE) as the evaluation metric and used cross-validation to continuously evaluate and optimize the parameters. We determined the following settings: 4 hidden layers, 3000 iterations, and historical time steps of 2. To improve training accuracy, we set the batch size to 480 and added a dropout layer to prevent overfitting. All layers were fully connected during training, and the sigmoid activation function was used. Based on these parameter settings, we trained the model and conducted predictive analysis on the testing set data.
For model evaluation, we chose three evaluation metrics to measure the predictive performance of the model: Mean Absolute Percentage Error (MAPE), Root Mean Squared Error (RMSE), and directional statistical indicator.
- (1)
MAPE (Mean Absolute Percentage Error)
Mean Absolute Percentage Error (
MAPE) is calculated by taking the average of the percentage errors, where each percentage error is the absolute difference between the predicted and actual values divided by the actual value. When the actual value is small, it can significantly influence the MAPE value since the denominator becomes small. This can affect the evaluation of the model. The formula for MAPE is as follows:
- (2)
RMSE (Root Mean Squared Error)
Root Mean Squared Error (RMSE) is a measure of the average deviation between observed values and true values, calculated by taking the square root of the average of the squared differences between the observed and true values. RMSE is used to describe the dispersion of the sample, and when performing nonlinear fitting, a smaller RMSE indicates better performance. Using RMSE to define the loss function has the advantage of being smooth and differentiable, making mathematical calculations easier.
The formula for RMSE is as follows:
In the equation, represents the number of samples, denotes the predicted price value at time period , and represents the true price value corresponding to time period .
- (3)
(Di-Rectional Statistic)
In addition, the
statistical indicator is introduced on top of MAPE and RMSE to measure the directional accuracy of the model’s predictions, providing a better assessment of the model’s performance in predicting price changes. A higher
value indicates better performance.
can be represented as follows:
where
The corn futures price data set is divided into a ratio of 8:2; the first 80% is selected as the training set of the model, and the last 20% is selected as the test set. Considering the characteristics of the data and business requirements, the training set data of corn futures price are taken as the input of the LSTM network, the batch_size is set to 480, the network parameters are updated using the backpropagation algorithm, the dropout layer is applied to minimize the loss function, and sigmoid is selected as the activation function. Set the hidden layer to 4, epoch to 2000, and time step to 2. All layers will be fully connected during the training process, and the sigmoid will be selected as the activation function. Based on the above parameters and model training, the LSTM model is constructed to predict and analyze the multi-factor test set data. The LSTM model prediction results and index results are shown in
Figure 5 and
Table 2.
Based on the analysis of the evaluation metrics RMSE, MAPE, and for corn settlement prices, it is observed that these metrics did not achieve satisfactory results. Particularly, the MAPE value of 1.6% is considered too high. Combining the analysis of the results with the graphs, it is concluded that using the LSTM model alone for predicting corn prices results in reasonably accurate short-term price predictions but introduces some bias in predicting long-term price trends. Although the model can roughly predict price trends and characteristics, the achieved predictive performance is not ideal.
3.3. Bezier-Based LSTM Model Construction and Prediction Analysis
Before predicting the corn price series, we first utilize Bézier curves to fit the data sequence, approximating the basic trend in the time series data. Segmented Bézier curves are defined, dividing the
interval into 10 segments. According to the requirements of segmented Bézier curves, corresponding basis functions are provided, with each basis function corresponding to a control point. For example, for a segmented fifth-order Bézier curve, the basis functions for the first segment, denoted as
, are as follows:
The corresponding control points are
, totaling 6 points. The basis functions for other segments can be obtained through coordinate translation and symmetry. Therefore, a segmented fifth-order Bézier curve divided into 10 equal segments has 51 basis functions in total. Using the chord-length parameterization method, specific values of the parameter
corresponding to each data point are obtained. Then, the specific coordinates of each control point are calculated through the least squares fitting method, resulting in the final fitted curve [
52]. Through multiple experiments, it is observed that the segmented fifth-order Bézier curve provides better detail fitting without knotting phenomena occurring. The fitting effect is shown in
Figure 6.
With the data generated from the Bézier curve, the corn futures price dataset is divided into a ratio of 8:2, with the first 80% for training the model and the remaining 20% as a testing set. That is, the training set data are used as the input to the LSTM network, with the output layer dimension set to 1. RMSE is chosen as the evaluation metric, and parameters are continuously evaluated and optimized using cross-validation. The settings for the hidden layer are determined as 3, with 2000 iterations and a historical time step of 2. To improve training accuracy, the batch size is set to 100, a dropout layer is added to prevent overfitting, and all layers are fully connected during training. Sigmoid is chosen as the activation function. Based on these parameter settings, the model is trained, and predictions are made on the testing set. Therefore, the predicted results of the LSTM model based on Bézier are shown in
Figure 7. The index results of Bezier + LSTM model are shown in
Table 3.
Therefore, it can be concluded that by first processing the corn settlement prices, fitting them using Bézier curves, and then entering them into LSTM for prediction, the combination of Bézier curves and LSTM models for forecasting shows significant advantages in terms of the evaluation metrics RMSE, MAPE, and Dstat. The MAPE metric value is reduced by 0.80%. Combined with the charts, it can be inferred that using the LSTM model based on Bézier fitting for corn price prediction achieves much better predictive performance than using the LSTM model alone. The predicted results are more satisfactory, and there is a significant improvement in the model’s predictive accuracy and precision.
3.4. ARIMA Model Construction and Predictive Analysis
Prediction steps: ① Preprocess the price dataset to handle outliers and missing values. ② Test the stationarity of the series. If the
p-value of the test is greater than 0.05, indicating that the original series is non-stationary, perform differencing operation (differencing order
d) until the series becomes stationary. In this case, the series is found to be first-order differenced. ③ Using the AIC or BIC criteria to determine the order of the model, we obtained
and
, establishing an ARIMA (0,1,0) model. Apply the fitted ARIMA (0,1,0) model to forecast corn futures prices. Split the historical data into training and testing sets in an 8:2 ratio. The ARIMA model’s forecast results are shown in
Figure 8. The index results of ARIMA model are shown in
Table 4.
It can be concluded that the results of the evaluation metrics RMSE, MAPE, and for the corn settlement price are acceptable. Combined with the charts, it can be inferred that the predictive performance achieved by using the ARIMA model for corn price forecasting is moderately satisfactory. Compared to the results of the LSTM model, the ARIMA model performs better, but it is not as suitable as the LSTM model based on Bézier fitting. Further optimization of this model is needed to improve its predictive accuracy and precision.
3.5. SVR Model Construction and Predictive Analysis
Prediction steps: ① Divide the corn futures price data into training and testing sets in an 8:2 ratio. ② Choose a kernel function to establish the Support Vector Machine (SVM) model. Select the Gaussian kernel function (rbf). ③ Input the training data and initialize the penalty factor parameter C and the parameter gamma. ④ Based on the initialized parameters, determine the approximate range of parameters and finalize the model parameters. ⑤ Input the testing set data and compare the predicted results with the actual corn futures prices for analysis.
Therefore, through calculation and parameter optimization, it is found that setting the prediction step length to 1 and using the first 50 data points to predict the next data point results in a small error. Initially, the penalty factor parameter
is set to 1, 3, 5, 10, 30, and 100, and the parameter gamma is set to 0.001, 0.005, 0.05, 0.1, 0.15, 0.5, and 0.8. The rbf kernel function is selected, and the final parameters are determined to be
and gamma = 0.15. Using these parameters, the testing set data are predicted, and the prediction results are shown in
Figure 9, and the index results are shown in
Table 5.
Based on the prediction results, it can be concluded that the SVR model based on corn settlement price performs well in predicting the overall price trend and features. However, it may not accurately predict some abrupt price trends, resulting in relatively smoother prediction results. Considering the evaluation metrics RMSE, MAPE, and Dstat, it can be observed that the results of all three metrics are acceptable. The predictive performance is moderate, with the MAPE and RMSE metrics lacking high precision. However, the metric performs better compared to LSTM and ARIMA models but inferior to the predictive results of the LSTM model based on Bézier fitting. Further optimization of this model is required to enhance its predictive accuracy and precision.
3.6. VMD-LSTM Model Construction and Predictive Analysis
The original time series data are decomposed using VMD, and then the decomposed data are used for prediction. Finally, the prediction results are added together to reconstruct the prediction results of the original data and are compared with the original data. Prediction steps: (1). The daily futures price data of corn were divided into a training set and a test set according to the ratio of 8:2. (2). After several parameter adjustments, VMD parameters are determined: alpha = 2400, tau = 0, K = 5, DC = 0, init = 1, tol = 1 × 10
−7. The breakdown results are shown in
Figure 10.
(3): Input the decomposed VMD data into the LSTM network as training data, set the output layer dimension to 1, and use RMSE as an evaluation index to continuously evaluate and optimize the parameters by cross-validation. The hidden layer is set to 3, the number of iterations is 2000, and the historical time step is 2. (4): Bring in the test set data and compare and analyze the forecast results with the actual price data of corn futures. Forecast results and indicator results are shown in
Figure 11 and
Table 6.
According to the forecast result chart, it can be concluded that the settlement price of corn based on the VMD + LSTM model has a suitable prediction effect on the overall price trend and characteristics. Combined with the evaluation indexes RMSE, MAPE, and , it can be seen that the results of the three indexes are all OK, and the prediction effect is better, but the index is only better than LSTM. It shows that VMD can predict the price of corn price well after time series decomposition but cannot learn the trend of rising and falling. Further optimization of this model is needed to improve the prediction accuracy and accuracy of the model.
3.7. Comparison of Results
Based on the displayed prediction results of corn futures prices, it is observed that the predicted values from various methods are generally close to the actual values. This indicates that the selected prediction methods for corn price forecasting are relatively reasonable. However, compared to other methods, the predictive performance of the LSTM model based on Bézier curves demonstrates better accuracy and precision in predicting corn prices. The comparison results of the above five models are shown in
Table 7.
According to the calculation results in the above table, it is observed that compared with LSTM, ARIMA, and SVR models, the RMSE index of the Bezier curve-based LSTM method and VMD decomposition-based LSTM method is much lower, the MAPE index is also significantly reduced, and the prediction of price trend is more accurate. It can be judged that there are certain advantages in the prediction after noise elimination of the original data, and the prediction accuracy is also significantly improved. Through the analysis of the daily price characteristics of corn and the comprehensive consideration of the forecast results, combined with the characteristics of the prediction accuracy and the practicability of the method, it can be concluded that the combination of Bezier curve fitting and LSTM model is a better choice to forecast the price, which has certain practicability and feasibility.
It is concluded that the possible reason for the best performance of ARIMA among the three original models is that ARIMA itself has undergone white noise detection and stationarity tests. Compared with the other two basic models, ARIMA has carried out data preprocessing so that the subsequent models can better fit the data. It has been proved that prediction after noise treatment can improve the accuracy of prediction, which is demonstrated by Bezier + LSTM and VMD + LSTM. The problem that the direction index value of VMD + LSTM is low and fails to meet the expectations may be caused by the error of the forecast result after decomposition. The forecast result of VMD + LSTM is reconstructed by adding the forecast results of each decomposed curve, and there are errors in each prediction, which may be that although MAPE is relatively ideal, the reason why the value is not ideal.
4. Conclusions
This paper focuses on using different predictive modeling methods to forecast corn futures daily price data and analyzes whether the proposed forecasting methods have higher accuracy, better predictive performance, and practical application significance.
In the study of corn daily price prediction, we use the Bezier curve fitting method to process the data set by constantly adjusting the control vertex to change the curve shape and combine the method of long-term memory network (LSTM) to forecast and analyze the corn daily price data. According to the calculated signal-to-noise ratio and the characteristics of corn daily price data, the B-spline multi-layer vertex fitting method is selected for de-noising processing, and the de-noised data are combined with the LSTM model for prediction analysis. The results of LSTM, ARIMA, and SVR model prediction are compared with the original data as input values, and the results of combined prediction with the LSTM model after VMD decomposition and denoising are also compared. It is concluded that the prediction effect of the LSTM model based on Bezier is obviously better and has greater practical significance.
This study provides new ideas and methods for price prediction of corn and other related agricultural products. Through big data technology, price prediction can be conducted more effectively and scientifically, which has practical significance. This modeling approach can be extended to the prediction of prices of more agricultural products, thereby better analyzing the price trends of agricultural products and providing reasonable and effective suggestions for government decision-making and the determination of target prices for agricultural products, thus better safeguarding the interests of farmers regarding the sale of agricultural products.