1. Introduction
The agricultural output comprises major and minor agricultural products, each pivotal in the agricultural economy. Major agricultural products, such as wheat, rice, and maize, are characterized by large-scale production and considerable economic value. Conversely, minor agricultural products, including certain fruits, vegetables, and specialty crops and herbs, are produced on a smaller scale with relatively lower economic value. With the rise of modern agriculture, minor agricultural products have become increasingly integral to the agricultural economy, significantly impacting farmers’ incomes. Minor agricultural products exhibit lower supply, demand, and transaction volume compared to major ones, rendering them more vulnerable to frequent and substantial price fluctuations, as well as market shocks resulting from stochastic events [
1]. In addition, the prices of minor agricultural products are characterized by high-frequency and large-amplitude fluctuations, complex influencing factors, and a lack of market regulation, which makes predicting them more challenging. Garlic, as a critical minor agricultural product, has profound effects on the livelihoods of industry stakeholders and consumers due to its price volatility. Based on this, the price fluctuation of garlic has attracted meticulous attention from producers, government and enterprise personnel, consumers, and other large groups. In recent years, the prices of garlic, ginger, and other minor agricultural products have often experienced drastic fluctuations, and the price spikes and plunges have threatened the stable and healthy development of allied industries [
2,
3,
4,
5].
China is the world’s largest garlic producer, exporter, and consumer. The volatile garlic prices have severely impacted workers’ interests and the sustainable development of the industry. How to grasp the fluctuation law of garlic prices and achieve an accurate prediction of garlic prices endures as a considerable difficulty in the current research field of garlic prices. To explore the fluctuation law of garlic prices, a considerable number of researchers have conducted research on the fluctuation of garlic prices. Refs. [
6,
7,
8], among others, used the seasonal adjustment method and the filtering method to prove the fluctuation of garlic prices in China has seasonality and periodicity; on the other hand, refs. [
7,
9] used the ARCH-type model to analyze the residual fluctuation characteristics of garlic prices in China, and the results showed that the fluctuation of garlic prices in China has clustering and uncertainty of the “risk–return” causal relationship. The study conducted by [
10] revealed that stochastic factors have a predominantly positive impact on garlic prices in China, leading to price increases. Furthermore, these stochastic events exert the greatest influence on garlic prices in the short term and are more likely to cause abnormal fluctuations.
In the field of agricultural product price prediction, different experts and scholars have conducted research on agricultural product price prediction from the aspects of prediction methods, price fluctuations, and influencing factors, etc. The agricultural product price prediction method has undergone a development process from qualitative methods to quantitative methods, from single models to combination models, and from traditional econometric prediction methods to intelligent prediction methods [
11]. The earliest use of econometric methods to predict agricultural product prices can be traced back to 1917 [
12]. Due to the limitations of traditional prediction methods and the impact of the characteristics of non-stationarity and non-linearity of agricultural product prices [
13,
14], the combination model containing intelligent prediction methods gradually showed its advantages. Wang built an SVM-ARIMA model to predict garlic prices [
15], and found that the proposed combination model had a better prediction performance than any single model. Ray proposed an ARIMA-LSTM model based on random forest to predict the volatility of agricultural product prices [
16], and the results showed that compared with the single model, its three indicators (RMSE, MAPE, MASE) were improved. Among the many price prediction models, the combination prediction model based on decomposition and integration is one of the hotspots in the current research field of combination prediction models. The decomposition–integration combination model refers to splitting the complex price sequence into several simpler sub-sequences. Then, each sub-sequence is predicted separately, and finally the prediction results of these sub-sequences are integrated to obtain the prediction value of the original sequence. The “decomposition–integration” prediction method significantly improved the prediction performance by analyzing the fluctuation patterns and trend laws of complex systems at different scales [
17,
18,
19,
20,
21].
In the problem of price prediction, it can be primarily divided into the prediction of the specific value of the price and the prediction of the price rise and fall. The current prediction of garlic prices mostly focuses on the prediction of values. Ref. [
22] used Census-X12 to analyze the seasonal and irregular fluctuation patterns of garlic prices, followed by ARIMA for monthly garlic price forecasting, and assessed the future garlic price trends based on the predictions. In [
10], the authors developed an ARIMA-SVM hybrid model for forecasting garlic prices, considering both linear and nonlinear aspects. The results indicated that this combination model outperformed any individual model in terms of the prediction accuracy. Wang et al. [
23] constructed a GARCH family model to obtain the volatility aggregation and other volatility characteristic information of the garlic price sequence; they subsequently used the LSTM model to learn the complex nonlinear relationship between the garlic price sequence and the sequence fluctuation characteristic information, and predicted the garlic price. The experimental results showed that the prediction performance of the LSTM and GARCH family joint model containing the garlic price fluctuation characteristic information was generally better than that of the single model. The LSTM model combined with the GARCH and PGARCH models (LSTM-GP) has the best prediction effect on garlic prices in terms of evaluation indicators such as mean absolute error, root-mean-square error, and mean absolute percentage error. The above research was carried out on garlic price prediction from various angles and used various methods, and all were based on the regression idea of numerical prediction. In real life, we may not care how much the future price is, but care more about whether the price is rising or falling [
24]. In addition, predicting an accurate value is still our goal. However, we can divide this goal into two small goals. The first step is to predict the future price rise and fall; the second step is to use the rise and fall prediction result as a feature to predict the specific number of the future price. He [
25] firstly utilized VMD to decompose the crude oil price, extracted the price volatility characteristic factors for prediction from the decomposed sequence, and constructed the feature set by merging the volatility. Then, a crude oil price trend prediction model based on multimodal data features was constructed. It was discovered that the extracted data features improved the classification prediction performance of the model.
Inspired by the above research, in this study we divided the garlic price into two trends of rise and fall and used the proposed combined feature prediction method to predict it. To begin with, VMD was used to decompose the garlic price and extract relevant data features. Secondly, the combined feature De_Vo was constructed by merging the volatility feature data, enhancing the feature information richness based on the extracted data features. Subsequently, three distinct classification algorithms, LR, SVM, and Xgboost, were employed to forecast price movements, validating the efficacy of the combined feature. This method improved the prediction accuracy and interpretability of the model by incorporating the constructed data features into each classification algorithm.
The specific structure of this paper is as follows:
Section 2 describes the data source and the algorithm used in the experiment. In
Section 3, the experimental process is explained, and the experimental results are analyzed and discussed.
Section 4 summarizes this research and future research prospects.
2. Materials and Methods
In this section a detailed description of the data source is first given and the extraction method of the data features is specified. Then, the algorithmic models used are explained.
Figure 1 shows the thematic framework diagram of this study.
In this research, we initially conducted a volatility transformation on the weekly price data of garlic, resulting in the computation of the volatility feature, vp. Subsequently, we utilized VMD to deconstruct the weekly price data of garlic, separating the high-frequency and low-frequency signal subsequences from the original time series. Feature engineering was further applied to these decomposed subsequences, from which we extracted key informational features to assemble a feature set, designated as De. This feature set encompassed the volatility parameter vp, long-term trends lf, short-term cyclicality rp, determinate trends dp, and frequency components fc. The resulting feature set De was then integrated with the volatility feature vp to formulate an even more comprehensive feature set, De_Vo. Finally, we selected three machine learning models for our experiments, including logistic regression (LR), SVM, and XGBoost, training and validating them based on the De_Vo feature set to develop predictive models. The prediction performance of the models was quantified using an array of evaluation metrics comprising accuracy, precision, recall, and F1. By contrasting the effectiveness of different algorithms, we identified the optimal predictive model.
2.1. Data Sources
Jinxiang is located in the southwestern part of Shandong Province, which is one of the main producing areas of garlic in China. The wholesale price of garlic in Jinxiang better reflects the price fluctuation of the garlic market and the market law grid of agricultural products trading. Therefore, the weekly average price of garlic in Jinxiang was selected as the experimental object in this paper. The experimental data of this study originated from the garlic Industry chain big data platform (
http://www.garlicbigdata.cn, accessed on 7 January 2024.) developed by the big data center of Shandong Agricultural University. The selected data were the daily price data from 22 May 2003 to 1 June 2023 for Jinxiang garlic.
Figure 2a depicts the original daily price data, while
Figure 2b illustrates the processed weekly average price data of Jinxiang garlic. It can be perceived that the weekly prices present a more complete picture of the trends in the raw data.
2.2. Data Feature Construction
For the sake of brevity in the subsequent paragraphs, the terminology used, as well as the acronyms, is summarized in
Figure 3.
At the same time, in order to visualize and predict the trend of price increase and decrease, a price increase is marked with a “+” sign, and a price decrease is marked with a “−” sign in
Figure 4. In the actual prediction, we chose to use “1” for a price increase and “−1” for a price decrease. The specific method was as follows: construct the volatility data vp(t) = [vr(t) − vr(t − 1)]/vr(t − 1). When vp > 0, take “1”; when vp < 0, take “−1”.
As shown in
Figure 2, the extraction of data features was divided into two steps in total. The first step was the VMD decomposition and the second step was the extraction of features based on the decomposition sequence. The two parts are described separately in the following.
2.2.1. VMD Decompositions
VMD is a decomposition method proposed by Dragomiretskiy [
26] et al. in 2014, which is not only computationally efficient, but also has advantages in solving the problems of signal noise and avoiding mode confusion. The advantage of VMD over other nonlinear signal decomposition techniques is that its theoretical model is in a constrained variational framework, which not only solves the problem of the accumulating estimation error of the envelope VMD but also has better adaptability to high-frequency noise and can decompose the signal more accurately. Therefore, VMD has a better performance in time-series forecasting including the price series of agricultural products [
27,
28,
29,
30]. VMD decomposition decomposes the original time series x(t) into K modal functions u
k by setting a reasonable preset number of modes K. If the number of IMFs is set to 2, then the VMD algorithm can decompose the signal into two IMFs, one for the high-frequency portion, and the other for the low-frequency portion. In this way, the high-frequency part and the low-frequency part of the price series can be directly extracted to reflect the short-term fluctuation and the long-term trend of the price, respectively [
25]. The principle of the VMD algorithm is briefly described below:
① For mode u
k, the corresponding analytic signal is computed using Hilbert transform to obtain a one-sided spectrum, and then the respective center frequency ω
k is adjusted by adding an exponential term to modulate the spectrum of each modal function into the baseband, so that the finite band of u
k surrounds its center frequency ω
k, and Gaussian smoothing is applied to the demodulated signals to estimate the corresponding bandwidths, so as to minimize the sum of the bandwidths, and, therefore, the constrained variational model is constructed as
where t is the time, δ(t) is the unit shock function, u
k is the decomposed mode, w
k is the center frequency corresponding to the mode, and the constraint is
; the sum of each mode is equal to the original signal
.
② Introducing a quadratic penalty factor α and a Lagrange multiplier λ, the variational problem is transformed into an unconstrained optimization problem:
③ Using the Lagrangian function to transform it from the time domain to the frequency domain and calculating the corresponding extreme values, the modal component u
k and its center frequency w
k are solved with the following expressions:
④ The optimal solution of the constrained variational model is obtained by decomposing the original signal into K narrowband modal components using the alternating direction multiplier method (ADMM) with alternating updates.
2.2.2. Feature Extraction
The required data features are constructed based on the price sequence vr(t) and the low-frequency sequence dl(t) and the high-frequency sequence dh(t) obtained from the VMD decomposition.
① Price volatility
The constructed volatility feature can reflect the degree of price volatility and trend over time. The feature can be combined with other price features or indicators to form a richer feature space, thus improving the robustness and generalization ability of the forecast. The formula is as follows:
② Low-frequency volatility (main trend volatility)
The low-frequency volatility characteristics are constructed from the decomposed low-frequency series. Finding the volatility of the low-frequency series can obtain a macro-level trend of the price, which reflects the long-term trend of the price and helps to predict the direction of price fluctuations. The low-frequency volatility at time t is calculated using the formula as follows:
③ Relative position
The relative position reflects the direction of price fluctuations during the window period corresponding to moment t. Its value is the price relative index at moment t, which helps to predict the direction of price fluctuations and is calculated as follows:
④ Deviate position
The deviation position shows the direction and extent to which price has deviated from the long-term trend at time t. When price deviates significantly from the long-term trend, it triggers a move towards the long-term trend line. The calculation is as follows:
⑤ Variance ratio
The variance ratio indicates the proportion of short-term volatility in the overall volatility within a recent historical window. The variance ratio mainly reflects the degree of price volatility, which is mainly captured using the high-frequency term dh(t). A large variance ratio indicates a large recent volatility and can provide information about price volatility. It is calculated as follows:
In this study, the above five types of features were selected based on a comprehensive consideration of prediction performance. When using VMD for decomposition, it is difficult to ensure that the decomposition results can completely retain all the information we need for forecasting. To a certain extent, the features of ① can compensate for the lack of information in the decomposition and capture both the short-term dynamics and the long-term trend of the price from another perspective. These five types of features are interrelated and cross-responsive to each other, which can portray the upward and downward price trends well.
2.3. Classification Algorithm
In this study, algorithms such as logistic regression, support vector machine, and XGBoost were selected to predict price rises and falls. A brief description of each algorithm is given below.
2.3.1. Logistic Regression
Logistic regression is a linear model widely applied to classification problems, which is particularly suitable for binary classification issues. This algorithm utilizes a logistic function (typically the Sigmoid function) to map the output of a linear regression model to the interval (0, 1), thus allowing it to be interpreted as a probability. Specifically, the logistic regression model predicts the log-odds of the target variable given input features, which are then transformed into probability values through the Sigmoid function. In tasks such as predicting the rise or fall of garlic prices, logistic regression can estimate the probability of price movements using historical price data and features constructed through various methods. Mathematically, the logistic regression model can be expressed as
where
represents the probability that the target variable Y (such as the price going up) equals 1 given the features X; where
denotes the probability that the target variable Y (e.g., price decline) is equal to 0 under the condition of characteristic X; β
0, β
1, …; β
n are the model parameters; and X
1, X
2, …, X
n are the feature variables.
2.3.2. Support Vector Machine
Support vector machine is a binary classification model. Its basic model is a linear classifier with the maximum margin defined in the feature space. Given the data set
, where
is 1 or −1, indicating the class to which the point
belongs, and
is a D-dimensional real vector. The most basic idea of classification learning is to find a maximum margin hyperplane in the sample space based on the data set
T, which separates the samples of different classes. In the sample space, the dividing hyperplane can be described using the following equation:
where w = (w1; w2; …; wd) is the normal vector, which determines the direction of the hyperplane, and b is the displacement term, which determines the lift between the hyperplane and the origin. The distance r from any point x in the sample space to the hyperplane (w, b) can be obtained from the following equation:
The sample points closest to the hyperplane are called “support vectors”, and the sum of the distances from the two different support vectors to the hyperplane is , which is called the “margin”. Finding the maximum margin to divide the hyperplane is to find the parameters w and b that make γ the largest. After determining w and b, for any new test sample with an unknown label, it is easy to obtain the classification result according to the sign of .
2.3.3. Extreme Gradient Boosting
XGBoost (Extreme Gradient Boosting) is an efficient algorithm that implements the gradient boosting framework, optimizing the speed and performance of traditional gradient boosting techniques. XGBoost enhances the model by iteratively adding new predictors (decision trees) and modeling the residuals at each step to gradually reduce the error. It incorporates regularization terms to control the complexity of the model, effectively preventing overfitting and ensuring outstanding performance across various prediction tasks, including the classification prediction of price movements. In each iteration, XGBoost evaluates the reduction in the loss function brought about by adding a new tree and selects the tree structure that minimizes the loss. The objective function of XGBoost comprises two parts: a loss function (such as logistic loss) that measures the discrepancy between the predicted values and the actual values, and a regularization term that penalizes the complexity of the model. XGBoost has attracted a lot of attention from researchers in the field of price prediction due to its powerful performance [
31,
32,
33,
34]. The mathematical expression of the XGBoost objective function is
where n is the number of samples, yi is the true value of the i-th sample, t is the number of decision trees,
is the prediction value of the previous t − 1 rounds of iteration,
is the model prediction value of the t-th round of iteration, and
is the model complexity penalty term,
where T is the number of leaf nodes in the decision tree, γ is the complexity parameter in the decision tree, λ is the L2 regularization coefficient, and w is the norm of the leaf node vector.
2.4. Evaluation Metrics
Taking binary classification as an example, the following are the prediction evaluation values.
The predicted value is positive (Positive) and the true value is also positive (Positive), so the prediction is true (True); True Positive (TP).
The predicted value is negative (Negative), but the true value is positive (Positive), so the prediction fails (False); False Negative (FN).
The predicted value is positive (Positive) but the true value is negative (Negative), so the prediction fails (False); False Positive (FP).
The predicted value is negative (Negative) and the true value is also negative (Negative), so the prediction is true (True); True Negative (TN).
In order to evaluate the prediction results, the evaluation methods involved in this study were as follows.
① Accuracy
Accuracy is the proportion of samples that are correctly classified out of the total number of samples. The formula for accuracy is as follows:
② Precision
Precision is the proportion of samples that are correctly classified out of the total number of samples.
③ Recall
Recall is the proportion of positive samples that are correctly classified out of the true number of positive samples. Recall is also a statistic for partial samples, focusing on the statistics of the true positive samples.
④ F1
When different models have advantages in recall and precision, F1 can be used to compare them. F1 is the harmonic mean of precision and recall, and it is defined as
3. Analysis and Discussion of Experimental Results
This section is divided using subheadings. It provides a concise and precise description of the experimental results, their interpretation, and the experimental conclusions that can be drawn.
3.1. Parameter Selection
In this study, the data were divided into a training set, a validation set, and a test set at a ratio of 7:1:2. Price volatility features along with other features based on decomposition were the input data used for the classification algorithm of this study. Therefore, the number of lags of volatility and the size of the decomposition window needed to be selected.
The performance of the classification algorithm was examined when the price lag period was 1 to 8 days. According to the test results, the optimal number of lag days selected in this study was 3. That is, the volatility of t − 1, t − 2, and t − 3 was used to predict the price of the t-th day (
Figure 5).
In order to make the extracted features reflect the price trend information as much as possible, it is necessary to determine the size of the decomposition window. In this study we selected 13 weeks, 26 weeks, 39 weeks, and 52 weeks as the control groups. The results of
Figure 6 clearly show that when the window period was 39, the accuracy and F1 had higher values. That is, when the window period was 39, the data features constructed using the VMD decomposition results could characterize the trend of the garlic price series well.
Figure 7a–d are the comparison graphs of the low-frequency part of the VMD decomposition graph and the original price series under the window scales of 13, 26, 39, and 52, respectively. It can be seen from the figure that when the window scale was 39, the low-frequency decomposition sequence could better simulate the price trend. Therefore, the window scale selected in this study was 39.
3.2. Analysis of Predicted Results
After selecting the model hyperparameters, the performance of the model needs to be tested and compared on the test set. Some parameters of the three algorithms are shown in
Table 1. This paper analyzed the prediction results from the following three perspectives.
3.2.1. Comparison of Prediction Results of Different Classification Models
As shown in
Figure 8, the predictive performance of the three classification algorithms ranked from low to high was LR, SVM, and XGBoost. Compared with the logistic regression algorithm, SVM and XGBoost had higher accuracy and F1 values, with both exceeding 70%. This is not surprising, as the garlic price has complex nonlinear characteristics, and SVM and XGBoost have stronger abilities to handle complex nonlinear relationships. Therefore, SVM and XGBoost have better classification prediction effects.
3.2.2. Comparison of Prediction Results for Different Data Characteristics
We conducted the following three sets of experiments on LR, SVM, and XGBoost models: ① using only the volatility feature Vo; ② using only the data features De constructed from the decomposition sequence; ③ using the combined feature De_Vo.
Figure 9a–c show a comparison of the prediction evaluation indicators for LR, SVM, and XGBoost under scenarios ①, ②, and ③, respectively. It can be seen from the figures that for any classification model, the prediction indicators based on the combined feature, such as accuracy, precision, recall, and F1, were better than the single Vo and De features. In terms of accuracy, the LR, SVM, and XGBoost models increased by 25.8%, 23.4%, and 15.5%, respectively. Although the improvement of LR’s prediction indicators was higher than that of SVM and XGBoost, the overall prediction performance of SVM and XGBoost was better than that of the LR model. In terms of accuracy, SVM and XGBoost were 14.2% and 16.6% higher than LR, respectively. This shows that compared with single features, combined features can better describe the trend of price fluctuations, thereby improving the accuracy of model predictions. Secondly, the algorithm was one of the key factors affecting the prediction performance, so in future research, we can consider optimizing the algorithm or using a combination model to improve the prediction ability of the model.
3.2.3. Predicted Results in the Case of Multiple Classifications
In reality, in addition to the upward and downward trends in prices, the magnitude of price increases and decreases are simultaneously of great concern. Therefore, a solution to the problem of multi-categorization is to divide the trend of garlic price more carefully. If the price has three classifications, the price volatility of the data can be regarded as slightly fluctuating, so it can be divided into three categories: rising, with slight fluctuations, and falling. Similarly, with four classifications the price trend can be divided into a large rise, a rise, a fall, and a large decline; with five classifications the price trend can be divided into a large rise, a rise, slight fluctuations, a fall, and a large decline.
Table 2 shows the prediction results when the number of classifications was 2, 3, 4, 5, and it can be seen that the classification prediction ability of the XGBoost model was stronger than that of the SVM and LR models.
It can be clearly seen from
Figure 10 that the accuracy decreases gradually with the increase in the number of classification categories. Among them, from S2 to S3, the accuracy decreases the most, and the accuracy of SVM and XGBoost decreases significantly more than that of LR model. From the S3–S5 stage, the accuracy rate of XGBoost shows more prominent changes, implying that the XGBoost model has higher sensitivity to the change in the number of categories.
Figure 11 portrays the rate of improvement of the prediction accuracy compared to the random selection model. The random selection model means that the accuracy of each choice is considered as 50% when two classifications are made, 33% for each choice when three classifications are made and so on. S2, S3, S4, and S5 represent the cases when the number of classifications is 2, 3, 4, and 5, respectively. It can be seen that as the number of classifications increases, the increase in accuracy increases. From S2 to S5, the accuracy improvement rate of the LR, SVM, and XGBoost models rose from 25.1% to 51.3%, from 42.9% to 84.2%, and from 45.9% to 92.6%, respectively. Among them, the change in the XGBoost model was the most prominent, which on the other hand reflects the higher sensitivity of the XGBoost model to the number of classifications. In addition, along with the increase in the number of classifications, the accuracy of the model classification prediction consequently decreased. Therefore, it is necessary to weigh the advantages and disadvantages before choosing the number of classifications according to the actual needs in the application.
3.3. Discussion
(1) This study used the garlic price as an example and predicted the price fluctuation of small-scale agricultural products from a classification perspective. It used the new features constructed from the sequences obtained using VMD decomposition and the volatility features to form a composite feature for garlic price classification prediction. The results show that, when performing a binary classification trend prediction, this method performs excellently on garlic price fluctuation, and the accuracy of the XGBoost model reached 72.9%. Through comparative experiments, it was found that the prediction performance of the composite feature De_Vo was better than using De and Vo features alone. This prediction method has a certain practicality and a deeper development value.
(2) In terms of the similarities and differences between this study and previous studies, in previous studies the price research of many agricultural products, including garlic, mainly focused on the price volatility characteristics [
35,
36,
37,
38,
39] and the specific value prediction [
40,
41,
42,
43]. This study is the first to use the new data features constructed using decomposing sequences and the volatility features to predict the price fluctuation of garlic. Compared with previous studies, the similarity of this study is that it uses the feature and model method to predict the price. The differences are that (1) previous studies focused more on approaching the specific values, while this study mainly studied the trend and direction of the price; and (2) previous studies used more collected data or processed data as features, rather than reconstructing new features.
(3) In terms of the theoretical significance and practical application of this study, the theoretical significance of this study lies in constructing a garlic price fluctuation prediction method based on multi-dimensional composite features. It provides new research ideas for future researchers on price prediction. This prediction method has high practical application: firstly, it can predict the price fluctuation, and help the industry personnel to prepare for the upcoming price changes in advance; in addition, after making some improvements to the model, the multi-classification prediction results can also be used for risk level early warnings for agricultural product prices.
Specifically speaking, accurate price trend predictions are crucial for farmers, traders, and policymakers, assisting these groups in making informed economic decisions amidst market uncertainty. Farmers can leverage price forecast data to optimize their planting and selling strategies. By understanding future price movements, they can decide on the appropriate scale of cultivation and timing of sales to maximize returns. For instance, farmers may delay sales to seek higher profits when prices are predicted to rise; conversely, they might opt for early sales to avoid losses when a price decline is anticipated. For traders, accurate price predictions enable the formulation of effective inventory and pricing strategies. This information allows traders to determine when to buy or sell garlic and how to price it, aiming to optimize profits. Policymakers can use the forecast outcomes to guide the development and adjustment of agricultural policies. Based on price trends, they can implement supportive measures, such as market interventions and price controls, to maintain market stability and protect farmers’ interests.
Therefore, the predictive model developed in this study not only offers a scientific forecast of market trends but also provides strategic guidance for the economic sustainability of the agricultural sector. To ensure the effective implementation of the predictive model, specialized training is recommended to enhance stakeholders’ understanding and application capabilities. Additionally, mechanisms should be established to ensure regular updates and adjustments to the model based on market feedback, maintaining its accuracy and practicality over time.
(4) The limitations or potential defects of this study are as follows. (1) This study mainly predicted the price fluctuation of garlic. The classification model selects three common classification algorithm models. From the research results, it can be seen that the prediction performance of different algorithm models exhibited large differences. Therefore, there may be more excellent algorithm models that can provide higher accuracy prediction results. When we have higher requirements for the accuracy of prediction, we can use other more suitable models for prediction. (2) The features used in this model mainly included volatility features, and four new features constructed using two sequences decomposed using VMD. Although these composite features can explain price fluctuations to a certain extent, they do not fully cover all possible factors that might affect the price. In particular, the model did not sufficiently consider key factors such as festivals, weather, and public sentiment, which could impact garlic prices. For example, in terms of the festival factor, with garlic as one of the indispensable raw materials for Chinese catering and other industries, coupled with the strong atmosphere of various Chinese festivals, garlic prices are easily affected by festival information and changes. The authors of [
23] found that after the occurrence of random events, the impact of its short-term impact on garlic prices is the largest, and it is more likely to cause abnormal fluctuations in garlic prices. Therefore, when predicting price fluctuations, festival information can be used as a feature to integrate into the prediction model, thereby improving the prediction ability and interpretability of the model.