Next Article in Journal
Human-Following Strategy for Orchard Mobile Robot Based on the KCF-YOLO Algorithm
Previous Article in Journal
Effects of Gibberellic Acid on Soluble Sugar Content, Organic Acid Composition, Endogenous Hormone Levels, and Carbon Sink Strength in Shine Muscat Grapes during Berry Development Stage
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Prediction of Weekly Price Trend of Garlic Based on Classification Algorithm and Combined Features

1
School of Information Science and Engineering, Shandong Agricultural University, Taian 271018, China
2
Key Laboratory of Huang-Huai-Hai Smart Agricultural Technology, Ministry of Agriculture and Rural Affairs, Taian 271018, China
3
Agricultural Big-Data Research Center, Shandong Agricultural University, Taian 271018, China
*
Author to whom correspondence should be addressed.
Horticulturae 2024, 10(4), 347; https://doi.org/10.3390/horticulturae10040347
Submission received: 17 February 2024 / Revised: 22 March 2024 / Accepted: 26 March 2024 / Published: 30 March 2024

Abstract

:
To promote the sustainable development of the garlic industry and provide a reference for the prediction of agricultural product price trends, this study used the garlic price in Jinxiang, China as the research object. First, the feature combination De was obtained by extracting the sequence obtained using VMD decomposition. Then, the De_Vo combined feature was constructed by combining the volatility feature Vo. Classification algorithms, such as logistic regression, SVM, and XGBoost, were used to classify and predict the garlic price trend. The results showed that the prediction results based on the combined features were better than those based on the single De or Vo features. In the binary classification prediction, the accuracy values for LR, SVM, and XGBoost were 62.6%, 71.4%, and 72.9%, respectively. Among them, the XGBoost algorithm performed better than the LR and SVM algorithms in the three-class, four-class, and five-class predictions.

1. Introduction

The agricultural output comprises major and minor agricultural products, each pivotal in the agricultural economy. Major agricultural products, such as wheat, rice, and maize, are characterized by large-scale production and considerable economic value. Conversely, minor agricultural products, including certain fruits, vegetables, and specialty crops and herbs, are produced on a smaller scale with relatively lower economic value. With the rise of modern agriculture, minor agricultural products have become increasingly integral to the agricultural economy, significantly impacting farmers’ incomes. Minor agricultural products exhibit lower supply, demand, and transaction volume compared to major ones, rendering them more vulnerable to frequent and substantial price fluctuations, as well as market shocks resulting from stochastic events [1]. In addition, the prices of minor agricultural products are characterized by high-frequency and large-amplitude fluctuations, complex influencing factors, and a lack of market regulation, which makes predicting them more challenging. Garlic, as a critical minor agricultural product, has profound effects on the livelihoods of industry stakeholders and consumers due to its price volatility. Based on this, the price fluctuation of garlic has attracted meticulous attention from producers, government and enterprise personnel, consumers, and other large groups. In recent years, the prices of garlic, ginger, and other minor agricultural products have often experienced drastic fluctuations, and the price spikes and plunges have threatened the stable and healthy development of allied industries [2,3,4,5].
China is the world’s largest garlic producer, exporter, and consumer. The volatile garlic prices have severely impacted workers’ interests and the sustainable development of the industry. How to grasp the fluctuation law of garlic prices and achieve an accurate prediction of garlic prices endures as a considerable difficulty in the current research field of garlic prices. To explore the fluctuation law of garlic prices, a considerable number of researchers have conducted research on the fluctuation of garlic prices. Refs. [6,7,8], among others, used the seasonal adjustment method and the filtering method to prove the fluctuation of garlic prices in China has seasonality and periodicity; on the other hand, refs. [7,9] used the ARCH-type model to analyze the residual fluctuation characteristics of garlic prices in China, and the results showed that the fluctuation of garlic prices in China has clustering and uncertainty of the “risk–return” causal relationship. The study conducted by [10] revealed that stochastic factors have a predominantly positive impact on garlic prices in China, leading to price increases. Furthermore, these stochastic events exert the greatest influence on garlic prices in the short term and are more likely to cause abnormal fluctuations.
In the field of agricultural product price prediction, different experts and scholars have conducted research on agricultural product price prediction from the aspects of prediction methods, price fluctuations, and influencing factors, etc. The agricultural product price prediction method has undergone a development process from qualitative methods to quantitative methods, from single models to combination models, and from traditional econometric prediction methods to intelligent prediction methods [11]. The earliest use of econometric methods to predict agricultural product prices can be traced back to 1917 [12]. Due to the limitations of traditional prediction methods and the impact of the characteristics of non-stationarity and non-linearity of agricultural product prices [13,14], the combination model containing intelligent prediction methods gradually showed its advantages. Wang built an SVM-ARIMA model to predict garlic prices [15], and found that the proposed combination model had a better prediction performance than any single model. Ray proposed an ARIMA-LSTM model based on random forest to predict the volatility of agricultural product prices [16], and the results showed that compared with the single model, its three indicators (RMSE, MAPE, MASE) were improved. Among the many price prediction models, the combination prediction model based on decomposition and integration is one of the hotspots in the current research field of combination prediction models. The decomposition–integration combination model refers to splitting the complex price sequence into several simpler sub-sequences. Then, each sub-sequence is predicted separately, and finally the prediction results of these sub-sequences are integrated to obtain the prediction value of the original sequence. The “decomposition–integration” prediction method significantly improved the prediction performance by analyzing the fluctuation patterns and trend laws of complex systems at different scales [17,18,19,20,21].
In the problem of price prediction, it can be primarily divided into the prediction of the specific value of the price and the prediction of the price rise and fall. The current prediction of garlic prices mostly focuses on the prediction of values. Ref. [22] used Census-X12 to analyze the seasonal and irregular fluctuation patterns of garlic prices, followed by ARIMA for monthly garlic price forecasting, and assessed the future garlic price trends based on the predictions. In [10], the authors developed an ARIMA-SVM hybrid model for forecasting garlic prices, considering both linear and nonlinear aspects. The results indicated that this combination model outperformed any individual model in terms of the prediction accuracy. Wang et al. [23] constructed a GARCH family model to obtain the volatility aggregation and other volatility characteristic information of the garlic price sequence; they subsequently used the LSTM model to learn the complex nonlinear relationship between the garlic price sequence and the sequence fluctuation characteristic information, and predicted the garlic price. The experimental results showed that the prediction performance of the LSTM and GARCH family joint model containing the garlic price fluctuation characteristic information was generally better than that of the single model. The LSTM model combined with the GARCH and PGARCH models (LSTM-GP) has the best prediction effect on garlic prices in terms of evaluation indicators such as mean absolute error, root-mean-square error, and mean absolute percentage error. The above research was carried out on garlic price prediction from various angles and used various methods, and all were based on the regression idea of numerical prediction. In real life, we may not care how much the future price is, but care more about whether the price is rising or falling [24]. In addition, predicting an accurate value is still our goal. However, we can divide this goal into two small goals. The first step is to predict the future price rise and fall; the second step is to use the rise and fall prediction result as a feature to predict the specific number of the future price. He [25] firstly utilized VMD to decompose the crude oil price, extracted the price volatility characteristic factors for prediction from the decomposed sequence, and constructed the feature set by merging the volatility. Then, a crude oil price trend prediction model based on multimodal data features was constructed. It was discovered that the extracted data features improved the classification prediction performance of the model.
Inspired by the above research, in this study we divided the garlic price into two trends of rise and fall and used the proposed combined feature prediction method to predict it. To begin with, VMD was used to decompose the garlic price and extract relevant data features. Secondly, the combined feature De_Vo was constructed by merging the volatility feature data, enhancing the feature information richness based on the extracted data features. Subsequently, three distinct classification algorithms, LR, SVM, and Xgboost, were employed to forecast price movements, validating the efficacy of the combined feature. This method improved the prediction accuracy and interpretability of the model by incorporating the constructed data features into each classification algorithm.
The specific structure of this paper is as follows: Section 2 describes the data source and the algorithm used in the experiment. In Section 3, the experimental process is explained, and the experimental results are analyzed and discussed. Section 4 summarizes this research and future research prospects.

2. Materials and Methods

In this section a detailed description of the data source is first given and the extraction method of the data features is specified. Then, the algorithmic models used are explained. Figure 1 shows the thematic framework diagram of this study.
In this research, we initially conducted a volatility transformation on the weekly price data of garlic, resulting in the computation of the volatility feature, vp. Subsequently, we utilized VMD to deconstruct the weekly price data of garlic, separating the high-frequency and low-frequency signal subsequences from the original time series. Feature engineering was further applied to these decomposed subsequences, from which we extracted key informational features to assemble a feature set, designated as De. This feature set encompassed the volatility parameter vp, long-term trends lf, short-term cyclicality rp, determinate trends dp, and frequency components fc. The resulting feature set De was then integrated with the volatility feature vp to formulate an even more comprehensive feature set, De_Vo. Finally, we selected three machine learning models for our experiments, including logistic regression (LR), SVM, and XGBoost, training and validating them based on the De_Vo feature set to develop predictive models. The prediction performance of the models was quantified using an array of evaluation metrics comprising accuracy, precision, recall, and F1. By contrasting the effectiveness of different algorithms, we identified the optimal predictive model.

2.1. Data Sources

Jinxiang is located in the southwestern part of Shandong Province, which is one of the main producing areas of garlic in China. The wholesale price of garlic in Jinxiang better reflects the price fluctuation of the garlic market and the market law grid of agricultural products trading. Therefore, the weekly average price of garlic in Jinxiang was selected as the experimental object in this paper. The experimental data of this study originated from the garlic Industry chain big data platform (http://www.garlicbigdata.cn, accessed on 7 January 2024.) developed by the big data center of Shandong Agricultural University. The selected data were the daily price data from 22 May 2003 to 1 June 2023 for Jinxiang garlic. Figure 2a depicts the original daily price data, while Figure 2b illustrates the processed weekly average price data of Jinxiang garlic. It can be perceived that the weekly prices present a more complete picture of the trends in the raw data.

2.2. Data Feature Construction

For the sake of brevity in the subsequent paragraphs, the terminology used, as well as the acronyms, is summarized in Figure 3.
At the same time, in order to visualize and predict the trend of price increase and decrease, a price increase is marked with a “+” sign, and a price decrease is marked with a “−” sign in Figure 4. In the actual prediction, we chose to use “1” for a price increase and “−1” for a price decrease. The specific method was as follows: construct the volatility data vp(t) = [vr(t) − vr(t − 1)]/vr(t − 1). When vp > 0, take “1”; when vp < 0, take “−1”.
As shown in Figure 2, the extraction of data features was divided into two steps in total. The first step was the VMD decomposition and the second step was the extraction of features based on the decomposition sequence. The two parts are described separately in the following.

2.2.1. VMD Decompositions

VMD is a decomposition method proposed by Dragomiretskiy [26] et al. in 2014, which is not only computationally efficient, but also has advantages in solving the problems of signal noise and avoiding mode confusion. The advantage of VMD over other nonlinear signal decomposition techniques is that its theoretical model is in a constrained variational framework, which not only solves the problem of the accumulating estimation error of the envelope VMD but also has better adaptability to high-frequency noise and can decompose the signal more accurately. Therefore, VMD has a better performance in time-series forecasting including the price series of agricultural products [27,28,29,30]. VMD decomposition decomposes the original time series x(t) into K modal functions uk by setting a reasonable preset number of modes K. If the number of IMFs is set to 2, then the VMD algorithm can decompose the signal into two IMFs, one for the high-frequency portion, and the other for the low-frequency portion. In this way, the high-frequency part and the low-frequency part of the price series can be directly extracted to reflect the short-term fluctuation and the long-term trend of the price, respectively [25]. The principle of the VMD algorithm is briefly described below:
① For mode uk, the corresponding analytic signal is computed using Hilbert transform to obtain a one-sided spectrum, and then the respective center frequency ωk is adjusted by adding an exponential term to modulate the spectrum of each modal function into the baseband, so that the finite band of uk surrounds its center frequency ωk, and Gaussian smoothing is applied to the demodulated signals to estimate the corresponding bandwidths, so as to minimize the sum of the bandwidths, and, therefore, the constrained variational model is constructed as
m i n u k , { w k } k t ( δ ( t ) + j π t ) u k ( t ) e j w k t ) 2 2 ,
where t is the time, δ(t) is the unit shock function, uk is the decomposed mode, wk is the center frequency corresponding to the mode, and the constraint is k u k = f ( t ) ; the sum of each mode is equal to the original signal f ( t ) .
② Introducing a quadratic penalty factor α and a Lagrange multiplier λ, the variational problem is transformed into an unconstrained optimization problem:
L u k , w k , λ = α k = 1 K t δ t + j π t u k t e j w k t ) 2 2 + f t k = 1 K u k t 2 2 + λ t , f t k = 1 K u k t
③ Using the Lagrangian function to transform it from the time domain to the frequency domain and calculating the corresponding extreme values, the modal component uk and its center frequency wk are solved with the following expressions:
u ^ k n + 1 w = f ^ w i k u i ^ ( w + λ ^ w 2 1 + 2 α w w k 2
w ^ k n + 1 = 0 w u k ^ ( w ) 2 d w 0 u k ^ ( w ) 2 d w
④ The optimal solution of the constrained variational model is obtained by decomposing the original signal into K narrowband modal components using the alternating direction multiplier method (ADMM) with alternating updates.

2.2.2. Feature Extraction

The required data features are constructed based on the price sequence vr(t) and the low-frequency sequence dl(t) and the high-frequency sequence dh(t) obtained from the VMD decomposition.
① Price volatility
The constructed volatility feature can reflect the degree of price volatility and trend over time. The feature can be combined with other price features or indicators to form a richer feature space, thus improving the robustness and generalization ability of the forecast. The formula is as follows:
v p t = v r ( t ) v r ( t 1 ) v r ( t 1 )
② Low-frequency volatility (main trend volatility)
The low-frequency volatility characteristics are constructed from the decomposed low-frequency series. Finding the volatility of the low-frequency series can obtain a macro-level trend of the price, which reflects the long-term trend of the price and helps to predict the direction of price fluctuations. The low-frequency volatility at time t is calculated using the formula as follows:
l f t = d l ( t ) d l ( t w + 1 ) d l ( t w + 1 )
③ Relative position
The relative position reflects the direction of price fluctuations during the window period corresponding to moment t. Its value is the price relative index at moment t, which helps to predict the direction of price fluctuations and is calculated as follows:
r p t = d l ( t ) m i n d l [ t w + 1 : t ] m a x d l t w + 1 : t m i n d l [ t w + 1 : t ]
④ Deviate position
The deviation position shows the direction and extent to which price has deviated from the long-term trend at time t. When price deviates significantly from the long-term trend, it triggers a move towards the long-term trend line. The calculation is as follows:
d p t = v r t d l ( t ) d l ( t )
⑤ Variance ratio
The variance ratio indicates the proportion of short-term volatility in the overall volatility within a recent historical window. The variance ratio mainly reflects the degree of price volatility, which is mainly captured using the high-frequency term dh(t). A large variance ratio indicates a large recent volatility and can provide information about price volatility. It is calculated as follows:
f c t = v a r ( x h t w + 1 : t ) v a r x l t w + 1 : t + v a r ( x h t w + 1 : t )
In this study, the above five types of features were selected based on a comprehensive consideration of prediction performance. When using VMD for decomposition, it is difficult to ensure that the decomposition results can completely retain all the information we need for forecasting. To a certain extent, the features of ① can compensate for the lack of information in the decomposition and capture both the short-term dynamics and the long-term trend of the price from another perspective. These five types of features are interrelated and cross-responsive to each other, which can portray the upward and downward price trends well.

2.3. Classification Algorithm

In this study, algorithms such as logistic regression, support vector machine, and XGBoost were selected to predict price rises and falls. A brief description of each algorithm is given below.

2.3.1. Logistic Regression

Logistic regression is a linear model widely applied to classification problems, which is particularly suitable for binary classification issues. This algorithm utilizes a logistic function (typically the Sigmoid function) to map the output of a linear regression model to the interval (0, 1), thus allowing it to be interpreted as a probability. Specifically, the logistic regression model predicts the log-odds of the target variable given input features, which are then transformed into probability values through the Sigmoid function. In tasks such as predicting the rise or fall of garlic prices, logistic regression can estimate the probability of price movements using historical price data and features constructed through various methods. Mathematically, the logistic regression model can be expressed as
P ( Y = 1 | X ) = 1 1 + e ( β 0 + β 1 X 1 + . . . + β n X n )
P ( Y = 0 | X ) = 1 1 + e ( β 0 + β 1 X 1 + . . . + β n X n )
where P Y = 1 | x represents the probability that the target variable Y (such as the price going up) equals 1 given the features X; where P Y = 0 | x denotes the probability that the target variable Y (e.g., price decline) is equal to 0 under the condition of characteristic X; β0, β1, …; βn are the model parameters; and X1, X2, …, Xn are the feature variables.

2.3.2. Support Vector Machine

Support vector machine is a binary classification model. Its basic model is a linear classifier with the maximum margin defined in the feature space. Given the data set T = x ( i ) , y ( i ) | x i R D , y ( i ) 1,1 , where y ( i ) is 1 or −1, indicating the class to which the point x ( i ) belongs, and x ( i ) is a D-dimensional real vector. The most basic idea of classification learning is to find a maximum margin hyperplane in the sample space based on the data set T, which separates the samples of different classes. In the sample space, the dividing hyperplane can be described using the following equation:
w T x + b = 0
where w = (w1; w2; …; wd) is the normal vector, which determines the direction of the hyperplane, and b is the displacement term, which determines the lift between the hyperplane and the origin. The distance r from any point x in the sample space to the hyperplane (w, b) can be obtained from the following equation:
r = w T x + B w
The sample points closest to the hyperplane are called “support vectors”, and the sum of the distances from the two different support vectors to the hyperplane is γ = 2 w , which is called the “margin”. Finding the maximum margin to divide the hyperplane is to find the parameters w and b that make γ the largest. After determining w and b, for any new test sample x ( i ) , y ( i ) with an unknown label, it is easy to obtain the classification result according to the sign of w T x ( i ) + b .

2.3.3. Extreme Gradient Boosting

XGBoost (Extreme Gradient Boosting) is an efficient algorithm that implements the gradient boosting framework, optimizing the speed and performance of traditional gradient boosting techniques. XGBoost enhances the model by iteratively adding new predictors (decision trees) and modeling the residuals at each step to gradually reduce the error. It incorporates regularization terms to control the complexity of the model, effectively preventing overfitting and ensuring outstanding performance across various prediction tasks, including the classification prediction of price movements. In each iteration, XGBoost evaluates the reduction in the loss function brought about by adding a new tree and selects the tree structure that minimizes the loss. The objective function of XGBoost comprises two parts: a loss function (such as logistic loss) that measures the discrepancy between the predicted values and the actual values, and a regularization term that penalizes the complexity of the model. XGBoost has attracted a lot of attention from researchers in the field of price prediction due to its powerful performance [31,32,33,34]. The mathematical expression of the XGBoost objective function is
L θ = i = 1 n l y i , y ^ i ( t 1 ) + f t ( x i ) + Ω ( f t )
where n is the number of samples, yi is the true value of the i-th sample, t is the number of decision trees, y ^ i ( t 1 ) is the prediction value of the previous t − 1 rounds of iteration, f t ( x i ) is the model prediction value of the t-th round of iteration, and Ω f t is the model complexity penalty term,
Ω f t = γ T + 1 2 λ w 2
where T is the number of leaf nodes in the decision tree, γ is the complexity parameter in the decision tree, λ is the L2 regularization coefficient, and w is the norm of the leaf node vector.

2.4. Evaluation Metrics

Taking binary classification as an example, the following are the prediction evaluation values.
  • The predicted value is positive (Positive) and the true value is also positive (Positive), so the prediction is true (True); True Positive (TP).
  • The predicted value is negative (Negative), but the true value is positive (Positive), so the prediction fails (False); False Negative (FN).
  • The predicted value is positive (Positive) but the true value is negative (Negative), so the prediction fails (False); False Positive (FP).
  • The predicted value is negative (Negative) and the true value is also negative (Negative), so the prediction is true (True); True Negative (TN).
In order to evaluate the prediction results, the evaluation methods involved in this study were as follows.
① Accuracy
Accuracy is the proportion of samples that are correctly classified out of the total number of samples. The formula for accuracy is as follows:
A c c u r a c y = n c o r r e c t n t o t a l = T P + T N T P + F N + F P + T N
② Precision
Precision is the proportion of samples that are correctly classified out of the total number of samples.
P r e c i s i o n = T P T P + F P
③ Recall
Recall is the proportion of positive samples that are correctly classified out of the true number of positive samples. Recall is also a statistic for partial samples, focusing on the statistics of the true positive samples.
R e c a l l = T P T P + F N
④ F1
When different models have advantages in recall and precision, F1 can be used to compare them. F1 is the harmonic mean of precision and recall, and it is defined as
F 1 = 2 P r e c i s i o n R e c a l l P r e c i s i o n + R e c a l l

3. Analysis and Discussion of Experimental Results

This section is divided using subheadings. It provides a concise and precise description of the experimental results, their interpretation, and the experimental conclusions that can be drawn.

3.1. Parameter Selection

In this study, the data were divided into a training set, a validation set, and a test set at a ratio of 7:1:2. Price volatility features along with other features based on decomposition were the input data used for the classification algorithm of this study. Therefore, the number of lags of volatility and the size of the decomposition window needed to be selected.
The performance of the classification algorithm was examined when the price lag period was 1 to 8 days. According to the test results, the optimal number of lag days selected in this study was 3. That is, the volatility of t − 1, t − 2, and t − 3 was used to predict the price of the t-th day (Figure 5).
In order to make the extracted features reflect the price trend information as much as possible, it is necessary to determine the size of the decomposition window. In this study we selected 13 weeks, 26 weeks, 39 weeks, and 52 weeks as the control groups. The results of Figure 6 clearly show that when the window period was 39, the accuracy and F1 had higher values. That is, when the window period was 39, the data features constructed using the VMD decomposition results could characterize the trend of the garlic price series well.
Figure 7a–d are the comparison graphs of the low-frequency part of the VMD decomposition graph and the original price series under the window scales of 13, 26, 39, and 52, respectively. It can be seen from the figure that when the window scale was 39, the low-frequency decomposition sequence could better simulate the price trend. Therefore, the window scale selected in this study was 39.

3.2. Analysis of Predicted Results

After selecting the model hyperparameters, the performance of the model needs to be tested and compared on the test set. Some parameters of the three algorithms are shown in Table 1. This paper analyzed the prediction results from the following three perspectives.

3.2.1. Comparison of Prediction Results of Different Classification Models

As shown in Figure 8, the predictive performance of the three classification algorithms ranked from low to high was LR, SVM, and XGBoost. Compared with the logistic regression algorithm, SVM and XGBoost had higher accuracy and F1 values, with both exceeding 70%. This is not surprising, as the garlic price has complex nonlinear characteristics, and SVM and XGBoost have stronger abilities to handle complex nonlinear relationships. Therefore, SVM and XGBoost have better classification prediction effects.

3.2.2. Comparison of Prediction Results for Different Data Characteristics

We conducted the following three sets of experiments on LR, SVM, and XGBoost models: ① using only the volatility feature Vo; ② using only the data features De constructed from the decomposition sequence; ③ using the combined feature De_Vo. Figure 9a–c show a comparison of the prediction evaluation indicators for LR, SVM, and XGBoost under scenarios ①, ②, and ③, respectively. It can be seen from the figures that for any classification model, the prediction indicators based on the combined feature, such as accuracy, precision, recall, and F1, were better than the single Vo and De features. In terms of accuracy, the LR, SVM, and XGBoost models increased by 25.8%, 23.4%, and 15.5%, respectively. Although the improvement of LR’s prediction indicators was higher than that of SVM and XGBoost, the overall prediction performance of SVM and XGBoost was better than that of the LR model. In terms of accuracy, SVM and XGBoost were 14.2% and 16.6% higher than LR, respectively. This shows that compared with single features, combined features can better describe the trend of price fluctuations, thereby improving the accuracy of model predictions. Secondly, the algorithm was one of the key factors affecting the prediction performance, so in future research, we can consider optimizing the algorithm or using a combination model to improve the prediction ability of the model.

3.2.3. Predicted Results in the Case of Multiple Classifications

In reality, in addition to the upward and downward trends in prices, the magnitude of price increases and decreases are simultaneously of great concern. Therefore, a solution to the problem of multi-categorization is to divide the trend of garlic price more carefully. If the price has three classifications, the price volatility of the data can be regarded as slightly fluctuating, so it can be divided into three categories: rising, with slight fluctuations, and falling. Similarly, with four classifications the price trend can be divided into a large rise, a rise, a fall, and a large decline; with five classifications the price trend can be divided into a large rise, a rise, slight fluctuations, a fall, and a large decline. Table 2 shows the prediction results when the number of classifications was 2, 3, 4, 5, and it can be seen that the classification prediction ability of the XGBoost model was stronger than that of the SVM and LR models.
It can be clearly seen from Figure 10 that the accuracy decreases gradually with the increase in the number of classification categories. Among them, from S2 to S3, the accuracy decreases the most, and the accuracy of SVM and XGBoost decreases significantly more than that of LR model. From the S3–S5 stage, the accuracy rate of XGBoost shows more prominent changes, implying that the XGBoost model has higher sensitivity to the change in the number of categories.
Figure 11 portrays the rate of improvement of the prediction accuracy compared to the random selection model. The random selection model means that the accuracy of each choice is considered as 50% when two classifications are made, 33% for each choice when three classifications are made and so on. S2, S3, S4, and S5 represent the cases when the number of classifications is 2, 3, 4, and 5, respectively. It can be seen that as the number of classifications increases, the increase in accuracy increases. From S2 to S5, the accuracy improvement rate of the LR, SVM, and XGBoost models rose from 25.1% to 51.3%, from 42.9% to 84.2%, and from 45.9% to 92.6%, respectively. Among them, the change in the XGBoost model was the most prominent, which on the other hand reflects the higher sensitivity of the XGBoost model to the number of classifications. In addition, along with the increase in the number of classifications, the accuracy of the model classification prediction consequently decreased. Therefore, it is necessary to weigh the advantages and disadvantages before choosing the number of classifications according to the actual needs in the application.

3.3. Discussion

(1) This study used the garlic price as an example and predicted the price fluctuation of small-scale agricultural products from a classification perspective. It used the new features constructed from the sequences obtained using VMD decomposition and the volatility features to form a composite feature for garlic price classification prediction. The results show that, when performing a binary classification trend prediction, this method performs excellently on garlic price fluctuation, and the accuracy of the XGBoost model reached 72.9%. Through comparative experiments, it was found that the prediction performance of the composite feature De_Vo was better than using De and Vo features alone. This prediction method has a certain practicality and a deeper development value.
(2) In terms of the similarities and differences between this study and previous studies, in previous studies the price research of many agricultural products, including garlic, mainly focused on the price volatility characteristics [35,36,37,38,39] and the specific value prediction [40,41,42,43]. This study is the first to use the new data features constructed using decomposing sequences and the volatility features to predict the price fluctuation of garlic. Compared with previous studies, the similarity of this study is that it uses the feature and model method to predict the price. The differences are that (1) previous studies focused more on approaching the specific values, while this study mainly studied the trend and direction of the price; and (2) previous studies used more collected data or processed data as features, rather than reconstructing new features.
(3) In terms of the theoretical significance and practical application of this study, the theoretical significance of this study lies in constructing a garlic price fluctuation prediction method based on multi-dimensional composite features. It provides new research ideas for future researchers on price prediction. This prediction method has high practical application: firstly, it can predict the price fluctuation, and help the industry personnel to prepare for the upcoming price changes in advance; in addition, after making some improvements to the model, the multi-classification prediction results can also be used for risk level early warnings for agricultural product prices.
Specifically speaking, accurate price trend predictions are crucial for farmers, traders, and policymakers, assisting these groups in making informed economic decisions amidst market uncertainty. Farmers can leverage price forecast data to optimize their planting and selling strategies. By understanding future price movements, they can decide on the appropriate scale of cultivation and timing of sales to maximize returns. For instance, farmers may delay sales to seek higher profits when prices are predicted to rise; conversely, they might opt for early sales to avoid losses when a price decline is anticipated. For traders, accurate price predictions enable the formulation of effective inventory and pricing strategies. This information allows traders to determine when to buy or sell garlic and how to price it, aiming to optimize profits. Policymakers can use the forecast outcomes to guide the development and adjustment of agricultural policies. Based on price trends, they can implement supportive measures, such as market interventions and price controls, to maintain market stability and protect farmers’ interests.
Therefore, the predictive model developed in this study not only offers a scientific forecast of market trends but also provides strategic guidance for the economic sustainability of the agricultural sector. To ensure the effective implementation of the predictive model, specialized training is recommended to enhance stakeholders’ understanding and application capabilities. Additionally, mechanisms should be established to ensure regular updates and adjustments to the model based on market feedback, maintaining its accuracy and practicality over time.
(4) The limitations or potential defects of this study are as follows. (1) This study mainly predicted the price fluctuation of garlic. The classification model selects three common classification algorithm models. From the research results, it can be seen that the prediction performance of different algorithm models exhibited large differences. Therefore, there may be more excellent algorithm models that can provide higher accuracy prediction results. When we have higher requirements for the accuracy of prediction, we can use other more suitable models for prediction. (2) The features used in this model mainly included volatility features, and four new features constructed using two sequences decomposed using VMD. Although these composite features can explain price fluctuations to a certain extent, they do not fully cover all possible factors that might affect the price. In particular, the model did not sufficiently consider key factors such as festivals, weather, and public sentiment, which could impact garlic prices. For example, in terms of the festival factor, with garlic as one of the indispensable raw materials for Chinese catering and other industries, coupled with the strong atmosphere of various Chinese festivals, garlic prices are easily affected by festival information and changes. The authors of [23] found that after the occurrence of random events, the impact of its short-term impact on garlic prices is the largest, and it is more likely to cause abnormal fluctuations in garlic prices. Therefore, when predicting price fluctuations, festival information can be used as a feature to integrate into the prediction model, thereby improving the prediction ability and interpretability of the model.

4. Conclusions

This paper obtained the composite feature De_Vo from the volatility feature Vo and the feature De extracted from the sequences decomposed using VMD, and used three classification models, LR, SVM, and XGBoost, to predict the price fluctuation trend of garlic. The research results and prospects of this paper are as follows:
(1)
Compared with the LR model, the SVM and XGBoost models had higher Accuracy and F1 values, both breaking 70%. Among them, the XGBoost model’s prediction performance was the best, with an accuracy of 72.9%, meaning it can predict the price fluctuation trend of garlic well. In future research, the methodology proposed in this paper can be drawn upon to forecast the upward and downward trends in agricultural commodity prices.
(2)
The prediction results for evaluation indicators obtained using the composite feature De_Vo were higher than those of the single De feature and Vo feature. After using the De_Vo composite feature, the accuracy of the three models LR, SVM, and XGBoost, increased by 25.8%, 23.4%, and 15.5%, respectively.
(3)
As the number of classifications increases, the evaluation indicators of each model decrease. With any number of classifications, the XGBoost model’s prediction performance was better than the LR and SVM models. Compared with the LR and SVM models, the XGBoost model showed a higher sensitivity to the number of classifications, that is, a faster accuracy decline rate. However, for the classifications S2, S3, S4, and S5, the XGBoost model still performed best. It can carry out more in-depth research on the multi-classification prediction of S3, S4, and S5, and then apply it for the early warning of garlic price fluctuations.
(4)
When predicting price fluctuations, we need to consider the factors that affect price fluctuations in combination with the actual prediction target to construct more suitable data features; in addition, we can also use combined prediction models or optimization algorithms to achieve the effect of improving the model prediction performance.
(5)
The trend prediction and value prediction can be combined to improve the prediction performance, such as the prediction result of the rise and fall as a feature of the value, or vice versa, with the prediction result of the value as one of the features of the trend prediction.

Author Contributions

Conceptualization, F.S. and P.L.; methodology, F.S.; writing—original draft preparation, F.S. and X.M.; writing—review and editing, X.M and H.Z.; supervision, project administration, Y.W. and H.Z. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by the Key Research Development Program (Major Science and Technology Innovation Projects) of Shandong Province, grant number 2022CXGC010609.

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Data are contained within the article.

Conflicts of Interest

The authors declared no conflicts of interest.

References

  1. Chen, M.J. Characteristics of Small Agricultural Products Industry and Suggestions for Price Regulation. China Price Regul. Antimonop. 2017, 9, 45–48. [Google Scholar]
  2. Zhang, Y.W.; Xu, Y.Y.; Liu, J.J. Research on Price Risk identification of Small-scale Agricultural Products and Its Countermeasures—Taking Green Onion, Ginger and Garlic as Examples. Price Theory Pract. 2022, 10, 111–114. [Google Scholar] [CrossRef]
  3. Shi, G.Y.; Li, M.G. Forecast and Analysis of Garlic Price Time Series in Qingdao City Based on ARIMA Model. Shandong Agric. Sci. 2017, 49, 168–172. [Google Scholar]
  4. Lv, X.; Meng, J.; Wu, Q. Dynamic Influence of Network Public Opinions on Price Fluctuation of Small Agricultural Products Based on NLP-TVP-VAR Model—Taking Garlic as an Example. Sustainability 2022, 14, 8637. [Google Scholar] [CrossRef]
  5. Marina, A.; Sri, H.; Sri, M. Analysis of Indonesian and Chinese Garlic Volatility Prices. Int. J. Sci. Res. Sci. Eng. Technol. 2019, 6, 197–207. [Google Scholar] [CrossRef]
  6. Zhang, L.X.; Zhang, X.C.; Chen, S.T. Does Hot Money Impact Fluctuation of Agricultural Product Price—Case of Fluctuations in Garlic Prices. J. Agrotech. Econ. 2010, 12, 60–67. [Google Scholar] [CrossRef]
  7. Qiu, S.X. China’s garlic price volatility cycle and characterization. Stat. Decis. 2013, 15, 97–100. [Google Scholar] [CrossRef]
  8. Li, J.D.; Zhang, J.G. Characteristics and influencing factors of Chinese small-scale agricultural products price fluctuation: An empirical analysis based on the garlic price data from 2005 to 2014. J. Hunan Agric. Univ. (Soc. Sci.) 2015, 16, 8–15. [Google Scholar] [CrossRef]
  9. Yao, S. Characterization of Garlic Price Volatility in China—Empirical Analysis Based on ARCH Class Models. Price Theory Pract. 2012, 10, 54–55. [Google Scholar]
  10. Guan, F.X.; Yao, R.K. Study on Price Fluctuation of Small-scale Agricultural Products and Its Random Factors—Take garlic price as an example. Price Theory Pract. 2022, 9, 117–120. [Google Scholar]
  11. Sun, F.; Meng, X.; Zhang, Y.; Wang, Y.; Jiang, H.; Liu, P. Agricultural Product Price Forecasting Methods: A Review. Agriculture 2023, 13, 1671. [Google Scholar] [CrossRef]
  12. Moore, H.L. Forecasting the Yield and the Price of Cotton; Macmillan: New York, NY, USA, 1917. [Google Scholar]
  13. Zhang, L.J.; Fu, Y.H. Research on the nonlinearity and complexity of China’s futures prices of agricultural products. Prices Mon. 2023, 12, 1–9. [Google Scholar] [CrossRef]
  14. Hua, J.G.; Su, G.F.; Jia, Y.F. Research on Pork Price Forecasting and Risk Early Warning Under Impact of Swine Epidemic. Agric. Econ. Manag. 2022, 6, 101–113. [Google Scholar] [CrossRef]
  15. Wang, B.J.; Liu, P.; Chao, Z.; Junmei, W.; Chen, W.; Cao, N.; O’hare, G.M.; Wen, F. Research on Hybrid Model of Garlic Short-Term Price Forecasting based on Big Data. Comput. Mater. Contin. 2018, 57, 283–296. [Google Scholar] [CrossRef]
  16. Ray, S.; Lama, A.; Mishra, P.; Biswas, T.; Das, S.S.; Gurung, B. An ARIMA-LSTM model for predicting volatile agricultural price series with random forest technique. Appl. Soft Comput. 2023, 149, 110939. [Google Scholar] [CrossRef]
  17. Hu, C.A.; Jiang, W. Pork Price Prediction Model Based on VMD-BO-BILSTM. J. Appl. Sci. 2023, 41, 692–704. [Google Scholar] [CrossRef]
  18. Sun, C.; Pei, M.; Cao, B.; Chang, S.; Si, H. A Study on Agricultural Commodity Price Prediction Model Based on Secondary Decomposition and Long Short-Term Memory Network. Agriculture 2023, 14, 60. [Google Scholar] [CrossRef]
  19. Jaiswal, R.; Choudhary, K.; Kumar, R.R. A Decomposition-Based Hybrid Model for Price Forecasting of Agricultural Commodities. Natl. Acad. Sci. Lett. 2022, 45, 477–480. [Google Scholar] [CrossRef]
  20. Fang, X.Q.; Wu, C.Y.; Yu, S.H.; Zhang, D.B.; Ou, Y.Q. Research on Short-Term Forecast Model of Agricultural Product Price Based on EEMD-LSTM. Chin. J. Manag. Sci. 2021, 29, 68–77. [Google Scholar] [CrossRef]
  21. Tang, Z.P.; Wu, J.C.; Zhang, T.T.; Du, X.X.; Chen, K.J. Research on grain futures price forecasting based on secondary decomposition and ensemble learning. Syst. Eng.-Theory Pract. 2021, 41, 2837–2849. [Google Scholar] [CrossRef]
  22. Wang, Y.J.; Bai, L.; Zhao, B.H. Price Forecast of Garlic in China Based on ARIMA Mode. Vegetables 2021, 12, 50–54. [Google Scholar]
  23. Wang, Y. Agricultural products price prediction based on improved RBF neural network model. Appl. Artif. Intell. 2023, 37, 2204600. [Google Scholar] [CrossRef]
  24. Wang, Y. Research on Combined Forecasting Model of Ginger PriceBased on Multiple Influencing Factors. Master’s Thesis, Shandong Agricultural University, Taian, China, 2023. [Google Scholar]
  25. He, H.; Sun, M.; Li, X.; Mensah, I.A. A novel crude oil price trend prediction method: Machine learning classification algorithm based on multi-modal data features. Energy 2022, 244, 122706. [Google Scholar] [CrossRef]
  26. Dragomiretskiy, K.; Zosso, D. Variational mode decomposition. IEEE Trans. Signal Process. 2013, 62, 531–544. [Google Scholar] [CrossRef]
  27. Bisoi, R.; Dash, P.K.; Parida, A.K. Hybrid variational mode decomposition and evolutionary robust kernel extreme learning machine for stock price and movement prediction on daily basis. Appl. Soft Comput. 2019, 74, 652–678. [Google Scholar] [CrossRef]
  28. Zhu, Q.; Zhang, F.; Liu, S.; Wu, Y.; Wang, L. A hybrid VMD–BiGRU model for rubber futures time series forecasting. Appl. Soft Comput. 2019, 84, 105739. [Google Scholar] [CrossRef]
  29. Lin, Y.; Lu, Q.; Tan, B.; Yu, Y. Forecasting energy prices using a novel hybrid model with variational mode decomposition. Energy 2022, 246, 123366. [Google Scholar] [CrossRef]
  30. Wu, J.; Hu, Y.; Wu, D.; Yang, Z. An Aquatic Product Price Forecast Model Using VMD-IBES-LSTM Hybrid Approach. Agriculture 2022, 12, 1185. [Google Scholar] [CrossRef]
  31. Avanijaa, J. Prediction of house price using xgboost regression algorithm. Turk. J. Comput. Math. Educ. (TURCOMAT) 2021, 12, 2151–2155. Available online: https://www.turcomat.org/index.php/turkbilmat/article/view/1870 (accessed on 7 January 2024).
  32. Yun, K.K.; Yoon, S.W.; Won, D. Prediction of stock price direction using a hybrid GA-XGBoost algorithm with a three-stage feature engineering process. Expert Syst. Appl. 2021, 186, 115716. [Google Scholar] [CrossRef]
  33. Jabeur, S.B.; Mefteh-Wali, S.; Viviani, J.L. Forecasting gold price with the XGBoost algorithm and SHAP interaction values. Ann. Oper. Res. 2021, 334, 679–699. [Google Scholar] [CrossRef]
  34. Wang, Y.; Guo, Y. Forecasting method of stock market volatility in time series data based on mixed model of ARIMA and XGBoost. China Commun. 2020, 17, 205–221. [Google Scholar] [CrossRef]
  35. Xie, Y.; Wang, Y.F.; Qiao, S. An Empirical Analysis of the Price Fluctuation Characteristics of our country’s Small Agricultural Products—Taking the Wholesale Price of Garlic as an Example. North. Econ. Trade 2023, 10, 82–87. [Google Scholar]
  36. Yao, S. Characteristics of and Influencing factors of of Small-Scale Agricultural Product in China—A Case Study of Shelf-stable Agricultural Products as Garlic. Price Theory Pract. 2021, 8, 100–103+186. [Google Scholar] [CrossRef]
  37. Yuan, T.T.; Wang, L.L.; Zhao, B.H.; Wang, J.Q. Empirical Analysis of Price Fluctuation of Garlic in China. North. Hortic. 2018, 20, 185–190. [Google Scholar]
  38. Wu, G.; Liu, P.; Chen, W.; Han, W. Analysis of Price Fluctuation Characteristics and Influencing Factors of Garlic Based on HP Filter Method. In Proceedings of the Cloud Computing and Security, ICCCS 2018, Haikou, China, 8–10 June 2018. [Google Scholar] [CrossRef]
  39. Marfatia, H.A.; Ji, Q.; Luo, J. Forecasting the volatility of agricultural commodity futures: The role of co-volatility and oil volatility. J. Forecast. 2022, 41, 383–404. [Google Scholar] [CrossRef]
  40. Wang, Y.; Liu, P.; Zhu, K.; Liu, L.; Zhang, Y.; Xu, G. A Garlic-Price-Prediction Approach Based on Combined LSTM and GARCH-Family Model. Appl. Sci. 2022, 12, 11366. [Google Scholar] [CrossRef]
  41. Yin, H.; Jin, D.; Gu, Y.H.; Park, C.J.; Han, S.K.; Yoo, S.J. STL-ATTLSTM: Vegetable Price Forecasting Using STL and Attention Mechanism-Based LSTM. Agriculture 2020, 10, 612. [Google Scholar] [CrossRef]
  42. Zhang, D.; Chen, S.; Liwen, L.; Xia, Q. Forecasting agricultural commodity prices using model selection framework with time series features and forecast horizons. IEEE Access 2020, 8, 28197–28209. [Google Scholar] [CrossRef]
  43. Fang, Y.M.; Guan, B.; Wu, S.; Saeed, H. Optimal forecast combination based on ensemble empirical mode decomposition for agricultural commodity futures prices. J. Forecast. 2020, 39, 877–886. [Google Scholar] [CrossRef]
Figure 1. Main framework of the study.
Figure 1. Main framework of the study.
Horticulturae 10 00347 g001
Figure 2. (a) Daily price of garlic; (b) weekly price of garlic.
Figure 2. (a) Daily price of garlic; (b) weekly price of garlic.
Horticulturae 10 00347 g002
Figure 3. Some common abbreviations used in the article.
Figure 3. Some common abbreviations used in the article.
Horticulturae 10 00347 g003
Figure 4. Chart of weekly price increases and decreases in symbols.
Figure 4. Chart of weekly price increases and decreases in symbols.
Horticulturae 10 00347 g004
Figure 5. (a) Comparison of accuracy values of the classification algorithm’s prediction results with different volatility lag days. (b) Comparison of F1 values of the classification algorithm’s prediction results with different volatility lag days.
Figure 5. (a) Comparison of accuracy values of the classification algorithm’s prediction results with different volatility lag days. (b) Comparison of F1 values of the classification algorithm’s prediction results with different volatility lag days.
Horticulturae 10 00347 g005
Figure 6. (a) Comparison of accuracy values for each classification algorithm at different decomposition window scales; (b) comparison of F1 values for each classification algorithm at different decomposition window scales.
Figure 6. (a) Comparison of accuracy values for each classification algorithm at different decomposition window scales; (b) comparison of F1 values for each classification algorithm at different decomposition window scales.
Horticulturae 10 00347 g006
Figure 7. Decomposition of price series with different window scales: (a) window scale of 13, (b) window scale of 26, (c) window scale of 39, (d) window scale of 52.
Figure 7. Decomposition of price series with different window scales: (a) window scale of 13, (b) window scale of 26, (c) window scale of 39, (d) window scale of 52.
Horticulturae 10 00347 g007
Figure 8. Comparison of predictive indicators of different classification models.
Figure 8. Comparison of predictive indicators of different classification models.
Horticulturae 10 00347 g008
Figure 9. Comparison of prediction results with different features. (a) LR prediction metric value. (b) SVM prediction metric value. (c) XGBoost prediction metric value.
Figure 9. Comparison of prediction results with different features. (a) LR prediction metric value. (b) SVM prediction metric value. (c) XGBoost prediction metric value.
Horticulturae 10 00347 g009
Figure 10. Comparison of Accuracy values for different number of classifications in each model.
Figure 10. Comparison of Accuracy values for different number of classifications in each model.
Horticulturae 10 00347 g010
Figure 11. Comparison of Accuracy enhancement rate with different number of classifications in each model.
Figure 11. Comparison of Accuracy enhancement rate with different number of classifications in each model.
Horticulturae 10 00347 g011
Table 1. Parameters related to the classification algorithm section.
Table 1. Parameters related to the classification algorithm section.
ModelParameter
LR‘C’: 100.0, ‘penalty’: ‘l2’
SVM‘C’: 100, ‘gamma’: 0.01, kernel = ‘rbf’
XGBoostobjective = “multi:softmax” num_class = 3, n_estimators = 100, max_depth = 3
Table 2. Prediction results of each model with different number of classifications.
Table 2. Prediction results of each model with different number of classifications.
ModelsMetricsS2S3S4S5
LRAccuracy0.6256410.4868420.3157890.302632
F10.6208150.4774320.3178080.269266
SVMAccuracy0.7142860.4934210.3815790.368421
F10.7123180.4925970.3799970.357876
XGBoostAccuracy0.7293230.514050.4347110.385124
F10.728940.5146690.4341460.383562
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Sun, F.; Meng, X.; Zhang, H.; Wang, Y.; Liu, P. Prediction of Weekly Price Trend of Garlic Based on Classification Algorithm and Combined Features. Horticulturae 2024, 10, 347. https://doi.org/10.3390/horticulturae10040347

AMA Style

Sun F, Meng X, Zhang H, Wang Y, Liu P. Prediction of Weekly Price Trend of Garlic Based on Classification Algorithm and Combined Features. Horticulturae. 2024; 10(4):347. https://doi.org/10.3390/horticulturae10040347

Chicago/Turabian Style

Sun, Feihu, Xianyong Meng, Hongqi Zhang, Yue Wang, and Pingzeng Liu. 2024. "Prediction of Weekly Price Trend of Garlic Based on Classification Algorithm and Combined Features" Horticulturae 10, no. 4: 347. https://doi.org/10.3390/horticulturae10040347

APA Style

Sun, F., Meng, X., Zhang, H., Wang, Y., & Liu, P. (2024). Prediction of Weekly Price Trend of Garlic Based on Classification Algorithm and Combined Features. Horticulturae, 10(4), 347. https://doi.org/10.3390/horticulturae10040347

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop