1. Introduction
Pork is the primary source of animal protein for residents in China. Pork production has consistently topped the list of domestic meat production in China. According to the National Bureau of Statistics (
http://data.stats.gov.cn/easyquery.htm?cn=C01, accessed on 15 October 2023), hog yields in China reached 55.41 million tons in the year 2022, accounting for 59.4% of the domestic livestock yields, which accounted for about 50% of the world’s total amount. The pork supply chain is composed of a wide range of links, including its upstream industry, such as feed processing and transportation, the farming of soybean and corn, slaughter, and the downstream sector, such as package, storage, transportation, and sales while satisfying consumers’ needs, etc. It can be seen that the pig farming industry has a core impact on the national economy and people’s livelihood and also affects the changes in international and domestic pork futures index to a certain extent. Keeping pork prices stable, avoiding big ups and downs, and accurately and reliably predicting the law of pork price change are of great value to ensure the safety, stability, and sustainable development of the pork meat supply chain, as well as the pig farming industry.
The pork production sector makes a vital contribution to the agricultural industry. However, due to the rapid development of pork production and poor management, incomplete regulation, and the decoupling of crop and pork production systems, pork production and its related feed production have significantly increased environmental pollution, especially through the improper disposal of manures and slurries and waste of feed resources, as well as the associated greenhouse gas emissions and non-renewable energy and resource use. Annual pig manure production exceeded 60 Mt in 2017, accounting for about 30% of total pollutants sourced from the animal husbandry industry. Therefore, strengthening the prediction and monitoring of pork prices is the foundation for achieving stable prices and production in the pig industry. It plays a vital role in promoting the sustainable development of the pig industry, affecting the sustainable development of upstream sectors such as pig feed processing and slaughter and ultimately affecting the sustainable development of the breeding industry and agriculture industry [
1,
2,
3].
Moreover, the pork meat supply chain significantly impacts the farming of the soybean and corn used as feed for pigs, increasing farmers’ income, sustainable agricultural development, and rural revitalization. In other words, the fluctuation, soaring, or continuous decline of pork prices not only affects the nerves of the general public but also involves the interests of the pig industry practitioners and even affects the stability and harmony of the economy and society. The increase in pork prices will increase the number of pigs raised, thereby driving the sustainable development of the planting industry of raw materials for pig feed and agricultural products and the sustainable expansion of the pig feed-processing industry. Conversely, the development of related sectors cannot be sustained and will lead to shrinkage. In other words, studying the changes in pork prices, predicting price changes, and warning against fluctuation risks are the foundation for the sustainable development of the pig industry and its upstream and downstream related industries. They are also fundamental requirements for achieving sustainable agricultural development. Therefore, the reliable, reasonable, and accurate prediction of the pork price for the pig industry chain, pork supply chain management, and practical arrangement of production, sales front production activities, commodity price departments, and control pork prices, as well as the consumer, has important theoretical significance and impacts decision making, has long been critical in government supervision and academic circles that carry out the hot and challenging issues of price forecasting and early warning research [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16].
Sarle [
17] and Ezekiel [
18] established multiple linear regression (MLR) models to predict the price of pork (or hogs) in the 1920s. Since then, a large number of studies on the prediction and early warning of pork prices have been published. In summary, there are three main categories of prediction models. The first type is based on the price fluctuation mechanism and influencing factors (or independent variables, hereafter referred to as based on the price fluctuation mechanism). In building multivariable models with different lagged periods, the number of influencing factors can be as small as four to five and as large as twenty or more. It mainly involves the cost of piglets, pig feeding, the price of pork substitutes, consumer demand, the feeding environment related to African swine fever, the catering industry, logistics, the international environment, the money supply, pork imports, futures index, and so on [
1,
2,
3,
4,
5,
6,
7,
8,
9,
10,
11,
12,
13,
14,
15,
16,
17,
18]. The second model type starts from the result of price fluctuation results (hereafter referred to as the result of price fluctuation results). The time series univariate auto-regression (i.e., multiple periods) model is established based on the time series data of pork price (daily, weekly, half-monthly, monthly, quarterly, semi-yearly, and yearly). The third category is a hybrid multivariate model, which includes both the autoregressive time series data on pork price and the data on influencing factors (the number of lagged periods can be different). Theoretically, the first model type is conducive to studying the pork price transmission mechanism, supply chain management, pig industry chain, upstream and downstream enterprise management, etc. The reliability and effectiveness of the model prediction results directly depend on the comprehensive, scientific, and systematic nature of the influencing factors determined. According to the existing literature, different authors often use different independent variables (influencing factors), which shows that the factors affecting pork prices are difficult to reliably and reasonably determine, which reduces the reliability and rationality of the model and the accuracy of the prediction results. Moreover, according to such models, the price of pork can generally only be predicted in phase 1. All the influencing factors of pork prices, and even the result of the interaction of multiple influencing factors, are finally reflected in the time series data of pork prices. The time series data of pork prices contain the combined influence results of all the influencing factors. The model established based on time series data of pork prices also has good rationality and reliability. According to this model, it can be very convenient to realize the multi-step prediction of the changing trend of pork price and to predict when or whether the pork price will appear at an inflection point in a certain period to make up for the deficiency of the first type of model. Therefore, the establishment of the two different models above is to study the fluctuation of pork prices from two perspectives, both of which have important theoretical and practical significance. Each has its characteristics, advantages, and disadvantages, and the two models should not be ignored. In principle, both models are valid and reasonable. The third model type is a synthesis of the first two types of models, which integrates the advantages of the above two models. Of course, there are also some deficiencies, such as only predicting the price of the one ahead. As for the established pork price prediction model, almost all the modeling methods for time series data have been applied to pork price prediction, mainly including traditional prediction models (TPMs, including time series in single independent variable or multi-independent variables) and modern data-mining technologies (MDTs). The TPMs include MLR [
17,
18,
19], grey prediction model [
5], vector auto-regression (VAR) [
14], auto-regressive integrated moving model (ARIMA) [
8,
18,
19], etc. The MDTs (or machine learning methods) have artificial neural networks (mainly based on error backpropagation neural network (BPNN)), radial basis function neural networks, generalized regression neural networks, extreme learning machine neural networks [
5,
9,
11,
15,
19], support vector machine/regression (SVM/SVR) [
6], multi-regime smooth transition autoregressive model [
16], and dynamic model average (DMA) [
4]. Meanwhile, the data decomposition methods or independent variable compression methods, such as empirical mode decomposition (EMD, including EEMD, CEEMD, and CEEMDAN, etc.) [
6], filter algorithm [
10], principal component analysis [
5], are applied to decompose the original data into intrinsic mode functions (IMFs). The IMFs are modeled by the TPMs or MDTs abovementioned. Furthermore, the combination models are established using two or three models above [
5,
13]. The existing studies and the above literature have achieved certain results in predicting pork prices. Most of the literature has shown good performance for training and validation datasets. At the same time, there are also some problems in the modeling process, such as generality, applicability, and reliability.
On the other hand, the research shows that the projection pursuit regression models have a good generalization ability and outperform SVR/SVM, BPNN, RF, etc., in suitability, applicability, and reliability [
20,
21,
22] for small samples as well as large samples. The primary purpose of this article is split into four aspects. Firstly, we apply the PPR model, which is particularly suitable for modeling high-dimensional, nonlinear, and non-normal distribution data, to the study of monthly pork price prediction for the first time. Secondly, we introduce the principles and precautions of PPR modeling. We compare the modeling performance metrics between the PPR models and other models in terms of reliability, prediction accuracy, generalization ability, etc., and analyze the main problems in the literature. Thirdly, based on the established PPR models, we put forward some measures and suggestions for regulating pork prices to avoid sudden increases and decreases, better promote the stable and efficient development of pig farming and upstream and downstream industries, promote sustainable agricultural development, and enhance the people’s happiness and sense of gain. Finally, we apply the PPR models to actually predict the monthly pork price more accurately and reliably using the latest available data.
The structure of this paper is as follows:
Section 1 is the introduction;
Section 2 is a review of the literature on price prediction of livestock, pork, and crop and PPR models;
Section 3 describes the data resources of the time series of pork price and its influencing factors;
Section 4 discusses the principles and precautions of PPR modeling based on univariate time series data and multivariate time series data such as hog–corn ratio, piglet’s price, etc.;
Section 5 is positive research and the results of establishing the PPR model;
Section 6 analyses the particular procedure and results of the H-PPR model;
Section 7 is the results and the discussion;
Section 8 includes the main conclusions, policy recommendations, limitations, and future research.
2. Literature Review
In order to conduct better research, it is necessary to review the existing literature on price forecasting comprehensively, absorb valuable achievements, identify problems, and make improvements. The literature provides multiple techniques to forecast livestock, pork, and crop products. The proposed solutions included mathematical and statistical models (MLR, GM, ARIMA, etc.) and machine learning approaches combining statistical and artificial intelligence models to provide better predictions. Among these models, ARIMA, error backpropagation neural networks (BPNN), support vector machines for regression (SVR), random forests, and LSTM are the most popular, but other models have also been used. Refs. [
23,
24] comprehensively reviewed the literature on the price prediction of pork, livestock, and agriculture products. Since the 1920s, scholars have widely researched pork price prediction. These previous studies added meaningful value to this article, and we only provide a brief review and summary in
Table 1.
It can be seen from
Table 1 that there are various models, including statistical methods (e.g., ARIMA, SARIMA, GM, etc.) and machine learning models such as BPNN, SVR, RF, and LSTM. Furthermore, swarm intelligent optimizations such as FOA, WSO, and SSA are used to optimize the parameters of the models. Decomposition techniques such as EMD, EEMD, CEEMD, STL, and VMD are applied to decompose the time series data into several independent components for each component to establish a model and finally to combine them to build the model for prediction prices. We can conclude that more and more machine learnings and their combined models, as well as more and more complicated models, are used to predict prices. In fact, the more complicated the models are, the higher the fitting accuracy for the training samples the models, the poorer the generalization for the models’ validation, and the greater the challenge to establish the models. Meanwhile, the conclusions of these articles are usually very vague or ambiguous. According to their study results, some scholars thought the SVR, as well as its combined models, outperformed other models [
34,
35], some scholars thought the LSTM and its combined model outperformed other models [
25,
27,
34], some scholars thought BPNN had better generalization ability than other models [
5,
11,
13], Ref. [
4] thought the DMA had better performance than other models, Ref. [
18] found that the BPNN performed considerably worse than the econometric model, etc. Theoretically, the traditional statistical model is a “white box” model with a clear working mechanism, and its flexibility is relatively insufficient. It can achieve better results only when the pork price changes conform to the model’s function. Usually, the fluctuation of the monthly/weekly/daily prices of pork, hog, vegetables, and other agricultural products and the future is far more complex than the function of the traditional model. Although modern data mining technologies or machine learning models such as BPNN, SVM/SVR, RF, and LSTM have good nonlinear approximation ability, they are not the panacea for price prediction [
36]. The machine learning models are well-designed, trained under the monitoring of validation samples in the training or optimization process, meaning they avoid over-training or over-fitting, and have good generalization ability, reliability, and applicability. Otherwise, “overtraining” and “overfitting” can easily occur when modeling. To avoid “overtraining” and “overfitting,” certain modeling principles must be followed. For example, the BPNN modeling process must follow basic principles and steps [
37,
38,
39,
40]: (1) The sample data must be divided into training and verification subsets with similar properties. Monitoring the root-mean-square error (RMSE) for verification cases in the training process is necessary. If the RMSE on the verification cases does not improve and begins to rise, the training process will cease (called the early stop training method). Characterize the model performance metrics with the error of the test samples; (2) Meet the accuracy requirements. We take the neural network structure topology as compact as possible (with hidden layer nodes as few as possible). The number of training samples must be at least 3–5 times greater than the number of network connection weights, and it is better to reach 5–10 times and above; (3) Use the regularization method to determine the reasonable number of hidden layer nodes. Unfortunately, much of the existing BPNN modeling literature does not follow the above principles. Although the SVR model can be applied to moderate samples, it is not easy to choose reasonable model parameters. In addition, BPNN, SVR, and others belong to the data-driven “black box” and “recessive” models [
40]. It is not convenient to analyze the working mechanism and study the transmission mechanism of pork prices. The follow-up application is not convenient, which is not conducive to formulating measures to control the pork price and strengthening the macro management of the pork supply chain and upstream and downstream enterprises. However, the DMA model involves more than 2000 prediction models, each with 4–5 independent variables. The model is very complex and significant in theory; its practicability is insufficient, and the prediction accuracy is not very high (referring to
Section 7.3). Therefore, two problems exist in the existing research literature on pork price prediction. First, the process of establishing the machine learning models (including BPNN, RBFNN, SVR, LSTM, DMA, and various combination models) is too complex to have good applicability. Second, for most of the literature on establishing machine learning models, the basic principles of modeling are not followed, which makes it difficult to ensure the generalization and prediction ability. For SVR models (including various combination models), the results are directly related to the model’s parameters search range, making it difficult to ensure its robustness and stability.
For the pork price prediction problem, under the condition of meeting the prediction accuracy requirements or prediction accuracy, we should choose to use a simple dominant model as far as possible. The model contains independent variables that should be as few as possible to facilitate data collection and reduce costs, making it more convenient, according to the prediction model, to take effective measures to control and adjust pork prices, analyze the pork price transmission mechanism, strengthen the pork supply chain management, and improve the pig industry chain. Therefore, the existing literature cannot meet the above requirements for pork price prediction. On the other hand, projection pursuit regression (PPR) technology is also a nonlinear data mining technology. Research has shown that it has the same nonlinear approximation ability as BPNN. Still, it is especially suitable for small and medium sample data modeling that does not obey the ordinary distribution law [
20,
21,
22,
41,
42,
43,
44,
45,
46]. Due to PPR, the model of independent variable weight sum is equal to 1 for multiple independent variables with collaborative constraints. The PPR model has been widely used in agriculture, water conservancy, earthquake, and experimental optimization design with less data for complicated changes and fluctuations, but has not been used in pork price prediction research.
This paper has the following features and contributions compared with the existing literature. First, from the perspective of theoretical model selection, modeling, and prediction ability, we innovatively established the PPAR model for the time series data of monthly pork prices, using the H-PPR model for the monthly data, including 12 influencing factors (lag period or sliding window data). The predation–parasitic algorithm [
20,
47] is adopted to obtain the real global optimal solution. Since the constraint of the PPR model is the sum of squares of the best variables’ best weights equal to 1, “overtraining” and “overfitting” can be effectively avoided. At the same time, through comparison, non-significant independent variables (influencing factors) are deleted one by one to establish more concise and practical PPAR and H-PPR models and a more straightforward and valuable model. Comparative studies show that the data-fitting ability of several machine learning algorithms (models) is equal. Still, the prediction ability of PPAR and H-PPR models is better than SVR, BPNN, DMA, and other models, and the model’s results are more robust, reliable, and reasonable. Moreover, PPAR and H-PPR are dominant models. Because of this, given pork price and multiple influencing factors (independent variables), this paper constructs the PPAR and H-PPR pork price prediction model and applies various performance metrics to evaluate the prediction ability of the model, avoid the subjectivity of the model and its parameter selection, improve the effectiveness, robustness, and effectiveness of the model, expand the new method of pork price prediction research, and provides a guiding research framework for the subsequent pork price prediction modeling.
Second, from the perspective of the practical application of the model, the PPAR model established in this paper only uses the pork price data lagging behind 1–3 periods. The established H-PPR model, which removed non-significant influencing factors and included only six independent variables, greatly simplified the prediction model, making PPAR and H-PPR models more practical and obtain higher prediction accuracy. Third, we formulate the basic principles of the regulation and control of pork price according to the best weight size and ranking of the influencing factors obtained, reveal the main factors affecting the fluctuation of pork price and their transmission mechanism, and put forward the principles of strengthening the management of the pork supply chain. The research methods and conclusions in this paper make up for the deficiency of the existing literature and also provide an essential basis for decision making for the relevant government departments to take appropriate measures to stabilize the pork price.
4. Principles of PPR Modeling
This paper mainly establishes the PPAR and the hybrid multivariate prediction pursuit regression (H-PPR) models based on the time series data of pork prices and the other independent influencing factors.
4.1. Principle of Establishing the PPAR Model
Two basic assumptions exist for establishing a PPAR prediction model for monthly pork prices based on time series historical data. Firstly, multiple factors affect the monthly pork prices, and the relationship between these factors is very complex, making it difficult to have a mathematical model to represent them. However, the results of these factors are reflected in the changes in monthly pork prices. Secondly, the changes in monthly pork prices have a certain regularity, which autoregressive time series data can represent.
According to the research results of the existing literature, there are several short and large cycles in the monthly price of pork. Some of the literature asserts that the extensive process (low-frequency fluctuation) should be around 36~48 months (3~5 years), which is too long for the monthly price forecast modeling. Therefore, establishing PPAR modeling is generally dominated by small and medium cycles. To this end, this paper analyzes whether the 12-month autoregression delay
is significantly associated with monthly pork prices
. The modeling principle is as follows [
20]:
Step 1: The autocorrelation coefficient
of the delay
step of the time series data
is
where
,
, in general,
n is the number of time series data. With the k increasing, the variance of
increases, and the estimation accuracy decreases. Therefore, it is usually required to take a smaller value for m. According to the sampling distribution theory, the confidence level is (
) (generally being 70~80%). When the autoregressive correlation coefficient value meets
it can be inferred that delay steps
are significantly correlated with
, and
are used as predictors. The quantile values
can be found in the standard normal distribution table.
Step 2: According to the delay step
, we obtain the predictors
, and
is the number of autoregressive predictors. Because it is difficult to judge the maximum pork price (because it may continue to rise) and the minimum, standardization (normalization) preprocessing is generally adopted, and the prediction model of
with
is established. According to the principle of PPAR modeling, the normalized data of the
dimensional predictors
are projected to obtain one-dimensional projection values
where
is the best projection vector coefficient or weight of the
autoregressive predictor.
Step 3: Build the PPAR model between
and
. To study the fitting effect and predictive ability of the model more intuitively, the monthly data of the dependent variable pork price
is not normalized. A PPAR model based on the power index polynomial ridge function is established between the one-dimensional projection value
and the pork price
(dependent variable). To set the objective function as the minimum sum of error squares (least squares), that is,
where
is the predicted value of the PPAR model. The formula based on the cubic polynomial ridge function (PRF) is
where
are the coefficients of the PRF.
In practice, to prevent “overtraining” and “overfitting”, we try the linear ridge function first. The quadratic and cubic polynomial ridge functions are established if the accuracy requirements are unmet.
Step 4: Optimize the objective function (4) to obtain the optimal global solution and obtain the fitting error of the PPAR model based on the first ridge function . If the appropriate error meets the prediction accuracy requirements, stop building more PRFs and output the model parameters and the performance indicators such as RMSE and MAPE. Otherwise, follow Step 5 to create more dimensional ridge functions.
Step 5: Replace with , return to Step 2, repeat Steps 3 and 4, and establish a PPAR model based on the second and third ridge functions until the prediction accuracy requirements are satisfied.
Generally, the higher the order of PRFs or the more the number of PRFs, the more likely it is to have “overtraining” and “overfitting.” Therefore, the verification (test) sample should be set in modeling. The verification sample error decreases gradually and then increases, which indicates that “overtraining” and “overfitting” have occurred; the number of polynomials and the ridge function before “overtraining” and “overfitting” must be taken.
To verify the predictive and generalization capabilities of the PPAR model, we used the monthly data of pork prices in the last 12 months as a validation sample.
4.2. The Principle of Establishing the H-PPR Model of Monthly Pork Price Prediction Based on Multivariate Time Series
There are two basic assumptions for establishing an H-PPR prediction model for monthly pork prices based on multivariate time series historical data. Firstly, the prices of live pigs, beef, piglets, etc., are the main factors affecting the monthly pork prices, and the effects of other factors can be ignored. Secondly, there is a specific quantitative relationship between the monthly prices of live pigs, piglets, pork, etc., that lags 1–6 periods and the current monthly pork prices.
The PPAR model generally has relatively high fitting accuracy, generalization, and prediction ability. Still, the PPAR model only contains the monthly pork price data so that it can perform multi-period and inflection point price predictions. Still, it cannot forecast the pork prices that soared rapidly according to the PPAR model. Providing strategic decisions for pig industry development is challenging, and we cannot study the influence mechanism of pork price fluctuation, etc. To achieve these goals, it is generally necessary to establish a nonlinear model between the monthly pork price and its influencing factors. The correlation analysis between the 12 collected factors affecting the monthly pork prices (referred to as the independent variables) and the monthly pork prices show that all the independent variables were significantly correlated with the pork price, and the pork prices lagging 1 to 6 periods were also significantly associated with the current pork price.
It is of no practical significance to study the relationship between the prices of the independent variables and the pork prices in the same period because these independent variables also need to be predicted. Therefore, building a prediction model between pork price and the independent variable lagging several periods is standard practice. According to the current research results, the monthly price of piglets generally lags behind by six periods (months), while other independent variables are assumed to lag behind by one period (sometimes there are specific differences between different scholars, see [
4,
5,
14,
15,
20,
48]). The results of the correlation analysis of the monthly price of pork and the data of other independent variables lagging 1 to 6 periods show that (1) the longer the lag period, the lower the correlation; (2) there is a high correlation between the price of piglets and the monthly price of hogs to slaughter, as well as the cost of all feeding (monthly prices); (3) the pig ratio has a certain independence, but is highly correlated with the monthly price of pork. Therefore, considering the model’s practicality and meeting the need to study the transmission mechanism of the monthly price of pork, we should first establish a PPR model of the monthly price of pork and all 12 indicators. The modeling principle consists of the following two steps:
Step 1: The monthly piglet price with lagging six periods and the data of the other 11 independent variables lagging one period (from now on referred to as predictors or independent variables ), and the monthly price of pork in the current period is not normalized.
Step 2: Build the construction data and make a one-dimensional projection of the
-dimensional independent variable predictor data
to obtain the one-dimensional projection value of the sample
Steps three to five are the same as those for establishing the PPAR model.
We established two models to predict monthly pork prices: the first is a PPAR model based on the time series data of pork prices, and the second is an H-PPR model based on time series data of multiple factors with lagged periods. We compare the performance metrics of two models, BPNN, SVR, LSTM, and other models, and study the applicability, advantages, and disadvantages of the models.
7. Results and Discussion
7.1. Comparison of the PPAR, H-PPR, and MLR Models
(1) The PPAR model has very high fitting and prediction accuracy. According to the results shown in
Table 2, it can be seen that the fitting accuracy of the training samples and the prediction ability of the verification samples of the PPAR model are both higher than that of the H-PPR model. This once again shows that the established models, such as ARIMA, BPNN, PPAR, and SVR with univariate time series data, are feasible and meaningful for monthly pork price prediction, which confirmed again that the monthly pork price time series data contains a variety of factors. The monthly pork price volatility and changing trends include a certain regularity. Compared with ARIMA and BPNN models, the PPAR model is more concise, has a clear mathematical meaning, and has a relatively simple topology. According to the best weight, the PPAR model can be very convenient to determine the monthly pork price fluctuations and change trends based on the pork price with lag periods. The best weight of the pork price with lag period one is 0.800, and the price with lag period two is −0.575, which indicates that the pork price with lag period one has the most significant impact on the pork price and the pork price with lag period two has a reverse harmonic effect and the second significant impact on the monthly pork prices. The weight of the price with lag period three is only 0.174, and the impact is significantly lower than the lag periods one and two. Conversely, we cannot draw similar conclusions from ARIMA, BPNN, etc.
(2) The H-PPR model established has an excellent ability to fit the data, test the prediction and generalization of the samples, and reveal the transmission mechanism and effect of pork price, which can effectively regulate the monthly pork price. Although the H-PPR model’s data fitting accuracy and prediction ability are slightly lower than the PPAR model, according to the optimal weight of multiple factors, we can analyze the transmission mechanism of pork price change and judge the pork price fluctuation and changing trend, put forward more targeted measurement, and control pork price fluctuations or soaring, etc. Therefore, establishing the H-PPR model is essential for strengthening the pork supply chain management and promoting the healthy development of the pig industry chain. The H-PPR model also provides the basis for decision making.
It can be seen from the best weight of influencing factors in the H-PPR model that the hog price with a lag period of one has the most significant impact on the monthly pork price, followed by beef price, the pork price with a lag period of one, the hog–corn ratio, and the piglet price with a lag period of six. Therefore, if the departments for price monitoring and management find that the hog price has risen significantly, they must provide more pork supply to the market. Otherwise, the pork price will increase significantly in the next month. Similarly, if beef prices rise significantly, the departments must take corresponding measurements to provide more pork or beef supply to the market. Otherwise, the pork prices in the following months will certainly rise. If we delete the hog price to establish the H-PPR model, its objective function value is 209.88 (referring to the “H-PPR-12b” row in
Table 4), which is significantly greater than 188.00. If we delete the seven non-important independent variables, such as mutton, and establish the H-PPAR-6b model, the pork price with a lag period of one has the most significant impact on the pork price, followed by the hog–corn ratio, then the beef price, the finishing pig feed price, etc. Therefore, if we establish the models with various variables, we may obtain different prediction results. Thus, we must carefully select proper and reliable influencing factors for modeling.
(3) The reliability of the MLR model is difficult to guarantee. We establish the MLR model with the same data and 12 variables. This is because most of the variables do not obey the normal distribution, even if the fitting accuracy of the MLR model is not low. Theoretically, its reliability and robustness are challenging to guarantee. The MLR model is obtained as follows
Because there is collinearity between the variables, only the variable
is significant at level 0.01, the variables
are significant at level 0.1, and Equation (6) has a little sense. Using the stepwise regression method, we establish Equation (7) with the significant variables at level 0.10,
The model performance metrics of Equation (7) are shown in
Table 5.
Similarly, if the deadline date for the monthly pork price is May 2019, the MLR is established as
The model performance metrics of Equation (8) are shown in
Table 5. Compared with the performance metrics of various models in
Table 5, it can be seen that although the performance metrics of the MLR are almost the same as H-PPR-6 and H-PPR-6a, the bias of Equation (7) is much greater than that of H-PPR, indicating that the predicted value of Equation (7) is skewed. Its robustness and reliability could be better.
Comparing Equations (7) and (8), we found that some significant variables differ. The coefficients of the piglet, the mutton, and the compound feed of broiler chickens were less than 0, indicating that these variables adversely affect the pork price, which is difficult to explain in theory. At the same time, the hog–corn ratio has nothing to do with pork prices and is inconsistent with common sense and truth. Therefore, although the fitting and prediction accuracy of the MLR is not low, its results are challenging to explain reasonably, and its practicability is poor.
7.2. Comparison with Xiong et al. [4]
Xiong et al. [
4] used 11 variables, including monthly pork price, piglet price, lean pork futures prices, west Texas light (West Texas Intermediate) crude oil prices, etc., from January 2000 to March 2019, and established a dynamic model average (DMA) consisting of 2000 models (each model has four–five variables). Its results are compared with the Bayesian model, time-varying parameter model, etc. The deadline date for this paper is May 2019, which is almost the same as that of Xiong et al. [
4]. The RMSE and SMAPE of the three training sample models (DMA, dynamic model selection, Bayesian model average) are shown in
Table 6. The RMSE and SMAPE of the PPAR and H-PPR models established in this paper are also shown in
Table 6. It can be seen that the SMAPE of PPAR and H-PPR models for training and verification samples are both smaller than those in Xiong et al. [
4]. Meanwhile, the RMSE is slightly larger than that of Xiong et al. [
4]. The leading cause is the large prediction error in February 2019 (AE = CNY 2.27, RE = 9.63%). In fact, from November 2018 to January 2019, the pork price was CNY 23.69, 23.16, and 22.55, respectively, which was gradually reduced. It suddenly turned upward in February, rose by more than one to CNY 23.61, and resulted in a large prediction error. The prediction error in March returned to normal, indicating that the PPR model has good robustness.
The DMA model in Xiong et al. [
4] is too complex to practice and only suitable for theoretical research. The applicability needs to be higher. We need to solve 8000~10,000 models’ parameters to establish DMA, and its prediction accuracy is similar to that of PPR and PPAR models. In contrast, we need to solve a few parameters, such as the best weights of variables and the coefficients of PRF applying PPA, which is convenient.
Moreover, we cannot analyze the transmission mechanism of affecting pork prices through the DMA model.
7.3. Comparison of PPR with SVR, BPNN, etc.
The SVR/SVM and BPNN models have their characteristics and advantages. Although the results in many articles show that SVR’s fitting accuracy and prediction ability are better than that of BPNN, BPNN is still more used for price prediction and early warning. We apply data process system software [
50] and the STATISTICA Neural Network [
40] to establish the SVR and BPNN; the results are shown in
Table 6. The results of the SVR are closely related to the specified ranges of the parameters to optimize. For univariate pork price time series data, the BPNN network topology is 3-2-1 (the number of neurons on the input, hidden, and output layer are 3, 2, and 1), and the number of its connection weights is 11. The network topology 6-2-1 is used for multivariate times series data of 12 variables, and its connection weights are 17. The 24 verification samples (about accounting for 10%) are randomly selected. During the training process, we monitor the RMSE of the verification samples, stop training when the REMS of verification samples begin to rise, and take the network weights before “overtraining”. The number of the training samples is ten times greater than that of connection weights, which meets with the principle of modeling BPNN. The following can be seen from
Table 7: (1) For the univariate pork price time series data with lag periods of 1–3, the SVR has the smallest RMSE and SMAPE of the training sample and the largest values of the verification samples, which indicates that the generalization ability is poor. The PPAR and BPNN, their RMSE, and SMAPE of the training and verification samples are almost the same, which indicates that PPAR and BPNN without “overtraining” have good generalization ability. (2) For the multivariate pork price time series data, the RMSE and SMAPE of the SVR are the smallest, but those of the verification samples are large, which indicates that the generalization ability of the SVR is poor; the RMSE and SMAPE of the training samples of H-PPR are good agreement with that of verification samples, which indicates that the H-PPR model has good generalization ability and is also better than the BPNN model. The H-PPR, SVR, and BPNN outperform DMA, dynamic model selection, and Bayesian model averaging for training samples. Therefore, compared with the BPNN and SVR, the PPAR and H-PPR have similar fitting abilities but generally do not occur as “overtraining” and “overfitting” during modeling and have a better predictive ability and generalization ability.
Through the above comparison, we can conclude the following. Firstly, the PPAR and H-PPR models not only have simple structures but also have explicit structures with precise mathematical meanings, and their prediction accuracy is higher than other machine learning models such as SVR and BPNN. Secondly, the PPAR and H-PPR models are semi-parametric models. When establishing the models, only the coefficients (weights) of multiple factors or autoregressive terms and the coefficients of the ridge function need to be optimized, which is not easy to cause “over-training”. Thirdly, based on the established models, the importance of multiple influencing factors or autoregressive terms can be directly judged, making it easier to analyze the transmission mechanism of pork price, build a pork price control mechanism, and strengthen pork supply chain management. This promotes the sustainable development of pig farming, as well as upstream and downstream industries such as cattle, sheep, and chickens, agricultural product production, feed processing, and sales, and lastly, promotes sustainable agricultural and regional development.
7.4. To Predict the Pork Price Using the Latest Data Available
We collect the latest pork price data from January 2020 to November 2023 from the National Bureau of Statistics of PRC (
http://www.stats.gov.cn accessed on 5 January 2024) and the Ministry of Agriculture and Rural Affairs of PRC (
http://www.moa.gov.cn accessed on 5 January 2024). The websites do not provide multivariate time series data. So, we only establish the PPAR model using the data from January 2000 to November 2023 to predict pork prices in the following 13 months.
We input the normalized data into the PPA-based PPAR program, build a PPAR model with one quadratic PRF, and obtain the global optimal solution. The best weights are 0.1236, −0.4823, and 0.8672, and the best coefficients of the PRF are 22.3053, 19.7423, and −0.45775. We obtain the sample projection values
, and the predicted pork price
. The performance metrics of MAE, RMSE, MAPE, Max_AE, and Max_RE of the training samples are 0.8684, 1.5089, 3.37%, 7.708, 18.89%, and those of the verification samples are 0.8368, 1.2516, 3.23%, 2.145, 8.21%. The predicted values of the training and verification samples, as well as the forecasted samples in the following months, are shown in
Figure 4.
From
Figure 4, we can conclude that the pork price will gradually increase in the following months, and the departments of price management and business administration should pay more attention to the market and provide more pork, beef, etc., to the market.
8. Conclusions, Policy Recommendations, Limitations, and Future Research
8.1. Conclusions
(1) The sustainable development of the pig industry is an important component of animal husbandry, feed processing industry, and agriculture, which significantly impacts achieving sustainable economic, social, and environmental development. The reliable and accurate prediction and risk warning of pork price fluctuations are the foundation and guarantee for achieving the sustainable development of the pig industry, playing a leading role. Establishing PPAR and H-PPR models and accurately and reliably predicting the pork price changing trend help the Chinese government to establish a long-term mechanism to promote the sustainable development of the pig industry, improve and strengthen the system for pork (pig) price prediction and warning mechanisms, collect the information about feed prices such as corn and finishing pig feed as well as piglet prices in a timely manner, strengthen monitoring of African swine fever and other diseases, strengthen the management of the pig industry chain, ensure controllable price fluctuations and stable production, and achieve the sustainable development of the pig industry (animal husbandry) and its related industries, laying a solid foundation for sustainable agricultural development.
(2) There is important theoretical significance and practical value in establishing the PPAR and H-PPR models to forecast the monthly pork price and expand the method. We collect the time series data of the monthly pork prices from January 2000 to September 2020 as well as the other 12 influencing factors (variables), such as the piglet and corn prices. For the monthly pork price, the studied results of the PPAR model with one linear or quadratic PRF show that the pork price lagged by 1–3 periods has a significant influence, and the lagged period of one has the most and positive impact, while the lagged period of two plays is of secondary significance and has a reverse and harmonic impact. The PPAR model possesses high fitting accuracy and good generalization ability. According to the time series data of the piglet price with a lagged period of six, the other variables, and the pork price with lagged period one, we established an H-PPR model with one linear PRF. We found that seven variables, including the hog price, beef price, pork price, finishing pig feed price, piglet price, hog–corn ratio, and corn price, are important influencing factors. Among them, the hog price had the most significant impact, playing a decisive and positive role, followed by the beef and pork prices with a lagged period of one. The influence impacts of other variables are almost the same. Therefore, we established the PPAR and H-PPR models to expand a method for monthly pork price prediction.
(3) The generalization ability and applicability of the established PPAR and H-PPR models are better than SVR, BPNN, DMA, and other methods. Compared with SVR, BPNN, and DMA models, the PPAR and H-PPR models are semi-parametric and “white box” models. We established the PPAR and H-PPR models with a few parameters, which are more straightforward, more explicit in mathematical meaning, and more convenient for applications than the other models. According to the best weights of the established PPR models, we can directly judge the importance of the lagged periods of the pork price, the importance of each variable, and its ranking, put forward the practical measurement of adjusting the pork price, and study the transmission mechanism and effectiveness of the pork price. According to market surveys or collected data, if the hog price has risen significantly in a month, we should increase the pork and beef supply to stabilize the pork price. Otherwise, the pork price will dramatically increase in the next month. Similarly, if the corn and beef prices in a month have increased significantly, it indicates that the pork prices in the next month will also rise significantly. If the monitoring finds that the piglet price increases significantly, the monthly pork price will rise considerably in the sixth month.
(4) According to the PPAR model, we can forecast the monthly pork price in multi-periods with higher accuracy, and the government departments can conveniently judge the changing trend of the pork price. With the H-PPR model, we can forecast the monthly pork price with a lagged period of one and study the transmission mechanism and effectiveness of the pork price. The related government departments take adequate measures to strengthen pork supply chain management and take steps to control the pork price. The studied results of the PPAR model show that only the periods lagged by 1–3 of the monthly pork price have an important impact on the current pork price; it is not necessary to introduce more lagged periods into models, and it is beneficial to simplify the model, improving its practicability. The prediction accuracy of the PPAR model is even higher than the H-PPR model. Still, its shortcomings are not suitable for studying the pork price transmission mechanism and the measures and suggestions to control the pork price. According to the results of the H-PPR model, we can analyze the transmission mechanism and effectiveness of the monthly pork price, and the government authorities can strengthen the management of the pork supply chain and promote the healthy development of the pig industry chain. We established the H-PPR model to delete seven factors with lower influence, although this does not mean that these seven factors are unrelated to the monthly pork price. Their influence impact has been reflected by factors such as hog–corn ratio, corn price, etc. The transmission mechanism of the monthly pork price is very complex and needs to be studied further.
(5) We establish a PPAR model using the latest pork price data from January 2000 to November 2023 to forecast the trend of pork prices changing in the following months. The results show that the pork price will rise in the future. The departments of price management and business administration should closely monitor the changes in pork prices and take timely measurements to adjust pork, hog, beef, etc., supply to ensure stable prices and increased efficiency in the pig farming industry.
8.2. Policy Recommendations
- (1)
To improve the monitoring of the monthly pork price, piglet price, other information, and the timeliness of monthly pork price prediction.
The pork price is the center of the whole price system of the pig industry chain. There is a lagged effect in the price transmission of pig breeding, and the transmission effectiveness of slaughtering and sales links also has information asymmetry, as well as sudden situations such as swine fever, which is highly likely to lead to drastic price fluctuations. Therefore, if the monthly pork price is to be controlled within a reasonable range, the relevant government departments must further improve the daily monitoring of the monthly pork price, piglet price, and other information and timely feedback on the drastic changes in relevant prices, to improve the timeliness and reliability of the monthly pork price forecast.
Many factors influence the pork price. According to the results of this paper, the pork price monitoring system mainly involves primary data collection, management, processing, etc. It should focus on monitoring the baby cost (piglets price), feeding cost (corn, pig ratio, pig, and chicken feed prices, etc.), alternative production prices (such as beef, mutton, live chicken, etc.), and hog price index etc. We must apply the timely data to establish the PPAR and H-PPR models to ensure the timeliness of the monthly pork price prediction. Based on timely pork price predictions, the market subject can make good decisions and take corresponding measures to keep the pork price fluctuation within a reasonable range, ensuring the orderly operation of the market mechanism.
- (2)
To standardize the release of the pork price information and to realize real-time information sharing.
Information asymmetry is a fundamental reason for the risk of the pork market. The regulatory information department should promptly release the pork price forecast results and the price information of related products, simplify the information query process, and realize information sharing. In this way, the market administrators, producers, and operators in the pig industry chain can, in a timely and accurate manner, grasp the market development trend and reliably guide the market administrators, producers, and operators to adjust the production and operation decisions according to the forecasting information, and actively adapt to the changes in the market situation.
- (3)
To improve the risk early warning system of the monthly pork price and the government’s coordinating ability.
Relevant government departments should establish an emergency control mechanism for pork prices to ensure market supply and price stability. Sudden outbreaks such as African swine fever are unpredictable and quickly lead to drastic changes in pork prices in the short term. Therefore, in addition to monitoring the price information, the relevant government departments must also closely monitor the epidemic situation of pigs, coordinate the release and storage of frozen pork meat from the central reserve in a timely fashion, and ensure the essential balance between the supply and demand of pork, to reduce the adverse impact of the pig epidemics.
8.3. Limitations and Future Research
Theoretically, the relationship between pork supply and demand should be one of the essential factors in determining the monthly price change of pork. Data composition techniques, such as VMD, EEMD, etc., have been widely applied in modeling time series data, and there are still some differences in their effectiveness. So, there are two limitations in this paper. First, without complete data on pork’s supply and demand, similar to the other literature, we do not consider the monthly supply and demand of pork in our modeling. Furthermore, infectious and sow reproductive diseases have always threatened the sustainable development of the pig farming industry; there is a shortage of related information, so we do not consider these factors. Second, we establish PPAR and H-PPR models using the original data, do not decompose the pork price time series data into independent components, and do not compare whether the data decomposition will improve the generalization ability, applicability, and reliability. In future research, we should collect and consider the pork supply, demand, and disease factors to establish H-PPR models. Secondly, we will decompose the pork price into independent components using VMD and EEMD, etc., and study whether data decomposition techniques will improve the model performance or not.