Next Article in Journal
Design and Evaluation of a Precision Irrigation Tool’s Human–Machine Interaction to Bring Water- and Energy-Efficient Irrigation to Resource-Constrained Farmers
Previous Article in Journal
The Impact of Corporate Governance on Sustainability Disclosures: A Comparison from the Perspective of Financial and Non-Financial Firms
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

An Integrated Hog Supply Forecasting Framework Incorporating the Time-Lagged Piglet Feature: Sustainable Insights from the Hog Industry in China

1
School of Management, Huazhong University of Science and Technology, Wuhan 430000, China
2
National Hog Big Data, Rongchang, Chongqing 402460, China
*
Author to whom correspondence should be addressed.
Sustainability 2024, 16(19), 8398; https://doi.org/10.3390/su16198398
Submission received: 13 August 2024 / Revised: 13 September 2024 / Accepted: 23 September 2024 / Published: 27 September 2024

Abstract

:
The sustainable development of the hog industry has significant implications for agricultural development, farmers’ income, and the daily lives of residents. Precise hog supply forecasts are essential for both government to ensure food security and industry stakeholders to make informed decisions. This study proposes an integrated framework for hog supply forecast. Granger causality analysis is utilized to simultaneously investigate the causal relationships among piglet, breeding sow, and hog supply, as well as to ascertain the uncertain time lags associated with these variables, facilitating the extraction of valuable time lag features. The Seasonal and Trend decomposition using Loess (STL) is leveraged to decompose hog supply into three components, and Autoregressive Integrated Moving Average (ARIMA) and Xtreme Gradient Boosting (XGBoost) are utilized to forecast the trends, i.e., seasonality and residuals, respectively. Extensive experiments are conducted using monthly data from all the large-scale pig farms in Chongqing, China, covering the period from July 2019 to November 2023. The results demonstrate that the proposed model outperforms the other five baseline models with more than 90% reduction in Mean Squared Logarithm (MSL) loss. The inclusion of the piglet feature can enhance the accuracy of hog supply forecasts by 42.1% MSL loss reduction. Additionally, the findings reveal statistical time lag periods of 4–6 months for piglet and 11–13 months for breeding sow, with significance levels of 99%. Finally, policy recommendations are proposed to promote the sustainability of the pig industry, thereby driving the sustainable development of both upstream and downstream sectors of the swine industry and ensuring food security.

1. Introduction

The pig industry in China plays a vital role in the country’s economy and social well-being, exerting significant influence on people’s daily lives and overall socioeconomic stability. Data from the National Bureau of Statistics (NBS) reveals that the output value of China’s pig industry amounted to 1784.92 billion CNY in 2022, contributing 1.48% to the gross domestic product (GDP). With 19.18 million pig farms in operation in 2022, the industry provides significant employment opportunities. Among these farms, 18.0 million had an annual output of less than 50 hogs. This industry plays a pivotal role in advancing rural development and improving farmers’ incomes. Pork is the primary meat protein consumed by the Chinese population, especially in rural areas. Per-capita pork consumption accounts for 77.46% of total meat consumption, highlighting the importance of the pig industry to the food security of the population.
Being the world’s largest producer and consumer of pork, market dynamics in China have profound implications not only domestically but also globally [1]. The Ministry of Agriculture and Rural Affairs (MARA) in China reported a pork production of 57.94 million tons in 2023, accounting for over half of the global total. Fluctuations in China’s hog production capacity are reflected in corresponding fluctuations in the country’s pork imports. As indicated by data from the United States Department of Agriculture (USDA), China’s pork imports in 2020 is projected to reach 4.8 million tonnes, representing approximately 46% of the global total and approximately 93.93% of the total for Germany, the world’s third-largest pork producer, in the same year.
The hog supply has exhibited significant fluctuations over time, influenced by various factors. Figure 1 illustrates the annual trends in national pork production and the growth rate over the past sixteen years. Despite notable fluctuations, there is an overall upward trajectory in pork production, attributed to technological advancements, scale expansion, and increased pork demand [2]. These fluctuations have been influenced by various factors, including epidemics, policy adjustments, and market price [3,4,5]. Figure 2 showcases the significant seasonal variations in pork supply from 2018 to 2024, driven by breeding cycles and holiday-driven consumer demand [6,7]. Additionally, the lengthy biological growth cycle of pigs, which lasts approximately 18 months from breeding sow replenishment to market maturity, introduces substantial time lags and the uncertainty of production capacity. Furthermore, human decisions, such as halting sow insemination, can induce capacity fluctuations extending over several months or even years [8].
The demand for pork has undergone a gradual stabilization following an initial period of expansion [9]. Nonetheless, the price of pork remains highly susceptible to fluctuations in pig production capacity. The Chinese government has implemented a series of policies to ensure stable pig supply in order to ensure food security. Notably, the pig industry has consistently received emphasis in the annual inaugural policy issued by the Chinese government over the past five years. Serving as the regulatory authority for the Chinese hog industry, the MARA introduced the “Implementation Plan for Capacity Regulation in the Pig Industry” in 2024. This plan highlights the crucial role of early warning systems for production capacity in maintaining industry stability, with accurate prediction of future hog supply being the key component. However, the intricate interplay among the multifaceted factors poses significant challenges to accurately forecasting hog supply, complicating efforts for academia, policy makers, and industry practitioners alike. Motivated by the importance of hog supply, the objective of this study is to establish a framework for accurately forecasting hog supply.
Despite the importance of the pig industry, current literature on hog supply forecasting is limited. Research in hog supply mainly employed statistical methods such as vector autoregressions (VAR) [10], linear regression analysis [11], and recursive models [12]. More recently, Machine Learning (ML) has improved resource management and enabled more sophisticated decision making processes, revolutionizing agriculture field [13,14,15]. In the hog industry, ML methods are utilized to analyze pig cycle patterns [16] and predict pork price trends [7,17]. The utilization of ML methods for forecasting hog supply has received limited attention in research. This can be attributed to the inherent challenges associated with developing accurate predictive models in the highly volatile environment of the hog industry. Nevertheless, it is during these periods of dynamic change that precise forecasting of hog supply holds the utmost economic value [18]. Hence, this study proposes a hybrid ML forecast model leveraging the latest ML techniques to enhance the precision and effectiveness of hog supply forecasting.
This study collected five years of monthly records from July 2019 to November 2023 from large-scale farms to conduct this research. This monthly data exhibits superior granularity in comparison to the annual hog supply data in the existing hog supply forecasting research [11], enabling the government to take more timely and scientific measures to guarantee food security. Firstly, the piglet is incorporated as a key feature for forecasting hog supply. In the pig industry chain, the time lag period is a crucial concept that refers to the temporal gap between the occurrence of an event or factor and the subsequent realization of its actual consequences. The piglet faces a shorter lag period before reaching the market hog, minimizing the influence of external factors embedded in pig growing process. Furthermore, the widely accepted notion of a five-month time lag from piglet to hog supply lacks rigorous theoretical research and statistical validation. This study employs the Granger causality test [19] to investigate, for the first time, the causal relationship between piglets and hog supply, and simultaneously determine the uncertain time lags. The findings demonstrate that incorporating the feature piglets with the identified time lags leads to 42.1% reduction in the Mean Absolute Error (MAE), a significant enhancement in predictive accuracy. This study, for the first time, statistically demonstrate the extract time lag periods for piglets is 4–6 months, for breeding sows is 11–13 months in Chongqing. This work pioneers the application of the Granger causality test in the analysis of features with uncertain time lags in the pig industry.
This study aims to develop a sophisticated hybrid machine learning-based time series predictive model specifically designed for forecasting hog supply, with the goal of fostering scientific decision making. The fluctuations in hog supply are driven not only by trends and cyclical patterns [20], but also by irregular disruptions. To capture the dynamic patterns of the pig industry, this study adopts the STL (Seasonal and Trend decomposition using Loess) method [21] to decompose the time-series data into three components: trend, seasonality, and residuals. The trend component is predicted using the Autoregressive Integrated Moving Average model (ARIMA) [22], while the seasonal factors and residuals are predicted using the Xtreme Gradient Boosting (XGBoost) model [23]. Compared to conventional black-box ML prediction models, the model facilitates a deeper understanding of the intrinsic pattern dynamics in pig industry and offers enhanced interpretability. From the experimental results, the hybrid model demonstrates a superior prediction accuracy with more than a 60% reduction in MSE against the existing time series forecasting models. Based on the prediction results and our findings, scientific decision can be made by the government and the industry stakeholders.
The main contributions of this paper are summarized as follows:
  • The Granger causality test is employed to investigate, for the first time, the causal relationship between the number of piglets and hog supply while simultaneously determining the uncertain time lags between them. The analysis indicates that piglets are a crucial factor in enhancing the accuracy of hog supply forecasts.
  • A hybrid prediction module is proposed, combining machine learning technology with the intrinsic development pattern of hog supply characterized by STL decomposition. Experimental results demonstrate that the proposed model achieves higher accuracy compared to traditional statistical and machine learning forecasting methods.
  • Based on the study findings, it is suggested that the government incorporates the number of piglets as a supplementary indicator and dynamically adjusts the monitoring baselines of breeding sow and piglet. Furthermore, the government can adopt more forward-looking policies based on future hog supply prediction.
The remainder of the paper is structured as follows: Section 2 provides a comprehensive review of the existing literature on hog supply. Section 3 outlines the data resource and data preprocessing steps. Section 4 presents the proposed hog supply forecasting model. Section 5 analyzes the experimental results. Section 6 offers policy suggestions based on the findings. Section 7 concludes the paper and provides future research directions.

2. Literature Review

2.1. Factors Related to Hog Supply

Researchers have conducted studies to examine the impact of different factors on the dynamics of hog supply [3,4,24]. These factors can be categorized into external social and economic factors, and internal pig-growing-process-related factors.
The impact of external social and economic factors on hog supply is significant and diverse, including the income level of residents, total factor productivity growth, policies, events, and seasonal factors. As the economy has progressed, the rising income levels of residents have led to a significant surge in pork demand, consequently driving an expansion in hog supply [25,26]. Total factor productivity growth and changing the pattern of production have boosted pig feeding technology development, enhancing production efficiency and increasing hog supply [2,3]. Policies have a substantial impact on hog supply through the formulation of measures concerning subsidies, environmental protection, and epidemic prevention and control [4,27]. Epidemics such as African Swine Fever (ASF) can significantly influence the hog supply, resulting in a substantial impact on market availability due to the potential mass mortality of swine during outbreaks [1,28,29,30]. These shock scenarios lead hog cycles to be irregular with varying phase and amplitude [24]. Seasonal factors like holidays directly influence the pork demand, and consequently, impact the seasonal variations in hog supply [6,31].
Internal factors associated with the pig growing process are closely related to hog supply, such as the number of breeding sows, length of the growth cycle, the reproductive rate, and pork price [12,32]. Hog breeding, preparation of pork, market conditions, and pricing are also used to forecast pork supply [33]. The cobweb theory [34] revealed that hog supply affect the pork price, with the pork price affecting future hog supply decisions. The government has utilized the breeding sow as a crucial regulatory indicator, and scholars have relied on this indicator for forecasting future hog supply [6]. Moreover, the length of the growth cycle directly impacts the timing of hog market, and the reproductive rate affects hog inventory and the number of breeding pigs [35].
The piglet, which has a shorter lag period and is less influenced by external disturbances during the growth process, has not received sufficient attention in existing research. This study examines the correlation between piglets and hog supply, comparing it with the commonly used feature of breeding sows.

2.2. Hog Supply Forecast Models

Research on hog supply has been a subject of interest for several decades, with studies dating back to the 1970s [36]. These research primarily utilized econometric models to establish the relationship between pork price and hog supply [36,37]. Brandt et al. [38] and Kaylen [10] adopted VAR (Vector Autoregression) for forecasting hog supply, offering a more sophisticated alternative to univariate time series methods. To enhance the prediction of pig supply, additional features were introduced into forecasting models. The pig price lag feature was incorporated to estimate the hog supply using geometric distributed model [39] and random coefficient regression [40]. Ref. [41] utilized the generalized autoregressive conditional model to forecast the hog supply. Liang et al. [11] employed linear regression model with many factors such as hog price, the level of hog inventory and emergency and government policy, etc. Additionally, Zhang et al. [12] constructed a recursive model to predict pig population, using features such as newly kept piglets and breeding sows.
Recently, there has been a gradual increase in the usage of ML methods within hog industry, particularly focusing on pig price prediction and the pig cycle. ML-based hog price prediction models include techniques such as support vector regression (SVR) [42], XGBoost [43], and Long Short-Term Memory (LSTM) [44]. Despite the significant advancements in ML methods, there remains a dearth of research on utilizing these techniques for accurately forecasting hog supply. Moreover, pure ML models are considered to be a blackbox. The decomposition-ensemble forecasting method disaggregates time series data into multiple scales, leveraging ensemble empirical mode decomposition (EEMD) [45], the Hodrick–Prescott (HP) filter method [17], and STL [7]. This approach facilitates a more profound comprehension of intrinsic patterns in the hog industry and enhances model interpretability. Some ensemble algorithms, such as the SFE-NET method [33], (WSO)-CART [46] have also been applied to pig production capacity forecasting with notable success. By integrating the strengths of various algorithms, these methods achieve more accurate predictions of pig production capabilities.
In this study, the hybrid model considers multiple features and use decomposition-ensemble forecasting framework, providing the model with deeper interpretability than traditional ML models.

3. Materials: Data Resource

3.1. Data Description

The datasets used in this research are primarily from two sources. The first dataset consists of 53 monthly records from July 2019 to November 2023 obtained from more than 5000 large-scale farms in Chongqing, which is the largest directly administered municipality in China. The hog supply trends in Chongqing are representative of national hog supply patterns. Forecasting hog supply provides insights into overarching trends in China’s hog production. Monthly frequency data analysis facilitates short-term forecasts and enables timely government decision making. This dataset documents the growth process of pigs in Chongqing, which involves comprehensive and detailed data on the number of market hogs, the numbers of breeding sows and piglets, the number of large farms, all hogs, and hog deaths. Market hogs, which represent the supply of hogs, are considered the dependent variable in the this study. The second dataset is collected from government departments such as MARA and NBS. The official data released by government departments hold significant authority and reliability, serving as valuable information resources for analyzing the pig market. This dataset contains factors like pork prices, corn prices, bean prices. and chicken prices, which are important factors affecting the supply of hogs.
A comprehensive statistical analysis of the collected variable data are conducted. The statistical features provide rich information about the distribution, volatility, and trends of data, offering insights into its underlying patterns and dynamics. This study assesses each variable in terms of the mean and Standard Deviation ( S t d ). The mean, often referred to as the average, represents the central value of a dataset. It is calculated by summing all the data points and dividing by the number of observations. The formula for the mean of a dataset is given by:
M e a n = 1 N i = 1 N x i
where N is the number of observations and x i represents each individual data point. The Std measures the dispersion or spread of the data points around the mean. The formula for the standard deviation is:
S t d = 1 N i = 1 N x i M e a n 2
A list of data descriptions is shown in Table 1.

3.2. Data Preprocessing

The raw data necessitate meticulous processing to ensure the integrity and reliability of subsequent analyses due to the presence of noise, inconsistencies, missing values, and outliers. The data cleaning process involves handling missing information, correcting outliers, and sample frequency alignment. This rigorous data preprocessing not only enhances the quality of the data but also significantly contributes to the robustness of the predictive models developed in this study.
Since some data points may be lost due to equipment failure and data storage issues, a nonlinear polynomial interpolation technique [47] is applied to deal with the missing values. Polynomial interpolation uses the values of a function at a number of known points in an interval to make an appropriate approximation of a particular function, where the function is assumed to be in polynomial form.
To identify outliers existing in raw data, a statistical technique z-score is used. The z-score is the ratio of each data point’s deviation from the mean to the overall standard deviation. Data points with z-score greater than 2 are considered possible outliers. A rolling average process with a sliding window (e.g., 3 months) is employed to calculate the mean and replace each data point with the mean value within that window. This method removes abnormal fluctuations in the short term while preserving the long-term trend of the data.
The sampling frequency of the pig industry data are uneven as datasets are from different sources. The dataset from large-scale farms is monthly records, while data from open resource range from monthly to daily. Specifically, daily data, such as price data, are down-sampled by calculating monthly averages, to match the sampling frequency as monthly hog supply data.

3.3. Feature Selection

Feature selection is the process of identifying and selecting the most relevant features from a dataset for use in model construction. By focusing on the most pertinent features, one can improve model performance, reduce computational costs, and gain clearer insights from the data. The Pearson correlation coefficient (r) is initially employed to identify the top k key features influencing the forecast. Multicollinearity among these features is subsequently assessed using the Variance Inflation Factor (VIF). Additionally, industry expertise is integrated to further refine the feature set. Pearson correlation coefficient is a measure of the linear relationship between two continuous variables. Pearson’s r ranges from −1 to +1. An r value of +1 indicates a perfect positive linear relationship, −1 indicates a perfect negative linear relationship, and 0 indicates no linear relationship between the variables. The formula for Pearson’s correlation coefficient is:
r = i = 1 N x i x ¯ y i y ¯ i = 1 N x i x ¯ 2 i = 1 N y i y ¯ 2
where x i and y i are individual data points for variables X and Y and x ¯ and y ¯ are the means of the variables.
VIF is a statistical measure used to quantify the degree of multicollinearity in regression models. High multicollinearity can inflate the variance of coefficient estimates and make the model less reliable. The threshold value for VIF needs to be determined, commonly set at 10. Features with VIF values exceeding this threshold indicate high multicollinearity and may need to be addressed. The VIF for a feature x i is given by:
VIF x i = 1 1 R i 2
where R i 2 is the coefficient of determination from the regression of x i on all other features.
The Pearson correlation between variables and market hog is firstly calculated. Historical data of breeding sows and piglets are utilized with 10-month [35] and 5-month time lags in accordance with industry practices. The top eight features are selected according to r value. Then, VIF value of these eight features are calculated, as shown in Table 2. According to industry expertise, breeding sow and piglet are two dominant features to indicate hog supply. Hence, in this study, these two features are kept to last until other feature are processed. We remove features with high VIF value to reduce redundancy. After removing the features, VIF values are recalculated for the remaining features to ensure that multicollinearity has been sufficiently mitigated. Ultimately, two distinct groups of features has been delineated with breeding sows or piglets, respectively. Table 3 shows the VIF values of these two groups. The data indicate that the VIF values are below the threshold of 10, which suggests the absence of multicollinearity within the dataset. Finally, the two groups of features are selected for further analysis.

4. Methodology

4.1. Structure Outline

As illustrated in Figure 3, this study proposes an integrated hog supply forecasting framework, which include four stages: data preprocessing, feature selection, feature engineering, and hybrid prediction module. The first stage is data preprocessing, which includes missing value completion, outlier handling, and sample frequency alignment. The second stage is feature engineering, which aims to identify the most relevant features from a dataset to improve the performance of the model. The third stage is feature engineering, which consists of feature analysis and feature extraction. Feature analysis focus on comparing the effectiveness of features in prediction. The Granger causality test is leveraged to investigate the causal relationship of these features and simultaneously determine the most likely time lags. Feature extraction focus on extract features from existing data, such as the statistical features. The fourth stage involves the hybrid hog supply prediction module combining STL with ML techniques. A multiplicative STL model is firstly used to decompose the time series of dependent data into three component, trend, seasonality, and residuals. Then, ARIMA-based prediction model captures the trend component, and XGBoost-based prediction models provide accurate predictions for the seasonal and residual components. The final prediction is a multiplication of the predictions of these individual components. To test the capability of the proposed model, comparative analysis and model evaluation are conducted by comparing the proposed models with existing time series prediction models. The mathematical notations used are summarized in Table 4.

4.2. Feature Engineering

Feature engineering involves feature analysis and feature extraction. Feature analysis focus on comparing the effectiveness of features in prediction. Feature extraction focus on extracting features related to temporal aspects, statistical properties, and other relevant characteristics. For feature with uncertain time lags, a Granger causality test is employed to ascertain the time-lagged relationship.

4.2.1. Feature Analysis: Piglets and Breeding Sows

In the pig industry, maintaining equilibrium between supply and demand constitutes a complex, multifaceted challenge [35]. When supply exceeds demand, farmers may reduce the number of pregnant breeding sows. In contrast, an excess of demand over supply prompts farmers to augment the population of breeding sows to boost production. However, such decisions to expand or reduce the number of breeding sows do not immediately influence the hog supply due to the lengthy biological growth cycle of pigs. In China, the imperative to maintain sow farrowing rates and optimize the biological performance of offspring requires the selection of specific pig herds for breeding purposes. As a result, China’s pig production capacity lacks immediate responsiveness to market fluctuations. Figure 4 depicts the biological development process of pigs from sow birth to market hog, incorporating stages such as pregnancy, delivery, and fattening, based on industry knowledge and research. Typically, it requires approximately 18–20 months to generate a new supply of market hogs from the birth of a sow. Consequently, even if farmers implement measures immediately, a tangible increase in pig supply will not be observed for at least 18 months.
Traditionally, the number of breeding sows is considered the weather vane of the pig market, as it directly correlates with the forthcoming availability of hogs [6]. However, there is a significant time delay of approximately 10 months between breeding sows and the market hogs. In this lengthy process, capacity can be influenced by factors including the farrowing rate and the survival rates. Additionally, human interference, such as intentionally delaying the gestation of sows, can impact the final hog supply.
The number of piglets, however, seems to be a more accurate barometer in hog supply. Early in 1995, ref. [48] inferred the hog price from the piglet price, demonstrating the significance relationship between piglets and hogs. In reality, piglets exhibit a shorter time lag of approximately 5 months in responding to market demand compared to breeding sows. Moreover, piglets face less influence from human decisions and external factors, including changes in policy and market price fluctuations. This diminished exposure to uncertainties contributes to a more stable and predictable process from piglet growth to market hogs. These aforementioned reasons make piglets a potentially promising feature for predicting pig supply, which motivates us to assess its feasibility.
To elucidate the impact of piglets and breeding sows on the hog supply volume, this study employs historical data trend comparison to intuitively demonstrate the trends and correlations. As mentioned above, breeding sows and piglets variables are lagged with 10-month [35] and 5-month time lags from industry practices. Additionally, breeding sow data are multiplied by the MSY (Market pigs per Sow per Year) to estimate hog supply, ensuring a consistent scale for data comparison. A graphical representation of the trends and correlations is depicted in Figure 5, including the estimated number of hog supply from breeding sows, the number piglets, and the hog supply. The Pearson correlation coefficient is tested to measure the degree of association between variables. As illustrated in Figure 6, the correlation coefficient between breeding sows and hog supply is 0.73. The resulting correlation matrix reveals a higher correlation of 0.83 between piglets and hog supply, indicating a strong positive relationship.
Hence, in this study, the number of piglets is recognized as a key feature and is incorporated to enhance the hog supply prediction accuracy.

4.2.2. Time Lag Features

There are many features with time lags, such as piglets and breeding sows. Due to the growth cycle of live pigs, which is susceptible to human and external influences, there is uncertainty in the time lag. Theoretical investigations into the precise statistical time lag periods are currently lacking. To address this research gap, this study employs the Granger causality test to identify the casual relationship and determine the uncertain time lag periods between variables. The procedural steps of the Granger causality test are as follows:
  • Unit Root Test. The Granger causality test is conducted by checking for stationarity in the time series via the ADF (Augmented Dickey–Fuller) test. If the ADF test indicates nonstationarity (i.e., the test statistic does not exceed the critical value and the p-value is above the significance threshold), the data must be differenced until stationarity is achieved.
  • Cointegration Test. If both time series variables satisfy the condition of single integrality of the same order, cointegration tests are needed to check for a long-term equilibrium relationship between them. Cointegration implies a stable long-run relationship but does not confirm causality. After ADF tests are performed on the two variables, the Ordinary Least Squares (OLS) is used to formulate the relationship between these two variables. If the residuals, denoted as r e s i d , remain stable following the ADF test, it indicates that there is a cointegration relationship between the variables.
  • Causality Test. Upon establishing cointegration between two variables, a Granger causality test based on VAR model is applied. Granger causality is present if past values of one variable significantly predict current values of the other at a specific lag order.
  • Time lag period determination. The range of candidate time lag periods is first determined using industry knowledge. For each time lag, step 3 is executed, followed by conducting an F-test. The time lag periods with a confidence level of 95% or more are accepted. Note that different from the cases with a fixed time lag, there may be multiple possible values for the uncertain time lag. The causality relationship at each of these values exhibits a confidence level of at least 95%.
Thus, the Granger causality test can be employed to ascertain causal relationships and identify the time lag periods with significant confidence. In this study, time lag features are investigated, specifically the number of piglets and the number of breeding sows, and their impact on the supply of hogs.

4.2.3. Temporal Features

In time series forecasting, temporal features primarily reflect the characteristics of time series data as they evolve over time. These features are crucial in time series analysis for capturing periodicity, trends, and other significant patterns. Fundamental time units such as years, quarters, months, weeks, days, hours, and minutes reveal variations in the data across different time scales. Additionally, special temporal points such as holidays, weekends, the beginning or end of months, and the start or end of years can significantly impact time series data, necessitating particular attention.
To grasp these seasonal patterns, temporal features are extracted from the dataset at regular intervals, encompassing years tag, quarters tag, and months tag features. Additionally, demand surges during cultural celebrations such as the Spring Festival, often influenced by making cured meat. Spring festival tag features are incorporated for holiday information. Correlating specific times with changes in hog supply enables the model to account for predictable fluctuations arising from cyclical farming processes.

4.2.4. Statistical Features

The statistical features embeded in time-series data provide rich information about the data’s distribution, volatility, and trends, offering insights into its underlying patterns and dynamics. These features are akin to interpreting signals within the data, providing a deeper understanding of its behavior over time. In this study, statistical features are extracted from hog supply data. The statistical features at time step t are computed using the data within a sliding window of size w prior to t. In addition to commonly used statistical features such as mean, standard deviation, maximum, and minimum, slope, specifically skewness and kurtosis, is also considered for the data within the window.
  • Skewness: Skewness, denoted as X t s k e w n e s s , measures the asymmetry of the data distribution. It is calculated as:
    X t s k e w n e s s = 1 w i = t w t Y i Y ¯ 3 1 w 1 i = 1 w Y i Y ¯ 2 3 / 2
    where Y i is i-th hog supply data and Y ¯ is the mean value of w window size data.
  • Kurtosis: A measure of the sharpness or kurtosis of the data distribution, denoted as X k u r t o s i s . The formula is:
    X t k u r t o s i s = 1 w i = 1 L Y i Y ¯ 4 1 w i = 1 w Y i Y ¯ 2 2 3
These statistical features provide important information about the data and help us to understand and analyze the nature and trends of the data.

4.2.5. Other Relevant Features

There are many other relevant features related to hog supply. In this study, other internal factors are considered like the number of all hogs and hog deaths, pork price, as well as external factors like the the corn price and chicken price. Analyses of the number of all pigs in stock and the number of death pigs can have a more complete understanding of the production status and health of the pig market. The economic returns of pig farmers are directly impacted by the fluctuation of pig prices, which subsequently influences their production decisions. The price of corn, the primary feed source for hog farming, directly affects the cost of feed. Additionally, the price of chicken, which is a substitute meat option, plays a significant role. Pork and chicken often exhibit a substitute relationship, where lower chicken prices may lead consumers to prefer chicken as their meat source, thereby reducing the demand for pork.

4.3. Hybrid Prediction Module

A hybrid hog supply prediction module is proposed combining of STL and ML techniques. STL model is employed to decompose the time series of dependent data into three components: trend, seasonality, and residuals. The trend component is then predicted using ARIMA model, while the seasonal and residual components are predicted using the XGBoost algorithm. Next, the proposed forecast model is elaborated in details.

4.3.1. STL Time Series Decomposition

When dealing with time-series data, STL offers an efficient approach to decompose the data into its underlying components. By dissecting the data, a better understanding of its intrinsic structure and nature can be gained. Considering factors that affect hog supply can be categorized into trends, i.e., seasonal and irregular factors, this model introduces STL decomposition method, to decompose time-series data Y t into three components: the trend term T t , the seasonal term S t , and the residual term R t . This study hypothesizes that seasonal and trend effects evolve over time, with their magnitudes varying in conjunction with changes in other influencing factors. Consequently, employing a multiplicative model in such instances facilitates a more accurate depiction of these dependencies, thereby yielding more precise outcomes.
STL mainly consists of inner and outer loops. Suppose the inner loop has n ( i ) iterations and the outer loop has n ( o ) iterations. After the k-th iteration of the inner loop, the decomposed three terms are denoted as T t ( k ) , S t ( k ) , and R t ( k ) , respectively. The multiplicative model is expressed as Y t = T t × S t × R t and it is equal to log Y t = log T t + log S t + log R t . The number of observation samples included in a cycle is n ( c ) . At k + 1 iteration, the steps of the inner loop are as follows:
  • Detrending. The time series data Y t is detrended as log Y t log T t ( k ) and the initialization of log T t ( k ) is set to 0.
  • Smoothing of cycle-subseries. LOESS (Locally Weighted Scatterplot Smoothing) processes the cycle-subseries with a smoothing parameter n ( s ) . The resulting smoothed sequence is C t ( k + 1 ) .
  • Smoothed cycle-subseries low-pass filtering. The trend L t ( k + 1 ) from C t ( k + 1 ) is retrieved by low-pass filtering subsequence, which consists of a moving average of length n ( c ) , a moving average of length 3, and a LOESS regression with parameters n ( l ) .
  • Smoothed cycle-subseries detrending. Removing the low flux of the smoothed cycle-subseries yields the seasonal term log S t ( k + 1 ) = C t ( k + 1 ) L t ( k + 1 ) .
  • Deseasonalizing. The seasonality term is then removed from the time series data Y t to obtain the de-seasonalized term log Y t log S t ( k + 1 ) .
  • Trend smoothing. The deseasonalized term calculated is then smoothed via LOESS regression with parameter n ( t ) . The output is provided if it converges, otherwise return to Step 1.
After an inner loop is performed, the remaining terms can be obtained:
log R t ( k + 1 ) = log Y t log T t ( k + 1 ) log S t ( k + 1 )
The outer loop part is undertaken thereafter and primarily utilized for determining the robust weights. Once the iteration results have stabilized, the time series data for hog supply Y can be decomposed into three components, i.e., T, S, and R, to facilitate deeper understanding of the intrinsic system dynamics.

4.3.2. ARIMA-Based Trend Prediction

The ARIMA model comprises three components: autoregression (AR), differencing (I), and moving average (MA). The AR component captures the influence of previous moments on the value at the current moment. The differencing component ensures stationarity of the series by applying differencing operations. The MA component accounts for the impact of past errors on the value at the current moment. The ARIMA model is formally denoted as ARIMA ( p , d , q ) . Within this notation, p corresponds to the number of autoregressive terms, d indicates the degree of differencing required to render the series stationary, and q denotes the number of moving average terms. ARIMA utilizes autocorrelation and stochasticity to extract patterns and trends from historical data for future prediction. It is, therefore, well suited to modeling the trend component of STL time series decomposition. For predicting the decompositing component trend, ARIMA model is utilized. In this study, the historical L time steps hog supply data are feed into ARIMA model, and the future one step trends T t can be predicted.

4.3.3. XGBoost-Based Seasonal and Residual Prediction

XGBoost is a form of gradient boosting tree model, generating models sequentially and taking the sum of all the models as the output. XGBoost expands the loss function as a second-order Taylor expansion, optimizes the loss function by using the information of the second-order derivative of the loss function, and chooses whether to split the nodes or not greedily depending on the reduction or not of the loss function. XGBoost excels in forecasting seasonal components due to its proficiency in discerning recurring cycles. Concurrently, it adeptly mitigates the impact of noise and anomalies, thereby enhancing the forecast’s resilience when addressing residual elements. Additionally, the decision tree framework inherent in XGBoost facilitates a transparent understanding of the predictive process, thereby elevating the interpretability of model.
The XGBboost model introduced for forecasting seasonal and residual components after decomposing STL time series data. In this study, the historical hog supply data and extracted features from the previous L time steps are fed into XGBoost model, and future one step season S t and residual components R t are predicted.

5. Results and Discussion

5.1. Experimental Setting

5.1.1. Setting

The entire dataset is rolled with a stride of 1 to generate various input–output sampling pairs. Each sample contains historical observations with L = 7 time steps and future h = 1 time step. The dataset is split by the ratio of 0.7:0.1:0.2. For parameters in ARIMA, the number of autoregressive terms p = 1 , the degree of differencing d = 1 , and the number of moving average terms q = 1 . For parameter in XGBoost, number of estimators is set to be 100, the max depth of tree is 10, and the learning rate is 0.05.
The hog supply series data before and after data preprocess is presented in Figure 7.

5.1.2. Benchmarks

To comprehensively assess the performance of the prediction model proposed in this paper for hog supply forecasting, the model compared and analyzed with a variety of classical time series forecasting algorithms. ARIMA, SARIMAX, and Linear are chosen as benchmarks.
  • ARIMA [22]: The ARIMA model is a time series model that combines autoregression and moving averages.
  • SARIMAX [22] (Seasonal Autoregressive Integrated Moving Average Model with Exogenous variable): SARIMAX is an extension of ARIMA that can handle time series data with seasonal characteristics.
  • Linear: The linear regression establishes a linear relationship between the independent and dependent variables.
  • Random Forest [49] (RF): RF is a powerful ML method used for both classification and regression tasks. It builds on the concept of decision trees and combines multiple trees to improve predictive performance and robustness.
  • Support Vector Regression [50] (SVR): SVR is a type of machine learning algorithm SVR aims to find a function that predicts the target values (continuous outcomes) while ensuring that the prediction error is within a specified margin of tolerance.
Comparing prediction accuracy verifies the effectiveness and superiority of the proposed model.

5.1.3. Metrics for Evaluation Models

In regression forecasting, the commonly used measure to estimate model performance is the Mean Squared Error (MSE). Since the time-series data in this paper are nonnormally distributed with extreme values, we use the more robust measures of MAE and MSL. To measure the interpretability of the model, the EVA and R-squared are also implemented to measure the explanatory ability of the model.
  • MAE (Mean Absolute Error): The MAE is calculated as the average of the absolute differences between the predicted and true values.
  • MSL (Mean Squared Logarithm): MSL is a regression prediction error metric that is relatively robust to extreme values.
  • EVS (Explained Variance Score): EVS is an indicator that measures how well the model explains the variance of the dependent variable. The closer the EVS value is to 1, the better the model can explain the variation in the dependent variable’s variance.
  • R 2 (R-Squared): R 2 is similar to EVS, which indicates the variation in the variance of the dependent variable that can be explained by the model.

5.2. Statistical Results of Granger Causality Tests

The Granger causality tests are conducted for the variables piglet and breeding sow, with hog supply.
Unit root and cointegration tests for the piglet variable are shown in Table 5. The ADF values of both X p i g l e t and Y exceed the 1% critical value, and the p-value is greater than 0.05, indicating that both variables are not stable. The ADF values of d ( X p i g l e t ) and d ( Y ) after first-order differencing are less than the critical value and the p-value is much less than 0.05, implying that the differenced variables are smooth and meet the conditions of the cointegration test. The residual unit root test shows that the ADF value is −4.616, which is less than 1% threshold −3.568, and the p-value is 0.0001 < 0.05. This allows us to conclude that the two variables are in a cointegration relationship.
Then, the range of the potential time lag periods is determined by causality test. Piglets younger than 4 months cannot be converted into a market hog and piglets older than 10 months are typically considered too oversized to be slaughtered. Therefore, the candidate time lag periods to be tested for the piglet variable are set to 4 to 10 months. Breeding sows need time to get pregnant, so the potential time lag periods are set to 6 to 15 months.
Statistical results are shown in Table 6, where p-values below 0.01 are highlighted in bold. The results reveal that in most of the cases tested, lags of 4 to 9 months show significant causality with p-values less than 0.05. Notably, the causal relationship between piglet quantity and pig supply is most significant when the lag periods are 4–6 months, with p-values less than 0.01.
Similarly to piglets, a Granger causality test is performed on breeding sows. The results are presented in Table 7, with the p-values below 0.01 in bold. The results indicate a significant causal relationship between breeding sows and hog supply, with time lag periods of 11–13 months.
As a result of the Granger causality test, this study is not only able to establish the existence of a causal relationship between piglet, breeding sow, and hog supply, but also determine the specific time lag periods to be 4–6 months for piglets and 11–13 months for breeding sows. Different from the widely held belief that breeding sows have a fixed time lag period of 10 month, the results show that there may be a range of values for the time lag that demonstrate strong casual relationship. The primary reason behind this is that pig farms in Chongqing tend to adopt longer breeding cycles, which may vary across farms. These results provide benchmarks for the time lag with strong statistical evidence for government and industry stakeholders.

5.3. Analysis of STL Decomposition

The STL time-series decomposition of the time-series data of hog supply yields the trend of seasonal and residual components, as shown in Figure 8.
The trend in Figure 8 illustrates a consistent growth trend in pig production capacity in China from January 2020 to January 2024, which aligns well with the actual situation. The capacity increase is due to the effective control measures that were implemented to combat the outbreak of ASF in 2018 and the joint efforts by the government and the livestock industry to boost the hog supply starting from 2019. As the market gradually recovered and consumer confidence strengthens, the increase in pig demand further propelled the rapid growth of production capacity. These results can also be reflected in the national pork production (Figure 1).
The seasonal component reflects the fluctuation pattern of annual pork consumption. The seasonal pattern depicted in the graph indicates that the hog supply reached its peak in January during the Chinese New Year celebrations and the preparation of cured pork products. Subsequently, the hog supply experienced a rapid decline, reaching its lowest point in March or April due to the availability of abundant preserved pork. It showed a minor peak in July as the stored pork was depleted, and then gradually declined again due to the decrease in pork consumption demand caused by high temperatures. From October onwards, there was a subsequent increase in supply until the period before the Spring Festival, as a new round of cured pork production began.
The residual component of the hog supply reflects the influence of various complex factors, including policy interventions, epidemics, and unforeseen events. December 2020 exhibited outliers in the data. The decline in production capacity halted in October 2020 and was succeeded by a recovery phase, which led to a surge in December 2020. The surge enhancement of production capacity can be attributed to the implementation of capacity recovery policies prior to March 2020, as indicated by data from the Chongqing Municipal Commission of Agriculture and Rural Affairs.
By employing STL time series decomposition, a better understanding of intrinsic operational dynamics is gained, thereby enhancing the predictive accuracy and practicality of the model.

5.4. Model Selection for the Season and Residual Components

Based on our comparative analysis of various models for predicting seasonal and residual components, the results indicate that XGBoost consistently outperforms other models in terms of MAE loss. Table 8 summarizes the MAE for seasonal and residual predictions across different models: Linear, SVR, RF, and XGBoost.
XGBoost significantly outperforms other models in predicting seasonal components. The MAE for the XGBoost model is substantially lower than that of the Linear (0.0737), SVR (0.0841), and RF (0.0676) models. This suggests that XGBoost provides a highly accurate representation of seasonal trends, effectively minimizing prediction errors. Similarly, in terms of residual prediction, XGBoost again demonstrates superior performance with an MAE of 0.01314. This is notably lower than the MAE values of the other models, where SVR (0.0226) shows the next best performance, followed by Linear (0.0664) and RF (0.6642). The substantial error in the RF model underscores its less effective handling of residuals compared to XGBoost.
The consistently low MAE values for both seasonal and residual predictions confirm that XGBoost is the most effective model among those evaluated. Its ability to handle complex patterns and provide accurate predictions makes it a superior choice for modeling both seasonal components and residual errors. This extensive comparative analysis supports the selection of XGBoost as the preferred model due to its robust performance and reliability in capturing intricate patterns within the data.

5.5. Comparison of Prediction Models

Experiments assess the performance of the proposed model against existing ARIMA, SARIMA, linear models, RF, and SVR. Table 9 presents four metrics regarding the out-of-sample performances of the models.
The proposed model outperforms the compared models with a MAE of 10,396.05. Compared to the ARIMA and SARIMA models, the model reduces the MAE error by approximately 67%. It also demonstrates a 61% reduction in MAE compared to the Linear model. The proposed model reduces 73.7% MAE error for RF and 71.2% for SVR. In terms of MSL, the proposed model performs well with a value of 0.0014. It achieves a reduction of 91% compared to the Linear model, which has the best performance among other methods. Furthermore, the proposed model has a significant advantage in terms of EVS and R 2 , with the EVS score being 0.9768 and the R 2 score being 0.974. Both values are very close to 1, indicating that this model can explain over 97% of variation of dependent variable and has strong interpretability. Compared to the ARIMA model, the proposed model gains improvements more than 20% in both EVS and R 2 . Similarly, the improvement in EVS and R 2 is 11% compared to the linear model. Furthermore, its performance also exceeds SVR by approximately 25% in both metrics. RF exhibits particularly poor performance on the EVS and R 2 metrics, with values of 0.2489 and 0.2388, respectively, indicating a notably limited interpretability. In conclusion, the comparative experiments validate the superior predictive capabilities of the proposed model, as it consistently outperforms the other five models across all four evaluation metrics.
Figure 9 displays the prediction results of the model in comparison to ARIMA, SARIMA, and linear models using real hog supply data. In Figure 9a, it is evident that the ARIMA model has a significant prediction delay, especially at the turning point. Figure 9b demonstrates that while the SARIMA model incorporates seasonality, it still lags in terms of immediate forecasting. Figure 9c shows that the Linear model performs well in predicting inflection points and controlling overall trends but lacks accuracy in matching predicted values to true values. Figure 9d shows that RF model shows excellent training fit but underperforms on the test set, as reflected in its poor EVS and R 2 metrics, due to a lack of generalizability. Figure 9e demonstrates that SVR has limited performance than other models. It can capture the trend changes of hog supply while exhibiting a low degree of fit for the predicted outcomes. Figure 9f showcases the prediction results of the proposed model. Compared to the previous models, the model excels in matching predicted and true values, especially in accurately capturing inflection points by considering piglet features. Furthermore, the model surpasses all other models in terms of immediacy.
These experimental comparisons eliminate stereotype bias against traditional methods of poor performance. Methods exhibit varying performance in temporal forecasting scenario [51]. Time series data typically exhibit temporal dependencies, where future values are influenced by historical values. Pure machine learning models, however, do not account for these temporal dependencies and treat each data point as independent rather than as part of a sequence [52]. Consequently, applying models SVR and RF directly fail to leverage the temporal characteristics of the data effectively, resulting in suboptimal predictive performance. RF, in particular, struggles to capture trends in time series data, while it often underperforms on test samples. In contrast, our decomposition approach considers the temporal properties and inherent relationships within the time series data, thereby fully utilizing the predictive capabilities of machine learning models and achieving improved forecasting accuracy. Evidently, the proposed model that integrates STL decomposition with ML has significant advantages in terms of all prediction metrics compared to other models.

5.6. Comparison of Prediction Models with and without the Piglet Feature

In this section, a series of comprehensive comparative experiments is conducted to explore the impact of incorporating piglet features on the predictive performance of the model. For this purpose, three models are compared: Model M1, which includes only breeding sow features; Model M2, which includes only piglet features; and Model M3, which incorporates both breeding sow and piglet features. These models systematically examine the influence of various feature combinations on performance.
The results are summarized in Table 10. Comparative analysis indicates that the M1 model performs relatively poorly among the three models. In contrast, the M2 model, which focuses on incorporating piglet features, exhibits significant performance improvements compared to the M1 model. Specifically, the MAE is reduced by 1.12%, and the MSL loss is dramatically reduced by 26.31%. An increase of 0.85% in the EVS and a 0.74% increase in the R 2 coefficient. This observation further confirms the findings discussed in Section 4.2.1. Incorporating piglet features indeed improves the accuracy of hog supply forecast.
It is noteworthy that the M3 model performs exceptionally well in terms of prediction performance. This model includes both breeding sows and piglet as inputs, achieving a more comprehensive feature fusion. As a result, the MAE loss in the M3 model is significantly reduced by 27.8%, and the MSL loss is reduced by 42.1%, demonstrating a substantial reduction in prediction errors. Additionally, the EVS and R 2 coefficients are improved by 1.2% and 1.24%, respectively, further highlighting the advantages of the model in terms of prediction performance.
In conclusion, the integration of breeding sow and piglet features yields a pronounced improvement in the accuracy of capacity forecasting. The findings underscore the noteworthy relevance of piglets, which has frequently been disregarded, as a crucial feature for predicting hog supply.

5.7. Interpretability Analysis

Interpretability analysis is a crucial aspect of understanding and explaining the proposed prediction model. This study uses the feature importance in XGBoost to calculate the importance of each feature. This method provides the importance of features based on their frequency as split points across all trees, which helps us understand which features have the greatest impact on the model’s predictions.
The analysis results in the season prediction (as shown in Figure 10a) show that the month variable has the highest F score, which is 581. This indicates that the month has the most significant impact on predicting the seasonal variation in hog supply. Following this, the variables quarter and skewness have F scores of 110 and 53, respectively. The month variable typically captures the finest level of seasonal variation, and many changes caused by seasons, like demand fluctuation and production cycle, can also be reflected by month. The quarter variable aggregates monthly information into broader seasonal periods. While monthly data provide fine-grained insights, quarterly data smooth out some of the short-term fluctuations and captures broader trends. Furthermore, the skewness shows the distribution shape of hog supply, reflecting irregular movements that can influence seasonal predictions.
Furthermore, in the residual prediction (as shown in Figure 10b), slope and quarter have the highest F score. Slope represents the rate of change in price, where a high slope means rapid changes in supply trends, which may be hardly captured by the prediction model, leading to larger residuals. Pig prices often exhibit seasonal fluctuations, which cause residual changes, so the quarter variable can help predicate the residual caused by season. Piglet-5 has an F score of 254, because the hog supply may be influenced by past piglets. The lagged variable captures this effect, explaining how the previous predictions impact current predictions, and thus, affecting residuals.

6. Recommendations and Policies

Based on the study analysis, valuable recommendations and policy suggestions can be provided to the government to support intelligent decision making, thereby benefiting the stable development of the pig industry.
The study identifies the causal relationship between piglet and hog supply, and results indicate that the inclusion of piglets can enhance hog supply prediction. In current practice, both the government and the industry primarily emphasize the breeding sow index. Effective measures for regulating pig production capacity and prices involve monitoring the pig feed-to-grain price ratio, changes in breeding sow inventory, and average retail prices of lean meat in 36 major cities, while piglets are ignored. It is, therefore, recommended that the government consider incorporating this new piglet indicator to complement existing control measures. By monitoring piglet indicators, the government can understand future pig supply and implement proactive regulatory measures.
The study shows significant seasonal fluctuations in pig supply. Currently, for the sake of food security, the government sets a benchmark of 39 million breeding sows per month, with a normal range of 92% to 105%. However, this regulation approach may be rigid in addressing fluctuations in pig production capacity. The research results reveal that the time lag periods are 4–6 months for piglets, and 11–13 months for breeding sows in Chongqing. In reality, the number of breeding sows and piglets should be flexibly adjusted according to market supply and demand dynamics within the lag period. The government is suggested to adjust the number of breeding sows and piglets dynamically based on future hog supply, taking into account the time lag. To ensure the stability of pig supply and smooth market operation, the government should conduct nationwide research on the lag period based on the lagged relationship studied in this paper. By integrating the calculated lags with the nationwide pig supply–demand dynamics across different months, along with the supply forecasts generated from this study, the government can effectively predict the forthcoming monthly pig supply and flexibly regulate the quantity of breeding sows and piglets within a reasonable threshold. Consequently, the dynamic establishment of monitoring baseline values pertaining to the number of piglets and breeding sows can be ascertained.
The prediction results obtained from the hog supply forecast model can guide the government in adopting forward-looking, comprehensive and diversified regulatory policies. Historically, China has treated pork as a strategic reserve commodity, aiming to address extreme situations and stabilize the pig market. When predicting insufficient pork supply, the government should proactively release the stockpiling, timely meeting market demand and avoiding sharp increases in pork prices. Conversely, when predicting an excess supply, the government should prepare for stockpiling, reducing market supply and avoiding sharp declines in pork prices. The government can take more regulatory action, such as increasing financial support and providing incentives and subsidies.

7. Conclusions

Based on STL decomposition and ML technologies, this study proposes an integrated framework for hog supply forecast and carry out an empirical study using real datasets collected from Chongqing. (1) Statistical time lag periods for piglets are revealed to be 4–6 months, while for breeding sows, they are 11–13 months in Chongqing. (2) The inclusion of the piglet feature significantly improves the accuracy of hog supply forecasts, resulting in a 42.1% reduction in MAE loss. (3) STL decomposition results indicate the growth trend and seasonal fluctuation pattern of annual hog supply, corroborating the national trend in pork consumption. (4) Comparative analyses demonstrate that our model outperforms existing models, achieving a reduction over 60% in MAE compared to the other five baselines.
Future work will identify more effective factors related to hog supply, and apply ML models with more interpretablity to uncover the inherent relationships between different factors and hog supply. Furthermore, because the cattle and sheep industries share similarities with the pig industry in the animal husbandry sector, the framework proposed in this study can be extended to predict production capacity in other animal husbandry sectors.

Author Contributions

Conceptualization, M.X., X.L., Y.Z., B.O., J.S. and S.D.; Methodology, M.X., X.L., B.O. and S.D.; Software, X.L. and B.O.; Validation, M.X., X.L. and Y.Z.; Formal analysis, M.X., X.L. and S.D.; Investigation, M.X., X.L., Y.Z., Z.L., J.S. and S.D.; Resources, J.S.; Data curation, X.L. and J.S.; Writing—original draft, M.X., X.L., Y.Z., B.O. and S.D.; Writing—review & editing, M.X., X.L., Y.Z., Z.L. and S.D.; Visualization, M.X., X.L. and Y.Z.; Project administration, S.D.; Funding acquisition, S.D. All authors have read and agreed to the published version of the manuscript.

Funding

Major Program of the National Natural Science Foundation of China (71931005); Innovative Research Group Project of the National Natural Science Foundation of China (71821001); Graduate Innovation Fund of Huazhong University of Science and Technology (YCJJ20241301); the Fundamental Research Funds for the Central Universities (2024JYCXJJ057).

Institutional Review Board Statement

Not applicable.

Informed Consent Statement

Not applicable.

Data Availability Statement

Restrictions apply to the availability of these data. Data were obtained from National Hog Big Data and are available at https://www.hogdata.cn with the permission of National Hog Big Data, Rongchang, Chongqing.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Xiong, T.; Zhang, W.; Chen, C.T. A Fortune from misfortune: Evidence from hog firms’ stock price responses to China’s African Swine Fever outbreaks. Food Policy 2021, 105, 102150. [Google Scholar] [CrossRef]
  2. Maples, J.G.; Lusk, J.L.; Peel, D.S. Technology and evolving supply chains in the beef and pork industries. Food Policy 2019, 83, 346–354. [Google Scholar] [CrossRef]
  3. Xiao, H.; Wang, J.; Oxley, L.; Ma, H. The evolution of hog production and potential sources for future growth in China. Food Policy 2012, 37, 366–377. [Google Scholar] [CrossRef]
  4. Hu, X.; Wang, M. Causes and inspirations of fluctuations in pig production and prices in the United States. Agric. Econ. Issues 2013, 9, 98–109. [Google Scholar]
  5. Zhou, J.; Ding, S.; Ruan, D. The empirical analysis on influencing factors of pig production fluctuation in China. Res. Agric. Mod. 2014, 6, 750–756. [Google Scholar]
  6. Xiao, H.; Wang, M. Analysis of the causes of fluctuations in China’s pig production. Agric. Econ. Issues 2012, 12, 28–32. [Google Scholar]
  7. Zhu, H.; Xu, R.; Deng, H. A novel STL-based hybrid model for forecasting hog price in China. Comput. Electron. Agric. 2022, 198, 107068. [Google Scholar] [CrossRef]
  8. Piewthongngam, K.; Vijitnopparat, P.; Pathumnakul, S.; Chumpatong, S.; Duangjinda, M. System dynamics modelling of an integrated pig production supply chain. Biosyst. Eng. 2014, 127, 24–40. [Google Scholar] [CrossRef]
  9. Pang, J.; Yin, J.; Lu, G.; Li, S. Supply and Demand Changes, Pig Epidemic Shocks, and Pork Price Fluctuations: An Empirical Study Based on an SVAR Model. Sustainability 2023, 15, 13130. [Google Scholar] [CrossRef]
  10. Kaylen, M.S. Vector autoregression forecasting models: Recent developments applied to the US hog market. Am. J. Agric. Econ. 1988, 70, 701–712. [Google Scholar] [CrossRef]
  11. Liang, X.; Liu, X.; Yang, F. Prediction model on Chinese annual live hog supply and its application. J. Syst. Sci. Complex. 2015, 28, 409–423. [Google Scholar] [CrossRef]
  12. Zhang, F.; Wang, F.; Wang, F. Forecasting model and related index of pig population in China. Symmetry 2021, 13, 114. [Google Scholar] [CrossRef]
  13. Javaid, M.; Haleem, A.; Khan, I.H.; Suman, R. Understanding the potential applications of Artificial Intelligence in Agriculture Sector. Adv. Agrochem 2023, 2, 15–30. [Google Scholar] [CrossRef]
  14. Fan, X.; Luo, P.; Mu, Y.; Zhou, R.; Tjahjadi, T.; Ren, Y. Leaf image based plant disease identification using transfer learning and feature fusion. Comput. Electron. Agric. 2022, 196, 106892. [Google Scholar] [CrossRef]
  15. Brignoli, P.L.; Varacca, A.; Gardebroek, C.; Sckokai, P. Machine learning to predict grains futures prices. Agric. Econ. 2024, 55, 479–497. [Google Scholar] [CrossRef]
  16. Yue, D.; Wang, Z. Research on the Fluctuation Cycle of Swine Production in China. J. Agrotech. Econ. 2010, 6, 18–25. [Google Scholar]
  17. Liu, Y.; Duan, Q.; Wang, D.; Zhang, Z.; Liu, C. Prediction for hog prices based on similar sub-series search and support vector regression. Comput. Electron. Agric. 2019, 157, 581–588. [Google Scholar] [CrossRef]
  18. Colino, E.V.; Irwin, S.H.; Garcia, P. Improving the accuracy of outlook price forecasts. Agric. Econ. 2011, 42, 357–371. [Google Scholar] [CrossRef]
  19. Granger, C.W. Investigating causal relations by econometric models and cross-spectral methods. Econom. J. Econom. Soc. 1969, 37, 424–438. [Google Scholar] [CrossRef]
  20. Chen, R. Analysis of the cyclical fluctuations in pig production in China. Agric. Technol. Econ. 2009, 3, 77–86. [Google Scholar]
  21. Cleveland, R.B.; Cleveland, W.S.; McRae, J.E.; Terpenning, I. STL: A seasonal-trend decomposition. J. Off. Stat. 1990, 6, 3–73. [Google Scholar]
  22. Box, G.E.; Jenkins, G.M.; Reinsel, G.C.; Ljung, G.M. Time Series Analysis: Forecasting and Control; John Wiley & Sons: Hoboken, NJ, USA, 2015. [Google Scholar]
  23. Chen, T.; Guestrin, C. XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA, USA, 13–17 August 2016; pp. 785–794. [Google Scholar]
  24. Wang, J.; Wang, X.; Yu, X. Shocks, cycles and adjustments: The case of China’s Hog Market under external shocks. Agribusiness 2023, 39, 703–726. [Google Scholar] [CrossRef]
  25. Yu, X. Productivity, efficiency and structural problems in Chinese dairy farms. China Agric. Econ. Rev. 2012, 4, 168–175. [Google Scholar] [CrossRef]
  26. Padilla, S.L.; MacLachlan, M.J.; Vaiknoras, K.; Schulz, L.L. Disasters, population trends, and their impact on the US pork packing sector. Food Policy 2023, 118, 102458. [Google Scholar] [CrossRef]
  27. Li, C.; Wang, G.; Shen, Y.; Amètépé Nathanaël Beauclair, A. The Effect of Hog Futures in Stabilizing Hog Production. Agriculture 2024, 14, 335. [Google Scholar] [CrossRef]
  28. Ma, M.; Wang, H.H.; Hua, Y.; Qin, F.; Yang, J. African swine fever in China: Impacts, responses, and policy implications. Food Policy 2021, 102, 102065. [Google Scholar] [CrossRef]
  29. Zhang, L.; Wang, Y.; Dunya, R. How Does Environmental Regulation Affect the Development of China’s Pig Industry. Sustainability 2023, 15, 8258. [Google Scholar] [CrossRef]
  30. Liu, H.; Zheng, K. Analysis of the Chinese government’s subsidy programs to restore the pork supply chain: The case of African swine fever. Omega 2024, 124, 102995. [Google Scholar] [CrossRef]
  31. Bancroft, J. The big hog Cycle-What Goes Down, Must go up. In Proceedings of the Report for Canadian Centre for Swine Improvement Inc., Genetics for Swine Production Meeting, Des Moines, IA, USA, 18–19 November 2003. [Google Scholar]
  32. Harlow, A.A. Factors Affecting the Price and Supply of Hogs; Number 1274; US Department of Agriculture: Washington, DC, USA, 1962. [Google Scholar]
  33. Chuluunsaikhan, T.; Kim, J.H.; Park, S.H.; Nasridinov, A. Analyzing Internal and External Factors in Livestock Supply Forecasting Using Machine Learning: Sustainable Insights from South Korea. Sustainability 2024, 16, 6907. [Google Scholar] [CrossRef]
  34. Ezekiel, M. The cobweb theorem. Q. J. Econ. 1938, 52, 255–280. [Google Scholar] [CrossRef]
  35. Sun, Z.; Zhang, L. Analysis of the national breeding sow production capacity survey and recommendations for the future market. China Swine Ind. 2020, 15, 12–16. [Google Scholar]
  36. Maki, W.R. Forecasting livestock supplies and prices with an econometric model. J. Farm Econ. 1963, 45, 612–624. [Google Scholar] [CrossRef]
  37. Crom, R. A Dynamic Price-Output Model of Beef and Pork Sectors; Technical Bulletin; United States Department of Agriculture, Economic Research Service: Washington, DC, USA, 1970. [Google Scholar]
  38. Brandt, J.A.; Bessler, D.A. Forecasting with vector autoregressions versus a univariate ARIMA process: An empirical example with US hog prices. North Cent. J. Agric. Econ. 1984, 6, 29–36. [Google Scholar] [CrossRef]
  39. Meilke, K.; Zwart, A.; Martin, L. North American hog supply: A comparison of geometric and polynomial distributed lag models. Can. J. Agric. Econ. Can. D’agroecon. 1974, 22, 15–30. [Google Scholar] [CrossRef]
  40. Dixon, B.L.; Martin, L.J. Forecasting US pork production using a random coefficient model. Am. J. Agric. Econ. 1982, 64, 530–538. [Google Scholar] [CrossRef]
  41. Rezitis, A.N.; Stavropoulos, K.S. Modeling pork supply response and price volatility: The case of Greece. J. Agric. Appl. Econ. 2009, 41, 145–162. [Google Scholar] [CrossRef]
  42. Xiong, T.; Bao, Y.; Hu, Z. Multiple-output support vector regression with a firefly algorithm for interval-valued stock price index forecasting. Knowl.-Based Syst. 2014, 55, 87–100. [Google Scholar] [CrossRef]
  43. Wang, Y.; Guo, Y. Forecasting method of stock market volatility in time series data based on mixed model of ARIMA and XGBoost. China Commun. 2020, 17, 205–221. [Google Scholar] [CrossRef]
  44. Cao, H.; Xu, P.; Chen, J.; Zhong, L.; Li, X. Domestic Pig Price Prediction and Experiment Based on Bi-RNN-LSTM Model. Mech. Electr. Eng. Technol. 2023, 52, 260–263+289. [Google Scholar]
  45. Xiong, T.; Li, C.; Bao, Y. An improved EEMD-based hybrid approach for the short-term forecasting of hog price in China. Agric. Econ. 2017, 63, 136–148. [Google Scholar] [CrossRef]
  46. Qin, J.; Yang, D.; Zhang, W. A Pork Price Prediction Model Based on a Combined Sparrow Search Algorithm and Classification and Regression Trees Model. Appl. Sci. 2023, 13, 12697. [Google Scholar] [CrossRef]
  47. Gasca, M.; Sauer, T. Polynomial interpolation in several variables. Adv. Comput. Math. 2000, 12, 377–410. [Google Scholar] [CrossRef]
  48. Gjølberg, O. Are piglet prices rational hog price forecasts? Agric. Econ. 1995, 13, 119–123. [Google Scholar] [CrossRef]
  49. Breiman, L. Random forests. Mach. Learn. 2001, 45, 5–32. [Google Scholar] [CrossRef]
  50. Cortes, C. Support-Vector Networks. Mach. Learn. 1995, 20, 273–297. [Google Scholar] [CrossRef]
  51. Qiu, X.; Hu, J.; Zhou, L.; Wu, X.; Du, J.; Zhang, B.; Guo, C.; Zhou, A.; Jensen, C.S.; Sheng, Z.; et al. Tfb: Towards comprehensive and fair benchmarking of time series forecasting methods. arXiv 2024, arXiv:2403.20150. [Google Scholar] [CrossRef]
  52. Hyndman, R. Forecasting: Principles and Practice; OTexts: Melbourne, Australia, 2018. [Google Scholar]
Figure 1. National annual pork production and growth rate from year 2008 to year 2024 in China (source: the National Bureau of Statistics).
Figure 1. National annual pork production and growth rate from year 2008 to year 2024 in China (source: the National Bureau of Statistics).
Sustainability 16 08398 g001
Figure 2. National quarterly pork production from year 2018 to year 2024 in China (source: the National Bureau of Statistics). The blue bars represent the quarterly pork production, and the orange curve connects the bars. The value labels indicate the numerical values at the seasonal peaks.
Figure 2. National quarterly pork production from year 2018 to year 2024 in China (source: the National Bureau of Statistics). The blue bars represent the quarterly pork production, and the orange curve connects the bars. The value labels indicate the numerical values at the seasonal peaks.
Sustainability 16 08398 g002
Figure 3. The proposed hog supply forecasting framework.
Figure 3. The proposed hog supply forecasting framework.
Sustainability 16 08398 g003
Figure 4. The growing process from sow birth to market hog.
Figure 4. The growing process from sow birth to market hog.
Sustainability 16 08398 g004
Figure 5. Figure of the hog supply, the number of piglets with a 5-month time lag and the number of breeding sows with a 10-month time lag.
Figure 5. Figure of the hog supply, the number of piglets with a 5-month time lag and the number of breeding sows with a 10-month time lag.
Sustainability 16 08398 g005
Figure 6. The correlation between variables.
Figure 6. The correlation between variables.
Sustainability 16 08398 g006
Figure 7. Hog supply time series data from July 2019 to November 2023 before and after filtering outlier.
Figure 7. Hog supply time series data from July 2019 to November 2023 before and after filtering outlier.
Sustainability 16 08398 g007
Figure 8. STL decomposition results of hog supply series.
Figure 8. STL decomposition results of hog supply series.
Sustainability 16 08398 g008
Figure 9. Comparison of prediction models.
Figure 9. Comparison of prediction models.
Sustainability 16 08398 g009
Figure 10. Feature importance values.
Figure 10. Feature importance values.
Sustainability 16 08398 g010
Table 1. Data description.
Table 1. Data description.
Data SourceVariable DescriptionMeanStdFrequency
Large-scale farmNumber of market hogs (heads)293,089.195,538.3monthly
Number of breeding sows (heads)239,894.560,454.0monthly
Number of breeding sows purchased (heads)10,673.14309.5monthly
Number of breeding sows for sale (heads)7966.94371.1monthly
Number of piglets (heads)316,251.097,059.0monthly
Number of piglets purchased (heads)109,132.945,232.4monthly
Number of piglets for sale (heads)137,872.147,430.1monthly
Number of large-scale farms4942.2516.84monthly
Number of all hogs2,168,410.6660,182.7monthly
Number of hog deaths23,314.712,337.6monthly
Government departmentPrice of pork (RMB/kg)26.211.3daily
Price of corn (RMB/ton)2854.3141.9daily
Price of bean (RMB/ton)3722.5683.1daily
Price of chicken (RMB/kg)20.52.4monthly
Number of import pigs (heads)189,253.0105,046.9monthly
Table 2. Top eight r variables and their VIF value.
Table 2. Top eight r variables and their VIF value.
FeaturePigletCornBreeding SowAll HogFarmHog DeathsBeanPiglet for Sale
r0.830.740.740.730.710.670.660.66
VIF33.8414.6845.0538.6617.064.547.4719.81
Table 3. VIF value of selected features.
Table 3. VIF value of selected features.
VariableVIFVariableVIF
corn9.96corn9.33
bean6.19bean6.37
piglet5.40breeding sow5.67
hog deaths1.89hog deaths2.28
Table 4. A summary of notations.
Table 4. A summary of notations.
NotationsDescriptions
ttime index
Y t the quantity of hog supply in period t
T t trend of STL model in period t
S t seasonality of STL model in period t
R t residual of STL model in period t
X p i g l e t the quantity of piglets in period t
X s o w the quantity of breeding sow in period t
d ( X p i g l e t ) the first-order difference variable of piglets
d ( X s o w ) the first-order difference variable of breeding sows
nthe time lag period
Table 5. Unit root and cointegration test.
Table 5. Unit root and cointegration test.
VariablesADF Valuep-Value1%Conclusion
X p i g l e t −0.530.886−3.565not stable
d ( X p i g l e t ) −7.5792.70 × 10 11 −3.568stable
Y0.2860.977−3.601not stable
d ( Y ) −5.3783.76 × 10 6 −3.601stable
r e s i d −4.6160.0001−3.568stable
Table 6. Granger causality test for the piglet variable.
Table 6. Granger causality test for the piglet variable.
Time Lagsf-Valuep-Value
n = 46.9430.0003
n = 54.3340.0034
n = 63.7690.0057
n = 73.2010.0118
n = 82.5730.0314
n = 92.3470.046
n = 101.6660.1557
Table 7. Granger causality test for breeding sow feature.
Table 7. Granger causality test for breeding sow feature.
Time Lagsf-Valuep-Value
n = 62.0350.0878
n = 72.0830.0756
n = 81.4930.2043
n = 91.1760.3520
n = 101.6480.1578
n = 113.5860.007
n = 123.1420.0065
n = 135.4860.0021
n = 144.1480.0146
n = 154.3500.0285
Table 8. Model comparison for seasonal and residual predictions.
Table 8. Model comparison for seasonal and residual predictions.
ModelSeason_MAEResid_MAE
Linear0.07370.0664
SVR0.08410.0226
RF0.06760.6642
XGBoost0.00210.01314
Table 9. Comparison of the out-of-sample performance between the proposed model and existing models.
Table 9. Comparison of the out-of-sample performance between the proposed model and existing models.
MetricsARIMASARIMALinearRFSVROurs
MAE31,848.3831,846.3326,849.8139,602.4036,148.8610,396.05
MSL0.02550.030.01670.02820.02690.0014
EVS0.78460.80220.86830.24890.79680.9768
R 2 0.76880.8020.86780.23880.76620.974
Table 10. Comparative analysis with and without the piglet feature.
Table 10. Comparative analysis with and without the piglet feature.
MetricsM1 (Sow)M2 (Piglet)M3 (Sow + Piglet)
MAE10,521.1910,396.057592.76
MSL0.00190.00140.0011
EVS0.96850.97680.9807
R 2 0.96680.9740.9788
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Xu, M.; Lai, X.; Zhang, Y.; Li, Z.; Ouyang, B.; Shen, J.; Deng, S. An Integrated Hog Supply Forecasting Framework Incorporating the Time-Lagged Piglet Feature: Sustainable Insights from the Hog Industry in China. Sustainability 2024, 16, 8398. https://doi.org/10.3390/su16198398

AMA Style

Xu M, Lai X, Zhang Y, Li Z, Ouyang B, Shen J, Deng S. An Integrated Hog Supply Forecasting Framework Incorporating the Time-Lagged Piglet Feature: Sustainable Insights from the Hog Industry in China. Sustainability. 2024; 16(19):8398. https://doi.org/10.3390/su16198398

Chicago/Turabian Style

Xu, Mingyu, Xin Lai, Yuying Zhang, Zongjun Li, Bohan Ouyang, Jingmiao Shen, and Shiming Deng. 2024. "An Integrated Hog Supply Forecasting Framework Incorporating the Time-Lagged Piglet Feature: Sustainable Insights from the Hog Industry in China" Sustainability 16, no. 19: 8398. https://doi.org/10.3390/su16198398

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop