1. Introduction
Poultry are highly sensitive to environmental changes and have poor adaptability to adverse conditions [
1]. Environmental control is a critical part of intensive poultry farming [
2]. In intensive poultry farming, the high density of production leads to problems such as temperature and humidity imbalances in the rearing environment and excessive emissions of pollutants, which have a negative impact on poultry health and breeding [
3,
4]. Relative humidity (RH) is one of the key factors in measuring the poultry-rearing environment, and the optimal range for RH is approximately between 50% and 80%. Low RH increases the risk of virus transmission, while high RH accelerates feed mold growth, reduces poultry evaporation rate, increases heat stress, and affects the physiological function and egg production performance of poultry [
5,
6,
7]. In addition, RH and temperature inside the poultry house are coupled [
8], and their changes affect each other. However, compared with temperature, the change in RH is larger and has a shorter period, making it more unstable [
9]. Therefore, continuous monitoring of RH in the poultry house with the necessary regulation and intervention are beneficial for ensuring poultry health and improving laying rates [
10].
In recent years, various machine learning (ML) and deep learning (DL) based time-series prediction models have been applied to environmental monitoring in livestock and poultry breeding. Arulmozhi et al. [
11] compared the performance of different ML models in predicting pigsty humidity and found that the random forest regression (RFR) had the best predictive performance. Liu et al. [
12] used an extreme gradient boosting tree (XGBoost) to predict and regulate the concentration of odor in chicken coops, ensuring a clean environment. Lee et al. [
13] utilized a recurrent neural network (RNN) to predict and control the temperature and RH of duck houses. Wang et al. [
14] proposed a pigsty ammonia concentration prediction model based on a convolutional neural network (CNN) and gate recurrent unit (GRU), which can timely grasp the trend of ammonia concentration changes. Environmental data collected by sensors are complex and nonlinear and are affected by irregular noise. Hybrid prediction methods can achieve better predictive performance [
15]. Existing hybrid prediction methods mainly include feature selection, data denoising or decomposition, and selection and optimization of prediction models. Shen et al. [
16] employed empirical mode decomposition (EMD) to decompose environmental parameters and used an Elman neural network to predict the ammonia concentration in pigsties. Data decomposition simplifies a complex time series. Song et al. [
17] employed kernel principal component analysis (KPCA) to extract the main component information from multiple environmental factors and established a QPSO-RBF combination prediction algorithm to predict ammonia concentration levels in cowsheds. Yin et al. [
18] employed LightGBM and the Recursive Feature Elimination (RFE) method to screen out environmental factors with high correlation with carbon dioxide in sheep houses and established an SSA-ELM model to predict carbon dioxide concentration. Feature selection reduces model training time, while optimization algorithms reduce the time required to determine prediction model initialization parameters. Huang et al. [
19] used wavelet transformation (WT) to remove noise from environmental data and used a time convolutional network (TCN) to predict the pollution index in waterfowl breeding farms, effectively improving data quality.
Although there has been a certain research foundation regarding the environmental prediction of animal husbandry, it is insufficient to meet the needs for predicting RH within poultry houses. Current research mainly focuses on the one-step-ahead prediction stage, which estimates the next predicted value using partial past observations [
20]. Poultry is sensitive and responsive to environmental changes. Due to their biological characteristics, changes in environmental conditions do not immediately affect the egg production and health indicators of poultry, and it will take a certain amount of time to reflect, showing a certain lagging effect. Short-term point prediction is not conducive to resource scheduling and management regulation in intensive poultry farming and is even less conducive to accurate regulation of breeding period variables and assessment of the whole life cycle health status of poultry. Therefore, achieving RH multi-step-ahead prediction is particularly urgent and necessary. However, as the prediction time steps increase, the predictive performance of the model will inevitably decrease, resulting in more errors and risks and making it difficult for regulators to make decisions. Interval prediction can effectively quantify the risk brought by multi-step point predictions. Unlike point prediction, it ensures that future observation values fall within the specified range by constructing prediction intervals (PI) at different confidence levels. For regulators, it can provide more useful information than point prediction and assist in decision-making and management [
21].
Synthesizing the aforementioned research, this study proposes a comprehensive and practical hybrid medium and long-term prediction model that can predict point and interval ranges of RH in intensive poultry farming environments. The main contributions and innovations of this study are as follows:
Exploring methods to enhance the quality of input data for the model. Spearman rank correlation analysis and gray relation analysis (GRA) are used to eliminate redundant environmental factors, and complete ensemble empirical mode decomposition with adaptive noise (CEEMDAN) and permutation entropy are combined to reduce the noise of RH data. Feature selection and data denoising eliminate interference from redundant data.
Proposing a deep learning model based on BiGRU and an attention mechanism to achieve effective medium and long-term point prediction of poultry house RH. Compared with common models, the BiGRU-Attention model can improve the utilization rate of multi-dimensional and long-term data, fully extract causal relationships between variables and targets, and enhance the accuracy of medium and long-term RH prediction.
Demonstrating measures to reduce decision-making risks caused by point prediction errors. Kernel density estimation (KDE) is used to fit the errors generated by point prediction, and PI at different confidence levels is calculated to quantify the risk brought by point prediction errors. This provides regulators with more useful information.
4. Discussion
4.1. Analysis of Model Results in Comparison Based on Feature Selection
After feature selection, the MAE of FS-BiGRU-Attention and FS-BiGRU models for predicting the future 3 steps decreased by 23.0% and 18.1%, respectively, and their RMSE decreased by 25.6% and 32.2%, respectively. Additionally, the MAPE also decreased by 22.8% and 18.1%. As shown in
Figure 11, other baseline models’ predictive errors were reduced, and their prediction performance significantly improved after the feature selection process. This indicates that the feature selection method based on Spearman rank correlation analysis and GRA can effectively select out environmental factors that have a high correlation with RH and similar trend changes. By eliminating redundant environmental factors, feature selection can help models focus on relevant and useful covariates, making it easier to uncover causal relationships between input and output data and thereby improve the predictive performance.
4.2. Analysis of Model Results in Comparison Based on CEEMDAN-Based Denoising
After data denoising, the MAE of CEEMDAN-BiGRU-Attention and CEEMDAN-BiGRU models for predicting the future 3 steps decreased by 13.7% and 5.2%, respectively. The RMSE also decreased by 15.7% and 20.1%, while the MAPE decreased by 12.5% and 12.4%, respectively. As shown in
Figure 12, other baseline models had reduced errors and improved predictive performance after data denoising. This demonstrates that the CEEMDAN-based method combined with permutation entropy can effectively extract irregular noise from RH, resulting in a more regular and stable RH curve. After denoising, the predictive models can extract useful information more simply and efficiently and eliminate the interference of redundant information, thereby enhancing robustness and accuracy.
4.3. Analysis of Results Based on the CEEMDAN-FS-BiGRU-Attention Model
To substantiate the excellence of the proposed CEEMDAN-FS-BiGRU-Attention hybrid prediction model, it was compared and analyzed with multiple models. As shown in
Table 5, under the same prediction framework, BiGRU-Attention outperformed other baseline models in terms of predictive performance. Compared to CEEMDAN-FS-LSTM, CEEMDAN-FS-BiGRU-Attention reduced the MAE, RMSE, and MAPE for predicting the future 3 steps by 29.1%, 26.4%, and 27.0%, respectively.
From the perspective of the prediction framework, feature selection and data denoising can fully extract features highly correlated with RH, remove noise, and eliminate the influence of redundant factors, thereby enhancing the quality of the model input. From the perspective of the baseline model, BiGRU can fully explore the correlation between model inputs and outputs and has good fitting ability. The introduction of an attention mechanism enables capturing the long-term dependence of the RH sequence and effectively improves the situation where important information is lost due to excessive data during BiGRU training. Compared with LSTM, CEEMDAN-FS-BiGRU-Attention reduced the MAE, RMSE, and MAPE for predicting the future 3 steps by 57.7%, 48.2%, and 56.6%, respectively, demonstrating outstanding predictive performance.
4.4. Comparative Analysis of Interval Prediction Performance
To compare and access the interval prediction performance of different models, this study conducted a comparative analysis using BiGRU and BiLSTM as baseline models.
Figure 13 shows the error probability distribution function curves of different baseline model validation sets fitted by KDE-Gaussian. It can be observed that the error distribution of CEEMDAN-FS-BiGRU-Attention is relatively concentrated, mainly between [−7, 7]. The error distribution of CEEMDAN-FS-BiGRU and CEEMDAN-FS-BiLSTM are more dispersed, mainly between [−9, 9]. This indicates that CEEMDAN-FS-BiGRU-Attention has smaller errors on the validation set and better predictive performance than the other two models.
To further substantiate the efficacy of the KDE-Gaussian method, we compare its performance in constructing prediction intervals for the future 3 steps using the commonly used normal distribution estimation (NDE) method and Bootstrap method. NDE is a parameter estimation method that assumes a sample follows a normal distribution, while Bootstrap is an estimation method based on a random sampling of error distributions. As shown in
Table 6, we can observe that the KDE-Gaussian method has the best and most stable interval prediction performance. Compared with the KDE-Gaussian method, both the NDE and Bootstrap methods have the disadvantage of forming prediction intervals with PICP lower than the confidence level. A lower PICP indicates that the formed prediction intervals do not cover the true RH data well and may lead to incorrect decisions by regulators. The KDE-Gaussian method can output suitable prediction intervals stably, meeting the requirement of the lowest confidence level without producing excessively wide prediction intervals, which is more reliable and practical.
From the perspective of baseline models, since under the same prediction framework, the errors of the BiGRU-Attention model on the validation set are lower than the other two models, even though the PICP of the prediction interval formed by the BiGRU-Attention model is lower than that of BiGRU and BiLSTM, it still meets the requirement of the confidence level. It is worth mentioning that compared with BiGRU and BiLSTM, BiGRU-Attention can maintain a narrower interval width while fulfilling the requirement of the confidence level, accurately describing the uncertainty information of RH variations, which makes it perform better in practical applications.
In conclusion, the prediction interval formed by the proposed CEEMDAN-FS -BiGRU-Attention-KDE-Gaussian model is capable of closely monitoring the trend of changes in RH sequences, forming well-confident prediction intervals while ensuring narrow interval width. The model can provide more accurate and useful information for regulators and is suitable for precise prediction and control of RH in poultry houses.
5. Conclusions
This study proposes an effective hybrid point and interval prediction framework for RH, which significantly improves the accuracy and stability of medium and long-term RH prediction. Through comparison with multiple models, CEEMDAN-FS-BiGRU-Attention has been proven to be a reliable and efficient RH prediction model. Additionally, using the KDE-Gaussian method to form prediction intervals based on point prediction error distribution has demonstrated excellent interval prediction performance under different confidence levels and prediction steps.
The specific conclusions are as follows:
(1) Due to the influence of various factors, the RH data collected by the sensor will inevitably produce noise, which will cause random interference for model training and prediction. After data denoising, the MAE of BiGRU-Attention, BiGRU, and BiLSTM future 3 steps prediction was reduced by 13.8%, 13.2%, and 5.2%, respectively. This indicates that the data denoising method based on CEEMDAN and permutation entropy effectively extracts irregular noise from RH, making it easier for the model to learn useful information while suppressing overfitting.
(2) Environmental factors in poultry houses impact each other. The comprehensive analysis and selection of environmental factors with high correlation and similar trend changes are important to improve the accuracy of RH prediction. After feature selection, the MAE of BiGRU-Attention, BiGRU, and BiLSTM future 3 steps prediction were reduced by 23.0%, 18.1%, and 22.2%, respectively. This indicates that the feature selection method based on Spearman rank correlation analysis and GRA can select important environmental factors, reduce input dimensions, and improve prediction accuracy.
(3) Common baseline models in existing research have the disadvantage of losing important information due to sequences being too long, which is not conducive to predicting long time series. Self-attention mechanism is an efficient solution. Compared with BiGRU and BiLSTM, BiGRU-Attention MAE for predicting future 3 steps decreased by 15.6% and 11.3%, respectively, illustrating that the attention mechanism can improve the utilization of past data, suppress useful information loss, and effectively improve model prediction performance.
(4) Point prediction outputs only a single datum, providing relatively less information. At the same time, as the prediction step length increases, point prediction will have inevitable fluctuations and larger errors. Therefore, it is necessary to implement interval prediction for RH. Compared with commonly used PI construction methods NDE and Bootstrap, KDE-Gaussian has better interval construction performance, outputting reliable and narrow prediction intervals. This method can provide more useful information for producers to make decisions and warnings.
(5) In terms of the overall prediction framework, the CEEMDAN-FS-BiGRU-Attention model proposed in this paper has the best point prediction performance. The MAE, RMSE, and MAPE of predicting future 3 steps were reduced by 57.7%, 48.2%, and 56.6%, respectively, compared with LSTM. Moreover, the CEEMDAN-FS-BiGRU-Attention -KDE-Gaussian method can form the most appropriate prediction interval at different confidence levels.
This study guides predicting and controlling RH or other environmental factors in livestock breeding from multiple environmental factors and is of great significance for achieving intelligent breeding. However, there are still some limitations in this study, including subjectivity in the feature selection process and high time cost in parameter optimization. In the future, our focus will be on finding more objective and effective feature selection methods and using heuristic optimization algorithms to initialize model parameters.