1. Introduction
China has the largest number of sheep and goats in the world and is the largest producer and consumer of meat sheep [
1,
2]. In order to meet the huge market demand for healthy meat sheep, transforming and upgrading meat sheep farming from the traditional free-range model to a modern model in terms of scale and intensification are inevitable [
3,
4]. However, the environmental air quality in sheep barns can easily deteriorate under large-scale and intensive farming, and when environmental regulation is not timely, it can threaten the normal growth and breeding of meat sheep by inducing disease outbreaks and even causing mass mortality [
5,
6].
Housing for sheep with a good breeding environment is the basis for disease prevention and control, considering the genetic and nutritional advantages of meat sheep [
7]. Factors affecting the environmental air quality of sheep housing mainly include temperature, humidity, wind speed, and harmful gases. The gases in sheep barns mainly include O
2, CO
2, NH
3, and H
2S, among which CO
2 is the main greenhouse gas. The CO
2 in sheep barns is mainly produced by respiration and fecal decomposition, and its emission is influenced by the growth stage, body weight, exercise habits, and ventilation rate of the sheep [
8,
9,
10,
11]. A normal range of CO
2 mass concentration is not harmful to the health of sheep and is not serious. When the CO
2 mass concentration in sheep sheds is too high, the oxygen content is relatively insufficient, and sheep that live in this environment for a long time will suffer from chronic hypoxia, mental depression, loss of appetite, delayed weight gain, weakness, reduced production level, stress, and susceptibility to infectious diseases, which seriously impairs their welfare [
12,
13]. Therefore, it is important to study methods of predicting CO
2 concentrations in the sheep barn breeding environments of large-scale meat sheep farms to accurately grasp the trend of CO
2 changes and precisely regulate air quality, which is of great research value for reducing the impact of environmental stress on meat sheep growth and reproduction, preventing the occurrence of diseases and epidemics, reducing stress, and guaranteeing the welfare of sheep.
Research has been conducted on predicting CO
2 mass concentration based on traditional machine learning methods, and some results have been obtained in predicting CO
2 concentrations in pig houses [
14], composting environments [
15], and building construction environments [
16,
17], and with regard to urban carbon emissions [
18,
19], crop CO
2 emissions [
20], and ambient air pollution [
21,
22]. Although these CO
2 mass concentration prediction models can express the trends of internal changes of CO
2 in the environment and achieve certain prediction results, they require large amounts of valid data as experimental support, which creates a large and tedious workload. In addition, they can have problems, such as a long training time, slow convergence speed, susceptibility to falling into a local optimum, and poor model generalization ability, which make it difficult to meet the requirements for the timely and accurate prediction and regulation of CO
2 mass concentration in sheep barns of large-scale meat sheep farms [
23,
24,
25].
In recent years, with the rapid development of artificial intelligence and deep learning technology, researchers have applied deep learning techniques to a wide range of real-world problems [
26,
27,
28,
29,
30,
31,
32,
33,
34,
35,
36,
37]. Deep learning techniques have been used in crop detection [
28,
29], data prediction for agricultural management processes [
30,
31,
32], crop disease detection and classification [
33,
34], and animal behavior recognition [
35,
36,
37].
CO
2 mass concentrations in sheep barns of large-scale meat sheep farms can be collected online as time series and nonlinear data, and the LSTM model, one of the typical methods of deep learning, can be used to mine future data change trends by extracting historical time series data features, which allows it to achieve certain results in time series data prediction tasks [
38,
39,
40,
41,
42,
43,
44,
45]. Wang et al. improved the LSTM model’s prediction performance by adding an adaptive attention module, which allowed the model to obtain more critical information from time series data and achieve an accurate prediction of the remaining service life of lithium–ion batteries [
38]. Lin et al. first used LSTM to obtain the long time series relationship of heart rate data, then used BiLSTM to obtain the forward and backward correlation information of the data, and finally combined that with the attention mechanism to achieve an accurate prediction of heart rate [
39]. Wu et al. constructed a hybrid model using a combination of LSTM and kinetic models to achieve an accurate prediction of drought occurrence [
40]. Zhang et al. combined convolutional neural network (CNN) and LSTM models to construct a hybrid model (CNN-LSTM), and then combined that model with the spatiotemporal characteristics of the soil temperature field (STF) to predict the outlet temperature of energy piles [
41]. Wang et al. used several machine learning methods combined with LSTM models to achieve a fast and accurate estimation of winter wheat yield over large areas based on remote sensing data [
42]. In summary, considering the potential of LSTM models for complex time series data prediction tasks, in this paper we use LSTM models for the prediction of CO
2 mass concentration in sheep barns.
To address the problem that it is difficult to predict and regulate CO2 mass concentrations in sheep sheds of large-scale meat sheep farms in a timely and accurate manner, we proposed a prediction method based on the RF-PSO-LSTM model. First, to address the possible problems of data packet loss, distortion, or singular values in the ambient air quality data collected online from sheep sheds, we used the mean smoothing and linear interpolation methods to repair the problematic data preprocessing and obtained a high-quality dataset. Second, to address the differences in units and magnitudes of the obtained ambient air quality data and to facilitate the study of correlations in the ambient air quality data of meat sheep barns, we normalized the data using a standardized processor. Then, to address the problem that a large variety of ambient air quality parameters as well as possible redundancies or information overlap would not only result in a complex prediction network structure, but also tend to lead to high computational complexity and low execution efficiency, we proposed to use the RF algorithm to screen and evaluate the important features of ambient CO2 concentration to reduce the structural complexity and computational efforts. In addition, to address the problems that the prediction results of LSTM models are susceptible to the influence of hyperparameters and the manual setting of hyperparameters is time-consuming and subjective, we proposed to use the PSO algorithm to find the optimal hyperparameter combination. Finally, the prediction results with the actual collected data were used to test the effectiveness of the method in this paper.
3. Results and Discussion
3.1. PSO Algorithm Parameter Setting
The parameters of the PSO algorithm in this paper were set as shown in
Table 3. The LSTM model used the mean square error (MSE) loss function to calculate the loss value of the model during training, and the optimizer was Adam. The number of neurons, dropout probability, and batch size hyperparameters of the LSTM model were obtained by the PSO algorithm for finding the best results.
3.2. Determination of LSTM Model Structure
The structure of the model needed to be determined when predicting time series using the LSTM model, consisting of input, hidden, and output layers. The input layer is the key to the data transfer of the model and is the first part of the whole model to make predictions. The output layer outputs the results of the model’s prediction of CO2 mass concentration and is the final link in the overall model to make predictions. LSTM models usually have only one input layer and one output layer. The structure of the LSTM model mainly lies in the different hidden layers; therefore, we need to determine the number of hidden layers in the model.
With a small number of hidden layers, the model may not be able to fully learn the relationships in the data, and the fitting ability of the data may be insufficient; as a result, the model may not achieve the expected results for CO2 mass concentration prediction. Too many hidden layers will lead to overfitting the model to the data, resulting in a poor generalization ability; additionally, the model will have more parameters and a complex structure. In summary, we need to set the number of hidden layers reasonably.
In this experiment, we selected the number of hidden layers as 1–5, and we used the RMSE, MAE, R
2, and model parameters as the evaluation metrics, with a time step set to 20 by default. The test results are shown in
Table 4.
From
Table 4, we can see that the model has the lowest number of parameters when there is one hidden layer. When there are five hidden layers, the model has the highest number of model parameters.
With one hidden layer, the RMSE of the model was 123.959 μg·m−3, the MAE was 95.315 μg·m−3, the R2 was 0.978, and the parameter size was 32,251. In this case, the model may not be able to fully learn the complex action relationships in the data due to the low number of hidden layers.
With two hidden layers, the RMSE was 108.177 μg·m−3, MAE was 83.187 μg·m−3, R2 was 0.984, parameter size was 52,451, and the index values of the model were optimal, between one and five hidden layers.
Compared with the model with one hidden layer, the RMSE of the model with two hidden layers decreased by 15.782 μg·m−3, the MAE decreased by 12.128 μg·m−3, and the R2 increased by 0.006, showing that the model with two hidden layers could adequately learn the connections in the data and have fewer errors in predicting the CO2 mass concentration. Models with three to five hidden layers may have a relatively complex structure due to more parameters, which leads to larger errors in the prediction results.
The experiments show that the model with two hidden layers had a better prediction effect, less structural complexity, less computation, faster training, and faster running speed. Thus, the model structure with two hidden layers was chosen in this study.
3.3. Optimal Time Step
We used the LSTM model for time series prediction, which requires feature acquisition by a time step. The time step is a very important parameter for LSTM models because it determines the size of the feature composition structure and the amount of data required for the model during training, validation, and testing. The size of the time step directly affects the performance of model training and prediction; thus, we needed to set a reasonable value for this parameter to ensure good model performance.
In our experiments, we used the grid search method for time steps T ∈ {1, 20, 40, 60, 80, 100} [
49,
50,
51]. We used the RMSE, MAE, and R
2 as the evaluation indexes of the model to filter the optimal time step. The experimental results are shown in
Table 5.
The values of the performance metrics of the LSTM model when the time step is set to T ∈ {1, 20, 40, 60, 80, 100} are shown in
Table 5. When the range of values for the time step T are ∈ {1, 20}, it can be seen that the model performs better with T = 20 than with T = 1. The reason for this is that the feature datum produced by time step T = 1 is one feature data point, which is not the same as time series data over a time span and cannot express the relationship between continuous feature data points.
When the value range of the time step is T ∈ {20, 40, 60}, it can be seen that the prediction error of the LSTM model gradually decreases and the performance gradually improves. The LSTM model with T = 60 has the best performance because the feature data produced by this time step can better represent the relationship between continuous feature data points.
When the time step is T ∈ {60, 80, 100}, it can be seen that the overall prediction error of the LSTM model increases and the performance decreases due to the increased time step. Further, if the time step is larger, fewer data points are used for training; thus, we believe that the model is not sufficiently trained, which is the same as the conclusion reached in [
51].
In summary, the prediction error of the LSTM model is the lowest and the performance is the best when time step T = 60. Therefore, in this paper, the time step of the model was chosen as T = 60, and subsequent experiments were conducted on this basis.
3.4. Feature Importance Ranking and Filtering
We collected a total of nine categories of environmental quality parameters in the sheep barn using the IoT. CO2 mass concentration is influenced by a variety of parameters, some of which show a strong correlation to it, and these parameters are called important features.
To filter the important features, we used the RF algorithm to calculate eight parameters to obtain their degree of importance and rank them in the following order: light intensity, air relative humidity, air temperature, PM2.5 mass concentration, PM10 mass concentration, noise, TSP mass concentration, and H
2S mass concentration; the scores are shown in
Table 6.
We selected different numbers of participants to input into the model in order to verify the effectiveness of the RF algorithm in order of ranking for the experiment, and we obtained the MAE variation curve as shown in
Figure 5.
As seen in
Figure 5, with one feature parameter, although the input dimension of the model is the smallest, the model has a poor fit and the MAE is the largest. With three feature parameters, although the input dimension of the model is smaller, the model is not fully developed, the fitting effect is average, and the MAE can be further reduced.
With four feature parameters, the MAE of the model is further reduced, the model fits better, and the input dimension is smaller. With five to eight feature parameters, the MAE is not much different from that of the model with four feature parameters, and the fitting effect is similar, but the input dimension increases.
In summary, it can be seen that with four feature parameters, the model can develop fully, the average absolute error is relatively low, the fitting effect is more satisfactory, and the input dimension is more reasonable.
In order to reduce the input dimension, optimize the network structure, and reduce the computational complexity of the model, we selected the top four parameters (light intensity, air relative humidity, air temperature, and PM2.5 mass concentration) as the prediction model inputs.
3.5. PSO Results for Hyperparameter Search
After determining the LSTM model structure, optimal step size, and important features, we used the PSO algorithm to find the optimal number of neurons, dropout probability, and batch size hyperparameters for the LSTM model. The results of the PSO algorithm were as follows: there were 64 neurons in the input layer, 128 neurons in hidden layer 1, 32 neurons in hidden layer 2, a dropout probability of 0.1, and a batch size of 32. We needed to train the LSTM model after determining the hyperparameters, and the training loss value changes are shown in
Figure 6.
Figure 6 shows that the initial values for the training loss and validation loss of the model were 0.0213 and 0.0057, respectively. Although the initial loss of the model was high, the value rapidly decreased as the training proceeded, because the model updates the weights during the backpropagation process, gradually improving the fitting ability.
When the network training exceeded 260 epochs, the training loss value gradually stabilized between 0.0008 and 0.0009, and the validation loss value gradually stabilized between 0.0006 and 0.0007. The convergence of the loss values of the overall model shows only slight oscillations, indicating the completion of network model training.
The final RF-PSO-LSTM model in this paper was obtained after the model training was completed;
Figure 7 shows the prediction effect of the model for CO
2 mass concentration in a sheep barn. As can be seen in
Figure 7a, the overall trend of our model’s prediction of CO
2 mass concentration was similar to the actual CO
2 mass concentration. Our model predicted the peak at the same point at which the sheep house CO
2 mass concentration reached the peak. This demonstrates the ability of our proposed model to act as an early warning when the CO
2 mass concentration in a sheep barn reaches a certain level, safeguarding the welfare of meat sheep to some extent.
Figure 7b further shows the difference between the CO
2 mass concentration predicted by our model and the actual CO
2 mass concentration in the sheep barn. It can be seen that although the predicted concentration of the model is very similar to the actual concentration, it is not as smooth as the actual value in terms of data smoothing. We determined that this could be related to changes in the environmental parameters of the sheep barn, which is a relatively slow process, and the interactions between parameters are slow, so the changes in CO
2 mass concentration in the sheep barn are relatively smooth.
3.6. Comparative Analysis of Hyperparameter Predictions
In order to verify the effectiveness of the hyperparameter search results of the PSO algorithm for the LSTM model, we set different hyperparameters for the LSTM model for comparison tests. The model evaluation metrics were the RMSE, MAE, and R
2, and the experimental results are shown in
Table 7.
To determine the effectiveness of the PSO algorithm for the batch size hyperparameter search of the LSTM model, we only changed the batch size of the model, and set the RF-LSTM_1 model to have a batch size of 64 and the RF-LSTM_2 model to have a batch size of 128. It can be seen from the table that the RF-PSO-LSTM model with a batch size of 32 has the lowest RMSE and MAE values, indicating that this model has the least error in predicting the CO2 mass concentration in the sheep shed.
To determine the effectiveness of the PSO algorithm for the dropout hyperparameter search of the LSTM model, we changed only the magnitude of the dropout value and set the RF-LSTM_3 model with a dropout value of 0.2 and the RF-LSTM_4 model with a dropout value of 0.3. The table shows that as the dropout value increased, the RF-LSTM_4 model had a larger prediction error than the RF-LSTM_3 model in the prediction of CO2 mass concentration in the sheep shed. The RF-PSO-LSTM model with a dropout value of 0.1 predicted the CO2 mass concentration with fewer errors and better results.
To determine the effectiveness of the PSO algorithm for the hyperparametric optimization of neurons in the LSTM model, we experimented by changing only the number of neurons in the input layer, hidden layer 1, and hidden layer 2 of the model. Specifically, we set up the RF-LSTM_5–11 models for comparison with the RF-PSO-LSTM model proposed in this paper. We found from the test indicators in the table that more neurons in the model is not better, and likewise, fewer is not better either. The number of neurons in each layer of the model needs to be reasonably configured to maximize the performance of the model.
3.7. Comparative Analysis of Model Predictions
In order to verify the difference between our proposed RF-PSO-LSTM model and other models in predicting the CO
2 mass concentration in sheep sheds, we used the gradient boosting regression tree (GBRT) algorithm, the light gradient-boosting machine (LightGBM) algorithm, the support vector regression (SVR) algorithm, and the random forest regression (RFR) algorithm models for the experimental analysis. The results are shown in
Table 8.
The RFR, SVR, GBRT, and LightGBM models were obtained by training with all the features. The RF-RFR, RF-SVR, RF-GBRT, and RF-LightGBM models were obtained by training with the four features filtered by the RF algorithm.
It can be seen from
Table 8 that compared with that of the RFR model, the RMSE of the RF-RFR model increased by 4.471 μg·m
−3, the MAE decreased by 6.861 μg·m
−3, and the R
2 decreased by 0.002. Compared with the SVR model, the RMSE of the RF-SVR model decreased by 43.488 μg·m
−3, the MAE decreased by 43.174 μg·m
−3, and the R
2 increased by 0.065. Compared with the GBRT model, the RMSE of the RF-GBRT model decreased by 4.475 μg·m
−3, the MAE decreased by 2.176 μg·m
−3, and the R
2 increased by 0.004. Compared with the LightGBM model, the RMSE of the RF-LightGBM model decreased by 8.332 μg·m
−3, the MAE decreased by 7.127 μg·m
−3, and the R
2 increased by 0.006.
We found that the model that first uses the RF algorithm to filter the important features and then trains using the filtered features has a lower MAE value and a higher R2 value than the model that trains using all features. The low RMSE and MAE of the model indicate that it has a small error in predicting CO2 mass concentration. The high R2 value of the model indicates that it has a high reliability in predicting CO2 mass concentration.
Among the compared models, the RF-RFR model predicted an RMSE of 220.844 μg·m−3, an MAE of 138.994 μg·m−3, and an R2 of 0.937 for the CO2 mass concentration in sheep sheds, which are the best predicted results.
The differences between our proposed model and the RF-RFR model can be seen in the table. Specifically, compared with RF-RFR, the RMSE of our model decreased by 145.422 μg·m−3, the MAE decreased by 87.155 μg·m−3, and the R2 increased by 0.055. Our proposed model had a better performance than the other models in predicting the CO2 mass concentration in sheep barns.
In summary, the RF-PSO-LSTM prediction model has a higher accuracy and a better fit, which are beneficial for single time series prediction with better real-time performance. Our model can be used for predicting sheep barn CO2 mass concentrations at large-scale meat sheep farms, providing a strong decision basis for early warning while improving the welfare of sheep.