1. Introduction
According to statistics, energy costs account for 20% to 30% of the total production costs in the industrialized production of edible mushrooms [
1]. Among those costs, the air conditioning systems used to regulate the temperature in mushroom houses account for approximately 40% of the total energy cost [
2]. It is expected that with the increasing proportion of industrialized production in the edible mushroom industry in China [
3], the energy consumption will continue to rise. Therefore, efficient energy saving and sustainable mushroom house buildings are the direction for industry development. Researchers have proposed improvements from energy-saving and environmentally friendly enclosure structures, green energy utilization, high-efficiency air conditioning equipment, etc. Among them, intelligent control methods are one of the highly valuable and widely applicable sustainable measures for reducing energy consumption [
4]. However, uncertainties, such as weather and human factors, result in significant randomness in mushroom house energy consumption. Therefore, accurate load forecasting for mushroom houses is crucial for decision making and control of air conditioning systems.
The air conditioning load forecasting methods can be mainly divided into white-box models, grey-box models, and black-box models based on their forecasting principles [
5]. White-box models are difficult to establish due to the challenges in modeling accuracy. These models involve a significant quantity of parameters, such as the thermal capacitance and thermal conductivity of various components. Additionally, the thermal parameters of mushroom substrates must be considered, for which it is difficult to obtain precise measurements. Simple grey-box models like the RC model are widely used in model predictive control (MPC). Their advantage lies in the ability to identify parameters with limited data and their reliability compared to black-box models, thanks to the constraints imposed by physical rules. However, as the available data increase, these models face performance limitations [
6]. Research has shown that MPC methods using black-box models can achieve energy savings of 8.4%, higher than the energy savings of 7.4% for white-box models and 7.2% for grey-box models [
7].
With the development of Internet of Things (IoT) technology, the cost of acquiring a large amount of real-time data has been significantly reduced. In this context, data-driven prediction methods have gained widespread attention from scholars domestically and abroad. Data-driven prediction models can be divided into statistical models represented by autoregressive (AR) models [
8], autoregressive integrated moving average (ARIMA) models [
9], and machine learning models represented by support vector machines (SVMs) [
10], artificial neural networks (ANNs) [
11], random forests (RFs) [
12], support vector regression (SVR) [
13], and deep learning models represented by long- and short-term memory (LSTM), gated recurrent unit (GRU), bi-directional LSTM (BiLSTM), etc.
In machine learning, Wang et al. [
14] combined ANN and ensemble methods for the short-term prediction of building loads. Compared with a single ANN model, the prediction error was reduced by 24%. Yong et al. [
15] used the RF method for building energy consumption prediction, and the RF model had a 10% and 6% increase in the coefficient of determination R
2 compared with backpropagation neural network–ANN (BPNN-ANN) and SVM, respectively. Ahmad et al. [
16] predicted the short-term, medium-term, and long-term energy consumption of building environments using a binary decision tree (BDT), compact regression Gaussian process model (CRGPM), and other methods. Dai et al. [
17] improved the prediction method of SVM by optimizing it using an improved particle swarm optimization algorithm (PSO). The results showed that the mean absolute percentage error (MAPE) of the SVM prediction with an improved PSO optimization algorithm was 0.0412%, which was better than that of minimal redundancy maximal relevance–genetic algorithm–SVM (mRMR-GA-SVM), BPNN, and mRMR-BPNN, which were 0.0493%, 0.0447%, and 0.0438%, respectively. This indicates that optimizing the parameters in the prediction model through optimization algorithms can significantly improve the model’s prediction accuracy.
In deep learning, Olu-Ajayi et al. [
18] compared nine algorithms, including RF, SVM, and linear regression, for predicting annual average energy consumption. They found that the deep neural network (DNN) achieved an R
2 of 0.95 and a root mean square error (RMSE) of 1.16 kWh/m
2, outperforming other models. Zhou et al. [
19] predicted air conditioning electricity consumption using LSTM and BPNN. The results showed that compared to BPNN, LSTM reduced the MAPE of daily electricity consumption by 49% and the MAPE of hourly electricity consumption by 36.61%. Bohara et al. [
20] applied BiLSTM for short-term load forecasting in residential buildings. The results showed that compared to the LSTM model, BiLSTM reduced the RMSE by 5.06%. As an improved network of LSTM, BiLSTM can handle more complex long-term sequences, extract information from both directions in the sequence, and better capture bidirectional temporal features.
To further improve the prediction accuracy, some researchers have proposed combining different algorithm models [
21]. For example, Song et al. [
22] proposed a combination model based on a hybrid convolutional neural network (CNN) and LSTM for predicting hourly loads in heating substations. In this model, CNN effectively extracted spatial feature matrices of the load and influencing factors, while LSTM captured the temporal features of the load. The results showed that compared to SVM, random forest regression (RFR), multilayer perceptron (MLP), and gradient-boosting regression (GBR), the RMSE of load prediction at the heating substations decreased by 0.235 GJ, 0.244 GJ, 0.237 GJ, and 0.236 GJ, respectively. The attention mechanism, which simulates the human brain’s resource allocation, has been widely applied in time-series forecasting due to its ability to enhance the weights of key influencing factors. Chitalia et al. [
23] studied the attention model of recurrent neural networks (RNNs) in buildings, such as laboratories, offices, and schools. The model’s predictive performance improved by 45% compared to ARIMA, RNN, and CNN. Wan et al. [
24] combined CNN, LSTM, and attention mechanism for power prediction in two units. Compared to LSTM and CNN-LSTM models, the CNN-LSTM-attention model achieved higher prediction accuracy by 1.815 MW and 1.57 MW, and 0.066 MW and 0.026 MW, respectively, outperforming single models and models without an attention mechanism. He et al. [
25] proposed a hybrid prediction model combining CNN, GRU, and an attention mechanism. The results showed that compared to LSTM, the RMSE of the prediction results decreased by 1.38%, and the R
2 improved by 1.49%.
Combination models can effectively improve the prediction accuracy compared to single models. However, relying solely on the performance of algorithms has limited effectiveness in improving prediction accuracy. Load data show characteristics, such as volatility, dynamics, and complexity, including trends, seasonality, noise, etc. [
26]. In particular, this applies to ultra-short-term load forecasting, which refers to load forecasting within one hour [
27]. Very-short-term load forecasting (VSTLF) is beneficial for optimizing the operation of energy systems [
28]. The speed and accuracy of VSTLF results determine the performance of MPC for energy-saving control, and they are the prerequisite for mining energy-saving potential [
29], while traditional prediction models struggle to accurately capture the inherent features in the raw data.
To solve such problems, various decomposition methods with time-series data have been proposed [
30], such as wavelet transform (WT), empirical mode decomposition (EMD), ensemble empirical mode decomposition (EEMD), etc. Wang et al. [
31] used WT to decompose photovoltaic power data and established deep convolutional neural networks (DCNNs) for each item of decomposed data, combined with quantile regression (QR) for prediction. The reconstructed results outperformed BPNN, SVM, etc., with an average improvement of 52.25% in RMSE and 56.64% in mean absolute error (MAE) compared to other models. Gao et al. [
32] used EMD to decompose the original load data, selected decomposition sequences highly correlated with the original load through Pearson correlation analysis, and combined them with the original data sequence as inputs to the GRU network for short-term load forecasting. Compared to single models, such as GRU, SVR, RF, as well as EMD-GRU, EMD-SVR, and EMD-RF models, the proposed method achieved the best performance in terms of RMSE (1484.17 kWh) and MAPE (3.08%) for July forecasting in the M1 dataset. Mounir et al. [
33] proposed a method that combines EMD and BiLSTM to achieve more accurate load forecasting. By decomposing the information on multiple time scales using EMD and capturing short-term and long-term information in each component using BiLSTM, the EMD-BiLSTM model reduced the MAE by 0.07 and the MAPE by 0.14% compared to the BiLSTM model. The introduction of EMD effectively captures complex temporal and spectral features in the data, improving the prediction accuracy. However, EMD decomposition may suffer from mode mixing and endpoint effects. To address this, He et al. [
34] proposed a short-term wind power prediction method based on EEMD and the least absolute shrinkage and selection operator–quantile regression neural network (LASSO-QRNN). Compared to LASSO-QRNN without EEMD, it achieved a 60.96% reduction in MAE. However, the EEMD algorithm, which solves the mode-mixing problem by adding white noise, also introduces reconstruction errors, affecting the prediction accuracy. The EEMD decomposition algorithm, due to the multiple additions of white noise and multiple EMD decompositions, increases the computational complexity and reduces efficiency. None of the above methods analyze the characteristics of each component after decomposition to select suitable methods for better prediction.
Empirical wavelet transform (EWT), proposed by Gilles in 2013 [
35], is an adaptive method that can select frequency bands, overcome mode mixing, and reduce computational complexity. It has promising applications. The accuracy of the prediction model can be effectively improved by analyzing the different characteristics of the decomposed load data and establishing the most suitable prediction model based on these characteristics [
36,
37].
In summary, deep learning neural network methods have been widely studied and have shown good application results in commercial buildings and power distribution. However, mushroom cultivation facilities in edible fungus factories differ from commercial buildings. The internal mushroom substrate density is high, and the indoor thermal hysteresis is significant. Moreover, the target temperature indoors needs to be changed, synchronizing with the growth cycle of the mushrooms. This article proposes a prediction method for rapidly changing the cooling and heating loads in mushroom cultivation facilities based on EWT decomposition. Firstly, the EWT method is used to decompose the load into multiple intrinsic mode functions. Then, the Lempel–Ziv algorithm is utilized to classify the modal components. The components are then reconstructed into high-frequency and low-frequency parts. Finally, the CNN-BiLSTM-attention model and the ARIMA model are used separately to predict the load. The proposed load prediction method is validated and compared with commonly used prediction methods using actual data from a mushroom factory, demonstrating its effectiveness in load prediction.
2. Materials and Methods
2.1. Data Source
The experimental site was located in a factory where the edible fungus hypsizygus marmoreus is produced, in Tongzhou District, Beijing. The length, width, and height of the experimental mushroom house were 14 m, 8 m, and 5 m, respectively. The ground of the mushroom house was formed of hardened concrete, and the surrounding walls and roof enclosures of the mushroom house were polyurethane steel plates in sandwich color with a thickness of approximately 100 mm. The mushroom house had control equipment such as fixed-frequency air conditioners, fresh air fans, exhaust fans, and humidifiers.
In order to obtain relevant data affecting the load of the mushroom house, the air temperature and humidity inside and outside the mushroom house were continuously measured using HOBO U23-001A (Onset Computer Corporation, Bourne, MA, USA), (accuracy: ±0.2 °C, ±2.5%). The working status of the air conditioner was checked and monitored via an HOBO CTV-C (Onset Computer Corporation), (accuracy: ±5 A) current collector, and the sampling intervals were set to 1 min. The fresh air velocity was measured using a Delta HD2903T wind speed sensor (Delta OHM S.r.l., Padua, Italy) (accuracy: ±3%); the CO2 concentration was measured using a Vaisala’s GMP252 sensor (Vaisala Oyj, Vantaa, Finland); the solar radiation was measured using Kipp and Zonen’s SMP3 sensor (Royal Kipp & Zonen BV, Amsterdam, The Netherland); the wall heat flux was measured using a Hukseflux HFP01 heat flux sensor (Hukseflux Thermal Sensors B.V., Amersfoort, Holland), which was connected to Campbell’s CR1000X data collector (Campbell Scientific, Inc., Logan, UT, USA) for continuous recording, and the sampling interval was 1 min.
The deployment of the mushroom house measuring equipment is shown in
Figure 1. The temperature and humidity sensors were evenly arranged in the middle of the mushroom house with five measurement points in the east–west direction; two layers were arranged at equal distances in the vertical direction of the cultivation shelf, with a total of ten sensors; the wind speed sensor was installed at the air inlet of the fresh air pipeline; the CO
2 sensor was arranged in the center of the house; the outdoor air temperature, humidity, and solar radiation sensors were attached to a pole at a distance of 5 m from the mushroom house, with a height of 2 m. Data were collected from 11 July 2022 to 15 September 2022, totaling 729,300 data.
2.2. Data Processing
Data processing includes three parts: outlier handling, feature filtering, and normalization, as shown in
Figure 2.
The acquired data are graded as missing and abnormal due to the equipment itself or transmission, which affects the accuracy of the prediction model. For missing data, the front and rear data of the missing position are used to fill the gaps through linear interpolation, and the calculation method is shown in Equation (1); for abnormal data, the mean method is used for smoothing and filtering, and the calculation method is shown in Equation (2):
where
is the missing data at time
;
is the original data at time
;
is the original data at time
:
where
is abnormal data;
,
are adjacent to the valid data.
In order to improve the efficiency of model prediction, the Boruta algorithm is used to filter out key input features and construct a feature set for model training. The Boruta algorithm calculates the binomial distribution of feature selection probability through multiple iterations, as shown in
Figure 3.
The red area represents the rejection region, where features falling within this area are eliminated during iteration. The blue area indicates the uncertain region, where features falling within this area remain pending throughout the iterative process and require further determination based on the selection threshold and feature importance. The green area represents the acceptance area, where features falling within this area are directly retained.
In order to eliminate the influence of the load parameters of the mushroom house on the training of the neural network prediction model due to the difference between the dimensions, it is necessary to normalize all factors, and the calculation method is shown in Equation (3):
where
is the maximum value;
is the minimum value; and
is the normalized value.
2.3. Model Experimental Environment
The computer configuration of the experimental platform is as follows: equipped with AMD Ryzen (TM) 7 5800H CPU (Advanced Micro Devices, Inc., Santa Clara, CA, USA), NVIDIA GeForce GTX 1650 4G GPU (NVIDIA Corporation, Santa Clara, CA, USA), 16 GB of memory, and 64-bit Windows 10 operating system. The software uses Keras-2.9 as a deep learning tool, Tensorflow-gpu-2.3 as a deep learning framework, the programming language is Python, the Python version is 3.7, and the integrated development environment is Visual Studio Code.
2.4. Model Prediction Evaluation Indicators
In order to quantitatively evaluate the effect and accuracy of the model’s prediction, this paper uses RMSE, MAE, MAPE, and R
2 as indicators to measure the prediction accuracy. The calculation methods of RMSE, MAE, MAPE, and R
2 are shown in Equations (4)–(7):
where
is the actual values;
is the predicted values;
is the mean of the actual value; and
is the total amount of data.
2.5. Process Design
The model prediction process proposed in this article is shown in
Figure 4; the specific steps are as follows:
Exception data processing: Missing and abnormal values in the collected data were processed using interpolation and smoothing filter methods, respectively, to obtain a complete dataset;
Load data decomposition: The EWT method was used to decompose the load of the mushroom house into modal components of different scales. The Lempel–Ziv method was then applied to classify the modal components into high-frequency and low-frequency categories, reconstructing all components into high-frequency and low-frequency feature components;
High-frequency feature selection: The Boruta algorithm was used to determine the input features of the neural network model. An input feature set was constructed, and the dataset was divided into training, validation, and testing sets in a ratio of 7:2:1;
Predictive model construction: For the high-frequency feature components, the dataset in Step 3 was first normalized, and then the CNN-BiLSTM-Attention model was used to predict; the high-frequency feature prediction results were obtained after inverse normalization. For the low-frequency feature components, an ARIMA prediction model was established to address the drawback of the neural network being less insensitive to linear features. Finally, the results from the high-frequency and low-frequency prediction models were combined and reconstructed to obtain the final result;
Model prediction: Based on the test data and the prediction values from the model, model error metrics were obtained to evaluate the model’s prediction performance.
2.6. Predictive Model Structure
The overall structure of the hybrid ARIMA and CNN-BiLSTM-attention model proposed in this paper is shown in
Figure 5.
Input layer: For the high-frequency part of the load, the input features are filtered using the Boruta method to establish key feature vectors, and then the historical 4-h load data are used to predict the load changes in the next 10 min, and the input features with data dimension (24, 7) are constructed and input to the CNN layer; for the low-frequency part of the load, it is directly entered into the ARIMA model.
CNN layer: Extract the input feature matrix information by setting the two-layer CNN layer. After passing through the first layer, the data dimension becomes (24, 7, 15); through the pooling layer pooling, the data dimension is transformed into (24, 3, 15), and then input to the second layer convolution layer. The data dimension is transformed into (24, 3, 1), then a single layer Squeeze layer is added, the data dimension to (24, 3) and input to the BiLSTM layer. The above process activation functions all use the ReLU function.
BiLSTM layer: Set the BiLSTM layer as a single layer, and the activation function is Tanh. After passing through the BiLSTM layer, the data dimension becomes (24, 128).
Attention layer: Due to the different degrees of influence of various influencing factors on the load in the time series, an attention mechanism is introduced, and the calculation process is shown in Equations (8) and (9):
where, K, V, and Q represent Key, Value, and Query; the dimensions are all (24, 128); F (Q, K) represent the similarity between Query and Key and are calculated using the dot product method of Query and Key [
38].
ARIMA layer: The ADF unit root method detects whether the low-frequency load part is a non-stationary sequence. After a differential transformation, it is a stationary sequence and input to the ARIMA layer. The p, d, and q are 2, 1, and 1, respectively.
Output layer: Add the dropout layer before the output layer to inactivate the neurons (the inactivation rate is 0.1) to prevent overfitting of the high-frequency part. After inactivation, the data dimension is (32), and then through reconnecting to a fully connected layer and obtaining the high-frequency load prediction result , the low-frequency load prediction result is obtained through the ARIAM layer. Subsequently, the two results are combined to obtain the final load prediction value.