5.1. Factors Involved in CO2 Release and Dataset Selection
Considering the unique characteristics of CO2 micro/nanobubble release, data were collected within 2 hours of spraying CO2 micro/nanobubble water in the experimental environment. According to the experimental method described above, 12,600 datasets were obtained. Two methods were used to divide the datasets.
Dataset 1: Within 2 hours of spraying CO2 micro/nanobubble water, the CO2 concentration data were monitored by the sensors at 3 different spatial distances. Three parameters were included in the dataset: spatial distance, time after spraying, and CO2 concentration. Subsequently, the time-series datasets of CO2 concentrations were built based on the calculated spatiotemporal correlation coefficients. The first 80% CO2 concentration time-series data were used as the training set to establish the ARIMA time-series model to predict the CO2 concentration coupled with spatiotemporal characteristics. The second 20% CO2 concentration data were used as the validation set to conduct the prediction-effect test.
Dataset 2: The CO2 concentration data predicted by the ARIMA model coupled with spatiotemporal characteristics were used as the actual values, and the other four parameters were used to construct a dataset for the neural network prediction model. The datasets were divided into training and test sets in an 8:1 ratio. The training set was used for model creation and training, whereas the test set was used to test the performance of the model.
5.2. Simulation Parameters
The process of selecting the model parameters and experimental platform used in this study was as follows:
(1) ARIMA model
Step 1: Calculate the spatiotemporal correlation coefficient of the distance of 0.6 m and 0.9 m with a target space 0.3 m. The results are shown in
Figure 5. It shows that, when the time-delay parameter of the 2 nontarget spaces is 60 s, the correlation coefficient with the target space is the largest, that is, 0.895 and 0.837, respectively. Therefore, the collection interval of the CO
2 concentration was 1 min, and the CO
2 concentration time-series dataset was constructed by coupling with the 0.3 m space.
Step 2: Judge the stationarity of the sequence preliminarily by observing the diagram of CO
2-release concentration change and use the ADF test for the stationarity test. The ADF test results are shown in
Table 1.
From the table, the sequence is unstable before the differential operation, and the ADF value of the dataset is −24.44, which is evidently less than the 1%, 5%, and 10% significance level values after the first-order difference. The ADF test indicates that the data are stationary and reached the ARIMA model stabilization requirements after the first-order difference.
Step 3: Use the autocorrelation and partial correlation coefficients to estimate the model order. The results of data correlation detection are shown in
Figure 6 and
Figure 7. To further identify the order of the model, numerous (p, q) combinations were set and the AIC, BIC, and HQIC values were compared under different combinations.
Figure 8 shows that when the model order (p, q) combination was (4, 6), the AIC value was the smallest (635.66), the minimum BIC value was 631.89, and the model order was ARIMA (4, 1, 4); when the model order (p, q) combination was (3, 5), the HQIC value was the smallest. Considering the lowest AIC value as the premise, the value of (p, q) was (4, 6). Therefore, BIC and HQIC are 640.77 and 647.60, respectively, and the difference between them and the corresponding minimum value is the smallest. Therefore, the parameters of the experimental ARIMA model are ARIMA (4, 1, 6).
Step 4: Perform ARIMA model checking and use residuals to test the model quality. The Durbin–Watson statistic can be used to test a model when the regression model contains an intercept term, explanatory variables are non-random, or the random disturbance term is a first-order linear autocorrelation [
34]. Based on this, white noise was used to assess the residual sequence, that is, to determine whether the residual sequence autocorrelation function graph fell within the confidence interval.
Figure 9 and
Figure 10 show the test results. The image shows that the residual sequence is almost entirely within the confidence interval, proving that the data sequence is white noise and that the ARIMA regression model is effective.
(2) BPNN model
In this hybrid model, the BPNN model describes the nonlinear relationship between the ambient temperature, humidity, equipment pressure, amount of bubble water sprayed, and residual CO
2 concentration predicted by ARIMA. Therefore, these parameters were taken as the input values, and the CO
2 concentration in a specific space coupled with space–time properties as the output values were used to train the network. The neural network had four layers: the input layer, output layer, and two hidden layers. The number of neurons in the input layer was equal to the number of model input parameters, that is, five, and the number of neurons in the output layer was one. The selection of the number of neurons in the hidden layer was obtained according to empirical formula 6, and the number of nodes in the hidden layer with the best fitting result was obtained by repeatedly testing the number of neurons within the value range.
where
and
represent the number of neurons in the input and output layers, respectively, and
is an integer in the range (1, 10). The comparison and analysis of each training result of the model show that when the number of neurons in the hidden layer is (7, 5), the training mean squared error reaches a minimum.
Figure 11 shows the structure of the neural network used in this experiment.
The parameters for the model training are listed in
Table 2.
Figure 12 shows the variation curve of the root-mean-square error (RMSE) with the number of iterations during the learning process of the training and test datasets. As shown, when the number of iterations was 1000, the RMSE of the model was stable and reached the optimum value. The RMSEs of the training and test sets were 3.58 × 10
−5 and 3.07 × 10
−4, respectively.
5.3. Model Evaluation Index
This study adopted three commonly used standard statistical measures, the
RMSE, mean absolute error (
MAE), and correlation coefficient (
), to evaluate the predictive ability of the combined model, which is the deviation between the prediction result and actual value. The specific calculation formulas are as follows:
In these three formulas,
is the quantity of sample data,
and
are the measured and model-predicted values, respectively, and
is the mean value of the sample data. The smaller the
RMSE and
MAE values of the three model evaluation indices, the higher is the accuracy of the prediction model and the better its prediction effect.
represents the goodness of fit between the predicted results and measured values; the closer
is to 1, the better is the interpretation of the independent variable to the dependent variable in the regression model [
25].